Internationalization (i18n)
Long-term good, but some short-term chaos
-
Ahead of its time
XML 1.0 is built on a foundation of Unicode characters, giving it a wide and well-defined range of characters to use, rules for processing them, and different encodings for transmitting them. However, Unicode's broad adoption has been extremely slow.
-
The Encoding Declaration and auto-detection schemes
The XML declaration provides an encoding declaration, but reaching it may require an understanding of the encoding being used. Sometimes this is easy (MIME types support charset parameters), but sometimes it's impossible. Non-Unicode users have to use this declaration to ensure a significant degree of interoperability, but some applications (notably HTML browsers) mistake this declaration for document text.
-
The shift from ISO-8859-1 (Latin-1) and Windows CP-1252
American users will likely have the easiest time transitioning to UTF-8, as ASCII is a perfect subset. For the rest of the world, there are a lot of issues to be addressed as far as encoding conversions and (in some cases) handling characters that don't have a home in Unicode. These problems range from the minor to the intractable.
Previous Page <
TOC
> Next Page