Meta Charset Update

Anne van Kesteren: If your host has already configured your server like this you can not alter the character encoding using a META element. Every document that suggests otherwise is incorrect.

Nearly ten months ago, I set out to tackle internationalization issues on my weblog.  My research included not only the specs, but experimentation with web browser software.  My conclusion at the time was that they were out of sync.

Time for an update.  For starters, Anne points to a new emerging standard that is consistent with previous W3C specs and tutorials.

But do these specs represent reality?  In my DevCon 2004 slides, I asserted otherwise.  This was based on testing I had done, in particular two tests:

I'm now getting different results than the ones I reported on last time, ones that are more consistent with the standards as written.  Perhaps the declarations that XML on the Web Has Failed were premature, we need to only give it more time?

On the other hand, and on a much narrower scope, the consensus continues to build that any notion that HTTP has a meaningful default charset continues to be foolish.

Meanwhile, try these two tests above, and if you get any interesting results, please leave a comment specifying what you saw and what browser (including version) you used.  As I understand the specs, iso-8859-1 should be treated as if it had unprintable characters in it, and utf-8 should display correctly.  After you view each page, try a refresh, particularly in IE.


Using Safari 1.2.4, the utf-8 one works as expected but the iso-8859-1 omits the internationalised characters entirely, instead displaying "Itrntinliztin". In Firefox 1.0 on OS X the utf-8 one works but the iso-8859-1 displays "I?t?rn?ti?n?liz?ti?n".

Opera 7.50 on OS X displays the utf-8 one correctly but displays the iso-8859-1 in the same way as Firefox but with squares instead of question marks.

Aside: the title of the utf-8 example page is currently "iso-8859-1".

Posted by Simon Willison at

IE5/Mac 5.20 on OS X does something really weird: it displays the iso-8859-1 one the same way the other browsers display the utf-8 one, but completely mangles the utf-8 one. Picture here: [link]

Posted by Simon Willison at

The only bug that currently exists in browsers that choose application/xhtml+xml as MIME type in the above test cases is that in the iso-8859-1 test case, they should throw in a non well-formed error. This is a known bug in Mozilla: https://bugzilla.mozilla.org/show_bug.cgi?id=174351

Posted by Anne at

Understanding charsets on the web

Sam Ruby: Meta Charset Update. Anyone who wants to understand charset issues on the web should go through most of what he links. The main thing I took away is that the charset specified in HTTP has primacy. And that makes sense. Since that makes...

Excerpt from Keith's Weblog at

Sam Ruby: Meta Charset Update

[link]...

Excerpt from del.icio.us/jonas at


Firefox 1.0 and Opera 7.54 running on Linux both say it's text/html and display it as expected: the Latin1 page has unprintable character placeholders while the UTF-8 page shows up correctly.

Posted by Aristotle at


I use Opera 8.0 on Windows XP.
All correctly.

Posted by Code at


Anne

Friday 20 May 2005 08:57 Nee das onzin. Je hebt totaal geen idee van hoe het in elkaar zit. Internet Explorer laat het eerst correct zien. Maar als je daarna op refresh drukt bekijken ze het opnieuw en laten ze het gecorrigeerde resultaat zien. Zie...

Excerpt from GoT at


Sam,

It would get good to get HTTP fixed if that’s the case; they’re still collecting errata, AFAIK.

See: [link]

If you could relate the experiences you’ve had WRT the default charset, it would be helpful.

Cheers,

Posted by Mark Nottingham at


Anne

vrijdag 20 mei 2005 08:57 Nee das onzin. Je hebt totaal geen idee van hoe het in elkaar zit. Internet Explorer laat het eerst correct zien. Maar als je daarna op refresh drukt bekijken ze het opnieuw en laten ze het gecorrigeerde resultaat zien. Zie...

Excerpt from Gathering of Tweakers at

Add your comment












Nav Bar