Hossein
Derakhshan: We should promote Unicode standard among
English speaking programmers. Many tools do not work well with
Unicode and this sucks.
I'm doing
my
part. It took only a few lines of code for me to convert
my weblog over to utf-8 (plus changing the content type in a few
templates and a configuration file... bah). Jacques Distler
updated
MTStripControlChars.
Character sets seem to be a classic leaky abstraction.
Java has excellent Unicode support on the inside, but you still
need to worry about the last mile problem. David Czarnecki
has his
blojsom weblog working but apparently had to tweak is
feed. Similar story for
Simon Brown, but java.blogs
garbles
the post in translation. It took a few lines of change to get
roller working, and it looks like a few more lines will be
required.
How do some of the .Net and PHP based weblogs fare?
Ecto
works. All clients should be
tested.
Spread the meme.
blojsom never had a problem with encoding and always had a valid feed. The issue was on Simon's end but all seems to be better on his end now :)
I know PHP by default outputs ISO-8859-1, and ASP.NET defaults to UTF-8. Both can be overrided (although incredibly easier in ASP.NET than PHP, at least the last time I had a look at it), but I will be pretty surprised if you see any large pack of developers doing this in any of the frameworks.
I'm having problems with your commenting system, btw. I can't have 'ø' in my name (I have to HTML encode it), and when I [Preview], the <textarea> is empty, so I have to copy/paste the stuff I've written into it to submit the comment. The error message I receive when posting with 'ø' in my name:
CGI Failure
traceback:Traceback (most recent call last):
File "gateway.cgi", line 39, in ?
post()
File "/home/rubys/mombo/post.py", line 237, in post
print template(searchList=[data, config])
File "/home/rubys/mombo/template/comment.py", line 276, in respond
write(filter(VFN(VFS(SL + [globals(), _builtin_],"parent",1),"get",0)('name',''), rawExpr="$parent.get('name','')")) # from line 152, col 64.
File "Cheetah/Filters.py", line 106, in filter
UnicodeError: ASCII encoding error: ordinal not in range(128)
Iñtërnâtiônàlizætiøn All internationalization tests pass. Entry title, text (includes filename on disk): Check Category name and description (includes directory on disk): Check Comments: Check (includes comment e-mail) Trackbacks: Check (includes...
Iñtërnâtiônàlizætiøn All internationalization tests pass. Entry title, text (includes filename on disk): Check Category name and description (includes directory on disk): Check Comments: Check (includes comment e-mail) Trackbacks: Check (includes...
I was sure I'd created the database with "-E Unicode". Oh well. Javablogs should be serving UTF-8 properly now, although old data will still be broken unless/until it can be refetched from the original RSS feeds.
Sam Ruby pointed out|http://www.intertwingly.net/blog/1763.html that Javablogs was garbling posts that contain highbit characters. This was mainly due to the database having been created in ASCII mode. Javablogs is now serving pages in full UTF8,......
[more]
Phil, stating that you output in UTF-8 isn't the same as actually outputting in UTF-8. The last time I checked, PHP didn't do any magic when setting the 'charset' with the header() function -- the characters will still be encoded with ISO-8859-1. In ASP.NET, though, this magic does happen, just by changing two values in web.config.
I think I'm pretty much there now. Everything internally is being represented okay but I was having some problems actually getting the XML feed to be streamed out as UTF-8 (rather than ISO-8859-1), even though the rest of my site was working fine. TrackBacks were also causing me some problems but this was a simple as changing the "Content-Type" HTTP header.
What else needs to be done to a PHP-based website, besides using the header() function, to deliver UTF-8? I would love to expand the article I wrote on serving XHTML properly to incorporate this.
Simon, if I'm not mistaken (I rarely program any PHP anymore), you also need to use the utf8_encode() function to actually serve the bytes as UTF-8. If you just set the 'Content-Type' header with the header() function, the script only declares that it uses UTF-8 -- the bytes are still served as ISO-8859-1.
I'm not sure whether this has changed in later versions of PHP, though.
Thank you for that information, Asbjørn. I will investigate further.
It is cool that some of the blogging tools are already being readied for advanced character sets. As someone with a home-brewed system, I am always interested to learn about any techniques that can make websites more accessible, and I consider character sets to be an important accessiblity consideration.
The multi-byte string extension looked like a nice solution, but having to compile stuff to get it work is IMHO a bit overload. My thoughts go to all the thousands and thousands of developers that doesn't have control over their web-servers, and doesn't have understanding administrators that can do this for them. Many of these doesn't even pay for their hosting service, and thus can't really demand anything either.
It's sad that PHP doesn't support Unicode natively, and by default. I hope this will be corrected in future versions of the framework. Maybe it's already corrected in PHP 5.x(?).
Sam, when I press [Preview], everything previews fine, but the <textarea> is empty. So when I then press [Post], I get to a completely blank and empty page. A fix seems to be to copy the text I write in the primary comment form, paste this into the preview form, and then submit. Can you please make sure the comment gets its way into the <textarea> in the preview form as well?
Asbjørn, I see that you are using Opera. Unfortunately, I see no problems using IE or Mozilla. The last change I made which might have affected this area was the utf-8 change, but you have posted since then, so I am at a loss as to what might have caused it.
Can you "view source" on the page that results after you push [Preview]? Do you see something like:
<textarea cols="59" name="comment" rows="12">Asbjørn, I see that you are using Opera. Unfortunately, I see no problems using IE or Mozilla. The last change I made which might have affected this area was the utf-8 change, but you have posted since then, so I am at a loss as to what might have caused it.
Can you "view source" on the page that results after you push [Preview]? Do you see something like:</textarea>
Sam Ruby has kick started a wonderful meme of I18N awareness. I'm admittedly a late-comer to understanding the gory details of all of it. I've recently, for Lucene in Action, dug deeper into how to "analyze" text of other languages. Sam's blog...
Sam, the <textarea> actually has the content, but it isn't visible. I don't understand why. I'll look into it -- it might be a bug in Opera (I'm using 7.5 Beta 1). Something else: why not smack some <label>s into the comment form, at least around the «Remember info?» text?
All Unicode-enabled websites can have this icon (there's a french one as well) on them to declare to the world that they support the standard. The icon should of course link to the Unicode website.
I just stole this from Anne's weblog. I wanted to test this stuff anyways. How does my weblog perform using unicode. See also: Survival guide to i18n. Some tests: ã“ã‚Œã¯æ—¥æœ¬èªžã®ãƒ†ã‚ストã§ã™ã€‚èªã‚ã¾ã™ã‹ Let’s see how Unicode...
This post will contain some tips on how to set up your web development process to use UTF-8 end to end. What happened was, I saw a pair of posts by Sam Ruby (Unicode and weblogs, Aggregator i18n tests). I can be a bit of a careful (read: slow)...
Hi I am new to web design and have just started learning unicode and how to make UTF-8 website. I would like to offer my web design service for bi-lingual users, namely English and Chinese, and believe the UTF-8 is the way to go. Am I being naive to think that making the <'Content-type: text/html; charset=utf-8'> is the answer for utf-8 website development or it's more complicated than I thought?
And how do I convert my Chinese to Unicode?