intertwingly

It’s just data

PHP and Unicode.


Jarek Zgoda: It still doesn't have native unicode support, so all this XML buzz is just that -- a buzz. In modern world lacking of unicode awareness makes any solution incomplete.

I agree with Adam Trachtenberg, Unicode support is on my list of things that would be great to add to PHP 6

Sterling Hughes and Thies Arntzen point out that Parrot is fully Unicode, but that largely is due to the use of the ICU libraries, and only addresses a small part of the problem.  The hard part is all of the inputs, outputs, and extensions.

My recommendation would be to first upgrade the current code base to using utf-8 internally and use that to shake out all of the interface problems.  Utf-8 has a number of desirable features:

Overall, I would suggest the following:

Determining the output encoding by parsing the Content-Type header would be a good idea, as would be providing functions that explicitly set the default input and output encodings.

Once the bugs are shaken out, upgrading to ICU (or converting to something like Parrot) would be a considerably simpler proposition.