Clean-utf-8-for-XML
Apparently, that’s the “top” of the Unicode range, in the so-called “plane 16”. Take a look at the regular expression here.
Posted by Sam Ruby at
Are you willing for this code to be treated as in the public domain? Because it might be handy.. :)
Posted by Paul Findlay at
Paul, I am comfortable with this code being treated as either covered by the MIT or the Apache License, Version 2.0. If neither of these are acceptable to you, let me know and I will see what I can do to accommodate.
Posted by Sam Ruby at
My fast XML generation library genx (see [link]) has routines genxCheckText and genxScrubText - the latter brutally removes anything that’s not either well-formed UTF-8 or a valid XML character.
Posted by Tim Bray at
Diff for clean compile on OS X (Darwin 8.7.1):
--- clean_utf8_for_xml.c.orig 2006-07-04 07:57:28.000000000 -0500 +++ clean_utf8_for_xml.c 2006-07-05 21:25:13.000000000 -0500 @@ -1,4 +1,4 @@ -#include <malloc.h> +#include <stdlib.h> #include <string.h> #include <stdio.h> @@ -11,7 +11,7 @@ * At a minimum, XML markup characters needs to be escaped. * * In the normal case, this code does nothing more than a quick scan of - * the input, and returns it back. If, however, it finds something amis + * the input, and returns it back. If, however, it finds something amiss * it will allocate another block of memory and attempt to correct a few of * the most common errors. If this occurs, it is the callers responsibility * to free the block that was allocated.
Posted by Paul Smith at
GentleCMS Development Log: Part 3
The extract method is basically done. I’m sure it could be improved a bit more, but it seems to be fairly effective. I added a few extra features beyond the original URI class’s capabilities, such as supplying a base uri to resolve...Excerpt from Sporkmonger at
if (*in == 0x09 && *in == 0x0A && *in == 0x0D) {
*c++ = *in;
} else {
looks for me as if it should be
if (*in == 0x09 || *in == 0x0A || *in == 0x0D) {
*c++ = *in;
} else {
As for using it in my own projects, is MIT/Apache License compatible with GPL? What do I Need to make it "right"?
Posted by Christian Forster atChristian: good catch. Patches by both you and Paul have been applied. I’ve also added an explicit MIT/X11 license header to the code. The FSF has deemed this license to be GPL compatible.
Posted by Sam Ruby at
GentleCMS Development Log: Part 3
The extract method is basically done. I’m sure it could be improved a bit more, but it seems to be fairly effective. I added a few extra features beyond the original URI class’s capabilities, such as supplying a base uri to resolve...Excerpt from gentlecms on SWiK at
Is the 0x8F on line 99 meant to be 0xBF? This is not a loaded question, I have no idea: It just jumped out at me as the RHS of the blocks above and below are all 0xBF.
Posted by Jon Dowland at