Ben Smedberg: Announcing the release of the Atom 1.0 plugin for Wordpress, version 0.5
Looking at the output, it is valid. Bravo!
Looking closer, the plugin marks titles as html
as they might contain markup. Most don’t. It marks content elements as html
as they might not be well formed. Most are. And it uses CDATA, even though the text might contain the characters ]]>
. Admittedly, most don’t, but they might.
Last time I checked, PHP has the ability to do if-checks. :-)
This might be overkill, but I’m just throwing it out there for consideration. text.php. text.phps. Test cases are near the bottom.
Well, then it gets interesting.
WP’s titles are natively HTML: type ampersand-g-t-semicolon, your web page displays a greater-than symbol. In get_the_title_rss(), filters for both the_title and the_title_rss are applied, so it goes through wptexturize(), convert_chars(), trim(), strip_tags(), ent2ncr(), and wp_specialchars(). For the most part those are no-ops for us, but if you have a raw ]]> in your title, the behavior of strip_tags() depends on what else you have: “<3 the ]]>” will become “”. But if you get your ]]> past there, wp_specialchars() will replace the > with a named character entity reference. I think the standard expectation is that it’s producing plain text, which will not be in a CDATA section, but that seems to approximately work out as also being “HTML” which is in a CDATA section.
For summary? the_excerpt_rss() expects to be in a CDATA section, as the content of an RSS 2.0 description element, so I’m guessing it’s prepared as meatcake. I’ll try to start looking at it earlier, tomorrow night.