Until recently, Dare’s reds would show through on Planet Intertwingly, but Antonio’s yellow’s would be stripped. The reason was that Dare uses the
<font> tag, and Antonio uses the
style attribute. Both approaches should be equally valid, the only difference is that the latter is more difficult to correctly parse.
The Universal Feed Parser is not known for shying away from difficult problems, and I saw no reason why this situation should be any different. That being said, I didn’t aim to solve the general problem of parsing all possible CSS, I merely aimed to allow through a large subset of CSS that is both simple to parse and known to be safe.
This provided other benefits, for example, inset images on many feeds displayed as inset images on Planet Intertwingly.
Per Feed Customization
While this made a dramatic improvement, it still didn’t capture everything. It turns out that a number of sources either put too much or too little style information into their feeds.
BoingBoing often puts a
<br clear="both"> in descriptions. Engadget does something similar with
<h6> tags. This has the effect of leaving large gaps when these items appeared alongside the subscription list which “floats” to the top right of the page.
Rogers Cadenhead places
class="sourcecode" on paragraphs and span tags when he is referencing source code. This displays using a monospace font on his site, but this style information is not syndicated along with his feed. I do something similar on my site, but I use
<code> tags as these degrade nicely.
Henri Sivonen places
<p> tags inside of
<ol> elements, and then uses CSS to reduce the gaps between list items.
center class names on images to cause them to float or to be centered. Again, the style sheet which describes the desired behavior associated with these classes is not placed into the feed.
Most of these issues are solvable with a little css (search for "Accomodations"). However, as the body of Planet Intertwingly is not positioned absolutely and has a floating subscription list, setting the left and right margins to
auto does not center an image. But even in this case,
display:block is an improvement.
It occurs to me that I’ve seen these problems solved before, and with a better tool. And I even have that the important piece installed on my machine...
I’d love to see all HTML processing in UFP become pluggable, and for a plug-in based on Mozilla to become a reality. Many of the pieces seem to be in place. After an
apt-get install python2.4-gtk2, I find that I can import gtkmozembed from within Python. It looks like more pieces to the puzzle are (or will) become available with GtkMozEdit. But I don’t believe that fine grained access to the DOM from within Python is either necessary or even desirable.
I’d much rather use DOM/XPath techniques than regular expressions.
At this point, it occurs to me that a number of people who read this weblog have far more experience and/or better contacts than I do to help pull these pieces together.