GIGO. It is
easier to produce correct XML output if you have correct XML
input. One way to achieve this is to ensure that data that
is not well formed XML can never to be stored. With Ruby on Rails,
this can be enforced with validation rules that invoke a parser,
and throw an error upon failure, thus:
require 'xml/parser'
class Entry < ActiveRecord::Base
validates_each :title, :summary, :content do |model, attr, value|
@@xmlparser ||= XML::Parser.new
begin
@@xmlparser.parse "<div>#{value}</div>" if value
rescue
model.errors.add attr, 'is not well formed XML'
ensure
@@xmlparser.reset
end
end
end
And tests such as these can verify the correct operation:
class EntryTest < Test::Unit::TestCase
fixtures :entries
def setup
@entry = Entry.find(:first)
end
def test_title_not_wellformed
@entry.title = "AT&T"
assert @entry.save, message="well formed title can't be saved"
@entry.title = "AT&T"
assert ! @entry.save, message="not well formed title saved"
assert_equal "is not well formed XML", @entry.errors.on(:title)
end
end
As a footnote, the verification logic took three attempts to get
right. My first attempt was to use
REXML.
While it is certainly the most elegant Ruby XML API, it seems to
accept a variety of ill-formed XML fragments, for example the
following produces no error:
Next, I tried libxml2.
While the following correctly reported the errors, it also did so
on STDERR.
require 'xml/libxml'
p = XML::Parser.new
p.string = "<div>at&t"
p.parse
My third attempt uses
Expat and serves my
needs just fine.
Peeking into the implementation of REXML, I see that it is riddled with regular expressions. Having a parser that doesn’t detect errors properly is one thing, but having a parser that incorrectly parses valid input is quite another. I’ve opened a ticket on one such problem. Depending on how it is received, I may open others.
While the following correctly reported the errors, it also did so on STDERR
I had the same problem. After lots of trawling through undocumented spaghetti code, I found it can be solved (in C) with a simple xmlSetGenericErrorFunc(NULL,xmlErrorHandler), where xmlErrorHandler is a dummy function that does nothing. Don’t know about Ruby.
via Sam Ruby : While [ REXML ] is certainly the most elegant Ruby XML API, it seems to accept a variety of ill-formed XML fragments, for example the following produces no error: [ <div>at&t ] F’real? That is, not only missing end tag, but...
via Sam Ruby: While [ REXML] is certainly the most elegant Ruby XML API, it seems to accept a variety of ill-formed XML fragments, for example the following produces no error: [<div>at&t] F’real? That is, not only missing end tag, but...
I mentioned previously that libxml2 had a habit of writing to STDERR. With the Python bindings, this can be mitigated by the use of an error handler global to the library. The steps below describe how to add equivalent functionality to Ruby’s...
[more]
For the final project for my web architecture class, I can choose what I want to do as long as it’s sufficiently webby. I have a lot of ideas saved, but I’ll probably work alone and the project is due in less than a month; the proposal is due...
I have previously admired the Ruby language, albeit from a distance, and been impressed by the vigor of the Rails community. In the last week I have written a few hundred lines of Ruby code that actually do something useful and I’ll probably release...