Rick Blommers: ReXML seems to escape items very nicely when setting values. But it doesn’t unescape the values with REXML::Document.new( … )
A bare minimum amount of functionality that one would expect from an XML parsing library is the ability to round-trip data.
The two things One thing I have yet to find is where I can SVN checkout the latest code, and how to run the exiting set of tests. I would like to submit new tests which expose the problems I have found so far, and patches to correct these issues. Ideally in time for 3.1.8.
Rick Blommers: ReXML seems to escape items very nicely when setting values. But it doesn’t unescape the values with REXML::Document.new( … )
A bare minimum amount of functionality that one would expect from an XML parsing library is the ability to round-trip data. If you parse a document and immediately reserialize the result, you would expect to get the original back. If you create a DOM, serialize it, and parse the results, you would expect to get the original back. The version of REXML that comes with Ruby 1.8.4 gives you the latter. The version of REXML that comes with Ruby 1.8.6 gives you the former. Neither gives you both.
This test case can be used to explore this situation. When run using Ruby 1.8.6, and you pass nots
(no test serializer) as a command line argument, you will see that everything passes. If you pass notp
(no test parser) instead, you will see 30 failures. Running with mp notp
(monkey patch and no test parser) and everything passes, but running with mp nots
and you will see 30 failures.
The root problem is in text.rb. Line 147 will “normalize” (entity encode) @string
in response to calls to to_s
. Line 174 will “unnormalize” (entity decode) @string
in response to calls to value
.
The key question is: is @string
already entity encoded (in which case normalize
will double encode it)? Or is @string
already entity decoded (in which case value
will double decode it). The answer can be found in @raw
. If it is set, the attribute is assumed to be entity encoded, in which case to_s
simply returns it. If it is not set (the default), you would assume that the reverse would be true, but no such short circuiting exists in value
. Additionally, the keyword return
is missing in the first line of value, eliminating a potential optimization.
There are other issues with the code. For example, try REXML::Text.unnormalize('&')
(which works as expected) and REXML::Text.unnormalize('&&')
(which doesn’t).
“when the world ends, the only things left will be cockroaches, rats, Keith Richards, and mangled text that has been escaped one-too-many or one-too-few times” — Dave Walker
The two things I have yet to find is where I can SVN checkout the latest code, and how to run the exiting set of tests. I would like to submit new tests which expose the problems I have found so far, and patches to correct these issues. Ideally in time for 3.1.8.
Pointers appreciated.
svn co http://www.germane-software.com/repos/rexml/trunk/
works for me.
There’s a bin/suite.rb
to run the test suite.
svn co http://www.germane-software.com/repos/rexml/trunk/
works for me.
Thanks!
There’s a bin/suite.rb to run the test suite.
346 tests, 1225 assertions, 10 failures, 8 errors
:-(
bin/suite.rb
(as opposed to ruby bin/suite.rb
, reduces this down to one error: No such file or directory - test/xml/ticket_110_utf16.xml
. I can work with that.Mankind’s ability to write software is far in advance of mankind’s ability to determine what it does (or does not) do. Scary, but true.
Also, from a commercial legal point of view, it is a question of ‘Do I have the right to distribute this software’ (for any price or for none). Consideration of whether it solves any problem for a client ... or indeed whether it solves any problem at all ... comes a distant second.
...Which ones? I’m sure you can find numerous others, but things like [link] (note that Sam hoped for inclusion in 3.1.8, we’re still at 3.1.7 18 months after 3.1.8 was supposed to be