The ability to have methods to init and complete the
datastructure can offload a lot of work off the application.
The ability to define an attribute as a list enables one to iterate
over child elements of a like kind... a common need in many XML
grammars.
When I wrote this logic, I marveled at the ability to easily
define an application specific mapping in Python vs using an
existing framework, but now that I look back, there would be value
in factoring this out into a reusable component.
Sam,
The TRAMP notion is very nice, and I found another use for it, which is in devising an XML format which one stores somewhere. At work, I have a custom XML format used to define different parts of an astronomical pipeline, and a rule format(like but less capable and much simpler than bpel-ws) which routes xml-rpc around the pipeline to control a bunch of telescopes. I'm at the point where I am experimenting with data being sent out of band, and in trying to come up with the right constructs in the pipeline to have components talk the same data transport, non-validated quick add instead of if element; iterate over element really helps, i was earlier using minidom and had an awkward 2 copies, an object format, and the dom itself.
Having said that, I can find two or 3 aspects which might be very useful(or not) in such a situation. I wanted to toss them out for discussion..
(a) Should array valued tags be accesible just by name? (not sure at all about this)
(b) lists and dictionaries should be accessed by interface only. This may be already done since I am guessing new style classes allow reimplementation of all 'operator' methods. The point here is to allow a direct persistent implementation by just coosing the class of the key types..an example would be directly smacking the XML into ZODB.
(c) certain attributes, again specifyable, ought to be 'better attributes' than others. Examples include id, class, and name. The advantage to this would in be xhtml and xml parsing where elements arent descriptive, attributes could be used in the dotted notation. That is div class='section' under body might be more appropriately body.section. This could make for easy non-templated weblog styles with both the CSS and the logic flow outside the so-called template. (Note this is not supposed to be like Zope's TAL where logic is in the tags, rather, all logic ought to be Quioxote style in python..but the designer can show by example now..).
(d) how does one handle XML namespaces? Using dicts?
Sam -- the URL to the gump file has an <em> element around the init name. I tried a couple ways to keep them out of this comment when I include the URL. Some step in the formatting of the comment is mucking with it.
Rahul, re (d), yes, in the patch Aaron and I worked on, dicts (_getitem_) and, in an unusual way, call syntax (_call_), depending on whether you're indexing attributes or content. Both take a tuple of (URI, localName), as is common in python.
There's a helper class Namespace to do that for you:
DC = Namespace('http://purl.org/dc/elements/1.1/')
date = xml[DC.date]
Hmm... XML Objectify (which I contributed to), available by David Mertz here: http://gnosis.cx/download/Gnosis_Utils-current.tar.gz does the same thing as xmltramp, it can work off expat (scaling much better) and allows you to override class/tag behavior in Python. Xmltramp is a nice hack (and much prettier than Objectify, source-code wise), but XO is way more powerful.
There are many OSS packages that have attempted to do this. I have written at least three (lisp, python, java) and an XML schema code generator.
My experience is that they are fun toys but really can't be used in any large project because the abstraction ends up breaking down (at least with current languages).
Namespaces are a good example. In your binding how do you deal with ns prefixes in elements that have the same local name across schemas? You can prepend the schema prefix to the member or method name but it is still ugly.
I think XPath is a better way to handle this. The object nesting approach could be considered very simple xpath. The advantage here is that you don't have another abstraction but you can still get access to the data.
This is what I use in some places of NewsMonster (as well as XSLT).... I think it works really well.
The biggest problem is the penetration of XPath implementations that can be used as a library. The impl of Jaxen in JDOM is very nice I might add. You can run an xpath query and get back a JDOM Element which is really cool.
The major problem comes when you start getting into RDF as no standard RDF query language is available. Jena provides a decent query language (Swish right?) and I might incorporate this into newsmonster but then I don't have native XML support for the data just RDF.
... again... none of this stuff is ever perfect :)
Sam Ruby writes, "When I wrote this logic, I marveled at the ability to easily define an application specific mapping in Python vs using an existing framework, but now that I look back, there would be value in factoring this out into a reusable...
[more]
Trackback from Sterling Hughes
at
May I point out SXSLT then? I believe it achieves the complete
integration of XML and XPath into a programming language. There is no
longer an impedance mismatch. SXSLT has two notations: one of them is
the compliant XPath implementation. The other notation differs in the
style and placement of parentheses. The interpreters/compilers for
both notations are regular functions and can be used
anywhere. Furthermore, the SXPath notation permits any custom predicate as
a filter or a path adjoiner. SXPath has been used in practice. The papers
My gosh! There are so many cool XML APIs for Python! Check out xmltramp, and I've already linked to PyMeld before. Plus, check out this link from a comment on Sam Ruby's blog to this library for using S-expressions to deal with XML. There are two...