Gordon Weakliem: I’ve run into a few references to Atom - JSON mappings, but nothing really canonical yet.
In addition to NewsGator, there seems to be some interest at Google, AOL, and the ASF.
Perhaps it is time for an RFC?
I’ve also come to something similar as a storage format for my still-in-development blog system (which I should really put up and running, as I start to have some small things to say ;-) )
Major differences are that I’m using plurals for lists ("authors", “contributors”, “links” and “categories” vs. “author”, “link”, etc.) and a big integer to represent date-times (milliseconds since Epoch: "updated": 1168936318800
).
I’m also doing (for the moment) like Google for text constructs, with a “type” property to reflect the @type attribute and “value” (not “$t") for the content; but Gordon’s proposal of using either a "text”, “html” or “xhtml” property is really interesting (the only drawback being that “content” cannot be mapped that way; maybe something like this would do it: "content": {"type": "image/svg+xml", "xml": "..."}
, "content: {"type": "image/png", "binary": "..."}
and "content": {"type": "text/html", "src": "http://..."}
).
The main issue is with extensions. If you want to also map Atom extensions to JSON, you’ll have to deal with namespaces.
"revision": { "number": "0.1", "initial": true, "when": 1168936318800}
),"extensions": "<revision xmlns="http://purl.org/atompub/revision/1.0"><number>0.1</number><initial>yes</initial><when>2007-01-16T08:31:58Z</when></revision>"
)As a storage format, I’ve chosen then second solution, but that’s not a solution for a “generic Atom-to-JSON” mapping.
There’s also a problem with xml:lang and xml:base (this one can easily be solved: add a “base” property to the “root item” and text constructs and resolve every IRI in properties to absolute ones, or IRIs relative to the root’s "base")
Forgot to mention JsonML, so we now have a generic XML-to-JSON mapping:
"summary": { "xhtml": [ "p", "Just a single paragraph with an ", [ "em", "emphasized" ], " word" ] }
(Note that I didn’t included the <xhtml:div>
that is required in Atom’s XML serialization)
Still lacks namespace handling, so that’s still not usable for Atom extensions (or foreign markup –such as MathML or SVG– in XHTML content).
JsonML uses "xmlns"
and "xmlns:…"
attributes (the same kind of serialization as Namespaces in XML). Google uses something similar too.
Thomas Broyer wrote, among other things:
In other words, you’re recreating the Metaweblog API. Badly. And all this time I didn’t think it was possible to make it any worse.
Thomas said: "If you want to also map Atom extensions to JSON, you’ll have to deal with namespaces"
No, you need to find a good, non-crappy way of representing those extensions in JSON. Trying to come up with a workable XML-to-JSON mapping that doesn’t suck is a waste of time and helps no one.
I initially proposed JEP to try to head off attempts to map Atom into JSON.
If you want Atom then use APP.
If you want a RESTful JSON-based protocol then use JEP.
If you want to map APP into JSON either you do a full mapping of XML into JSON - since that is what foreign markup in Atom would force you to do - which would give you gorpy JSON, or you could do a partial mapping of XML into JSON and invariably drop data on the floor and upset somebody. Either way it won’t be pretty.
Joe said: "If you want to map APP into JSON either you do a full mapping of XML into JSON - since that is what foreign markup in Atom would force you to do - which would give you gorpy JSON, or you could do a partial mapping of XML into JSON and invariably drop data on the floor and upset somebody. Either way it won’t be pretty."
It’s not like JSON is actually all that difficult to produce. So what if my Atom-to-JSON serializer doesn’t serialize everything you need. Write your own or download the source to mine and add it in. Very simple. It’s called open source for a reason.
“So what if my Atom-to-JSON serializer doesn’t serialize everything you need. Write your own or download the source to mine and add it in.”
Wow, what a great basis for a spec, writing the APP would have been so much easier if I had that sentence handy to throw at any problem that arose.
James: I’m using JSON as a storage format, in the back-scene, so I don’t need a generic Atom-to-JSON mapping. If I ever need to map Atom to JSON, it’s when implementing the APP, so –as a server– I can do whatever I want with extensions. Mapping JSON to Atom when serving a feed is pretty straightforward and is my own soup.
But if you want a “standard mapping” (e.g. an IETF RFC), it has to deal with extensions; which, in Atom, are specified as XML “foreign markup”, not as “model extensions” (except for “simple extensions").
Just to say that I was pointing out issues for the one who want an RFC for a Atom-to-JSON mapping; but if you consider JSON only for specific "private” use, then you don’t ever need such a mapping: just define your own “JSON-based format” (just as if you used your own “private” DTD/schema with an XML document). That’s what I’ll do (editing entries with AJAJ-enabled HTML forms). If a standard JSON mapping arises, I’ll probably use it as an alternate serialization too.
In brief ('cause the above might be a bit messy ;-) ): if you want a standard mapping, it’ll have to deal with extensions; if you use JSON in “private” exchanges, you don’t need a standard mapping.
I’ve spent a bit of time trying to represent Atom + Extensions faithfully in a non-XML form. For extensions such as dc:modified, implementations will just want the text content of the element, they don’t want to parse an XML fragment to get at it. For more complicated extensions with attributes, children, or mixed content, implementations need the content in XML form. It is also undesirable for complicated extensions to flip between a simplified form and complex form depending on whether they happen to contain optional attributes, because that makes more work for the consumer.
In my RDF implementation, I decided to use introduce some duplication, so that extensions are represented using both an extensionXML property and an extensionText property. There is also xml:lang to worry about for non-simple extensions, xml:base, and the fact that you can’t just concatenate the namespace URI and localname, because {http://purl.org/dc/elements/1.1/}modified and {http://purl.org/dc/elements/1.1/mod}ified aren’t the same extension (unfortunately).
In JSON this would translate to something like:
“{http://purl.org/dc/elements/1.1/}modified":{text:"2007-01-01T12:00:00”, xml:"<dc:modified xmlns:dc='http://purl.org/dc/elements/1.1/'>2007-01-01T12:00:00</dc:modified>", lang:"en-GB", base:"http://example.com"}
Of course, it isn’t necessary to serialize base, lang, or xml, if you know what the extension is, and know that it isn’t needed to consume the extension, but general purpose implementations can serialize to both syntaxes to be safe.
It might also be useful to provide the option of a “json” property too, so that structured extensions can be serialized to custom JSON structs optimized for that particular extension.
Thomas said: "But if you want a “standard mapping” (e.g. an IETF RFC), it has to deal with extensions; which, in Atom, are specified as XML “foreign markup”, not as “model extensions” (except for “simple extensions")."
Dealing with extension != specifying how extensions are serialized. You can “deal with extensions” by simply saying that extensions in the JSON representation take the form of additional, arbitrary key/value pairs beyond those defined for the core Atom elements. Specifically how the values of those extensions are serialized can be left to other documents.
James wrote:
It’s not like JSON is actually all that difficult to produce.
Here’s a short quiz:
print repr(theList)
? Why or why not?print repr(theList)
? Why or why not? [ "www.詹姆斯.com", "\uB7" ]
[ true, 'TRUE', false, null ]
charset:UTF-16
parameter in the Content-Type header, which encoding should a parser respect?except possibly the last one
Actually, the JSON media type does not define any parameters. That piece of metadata is encoded in the first four octets of the JSON record. Ruby’s postulate in full effect.
application/atom+json
. Specifying application/atom-and-extensions-and-everything-you-can-possibly-cram-into-an-xml-document+json
is neither necessary nor something anybody would want to do. IMO of course.I’ll have a crack at this, no doubt making the gaps in my understanding of Unicode and character sets glaringly apparent. I am quite interested in the answers though.
# In Python, can a list of integers be converted to JSON with print repr(theList) Why or why not?
It appears to me that print repr(intList) would produce a legal JSON array, but I’m not sure you can answer this question without knowing something about your Python implementation. By spec, Python ints are defined to be minimally in the range -2147483648 through 2147483647, but implementations can allow a larger range. Some ECMAScript operators are defined only on numbers within the minimal range, so it’s possible that your Python implementation could define integers as having values outside the range of values that ECMAScript has arithmetic operations defined on. However, the JSON production for value simply defines number as one possible production, meaning that a number is not guaranteed to be an integer. So the lesson here may be that even though you’ve received a number in a JSON structure, certain operations may not be defined as numbers. Of course, this question gets much harder if your expected client is not actually a Javascript interpreter. For example, with Java or C#, you’d certainly have to take numeric limits into consideration.
# In Python, can a list of strings be converted to JSON with print repr(theList) Why or why not?
You have to consider escaping. According to the grammar for JSON, strings are delimited by double quotes, but Python implementations may use single- or double- quotes. To wit:
>>> print repr(["A string", “Yell \"fire\” in a crowded theater","Don\'t cry \"fire\" in a crowded theater","Cry ‘havoc’ and let slip the dogs of war", “""For some reason, "foo” was the answer"""])
['A string', ‘Yell “fire” in a crowded theater’, ‘Don\'t cry “fire” in a crowded theater’, “Cry ‘havoc’ and let slip the dogs of war”, ‘For some reason, “foo” was the answer’]
My Python interpreter allows itself some latitude in generating a printable representation of a string. repr is intended to generate a representation that’s valid for Python, not some other language. I think that may be Mark’s overall point, but I’ll continue.
# (true or false) JSON is case-sensitive
True, or rather, true. { ‘foo': 'bar’ } and {'Foo' : ‘bar'} are not equivalent.
# (true or false) This is valid JSON: [ “www.詹姆斯.com”, “\uB7” ]
false, strictly speaking, as an array production isn’t valid at the top level of the grammar. It’s valid as an but probably not what you meant, if you were looking for ·, that would be “\u00B7”.
# (true or false) This is valid JSON: [ true, 'TRUE’, false, null ]
True. Or rather, true. At least as far as being a valid array.
# (true or false) JSON MAY be encoded in UTF-8, or one of the two variations of UTF-16, or one of the four variations of UTF-32
true (see section 6)
# (true or false) JSON MAY contain a BOM (byte order mark) to indicate a Unicode-based encoding
true (see section 3)
(true or false) JSON parsers MUST treat windows-1252-encoded and iso-8859-1-encoded JSON files the same way
true, in that neither are Unicode. JSON SHALL be encoded in Unicode - the way I read this, if a file was served as either encoding, it’s not valid JSON.
# If a UTF-8 JSON document is served over HTTP with a charset:UTF-16 parameter in the Content-Type header, which encoding should a parser respect?
Robert Sayre pointed out in the comments that “the JSON media type does not define any parameters”, so the only encoding that’s respected is in the document itself.
an array production isn’t valid at the top level of the grammar.
JSON-text = object / array
An array is valid at the top... in my opinion.
True. Or rather, true. At least as far as being a valid array.# (true or false) This is valid JSON:
[ true, 'TRUE', false, null ]
It contains a single quoted string.
It contains a single quoted string.
Good catch. After being long winded about the Python interpreter’s quoting habits, I didn’t parse very well myself.
An array is valid at the top... in my opinion.
Indeed, RFC 4627 shows that in the BNF. The grammar given on the RHS of the json.org page shows only object at the top level.
Thomas Broyer: thanks for the JsonML link, i was going to ask if there were any projects like that underway. at some point though, there needs to be some kind of namespace support. step third and final is of course world domination.
i support the /notion/ of application/atom+json, but as we digress into the sea of extensions the usefulness becomes limited, at best. the net sum is, application/atom+json just doesnt seem wieldable nor practical. why one-off & codify one particular mapping? we really just need text/jsonized-xml.
the net sum is, application/atom+json just doesnt seem wieldable nor practical.
Given that Atom is somewhat more complete than the “core” elements in any version of RSS, it may very well turn out that a JSON mapping of just the core Atom elements, perhaps with additional limitations (like the ability for content to be of of an arbitrary mime type) or a small number of known extensions (like threading) might very well meet an 80/20 point for a lot of people.
For people who need more, there’s always XML.
I see the situation as being like wikis. A lot of people criticize wiki syntax. There certainly isn’t anything that can be done in wiki syntax that can’t be done in HTML. But having a simpler syntax — even with known limitations — turns out to be fairly crucial to the success of wikis.
I agree. Of course, I have lots of opinions.
It’s not really any worse than accepting an Atom entry and displaying an HTML file as a result. All the information might not be included. The valuable thing about Atom is the field definitions anyway. The layout routes around XML (just like RSS), and the API amounts to an HTTP users' manual (not that it’s worse for it).
All the information might not be included.
I would argue that the Text Construct in Atom is an artifact of the desire to embed markup inside of markup, and is a result of inevitable confusion that the similar escaping rules of the two layers creates.
Needless to say, this doesn’t apply to JSON.
A simplification would be to say that all text constructs are the equivalent of what can occur inside an HTML5 div element. If you literally want an angle bracket, you would have to escape it. If you want an ampersand, you should escape it, but might be safe otherwise. If you wish to use any other HTML predefined entity, feel free.
I would argue that the Text Construct in Atom is an artifact of the desire to embed markup inside of markup
Yep. And the XHTML variant is an artifact of the desire to embed XML in XML+Namespaces. How would you differentiate between HTML and XHTML in JSON, without making implementations choose between “html” and “xhtml” fields with hazy multipart semantics? Co-constraint?
Having a single spec (Atom in XML 1.0) for the bits on the wire is good for interop. I think it would be bad if people started offering feeds in JSON and consumers had to support that in addition to Atom, jumped-the-gun Atom 0.3 and all the mutually incompatible flavors of RSS.
It is reasonable for a Web app to want to transfer data between a server and a browser in JSON, but I don’t see why this couldn’t be a private thing for a service that wants it.
How would you differentiate between HTML and XHTML in JSON
My point is that you wouldn’t, though I was unclear and imprecise. If maintaining that distinction is important, then use the XML representation.
Let me expand on that.
The Universal Feed Parser will convert a feed, be it Atom, jumped-the-gun Atom 0.3 or any of the mutually incompatible flavors of RSS, to a struct.
Along the way, it will drop many extensions, and horribly mangle a number of others. Yet many (including me) find this code to be incredibly useful.
At to text constructs, it will sanitize, canonicalize and serialize both HTML and XHTML into a string. Along the way, information such as whether a given attribute was single quoted, double quoted, or even unquoted is lost. Venus, which uses the UFP, doesn’t need or care to know about notational distinction such as these in order to support SVG or MathML. In fact, it will happily render ill-formed SVG which has been escaped and stuffed into the description element of an ill-formed RSS 2.0 feed.
Yes, I think we agree. There will be information lost in translation. You could even call it silent data loss. But I don’t think it’s a bad idea.
I do question the value of standardizing at this time. Is there evidence of implementor or user pain? Or are we concerned Don Box will standardize it for us? ;)
sam, to that i concede: atom + atom threading would qualify for 80/20. there’s no doubt it would have significant impact on web technologies.
but you have to feel bad for the 20% left over that have to revert to hacking JsonML into usability, using XML, or fudging it when the spec hits up against use limitations. there’s no doubt i’m walking into a minefield for discussing XML-to-json mappings, but the alternative of special casing out one proprietary mapping is sacrificing a genuine durable interoperability between json and xml renditions (in this case, of atom), and compromising on some magic spice to make it work “at least well enough”. there’s ups and downs to that, but the greatest danger i see is providing just enough “core” to get by while never bothering to tackle the larger issue.
and what if the larger issue does get tackled? what if we create a clean way of reduce XML into object data structures, a universal bidirectional mapping? application/atom+jsonml? how many code paths is atom going to demand? when do we re-engineer atom for yaml?
i really haven’t seen much discussion on JsonML or projects like it, but as the core technological problem at hand, I think they deserve some very serious inquiry before one off projects like application/atom+json can even be considered. i don’t know XML well enough to judge the full scope or feasibility of reducing XML into a simpler JSON data structure, but the idea seems noble, and JsonML seems so far both simple & bluntly practical. if possible, it would would knock out this problem for good, and a million other similar problems down the road. i keep a reasonably active feed list going, but in all the json-xml hubub, i’ve never seen mention of JsonML or any translation project like it before. undoubtedly there’ll’ve been some smart people thinking about the topic, but I’d like to see some kind of public discourse on translation before even starting the discussion on re-implementing specifications.
for my own web service projects, i’ve been using deliberately simply data structures that can easily be represented in both json and xml, then using output filters to serialize the data structures to the clients desired markup. being able to use the appropriate serializer where desired has been enormously conveniencing. the notion of being able to map any xml doc into json is spine tingling, and warrants some serious exploration, before we start spec re-implementations.
hopefully you’ll pardon me for my nay saying, my intent was not purely to be contrary here. i just worry that a re-implementation will set an extremely poor (and very popular) precedent for future system intercompatibiltiy, when really whats wanted is a translation.
but the alternative of special casing out one proprietary mapping is sacrificing a genuine durable interoperability between json and xml
It’s not a sacrifice, because there is no such thing, there can never be such a thing without destroying JSON or XML or both, and that’s a feature not a bug. Of course there will be enormous amounts of time and energy wasted by people who don’t realize that yet, who venture out determined to build such a chimerical thing. These people will erect wikis and spawn mailing lists and spill much virtual ink arguing about the finer points of this thing-that-can-not-be-built. This course is as predetermined as the outcome, not even a matter of free will: now that both concepts exist in the collective consciousness, there exists a group of people who yearn to build a bridge between them. It is what they do. That it is guaranteed to end in tears or bloodshed or colossal failure or RDF does not deter them, has never deterred them, and will not deter them this time, or the next, or the time after that.
My God. I think Mr. Sayre and I actually agree on something. What in the hell is this world coming to.
I’m not in any particularly rush to standardize, but having a common doc or example we can all generally point to and say, “We’re doing it like that” is useful. An I-D is as good as anything for that purpose.
mark, i need to have some inkling of an idea why json and xml are completely irreconcileable before i can scratch the notion out. i’m willing to accept that there are deep running latent issues ready to foible the whole mess into a never ending hopelessly unexact half-specification, but you’ll forgive me if i dont take your naysaying as immediate canon in discovering that nasty truth.
taking the one and only case example on the entire subject, jsonml does a fairly good job of imposing a little added structure to allow json markup of the xml data structure. xml is preserved, untouched, and json gains a couple little syntax and structure requirements. its certainly against the spirit of json, but most any json structure could be ported with absolutely minimal difficulty to jsonml.
its all just data. xml just does a good job of making it overly complex. what couple bridges i’ve made have so far not ended in tears or bloodshed, and i have no reason to believe that a concerted effort couldnt produce something adequate.
I’m not in any particularly rush to standardize, but having a common doc or example we can all generally point to and say, “We’re doing it like that” is useful.
Sounds like you’re in a rush to standardize to me. You can make a web page documenting what you’ve done. Just like Google has already done.
Sayre said: "You can make a web page documenting what you’ve done"
Nah, having the code available is good enough.
Any news on this recently? I’ve run into a similar need for the Open Cloud Computing Interface (OCCI) and would rather like an easy way to convert the native OCCI Atom feed into JSON for web interfaces and the like.
Sam
Ok so it turns out James Snell had subsequently written Convert Atom documents to JSON, Apache Abdera looks interesting and there’s projects like XSLTJSON which look interesting too...
Sam