atom2json.erl converts a directory of Atom files to a directory of JSON files. As with most real-life problems, this one has multiple layers.
First one needs to settle on an XML to JSON mapping. It turns out that there are many different approaches to this problem. For now, I elected to do some generic XML-to-JSON mapping crap. An RFC in this area would be helpful, particularly one that dealt with the notion of Extensions, and one that exposes the true structure of [x]html Text Constructs as those would crucial enablers for things like standard Map/Reduce jobs that extract Microformats and RDFa.
Next, it turns out that the data structures returned from the XML parser/builder (xmerl) are not what the JSON parser/builder (rfc4627) expects, so there’s yet another layer of impedance mismatch.
The next level down, there are Erlang concepts of tuples, arrays, binary, and (lower case) atoms that need to be dealt with. Even lower down, there is utf-8 which apparently the current rfc4627 implementation doesn’t properly handle, so that module needs to be patched. (Note: this is only for the JSON builder part, another patch would be required to support JSON parsing).
Add in requirements like coalescing consecutive XML text nodes, and the desire to spawn a separate thread per conversion, and the task seems like a fairly daunting one. Yet the resulting Erlang program is remarkably compact, clean, and simple.
With many frameworks and languages, I get the feeling that I’m dealing with a metal cabinet covered by layers of marine paint; one where scratches tend to reveal sharp edges. With Erlang, I get the feeling of a Victorian mahogany armoire; one where scratches in the wood simply reveal more rich wood.
Interesting, and thanks again - my planned bit of late night fun for today is to drag various chunks of pre-existing XML data into Mnesia. It looks very much like your code will give me a bit of a leg up. At the very least, I no longer have to look in the documentation to find file:list_dir. :)
I have similar feelings about Erlang, it would seem. It was Ewan Silver’s comment of “The more I look at, and play with, Erlang the more I like it.” that made me finally take the plunge and start tinkering around with Erlang. I know exactly what he meant now - I like it more every day.
It’s worth noting though, that while YOUR resulting Erlang program is undeniably clean, compact and simple (I was expecting a lot more code after reading your post), it’s also possible to produce an extremely unpleasant mess with Erlang in the wrong hands. That’s true of any language of course, but I have a hunch that Erlang is very near the top of the “bad code potential per pound” list.
Beautiful work!
The part of converting xml entity to json encoder ready list/tuple, can actually be even more concise:
json(#xmlElement{name=Name, attributes=Attributes, content=Content}) -> [atom_to_binary(Name), {obj, json(Attrs)}, json(Content)]; json(List) when is_list(List) -> json_1(List, []). json_1([], Acc) -> lists:reverse(Acc); json_1([#xmlAttribute{name=Name, value=Value}|Rest], Acc) -> json_1(Rest, [{Name, list_to_binary(xmerl_ucs:to_utf8(Value))}|Acc]); json_1([#xmlElement{}=Element|Rest], Acc) -> json_1(Rest, [json(Element)|Acc]); json_1([#xmlText{value=Value1},#xmlText{value=Value2}|Rest], Acc) -> json_1([#xmlText{value = Value1 ++ Value2} | Rest], Acc); json_1([#xmlText{value=Value}|Rest], Acc) -> json_1(Rest, [list_to_binary(xmerl_ucs:to_utf8(Value))|Acc]); json_1([_Other|Rest], Acc) -> json_1(Rest, Acc).