In the current JSON vs. XML debate (see Bray, Winer, Box, Obasanjo, and many others), there are three things that important to understand:
- There is no information that can be represented in an XML document that cannot be represented in a JSON document.
- There is no information that can be represented in a JSON document that cannot be represented in an XML document.
- There is no information that can be represented in an XML or JSON document that cannot be represented by a LISP S-expression.
They are all capable of modeling recursive, hierarchical data structures with labeled nodes. Do we have a term for that, like Turing completeness for programming languages? It would certainly be convenient in discussions like this.
Syntactic sugar
The only important differences among the three are the size of the user base (and opportunity for network effects), software support, and syntactic convenience or inconvenience. The first two are fickle — where are the Pascal programmers of yesteryear? — so let’s concentrate on syntax. Here’s a simple list of three names in each of the three representations:
<!-- XML --> <names> <name>Anna Maria</name> <name>Fitzwilliam</name> <name>Maurice</name> </names>
/* JSON */ {"names": ["Anna Maria", "Fitzwilliam", "Maurice"]}
;; LISP '(names "Anna Maria" "Fitzwilliam" "Maurice")
Nearly all comparisons between XML and JSON look something like this, and I have to admit, it’s a slam dunk — in an example like this, XML seems to go out of its way to violate Larry Wall‘s second slogan: “Easy things should be easy and hard things should be possible.” On the other hand, I rarely see any data structures that are really this simple, outside of toy examples in books or tutorials, so a comparison like this might not have a lot of value; after all, I could have written the XML like this:
<names>Anna Maria, Fitzwilliam, Maurice</names>
Let’s dig a bit deeper and see what we find.
Node labels
In the previous example, I made some important assumptions: I assumed that node label for the individual names (“name”) didn’t matter and could be omitted from the JSON and LISP, and I assumed that the node label for the entire list (“names”) was a legal XML and LISP identifier. Let’s break both of those assumptions now, and make the label for the list “names!” and the labels for the items “male-name” or “female-name”. Here’s what we can do now to handle this in XML, JSON, and LISP:
<!-- XML --> <list label="names!"> <female-name>Anna Maria</female-name> <male-name>Fitzwilliam</male-name> <male-name>Maurice</male-name> </list>
/* JSON */ {"names!": [ {"female-name": "Anna Maria"}, {"male-name: "Fitzwilliam"}, {"male-name": "Maurice"}]}
;; LISP '(names! (female-name "Anna Maria") (male-name "Fitzwilliam") (male-name "Maurice"))
XML is forced to use a secondary syntactic construction (an attribute value) to represent the top-level label, because it no longer matches XML’s syntactic rules for element names. LISP simply switches from a token to a string to represent “names!”can still use names! as a token, and JSON doesn’t notice, because it has been using a string all along — XML syntax is convenient for trees of labeled nodes only when the labels are heavily restricted. That aside, however, note that as soon as we add any non-trivial complexity to the information — as soon as we assume that node labels matter — then all three formats start to look a little more like XML.
Additional node attributes
Now, let’s add the next wrinkle, by allowing additional attributes (beside a label) for each node. In this case, we’re going to add a “lang” (language) attribute to each of the nodes:
<!-- XML --> <list label="names!"> <female-name xml:lang="it">Anna Maria</female-name> <male-name xml:lang="en">Fitzwilliam</male-name> <male-name xml:lang="fr">Maurice</male-name> </list>
/* JSON */ {"names!": [ {"female-name": [{"lang": "it"}, "Anna Maria"]}, {"male-name: [{"lang": "en"}, "Fitzwilliam"]}, {"male-name": [{"lang": "fr"}, "Maurice"]}]}
;; LISP '(names! (female-name (((lang it)) "Anna Maria")) (male-name (((lang en)) "Fitzwilliam")) (male-name (((lang fr)) "Maurice")))
Now, while XML is still using ad-hoc convention to represent the “name!” label, JSON and LISP are forced to use ad-hoc conventions to represent attribute lists (a dictionary list for JSON, and an a-list for LISP). It’s also worth noting that JSON and LISP now look so much like XML, both in length and complexity, that it’s hardly possible to distinguish them. Node attributes are not esoteric — they’re the basis of such simple things as hyperlinks.
Data typing
XML certainly looks better for the attributes, but now let’s jump to data typing. Let’s assume that there is a country where people use real numbers as names, and we need to find a way to distinguish names that are real numbers from names that just happen to look like real numbers (say, a person named “1.7” in a country where names are strings). JSON and LISP can make that distinction naturally using first-class syntax, while XML has to use a different standard that is not part of the core language:
<!-- XML --> <list label="names!" xmlns:xsd="http://www.w3.org/2001/XMLSchema" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"> <female-name xml:lang="it">Anna Maria</female-name> <male-name xml:lang="en">Fitzwilliam</male-name> <male-name xml:lang="fr">Maurice</male-name> <female-name xsd:type="xsi:float" xml:lang="de">7.9</female-name> </list>
/* JSON */ {"names!": [ {"female-name": [{"lang": "it"}, "Anna Maria"]}, {"male-name: [{"lang": "en"}, "Fitzwilliam"]}, {"male-name": [{"lang": "fr"}, "Maurice"]}, {"female-name": [{"lang": "de"}, 7.9]}]}
;; LISP '(names! (female-name (((lang it)) "Anna Maria")) (male-name (((lang en)) "Fitzwilliam")) (male-name (((lang fr)) "Maurice")) (female-name (((lang de)) 7.9)))
XML loses badly on this particular example; however, if the extra data were (say) a date or currency, we would have to make up an ad-hoc way to label its type in JSON and LISP as well, since they have no special syntax to distinguish a date or monetary value from a regular number or string. For anything other than simple numeric data types, this one’s actually a draw.
Mixed content
And now, finally, for mixed content. I will add surnames to all of the (non-numeric) names in the list, and (here’s the kicker) will put those in their own labeled nodes:
<!-- XML --> <list label="names!" xmlns:xsd="http://www.w3.org/2001/XMLSchema" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"> <female-name xml:lang="it">Anna Maria <surname>Mozart</surname></female-name> <male-name xml:lang="en">Fitzwilliam <surname>Darcy</surname></male-name> <male-name xml:lang="fr">Maurice <surname>Chevalier</surname></male-name> <female-name xsd:type="xsi:float" xml:lang="de">7.9</female-name> </list>
/* JSON */ {"names!": [ {"female-name": [{"lang": "it"}, "Anna Maria", {surname: "Mozart"}]}, {"male-name: [{"lang": "en"}, "Fitzwilliam", {surname: "Darcy"}]}, {"male-name": [{"lang": "fr"}, "Maurice", {"surname": "Chevalier"}]}, {"female-name": [{"lang": "de"}, 7.9]}]}
;; LISP '(names! (female-name (((lang it)) "Anna Maria" (surname "Mozart"))) (male-name (((lang en)) "Fitzwilliam" (surname "Darcy"))) (male-name (((lang fr)) "Maurice" (surname "Chevalier"))) (female-name (((lang de)) 7.9)))
Character for character, the JSON and LISP are still shorter, but the difference is not nearly as dramatic as it was in the very first example. In fact, typing all of these examples by hand, I find myself appreciating the redundant end tags on the XML parts, because it’s getting very hard to keep track of all the closing “]”, “}” and “)” for JSON and LISP.
No silver bullet
There are a few morals here. First, with markup, as with coding, there’s no silver bullet. JSON (and LISP) have the important advantage that they make the most trivial cases easy to represent, but as soon as we introduce even the slightest complexity, all of the markup starts to look about equally verbose. That means that the real problems we have to solve with structured data are no longer syntactic, and anyone trying to find a syntactic solution to structured data is really missing the point: JSON, XML (and LISP) people would be best making common cause to start dealing with more important problems than whether we use braces, pointy brackets, or parentheses. That’s why I was excited to have JSON inventor Doug Crockford speak at XML 2006, and why I hope that we’ll get more submissions about JSON as well as XML for 2007.
Personally, I like XML because it’s familiar and has a lot of tool support, but I could easily (and happily) build an application based on any of the three — after all, once I stare long enough, they all look the same to me.