All markup ends up looking like XML

In the current JSON vs. XML debate (see Bray, Winer, Box, Obasanjo, and many others), there are three things that important to understand:

  1. There is no information that can be represented in an XML document that cannot be represented in a JSON document.
  2. There is no information that can be represented in a JSON document that cannot be represented in an XML document.
  3. There is no information that can be represented in an XML or JSON document that cannot be represented by a LISP S-expression.

They are all capable of modeling recursive, hierarchical data structures with labeled nodes. Do we have a term for that, like Turing completeness for programming languages? It would certainly be convenient in discussions like this.

Syntactic sugar

The only important differences among the three are the size of the user base (and opportunity for network effects), software support, and syntactic convenience or inconvenience. The first two are fickle — where are the Pascal programmers of yesteryear? — so let’s concentrate on syntax. Here’s a simple list of three names in each of the three representations:

<!-- XML -->
<names>
  <name>Anna Maria</name>
  <name>Fitzwilliam</name>
  <name>Maurice</name>
</names>
/* JSON */
{"names": ["Anna Maria", "Fitzwilliam", "Maurice"]}
;; LISP
'(names "Anna Maria" "Fitzwilliam" "Maurice")

Nearly all comparisons between XML and JSON look something like this, and I have to admit, it’s a slam dunk — in an example like this, XML seems to go out of its way to violate Larry Wall‘s second slogan: “Easy things should be easy and hard things should be possible.” On the other hand, I rarely see any data structures that are really this simple, outside of toy examples in books or tutorials, so a comparison like this might not have a lot of value; after all, I could have written the XML like this:

<names>Anna Maria, Fitzwilliam, Maurice</names>

Let’s dig a bit deeper and see what we find.

Node labels

In the previous example, I made some important assumptions: I assumed that node label for the individual names (“name”) didn’t matter and could be omitted from the JSON and LISP, and I assumed that the node label for the entire list (“names”) was a legal XML and LISP identifier. Let’s break both of those assumptions now, and make the label for the list “names!” and the labels for the items “male-name” or “female-name”. Here’s what we can do now to handle this in XML, JSON, and LISP:

<!-- XML -->
<list label="names!">
  <female-name>Anna Maria</female-name>
  <male-name>Fitzwilliam</male-name>
  <male-name>Maurice</male-name>
</list>
/* JSON */
{"names!": [
  {"female-name": "Anna Maria"},
  {"male-name: "Fitzwilliam"},
  {"male-name": "Maurice"}]}
;; LISP
'(names!
  (female-name "Anna Maria")
  (male-name "Fitzwilliam")
  (male-name "Maurice"))

XML is forced to use a secondary syntactic construction (an attribute value) to represent the top-level label, because it no longer matches XML’s syntactic rules for element names. LISP simply switches from a token to a string to represent “names!”can still use names! as a token, and JSON doesn’t notice, because it has been using a string all along — XML syntax is convenient for trees of labeled nodes only when the labels are heavily restricted. That aside, however, note that as soon as we add any non-trivial complexity to the information — as soon as we assume that node labels matter — then all three formats start to look a little more like XML.

Additional node attributes

Now, let’s add the next wrinkle, by allowing additional attributes (beside a label) for each node. In this case, we’re going to add a “lang” (language) attribute to each of the nodes:

<!-- XML -->
<list label="names!">
  <female-name xml:lang="it">Anna Maria</female-name>
  <male-name xml:lang="en">Fitzwilliam</male-name>
  <male-name xml:lang="fr">Maurice</male-name>
</list>
/* JSON */
{"names!": [
  {"female-name": [{"lang": "it"}, "Anna Maria"]},
  {"male-name: [{"lang": "en"}, "Fitzwilliam"]},
  {"male-name": [{"lang": "fr"}, "Maurice"]}]}
;; LISP
'(names!
  (female-name (((lang it)) "Anna Maria"))
  (male-name (((lang en)) "Fitzwilliam"))
  (male-name (((lang fr)) "Maurice")))

Now, while XML is still using ad-hoc convention to represent the “name!” label, JSON and LISP are forced to use ad-hoc conventions to represent attribute lists (a dictionary list for JSON, and an a-list for LISP). It’s also worth noting that JSON and LISP now look so much like XML, both in length and complexity, that it’s hardly possible to distinguish them. Node attributes are not esoteric — they’re the basis of such simple things as hyperlinks.

Data typing

XML certainly looks better for the attributes, but now let’s jump to data typing. Let’s assume that there is a country where people use real numbers as names, and we need to find a way to distinguish names that are real numbers from names that just happen to look like real numbers (say, a person named “1.7” in a country where names are strings). JSON and LISP can make that distinction naturally using first-class syntax, while XML has to use a different standard that is not part of the core language:

<!-- XML -->
<list label="names!" xmlns:xsd="http://www.w3.org/2001/XMLSchema" 
    xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
  <female-name xml:lang="it">Anna Maria</female-name>
  <male-name xml:lang="en">Fitzwilliam</male-name>
  <male-name xml:lang="fr">Maurice</male-name>
  <female-name xsd:type="xsi:float" xml:lang="de">7.9</female-name>
</list>
/* JSON */
{"names!": [
  {"female-name": [{"lang": "it"}, "Anna Maria"]},
  {"male-name: [{"lang": "en"}, "Fitzwilliam"]},
  {"male-name": [{"lang": "fr"}, "Maurice"]},
  {"female-name": [{"lang": "de"}, 7.9]}]}
;; LISP
'(names!
  (female-name (((lang it)) "Anna Maria"))
  (male-name (((lang en)) "Fitzwilliam"))
  (male-name (((lang fr)) "Maurice"))
  (female-name (((lang de)) 7.9)))

XML loses badly on this particular example; however, if the extra data were (say) a date or currency, we would have to make up an ad-hoc way to label its type in JSON and LISP as well, since they have no special syntax to distinguish a date or monetary value from a regular number or string. For anything other than simple numeric data types, this one’s actually a draw.

Mixed content

And now, finally, for mixed content. I will add surnames to all of the (non-numeric) names in the list, and (here’s the kicker) will put those in their own labeled nodes:


<!-- XML -->
<list label="names!" xmlns:xsd="http://www.w3.org/2001/XMLSchema" 
    xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
  <female-name xml:lang="it">Anna Maria 
    <surname>Mozart</surname></female-name>
  <male-name xml:lang="en">Fitzwilliam 
    <surname>Darcy</surname></male-name>
  <male-name xml:lang="fr">Maurice 
    <surname>Chevalier</surname></male-name>
  <female-name xsd:type="xsi:float" xml:lang="de">7.9</female-name>
</list>
/* JSON */
{"names!": [
  {"female-name": [{"lang": "it"}, "Anna Maria", {surname: "Mozart"}]},
  {"male-name: [{"lang": "en"}, "Fitzwilliam", {surname: "Darcy"}]},
  {"male-name": [{"lang": "fr"}, "Maurice", {"surname": "Chevalier"}]},
  {"female-name": [{"lang": "de"}, 7.9]}]}
;; LISP
'(names!
  (female-name (((lang it)) "Anna Maria" (surname "Mozart")))
  (male-name (((lang en)) "Fitzwilliam" (surname "Darcy")))
  (male-name (((lang fr)) "Maurice" (surname "Chevalier")))
  (female-name (((lang de)) 7.9)))

Character for character, the JSON and LISP are still shorter, but the difference is not nearly as dramatic as it was in the very first example. In fact, typing all of these examples by hand, I find myself appreciating the redundant end tags on the XML parts, because it’s getting very hard to keep track of all the closing “]”, “}” and “)” for JSON and LISP.

No silver bullet

There are a few morals here. First, with markup, as with coding, there’s no silver bullet. JSON (and LISP) have the important advantage that they make the most trivial cases easy to represent, but as soon as we introduce even the slightest complexity, all of the markup starts to look about equally verbose. That means that the real problems we have to solve with structured data are no longer syntactic, and anyone trying to find a syntactic solution to structured data is really missing the point: JSON, XML (and LISP) people would be best making common cause to start dealing with more important problems than whether we use braces, pointy brackets, or parentheses. That’s why I was excited to have JSON inventor Doug Crockford speak at XML 2006, and why I hope that we’ll get more submissions about JSON as well as XML for 2007.

Personally, I like XML because it’s familiar and has a lot of tool support, but I could easily (and happily) build an application based on any of the three — after all, once I stare long enough, they all look the same to me.

Posted in General | 53 Comments

Templating languages and XML

Erich Schubert is talking about web templating languages. He’s looking for a pure-XML templating solution, but that might not be necessary for simple web-page design, where we don’t need all the extra benefits of heavy-duty transformation standards like XSLT.

Keeping it simple

For PHP-driven web sites, I’m a big fan of Smarty, which uses braces (“{” and “}”) to delimit template constructions. Braces have no special meaning to XML parsers (they’re just character data), so it’s possible to put a template expression inside an attribute value (for example), while keeping the template itself as well-formed XML and not requiring the elaborate paraphrastic expressions you need to set up attribute values in XSLT:

<p id="x-{$myvalue|escape}">Hello, world!</p>

Concurrent markup resurrected

Really, Smarty adds a second set of concurrent markup on top of the XHTML. Smarty constructs don’t have to balance with XML element boundaries, and with only a little care, I’ve never ended up with a Smarty template that wasn’t well-formed. JSP’s mistake was using something that looks like XML but isn’t quite, messing up parsers. Even the old SGML CONCUR feature would not have allowed markup inside attribute values. Sometimes there’s something to be said for using two different syntaxes when you’re trying to represent two different things.

Tagged , , | 10 Comments

Yahoo stands firm behind its search API

Early in the week, I posted about the end of the Google search API, and speculated that — since everyone else tends to copy Google — it might be the start of a general trend away from open data APIs and in favour of server-side AJAX widgets. In response, Amit Kumar of Yahoo sent me an e-mail message (after failing to get past Spam Karma in the comment system for my blog):

You don’t have to worry. We just posted a blog entry on this topic. Yahoo Search APIs are going strong – we welcome developers to use our APIs.

http://www.ysearchblog.com/archives/000393.html

Amit Kumar
Manager, Site Explorer

Thanks, Amit. Fortunately, megginson.com isn’t popular enough that it will break Yahoo’s 5,000 queries/day quota.

SOAP, REST, JSON, XML, and Serialized PHP

Note that Yahoo has a REST interface that can deliver results in XML, JSON, or serialized PHP, so if people get tired of the REST vs. SOAP perma-debate, there’s some alternative material for you here (if you want a good roaring debate, be careful to avoid reading Tim Bray’s carefully balanced view).

Tagged , , | Comments Off on Yahoo stands firm behind its search API

It's OK to wish me "Merry Christmas"

Even if you don’t know whether I’m Christian, I promise not to take offense. I’m tired of watching friends, neighbours, and colleagues tying themselves in knots trying to think of a culturally-sensitive thing to say to me (usually “have a nice holiday” or something similar). You’re also welcome to wish me a happy version of any other holiday, religious or not — I prefer a world where we celebrate all cultures and religions to one where we pretend religion and culture don’t exist. I live in a big, multi-ethnic city, and my acquaintances who are Muslim, Sikh, Jewish, or Hindu seem to have no problem using the C word, so the awkwardness seems limited to, well, Christians (or at least people whose ancestors were Christians).

Actually, since my ancestors were Christian, I’d also be happy if you wished me a “Merry Xmas”. During the 70’s or 80’s, there was a wrong-headed reaction against this spelling, along the lines of “put the Christ back in Christmas.” That’s not an “X”, folks — it’s the Greek letter Chi, which starts the word “Christ”. “X” was very much used among early Christians as an abbreviation for the title of the person they believed was the son of God, with the added bonus that the letter makes the shape of the cross they believed he died on.

So, on that note, Merry Christmas, everyone!

Tagged | 5 Comments

Beginning of the end for open web data APIs?

[Update: hacking the Google Search AJAX API — see below.]

[Update #2: Don Box is thinking along the same lines as I am.]

[Update #3: Rob Sayre points out that there is, in fact, a published browser-side JavaScript API underlying the AJAX widget.]

Over on O’Reilly Radar, Brady Forrest mentioned that Google is shutting down its SOAP-based search API. Another victory for REST over WS-*? Nope — Google doesn’t have a REST API to replace it. Instead, something much more important is happening, and it could be that REST, WS-*, and the whole of open web data and mash-ups all end up on the losing side.

It’s not about SOAP

Forget about the SOAP vs. REST debate for a second, since most of the world doesn’t care. Google’s search API let you send a search query to Google from your web site’s backend, get the results, then do anything you want with them: show them on your web page, mash them up with data from other sites, etc. The replacement, Google AJAX API, forces you to hand over part of your web page to Google so that Google can display the search box and show the results the way they want (with a few token user configuration options), just as people do with Google AdSense ads or YouTube videos. Other than screen scraping, like in the bad old days, there’s no way for you to process the search results programmatically — you just have to let Google display them as a black box (so to speak) somewhere on your page.

A precedent for widgets instead of APIs

An AJAX interface like this is a great thing for a lot of users, from bloggers to small web site operators, because it allows them to add search to their sites with a few lines of JavaScript and markup and no real coding at all; however, the gate has slammed shut and the data is once again locked away outside the reach of anyone who wanted to do anything else.

Of course, there are alternatives still available, such as the Yahoo! Search API (also available in REST), but how long will they last? Yahoo! has its own restructuring coming up, and if Nelson Minar’s suggestion (via Forrest) is right — that Google is killing their search API for business rather than technical reasons — this could set a huge precedent for other companies in the new web, many of whom look to Google as a model. Most web developers will probably prefer the AJAX widgets anyway because they’re so much less work, so by switching from open APIs to AJAX widgets, you keep more users happy and keep your data more proprietary. What’s an investor or manager not to like?

What next?

Data APIs are not going to disappear, of course. AJAX widgets don’t allow mash-ups, and some sites have user bases including many developers who rely on being able to combine data from different sources (think CraigsList). However, the fact that Google has decided that there’s no value playing in the space will matter a lot to a lot of people. If you care about open data, this would be a good time to start thinking of credible business cases for companies to (continue) offer(ing) it.

Update: Hacking the Google AJAX API (or, back to Web ’99)

The AJAX API is designed to allow interaction with JavaScript on the client browser, but not with the server; however, as Davanum Srinivas demonstrates, it’s possible to hack on the API to get programmatic access from the server backend. I’m not sure how this fits withThis violates Google’s terms of service, and obviously, they can make incompatible changes at any time to try to kill it, but at least there’s a back door for now. Thanks, Davanum.

Personally, I was planning to use the Yahoo (REST) search API for site search even before all this broke, because I didn’t want to waste time trying to figure out how to use SOAP in PHP. I’m glad now I didn’t waste any time on Google’s API, and I’ll just keep my fingers crossed that Yahoo’s API survives.

Tagged , , , | 28 Comments

XML 2006 proceedings due today

If you gave a regular presentation at XML 2006, I’d like to remind you that your proceedings — slides and or text — are due today (PDF or XHTML format, please). If you gave a keynote or participated in a panel or Masters Series and have slides or text to send in, we’d also be happy to put it on the site. Please send your presentation by e-mail to cmills at idealliance dot org.

Some presentations already online

Thanks to the many of you who sent in your presentations before the proceedings deadline. You can look at the conference programme page to see who those people are, since their presentations are highlighted in yellow (and with an asterisk, for text browsers or screen readers).

Fake real-time blog

Finally, for a creative approach to the proceedings, take a look at Rick Jelliffe’s Fake real-time blog from XML 2006: day one.

Posted in General | Tagged | Comments Off on XML 2006 proceedings due today

Good/bad/good/good news

Good news: the XML 2006 web site was far more popular than we anticipated.

Bad news: the site was so popular during the conference that we exceeded our bandwidth limit and went off line.

Good news: the site didn’t go down until two days after the conference was finished.

More good news: the site is back up now.

Apologies to everyone for the inconvenience. In a couple of weeks, we’ll be putting the proceedings online, and I’ll watch bandwidth closely in case we get linked to from somewhere popular. Maybe next year we should use Amazon’s Elastic Compute Cloud (EC2) instead of conventional shared hosting.

Now, hurry up and get your proposals in for XTech 2007, because the deadline is only four days away (if you liked Boston in December, you’ll love Paris in May).

Tagged , | Comments Off on Good/bad/good/good news

XML 2006: Day 3

XML 2006 ended today, after more than a year of planning (I started working on the conference in November 2005). I am usually a politely cynical person, and tend to dismiss expressions of emotion in organizers as empty words; however, I am surprised at how strong my emotions are. Before the conference, everything is numbers and rooms and layouts, but during the conference, it’s hundreds of people who have taken days (or more) out of their lives and away from their families to come to Boston to give talks, listen to papers, socialize, and generally just be together. I wish that there were some words I could use to describe how I feel without being trite.

The last day went well, but it was a hard start: adrenaline is a fickle mistress, and while she stayed close by my side Tuesday and Wednesday, I woke up this morning to find her gone. Instead of bounding up and down stairs, I had to drag myself; instead of looking forward to breakfast, I had to force it down while trying not to be sick. Fortunately, as the day went on, some of my energy came back, and the morning had a good start with a long and enthusiastic audience discussion after Paola Marinelli presented his and Stefano Zacchiroli’s XML Scholarship-winning paper, Co-constraint validation in a streaming context, describing a new way to validate XML documents.

While a few people had to leave early, we had good attendance right up until the final session, and well over 100 people came out to hear Jon Bosak’s closing keynote this evening, which included Renaissance poetry, 1990’s dogrel, old e-mail threads, vigorous invective against software vendors, a committee status report, and everything else one would expect from Jon, ending in a loud standing ovation from the audience.

Please take a look at the lists of conference blogs and photos now on the web site. Next year, I’ll get the lists up earlier, and will remember to bring my own camera.

Thank you to everyone who made this such a wonderful, crowded, and energetic conference. I hope to see many of you in Paris at XTech 2007 next spring, in Montreal at Extreme Markup 2007 next summer, and of course, at XML 2007 here in Boston next fall. Now it’s time to check the weather for my flight home to Ottawa tomorrow.

Tagged | 1 Comment

XML 2006: Day Two

We had another good day at the conference today, though once again, you’ll find that my reports have little in common with those of others blogging about it, simply because I don’t get much time to sit in sessions. JustSystems sponsored a breakfast opener at 8:00 am this morning, and it was packed — I expect we’ll see the same thing at Microsoft’s breakfast opener tomorrow morning.

I’ve been receiving excellent comments all day about today’s keynote speaker, Darin McBeath, for three reasons:

  1. He’s from a publishing company (Elsevier) rather than a software company, and a lot of people feel that the publishing side of XML has been neglected for the past few years.
  2. He’s from a company that’s an XML user rather than an XML vendor or consultancy, so he brings a different perspective.
  3. He did a survey of several other publishing companies (anonymity preserved) about XML, and gave us the results.

Saunas

Of course, a conference isn’t a conference if something doesn’t go wrong. During the morning, two of our conference rooms became alarmingly warm, though the presentations were good enough that people stayed and sweated them out. The hotel had to send someone onto the roof to repair some ductwork, and things returned to normal after lunch. Apologies to everyone (both for the heat and for the banging).

Coffee

Elliotte Harold noted a shortage of coffee yesterday, but I didn’t read to his blog in time to fix things today. I’ll take it up with staff tomorrow, but to be honest, if I were a coffee drinker, I’d take 2 minutes to slip down the escalator and grab something at the hotel Starbucks before I’d drink conference coffee. Then again, since I’m not a coffee drinker, I may not understand the urgency of the caffeine craving after 90 minutes in a conference session.

Presentation tips

Before the conference, I posted about How not to suck at your presentation, concentrating on technical issues like font size and live demos. Sarah O’Keefe covers some of the same points, but mostly addresses presentation style, including the excellent advice not simply to read the bullet points on your slides out loud to the audience.

I’d also like to add that you can probably simply delete 1/3 to 2/3 of your slides without losing any important information. I know that applies to the first drafts of all of my own presentations — most of my editing involves the [delete] key. Fewer slides are also more fun, because you have extra time left to talk with the audience.

Eating your own dog food

Keith Fahlgren mentions that there was a DocBook dinner tonight, thus neatly explaining the large gang I spotted following Norm Walsh through the halls of the Prudential Center around 8:00 pm. This is probably a sign of my fatigue (and not directed at DocBook, which is a fine spec), but I wonder whether, if each committee member were required eat a printed copy of the spec with each release, we might end up with much shorter specs (and fewer versions, to boot). WS-* dinner, anyone?

Evening weather

In addition to his conference reports, Robin Hastings mentions the beautiful weather this evening. I was also out for a short walk on Newbury once I knew that everything else was OK, and it was a nice, warm 7 degC with a moist (but not chilly) breeze (compared to -4 degC and wind chill during my pre-dawn run). If I weren’t so tired, I would have liked to walk around Boston for hours. This is a beautiful city.

Tagged | Comments Off on XML 2006: Day Two

XML 2006: Day One

There are several people blogging about the content at XML 2006; unfortunately, as chair, I’m not able to do that, because I don’t actually get to see much of any presentation aside from the keynotes.

I was nervous coming into the first talk, Roger Bamford’s opening keynote, but that evaporated when I saw the huge crowd spilling out into the foyer (and staying, standing outside, for the whole talk). It’s been a great opening day, with no speaker no-shows, only one technical crisis, and huge enthusiasm from the attendees; with luck, the next two days will go as smoothly.

One experience that stood out for me today was Michael Smith’s PechaKucha this evening: we brought a lot of vendors and service providers into one room and gave each one exactly 6 minutes and 40 seconds to use 20 slides to tell us what they had to say. In many cases, with a lot of creativity, they managed to convey as much information in 400 seconds as they normally would in a 45-minute presentation, so it’s an astoundingly efficient way to get information. I have to admit, though, that after a while the sheer pace becomes overwhelming. Next year, we’re thinking of trying PechaKucha for standards-committee updates, so that you can get updates from 10 working groups in just over an hour — stay tuned.

It’s to bed for me soon, so that I can get up early tomorrow and go for a run outside before the conference starts again. If you’re here in Boston, don’t forget the free breakfast in Back Bay C tomorrow morning at 8:00 am, then Darin McBeath’s keynote at 9:00 am.

Tagged | Comments Off on XML 2006: Day One