It’s seven years ago this January that I put out the first prerelease of SAX for consideration by the xml-dev mailing list. The final SAX releases contain the wisdom of a lot of people, but in the end, I had to make the final decisions about how it would work, and my record was mixed. Now that SAX is a standard (if unremarkable) part of the XML infrastructure, I thought it would be worth making two or three posts about what went wrong and what went right. In this post, I’ll start with my three biggest regrets about SAX/Java:
- SAXException does not extend IOException
XML parsing is a kind of I/O, and the exception should have reflected that. If we had done things that way, any library that does XML parsing could simply have thrown IOException, without having to expose any XML stuff at all or to force tunnelling of exceptions inside other exceptions, etc. This one bugs me every time I code with SAX.
- SAX uses callbacks instead of a pull interface
In this case, though, I probably wouldn’t do things differently if I could go back in time. To get acceptance, SAX had to work with all existing Java/XML parsers. They used callbacks, and the only way to get a pull interface would have been to run the parser in a separate thread, an approach wasn’t all that stable back in early 1998 (especially not on Windows). Callbacks are not a serious problem for most applications, but they do make event dispatching much more difficult and sometimes they make for messy, hard-to-maintain code. Now that Java thread support is rock-solid on all platforms, it’s easy enough to write a good pull-parsing adapter for SAX (I have one that I can release, if anyone cares). I’ve played around with StAX a bit, but none of the StAX drivers seems as stable as the SAX ones.
- SAX2 isn’t really simple
The original vision for SAX was to keep it dead simple. The XML 1.0 REC required that we report certain information, like processing instructions, but otherwise, I wanted to keep it as close to elements-attributes-content as humanly possible. SAX1 didn’t do too bad a job of that. SAX2 had to add support for namespaces, which messed up all the interfaces; at that point, people were screaming for all kinds of esoteric stuff that about 12 people in the world care about (i.e. entity boundaries). Instead of making SAX even more complicated, I invented the property and extension interfaces so that people could invent new things without cluttering the core. Then SAX ended up with all kinds of new, optional interfaces in the distribution anyway, so it’s quite nightmarish for a new user trying to figure out what matters and what doesn’t. If I ever put out a SAX3, I’ll do most of the work using the delete key, but that’s probably not possible when things like JAXP depend so heavily on SAX.