Recently, I mentioned my biggest regrets about SAX. When we were building SAX, however, there were an awful lot of things that went right. Here are the three things that I’m happiest about:
- SAX was useful right from the start
-
Not just useful, in fact, but more useful than any alternative at the time. When I wrote the first draft of SAX over Christmas 1997 and put it up on the xml-dev mailing list for discussion and review in January 1998, the package included not only an interface definition but driver/adapters for all four existing Java XML parsers: Jame’s Clark‘s XP, Tim Bray‘s Lark, Microsoft‘s MSXML (I don’t think a Java version is still available), and my own AElfred (now maintained by others). That meant that right away, a Java developer would be able to write code that worked with any existing XML parser.
This was an important point because I was afraid that the big computer companies (IBM and Oracle were also working on parsers) were going to try to lock developers into their platforms through proprietary parser interfaces. XML is an open format, but if all your code and all the libraries available to you work only with (say) IBM’s or Microsoft’s parser interface, then you haven’t gained much over using a proprietary format.
Another advantage, that I hadn’t anticipated, was that people started developing large-scale projects with SAX right away, so they shook out bugs and design problems very quickly. Running code is always a good thing, but running code that actually makes developers’ lives easier trumps anything else.
- SAX is efficient
-
There are so many things that we could have done to kill SAX’s efficiency: we could have returned strings for character data instead of arrays (which can be indexed directly into the parser’s buffer); we could have returned elaborate objects for events, managed from some kind of pool; we could have managed a context stack for the user, whether she needed it or not; but we did none of those things. I was tempted, sometimes, but the other volunteers in the project quickly slapped me back into line.
The rationale was simple: it is easy to build all of those things on top of SAX if you need them (and, in fact, Michael Kay‘s SAXON started life as a friendly SAX helper library, before it evolved into an XSLT framework), but there is no way to remove them if you don’t need them. As a result, SAX concentrated on standardizing the way that parsers deliver information rather than providing a friendly user experience — once that was standardized, it would be easy to build layers on top that would work with any parser. In short, the motto was do no harm rather than make it fun and simple; it turned out being a perfect example of worse is better.
I had assumed that just about everyone would work through those higher-level libraries, but in the end (to my surprise), lots of developers learned to love the clumsy, low-level SAX interfaces in all their ugly glory. I myself have messed around with writing higher-level libraries on top of SAX, only to go back to the raw ContentHandler and its friends every time. For some reason, hard-core XML developers like to stay close to the metal, no matter how many friendly high-level tools people offer them.
- SAX supports filter chains
-
SAX filter chains may seem obvious now, but I doubt I would ever have been able to think them up. I cannot remember who first suggested using SAX handlers in chains, like a Unix pipeline — perhaps the idea just evolved gradually as a kind-of group think — but it was well established by SAX2 and officially supported by a dedicated interface. We don’t support filters perfectly (error handling is a bit kludgy), but people make beautifully simple yet powerful systems using them.
I don’t think that there will ever be substantial changes to SAX. Now that I’ve resumed maintaining it, I’ll try to fix bugs and keep it up to date with any new XML versions, but otherwise, it is what it is. Perhaps something newer, like StAX or some other pull interface will eventually displace SAX, and that would be fine too. For now, though, it is an essential part of the XML infrastructure, used at tens or hundreds of thousands of sites, and the best thing I can do is keep it stable and make as few changes as possible.
Pingback: Quoderat » The complexity of XML parsing APIs
Pingback: Danny Ayers, Raw Blog
2556 http://texas-holdem.usyellow.com