Blame Larry Wall

Larry Wall

Late yesterday I was working on a mind-numbingly simple XML data library in Java for use with a larger project. I spent about an hour on the first iteration, which could read and write through an event-interface and/or into a data tree but used only simple names. After supper, I came back and spent another hour writing a beautifully elegant XMLName class and refactoring the rest of the code to support namespace-qualified names. The class supported getters and setters for the namespace URI and local name, equals, and hashCode methods, and at one point, support for the Comparable and Serializable interfaces, but it went even further — to support the flyweight design pattern it was declared final and had a weak-reference lookup table for internalization, like the Java String class. To go even further, it had a static intern method that took two arguments, so that you could create an internalized XMLName directly without having to construct a non-internalized version first:

XMLName name = XMLName.intern("http://http://www.w3.org/1999/xlink", "href");

In other words, it was pretty cool — fast, memory-efficient, and properly designed. I’m sure that many of the people reading this posting have designed similar classes for XML work and taken similar pride in them. Unfortunately, before I went to bed, I realized I’d have to delete the class when I got up in the morning.

Why? I blame Larry Wall for all my grief, because it was his voice that started playing in my head, saying “easy things should be easy, and hard things should be possible.”

I messed up because I was focussing on the harder part of the problem. For simple XML configuration files, most people won’t be using namespaces most of the time, so forcing them to write

branch.setName(XMLName.intern(null, "foo"))

instead of

branch.setName("foo")

is a bad idea. Of course, I could hide that behind the scenes by adding extra method calls, say, setNameString and getNameString, but then I end up cluttering up my code (harder to learn, more bugs, trickier maintenance, etc.), again, just in an attempt to make the hard case easier.

The right solution for this particular library is one that James Clark suggested back in 1998 or 1999 when we were first trying to figure out how to get namespace support into SAX, and one that I sometimes wish we had taken up (though it’s not one of my biggest regrets): represent any XML name as a single string, with the namespace URI and the local name merged together. James preferred surrounding the namespace URI in braces, like this: “{http://http://www.w3.org/1999/xlink}href”; other option is to separate the two with a space, like this: “http://http://www.w3.org/1999/xlink href”. Of course, any library that does this should provide helper functions for splitting the string into its two parts or recombining.

So, while I’m still channelling Larry’s voice, let’s see how well this solution fits. First, the easy case:

String name = branch.getName();
branch.setName("foo");

OK, looks good: the easy thing is easy. Now, the hard case:

String name = branch.getName();
String parts[2] = Utils.splitName(name);
branch.setName("{http://www.example.org/ns}foo");

The hard thing is not easy, but it’s possible. Perhaps Larry’s voice will leave my head now, and I can get on with life and coding, in that order.

About David Megginson

Scholar, tech guy, Canuck, open-source/data/information zealot, urban pedestrian, language geek, tea drinker, pater familias, red tory, amateur musician, private pilot.
This entry was posted in Uncategorized and tagged . Bookmark the permalink.

3 Responses to Blame Larry Wall


  1. String parts[2] = branch.getNamepair();
    branch.setNamepair("foo", "http://www.example.org/ns");

  2. David Megginson says:

    That’s a good example, Bill — thanks. I thought about it (as well as separate getters and setters for the namespace URI and local part), and I agree that there are cases where those would be a good idea. Still, that approach adds the problem (for the easiest case) of dealing with two strings and passing them around. To be memory efficient, there would also have to be some kind of internalization facility for the pair.

  3. Brent Hendricks says:

    I’m not really a Java programmer, but couldn’t you reverse the order of the arguments and make the namespace URL optional? Then either of:


    branch.setName("foo");
    branch.setName("foo","http://www.example.org/ns");

    would be legal

Comments are closed.