A new Namespaces discussion

Eliot Kimber and I were both on the old W3C XML Working Group during the development of the Namespaces in XML specification. Late in the process, pressure from outside the WG forced us to make a major change to the specification, angering many of the members. Eliot, who was already pretty unhappy with the Namespaces spec, left; I decided to stay.

Eliot has recently had the grace and integrity to making a posting where he admits to being wrong about Namespaces, and states that he is now, with only a few caveats, a big fan of the spec. He even goes so far as to write the following:

If you’re not using namespaces you should be–I can’t see any excuse for anyone defining any set of XML elements that is not in a namespace. It should be required and it’s too bad that XML, for compatibility reasons, has to allow no-namespace documents.

The context problem

While I wasn’t originally as strongly opposed to Namespaces as Eliot was, I cannot claim to be as strongly in favour now. For me, the biggest problem with the Namespaces spec is the requirement for a context to interpret prefixed names. That’s no biggie as far as XML element and attribute names go:

<foo:bar xmlns:foo="http://www.example.org/foo" foo:a="b"/>

Here, there’s no doubt that foo:bar stands for “{http://www.example.org/foo/}bar” (or however you want to notate it), while foo:a stands for “{http://www.example.org/foo/}a.”

QNames in content and attribute values

What happens, however, when the prefixed name appears in an attribute value or content?

<foo:bar xmlns:foo="http://www.example.org/foo/" foo:a="foo:b">foo:c</foo:bar>

Simply looking at this XML document in isolation, there’s no way to know whether the attribute value “foo:b” and the content “foo:c” is meant as a literal string or a qualified name. The context (the xmlns declaration) is still there to allow software to expand the prefix, but you need something else — an external schema, hard-coded application logic, prompting a human operator — to decide whether it’s safe to expand the name. Any feature that requires the use of schemas to perform basic XML processing should raise red flags.

QNames in XPath expressions

The biggest problem, however, comes with referring to parts of an XML document in non-XML syntax. Consider the following XPath expression:

//foo:bar/@foo:a

Unlike the XML document, this expression does not provide any way to expand the foo: prefix. It needs some kind of external context. That means that you can never simply pass this around as a string argument in a programming language, for example, without also passing around a whole set of Namespace declarations. Namespace processors cannot safely discard prefixes, because they might still be important later on. XML transformation filters have to try to preserve original prefixes whenever possible. In short, in non-trivial XML processing, the distinction between the Namespace prefix and the Namespace URI quickly becomes blurred. And this is not simply a problem for tool makers — it’s one that bites developers, script writers, database administrators, and even information authors.

Namespaces if necessary, but not necessarily Namespaces

I don’t know an easy fix for this (perhaps including the full Namespace URI in XPath expressions would have been smarter), but given all of this hassle, I cannot agree with Eliot that Namespaces should always be mandatory. Where Namespaces are not needed for disambiguation — where an XML document isn’t meant to be published to the web for general use — avoiding Namespaces (or at least, using them sparsely) removes a huge amount of complexity from XML development, authoring, and information management. A script kiddie, for example, can easily write PHP code to deal with non-Namespaces qualified XML documents, but may quickly fall out of his or her depth once we stir Namespaces into the mix.

I do still believe that Namespaces are valuable, and in general, I’m not unhappy with the current specification; however, I also believe that simpler XML markup still has its place for a huge range of applications, especially when the XML document will be used in a specific way and not published to the world at large.

This entry was posted in Uncategorized and tagged programming. Bookmark the permalink.

9 Responses to A new Namespaces discussion

Joe English says:

February 26, 2006 at 10:53 am

Curious — what was the major change in the spec alluded to in the first paragraph? (I have a hunch where it came from, but wasn’t in on the process so it’s just a hunch).

Wholeheartedly and vigorously agree about QNames-in-content — these are bad, bad, bad.
david says:

February 26, 2006 at 12:02 pm

Eliot referred to it in his posting — earlier drafts of the Namespaces spec used processing instructions for declarations, but we switched to attributes. It seems pretty trivial in retrospect, but at the time, we’d wanted to keep the element/attribute/content tree clean (i.e. you wouldn’t have declarations and real attributes mixed together).
Lars Marius Garshol says:

February 26, 2006 at 12:08 pm

The problem with QNames in queries is also faced by the RDF and Topic Maps query languages, all of which solve it by allowing the developer to declare prefixes inside the query. That would work for XPath, too, but the downside is that since XPath “queries” tend to be so short, the prefixes would be disproportionately long. Still, if you really want self-contained queries, I can’t really think of any other way.
Anthony B. Coates says:

February 26, 2006 at 5:56 pm

I can see Eliot’s point of view, since as he himself notes, he is a system integrator. I’m sure he’s suffered through many schemas from different sources which uses the same element/attribute names for different things, exactly what namespaces can help fix. That said, it doesn’t mean that every little XML document needs to be namespaced, if it remains private enough.

As for XPath, I consider the difficulties in the use namespaces with XPath to be a weakness in XPath, one that causes me a regular amount of pain. I don’t consider it a weakness in the way XML does namespaces. XPath is a convenient nuisance at any time; it’s text format is short and convenient, but the fact that it’s text and not XML sometimes makes it inconvenient. Not having anywhere to put namespace declarations is one side effect of the textual format. Still, would it be *so* hard to provide an extra parameter somewhere to allow the namespaces to be specified?

With XSLT, it becomes a nuisance that there isn’t an easy way to pass the namespace prefix to URI mappings into a script. If you could, it would certainly make XSLT transformations more robust with respect to namespace URI changes.

Cheers, Tony.
Ed Davies says:

February 26, 2006 at 6:59 pm

Norman Walsh has a proposal:

http://norman.walsh.name/2004/11/10/xml20

for “XML 2.0” which would fix the QNames in content problem pretty neatly, I think.
John Watson says:

February 27, 2006 at 6:04 am

I think the ‘QNames in XPath expressions’ problem weakens various important XML specs. For example XUpdate seems to be unable to update target documents that use namespaces. I imagine it’s for this reason. At any rate the spec has no mention of how to incorporate a namespace prefix from the target document into an XPath expression in the update.
Eric van der Vlist says:

February 28, 2006 at 5:19 am

David,

The general issue of QNames in content is one of the windmills against which I have been fighting with limited success. My main success has been to get alternatives to QNames in content for RELAX NG and Schematron but I am not sure I have been able to really convince James Clark about this point…

The issue you mention regarding the dependency between XPath expressions and their context isn’t new and, as far as I recall, it has been the reason why XPointer has been brought back from CR to Last Call in early 2001!

See http://www.xmlhack.com/read.php?item=982 .

Eric
Laender says:

May 2, 2006 at 4:30 pm

Thanks Ed, thats (http://norman.walsh.name/2004/11/10/xml20) a really good site!
Tom in Cala Dor Palma de Mallorca says:

October 1, 2006 at 6:16 am

I can see Eliot’s point of view, since as he himself notes, he is a system integrator. I’m sure he’s suffered through many schemas from different sources which uses the same element/attribute names for different things, exactly what namespaces can help fix. That said, it doesn’t mean that every little XML document needs to be namespaced, if it remains private enough.