Anne van Kesteren’s is the first report to reach me that the W3C’s xml:id spec has just moved up the food chain to Candidate Recommendation. I’m usually one of the first people to whine about too many XML-related specs, but I think this is a good one, despite a few minor problems like an incompatibility with XML Canonicalization.

Why does this matter? Any use of XML over the web that requires DTD or schema processing is broken because of all the extra security and availability risks involved in processing external files, especially when they’re hosted at other sites. The xml:id spec gives a quick and dirty way of identifying parts of an XML document without requiring a schema, DTD, or even a namespace declaration (since the xml: prefix is predeclared for XML documents). Basically, you just use something like this inside your XML document:

<employee xml:id="dmeg123">
 <name>David Megginson</name>

and you’re done. Other XML documents can refer to part of yours using a fragment identifier, as in http://www.example.org/employees.xml#dmeg123, and that’s that — no schemas are harmed in the making of this link. I don’t know if XML data on the web ever will take off, but this small spec is a critical step in the right direction. Congrats to the editors and the working group for pushing it through this far.

If only we could make everything in XML this simple.

This entry was posted in General. Bookmark the permalink.

7 Responses to xml:id

  1. Pingback: Quoderat » Rumours of xml:id trouble in the W3C

  2. Pingback: Quoderat » REST design question #5: the “C” word (content)

  3. A bit of nitpicking: I’d say http://www.example.org/employees.xml#dmeg123 is bad example here.
    As per “Architecture of the World Wide Web”:
    “When the media type assigned to representation data is “application/xml”, there are no semantics defined for fragment identifiers, and authors should not make use of fragment identifiers in such data. The same is true if the assigned media type has the suffix “+xml” (defined in “XML Media Types” [RFC3023]), and the data format specification does not specify fragment identifier semantics. In short, just knowing that content is XML does not provide information about fragment identifier semantics.

    Many people assume that the fragment identifier #abc, when referring to XML data, identifies the element in the document with the ID “abc”. However, there is no normative support for this assumption. A revision of RFC 3023 is expected to address this.”

    In XInclude it would be

  4. Oops, I meant
    <xi:include href=”http://www.example.org/employees.xml” xpointer=”dmeg123″/>

  5. Anne says:

    I always thought that #foo was short for XPointer #xpointer(id(‘foo’)) in XML context. However, I can not find it in the XPointer specification.

  6. Anne and Oleg are both correct, as far as I understand the specs. In the general case, like in the toolbar of a browser, it’s not safe to assume that a bare name in a fragment identifier has any special meaning, like Oleg says; however, the XLink spec states that the fragment identifier in the value of the xlink:href attribute is an XPointer, and the XPointer spec states that a bare name is an ID (and explicitly mentions the application/xml MIME type), like Anne says.

    I didn’t explicitly mention XLink and XPointer in the original posting, so here’s an example that (I think) is correct, assuming that the xlink prefix is already declared on an ancestor element:

    <employee-ref xlink:type="simple" xlink:href="http://www.example.org/employees.xml#dmeg123"/>

    I’ll be glad when we don’t have to include xlink:type any more. Anne: here’s where the XPointer spec defines bare names.

  7. Anne says:

    Ah, thanks to “barenames” I found it. The official specification calls them shorthand pointers nowadays. Unfortunately the specification does not provide a simple example along with it.

Comments are closed.