My first REST design question is about the fact that RESTafarians seem to consider identification and location to be the same thing, and following from that, the question of how to make identification persistent in XML resources. For example, assume that http://www.example.org/airports/ca/cyow.xml
is both the unique identifier of an XML data object and the location of that object on the web. That’s the whole point of REST, really. RESTafarians don’t like interfaces where identifiers are hidden inside XML objects returned from POST requests to unrelated URLs, for example (in fact, they get angry in quite an amusing way).
GET and PUT
So, here’s a simple use case. Let’s say that I download the XML data file at http://www.example.org/airports/ca/cyow.xml
and it looks like this simple example:
<airport> <icao>CYOW</icao> <name>Macdonald-Cartier International Airport</name> <political> <municipality>Ottawa</municipality> <region>ON</region> <country>CA</country> </political> <geodetic> <latitude-deg>45.322</latitude-deg> <longitude-deg>-75.669167</longitude-deg> <elevation-msl-m>114</elevation-msl-m> </geodetic> </airport>
I then copy it onto a USB memory stick, bring it home from work, copy it onto my notebook computer, and work on it while offline during a business flight. The file no longer has any direct connection with its URL: it has gone through other transfers since the HTTP GET request I used to download it. How do I know what I’m working on or where I should PUT it when I’m done?
If this information has to be kept out of line, then some of REST’s advantages are evaporating, because now I have to start using custom-designed clients again instead of simply piggybacking on existing web technologies. As an identifier, the URL is clearly part of the resource’s state, and belongs in the XML data file; as a location, however, it is superfluous information and belongs only in the protocol (HTTP) level.
Where does the document identifier go?
Let’s assume that I get over my squeamishness and decide that the URL is a proper identifier and belongs in the XML representation. Now, how do I do that in a fairly generic way? xml:id is out of the question, since it’s designed only to hold an XML name for identifying part of a document, not a URL to identify an entire document. I could use (or abuse) xml:base, like this:
<airport xml:base="http://www.example.org/airports/ca/cyow.xml"> ... </airport>
I’m not certain, though, how XLink processors would deal with that. Would the relative URL “cyyz.xml” end up being resolved to http://www.example.org/airports/ca/cyyz.xml
or http://www.example.org/airports/ca/cyow.xmlcyyz.xml
? There’s also the possibility that some highly-cooked APIs might predigest the xml:base attribute so that application code never sees it. Do the XML standards people believe this kind of an xml:base usage is legit?
If xml:id is unusable, and xml:base is problematic, it looks like there might be no standard way to identify RESTful XML documents, and each XML document type will need its own ad-hoc solution. Any suggestions? Does the world need one more xml:* attribute (I hope not)?
I’d be interested in hearing how REST developers have dealt with identifier persistence and round-tripping when the identifier is the URL.
Pingback: Quoderat » REST design question #3: meaning of a link
Pingback: Quoderat » REST design question #4: how much normalization?
Pingback: Quoderat » REST design question #5: the “C” word (content)
Pingback: Quoderat » REST design question #5: the “C” word (content)
Pingback: AsynchronousBlog
I think this is exactly what xml:base is for. IMHO tools (like browsers) that allow XML documents to be retrieved from the web and stored somewhere else should automatically add an xml:base attribute.
XLink tools are either not xml:base aware or they will resolve the links correctly. I don’t think you’ll ever get cyow.xmlcyyz.xml.
Applications that are compatible with DOM Level 3 or with the XML Information Set should give you the base URI for every node.
XLink processors? What XLink processors?
Resolving cyyz.xml against a base URI of http://www.example.org/airports/ca/cyow.xml will result in http://www.example.org/airports/ca/cyyz.xml. That looks like a reasonable use of xml:base to me, though it seems like a slight extension. I’d have expected the xml:base to be http://www.example.org/airports/ca/ (the trailing slash is significant) but maybe that’s just narrow-minded of me.
xml:base isn’t just a cute hack to set the base URI. From the very beginning of HTML [1] the base href is meant to provide the original URI of the document, to be used when the document is read out of context. XInclude extends this practice by providing with xml:base the url of the originating document.
So I’d expect that xml:base would usually contain a resolvable URI of a document, and not a URI like http://www.example.org/airports/ca/
[1] http://w3future.com/weblog/2005/01/13.xml#stillBugsInTheImplementationOfHtmlHyperlinks
xml:base has, on occasion, the same value as the URL from which the document was retrieved, but that’s accidental. Consider that it makes no sense to have a base URI that ends with anything other than “/”, which most URLs do not do. rdf:about is the closest thing to what David’s looking for AFAICT.
Full response forthcoming on my weblog …
The default base URI of the root element when there is not xml:base attribute is the base URI of the document, i.e. the URI used to retrieve the document. So you’re just asserting what is already the default.
Even when you don’t need a document identifier it seems good practice, as it makes your data location independent.
I’m not 100% sure about RESTafarians considering identification and location one and the same thing – surely resources are identified (via URIs), representations are located (via HTTP).
The question of where the document identifier should go is spawned by your creating a new resource, albeit with the same representation as the original. A possible solution might be to wrap the content of the document to identify the resource which it represents, i.e.
<rdf:Description rdf:about=”http://www.example.org/airports/ca/cyow.xml”>
It may be interesting to note that Atom accepted
<link rel="self">
for identifying the location of a feed instead ofxml:base
.See PaceFeedLink and Discussion on atom-syntax mailing list.
This is great stuff, btw, David.
Your intial assumptions are wrong. Identifiers, in the business sense that you mean, have nothing to do with URLs. Identifiers should be GUIDs and should never, ever change. URLs are not identifiers and trying to use them as such is silly. It’s common in many systems for a single resource to be located by several URLs and it’s also common for the business entity represented by a URL to change over time (eg http://www.weather.com/nyc results to a new business entity every 20 minutes). You’re not downloading a resource when you GET a URL, you’re downloading a representation of a resource. The representation of the resource should contain zero information about its URL (unless that makes sense for the RSS, for example RSS documents) but it should contain an identifer to prevent the need for conversational state.
PROBLEM: You cannot remember the source of the data you’ve just retrieved over an ambiguous duration or disposition through time.
SOLUTION{?}: Attribute an external identifier (the information may not need to care where its from after, you do.) I would elect: http://www.example.org.airports.ca.cyow.xml
or allow embedded paths: ‘www.example.org/airports/ca/cyow.xml’