REST design question #3: meaning of a link

This is the third in a series of REST design questions. The first design question asked about keeping track of location and identification information after you have downloaded an XML file; the second design question asked about discovering resources and dealing with long lists of data in a RESTful way.

The very heart of REST, both in its narrow original sense (everything must have a URL) and its broader popular sense (basic HTTP + XML as an alternative to Web Services), is linking. REST insists that any information you can retrieve must have a single, unique address that you can pass around, the same way that you can pass around a phone number or an e-mail address — those addresses make it possible to link resources (HTML pages or, in the future, XML data files) together into a web, so that either people or software agents can discover new pages by following links from existing ones.

Old-School Hypertext

But what does a link mean? That question matters a lot for anyone writing general-purpose REST software, such as search engines, data browsers, or database tools, that are not designed to work with only a single XML markup vocabulary. The pre-HTML Hypertext specialists believed that links could have many different meanings, and typically wanted to provide a way for the author to specify them; hiding in the shadows during the web revolution of the 1990s, the old-school managed to keep the fire alive long enough to add the universally-ignored xlink:type attribute to XLink. Do we need xlink:type for generic XML data processing in a REST environment?

I don’t think we do.

In fact, if you take a look closely, linking to an external resource from an HTML document always means the same thing:

Here is a more complete version of what I’m talking about.

It is very hard to think of any exceptions. For example, consider these three links from an HTML document:

<p>During the <a href="http://en.wikipedia.org/wiki/Renaissance">Renaissance</a> ...</p>
<img alt="Illustration of Galileo" src="galileo.jpg"/>
<script src="validate-form.js"/>

In every case, the element containing the link attribute is a placeholder for something somewhere else. Obviously, they cause different browser behaviour — the picture will be inserted into the displayed document automatically, while the Wikipedia Renaissance entry will not — but in all three cases, the thing linked represents something more complete: the Wikipedia Renaissance article is more complete than the phrase “Renaissance”, the image galileo.jpg is more complete than the alternative text “Illustration of Galileo”, and the Javascript code is more complete than the script placeholder.

New-School XML

Exactly the same principle will likely apply to links in XML data files, like this example:

<person xml:base="http://www.example.org/people/e40957.xml" xmlns:xlink="http://www.w3.org/1999/xlink">
  <name>Jane Smith</name>
  <date-of-birth>1970-10-11</date-of-birth>
  <employer xlink:href="http://www.example.org/companies/acme.xml">ACME Widgets, Inc.</employer>
  <country-of-birth xlink:href="http://www.example.org/countries/ca.xml">Canada</country-of-birth>
</person>

All of the information available for the person’s name is the string “Jane Smith”, and all of the information available for the date of birth is the string “1970-10-11”; however, there is more complete information about the employer at http://www.example.org/companies/acme.xml, and there is more complete information about the country of birth at http://www.example.org/countries/ca.xml.

It seems that unidirectional links like those used in the web always lead towards increasingly canonical information. If an XML element has a linking attribute, then, can we assume that the entire XML document subtree starting at that element represents a lesser version of the information available externally at the link target? If so, can we really gain much by adding xlink:role to the mix?

Snags

Is this a safe-enough assumption that we could use it with any RESTful XML data files, and perform some kinds of data processing without having to know about the specific XML vocabulary in use?

I can think of two counter-examples right away, and they both deserve some attention. First, there is one context where HTTP URLs frequently appear as attribute values in XML documents but do not refer to a more complete version of the information inside an element: XML Namespaces. Here’s an example:

<person xmlns="http://www.example.org/ns/">Jane Smith</person>

In this case, there may be no information available at all at the location http://www.example.org/ns/; if there is something there (like an RDDL file), it will most likely be information about the XML markup, not about the person Jane Smith. Of course, this URL does not appear as the value of an xlink:href attribute, so there is no ambiguity. More importantly, the use of URLs for Namespace identifiers (a choice which I supported) as caused an enormous amount of confusion among XML users, who expect the URL to point to something — something more complete or authoritative, that is. That very confusion is proof of how ingrained this use of linking is.

The second counter-example is the rise of the rel="nofollow" attribute in HTML links, partly as an attempt to counter spam in weblog comments and wiki sandboxes. If anything, this appears to vindicate the old-school hypertexters. They should be rushing into the street in disheveled clothing with a mad gleam in their eyes, shouting “Look, we were right! It took 15 years, but finally everyone sees that links do need semantic information attached!” and so on. But they’re not, probably because they’re smart enough to realize that this isn’t, quite, change the primary meaning of a link. The rel="nofollow" attribute says that the author does not endorse the link target, but it still provides a more complete version of the information. For example, someone who strongly dislikes the U.S. Libertarian Party might want to point to their web page without improving their Google search ranking, and thus include something like this:

<p>In contrast to the misinformation coming from the <a href="http://www.lp.org/" rel="nofollow">Libertarian Party</a> ...</p>

The resource at http://www.lp.org/ is still a more complete version of the information in the XML element, even if it is information that the author does not particularly like.

Implications

If linking really can be this simple, then we will be able to do a lot with XML data and REST even if we do not agree on a common content encoding. That could be enormously valuable: the document web was an enormous success precisely because content-encoding was standardized on HTML, so that people could build things like authoring tools and search engines. If we can do a lot of the same thing with an XML document web without forcing everyone to squeeze their data into something like RDF or XTM, we might just be able to get enough people to play along to make it work.

5 Responses to REST design question #3: meaning of a link

Pingback: Quoderat » REST design question #4: how much normalization?
Pingback: Quoderat » REST design question #5: the “C” word (content)
Pingback: Quoderat » REST design question #5: the “C” word (content)
Pingback: AsynchronousBlog
Bo says:

February 22, 2005 at 6:25 pm

Don’t really see what you’re getting at here though I’d be happy to continue discussing RESTful architectures with you.

Comments are closed.