REST design question #5: the "C" word (content)

The other posts in this series of REST design questions has danced around the edge of the content problem dipping in its toes with issues like identification and linking, but now that the design questions are coming to a close, it’s time to dive right into REST’s biggest problem: content.

The principles of REST tell you how to manage resources in a CRUDdy way, but not what you can actually do with those resources. This is not a problem shared by other XML networking approaches: XML-RPC defines precisely what its XML content means, to the point that it can be serialized and deserialized invisibly and automatically; SOAP allows any kind of XML payload in principle (assuming it’s wrapped in a SOAP envelope), but most people use the default SOAP encoding which, again, can be serialized and deserialized somewhat automatically. REST, on the other hand, is pure architecture without any direct mention of content. RESTafarians boast that there are RESTful web applications already online for Amazon, , eBay, Flickr, and many others, but developers quickly figure out that they don’t get any benefit: each REST application requires its own separate stovepipe of code support right from the ground up, because they all use different content formats. If these all used XML-RPC or SOAP, there would be many standard libraries to simplify the developers’ work, and a lot of shared code that could work with all these sites.

Is REST, in practical terms, nothing more than a marketing word?

RESTafarians can argue that the lack of content standardization is a good thing, because it leaves the architectural flexible enough to deal with any kind of resource, from an XML file to an image to a video to an HTML page — moving the last two using XML-RPC or SOAP can be less than pleasant. On the other hand, the lack of any kind of standard content format makes it hard actually to do anything useful with RESTful resources once you’ve retrieved them. People have put forward candidates for standard XML-encoded REST content, including RDF and XTM, but it’s unlikely that either of these will take off, especially since RDF (the leader) does not even work nicely with most other XML-based specifications like XQuery or XSLT.

Standardizing XML REST content in bits and pieces

The alternative is to standardize content in bits and pieces — instead of trying to come up with a comprehensive data-encoding format, we can try to come up with a profile of standard markup bits that people can use in any kind of XML data document. Here are some of the possibilities:

xlink:href and xml:id for linking

I’ve already mentioned how the use of the xlink:href attribute will make it possible to design XML data crawlers similar to HTML crawlers, along with search engines and all the other good things that follow: no matter what the document type, the engine will be able to find the links.

Together with xlink:href, xml:id can allow links to point to fragments of XML documents easily, making it possible to refer to embedded resources.

<data>
  <person xml:id="dpm">
    <name>David Megginson</name>
  <person>
 
  <weblog>
    <title>Quoderat</title>
    <author xlink:href="#dpm"/>
  </weblog>
</data>

This stuff is critical — since REST is all about linking, lack of a standard linking mechanism in content will simply kill it before it can even start.

xml:base for document identification

Similarly, the xml:base attribute can provide an identifier and locator for an XML data document. An xml:base attribute attached to the root element can both give a base URL for resolving relative links in the document and a global identifier for the document.

<data xml:base="http://www.example.org/data/foo.xml">
  ...
</data>

xsi:type for data typing (?)

Do we need data typing at all in XML? The use of external schemas is generally a bad idea both for performance and security reasons, so if we want typing at all (at least for simple data types), we should do it in the document instance itself, using something similar to the xsi:type attribute. Norman Walsh doesn’t like this approach, but for reasons different from mine: I think that typing information is useful mainly for authoring, not publishing; Norman would prefer to see it offloaded into external schemas. If you want typing at all, I think that something like

<start-date xsi:type="xsd:date">2005-02-23</start-date>

is generally inoffensive, aside from the fact that it uses Namespace prefixes in attribute values (a bit of a nasty kludge). Compared with bolting a whole schema onto our poor little XML data document, however, it’s a lightweight solution, assuming that it actually adds useful information.

Dublin Core for simple, basic properties (??)

The Dublin Core failed completely in the HTML meta element, and many people don’t think it’s particularly well set up, but somehow those original 16 simple property names still have a lot of popular recognition in the tech community. By far the most useful of the property names is dc:title, which identifies the name of a resource (for display in a pick list, search engine results, and so on).

<city xmlns:dc="http://purl.org/dc/elements/1.1/">
  <dc:title>San Diego</dc:title>
  <region>California</region>
  <country>US</country>
  <population>1223400</population>
</city>

Will people go for this, though, or will the Dublin Core fizzle out here as well?

What else?

What other bits and pieces are out there that people would actually use in XML data files served out by RESTful web applications? I’m not convinced that the xml:space attribute is all that useful for generic XML data files, since it’s about formatting rather than meaning; the xml:lang is useful in XML documents intended for human readers, as I’ve mentioned, but for fielded data, I’d rather see language information in its own proper field, maybe using the Dublin Cores dc:language element (if the Dublin Core succeeds). Perhaps people will borrow rss:enclosure from RSS 2.0, for lack of any other standard way to indicate an external non-XML resource.

I’d love to hear other suggestions of what might appear in a simple profile for XML data REST content.

About David Megginson

Scholar, tech guy, Canuck, open-source/data/information zealot, urban pedestrian, language geek, tea drinker, pater familias, red tory, amateur musician, private pilot.
This entry was posted in General. Bookmark the permalink.

5 Responses to REST design question #5: the "C" word (content)

  1. Danny says:

    Personally I think I’d go with keeping the content (document, data or whatever) orthogonal with the transport. If cross-app format standardization is needed, for docs there’s XHTML and DocBook, for data there’s RDF/XML and more RDF/XML (this time derived from arbitrary XML via GRDDL). There is a standard way to refer to a non-XML (representation of a) resource, that’s a URI plus MIME type. URIs can appear in XML very nicely as rdf:about/rdf:resource attributes. Staying with the RDF theme, note that if you add a default namespace to your DC city example, it becomes valid RDF…

  2. You shouldn’t use elements from foreign vocabularies like the Dublin Core. There are often subtle semantic differences between the defined meaning of the elements and the way they seem to be applicable in other vocabularies. Those differences tend to reveal themselves only in practice (when the semantics are actually used, still quite rare on the web). It’s better to only use your own vocabulary, and then provide a translation (like f.e. XSLT) to the other vocabularies. When the differences show up, you only have to change the translation, not your format.

  3. Pingback: AsynchronousBlog

  4. Pingback: Bill de hÓra

  5. Pingback: Bill de hÓra

Comments are closed.