REST design question #2: listing and discovering resources

The second in my series of REST design questions is how to handle listing and paging, or, in fancier jargon, resource discovery. I prefer concrete examples, so I’ll start with one that I know is flawed and then try to find ways to fix it.

Let’s say that I have a large collection of XML data records with URLs like http://www.example.org/airports/cyow.xml and http://www.example.org/airports/cyyz.xml, and so on. Since they all share the same prefix, it would be reasonable to assume that performing an HTTP GET operation on that prefix (http://www.example.org/airports/) would return a list of links to all of the data records (though I acknowledge that URLs are opaque and no one should rely on that, etc. etc.):

<airport-listing xmlns:xlink="http://www.w3.org/1999/xlink" xml:base="http://www.example.org/airports/">
  <airport-ref xlink:href="cyow.xml"/>
  <airport-ref xlink:href="cyyz.xml"/>
  <airport-ref xlink:href="cyxu.xml"/>
  ...
</airport-listing>

This is a wonderfully RESTful example, since it shows how (say) a search-engine spider could eventually find and index every XML resource. However, anyone who’s ever worked on a large, production-grade system can see that there’s a huge scalability problem here (I’m leaving out other possible issues like privacy and security). For a listing of a few dozen resources, this is a great approach. For a listing of a few hundred, it’s manageable. A listing of a few thousand resources will start to consume serious bandwidth every time someone GETs it, and a listing of a few million resources is simply ridiculous.

HTML-based web applications designed for humans typically employ a combination of querying and paging to deal with discovering resources from a large collection. For example, I might start by specifying that I’m interested only in airports with instrument approaches within 500 nautical miles of Toronto; then the application will return a single page of results (say, the first 20 matches), with a link to let me see the next page if I’m interested.

How would this work for a REST-based data application? Clearly, we want to use GET rather than POST requests, since pure queries are side-effect free, so presumably, I’d end up adding some request parameters to limit the results:

http://www.example.org/airports/?ref-point=cyyz&radius=500nm&has-iap=yes

That’s certainly not the kind of pretty REST URL that we see in the examples, but it does look a lot like the ones used in Amazon’s REST web services, so perhaps I’m on the right track. Of course, there will have to be some way for systems to know what the available request parameters are. Now, perhaps, the result will look something like this (assuming 20 results to the page):

<airport-listing xmlns:xlink="http://www.w3.org/1999/xlink"
    xml:base="http://www.example.org/airports/?ref-point=cyyz&radius=500nm&has-iap=yes">
  <airport-ref xlink:href="cyow.xml"/>
  <airport-ref xlink:href="cyyz.xml"/>
  <airport-ref xlink:href="cyxu.xml"/>
  ...
  <next-page-link xlink:href="http://www.example.org/airports/?ref-point=cyyz&radius=500nm&has-iap=yes&start=21"/>
</airport-listing>

As far as I understand, this is good REST, because the XML resource contains its own transition information (i.e. a link to the next page). However, this is pretty unbelievably ugly. Presumably, the same kind of paging could work on the entire collection when there are no query parameters, so that

http://www.example.org/airports/

or

http://www.example.org/airports/?start=1

would return the first 20 airport references, followed by a link to http://www.example.org/airports/?start=21, which will return the next 20 entries, and so on. The potential power of REST and XLink together is clear: it is still possible to start at a single URL with a simple crawler and discover all of the available resources automatically, and unlike WS-*, I did it without having to deal with extra, cumbersome specs like UDDI and WSDL. Still, this looks a bit like an ugly solution to me. I’ll look forward to hearing if anyone can come up with something more elegant.

About David Megginson

Scholar, tech guy, Canuck, open-source/data/information zealot, urban pedestrian, language geek, tea drinker, pater familias, red tory, amateur musician, private pilot.
This entry was posted in Uncategorized and tagged . Bookmark the permalink.

4 Responses to REST design question #2: listing and discovering resources

  1. Pingback: Quoderat » REST design question #3: meaning of a link

  2. Mike Gratton says:

    Isn’t this sort of thing what extended XLinks, rather than basic ones, are for? I need to re-read the XLink spec and think about it some more, but it seems like you should be able to use extended XLinks to come up with reasonable XML document that acts as a “cursor” for accessing sub-/child-resources.

  3. Pingback: AsynchronousBlog

  4. Bo says:

    No, ‘next-page’ is an ugly design in my opinion. Logically, the /airports is a resource that represents a collection of other resources. You should query this collection using query strings. GET /airports should, in my opinion, return information about the COLLECTION of airports eg the number of airports contained, the name of the first airport, the name of the last airport, etc. As for the general question of returning too many results for a query I don’t think it’s a problem. You don’t need paging because a computer doesn’t need to scroll through thousands of results. Another solution I’ve toyed with is to make the query itself a resource. With this mechanism, GET /airports redirects to a ‘/queries/123/’ resource that contains ‘page’ links to each ‘page’ of the query. But this kind of messes with the notion that GET should return a representation of a resource and not create/update/delete stuff.

Comments are closed.