The second in my series of REST design questions is how to handle listing and paging, or, in fancier jargon, resource discovery. I prefer concrete examples, so I’ll start with one that I know is flawed and then try to find ways to fix it.
Let’s say that I have a large collection of XML data records with URLs like
http://www.example.org/airports/cyyz.xml, and so on. Since they all share the same prefix, it would be reasonable to assume that performing an HTTP GET operation on that prefix (
http://www.example.org/airports/) would return a list of links to all of the data records (though I acknowledge that URLs are opaque and no one should rely on that, etc. etc.):
<airport-listing xmlns:xlink="http://www.w3.org/1999/xlink" xml:base="http://www.example.org/airports/"> <airport-ref xlink:href="cyow.xml"/> <airport-ref xlink:href="cyyz.xml"/> <airport-ref xlink:href="cyxu.xml"/> ... </airport-listing>
This is a wonderfully RESTful example, since it shows how (say) a search-engine spider could eventually find and index every XML resource. However, anyone who’s ever worked on a large, production-grade system can see that there’s a huge scalability problem here (I’m leaving out other possible issues like privacy and security). For a listing of a few dozen resources, this is a great approach. For a listing of a few hundred, it’s manageable. A listing of a few thousand resources will start to consume serious bandwidth every time someone GETs it, and a listing of a few million resources is simply ridiculous.
HTML-based web applications designed for humans typically employ a combination of querying and paging to deal with discovering resources from a large collection. For example, I might start by specifying that I’m interested only in airports with instrument approaches within 500 nautical miles of Toronto; then the application will return a single page of results (say, the first 20 matches), with a link to let me see the next page if I’m interested.
How would this work for a REST-based data application? Clearly, we want to use GET rather than POST requests, since pure queries are side-effect free, so presumably, I’d end up adding some request parameters to limit the results:
That’s certainly not the kind of pretty REST URL that we see in the examples, but it does look a lot like the ones used in Amazon’s REST web services, so perhaps I’m on the right track. Of course, there will have to be some way for systems to know what the available request parameters are. Now, perhaps, the result will look something like this (assuming 20 results to the page):
<airport-listing xmlns:xlink="http://www.w3.org/1999/xlink" xml:base="http://www.example.org/airports/?ref-point=cyyz&radius=500nm&has-iap=yes"> <airport-ref xlink:href="cyow.xml"/> <airport-ref xlink:href="cyyz.xml"/> <airport-ref xlink:href="cyxu.xml"/> ... <next-page-link xlink:href="http://www.example.org/airports/?ref-point=cyyz&radius=500nm&has-iap=yes&start=21"/> </airport-listing>
As far as I understand, this is good REST, because the XML resource contains its own transition information (i.e. a link to the next page). However, this is pretty unbelievably ugly. Presumably, the same kind of paging could work on the entire collection when there are no query parameters, so that
would return the first 20 airport references, followed by a link to
http://www.example.org/airports/?start=21, which will return the next 20 entries, and so on. The potential power of REST and XLink together is clear: it is still possible to start at a single URL with a simple crawler and discover all of the available resources automatically, and unlike WS-*, I did it without having to deal with extra, cumbersome specs like UDDI and WSDL. Still, this looks a bit like an ugly solution to me. I’ll look forward to hearing if anyone can come up with something more elegant.