Dare Obasanjo is giving a bit of pushback on the Atom Publishing Protocol, but the part that caught my attention was the section on the Lost Update Problem. This doesn’t have to do with REST per se as much as with the choice not to use resource locking, but since REST people tend to like their protocols lightweight, the odds are that you won’t see exclusive locks on RESTful resources all that often (it also applies to some kinds of POST updates as well as PUT).
How to lose a REST update
- I check out a resource about “John Smith” (as a web form or an XML document, for example), and correct the first name field to “Jon”.
- You check out the same resource, and correct the last name field to “Smyth”.
- I check in my changes.
- You check in your changes.
You have corrected the last name to “Smyth”, but have inadvertently overwritten my correction of the first name with the old value “John”, because you never saw my update.
Detection, not avoidance
Without exclusive locks, there’s no way to avoid this problem, but it is possible to detect it. What happens after detection depends on the application — if it’s interactive, for example, you might redisplay the form with both versions side by side. I don’t mean to diminish the difficulty of dealing with check-in conflicts and merges — it’s a brutally hard problem — but it’s one that you’ll have whenever you chose not to use exclusive resource locks (and even with resource locks, the problem still comes if someone’s lock expires or is overridden). Managing multi-user resource locks properly can require a lot of extra infrastructure, and they have all kinds of other problems (ask an enterprise developer about the stale lock problem), so there are often good reasons to avoid them.
State goes in the resource, not the HTTP header
Dare points to an old W3C doc that talks about doing lost-update detection using all kinds of HTTP-header magic, requiring built-in support in the client (such as a web browser). That doesn’t make sense to me. A better alternative is to include version information directly in the resource itself. For example, if I check out the record as XML, why not just send me something like this?
<record version="18"> <given-name>John</given-name> <family-name>Smith</family-name> </record>
If I check it out as an HTML form, my browser should get something like this:
<form method="post" action="/actions/update"> <div> <input type="hidden" name="version" value="18" /> Given name: <input name="given-name" value="John" /> Family name: <input name="family-name" value="Smith" /> <button>Save changes</button> </div> </form>
When you check out the resource, you’ll also get version 18. However, when I check in my changes (using PUT or POST), the server will bump the resource version to 19. When you try to check in your copy (still at version 18), the server will detect the conflict and reject the check-in. Again, what happens after that depends on your application.
The Sneakernet Test
I think that this is far better than the old W3C solution, because it (1) it’s already compatible with existing browsers, and (2) it passes what I call the Sneakernet Test — I can take a copy of the XML (or JSON, or CSV, or whatever) version of the resource to a machine that’s not connected to the net, edit it (say, on the plane), then check it back in from a different computer — I can copy it onto a USB stick, take it to the beach, edit it on my laptop, then take it back to work and check it back in — all the state is in the resource, not hidden away in cryptic HTTP headers.
By the way, if you don’t trust programmers to be honest when designing their clients, you can use a non-serial, pseudo-random version so that they can’t just guess the next version and avoid the merge problem, but serial version numbers should be fine most of the time.
The GData API appends a version number to the URL, for updates and deletes. In order to pass the sneakertest, you’d have to also remember (or copy) the URL.
Thanks for that info, Han. I have no problem including the version number to retrieve a specific version of a resource from the history, e.g.
http://www.example.org/resource.xml?version=3
or even
http://www.example.org/resource/3/
but keeping the number in the URL for delete/update of the latest version fails the sneakernet test, as you point out.
Some immediate problems I see with this scheme are:
1. What if the resource is a binary object, e.g. a JPEG file?
2. Putting this information in the message violates the self-descriptive REST constraint. One could argue that a version number _is_ part of the representation, and not metadata external to the representation and valuable to intermediaries. But keeping this information in the header as an ETag allows me to let off-the-shelf software worry about this for me. For instance, a Web server can manage the lost update problem on ordinary files. Though, admittedly, none do. So, it seems that “version” information really belongs in the headers.
3. Ideally, one should be transferring higher-order media-types, which may not be extensible. Or, if they are, requires that everyone agree on this versioning extension for every different media-type.
In a JPEG file, you could stick in the version number as extended EXIF data, but I understand your point in general. On the other hand, not only is an etag not widely supported, but as I mentioned, the information gets lost unless the representation stays on a single system with an open connection to the server, which is severely limiting.
For binary objects being edited by non-browser clients, maybe the best thing would be to distribute them in some kind of package with a small, standard metadata file (XML or otherwise). The model of JAR files springs immediately to mind.
FWIW, you do locking in REST just like you do everything else: by modelling locks as resources. You POST to the lock manager to ask for a lock, and it creates a new lock resource and gives you the URI. What you do with that depends on a number of considerations; maybe the resource contains some sort of auth token, or a special URI to which to PUT your update, or maybe the lock resource itself accepts the update on behalf of the actual resource. Once you’re done, you DELETE the lock.
Couldn’t be much simpler.
I don’t agree that using HTTP headers is a bad idea.
You have to write the eTag stuff anyway for caching GETs, so it would be easy to generalize the support to PUT as well.
As for browsers, the only way to do PUT in browsers is with XHR, which has full header support, so no browser changes are required.
So for webapps, which aren’t sneaker-compatible anyways, the w3c solution sounds perfect to me.
Sjoerd: Most webapps still use HTML forms and POST, not PUT — since they repost the whole form (rather than just the changed fields), they run into exactly the same lost update problem. You’re right that you could hand-code something in JavaScript that uses etags with either POST or PUT and XMLHTTPRequest, as long as you’re working with a web browser that has JavaScript enabled, and not, say, a cell phone or similar; on the other hand, people are starting to write AJAX-y webapps that work offline, and it’s only a small step from that to full sneakernet compatibility.
I guess that this is just an extension of the very old protocols-vs-formats debate.
“SSE is ideal for bidirectional, asynchronous synchronization, particularly for scenarios where multiple people can independently edit or create entries–where there is not a single “master” copy of the data and each end user has their own copy of the data.”
http://msdn2.microsoft.com/en-us/xml/bb190613.aspx
Pingback: Dare Obasanjo aka Carnage4Life - GData isn't a Best Practice Implementation of the Atom Publishing Protocol
Pingback: Dave Johnson: Latest links: do you Dare criticize the APP? | Server software
For GData, the edit URL that contains the versioning info is actually contained within the resource (atom:entry/atomk:link@rel=”edit”). So it could be viewed as an implementation of the algorithm proposed here.