Dare Obasanjo is giving a bit of pushback on the Atom Publishing Protocol, but the part that caught my attention was the section on the Lost Update Problem. This doesn’t have to do with REST per se as much as with the choice not to use resource locking, but since REST people tend to like their protocols lightweight, the odds are that you won’t see exclusive locks on RESTful resources all that often (it also applies to some kinds of POST updates as well as PUT).
How to lose a REST update
- I check out a resource about “John Smith” (as a web form or an XML document, for example), and correct the first name field to “Jon”.
- You check out the same resource, and correct the last name field to “Smyth”.
- I check in my changes.
- You check in your changes.
You have corrected the last name to “Smyth”, but have inadvertently overwritten my correction of the first name with the old value “John”, because you never saw my update.
Detection, not avoidance
Without exclusive locks, there’s no way to avoid this problem, but it is possible to detect it. What happens after detection depends on the application — if it’s interactive, for example, you might redisplay the form with both versions side by side. I don’t mean to diminish the difficulty of dealing with check-in conflicts and merges — it’s a brutally hard problem — but it’s one that you’ll have whenever you chose not to use exclusive resource locks (and even with resource locks, the problem still comes if someone’s lock expires or is overridden). Managing multi-user resource locks properly can require a lot of extra infrastructure, and they have all kinds of other problems (ask an enterprise developer about the stale lock problem), so there are often good reasons to avoid them.
State goes in the resource, not the HTTP header
Dare points to an old W3C doc that talks about doing lost-update detection using all kinds of HTTP-header magic, requiring built-in support in the client (such as a web browser). That doesn’t make sense to me. A better alternative is to include version information directly in the resource itself. For example, if I check out the record as XML, why not just send me something like this?
<record version="18"> <given-name>John</given-name> <family-name>Smith</family-name> </record>
If I check it out as an HTML form, my browser should get something like this:
<form method="post" action="/actions/update"> <div> <input type="hidden" name="version" value="18" /> Given name: <input name="given-name" value="John" /> Family name: <input name="family-name" value="Smith" /> <button>Save changes</button> </div> </form>
When you check out the resource, you’ll also get version 18. However, when I check in my changes (using PUT or POST), the server will bump the resource version to 19. When you try to check in your copy (still at version 18), the server will detect the conflict and reject the check-in. Again, what happens after that depends on your application.
The Sneakernet Test
I think that this is far better than the old W3C solution, because it (1) it’s already compatible with existing browsers, and (2) it passes what I call the Sneakernet Test — I can take a copy of the XML (or JSON, or CSV, or whatever) version of the resource to a machine that’s not connected to the net, edit it (say, on the plane), then check it back in from a different computer — I can copy it onto a USB stick, take it to the beach, edit it on my laptop, then take it back to work and check it back in — all the state is in the resource, not hidden away in cryptic HTTP headers.
By the way, if you don’t trust programmers to be honest when designing their clients, you can use a non-serial, pseudo-random version so that they can’t just guess the next version and avoid the merge problem, but serial version numbers should be fine most of the time.