POST, PUT, idempotence, and self-identification

I realize that the title of this posting might be a bit off-putting for non-RESTafarians, but it’s about a topic that matters to anyone building an application that uses simple web standards for updates.

Background: how REST usually works

There are two HTTP methods that you can use for adding/updating information on a web site:

PUT
PUT, when you know the address (URL) of the thing you’re adding/updating.
POST
POST, when you don’t know the address of the thing you’re adding/updating (you’re implicitly asking the server to assign a URL for you).

Since PUT refers to a specific URL, people say that it’s idempotent — that is, if you repeat the same operation three times, it won’t create three separate resources on the server (I’ve simplified the HTTP headers a fair bit in these examples):

PUT http://example.org/greetings/resource01.xml

<greeting>Hello!</greeting>
PUT http://example.org/greetings/resource01.xml

<greeting>Hello!</greeting>
PUT http://example.org/greetings/resource01.xml

<greeting>Hello!</greeting>

The result is just one resource at http://example.org/greetings/resource01.xml.

POST, on the other hand, sends the request to one URL (or other address) that will place the resource at a different URL. If PUT is like placing the resource in a file yourself, POST is like handing it to a file clerk. As a result, POST is not guaranteed to be idempotent:

POST http://example.org/actions/add-greeting

<greeting>Hello!</greeting>
POST http://example.org/actions/add-greeting

<greeting>Hello!</greeting>
POST http://example.org/actions/add-greeting

<greeting>Hello!</greeting>

In this case, HTTP itself makes no guarantee that this three-time repetition won’t result in representations of three different resources with the same content, say http://example.org/greetings/resource01.xml, http://example.org/greetings/resource02.xml, and http://example.org/greetings/resource03.xml.

Self-identification

So that’s it, as far as HTTP goes, because HTTP is just a transport layer — with a few exceptions (like re-encoding text files), it doesn’t care whether you’re using it to send a picture, video, web page, or XML file. Because HTTP doesn’t know anything about your resource, it can’t make any guarantee about the idempotence of POST.

Let’s say, however, that you’re designing an XML-based web application where every resource has its own, internal unique identifier. To take a simple example, let’s use ISO 3166-1 alpha2 country codes, such as “US” for the United States or “DE” for Germany. Your XML files always contain the identifier in a place that the application can find it:

<country ident="CN">
  <name>China</name>
  <population>1338299500</population>
  <land-area unit="km2">9640011</land-area>
</country>

If your web application has a fixed URL scheme based on the ISO 3166 code, it will create this resource at (say) http://example.org/countries/CN.xml when you POST it. If you POST it a second time, it won’t create a second copy of the same information, because it has a fixed mapping between the code “CN” and the URL, and knows that this is a new representation of the same resource. In other words, while HTTP itself can’t guarantee idempotence for a POST operation, the web application can.

So now what?

So for a web application like this, does it make sense to have both POST and PUT operations? RDF fans would say clearly “yes”, since they would consider the URL — not the ISO 3166 code — to be the resource’s identifier, and I feel strong sympathy with that approach (URLs make great global identifiers for things). There is, however, a down-side. Let’s say I’m reading data from a legacy system that doesn’t know or care about URLs. Does it make sense to force that system to duplicate my URL-construction algorithm, and know to PUT its updates for China to http://example.org/countries/CN.xml, for Canada to http://example.org/countries/CA.xml, etc.? Or does it make sense just to let the client POST to a single URL, knowing that the application will enforce idempotence based on the domain-unique identifier in the XML file?

I actually haven’t made up my mind on this point yet. As I wrote in a 2005 posting, real-world REST applications generally stick to GET for retrieval and POST for creation/updating/deletion, however inelegant that approach may be. I hope things have improved in the last 6½ years, but I’m not confident that they have.

What do you think? Is allowing both POST and PUT unnecessarily complicating the interface for the sake of RESTful purity, or is using only POST for creation and updating breaking the implied contract of HTTP and risking confusion among users?

This entry was posted in General. Bookmark the permalink.

7 Responses to POST, PUT, idempotence, and self-identification

  1. Chuck says:

    If you trust clients to put the right representations at the right URLs, then PUT might be good; if you dont, then maybe POST is better. If your URI structure is static and clients can easily construct the right URL, then PUT might be good. If it is important that your clients be able to simply try again when they experience a network error, PUT might work better for you. If your application can detect duplicate POST requests based on the content and do the right thing, then maybe POST is just fine. If your legacy app is the only thing using the service and it’s extra work to make it add the ISO code to the end of the URL for a PUT, then use POST. If some of your clients dont support PUT, then maybe POST is better.

  2. Kurt Cagle says:

    David,

    The distinction actually is more subtle. Assume for the moment that you know nothing about the back end state or representation of the object in question – that is to say, there is a transformation that converts your POSTed or PUT document into the internal state of the server. If you assume that you have a purely idempotent system (one where a resource is never destroyed) then PUT becomes a mechanism for handling versioning. I’ve actually been working on such a system for a publishing client, in which POST effectively creates a new “resource” while PUT updates that resource, but in the back end, the POST actually creates two “documents” – a resource “proxy” and the first or base revision of that document. The PUT, on the other hand, will just create a revision of that original document and save it, with the revision having a different identifier than that version version of that document.

    When you retrieve that document, you are getting the latest revision of that document, but the system itself never destroys previous revisions. What’s more, the same system can store within each revision its revision chain (via URL pointers stored in an envelope holding the document), meaning that the base object will have an auditable history. This implies that in such an environment, even PUT is non-destructive.

    The processing pipelines in this case are really the key to this. A lot of (perhaps most) people have some strong misconceptions about REST. The first is that if you use GET and POST, you are engaging in REST, yet the vast majority of such calls are in fact RPCs (especially POST) – you are passing a bundle of parametric content along with an imperative expressed through either the URL or the payload, and you get back other content as a consequence.

    The second expression of REST is that it is in fact nothing but CRUD, that you save or load resources to a database through a URI. While this is technically more RESTful, it does not take advantage of the fact that the internal and external representations of a resource do not necessarily have to match – so long as the external representation can be transformed from an internal representation and vice versa (albeit with no requirements that the results coming in or going out have to be the same representations) you are still doing RESTful operations.

    Yet if you assume idempotency (and hence non-destructive PUTs and revisional content) then what emerges is a truly “stateless”, purely declarative system, because you can, in theory, pass in a time parameter and get back the state of that resource at that time for any time in a resource’s history.

    This has profound implications for semantic systems. One of the biggest challenges that linked data systems face is that in a destructive REStful system, RDF triples can only describe relationships relative to the current state of the triple store system – even if such triples are themselves never destroyed. On the other hand, in an IDEMPOTENT PUT system, the relationships that resources have (assuming that the predicate of such a relationship is also defined in such a manner) can be rolled forward or back over time. As a consequence, such a triple store becomes a time machine that evinces the relational descriptions between its resources based upon the timestamp of the query.

    I’m aware that there are data modeling systems that do this already, but I think as the relationships between resources, REST and RDF become more fully understood (and I do believe there is a DEEP connection between these) and as hardware and software systems advance to the point where idempotent PUT becomes the norm rather than the exception, I believe that we’ll see some very interesting developments in this space.

  3. Ed Davies says:

    The problem is not subtle differences. The problem is that people just don’t care whether they actually implement HTTP or only something which uses port 80 and superficially looks a bit like HTTP:

    http://wiki.openstreetmap.org/wiki/Talk:API_v0.6#Overloading_of_PUT_method

    —-

    Kurt Cagel: “Yet if you assume idempotency (and hence non-destructive PUTs and revisional content)…”

    I think you misunderstand the word “idempotency”. From http://en.wikipedia.org/wiki/Idempotence

    “Idempotence (…) is the property of certain operations in mathematics and computer science, that they can be applied multiple times without changing the result beyond the initial application.”

    PUT can be destructive yet idempotent. The point is that if you do the PUT two or more times it has the same effect as only doing it once. If a system creates new revisions (and mints new revision URIs) for each PUT operation even if the representation put is the same then the operation is not idempotent. If it simply overwrites the resource with the given URI then it may well be.

  4. This PUT problem becomes easier if the representation is self-documenting.

    Your XML example of a country code doesn’t actually contain any URLs, so the application needs to know how to construct them. Why not just include the URL in the XML? After all, this is how it’s done in HTML and ATOM. This way the application doesn’t need to know how to construct urls, only how to follow links. The application still needs to know enough about the resource representations to know where the links are, but it doesn’t need to know how to construct links according to your (possibly changing) scheme.

    We can take this further and have the state representation include links and methods for all operations that can be done on the resource and related resources. In your country example, there could be a link “up” to a page that lists all country urls, a notation that you can PUT to the present resource to replace it, and POST to another url to create. Again, browsing web pages is a good human analogy: nav bars, breadcrumbs, etc, are all to facilitate browsing. Just do for machine browsers what you would have done for a human browser. (The result will probably end up looking more like gopher, if you remember that.)

    The problem is there’s no standard “language” (other than maybe html A, LINK, and FORM elements) for expressing these relationships, so every representation has to roll its own and every application has to understand that particular representation’s way of linking to resources.

    Also, this only gets you to the point where an application can POST for create, and then, after the application knows the URL, it can PUT for updates. Maybe it will always be too much of a burden to require PUT for create.

    Web Application Description Language (WADL) is trying to address this space for RESTful apis in the same way that WSDL tried to do for more RPC-like protocols. I definitely think there’s a place for things like this (and it may have mechanisms to enable reliable “PUT-create” operations), but solutions like this are heavy and still don’t put the resource links *inside* the representation itself.

  5. Don Park says:

    I think it’s like screwdrivers. I get them as a full set but most often used ones get misplaced and I have to got out to get another full set of screwdrivers. In all the webapps I’ve written so far, I needed only one screwdriver: POST. Do I still want PUT in my toolbox? Absolutely. 😉

    Frankly, I am more concerned about impact of REST on usability these days because some of the extensive REST API’s I’ve seen, like SoundCloud’s, made my eyes rollback like the way sound engineers’ desk do: whole lot of knobs, buttons, and sliders that all look alike. They screamed of beauty and order yet radiated total indifference to common needs.

  6. Gaius Gracchus says:

    This is odd:
    ————————————–
    POST http://example.org/actions/add-greeting
    Hello!
    PUT http://example.org/actions/add-greeting
    Hello!
    PUT http://example.org/actions/add-greeting
    Hello!
    ————————————–

    Should you have used POST in the last two repetitions of this code?
    Gaius G.

  7. Hi, I would like to subscribe for this webpage to take most
    recent updates, thus where can i do it please help.

Leave a comment