A short while ago, I reluctantly acknowledged that RSS 2.0 will likely fill the same role for data that HTML fills for documents, providing a single, shared format across the web (the big missing piece of the puzzle for REST apps). Now, it appears that someone a lot smarter than I am — no one less than Adam Bosworth — is suggesting exactly the same thing.
If I’m wrong about RSS, at least I’ll be wrong in excellent company.
I really hope we end up with Atom in that role instead, because if it’s RSS 2.0 we get stuck with, then we’re going to have to live with hackneyed, half-baked hack solutions for a long time.
Ok, say I want to communicate the age of my dog over the Web. I can imagine how to do it in a custom XML vocabulary (I’ll use square brackets just in case):
Ok, if you could get the parties involved to agree on that format and some kind of interpretation, that could work. Now personally I’d opt for doing this in RDF. The interchange format would be RDF/XML, and the syntax could be exactly the same as the example above (it’s valid RDF/XML). A likely benefit in this case being the partial understanding aspect (e.g. a receiver may not understand ‘Dog’, but it still might be able to work with the properties ‘name’ and ‘age’). As it stands you can put that data into the W3C’s RDF validator and get a graphic representation (after switching the brackets, and selecting ‘no rdf:RDF’ element on the form). This vocab and serialization would also be entirely compatible with RSS 1.0.
So my questions are, how would you communicate the same information in RSS 2.0, and more to the point what would it gain you over the plain-XML and RDF/XML alternatives?
I agree with Danny that RDF or Plain Old XML models this more cleanly. In RSS 2.0, I guess, it would look something like this fragment:
(Or, alternatively, you could declare a separate namespace and include custom elements for age and type.) What you gain is an ability to tap into the enormous existing RSS infrastructure people have been building, just as HTML gave us an ability to tap into the enormous web infrastructure in the mid 1990s.
Ok, there is an enormous RSS infrastructure developing, but I’m skeptical how much you could tap into it usefully with something data-oriented, when RSS 2.0 is so loose and aligned to (blog) content. The polled-HTTP protocol part of RSS certainly brings in something new, as does microcontent (or whatever you wish to call less content/more metadata). I guess when the Atom stuff is in place for posting with similar data things should start to get interesting. But so far that enormous infrastructure would allow you to see an item titled “Basil” in a viewer, not really an advance on HTML.
I come to the same conclusion as David very reluctantly too. As others have noted, RSS 2 is a bit half baked, it’s very unclear how it could evolve, and its “true” definition for interoperability is embedded in lots of code rather than a real spec. All that was true of HTML in the early days, however, but that didn’t stop it. All the sound logical arguments against “RSS as the HTML for data” were made against HTML as the world’s standard hypertext language. I personally wish the world were more of a place where doing the Right Thing was rewarded, but time and again, “worse is better” wins out.
Bosworth’s presentation is very well worth studying in this respect. He says that successful Web-scale technologies tend to be simple (for users), sloppy, standardized (widely deployed in a more or less interoperable way, irrespective of formal status), and scalable. I don’t think Atom or RDF meet these criteria. Atom’s main value over RSS is supposed to be its FORMAL standardization, but apparently nobody really cares. (Tim Bray’s “Mr. Safe” has not appeared, but RSS interop and even extensibility is happening and making it boss-friendly in practice). RDF is not simple for ordinary mortals, and its scalability is unproven. (I have been informed that actual RDF systems handle sloppiness well, even though one would think that its basis in formal logic would make it brittle … I don’t know how to evaluate that). Or maybe these could just as well have proven to be “HTML for data” but didn’t through the sheer perversity of timing or luck or personality … who knows. The apparent fact is that RSS has the momentum and mindshare, and the various power laws (Metcalfe’s, Pareto’s, … even Gresham’s!) predict it will dominate.
The way it (or Atom for that matter) would work as HTML for Data is by providing a framework in which information that is common to most data formats (an identity, title, description, timestamp, owner …) just work, and it provides a framework within which specific communitites or domains can agree on conventions for marking up the content to be interoperable. For example, I have heard that Froogle is driving the evolution of markup for online stores toward conventions that it understands. If you build it and Froogle can find it, presumably the customers will come. If you wait for a “real” standard or try to do the Right Thing but publish your data in a format that consumers can’t consume, you’ll presumably go out of business. Given the intense competition among Yahoo, Google, and MSN, even slight movements toward domain standardization seem likely to be amplified, much like the way HTML markup evolved in the early 90’s. Again, this will be quick n dirty type of standardization that we may all regret in a few years, but if Bosworth is right, it is very likely to happen.
I would have agreed with Danny about RSS being only for blogs rather than data, but the way interoperable RSS enclosures for sound files (“podcasting”) has come out of thin air in less than a year suggests that it does provide that simple and sloppy framework that evolves and survives.
I would be perfectly happy to be wrong about this … this looks like an interesting real-world experiment to test the Worse Is Better hypothesis. I wish it weren’t true, but I’m not going to bet against it this time.
Pingback: Bill de hÓra
Pingback: Bill de hÓra