Burden of Proof

Bill de hÓra has a thoughtful piece about complexity experts and simplicity mavens. As Bill points out, conventional wisdom provides an easy way to criticize things without really thinking: in the 1990’s, you could diss any project or spec simply by claiming that “it won’t scale”; this decade, you can diss any project or spec by claiming that “it’s too complex”.

I think that there’s more to the change in conventional wisdom than mindless pot-shots, though — it’s really a fundamental change in the burden of proof (in the popular sense of the obligation to defend a position). With the rise of agile development and worse is better, we’ve shifted from “required until proven unnecessary” 10 years ago to “unnecessary until proven required” today, and I think that’s a change for the better — after all, if we’re uncertain either way, we might as well pick the option that requires less time and money. So when people say “the project is too complex,” what they’re really saying is “prove to me that these features are justified.”

Burden of proof matters a lot in life: just ask someone like Schapelle Corby, who has been charged with a crime in a country whose legal system does not support the presumption of innocence.

Tagged | 2 Comments

Admin: WP 1.5.1 and Spam Karma

I’ve just upgraded Quoderat to WordPress 1.5.1 and have installed the Spam Karma plugin, as recommended by Lauren Wood in a comment to my previous whining posting about comment spam. With luck, legitimate comments should now all appear immediately, while links to Online Poker should silently disappear. I’ll keep my fingers crossed.

I’ve already noticed that the default WP 1.5.1 theme is too narrow for source-code/XML listings, so the formatting on some of my older postings is messed up. I’ll work on that later. In the meantime, please let me know (somehow) if Spam Karma prevents you from leaving comments.

Tagged | 3 Comments

Problem-first design

[Update: a nice analogy from Jon Udell, and a thoughtful response from Tim Bray.] WSDL is too abstract to be useful, according to both Tim Bray and Norm Walsh (and for what it’s worth, I agree with them). Tim and Norm both propose much simpler alternatives — hardly surprising from two brilliant guys who helped design so much of what actually works in the XML world — but I think they might be going about things the wrong way this time.

How many developers do you know who complain about working nights and weekends manually entering connection information for thousands of publicly available web services? Given that there are, at most, a few dozen sites offering web services over the public web (and that’s web services in the most general sense, including REST as well as SOAP), I’ll guess that the answer is “zero”.

So here’s my suggestion: let’s hold off on designing new specifications until there’s a real problem to solve. If online services continue to grow, some day my hypothetical overworked developers will emerge. When we find them, we can go and ask them what they need to make their lives easier, and then write a specification that does the simplest thing that can possibly work to solve their problem, and no more. Perhaps they’ll need all the abstraction of WSDL (though I’m dubious); perhaps they’ll need something with a message-passing focus like Tim’s SMEX-D, or something with an RPC-focus like Norm’s NSDL; perhaps they’ll need all of those, or something that none of us has thought of yet.

Update 1: Jon Udell has an excellent footpath analogy that was passed on to him by Larry Wall. It seems to work perfectly for specification writing.

Update 2: Tim Bray has updated his posting with a well-thought-out defense of WSDL’s role (though not WSDL itself).

Posted in General | 5 Comments

More on RSS as the HTML for data …

A short while ago, I reluctantly acknowledged that RSS 2.0 will likely fill the same role for data that HTML fills for documents, providing a single, shared format across the web (the big missing piece of the puzzle for REST apps). Now, it appears that someone a lot smarter than I am — no one less than Adam Bosworth — is suggesting exactly the same thing.

If I’m wrong about RSS, at least I’ll be wrong in excellent company.

Tagged | 7 Comments

Collateral Damage

I am a bystander in the war between spammers and virus writers on the one side, and Microsoft and the antivirus companies on the other. I have never in my life read or sent an e-mail message using Microsoft Outlook, I spend, perhaps 4 hours/year using the Windows operating system (mostly helping other people with computer problems), and I never read e-mail or browse the web as root, so I should live in a fairly safe area, far from the battlefield. Nevertheless, I lost e-mail services for my whole domain this morning because of Outlook viruses on other people’s systems, and it will take at least a few hours before I can receive e-mail here again.

In fact, I’ve been hit by a lot of collateral damage over the years. I had to shut down my old e-mail account at this domain, david, when the volume of messages passed 1,000 per hour; even now, the megginson.com domain can receive as over 30,000 messages a day — it’s a day-to-day challenge to keep the domain working at all, involving frequent changes of ISP.

What happened? Because my old e-mail address was well known, it ended up in a lot of people’s Outlook address books; then, predictably, some of those systems got infected, so their Outlook installations started sending out virus messages with my return address forged, and those messages infected more systems, which started sending out more, and so on. Those didn’t affect me directly (aside from writing the occasional polite reply to an irate message asking why I was mailing viruses), but then the warnings from the antivirus software at other people’s sites started pouring in. The antivirus makers know perfectly well that the return addresses on virus attacks are nearly always forged, but still cannot resist a marketing opportunity by warning me that my non-existant Outlook installation is infected with a virus.

I don’t know how many more direct hits I’ll be able to withstand at megginson.com — I’ll never know, of course, how much business I’ve lost over the past couple of years because of these e-mail problems, and sometimes I’m tempted just to abandon the domain, or at least, any attempt at using it for e-mail.

If there’s a moral to this, it’s that sloppy design hurts more people than the immediate users — simply choosing not to use bad software does not protect you from its flaws. Security holes in Outlook hurt me, though I’ve never used the program; virus-warning spam from antivirus software makers repeatedly shut me down, though I’ve never bought their products. If we mess up too badly designing our next generation of XML-based systems (blogs, REST, Web Services, or what-have-you), it’s hard to predict how many people we’ll hurt beyond our immediate user base.

Posted in General | Tagged | 4 Comments

Gmail without AJAX, part 1

I noticed today that Gmail is now offering an alternative, non-AJAX interface, selectable by choosing “basic HTML” below the message listing. This is actually a great opportunity to experiment and see whether AJAX (or any other kind of heavy DHTML-style interaction) actually makes a enough of a difference to justify the extra implementation work.

I’ll do all my Gmail browsing using old-style HTML forms until next week and observe how much I miss the extra features, then will report back here.

(First note: Gmail does not allow you to change account settings using the non-AJAX interface.)

Tagged | 4 Comments

Self-classification on the web

Coordinator: Crucifixion?
Prisoner: Er, no, freedom actually.
Coordinator: What?
Prisoner: Yeah, they said I hadn’t done anything and I could go and live on an island somewhere.
Coordinator: Oh I say, that’s very nice. Well, off you go then.
Prisoner: No, I’m just pulling your leg, it’s crucifixion really.
Coordinator: [laughing] Oh yes, very good. Well…
Prisoner: Yes I know, out of the door, one cross each, line on the left.

From Monty Python, Life of Brian (1979)

And now, for the pure joy of killing the joke by trying to explain it, this scene of Life of Brian is funny for two reasons:

  1. the Romans allow the prisoners to self-classify themselves as condemned-to-death-by-crucifixion or free-to-go, even though the prisoners have every incentive to lie and save their own lives and no incentive to tell the truth; but
  2. the prisoners all classify themselves correctly anyway.

A lesser wit (like me) would have stopped at the first part of the joke and let all of the prisoners run off; however improbable the first part, however, it’s always the second part that gets the laugh.

Tim Bray is still wondering about tags, but what he’s really wondering about, I think, is the whole idea of self-classification on the web. Should we be as trusting as the Roman coordinator? Will web content creators classify themselves honestly? So far, the record has not been good — for example, web search engines quickly learned to ignore Dublin-Core-style information in the HTML meta element because, unlike the prisoners in Life of Brian, doomed by their own honesty, people who create content for the web lie. In fact, they lie a lot.

At this point, folksonomy tags are a bit of a cottage industry, so the incentive for lying is low (people are happy to tell the truth when it doesn’t cost them much). Self-classification can work when the costs of lying are unacceptably high and the benefits of lying are low or non-existant — for example, a departmental web site inside a government or large company, a member of a supply chain, or a major vendor with a reputation to protect would lose much and gain nothing by using deceptive metadata to pull in more traffic. That does not apply to the web as a whole, though. Once you move beyond established relationships (enterprise or inter-enterprise), trust is much more difficult to manage.

What will happen when tags become more popular? Will the current model be sustainable? Is there any future for using any kind of metadata to self-classify on the web? The answer probably has something to do with reputation management, though people are doing a good job gaming even that with link farms and comment-/wiki-spam. The crucifixion line looks rather empty right now.

Tagged | 5 Comments

Post in REST: create, update, or action?

(Personally, I think it would be healthier if we worked out the wrinkles in REST by writing lots of code rather than writing lots of blog entries, but blog entries are easier, and XML people have never been shy about pontificating, so here goes …)

Joe Gregorio has an excellent article on REST at XML.com, and I recommend that anyone interested in building an XML data app with REST rather than RPC (Web Services, etc.) take a look at it. However, one point in the article jumped out at me. Joe mapped CRUD to the standard HTTP verbs like this:

CRUD HTTP
Create POST
Retrieve GET
Update PUT
Delete DELETE

Personally, I’ve always seen the mapping more like this:

CRUD HTTP
Create PUT
Retrieve GET
Update PUT
Delete DELETE

In other words, PUT does double duty for both Create and Update. Of course, the sad point is that almost no actual REST applications work this way — most of them are read-only, so GET is the only verb that counts, and when they do allow updating information, they do not do it by sending an entire resource representation (i.e. XML file) via PUT.

What about POST?

I think there are actually two roles that POST can play in a data application:

  1. Partial, in-place updates (i.e. send just pieces of changed information, rather than a whole resource).
  2. Actions (i.e. buy a book).

For example, consider this resource representation (fancy REST talk for “file”) using POX:

<pet-record xml:base="http://www.example.org/pets/lassie.xml" xmlns:xlink="http://www.w3.org/1999/xlink">
 <name>Lassie</name>
 <gender>f</gender>
 <dame xlink:href="spot.xml">Spot</dame>
 <sire xlink:href="jenny.xml">Jenny</sire>
 <offspring xlink:href="marmaduke.xml">Marmaduke</offspring>
 <offspring xlink:href="snoopy.xml">Snoopy</offspring>
</pet-record>

Now, let’s say that Lassie has another puppy. Using REST, there are two obvious ways to update this information. The first is to download the XML file (using HTTP GET), add an extra offspring element, then upload the modified file (using HTTP PUT):

<pet-record xml:base="http://www.example.org/pets/lassie.xml" xmlns:xlink="http://www.w3.org/1999/xlink">
 <name>Lassie</name>
 <gender>f</gender>
 <dame xlink:href="spot.xml">Spot</dame>
 <sire xlink:href="jenny.xml">Jenny</sire>
 <offspring xlink:href="marmaduke.xml">Marmaduke</offspring>
 <offspring xlink:href="snoopy.xml">Snoopy</offspring>
 <offspring xlink:href="clifford.xml">Clifford</offspring>
</pet-record>

The second option is simply to send the REST server a message that it should add an extra offspring, by using HTTP POST to a URL like http://www.example.org/pets/updates/add-offspring and using parameters to identify sire or dame and offspring. I don’t know that it’s possible to say that either of these approaches is better; obviously, the POST approach would make more sense for very large resources.

The other use of POST would be to execute options that do not have an obvious correspondence with resource representations/files. A good example would be posting to a URL http://www.example.org/pets/actions/buy including parameters describing the pet you want to buy (i.e. the URL of the pet’s XML file — we’re RESTful, after all) and the price you are willing to pay.

The one use I don’t see for POST is uploading entire XML files, except as a way to work around firewalls that block PUT. Maybe we should fix the firewalls, or maybe we’ll just have to learn to live with this (ab)use of POST as a practical necessity for making REST work with the current web infrastructure.

Posted in General | 10 Comments

REST: is RSS the HTML for data?

As I’ve mentioned before, REST offloads complexity from the protocol (HTTP) to the content (XML). That makes REST look simple as long as you focus only on the protocol, but RESTafarians cannot get away forever with leaving the content format for data unspecified.

REST works with the existing document web because we have HTML to hold everything together — in other words, we have a standard protocol and a standard format. What’s the equivalent of HTML for the RESTful data web? RDF? XML Topic Maps? POX (Plain Old XML) with XLink? Nope — love it or hate it, I get the impression that it’s going to be RSS 2.0. People are starting to push the boundaries of RSS in serious ways, and so far, it’s not breaking. I have trouble imagining how we’re going to use RSS to encode information (say, a data record) rather than just pointing to information, but I’m ready to be surprised.

On the topic of RSS, I noticed that Open Search has introduced some RSS 2.0 extension properties (confusingly labelled OpenSearch RSS 1.0 Specification) to handle result paging, which was at the centre of another of my REST design questions. The spec is admirably minimalist, introducing only three new child elements of channel: openSearch:totalResults, openSearch:startIndex, and openSearch:itemsPerPage. That way, a RESTful web app can return (say) results 65-98 of 200,000 in a reasonably portable way:

<rss version="2.0" xmlns:openSearch="http://a9.com/-/spec/opensearchrss/1.0/">
 <channel>
  <title>Example.org search: REST</title>
  <link>http://www.example.org/search?q=REST&amp;start=65</link>
  <description>Search results for REST.</description>
  <openSearch:totalResults>200,000</openSearch:totalResults>
  <openSearch:startIndex>65</openSearch:startIndex>
  <openSearch:itemsPerPage>33</openSearch:itemsPerPage>
   ...
 </channel>
</rss>

This is exactly the way people are supposed to use Namespaces (nicely done!), I’m impressed that they require including the GET URL that can reproduce the search results. It would be even better, I think, if A9 added just two more elements to their RSS extensions:

  <openSearch:previousLink>http://www.example.org/search?q=REST&amp;start=32</openSearch:previousLink>
  <openSearch:nextLink>http://www.example.org/search?q=REST&amp;start=99</openSearch:nextLink>

That way, I would be able to page through the results without having to know how to construct query GET URLs for that particular site.

I like RSS for syndication, but it wasn’t exactly what I had in mind for general data handling (I would at least have liked a common attribute identifying URLs, like xlink:href); then again, HTML wasn’t exactly what I had in mind for Hypertext in 1990 either, and it took me two years to stop being sniffy and start working with it. I won’t wait that long this time.

Tagged | 6 Comments

Admin: Comment and Pingback Limits

I’ve been spending a lot of time deleting comment and pingback spam from my two blogs (most of it from the moderation queue). My first impulse was to ban comments and pingbacks completely — after all, some blogs seem to do fine without them, and most people technically-oriented enough to read Quoderat already have their own blogs that they can use to comment on mine.

After some thought, however, I’ve decided on a compromise — I’m going to leave postings from the current and previous month open for comments, but close any older ones. That should eliminate a lot of the spam, but still allow discussion on recent postings. I might tighten that up a bit more, but I’ll give it a chance, first.

How is everyone else dealing with comment/pingback/traceback spam? My blog isn’t all that popular — it must be much worse for blogs with high rankings.

Posted in General | 5 Comments