Self-classification on the web

Coordinator: Crucifixion?
Prisoner: Er, no, freedom actually.
Coordinator: What?
Prisoner: Yeah, they said I hadn’t done anything and I could go and live on an island somewhere.
Coordinator: Oh I say, that’s very nice. Well, off you go then.
Prisoner: No, I’m just pulling your leg, it’s crucifixion really.
Coordinator: [laughing] Oh yes, very good. Well…
Prisoner: Yes I know, out of the door, one cross each, line on the left.

From Monty Python, Life of Brian (1979)

And now, for the pure joy of killing the joke by trying to explain it, this scene of Life of Brian is funny for two reasons:

  1. the Romans allow the prisoners to self-classify themselves as condemned-to-death-by-crucifixion or free-to-go, even though the prisoners have every incentive to lie and save their own lives and no incentive to tell the truth; but
  2. the prisoners all classify themselves correctly anyway.

A lesser wit (like me) would have stopped at the first part of the joke and let all of the prisoners run off; however improbable the first part, however, it’s always the second part that gets the laugh.

Tim Bray is still wondering about tags, but what he’s really wondering about, I think, is the whole idea of self-classification on the web. Should we be as trusting as the Roman coordinator? Will web content creators classify themselves honestly? So far, the record has not been good — for example, web search engines quickly learned to ignore Dublin-Core-style information in the HTML meta element because, unlike the prisoners in Life of Brian, doomed by their own honesty, people who create content for the web lie. In fact, they lie a lot.

At this point, folksonomy tags are a bit of a cottage industry, so the incentive for lying is low (people are happy to tell the truth when it doesn’t cost them much). Self-classification can work when the costs of lying are unacceptably high and the benefits of lying are low or non-existant — for example, a departmental web site inside a government or large company, a member of a supply chain, or a major vendor with a reputation to protect would lose much and gain nothing by using deceptive metadata to pull in more traffic. That does not apply to the web as a whole, though. Once you move beyond established relationships (enterprise or inter-enterprise), trust is much more difficult to manage.

What will happen when tags become more popular? Will the current model be sustainable? Is there any future for using any kind of metadata to self-classify on the web? The answer probably has something to do with reputation management, though people are doing a good job gaming even that with link farms and comment-/wiki-spam. The crucifixion line looks rather empty right now.

This entry was posted in Uncategorized and tagged . Bookmark the permalink.

5 Responses to Self-classification on the web

  1. Norman Walsh says:

    I heard Tim asking a different question. Even assuming everybody tells the truth, are tags useful? Do they actually help readers find the information they’re interested in more accurately or more quickly than your favorite search engine?

    I don’t know. I can’t get Technorati to recognize the tags in my posts
    and getting feedback from their support alias is more of a struggle than
    I have patience for more than about once a month.

  2. Nicely said. A related way of putting it, although perhaps not quite so pessimistic in its intent, is Don Turnbull’s bon mot that “the inmates are tagging the asylum“.

  3. Bob DuCharme says:

    As a follow-on to what Norm wrote: we have to consider the possibility that as systems like Technorati get more efficient in the handling and use of this metadata, there will be more incentives to use the metadata system, and therefore more incentives to lie. Formerly honest adders of metadata won’t necessarily start lying, but of the new people looking to take advantage of the increasingly useful system, we can expect to see more and more liars. (I won’t be one, though, I swear… yeah, that’s the ticket…)

  4. In reference to Prentiss’s comment, my impression was that Don was talking about letting people invent classification schemes (i.e. making up tags instead of using a pre-written scheme). My point is that any classification scheme, from folksonomies to Dublin Core to Library of Congress, runs into trouble when people can self-classify and the benefits of lying outweight the potential costs. That’s why search engines are likely to have to continue to ignore metadata of any kind to return useful results, at least for the more popular queries.

  5. I think we will end up with tagging bubbles in a sea of spam: tags will remain useful in one’s personal closed space, even if they won’t be on the open web.

Comments are closed.