[Updated] Over in my aviation weblog, I find myself more and more linking to Wikipedia whenever I’m discussing a concept, person, place, or anything else that doesn’t have its own, canonical home page. If, as I suspect, lots of other bloggers are doing the same, then links to Wikipedia articles may soon be the blogsphere’s answer to subject codes.
News wire services like Reuters or Dow Jones put a lot of time and money into maintaining long lists of subject codes to attach to their news products. Unlike the simple categories used in blogs, subject codes tell you not just that an article is about (say) computer technology, but that it is about specific companies, industries, people, places, and concepts. News customers use the codes to classify stories automatically, routing them to the appropriate editorial sections, displaying them on trading screens, sorting them into categories on web sites, or using them to improve searches. The providers are constantly sending out updated lists, keeping their customers’ technical departments very busy.
Should weblogs be using some kind of subject code (beyond categories)? Some areas already have standard identifiers that we could use, such as ICAO codes for airports, UPCs for retail products, ISBNs for books, CUSIPs for financial instruments, or ISO codes for countries, languages, and currencies. However, each of those requires some surrounding context: you need not only the code, but some indication that it refers to a currency or an airport. They’re also managed by central authorities, making them less attractive to the weblog community.
Enter Wikipedia. If I’m posting about Washington the U.S. state, I can link to the Wikipedia article about the state; if I’m posting about Washington the U.S. president, I can link to the article about the president; if I’m posting about Washington the U.S. capital, I can link to the article about the city; and if I’m using the word Washington by metonymy to refer to the U.S. government, I can link to the article about the government.
Bingo — subject codes, just like the big newswires use, only a lot more useful and totally open. I can link to abstraction subjects like love or communism or to time periods like the middle ages just as easily as I can link to concrete people, places, or things; if there’s not already a Wikipedia article on my subject, I can always start a stub. If people keep linking to Wikipedia, search engines like Technorati and aggregators like Bloglines might start taking advantage of those links to do some automatic categorization, right down to offering links to other postings on the same subject (“Click here for other postings about Open Source“). Once people know the search engines are doing that, they’ll be bound to link to Wikipedia even more than they already are, creating a virtuous circle where both Wikipedia and the blogsphere become more valuable.
Of course, like anything that people actually do in the web (as opposed to drawing-board architectures that never get implemented), this approach is far from perfect. Once the search engines are paying attention to Wikipedia links, some people will deliberately include misleading links to have their weblog entries miscategorized, though rankings like Technorati’s should help make sure that the most relevant ones stay near the top of the list. Furthermore, Wikipedia URLs do change, especially for the sake of disambiguation, so the Wikipedia URLs will never be 100% accurate as subject codes. And finally, the Wikipedia project itself could shut down, leaving all of the subject codes orphaned. Still, since linking to Wikipedia is something many of us do anyway, it looks like a good, quick-and-dirty webby alternative to the news industry’s subject codes — it might even work better.
Update: James Tauber posted the same idea with slightly different language back in October, and has just put up a followup.
Pingback: Quoderat » Hub URLs and feudalism in the blogsphere
Couldn’t agree more. See Wikipedia URIs which is a follow up (inspired by your post) to an earlier post Wikipedia as a URI Lookup Service.
Nice to see you blogging, David!
The problem for me with using Wikipedia URIs is the tie to a single topic scheme (albeit a good one). Ideally it would be better to enable the use of any, possibly in combination. The most promising work I’ve seen on this recently is the FOAF Output plugin, which (in part) auto-generates a personal ontology based on WordPress categories, expressed using SKOS, and inserts statements in the RSS feed for each item referring to the terms in the ontology. This is really more of a formalisation of the del.icio.us/Flickr/Technorati kind of informal tagging system, associating the words with the user’s own scheme. But this approach would be entirely compatible with the kind of referencing you describe, and the user could potentially make cross-references with other schemes such as Wikipedia and WordNet. Ok, so your suggestion would avoid the need for any RDF, and would I’m sure make a good quick-n-dirty approach. But it would be nice to use it in a way that would leave the door open for any RDF-capable clients/services to go further.
In a proposal for the DHS UICDS that didn’t make it, we proposed using wikipedia and wikipedia-analogs to enable public safety agencies to begin creating subject matter expertise online and in community. That is a heckuva lot simpler approach to the problem of aggregating expertise bottom-up than waiting for subject code and authorized ontologies to come out of the Beltway. People wait for funding, or wait for decisions, or wait for Jesus to come back when all they have to do is form a community, open an online editor, and just start adding content and services.
The web as a ‘world of ends’ really escapes too many industries looking for a way to create a product niche when what they really should be doing is creating a content niche and offering a service. It’s just too simple to get the vendor’s attention, so it is usually a customer who gets it done.
Good to see your blog, David. As always, I found my way here via Tim Bray.
I can see why using wikipedia urls is attractive, but I think the real way forward is to allow categorisation in hierarchies, rather than the flat (and very impermanent) wikipedia namespace. I have a longer discussion of the issue in the context of scientific data management in my blog.
Pingback: Chaos Magnet