Tech Fashions: What's in a name?

Dare Obasanjo complains that new names like SOA, AJAX, and REST have more to do with fashion than software. He’s right, but his posting might be missing the point.

There are two reasons that a fuzzy, general approach to things (as opposed to a concrete standard or application) gets a name:

  1. the approach represents something a lot of people are actually doing; or
  2. the approach represents something someone wants a lot of people to do.

The second reason has no real value except hype, and Dare and I would probably agree in condemning it as vacuous marketing drivel. The first reason, though, has some real value: it represents the moment of self-recognition for a community of practise, a group of people who discover that they’re thinking and working in the same way. No doubt, many woodworkers made fine furniture when they were all still called carpenters; however, some of them must once have realized that they were spending most of their time doing fine, detailed work rather than laying studs and joists, and that the skills and even tools they were using were different — calling those people joiners or cabinet makers gives us a way to recognize that community of practise and the distinctive approaches and skills that set it apart from other woodworkers. (Of course, the distinction almost certainly originated in a language other than modern English, but I don’t feel like researching the original terms right now.)

So it’s true, as I mentioned in an earlier posting, that AJAX is nothing new — in fact, leaving aside XMLHttpRequest altogether, AJAX is nothing more than old-fashioned client-server under a new guise, just as cabinet making is nothing more than fussy carpentry. It’s also true that people were doing precisely what is now called AJAX long before the name was invented. But both statements are missing the point: the name AJAX is catching on because it represents a problem-solving approach that a lot of people now like and either use or want to start using; it represents the emergence of a community, not a standard or a chunk of code. Ditto for REST (though in this case Dare is complaining about the popular usage, plain XML over HTTP, rather than the original meaning, the principle that every resource must be directly addressable). People posted online diaries and columns before we called them blogs; people built hypertext systems on the Internet before we called one the Web; people were using event-based parsing API’s long before they were called SAX-like. In each case, though, the arrival of a widely-accepted name helps us to pinpoint the movement when a social change — a fashion to use Dare’s own word — emerged in the technological community. These fashions are far more important than new standards or new code, at least when they represent genuine changes in people’s thinking.

So what about SOA? Does it represent another moment of self-realization among a group of people doing the same kind of thing, or is it an attempt to steer people the way someone wants them to go, a bit of vain marketing buzz? I’m still trying to decide.

Tagged , | Comments Off on Tech Fashions: What's in a name?

AJAX as a privacy solution

There’s a lot of noise about AJAX recently, ranging from positive to negative to what’s the big deal?

It’s true that architecturally, AJAX is nothing new — basically, it’s just the old, pre-Web client-server model wrapped up in the browser using Javascript and XML. It’s also true that people were doing this kind of thing with Java applets or DHTML back in the late 1990s, avoiding the need to install custom client software on every workstation. So what’s the big deal? Think back to the late 1990s — these applications were horribly unstable. First, they were rarely cross-platform, or even cross-version — you had to (say) be running exactly the right version of MSIE under Windows with the right DLLs, or exactly the right version of Netscape and Java, even to start up the apps, and then they generally crashed before too long anyway. Web developers are excited about AJAX now because applications like GMail are actually working on just about everyone’s computer (*nix/Windows/MacOS, MSIE/Firefox/Opera/Safari), and they almost never crash. New ideas aren’t worth much on the web; it’s stable, running, cross-platform implementations that count. We’ve never had good, stable, platform-independent client-server before, period.

Moving past the specific technologies, though, what are the advantages of abandoning our traditional thin-client web model and going back to client-server? One of the most interesting will be the ability to do information aggregation while preserving privacy. Imagine, for example, that I’d like to see a single, consolidated view of all my finances — my stocks, bonds, bank accounts, retirement savings, and credit cards. Using a thin-client approach, I have to give some web site, somewhere, the ability to access all of my private financial information for me; using a client-server approach, my browser itself could go out and retrieve the information separately from each institution and then aggregate it right on my screen. I have all the advantages of a single view, without giving up any personal information.

Privacy is going to be a bigger and bigger deal on the web over the next decade: as technology gets even better at violating it, governments will come under pressure to pass more and more legislation, limiting what corporations are allowed to ask for and do. AJAX in particular, and the client-server model in general, gives us one way to respect privacy without giving up the advantages of information aggregation.

The RESTafarians should be happy as well, since this involves using the browser as an XML+REST client.

Tagged | 2 Comments

Canadian Flag in CSS

Canadian flag in CSS (screenshot).

Via Anne van Kesteren (again), I have found a site with a pure-CSS rendition of the Canadian flag (the image here in my blog is a screenshot, not the live CSS). It’s a little squished, granted, but at least it’s the right way up.

Now, let’s see the XSL-FO version of the Canadian flag: any volunteers?

Tagged | 4 Comments

Attributes and Namespaces

Anne van Kesteren complains that the relationship between XML Namespaces and XML attributes bugs him, and I think that his annoyance might be justified. It’s been many years since we did the 1.0 Namespaces spec in the old XML working group, but as far as I can remember, the thinking that won out (not unanimously) was that attributes were similar to variables in programming languages: unqualified attributes were like automatic variables, scoped to a single element instead of a single function, while namespace-qualified attributes were like global variables:

int foo;

void
adjustFoo ()
{
    int bar = 3;
    foo = foo + bar;
}

The foo variable has a scope outside the adjustFoo function, while the bar variable does not. Similarly, in

<n:info xmlns:n="http://www.example.org/ns/n">
  <n:record n:foo="3" bar="4"/>
</n:info>

The n:foo attribute has a meaning independent of the meaning of the n:record element, while the bar attribute does not.

Sense or Nonsense?

Does that make sense to you? If not, don’t feel bad: drawing analogies between markup and programming code is a dubious undertaking at the best of times. Unlike the foo variable, for example, the n:foo attribute doesn’t have a global value space, only a global meaning.

I do think that the idea of globally-defined, namespace-qualified attributes (like rdf:about, xml:id, xlink:href, etc.) is a very useful one. We messed up, though, by adding this extra level of complexity with unqualifed attributes (they should have just inherited the parent element’s namespace) — in other words, we didn’t choose the simplest thing that could possibly work. It’s too late to change Namespaces now, though, and any attempts to codify or clarify things since Namespaces 1.0 seem only to increase the confusion.

Posted in General | 4 Comments

Big, public REST application: Seniors Canada Online

[Update: partial contact information at bottom.] Yesterday I found out about a major government XML+HTTP (i.e. REST) web application that has been open to the general public since October 2004 but was never formally announced — I’m posting about it here with permission from the federal department that’s hosting it.

The Seniors Canada Online web site is designed to provide amalgamated information for senior citizens from all levels of government — currently it contains seniors’ information from the Canadian federal government, the provincial and territorial governments, and the city of Brockville, but more municipalities and NGOs will likely be joining in the future. Instead of simply providing an HTML interface for human readers, however, the site’s maintainers decided to make information available via XML as well so that other jurisdictions (such as provinces and cities) could include the same seniors’ information in their web sites. In fact, since it’s wide open, anyone can experiment with using the XML data.

According to the developer, the implementation was trivial — the REST application shares its database and application logic with the HTML web site, so the XML part is just a thin view written on top of all that, running in parallel with the HTML view.

Simple Example

Currently, the REST interface is read-only, and all requests are HTTP GETs, so they are bookmarkable, cacheable, linkable, and all the other good stuff that comes with REST. Here’s a simple example that searches for the word “sport”:

http://www.seniors.gc.ca/servlet/SeniorsXMLSearch?search=sports

The result is an XML-encoded list of URLs and Dublin-Core-style metadata; here’s an example of one item in the result list:

<listing>
<realcount>1</realcount>
<offsetcount>1</offsetcount>
<referenceid>277103</referenceid>
<language>en</language>
<url>http://www.active2010.ca/index.cfm?fa=english.homepage</url>
<dctitle>ACTIVE2010</dctitle>
<priority></priority>
<dcdescription>ACTIVE2010 is a comprehensive strategy to increase participation in sport and physical activity throughout Ontario.</dcdescription>
<dcsource>ACTIVE2010</dcsource>
</listing>

Canada is a bilingual country, however, so you will reasonably expect that you could make the same query for French-language resources. Give it a try:

http://www.seniors.gc.ca/servlet/SeniorsXMLSearch?search=sports&lang=fr

Nuts and Bolts

I’m not going to describe the XML format here, since anyone who knows XML and Dublin Core will be able to puzzle it out in a few seconds.

Here are some request parameters that work with many of the REST URLs:

lang
“en” (the default) to request English-language results, or “fr” to request French-language results.
geo
An identifier from the coverage metadata table (see below) to restrict results to a specific area.
cat
An identifier from the category metadata table (see below) to restrict results to a specific hierarchical category.

Here are the GET URLs with any local request parameters:

http://www.seniors.gc.ca/servlet/SeniorsXMLDCCoverages
Get a listing of three-level coverage metadata (i.e. geographical locations). Use the request parameter dccoverageid instead of geo to restrict the results to a specific subset.
http://www.seniors.gc.ca/servlet/SeniorsXMLCategories
Get the a listing of three-level categorization metadata.
http://www.seniors.gc.ca/servlet/SeniorsXMLKeywords
Get an alphabetical listing of search keywords available. Use the request parameter letter to restrict the results to keywords beginning with a specific letter.
http://www.seniors.gc.ca/servlet/SeniorsXMLSearch
Get XML-encoded search results. Use the request parameter search to specify the search string, searchop to specify the search type (“all”, “or” or “exact”), and recfrom to specify the starting position in the results (defaults to 1).

For example, here is a list of French-language keywords beginning with “L”:

http://www.seniors.gc.ca/servlet/SeniorsXMLKeywords?lang=fr&letter=l

I’m not quite sure how the keywords relate to the search, but I’ll play around a bit and try to find out.

Update: Contact Information

After my posting, the government department that maintains this REST application has already started receiving enquiries from others considering the same thing. The site is maintained by Veterans Affairs Canada (VAC) on behalf of the Canadian Seniors Partnership (which involves multiple departments and levels of government). The technical contact for this project at VAC is Ron Broughton.

Tagged | Comments Off on Big, public REST application: Seniors Canada Online

Cascading RSS

The idea of Cascading RSS (or aggregation aggregation) is so obvious that it has probably already been blogged to death or even implemented by well-known web sites; unfortunately, my short attention span ran out before I think up the right search words for Google, so I’ll pretend that it’s my own, original idea for now. We use RSS or Atom to tell people when a web resource has changed, but that can still involve polling dozens or hundreds of RSS files frequently. With only a few tiny tweaks, we could also use master RSS files to tell us when other RSS files have changed, cutting the polling by (potentially) orders of magnitude.

I can think of a couple of places where this approach could allow RSS into places where it hasn’t been able to go yet:

Information Management

A very enlightened company might realize that RSS gives it an excellent way to manage information from all of its divisions, branches, subsidiaries, partners, and so on. Everyone simply puts data (sales figures, inventory, projects, and so on) on the company intranet as XML data files (presumably with appropriate authentication and authorization requirements) and then uses RSS to announce when new information is available or old information has changed. If division X needs to monitor inventory data from division Y, it polls division Y’s inventory RSS file every 5 minutes to see if there’s anything new.

The problem is that the network will get messy if the company ends up with thousands of RSS files, and everyone is polling everyone else’s every 5 minutes, especially if some of them are on old, slow servers. To simplify things (and speed them up), the company could have one fast server that polls all the RSS files in the company and then produces its own RSS file with the most recent change dates for each one. Now, everyone can poll only that central server, but the divisions still own their own data. Of course, it would be possible to build this up in several cascading layers to avoid one RSS file with 1,000 entries.

Personal RSS

Sooner or later, we’ll have personal RSS for reporting information like credit card and bank transactions, as Tim Bray predicted almost two years ago. One of the biggest problems here, though, is that people might be reluctant to give personal passwords to online aggregators like Bloglines. People might, however, allow online aggregators to request the last-modified time of their credit card or bank feeds, and they could use these to build cascading RSS files, allowing users to reduce the amount of polling they have to do from home or on the road.

In other words, both advantages have to do with getting RSS into the business world (either B2B or B2C), not with improving the current blogosphere. I’ll look forward to finding out who has thought about this idea in more detail, or even implemented it.

Tagged | 1 Comment

Business requirements: the weakest link?

Nobody, or at least, almost nobody in the software engineering world believes in the waterfall design model any more. Like Santa Claus, waterfall sounded like a great idea (old guy comes down chimney and leaves free stuff in living room; planners work out all issues during design phase so that implementation is easy and predictable), but a little maturity and experience forces us to leave both fantasies behind.

So now we mostly agree that software and information system design has to be iterative and agile, and after years … well, decades … of heartbreaking failure, we’re actually starting to be able to deliver projects that work on a reasonable schedule and for reasonable cost.

But how do customers know what to order in the first place? Out of curiosity, I spent some time reading through the UN/CEFACT Modeling Methodology, or UMM, the business-side counterpart to the technological ebXML specifications from OASIS (the two can be used independently). If you don’t feel like reading hundreds of pages of documentation, here’s a quick summary: the UMM is a top-down business-modelling approach using forms and UML models to move from an abstract, executive-level view of a business to a more concrete, information-management view. Once the final, low-level UML model is ready, it can be handed off to technologists for implementation using ebXML, Web Services, or anything else convenient (it tries hard to be technology-agnostic). The decomposition of business modeling goes through four stages, called views:

  1. the business domain view separates the business into areas and processes, from the perspective of senior management
  2. the business requirements view deals with scenarios, inputs, outputs, and so on, from the perspective of an expert in the business domain
  3. the business transaction view is pretty-much the same thing, but more concrete and from a more technological perspective (i.e. less what? and more how?)
  4. the business service view deals with services, agents, and other stuff, and gets passed off to the software developer for implementation

In other words, it’s a waterfall. If that approach doesn’t make sense for technology, can it make sense for business management? As far as I can tell, business management and technology both deal with building robust, predictable systems against constantly-changing requirements and unpredictable inputs, so my own hunch is that a top-down approach like the UMM will not serve business itself well, and will make life hard for the technologists feeding out of the bottom of it.

In fact, I think that the problem may go much deeper, because no matter how business requirements themselves are developed, top-down or bottom-up, there is usually a waterfall-style leap between business requirements and technology requirements: businesses are supposed to figure out what they want, and technologists are supposed to figure out how to give it to them. In reality, however, business requirements are often driven by the technology available: few businesses were interested in setting up an online presence, for example, until the web made it cheap and easy to reach customers that way; outsourcing technical support to offshore call centres is economical only because of certain types of technology; opening up an organization’s systems makes sense only if enough potential partners are using something like ebXML or Web Services; and so on.

We’ve gotten a lot better at building systems to spec, so if we’re going to see similar improvements in the future, we’ll have to start looking at the specs themselves, learning to iterate all the way back up to the top, instead of just inside our little technology sandbox — in other words, it’s not just the technical requirements but the business requirements that have to be agile. If that means that the CEO occasionally has to be troubled with nuts-and-bolts details like web protocols or database scalability, so be it — it beats losing the whole company to a bad technology decision.

Posted in General | Comments Off on Business requirements: the weakest link?

REST design question #5: the "C" word (content)

The other posts in this series of REST design questions has danced around the edge of the content problem dipping in its toes with issues like identification and linking, but now that the design questions are coming to a close, it’s time to dive right into REST’s biggest problem: content.

The principles of REST tell you how to manage resources in a CRUDdy way, but not what you can actually do with those resources. This is not a problem shared by other XML networking approaches: XML-RPC defines precisely what its XML content means, to the point that it can be serialized and deserialized invisibly and automatically; SOAP allows any kind of XML payload in principle (assuming it’s wrapped in a SOAP envelope), but most people use the default SOAP encoding which, again, can be serialized and deserialized somewhat automatically. REST, on the other hand, is pure architecture without any direct mention of content. RESTafarians boast that there are RESTful web applications already online for Amazon, , eBay, Flickr, and many others, but developers quickly figure out that they don’t get any benefit: each REST application requires its own separate stovepipe of code support right from the ground up, because they all use different content formats. If these all used XML-RPC or SOAP, there would be many standard libraries to simplify the developers’ work, and a lot of shared code that could work with all these sites.

Is REST, in practical terms, nothing more than a marketing word?

RESTafarians can argue that the lack of content standardization is a good thing, because it leaves the architectural flexible enough to deal with any kind of resource, from an XML file to an image to a video to an HTML page — moving the last two using XML-RPC or SOAP can be less than pleasant. On the other hand, the lack of any kind of standard content format makes it hard actually to do anything useful with RESTful resources once you’ve retrieved them. People have put forward candidates for standard XML-encoded REST content, including RDF and XTM, but it’s unlikely that either of these will take off, especially since RDF (the leader) does not even work nicely with most other XML-based specifications like XQuery or XSLT.

Standardizing XML REST content in bits and pieces

The alternative is to standardize content in bits and pieces — instead of trying to come up with a comprehensive data-encoding format, we can try to come up with a profile of standard markup bits that people can use in any kind of XML data document. Here are some of the possibilities:

xlink:href and xml:id for linking

I’ve already mentioned how the use of the xlink:href attribute will make it possible to design XML data crawlers similar to HTML crawlers, along with search engines and all the other good things that follow: no matter what the document type, the engine will be able to find the links.

Together with xlink:href, xml:id can allow links to point to fragments of XML documents easily, making it possible to refer to embedded resources.

<data>
  <person xml:id="dpm">
    <name>David Megginson</name>
  <person>
 
  <weblog>
    <title>Quoderat</title>
    <author xlink:href="#dpm"/>
  </weblog>
</data>

This stuff is critical — since REST is all about linking, lack of a standard linking mechanism in content will simply kill it before it can even start.

xml:base for document identification

Similarly, the xml:base attribute can provide an identifier and locator for an XML data document. An xml:base attribute attached to the root element can both give a base URL for resolving relative links in the document and a global identifier for the document.

<data xml:base="http://www.example.org/data/foo.xml">
  ...
</data>

xsi:type for data typing (?)

Do we need data typing at all in XML? The use of external schemas is generally a bad idea both for performance and security reasons, so if we want typing at all (at least for simple data types), we should do it in the document instance itself, using something similar to the xsi:type attribute. Norman Walsh doesn’t like this approach, but for reasons different from mine: I think that typing information is useful mainly for authoring, not publishing; Norman would prefer to see it offloaded into external schemas. If you want typing at all, I think that something like

<start-date xsi:type="xsd:date">2005-02-23</start-date>

is generally inoffensive, aside from the fact that it uses Namespace prefixes in attribute values (a bit of a nasty kludge). Compared with bolting a whole schema onto our poor little XML data document, however, it’s a lightweight solution, assuming that it actually adds useful information.

Dublin Core for simple, basic properties (??)

The Dublin Core failed completely in the HTML meta element, and many people don’t think it’s particularly well set up, but somehow those original 16 simple property names still have a lot of popular recognition in the tech community. By far the most useful of the property names is dc:title, which identifies the name of a resource (for display in a pick list, search engine results, and so on).

<city xmlns:dc="http://purl.org/dc/elements/1.1/">
  <dc:title>San Diego</dc:title>
  <region>California</region>
  <country>US</country>
  <population>1223400</population>
</city>

Will people go for this, though, or will the Dublin Core fizzle out here as well?

What else?

What other bits and pieces are out there that people would actually use in XML data files served out by RESTful web applications? I’m not convinced that the xml:space attribute is all that useful for generic XML data files, since it’s about formatting rather than meaning; the xml:lang is useful in XML documents intended for human readers, as I’ve mentioned, but for fielded data, I’d rather see language information in its own proper field, maybe using the Dublin Cores dc:language element (if the Dublin Core succeeds). Perhaps people will borrow rss:enclosure from RSS 2.0, for lack of any other standard way to indicate an external non-XML resource.

I’d love to hear other suggestions of what might appear in a simple profile for XML data REST content.

Posted in General | 5 Comments

REST design question #4: how much normalization?

[Update: why this has to do with REST] Here is the fourth in a series of REST design questions: how much should the XML data files returned by a REST web application be normalized into separate XML files? For example, if an application is returning information about the film Sixteen Candles, should it try to put most of the relevant information into a single XML file, like this?

<film>
  <title>Sixteen Candles</title>
  <director>John Hughes</director>
  <year>1984</year>
  <production-companies>
    <company>Channel Pictures</company>
    <company>Universal Pictures</company>
  </production-companies>
</film>

Or should it link to separate XML documents containing information about people, companies, and so on, like this?

<film xml:base="http://www.example.org/objects/014002.xml" xmlns:xlink="http://www.w3.org/1999/xlink">
  <title>Sixteen Candles</title>
  <director xlink:href="487847.xml"/>
  <year>1984</year>
  <production-companies>
    <company xlink:href="559366.xml"/>
    <company xlink:href="039548.xml"/>
  </production-companies>
</film>

(Of course, you can take this a lot further, making the relationships themselves, like isDirectorOf, into separate XML files, but this is enough to give a good flavour.)

Presumably, the REST server is creating the XML information from a relational database that is normalized, so the regular arguments about maintainability, etc. are not an issue. Still, each example has its disadvantages:

  • In the first example, the client application cannot be certain that two separate records are referring to the same director or production company, or to a different one that happens to have the same name. It will also be hard for the server to handle a PUT request to update the (normalized) database.
  • In the second example, the client application will have to make a ridiculous number of GET requests to assemble enough information for even the most basic application, like a cast list: complete information for information like cast, crew, and locations even for a single movie will likely involve retrieving over hundreds or thousands of tiny XML files.

Would imitating HTML be the best compromise? HTML links (the a element) typically include both a reference to an external resource and a short, local description of the resource at the other end of the link (i.e. the blue, underlined text). There is no reason that XML data files in a REST application cannot do the same thing, combining the advantages of the normalized and unnormalized approaches, as in this example:

<film xml:base="http://www.example.org/objects/014002.xml" xmlns:xlink="http://www.w3.org/1999/xlink">
  <title>Sixteen Candles</title>
  <director xlink:href="487847.xml">John Hughes</director>
  <year>1984</year>
  <production-companies>
    <company xlink:href="559366.xml">Channel Pictures</company>
    <company xlink:href="039548.xml">Universal Pictures</company>
  </production-companies>
</film>

Now, a simple REST client application does not need to retrieve extra data files simply to find the name of the director or production company, but it still knows where to look for more complete information. It can also use the link URLs as identifiers for disambiguating people, companies, and so on. The approach will also be familiar to web developers, the ones who will eventually decide whether to use REST for data retrieval.

Now, what about a REST application that supports not only GET but PUT? What should it do when someone tries to check in this document? I’d suggest that any information under an element with an xlink:href attribute should be considered non-canonical and ignored during the checkin — you don’t want to rename John Hughes on the basis of the description of one of his films — and that the label information inside the link be autogenerated at the next GET (presumably from the resource at http://www.example.org/objects/487847.xml).

This particular design question comes from personal experience during the late 1990s — the project involved moving precisely this kind of information in very large quantities to eCommerce customers. In that case, PUT was not an issue, since the customers did not have write access to the provider’s database.

(Josh Sled quite reasonably asks what this question has to do specifically with REST. The main selling point of REST is linking resources together, so I believe that figuring out when to link and when to embed will be critical to making REST-based applications work. Josh also mentions RDF. The project I mentioned actually was trying to use RDF [first the 1.0 WD, then the REC]; unfortunately, RDF makes an example like my third one difficult, since in 1.0 at least, a property had to have either a link or content, but not both; you end up having to create a new, inline resource for every link, which is messy. I’m not too familiar with the newer RDF version, so I don’t know if they’ve fixed that by allowing labeled links.)

Tagged | 3 Comments

REST design question #3: meaning of a link

This is the third in a series of REST design questions. The first design question asked about keeping track of location and identification information after you have downloaded an XML file; the second design question asked about discovering resources and dealing with long lists of data in a RESTful way.

The very heart of REST, both in its narrow original sense (everything must have a URL) and its broader popular sense (basic HTTP + XML as an alternative to Web Services), is linking. REST insists that any information you can retrieve must have a single, unique address that you can pass around, the same way that you can pass around a phone number or an e-mail address — those addresses make it possible to link resources (HTML pages or, in the future, XML data files) together into a web, so that either people or software agents can discover new pages by following links from existing ones.

Old-School Hypertext

But what does a link mean? That question matters a lot for anyone writing general-purpose REST software, such as search engines, data browsers, or database tools, that are not designed to work with only a single XML markup vocabulary. The pre-HTML Hypertext specialists believed that links could have many different meanings, and typically wanted to provide a way for the author to specify them; hiding in the shadows during the web revolution of the 1990s, the old-school managed to keep the fire alive long enough to add the universally-ignored xlink:type attribute to XLink. Do we need xlink:type for generic XML data processing in a REST environment?

I don’t think we do.

In fact, if you take a look closely, linking to an external resource from an HTML document always means the same thing:

Here is a more complete version of what I’m talking about.

It is very hard to think of any exceptions. For example, consider these three links from an HTML document:

<p>During the <a href="http://en.wikipedia.org/wiki/Renaissance">Renaissance</a> ...</p>
<img alt="Illustration of Galileo" src="galileo.jpg"/>
<script src="validate-form.js"/>

In every case, the element containing the link attribute is a placeholder for something somewhere else. Obviously, they cause different browser behaviour — the picture will be inserted into the displayed document automatically, while the Wikipedia Renaissance entry will not — but in all three cases, the thing linked represents something more complete: the Wikipedia Renaissance article is more complete than the phrase “Renaissance”, the image galileo.jpg is more complete than the alternative text “Illustration of Galileo”, and the Javascript code is more complete than the script placeholder.

New-School XML

Exactly the same principle will likely apply to links in XML data files, like this example:

<person xml:base="http://www.example.org/people/e40957.xml" xmlns:xlink="http://www.w3.org/1999/xlink">
  <name>Jane Smith</name>
  <date-of-birth>1970-10-11</date-of-birth>
  <employer xlink:href="http://www.example.org/companies/acme.xml">ACME Widgets, Inc.</employer>
  <country-of-birth xlink:href="http://www.example.org/countries/ca.xml">Canada</country-of-birth>
</person>

All of the information available for the person’s name is the string “Jane Smith”, and all of the information available for the date of birth is the string “1970-10-11”; however, there is more complete information about the employer at http://www.example.org/companies/acme.xml, and there is more complete information about the country of birth at http://www.example.org/countries/ca.xml.

It seems that unidirectional links like those used in the web always lead towards increasingly canonical information. If an XML element has a linking attribute, then, can we assume that the entire XML document subtree starting at that element represents a lesser version of the information available externally at the link target? If so, can we really gain much by adding xlink:role to the mix?

Snags

Is this a safe-enough assumption that we could use it with any RESTful XML data files, and perform some kinds of data processing without having to know about the specific XML vocabulary in use?

I can think of two counter-examples right away, and they both deserve some attention. First, there is one context where HTTP URLs frequently appear as attribute values in XML documents but do not refer to a more complete version of the information inside an element: XML Namespaces. Here’s an example:

<person xmlns="http://www.example.org/ns/">Jane Smith</person>

In this case, there may be no information available at all at the location http://www.example.org/ns/; if there is something there (like an RDDL file), it will most likely be information about the XML markup, not about the person Jane Smith. Of course, this URL does not appear as the value of an xlink:href attribute, so there is no ambiguity. More importantly, the use of URLs for Namespace identifiers (a choice which I supported) as caused an enormous amount of confusion among XML users, who expect the URL to point to something — something more complete or authoritative, that is. That very confusion is proof of how ingrained this use of linking is.

The second counter-example is the rise of the rel="nofollow" attribute in HTML links, partly as an attempt to counter spam in weblog comments and wiki sandboxes. If anything, this appears to vindicate the old-school hypertexters. They should be rushing into the street in disheveled clothing with a mad gleam in their eyes, shouting “Look, we were right! It took 15 years, but finally everyone sees that links do need semantic information attached!” and so on. But they’re not, probably because they’re smart enough to realize that this isn’t, quite, change the primary meaning of a link. The rel="nofollow" attribute says that the author does not endorse the link target, but it still provides a more complete version of the information. For example, someone who strongly dislikes the U.S. Libertarian Party might want to point to their web page without improving their Google search ranking, and thus include something like this:

<p>In contrast to the misinformation coming from the <a href="http://www.lp.org/" rel="nofollow">Libertarian Party</a> ...</p>

The resource at http://www.lp.org/ is still a more complete version of the information in the XML element, even if it is information that the author does not particularly like.

Implications

If linking really can be this simple, then we will be able to do a lot with XML data and REST even if we do not agree on a common content encoding. That could be enormously valuable: the document web was an enormous success precisely because content-encoding was standardized on HTML, so that people could build things like authoring tools and search engines. If we can do a lot of the same thing with an XML document web without forcing everyone to squeeze their data into something like RDF or XTM, we might just be able to get enough people to play along to make it work.

Tagged | 5 Comments