My biggest problem with Wikipedia


Summary: You can’t partition a web site’s users into discrete groups by language.

I don’t worry much about Wikipedia’s objectivity or reliability — no sources (especially not newspapers or Britannica) are objective or reliable, and at least Wikipedia preserves its conflicts and controversies in comments and edit history — but I do have one bit problem with the project: WHY THE *^%*& DON”T THEY HAVE SINGLE-SIGNON?

I usually edit in English, but I can also make at least minor contributions to Wikipedia in French, German, Spanish, Italian, and Latin, and sometimes also contribute to Wikimedia. Every one of those requires me to create a separate account! It is absurd that my username and password for en.wikipedia.org won’t work for fr.wikipedia.org.

Don’t make this mistake with your own webapps, kids. Lots of people in the world are comfortable working in more than one language, even if they’re not fluent in all. It’s good to make a site available in more than one language, but don’t expect language to partition your users into discrete groups. Don’t lock them into a single language with a cookie, or limit their accounts to one language domain — multilingualism is extremely common around the world, even in the U.S. (how many American users would want to be able to use a site in English and Spanish if given the opportunity?)

Tagged , , | Comments Off on My biggest problem with Wikipedia

Maybe the women are right

Summary: Perhaps the women who don’t choose computer programming are making a good choice, especially with the deteriorating working conditions, stagnant or falling salaries, and offshoring.

Recently, we’ve had a few postings about women in computing (or the lack thereof) — see Bray, Wood, Tenison, and Bray (again), all ignited by a piece in devChix.

These postings all assume that we need to do something to pull more women into coding. Why? Do we think there are there lots of women would be happy coding, but aren’t smart enough or motivated enough to choose the right careers for themselves, or are too timid to deal with any barriers unless someone comes along and dismantles them first?

Listen to the market

In an age where we’ve come to trust central planning less and the free market more, why not try to learn from the labour market instead of trying to push it ways it doesn’t want to go?

If we assume that the majority of working women are smart, strong, motivated, and brave, then we can also assume that they have good reasons for choosing their careers. And in fact, it turns out that their track record isn’t bad. For example, in the 1970s and 1980s, women were grossly underrepresented in manufacturing and overrepresented in lower-paying service-industry jobs like retail. But when manufacturing starting offshoring in the 1980s and 1990s, it was the women who were still working (often as managers, at this point), while the men were at home, depressed, collecting welfare cheques or trying to retrain for jobs that paid a fraction of what they used to earn.

Now, while there’s lots of work connected with tech, we see pure coding increasingly being offshored, the same way that manufacturing was 20 years ago. There’s no shortage of women working in jobs connected with computers, but instead of coding, many women choose onsite consulting, training, marketing, and other jobs that are not only social but require face time with customers, and as a result, are much more difficult to offshore.

Of course, if you absolutely love coding, like I do (and most of the people reading this do), you’re going to work hard to try to find a way to keep doing it, whether you’re a man or a woman. But if you don’t feel that burning love, why let yourself be dragged kicking and screaming into an industry where salaries are falling, jobs are fleeing, hours are increasing (bye bye weekends!), and workers are increasingly treated as interchangeable cogs on a development assembly line, without even the (questionable) union protection their parents had in their factory jobs 20-30 years ago?

Posted in Uncategorized | Tagged , , , | 2 Comments

The lawyers …

Lawyers force companies to write page after page of end-user license agreements (“clause xix: The said user hereby indemnifies ACME Widgets against any harm caused to his/her pregnancy by use of this spreadsheet”) and disclaimers (“ACME Widgets does not intend that this flight-planning software be used to plan an actual flight or for any other aviation-related or planning-related activities”). We live in a litigious society, so companies need to protect themselves from spurious lawsuits.

But is that true?

When I read blogs and other stuff written by lawyers, VCs, and other similar professionals, I’m usually struck by the lack of legal disclaimers — they don’t seem to worry so much when it’s their own necks on the line, or are satisfied with one very brief, casual-language disclaimer instead of pages of B.S.

That leads me to two questions:

  1. Do disclaimers and long EULAs actually work? Does the amount of legal text have any correlation (positive or negative) with the likelihood of being successfully sued?
  2. Is legal mumbo-jumbo something that lawyers push on companies, or something that paranoid company execs demand from their lawyers (maybe because everyone else has it)?

I’d be interested in pointers to any studies, etc.

Tagged , | 2 Comments

XML 2007 Call for Participation (closes 31 August)

XML 2007

The Call for Participation in XML 2007, the world’s largest and longest-running markup conference, is now open until Friday 31 August:

http://2007.xmlconference.org/public/cfp/4

The conference runs from Monday 3 December to Wednesday 5 December 2007, and includes a keynote by Jason Hunter, one of our most popular speakers at past conferences.

Thanks to Edd Dumbill and his startup, Expectnation, we have a much simpler, better-designed web interface for submitting proposals this year, so please drop by and take a look.

Changes for 2007

There are a few changes for the conference this year:

  • There will be no separate tutorial day; instead, the regular program (three tracks) ends at noon on Wednesday, then Wednesday afternoon will be devoted to a special training track for all different skill levels (no separate registration required).

  • This is the only call for participation; there will be no late-breaking call this year.

  • On one of the evenings, we plan to offer lightning sessions for standards groups — each group will have 20 slides and 6 minutes and 40 seconds to let us know what they’re working on. This will be a great way to learn a lot about a lot of specs and standards in a short time.

  • We continue to encourage presentations on open data and document technologies other than XML, such as JSON.

We look forward to seeing you in Boston this December.

Tagged | Comments Off on XML 2007 Call for Participation (closes 31 August)

REST, the Lost Update Problem, and the Sneakernet Test

Dare Obasanjo is giving a bit of pushback on the Atom Publishing Protocol, but the part that caught my attention was the section on the Lost Update Problem. This doesn’t have to do with REST per se as much as with the choice not to use resource locking, but since REST people tend to like their protocols lightweight, the odds are that you won’t see exclusive locks on RESTful resources all that often (it also applies to some kinds of POST updates as well as PUT).

How to lose a REST update

  • I check out a resource about “John Smith” (as a web form or an XML document, for example), and correct the first name field to “Jon”.
  • You check out the same resource, and correct the last name field to “Smyth”.
  • I check in my changes.
  • You check in your changes.

You have corrected the last name to “Smyth”, but have inadvertently overwritten my correction of the first name with the old value “John”, because you never saw my update.

Detection, not avoidance

Without exclusive locks, there’s no way to avoid this problem, but it is possible to detect it. What happens after detection depends on the application — if it’s interactive, for example, you might redisplay the form with both versions side by side. I don’t mean to diminish the difficulty of dealing with check-in conflicts and merges — it’s a brutally hard problem — but it’s one that you’ll have whenever you chose not to use exclusive resource locks (and even with resource locks, the problem still comes if someone’s lock expires or is overridden). Managing multi-user resource locks properly can require a lot of extra infrastructure, and they have all kinds of other problems (ask an enterprise developer about the stale lock problem), so there are often good reasons to avoid them.

State goes in the resource, not the HTTP header

Dare points to an old W3C doc that talks about doing lost-update detection using all kinds of HTTP-header magic, requiring built-in support in the client (such as a web browser). That doesn’t make sense to me. A better alternative is to include version information directly in the resource itself. For example, if I check out the record as XML, why not just send me something like this?

<record version="18">
  <given-name>John</given-name>
  <family-name>Smith</family-name>
</record>

If I check it out as an HTML form, my browser should get something like this:

<form method="post" action="/actions/update">
  <div>
    <input type="hidden" name="version" value="18" />
    Given name: <input name="given-name" value="John" />
    Family name: <input name="family-name" value="Smith" />
    <button>Save changes</button>
  </div>
</form>

When you check out the resource, you’ll also get version 18. However, when I check in my changes (using PUT or POST), the server will bump the resource version to 19. When you try to check in your copy (still at version 18), the server will detect the conflict and reject the check-in. Again, what happens after that depends on your application.

The Sneakernet Test

I think that this is far better than the old W3C solution, because it (1) it’s already compatible with existing browsers, and (2) it passes what I call the Sneakernet Test — I can take a copy of the XML (or JSON, or CSV, or whatever) version of the resource to a machine that’s not connected to the net, edit it (say, on the plane), then check it back in from a different computer — I can copy it onto a USB stick, take it to the beach, edit it on my laptop, then take it back to work and check it back in — all the state is in the resource, not hidden away in cryptic HTTP headers.

By the way, if you don’t trust programmers to be honest when designing their clients, you can use a non-serial, pseudo-random version so that they can’t just guess the next version and avoid the merge problem, but serial version numbers should be fine most of the time.

Tagged , , | 11 Comments

Godwin's law goes mainstream

Godwin’s Law has finally left the geekier corners of the net and gone mainstream: political opponents and the press are lambasting Canadian Green Party leader Elizabeth May for an immature and totally gratuitous Nazi reference.

According to Godwin’s law, that means that the debate is now over and the environment has lost. Gee, thanks!

Tagged , | Comments Off on Godwin's law goes mainstream

Country codes: a spreadsheet-sharing experiment

I’ve just uploaded a spreadsheet of country codes (plain HTML view) to Google documents and spreadsheets. The spreadsheet includes ISO 3166-1 alpha-2, alpha-3, and numeric codes together with FIPS 10-4 codes, and the country names as provided in each spec. I originally created it to help me map FIPS to ISO codes from some air navigation data.

I’m interested in online data collaboration — what tools people need, how it will work in practice, etc. — and this seems like an easy way to experiment. If you’d like to make any corrections to the spreadsheet, let me know, and I’ll add you as a collaborator. I might also upload some spreadsheets of general geodata in the future, where there’s more opportunity for contributions.

Tagged | 1 Comment

Ruby on Rails pain at Twitter

Josh Kenzer has posted an interview with Alex Payne, a developer for Twitter, which is one of (if not the) biggest Ruby on Rails-based web apps.

A couple of years ago, when I was getting tired of working within the confines of the Java/J2EE bubblesphere, I tried out PHP and Ruby on Rails, intended to like Rails; instead, I surprised myself by preferring PHP, an ugly hack of a language optimized for script kiddies (I’ve been using it ever since). It looks like Payne is coming to the same conclusion, as his team has ended up working to keep Twitter running despite RoR instead of because of it. Here is an excerpt:

All the convenience methods and syntactical sugar that makes Rails such a pleasure for coders ends up being absolutely punishing, performance-wise. Once you hit a certain threshold of traffic, either you need to strip out all the costly neat stuff that Rails does for you (RJS, ActiveRecord, ActiveSupport, etc.) or move the slow parts of your application out of Rails, or both.

There’s lots more in Kenzer’s posting, including Payne’s claim (I don’t know enough to verify) that Rails cannot support more than one database at once, and that “Running on Rails has forced us to deal with scaling issues — issues that any growing site eventually contends with — far sooner than I think we would on another framework.”.

Tagged | 1 Comment

Anonymity and freedom

Elliotte Rusty Harold is right that anonymity goes together with freedom, and I was happy to read his excellent posting How to Blog Anonymously. Rusty distinguishes three different kinds of anonymity — roughly “I don’t want to be embarrassed”, “I don’t want to be fired”, and “I don’t want to be hauled out of my bed by the secret police and shot” — and talks about the steps necessary to achieve each one.

Granted, anonymity has its ugly sides, like the disgusting online threats against Kathy Sierra and online abuse of Maryam Scoble, but it’s also sometimes the only conduit around the abusive authority of a government, employer, or even one’s peer group. As even Western democratic governments have become more authoritarian since 9/11, keeping these conduits open is more important than ever.

Granted, 99% or more of anonymous information is simply stupid or malicious, but if that’s the cost of freedom, it’s a relatively small cost to pay compared to the sacrifices our ancestors made to win us the freedoms in the first place.

Tagged , , | 1 Comment

Open Data matters more than Open Source

Dare Obasanjo just put up a posting with the title Open Source is Dead. Dare does happen to be a Microsoft employee, but his posting is none of the standard anti-Linux/OpenOffice/Apache/Firefox FUD. Instead, he voices a question that’s been floating around for a while:

… how much value do you think there is to be had from a snapshot of the source code for eBay or Facebook being made available? This is one area where Open Source offers no solution to the problem of vendor lock-in.

Let me out!!!

In other words, as the Web replaces Microsoft Windows as the world’s favorite desktop/laptop software platform (it may be there already), what good is Open Source to ordinary computer user? Even if a web site happens to be built on Open Source software (like the LAMP stack), I’m still locked in:

  • How can I move my address book and archived e-mail from Hotmail to Yahoo or GMail?
  • How can I move my blog (with all postings and comments) from Blogger to Bloglines or WordPress?
  • How can someone move her contact list and comments from MySpace to Facebook?
  • How can a buyer in Yahoo’s auction thingy verify my reputation on eBay?
  • How can I move my old flight plans from Aeroplanner to FBOWeb?
  • How can I move my sales contacts and data from Salesforce.com to Highrise?
  • How can I move my pictures with their tags from Flickr to Smugmug?

A crack of light under the door

These are huge problems, and the solution is probably going to have a lot more to do with Open Data than with Open Source. There are already a couple of minor successes:

  • Blog reading sites almost universally support OPML import and export, so that you can save the list of blogs you read from one site and move it to another.
  • Online wordprocessors and spreadsheets, of course, support the Microsoft Office formats and/or the OpenDocument formats and/or RTF and CSV.

That’s not much, though. Open Source (and its predecessor buzzword, Free Software) have been very important over the past couple of decades, giving us choices beyond the Microsoft/Apple duopoly that chained our desktops (and forcing the duopoly to open up a lot) and smashing the big-iron vendor cartel that owned our servers, but as the world shifts from desktop to web-hosted software, it can’t take us much further.

Posted in Uncategorized | Tagged , , , , | 5 Comments