Portlets are software modules that produce fragments of HTML markup that are assembled into a single HTML page, sharing common CSS stylesheet, cookies, etc. Final composition takes place on a portal server, and a single page is delivered to the client browser.
Portlets have a lot of features that iFrames don’t: they require fewer HTTP connections, they allow for common styling (one CSS stylesheet can style all the portlets on a page), and they can communicate with each other and take advantage of common authentication/authorization, etc. (so that a user doesn’t have to sign on to each portlet separately).
Portlets use a window-manager metaphor, allowing the portlet server to resize them, expand them etc. They also have modes, like edit and view, all of which can be accessed through a common interface. All of this happens on the server side.
iFrame-based widgets don’t normally do any of that, but they don’t require special portal servers, they can be embedded in more creative ways, and they offload the processing from the server to the client. They also introduce potential security holes, but only if they’re hosted somewhere that’s not under the original company’s control (the same applies to remote portlets using WSRP).
Portlets are used mainly in intranets, to provide a collection of enterprise apps on a single web page for employees (e.g. a news feed, calendar, expense forms, bug reports, etc.).
Widgets are used everywhere else (e.g. embedding Google maps, Facebook applications, etc.). While widget authors/consumers don’t tend to know (or care) much about portlets, the portlet people haven’t failed to notice the popularity of widgets — most (if not all) portal servers now have an iFrame portlet that does little more than wrap an iFrame and allow it to be resized, etc.
Are the extra features of portlets compelling enough to justify the extra cost and hassle of running a portlet server? Now that we have browser tabs, AJAX, etc., do enterprises really need to continue to squish all their apps into a single web page that looks like a 1995 Mac desktop gone bad?
My guess is that the only portlet feature with compelling benefits is common authentication/authorization — once the web community gets behind a solution to that problem (OpenID or something similar), widgets will probably push portlets out completely, even in the enterprise.
]]>My ISP set up the server for me last summer with a bare-bones Ubuntu distro, then I installed the extra packages I needed using aptitude over ssh. Since then, I’ve done many Ubuntu in-place upgrades, rolled out hundreds of changes and upgrades to the web apps and dozens to the database schema (some very significant), and upgraded WordPress n-teen times. Check this out:
$ uptime 13:08:31 up 175 days, 10:02, 1 user, load average: 0.23, 0.06, 0.02
That’s right — since my ISP first set up the server with a basic Ubuntu system, I’ve never had to restart it. In fact, if Apache and mod_php (PHP5) had ‘uptime’ commands, they’d show almost the same amount of time, since I restarted them only to make configuration changes in the first few days of setting up the server (unless apt stopped them to install a newer version during one of my upgrades). I’ve restarted MySQL more recently, but again, only to experiment with configuration changes (especially for fulltext).
Using reliable old technologies like Linux, Apache, MySQL, and PHP doesn’t win any cool points, but it certainly makes maintaining a web server and its applications easy. I can go on vacation, for example, without worrying about being able to get online to fix or restart my server every couple of days. I don’t have to stay up until 3:00 am on Sunday night so that I can take the server offline to roll out new software versions or bug fixes (aptitude installs any security fixes in place). I spend lots of time with my family. I go to my kids’ school concerts. I learned banjo and mandolin (why not, since I have the free time?).
And yes, my PHP web app is easy to maintain and extend, because I designed it to be that way (I can often implement, test and roll out new features in a matter of minutes, even when they require database schema changes) — it’s the developer, not the programming language, that determines the quality and maintainability of an app. A lot of newbies use PHP, so there’s a lot of bad PHP out there, but the same can be said for any language, even Ruby.
]]>Image: Thomas Penn, second proprietor of Pennsylvania, not as nice as his dad William.
Almost a year ago, I wrote that Open data matters more than Open Source — it doesn’t matter (to you, the end user) whether a web site is using Open Source software or not, if they still keep your data locked up.
Here’s a nasty example: Robert Scoble has just had his Facebook account disabled for running a script to try to scrape his personal information off the site (since Facebook doesn’t provide him with any other way to get it).
I understand that Facebook needs to protect against malicious bots — and they might decide to restore his account once they know what Robert was actually trying to do (though for now all traces of him have vanished) — but do we really want to have hope for the good will of social sites and beg for our own data every time we want it? Are web site owners the new version of the Proprietors in the early American colonies, who can grant rights as favours when they see fit?
]]>Dear AWS Developers,
This is a short note to let a subset of our most active developers know about an upcoming limited beta of our newest web service: Amazon SimpleDB, which is a web service for running queries on structured data in real time. This service works in close conjunction with Amazon Simple Storage Service (Amazon S3) and Amazon Elastic Compute Cloud (Amazon EC2), collectively providing the ability to store, process and query data sets in the cloud.
Traditionally, this type of functionality has been accomplished with a clustered relational database that requires a sizable upfront investment, brings more complexity than is typically needed, and often requires a DBA to maintain and administer. In contrast, Amazon SimpleDB is easy to use and provides the core functionality of a database – real-time lookup and simple querying of structured data – without the operational complexity.
Were excited about this upcoming service and wanted to let you know about it as soon as possible. We anticipate beginning the limited beta in the next few weeks. In the meantime, you can read more about the service, and sign up to be notified when the limited beta program opens and a spot becomes available for you. To do so, simply click the “Sign Up For This Web Service” button on the web site below and we will record your contact information.
It’s not SQL, or even SQL-like, though, supporting only the operators “=, !=, =, STARTS-WITH, AND, OR, NOT, INTERSECTION AND UNION”. I’m no relational expert, but I don’t think Codd would have been impressed. A distributed database is one of the big missing pieces from Amazon’s services, but I’m not sure if this will be it.
Earlier postings:
I didn’t have time to look at the OpenSocial API yesterday, so I’m continuing today looking at the data format for the last major area, persistence data.
My first impression of the persistence data API is that it doesn’t belong in v.1 of OpenSocial — unlike the member/friends and activities APIs, it doesn’t seem to be solving a core problem for social-site app writers (I have no way to get at a friends list except through the site’s API, but I can store my own data, thanks). I can see only two reasons that it’s here, neither of them very admirable:
I’ll give Google the benefit of a doubt and assume that it’s a vision thing, but that’s still very unhealthy — specs should solve the real problems of the present, not the speculative problems of the future, especially bare-bones v.1 specs like this.
Now that that’s out of my system, let’s take a look at what you get back from a URL like http://{DOMAIN}/feeds/apps/{appId}/persistence/global (and its many variants). From the spec, here’s what you get when you request a single piece of information from the API:
<entry xmlns='http://www.w3.org/2005/Atom'> <title type="text">somekey</title> <content type="text">somevalue</content> </entry>
Or, in non-XML terms,
$globals{'somekey'} = 'somevalue'
That comes from a URL like http://{DOMAIN}/feeds/apps/{appId}/persistence/global/somekey which requests a single value. Using the first URL mentioned gets you a feed of name=value pairs, sort-of like an associative array:
<?xml version='1.0' encoding='UTF-8'?> <feed xmlns='http://www.w3.org/2005/Atom'> <id>http://sandbox.orkut.com:80/feeds/apps/02864641990088926753/persistence/global</id> <updated>2007-10-30T20:53:20.086Z</updated> <title>Persistence</title> <link rel='http://schemas.google.com/g/2005#feed' type='application/atom+xml' href='http://sandbox.orkut.com:80/feeds/apps/02864641990088926753/persistence/global'/> <link rel='http://schemas.google.com/g/2005#post' type='application/atom+xml' href='http://sandbox.orkut.com:80/feeds/apps/02864641990088926753/persistence/global'/> <link rel='self' type='application/atom+xml' href='http://sandbox.orkut.com:80/feeds/apps/02864641990088926753/persistence/global'/> <generator version='1.0' uri='/feeds'>Orkut</generator> <entry> <id>http://sandbox.orkut.com:80/feeds/apps/02864641990088926753/persistence/global/somekey</id> <title>somekey</title> <content>somevalue</content> <link rel='self' type='application/atom+xml' href='http://sandbox.orkut.com:80/feeds/apps/02864641990088926753/persistence/global/somekey'/> <link rel='edit' type='application/atom+xml' href='http://sandbox.orkut.com:80/feeds/apps/02864641990088926753/persistence/global/somekey'/> </entry> </feed>
There’s only one entry in the spec’s example, but there could be a lot more. Basically, this is the equivalent of something like
$globals = { 'somekey' => 'somevalue' }
The comparison isn’t quite fair, because there are also some links explaining what you can do to modify this information, etc., but it still seems like a lot of markup for not much value (pun intended). I wonder if this would be a good place to use JSON instead of Atom+XML? After all, the serious apps will be doing their own data storage anyway, and the client-only apps will probably use a JavaScript API that hides the Atom from the developer.
As hinted at, at least, in my URL posting, there are several different data scopes:
It seems like a reasonable division of scope, especially since the app can’t get anything out that it didn’t put in.
I do believe that, eventually, many web apps will be about to outsource storage as a service instead of having to maintain their own databases and database clusters — in fact, Amazon’s S3 and its competitors already provide precisely this service, though they might not be optimized for a lot of name=value look ups. I’m surprised though, that this could be considered a key feature of a social app spec, when so much else was left out.
Earlier postings:
This is the third part of a series where I’m working through the OpenSocial specs as I write — that means that I haven’t preread and predigested this stuff, but am creating a record of how I approach a new set of specifications and try to understand them. First, I looked at the basic URLs for data access, since they provide the best high-level description of the OpenSocial capabilities (read-only info on members and their friends, read/write info on a member’s activity notifications, and a simple data-storage API). Next, I looked at the data format for the most important content, the member profile and friends lists. This time, I’ll look at the format for activity notifications, which is also based on the Atom syndication format.
To get a list of a member’s recent activities (uploaded a photo, poked a friend, got a new job, or stuff like that, I guess) an OpenSocial application uses the URL pattern http://{DOMAIN}/activities/feeds/activities/user/{userId} according to the specs, though I suspect that might be intended to be http://{DOMAIN}/feeds/activities/user/{userId} for consistency with the other data-access URLs — it’s hard to be certain. The host should return an Atom feed of activities, like this template example lifted from the spec:
<atom:feed xmlns:atom='http://www.w3.org/2005/Atom' xmlns:openSearch='http://a9.com/-/spec/opensearchrss/1.0/' xmlns:gact='http://schemas.google.com/activities/2007'> <atom:id>http://www.google.com/activities/feeds/activities/user/userID/source/sourceID</atom:id> <atom:updated>1970-01-01T00:00:00.000Z</atom:updated> <atom:category scheme='http://schemas.google.com/g/2005#kind' term='http://schemas.google.com/activities/2007#activity'/> <atom:title>Feed title</atom:title> <atom:link rel='alternate' type='text/html' href='http://sourceID.com/123'/> <atom:link rel='http://schemas.google.com/g/2005#feed' type='application/atom+xml' href='http://www.google.com/activities/feeds/activities/user/userID/source/sourceID'/> <atom:link rel='http://schemas.google.com/g/2005#post' type='application/atom+xml' href='http://www.google.com/activities/feeds/activities/user/userID/source/sourceID'/> <atom:link rel='self' type='application/atom+xml' href='http://www.google.com/activities/feeds/activities/user/userID/source/sourceID'/> <atom:author> <atom:name>unknown</atom:name> </atom:author> <openSearch:totalResults>1</openSearch:totalResults> <openSearch:startIndex>1</openSearch:startIndex> <openSearch:itemsPerPage>25</openSearch:itemsPerPage> <atom:entry> <atom:id>http://www.google.com/activities/feeds/activities/user/userID/source/sourceID/a1</atom:id> <atom:updated>2007-10-27T19:41:51.574Z</atom:updated> <atom:category scheme='http://schemas.google.com/g/2005#kind' term='http://schemas.google.com/activities/2007#activity'/> <atom:title>Activity title</atom:title> <atom:link rel='self' type='application/atom+xml' href='http://www.google.com/activities/feeds/activities/user/userID/source/sourceID/a1'/> <atom:link rel='edit' type='application/atom+xml' href='http://www.google.com/activities/feeds/activities/user/userID/source/sourceID/a1'/> <gact:received>2007-10-27T19:41:51.478Z</gact:received> </atom:entry> </atom:feed>
There’s a lot of front-matter in this, so it’s hard to realize at first glance that it lists only a single activity (in the atom:entry element near the bottom). The entry itself uses mostly standard Atom elements, except for one extension element from the Google activities namespace, giving the date that the notification was received (received date is also important in the news industry, so maybe this is something Atom needs to add to its core). Other than that, the activity itself is easy enough to understand: it has a unique id, a couple of dates, a title (which seems also to serve as the sole description), and web links for viewing and editing.
Unlike the member and friends info, which was read-only, OpenSocial allows apps to post new activities and edit or delete existing ones, but only in what is called a “source-level feed” — that’s a list of a user’s activities limited to a single source (which, I assume, is the application), using the URL pattern http://{DOMAIN}/activities/feeds/activities/user/{userId}/source/{sourceId} (which, again, may be a typo with an extra “activities” path element at the beginning). In other words, an application can read activities from any source, but it can mess around only with the ones it created. I’m not sure yet how the application knows its source id, or how the host verifies the app’s identity, but I’ll be looking at those issues in a later posting.
For members and friends, I noted that the spec’s example included the OpenSearch namespace but didn’t use it. This time, the namespace is used for the totalResults, startIndex, and itemsPerPage elements. These suggest that it’s possible to page through long lists of activities, though I could find no mention of that in the spec. Again, I don’t know much about Atom, but I think that Atom-blessed way to handle paging would involve using “first”, “next”, and “last” links.
I’m not deeply into social networking myself — with my adolescent children using Facebook, my joining that site would be like showing up in a leather jacket at their highschool dance, and 99% of the time I spend on the more grown-up sites like Plaxo, LinkedIn, and Dopplr is used approving connection requests. As a result, I wasn’t aware of how important activity notifications were for a social-networking site.
Whatever happens with OpenSocial, I have found it to be a good architectural introduction to social networking in 2007, though I suspect that the next thing I’m going to look at — the persistence data API — has more to do with Google’s business requirements than with social networking itself.
See also First looks at OpenSocial: part 1 (URLs)
This is the second part of a series of postings describing how I’m trying to understand the technical specs for the new Google-led OpenSocial initiative. In the first part, I cut down through all the text in the specs to get at the basic URLs, which represent the raw skeleton of services defined by the spec. This time, I’m going to look at the data formats, starting with the real bread and butter of social networking, people and their friends.
The content format for OpenSocial is always the Atom syndication format, a competitor to RSS for syndicating blogs and other similar information. I haven’t spent very much time with Atom yet — I appreciate that it’s more fully-specified than RSS 2.0, but I already know RSS and have run into no practical problems with it (through I’m aware of the potential ones) — so I’m probably not going to notice if or where the OpenSocial specs are violating the spirit or even letter of the Atom specs. I’ve occasionally seen complaints from Atom-heads about Atom-compliance in Google’s GData, and assume those apply to OpenSocial as well.
When you ask an OpenSocial provider for information about a member (using the URL pattern http://{DOMAIN}/feeds/people/{userId}), the spec says you get back something like this, assuming you’re authorized to make the request (lifted straight from the spec, and not namespace-compliant):
<entry xmlns='http://www.w3.org/2005/Atom' xmlns:georss='http://www.georss.org/georss' xmlns:gd='http://schemas.google.com/g/2005'> <id>http://sandbox.orkut.com:80/feeds/people/14358878523263729569</id> <updated>2007-10-28T14:01:29.948-07:00</updated> <title>Elizabeth Bennet</title> <link rel='thumbnail' type='image/*' href='http://img1.orkut.com/images/small/1193601584/115566312.jpg'/> <link rel='alternate' type='text/html' href='http://orkut.com/Profile.aspx?uid=17583631990196664929'/> <link rel='self' type='application/atom+xml' href='http://sandbox.orkut.com:80/feeds/people/14358878523263729569'/> <georss:where> <gml:Point xmlns:gml='http://www.opengis.net/gml'> <gml:pos>51.668674 -0.066235</gml:pos> </gml:Point> </georss:where> <gd:extendedProperty name='lang' value='en-US'/> <gd:postalAddress/> </entry>
Aside from the fact that the tech writer is a Jane Austen fan, a couple of other points jump out:
Of course, in reality, the most important information about a member is the member’s friends list, but that information comes through a separate URL, http://{DOMAIN}/feeds/people/{userId}/friends.
This example is also lifted from the spec (and is still missing the declaration for the GML namespace):
<feed xmlns='http://www.w3.org/2005/Atom' xmlns:openSearch='http://a9.com/-/spec/opensearchrss/1.0/' xmlns:georss='http://www.georss.org/georss' xmlns:gd='http://schemas.google.com/g/2005'> <id>http://sandbox.orkut.com:80/feeds/people/14358878523263729569/friends</id> <updated>2007-10-28T21:01:03.690Z</updated> <title>Friends</title> <link rel='http://schemas.google.com/g/2005#feed' type='application/atom+xml' href='http://sandbox.orkut.com:80/feeds/people/14358878523263729569/friends'/> <link rel='self' type='application/atom+xml' href='http://sandbox.orkut.com:80/feeds/people/14358878523263729569/friends'/> <author><name>Elizabeth Bennet</name></author> <entry> <id>http://sandbox.orkut.com:80/feeds/people/02938391851054991972</id> <updated>2007-10-28T14:01:03.690-07:00</updated> <title>Jane Bennet</title> <link rel='thumbnail' type='image/*' href='http://img1.orkut.com/images/small/null'/> <link rel='alternate' type='text/html' href='http://sandbox.orkut.com:80/Profile.aspx?uid=574036770800045389'/> <link rel='self' type='application/atom+xml' href='http://sandbox.orkut.com:80/feeds/people/02938391851054991972'/> <georss:where> <gml:Point xmlns:gml='http://www.opengis.net/gml'> <gml:pos>51.668674 -0.066235</gml:pos></gml:Point></georss:where> <gd:extendedProperty name='lang' value='en-US'/> <gd:postalAddress/> </entry> <entry> <id>http://sandbox.orkut.com:80/feeds/people/12490088926525765025</id> <updated>2007-10-28T14:01:03.691-07:00</updated> <title>Charlotte Lucas</title> <link rel='thumbnail' type='image/*' href='http://img2.orkut.com/images/small/null'/> <link rel='alternate' type='text/html' href='http://sandbox.orkut.com:80/Profile.aspx?uid=5799256900854924919'/> <link rel='self' type='application/atom+xml' href='http://sandbox.orkut.com:80/feeds/people/12490088926525765025'/> <georss:where> <gml:Point xmlns:gml='http://www.opengis.net/gml'> <gml:pos>0.0 0.0</gml:pos></gml:Point></georss:where> <gd:extendedProperty name='lang' value='en-US'/> <gd:postalAddress/> </entry> <entry> <id>http://sandbox.orkut.com:80/feeds/people/15827776984733875930</id> <updated>2007-10-28T14:01:03.692-07:00</updated> <title>Fitzwilliam Darcy</title> <link rel='thumbnail' type='image/*' href='http://img3.orkut.com/images/small/1193603277/115555466.jpg'/> <link rel='alternate' type='text/html' href='http://sandbox.orkut.com:80/Profile.aspx?uid=14256507824223085777'/> <link rel='self' type='application/atom+xml' href='http://sandbox.orkut.com:80/feeds/people/15827776984733875930'/> <georss:where> <gml:Point xmlns:gml='http://www.opengis.net/gml'> <gml:pos>53.017016 -1.424363</gml:pos></gml:Point> </georss:where> <gd:extendedProperty name='lang' value='en-US'/> <gd:postalAddress/> </entry> </feed>
Again, very straight-forward, if not namespace-compliant (due to the missing GML namespace declaration). There’s also a declaration of an OpenSearch namespace URI that’s never used, suggesting a feature that was removed in haste just before release. The friends list is simply a feed of person entries, just like the single entry returned for the member query, with a title, date, etc. at the top. Note that you always get the full friends list — there’s no support for filtering — so this might not be fun for someone who has 10,000+ friends.
What I don’t see, either in the example or the spec, is a way to provide typed relationships, like “spouse”, “colleague”, “classmate”, etc. I don’t know how important that is to application developers — simply getting the list of friends is probably the most important thing.
Instead of reading and understanding everything first and then posting from a (virtual) podium, I’m going to try to work out my own understanding of the APIs right here on the web. That means that I’ll be asking questions that I’ll find the answers for later, that I’ll be making incorrect assumptions, and that I’ll be deferring hard stuff (like authorization/authentication) until I understand the basics. This is not, then, an OpenSocial primer by any stretch, since I don’t actually know what I’m talking about, but it might be useful as a snapshot of how a developer approaches a new API.
OpenSocial is designed so that any app can get information from any site as long as it has permission (I’ll figure out how that works later) — to accomplish that, it uses standard URL patterns on every site returning Atom entries and feeds. So after digging through a lot of crackerjack, I finally found the prize buried in the docs. Here are the URL patterns:
Listing all the URLs together like this, instead of spreading them out over pages and pages of docs, is the best way to start with a REST API. For example, you can tell at a glance what what kind of information is available and what is and isn’t writable (you can’t add new friends for a user, but you can add a new activity). Sure, Javascript libraries, etc. are nice, but the class hierarchies can obscure how simple the underlying data actually is (or “are”, if you’ve studied Latin). You can also spot possible typos in the docs — for example, what are the odds that the activity URLs are really supposed to start with “/activities/feeds/” when everything else starts with “/feeds/”? It could be poor, inconsistent design, but I suspect cut-and-paste errors.
The next time I get around to looking at OpenSocial, I’ll try to figure out the formats — it shouldn’t be too hard, since they’re all Atom entries or feeds. Then I’ll get into messier stuff like auth/auth, and I may eventually try adding OpenSocial support to my OurAirports hobby site, though it doesn’t even support friends yet.
]]>Update: Nope, my solution won’t work. As Christian Matthies points out in the comments, it is possible to spoof the HTTP Host header as well (his link in the comment is broken because of an extra comma, but this one works). As a kludge, browsers could be modified to prevent Host header spoofing, but (a) it would take a long time to deploy to the world at large, and (b) it would be only a bandaid for a much bigger problem.
Summary: While there’s no way to protect browsers against the DNS rebinding attack, you can protect web sites and web services by forcing them to check the HTTP
Host
header with every request. This is easy to do for RESTful services going through a regular web server like Apache — you get it by default with virtual hosts — but might be trickier for WS-* services.
If you or your company is using HTTP-based web services (either WS-* or REST), you might be in trouble — a new exploit allows a web site from outside your firewall to use a web browser as a proxy to read any web site or service inside your firewall.
Artur Bergman at O’Reilly has a posting on the DNS rebinding (aka anti-DNS-pinning) attack that works against all major browsers, including all versions of Firefox and MSIE. There’s no obvious general fix for this, though there’s a Firefox extension that helps a tiny bit.
In a DNS-rebinding attack, the attacker is able to force your browser to read data from any IP address that your browser has access to, even if you’re behind a router/firewall, by changing the IP address associated with a domain name you’ve connected to. That means that given an IP address, an outside attacker can read your local website (at 127.0.0.1), anything behind your corporate firewall (such as an Intranet accounting page or a web service), or — I think (haven’t tested yet) — a website that you’re logged into using a cookie (HTTP authentication will force a popup, since the browser will see a different domain name, even if you’re logged into the site in another tab/window). If you run a local web server on your computer (say, at 127.0.0.1), you can go to http://www.jumperz.net/index.php?i=2&a=1&b=7, type in the local address, and see jumperz.net use the exploit display the source of your home page.
There’s no way to protect the browser yet, but you can protect your HTTP-based sites and services from this attack very easily — in fact, many sites on the web are already unknowingly protected, though I don’t know if most enterprise web services are.
The trick is in the HTTP Host
header. While the DNS rebinding attack can associate a new IP address with a hostname, it cannot change the hostname itself, so the browser will still send the original hostname to the new host. Nearly all shared-hosting servers — and many servers at dedicated hosts as well — will check the Host
header to decide what pages to serve out. As long as the site does something harmless when it gets an unrecognized hostname (such as returning a “501 Not implemented” HTTP status code), the site will be safe the attack. In Apache, for example, you use the ServerName directive for each virtual host, and just make sure that there’s a default virtual host that returns an error or at least does nothing harmful.
For Web Services, the same thing applies. It’s often tempting to use IP addresses instead of hostnames for web services (including RESTful services), especially during development, but doing so opens you right up to a DNS-rebinding attack, which could be very harmful if you’re using real data for development and testing. To protect your HTTP-based services from this attack, you need to make sure that every web service is accessed via a hostname rather than a raw IP address, and that every service checks its hostname. For RESTful services, this is trivially easy (since you’re probably going through Apache or something similar anyway, just as with a web site); for WS-* services, I don’t know the implementations well enough to be sure, but it should be possible to force them to check the Host
header somehow.
Even if you’re not building web services, managing an enterprise intranet, or running a public web site, don’t forget to protect the web server on your local computer, if you have one.
Keep all the database code together. Put all your database calls into a single source file if you can — functions like mysqli_query
(PHP) should never appear anywhere else but in this file — and create neutral functions like get_member()
or delete_cart()
for the rest of your code to call. The reason for this is not so that you can switch databases in the future (that’s easy enough to fix), but so that you can easily do a search/replace when you rename or modify tables. If all your database code is in the same place, your application will be orders of magnitude easier to maintain and upgrade a few months from now. Seriously.
Make an extra database for junk. If your hosting account allows more than one database, create at least two, say “foo” and “foo_cache” — put all the tables you need to back up into the first one, and all the stuff you don’t need to back up (views, caching tables, session states, etc.) into the second. Write a SQL script to automatically regenerate any required tables in “foo_cache” when you restore. That way, you won’t waste time and bandwidth every day backing up megabytes or gigabytes of stuff you don’t need and can easily regenerate.
Make GET harmless. If you use HTTP GET (e.g. $_GET
in PHP) to do things like deleting or modifying records, bad things will happen to your application — search engines will start randomly changing your database by following links (robots.txt
might not be enough to protect you), browsers will delete records by trying to precache pages, etc. Always use POST (normally from a form button) for anything that can make a change. More here.
Summary: You can’t partition a web site’s users into discrete groups by language.
I don’t worry much about Wikipedia’s objectivity or reliability — no sources (especially not newspapers or Britannica) are objective or reliable, and at least Wikipedia preserves its conflicts and controversies in comments and edit history — but I do have one bit problem with the project: WHY THE *^%*& DON”T THEY HAVE SINGLE-SIGNON?
I usually edit in English, but I can also make at least minor contributions to Wikipedia in French, German, Spanish, Italian, and Latin, and sometimes also contribute to Wikimedia. Every one of those requires me to create a separate account! It is absurd that my username and password for en.wikipedia.org
won’t work for fr.wikipedia.org
.
Don’t make this mistake with your own webapps, kids. Lots of people in the world are comfortable working in more than one language, even if they’re not fluent in all. It’s good to make a site available in more than one language, but don’t expect language to partition your users into discrete groups. Don’t lock them into a single language with a cookie, or limit their accounts to one language domain — multilingualism is extremely common around the world, even in the U.S. (how many American users would want to be able to use a site in English and Spanish if given the opportunity?)
]]>You have corrected the last name to “Smyth”, but have inadvertently overwritten my correction of the first name with the old value “John”, because you never saw my update.
Without exclusive locks, there’s no way to avoid this problem, but it is possible to detect it. What happens after detection depends on the application — if it’s interactive, for example, you might redisplay the form with both versions side by side. I don’t mean to diminish the difficulty of dealing with check-in conflicts and merges — it’s a brutally hard problem — but it’s one that you’ll have whenever you chose not to use exclusive resource locks (and even with resource locks, the problem still comes if someone’s lock expires or is overridden). Managing multi-user resource locks properly can require a lot of extra infrastructure, and they have all kinds of other problems (ask an enterprise developer about the stale lock problem), so there are often good reasons to avoid them.
Dare points to an old W3C doc that talks about doing lost-update detection using all kinds of HTTP-header magic, requiring built-in support in the client (such as a web browser). That doesn’t make sense to me. A better alternative is to include version information directly in the resource itself. For example, if I check out the record as XML, why not just send me something like this?
<record version="18"> <given-name>John</given-name> <family-name>Smith</family-name> </record>
If I check it out as an HTML form, my browser should get something like this:
<form method="post" action="/actions/update"> <div> <input type="hidden" name="version" value="18" /> Given name: <input name="given-name" value="John" /> Family name: <input name="family-name" value="Smith" /> <button>Save changes</button> </div> </form>
When you check out the resource, you’ll also get version 18. However, when I check in my changes (using PUT or POST), the server will bump the resource version to 19. When you try to check in your copy (still at version 18), the server will detect the conflict and reject the check-in. Again, what happens after that depends on your application.
I think that this is far better than the old W3C solution, because it (1) it’s already compatible with existing browsers, and (2) it passes what I call the Sneakernet Test — I can take a copy of the XML (or JSON, or CSV, or whatever) version of the resource to a machine that’s not connected to the net, edit it (say, on the plane), then check it back in from a different computer — I can copy it onto a USB stick, take it to the beach, edit it on my laptop, then take it back to work and check it back in — all the state is in the resource, not hidden away in cryptic HTTP headers.
By the way, if you don’t trust programmers to be honest when designing their clients, you can use a non-serial, pseudo-random version so that they can’t just guess the next version and avoid the merge problem, but serial version numbers should be fine most of the time.
… how much value do you think there is to be had from a snapshot of the source code for eBay or Facebook being made available? This is one area where Open Source offers no solution to the problem of vendor lock-in.
In other words, as the Web replaces Microsoft Windows as the world’s favorite desktop/laptop software platform (it may be there already), what good is Open Source to ordinary computer user? Even if a web site happens to be built on Open Source software (like the LAMP stack), I’m still locked in:
These are huge problems, and the solution is probably going to have a lot more to do with Open Data than with Open Source. There are already a couple of minor successes:
That’s not much, though. Open Source (and its predecessor buzzword, Free Software) have been very important over the past couple of decades, giving us choices beyond the Microsoft/Apple duopoly that chained our desktops (and forcing the duopoly to open up a lot) and smashing the big-iron vendor cartel that owned our servers, but as the world shifts from desktop to web-hosted software, it can’t take us much further.
Open Source will still matter, of course (especially for server-side work), but it won’t be an important battle ground any more. Until we can convince (or force) web sites to embrace and standardize on Open Data formats — XML, JSON, or even CSV, as appropriate — we will be in some ways even more locked in than we were in the bad old desktop days. Celebrate Open Source’s victory, by all means, but get ready for an even bigger and bloodier battle over the next 10-20 years.
With REST, every piece of information has its own URL.
If you just do that and nothing else, you’ve got 90%+ of REST’s benefits right off the bat. You can cache, bookmark, index, and link your information into a giant, well, web. It works — you’re reading this, after all, aren’t you? Betcha got here by following a link somewhere, not by parsing a WSDL to find what ports and services were available.
If you want to do REST well (rather than just doing REST), you can spend 2-3 minutes after your elevator ride learning a few very simple best practices to get most of the remaining 10% of REST’s benefits:
Use HTTP POST to update information. Here’s the simple rule: GET to read, POST to change. That way, no body deletes or modifies something by accident when trying to read it.
Make sure your information contains links (URLs) for retrieving related information. That’s how search engines index the web, and it can work for other kinds of information (XML, PDF, JSON, etc.) as well. Once you have one thing, you can follow links to find just about everything else (assuming that you understand the file format).
Try to avoid request parameters (the stuff after the question mark). It’s much better to have a URL like
http://www.example.org/systems/foo/components/bar/
than
http://www.example.org/get-component.asp?system=foo&component=bar
Search engines are more likely to index it, you’re less likely to end up with duplicates in caches and hash tables (e.g. if someone lists the request parameters in a different order), URLs won’t change when you refactor your code or switch to a different web framework, and you can always switch to static, pregenerated files for efficiency if you want to. Exceptions: searches (http://www.example.org/search?q=foo
) and paging through long lists (http://www.example.org/systems/?start=1000&max=200
) — in both of these cases, it’s really OK to use the request parameters instead of tying yourself in a knot trying to avoid them.
Avoid scripting-language file extensions. If your URLs end with “.php”, “.asp”, “.jsp”, “.pl”, “.py”, etc., (a) you’re telling every cracker in the world what exploits to use against you, and (b) the URLs will change when your code does. Use Apache mod-rewrite or equivalent to make your resources look like static files, ending in “.html”, “.xml”, etc.
Avoid cookies and URL rewriting. Well, maybe you can’t, but the idea of REST is that the state is in the thing the server has returned to you (an HTML or XML file, for example) rather than in a session object on the server. This can be tricky with authentication, so you won’t always pull it off, but HTTP authentication (which doesn’t require cookies or session IDs tacked onto URLs) will work surprisingly often. Do what you have to do to make your app work, but don’t use sessions just because your web framework tells you to (they also tie up a lot of resources on your server).
The strength of REST is that it’s been proven through almost two decades of use on the Web, but not everything that some of the hard-core RESTafarians (and others) try to make us do has been part of that trial. Stop reading now if you just want to go ahead and do something useful with REST. Really, stop! Some of this stuff is moderately interesting, but it won’t really help you, and will probably just mess up your project, or at least make it slower and more expensive.
[maybe some day] Use HTTP PUT to create a resource, and DELETE to get rid of one. These sound like great ideas, and they add a nice symmetry to REST, but they’re just not used enough for us to know if they’d really work on a web scale, and firewalls often block them anyway. In real-life REST applications, rightly or wrongly, people just use POST for creation, modification, and deletion. It’s not as elegant, but we know it works.
[don’t bother] Use URLs to point to resources rather than representations. Huh? OK, a resource is a sort-of Platonic ideal of something (e.g. “a picture of Cairo”), while a representation is the resource’s physical manifestation (e.g. “an 800×600 24-bit RGB picture of Cairo in JPEG format”). Yes, as you’d guess, it was people with or working on Ph.D.’s who thought of that. For a long time, the W3C pushed the idea of URLs like “http://www.example.org/pics/cairo
” instead of “http://www.example.org/pics/cairo.jpg
“, under the assumption that web clients and servers could use content negotiation to decide on the best format to deliver. I guess that people hated the fact that HTTP was so simple, and wanted to find ways to make it more complicated. Fortunately, there were very few nibbles, and this is not a common practice on the web. Screw Plato! Viva materialism! Go ahead and put “.xml” at the end of your URLs.
[blech] Use URNs instead of URLs. I think even the hard-core URN lovers have given up on this now — it’s precisely the kind of excessive abstraction that sent people running screaming from WS-* into REST’s arms in the first place (see also “content negotiation”, above), and it would be a shame to scare them away from REST as well. URLs are fine, as long as you make some minore efforts to ensure that they don’t change.
[n/a] REST needs security, reliable messaging, etc. The RESTafarians don’t say this, but I’m worried that the JSR (the Java REST group) will. We already have a secure version of HTTP TLS/SSL, and it works fine for hundreds of thousands or millions of web sites. Reliable messaging can be handled fine in the application layer, since everyone’s requirements are different anyway, or maybe we want a reliable-messaging spec for HTTP in general. In either case, please don’t pile this stuff on REST.
So to sum up, just give every piece of information its own URL, then have fun.
[Update #2: Don Box is thinking along the same lines as I am.]
[Update #3: Rob Sayre points out that there is, in fact, a published browser-side JavaScript API underlying the AJAX widget.]
Over on O’Reilly Radar, Brady Forrest mentioned that Google is shutting down its SOAP-based search API. Another victory for REST over WS-*? Nope — Google doesn’t have a REST API to replace it. Instead, something much more important is happening, and it could be that REST, WS-*, and the whole of open web data and mash-ups all end up on the losing side.
Forget about the SOAP vs. REST debate for a second, since most of the world doesn’t care. Google’s search API let you send a search query to Google from your web site’s backend, get the results, then do anything you want with them: show them on your web page, mash them up with data from other sites, etc. The replacement, Google AJAX API, forces you to hand over part of your web page to Google so that Google can display the search box and show the results the way they want (with a few token user configuration options), just as people do with Google AdSense ads or YouTube videos. Other than screen scraping, like in the bad old days, there’s no way for you to process the search results programmatically — you just have to let Google display them as a black box (so to speak) somewhere on your page.
An AJAX interface like this is a great thing for a lot of users, from bloggers to small web site operators, because it allows them to add search to their sites with a few lines of JavaScript and markup and no real coding at all; however, the gate has slammed shut and the data is once again locked away outside the reach of anyone who wanted to do anything else.
Of course, there are alternatives still available, such as the Yahoo! Search API (also available in REST), but how long will they last? Yahoo! has its own restructuring coming up, and if Nelson Minar’s suggestion (via Forrest) is right — that Google is killing their search API for business rather than technical reasons — this could set a huge precedent for other companies in the new web, many of whom look to Google as a model. Most web developers will probably prefer the AJAX widgets anyway because they’re so much less work, so by switching from open APIs to AJAX widgets, you keep more users happy and keep your data more proprietary. What’s an investor or manager not to like?
Data APIs are not going to disappear, of course. AJAX widgets don’t allow mash-ups, and some sites have user bases including many developers who rely on being able to combine data from different sources (think CraigsList). However, the fact that Google has decided that there’s no value playing in the space will matter a lot to a lot of people. If you care about open data, this would be a good time to start thinking of credible business cases for companies to (continue) offer(ing) it.
The AJAX API is designed to allow interaction with JavaScript on the client browser, but not with the server; however, as Davanum Srinivas demonstrates, it’s possible to hack on the API to get programmatic access from the server backend. I’m not sure how this fits withThis violates Google’s terms of service, and obviously, they can make incompatible changes at any time to try to kill it, but at least there’s a back door for now. Thanks, Davanum.
Personally, I was planning to use the Yahoo (REST) search API for site search even before all this broke, because I didn’t want to waste time trying to figure out how to use SOAP in PHP. I’m glad now I didn’t waste any time on Google’s API, and I’ll just keep my fingers crossed that Yahoo’s API survives.
Post/Redirect/Get (PRG) is a common web-application design pattern, where a server responds to an HTTP POST request not by generating HTML immediately, but by redirecting the browser to GET a different page. At the cost of an extra request, PRG allows users safely to bookmark, reload, etc.
When someone attempts to reload a page generated by a POST request, browsers will generally pop up a warning that reloading will cause a form to be resubmitted, possibly causing you to purchase two sports cars (etc.) — that warning is a good thing. Strangely, however, Firefox 1.5.03 will pop up the same warning after a PRG operation, when reloading should not cause anything bad to happen. I can think of a few possible reasons:
I’m leaning towards #3, but I’m curious about whether anything thinks that Firefox is doing the right thing here, and whether other browsers (MSIE, Opera, Safari, etc.) act the same way.
]]>It looks like continuations are back on the discussion board (Gilad Bracha, Tim Bray, and Don Box). I spent some time with Scheme a decade ago and continuations were one of the new features I had to try to understand. Then, as now, I found them more clever than practical.
Gilad sets up a use case for continuations before he goes on to oppose them: in essence, a web application could use continuations to maintain separate stacks, so that as a user hits the back button and then starts down new paths, the web application would not become confused, selling the user a trip to Hawaii instead of Alaska. I can see how continuations would work for that, just as I can see how a bulldozer could turn over the sod in my garden, but I’m far from convinced that either is the right tool for what is really a much simpler problem.
First, a continuation preserves the entire state of a program, including the stack, instruction counter, local variables, etc. How much of that do you really need for a hypothetical travel web app? In reality, you probably need, maybe, 1-5 variable values to restore a previous state in the travel app, so why not just save those explicitly? It would be faster, more secure (less information being saved), and much easier to performance tune and debug (since no magic is happening behind the scenes). Save those variables in a database, in a hash table, in an XML or CSV file, in memcached, or wherever happens to be most convenient. You may be looking at under 100 bytes for each saved state, so if you really want to do this, it’s not going to hurt too badly.
But do you really want to do this? Most of the discussion around REST has focussed on the use of persistent URLs and how to use HTTP verbs like GET, POST, PUT, and DELETE, but there’s another, perhaps more critical idea behind REST — that the resource your retrieve (a web page, XML document, or what-have-you) contains its own transition information.
Let’s say that you load a web page into your browser, load more web pages, then use the back button to return to the original one. Now, select a link. What happens? Did you browser have to go back to the original web server, which was using continuations (or other kinds of saved state) to keep track of the links from every page you visited, so that it won’t send you to the wrong one? Of course not. The web page that you originally downloaded already included a list of all its transitions (links), and intuitive things just happen naturally when you hit the back button.
The web is stateless, but web application toolkits maintain pseudo-sessions (using cookies, URL rewriting, or what-have-you) that makes them look stateful, and that makes programmers lazy. Obviously, you don’t want to stick information like ‘isauthenticated’ on a web page, since it could be forged; likewise, you don’t want to put a credit-card number there. But it is trivially simple to make sure that forms, like links, go to the right place even when you hit the back button — just make the transitions fully independent of any session stored on the server side. For example, consider this:
<form method="post" action="/actions/book-trip"> <button>Book this trip!</button> </form>
Presumably, the trip the person was looking at is stored somewhere in a session variable on the browser. DON’T DO THIS! As Gilad pointed out, someone hitting the back button might end up booking the wrong trip. There are gazillions of ways to push all of the context-sensitive stuff into the web page itself, where it belongs. Here’s one example:
<form method="post" action="/actions/book-trip"> <label>Book your economy trip to Alaska!</label> <input type="hidden" name="destination" value="alaska"/> <input type="hidden" name="package" value="economy"/> <button>Book it.</button> </form>
Here’s another:
<form method="post" action="/actions/book-trip/alaska/economy"> <label>Book your economy trip to Alaska!</label> <button>Book it.</button> </form>
This is 100% backbutton-proof and it’s trivially simple to implement. It took me a while after reading Gilad’s (admittedly, strawman) example to realize that there are people who do not develop webapps this way. If they do this much damage just with a Session stack, how much pain will they be able to cause with continuations?
The REST people are right, at least on this point: there’s no need to drive a continuation bulldozer through your webapp, when a little REST garden spade will work quite nicely (and won’t tear up your lawn in the process). Don suggests that there may be other, more legitimate use cases for continuations outside of web applications, and I have no reason to disagree, but I would like to look at them pretty carefully.
Is this a serious defect in my own skills and practices, or do others feel the same way? To paraphrase an almost-famous saying by Tim Bray, here’s what I want most from a development environment:
Don Box got people talking last week in a posting where he distinguishes between two kinds of REST: lo-REST, which uses only HTTP GET and POST, and hi-REST, which also uses HTTP PUT and DELETE.
If this distinction doesn’t seem very important, don’t worry — it’s not. Tim Bray captured the most important point, that Don Box (who is heavily involved in REST’s nemesis, Web Services) is talking positively about REST at all. For the RESTafarians and some of their friends, however, Box’s heresy was even worse than his former non-belief, because heresy can easily lead the faithful astray: witness strong reactions from Dimitri Glazkov, Jonnay (both via Dare Obasanjo), and Dare Obasanjo himself. There is even a holy scripture, frequently cited to clinch arguments.
I do not yet have a strong opinion on which approach is better, but I do see a contradiction between the two arguments I hear most often from REST supporters:
Pick one, and only one of these arguments, please. As far as I can see, apart from a few rare exceptions (like WebDAV), Don’s lo-REST — HTTP GET and POST only — is what’s been proven on the web. The pure Book of Fielding, hi-REST GET/POST/PUT/DELETE version is every bit as speculative and unproven as Web Services/SOAP/SOA themselves (that’s not to say that it’s wrong; simply that it’s unproven). Some REST supportors, like Ryan Tomayko, acknowledge this contradiction.
Tim Bray proposes throwing out the REST name altogether and talking instead about Web Style. I like that idea, though the REST name may be too sticky to get rid of by now. Dumping the REST dogma along with the name would clear up a lot of confusion: HTTP GET and POST have actually been proven to work and scale across almost unimaginable volumes; on the other hand, like the WS-* stack, using HTTP PUT and DELETE remains a clever design idea that still needs to be proven practical and scalable.