Comments on: XML 2006 pickled and preserved

By: Anonymous Coward

Anonymous Coward — Wed, 18 Apr 2007 18:20:54 +0000

I David, just found your blog and I like it…

I agree that ideally you shouldn’t show {.asp,.jsp,.php} extensions… Yet many of the planet’s most successful websites (both in terms of availability and users) do use them (well, at least they don’t bother to hide these extensions).

Regarding the security concern, hiding the extension is just “security by obscurity”. Now I’m certainly not saying the website is *less* secure… but you’re not gaining much (some would say that with security by obscurity you’re gaining nothing).

Regarding the banks websites: I know that out of security concern most of the banking industry is using Java and Java backed Webapp servers. I like that, for I know that there hasn’t been a single buffer overflow targetting any single JVM since Java exist (there have been exploits in thirdparties, C-written, libs, like zlib, though). So when I’m on a website and I see “.jsp” I tend to think: it’s backed by Java, it’s probably not as insecure as a .asp or .php website.

But that may be just me.

Anyway this is a false argument for only a very small percentage of the population, actually an insignificant one, knows what it means when they see .php or .asp or .jsp etc.

I agree it’s better not show the extensions, but I wouldn’t consider showing them to be such a huge problem.

By: david

david — Tue, 23 Jan 2007 16:09:06 +0000

Deepak:

Exposing scripting extensions leads to a huge range of problems for web sites:

The extension exposes both the scripting environment being used and the architecture of the site (e.g. get-bank-account.asp?account=12345), providing valuable information to any would-be cracker.

The scripting extension makes long-term maintenance of the site very difficult, since you will have to either break all existing links and bookmarks or add a complicated, high-maintenance set of redirects as the site’s architecture or web framework changes over the years (even something as simple as splitting one script into two or vice-versa could break thousands of external links and bookmarks).

Search engines cannot index pages that rely on POST parameters, and often won’t index pages that rely on GET parameters, so you’re damaging the site’s search-engine placement.

(Less important) scripting extensions look tacky and amateurish, and reflect especially badly on big companies like banks and merchants who rely on online trust (if they don’t know enough to hide the script extensions, do they really know enough to protect my credit card number?).

For a more detailed and thoughtful discussion of this topic, see Tim Berners-Lee’s famous paper, Cool URLs don’t change — there are some real-world examples at the end of sites bombing miserably from not following the advice.

By: Deepak Shetty

Deepak Shetty — Tue, 23 Jan 2007 08:33:27 +0000

Whats your rationale behind
“Script names are not shown to the public, so there are no URLs ending in “php” (hint: exposed script extensions like “php”, “asp”, or “jsp” are signs of gross incompetence in web design). “?

By: david

david — Sat, 20 Jan 2007 12:09:26 +0000

Thanks, Ed. Content negotiation sounded cool back in the 1990s when the Tim B-L and others at the W3C were pushing it so hard, but outside of their own site, I haven’t seen it much (if at all) in the wild. I wonder if it’s one abstraction too far.

Relying simply on well-known file extensions (html, png, jpg, pdf) for media identification worked very well in this case, with one exception, which I didn’t notice until after I made the posting: I was relying on the web server to send out the right character encoding (I had used .htaccess in Apache), so I’ve had to send a note to IDEAlliance’s ISP asking all files to be served out as UTF-8. Right now, IIS is sending them simply as ‘Content-type: text/html’ with no encoding specified. Firefox guesses UTF-8 correctly, but MSIE doesn’t. That should be fixed early next week. IIS also sends out stupid caching headers, but I’ve also requested that those be fixed.

By: Ed Davies

Ed Davies — Sat, 20 Jan 2007 10:23:44 +0000

An issue you don’t mention is preservation of the media type. A cooler implementation would use “/programme/presentations/123” without the .html extension leaving the choice of media type to content negotiation. However, that doesn’t work so well when the files are stored in most file systems which do not preserve media type.

I assume that, in practice, you required a reliable mapping between filename/URL extension and media type.