Implementing URL Shorteners

When all the talk about the problems with URL shorteners, started, a few people, such as Jeremy Keith and Kellan Elliott-McCrea suggested the use of rev="canonical" as solutions to URL shorteners.

Kellan quickly whipped up a working solution which 'searches the referenced resource for a link element that (contains rev="canonical")' showing the shortened URL or if nothing was found, the original URL. Since setting up this basic solution, he has also created a blog about RevCanonical, documenting all the ideas and implementations which are springing up at a surprising speed.

There is currently a small caveat in using the rev attribute, which is available for both <link> and <a> HTML elements. The newest version of HTML, HTML5, which is gaining great traction in both the web community and with browser vendors, has dropped support for the attribute. The reasoning behind this was that the attribute is rarely ever used. In contrast, the 'forward-facing' rel rel attribute is still available in HTML5 as it is using a lot when linking to alternative content, stylesheets, feeds etc. This is why some people are suggesting using rel="alternate short" or variants of this, but for the meantime you could implement both. However, because the HTML5 specification is still work in progress, this sudden surge of use-cases could see the attribute re-appear.

Simon Collision has created an Expression Engine plugin which you can read by following his new shortened URL. Chris Shiflett talks about using a X-Rev-Canonical HTTP headers, to save services which would like to implement the shortening URL look up parsing the entire HTML. Akrabat has written a WordPress plugin for URL shortening on your own website, which implements the Canonical HTTP header that Chris Shiflett suggested as well as the rev="canonical" and the HTML5 safe rel="alternate short".

Some large sites has already implemented this system, PHP.net and Flickr, among others, are using the rev="canonical". Flickr have even used a new shorter domain – http://flic.kr – to make their new URLs as short as possible.

Simon Willison has written a great article about "rev=canonical bookmarklet and designing shorter URLs" which includes a bookmarklet to help you discover whether the page you're on as explicitly described a short-URL itself, otherwise suggesting a URL shortening service, such as TinyURL.

What needs to happen now is URL shortening services, such as TinyURL and bit.ly, need implement a lookup system, similar to that which Simon Willison put together. So when a user goes to shorten a URL, it will give back a preferred version (if one exists), if not, then default back to their system. Services which make create use of shortened URLs, such as Twitter, should implement something like this natively which would give this new concept great traction.

I have implemented the system, which uses both the rev and rel attributes on the <link> element for my movies and blog section. The redirects performed from the shortened URL to the correct explanative URL are using the 301 Moved Permanently status code.

Some people have complained that this solution is solving a non-problem. That is, URLs don't need to be long in the first place – I disagree. URLs should be easy to understand and 'hackable'. I am unaware of when he convention of dates and slugs for URLs started, but it is common place in blogs, such as in WordPress, and for well written websites. 'Nice' URLs, that is a URL which describes the page you're visiting, is good for SEO, this is what people who complain about them being too long are using as their weapon. However, I think 'nice' URLs have two very important functions; firstly, describing the page you're visiting and secondly, being hackable. In most blog-style URLs you can remove the article slug and view articles posted that month, and likewise, removing the month shows articles from that year – this is a great system for exposing more data. URLs with simple the ID in them tell the user nothing about the content and are not hackable, that is why the places which have quickly implemented these shortened URLs for their own content simply use them as a shortcut to the descriptive and hackable URL. The W3C have a good starter guide for these types of URLs – "Cool URIs don't change".