Many people believe that what they do on the Web will stay there forever. Maybe it is a scaremongering tactic to stop sharing information and it was something I used to be aware of. However, it's a rather common misconception. Although the Web is infinite in its scale – and always growing – the assumption is the URLs and their data will still live indefinitely; this is far from the truth.
In 1998, Tim Berners-Lee wrote one of the directives of the Web; Cool URIs don’t change;
There are no reasons at all in theory for people to change URIs (or stop maintaining documents), but millions of reasons in practice.
However, those that work on the web know this is a fallacy. Web developer Jeremy Keith – knowing from experience how URLs disappear all too easily – created a “long bet” stating;
The original URL for this prediction (www.longbets.org/601) will no longer be available in eleven years.
Thankfully, after eleven years, the URL still survives and Jeremy was happily proven wrong. However, this isn't the case for a lot of URLs.
Sometimes there is seismic destruction of URLs, such as Geocities and the original social network Myspace. These closures are often reported in the media – ironically, these posts can also suffer the same fate of disappearing!
GeoCities was an important outlet for personal expression on the Web for almost 15 years but was discontinued on October 26, 2009.
Luckily The Internet Archive stepped in and archived Geocities using Wayback Machine in a special deep backup.
The Internet Archive launched special several deep collection crawls, including specific sites nominated by the public, over the last few months that GeoCities was in operation, to help make our archive of GeoCities sites as deep and thorough as possible.
These backups are definitely exceptional and not the norm. It is heartening to see that the UK Government Digital Service has a policy of No link left behind which is part of their modernising and consolidating online services, meaning legacy URLs from the sporadic early days of online government services are maintained.
Brian Suda sums up the scale of the problem in his article Link Rot…
In 18 years, we’ve lost 22% of the content on the Web, sure, the language and ideas are not extinct, but we certainly lost something for future generations.
There is also a steady loss of small personal URLs from the Web. These don't get much attention but is something I have infuriatingly and sadly seen a lot when revisiting links I have posted, especially in my on this day archive which is a lovely dive into the past. There are plenty of reasons for this to happen, such as the recurring cost of domains and hosting, as well as moving to newer versions without the care to redirect old URLs.