In my blog post about how to do digital preservation I covered setting up a legacy PHP project using Docker, building a static version of the website and then deploying the HTML to GitHub Pages. However, the decade-old+ projects needed a little tinkering to work correctly.
Digital Archaeology
The websites were built for the Apache web server and used .htaccess
files to create nice URLs. This technique is employed so the website has friendly URLs but it also hides the technology used; there are no .php
suffixes in sight. This also helped maintain the same URL structure for the static-generated website, which is now serving .html
files – but you would never know!
However, – for whatever reason – a lot of these projects were missing the important .htaccess
file… So, I had to do a little digital archaeology. I had to reverse engineer the mod_rewrite
rules to build the correct query parameters by investigating the code. Luckily the projects were relatively small and I was able to map the $_GET
parameters expected in PHP to the URLs the website was using.
Database Issues
I had to change from using mysql_query
to mysqli_query
and that also involved passing the connection as the first parameter. Once these changes were made, the database-driven projects started showing a lot more content.
For two of the projects, I was missing a database export that was required for certain sections. The website still worked but was missing core functionality. Again, I investigated the codebase, looking for SELECT
MySQL statements and then rebuilt the database tables to match. Once I had a basic structure, I populated a few rows and the website sprung to life!
Paths and Other Issues
With the .htaccess
and MySQL database issues solved, there were a few other issues that cropped up. This was mainly down to URL paths missing a starting slash, so when the build script ran it ended up in a loop, generating folders deeper and deeper!
One website complained about “no default timezone set”, which was an easy fix by adding date_default_timezone_set
.
A bizarre issue was that GitHub Pages didn't seem to support files which contain []
characters. This was was how one project named the images so, although they worked locally and in the build folder, they showed the GitHub Pages 404 page when deployed. I had to rename these files and references to them in the codebase and they then worked correctly.