Broken Links Analysis

In a previous post, I talked about the loss of links on the web. In Brian Suda’s research on his Pinboard links, he found that 22% of the links were gone. I have links on my website dating back to early 2006, so I was curious how many would still be working.

So I have done a similar analysis on the links I have posted and found out that almost a third of the links are now broken. Some because the domain is no longer available, and some because the website is returning a “not found” (404) response. It’s not perfect, but as Brian writes;

If the HTTP code is less than 400 we mark it as a success. Without manually checking every URL, there might be some false positives: people selling existing domains, hosting provider redirects, etc. If the status code was 400 or higher, we marked it as a failure.

These are the results I found. Out of the 1546 links in my link collection, there were 507 which returned an error. This is a total of 32.8% of the links that are broken and over 40% of those I added in 2006.

Year Links Broken Percentage
2006 760 325 42.4%
2007 156 49 31.4%
2008 50 19 38%
2009 96 26 27.1%
2010 159 62 39%
2011 102 25 24.5%
2022 199 1 0.5%
2023 24 0 0%
TOTAL 1546 507 32.8%

To investigate this, I built two Laravel Artisan commands. One to find broken links and another to generate the report above.

Finding Broken Links

I followed an approach very similar to Brian’s; making an HTTP request to the URL and checking if it is not successful. I can run the command to just check for broken links or additionally mark them in the database by adding an --update flag;

php artisan link:error-checking
php artisan link:error-checking --update

Below is the command code (sans the namespaces);

class ErrorChecking extends Command
{
    use Conditionable;

    /** @var string */
    protected $signature = 'link:error-checking {--update}';

    /** @var string */
    protected $description = 'Check links for HTTP errors and optionally mark them as broken.';

    protected array $httpOptions = [
        'verify' => false,
    ];

    public function handle(): mixed
    {
        $errorLinks = Collection::make();
        $update = $this->option('update');

        $links = $this->withProgressBar(Link::all(), function (Link $link) use ($errorLinks) {
            $error = $this->isLinkAnError($link);
            $errorLinks->when($error, fn () => $errorLinks->add($link));
        });

        $this->newLine();

        $errorLinks->each(function (Link $link) use ($update): void {
            $this->error($link->permalink);
            $this->when($update, fn () => $link->markAsBroken());
        });

        $this->info($this->message($links->count(), $errorLinks->count()));

        return Command::SUCCESS;
    }

    protected function isLinkAnError(Link $link): bool
    {
        try {
            return ! Http::withOptions($this->httpOptions)->get($link->permalink)->successful();
        } catch (ConnectionException | RequestException) {
            return true;
        }
    }

    protected function message(int $total, int $errorCount): string
    {
        $link = Str::plural('link', $total);
        $error = Str::plural('error', $errorCount);
        $were = $errorCount === 1 ? 'was' : 'were';

        return "{$total} {$link} checked. There {$were} {$errorCount} {$error}.";
    }
}

Broken Link Report

For the report, I can use my local database of the updated data. Running link:error-report generates a table;

+-------+-------+--------+------------+
| Year  | Links | Broken | Percentage |
+-------+-------+--------+------------+
| 2006  | 760   | 325    | 42.8%      |
| 2007  | 156   | 49     | 31.4%      |
| 2008  | 50    | 19     | 38%        |
| 2009  | 96    | 26     | 27.1%      |
| 2010  | 159   | 62     | 39%        |
| 2011  | 102   | 25     | 24.5%      |
| 2022  | 199   | 1      | 0.5%       |
| 2023  | 24    | 0      | 0%         |
| Total | 1546  | 507    | 32.8%      |
+-------+-------+--------+------------+

I can change the style of the table using the --style flag, based on the styles provided by Laravel. These could be one of the following; default, borderless, compact, symfony-style-guide, box or box-double. Unfortunately, there isn't currently a Markdown syntax style provided by the underlying Symfony component.

Below is the code for the report command;

class ErrorReport extends Command
{
    use Conditionable;

    /** @var string */
    protected $signature = 'link:error-report {--style=default}';

    /** @var string */
    protected $description = 'Output a report about broken links.';

    protected array $tableHeaders = [
        'Year',
        'Links',
        'Broken',
        'Percentage',
    ];

    public function handle(): mixed
    {
        $errorLinks = Link::query()
            ->selectRaw('YEAR(added) as `Year`')
            ->selectRaw('count(*) as `Total`')
            ->selectRaw('(SELECT count(*) FROM `links` as l WHERE l.broken = 1 AND YEAR(l.added) = YEAR(links.added)) as `Broken`')
            ->groupBy('Year')
            ->orderBy('Year')
            ->get()
            ->map(function (Link $link): array {
                return array_merge($link->toArray(), [
                    'Percentage' => $this->percentage($link->Broken, $link->Total),
                ]);
            });

        $this->table(
            $this->tableHeaders,
            $errorLinks->add([
                'Total',
                $errorLinks->sum('Total'),
                $errorLinks->sum('Broken'),
                $this->percentage($errorLinks->sum('Broken'), $errorLinks->sum('Total'))
            ]),
            $this->option('style')
        );

        return Command::SUCCESS;
    }

    protected function percentage(int $broken, int $total): string
    {
        $percentage = round(($broken / $total) * 100, 1);

        return "{$percentage}%";
    }
}