Sean Conner:
I gave up on dealing with link rot years ago. If I come across an old post
with non-functioning links, I may just find a new resource, link to The
Wayback Machine or (if I’m getting some spammer trying to get me to fix a
broken link by linking to their black-hat-SEO laiden spamfarm) removing
the link outright. I don’t think it’s worth the time to fix old links,
given the average lifespan of a website is 2 years and trying to automate
the detection of link rot is a fools errand (a page that goes 404 or does
not respond is easynow handle the case where it’s a new company running
the site and all old links still go a page, but not the page that you
linked to). I’m also beginning to think it’s not worth linking at all, but
old habits die hard.
After about nine years of writing, I’ve concluded something similar: my
existing reactive approach is not
going to scale either with expanding content or over time. Fixing individual
links is OK if you have only a few or aren’t going to be around too long,
but as you approach tens of thousands of links over decades, the dead links
build up.
So the solution I am going to implement soon is taking a tool like
ArchiveBox or SinglePage and hosting my own copies of (most) external links,
so they will be cached shortly after linking and can’t break. The bandwidth
and space will be somewhat expensive, but it’ll save me and my readers a ton
in the long run.
Always glad to hear your perspective on this. I am working on my own archival
system for href.cool - I’ve already lost thewoodcutter.com and humdrum.life.
(Been going for one year.)
My approach right now is to verify all of the links weekly by comparing last
week’s title tag on that page to this week’s title tag. I’ve had to tweak this a
bit - for PDF links or for pages that have dynamically generated titles - I can
opt to use ‘content-length’ header or a meta tag. (Of course, the old title can
be spoofed, so I’m going to improve my algorithm over time.)
I wish I had a way of seeding sites with you. I imagine we have some crossover
on interests and would also love to contribute bandwidth - as would some of your
other readers I’m sure.
This post accepts webmentions. Do you have the URL to your post?
You may also leave an anonymous comment. All comments are moderated.
Reply: Nine Years of Link Rot
Always glad to hear your perspective on this. I am working on my own archival system for href.cool - I’ve already lost thewoodcutter.com and humdrum.life. (Been going for one year.)
My approach right now is to verify all of the links weekly by comparing last week’s title tag on that page to this week’s title tag. I’ve had to tweak this a bit - for PDF links or for pages that have dynamically generated titles - I can opt to use ‘content-length’ header or a meta tag. (Of course, the old title can be spoofed, so I’m going to improve my algorithm over time.)
I wish I had a way of seeding sites with you. I imagine we have some crossover on interests and would also love to contribute bandwidth - as would some of your other readers I’m sure.