Whenever I download drivers or installers for anything I own, I keep the files archived. However, this can still be a problem when I get something new and it's support pages are already gone.
It wasn't long ago that I needed to grab drivers from Toshiba's site for a particular model Pentium-75MHz laptop that I had picked up from Goodwill. I was glad they were still there.
It's kind of annoying that stuff like this is being deleted. Storage is cheaper than ever, and the files in question for old obsolete devices will almost by definition be small by modern standards. They don't need to "maintain" the current trendy look for that part of the web site, just throw them into an FTP directory or something.
raymangold wrote:
I've already performed a mass archive of many websites. One good piece of software is HTTRACK: http://www.httrack.com/
The few times I've used httrack, I'm always paranoid that a site admin or even some automated server script is going to take offense and IP ban me. I've read some stories about that happening. The default downloading settings are so aggressive that it worries me, so I always end up configuring it to be much, much slower. I have no idea what level of abuse actually gets an admin's attention though, and I don't think asking them directly would go over well. I figure staying below the radar is best.
I've had difficulty getting it configured to crawl a site just as I intend, without missing important sections or files but also without trying to archive the entire internet. I've had to do a lot of trial and error on that, and I almost always find flaws with my attempts, but at some point I settle for "good enough".
I had one quite large mirror project that thankfully had a straightforward site structure but lots of large files. I think it took over a month, and it still didn't quite finish before a power outage cut it short. That's the downside of trying to be inoffensively slow I guess. Sadly httrack is pretty broken when it comes to supposedly being able to resume an incomplete mirror. It doesn't really work as far as I can tell.
To the extent that I can tame it and get it to do what I actually want it to do, it's an awesome tool though. It's just really tricky and intimidating to get it set up right, and I find it requires a dedicated machine with no particular timeframe for completion, and a lot of patience for starting over when I screw something up. But when it goes right, the end result can be amazing.
A few years ago I mirrored a few motherboard manufacturer support sites, but I missed important files in some of those.
The oldest "mirror" I have is a set of CDs that I got mailed from Intel. It's a copy of some portions of their web site from 1998. Recently I looked at it trying to find high quality motherboard pictures, but they weren't high quality at all. It was 1998, the age of dialup.