VOGONS

Common searches


First post, by tkrn

User metadata
Rank Newbie
Rank
Newbie

Dear Community -

Longtime listener, first-time caller here. It's wonderful what this community has done and soo many other communities have done. As the internet ages I've taken it upon myself to help index, crawl and preserve internet sites that are disappearing as time goes on. This is especially relevant for those sites that have been locked in time where the maintainer is no longer maintaining it or has gone abandoned. As these sites become abandoned, it's a matter of time before they go offline indefinitely! My goal is to capture it before that happens!

I'm looking for a call to action to help me capture a list of sites that fit the criteria to get indexed in this archive collection. You may ask, why not leverage the Wayback Machine? This is a fair question, in my observations, the Wayback Machine does not always capture the binaries associated with these types of sites for whatever reason or another thus leaving important binaries/data unarchived.

The archiving that takes place leverages the Heritrix engine which is the same crawl engine that is behind the Wayback Machine along with a number of other private archive projects generally funded by universities. This means, the archives in this collection are in archive quality collections (warc format). It's also worthy to note, I have approximately 250TB of current storage and have the potential to scale higher.

For more details, you can find them here: https://blog.tkrn.io/vintage-computer-game-console-archive/
And a list of currently archived sites: https://blog.tkrn.io/tkrns-archive-indexed-sites/

Call to action, drop here or use the submission form on my blog for sites that fit the following mission statement:

tkrn’s archive is a niche web archive specifically focused on the preservation of vintage computer technology and video game console related information. There is an emphasis on preserving sites that have information to reverse engineering hardware/software and or driver/game/mod repositories.

Last edited by tkrn on 2021-09-10, 18:31. Edited 1 time in total.

Reply 1 of 5, by BitWrangler

User metadata
Rank l33t++
Rank
l33t++

Hi hi, a worthy project, thanks, keep it up!

One set of things I am worried about losing is the Rebels Haven Computer Forum collection of stock and modified BIOSes for AMD boards. Now the forum went offline around 4 or 5 years ago, but their BIOSes with description page for each board, were hosted offsite. Unfortunately, the main directory of them seems to have been generated by a script that was server side hosted on the forum domains so trying to bring it up by the waybackmachine does not work. The domain hosting the BIOSes lejabeach.com does not appear to have an index accessible from main page or any pages, just links back to rhcf.com

However, the google search https://www.google.com/search?q=site:lejabeac … w&start=20&sa=N probably brings up most of what is there, but it needs re-indexing or something.

Last edited by BitWrangler on 2021-09-10, 16:10. Edited 1 time in total.

Unicorn herding operations are proceeding, but all the totes of hens teeth and barrels of rocking horse poop give them plenty of hiding spots.

Reply 2 of 5, by DJMadMax

User metadata
Rank Newbie
Rank
Newbie

Okay, first of all: this is just great! Perservation especially of such topics that went by faster than the new old next past iPhone and Samsung Galaxy is pretty important - at least to people like me and obviously there's a ton of people thinking similarly.

My question is: are you focusing strictly on english websites? I'm from Germany and I am pretty aware of some special German sites and forums revolving solely around oldschool computers and retro gaming.
Additional: can you actually backup forums? I know that - especially when they're still frequently used - you dont really have a "final" version of an archived forum but still, better than nothing.

The next question would be if your archiving is limited to text and images or if it could also be containing videos? I'm thinking of some really great Youtube channels revolving around your topic. As long as ABC/Google/Youtube wants, those videos are of course up and everyone can watch them at practically any given time. But what if an account gets deleted? What if Youtube decides to change its policies, what if stuff gets geoblocked etc?

As long as it's not a copyright problem (I'm not a lawyer), I would love to have the assurance that "my favourite Youtube channels" (regarding your topic) are redundantly backed up.

Reply 3 of 5, by tkrn

User metadata
Rank Newbie
Rank
Newbie

Very good points. Thanks for the feedback. To directly address some of the questions thus far is that it's not language limited. There are a number of sites that appear to have tons of useful information but I often translate them with Google to determine if they have meaningful content, when they do they get added. Simply put, I'm not tied to English because the information that is relevant is relevant regardless of language or country of origin. The only impact is that when it crawls over long distances, it just takes more time is the only real implication.

Copyright, that's probably always a problem but this would face the same challenges as another university with archive projects and the Wayback project. I have to think these are a precedence from a law aspect. I don't know what that is, I don't claim to know it but none of this is for profit. If anything, if it grows enough momentum (which I hope it does) it becomes it's own 503c.

Interesting point on Youtube. Apart from the main crawler, Heritrix, there is a module that can extract YouTube videos using the youtube-dl package. I've had this setup in a previous version but never performed correctly. It's on my docket list to address because, I agree there is a ton of great videos which need to get preserved as well.

@BitWrangler I'm working to get a copy of that. I see alot of links are broken but I'm manually building a seed url list (https://pastebin.com/EfBTpyLa) which I can start a crawl job. It's crawling right now and it's grabbed 164MiB already with 1400+ URLs queued.

d5EAkxR.png

Last edited by tkrn on 2021-09-10, 18:08. Edited 3 times in total.

Reply 4 of 5, by BitWrangler

User metadata
Rank l33t++
Rank
l33t++

Excellent! There's some manufacturer directories that don't have a real index page in the directory like ECS, but there's three pages under it, http://www.lejabeach.com/ECS/ez.html and http://www.lejabeach.com/ECS/k7s6aBIOS.html in addition to the one you've got. Google is being unfriendly and giving me "not a robot" checks every time I search with inurl: so potentially not helpful to automate that part. Then inurl: or text search is giving me nothing for EVGA despite the one page you've listed there so IDK if there's more than that one EVGA page.

Motherboard, BIOS, workshop, guide seem to be useful text terms, but results partially duplicate each other. Not sure if they put a full whack of their other guides and things on this domain or not and it's just a few strays.

Then there's the odd stray page in the root like http://www.lejabeach.com/CBROM.html not sure how to chase all those out.

Thanks for taking this on.

Unicorn herding operations are proceeding, but all the totes of hens teeth and barrels of rocking horse poop give them plenty of hiding spots.

Reply 5 of 5, by DJMadMax

User metadata
Rank Newbie
Rank
Newbie

@tkrn

Thx for the answers 😀 And thanks in BitWrangler's name - that was quick! You might be a machine yourself 😁

If forums are a thing to you, I can name a couple of German ones:
www.voodooalert.de
Main topic is 3Dfx but also any other aspects of retro computer hardware and related gaming/software. It has a front page with some useful information but the biggest part of knowledge of course is hidden in its forum.

www.forum64.de
As it's name implies, this site is solely dedicated to Commodore, mainly the C64 (but not only). They have a ton of technical knowledge and once helped me identify a bad memory chip in one of my C64 which I was able to replace based on their help (finding a compatible memory chip wasnt that easy and some pins of the chip needed to be rerouted/rewired).

www.circuit-board.de
Same as the above but now completely revolving around retro console gaming with tons of DIY and HowTo informations.

Not a forum:
www.amoretro.de
It features a nice collection of old hardware, some hi-res pictures and lots of background informations.

I'll post more when I come across/remember more 😀