VOGONS


First post, by pyrogx

User metadata
Rank Member
Rank
Member

The other day I was messing around with some 3D accelerators and I was annoyed by the fact that Voodoo 1s (SST1) didn't want to cooperate with my 1GHz Athlon test system. Web search results mostly came down to "Yeah, Voodoo1s have issues with fast systems, FSB issue, don't go above 66MHz, don't use CPUs above 500MHz, use an older system, etc, etc..."
But my inner scientist was asking the question:
Why?
What's wrong with the first Voodoo cards that they stop working properly on systems with fast CPUs or FSB speeds, although the card itself still sits on a standard PCI bus with 33MHz? What does FSB speed have to do with it? Or CPU clock speed?

So I started poking around: Different drivers, different versions of the Glide libraries, different OSs (DOS, Windows, Linux), different chipset settings affecting the PCI bus behaviour. The result was always the same:
The card starts to initialize, the VGA passthrough is turned off - and then the card hangs, displays whatever garbage is currently in the framebuffer, and most of the time also takes the rest of the system down.

Anyway, I noticed a few things:
- Sometimes on very rare occations the Voodoo1 would work properly, but only once and not reproducible.
- If I go to an 800MHz CPU, this happens more often but still not 100% of the time.
- If I go to a 500MHz Athlon (the slowest one I have), the card works fine.
The rest of the system including the 100MHz FSB is always the same. So the whole mess is not really related to FSB speed, at least not on my system. Also, if I turn off caches or slow the system down with something like THROTTLE.EXE, the Voodoo also works with the 1GHz Athlon, although as a slideshow simulator.

So I thought: Maybe it's not the hardware, but the software i.e. the drivers? Do they contain some speed sensitive code which breaks on faster systems?
Since I had Linux on that machine already, I grabbed one of the many Glide source code forks (https://github.com/sezero/glide), compiled it for the SST1, verified that it worked on a slow system (it does) and that it failed on a fast system (it does).
I knew that the card failed early during init, so I started single-stepping through the glide init code with a debugger. Soon I noticed that if I single-stepped the program, the Voodoo works on my fast CPU, but if I set a breakpoint somewhere after initialization, it failed.

And then, when crawling through the actual code, I noticed it contains something like this:

/* glide2x/sst1/init/initvg/util.c  */
/* Wait for command to execute... */
for(n=0; n<25000; n++)
sst1InitReturnStatus(sstbase);

...and also this:

 /* glide2x/sst1/init/initvg/video.c  */
/* Wait for video clock to stabilize */
for(n=0; n<200000; n++)
sst1InitReturnStatus(sstbase);

...and quite a few more variants of that in different files.

This reeks of a delay loop being executed too fast for the card. The function sst1InitReturnStatus does not seem to do anything special except for returning the value of the memory-mapped status register of the SST1 FBI.
But the for-loop does not even look at the result, so it is just a delay loop running in circles around that register.
That's a recipe for failure on faster systems. It looks like if you perform the initialization steps too quickly for the SST1, it gets confused and drops into an undefined state (i.e. it hangs...).
Just for fun, I tried to modify the source code of sst1InitReturnStatus() so that it slows down a bit by using a nanosleep() call in that function. I added a delay of just 20ns on each call of sst1InitReturnStatus(), recompiled and checked if it made any difference.
It did. The Voodoo came up with no issues using my modified glide library, every single time, no hangs, no crashes anymore, I even could play GLQuake and Quake II using FXMesa without problems.

So the whole problem (at least on my particular Athlon system, I didn't test even faster systems yet) was just a timer underrun during hardware setup. I don't know for sure why 3Dfx did this but I have a suspicion: The Voodoo1 (and the Voodoo2) have no active feedback channel of their current state like an interrupt. Just a status register that you have to poll a lot and maybe that register doesn't even track all the states of the hardware, especially during memory and video clock setup. That's a very simple way to implement that in hardware, just fire a series of commands at different registers, wait a bit and hope for the best that the hardware does what it needs to do in time...

I will try to compile a glide2x.dll and glide3x.dll for Win9x now but I need to find a replacement for that nanosleep() function which is not available outside of Linux.

Reply 1 of 30, by BitWrangler

User metadata
Rank l33t++
Rank
l33t++

Wow, amazing that went so long without anyone patching it.

Did you think of taking a look at the Voodoo 2 drivers to see what they did there?

Otherwise, maybe use tricks from the slowdown utils, but only for a beat or two, not looping.

Unicorn herding operations are proceeding, but all the totes of hens teeth and barrels of rocking horse poop give them plenty of hiding spots.

Reply 2 of 30, by ChrisK

User metadata
Rank Member
Rank
Member

Great news. Excellent work!
Keepin' my fingers crossed 😀

RetroPC: K6-III+/400ATZ @6x83@1.7V / CT-5SIM / 2x 64M SDR / 40G HDD / RIVA TNT / V2 SLI / CT4520
ModernPC: Phenom II 910e @ 3GHz / ALiveDual-eSATA2 / 4x 2GB DDR-II / 512G SSD / 750G HDD / RX470

Reply 3 of 30, by pyrogx

User metadata
Rank Member
Rank
Member
BitWrangler wrote on 2026-05-29, 12:28:

Wow, amazing that went so long without anyone patching it.

Did you think of taking a look at the Voodoo 2 drivers to see what they did there?

Otherwise, maybe use tricks from the slowdown utils, but only for a beat or two, not looping.

I have looked at the Voodoo2 code as well and it is very similar to the Voodoo1, at least in terms of initialization. It also contains these "delay loops", so I am a bit surprised that Voodoo2s are not affected by speedy CPUs. But maybe the V2s are simply fast enough on startup and the delay times in the driver are still sufficient even on modern systems.

I'm still working on getting a usable Windows DLL from the source code. As a quick fix I'll probably just add another delay for-loop to this status function, but for a "proper" solution several parts of the init code need to be rewritten.
My coding skills are not really that great so I don't know if I want to do this...
Anyway, right now I'm fighting with the MinGW compiler to actually produce a working DLL file in the first place.

Reply 4 of 30, by AncapDude

User metadata
Rank Newbie
Rank
Newbie

Would a simple sleep(1) solve the issue? I mean no one would complain waiting 1sec if the card works flawlessly then.

Reply 5 of 30, by pyrogx

User metadata
Rank Member
Rank
Member
AncapDude wrote on 2026-05-29, 18:50:

Would a simple sleep(1) solve the issue? I mean no one would complain waiting 1sec if the card works flawlessly then.

Unfortunately this function which reads the status bit is called ~20-30 times during init (for-loops not included) so the waiting time would be much longer.

Anyway, I was able to finally compile a set of new DLLs which actually work. I took the "cheap" shortcut and just added another delay loop in the sst1InitReturnStatus funtion. Not the prettiest solution but it seems to do the job.
I hat to use MSVC 6.0 to compile these, neither MinGW nor OpenWatcom produced anything useful.

To make things a bit more flexible, I also added a new environment variable: SST_INITDELAY
It takes a positive non-zero number which defines the number of for-loop runs in said function. It defaults to 250, which is the value that works on my 1GHz Athlon system.

I also created a Glide 2.1.1 compatibility dll (named glide.dll) which is required for some older 3dfx titles like Mechwarrior 2. I checked MW2 with that dll and it works for me, but it's not guaranteed to work with other games...

Check the files attached if they work for you, feedback is appreciated.

Reply 6 of 30, by Spark

User metadata
Rank Member
Rank
Member

This is a great find! Do you think it might be possible that the glide2x.ovl dos driver might be fixed in a similar manner?

Reply 7 of 30, by AncapDude

User metadata
Rank Newbie
Rank
Newbie

Awesome!

Reply 8 of 30, by pyrogx

User metadata
Rank Member
Rank
Member
Spark wrote on 2026-05-30, 18:09:

This is a great find! Do you think it might be possible that the glide2x.ovl dos driver might be fixed in a similar manner?

Probably. But first I have to find out how to build that thing. This ovl file is some weird dll format for this DOS extender, dos4gw. No clue what compiler can build this. Watcom maybe.

Reply 9 of 30, by Ozzuneoj

User metadata
Rank l33t
Rank
l33t

This is awesome! Nice job!!

It's amazing that people are still figuring things like this out after all these years.

Now for some blitting from the back buffer.

Reply 10 of 30, by NeoG_

User metadata
Rank Oldbie
Rank
Oldbie

I wonder if one would write a simple calibration program that observes how fast the delay loop runs and suggests a value for SST_INITDELAY using a safe target delay time (or even installs the variable permanently)

98/DOS Rig: BabyAT AladdinV, K6-2+/550, V3 2000, 128MB PC100, 20GB HDD, 128GB SD2IDE, SB Live!, SB16-SCSI, PicoGUS, WP32 McCake, iNFRA CD, ZIP100
XP Rig: Lian Li PC-10 ATX, Gigabyte X38-DQ6, Core2Duo E6850, ATi HD5870, 2GB DDR2, 2TB HDD, X-Fi XtremeGamer

Reply 11 of 30, by BitWrangler

User metadata
Rank l33t++
Rank
l33t++

There's an example of that I think in Linux, older kernels at least, where the Bogomips calibration loop runs to set a delay loop or timing variable.

Unicorn herding operations are proceeding, but all the totes of hens teeth and barrels of rocking horse poop give them plenty of hiding spots.

Reply 12 of 30, by Thandor

User metadata
Rank Member
Rank
Member

Excellent work pyrogx! 😀

thandor.net - hardware
And the rest of us would be carousing the aisles, stuffing baloney.

Reply 13 of 30, by schmatzler

User metadata
Rank Oldbie
Rank
Oldbie
Spark wrote on 2026-05-30, 18:09:

This is a great find! Do you think it might be possible that the glide2x.ovl dos driver might be fixed in a similar manner?

This would probably fix Archimedian Dynasty (Schleichfahrt) on faster systems and I hope it'll be possible.

I tried to run that on a Voodoo 2 and it reliably works with a P3 500Mhz, I got it to run once on 800MHz and never on my 1.4GHz Tualatin.

"Windows 98's natural state is locked up"

Reply 14 of 30, by Spark

User metadata
Rank Member
Rank
Member
schmatzler wrote on 2026-05-31, 23:50:

I tried to run that on a Voodoo 2 and it reliably works with a P3 500Mhz, I got it to run once on 800MHz and never on my 1.4GHz Tualatin.

That's pretty interesting as I had it working reliably on a celeron 766 at 66mhz, which must be near its limit.
Not that it matters now that pyrogx has isolated the actual cause.

Reply 15 of 30, by pyrogx

User metadata
Rank Member
Rank
Member

Managed to build the DOS driver. This OVL file really needs to be built with Watcom. I used OpenWatcom 1.9 (v2 wouldn't work) and some special dos4gw dll startup code.
Tomb Raider and Descent 2 worked just fine, didn't check any other DOS Glide game, though.

Reply 16 of 30, by DrAnthony

User metadata
Rank Member
Rank
Member

Maybe I'm crazy (okay I know I am, but hear me out). What if we thought about this like the engineers at 3DFX in like mid 96. We could write a small program to track the wall time for running that delay loop and gather data from volunteers with period correct systems (say P133 or so). Basically have them run it and calculate an average "initialization time" so that we can migrate to a sleep for X milliseconds delay approach that should be universal for all hardware. Basically take this from a quick and dirty lab fix to something actually robust.

Edit: Holy run on sentence. I definitely needed a cup of coffee before posting that. I'll leave it mostly intact but edit it to, you know, be somewhat coherent. Also it's fascinating that they stuck with this delay loop approach with Voodoo II. Perhaps it was less pressing of an issue given that it initialized faster or maybe they just had more pressing issues.

Last edited by DrAnthony on 2026-06-02, 12:34. Edited 1 time in total.

Reply 17 of 30, by Peckmore

User metadata
Rank Newbie
Rank
Newbie
NeoG_ wrote on 2026-05-31, 03:20:

I wonder if one would write a simple calibration program that observes how fast the delay loop runs and suggests a value for SST_INITDELAY using a safe target delay time (or even installs the variable permanently)

Could we do something even simpler and just multiply all loops by the CPU clock speed vs a "baseline"? It seems like up to around 500MHz most people don't have an issue, so maybe that could be the baseline of "1.0". Any CPU speed up to and including 500MHz just gets the default loop time. Any CPU speed over that, multiply the loops by the clock speed ratio. E.g., 750MHz => loops * 1.5. 1GHz => loops * 2.0. If you wanted to add a "buffer" to account for different architectures, etc., you could even drop the baseline to 400MHz, or 350Mhz.

I know that not all CPU architectures are equal, and speed doesn't scale linearly with clock speed across multiple architectures, but my guess would be a simple system like this could probably cover all use cases, and would be simpler than needing to benchmark the CPU to establish a multiplier?

DrAnthony wrote on 2026-06-01, 22:46:

Maybe I'm crazy (okay I know I am, but hear me out). What if thought about this like the engineers at 3DFX in like mid 96. What if we wrote a small program to track the wall time for running that delay loop and got some volunteers with period correct systems (say P133 or so) to run it and just calculate an average "initialization time" and migrated to to a sleep for x milliseconds delay rather that should be universal for all hardware?

Or this! Just wait for X seconds, rather than a loop. 😀

Reply 18 of 30, by pyrogx

User metadata
Rank Member
Rank
Member

Looked a bit more thoroughly into that issue. Turns out that these for-loops around sst1InitReturnStatus are not the actual problems, although slowing it down fixes the crashes. The real culprit was a single function which queries the "BUSY" status of the FBI chip. This needs to be either slowed down or a command clearing the graphics pipeline needs to be executed first.

There are two functions in the driver code which poll the FBI status, one of them issues this "clear pipeline" command, the other one doesn't.
...and for some reason, only the latter one is actually used in the driver, and that's why the whole thing face-plants the card without a slowdown.

I now changed the source code that the clear pipeline command is always sent to the FBI chip first before polling its status bit. It seems to work just fine, so I removed all the extra delay loop crap I added earlier. Everything looks okay, my Voodoo1 now even works on my old AMD FX-8350 system, the newest one I have with a PCI slot.

I also managed to patch the Win9x Direct3D driver, so that one also works fine now on my Athlon system.

Updated driver package attached.

Reply 19 of 30, by DrAnthony

User metadata
Rank Member
Rank
Member

Brilliant work! It makes sense that the delay loops helped, and honestly may have been inserted as a bandaid to fix this oversight. Maybe they had every intention to move that clear pipeline logic to it's own subroutine and then just forgot to finish the job after a long weekend. It would be interesting to see if the same issue is present in later drivers for the Voodoo II.