Getting a ZOTAC GeForce FX 5200 PCI (with 256MB DDR RAM, 128-bit memory width) to work in a Shuttle HOT-433 board in Windows 2000 turned out to be challenging, but at least for DirectX, it seems to work at least basically now. 3DMark 99 runs fine on it. Let's start with the most obvious and easily solvable issue: No 3.3V supply on the PCI bus. This specific FX5200 needs a 3.3V supply. The 3.3V supply is mandatory since PCI 2.2, even at 5V signalling voltage. Typical 486 boards implement PCI 2.1, use 5V signalling and omit the 3.3V supply which is optional in that configuration. There is a VOGONs thread on adding 3.3V using a "bodge PCB" to inject 3.3V, but this requires soldering on the mainboard. Instead I decided for this one-off test project to solder on the graphics card instead and came up with this ugly contraption:
The attachment GF5200 3.3V.jpg is no longer available
I'm using DirectX 7 on Windows 2000. DirectX 8 requires a Pentium Processor, so DirectX 7 is the last possible version. DXDiag supplied with DX7 for Windows 2000 contains a bug that it doesn't start on a 5x86 processor. It contains the common mistake to assume that every processor that supports CPUID is a Pentium processor and has RDTSC available. I patched DXDiag to bypass this check and not use RDTSC. This is a straightforward patch. The offending piece of code is easily identifiable by either searching for the typical pushf/popf sequences for CPU type detection, or by looking into the DrWatson log that contains the ILLEGAL_INSTRUCTION-Exception and a pointer to the offending instruction. Let's not spend more time on this issue. Testing Direct 3D worked perfectly for software rendering (not that surprising), at awful performance (again, not that surprising on a 486 processor), but for hardware rendering, DXDiag failed with "Step 18: CreateDevice failed with HRESULT 887602eb" (or a similar message).
I took some time to analyze the root cause of this problem, which took a tour through all the layers of the Windows 2000 graphics stack. The error code is caused at the step where DXDiag already obtained a "primary surface for full-screen use, double-buffering and 3D rendering capability" and asks for a hardware accelerated 3D driver for it. It turns out that DXDiag didn't specify whether it wants that surface in video memory or system memory. In that case, DirectDraw tries to allocate the surface in video memory first, and failing that, it retries allocation in system memory. On that system, allocation in video memory failed. DirectDraw calls the GDI CanCreateSurface system call to ask the NVidia graphics driver whether it can create that surface in video memory. That call fails with the status code DDERR_OUTOFMEMORY. Yet, DXDiag reports 128MB of graphics memory (which is only half of the 256MB, but I don't care about that problem [yet], I wanted 128-bit memory access, the size of memory is secondary) which should be good enough for a double-buffered surface at 640x480 with 16bpp.
The driver errors out, because some call to the VDD using EngDeviceIoControl returns -1. The IO Control code used to call the VDD is a vendor-specific code, so there is no documentation about the stuff that call is supposed to do. It is called from code in the graphics driver where it tries to initialize "some stuff" that needs to be initialized before surfaces can be managed.
The VDD errors out because the kernel rejects a MmMapLockedPagesSpecifyCache call. The kernel driver tries to map a 6MB buffer into kernel address space after locking it to allow busmaster DMA to it. The root issue is that the kernel runs out of "system PTEs". The kernel has a limited amount of address space to map buffers into it. The management structure to tell the processor what pages is mapped to what virtual address is a "page table" containing of "page table entries". The page table structures for that part of kernel address space is allocated at a fixed size during boot. The allocated size depends on the amount of system RAM, can be overridden by a registry entry, and is clamped to "sensible values": No matter how much RAM you have and what the registry says, you will never get more than 50.000 pages of address space to map buffers, and you will never get less than 7000 pages. 7000 pages is 28MB, 50.000 pages is 200MB. The GeForce FX driver maps the 16MB MMIO area of the graphics card into kernel space, as well as 128MB of the graphics RAM. This by itself uses 144MB of address space. No only the NVidia driver uses buffer mappings, other drivers do so, too. As the computer has 256MB of RAM, Windows 2000 already decided to allocate a lot of page-table entries. Furthermore, the registry entry was set to an insanely high value (maybe by some graphics driver installer, maybe by SP4, or the rollup 1, or another update), so the kernel buffer mapping space already was maxed out at 200MB. I adapted the kernel to raise the upper clamp limit to 200.000 pages, and set up 70.000 pages in the registry in the value SystemPages in the key HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Control\Session Manager\Memory Management. The upper limit of pages is set using "MOV ECX, 50000" in the kernel, which is "B9 50 C3 00 00". You can patch this for example to 131072 pages by replacing it by "B9 00 00 02 00". There is only one hit for this byte sequence in Windows 2000 NTOSKRNL.EXE. When you patch kernel mode modules (drivers or the kernel itself), don't forget to update the checksum as well, because loading a kernel module with a bad checksum will result in a blue screen. I used https://www.coderforlife.com/projects/utilities/#PEChecksum to adjust the checksum.
After a reboot, there were enough free pages to DXDiag to start the Direct3D tests, which worked perfectly.
To see how much (or how little) of the performance of the card can be used, I tried to start with 3DMark 99 Max (which is available for free). It turned out that 3DMark 99 does not report any kind of sensible results on that machine, see Re: 3dmark99 MegaThread . Finally after fixing 3DMark, I obtained around 285 3DMarks at 4*33MHz. At 4*40MHz (PCI clocked at 40MHz), the system is unstable with the GeForce FX5200, but it works without problems with a Matrox G450. The stability issues might be thermal, so I will try adding a fan to try to get a 160MHz benchmark value as well.