This post discusses 3DMark 99 Max, as you can currently download from Futuremark for free.
Let's start with a disclaimer. 3DMark 99 is specified to require a 166MHz Pentium Processor, and support for Windows 2000 is explicitly denied. This post about running 3DMark 99 on a 486-class processor on Windows 2000 thus is oprating 3DMark 99 well outside of the specified operating conditions.
It's widely known that 3DMark 99 doesn't start on Windows 2000 unless you remove the DirectX version check (which is programmed in a way that it fails to work with Microsoft's DLL versioning scheme used for DirectX on Windows NT) or run it in Windows 98 compatibility mode. Windows 98 Compatibility Mode includes a shim called "DxVersionLie" that reports the DirectX DLL file version numbers as the same DirectX version for Windows 98 would have. Furthermore, the Anti-Lockup patch is required to 3DMark 99 to start on NT-series windows. The remaining parts of the post assume that these known issues already have been dealt with.
3DMark 99 tries to use RDTSC (read timestamp counter, which is supposed to return the number of processor clocks since power-up) for precision timing. It properly detects whether the processor supports it. If the processor does not support it, 3DMark 99 falls back into a mode called "Cyrix mode" according to the name of the exported function of RLMFC.DLL. This is likely due to the fact, that the Cyrix 6x86 processor didn't support RDTSC.
Interestingly, the lowest level abstraction has these methods:
- get the current time as 64-bit value (of unspecified format)
- calculate the difference between two 64-bit timestamps returned from method 1, returning a 64-bit difference (again of unspecified format)
- convert a difference returned from method 2 into a floating point value containing the number of "cycles". Here the format is specified.
- get the clock speed in "cycles" per second.
In RDTSC mode, the timestamp and the timestamp difference are 64-bit integers, and the cycles per second are the actual processor clock (assuming that there is no such thing as SpeedStep). The processor clock is determined by timing a loop that polls the wall-time clock to advance by 0.5 seconds.
In Cyrix mode, 3DMark 99 falls back to reading a processor speed independent time value, and converts it to seconds since program start stored as 64-bit double precision floating point number. The difference is again a 64-bit floating point number, and the "convert difference to floating point number" function just returns its parameter. The processor clock is reported as "1 cycle per second", so the timing values in seconds used here can be treated as virtual processor clocks, too.
This could be all to it, and it could work perfectly with a shared common code path above it. Too bad it is not implemented that way. The next layer above also distinguishes between processors with and without RDTSC. The code path for processors without RDTSC fails to initialize a variable that is supposed the frame time in seconds. Too bad that this variable is used for all the score calculations in 3DMark. This can quite easily be worked around by just removing the no-TSC case from the higher level code. The result is: You get scores, but more often than not, Result Browser locks up when you try to load the project file with the detailed benchmark values. This turns out to be due to the "2MB Texture Rendering Speed" being measured as infinite frames per second, and being processed in a way that an intermediate result contains NaN (not a number). The code that processes these intermediate results can't deal with the special behaviour of NaN values in comparisons, and enters an endless loop.
So another patch is needed. The Cyrix mode uses a generic function "getAppAge" to retrieve the wall-clock time since the start of 3DMark 99. This function is implemented on top of the WINMM function timeGetTime. On Windows 95, this function returns the time since boot in milliseconds (and that value overflows after 49 days), on later Windows version, a certain offset is added to it so it overflows the first quickly after power-up. This has been implemented to detect problems caused by not dealing with overflow of this value, which caused the infamous "Windows 95 locks up after 49.5 days without crashing" issue. While the unit of the timestamp is milliseconds, it doesn't mean we get millisecond resolution, even more so on Windows NT (and descendants). The "time counter" returned by timeGetTime is incremented by several ms every timer tick interrupt. To conserve processing power, Windows NT runs the timer at 100Hz, incrementing the time counter by ten milliseconds on each interrupt. You can request a higher precision of the timeGetTime counter using timeBeginPeriod and timeEndPeriod, up to 500Hz or even 1000Hz, but 3DMark 99 doesn't do so. This means that timeGetTime, while being fast, as it just reads one global kernel variable, can return the exact same value on begin and end of a frame, if the frame takes less than 10ms to render. (On Windows 95/98, the default rate of the timer is claimed to be around 300 to 500 Hz, so the issue is not that prominent there) If the begin time is equal to the end time, the frame duration is 0ms, and thus the frate is INF fps. While 3DMark uses a 16-frame moving average over 16 frames to calculate the resulting frame rate, as soon as one frame in that window is determined to run at INF fps, the average of the 16 fps values is also INF fps (yeah, that's again broken, as you should average over frame times, not over frame rates, but as long as the samples are close to each other, the difference is negligible). So what's required is a time source with higher precision. While you can increase the timer frequency using timeBeginPeriod, the increased timer frequency will take some CPU cycles for no actual gain. There is a different solution, though: The Win32 API contains the functions QueryPerformanceCounter and QueryPerformanceFrequency. These functions do not just read a value updated each timer interrupt, but they ask the timer chip for the time since the last interrupt as well as fetching the timestamp of the last timer interrupt, so you get around microsecond resolution using them. The disadvantage of QueryPerformanceCounter ist that it might be considerably slower than timeGetTime. As 3DMark only calls this function twice per frame, this disadvantage is irrelevant, though.
Please find attached a patched version of rlfmc.dll that unifies the TSC and non-TSC code paths as much as possible, and switches non-TSC timing from timeGetTime to QueryPerformanceCounter. I validated it as well as possible to work correctly on non-TSC machines and still work as before on TSC-capable machines.