First post, by mkarcher
A well-known sales pitch from the mid 90s is "we use the new modern EDO RAM, so we don't need to add L2 cache", countered by "we use EDO RAM and L2 cache for even better performance". Let's perform a Quake 320x200 shootout at 120MHz on a Cyrix 5x86.
I'm using a Biostar MB-8433UUD with the latest UMC 8881/8886 chipset revision. To boost performance slightly, RAM is set to "slow refresh".
The processor is a Cyrix 5x86 (rated 100) at 120MHz, 4.0V (the Biostar board doesn't have a 3.6V setting, and I didn't yet bodge one in), PCR0=02, CCR2=D6, CCR3=1C, CCR4=38 with a cooling fan added to the original cyrix heatsink. The register settings mean: Branch prediction enabled, but loop optimization and return stack disabled, linear burst and burst writes enabled, "fast FPU" enabled, load/store serialization disabled.
These are the results I obtained. The columns mean
- CPU: cpu clock 120Mhz as 2*60 or 3*40. My 5x86 doesn't have a 4x multiplier, and my board doesn't provide 30MHz FSB, so no benchmarks for 4*30. That 5x86 doesn't work at 2*66 (133MHz).
- L2: amount of L2 cache installed (either chips removed from board or 256KB cache in two banks. Chips are nine 15ns 32k x 8 chips)
- RAM: amount of RAM installed. SIMMs were 32MB modules. EDO modules are single-sided(!) 32MB modules with 4 chips, FPM modules are double-sided 32MB modules with 16 chips.
- SPD: speed of the slowest SIMM in nanoseconds
- WS: RAM wait states as configured in the BIOS. (read WS/write WS)
- RAM burst: 3-1-1-1 or 4-2-2-2 as configured in the BIOS if EDO mode is active, or FPM in FPM mode.
- Quake: Score in fps from timedemo 1 at standard resolution (dosbench option c) with a Trio64V+ PCI graphics card. PCI clock is 40MHz at 3*40, and 30MHz at 2*60.
- SPSYS: Speedsys CPU score.
- remaining columns: Memory throughput as measured by speedsys (when you press M or write a report file).
CPU L2 RAM SPD WS RAM burst Quake SPSYS L1R L1W L1M L2R L2W L2M MR MW MM
2*60 none off 96MB 60 0/0 4-2-2-2 17.5 68.18 225.2 114.7 218.7 76.4 114.6 31.7
2*60 none off 64MB 60 2/1 FPM 15.5 68.18 222.3 76.4 214.6 45.8 76.3 20.5
2*60 256K/2B 3-2 WB 96MB 50 1/0 4-2-2-2 17.1 68.09 226.6 76.6 211.2 83.7 76.3 38.4 54.0 76.5 18.8
2*60 256K/2B 3-2 WT 96MB 50 1/0 4-2-2-2 17.2 226.6 76.6 219.7 83.7 75.3 35.3 54.0 76.5 24.2
2*60 256K/2B 3-2 WB 32MB 60 1/0 FPM 16.9 68.68 226.4 76.6 210.7 83.7 76.3 38.4 57.3 76.5 18.8
2*60 256K/2B 3-2 WT 32MB 60 1/0 FPM 17.2 68.58 226.6 76.6 219.7 83.7 76.3 35.3 57.3 76.5 24.6
3*40 none off 32MB 50 0/0 3-1-1-1 17.0 68.45 225.4 76.5 218.7 76.3 76.4 27.8
3*40 256K/2B off 32MB 50 0/0 4-2-2-2 16.0 222.4 76.5 216.6 50.9 76.4 23.5
3*40 256K/2B off 64MB 60 0/0 FPM 16.3 68.55 223.1 76.5 217.1 55.5 76.4 24.5
3*40 256K/2B 2-1 WB 32MB 60 0/0 4-2-2-2 16.8 68.51 227.1 76.5 216.0 87.6 76.2 40.1 50.9 76.4 18.0
3*40 256K/2B 2-1 WB 96MB 60 0/0 4-2-2-2 17.0 -- same, but partially uncached --
3*40 256K/2B 2-1 WT 32MB 60 0/0 4-2-2-2 17.1 68.46 227.1 76.5 220.1 87.6 76.2 36.0 50.9 76.4 24.5
3*40 256K/2B 2-1 WT 96MB 60 0/0 4-2-2-2 17.1 -- same as above, all memory Quake uses is cached --
3*40 256K/2B 2-1 WB 32MB 60 0/0 FPM 16.8 68.58 227.1 76.5 216.0 87.7 76.2 40.1 50.9 76.4 18.0
3*40 256K/2B 2-1 WB 64MB 60 0/0 FPM 17.0 -- same, but partially uncached --
3*40 256K/2B 2-1 WT 32MB 60 0/0 FPM 17.1 68.51 227.1 76.4 220.1 87.7 76.2 36.0 50.8 76.4 24.5
This table shows some interesting insights:
- The highest Quake score is achieved at FSB60 with no L2 cache. I didn't test whether L2 chips inserted but disabled will work at 0/0 WS.
- Quake does not like write-back cache at all. It's generally slower in WB mode than in WT mode. In WB mode, Quake gets faster if I install 64MB of RAM with the second half uncached, compared to 32MB fully cached. Quake is even faster in write-through mode. DOS 6 (used as OS for these tests) doesn't use more than 64MB of RAM, so installing 96MB just shows that the system is stable with that much RAM installed, but exceeding the cacheable area is not represented in these benchmark scores.
- EDO at its slow burst (4-2-2-2) with 0WS/0WS is slower than FPM RAM in FPM mode.
- EDO at its fast burst (3-1-1-1) performs remarkably well, but this speed barely works at 40MHz (only if just a single 50ns SIMM is installed, and L2 cache is physically removed). This configuration is likely meant for 33MHz FSB maximum.
- Setting the cache leadoff cycle to 3 clocks limits RAM performance. This is not surprising, as a tag lookup is required for all memory cycles. This means not using the cache can be helpful if you have fast RAM and need 3 cycles for a tag lookup.
- All of the FPM modules I used are not able to provide good performance at FSB60. I can run 60ns EDOs in "EDO mode" at 0/0, but I need to run 60ns FPM modules in "FPM mode" at 2/1.
So, for Quake, optimal performance is obtained with EDO DRAMs and no L2 cache installed or active.
I intend to automate these tests, possibly including more numbers. In the long run, I want to continue testing "overly new graphics cards" like the Radeon 9250 or even the Geforce FX 5200 with this platform. If you have suggestions what benchmarks to include, preferably in a way that the run can be automated, feel free to mention them.