First post, by feipoa
The Ultimate 486 Benchmark Comparison
In this study, 28 socket 3 CPUs were tested under identical conditions using 23 different benchmark programs. It is believed that this work is the most comprehensive 486 comparison to-date. The intent was to identify each CPU's relative performance to that of a Socket 5/7, Intel Pentium 100 (P54C). The names of the employed benchmark programs, and of which tests were run, can be found on the charts in Appendices 1 2. The test results are broken down into ALU, FPU, and overall performances. This methodology was decided upon as some applications are heavily ALU-specific, as with general clickidy-click Windows use, while others are largely FPU-specific, as with 3D games and mpeg/mp3 playback. The combinations of these tests are averaged in the Overall Performance chart.
EDIT (Nov. 2018): If you are looking for more of a 3D graphic intensive benchmark comparison performed on the socket 3 platform, you may wish to view this supplemental thread, https://www.vogons.org/viewtopic.php?f=63&t=61639
It is commonly asked what the fastest socket 3 486 CPU is. Depending on your computing goals, the answer may vary. An appropriately configured IBM/Cyrix 5x86-133 and an AMD X5-160 were the fastest commonly available CPUs for ALU-specific operations and were approximately equivalent to a Pentium 100. An IBM/Cyrix 5x86-133 (and perhaps an Intel POD-100-WB, if stable) was the fastest commonly available CPU for FPU-specific operations and was approximately equivalent to a Pentium 90. For the overall performance, an IBM/Cyrix 5x86-133 was the fastest commonly available CPU and was equivalent to a Pentium 100, however if you are lucky enough to own an AMD X5-133/160 which overclocks well at 200 MHz, this configuration would be the fastest possible for ALU-specific and overall tasks.
Appendices 1-2 tabulate the raw data computed from the employed benchmark programs, while Appendices 3-4 tabulate the data normalised to that of a Pentium 100. In cases where the tests contained units of seconds, the values were inverted to reflect an increasing trend with increasing performance.
The bar chart in Appendix 5 shows the integer-only, or algorithmic logic unit (ALU), performance in descending order for each CPU, including the normaliser Pentium 100 CPU, which by definition will always have a value of 1.0 x 100. The normalised values from Appendices 3-4 have been averaged and multiplied by 100 to be more pleasing to the eye. The ALU chart contains the average of tests 5, 8, 20, 22, 59, 65, 67, and (75-78), the results of which have been tabulated in Appendix 4 under test Integer. The parenthesis around a group of test results, such as (75-78), indicate that these tests were averaged independently as a single test so as to not give too much weight to any one benchmark program.
Similarly, Appendix 6 shows the averaged floating-point unit (FPU) performance, the values of which were taken from tests 4, 6, 9, 21, 23, 24, 48, 58, 60, 66, 68, 74, and (79-82). Appendix 7 portrays the overall CPU performance, and averages tests [1, 2, 3, 33, 36, 39, 44, 45, 46, 47, 54, 55, 57, 73], [5, 8, 20, 22, 59, 65, 67, (75-78)], and [4, 6, (9, 21), (23, 24), (48, 58), (60, 66), (68, 74), and (79-82)]. The pairing of the last several FPU tests was done so that the number of averaged FPU tests equated to the number of averaged ALU tests. There are also tests included in the Overall Performance chart which were not included on the ALU and FPU charts because their test methodology was either not ALU- or FPU-specific, or could not be determined.
● PC Chips M919 v3.4 B/F Motherboard (Hsing Tech) - UMC 8881F/8886BF chipset [BIOS: 05/06/1996]
● 128 MB Fast-page mode RAM (60 ns) [BIOS: 0WS/0WS]
● 256 KB Double-banked L2 SRAM Cache (15 ns), Write-back w/ 32 MB cacheable range [BIOS: 2-1-2]
PCI Slot 1 = Adaptec 2940U2W PCI SCSI Controller w/Seagate ST373307LW Ultra320 Hard drive
PCI Slot 2 = SIIG Intek21 (TKP2U022) OPTi 82C861 USB Host Controller (Disabled in Windows)
PCI Slot 3 = Matrox Millennium G200 PCI Graphics Card, 16 MB SDRAM
ISA Slot 2 = 3Com 3C509B-TPO Etherlink III Network Interface Card
ISA Slot 4 = Creative Labs AWE64 Gold, 4MB (CT4390)
● Windows 98SE: 1024x768x16bit, DirectX 6.1a, All security updates installed as of April 1, 2011
● L2 cache set to Write-through w/POD83-WT for stability in Windows
● Biostar MB-8433UUD v3.x UMC chipset motherboard [BIOS: UUD960326S, 03/26/1996] was used to test the POD83/POD100 in write-back mode since the M919 did not support the POD in Write-back mode. The MB-8433UUD and M919 both contain late model UMC 8881F/8886BF chipsets and are expected to yield test results consistent with one another. Three CPUs (A*, B*, C*) were later added using the MB-8433UUD.
● Some Windows test results (#48, 55, 64, and 72) will improve when all RAM is cacheable, i.e. when using 32MB RAM for 256KB WB L2 cache
● Doom fps = gametics / realtics x 35, where gametics is 2134 for demo3 and realtics is measured
● 3DMark99Max (C*, IBM 5x86C-133-FF): 3DMarks = 15, CPU Marks = 5, 3DMark99: 19
PENTIUM REFERENCE SYSTEM (used for normalisation of data)
● AZZA PT-5IT v2.1 Motherboard - Intel 430TX chipset [BIOS: 07/17/1998]
● Intel Pentium-100 P54C CPU, S-spec: SX963, CPUID: 0525, Cache: 16KB-split Write-back
● 512 KB L2 Pipeline Burst SRAM Cache (12 ns TAG), Write-back w/64 MB cacheable range
● All other hardware identical to 486 test system
CYRIX 5X86 REGISTER BIT SETTINGS*
[PCR0=5, CCR1=2, CCR2=C6, CCR3=1C, CCR4=38, WBE (CD=0, NW=1)] (units are in hexadecimal)
RSTK_EN = 1
Enables the return stack so that RET instructions will speculatively execute following a CALL.
BTB_EN = 0
Invokes the branch target buffer for instruction addresses, thereby inducing branch prediction. Works reliably with stepping 1 CPUs only.
LOOP_EN = 1
Enables the prefetch buffer loop for destination jumps still present in the prefetch buffer (prevents buffer flushing/reloading).
LSSER = 0
If set to 0, memory reads and writes to the load/store memory management unit can be reordered for optimum performance.
USE_WBAK = 1
Enables write-back L1 cache pins.
WT1 = 1
Enables write-through in region 1 (640KB-1MB). Forces all writes to region 1 that hit the L1 cache to be sent to external bus.
BWRT = 1
Enables the use of 16-byte burst write-back cycles.
LINBRST = 1
Enables a linear address sequence while performing burst cycles (as opposed to i486 "1+4" address sequencing).
FP_FAST = 1
Enables Fast FPU exception handling.
MEM_BYP = 1
Enables memory read bypassing so that data can be read from the write buffers prior to being written to external memory.
DTE_EN = 1
Enables the directory table entry cache.
FP_FAST = 0 sometimes required for WinTune98 Direct3D test only
FP_FAST = 0 required for Bytemark Neural Net test only
FP_FAST = 1 possible for Bytemark Neural Net test if BTB = 1 and LOOP_EN = 0
BWRT = 0 required for stability (DOS / Windows)
RSTK = 0 required for stability (Windows)
RESULTS - ALU
From Appendix 5, it is clear that the Cyrix/IBM 5x86-133 had the best ALU performance for non-overclocked CPUs, whereby the Cyrix 5x86-120 was next in-line with 8% less performance. It may be argued, however, that the AMD X5-160 is long-term stable at 160 MHz and should not be considered an overclocked CPU, thereby outperforming the Cyrix 5x86-133 by 6%, that is, by 6 "Pentium points".
[Note: All referenced percent increases/decreases noted in this study are relative to the Pentium 100 reference CPU and not to the individual increases/decreases between compared CPUs. An increase of 10% can be thought of as 10 "Pentium points", meaning that this increase would be similar to an upgrade from a Pentium-90 to a Pentium-100. WB refers to L1 cache in write-back mode, while WT refers to L1 cache in write-through mode. If WB/WT is not specified, refer to L1 Cache Type on the charts in Appendices 1-4. BP is for a Cyrix 5x86 series processor with branch prediction enabled, while EO is for a Cyrix 5x86 series processor with its enhancement features disabled. An FF suffix refers to running the motherboard with a 66 MHz front-side bus (i.e. as opposed to a 33/40 MHz FSB). DR refers to CPU results which were derived from the linear slope of identically branded CPUs. More on DR CPUs can be found in Derived CPU Results. Lastly, RG refers to a CPU tested by Retro Games 100.]
If ALU performance is of primary interest, it was recently discovered that running an IBM/Cyrix 5x86 with a front-side bus (FSB) of 66 MHz greatly improved performance, whereby the IBM 5x86C-133-FF-BP's ALU performance closely approached that of the AMD X5-160. It is important to consider, though, that CPU's A*, B*, C* (AMD X5-200-RG, IBM 5x86C-133-FF-BP, IBM 5x86C-133-FF, respectively) were not run on the same system as other CPU's portrayed in this study, so their scores will be very marginally higher than other CPU's reported here. This is due to the entirety of the motherboard's RAM being cacheable by the system's L2 cache. For more on this topic, refer to Additional Discussion. While the two tested IBM 5x86C-133's are overclocked CPUs, a non-overclocked Cyrix 5x86-133 may also be run with the -FF and -BP suffixes and be considered a non-overclocked CPU yielding results consistent with the IBM 5x86C-133-FF-BP. It may also be counter argued that running a 486's northbridge controller at 66 MHz is overclocking the motherboard, regardless of how stable the system appears. To please the masses, the IBM 5x86C-133-FF-BP will be considered a pseudo-overclocked CPU.
The massively overclocked AMD X5-200-RG was recently added to the benchmark charts and was tested by www.vogons.org user Retro Games 100. The motherboard used for these tests was a Biostar MB8433 UUD v3.1 with an identical graphics adapter, so that CPUs A*, B*, and C* are equivalently comparable. The long-term stability of the X5-200-RG is currently unknown, but if deemed stable and appropriately cooled, it would certainly be the fastest usable overclocked 486 inasmuch as ALU performance is concerned.
Clock-for-clock, the Cyrix 5x86 133, 120, and 100 CPUs outperformed all AMD DX4/DX5 and Intel DX4 parts, however the Intel DX4-100 and DX4-120 pieces weighed in very closely to the equivalently clocked Cyrix 5x86 CPUs. Following the trend, had Intel made a real DX4-120 (not one that required overclocking) or a DX5-133, their arithmetic logic units would have outperformed a similarly clocked AMD part by a significant margin.
[Note: The Cyrix 5x86-100-BP with branch prediction enabled is a stable configuration on Stepping 1, Revision 3 CPUs (S1R3). Branch prediction is DOS-only stable on Stepping 0, Revision 5 CPUs (S0R5) and was, therefore, not enabled for general testing. To retract slightly from this statement, it was recently determined that branch prediction will function in Windows using S0R5 CPUs if the system is first booted into DOS then into Windows (i.e. by typing win at the command prompt). Using this method, the en suite of Windows tests was easily completed with an IBM 5x86C-133-FF-BP. It is still unclear whether or not Cyrix 5x86-120/133 CPUs were ever produced with S1R3, however the Cyrix 5x86-100 came in both S1R3 and S0R5 flavours. All S1R3 units surveyed to-date had production dates earlier than the S0R5 units; this may indicate that the S0R5 pieces are the more refined revisions.]
Observing just the three DX4-100's (Intel, AMD-WT, and Cyrix), the Intel piece won by a landslide, followed by the AMD and Cyrix parts. This may be understood by the following explanation: The Intel CPU has 16 KB of write-back cache while the Cyrix part only has 8 KB of write-back cache. The AMD falls in last because it has 8 KB of write-through cache. An AMD DX4-100 with 8 KB of write-back cache does exist, but was not available for testing. There also exist 16 KB, write-back versions of the AMD DX4-100 (which were later added to the chart and are referred to as AMD DX4-100-WB) which emerged more than a year after the X5 133. The AMD DX4-100-WB-16KB was not commonly found in consumer-marketed 486 machines, however, for completeness, a 100 MHz down-clocked X5-133 was added to the charts to simulate this CPU. The AMD DX4-100-WT was the most common of the AMD DX4's at 100 MHz. The AMD DX4-100-WB now steps ahead of the Cyrix DX4-100 in ALU performance (it has 8 KB more cache than the Cyrix), however still falls considerably behind the Intel DX4-100. The Intel piece clearly had a performance edge for DX4-100 class processors.
Considering that the ALU performance of the Intel DX4 100 and DX4-120 CPUs were similar to that of the Cyrix 5x86 units at the same clock rate, it may be that the Intel DX4 also contains some architectural enhancements, especially considering that the Intel DX4-120 marginally outperformed the AMD DX5-133 in ALU-focused operations.
Clock-for-clock, the ALU performances of the Cyrix 5x86-100-BP, Intel DX4-100, and AMD DX4-100-WB were 73%, 69%, and 62% of a Pentium-100 (P54C), respectively.
Observing the three DX2-66 pieces (Intel, AMD, and Cyrix), they all displayed similar performances at 66 MHz. This is surprising considering that the Cyrix company literature mentions the Cyrix DX2-66 as containing write-back cache, while the AMD and Intel units tested both had 8 KB of write-through cache. It may be that either the Cyrix part really contains WT cache or that the motherboard placed its cache in WT mode (one of the cache utility programs claimed the CPU was in WT mode, while another mentioned it was in WB mode. CTCM and Chkcpu16 are usually in agreement, but not for this CPU). The IBM-Cyrix literature mentions a 15% performance increase for chips with write-back cache due to eliminating unnecessary external memory write cycles. An Intel DX2-66 with 8 KB of write-back cache also exists, however it was far less common than the WT version and was not available for testing.
As far as the Pentiums are concerned, the P54C-100 (the reference CPU) contains half the L1 cache of the POD 100 WB (Socket 3, Pentium Overdrive) and scored 7% better. This can easily be understood as the P54C-100 was run on a motherboard with a 66 MHz FSB and with pipeline burst cache. By comparing the POD-83-WB with that of the POD-83-WT, it is clear that using the WB caching scheme over the WT scheme contributes significantly to performance, 15% in this case. Unfortunately, the POD 100 may be hit-or-miss with finding one which will overclock well to 100 MHz. Only 1 in 3 units tested for this project overclocked well enough in Windows to run the complete set of benchmark tests. The POD-100 in WT mode overclocks even worse than in WB mode, so this was used as a basis to establish which 1 in 3 were the best overclockers.
[Note: Only the M919 seemed to operate with WT mode correctly on the POD, whereas if you set the MB 8433UUD motherboard to WT mode on a POD, the cache utility programs would still report the CPU to be in WB mode).]
Turning off the Cyrix next generation enhancements, as witnessed with Cyrix 5x86-133-EO, brought the ALU performance down to a level barely above that of the AMD X5-133. Fortunately, most socket 3, PCI-based motherboards work with the majority of these enhancements enabled, with the worst case leaning towards BWRT and LINBRST being non-functional on some older PCI-based motherboards. Further individualised testing would be required to determine what significance each enhancement has over the benchmark results. No such testing is currently planned; however, it is evident from the ALU Performance chart that enabling branch prediction (5x86-100-BP) increased ALU performance by 3%. It has also been observed that turning on FP_FAST increased FPU performance by 10%.
Looking again at the AMD DX2-66 score of 40.3 and doubling it (to simulate 133 MHz operation), we get a score of 80.6, which is about the same as an AMD X5-133 (at 82.3). It appears as if not much as changed architecturally with the AMD unit, except for whatever layout rules and technology were needed to enable 133 MHz operation on a DX2. The increase was only 2.04 fold, whereby a 2 fold increase would be expected based on clock frequency alone. 2.04/2 = 1.02, or a 2% increase. This will be referred to as the design enhancement factor, or DEF. Using the same analogy for the Cyrix DX2-66 and Cyrix 5x86-133, we see a DEF of 1.20, or 20%! Through linear extrapolation, the ALU performance of a fictitious Intel DX2-60 was obtained, resulting in an Intel DX2-60 / DX4-120 DEF of 1.06, or a gain of 6%.
RESULTS - FPU
For non-overclocked CPUs, Appendix 6 indicates that the Cyrix 5x86-133 had the best FPU performance, whereby the Intel POD-83-WB was next in-line with 4% less performance. A Cyrix 5x86-120 and POD-83-WB were a close match, with a 3% lead by the POD-83-WB. A Cyrix 5x86-120, and even a POD-83-WT in write-through mode, outperformed the AMD X5-160 in FPU operations. Looking at the charted FPU results, an AMD X5-160 was approximately equivalent to a Cyrix 5x86-100 in floating-point operations, while an AMD X5-133, Cyrix 5x86-80, and Intel DX4-120 demonstrated similar performance.
For overclocked CPUs, the IBM 5x86C-133-FF-BP fell short of an Intel POD-100-WB by 5% and an AMD X5-200-RG fell short of an Intel POD-100-WB by 10%.
Observing just the three DX4-100's (Intel, AMD, and Cyrix), the Intel piece won by only a slight margin (2%) - the three were all basically equivalent in FPU operations. The same can be said for the three DX2-66 pieces; their performance deviated by less than 1%. Assuming all three DX2 / DX4 CPUs were available concurrently for purchase and there was a steep cost differential, the obvious decision would have been to buy the most conservatively priced CPU, that is, if FPU performance was the primary interest. In 1996, the main applications demanding FPU power were mathematical and simulation-based modeling for the scientific research community, as well as the newly released 3D game, Quake. Widespread mp3 decoding interest followed suite shortly thereafter.
Clock-for-clock, FPU performances of the Cyrix 5x86-100, Intel DX4-100, and AMD DX4-100 were 64%, 44%, and 42% of a Pentium-100 (P54C), respectively. Certain 3D games, such as Quake, make heavy use of Pentium-specific optimisations, so the FPU performance difference with Quake between Pentium vs. non-Pentium chips is expected to increase more than with other benchmark tests. Taking Quake as an example, and looking at Appendix 3, Test 47, we see that the performances of the Cyrix 5x86-100, Intel DX4-100, and AMD DX4-100 are 53%, 45%, and 43% of a Pentium-100 (P54C), respectively. As for the AMD DX4-100-WB, the addition of 8 KB more L1 cache, and with the CPU in write-back mode, did not seem to improve FPU performance.
When observing the Quake score of the POD-100-WB, it should be mentioned that this CPU was unable to complete the Quake timedemo test. This score had to be linearly extrapolated from the scores of the POD-83-WB, POD-83-WT, and POD-100-WT (not included on the charts). The fastest playable Quake score was from the POD-100-WT, which scored 23.6 fps. The POD-100-WB, however, was able to complete all other tests found in this report.
Looking again at the design enhancement factor for FPU operations, the AMD DX2-66 / X5-133 showed a DEF of 0.97, or a 3% loss, while the Cyrix DX2-66 and Cyrix 5x86-133 showed a DEF of 1.45. For the Cyrix 5x86-133, enhancements in CPU architecture (as compared to an old generation DX2-133) increased floating-point performance by 45%! Through linear extrapolation, the FPU performance of a fictitious Intel DX2-60 was obtained, resulting in an Intel DX2-60 / DX4-120 DEF of 1.0, or no gain/loss.
RESULTS - OVERALL
For non-overclocked CPUs, Appendix 6 indicates that the Cyrix 5x86-133 displayed the best overall performance, less the P54C 100 reference CPU by 8%. Now, for a Cyrix/IBM 5x86-133 running with a 66 MHz FSB (same as the P54C-100 reference CPU) and with branch prediction enabled, the CPU now perfectly equates to the overall performance of an Intel Pentium 100. This is extraordinary considering that the P54C 100 was run on a motherboard which had technology at least 2 years newer and pipeline burst L2 cache! It would be interesting to see a similar CPU comparison done using the same hardware for the socket 7 ensemble of CPUs, particularly interesting would be how well a Cyrix 6x86 at 100 MHz performs compared to a P54C 100 for the same set of tests.
For the overall best overclocked CPU performance, the AMD X5-200-RG lead the race, with the IBM 5x86C-133-FF-BP falling shortly behind. For typical use, these two CPU configurations would be fairly competitive and are hereby independently awarded the title of World's Fastest 486 Â®. Finding an AMD X5-133 or X5-160 which will stably overclock to 200 MHz is extremely rare though, however the IBM 5x86C-100HF used in this study was relatively popular since IBM manufactured these chips well after Cyrix ceased production interest. The IBM branded chips were really just Cyrix designed chips with IBM external markings. Both versions came from IBM's semiconductor fabrication facility. It is thought that IBM maintained stricter fabrication and qualification yields on the 5x86 than Cyrix did, so the IBM branded versions may have a better chance in running at 133 MHz.
Comparing the Cyrix 5x86-133 to the IBM 5x86C-133-FF, we see a 6% overall performance enhancement in using a 66 MHz front-side bus as compared to using a 33 MHz front-side bus. The PCI bus for both cases was maintained at 33 MHz through the motherboard's onboard PLL chip. With a 66 MHz FSB, the socket 3 486 motherboard performs closer to a socket 5 Pentium motherboard with regular SRAM, which makes for a fairer chip-for-chip comparison to the Pentiums. Most users of an IBM 5x86-100HF or Cyrix 5x86-120 will likely run their chip at its advertised speed of 120 MHz, however a 15% speed boost can be obtained if the chip is run at 133 MHz (CLKMUL set to 2x, FSB set to 66 MHz). Recent tests seem to indicate that this modest overclock is stable in Windows 98SE, WinNT 4.0, and Win2000 when the CPU's core voltage is set to 3.85 volts and a cooling fan is added to the heatsink.
The POD-100-WB was comparable to the Cyrix 5x86-133, although its long-term stability remains an open question. The POD-83-WB's performance fell somewhere between that of a Cyrix 5x86-100-BP and a Cyrix 5x86-120. For non-overclocked and readily obtainable CPUs of the time, the Cyrix 5x86 120 was your best bet, with overall performance similar to that of a fictitious P54C-83.5. To include conservatively overclocked CPUs, the AMD X5-160 is a good compromise and performed similarly overall to a Cyrix 5x86-120.
The overall performance of the AMD X5-133 was marginally less than that of a Cyrix 5x86-100-BP, which was no surprise considering this is a very common conclusion that many benchmark results have independently arrived at. If you have your mind set on a Pentium Overdrive socket 3 processor and if your motherboard only works with the POD-83 in write-through mode, you're better off with an AMD X5 133 or an Intel DX4-120. The overclocked tests for the Intel DX4-120 never hung, however long-term stability hasn't been firmly established.
Not surprising were the overall Cyrix 5x86 scores, which weighed in closely to their expected Pentium equivalents. The Cyrix 5x86-100 has a Pentium rating of 75, or PR-75, whereby the results from Appendix 7 placed it at about a Pentium 72.5. The Cyrix 5x86-120 has a Pentium rating of PR-90, but fell short of this goal with only a Pentium rating of only 83.5. The Cyrix 5x86-133 has a Pentium rating of PR-100 and the results awarded it a Pentium 91.5, however if using branch prediction and a 66 MHz FSB, it met right up to that PR-100 expectation. Lastly, the AMD X5-133 has a Pentium rating of PR-75 and the results studied here indicated it was approximately equal to a Pentium 71.4.
Clock-for-clock, the overall performances of the Cyrix 5x86-100-BP, Intel DX4-100, AMD DX4-100-WB and AMD DX4-100-WT were 72.5%, 58.9%, 58.5%, and 51.7% of a Pentium-100 (P54C), respectively. From the AMD portion of the comparison, it was evident that having 16 KB of WB cache improved CPU performance by 7% compared to those with only 8 KB of WT cache.
For the overall design enhancement factor, the AMD DX2-66 / X5-133 had a DEF of 0.94, or a loss of 6%. Through linear extrapolation, the overall performance of a fictitious Intel DX2-60 was obtained, resulting in an Intel DX2-60 / DX4-120 DEF of 0.98, or a loss of 2%. The AMD X5 and Intel DX4 parts seem to contain no prominent design enhancements, although the Intel DX4 displayed an ALU DEF of 6%. The DEF results for the AMD X5 seem to be in agreement with reports made elsewhere stating that the AMD X5 is merely a clock-enhanced DX2/DX4. The Cyrix DX2-66 / 5x86-133 had an overall DEF of 1.26, or a 26% increase. Using the same extrapolation method, we can also obtain an Intel DX2-41.5 / POD83-WB DEF of 1.59, or a 59% increase. The CPU design architectural enhancements of the POD83 were twice that of the Cyrix 5x86; it is unfortunate that Intel did not produce a 100, 120, or 133 MHz version of the Pentium Overdrive for the socket 3 platform. Such high speed Pentium Overdrives would have undoubtedly squashed all other 486 competitors and perhaps even put AMD out of business. For AMD, sales from the X5-133 continued well into 1999 and were what kept them afloat during the K5 flop.
Unfortunately the three AMD X5's tested would not operate at 200 MHz in either the M919 or MB 8433UUD motherboards. From results reported elsewhere (Retro Games 100), it may be that an AMD X5-133ADW with CPUID 04F4 (and perhaps with a production date of 9630DPE) is required for operation at 200 MHz and 5V. One other online user with an AMD marked as an X5-160 has also reported success at 200 MHz. The AMD X5's tested for this study were of flavours 133ADZ (0494 & 04F4) and 133ADW (0494). The results from Retro Games 100 were later added to this benchmark comparison. Refer to the section under Derived CPU Results to see how an AMD X5-200 compares to other socket 3 CPUs tested herein.
It is thought that the 5x86-120 was somewhat plagued with running a 40 MHz front-side bus (as opposed to a relatively standard 33 MHz bus), but this only gave problems with select PCI cards, notably older 3Com network cards. Some modern 486 motherboards have a workaround for this issue by either auto-enabling a 2/3 FSB multiplier (so that the PCI bus runs at 27 MHz, as is suspected with the M919), or by implementing a user-controllable 1/2 and/or 2/3 multiplier option in the BIOS (as is the case with the MB 8433UUD). If a Cyrix 5x86-120 was problematic in your motherboard and if you were unable to obtain a Cyrix 5x86-133, the next best choice was probably a POD-83-WB or AMD X5-160. Unfortunately, the POD-83's were over-priced, late to market, and many motherboards didn't work properly with them installed. There exist some hard-to-find AMD X5 CPUs marked as 150 and 160 MHz. It is thought that some of the 133ADZ units are actually rated for 160 MHz, but down-marked by AMD for marketing reasons, particularly so that 486 sales did not heavily distract from AMD K5 sales. By this argument, the AMD X5-160 may be considered a non-overclocked CPU, whose overall performance, as noted in Appendix 7, equates to that of a Cyrix 5x86-120.
[Note: The difficulty in finding Cyrix 5x86 CPUs is due, in-part, to a short production run, from approximately August 1995 to February 1996. Production dates for the 5x86-133 were generally from the first several weeks of 1996, though some from Dec. 1995 were also produced. There exist mixed reports online which claim that all of the 5x86-133 CPUs were bought by either Evergreen Technologies, Gainbery Computer Products, or Computer Nerd (RA4) for use in 486 upgrade kits. The AMD X5, by contrast, was produced all the way from 1996 through 1999.]
The tabulated results found in Appendices 1-2 are not intended to reflect the absolute best performance of each CPU, but rather to serve as an equal comparison of CPUs within an identical system. Consider a motherboard with 1024 KB of L2, or Level 2, cache; it has a cacheable range of 128 MB, whereby the employed system, with 256 KB L2 cache, can only cache up to 32 MB of this 128 MB (in write-back mode). The remaining 96 MB (128 MB - 32 MB) remains uncached by the L2 cache system (although it is still cached by the CPU's L1 cache) and any data in RAM being accessed by the CPU which isn't L2 cached will usually be processed ~2 times slower than if it was L2 cached. For the purposes of this study, the effect of this uncached RAM seemed to be witnessed only in Windows and by specific benchmark programs, namely CPUMark, SuperPi, PassMark, and WinTune-Memory. In the case of these specific benchmark tests, using less RAM (32 MB, for this system) will actually increase test performances, i.e. for the Cyrix 5x86-133, CPUMark99 increased from 3.8 to over 5. Similarly, the use of a different graphics card, or a 40 MHz front-side bus will also affect benchmark performance. Luckily, these various system-wide optimisations should not affect each CPU's comparative benchmark score since all CPUs were tested in an identical system. If one were to re-do the ensemble of tests, it may be preferred to use an amount of RAM that is fully cached by the Level 2 cache, in this way, these particular Windows-based scores would be more properly comparable to test results taken by other users and in other systems.
When using a Biostar MB-8433UUD motherboard (in contrast to the employed M919), the PCI bus can be set to 40 MHz without the fear of an auto-divider sneaking its way in. This increases performance with 40 MHz FSB CPUs only (5x86 120, X5-160, Intel/AMD DX4-120, and POD-100-WB). For example, with a fixed 40 MHz FSB (as opposed to 27 MHz), a 15% improvement was noted in the 3Dbench score when tested with a Cyrix 5x86-120 in an MB-8433UUD motherboard. This enhancement is only noticed for highly graphic applications, like Quake, or any benchmark requiring extensive use of the PCI bus. Considering each 40 MHz FSB CPU was given the same treatment (of a 2/3 auto-multiplier on the M919) and noting that the 40 MHz FSB issue affected only 10% of the benchmark programs, most of the performance hit will average-out given the large number of tests being averaged. For an entirely fair CPU-only comparison, the FSB would need to be the fixed for all CPUs. This, unfortunately, is not possible without implementing a custom PLL circuit.
For CPUs running with a 40 MHz FSB, the observed decrease in heavy graphics-based performance (mainly in Quake, Doom, Pcpbench, and 3Dbench) was due to the M919 automatically adding a 2/3 FSB-to-PCI multiplier, thereby slowing graphics throughput to 27 MHz. Unfortunately the phase-locked loop (PLL) circuit used on 486 motherboards isn't complex enough to set the PCI bus to a fixed 33.3 MHz while maintaining the 40 MHz FSB to the RAM, although some Pentium-class motherboards have PLL circuits which allow for a relatively constant FSB frequency via the BSEL signal.
DERIVED CPU RESULTS
There exist some preliminary tests online of an AMD X5-133 being overclocked to 180 MHz (3 x 60 MHz) and to 200 MHz (4 x 50 MHz), using a Shuttle HOT-433 v1-3 and a Biostar MB8433-UUD v3.1, respectively. Unfortunately, cache and memory wait state settings may need to be de-optimised from the static values used in this comparison to maintain stability. There are also reports of a Cyrix 5x86-100 being overclocked to 150 MHz (3 x 50 MHz) well enough to run several of the utilised benchmark programs, however cache and memory wait states were also significantly reduced. A second account of such an AMD-based system surfaced online in late 2010. It employed an AMD X5-160ADZ, overclocked to 200 MHz on a SOYO SY-4SAW2 motherboard, whereby the performance was stable enough to install and run Win98SE and WinME.
[Note: UMC-based 486 motherboards contain undocumented FSB jumper settings to enable 60/66 MHz operation.]
While the exact BIOS, memory, and cache settings adopted in these systems is not fully documented, the results are promising for the would-be retrocomputing overclocker. Without evidence of prolonged stability, such systems may be more of a curiosity rather than a practicality. In light of these curious findings, data from the current results have been extrapolated linearly to produce the results of an AMD X5-180, AMD X5-200, and Cyrix 5x86-150. As of the November edition of this comparison, actual results from an AMD X5-200 have been added.
Results for these overclocked CPUs have been added to the ALU, FPU, and Overall Performance charts and are identified as Derived CPU in Appendices 5-7. If you are lucky enough to have access to a system running with one of these CPUs, your system's performance may not necessarily reach the levels published in these charts. It is sometimes necessary to configure the system settings (via the BIOS) to slower I/O recovery times, cache wait states, and memory wait states, to name a few. Expect to reduce the derived CPU scores shown here anywhere from 0-15%. Put more quantitatively, there have been reported SpeedSys scores for the AMD X5-180 at 67.5 and AMD X5 200 at 75.1, whereby, for comparison, the Cyrix 5x86-133 tested in this study had a score of 73.6.
Looking at the ALU Performance chart, the three derived CPUs are the clear leaders of the race, with the tested AMD X5-200-RG matching perfectly with the linearly extrapolated AMD X5-200-DR. For FPU performance, the Cyrix 5x86-133/150 outperformed even the AMD X5-200-RG, and the Cyrix 5x86-150 stretched curiously close to the POD-100-WB. For the overall performance, an AMD X5 200-RG whopped even the Pentium 100 (P54C) by 9%. The 5.5% gain of the AMD X5-200-RG over the AMD X5-200-DR may be largely attributed to the use of a 50 MHz front-side bus and, to a lesser extent, all RAM being L2 cacheable. A Cyrix 5x86-150 also outperformed the Pentium 100 (P54C) by 2%.
Gentlemen, its time to throw out those slow Intel Pentium 100's and treat yourself to a 486!