VOGONS


The Ultimate 486 Benchmark Comparison

Topic actions

First post, by feipoa

User metadata
Rank l33t++
Rank
l33t++

The Ultimate 486 Benchmark Comparison

In this study, 28 socket 3 CPUs were tested under identical conditions using 23 different benchmark programs. It is believed that this work is the most comprehensive 486 comparison to-date. The intent was to identify each CPU's relative performance to that of a Socket 5/7, Intel Pentium 100 (P54C). The names of the employed benchmark programs, and of which tests were run, can be found on the charts in Appendices 1 2. The test results are broken down into ALU, FPU, and overall performances. This methodology was decided upon as some applications are heavily ALU-specific, as with general clickidy-click Windows use, while others are largely FPU-specific, as with 3D games and mpeg/mp3 playback. The combinations of these tests are averaged in the Overall Performance chart.

EDIT (Nov. 2018): If you are looking for more of a 3D graphic intensive benchmark comparison performed on the socket 3 platform, you may wish to view this supplemental thread, Voodoo 1 vs. Voodoo 2 on a 486

EXECUTIVE SUMMARY

It is commonly asked what the fastest socket 3 486 CPU is. Depending on your computing goals, the answer may vary. An appropriately configured IBM/Cyrix 5x86-133 and an AMD X5-160 were the fastest commonly available CPUs for ALU-specific operations and were approximately equivalent to a Pentium 100. An IBM/Cyrix 5x86-133 (and perhaps an Intel POD-100-WB, if stable) was the fastest commonly available CPU for FPU-specific operations and was approximately equivalent to a Pentium 90. For the overall performance, an IBM/Cyrix 5x86-133 was the fastest commonly available CPU and was equivalent to a Pentium 100, however if you are lucky enough to own an AMD X5-133/160 which overclocks well at 200 MHz, this configuration would be the fastest possible for ALU-specific and overall tasks.

TEST METHODOLOGY

Appendices 1-2 tabulate the raw data computed from the employed benchmark programs, while Appendices 3-4 tabulate the data normalised to that of a Pentium 100. In cases where the tests contained units of seconds, the values were inverted to reflect an increasing trend with increasing performance.

The bar chart in Appendix 5 shows the integer-only, or algorithmic logic unit (ALU), performance in descending order for each CPU, including the normaliser Pentium 100 CPU, which by definition will always have a value of 1.0 x 100. The normalised values from Appendices 3-4 have been averaged and multiplied by 100 to be more pleasing to the eye. The ALU chart contains the average of tests 5, 8, 20, 22, 59, 65, 67, and (75-78), the results of which have been tabulated in Appendix 4 under test Integer. The parenthesis around a group of test results, such as (75-78), indicate that these tests were averaged independently as a single test so as to not give too much weight to any one benchmark program.

Similarly, Appendix 6 shows the averaged floating-point unit (FPU) performance, the values of which were taken from tests 4, 6, 9, 21, 23, 24, 48, 58, 60, 66, 68, 74, and (79-82). Appendix 7 portrays the overall CPU performance, and averages tests [1, 2, 3, 33, 36, 39, 44, 45, 46, 47, 54, 55, 57, 73], [5, 8, 20, 22, 59, 65, 67, (75-78)], and [4, 6, (9, 21), (23, 24), (48, 58), (60, 66), (68, 74), and (79-82)]. The pairing of the last several FPU tests was done so that the number of averaged FPU tests equated to the number of averaged ALU tests. There are also tests included in the Overall Performance chart which were not included on the ALU and FPU charts because their test methodology was either not ALU- or FPU-specific, or could not be determined.

TEST SYSTEM

● PC Chips M919 v3.4 B/F Motherboard (Hsing Tech) - UMC 8881F/8886BF chipset [BIOS: 05/06/1996]
● 128 MB Fast-page mode RAM (60 ns) [BIOS: 0WS/0WS]
● 256 KB Double-banked L2 SRAM Cache (15 ns), Write-back w/ 32 MB cacheable range [BIOS: 2-1-2]

PCI Slot 1 = Adaptec 2940U2W PCI SCSI Controller w/Seagate ST373307LW Ultra320 Hard drive
PCI Slot 2 = SIIG Intek21 (TKP2U022) OPTi 82C861 USB Host Controller (Disabled in Windows)
PCI Slot 3 = Matrox Millennium G200 PCI Graphics Card, 16 MB SDRAM
ISA Slot 2 = 3Com 3C509B-TPO Etherlink III Network Interface Card
ISA Slot 4 = Creative Labs AWE64 Gold, 4MB (CT4390)

● Windows 98SE: 1024x768x16bit, DirectX 6.1a, All security updates installed as of April 1, 2011
● L2 cache set to Write-through w/POD83-WT for stability in Windows
● Biostar MB-8433UUD v3.x UMC chipset motherboard [BIOS: UUD960326S, 03/26/1996] was used to test the POD83/POD100 in write-back mode since the M919 did not support the POD in Write-back mode. The MB-8433UUD and M919 both contain late model UMC 8881F/8886BF chipsets and are expected to yield test results consistent with one another. Three CPUs (A*, B*, C*) were later added using the MB-8433UUD.
● Some Windows test results (#48, 55, 64, and 72) will improve when all RAM is cacheable, i.e. when using 32MB RAM for 256KB WB L2 cache
● Doom fps = gametics / realtics x 35, where gametics is 2134 for demo3 and realtics is measured
● 3DMark99Max (C*, IBM 5x86C-133-FF): 3DMarks = 15, CPU Marks = 5, 3DMark99: 19

PENTIUM REFERENCE SYSTEM (used for normalisation of data)

● AZZA PT-5IT v2.1 Motherboard - Intel 430TX chipset [BIOS: 07/17/1998]
● Intel Pentium-100 P54C CPU, S-spec: SX963, CPUID: 0525, Cache: 16KB-split Write-back
● 512 KB L2 Pipeline Burst SRAM Cache (12 ns TAG), Write-back w/64 MB cacheable range
● All other hardware identical to 486 test system

CYRIX 5X86 REGISTER BIT SETTINGS*

[PCR0=5, CCR1=2, CCR2=C6, CCR3=1C, CCR4=38, WBE (CD=0, NW=1)] (units are in hexadecimal)

-------------------
RSTK_EN = 1
Enables the return stack so that RET instructions will speculatively execute following a CALL.

BTB_EN = 0
Invokes the branch target buffer for instruction addresses, thereby inducing branch prediction. Works reliably with stepping 1 CPUs only.

LOOP_EN = 1
Enables the prefetch buffer loop for destination jumps still present in the prefetch buffer (prevents buffer flushing/reloading).

LSSER = 0
If set to 0, memory reads and writes to the load/store memory management unit can be reordered for optimum performance.

USE_WBAK = 1
Enables write-back L1 cache pins.

WT1 = 1
Enables write-through in region 1 (640KB-1MB). Forces all writes to region 1 that hit the L1 cache to be sent to external bus.

BWRT = 1
Enables the use of 16-byte burst write-back cycles.

LINBRST = 1
Enables a linear address sequence while performing burst cycles (as opposed to i486 "1+4" address sequencing).

FP_FAST = 1
Enables Fast FPU exception handling.

MEM_BYP = 1
Enables memory read bypassing so that data can be read from the write buffers prior to being written to external memory.

DTE_EN = 1
Enables the directory table entry cache.
-------------------

*Cyrix 5x86-133
FP_FAST = 0 sometimes required for WinTune98 Direct3D test only

*Cyrix 5x86-100/120
FP_FAST = 0 required for Bytemark Neural Net test only
FP_FAST = 1 possible for Bytemark Neural Net test if BTB = 1 and LOOP_EN = 0

*Cyrix 5x86-80/100
BWRT = 0 required for stability (DOS / Windows)
RSTK = 0 required for stability (Windows)

RESULTS - ALU

From Appendix 5, it is clear that the Cyrix/IBM 5x86-133 had the best ALU performance for non-overclocked CPUs, whereby the Cyrix 5x86-120 was next in-line with 8% less performance. It may be argued, however, that the AMD X5-160 is long-term stable at 160 MHz and should not be considered an overclocked CPU, thereby outperforming the Cyrix 5x86-133 by 6%, that is, by 6 "Pentium points".

[Note: All referenced percent increases/decreases noted in this study are relative to the Pentium 100 reference CPU and not to the individual increases/decreases between compared CPUs. An increase of 10% can be thought of as 10 "Pentium points", meaning that this increase would be similar to an upgrade from a Pentium-90 to a Pentium-100. WB refers to L1 cache in write-back mode, while WT refers to L1 cache in write-through mode. If WB/WT is not specified, refer to L1 Cache Type on the charts in Appendices 1-4. BP is for a Cyrix 5x86 series processor with branch prediction enabled, while EO is for a Cyrix 5x86 series processor with its enhancement features disabled. An FF suffix refers to running the motherboard with a 66 MHz front-side bus (i.e. as opposed to a 33/40 MHz FSB). DR refers to CPU results which were derived from the linear slope of identically branded CPUs. More on DR CPUs can be found in Derived CPU Results. Lastly, RG refers to a CPU tested by Retro Games 100.]

If ALU performance is of primary interest, it was recently discovered that running an IBM/Cyrix 5x86 with a front-side bus (FSB) of 66 MHz greatly improved performance, whereby the IBM 5x86C-133-FF-BP's ALU performance closely approached that of the AMD X5-160. It is important to consider, though, that CPU's A*, B*, C* (AMD X5-200-RG, IBM 5x86C-133-FF-BP, IBM 5x86C-133-FF, respectively) were not run on the same system as other CPU's portrayed in this study, so their scores will be very marginally higher than other CPU's reported here. This is due to the entirety of the motherboard's RAM being cacheable by the system's L2 cache. For more on this topic, refer to Additional Discussion. While the two tested IBM 5x86C-133's are overclocked CPUs, a non-overclocked Cyrix 5x86-133 may also be run with the -FF and -BP suffixes and be considered a non-overclocked CPU yielding results consistent with the IBM 5x86C-133-FF-BP. It may also be counter argued that running a 486's northbridge controller at 66 MHz is overclocking the motherboard, regardless of how stable the system appears. To please the masses, the IBM 5x86C-133-FF-BP will be considered a pseudo-overclocked CPU.

The massively overclocked AMD X5-200-RG was recently added to the benchmark charts and was tested by http://www.vogons.org user Retro Games 100. The motherboard used for these tests was a Biostar MB8433 UUD v3.1 with an identical graphics adapter, so that CPUs A*, B*, and C* are equivalently comparable. The long-term stability of the X5-200-RG is currently unknown, but if deemed stable and appropriately cooled, it would certainly be the fastest usable overclocked 486 inasmuch as ALU performance is concerned.

Clock-for-clock, the Cyrix 5x86 133, 120, and 100 CPUs outperformed all AMD DX4/DX5 and Intel DX4 parts, however the Intel DX4-100 and DX4-120 pieces weighed in very closely to the equivalently clocked Cyrix 5x86 CPUs. Following the trend, had Intel made a real DX4-120 (not one that required overclocking) or a DX5-133, their arithmetic logic units would have outperformed a similarly clocked AMD part by a significant margin.

[Note: The Cyrix 5x86-100-BP with branch prediction enabled is a stable configuration on Stepping 1, Revision 3 CPUs (S1R3). Branch prediction is DOS-only stable on Stepping 0, Revision 5 CPUs (S0R5) and was, therefore, not enabled for general testing. To retract slightly from this statement, it was recently determined that branch prediction will function in Windows using S0R5 CPUs if the system is first booted into DOS then into Windows (i.e. by typing win at the command prompt). Using this method, the en suite of Windows tests was easily completed with an IBM 5x86C-133-FF-BP. It is still unclear whether or not Cyrix 5x86-120/133 CPUs were ever produced with S1R3, however the Cyrix 5x86-100 came in both S1R3 and S0R5 flavours. All S1R3 units surveyed to-date had production dates earlier than the S0R5 units; this may indicate that the S0R5 pieces are the more refined revisions.]

Observing just the three DX4-100's (Intel, AMD-WT, and Cyrix), the Intel piece won by a landslide, followed by the AMD and Cyrix parts. This may be understood by the following explanation: The Intel CPU has 16 KB of write-back cache while the Cyrix part only has 8 KB of write-back cache. The AMD falls in last because it has 8 KB of write-through cache. An AMD DX4-100 with 8 KB of write-back cache does exist, but was not available for testing. There also exist 16 KB, write-back versions of the AMD DX4-100 (which were later added to the chart and are referred to as AMD DX4-100-WB) which emerged more than a year after the X5 133. The AMD DX4-100-WB-16KB was not commonly found in consumer-marketed 486 machines, however, for completeness, a 100 MHz down-clocked X5-133 was added to the charts to simulate this CPU. The AMD DX4-100-WT was the most common of the AMD DX4's at 100 MHz. The AMD DX4-100-WB now steps ahead of the Cyrix DX4-100 in ALU performance (it has 8 KB more cache than the Cyrix), however still falls considerably behind the Intel DX4-100. The Intel piece clearly had a performance edge for DX4-100 class processors.

Considering that the ALU performance of the Intel DX4 100 and DX4-120 CPUs were similar to that of the Cyrix 5x86 units at the same clock rate, it may be that the Intel DX4 also contains some architectural enhancements, especially considering that the Intel DX4-120 marginally outperformed the AMD DX5-133 in ALU-focused operations.

Clock-for-clock, the ALU performances of the Cyrix 5x86-100-BP, Intel DX4-100, and AMD DX4-100-WB were 73%, 69%, and 62% of a Pentium-100 (P54C), respectively.

Observing the three DX2-66 pieces (Intel, AMD, and Cyrix), they all displayed similar performances at 66 MHz. This is surprising considering that the Cyrix company literature mentions the Cyrix DX2-66 as containing write-back cache, while the AMD and Intel units tested both had 8 KB of write-through cache. It may be that either the Cyrix part really contains WT cache or that the motherboard placed its cache in WT mode (one of the cache utility programs claimed the CPU was in WT mode, while another mentioned it was in WB mode. CTCM and Chkcpu16 are usually in agreement, but not for this CPU). The IBM-Cyrix literature mentions a 15% performance increase for chips with write-back cache due to eliminating unnecessary external memory write cycles. An Intel DX2-66 with 8 KB of write-back cache also exists, however it was far less common than the WT version and was not available for testing.

As far as the Pentiums are concerned, the P54C-100 (the reference CPU) contains half the L1 cache of the POD 100 WB (Socket 3, Pentium Overdrive) and scored 7% better. This can easily be understood as the P54C-100 was run on a motherboard with a 66 MHz FSB and with pipeline burst cache. By comparing the POD-83-WB with that of the POD-83-WT, it is clear that using the WB caching scheme over the WT scheme contributes significantly to performance, 15% in this case. Unfortunately, the POD 100 may be hit-or-miss with finding one which will overclock well to 100 MHz. Only 1 in 3 units tested for this project overclocked well enough in Windows to run the complete set of benchmark tests. The POD-100 in WT mode overclocks even worse than in WB mode, so this was used as a basis to establish which 1 in 3 were the best overclockers.

[Note: Only the M919 seemed to operate with WT mode correctly on the POD, whereas if you set the MB 8433UUD motherboard to WT mode on a POD, the cache utility programs would still report the CPU to be in WB mode).]

Turning off the Cyrix next generation enhancements, as witnessed with Cyrix 5x86-133-EO, brought the ALU performance down to a level barely above that of the AMD X5-133. Fortunately, most socket 3, PCI-based motherboards work with the majority of these enhancements enabled, with the worst case leaning towards BWRT and LINBRST being non-functional on some older PCI-based motherboards. Further individualised testing would be required to determine what significance each enhancement has over the benchmark results. No such testing is currently planned; however, it is evident from the ALU Performance chart that enabling branch prediction (5x86-100-BP) increased ALU performance by 3%. It has also been observed that turning on FP_FAST increased FPU performance by 10%.

Looking again at the AMD DX2-66 score of 40.3 and doubling it (to simulate 133 MHz operation), we get a score of 80.6, which is about the same as an AMD X5-133 (at 82.3). It appears as if not much as changed architecturally with the AMD unit, except for whatever layout rules and technology were needed to enable 133 MHz operation on a DX2. The increase was only 2.04 fold, whereby a 2 fold increase would be expected based on clock frequency alone. 2.04/2 = 1.02, or a 2% increase. This will be referred to as the design enhancement factor, or DEF. Using the same analogy for the Cyrix DX2-66 and Cyrix 5x86-133, we see a DEF of 1.20, or 20%! Through linear extrapolation, the ALU performance of a fictitious Intel DX2-60 was obtained, resulting in an Intel DX2-60 / DX4-120 DEF of 1.06, or a gain of 6%.

RESULTS - FPU

For non-overclocked CPUs, Appendix 6 indicates that the Cyrix 5x86-133 had the best FPU performance, whereby the Intel POD-83-WB was next in-line with 4% less performance. A Cyrix 5x86-120 and POD-83-WB were a close match, with a 3% lead by the POD-83-WB. A Cyrix 5x86-120, and even a POD-83-WT in write-through mode, outperformed the AMD X5-160 in FPU operations. Looking at the charted FPU results, an AMD X5-160 was approximately equivalent to a Cyrix 5x86-100 in floating-point operations, while an AMD X5-133, Cyrix 5x86-80, and Intel DX4-120 demonstrated similar performance.

For overclocked CPUs, the IBM 5x86C-133-FF-BP fell short of an Intel POD-100-WB by 5% and an AMD X5-200-RG fell short of an Intel POD-100-WB by 10%.

Observing just the three DX4-100's (Intel, AMD, and Cyrix), the Intel piece won by only a slight margin (2%) - the three were all basically equivalent in FPU operations. The same can be said for the three DX2-66 pieces; their performance deviated by less than 1%. Assuming all three DX2 / DX4 CPUs were available concurrently for purchase and there was a steep cost differential, the obvious decision would have been to buy the most conservatively priced CPU, that is, if FPU performance was the primary interest. In 1996, the main applications demanding FPU power were mathematical and simulation-based modeling for the scientific research community, as well as the newly released 3D game, Quake. Widespread mp3 decoding interest followed suite shortly thereafter.

Clock-for-clock, FPU performances of the Cyrix 5x86-100, Intel DX4-100, and AMD DX4-100 were 64%, 44%, and 42% of a Pentium-100 (P54C), respectively. Certain 3D games, such as Quake, make heavy use of Pentium-specific optimisations, so the FPU performance difference with Quake between Pentium vs. non-Pentium chips is expected to increase more than with other benchmark tests. Taking Quake as an example, and looking at Appendix 3, Test 47, we see that the performances of the Cyrix 5x86-100, Intel DX4-100, and AMD DX4-100 are 53%, 45%, and 43% of a Pentium-100 (P54C), respectively. As for the AMD DX4-100-WB, the addition of 8 KB more L1 cache, and with the CPU in write-back mode, did not seem to improve FPU performance.

When observing the Quake score of the POD-100-WB, it should be mentioned that this CPU was unable to complete the Quake timedemo test. This score had to be linearly extrapolated from the scores of the POD-83-WB, POD-83-WT, and POD-100-WT (not included on the charts). The fastest playable Quake score was from the POD-100-WT, which scored 23.6 fps. The POD-100-WB, however, was able to complete all other tests found in this report.

Looking again at the design enhancement factor for FPU operations, the AMD DX2-66 / X5-133 showed a DEF of 0.97, or a 3% loss, while the Cyrix DX2-66 and Cyrix 5x86-133 showed a DEF of 1.45. For the Cyrix 5x86-133, enhancements in CPU architecture (as compared to an old generation DX2-133) increased floating-point performance by 45%! Through linear extrapolation, the FPU performance of a fictitious Intel DX2-60 was obtained, resulting in an Intel DX2-60 / DX4-120 DEF of 1.0, or no gain/loss.

RESULTS - OVERALL

For non-overclocked CPUs, Appendix 6 indicates that the Cyrix 5x86-133 displayed the best overall performance, less the P54C 100 reference CPU by 8%. Now, for a Cyrix/IBM 5x86-133 running with a 66 MHz FSB (same as the P54C-100 reference CPU) and with branch prediction enabled, the CPU now perfectly equates to the overall performance of an Intel Pentium 100. This is extraordinary considering that the P54C 100 was run on a motherboard which had technology at least 2 years newer and pipeline burst L2 cache! It would be interesting to see a similar CPU comparison done using the same hardware for the socket 7 ensemble of CPUs, particularly interesting would be how well a Cyrix 6x86 at 100 MHz performs compared to a P54C 100 for the same set of tests.

For the overall best overclocked CPU performance, the AMD X5-200-RG lead the race, with the IBM 5x86C-133-FF-BP falling shortly behind. For typical use, these two CPU configurations would be fairly competitive and are hereby independently awarded the title of World's Fastest 486 ®. Finding an AMD X5-133 or X5-160 which will stably overclock to 200 MHz is extremely rare though, however the IBM 5x86C-100HF used in this study was relatively popular since IBM manufactured these chips well after Cyrix ceased production interest. The IBM branded chips were really just Cyrix designed chips with IBM external markings. Both versions came from IBM's semiconductor fabrication facility. It is thought that IBM maintained stricter fabrication and qualification yields on the 5x86 than Cyrix did, so the IBM branded versions may have a better chance in running at 133 MHz.

Comparing the Cyrix 5x86-133 to the IBM 5x86C-133-FF, we see a 6% overall performance enhancement in using a 66 MHz front-side bus as compared to using a 33 MHz front-side bus. The PCI bus for both cases was maintained at 33 MHz through the motherboard's onboard PLL chip. With a 66 MHz FSB, the socket 3 486 motherboard performs closer to a socket 5 Pentium motherboard with regular SRAM, which makes for a fairer chip-for-chip comparison to the Pentiums. Most users of an IBM 5x86-100HF or Cyrix 5x86-120 will likely run their chip at its advertised speed of 120 MHz, however a 15% speed boost can be obtained if the chip is run at 133 MHz (CLKMUL set to 2x, FSB set to 66 MHz). Recent tests seem to indicate that this modest overclock is stable in Windows 98SE, WinNT 4.0, and Win2000 when the CPU's core voltage is set to 3.85 volts and a cooling fan is added to the heatsink.

The POD-100-WB was comparable to the Cyrix 5x86-133, although its long-term stability remains an open question. The POD-83-WB's performance fell somewhere between that of a Cyrix 5x86-100-BP and a Cyrix 5x86-120. For non-overclocked and readily obtainable CPUs of the time, the Cyrix 5x86 120 was your best bet, with overall performance similar to that of a fictitious P54C-83.5. To include conservatively overclocked CPUs, the AMD X5-160 is a good compromise and performed similarly overall to a Cyrix 5x86-120.

The overall performance of the AMD X5-133 was marginally less than that of a Cyrix 5x86-100-BP, which was no surprise considering this is a very common conclusion that many benchmark results have independently arrived at. If you have your mind set on a Pentium Overdrive socket 3 processor and if your motherboard only works with the POD-83 in write-through mode, you're better off with an AMD X5 133 or an Intel DX4-120. The overclocked tests for the Intel DX4-120 never hung, however long-term stability hasn't been firmly established.

Not surprising were the overall Cyrix 5x86 scores, which weighed in closely to their expected Pentium equivalents. The Cyrix 5x86-100 has a Pentium rating of 75, or PR-75, whereby the results from Appendix 7 placed it at about a Pentium 72.5. The Cyrix 5x86-120 has a Pentium rating of PR-90, but fell short of this goal with only a Pentium rating of only 83.5. The Cyrix 5x86-133 has a Pentium rating of PR-100 and the results awarded it a Pentium 91.5, however if using branch prediction and a 66 MHz FSB, it met right up to that PR-100 expectation. Lastly, the AMD X5-133 has a Pentium rating of PR-75 and the results studied here indicated it was approximately equal to a Pentium 71.4.

Clock-for-clock, the overall performances of the Cyrix 5x86-100-BP, Intel DX4-100, AMD DX4-100-WB and AMD DX4-100-WT were 72.5%, 58.9%, 58.5%, and 51.7% of a Pentium-100 (P54C), respectively. From the AMD portion of the comparison, it was evident that having 16 KB of WB cache improved CPU performance by 7% compared to those with only 8 KB of WT cache.

For the overall design enhancement factor, the AMD DX2-66 / X5-133 had a DEF of 0.94, or a loss of 6%. Through linear extrapolation, the overall performance of a fictitious Intel DX2-60 was obtained, resulting in an Intel DX2-60 / DX4-120 DEF of 0.98, or a loss of 2%. The AMD X5 and Intel DX4 parts seem to contain no prominent design enhancements, although the Intel DX4 displayed an ALU DEF of 6%. The DEF results for the AMD X5 seem to be in agreement with reports made elsewhere stating that the AMD X5 is merely a clock-enhanced DX2/DX4. The Cyrix DX2-66 / 5x86-133 had an overall DEF of 1.26, or a 26% increase. Using the same extrapolation method, we can also obtain an Intel DX2-41.5 / POD83-WB DEF of 1.59, or a 59% increase. The CPU design architectural enhancements of the POD83 were twice that of the Cyrix 5x86; it is unfortunate that Intel did not produce a 100, 120, or 133 MHz version of the Pentium Overdrive for the socket 3 platform. Such high speed Pentium Overdrives would have undoubtedly squashed all other 486 competitors and perhaps even put AMD out of business. For AMD, sales from the X5-133 continued well into 1999 and were what kept them afloat during the K5 flop.

ADDITIONAL DISCUSSION

Unfortunately the three AMD X5's tested would not operate at 200 MHz in either the M919 or MB 8433UUD motherboards. From results reported elsewhere (Retro Games 100), it may be that an AMD X5-133ADW with CPUID 04F4 (and perhaps with a production date of 9630DPE) is required for operation at 200 MHz and 5V. One other online user with an AMD marked as an X5-160 has also reported success at 200 MHz. The AMD X5's tested for this study were of flavours 133ADZ (0494 & 04F4) and 133ADW (0494). The results from Retro Games 100 were later added to this benchmark comparison. Refer to the section under Derived CPU Results to see how an AMD X5-200 compares to other socket 3 CPUs tested herein.

It is thought that the 5x86-120 was somewhat plagued with running a 40 MHz front-side bus (as opposed to a relatively standard 33 MHz bus), but this only gave problems with select PCI cards, notably older 3Com network cards. Some modern 486 motherboards have a workaround for this issue by either auto-enabling a 2/3 FSB multiplier (so that the PCI bus runs at 27 MHz, as is suspected with the M919), or by implementing a user-controllable 1/2 and/or 2/3 multiplier option in the BIOS (as is the case with the MB 8433UUD). If a Cyrix 5x86-120 was problematic in your motherboard and if you were unable to obtain a Cyrix 5x86-133, the next best choice was probably a POD-83-WB or AMD X5-160. Unfortunately, the POD-83's were over-priced, late to market, and many motherboards didn't work properly with them installed. There exist some hard-to-find AMD X5 CPUs marked as 150 and 160 MHz. It is thought that some of the 133ADZ units are actually rated for 160 MHz, but down-marked by AMD for marketing reasons, particularly so that 486 sales did not heavily distract from AMD K5 sales. By this argument, the AMD X5-160 may be considered a non-overclocked CPU, whose overall performance, as noted in Appendix 7, equates to that of a Cyrix 5x86-120.

[Note: The difficulty in finding Cyrix 5x86 CPUs is due, in-part, to a short production run, from approximately August 1995 to February 1996. Production dates for the 5x86-133 were generally from the first several weeks of 1996, though some from Dec. 1995 were also produced. There exist mixed reports online which claim that all of the 5x86-133 CPUs were bought by either Evergreen Technologies, Gainbery Computer Products, or Computer Nerd (RA4) for use in 486 upgrade kits. The AMD X5, by contrast, was produced all the way from 1996 through 1999.]

The tabulated results found in Appendices 1-2 are not intended to reflect the absolute best performance of each CPU, but rather to serve as an equal comparison of CPUs within an identical system. Consider a motherboard with 1024 KB of L2, or Level 2, cache; it has a cacheable range of 128 MB, whereby the employed system, with 256 KB L2 cache, can only cache up to 32 MB of this 128 MB (in write-back mode). The remaining 96 MB (128 MB - 32 MB) remains uncached by the L2 cache system (although it is still cached by the CPU's L1 cache) and any data in RAM being accessed by the CPU which isn't L2 cached will usually be processed ~2 times slower than if it was L2 cached. For the purposes of this study, the effect of this uncached RAM seemed to be witnessed only in Windows and by specific benchmark programs, namely CPUMark, SuperPi, PassMark, and WinTune-Memory. In the case of these specific benchmark tests, using less RAM (32 MB, for this system) will actually increase test performances, i.e. for the Cyrix 5x86-133, CPUMark99 increased from 3.8 to over 5. Similarly, the use of a different graphics card, or a 40 MHz front-side bus will also affect benchmark performance. Luckily, these various system-wide optimisations should not affect each CPU's comparative benchmark score since all CPUs were tested in an identical system. If one were to re-do the ensemble of tests, it may be preferred to use an amount of RAM that is fully cached by the Level 2 cache, in this way, these particular Windows-based scores would be more properly comparable to test results taken by other users and in other systems.

When using a Biostar MB-8433UUD motherboard (in contrast to the employed M919), the PCI bus can be set to 40 MHz without the fear of an auto-divider sneaking its way in. This increases performance with 40 MHz FSB CPUs only (5x86 120, X5-160, Intel/AMD DX4-120, and POD-100-WB). For example, with a fixed 40 MHz FSB (as opposed to 27 MHz), a 15% improvement was noted in the 3Dbench score when tested with a Cyrix 5x86-120 in an MB-8433UUD motherboard. This enhancement is only noticed for highly graphic applications, like Quake, or any benchmark requiring extensive use of the PCI bus. Considering each 40 MHz FSB CPU was given the same treatment (of a 2/3 auto-multiplier on the M919) and noting that the 40 MHz FSB issue affected only 10% of the benchmark programs, most of the performance hit will average-out given the large number of tests being averaged. For an entirely fair CPU-only comparison, the FSB would need to be the fixed for all CPUs. This, unfortunately, is not possible without implementing a custom PLL circuit.

For CPUs running with a 40 MHz FSB, the observed decrease in heavy graphics-based performance (mainly in Quake, Doom, Pcpbench, and 3Dbench) was due to the M919 automatically adding a 2/3 FSB-to-PCI multiplier, thereby slowing graphics throughput to 27 MHz. Unfortunately the phase-locked loop (PLL) circuit used on 486 motherboards isn't complex enough to set the PCI bus to a fixed 33.3 MHz while maintaining the 40 MHz FSB to the RAM, although some Pentium-class motherboards have PLL circuits which allow for a relatively constant FSB frequency via the BSEL signal.

DERIVED CPU RESULTS

There exist some preliminary tests online of an AMD X5-133 being overclocked to 180 MHz (3 x 60 MHz) and to 200 MHz (4 x 50 MHz), using a Shuttle HOT-433 v1-3 and a Biostar MB8433-UUD v3.1, respectively. Unfortunately, cache and memory wait state settings may need to be de-optimised from the static values used in this comparison to maintain stability. There are also reports of a Cyrix 5x86-100 being overclocked to 150 MHz (3 x 50 MHz) well enough to run several of the utilised benchmark programs, however cache and memory wait states were also significantly reduced. A second account of such an AMD-based system surfaced online in late 2010. It employed an AMD X5-160ADZ, overclocked to 200 MHz on a SOYO SY-4SAW2 motherboard, whereby the performance was stable enough to install and run Win98SE and WinME.

[Note: UMC-based 486 motherboards contain undocumented FSB jumper settings to enable 60/66 MHz operation.]

While the exact BIOS, memory, and cache settings adopted in these systems is not fully documented, the results are promising for the would-be retrocomputing overclocker. Without evidence of prolonged stability, such systems may be more of a curiosity rather than a practicality. In light of these curious findings, data from the current results have been extrapolated linearly to produce the results of an AMD X5-180, AMD X5-200, and Cyrix 5x86-150. As of the November edition of this comparison, actual results from an AMD X5-200 have been added.

Results for these overclocked CPUs have been added to the ALU, FPU, and Overall Performance charts and are identified as Derived CPU in Appendices 5-7. If you are lucky enough to have access to a system running with one of these CPUs, your system's performance may not necessarily reach the levels published in these charts. It is sometimes necessary to configure the system settings (via the BIOS) to slower I/O recovery times, cache wait states, and memory wait states, to name a few. Expect to reduce the derived CPU scores shown here anywhere from 0-15%. Put more quantitatively, there have been reported SpeedSys scores for the AMD X5-180 at 67.5 and AMD X5 200 at 75.1, whereby, for comparison, the Cyrix 5x86-133 tested in this study had a score of 73.6.

Looking at the ALU Performance chart, the three derived CPUs are the clear leaders of the race, with the tested AMD X5-200-RG matching perfectly with the linearly extrapolated AMD X5-200-DR. For FPU performance, the Cyrix 5x86-133/150 outperformed even the AMD X5-200-RG, and the Cyrix 5x86-150 stretched curiously close to the POD-100-WB. For the overall performance, an AMD X5 200-RG whopped even the Pentium 100 (P54C) by 9%. The 5.5% gain of the AMD X5-200-RG over the AMD X5-200-DR may be largely attributed to the use of a 50 MHz front-side bus and, to a lesser extent, all RAM being L2 cacheable. A Cyrix 5x86-150 also outperformed the Pentium 100 (P54C) by 2%.

Gentlemen, its time to throw out those slow Intel Pentium 100's and treat yourself to a 486!

Attachments

Last edited by feipoa on 2018-11-30, 09:56. Edited 48 times in total.

Plan your life wisely, you'll be dead before you know it.

Reply 1 of 280, by feipoa

User metadata
Rank l33t++
Rank
l33t++

Images of the CPUs and motherboard employed in this study plus the three most important charts from the report.

Attachments

  • Filename
    Elpina_M919.jpg
    File size
    1.18 MiB
    Downloads
    674 downloads
    File comment
    Elpina M919 v3.4 B/F with 256 KB cache
    File license
    Fair use/fair dealing exception
  • Filename
    CPU_486_Scan.jpg
    File size
    4.57 MiB
    Downloads
    766 downloads
    File comment
    Images of the CPUs used for this comparison (scanned with UMAX Astra 600S, circa 1996)
    File license
    Fair use/fair dealing exception
  • ALU_486.png
    Filename
    ALU_486.png
    File size
    15.31 KiB
    Views
    107677 views
    File license
    Fair use/fair dealing exception
  • FPU_486.png
    Filename
    FPU_486.png
    File size
    15.06 KiB
    Views
    107631 views
    File license
    Fair use/fair dealing exception
  • Overall_486.png
    Filename
    Overall_486.png
    File size
    15.31 KiB
    Views
    106536 views
    File license
    Fair use/fair dealing exception
Last edited by feipoa on 2013-05-14, 10:45. Edited 23 times in total.

Plan your life wisely, you'll be dead before you know it.

Reply 2 of 280, by SquallStrife

User metadata
Rank l33t
Rank
l33t

Ooh, very nice.

One small question though:

• PC Chips M919 v3.4 B/F Motherboard (Hsing Tech) - UMC 8881F/8886BF chipset [BIOS: 05/06/1996]
[...]
• 256 KB Double-banked L2 SRAM Cache (15 ns), Write-back [BIOS: 2-1-2]

Isn't the M919's cache fake?

VogonsDrivers.com | Link | News Thread

Reply 3 of 280, by feipoa

User metadata
Rank l33t++
Rank
l33t++

The later model M919 motherboards with rev. 3.4 B/F didn't have the fake cache chips, they had no cache unless you installed a COAST-like cache module like the ones shown here,
Fastest PCI graphics card in a 486

The motherboard used in these tests had a real 256 KB SRAM module installed. Speedsys and cachechk both report enhanced speed in the 16-256 regime. The performance of the M919 w/256 KB cache agrees well with results found on the Biostar MB-8433UUD motherboard. Both have the same chipset as well.

EDIT (13 Sept. 2018): Later work, which compares 486 CPUs using glide games on a 486, can be found here: Voodoo 1 vs. Voodoo 2 on a 486
A brief summary is provided for convenience:

Average_all_games_normalised_to_POD100.png
Filename
Average_all_games_normalised_to_POD100.png
File size
7.97 KiB
Views
54190 views
File license
Fair use/fair dealing exception
Last edited by feipoa on 2018-09-13, 09:38. Edited 2 times in total.

Reply 5 of 280, by udam_u

User metadata
Rank Member
Rank
Member

Marvelous test [hat off]! Everything is perfect: big amount of tests - final result should be objective, normalization to P100 - interesting for all people building 486 machine but especially for 486 hot rod enthusiasts and finally great charts that notably simplify reading overall tests result. This is the most comprehensive 486 CPU comparison I have ever seen. I admire that one man has made such great test in such a short time!

I was thinking that POD83WB is the best in FPU tasks. It is amazing how good is Cx5x86 after enabling some hidden enhancements. Too bad it was being produced for a short time.
Also results obtained by Intel 486 are very surprising. I always thought that Intel resigned from 486 market because of the better AMD 486 core.

It was a great idea to introduce "enhancement factor". Now everyone will respect Cx5x86 and it has a chance to become the most desirable processor for 486 Hot Rod builders. However, I think that the position of Am5x86@160MHz is still safe because of its fastest ALU that is very important in typical operating system work and old games. Also availability is very important.

Regards! (:

What doesn't kill you makes you stronger.

Reply 6 of 280, by DonutKing

User metadata
Rank Oldbie
Rank
Oldbie

excellent work feipoa 😀

I've just had a skim, will go through it more thoroughly later... interesting to note that for both the AMD and Intel DX4's, going from 100 to 120MHz actually reduces the score in pcpbench, 3dbench and doom timedemo... did you have to loosen timings to get these CPU's to run at 120?

I have an M919, do these modules still appear on eBay from time to time?

I tried and never managed to find one 🙁
I did have a COAST module from a socket 7 board, which physically fit, but prevented the PSU from powering on when I installed it.

I'd read that doing this would destroy the board, but I wasn't too worried at the time 🤣

Reply 7 of 280, by feipoa

User metadata
Rank l33t++
Rank
l33t++

Thanks to everyone for the positive encouragement.

To answer DonutKing's question, the decrease in heavy graphics-based performance is due to the M919 automatically adding a FSB-to-PCI for CPUs running at a 40 Mhz FSB. Unfortunately the PLL used on the motherboard isn't fancy enough to set the PCI bus to 33.3 Mhz, but at a fractional multiple of the FSB. I've seen PLL circuits on more modern motherboards that allow for a fixed PCI output, irregardless of the FSB, but not with 486 motherboards.

Luckily the performance degradation is only for 40 Mhz FSB CPUs and those 4 DOS-based benchmarks and most of the hit will average out in the end. For an entirely fair CPU only comparison, the FSB should be the same for all CPUs, this unfortunately is not possible without adding a custom PLL circuit. hmmm...

EDIT: I haven't seen an m919 cache stick on eBay in years. They were fairly common in 2002 when people were junking their 486s. I have two such cache modules and two m919s, but I bought the cache modules later. I'd be surprised if the m919s sold with the cache modules as that would have upped the cost, and we all know that PC Chips is notorious for being cheap.

Last edited by feipoa on 2011-05-13, 09:26. Edited 1 time in total.

Reply 8 of 280, by Tetrium

User metadata
Rank l33t++
Rank
l33t++

Great post feipoa!!!

I'll go read it more thoroughly after posting this, it's very interesting!

One note though about the DX4's:
Intel and AMD made several different DX4's (not sure about Cyrix, they made a DX4 and a 5x86 clocked as a DX4).
Intel has a DX4 16kb WT and a DX4 16kb WB (The WT version is a lot more common and the older variant)
AMD made 3 different chips:A DX4 8kb WT, 8kb WB and a 16kb WB

Personally I think it would've been a little bit more fair to compare the Intel 16kb WT with the AMD 16kb WT and then the Intel 16kb WB with the AMD 16kb WB.
Though I "think" the Intel DX4 is still the better unit.

Edit:I did a little bit more looking around, apparently AMD also made several versions of their DX2. One early one with 8kb WT and a much later variant with 8/16kb WB!
The difference is in whether it's an AMD 486 enhanced version or not 😉

Link: http://en.wikipedia.org/wiki/Am486
It's in the chart piece, scroll down a little bit.

Edit2: Now this is interesting, apparently AMD made PGA Socket 3 chips well into the 90's!!
Think of the overclocking potential 😁
http://www.cpu-world.com/CPUs/80486/MANUF-AMD.html

Last edited by Tetrium on 2011-05-13, 09:49. Edited 1 time in total.

Whats missing in your collections?
My retro rigs (old topic)
Interesting Vogons threads (links to Vogonswiki)
Report spammers here!

Reply 9 of 280, by feipoa

User metadata
Rank l33t++
Rank
l33t++

Tetrium, you are correct in that this would make for a fairer clock-for-clock comparison, however I do not own an Intel DX4-100-WT. I did go into this a bit in the monologue.

You may have made a typo, the 3 AMD DX4's were 8KB WT, 8KB WB, 16KB WB. I do not own the AMD DX4-100 8KB WB.

I did notice there was an AMD DX2-WB, but I figured a downclock would be a good enough simulator. To do a DX2-WB comparison, I'd also need to buy the late era Intel DX2-WB.

The AMD 16KB WT would be the AMD X5 set in WT mode? Downclocking would be required for a 100 and 120 Mhz comparison. A good observation, and maybe if I get the retro cpu itch again I'll add these to the list. I'm fairly retroed out. I am now focusing efforts on 5x86-133 stability in NT4 -- so far so good, but it hasn't been long.

Reply 10 of 280, by Tetrium

User metadata
Rank l33t++
Rank
l33t++

Cheers feipoa, it was indeed a typo! I typed AMD 8kb WT twice. Corrected 😉

feipoa wrote:

I did notice there was an AMD DX2-WB, but I figured a downclock would be a good enough simulator.

You are absolutely correct 😉

And about the slower AMD 16kb WB models, these may be nothing more then downclocked 5x86 133 chips in the first place 😉

Edit: The results for the POD-100 and the P55C-100 are VERY interesting!
I hope one of my POD-83's will manage to run stable @ 100Mhz, but I have so much to do before I get around to it.
The giant stack of untested hardware has only slimmed a little bit so far, I'm too busy doing a cleanup and reordering of my attic before I can begin toying with my hardware in earnest again.

Whats missing in your collections?
My retro rigs (old topic)
Interesting Vogons threads (links to Vogonswiki)
Report spammers here!

Reply 11 of 280, by DonutKing

User metadata
Rank Oldbie
Rank
Oldbie

Edit2: Now this is interesting, apparently AMD made PGA Socket 3 chips well into the 90's!!
Think of the overclocking potential
http://www.cpu-world.com/CPUs/80486/MANUF-AMD.html

If I'm not mistaken the Am486DX2-66V16BGC about 4th up from the bottom has a datecode of 1998... I'm guessing for embedded/industrial applications because I don't know why you'd buy a 486 desktop in 1998 😜

If you are squeamish, don't prod the beach rubble.

Reply 12 of 280, by Tetrium

User metadata
Rank l33t++
Rank
l33t++

True, but for us the main point is that they work in the boards we have 😁

Whats missing in your collections?
My retro rigs (old topic)
Interesting Vogons threads (links to Vogonswiki)
Report spammers here!

Reply 15 of 280, by swaaye

User metadata
Rank l33t++
Rank
l33t++

Ok this is definitely the best 486 CPU study that I've seen. 😁

feipoa wrote:

I had hoped that the ultimate 486 comparison would have cleared a lot of 486 vs. pentium fpu myths, but I guess not.

According to your very impressive testing results, 486-like CPUs including the AMD X5 provide about 50% the FPU performance per clock of a Pentium. The Cyrix 5x86 moves that to about 65% performance per clock.

Although when one considers the fact that Pentium CPUs only reach at best 100 MHz on the 486 platform, the advantage is less profound in most cases. The X5 at 160 gets you to about -15% of a Pentium 100 and the rare Cyrix 133 gets you to within 5%.

Still, the common Quake comparison shows a ~30% disadvantage even in this case with highly clocked X5 and Cx5x86. This brings to mind potential Pentium optimizations in software. Pentium's superscalar, but in-order design has potential for further speed if programmers optimize for it and this is shown in Quake. There may be other application scenarios like this but I do think they could be rare due to the time demands and complexity of assembly optimization.

I would love to see this kind of testing done for Socket 5/7 CPUs including the AMD K5, SSA-5, K6, 6x86, C6, Winchip 2 and the two Pentium cores. 😉 That's a lot of CPUs though....

Reply 16 of 280, by Tetrium

User metadata
Rank l33t++
Rank
l33t++
swaaye wrote:

I would love to see this kind of testing done for Socket 5/7 CPUs including the AMD K5, SSA-5, K6, 6x86, C6, Winchip 2 and the two Pentium cores. 😉 That's a lot of CPUs though....

After I'm done rearranging my attic, I've thought of doing some benchmarking with lots of chips and from a bootable floppy disk (that, or I'll use 1 harddrive and pass it along different systems 😉 ).
The attic is nowhere near done yet, but if anyone wants somekind of benchmark that can be done from a bootable disk (AND if said person would be so kind as to upload a bootable floppy image file, .ima or .imz winimage files) then I'm sure I can take a couple detours 😉

Chips I have available for Socket 7-ish are:
Pentium (including Socket 4 and the 200Mhz one)
Pentium MMX (including 233 and Tillamook 266)
Cyrix MII (Could possibly overclock to say 300Mhz?)
mp6 PR266 (untested as of this time)
Winchip classic(?) 240 and/or 200 (untested as of this time)
K6-III+
K6-2+
K6-2 (Got the 550 laying around also I think)
I think that's kinda all of em, except for K5 which I don't have. I do have a couple earlier Cyrix MX thingy chips? But I'd rather not mess with them, I'll just use the MII's. Have 1 single K6-III but K6-III+ should be the same performance anyway.

For s370 I have available:
Celeron PPGA, most speeds
Celeron Coppermine with the crappy 66Mhz FSB.
Celeron Coppermine 100Mhz FSB
Tualerons ranging till 1400
And Coppermine to 1000Mhz
And Tualatin-s 1400, Tualatin classic 1200 (Do Tualatin 1400's with 256kb cache even exist??).
And a couple Nehemiah's, 1.2Ghz, 1Ghz and one 866 I think. Haven't been able to get any of the lower ones booting.

Soooo...if anyone wants a benchmark that can be booted from a floppy and can upload a working .ima/.imz file...😉

Whats missing in your collections?
My retro rigs (old topic)
Interesting Vogons threads (links to Vogonswiki)
Report spammers here!

Reply 17 of 280, by feipoa

User metadata
Rank l33t++
Rank
l33t++

Swaaye, good call on the Quake Pentium-specific optimisations. Do you think Quake makes heavy use of CMPXCHG8B, RDMSR, RDTSC, and WRMSR Pentium instructions and that these are the main contributors to the enhanced frames per second?

I should point out that when using a Biostar MB-8433UUD motherboard (in contrast to the employed M919), I can set the PCI bus to 40 MHz without the fear of an auto divider. This helps with the 40 MHz FSB CPUs only (5x86-120, X5-160, Intel DX4-120, POD-100-WB). At 40 MHz FSB, I notice a 15% improvement in the 3Dbench score for the Cyrix 5x86-120. This enhancement is only noticed for highly graphic applications, like Quake, or any benchmark requiring extensive use of the PCI bus. Considering each 40 MHz FSB CPU was given the same treatment (2/3 m919 auto divider) and considering that the 40 MHz FSB affects only 10% of the benchmark programs, I do not expect the overall scores to shift much.

Tetrium, I too have toyed with the idea of a similar Socket 7 comparison, but as Swaaye pointed out, the number of CPUs is outrageous. I am having my AZZA BIOS inspected for the possibility of an upgrade to support the K6 2/3+ processors. The main interest would be to have the same benchmark suite comparison done on a fixed system (motherboard), but this project is way low on my list. I might get to it by the time Tetrium finishes cleaning his attic. I have much of what Tetrium has, but not all -- I'm mainly a socket 3 guru.

Plan your life wisely, you'll be dead before you know it.

Reply 18 of 280, by swaaye

User metadata
Rank l33t++
Rank
l33t++

There are articles about the Quake engine that go into great detail and way beyond me but I recall that they optimized specifically for the Pentium's dual issue design. It was really no holds barred Pentium targeting.

Michael Abrash helped write Quake and is a renowned assembly programmer.
http://drdobbs.com/high-performance-computing/184404919

Last edited by swaaye on 2011-05-19, 18:35. Edited 3 times in total.

Reply 19 of 280, by sliderider

User metadata
Rank l33t++
Rank
l33t++
SquallStrife wrote:

That's AWESOME!

I have an M919, do these modules still appear on eBay from time to time?

No, they don't. Most people don't even know what they are. You need a special type that only works with the M919 and when they do come up, they are frequently misidentified so it makes it hard to search for them. There's also a lot of buyers now who DO know what they are so you'd be competing against them and that drives the bids up.