VOGONS


Cyrix DX2-80 slower than DX2-66?

Topic actions

Reply 20 of 40, by mkarcher

User metadata
Rank l33t
Rank
l33t
Disruptor wrote on 2020-07-12, 08:29:
jakethompson1 wrote on 2020-07-11, 18:43:

It's in write through. The write in in Speedsys is a straight horizontal line about as fast as an L2-cached read which seems consistent with that. No jumpers on the board for write back. The board is also 5V only so it may be too old. There is a BIOS setting for L2 write back, though, and the performance seems consistent with it being on.

I don't give a penny for Speedsys' write line on a 486. On this architecture it seems to be not affected by write back cache at all.
And I also have noticed the 27 MHz of Speedsys for my Cx486-DX2/80. It is definitively wrong for this CPU.

Speedsys uses an access pattern for write tests that causes a near 100% miss ratio on the write-back cache on most 486 and Pentium systems, so you always measure the cache-miss performance, even with just 4KB (which would fir in L1) or 128KB (which would fit in L2). I explained it in this post: Re: Pentium MMX 450MHz. Unfold the quoted text to get the required context to understand that post.

Reply 21 of 40, by mkarcher

User metadata
Rank l33t
Rank
l33t
jakethompson1 wrote on 2020-07-11, 18:49:
I replied to an old thread so I'm the recent one here, but what you're saying sounds like my suspicion. Would that also be consi […]
Show full quote
mkarcher wrote on 2020-07-11, 17:23:

My money is on "chipset autoconfiguration". Most BIOSes have the option to auto-configure RAM and cache timings automatically depending on the bus clock. On some BIOSes, auto-config is the only option (and no timing configuration is displayed in the advanced chipset setup). On most BIOSes, you can set auto-config to "disabled" and manually configure timings.

I replied to an old thread so I'm the recent one here, but what you're saying sounds like my suspicion. Would that also be consistent with Speedsys displaying 27 MHz?

The only performance-related options in the BIOS that I don't already understand are one about waiting 1T or none for FPU (I set it to none), and 1T or 2T for ELBA# (won't boot on 1T, default is 2T).

I think I have the jumpers set to max out all the VLB stuff now, and my issue is on some cold boots it locks up at the VGA BIOS copyright screen, and when it does boot, the memory graph in Speedsys beyond 256K is pretty unimpressive. Raw CPU performance, VESA Memory throughput, L1 and L2 cached memory, and disk I/O all look better though.

I know how to write asm to do i/o port reads and writes and such if I were curious about settings that don't show in the BIOS, but there seems to be very little documentation at that depth about the UMC 82C491F.

Yeah, good luck on finding UMC chipset data sheets. If your BIOS is AMI, you might want to try AMISETUP to get access to hidden configuration options. If your BIOS is AWARD; you might want to try MODBIN to create a modified BIOS for your board that exposes more configuration options. MODBIN is a tool aimed at mainboard manufacturers which lets them choose which of the options the chipset manufacturer provided should actually be exposed to the user. A deep understanding of mainboard operation is helpful for using MODBIN, Assembly language knownledge is not required, as MODBIN only edits configuration data in the BIOS, but it does nothing about code.

I guess the ELBA# pin means "external local bus access". This signal is driven by a VL card to indicate that the cycle is handled by a VL card and should thus not be handled by the ISA bridge or the memory controller, although I am unsure whether having VL memory space on top of mainboard memory is actually a supported configurtion on most 486 chipsets. The VL slots have a pin called "LDEV#" (local bus device) that is to be driven by a VL card if it wants to respond to a cycle. The LDEV# signals of all VL slots (and onboard local-bus devices, if any) are typically ANDed together using a fast AND gate chip (remember, LDEV# is active low, so a physical "AND" gate actually means "OR"), and presented to the chipset. It seems UMC calls this OR of any LDEV# line "ELBA#".

The setup option is about when the chipset is going to theck its ELBA# input pin. It must not start an ISA cycle for cycles handled by VL cards, so it cannot forward the 486 cycle to the ISA bus until the decision for VL/non-VL is known, so it needs to wait for the ELBA# check result before forwarding. The fast option is T1, and it means that at the end of the first clock cycle of an 486 bus cycle, LDEV is sampled. The chipset looses one clock before it may assert any read/write command lines on the ISA bus. This might not be an issue because there might be address setup time requirements on the ISA bus that require a command delay anyway. The slower option is T2, which gives VL cards a whole extra cycle to assert LDEV#, but also delays forwarding of cycles to the ISA bus by an extra FSB cycle.

Disregarding the delay of the AND gate that computes ELBA# from LDEV#, the recommendation of VESA is to sample LDEV# at the end of T1 if clock speed is 33MHz or lower, and to sample LDEV# at the end of T2 if clock speed is above 33MHz. This is what the VL>33MHz jumper on the mainboard is for. It does not have any effect on mainboard operation itself, its only function is to drive the ID3 pin on VL bus that informs cards about the bus speed. Setting this jumper to >33Mhz, it tells the VL cards that it is OK to take an extra cycle on decoding the address, because LDEV is sampled at the end of T2. Setting this jumper to >33MHz and configuring ELBA# sampling to T1 is a non-conforming configuration. The second VL jumper is about write wait states. It also does not influence mainboard operation at all (on the mainboards I know), but it is directly connected to the ID2 pin. This pin tells VL cards that they can assume there are no 0WS write cycles on the VL bus, so taking an extra cycle to pick up the data on write cycles is OK, even if the card signals LRDY# (to indicate readiness, i.e. terminate the cycle) very early. Again, the VESA recommendation is that 0WS write cycles are forbidden at clock frequencies above 33MHz.

ELBA# at T2 is a performance killer for ISA performance, and if you need that configuration at 40MHz (or even at 33MHz), you should consider using a different mainboard, or you should place everything performance sensitive on the VL bus (mostly hard disk controller and graphics card) to avoid paying the ISA performance penalty during normal operation.

Reply 22 of 40, by jakethompson1

User metadata
Rank Oldbie
Rank
Oldbie
mkarcher wrote on 2020-07-12, 08:50:

The memory graph seems to indicate that I indeed was right about "chipset autoconfiguration" being the cause of the problem, but I explained about L2 timings, whereas the problem is in fact about main memory timing. (Nearly?) all 486 consumer boards derive the memory timing (how long to wait after the falling edge on /RAS to switch to the column address, how long to wait after that to provide the falling edge on /CAS, how long to wait after /CAS being pulled low until the data output is valid, how long /CAS needs to be low and how long /CAS needs to be high after that before the next fast-page-mode access may occur) from the main oscillator, sometimes allowing specification in half-cycles. This means the resolution for setting these time is 30/15ns with 33MHz FSB, and 25/12.5ns with 40MHz FSB. To compensate for the faster timing, the BIOS is likely increasing the number of clocks or half-clocks to keep in specification with some memory timing requirements.

As far as I know, BIOS auto-configuration on 486 boards usually tunes memory access times to be suitable for 80ns or 70ns RAM. If you have 60ns RAM installed, it is likely that the cycles counts you use at 33MHz would also work at 40MHz.

Finally, some thoughts about the numbers: 19MB/s is 1.19 cache lines per microsecond, or 840ns per cache line. This is around 28 cycles at 33MHz per cache line. 15MB/s is 0.94 cache lines per microsecond, or 1070 ns per cache line. This is around 43 cycles per cache line. The raw cycle count will be a bit lower, as the memory is also busy doing other things like refresh, but those numbers seem excessively slow for reading. Are you sure you did not mix up read timings and move timings? For moving, data is transferred twice over the bus, so speeds between 15MB/s and 20MB/s make much more sense.

I tried the tests again, this time keeping a careful log-book.
First things first, I was ultimately able to reproduce the intermittent hang on boot even at a bus speed of 33 MHz and the Intel 486DX2-66, so that doesn't seem to be related. I've made a ton of other changes recently including different SIMMs and an external CMOS battery, so I won't worry about that for now.

So I hit R for each Speedsys test and got these numbers from the SSTREP*.txt files. Speedsys seems to not distinguish between L1 & L2 cache likely for lack of CPUID, which is why I'm going into the .txt file and using the table - row "Memory", column "Reading."
Cx486DX2-80, bus speed 40 MHz, all other jumpers aggressive (<= 33 MHz) - CPU score 30.24, uncached Memory read 10.83 MB/s, disk linear read 5904 KB/s.
I tried setting JP18 (VLB speed > 33) to the conservative setting and it didn't affect the numbers.

Next, I set both JP18 (VLB speed > 33) and JP7 (CPUCLK >= 40 or 50). The CPU score worsened to 28.4, the memory read worsened to 10.66 MB/s, and the disk linear read worsened to 5634 KB/s.

At that point, I switched from VLB to ISA VGA to eliminate that as a factor. It didn't seem to change the scores, except worsen the disk linear read speed a little.

Next, I underclocked the Cx486CX2-80 to 66 MHz by lowering the bus speed to 33. I returned all other jumpers to aggressive settings.
Cx486DX2-80@66 - CPU score declined to 26.69, uncached Memory read improved to 17.17 MB/s, linear disk read worsened to 5703 KB/s.

Next, I switched from the Cx486DX2-80@66 back to an Intel 486DX2-66.
i486DX2-66 - CPU score declined to 25.17, uncached Memory read declined to 14.78 MB/s, linear disk read worsened to 5691 KB/s.

Stay tuned for another post in a bit though - AMISETUP seems to change things.

Reply 23 of 40, by jakethompson1

User metadata
Rank Oldbie
Rank
Oldbie
mkarcher wrote on 2020-07-12, 09:34:

Yeah, good luck on finding UMC chipset data sheets. If your BIOS is AMI, you might want to try AMISETUP to get access to hidden configuration options. If your BIOS is AWARD; you might want to try MODBIN to create a modified BIOS for your board that exposes more configuration options. MODBIN is a tool aimed at mainboard manufacturers which lets them choose which of the options the chipset manufacturer provided should actually be exposed to the user. A deep understanding of mainboard operation is helpful for using MODBIN, Assembly language knownledge is not required, as MODBIN only edits configuration data in the BIOS, but it does nothing about code.

First I'll get ELBA# out of the way. Let me clarify the hang was at the AMI WAIT... prompt. I tried flipping C: disk type to Not Installed, and also changing advanced IDE features (block read, LBA, 32-bit) to disabled. This avoided the hang. Verdict: my UMC8672 VLB IO card is what's forcing the ELBA# to be on T2.

Thanks for pointing me to AMISETUP as that seems key! Here are some of the unavailable settings:
Auto Config: Enabled
Cache Read: 3-2-2-2
Cache Write: 1 WS
DRAM Type: Fast Page
DRAM wait states: 1 WS
Keyboard Clock: 9.5 MHz
AT Clock: CPUCLK/4
IO Ready Time: 5/3 BCLK
Hold PD Bus: 2~3 T

Those do seem quite conservative. As an example I did nothing but change autoconfig to Disabled and DRAM wait states to 0. This alone improved memory read from 14.78MB/s to 18.82MB/s. This is with the Intel 486DX2-66 back in with bus speed of 33. Sounds like I have lots of things to play with. Unfortunately I have no cmos reset jumper so I'm stuck disconnecting the battery and waiting for it to clear if I make a mistake. My cache chips are 15ns and tag ram 5ns if that's a factor - seems those are roomy as far as increasing settings. I better set it back to autoconfig and put the Cx486DX2-80 back in before I tinker more.

By the way, I bought some EPROM chips and a programmer expecting to have to update the BIOS for LBA support. It came updated already. You think there is a bit I can flip to put these settings in the normal BIOS setup? I ran strings on the bios image and those are in there so hopefully the code to support them is there and just disabled.

Reply 24 of 40, by mkarcher

User metadata
Rank l33t
Rank
l33t
jakethompson1 wrote on 2020-07-12, 22:04:

First I'll get ELBA# out of the way. Let me clarify the hang was at the AMI WAIT... prompt. I tried flipping C: disk type to Not Installed, and also changing advanced IDE features (block read, LBA, 32-bit) to disabled. This avoided the hang. Verdict: my UMC8672 VLB IO card is what's forcing the ELBA# to be on T2.

So, obviously that card is too slow in pulling LDEV# for ELBA# to be active at T1. If you own another VLB IDE interface card, try it.

jakethompson1 wrote on 2020-07-12, 22:04:
Thanks for pointing me to AMISETUP as that seems key! Here are some of the unavailable settings: Auto Config: Enabled Cache Read […]
Show full quote

Thanks for pointing me to AMISETUP as that seems key! Here are some of the unavailable settings:
Auto Config: Enabled
Cache Read: 3-2-2-2
Cache Write: 1 WS
DRAM Type: Fast Page
DRAM wait states: 1 WS
Keyboard Clock: 9.5 MHz
AT Clock: CPUCLK/4
IO Ready Time: 5/3 BCLK
Hold PD Bus: 2~3 T

Those do seem quite conservative.

While auto-config is often quite conservative, keep in mind that if auto-config is enabled, auto-config overrides at least Cache Read, Cache Write and DRAM wait states, so the awfully slow values printed there are most likely not in effect unless you run FSB50.

jakethompson1 wrote on 2020-07-12, 22:04:

I better set it back to autoconfig and put the Cx486DX2-80 back in before I tinker more.

By the way, Award BIOSes (I know them better than AMI) can use different auto-config table for Cyrix and Non-Cyrix CPUs. Speed differences between Cyrix and Intel processors might be caused by auto-config chosing different chipset configuration as well as actual core performance differences. I guess there are technical reasons (like different signal timing specifications) that require these differences, though.

jakethompson1 wrote on 2020-07-12, 22:04:

By the way, I bought some EPROM chips and a programmer expecting to have to update the BIOS for LBA support. It came updated already. You think there is a bit I can flip to put these settings in the normal BIOS setup? I ran strings on the bios image and those are in there so hopefully the code to support them is there and just disabled.

They are there and are just disabled. AMISETUP does not know anything about your boards or chipsets by itself. It just collects both enabled and disabled settings from the ROM BIOS and presents you all of them. It's likely just flipping a single bit to make the settings user-available (but you still need to adjust the checksum, too. And if your BIOS is of the newer self-decompressing variant, you also need a tool to unpack and repack the BIOS). I don't know about the tooling available for AMI BIOS ROMs, but I guess there must be some OEM BIOS adjustment tool for AMI BIOSes just as there is MODBIN for Award BIOSes. I hope someone else with AMI experience can chime in here...

Reply 25 of 40, by jakethompson1

User metadata
Rank Oldbie
Rank
Oldbie
mkarcher wrote on 2020-07-12, 22:38:

While auto-config is often quite conservative, keep in mind that if auto-config is enabled, auto-config overrides at least Cache Read, Cache Write and DRAM wait states, so the awfully slow values printed there are most likely not in effect unless you run FSB50.

It was actually overwriting them in addition to overriding. When I shut down, put the DX2-80 back in, and booted, I found they were even more conservative. Cache write and memory access were both 2 WS, etc.

This speedsys is with some tuning of the hidden settings. Looks more like it!

I can't get cache timings below 3-2-2-2 nor can I get cache write below 1 WS. Anything I'd be doing wrong? Again cache sram is 15ns and tag ram is 5ns.

Reply 26 of 40, by jakethompson1

User metadata
Rank Oldbie
Rank
Oldbie
mkarcher wrote on 2020-07-12, 22:38:

While auto-config is often quite conservative, keep in mind that if auto-config is enabled, auto-config overrides at least Cache Read, Cache Write and DRAM wait states, so the awfully slow values printed there are most likely not in effect unless you run FSB50.

Hey, I just realized the tag ram HM3-65756F-5 is a 20ns chip, not 5ns which doesn't exist (sneaky!) .
The F means 20ns and the -5 means commercial grade.
This is probably why I can't get cache reads faster than 3-2-2-2 and writes faster than 1 WS, right?
Looks like I can replace all 9 chips with 12ns ones for $3 apiece, might do that just for fun just to see what happens.

Reply 27 of 40, by maxtherabbit

User metadata
Rank l33t
Rank
l33t

you should be able to get 2-1-1-1 with 20ns chips at 33MHz, but that's probably too slow for 40MHz

Reply 28 of 40, by Anonymous Coward

User metadata
Rank l33t
Rank
l33t

The tag RAM is normally recommended to be one speed stepping faster than the rest of the cache. In theory if you have 20ns cache, the tag should be 15ns, and if you have 15ns cache the tag should be 12ns. However, in practice this often doesn't happen.

"Will the highways on the internets become more few?" -Gee Dubya
V'Ger XT|Upgraded AT|Ultimate 386|Super VL/EISA 486|SMP VL/EISA Pentium

Reply 29 of 40, by fsinan

User metadata
Rank Member
Rank
Member

I tested my Cx486DX2-80 on a M912 V1.7 board with latest dual bios downloaded from vogons driver library at Award bios. I will test it with an İntel dx2-66 overdrive too to compare but for now here are the results with Cx486DX2-80 at best optimized bios settings that work flawlessly.

Last edited by fsinan on 2025-04-28, 09:44. Edited 1 time in total.

System:1
Cyrix 5x86-120GP & X5-160ADZ
Lucky Star LS-486E
System:2
Intel DX4-WB & AMDDX4-120
PcChips M912 V1.7
System:3
AMD K6-2-475 & Cyrix 6x86MX PR-233
Asus P5A-B
System:4
UMC U5S-40
486UL-P101
System:5
P3 Coppermine 800EB
Gigabyte GA-6BX7

Reply 30 of 40, by Disruptor

User metadata
Rank Oldbie
Rank
Oldbie
fsinan wrote on 2025-04-28, 05:54:

I tested my Cx486DX2-80 on a M912 V1.7 board with latest dual bios downloaded from vogons driver library at Award bios. I will test it with a wb cache enabled İntel dx2-66 overdrive too to compare but for now here are the results with Cx486DX2-80 at best optimized bios settings that work flawlessly.

Oh, a ~5 year necro.
Do you use 2-1-1-1 L2 cache timings?

Reply 31 of 40, by fsinan

User metadata
Rank Member
Rank
Member
Disruptor wrote on 2025-04-28, 07:27:
fsinan wrote on 2025-04-28, 05:54:

I tested my Cx486DX2-80 on a M912 V1.7 board with latest dual bios downloaded from vogons driver library at Award bios. I will test it with a wb cache enabled İntel dx2-66 overdrive too to compare but for now here are the results with Cx486DX2-80 at best optimized bios settings that work flawlessly.

Oh, a ~5 year necro.
Do you use 2-1-1-1 L2 cache timings?

Yes.

System:1
Cyrix 5x86-120GP & X5-160ADZ
Lucky Star LS-486E
System:2
Intel DX4-WB & AMDDX4-120
PcChips M912 V1.7
System:3
AMD K6-2-475 & Cyrix 6x86MX PR-233
Asus P5A-B
System:4
UMC U5S-40
486UL-P101
System:5
P3 Coppermine 800EB
Gigabyte GA-6BX7

Reply 32 of 40, by fsinan

User metadata
Rank Member
Rank
Member

Tested it with an İntel DX2-66 overdrive, which is basically a DX2 with heatsink.

For the synthetic tests except sysinfo, Cx486DX2-80 is faster. No picture here but sysinfor gives 144.3 for Intel. For Sysinfo and Quake Intel DX2-66 is faster. All tested with tightest and same memory timings.

Quake kills the Cyrix cpu, difference is huge.

System:1
Cyrix 5x86-120GP & X5-160ADZ
Lucky Star LS-486E
System:2
Intel DX4-WB & AMDDX4-120
PcChips M912 V1.7
System:3
AMD K6-2-475 & Cyrix 6x86MX PR-233
Asus P5A-B
System:4
UMC U5S-40
486UL-P101
System:5
P3 Coppermine 800EB
Gigabyte GA-6BX7

Reply 33 of 40, by Disruptor

User metadata
Rank Oldbie
Rank
Oldbie

Well, this is an Enhanced 486 DX2 from Intel. Do you know whether you have L1 cache in WB or WT mode?
Intels classic 486 DX2 do not have L1 WB cache. They run win WT mode.
Cyrix' 486 processors have WB support depending on motherboard. However the DX and DX2 need a special wiring for that.

And, perhaps the FPU of the Intel processor is faster than the one from Cyrix.
Perhaps you bench Doom too 😉

Reply 34 of 40, by fsinan

User metadata
Rank Member
Rank
Member
Disruptor wrote on 2025-04-28, 15:54:
Well, this is an Enhanced 486 DX2 from Intel. Do you know whether you have L1 cache in WB or WT mode? Intels classic 486 DX2 do […]
Show full quote

Well, this is an Enhanced 486 DX2 from Intel. Do you know whether you have L1 cache in WB or WT mode?
Intels classic 486 DX2 do not have L1 WB cache. They run win WT mode.
Cyrix' 486 processors have WB support depending on motherboard. However the DX and DX2 need a special wiring for that.

And, perhaps the FPU of the Intel processor is faster than the one from Cyrix.
Perhaps you bench Doom too 😉

It's just wt cache, there is no enhancement for ODPR66.

Cyrix is working in wb mode.

System:1
Cyrix 5x86-120GP & X5-160ADZ
Lucky Star LS-486E
System:2
Intel DX4-WB & AMDDX4-120
PcChips M912 V1.7
System:3
AMD K6-2-475 & Cyrix 6x86MX PR-233
Asus P5A-B
System:4
UMC U5S-40
486UL-P101
System:5
P3 Coppermine 800EB
Gigabyte GA-6BX7

Reply 35 of 40, by Anonymous Coward

User metadata
Rank l33t
Rank
l33t
Disruptor wrote on 2025-04-28, 15:54:

And, perhaps the FPU of the Intel processor is faster than the one from Cyrix.
Perhaps you bench Doom too 😉

I thought it was pretty well established that the FPU on the Cyrix is actually stronger than the Intel...

"Will the highways on the internets become more few?" -Gee Dubya
V'Ger XT|Upgraded AT|Ultimate 386|Super VL/EISA 486|SMP VL/EISA Pentium

Reply 36 of 40, by Deunan

User metadata
Rank Oldbie
Rank
Oldbie

No, I think it's the other way around. FPU on Intel DX2 is considerably faster than on Cyrix. Reason being the Cyrix is not (unlike AMD) a copy but they did their own implementation, based on earlier work with x87 chips.
What should be faster on Cyrix is integer multiplication (considerably so) and perhaps L1 (but not as much as UMC). And since it runs at 40MHz FSB Cyrix will be faster in every task that does not require heavy FPU usage, but it's considerably slower in Quake for example.
Also, reminder, Norton System Information was "helped" by Intel to show lower scores for Cyrix chips. So that particular benchmark uses integers only (AFAIR) but ends up with lower scores.

Reply 37 of 40, by fsinan

User metadata
Rank Member
Rank
Member
Deunan wrote on 2025-04-29, 11:19:

No, I think it's the other way around. FPU on Intel DX2 is considerably faster than on Cyrix. Reason being the Cyrix is not (unlike AMD) a copy but they did their own implementation, based on earlier work with x87 chips.
What should be faster on Cyrix is integer multiplication (considerably so) and perhaps L1 (but not as much as UMC). And since it runs at 40MHz FSB Cyrix will be faster in every task that does not require heavy FPU usage, but it's considerably slower in Quake for example.
Also, reminder, Norton System Information was "helped" by Intel to show lower scores for Cyrix chips. So that particular benchmark uses integers only (AFAIR) but ends up with lower scores.

NSSI shows slower fpu performance for CxDX2-80 than Intel DX2-66. But landmark doesn't. Even it shows Cyrix faster at 66Mhz. This is complicated, but there is no detailed technical documentation or comparison anywhere.

Some say, CxDX2 cpu's use their cache at bus speed, unlike intel cpus which uses at their double speed. I didnt see any technical documentation on this as well. I'm not sure on this.

For Quake case, I dont think that fpu speed difference describes everything, there is huge difference! AMD DX2-80 (copy of intel) runs at 10.2 fps but CyrixDX2-80 gives 5.7 fps. This is ridiculous on same board with same memory settings.

System:1
Cyrix 5x86-120GP & X5-160ADZ
Lucky Star LS-486E
System:2
Intel DX4-WB & AMDDX4-120
PcChips M912 V1.7
System:3
AMD K6-2-475 & Cyrix 6x86MX PR-233
Asus P5A-B
System:4
UMC U5S-40
486UL-P101
System:5
P3 Coppermine 800EB
Gigabyte GA-6BX7

Reply 38 of 40, by Deunan

User metadata
Rank Oldbie
Rank
Oldbie
fsinan wrote on Yesterday, 20:34:

For Quake case, I dont think that fpu speed difference describes everything, there is huge difference! AMD DX2-80 (copy of intel) runs at 10.2 fps but CyrixDX2-80 gives 5.7 fps. This is ridiculous on same board with same memory settings.

It depends on what instructions are used, and how heavily. In the 387 days the Cyrix coprocessors were faster, and actually more bit-accurate. Instructions like FSIN, FCOS, FSINCOS were running way faster on Cyrix - although that also depends on which version of the 387 you had. Cyrix had several and the 40MHz capable variants were slower than their previous 33MHz topping ones (but still faster than Intel).

I suppose Intel did their homework and designed completly new FPU architecture that is very integrated into 486 design. So much in fact that it can use interger ALU to speed up some of its operations (if the ALU is not busy). That's why you can't have external NPU for 486SX, it's not how it works, it would be way slower with external coprocessor than 486DX.

Cyrix on the other hand made their 486DLC core, than kept updating it. Their original 486S was actually just an improved DLC and could apparently work with external Cyrix 387/487 coprocessor. It's quite likely that the integration that happened in their DX series, even the late DX2, was simply nowhere near what Intel did. Cyrix had this approach also with later chips, which were also usually faster in integer workloads but suffered in floating point code. One could argue both ways, that their x87 design team wasn't as good or that their integer core team was so good that the design was not easily married to FPU - perhaps a bit of both.
In my own microbenchmarks Cyrix 486DX2 FPU is faster than Intel with 287 class code, that calculates sin(x) and cos(x) via FPTAN. But once you switch to 387+ instruction set Intel becomes the faster FPU. So it's clearly a completly different FPU pipeline and what you do with it will affect the results a lot.

Quake is also a special case. It not only uses FPU for math but also for movig data around, since it was coded for Pentium and this chip could do 64-bit loads and stores but only via FPU since integer registers were still 32-bit (I guess various memory combining tricks were not yet advanced enough to make up that difference). This approach hurts 486 (and lower) performace though, possibly more on Cyrix. Most FPU datasheets will tell you there is a penalty if you try to load random data to registers that is not properly formatted floating-point. It might be small enough on Pentium to be worth the trouble but could affect Cyrix chips way more than Intel/AMD.

Landmark code is using 16-bit only instructions and possibly their x87 code is also compatible with 8087, which would use FWAIT for each instruction. That might flatten the resutls a lot. Plus with faster chips their measurement method is less precise and eventutally starts reporting wrong values (somewhere around P200 and faster CPUs).

Reply 39 of 40, by fsinan

User metadata
Rank Member
Rank
Member

Yeah, made a bit search on the issue. There is no detailed microarchitecture analysis of Cyrix M7 cpus on the net, this is the one I could find at best.

Two specific reasons for Cyrix to be slower: ->Lack of specific address register unit of integer core.

A rumour, that is not written in this analysis is Cyrix has slower internal cache speeds, equal to bus speed as opposed to Intel.

FPU can not be considered "slower" for every instance, in fact, it is called faster according to this report analysis. It can be wrong.

Take a look at this;

https://www.cecs.uci.edu/~papers/mpr/MPR/ARTICLES/071101.pdf

Maybe we need a detailed test and analysis to fully understand the issue, maybe @feipoa knows more details on this. I can help for tests and analysis.

System:1
Cyrix 5x86-120GP & X5-160ADZ
Lucky Star LS-486E
System:2
Intel DX4-WB & AMDDX4-120
PcChips M912 V1.7
System:3
AMD K6-2-475 & Cyrix 6x86MX PR-233
Asus P5A-B
System:4
UMC U5S-40
486UL-P101
System:5
P3 Coppermine 800EB
Gigabyte GA-6BX7