VOGONS


A brief comparison of 386 FPUs

Topic actions

First post, by feipoa

User metadata
Rank l33t++
Rank
l33t++

I decided to run a quick comparison of the various 386 FPU types on a ALi1429-based 386 system. I have the FSB set to 33 MHz and am using a TI 486SXL for all FPU measurements except for the Intel RapidCAD, which requires the RapidCAD CPU. In the chart below, the winner, but not by much, was the grey-top Cyrix FasMath. You will notice that it is a few points faster than the black-top versions or the DLC versions of the same cahp. Cyrix likely added some DLC compatibility features to the FasMath, with consequence being marginally reduced performance. Conversely, the IIT 3C87 grey ceramic top chip (126 a.u.) was slower than its counterparts with either gold caps, black plastic caps, or with the DLC marking (135 a.u.). The ULSI DX and DLC chips had the same result and were a little slower than the IIT chips.

386_FPU_Roundup.jpg
Filename
386_FPU_Roundup.jpg
File size
470.78 KiB
Views
5054 views
File license
Fair use/fair dealing exception

At rock bottom was the Intel i387DX. I am left to wonder if Intel intentionally made it perform poorly to encourage RapidCAD or next generation chip sales. You will notice that the RapidCAD beats all the single-clock competitors, though a DRx2 or SXL2 at 66 MHz using a basic black-top FasMath will still outperform the RapidCAD, at least inasmuch as Landmark is concerned.

386_SXL-33_Chart.png
Filename
386_SXL-33_Chart.png
File size
13.68 KiB
Views
5054 views
File license
Fair use/fair dealing exception

What I found most intriguing was the ULSI Math-Co DX-2 66 MHz FPU. With an incredible 2x clock multiplier, it performed just barely worse than the single clock Cyrix FasMath grey-tops, and only a smidge above the FasMath black-tops. What is going on here? The ULSI DX2 at 66 MHz performs only 12% faster than the ULSI DX at 33 MHz. Does a clock doubled FPU need a clock doubled CPU to take advantage of its doubled clock rate? The ULSI DX-2 at 66 MHz scored 145 a.u. while the ULSI DX at 33 MHz scored 130 a.u. This had me wondering, so I also ran these chips with a clock doubled SXL2 at 66 MHz.

386_SXL2-66_Chart.png
Filename
386_SXL2-66_Chart.png
File size
8.02 KiB
Views
5054 views
File license
Fair use/fair dealing exception

With the clock doubled SXL2 at 66 MHz, the ULSI DX2 also at 66 MHz rounded the top with 225 a.u., while a 33 MHz FasMath black-top scored 209 a.u. Why doesn't the ULSI DX2 at 66 MHz yield significantly better results compared to the FasMath at 33 MHz? Is the CPU the bottleneck? Single-clocked FPUs, such as the FasMath and ULSI DX, both, show a 46% increase in performance at 33 MHz when using a clock doubled CPU (SXL2-66) compared to a clock singled CPU (SXL-33). The ULSI DX2 showed a 55% boost in going from the SXL-33 to the SXL-66. For the case of the SXL2-66, the increase from single-clock ULSI DX to doubled clocked ULSI DX2 was only 18%, but at least it was faster than the 12% increase for the case of SXL-33 for ULSI DX to ULSI DX2. Why is this?

The one other courisity was the C&T Super Math Chips. I was surprised to see it neck and neck with the Cyrix FasMath. I wonder how these chips were priced comapred to the competition...

Data table follows

386_FPU_Data.png
Filename
386_FPU_Data.png
File size
9.17 KiB
Views
5054 views
File license
Fair use/fair dealing exception

Plan your life wisely, you'll be dead before you know it.

Reply 1 of 148, by Deunan

User metadata
Rank Oldbie
Rank
Oldbie

Very nice! If I might suggest a small change to (hopefully) improve this benchmark: Add some high-contrast numbers to the FPU photo and then use these numbers in the result tables along with the names - this will make the chip identification way easier than using "black top", "gray package", etc.

In the unlikely case you haven't seen it yet, there is a nice document on the speed, accuracy and other differences of 386-era FPUs here: ftp://retronn.de/docs/FPU/coproc.txt
You might also want to google "paranoia.c" - in my humble opinion _the_ FPU accuracy and compliance test for any system that has a working C compiler.

Reply 2 of 148, by Anonymous Coward

User metadata
Rank l33t
Rank
l33t

Please don't use Landmark to benchmark anything. Landmark sucks, and it is notoriously inaccurate.
A more realistic way to test FPUs is to time a CAD rendering. Even fractint would be more useful than landmark crap.

"Will the highways on the internets become more few?" -Gee Dubya
V'Ger XT|Upgraded AT|Ultimate 386|Super VL/EISA 486|SMP VL/EISA Pentium

Reply 3 of 148, by feipoa

User metadata
Rank l33t++
Rank
l33t++

That's a good idea. Is there a built-in benchmark, or do I need to get out the stop watch? What version of fractint is best for early 486 chips?
https://fractint.org/ftp/

I'm a little confused by the navigational structure of the ftp site. I'm guessing that the really old stuff (pre-version 20) from 1997 and earlier is here: https://fractint.org/ftp/archive/ and version 20+ here: https://fractint.org/ftp/old/dos/ But then they have a whole folder dedicated to what looks like an old version 20.0, https://fractint.org/ftp/release.20.0/dos/ , while version 20.04p14 appears to be in another folder called current, https://fractint.org/ftp/current/dos/

Plan your life wisely, you'll be dead before you know it.

Reply 4 of 148, by Phido

User metadata
Rank Newbie
Rank
Newbie

Bravo!

I love this, Feipoa.. You have a much more comprehensive collection than I have I only have the;
Intel
FasMath
ULSI
IIT

No Weiteks 🙁

However, I would recommend you look at finding a benchmark that gives you break downs of performance in specific areas, like Sin, Cos, Tan, floating point addition/subtraction/multiplication division/exp/root etc.
There is a lot more going on here than first appears. Different FPU's are faster at different functions. In some cases wildly different, as different FPU's used different strategies (ie they weren't just cloning each other they were being innovative).
Different workloads will perform significantly differently. The FasMath was generally quite zippy.

Intel unfortunately killed off the C&T chips quite early on. They were real competitive, imagine if dells, compaqs, Tandy and IBM started shipping C&T systems with C&T chipsets and C&T cpu's and FPU's and video cards. The world would have changed.

Also have a look at the bus speed, because of the 386 bus, there is a real limit to how fast you can make a 387 FPU work. However faster buses do make faster 387's . A 40Mhz 387 will be faster than a clocked double FPU 33/66.

Weiteks were rumoured to be faster than some low end Pentium models (60mhz?) in certain functions with specific ranges.. Because the way they were addressed, they weren't limited by the x86 bus, but it also broke x87 compatibility.

Fascinating..

Reply 6 of 148, by feipoa

User metadata
Rank l33t++
Rank
l33t++

Quake - 0.1 to 0.3 fps at most I suspect.

I have run version 19.6 of Fractint on three different FPUs: Cyrix FasMath black-top, i387DX, and ULSI DX2-66. I used fractal image Mandelfn with Video size SF7 (1024x768x256c). All three FPUs took 32 seconds to finish drawing the fractal.

I also ran all three chips with an circuit simulation benchmark (CABT) and Roy Longbottom's optimised whetstone in DOS.

CABT
Cyrix black-top: 1.48 seconds
Intel i387: 1.65 seconds
ULSI DX2: 1.43 seconds

Roy Longbottom's WHETCOD
Cyrix black-top: 5.18 MFLOPS
Intel i387: 3.96 MFLOPS
ULSI DX2: 5.61 MFLOPS

PiDOS to 25K decimal places
Cyrix black-top: 64 seconds
Intel i387: 64 seconds
ULSI DX2: 64 seconds

The trend here seems to be in some agreement with Landmark v2. Any better program to benchmark?

Plan your life wisely, you'll be dead before you know it.

Reply 8 of 148, by kixs

User metadata
Rank l33t
Rank
l33t

I usually use AutoCAD 10 as it takes the longest to complete regen and hide. But differences aren't big in the end.

I was actually thinking on the same project for quite some time... but no time to spare. I have most of these FPUs + Weitek 3167-33 - SuperMath.

Requests are also possible... /msg kixs

Reply 9 of 148, by kixs

User metadata
Rank l33t
Rank
l33t
Phido wrote:

...

Weiteks were rumoured to be faster than some low end Pentium models (60mhz?) in certain functions with specific ranges.. Because the way they were addressed, they weren't limited by the x86 bus, but it also broke x87 compatibility.

Weitek was the fastest till 486DX2-66 came around.

Requests are also possible... /msg kixs

Reply 10 of 148, by Anonymous Coward

User metadata
Rank l33t
Rank
l33t

So based on these results, it seems that if you wanted to build a 386 for FPU performance, short of a weitek, a blacktop Fasmath overclocked to 50MHz would probably be the way to go.

"Will the highways on the internets become more few?" -Gee Dubya
V'Ger XT|Upgraded AT|Ultimate 386|Super VL/EISA 486|SMP VL/EISA Pentium

Reply 11 of 148, by kixs

User metadata
Rank l33t
Rank
l33t

Don't forget RapidCAD 😉

Weitek 4167 was fastest for PC till 486DX2/66. But it needed special support. 3167 was much slower (3-4X).

There was also Cyrix EMC87 with memory mapping. But without special support it performed the same as regular Cyrix FPU. I have it but don't know of any program that supports it 🙁

In a test, the EMC87 at 33 MHz ran the single-precision Whetstone benchmark at 7608 kWhetstones/sec, while the Cyrix 83D87 at 33 MHz had a speed of only 5049 kWhetstones/sec, an increase of 50.6% [63]. In another test, the EMC87 ran a fractal computation at twice the speed of the Cyrix 83D87 and 2.6 times as fast as an Intel 387DX [64]. A third test found the EMC87's overall performance to be 20% higher than the performance of the Cyrix 83D87 [65].

Requests are also possible... /msg kixs

Reply 12 of 148, by sunaiac

User metadata
Rank Oldbie
Rank
Oldbie
kixs wrote:

Don't forget RapidCAD 😉

He said 386 😁

R9 3900X/X470 Taichi/32GB 3600CL15/5700XT AE/Marantz PM7005
i7 980X/R9 290X/X-Fi titanium | FX-57/X1950XTX/Audigy 2ZS
Athlon 1000T Slot A/GeForce 3/AWE64G | K5 PR 200/ET6000/AWE32
Ppro 200 1M/Voodoo 3 2000/AWE 32 | iDX4 100/S3 864 VLB/SB16

Reply 14 of 148, by sunaiac

User metadata
Rank Oldbie
Rank
Oldbie

The good ol' days of no cache and slow buses !
We had time to contemplate life !
Everything is too fast now with them 486s :p

R9 3900X/X470 Taichi/32GB 3600CL15/5700XT AE/Marantz PM7005
i7 980X/R9 290X/X-Fi titanium | FX-57/X1950XTX/Audigy 2ZS
Athlon 1000T Slot A/GeForce 3/AWE64G | K5 PR 200/ET6000/AWE32
Ppro 200 1M/Voodoo 3 2000/AWE 32 | iDX4 100/S3 864 VLB/SB16

Reply 15 of 148, by Anonymous Coward

User metadata
Rank l33t
Rank
l33t

One point about the graph that lists "RapidCAD 2" as the FPU. Technically it's not an FPU. It's just a useless lump that plugs into the 387 socket and generates the #FERR signal. The actual FPU is in the RapidCAD-1 chip.

"Will the highways on the internets become more few?" -Gee Dubya
V'Ger XT|Upgraded AT|Ultimate 386|Super VL/EISA 486|SMP VL/EISA Pentium

Reply 16 of 148, by alvaro84

User metadata
Rank Member
Rank
Member
feipoa wrote:

Quake - 0.1 to 0.3 fps at most I suspect.

Tests run a few years ago, on boards I don't have anymore, at 33MHz, with an ISA S3 805. The 386 had some older OPTi chipset, an Intel 386DX-20 (isn't it a nice overclock? 😁) and 256k of cache, the 486 was a tiny cheap SiS496 one and I equipped them with 8MB of RAM. Plus I attached a PCI S3 Trio64V+ to the 486 as it had awfully slow ISA performance.

i387: 1.5 fps
C&T: 1.6 fps
RapidCad: 2.5 fps
i486DX: 4.2 fps

Fractint Mandelbrot 1024*768:

i387: 29.05s
IIT: 28.07s
Cyrix FasMath: 26.97s
Cyrix DLC: 26.97s
C&T: 26.58s
ULSI: 26.31s
RapidCAD: 17.96s
i486DX: 13.07s

Shame on us, doomed from the start
May God have mercy on our dirty little hearts

Reply 17 of 148, by rasz_pl

User metadata
Rank l33t
Rank
l33t
dirkmirk wrote:

I wonder if you would see any difference when running quake......

Quake doesnt need fast FPU, Quake needs pipelined one - ability to interleave instructions using zero cycle fxch register swaps. Afaik nothing x86 pre Intel Pentium did that, AMD caught up in late 1998 with CXT revision K6-2.

Open Source AT&T Globalyst/NCR/FIC 486-GAC-2 proprietary Cache Module reproduction

Reply 18 of 148, by feipoa

User metadata
Rank l33t++
Rank
l33t++
alvaro84 wrote:
Tests run a few years ago, on boards I don't have anymore, at 33MHz, with an ISA S3 805. The 386 had some older OPTi chipset, an […]
Show full quote

Tests run a few years ago, on boards I don't have anymore, at 33MHz, with an ISA S3 805. The 386 had some older OPTi chipset, an Intel 386DX-20 (isn't it a nice overclock? :D) and 256k of cache, the 486 was a tiny cheap SiS496 one and I equipped them with 8MB of RAM. Plus I attached a PCI S3 Trio64V+ to the 486 as it had awfully slow ISA performance.

i387: 1.5 fps
C&T: 1.6 fps
RapidCad: 2.5 fps
i486DX: 4.2 fps

Fractint Mandelbrot 1024*768:

i387: 29.05s
IIT: 28.07s
Cyrix FasMath: 26.97s
Cyrix DLC: 26.97s
C&T: 26.58s
ULSI: 26.31s
RapidCAD: 17.96s
i486DX: 13.07s

How did you get two decimal place precision running Fractint? Is there a built-in timer that I don't know about? And what version were you using?

Plan your life wisely, you'll be dead before you know it.

Reply 19 of 148, by feipoa

User metadata
Rank l33t++
Rank
l33t++
jesolo wrote:

Navrátil System Information (NSSI 0.60) - it has a CPU & FPU benchmark.

I'll try ByteMark and NSSI to see if they are sensitive enough. My primary confusion is why the ULSI DX2-66 is so slow... Why would they make a clock doubled FPU which isn't much faster than a Cyrix FasMath. What was the target CPU for the ULSI DX2 FPU?

Plan your life wisely, you'll be dead before you know it.