VOGONS


The World's Fastest 486

Topic actions

Reply 240 of 747, by The Serpent Rider

User metadata
Rank l33t++
Rank
l33t++

Am5x86-180Mhz additional testing under Windows 95 OSR2 with Voodoo 3 and 32Mb RAM:

PC Player Direct3D Benchmark 2.10 (320x200, perspective correction off) - 20.8 fps
PC Player Direct3D Benchmark 2.10 (640x480, perspective correction on) - 16.5 fps
Incoming (1024x768) - 10.5 fps
Quake 2 v3.20 Software (320x240, demo1) - 8.7 fps
Quake 2 v3.20 MiniGL (640x480, demo1) - 13.1 fps
Quake 2 v3.20 MiniGL optimised (640x480, demo1) - 16.8 fps
Quake 3 Demo v1.09 (640x480, Low geometry, Vertex, Demo001) - 7.9 fps
Quake 3 Demo v1.09 (640x480, High geometry, Lightmaps, Demo001) - 6.6 fps
MDK2 Demo (640x480) - 5.79 fps
SiSoft Sandra 97 CPU - 234 MIPS
SiSoft Sandra 97 FPU - 50 MFLOPS
CPU-Z Vintage CPU Speed - 169.2
CPU-Z Vintage FPU Speed - 418.8

Performance wise, Windows 95 didn't added much, although it's much snappier. Now some tests on P75.

P75 / 430FX / 256Kb asynchronous L2 cache / 32Mb EDO RAM / Voodoo 3:

PC Player Direct3D Benchmark 2.10 (320x200, perspective correction off) - 15.7 fps*
PC Player Direct3D Benchmark 2.10 (640x480, perspective correction on) - 13.2 fps*
Incoming (1024x768) - 10.6 fps
Quake 2 v3.20 Software (320x240, demo1) - 8.0 fps
Quake 2 v3.20 MiniGL (640x480, demo1) - 13.9 fps
Quake 2 v3.20 MiniGL optimised (640x480, demo1) - 17.5 fps
Quake 3 Demo v1.09 (640x480, Low geometry, Vertex, Demo001) - 7.5 fps
Quake 3 Demo v1.09 (640x480, High geometry, Lightmaps, Demo001) - 5.9 fps
MDK2 Demo (640x480) - 5.01 fps

And P120:

PCPlayer Benchmark 320x200 - 28.2 fps
PCPlayer Benchmark 640x480 - 12.6 fps
SiSoft Sandra 97 CPU - 224 MIPS
SiSoft Sandra 97 FPU - 63 MFLOPS
CPU-Z Vintage CPU Speed - 166.5
CPU-Z Vintage FPU Speed - 692.4
C&C Tiberian Sun Demo (subjective experience) - more or less equal to Am5x86-180.
_________________________________________________
Overall conclusion:
Am5x86-180 Integer = ~P120 with asynchronous L2 cache
Am5x86-180 FPU = ~P75 with asynchronous L2 cache
Loading time in various games (subjective experience)) = P90/P100

*Looks like this benchmark really loves overclocked PCI bus.

I must be some kind of standard: the anonymous gangbanger of the 21st century.

Reply 241 of 747, by ph4nt0m

User metadata
Rank Member
Rank
Member
feipoa wrote on 2019-12-27, 00:50:

Looking back though this thread, I have a screenshot showing 19.8 fps in Quake sharewave v1.06. This was with the IBM 5x86c-133/2x. Several pages later, that same score was 19.1 fps, so not sure what I changed between those two screenshots. I'm still confused as to why ph4nt0m's score is lower on his LuckyStar.

I have revisited this build evaluating performance of EDO vs. Fast Page on SiS 496. It seems that my 17.1fps Quake timedemo1 score is for a Cyrix 5x86 accidentally set to 120MHz (2x60MHz). It should be 19.3fps on 133MHz (2x66MHz). BTW I have benchmarked it using both S1R3 and S0R5 CPUs, EDO and FP memory, though the performance is mostly FPU limited:

S1R3 and EDO: 19.3fps
S1R3 and FP: 19.0fps
S0R5 and EDO: 19.0fps
S0R5 and FP: 18.7fps

The timings were set the same in the BIOS, so it's purely EDO vs. FP. Speedsys is more positive though.

EDO read: 78.58 MB/s
EDO move: 26.23 MB/s

FP read: 63.83 MB/s
FP move: 24.36 MB/s

+23% on reads is a decent imporovement.

My Active Sales on CPU-World

Reply 242 of 747, by feipoa

User metadata
Rank l33t++
Rank
l33t++

Coincidence in timing I guess, but a few days ago I also tried EDO in my IBM 5x86c-133 system and EDO couldn't cope with the timings I use with FPM. I suspect the 19.1 fps vs. 19.8 fps discrepancy is related to what I had loaded in config.sys and autoexec.bat at the time of testing. I should retest with those files bypassed.

Branch prediction works with S0R5 in DOS, so the results should be the same as S1R3.

I'm curious about the large difference in Speedsys memory read speeds. Does cachechk and Sandra99's memory benchmark in Windows 9x show the same percent improvement? Also CPUmark99 is pretty sensitive to these changes, so might check this as well.

Plan your life wisely, you'll be dead before you know it.

Reply 243 of 747, by ph4nt0m

User metadata
Rank Member
Rank
Member
feipoa wrote on 2020-03-24, 20:35:

Branch prediction works with S0R5 in DOS, so the results should be the same as S1R3.

It seems it's more about 16-bit vs. 32-bit code than DOS vs. Windows. Anyway, I always set up S0R5 with BTB off and BWRT on, S1R3 with BTB on and BWRT off. LOOP is off for both because it hangs up Windows and doesn't seem important in general. All other optimisations are enabled.

My Active Sales on CPU-World

Reply 244 of 747, by kool kitty89

User metadata
Rank Member
Rank
Member

I've discovered Tomb Raider might be an interesting 486 benchmark, albeit not as convenient as games with built-in timedemos. I'm not sure if there's any way to enable an in-game frame counter, otherwise it'd be a matter of using video capture frame counting. (I think that's the method Phil uses for Wing Commander III or IV)

Unlike Wing Commander III's engine or Descent's, Tomb Raider does make use of the FPU, but not like Quake. I actually got it to launch and enter the game on a TI DLC-40, but it was well under 1 FPS and horribly laggy (though sound didn't stutter like in Quake), so it must default to some sort of software floating point handler that's horrifically slow rather than using integer math. It was vastly faster with a IIT 387 installed and actually playable at minimum details (tiny screen), so it's probably using the FPU for vertex math and doing the rendering all on the ALU side.

If I'm right about the renderer working that way, it's way better suited to 486 and 5x86 class CPUs and Cyrix's 6x86 (or the K6 family for that matter) than Quake-like renderers pipelining their rendering through the FPU itself and using floating point for additional perspective correction on span segments.

I'd previously suspected Tomb Raider was a pure ALU based game (including based on an anecdote about someone running it, poorly, on a 386 back in the mid 90s), but it seems more like an unusual case of a 486DX optimized engine also banking on fast FPU performance for the breadth of Pentium and 586/686 class CPUs. That and possibly wanting to avoid 16-bit precision vertex math and taking note of 486 FPU multiply and devide execution times being similar or faster than 32-bit ALU math (though there's a hit for add/subtract being slower in FP) and some 486/586/686 class CPUs supporting parallel execution on FPU and ALU via an instruction FIFO or pipeline, so there could be some ALU/FPU parallelism when FPU operations were kept relatively light.

I'm still not that surprised it turned out to be that sort of hybrid engine given I'd previously tried playing it on every Socket 5/7 CPU I could get my hands on and found the Cyrix and AMD offerings as fast or faster than Intel's at similar clock speeds. It probably likes really fast 486-like CPU performance in general, maybe an odd case for the Winchip and C3 to do well with.

That's in huge contrast to the Pentium FPU, which does multiplication significantly faster than the ALU and features many FPU instructions that can execute in parallel with eachother, making things quite lop-sided and making Pentium-quirk-optimized rendering do the opposite sort of weighting as 486 ... or pretty much any Pentium competitor and maybe even the Pentium Pro. (I forget the execution time difference there, but 32-bit integer performance might exceed FPU performance, I know reviews from the era spoke about how its FPU was less impressive than integer performance compared to high-end RISC processors of the time and peak multiply throughput is actually slower than the P5x FPU's)

And even the AMD K5 with relatively fast (and pipelined with internal parallel execution and fast multiply and divide) FPU still benefitted more from relying on its very fast ALU, especially in the later bug-fixed revisions with all 5 integer units working properly. (though even the early K5 seems to do well in ALU integer tests, more DSP-like than the 6x86 or Pentium or K6 for that matter)

I've got a 120 MHz (3x40) Cyrix (ST) 486DX4 set-up working currently and Tomb Raider runs pretty decently with that, even up to the full screen size at 320x200, though it's better a bit smaller (and I prefer tweaking the aspect ratio closer to square pixel anyway, stretching the screen wider). 640x480 isn't playable other than down in a 320x240 window, and running at the max perspective correct detail setting hurts a bit, but each of the 3 texture detail settings seems to be about as much of a performance hit as 1 screen size step.

It's in an Opti 495SX 386/486 VLB board (using a CL5422 1MB ISA VGA card currently), 256kB of 20 ns write-through cache with 2-2-2-2 read timing, 32 MB 60 ns FPM DRAM (30 pin SIMMs) with zero wait states, Vibra 16S, and a generic multi-IO card for floppy and IDE, though I was using a motorola based floppy/IDE card a week or so ago. Also using a 1/4 ISA divider since everything seems happy there and the 170 MB quantum hard drive seems happier at 10 than 8 MHz for some reason, or just doesn't like specific dividers on this board. (It's OK at 40/5 at least, but it's happy at 40/10 without errors and several pulls and win98SE scandisk checks; I actually got more HDD errors caused by playing with unstable cache timings or unstable DRAM at 50 MHz FSB operation on a DX66 and DX80)

And yes, that should be a 5V only board and a 3.45V rated CPU, but it seems to be happy enough for now and cool enough with a 486 heatsink+fan installed. (I had a makeshift fan screwed onto generic heat sink stuck-on with sticky thermal paste, but switched to a new old stock 486 cooler ... I think the makeshift one has better heatsink/fin design though and maybe better surface contact, short of getting bumped loose)

I got a handful of ST DX4-100s from a chinese ebay seller a few months ago in the $8 range, so I'm a bit more comfortable playing around with overvolting these than some others. The board's components don't seem stressed either, though it's only officially rated for a DX2-66 or DX-50, but it's from 1993 and nothing faster existed.

I do also have an actual 5V rated ST DX2-80 from a junk/scrap lot that had mangled pins I've straightened, and I'm a bit confused about where Cyrix and ST deviated from voltage ratings (most/all Cyrix DX2-66 and 80 CPUs seem to be 3.4V parts, or those ones with 3.3/3.6/4.0V marked on them), and it gets confusing as to when and why different ratings were used (ie heat, compatibility, actual silicon/transistor/dioide tolerances) and what process those used vs the SGST made DX4 and 5x86s.

The packaging used on the SGST DX4s and Cyrix 5x86s also look similar or near identical, but then the ceramic packaged IBM 5x86s look very similar, too. I wonder if there's any identification guide to sort out Cyrix 5x86s made my SGS vs IBM. (the IBM produced Cyrix 486DX-2s are also very similar, including the Blue Lightning DX2s)

SGS also supposedly kept making 486DX4s well into the late 90s or beyond long after Cyrix and IBM had dropped them (though IBM kept making QFP 5x86Cs for a long time, too), but they may have continued using the same .65 micron (I think) process and the same masks, just a high yield part on a mature process and no investing in new masks or revisions. That should be the same process used for the 5x86, at least the SGS built ones, and if these 486s tolerate 5V for long periods, it might say the same about 5x86s. (but might not apply to IBM's manufacturing ... and would be worth a lot more testing before using on a rarer or more expensive 5x86, let alone a 133 ... but if a lowly 5x86-100 managed 150 MHz at 5V that might be interesting in its self, or maybe an Intel DX4 at 3x50 MHz)

If nothing else, I'd expect both IBM and Cyrix .65 micron parts to be more tolerant of overvolting than AMD's .35 micron 5x86 and DX4100 (I think all DX4-100s with 16kB cache are that same process, just down-binned DX5s, unlike the older 8kB DX4-100 and 120 which were .5 micron, I think)

The die is also much larger vs the AMDX5 which heats up incredibly fast in a very small area around the die, not letting the ceramic package soak up the heat. (I found that out doing a finger-test when basic POST function testing CPUs without heatsinks ... the ST DX4 could run for a while without getting painfully hot or apparently indefinitely with a 90 mm fan blowing on it, and similar for an AMD DX2-66, but not the DX5 ... which I also didn't want plugged into a 5V board for more than a few seconds given they're more explicitly known for burning up if actually used at 5 volts for any length of time)

Also oddly, the current DX4 that's happy at 120 MHz won't even post at 150, but another that does 150 (and will boot DOS) keeps throwing protection errors (and crashes X-Wing) at any setting I've tried down to 25 MHz FSB and L2 cache disabled. (didn't bother with L1 disabling as that kind of defeats the purpose)

But that same CPU runs fine at the 2x multiplier setting on an SiS 496 based PCI board at 33, 40, and 50 MHz (though my Virge 325 didn't like 50 MHz in mode 13h ... palette errors and then crash) and did basic function testing OK at 2x 60 MHz, but that needs further testing. (that particular DX4 is from a different lot and has a different lot/date code on it, but I'll have to look that up for specifics ... or actually remove the heatsink that's on it right now) I'm thinking it may be more of a chipset issue than a voltage one, though I haven't tried setting the 496 board to 5V yet.

I haven't actually worked out the voltage select jumpers at all on that board. It's an Acorp SiS 496/497 board with 4 72 pin SIMM slots, 3 PCI, and 3 16-bit ISA, but Stason doesn't seem to have anything acorp listed and I haven't sorted through all their unbranded 486 boards to match it, yet. (it appears to be set to 3.3~3.6V as it is given how much cooler the CPUs run and some 5V CPUs won't post at all, though a couple Intel DX33s seem to run OK oddly enough, and a DX50 posted at 60, which seems even stranger; I considered probing the socket with a wire and multimeter, but it seemed like a bad idea at the time)

Tomb Raider is way more playable than Quake on that DX-120 at any screen size and doesn't have the audio stuttering issue or as much jerkiness from higher to lower framerate (getting closer to flat walls makes the framerate jump in quake, and only big, open, detailed rooms/chambers tend to hurt Tomb Raider, but still not as badly ... plus the smoother, articulated model animation still shows). You get plenty of sub-pixel polygon seam errors and Z-Sorting errors in Tomb Raider, but that's no different on other systems or the Playstation (the Saturn mostly avoids that as do most accelerated DOS versions).

It's also a 1996 vintage 'killer app' sort of title like Quake, so it's interesting for comparison. Terminal Velocity might be another, but I think that may be ALU-only. I should try the shareware version of that with a 486DLC or SX. (it'd be nice to have an SX2-66, but those seem to be super rare)

Reply 245 of 747, by kool kitty89

User metadata
Rank Member
Rank
Member

I might as well include a few benchmark results, too, though not relevant to the non-overclocked topic of this thread, and not in a particularly fast 486 board anyway, but for reference:

the above Opti 495SX
40 MHz FSB x3 (120 MHz)
256 kB write-through cache at 2-2-2-2 write, zero wait cache write,
, zero wait state DRAM settings, 32MB 60 ns (8x 4MB 30pin simms)
ISA bus 40/4 MHz (10 MHz)

X-Wing detect.exe = 20 ticks best, 21 ticks average

Speedsys 478 = 40.06
cache L1 = 91.05 MB/s read, 30.11 MB/s write, 38.02 MB/s move
L2 = 68.14 30.05 24.58
RAM = 11.78 30.46 9.07

PC Player 320x200 = 11.2 fps
Doom = 2134 gameticks 3696 realticks

Quake 320x200 = 128.8s 7.5fps

Landmark 6.0 =
513.5MHz AT
938.48 MHz 287
4818.82 chr/ms

CPU Ident. Utility 1.25 = 120.1 MHz, 40 MHz bus
model= 1Fh
stepping = 36h

Norton System Information 8.0
119.1

chris's bench = 11.2

I forgot to record the 3Dbench scores, but I remember V1.0 was 50.0. (I think that's frame limited)

Reply 246 of 747, by ph4nt0m

User metadata
Rank Member
Rank
Member
kool kitty89 wrote on 2020-03-25, 05:24:

That's in huge contrast to the Pentium FPU, which does multiplication significantly faster than the ALU and features many FPU instructions that can execute in parallel with eachother, making things quite lop-sided and making Pentium-quirk-optimized rendering do the opposite sort of weighting as 486 ... or pretty much any Pentium competitor and maybe even the Pentium Pro. (I forget the execution time difference there, but 32-bit integer performance might exceed FPU performance, I know reviews from the era spoke about how its FPU was less impressive than integer performance compared to high-end RISC processors of the time and peak multiply throughput is actually slower than the P5x FPU's)

The only FP instruction the Pentium FPU can execute in parallel is FXCH. It's the actual execution latencies and pipelining that make this FPU shine.

Pentium:
FLD = 1 clock
FADD = 3 clocks
FMUL = 3 clocks
FDIV = 39 clocks

486:
FLD = 3 clocks
FADD = 8 to 20 clocks
FMUL = 14 clocks
FDIV = 73 clocks

In addition, Pentium can start execution of FADD and FMUL every clock while 486 has to wait for retirement of the previous instruction. This is why 486 couldn't outperform Pentium on FP even at 2x clock speed.

kool kitty89 wrote on 2020-03-25, 05:24:

And even the AMD K5 with relatively fast (and pipelined with internal parallel execution and fast multiply and divide) FPU still benefitted more from relying on its very fast ALU, especially in the later bug-fixed revisions with all 5 integer units working properly. (though even the early K5 seems to do well in ALU integer tests, more DSP-like than the 6x86 or Pentium or K6 for that matter)

FPUs of K5, K6, Cyrix and IDT were not pipelined. K5 has no 5 integer units. Two ALUs, one FPU, one branch execution unit, and two load/store units sharing the same execution path which bottlenecked the whole architecture.

kool kitty89 wrote on 2020-03-25, 05:24:

The packaging used on the SGST DX4s and Cyrix 5x86s also look similar or near identical, but then the ceramic packaged IBM 5x86s look very similar, too. I wonder if there's any identification guide to sort out Cyrix 5x86s made my SGS vs IBM. (the IBM produced Cyrix 486DX-2s are also very similar, including the Blue Lightning DX2s)

ST didn't produce any 5x86. All were made by IBM. Even those sold under the brand of ST.

My Active Sales on CPU-World

Reply 247 of 747, by kool kitty89

User metadata
Rank Member
Rank
Member

I'm seeing conflicting info about the K5, but multiple articles mentioning the 5 integer units, though I suppose they could be repeating an error originating here:
http://209.68.14.80/ref/cpu/fam/g5K5-c.html

It was my impression that 3 of the integer units (which might not be full ALUs) were disabled due to bugs in the early revisions of the K5 and that was part of the main difference with performance gained in the later versions. (this referring to the architecture of the native RISC core, not the x86 translation portion)

I forget if it was in an article or AMD's own literature that mentioned the K5 FPU pipeline, but I seem to recall it being in AMD's own documentation.

I forget the latency listed, but I remember peak fmul throughput was single clock tick. (I remember it being higher latency than the pentium's 3/1 latency/throughput, maybe 7/1)

The term pipeline might also be used loosely and might be applied to a prefetch queue like Cyrix's 4-stage FIFO. But in any case, I thought the K5 FPU was generally more competitive with Pentium-optimized FPU exploits/hacks, or at least in terms of Quake performance.

I don't recall the performance of fxch on the K7, but that seems to be the major weakpoint of non-Pentium FPUs (prior to the K7 and maybe Winchip 2) when it comes to Quake-type engines, or anything involving lots of register swapping and/or int/float translation. (there may be some other operations involved in int/float translation that are also slower on other FPUs)

fxch performance being the bottleneck also made sense to me given the relatively similar clock for clock performance of the K6 and 6x86 MX/MII in Quake and Quake II software rendering when the K6 FPU itself is significantly faster in most respects and in most benchmarks.

Albeit, the K6 may lack even an FPU FIFO prefetch/execution queue, in which case there'd be much less parallelism between ALU and FPU operations. (or have a queue shorter than the Cyrix one)

And did SGS Thompson actually make 6x86s, or were the ST branded 6x86s also IBM manufactured?

Reply 248 of 747, by kool kitty89

User metadata
Rank Member
Rank
Member

On a more 486 (or Socket 3) relevant note: I'm not sure if it's already come up, but according to IBM's literature on the 5x86C, several Socket 3 chipsets support linear burst cache functionality.

See:
http://datasheets.chipdb.org/IBM/x86/
http://datasheets.chipdb.org/IBM/x86/5x86/40034.PDF

There's:

ALI M1489/87
OPTI 82C465MVB
UMC 8880

Symphony (Wagner) 491/492

PicoPower (Sequoia)
868/818 486,
668/618 486,
768/718

It lists the SiS 496/497 chipset as being slated for Linear Burst support in a later revision, but I'm not sure that ever happened. (or if there's a different model designation for it, or just a certain serial number or production code range)

Reply 249 of 747, by ph4nt0m

User metadata
Rank Member
Rank
Member

Early K5 aka 5k86 SSA/5 has no branch prediction. It was late to the market and AMD had to deliver anything Pentium class that worked. Engineering samples in mass production basically. It would be better if they just made a stripped down version of K5 for 486 sockets, but their 5x86 was making good sales and they didn't want internal competition.

K7 has a completely redesigned FPU. Much better than anything Intel had that time. It was Core 2 when Intel became a leader in FP again.

Cyrix 486, 5x86 and 6x86 share the same FPU design basically. In fact, nothing changed for Cyrix in this area since their 8087, 80287 and 80387 coprocessors which were excellent back in the time. They kept the FPU transistor budget low because integer performance was considered of higher importance. When they finally had resources available for a better FPU in times of 6x86MX, they went an easier way of quadrupling the cache. Cyrix was losing their market share and sold itself to National. The new owners only wanted MediaGX and didn't care much of the 6x86 line. It wasn't RISC and didn't scale well anyway. Remaining Cyrix engineers nearly finished the next gen M3 Jalapeño core which was very much P6 class. Out-of-order execution, register renaming, integrated 256Kb L2 cache and Rambus memory controller, dual issue pipelined FPU with MMX and 3DNow!. They had it done for the most part by the end of 1998 and took another year to integrate a 3D core when National felt they had enough and sold all of Cyrix except MediaGX to VIA and Jalapeño was simply cancelled.

My Active Sales on CPU-World

Reply 250 of 747, by ph4nt0m

User metadata
Rank Member
Rank
Member
kool kitty89 wrote on 2020-03-28, 09:29:

And did SGS Thompson actually make 6x86s, or were the ST branded 6x86s also IBM manufactured?

Both. The following are ST branded 6x86 PR166. One made by IBM, another original.

Attachments

My Active Sales on CPU-World

Reply 251 of 747, by kool kitty89

User metadata
Rank Member
Rank
Member

So the ST 6x86 is another thing Red Hill's museum got wrong.

On the 5x86 manufacturing end: if IBM was manufacturing all the Cx5x86 cores, what's the deal with (apparently) only the 5x86C revisions being IBM branded and for that matter: no Cyrix branded 5x86C models. (or are late-model Cyrix 5x86s the same silicon, mask revision, etc, just marked and graded differently?)

It doesn't seem like IBM was just grading and marking their parts more conservatively given the actual chip functionality changed. Or am I mistaken and are IBM's 5x86 75 MHz parts equivalent to (at least some) 100 MHz Cyrix branded ones?

And for that matter, IBM started selling Cyrix based CPUs with their own IBM label on them starting with the 486DX2 (the Blue Lightning DX2 CPUs). So were those all IBM manufactured and were any of the ST branded 486s actually IBM produced?

I assume the 5V rated DX2 66 and 80 chips by ST were their own, but what about the DX4? Wikipedia lists that as being manufactured by SGST for a long time, but was it just sold through ST and just continued IBM manufacturing sort of like IBM's long-term 5x86 production?

Incidentally, I tried the 5V rated DX2-80 in that same Opti board and it wouldn't get past POST at 2x50 where I had a BL DX266 running at 2x50, apparently stable, but also open-bench 90 mm fan cooled (over both a heatsink and lots of airflow over the cache chips). The board is finicky at 50 MHz anyway, and I've had trouble with an Intel DX50 in it (but the bent pins might not make good enough contact in a non-ZIF socket and I'll probably save that thing for others) Besides that, with low wait states at 40 MHz, most performance is better there anyway. (memory bandwidth is mediocre even at 'zero' wait state DRAM configurations, but they do work at 40 MHz and gave generally better benchmarks at 2x80 than 2x50 with that BL)

It seems happy with that ST DX4 at 3x40, and an Avance Logic VESA card, too. It might do better at 33 MHz FSB and faster VLB settings, but I haven't messed with the video card's jumpers or the boards VLB selection at all. It's doing 3DBench at 55.5 FPS now. (and I was mistaken earlier, with the CL5422: it does 41.6 FPS in 3DBench 1.0 at 10 MHz ISA bus and 43.4 at 13.3 MHz. (the IDE drive seems fine at that clock rate now, too)
A CL5420 at 10 MHz in low wait state jumper mode does 43.4 at 10 MHz and crashes or gives lots of screen glitches at 13.3 MHz. (at 13.3 MHz with wait states on it does 43.4 as well)

PC Player was up to 12.5 FPS with the AL VESA card.

Quake at 320x200 was stuck at 7.7 FPS pretty much across the board, 7.6 worst-case with my ATi Graphics Ultra in 8-bit mode, 7.8 FPS best case with the low WS 10 MHz 5420 (I think I skipped that test on the VLB card). Doom varied a lot more, but the 8-bit mode performance loss was less dramatic than 3DBench or PC Player where were pretty much cut in half. (which makes sense since Doom does VGA RAM/register writes with 8-bit pixels or 2-pixel spans in low detail mode, and the other seem to be running in mode 13h and bottlenecks by the block-copy rate: though that doesn't say much for the VLB performance of this board ... and PCPlayer isn't 16-bit real-mode like 3DBench, so should be doing 32-bit copy operations, or maybe it also says the board has fairly fast ISA bus optimization and maybe DMA is being used, so ISA copies get buffered through chipset FIFOs and VLB might be limited to PIO and/or DMA is unbuffered and halts the CPU for entire transfers where ISA DMA allows more parallelism ... or I've just got unnecessary wait states enabled on the VLB card and/or board)

Anyway, this board is more of a 'fastest 386' sort of category (or would be if I could work out how to set the clock synthesizer to 50 MHz), or fast 486DLC. (I'll have to try to put a DLC40 and 487DLC40 or Fasmath together for some Pentium era benchmarks for that) My DLC40 seems to go fine, though I wouldn't expect it to actually do 50 MHz like some AMD DX40s can given the reputation for Cyrix chips running close to their limits.

That's probably what the OPTi 486SX (or DLX or similar variations) is best at. Fast 386-socket boards with full 486DLC support and 486SX/DX upgrade options. Maybe a bit like some of the Asrock boards oriented around flexible forward compatibility and legacy support: not great price to performance for a clean/new system build, but good for maching up mixed used (or cheap late-model old gen) parts and having flexibility for expansion. Moreso since a lot of slower rated, later production PGA Intel 386s as well as AMD's tended to overclock well, especially with a heatsink added. (like cases of DX20s used at 40 MHz, I've seen some anecdotes of PC shops setting up systems like that rather than end-user tweaks)

For AMD, I honestly thought a better move might have been to hedge their bets with a Socket 5 AM486 derived core, likely with a lot more L1 cache, sort of like an earlier WinChip. I think most engineering would be a new I/O block for the 64-bit wide P5 bus. Potentially even earlier than the 1995 launch of the .35 micron process x5 5x86. (a beefed-up .5 micron 486 derivative should've still used a lot less silicon than the .5 micron K5s)

That and pushing for voltage specs up to 4.0V. (ie getting good chipset and board vendor support for that) The AM486 core should've been low enough power to justify that where higher voltage on the K5 and 6x86 would have made them too exotic for the time. (PSU and board-level power concerns, plus the thermal issues)

I know NexGen resorted to 4.0V Vcore, but the Nx586 was both a smaller, less power hungry chip, and exclusively used on a proprietary board architecture.

As for the Cyrix FPU, I'd gotten a similar impression of the coprocessor core itself changing little or not at all (aside from refinements/tweaks specific to process changes) since the 386 or 486DLC days, but the I/O interface with the CPU core is where the changes were. (reducing latency and bandwidth bottlenecks; and with Cyrix's FPU already being fast that alone would've made a bigger impact than Intel or AMD's offerings: I think the AM486 had the more sluggish of the FPUs there too, depending on what operations and tests you compare, and I'm not sure the AM5x86 revision changed that much or at all, though I know Intel's DX4 had significant performance differences from the DX2, at least on the ALU end)

I'm not sure where the FIFO came in and if that was present from the first Cyrix 486DXs or not, but I could imagine that being one of the enhanced features added to the 5x86 core among the other tweaks/enhanced functionality, some of which were disabled and buggy, except on the 5x86C where they can usually be enabled without problems.

I do wonder how well the FPU scales or overclocks compared to the CPU cores of various 6x86 (M1 and M2) cores, especially given the FPU block took up less chip space and should have generated a minority of heat/power draw compared to the 486 and 5x86 (or Media GX). That and the MMX execution unit, which shares FPU registers anyway, and was kept relatively skimpy as well die-space and transistor budget wise, I think.

From my own experience, many of the Cyrix chips actually overclock quite well, sometimes at stock voltage (especially the 2.9V 250 nm IBM and NS chips), but get excessively hot compared to most or all other CPUs of their era when pushed to their max useable speeds. (though cooling sufficient for 3.2V K6 233s shoudl work well with contemporary 6x86 chips)

So assuming the FPU itself overclocks at least as well as the chip as a whole, they could've used different clock multipliers for the FPU and CPU and kept the core clock lower and relatively cool.

In 1995 and most of 1996 FPU performance would've been pretty marginal to target for mass market competition, and aside from regaining their former high-end FPU glory in the workstation market, Cyrix had little reason to bother with that (it was plenty fast for spreadsheet and office app calulation use and had the Pentium Fdiv bug PR mess to lean on as well)

But just slightly later, that PR-rating system would've scaled a lot better that way from 1996 onward (ie once games and multimedia applications became genuinely FPU-hungry and/or Pentium-specific FPU optimized) and more competitive with the Pentium II and Celeron. And honestly, beefing up raw FPU performance (one way or another) would've been more useful for the average end-user in 1996-1999 than having MMX or 3DNow! support at all.

That goes for the final revisions of the M1 6x86L as well. I'd think it would've made a good budget gaming capable CPU for 1997 with an overclocked FPU and voltage bump. They could've started speed-grading individual chips with overclockable FPUs and more conservatively rated CPU cores and potentially even gotten better yields by splitting them into Gaming+Multimedia and Business class variants.
(use pads on the package for surface-mounted jumpers to hard-code the multiplier set-up at the factory, or just use jumper-selection via the existing 2 select bits or add a third and re-use the existing PLL/clock generator logic on-chip: if you stuck with just 2 bits you could have 2x and 3x CPU core settings and a synchronous and 'fast' FPU settings.
so fast FPU would be : 2x CPU + 3x FPU or 3x CPU + 4x FPU)

It also took an oddly long time to get 100 MHz FSB parts out. I'm not sure if that (and the 2.9V rating) were aimed at compatibility with older and lower-end boards of the time or if there's stability issues more than I've experienced with my P5AB.

(I've got a couple MVP3 boards I can try now that might shed light on this, and I did a bit on a Shuttle Shuttle HOT-591P which didn't want to even post below 2.9V on my PR366 at any speed I tried, but seemed to work at 300 MHz at 2.9V for the bit I tried; that board had some corrosion on it though and some other oddities, like the voltage selector jumpers replaced by a single jumper block that toggled between 2.0 and 3.5V only, and no voltage warnings in POST or BIOS readings ... worse given it came with a K6-2 installed and 3.5V seems to kill those pretty much instantly, probably depending on the exact board and VRM configuration)

The P5A I have also tends to undervolt 250 nm Cyrix chips well (and K6-2s and P55Cs) will POST at stock 250 MHz down to 2.0V, but isn't stable enough to boot windows (or maybe if at ~40C, like a cold boot, but then would obviously crash quickly as it warms up ... or needs a beefy, out of era cooler), but in any case it does work at 2.2V in that same board, but probably needs to stay cooler than the rated temps.

(it eventually freezes or crashes in my AT case when there was no exhaust fan other than the 2001-era ATX PSU's, and a wimpy and very choked intake fan, when the case was closed up, but never did that with the case open, though I've had similar issues running K6-2s or 2/3+s at stock speeds under those conditions ; I had mostly flawless results a few years back when I was doing all my tweaking tests in a makeshift test-bench or coverless case and some tests with a propped up 90 mm fan blowing on a fanless heatsink)

And the M2 core might not have scaled well, but it at least had decent real-world performance at the scaling and yields it did manage ... compared to the Centaur C3 that VIA chose over it. (or chose over the enhanced, sort of K6-III-like Jalapeno or Joshua core, but that also took up more die space where the 180 nm plain old MII-400/433 was slightly smaller than the 180 nm C3 and seems to be about 2x the speed per-clock in the overall 6x86 testing ... but even further ahead in FPU-bound stuff) So I'd have thought the genuinely cheap option would just be keeping the SS7 based M2 core in production, and do simple die shrinks for late-gen embedded use. The Winchip 2 also did better per clock, but not as well as the M2 and also didn't scale as high in clock speeds.

(I suppose the C3 still had advantages in super low power, notebooks and thin clients like the Media GX/Geode was used for ...the C3 maybe did the performance per watt thing better in VIA S370 based systems than an M2 would in an MVP4 based integrated board, but I'm not sure how chipset power consumption compares, even if the MVP board omitted the board-level cache)

But this is way far afield of the 486 and 5x86 era topic now. (though I suppose that poorly ventilated AT case observation is worth noting for 486 era stuff, too: fanless baby AT and XT clone cases ... also carpeted floor mounted, not desktop mounted, but that's how Dad set-up all of our PCs back then too: under-desk tower/mini-tower, or a horizontal case tucked way under desk drawer areas, or inside TV cabinets, though he did other cooling mods and fan additions: and started cutting away all the fan cages/guards on PSUs and case exhaust ports by the time it came to Athlon XP era rigs)

Actually that old ATX PSU in the AT case has it's fan guard cut away, but airflow still wasn't good enough. (or probably pulls to much through other cracks and across the drive bay rather than over the board)

Reply 252 of 747, by ph4nt0m

User metadata
Rank Member
Rank
Member
kool kitty89 wrote on 2020-03-31, 04:21:
On the 5x86 manufacturing end: if IBM was manufacturing all the Cx5x86 cores, what's the deal with (apparently) only the 5x86C r […]
Show full quote

On the 5x86 manufacturing end: if IBM was manufacturing all the Cx5x86 cores, what's the deal with (apparently) only the 5x86C revisions being IBM branded and for that matter: no Cyrix branded 5x86C models. (or are late-model Cyrix 5x86s the same silicon, mask revision, etc, just marked and graded differently?)

It doesn't seem like IBM was just grading and marking their parts more conservatively given the actual chip functionality changed. Or am I mistaken and are IBM's 5x86 75 MHz parts equivalent to (at least some) 100 MHz Cyrix branded ones?

And for that matter, IBM started selling Cyrix based CPUs with their own IBM label on them starting with the 486DX2 (the Blue Lightning DX2 CPUs). So were those all IBM manufactured and were any of the ST branded 486s actually IBM produced?

I assume the 5V rated DX2 66 and 80 chips by ST were their own, but what about the DX4? Wikipedia lists that as being manufactured by SGST for a long time, but was it just sold through ST and just continued IBM manufacturing sort of like IBM's long-term 5x86 production?

IBM didn't change any functionality of 5x86 because they worked with the masks only. They could optimise for yields and clock speeds though. They also didn't overclock their branded 5x86 because all of them were 3.3V rated, therefore very few made to 120MHz and none to 133MHz. ST also didn't offer any 133MHz chips. Cyrix branded 120MHz parts were 3.6V rated and 133MHz ones were either 3.6V or 3.7V rated.

All ST branded 486 chips that I'm aware of were actually produced by ST. Most were assembled in Canada, some in the USA.

My Active Sales on CPU-World

Reply 253 of 747, by kool kitty89

User metadata
Rank Member
Rank
Member

IBM making mask revisions could have impacted fixing buggy/unstable portions of the chips, couldn't it? Especially bugs caused by interconnect routing (too much signal noise or poor power or grounding routing to certain functional blocks).

Apparently one of the major bugs in the production version of the Atari Jaguar's GPU ASIC was caused by weak power distribution. (the 32-bit RISC MPU inside it is unable to execute code from external RAM because of it and needs to page code blocks into its scratchpad instead: there's also a software workaround for that discovered in the homebrew community 10+ years ago, but Atari/Flair and any of the game programmers of the time didn't work that out, including John Carmak who did a fair bit of other advanced work-arounds and in-house tool-set building)

AFIK, overvolting that ASIC doesn't solve the problem either, but I'm not aware of anyone who's tried. (like running it at ~5.25V instead of 5.0V, like some P5 boards support, and there are simple linear voltage regulators with 5.2V output ... I think some use a 7852 part number; though 6V models are probably more common) I believe of Atari's (preproduction and production) Jaguar chips use a .8 micron CMOS standard cell process done by Toshiba or Motorola. (and I think IBM manufactured the boards themselves)

I'd assume similar sorts of problems would be possible on standalone CPU cores as well. (it's the sort of thing that usually demands a new mask revision before mass production takes place, but sometimes bugs slip through or timing is too tight and forces an early mask revision to be put into mass production)

Which of course could also mean the 'buggy' chips could still have a few examples that work with all features, especially if underclocked and/or overvolted and/or cooled well.

Has anyone tried a Cyrix 5x86 100 (P75) at 2x40 or 3x25 to see it works as well as an IBM 5x86C for enabling the enhanced FPU features? (or 2x33 for that matter)

Reply 254 of 747, by kool kitty89

User metadata
Rank Member
Rank
Member

Well, I overlooked what should have been an obvious jumper on the CL5422, and it sped it up beyond CL5420 speeds, but my attempt to compare them 'maxed out' in a S7 board didn't pan out, at least not this SiS 5571 based Mtech Mustang (I was also mistaken about ISA clock speeds, the fastest it goes is 1/3 PCI clock, so 13.89 MHz for 83.3 MHz FSB, that is unless I was mis-remembering and another 486 board I was testing had a 1/2 divider).

Anyway, unlike the 5420, the 5522 seems to run fine at 13.3 or 13.9 MHz ISA clock with the waitstate jumper closed and is a bit faster than the 5420 both at 10 and 13+ MHz.

But the weird thing is that I was getting lower scores for Doom and 3DBench with a Pentium 133 and 11.1 MHz ISA clock than the 486 120 at 10 MHz, and even 150 MHz P54c didn't catch up there. At 2x 83 MHz it was able to slightly beat the Doom and 3Dbench scores, but not by a lot.

PC Player was about 2x as fast and Quake did much better than Doom, but I'm not sure what's going on. For Doom, I guess slow VGA register writes in the S7 board would explain it (vs fairly fast 16-bit block copy), but 3DBench uses mode 13h and should block copy over fine, too.

The ISA bus clock did may a significant difference in performance on the S7 board too where dropping the clock down didn't hurt much on the 486. (I didn't try both at 8 MHz, but I suspect the Pentium would be too choked to catch up)

It also wasn't just the FSB and ISA clock limiting things as 1.5x 83 did substantially worse than 2x83 on the Pentium.

With a Virge 325 the Pentium was vastly faster in all tests and fast enough that it got the erroneous low 3DBench 1.0 scores and proper high 1.0C score of 122.9 (where with ISA it was slow enough to have similar scores on both tests).

The best 3Dbench score on the P54 2x83 MHz was 47.6 for 1.0 and 46.3 for 1.0c
and the 486 got 45.4 for 1.0 and 45.1 for 1.0c

PC Player 640x480 was 8.8 fps for the Pentium and 4.1 fps for the 486
320x200 was 29.3fps pentium and 11.7 fps 486

Doom high-detail got:
3504 realticks pentium
3298 realticks 486 (at 10 MHz; but at 13.3 MHz it was down to 3190 for some reason; several re-runs at 10 MHz gave similar results with 3291-3298)

The 5571 isn't a super late-gen chipset, but were there 486 era ISA bus optimizations that got stripped down even by that point? Or for that matter, did it happen back during the socket 4 and 5 transition? I haven't compared those boards either.

I don't think it's using a PCI to ISA bridge interface, and I think that usually ends up making things even slower than this. (or my assumptions about that being why ISA video on my Shuttle AV18 is slow enough to see single-character printing in the POST screens are wrong and something else is going on there ... maybe just high ISA bus wait states not user configurable, aimed at compatibility with slow cards rather than performance)

And I didn't feel like opening up my P5A-B rig again and swapping cards around to make room for ISA video to compare that one.

In any case, it seems like OPTi 495SX (and presumably 495DLC) based boards don't lose a whole lot if they're ISA only, but VESA doesn't hurt either. I don't have enough data from other 486, 386, and then later boards to say, but I'm almost thinking this OPTi chipset is quite fast for ISA based stuff in general and I wonder if there are any 386 socket boards that are much faster. (I plan to do a 386DX vs 486DLC vs 486 comparison test at some point, including comparing the Doom results with Phil's previous 386 vs 486 comparisons, all using the same RAM and video card configurations)

Reply 255 of 747, by kool kitty89

User metadata
Rank Member
Rank
Member

Also, in relation to the argument over what constitutes a 486 or not some pages back in the thread, and the mention of some AM5x86 variants having not just the 'x5' printed on them, but actually AM486 DX5.

Here's one I found a few months ago (listed on ebay as an AMD 486, no 5x86 in the title or description).

Am486 DX5-133W16BGC
9738CPA

Attachments

Reply 256 of 747, by ph4nt0m

User metadata
Rank Member
Rank
Member
kool kitty89 wrote on 2020-04-02, 13:04:

The 5571 isn't a super late-gen chipset, but were there 486 era ISA bus optimizations that got stripped down even by that point? Or for that matter, did it happen back during the socket 4 and 5 transition? I haven't compared those boards either.

In any case, it seems like OPTi 495SX (and presumably 495DLC) based boards don't lose a whole lot if they're ISA only, but VESA doesn't hurt either.

Those single chip SiS chipsets for Socket 7 or Socket 370 are horribly slow when it comes to memory performance. Their engineers had to sacrifice a lot in order to get all the stuff into a single chip to cut down the manufacturing costs. These chipsets were used for the cheapest boards back in the day.

OPTi 495SX / 495SLC / 495XLC were also memory slow. Although they were featured often on hybrid 386 / 486 boards even with VESA slots, their memory performance limited everything.

My Active Sales on CPU-World

Reply 257 of 747, by ph4nt0m

User metadata
Rank Member
Rank
Member
kool kitty89 wrote on 2020-04-02, 13:13:

Here's one I found a few months ago (listed on ebay as an AMD 486, no 5x86 in the title or description).

Am486 DX5-133W16BGC
9738CPA

That's not a surprise. AMD sold many 5x86 chips labelled simply as

Am486DX4-100
A80486DX4-100SV8B
package 25544

They still had 16Kb cache and could be overclocked to 160MHz easily. Original 500nm SV8B package 25398 couldn't do that.

My Active Sales on CPU-World

Reply 258 of 747, by kool kitty89

User metadata
Rank Member
Rank
Member

I'm not sure how slow memory performance (I'm assuming L2 cache and EDO/SDRAM controller speed/performance) would make the ISA bus more of a bottleneck. Fast PCI cards, yes, and general RAM operations, and I'd think it'd make it harder for fast PCI cards to perform their best, but why would it bottleneck the ISA bus more than it already is?

Besides that, I was running SDRAM, and it's generally a lot easier to design a fast SDRAM controller with a more limited transistor budget than an FPM or EDO dram controller.

It seems more like there's some sort of forced wait states well in excess of the older 486 chipset or just really slow ISA interfacing. (I'll have to throw in some other Socket 7 and Super Socket 7 ... and Slot 1 and 370 ISA bus comparisons at some point to compare) And the wait states aren't adjusted proportional to ISA divider and FSB settings, so overclocking the ISA bus still improves speed (and asynch 7.16 MHz mode is worst) but still performs disproportionately slowly.

Or maybe it has something to do with the way the integrated IDE interface is implemented, but I kind of doubt it. (that Mustang board is still old enough to use a discrete Winbond multi-IO ASIC for the IDE interface ... which also means they could've thrown a gameport header on the board with little added effort: but that's true for most UART+PIO/IDE/etc interfaces using similar chips)

Unlike my Super 7 boards, there are no I/O recovery time or wait state selections for the expansion bus, though I'm not positive tweaking that would make the difference there, either.

Also, those SiS chipsets weren't always just used on low-end boards, there were some more performance optimized (and overclocking oriented) implementations, and M-Tech seems to have had a reputation for making those sorts of boards around the time. At least Anandtech gives that impression:
https://www.anandtech.com/show/45

OTOH, that's probably comparing typical FX and VX board performance when mentioning Intel (not HX). Albeit they're also talking about business performance scores and the Cyrix 6x86.

I know VIA's VPX chipset had a lot of high performance features that didn't end up demonstrated on a lot of cheaper boards using it, but a few FIC boards were known to be particularly fast with it.