VOGONS


3 (+3 more) retro battle stations

Topic actions

Reply 2480 of 2485, by sqpat

User metadata
Rank Newbie
Rank
Newbie

I actually tried FPU for some RealDOOM stuff a couple weeks ago. Mostly the obvious use case is to replace 48:32 bit division (a single instruction in x86-32 but a complicated mess that in the worst case might involve multiple div/mul instructions in x86-16 to replicate) but even in the worst case I could not get a performance increase, in fact it was generally a significant loss in performance. Probably something could be built from the ground up to use the 287 properly as a coprocessor rather than block on calculation of values, but it's hard to integrate into doom. I don't think FPU is fast enough to do be doing a FIDIV for every column drawn in the doom engine. There's a handful of other operations done to prepare the column for drawing, but i think even if they were done in parallel you'd sit there waiting a long time for the FIDIV calculation to complete. And if its not even fast enough to do this, it's hard to think of something it could do often and at a high frame rate to be useful.

That said It was my first time in using 287s though, and when I tried to bench performance on various programs, it seemed the performance of both my FPUs was halved. At 25 mhz i scored ~521 KWhetstones on Navtratil (some of the other benches were not working or were 'off the scale' low) while it seems you got 1097 kwhetstones with the same FPU and speed. I tried with both a 20 mhz IIT and Cyrix CX-82S87-NP-SV (both ran 25 mhz just fine) and both got about half your scores. I fiddled around with bus clocks and things like that, but the performance was consistently half of what I expected. The screen did say the FPU was running at 25 mhz and there was no sign it was a fake chip or anything. I could imagine something like wait states existing in 287 communication but most of the cycles are spent with the FPU doing its work so there's no reason for such a large performance drop. This was the SCAT router board; I can try another board later. I scoured all the chipset settings and docs and couldn't find anything to suggest the speed should be halved. If the FPU performance was really compromised in some way I might have to revisit those realdoom FPU div tests at some point.

Outside of benchmarks is there anything actually 'cool' using the FPU? I guess I am mostly aware of CAD, mandelbrot/fractal programs and flight sims. I never really messed around with this stuff either so I have some learning to do. Perhaps a raytracer could use an FPU as an "accelerator" that traces other pixels in parallel or something. Might make a neat demo/benchmark itself.

Reply 2482 of 2485, by pshipkov

User metadata
Rank l33t
Rank
l33t

Yeah, for this class hardware it is unfair to compare FPUs against fixed point math specialization in on-rails runtimes such as videogames.
FPUs will make a significant difference for general purpose math computations, usually found in CAD systems, offline graphics software (image and video editing, vector and 3D rendering, etc), spreadsheets, research/modeling (Mathlab, etc.).
In the late 80ies and early 90ies, compatible with 286/287, i can think of a handful of "cool" things such as AutoCAD, TurboCAD, Microstation, POVRay, Chaos (a cool set of fractal generators published by Autodesk), CorelDraw, drive/flight simulators with vector graphics, but that's about it.
From what i can see online, people tend to run a game or two on their 286es once in a while for the good old memories, so the obscurities above are largely a no factor.

VLSI SCAMP is brutally slow on a clock-to-clock basis. It is incomparable to VLSI VL82CPCAT-16QC or Headland HT-18.
I was able to squeeze out of it ~750 kwhetstones/s in NSSI at 20MHz CPU and 10MHz FPU but that's about it. Link here. This is long-term stable.
Hope these notes help somewhat.

I prefer to test things with real software. For the purpose of benchmarking 287 FPUs i use Autodesk Chaos. It greatly stresses the system and also makes nice images (for its time).
Take a look at this post - the last chart. It gives a glimpse into how things are for 287. Time is measured in seconds. Only handful of chipsets/motherboards are worth FPU-testing as the rest are way WAY too slow.
The fastest chipset and motherboard for 287 FPUs, that i have seen so far, is Protech PM286 based on Headland HT-18/C. The thing is a beast in this area.

(btw, this conversation reminds me that it is time to post info about few more 286 chipsets - obscure ones)

retro bits and bytes | DOS media library

Reply 2483 of 2485, by pshipkov

User metadata
Rank l33t
Rank
l33t

Ever since the very successful outcome with Chicony CH-471B rev 2.0 i have been wondering how Chico Jr. (the A models) would do in terms of performance and overclocking. Not far ago i managed to obtain rev 3.0 of the assembly and inspected it.

Chicony CH-471A rev:3.0 based on SiS 85C471, 85C407

motherboard_486_chicony_ch-471a_ver_3.0.jpg

Classic ISA/VLB layout. Nothing much to say really.

There was a small corrosion from leaked battery around the lower right corner of the memory slots. Cleaned it for good.

Upgraded it to 1Mb level 2 cache. Board is not very picky about level 2 cache chips - was able to quickly bin the right set.
I see this class hardware as mostly DOS-bound, so my preferred card is Ark1000VL, since it is the VGA blaster = fastest DOS interactive graphics. The card can be fussy in some motherboards with tight BIOS timings - this is a known issue. Instead of relaxing the wait states and slowing the system down, i switched to a Diamond Stealth 64 DRAM T VLB REV B2 (S3 Trio64) which is more resistent than Ark1000VL in such situations. The S3 Trio64 is slightly slower clock-to-clock than Ark1000VL at DOS interactive graphics, but it handles the tightest BIOS settings which results in a better overall performance.

Local storage through Promise EIDE2300Plus with CF card attached to it.

--- Am5x86 at 160MHz (4x40)

All BIOS settings on max.
1Mb level 2 cache - achieved with a mix of 10/15 ns chips.
32Mb (2x16) RAM, 60ns.

Level 1 cache policy is always in write-trough mode regardless of what the corresponding BIOS setting says.

Nothing much to add really. Things just work. System is fully stable. Intermediate performance.
performance results

chicony_ch_471a_ver_3.0_speedsys_160.png

--- Am5x86 at 180MHz (3x60)

3.6V to CPU, 5V Peltier element for cooling.

All BIOS settings on max, except:
DRAM SPEED = "SLOWER" (best is FASTEST)
DRAM WRITE WS = 1 WS (best is 0 WS)
CACHE BURST READ = 1T (best is 0T)
LOCAL BUS READY = SYNCHRONIZE (best is TRANSPARENT)

Was not able to achieve fully stable system.
All is good in the simple DOS interactive graphics (Wolf3D, Doom, Quake 1, bunch of other benchmarks / games) and standard Windows usability tests, but some of the heavy offline compute tests fail no matter what. Tried all CPU voltages. 12V peltier for deep processor freeze. All sorts of BIOS settings combinations - from most conservative to different grades of wait states tightening. Rotated CPUs, L2 cache chips, video and IDE controllers, RAM modules. Used multiple sets of trusted components. Nothing helped.
For DOS gaming and casual Windows activities, the motherboard totally cuts it. For more intricate usage - there can be trouble.

Performance is lacking - below average.
performance results

chicony_ch_471a_ver_3.0_speedsys_180.png

--- Am5x86 at 200MHz (3x66)

System is unstable no matter what.

--- P24T (POD100) at 100MHz (2.5 x 40)

Similar to Chicony CH-471B ver 2.0, the system hangs at boot time no matter what. Tried everything - components, frequencies, BIOS settings - no luck.

---

All in all - a slightly disappointing motherboard.

retro bits and bytes | DOS media library

Reply 2484 of 2485, by feipoa

User metadata
Rank l33t++
Rank
l33t++

The Chicony CH-471A rev 3.0 looks similar to Chicony CH-471B rev 2.0, except that the B-rev2 had to make space for onboard IDE/floppy/IO (UM82C865F, SMC FDC37C666GT, Appian ADI/2). The B-rev2 being the better board, I wonder if having these components integrated onto the motherboard somehow was the magic difference. Everything is hit or miss at 180 MHz.

Plan your life wisely, you'll be dead before you know it.

Reply 2485 of 2485, by pshipkov

User metadata
Rank l33t
Rank
l33t

I expected that the simpler, more compact A model will do better than B, but as you said - when operating out of specification, presumptions are meaningless.

retro bits and bytes | DOS media library