Reply 2480 of 2481, by sqpat
I actually tried FPU for some RealDOOM stuff a couple weeks ago. Mostly the obvious use case is to replace 48:32 bit division (a single instruction in x86-32 but a complicated mess that in the worst case might involve multiple div/mul instructions in x86-16 to replicate) but even in the worst case I could not get a performance increase, in fact it was generally a significant loss in performance. Probably something could be built from the ground up to use the 287 properly as a coprocessor rather than block on calculation of values, but it's hard to integrate into doom. I don't think FPU is fast enough to do be doing a FIDIV for every column drawn in the doom engine. There's a handful of other operations done to prepare the column for drawing, but i think even if they were done in parallel you'd sit there waiting a long time for the FIDIV calculation to complete. And if its not even fast enough to do this, it's hard to think of something it could do often and at a high frame rate to be useful.
That said It was my first time in using 287s though, and when I tried to bench performance on various programs, it seemed the performance of both my FPUs was halved. At 25 mhz i scored ~521 KWhetstones on Navtratil (some of the other benches were not working or were 'off the scale' low) while it seems you got 1097 kwhetstones with the same FPU and speed. I tried with both a 20 mhz IIT and Cyrix CX-82S87-NP-SV (both ran 25 mhz just fine) and both got about half your scores. I fiddled around with bus clocks and things like that, but the performance was consistently half of what I expected. The screen did say the FPU was running at 25 mhz and there was no sign it was a fake chip or anything. I could imagine something like wait states existing in 287 communication but most of the cycles are spent with the FPU doing its work so there's no reason for such a large performance drop. This was the SCAT router board; I can try another board later. I scoured all the chipset settings and docs and couldn't find anything to suggest the speed should be halved. If the FPU performance was really compromised in some way I might have to revisit those realdoom FPU div tests at some point.
Outside of benchmarks is there anything actually 'cool' using the FPU? I guess I am mostly aware of CAD, mandelbrot/fractal programs and flight sims. I never really messed around with this stuff either so I have some learning to do. Perhaps a raytracer could use an FPU as an "accelerator" that traces other pixels in parallel or something. Might make a neat demo/benchmark itself.