Reply 20 of 41, by rasz_pl
MSxyz wrote on 2024-07-13, 15:26:With Quake 1.06, a 40MHz Am386DX + Cyrix 83D87 is capable of 2.0 frame per seconds on average. Swap the 386 with a Cyrix 486DLC and the framerate increases to 2.5. With the Intel RapidCAD, also running at 40 MHz, the framerate increases to 3.1
for completeness you should compare to real 486DX 40MHz with similar graphic card/chipset/ram timings
MSxyz wrote on 2024-07-13, 15:26:since having the FPU integral to the CPU saves a lot of cycles.
Is there something special about being integrated? I dont think so. CPU-FPU communication speed is down to protocol:
- 8087 FPU ran in lockstep with CPU snooping on main CPU bus and taking over when necessary, switching CPU-FPU by bus mastering CPU bus?
- 287/387 FPU uses different mode, something about message passing? special 0F0-0FF I/o ports? Switching CPU-FPU takes tens of cycles and is realized by raising exceptions? above my knowledge level
- Weitek 4167 were using yet another different communication mechanism, memory mapped 64KB window at ~3GB address. Communication at full external bus speed, much faster than 387
- afaik 486DX copro didnt change the way coprocessor communicates from 387? its as slow as 386-387?
- Pentium drastically changed CPU-FPU communication protocol again.
Quake with Weitek support would be interesting 😀
MSxyz wrote on 2024-07-13, 15:26:To me this stuff is fascinating...
Definitely. I also love obscure outdated technical knowledge!
mkarcher wrote on 2024-07-13, 22:35:Q_memcpy at that line is inside an #if 0 block, so it is not compiled.
how do you mean? the only ifdefs in common.c are for
#ifdef PARANOID
#if defined(_WIN32)
#if WINDED
and Q_memcpy is used all over the place
vga palette https://github.com/id-Software/Quake/blob/bf4 … /vid_dos.c#L285
draw_pic https://github.com/id-Software/Quake/blob/bf4 … ent/draw.c#L366
models/skins https://github.com/id-Software/Quake/blob/bf4 … t/model.c#L1358
its ~half/half between Q_memcpy and memcpy in those files, like they couldnt make their minds up or abandoned optimizations half way (no point optimizing something not in hot loops) or deeming them inconsequential (realized watcom memcopy generated same code).
mkarcher wrote on 2024-07-13, 22:35:I could not find any traces of FPU memcpy in Quake 1.08 for DOS, especially not for VID_Update, so the use of FPU memcpy in Quake is likely a myth.
I first read that meme in @Jo22 posts Re: 2D Acceleration - first chipsets and never understood where he got the idea from.
Now the question is - would that even speed up Quake? Here is FPU memcopy implementation http://www.pennelynn.com/Documents/CUJ/HTML/1 … HAM1/DURHAM.HTM promising 16% gain on memory-memory transfers. Tthis number is with pre warmed cache so not real world, without it its
"No Self-warming, float register copy, Pentium 90
(All cases were worse than memcpy)"
It took Intel another ~15 years to finally start optimizing and recommending "rep movsb" as guaranteed full cache speed memory move operation.
https://github.com/raszpl/FIC-486-GAC-2-Cache-Module for AT&T Globalyst
https://github.com/raszpl/386RC-16 memory board
https://github.com/raszpl/440BX Reference Design adapted to Kicad
https://github.com/raszpl/Zenith_ZBIOS MFM-300 Monitor