First post, by mpe
Is there any trick how to run Quake on a system without FPU (NexGen Nx586).
Tried to load Q87 emulator, but it is hanging on start.
Is there any trick how to run Quake on a system without FPU (NexGen Nx586).
Tried to load Q87 emulator, but it is hanging on start.
From what I understand there are critical FPU instructions which make it impossible to run without one.
Never even considered this before...
http://porthos.ist.utl.pt/docs/fpc/user/node47.html
Maybe if you track down this WMEMU it would at least start. However I can only imagine this is going to translate all floating point operations into fixed point integer ops at runtime. If (possible) precision issues don't cause the game to freak out, i would expect the game to run unbelievably slow. Might have to track down the later nexgen 586 with fpu if quake is a must ..
Sup. I like computers. Are you a computer?
Can't be done.
(though it probably is possible to excruciatingly hack up the source to force integers on everything including the progs system which stores floats for about every value)
wrote:Is there any trick how to run Quake on a system without FPU (NexGen Nx586).
Tried to load Q87 emulator, but it is hanging on start.
It should be running. What is the Q87 version? What is the EMM version? More details about the "hanging on start", especially which software hangs on its start!
@leileilol your reply is complete nonsense.
Then you have to count your gameplay in frames per minute, so what is really the point. I barely runs with a 486DX or Cyrix, it is tailored for a dual pipelined Pentium, thats it.
I am aroused about any  X86 motherboard that has full functional ISA slot. I think i have problem. Not really into that original (Turbo) XT,286,386 and CGA/EGA stuff.  So just a DOS nut.
PS. If I upload RAR, it is a 16-bit DOS RAR Version 2.50.
wrote:Then you have to count your gameplay in frames per minute, so what is really the point. I barely runs with a 486DX or Cyrix, it is tailored for a dual pipelined Pentium, thats it.
When you try it yourself then you speak! Q87 outperforms real 287 on 386 CPU not to mention on pentium CPUs. "That's it" 😀 Poor AMD CPUs that don't have "dual pipelines" 😉 We can't run Quake or other games on them 😀
wrote:Q87 outperforms real 287 on 386 CPU
Show some evidence, please. It's quite unlikely that it does, because a real FPU is about 80 to 100 times faster than software emulation. So for your statement to be true, a 386 must be more than 80 to 100 times faster than an XT, which it isn't (no even close).
Also, you would need Q387, as emulating a 8087 won't help you running Quake. Well, it won't work anyway, as neither Q87 nor Q387 can run in protected mode...
Some time ago, i tried out many different 387 emulators, but couldn't find any, that would work. I think i also used quake to try running it.
With real 387 co-processor, the quake does run... or walk on my 386.
Would be interesting to try some working emulator though.
"640K ought to be enough for anybody." - And i intend to get every last bit out of it even after loading every damn driver!
A little about software engineering: https://byteaether.github.io/
I think you need recompile Quake engine from sources with software emulated math library. Old attempt to compile DOS Q1 sources: Any DOS coders around? (compiling Quake 1 source)
wrote:it is tailored for a dual pipelined Pentium, thats it.
More specifically, it is tailored for a Pentium which can execute FPU instructions in parallel with integer ones.
It fires off an fdiv instruction once every 16 pixels. This fdiv instruction is very slow.
However, on Pentium, the fdiv runs 'in the background', while the two integer pipelines continue churning through the unrolled loop to plot 16 pixels. By the time the 16 pixels are done, the fdiv has completed, and the result can be used for the next 16 pixels without any pipeline stalls. On a 486 or Cyrix/AMD Pentium wannabes, the FPU cannot do this, so the pipeline stalls as soon as you fire off the fdiv, until the result is available, which means your innerloop gets a stall of ~40 cycles every 16 pixels. That hurts bad.
Aside from the fact that FPU emulation is horribly inefficient compared to fixedpoint integer arithmetic, obviously the whole trick of getting a 'free' fdiv in the innerloop is not going to work if you need to emulate the fdiv instruction with the integer pipeline as well.
The whole concept of the 'free fdiv' that Quake's renderer is based around, would need to be abandoned for a non-FPU approach, otherwise it's never going to work on any 486/Pentium-level machine. You'd need a much faster machine, because of the less efficient approach.
wrote:I think you need recompile Quake engine from sources with software emulated math library. Old attempt to compile DOS Q1 sources: Any DOS coders around? (compiling Quake 1 source)
I was googling to see if anyone had done quake without FPU before, since its been open source for 23 years and quake has been ported to run on hundreds of different platforms now but was surprised I could not find anything. My conclusion is that any cpu that is remotely modern includes a FPU, the only exceptions are probably simple microprocessors that are still slower than a pentium class cpu.
wrote:wrote:it is tailored for a dual pipelined Pentium, thats it.
However, on Pentium, the fdiv runs 'in the background
On every x86 NPU since the very first 8087 it runs in parallel to the CPU!
wrote:On every x86 NPU since the very first 8087 it runs in parallel to the CPU!
Yes, but no.
Here is Michael Abrash' article on the Pentium FPU:
https://www.phatcode.net/res/224/files/html/ch63/63-04.html
However, remember that although FDIV has a latency of up to 39 cycles, it can overlap with integer instructions for all but one of those cycles. That means that if we can find enough independent integer work to do before we need the 1/z result, we can effectively reduce the cost of the FDIV to one cycle.
This overlapping was not possible on earlier FPUs. The CPU would wait for the FPU to complete before executing the next instruction. There was no pipelining involved, like with the Pentium.
wrote:Would be interesting to try some working emulator though.
wrote:wrote:On every x86 NPU since the very first 8087 it runs in parallel to the CPU!
The CPU would wait for the FPU to complete before executing the next instruction.
Wrong! It will wait if/when WAIT/FWAIT instruction is executed. Until then even 8087 crunches numbers by its own. Pipelining is not related to this at all.
wrote:Wrong! It will wait if/when WAIT/FWAIT instruction is executed. Until then even 8087 crunches numbers by its own. Pipelining is not related to this at all.
Pretty sure it doesn't work like that on the 486 at least.
Else the whole Abrash article, and the fact that Quake runs like a total dog on any 486, doesn't make sense.
wrote:wrote:Wrong! It will wait if/when WAIT/FWAIT instruction is executed. Until then even 8087 crunches numbers by its own. Pipelining is not related to this at all.
Pretty sure it doesn't work like that on the 486 at least.
Else the whole Abrash article, and the fact that Quake runs like a total dog on any 486, doesn't make sense.
FYI this is exact quote from the "Intel 486 Processor Programmer's Reference Manual"
The i486 Integer Unit (IU) and FPU coordinate their activities in a manner transparent to software. Moreover, built-in coordinat […]
The i486 Integer Unit (IU) and FPU coordinate their activities in a manner transparent
to software. Moreover, built-in coordination facilities allow the IU to proceed with other
instructions while the FPU is simultaneously executing numeric instructions. Programs
can exploit this concurrency of execution to further increase system performance and
throughput.
14-2
Okay, then I wonder why Abrash wrote what he wrote, and why I have thought for the past 25+ years that the fdiv thing does not work, along with all my assembly programming peers.
Perhaps the answer is in the "Concurrent execution" measure that is listed for the FPU instructions in the manual?
It's not clear how that number is to be interpreted. Is that the number of cycles that allow concurrent execution? Or is it the cycle at which concurrent execution starts?
I think the latter makes the most sense (that's the thing a programmer wants to know when optimizing code, and it also seems to make most sense given the ratings for the various instructions), in which case we'd both be sorta right:
Theoretically the FPU can execute fdiv concurrently. But if in practice, an fdiv takes 73 cycles, and concurrrency starts at the 70th cycle, there's little or no gain from concurrent execution of fdivs.
I will dig out my 486 and the Zen Timer to do some testing when I have more time, to see what happens exactly.
wrote:wrote:Would be interesting to try some working emulator though.
Thanks! Though, i've already tried that and couldn't get quake running with it... 🙁
"640K ought to be enough for anybody." - And i intend to get every last bit out of it even after loading every damn driver!
A little about software engineering: https://byteaether.github.io/