VOGONS


Reply 20 of 32, by Beerfloat

User metadata
Rank Member
Rank
Member
Gahhhrrrlic wrote on 2026-04-01, 05:40:

I just bought the IIT FPU from group 4 above (because it's supposedly the fastest and only has some fails in sine/cos). But I just want to confirm so I don't blow up my precious machine... These XC87-DLC chips work in a 386 computer??? I thought that was kind of the thrust of the original post but never saw it explicitly made clear whether these chips were being tested on a 386 or 486. Are all of the chips compatible as 387 coprocessors or do some have to be used with 486 sx or something like that? Thanks.

Yes they are all for use with 386 CPUs. It's just branding to make them appear particularly suitable for the 386+ CPUs that appeared later for the 386 PGA132 socket. Like the 486DLC, 486Drx2, 486SXL etc.

Proper 486 CPUs (i.e. using PGA168+ socket) do not support external coprocessors.

Reply 21 of 32, by Gahhhrrrlic

User metadata
Rank Member
Rank
Member

Thanks. I'm excited to get mine to replace my ULSI. I think this A type IIT should be much faster. I think IIT had pretty good mul performance. Can't find any documentation on the architecture or instruction latency of the ULSI though to compare...

https://hubpages.com/technology/How-to-Maximi … -Retro-Computer

Reply 22 of 32, by PiotrUU

User metadata
Rank Newbie
Rank
Newbie

The latest versions of the ULSI FPU are very fast – just like the CYRIX FPU. The IIT with a v4 core is slower than the ULSI. But the early ULSI is slow.
Notice that the IIT FPU from the v4 and v5 groups always has the inscription 1002FA.
1002FB and 1002FC can be from the v2 or v3 group.

Reply 23 of 32, by PiotrUU

User metadata
Rank Newbie
Rank
Newbie

IIT XC87DLC from the v3 group with the inscription 1002FB.

Reply 24 of 32, by Gahhhrrrlic

User metadata
Rank Member
Rank
Member

I got my IIT chip in the mail and swapped it in. Before doing so however I ran the ULSI through as many benches as I could. Then I re-ran with the IIT. The IIT group 4 (A revision) was faster than the ULSI in every benchmark EXCEPT "CABT" (Circuit Analysis bench) and Quake. I can't find documentation to prove it but I have a feeling the microarchitecture of the ULSI is more insensitive to stack traffic than IIT. Quake was heavy on FXCH, which may have been free for Pentium but was it for ULSI as well? What other distinctions could there have been to account for this discrepancy? I wonder what the circuit analysis bench and quake have in common...

https://hubpages.com/technology/How-to-Maximi … -Retro-Computer

Reply 25 of 32, by rasz_pl

User metadata
Rank l33t
Rank
l33t

FXCH wouldnt be free for any external FPU as the opcode needs to be send separately eating cycles. FXCH is only one half of the puzzle, the other is ability to fire next instruction immediately after hanks to pipelining. No one else made pipelined x87 FPUs before Intel Pentium, with AMD K7 joining the cool kids club in 1999.

https://github.com/raszpl/sigrok-disk FM/MFM/RLL decoder
https://github.com/raszpl/FIC-486-GAC-2-Cache-Module (AT&T Globalyst)
https://github.com/raszpl/386RC-16 ram board
https://github.com/raszpl/Zenith_ZBIOS Zenith Z-386 MFM-300 ZBIOS disassembly

Reply 26 of 32, by Gahhhrrrlic

User metadata
Rank Member
Rank
Member

Oh I agree. I'm just trying to understand the nuances between chips of the same era and class. IIT outperforms ULSI (well the one I have anyway) in most applications but just falls on its face in quake. They share the same IEEE standard and instructions for the mostpart so there have to be some internal microcode differences or something to account for quake.

https://hubpages.com/technology/How-to-Maximi … -Retro-Computer

Reply 27 of 32, by rasz_pl

User metadata
Rank l33t
Rank
l33t

Do those FPUs have documented instruction cycle times? Without pipelining Quake will stall on long chains of
fmuls waiting for every single one to end before firing next one https://github.com/id-Software/Quake/blob/bf4 … _drawa.asm#L248
and then waiting until last one ends before executing FXCH just to add results, wait for add to end to start fmuls again. IIT might e slower on one of the instructions quake is heavy on.

pipelined FPU allows all those instructions run simultaneously https://www.phatcode.net/res/224/files/html/c … ml#:~:text=FXCH

Without pipelining you would have to carefully fine tune FPU calculations and interleave them with some other unrelated normal code to hide FPU stalls.
I think Quake486 https://github.com/goshhhy/486quake ran with much more basic idea of just minimizing number of FPU instructions (get rid of fxch since it doesnt help without pipelining) and some reordering
this https://github.com/id-Software/Quake/blob/bf4 … e/d_draw.s#L121
becomes this https://github.com/goshhhy/486quake/blob/27bd … 6/d_draw.s#L125
Funnily enough this doesnt seem to make much difference on 486 CPUs. Pentium and K6 both get a nice ~20% speed bump pretty equally so its not clear if speed gain comes from messing with FPU code or soemthing else.
One surprising thing about Quake486 is that it actually makes Pentium run faster despite deoptimizing FXCH heavy code 😮

https://github.com/raszpl/sigrok-disk FM/MFM/RLL decoder
https://github.com/raszpl/FIC-486-GAC-2-Cache-Module (AT&T Globalyst)
https://github.com/raszpl/386RC-16 ram board
https://github.com/raszpl/Zenith_ZBIOS Zenith Z-386 MFM-300 ZBIOS disassembly

Reply 28 of 32, by Gahhhrrrlic

User metadata
Rank Member
Rank
Member

https://limewire.com/d/lpUA7#mvn5CLax18
(upload was too big)

I think of all the chips, IIT would be the best at processing tons of muls. It's literally what it was designed to do. The canonical dot product is the heart of the F4x4 instruction which is nothing but muls and adds.

Speaking of, since I just started learning assembly, does anyone know how to write assembly with raw machine code instead of mnemonics? I'm trying to get the F4x4 instruction to work but either I'm doing it wrong or the compiler doesn't recognize it. It throws a syntax error if I type F4X4 and if I do 0xDB 0xF1 it compiles but then when I debug the opcode just vanishes, with the one before and after it right next to each other instead.

https://hubpages.com/technology/How-to-Maximi … -Retro-Computer

Reply 29 of 32, by rasz_pl

User metadata
Rank l33t
Rank
l33t

F4X4 is not a standard x87 instruction. No non IIT aware assembler will display it, and nothing apart from IIT dedicated/custom written software ever issued that command.

Now taking something like quake486 and rewriting all FPU code to use this special IIT vector instruction would be a fascinating exercise. You could probably coax LLM into helping you with bulk of the work or at least sanity checking/debugging manual conversion.

https://github.com/raszpl/sigrok-disk FM/MFM/RLL decoder
https://github.com/raszpl/FIC-486-GAC-2-Cache-Module (AT&T Globalyst)
https://github.com/raszpl/386RC-16 ram board
https://github.com/raszpl/Zenith_ZBIOS Zenith Z-386 MFM-300 ZBIOS disassembly

Reply 30 of 32, by Gahhhrrrlic

User metadata
Rank Member
Rank
Member

I tried using the software that comes on the demo disk and the 1 file "F4X4.EXE" does run, so it seems the instruction functions. I just need to know how to compile code using that instruction. Do I need a specific compiler that is IIT aware? I'm using Watcom 1.9, maybe that's the problem? Honestly I thought that when you use machine code, it's so low level that the compiler becomes irrelevant. DB F1 just says stick F1 in the instruction queue...whatever that is and it's up to the hardware to figure it out. Maybe I misunderstood though...

https://hubpages.com/technology/How-to-Maximi … -Retro-Computer

Reply 31 of 32, by rasz_pl

User metadata
Rank l33t
Rank
l33t
_asm {
xor eax, eax
db 0xDB, 0xF1
xor eax, eax
}

I dont think there is a way to declare your own opcodes in Watcom. Maybe IIT documentation has some tips?

https://github.com/raszpl/sigrok-disk FM/MFM/RLL decoder
https://github.com/raszpl/FIC-486-GAC-2-Cache-Module (AT&T Globalyst)
https://github.com/raszpl/386RC-16 ram board
https://github.com/raszpl/Zenith_ZBIOS Zenith Z-386 MFM-300 ZBIOS disassembly

Reply 32 of 32, by Gahhhrrrlic

User metadata
Rank Member
Rank
Member

Oh in Watcom you do it like this:

void FPU_Init(void);
#pragma aux FPU_Init = \
"finit" /* reset FPU */ \
"fstcw word ptr fpu_cw" \
"or word ptr fpu_cw, 0x1C3F" \
"fldcw word ptr fpu_cw" \
"fld1" \
"fld1" \
"fld qword ptr sixteen" \
"fld1" \
"fscale" \
"fld1" \
"fdiv st(0), st(1)" \
"fld1" \
"fld1" \
"fld st(3)" \
"fmul st(0), st(4)" \
modify exact [8087]

https://hubpages.com/technology/How-to-Maximi … -Retro-Computer