Are all IIT 387 FPUs the same? 3C87 =/= 4C87DLC after all?

Reply 20 of 27, by Beerfloat

Posted on 2026-04-01, 06:17

Beerfloat Offline

Rank Newbie

Rank: Newbie
Posts: 97
Joined: 2023-07-17, 22:07

Gahhhrrrlic wrote on 2026-04-01, 05:40:

I just bought the IIT FPU from group 4 above (because it's supposedly the fastest and only has some fails in sine/cos). But I just want to confirm so I don't blow up my precious machine... These XC87-DLC chips work in a 386 computer??? I thought that was kind of the thrust of the original post but never saw it explicitly made clear whether these chips were being tested on a 386 or 486. Are all of the chips compatible as 387 coprocessors or do some have to be used with 486 sx or something like that? Thanks.

Yes they are all for use with 386 CPUs. It's just branding to make them appear particularly suitable for the 386+ CPUs that appeared later for the 386 PGA132 socket. Like the 486DLC, 486Drx2, 486SXL etc.

Proper 486 CPUs (i.e. using PGA168+ socket) do not support external coprocessors.

Reply 21 of 27, by Gahhhrrrlic

Posted on 2026-04-03, 02:50

Gahhhrrrlic Offline

Rank Member

Rank: Member
Posts: 484
Joined: 2017-12-05, 00:39

Thanks. I'm excited to get mine to replace my ULSI. I think this A type IIT should be much faster. I think IIT had pretty good mul performance. Can't find any documentation on the architecture or instruction latency of the ULSI though to compare...

https://hubpages.com/technology/How-to-Maximi … -Retro-Computer

Reply 22 of 27, by PiotrUU

Posted on 2026-04-09, 13:38

PiotrUU Offline

Rank Newbie

Rank: Newbie
Posts: 39
Joined: 2019-01-13, 14:13
Location: Poland

The latest versions of the ULSI FPU are very fast – just like the CYRIX FPU. The IIT with a v4 core is slower than the ULSI. But the early ULSI is slow.
Notice that the IIT FPU from the v4 and v5 groups always has the inscription 1002FA.
1002FB and 1002FC can be from the v2 or v3 group.

Reply 23 of 27, by PiotrUU

Posted on 2026-04-09, 13:41

PiotrUU Offline

Rank Newbie

Rank: Newbie
Posts: 39
Joined: 2019-01-13, 14:13
Location: Poland

IIT XC87DLC from the v3 group with the inscription 1002FB.

Reply 24 of 27, by Gahhhrrrlic

Posted on 2026-04-22, 03:08

Gahhhrrrlic Offline

Rank Member

Rank: Member
Posts: 484
Joined: 2017-12-05, 00:39

I got my IIT chip in the mail and swapped it in. Before doing so however I ran the ULSI through as many benches as I could. Then I re-ran with the IIT. The IIT group 4 (A revision) was faster than the ULSI in every benchmark EXCEPT "CABT" (Circuit Analysis bench) and Quake. I can't find documentation to prove it but I have a feeling the microarchitecture of the ULSI is more insensitive to stack traffic than IIT. Quake was heavy on FXCH, which may have been free for Pentium but was it for ULSI as well? What other distinctions could there have been to account for this discrepancy? I wonder what the circuit analysis bench and quake have in common...

https://hubpages.com/technology/How-to-Maximi … -Retro-Computer

Reply 25 of 27, by rasz_pl

Posted on 2026-04-22, 11:35

rasz_pl Offline

Rank l33t

Rank: l33t
Posts: 4432
Joined: 2017-06-04, 00:57

FXCH wouldnt be free for any external FPU as the opcode needs to be send separately eating cycles. FXCH is only one half of the puzzle, the other is ability to fire next instruction immediately after hanks to pipelining. No one else made pipelined x87 FPUs before Intel Pentium, with AMD K7 joining the cool kids club in 1999.

https://github.com/raszpl/sigrok-disk FM/MFM/RLL decoder
https://github.com/raszpl/FIC-486-GAC-2-Cache-Module (AT&T Globalyst)
https://github.com/raszpl/386RC-16 ram board
https://github.com/raszpl/440BX Reference Design adapted to Kicad

Reply 26 of 27, by Gahhhrrrlic

Posted on 2026-04-22, 22:48

Gahhhrrrlic Offline

Rank Member

Rank: Member
Posts: 484
Joined: 2017-12-05, 00:39

Oh I agree. I'm just trying to understand the nuances between chips of the same era and class. IIT outperforms ULSI (well the one I have anyway) in most applications but just falls on its face in quake. They share the same IEEE standard and instructions for the mostpart so there have to be some internal microcode differences or something to account for quake.

https://hubpages.com/technology/How-to-Maximi … -Retro-Computer

Reply 27 of 27, by rasz_pl

Posted on 2026-04-23, 04:34

rasz_pl Offline

Rank l33t

Rank: l33t
Posts: 4432
Joined: 2017-06-04, 00:57

Do those FPUs have documented instruction cycle times? Without pipelining Quake will stall on long chains of
fmuls waiting for every single one to end before firing next one https://github.com/id-Software/Quake/blob/bf4 … _drawa.asm#L248
and then waiting until last one ends before executing FXCH just to add results, wait for add to end to start fmuls again. IIT might e slower on one of the instructions quake is heavy on.

pipelined FPU allows all those instructions run simultaneously https://www.phatcode.net/res/224/files/html/c … ml#:~:text=FXCH

Without pipelining you would have to carefully fine tune FPU calculations and interleave them with some other unrelated normal code to hide FPU stalls.
I think Quake486 https://github.com/goshhhy/486quake ran with much more basic idea of just minimizing number of FPU instructions (get rid of fxch since it doesnt help without pipelining) and some reordering
this https://github.com/id-Software/Quake/blob/bf4 … e/d_draw.s#L121
becomes this https://github.com/goshhhy/486quake/blob/27bd … 6/d_draw.s#L125
Funnily enough this doesnt seem to make much difference on 486 CPUs. Pentium and K6 both get a nice ~20% speed bump pretty equally so its not clear if speed gain comes from messing with FPU code or soemthing else.
One surprising thing about Quake486 is that it actually makes Pentium run faster despite deoptimizing FXCH heavy code 😮

https://github.com/raszpl/sigrok-disk FM/MFM/RLL decoder
https://github.com/raszpl/FIC-486-GAC-2-Cache-Module (AT&T Globalyst)
https://github.com/raszpl/386RC-16 ram board
https://github.com/raszpl/440BX Reference Design adapted to Kicad

Main menu