VOGONS


Reply 20 of 24, by AlexZ

User metadata
Rank Oldbie
Rank
Oldbie

I have seen cache issues being discussed on algorithm interview, guys who work in C/C++ care about it. It is more about few key algorithms/functions being cache optimized than the whole app.

I would speculate that one of the reason why Cyrix put just 1KB cache on 486DLC is because they were aware of the cache flush issue, couldn't solve it on 386 platform and decided not to make the CPU more expensive than needed. They didn't sell well anyway, there weren't that many boards that supported them.

Pentium III 900E,ECS P6BXT-A+,384MB,GeForce FX 5600, Voodoo 2,Yamaha SM718
Turion 64 MT-40@2.4Ghz,Gigabyte GA-K8NE,2GB,GeForce GTX 275,Audigy 2ZS
Phenom II X4 955,Gigabyte GA-MA770-UD3,8GB,GeForce GTX 780
Vishera FX-8370,Asus 990FX,32GB,GeForce GTX 980 Ti

Reply 21 of 24, by MikeSG

User metadata
Rank Oldbie
Rank
Oldbie
rasz_pl wrote on 2026-06-02, 02:38:
Ah, 386 chipset + 486 cpu, perfect! Now run tests that dont bottleneck on FPU and VGA :) Doom might be ok'ish, it does 8bit vide […]
Show full quote

Ah, 386 chipset + 486 cpu, perfect! Now run tests that dont bottleneck on FPU and VGA 😀 Doom might be ok'ish, it does 8bit video writes when drawing columns and VGA will still bottleneck on faster cpu. FastDoom even better as it renders to ram buffer and copies to video in bulk.

I dont recall ever seeing cachechk results from such combo either 🙁 Would be good comparison to 486SXL2 @80Mhz running on (ALi M1429) ECS PANDA 386V which is the opposite of the case we are wondering about, its a 486 chipset on 386 motherboard running Cyrix 486'like using 386 bus :-] 😮
Re: Custom interposer module for TI486SXL2-66 PGA168 to PGA132 - HELP!
486SXL2 @80Mhz + ALi M1429
L2 27 us/KB

Attached the cachechk for the 386DX Sis Rabbit hybrid with DX4-100. FSB 33MHz. L2 cache is 64KB. 26 us/KB. L1 cache flies though.

The fastest frame rate I've seen is in Duke Nukem 3D, using Linear Frame Buffer Vesa mode, 320x200 on a Chips 65545 ~50FPS. Otherwise max is around 25FPS (facing a wall) and either an ISA or RAM bottleneck.

rasz_pl wrote on 2026-06-02, 02:38:

How do you write such application? In nineties even geniuses like Abrash and Carmack didnt bother and Quake speeds up linearly all the way to 2MB of L2 and would probably go beyond https://dependency-injection.com/2mb-cache-benchmarks/ Quake codebase is not cache aware, both total size and cache line width were not a factor during development. It was simply too much to worry about data locality when shipping a DOS game in the nineties. Nowadays everyone tries to fit hot path in L1, but even then you cant predict cache sizes. Data oriented programming only became popular with bad/convoluted console architectures where you simply had to worry about data layout to avoid huge latency penalties.

You'd be able to see when performance starts to degrade because you're holding onto too much data to render one frame. You couldn't access all maps, all textures, all things in the game at once... You'd keep room size small, reuse the same enemy, same 64x64 textures (per level)... maps themselves are pre-rendered lists of coordinates and lighting... In Doom 3 though they were taking data off the HDD mid game and made everyone upgrade to SSDs..

Reply 22 of 24, by MikeSG

User metadata
Rank Oldbie
Rank
Oldbie
AlexZ wrote on 2026-06-02, 06:30:

I would speculate that one of the reason why Cyrix put just 1KB cache on 486DLC is because they were aware of the cache flush issue, couldn't solve it on 386 platform and decided not to make the CPU more expensive than needed. They didn't sell well anyway, there weren't that many boards that supported them.

Cache also costs a lot in space in silicon, and is a large surface area to generate defects (and the need to reduce cache size to have a good yield for the CPU)

Reply 23 of 24, by rasz_pl

User metadata
Rank l33t
Rank
l33t
MikeSG wrote on 2026-06-02, 12:23:

cachechk-386-dx4-100.jpg
Attached the cachechk for the 386DX Sis Rabbit hybrid with DX4-100. FSB 33MHz. L2 cache is 64KB. 26 us/KB. L1 cache flies though.

26 us/KB is fantastic for 386 chipset, even your ram speed is pretty in line with fastest 486 chipsets 😮 I seem to suck at theretroweb and wasnt able to find dual socket Sis Rabbit boards 🙁

numbers I cited earlier were for DX2 @66
>3-2-2-2: 32 us/KB
>2-1-1-1 24 us/KB

SX @33
37 us/KB compaq Re: Compaq (197005-001) cache module - open source reproduction no idea if cache was set to 2111 (Turbo) or 3222 (fast)

DX4 @100
3-2-2-2: 28 us/KB ALi M1489/DataExpert EXP8449 Re: 486 dx4-100 outperforms P100 . VT82C486 FIC 486-GAC-2 Re: FIC 486-GAC-2 and proprietary cache module - Prototype works! Revision in progress.
2-1-1-1 20 us/KB UM82C881 Hot433 https://www.youtube.com/watch?v=wm_QoqwQ1hU&t=180 . VIA VT82C486 FIC 486-GAC-2 Re: FIC 486-GAC-2 and proprietary cache module - Prototype works! Revision in progress. . VLSI VL82C481 Compaq
PROLINEA Re: Compaq (197005-001) cache module - open source reproduction

AMD Am5x86-P75 @133
2-1-1-1 18 us/KB VLSI VL82C481 Compaq PROLINEA Re: Compaq (197005-001) cache module - open source reproduction

DX4 100 @150 using 50MHz FSB 😮
2-1-1-1 14 us/KB VLSI VL82C481 Compaq PROLINEA Re: Compaq (197005-001) cache module - open source reproduction
2-1-1-1 13 us/KB with more tweaks to memory controller Re: Compaq (197005-001) cache module - open source reproduction

In conclusion no Burst support means ~30% slower L2 cache no matter the CPU speed, but the impact is less significant with faster CPUs. On DX2 3222 timings make L2 barely faster than ram, while DX4 is somehow capable of fetching L2 much faster making L2-ram difference almost x2 even at 3222 timings.

https://github.com/raszpl/sigrok-disk FM/MFM/RLL decoder
https://github.com/raszpl/FIC-486-GAC-2-Cache-Module (AT&T Globalyst)
https://github.com/raszpl/386RC-16 ram board
https://github.com/raszpl/440BX Reference Design adapted to Kicad

Reply 24 of 24, by MikeSG

User metadata
Rank Oldbie
Rank
Oldbie

I'll agree that in synthetic tests there's a 30% difference, but games that run main loops in a small memory foot print are different. L1 has a "least recently used" structure. L2 not as frequently used. RAM rarely used.

I dug up the 486 motherboard cachchk.
486 (SIS EISA - 85C406/85C411/85C431). L2: 20us/KB (55.9MB/s), RAM 30us/KB (36.7MB/s). Write: 16us/KB.
386 (SIS - 310/320/330). L2: 26us/KB (42.5MB/s), RAM 43us/KB (25.9MB/s). Write 24us/KB.

L2 was 31% faster. RAM 41% faster.

There's no dual socket SIS 310/320/330 (SIS rabbit) anywhere that I can find except in the Acer V5 (not on retroweb). I need to upload a picture some day.

Those cache module times are crazy.