rasz_pl wrote on 2023-04-09, 08:55:
Doom while period correct is not the greatest benchmark of VLB cards due to the way it writes to memory one byte at a time because its rendering Columns of pixels. https://fabiensanglard.net/doomIphone/doomCla … sicRenderer.php
And it's the ET4000 that looses more potential performance due to this than the Cirrus Logic chip, because the CL card runs on 50% of its available bus width (which is 16 bits), whereas the ET4000/W32 runs on 25% of its available bus width (which is 32 bits).
20 years ago, I toyed around with the idea of rendering 4 columns into hot cache of main memory, then interleaving these 4 columns into 32-bit values that can be written as-is into the graphics card. If you render 4 coloumns that are 4 pixels apart instead of 4 columns that are next to each other, this also works for Mode-X schemes. That's how my prototype renderer core (nothing fancy or to show off here) worked: It first collected the drawing parameters for the 320 columns (probably in a stupid uneducated way, I tried to re-invent BSP trees without reading the relevant literature to learn something), and then picked the columns that can be combined into dword writes, to have them rendered into an 4-column (800 byte) buffer and transferred to video memory with that 800-byte buffer still in L1. (Obviously, that technique is targeted to a 486 processor with considerably more than 1K of L2 cache. The original 486SLC is out. L1WB is likely preferrable to avoid the 800-byte sets going to L2 at all)
I might take a peek into FastDoom whether they also invented this technique (no offense taken, of course), or whether they had even smarter ideas.