Cirrus Logic GD5429 VLB significantly slower than a TSENG ET4000

Reply 20 of 33, by 386SX

Posted on 2023-03-25, 07:42

386SX Offline

Rank l33t

Rank: l33t
Posts: 3192
Joined: 2014-10-27, 12:56

I don't remember benchmark numbers but last time I tried a DX4-100 with the GD5429 VLB beside being a bit unstable when trying to increase performance with bios settings, it looked like a fast config. The GD5429 maybe wasn't the fastest video chip but not the slowest too. Worked with higher clocks and for Win GUI also with a faster BitBLT engine and fast video ram modules too.

Reply 21 of 33, by mbarszcz

Posted on 2023-03-25, 20:19

mbarszcz Offline

Rank Newbie

Rank: Newbie
Posts: 64
Joined: 2021-11-24, 16:11

Changing my BIOS setting Local Ready Delay Setting = Delay 1T to None has brought my VESA memory speed up to 8512KB/s. I ran a few more tests, and with mkarcher's utility, I was able to increase the memory clock up from the default 50MHz to 60MHz and 70MHz. The Local Ready delay = none and a 70MHz memory clock really brought the Cirrus up to very near the performance of the TSENG ET4000. Interestingly the Et4000 won't run with the Local Ready Delay Setting at None. It needs 1T.

1  -                                  ET4000/W32P      CL-GD5429        CL-GD5429        CL-GD5429        CL-GD5429        CL-GD5429       
2 ---------------------------------- ---------------- ---------------- ---------------- ---------------- ---------------- ---------------- 
3  CPU / Bus                          100MHz / 33MHz   100MHz / 33MHz   100MHz / 33MHz   100MHz / 33MHz   100MHz / 33MHz   100MHz / 33MHz  
4  Local Ready Delay Setting          Delay 1T         Delay 1T         None             None             None             None            
5  MCLK                               n/a              50.12            50.12            60.86            69.81            69.81           
6  3D Bench (Faster PCs) fps          69.2             65.5             67.6             69.1             69.2             100.0%          
7  Chris’s 3D Benchmark fps           48.2             41.3             45.6             43.4             43.4             90.0%           
8  Chris’s 3D Benchmark 640x480 fps   14.2             13.5             14.1             14.2             14.2             100.0%          
9  Doom Max Details fps               40.86            38.2             39.9             41               41.1             100.6%          
10  PC Player Benchmark fps            20.8             19.6             20               20.1             20.1             96.6%           
11  Speedsys VESA Memory Speed KB/s    22888            7672             8455             9300             9940             43.4%           
12

Thanks everyone for helping me figure this out.

Reply 22 of 33, by mkarcher

Posted on 2023-03-25, 21:46

mkarcher Offline

Rank l33t

Rank: l33t
Posts: 3317
Joined: 2019-01-19, 16:29
Location: Germany

mbarszcz wrote on 2023-03-25, 20:19:

The Local Ready delay = none and a 70MHz memory clock really brought the Cirrus up to very near the performance of the TSENG ET4000.

I wouldn't call it "the Cirrus is near the performance of the ET4000", because raw throughput is still less than 50% of the ET4000 (which is not surprising), but on the other hand, the Cirrus card got fast enough that in most practical applications, the graphics card no longer is the bottleneck. So all the excess performance of the ET4000 card goes unused in 3D Bench, Chris (high) and Doom, and even in PCPlayer bench and Chris (low), the performance difference isn't that significant.

mbarszcz wrote on 2023-03-25, 20:19:

Interestingly the Et4000 won't run with the Local Ready Delay Setting at None. It needs 1T.

I'm surprised that you need the Ready Delay at 33MHz with an ET4000 card. If you need that delay, it means that the RDY signal from the ET4000 card arrives so late that it's undefined to what bus clock cycle it applies. Delaying the signal until the start of the next cylce causes a unambigous interpretation. I don't think there is anything you can do about it, though.

Reply 23 of 33, by drosse1meyer

Posted on 2023-03-25, 22:42

drosse1meyer Offline

Rank Member

Rank: Member
Posts: 417
Joined: 2020-11-17, 17:43
Location: United States

Yes that looks much better. Glad you got it working.

P1: Packard Bell - 233 MMX, Voodoo1, 64 MB, ALS100+
P2-V2: Dell Dimension - 400 Mhz, Voodoo2, 256 MB
P!!! Custom: 1 Ghz, GeForce2 Pro/64MB, 384 MB

Reply 24 of 33, by dj_pirtu

Posted on 2023-03-29, 09:56

dj_pirtu Offline

Rank Member

Rank: Member
Posts: 269
Joined: 2020-01-14, 11:32

Some time ago I was benchmarking Cirrus VLB cards and found out the same thing, 5428 was noticeably slower than 5434.

And that 5434 is really a super fast card with 2MB memory. It takes 50MHz VLB bus clock and put 486-PCI system in shame.

Reply 25 of 33, by Anonymous Coward

Posted on 2023-04-01, 16:14

Anonymous Coward Offline

Rank l33t++

Rank: l33t++
Posts: 5005
Joined: 2008-03-20, 05:37
Location: Shandong, China

I've never been a fan of 542x VLB cards, but I have a 5429 that I tested briefly. I remember encountering the LFB limitation. I figured either my card was bad or I was using a bad version of univbe. These cards really have 24-bit addressing limitation? That's pretty lame. I remember reading they had a 16-bit interface, but I didn't realise they were stuck with ISA addressing too. It almost seems more like something Trident would do.

What year did the 5429 come out? I think I remember something really late like 1995. It seems strange since the 5434 was already available in 1994.

"Will the highways on the internets become more few?" -Gee Dubya
V'Ger XT|Upgraded AT|Ultimate 386|Super VL/EISA 486|SMP VL/EISA Pentium

Reply 26 of 33, by rasz_pl

Posted on 2023-04-09, 08:55

rasz_pl Offline

Rank l33t

Rank: l33t
Posts: 4208
Joined: 2017-06-04, 00:57

Doom while period correct is not the greatest benchmark of VLB cards due to the way it writes to memory one byte at a time because its rendering Columns of pixels. https://fabiensanglard.net/doomIphone/doomCla … sicRenderer.php

Try https://github.com/viti95/FastDoom in VESA mode, that makes Doom render into RAM buffer and then blast framebuffer to VGA in one go at max speed. Another option is trying some other games, for example testing in Transport Tycoon (altho I dont know how would one measure performance difference).

https://github.com/raszpl/FIC-486-GAC-2-Cache-Module for AT&T Globalyst
https://github.com/raszpl/386RC-16 memory board
https://github.com/raszpl/440BX Reference Design adapted to Kicad
https://github.com/raszpl/Zenith_ZBIOS MFM-300 Monitor

Reply 27 of 33, by mkarcher

Posted on 2023-04-09, 12:00

mkarcher Offline

Rank l33t

Rank: l33t
Posts: 3317
Joined: 2019-01-19, 16:29
Location: Germany

rasz_pl wrote on 2023-04-09, 08:55:

Doom while period correct is not the greatest benchmark of VLB cards due to the way it writes to memory one byte at a time because its rendering Columns of pixels. https://fabiensanglard.net/doomIphone/doomCla … sicRenderer.php

And it's the ET4000 that looses more potential performance due to this than the Cirrus Logic chip, because the CL card runs on 50% of its available bus width (which is 16 bits), whereas the ET4000/W32 runs on 25% of its available bus width (which is 32 bits).

20 years ago, I toyed around with the idea of rendering 4 columns into hot cache of main memory, then interleaving these 4 columns into 32-bit values that can be written as-is into the graphics card. If you render 4 coloumns that are 4 pixels apart instead of 4 columns that are next to each other, this also works for Mode-X schemes. That's how my prototype renderer core (nothing fancy or to show off here) worked: It first collected the drawing parameters for the 320 columns (probably in a stupid uneducated way, I tried to re-invent BSP trees without reading the relevant literature to learn something), and then picked the columns that can be combined into dword writes, to have them rendered into an 4-column (800 byte) buffer and transferred to video memory with that 800-byte buffer still in L1. (Obviously, that technique is targeted to a 486 processor with considerably more than 1K of L2 cache. The original 486SLC is out. L1WB is likely preferrable to avoid the 800-byte sets going to L2 at all)

I might take a peek into FastDoom whether they also invented this technique (no offense taken, of course), or whether they had even smarter ideas.

Reply 28 of 33, by ViTi95

Posted on 2023-04-09, 13:50

ViTi95 Offline

Rank Oldbie

Rank: Oldbie
Posts: 555
Joined: 2017-02-14, 22:18

rasz_pl wrote on 2023-04-09, 08:55:

Doom while period correct is not the greatest benchmark of VLB cards due to the way it writes to memory one byte at a time because its rendering Columns of pixels. https://fabiensanglard.net/doomIphone/doomCla … sicRenderer.php

Try https://github.com/viti95/FastDoom in VESA mode, that makes Doom render into RAM buffer and then blast framebuffer to VGA in one go at max speed. Another option is trying some other games, for example testing in Transport Tycoon (altho I dont know how would one measure performance difference).

That's correct, Doom only uses 8-bit writes to VRAM. Only FastDoom mode 13h and mode VBR (VESA with backbuffer) executables use 32-bit writes to VRAM, for example I noticed huge speedup with Trident TGUI 9440AGi VLB cards (full 32-bit VLB support). Those modes also benefit a lot from the linear VRAM layout.

mkarcher wrote on 2023-04-09, 12:00:
I might take a peek into FastDoom whether they also invented this technique (no offense taken, of course), or whether they had even smarter ideas.

I'm able to do 32-bit writes to VRAM because the scene is rendered in a linear RAM backbuffer (using the Heretic rendering functions which use 8-bit writes), thus is possible to copy that linear backbuffer to the VRAM using 32-bit copies. On some systems is faster to use the backbuffer, it depends a lot on the CPU/RAM/GPU combination used. Also Mode X is really slow in some cards, even if they are VLB compatible. Maybe the OUT calls to select between planes causes those slowdowns.

https://www.youtube.com/@viti95

Reply 29 of 33, by mockingbird

Posted on 2023-04-09, 16:10

mockingbird Offline

Rank Oldbie

Rank: Oldbie
Posts: 1432
Joined: 2013-06-17, 02:57

ViTi95 wrote on 2023-04-09, 13:50:
That's correct, Doom only uses 8-bit writes to VRAM. Only FastDoom mode 13h and mode VBR (VESA with backbuffer) executables use 32-bit writes to VRAM, for example I noticed huge speedup with Trident TGUI 9440AGi VLB cards (full 32-bit VLB support). Those modes also benefit a lot from the linear VRAM layout.

Off the top of your head, is there a way to execute FASTDOOMVBR.EXE without having the WAD/Episode selection menu appear so as to be able to do a qualitative benchmark? I use FASTDOOMVBR but I would like to see just how much faster it is than regular Doom.

Thanks

(Decommissioned:)

Reply 30 of 33, by rasz_pl

Posted on 2023-04-09, 22:59

rasz_pl Offline

Rank l33t

Rank: l33t
Posts: 4208
Joined: 2017-06-04, 00:57

mkarcher wrote on 2023-04-09, 12:00:

20 years ago, I toyed around with the idea of rendering 4 columns into hot cache of main memory, then interleaving these 4 columns

how would you know the next column is going to be drawn until you get to it? any preprocessing passes will kill any eventual framebuffer write gains

mkarcher wrote on 2023-04-09, 12:00:

I might take a peek into FastDoom whether they also invented this technique (no offense taken, of course), or whether they had even smarter ideas.

One idea ViTi95 had was to exploit 2D accelerators (8514 so mach8/32/64/S3/et4000 etc) by drawing each column as a linear span into graphic memory and firing accelerator command to BITBLT it at 90 angle into final place.

My idea was to flip monitor on the side 😀 but then you have ceiling/floor rendering to worry about so no free cake 😁

https://github.com/raszpl/FIC-486-GAC-2-Cache-Module for AT&T Globalyst
https://github.com/raszpl/386RC-16 memory board
https://github.com/raszpl/440BX Reference Design adapted to Kicad
https://github.com/raszpl/Zenith_ZBIOS MFM-300 Monitor

Reply 31 of 33, by ViTi95

Posted on 2023-04-09, 23:09

ViTi95 Offline

Rank Oldbie

Rank: Oldbie
Posts: 555
Joined: 2017-02-14, 22:18

mockingbird wrote on 2023-04-09, 16:10:

Off the top of your head, is there a way to execute FDOOMVBR.EXE without having the WAD/Episode selection menu appear so as to be able to do a qualitative benchmark? I use FASTDOOMVBR but I would like to see just how much faster it is than regular Doom.

Thanks

You can select the IWAD automatically with the command line parameter "-iwad <wadfile>". I'm doing several changes to FastDoom to do better automated benchmarks (output CSV results), it will be available for next release.

https://www.youtube.com/@viti95

Reply 32 of 33, by rasz_pl

Posted on 2023-04-09, 23:57

rasz_pl Offline

Rank l33t

Rank: l33t
Posts: 4208
Joined: 2017-06-04, 00:57

ViTi95 wrote on 2023-04-09, 23:09:

mockingbird wrote on 2023-04-09, 16:10:

Off the top of your head, is there a way to execute FDOOMVBR.EXE without having the WAD/Episode selection menu appear so as to be able to do a qualitative benchmark? I use FASTDOOMVBR but I would like to see just how much faster it is than regular Doom.

Thanks

You can select the IWAD automatically with the command line parameter "-iwad <wadfile>". I'm doing several changes to FastDoom to do better automated benchmarks (output CSV results), it will be available for next release.

can you do individual frame times? 😀 The article that started serious discussion about benchmarking games was 2011 'Inside the second: A new look at game benchmarking' https://web.archive.org/web/20121031015421/ht … me-benchmarking This and the follow up in 2013 https://www.guru3d.com/articles-pages/fcat-be … g-review,1.html lead to invention of G-sync/Adaptive-Sync/FreeSync.
Frametime instrumentation will be able to tell you where the biggest bottlenecks are in Doom codebase on different hardware configurations. You could then go further by toggle skipping code blocks to look in more detail. Nop wall drawing, nop ceiling/floor drawing, nop screen output altogether, replay geometry instead of calculating it live etc. Poor mans profiling.

https://github.com/raszpl/FIC-486-GAC-2-Cache-Module for AT&T Globalyst
https://github.com/raszpl/386RC-16 memory board
https://github.com/raszpl/440BX Reference Design adapted to Kicad
https://github.com/raszpl/Zenith_ZBIOS MFM-300 Monitor

Reply 33 of 33, by ViTi95

Posted on 2023-04-10, 01:04

ViTi95 Offline

Rank Oldbie

Rank: Oldbie
Posts: 555
Joined: 2017-02-14, 22:18

Right now FastDoom doesn't do it, but recently I've discovered a fork of FastDoom that analyzes function by function frametimes. I haven't spent much time researching how it works, but it's a good idea and will look onto it further.

https://github.com/Pixinn/FastDoom-Bench

https://www.youtube.com/@viti95

Main menu