VOGONS


Reply 240 of 979, by rasz_pl

User metadata
Rank l33t
Rank
l33t
mrau wrote on 2021-06-01, 09:15:

thats 9 fps on that vid;

36 vs 37 fps

mrau wrote on 2021-06-01, 09:15:

btw is there any graphics card that will do the copying from buffer to screen for us?

no, but its possible ones sharing RAM with CPU (intel 810 on really bad boards without additional ram?) could be faster when skipping buffer and rendering in 13h/vesa directly to video ram.

Open Source AT&T Globalyst/NCR/FIC 486-GAC-2 proprietary Cache Module reproduction

Reply 241 of 979, by ViTi95

User metadata
Rank Member
Rank
Member

Some video cards are just faster in VBE2 mode, don't know exactly why. For now it's using exactly the same code path as mode 13h (look at the latest commit in the VBE2 branch), only changes the video memory addresses (new memory address for LFB modes, and 0x40000 for banked modes) and the video mode setting. I've also implemented a special mode for VBE2 cards that support protected mode extensions, and that gives a little extra boost when chaning palettes and setting video pages (i'll test my 486@150 MHZ with an ATI Rage II PCI, which supports this mode very well, even without UniVBE).

Maybe I'll implement something similar to Mode Y but for VBE2 cards, as there is support for multiple video pages. It should be faster than Mode Y, but will have the same problem when rendering invisible columns (requires reading to the video card which is very slow). For now I've to find what makes the game unstable, I think there is some kind of memory corruption that affects all executables (which is a big problem).

MrFlibble wrote on 2021-06-01, 12:09:

Just checked the new release, and the crash on MAP15 in Freed∞m persists. Interestingly, with the default Mode X binary the game just froze with the music still playing normally, while in Mode 13h it crashed to a screenful of text gibberish with music/sound a garbled mess.

Still haven't found what makes the game crash with Freedoom 2, when playing MAP15 just crashes randomly, with all the executables.

https://www.youtube.com/@viti95

Reply 243 of 979, by appiah4

User metadata
Rank l33t++
Rank
l33t++

Why does FastDoom have so much vsync/tearing issues in Mode13h whereas Heretic does not for example? Can this be fixed?

Retronautics: A digital gallery of my retro computers, hardware and projects.

Reply 244 of 979, by rasz_pl

User metadata
Rank l33t
Rank
l33t
ViTi95 wrote on 2021-06-01, 17:38:

Some video cards are just faster in VBE2 mode, don't know exactly why.

me neither, but this fascinates me 😮
Is the difference visible only when linear mode is enabled? or does it also show up in vesa banked?
From the hardware standpoint it should make no difference at all. Did some vendors implement optional fifos on linear memory access only, on >1MB accesses?
Does it show depending on VGA chip or the bus used? ​Is it a quirk of the VL Bus? Is accessing above 1MB window somehow faster? VLB was such a crazy idea, 'lets directly tap CPU address and full 32bit data bus'.
Some VLB VGA cards cheated, like Cirrus Logic CL-GD542x having 16bit host interface, but handling 32bit accesses by dividing them into two cycles invisibly to the host. Looks bad in VIDSPEED with GD542x reaching same ~10MB/s in both 16 and 32bit modes, but pretty much no DOS games ever wrote using 32bit access in short VLB era so game scores remained similar to "proper" 32bit competition.

ViTi95 wrote on 2021-06-01, 17:38:

​For now it's using exactly the same code path as mode 13h (look at the latest commit in the VBE2 branch)

Im sorry, I clicked on the first commit and forgot there were two more 🙁

ViTi95 wrote on 2021-06-01, 17:38:

Maybe I'll implement something similar to Mode Y but for VBE2 cards, as there is support for multiple video pages. It should be faster than Mode Y, but will have the same problem when rendering invisible columns (requires reading to the video card which is very slow).

is it really in case of VLB/PCI cards? ET4000 VLB does >30MB/s 32bit writes, most VLB cards boasted 0 waitstates, implemented large FIFOs, and operated at full CPU FSB frequency - should be no difference between writing to ram vs writing to fast VLB card.

Open Source AT&T Globalyst/NCR/FIC 486-GAC-2 proprietary Cache Module reproduction

Reply 245 of 979, by ViTi95

User metadata
Rank Member
Rank
Member
appiah4 wrote on 2021-06-02, 08:00:

Why does FastDoom have so much vsync/tearing issues in Mode13h whereas Heretic does not for example? Can this be fixed?

I'll check if i'm doing different compared to Heretic and try to fix it, i'm still learning how to program video cards ^^

rasz_pl wrote on 2021-06-02, 08:23:

...

I guess some video cards optimizes internally the way they draw the framebuffer in VBE2 modes compared to mode 13h, even in banked modes. VGA modes have lot's of quirks and internal registers that maybe are not needed in VBE2 modes. I've tested my AMD 5x86 (150MHz) with a SiS 6326 PCI (4Mb) and the results are pretty good:

https://www.youtube.com/watch?v=minIT5dBCVQ

  • Mode Y: 55.481 fps
  • Mode 13h: 59.412 fps (+7%)
  • Mode VBE2 (LFB): 75.308 (+35.7%)

Next time i'll try the same with a VLB system, maybe it will also benefit from this mode. The idea of a new mode for VBE2 that draws columns and visplanes directly in video memory (as in Mode Y) is to avoid writing first to ram memory and then copy data to vram. That should be faster than mode 13h or VBE2 with a backbuffer if there isn't much overdraw image and there isn't much data to be read (invisible objects).

https://www.youtube.com/@viti95

Reply 246 of 979, by rasz_pl

User metadata
Rank l33t
Rank
l33t
ViTi95 wrote on 2021-06-02, 16:21:

I guess some video cards optimizes internally the way they draw the framebuffer in VBE2 modes compared to mode 13h, even in banked modes. VGA modes have lot's of quirks and internal registers that maybe are not needed in VBE2 modes.

They sure do, but 13h is the most basic no thrills mode. I just cant come up with a scenario where there would be something slowing it down. Would love to have someone who actually designed PC graphic chips in the eighties/nineties to shed some light. I know early VGAs were super unoptimized, there is a great story about Video Seven 1 word FIFO by Abrash in https://www.bluesnews.com/abrash/chap64.shtml That chipset is ridiculously slow, below 2MB/s writes, but was considered fast at some point 😮

ViTi95 wrote on 2021-06-02, 16:21:

SiS 6326 PCI (4Mb)
[*]Mode 13h: 59.412 fps (+7%)
[*]Mode VBE2 (LFB): 75.308 (+35.7%)[/list]

just wow

ViTi95 wrote on 2021-06-02, 16:21:

Next time i'll try the same with a VLB system, maybe it will also benefit from this mode. The idea of a new mode for VBE2 that draws columns and visplanes directly in video memory (as in Mode Y) is to avoid writing first to ram memory and then copy data to vram.

this should help all CPUs with no or WT cache and fast VLB, in theory there should be no difference between writing to 0 waitstate VLB card and ram.
EDIT: Just remembered, CPU Galaxy captured extreme case of this https://www.youtube.com/watch?v=qaGQxZEYby0 386DX-40 with 20MB/s VESA write speed while L1 cache read speed is mere 25MB/s

ViTi95 wrote on 2021-06-02, 16:21:

That should be faster than mode 13h or VBE2 with a backbuffer if there isn't much overdraw image and there isn't much data to be read (invisible objects).

Reading will bottleneck it hard, read/write speed difference is usually x2 in case of ISA, grows to x4 on VLB S3 cards, and finally to a whooping x8 on PCI S3 cards.
R_DrawFuzzColumnSaturnLow should in theory give not insignificant boost, as afaik this skips reads. Another option would be dynamically switching between rendering directly to VGA ram only when there are no invisible sprites on screen, otherwise falling back to buffered to skip any direct VGA reads.

PS since I didnt see it in your autoexec, for your Cyrix MediaGX you could also try
6x86opt.exe -l
"Searches for a Linear Frame Buffer and tries to define an ARR/RCR for it allowing Write Gathering." this is Cyrix version of MRTT Write Combining https://en.wikipedia.org/wiki/Memory_type_range_register
Same goes for any P3 tests and FastVID/MTRRLFBE. This has a potential to make 8 bit writes as fast as 32bit ones (at least the consecutive ones), might help with mode Y span speed.

EDIT:

I found something interesting regarding VESA vs 13h: http://www.geocities.ws/liaor2/myutil/m13speed.html

M13speed examines all video BIOS calls (10h.) If the call is a request for VGA mode13h (320x200 256-color), M13speed replaces the call with a request for VESA SVGA 640x400x8 (mode 100h.) M13speed ignores all other video BIOS calls and passes them on to the original interrupt handler. The emulated SVGA 320x200x256 display looks like a normal MCGA 320x200 256-color display, but operates in the framebuffer in native SVGA. The resulting emulation allows the full bus-throughput of the Trident 9440/96xxboard. In other words, M13SPEED speeds up your Trident 9440/96xx's MCGA (320x200 256-color) performance.

The resident-TSR chains to the int10h (video) software-interface. It checks all BIOS int10h calls, responding to VGA setmode13h requests (320x200 MCGA) The TSR's response to setmode13h includes issuing a VESA setmode100h (640x400 SVGA) request, reprogramming the VGA CRTC and SR registers to setup a 320x200 display field, which looks, feels, smells like MCGA 320x200. Strangely enough, reprogramming these 100% standard VGA registers sends most chipsets into the world of screen gyrations, shattered displays, and disgruntled users. If you don't have a Trident 9440/96xx based SVGA, M13speed can't help you

So this is a thing 😮 I wonder how many more vga chipsets could use similar hack.

Open Source AT&T Globalyst/NCR/FIC 486-GAC-2 proprietary Cache Module reproduction

Reply 247 of 979, by ViTi95

User metadata
Rank Member
Rank
Member

I've uploaded a new version of FastDoom (0.8.2). This solves an important bug that made Ultimate Doom switches to behave badly, and adds the new VBE 2.0 modes.

You can grab it here: https://github.com/viti95/FastDoom/releases/d … tDoom_0.8.2.zip

rasz_pl wrote on 2021-06-02, 22:37:

...

I've tested the Cyrix MediaGX optimizing tools, but they tend to crash in my thinclient, don't know why. Maybe they aren't very compatible with the cpu, it's a National Semiconductor Geode GX1 at 300MHz, it should be exactly the same as a Cyrix MediaGXm but maybe the BIOS isn't as compatible as it should be. Anyway the VBE2.0 LFB version is nearly 20% faster even without those optimizations. What it worked for me was MTRRLFBE, with my Pentium 3 550MHz the result was much more impressive, nearly a 40% faster. I'll upload some videos comparing processors and video cards.

Regarding M13speed, VBE doesn't allow to modify directly VGA registers directly, that's why it doesn't work with most video cards. I think it's better to implement directly VBE 2.0 support, as it supports linear framebuffers and resolutions like 320x200 (VBE 2.0 is nearly supported in every video card thanks to UniVBE 5.3)

https://www.youtube.com/@viti95

Reply 248 of 979, by badmojo

User metadata
Rank l33t
Rank
l33t

My previous times:

doom (sound disabled in setup)
110 FPS

fdoom -timedemo demo3 -nosound
126 FPS

fdoom13h -timedemo demo3 -nosound
164.2 FPS

And here's my VBE 2.0 mode time:

FDOOMVBR -timedemo demo3 -nosound
171.3 FPS

FDOOMVBRP didn't work for me as predicted (I'm using UniVBE).

I can't see any screen tearing with this mode and the best part of all for me is that I can now crank up the refresh rate (using UNIRFRSH) - DOOM @ 130Hz! Brilliant 👍

The issue with DOOM2's in-game demo remains but that's just an FYI, it's not a big deal.

Life? Don't talk to me about life.

Reply 249 of 979, by ViTi95

User metadata
Rank Member
Rank
Member

Finally i've tested the Rendition Veritè V2200 on an AMD K6-2 550MHz. The result is insane.

https://www.youtube.com/watch?v=6unNuPympE4

  • Mode Y: 14.446 fps
  • Mode 13h: 59.150 fps (409% faster, seems to have applied VSync by hardware and locked to 60fps. FastDoom VSync option was disabled)
  • Mode VBE2 (LFB, PM): 305.446 fps (2114% faster, just wow)

https://www.youtube.com/@viti95

Reply 250 of 979, by trixster

User metadata
Rank Newbie
Rank
Newbie

I can’t seem to get VBE working on my ISA Mach64, even using 64vbe. 64vbe is supposed to enable vesa 2.0 on ati cards similar to univbe, and even though it mentions 320x200 8bit in the readme, the utility doesn’t seem to give me an 8bit when I use the supplied VesaTest programme. As such, fdoomvbr fails to load and just displays a flashing cursor. Bummer!

Reply 251 of 979, by ViTi95

User metadata
Rank Member
Rank
Member

Maybe you can test UniVBE 5.3 and the FDOOMVBR.EXE executable, it's the most compatible configuration for older video cards. I've tested an ATI Mach64 VT2 PCI and it works fine for me, using both UniVBE and the native VBIOS. I'll try 64vbe, maybe it's incompatible. I wish I had an ISA ATI Mach64 to test.

https://www.youtube.com/@viti95

Reply 252 of 979, by badmojo

User metadata
Rank l33t
Rank
l33t

Yes that is a bummer - I also settled on v5.3a of UniVBE with my older PCI VGA card, the newer versions sort-of worked but didn't configure the LFB correctly (and took up more memory).

Life? Don't talk to me about life.

Reply 253 of 979, by BitWrangler

User metadata
Rank l33t++
Rank
l33t++

IF UniVBE is giving grief there's this to try... https://shawnhargreaves.com/freebe/

Not sure if that was the completely free one I was thinking about from back in the day, circa '97 we were getting fed up of UniVBE/display doctor getting both more bloaty and restricted, and someone wrote a good free alternative, but can't remember if that was it, or there's another one around.

Edit: woohoo lots of tasty stuff here including chipset specific Ati Matrox etc, https://dosdriver.de/graph.php

Unicorn herding operations are proceeding, but all the totes of hens teeth and barrels of rocking horse poop give them plenty of hiding spots.

Reply 254 of 979, by trixster

User metadata
Rank Newbie
Rank
Newbie
ViTi95 wrote on 2021-06-08, 23:48:

Maybe you can test UniVBE 5.3 and the FDOOMVBR.EXE executable, it's the most compatible configuration for older video cards. I've tested an ATI Mach64 VT2 PCI and it works fine for me, using both UniVBE and the native VBIOS. I'll try 64vbe, maybe it's incompatible. I wish I had an ISA ATI Mach64 to test.

Yeah I tried univbe 5.3 and it says it cannot find a Vesa 2.0 compatible graphics card. That’s when I tried ATI’s own 64vbe utility instead.

Reply 255 of 979, by trixster

User metadata
Rank Newbie
Rank
Newbie

This is what the 64vbe readme says:

M64VBE SWITCHES:

s - Enable single window implementation. This single window will be
readable as well as writable.
d - Enable dual read and write windows implementation.
3 - Enable 320x200 & 320x240 modes in 8bpp, 15bpp, 16bpp and 24bpp.
VESA mode number 202h, 10dh, 10eh, 10fh are for 320x200 8bpp,
15bpp, 16bpp and 24bpp respectively. Similarly, VESA mode number
212h, 213h, 214h and 215h are for 320x240.
-3 - Disable 320x200 & 320x240 modes.
vw - Set memory aperture to off.
-vw - Set memory aperture to on.
vga - Use standard VGA CRT parameters.
acc - Use accelerator CRT parameters.

Reply 256 of 979, by ViTi95

User metadata
Rank Member
Rank
Member

Some updates, the next release will have support for a new mode with "better" PC Speaker audio (1 bit resolution at 11 KHz, based on the same idea that Wolfenstein 3D implemented but never used) and Covox support (11 KHz).

https://www.youtube.com/watch?v=afdUcKzbqVY
https://www.youtube.com/watch?v=CSmfDD2OBVY

https://www.youtube.com/@viti95

Reply 257 of 979, by b_w

User metadata
Rank Newbie
Rank
Newbie

Some suggestions to FreeDoom:
1) Use 8x8 Bayer dithering for monochrome (CGA, Hercules) output:
I use these tables -

unsigned char bayer64_1_1[] = {
0,32, 8,40, 2,34,10,42,
48,16,56,24,50,18,58,26,
12,44, 4,36,14,46, 6,38, // This table favors devices like
60,28,52,20,62,30,54,22, // the Hercules card where the
3,35,11,43, 1,33, 9,41, // aspect ratio is close to 1:1
51,19,59,27,49,17,57,25,
15,47, 7,39,13,45, 5,37,
63,31,55,23,61,29,53,21};

unsigned char bayer64_2_1[] = {
0,32,16,48, 2,34,18,50,
24,56, 8,40,26,58,10,42, // This table favors devices like
4,36,20,52, 6,38,22,54, // the CGA card where the aspect
28,60,12,44,30,62,14,46, // ratio is close to 2:1
3,35,19,51, 1,33,17,49,
27,59,11,43,25,57, 9,41,
7,39,23,55, 5,37,21,53,
31,63,15,47,29,61,13,45};

2) For CGA binary use BrightRed-BrightGreen-BrightYellow-Black palette. It produces more accurate color "mixings". While DarkRed is nearly equals DarkYellow. Or BrightCyan-BrightMagenta-White-Black palette? See attach for comparison.
3) Made a binary for monochome:
VGA 640x480 mode 0x11/0x12.
EGA 640x350 mode 0x0F/0x10.

Here is DOOM on CGA samples:

Attachments

  • doom3.png
    Filename
    doom3.png
    File size
    58.45 KiB
    Views
    1845 views
    File license
    Public domain
  • doom2.png
    Filename
    doom2.png
    File size
    175.32 KiB
    Views
    1845 views
    File license
    Public domain
  • doom1.png
    Filename
    doom1.png
    File size
    299.68 KiB
    Views
    1845 views
    File license
    Public domain
  • doom0.png
    Filename
    doom0.png
    File size
    310.04 KiB
    Views
    1845 views
    File license
    Public domain
Last edited by b_w on 2021-07-26, 15:25. Edited 1 time in total.

Reply 258 of 979, by Bondi

User metadata
Rank Oldbie
Rank
Oldbie
ViTi95 wrote on 2021-06-28, 21:48:

Some updates, the next release will have support for a new mode with "better" PC Speaker audio (1 bit resolution at 11 KHz, based on the same idea that Wolfenstein 3D implemented but never used) and Covox support (11 KHz).

https://www.youtube.com/watch?v=afdUcKzbqVY
https://www.youtube.com/watch?v=CSmfDD2OBVY

Adding Covox support is such a cool idea!! Thank you for your work.
Really impressive to see doom modding to continue on DOS platform.
Is there an easy way to spread fdoom features to Heretic, Hexen and other games sharing same engine?

PCMCIA Sound Cards chart
archive.org: PCMCIA software, manuals, drivers

Reply 259 of 979, by ViTi95

User metadata
Rank Member
Rank
Member
b_w wrote on 2021-07-04, 16:02:

...

1) I first implemented a 4x4 matrix, but it was very slow on 486/586 processors, so I had to downgrade it to 2x2 and optimize it by precalculating LUT tables with all the possible combinations to avoid all the branches. The 8x8 will look better for sure but will hamper the framerate.
2) Those images look very very good, can you share the source code of your implementation? I've been looking for an implementation that generates good results, but most of them require very fast CPUs.
3) I rejected the idea of the VGA 640x480 (16) just because 320x200 (256) looks much better and is much faster, and VGA cards supports both modes anyway. And the EGA 640x350 was also rejected because most EGA cards are painfully slow, even in Mode D most of them can't produce more than 10 fps. Now I get why Doom runs so bad in Amiga computers 😅

Bondi wrote on 2021-07-04, 17:09:

Adding Covox support is such a cool idea!! Thank you for your work.
Really impressive to see doom modding to continue on DOS platform.
Is there an easy way to spread fdoom features to Heretic, Hexen and other games sharing same engine?

Well it can be done, but requires time. The code with the new features can be ported to Heretic and Hexen without major issues, for example I've ported the VGA renderer back from Heretic to FastDoom and that wasn't a big deal, the other way around should be the same. Maybe in the future I can do that, but for now it's pretty much impossible for me (I'll be happy to help if anyone gets to work on it).

https://www.youtube.com/@viti95