VOGONS


Reply 820 of 987, by ViTi95

User metadata
Rank Member
Rank
Member

Not exactly, Doom only displays any object in the portion of the BSP tree it selects for each frame (depending on the viewing angle and position), without taking into account a distance limit. This option in FastDoom limits the visible distance of objects that are not enemies.

https://www.youtube.com/@viti95

Reply 821 of 987, by ViTi95

User metadata
Rank Member
Rank
Member

Update: There will be frametime analysis (1% lows and per frame frametime), but just for a special benchmark mode since it takes quite some CPU time to calculate on each frame. For example, this is the result on a simulated 486DX-33 on DosBox-X, using Phil's benchmark High and Low detail configurations (Ultimate Doom 1.9, demo1):

Max detail benchmark: 14.062 fps, 11.082 1% low fps (without frametime calculation: 14.547 fps)
Min detail benchmark: 88.404 fps, 44.043 1% low fps (without frametime calculation: 91.513 fps)

demo1_frametimes.png
Filename
demo1_frametimes.png
File size
50.82 KiB
Views
2243 views
File comment
FastDoom 0.9.7 frametimes demo1
File license
CC-BY-4.0

https://www.youtube.com/@viti95

Reply 822 of 987, by rasz_pl

User metadata
Rank l33t
Rank
l33t

Fascinating graph, shows that both high and low detail modes share same bottlenecks for the most part 😮 this gives me hope some of them will be in untouched spots (bsp tree traversal etc).

imo 1ms precision is too low.

                G_SaveCSVResult(realtics, resultfps, onepercentlow_fps);

0.1% is also quite important, nim/max frametimes too.

https://github.com/viti95/FastDoom/blob/73584 … 606C13-L1606C98

frametime = (unsigned int *)Z_MallocUnowned(20000 * sizeof(unsigned int), PU_STATIC);

is 20000 a safe number? can time demos be longer? potential for out-of-bounds write bug

https://github.com/viti95/FastDoom/blob/73584 … 485C1-L1485C108

                for (i = fix_start; i < onepercentlow_num + fix_start; i++) // Omit first frame (load data)

I have a bad feeling about this, what does "Omit first frame (load data)" mean here? this is after bubblesort so 1st byte is no longer first benchmark frame, and it looks like you already skipped one frame earlier?

                fix_start = frametime_position - benchmark_gametics + 1;

https://github.com/viti95/FastDoom/blob/d0e32 … /ns_task.c#L152

speed = 1192030L / TickBase;

14.31818 MHz / 12 = ~1.193182

Is servicing I_TimerMS higher timer interrupt rate eating all this cpu?

Much faster and precise way is reading raw PIT counters https://www.freebasic.net/forum/viewtopic.php?t=20941 TS_ServiceSchedule already keeps track of interrupts/counter loops in TaskServiceCount
https://github.com/viti95/FastDoom/blob/d0e32 … /ns_task.c#L231

    TaskServiceCount += TaskServiceRate;

raw PIT timer reads + TaskServiceCount will give you maximum precision possible without increasing interrupt rate

Open Source AT&T Globalyst/NCR/FIC 486-GAC-2 proprietary Cache Module reproduction

Reply 823 of 987, by ViTi95

User metadata
Rank Member
Rank
Member

imo 1ms precision is too low

Is servicing I_TimerMS higher timer interrupt rate eating all this cpu?

I wasn't able to use the PIT to get high precision timing, so I relied on the Apogee Sound System task services to generate a millisecond timer. It can be much more precise, but the faster it runs, the more cycles it steals from the cpu.

is 20000 a safe number? can time demos be longer? potential for out-of-bounds write bug

yeah I know this is unsafe, i'm trying to find a way to get easily the number of tics in a Doom demo prior to execution

Much faster and precise way is reading raw PIT counters https://www.freebasic.net/forum/viewtopic.php?t=20941 TS_ServiceSchedule already keeps track of interrupts/counter loops in TaskServiceCount
https://github.com/viti95/FastDoom/blob/d0e32 … /ns_task.c#L231

I'll take a look into those variables, i'm still learning how it works 😅

EDIT:

I have a bad feeling about this, what does "Omit first frame (load data)" mean here? this is after bubblesort so 1st byte is no longer first benchmark frame, and it looks like you already skipped one frame earlier?

fix_start = frametime_position - benchmark_gametics + 1

The first frame in any demo run always spents a lot of time, so much that it basically destroys the whole 1% low average (we are talking more than 500 ms). That's why I decided not to take it into account.

0.1% is also quite important

That's true, but the lack of frames in demos make this value close to useless. For example demo1 of Ultimate Doom is just 1710 frames. For 1% low, only 17 frames are used for the average and the 0.1% low will use only just 1 frame.

https://www.youtube.com/@viti95

Reply 824 of 987, by rasz_pl

User metadata
Rank l33t
Rank
l33t
ViTi95 wrote on 2023-07-07, 09:43:

I wasn't able to use the PIT to get high precision timing, so I relied on the Apogee Sound System task services to generate a millisecond timer. It can be much more precise, but the faster it runs, the more cycles it steals from the cpu.

from what I see in the code you are using /FASTDOOM/ns_task.c TS_ServiceSchedule and that is programming PIT directly
https://github.com/viti95/FastDoom/blob/d0e32 … /ns_task.c#L140

    outp(0x43, 0x36);
outp(0x40, TaskServiceRate);
outp(0x40, TaskServiceRate >> 8);
ViTi95 wrote on 2023-07-07, 09:43:

Much faster and precise way is reading raw PIT counters https://www.freebasic.net/forum/viewtopic.php?t=20941 TS_ServiceSchedule already keeps track of interrupts/counter loops in TaskServiceCount
https://github.com/viti95/FastDoom/blob/d0e32 … /ns_task.c#L231

I'll take a look into those variables, i'm still learning how it works 😅

Delete I_TimerMS, dont TS_ScheduleTask(I_TimerMS, 1000, 1, NULL). Replace counting milliseconds with counting PIT ticks
https://github.com/viti95/FastDoom/blob/73584 … M/d_main.c#L571

        start_time = mscount;

becomes something like

        start_time1 =  TaskServiceCount;
outp(0x43, 0x0); // latch PIT counter for zero channel
start_time2 = inp(0x40) + inp(0x40) << 8; // low byte of PIT counter + high byte of PIT counter SHL 8

and https://github.com/viti95/FastDoom/blob/73584 … M/d_main.c#L604

       end_time = mscount - start_time;

goes

        outp(0x43, 0x0); // latch PIT counter for zero channel
end_time = start_time1 - TaskServiceCount + start_time2 - (inp(0x40) + inp(0x40) << 8;

Im sure I made a mistake somewhere in there 😀
/Edit: sure did, plenty 😀
for starters PIT is counting down https://wiki.osdev.org/Programmable_Interval_Timer https://www.xtof.info/Timing-on-PC-familly-under-DOS.html
so its more like

        start_time = TaskServiceCount + TaskServiceRate - inp(0x40) + inp(0x40) << 8; // low byte of PIT counter + high byte of PIT counter SHL 8
end_time = TaskServiceCount + TaskServiceRate - (inp(0x40) + inp(0x40) << 8 - start_time;

TaskServiceCount overflows ( with long would be every hour?), looks like its taken care of here https://github.com/viti95/FastDoom/blob/d0e32 … /ns_task.c#L232

    TaskServiceCount += TaskServiceRate;
if (TaskServiceCount > 0xffffL)
{
TaskServiceCount &= 0xffff;

so we just need some clever logic around start_time/end_time to take that into consideration
/edit

no more 1ms interrupts, highest precision possible (single PIT ticks at ~1.193182MHz), all related _time variables should be upgraded from int to longs. Final timings can be calculated back to ms or just truncated a bit to fit in int frametime array.

ViTi95 wrote on 2023-07-07, 09:43:

The first frame in any demo run always spents a lot of time, so much that it basically destroys the whole 1% low average (we are talking more than 500 ms). That's why I decided not to take it into account.

but dont you already skip it here with that +1?

fix_start = frametime_position - benchmark_gametics + 1;
ViTi95 wrote on 2023-07-07, 09:43:

0.1% is also quite important

That's true, but the lack of frames in demos make this value close to useless. For example demo1 of Ultimate Doom is just 1710 frames. For 1% low, only 17 frames are used for the average and the 0.1% low will use only just 1 frame.

perfect, so we will know what is the timeframe of the slowest frame in the demo 😀 our absolute bottom performance

Open Source AT&T Globalyst/NCR/FIC 486-GAC-2 proprietary Cache Module reproduction

Reply 825 of 987, by ujav

User metadata
Rank Newbie
Rank
Newbie

Not sure if this has been discussed here before, but I found that Wolfenstein 3d has additional optimizations for wall pixel scaling, that's why it has a "Thinking" screen every time the viewport is resized. To what extent is this applicable to Doom engine?

https://fabiensanglard.net/b/gebbwolf3d.pdf
(page 178)

Reply 826 of 987, by Gmlb256

User metadata
Rank l33t
Rank
l33t
ujav wrote on 2023-07-12, 00:36:

Not sure if this has been discussed here before, but I found that Wolfenstein 3d has additional optimizations for wall pixel scaling, that's why it has a "Thinking" screen every time the viewport is resized. To what extent is this applicable to Doom engine?

https://fabiensanglard.net/b/gebbwolf3d.pdf
(page 178)

Those optimizations wouldn't work in the way you expect in DOOM as the engine has different wall heights and its textures are varied in width and height.

VIA C3 Nehemiah 1.2A @ 1.46 GHz | ASUS P2-99 | 256 MB PC133 SDRAM | GeForce3 Ti 200 64 MB | Voodoo2 12 MB | SBLive! | AWE64 | SBPro2 | GUS

Reply 827 of 987, by ViTi95

User metadata
Rank Member
Rank
Member

Time for a new release! FastDoom 0.9.7

This has taken quite a long time to develop, but comes with some cool things:

* Add CPU selection for different render paths
* Optimized flat visplane rendering (handcrafted ASM)
* Optimized column rendering for some CPUs (Ken Silverman)
* Upgrade FPS display, now it's possible to show FPS on-screen and on a debug card at the same time
* Debug card port is now selectable on fdoom.cfg file. Full support for 4 digit debug cards
* Fixed issue #148. Now it's possible to use SB-MIDI and Sound Blaster Direct Mode without crashing
* Upgrade display menu, now it's possible to select options using left/right keys (previous/next)
* Optimized fuzz column redering (handcrafted ASM)
* Optimized fuzz flat column rendering (handcrafted ASM)
* Optimized backbuffered non-VGA modes. EGA 320x200 16-color mode should be much faster now
* New in-game menu for benchmarks. Now it's possible to execute multiple benchmarks without reloading the game
* Automatically detect MDA/Hercules video card in FDSetup
* Updated bench.bat (new in-game benchmark options). Also new FDBench executable to make easier to launch benchmarks from commmandline (FDBENCH.EXE)
* New advanced benchmark, which stores frametimes on file ftime.csv. Also calculates 1% low and 0.1% low frametimes, which are saved on bench.csv file. Now you will be able to create videos like Digital Foundry 😁
* New commandline parameter to execute benchmarks:
- Single benchmark with current parameters: -benchmark single [demofile] [-advanced]
- Multiple benchmark: -benchmark file [demofile] [benchmarkfile] [-advanced]
- [-advanced] is optional (frametimes)

https://github.com/viti95/FastDoom/releases/d … tDoom.0.9.7.zip

Attachments

  • fdoom_000.png
    Filename
    fdoom_000.png
    File size
    48.27 KiB
    Views
    1965 views
    File comment
    FastDoom 0.9.7 screenshot 1
    File license
    CC-BY-4.0
  • fdoom_001.png
    Filename
    fdoom_001.png
    File size
    50.18 KiB
    Views
    1965 views
    File comment
    FastDoom 0.9.7 screenshot 2
    File license
    CC-BY-4.0
  • fdoom_002.png
    Filename
    fdoom_002.png
    File size
    42.72 KiB
    Views
    1965 views
    File comment
    FastDoom 0.9.7 screenshot 3
    File license
    CC-BY-4.0

https://www.youtube.com/@viti95

Reply 828 of 987, by marxveix

User metadata
Rank Member
Rank
Member

Thank you, i soon test it out with slowest Transmeta Cursoe CPU
HP Compaq t5300 thin client (with TM5600 533 MHz)
https://en.wikipedia.org/wiki/Transmeta_Crusoe.

31 different MiniGL/OpenGL Win9x files for all Rage 3 cards: Re: ATi RagePro OpenGL files

Reply 829 of 987, by ViTi95

User metadata
Rank Member
Rank
Member

Forgot to say that with this new release you will be able to create new benchmark scripts easily and run them without effort. Still have to create a Wiki entry to explain how to do it, and very probably will make a video to explain the new features.

https://www.youtube.com/@viti95

Reply 830 of 987, by 7F20

User metadata
Rank Member
Rank
Member

Just tested it. Still seems like the music volume is super low when selecting Adlib sound for me. I know Max the Rabbit also noticed this, but I can't help but wonder if it's limited to some specific condition?

Reply 832 of 987, by 7F20

User metadata
Rank Member
Rank
Member
ViTi95 wrote on 2023-07-18, 07:14:

The Adlib volume is a known issue that I still haven't been able to solve 😔

Gotcha. No worries. I wanted to find out if I was missing something. Thanks for all the hard work you do on this. I appreciate it and I know a lot of other people do as well.

Reply 833 of 987, by ViTi95

User metadata
Rank Member
Rank
Member

I've discovered a new optimization for very slow VGA ISA cards on backbuffered modes (fdoom13h.exe, fdoomvbr.exe). Basically using differential copies to the VRAM is much faster, as the ISA bus becomes a huge bottleneck. This is the same optimization used for CGA/EGA modes:

2023-07-19 12_03_30-_nuevo 55 - Notepad++.png
Filename
2023-07-19 12_03_30-_nuevo 55 - Notepad++.png
File size
10.03 KiB
Views
1680 views
File comment
FastDoom new optimization for slow VGA cards
File license
CC-BY-4.0

This test has been done with flat visplanes and full screen, there is also performance advantage with regular visplane rendering. Differential 16-bit copies is a bit slower, since it's hard to find 16-bit screen matches in VGA modes.

https://www.youtube.com/@viti95

Reply 834 of 987, by ViTi95

User metadata
Rank Member
Rank
Member

Update: A bit faster with 32-bit reads + 8-bit writes

2023-07-19 15_21_20-_nuevo 55 - Notepad++.png
Filename
2023-07-19 15_21_20-_nuevo 55 - Notepad++.png
File size
18.92 KiB
Views
1648 views
File comment
FastDoom new optimization for slow VGA cards (ASM opt)
File license
CC-BY-4.0

https://www.youtube.com/@viti95

Reply 835 of 987, by wbc

User metadata
Rank Member
Rank
Member

that looks awesome, does it work by iterating over entire framebuffer and finding pixel-wise differences (like "dirty rectangles" but "dirty pixels" instead) between current and previous frame or it does take geometry data into account? (so flat visplanes could significally boost "dirty pixels" calculation itself)

--wbcbz7

Reply 836 of 987, by ViTi95

User metadata
Rank Member
Rank
Member

It iterates the whole framebuffer against a copy of the previous frame, pixel by pixel. It's optimized in ASM (works on groups of 4 pixels, a single 32-bit read and 4 comparisons/writes if required). Basically this minimizes the number of writes to the video card. Not very useful for VLB cards, but works great for 8-bit ISA cards

https://www.youtube.com/@viti95

Reply 839 of 987, by drosse1meyer

User metadata
Rank Member
Rank
Member

nice work

i'm gonna have to dig out out my 486 and try this

P1: Packard Bell - 233 MMX, Voodoo1, 64 MB, ALS100+
P2-V2: Dell Dimension - 400 Mhz, Voodoo2, 256 MB
P!!! Custom: 1 Ghz, GeForce2 Pro/64MB, 384 MB