Not exactly, Doom only displays any object in the portion of the BSP tree it selects for each frame (depending on the viewing angle and position), without taking into account a distance limit. This option in FastDoom limits the visible distance of objects that are not enemies.


Update: There will be frametime analysis (1% lows and per frame frametime), but just for a special benchmark mode since it takes quite some CPU time to calculate on each frame. For example, this is the result on a simulated 486DX-33 on DosBox-X, using Phil's benchmark High and Low detail configurations (Ultimate Doom 1.9, demo1):

Max detail benchmark: 14.062 fps, 11.082 1% low fps (without frametime calculation: 14.547 fps)
Min detail benchmark: 88.404 fps, 44.043 1% low fps (without frametime calculation: 91.513 fps)

Fascinating graph, shows that both high and low detail modes share same bottlenecks for the most part 😮 this gives me hope some of them will be in untouched spots (bsp tree traversal etc).

imo 1ms precision is too low.

                G_SaveCSVResult(realtics, resultfps, onepercentlow_fps);

0.1% is also quite important, nim/max frametimes too.

https://github.com/viti95/FastDoom/blob/73584 … 606C13-L1606C98

frametime = (unsigned int *)Z_MallocUnowned(20000 * sizeof(unsigned int), PU_STATIC);

is 20000 a safe number? can time demos be longer? potential for out-of-bounds write bug

https://github.com/viti95/FastDoom/blob/73584 … 485C1-L1485C108

                for (i = fix_start; i < onepercentlow_num + fix_start; i++) // Omit first frame (load data)

I have a bad feeling about this, what does "Omit first frame (load data)" mean here? this is after bubblesort so 1st byte is no longer first benchmark frame, and it looks like you already skipped one frame earlier?

                fix_start = frametime_position - benchmark_gametics + 1;

https://github.com/viti95/FastDoom/blob/d0e32 … /ns_task.c#L152

speed = 1192030L / TickBase;

14.31818 MHz / 12 = ~1.193182

Is servicing I_TimerMS higher timer interrupt rate eating all this cpu?

Much faster and precise way is reading raw PIT counters https://www.freebasic.net/forum/viewtopic.php?t=20941 TS_ServiceSchedule already keeps track of interrupts/counter loops in TaskServiceCount
https://github.com/viti95/FastDoom/blob/d0e32 … /ns_task.c#L231

    TaskServiceCount += TaskServiceRate;

raw PIT timer reads + TaskServiceCount will give you maximum precision possible without increasing interrupt rate

imo 1ms precision is too low

Is servicing I_TimerMS higher timer interrupt rate eating all this cpu?

I wasn't able to use the PIT to get high precision timing, so I relied on the Apogee Sound System task services to generate a millisecond timer. It can be much more precise, but the faster it runs, the more cycles it steals from the cpu.

is 20000 a safe number? can time demos be longer? potential for out-of-bounds write bug

yeah I know this is unsafe, i'm trying to find a way to get easily the number of tics in a Doom demo prior to execution

Much faster and precise way is reading raw PIT counters https://www.freebasic.net/forum/viewtopic.php?t=20941 TS_ServiceSchedule already keeps track of interrupts/counter loops in TaskServiceCount
https://github.com/viti95/FastDoom/blob/d0e32 … /ns_task.c#L231

I'll take a look into those variables, i'm still learning how it works 😅


I have a bad feeling about this, what does "Omit first frame (load data)" mean here? this is after bubblesort so 1st byte is no longer first benchmark frame, and it looks like you already skipped one frame earlier?

fix_start = frametime_position - benchmark_gametics + 1

The first frame in any demo run always spents a lot of time, so much that it basically destroys the whole 1% low average (we are talking more than 500 ms). That's why I decided not to take it into account.

0.1% is also quite important

That's true, but the lack of frames in demos make this value close to useless. For example demo1 of Ultimate Doom is just 1710 frames. For 1% low, only 17 frames are used for the average and the 0.1% low will use only just 1 frame.


ViTi95 wrote on 2023-07-07, 09:43:

I wasn't able to use the PIT to get high precision timing, so I relied on the Apogee Sound System task services to generate a millisecond timer. It can be much more precise, but the faster it runs, the more cycles it steals from the cpu.

from what I see in the code you are using /FASTDOOM/ns_task.c TS_ServiceSchedule and that is programming PIT directly
https://github.com/viti95/FastDoom/blob/d0e32 … /ns_task.c#L140

    outp(0x43, 0x36);
outp(0x40, TaskServiceRate);
outp(0x40, TaskServiceRate >> 8);
ViTi95 wrote on 2023-07-07, 09:43:

Much faster and precise way is reading raw PIT counters https://www.freebasic.net/forum/viewtopic.php?t=20941 TS_ServiceSchedule already keeps track of interrupts/counter loops in TaskServiceCount
https://github.com/viti95/FastDoom/blob/d0e32 … /ns_task.c#L231

I'll take a look into those variables, i'm still learning how it works 😅

Delete I_TimerMS, dont TS_ScheduleTask(I_TimerMS, 1000, 1, NULL). Replace counting milliseconds with counting PIT ticks
https://github.com/viti95/FastDoom/blob/73584 … M/d_main.c#L571

        start_time = mscount;

becomes something like

        start_time1 =  TaskServiceCount;
outp(0x43, 0x0); // latch PIT counter for zero channel
start_time2 = inp(0x40) + inp(0x40) << 8; // low byte of PIT counter + high byte of PIT counter SHL 8

and https://github.com/viti95/FastDoom/blob/73584 … M/d_main.c#L604

       end_time = mscount - start_time;


        outp(0x43, 0x0); // latch PIT counter for zero channel
end_time = start_time1 - TaskServiceCount + start_time2 - (inp(0x40) + inp(0x40) << 8;

Im sure I made a mistake somewhere in there 😀
/Edit: sure did, plenty 😀
for starters PIT is counting down https://wiki.osdev.org/Programmable_Interval_Timer https://www.xtof.info/Timing-on-PC-familly-under-DOS.html
so its more like

        start_time = TaskServiceCount + TaskServiceRate - inp(0x40) + inp(0x40) << 8; // low byte of PIT counter + high byte of PIT counter SHL 8
end_time = TaskServiceCount + TaskServiceRate - (inp(0x40) + inp(0x40) << 8 - start_time;

TaskServiceCount overflows ( with long would be every hour?), looks like its taken care of here https://github.com/viti95/FastDoom/blob/d0e32 … /ns_task.c#L232

    TaskServiceCount += TaskServiceRate;
if (TaskServiceCount > 0xffffL)
TaskServiceCount &= 0xffff;

so we just need some clever logic around start_time/end_time to take that into consideration

no more 1ms interrupts, highest precision possible (single PIT ticks at ~1.193182MHz), all related _time variables should be upgraded from int to longs. Final timings can be calculated back to ms or just truncated a bit to fit in int frametime array.

ViTi95 wrote on 2023-07-07, 09:43:

The first frame in any demo run always spents a lot of time, so much that it basically destroys the whole 1% low average (we are talking more than 500 ms). That's why I decided not to take it into account.

but dont you already skip it here with that +1?

fix_start = frametime_position - benchmark_gametics + 1;
ViTi95 wrote on 2023-07-07, 09:43:

0.1% is also quite important

That's true, but the lack of frames in demos make this value close to useless. For example demo1 of Ultimate Doom is just 1710 frames. For 1% low, only 17 frames are used for the average and the 0.1% low will use only just 1 frame.

perfect, so we will know what is the timeframe of the slowest frame in the demo 😀 our absolute bottom performance

Not sure if this has been discussed here before, but I found that Wolfenstein 3d has additional optimizations for wall pixel scaling, that's why it has a "Thinking" screen every time the viewport is resized. To what extent is this applicable to Doom engine?

(page 178)

ujav wrote on 2023-07-12, 00:36:

Not sure if this has been discussed here before, but I found that Wolfenstein 3d has additional optimizations for wall pixel scaling, that's why it has a "Thinking" screen every time the viewport is resized. To what extent is this applicable to Doom engine?

(page 178)

Those optimizations wouldn't work in the way you expect in DOOM as the engine has different wall heights and its textures are varied in width and height.

Time for a new release! FastDoom 0.9.7

This has taken quite a long time to develop, but comes with some cool things:

* Add CPU selection for different render paths
* Optimized flat visplane rendering (handcrafted ASM)
* Optimized column rendering for some CPUs (Ken Silverman)
* Upgrade FPS display, now it's possible to show FPS on-screen and on a debug card at the same time
* Debug card port is now selectable on fdoom.cfg file. Full support for 4 digit debug cards
* Fixed issue #148. Now it's possible to use SB-MIDI and Sound Blaster Direct Mode without crashing
* Upgrade display menu, now it's possible to select options using left/right keys (previous/next)
* Optimized fuzz column redering (handcrafted ASM)
* Optimized fuzz flat column rendering (handcrafted ASM)
* Optimized backbuffered non-VGA modes. EGA 320x200 16-color mode should be much faster now
* New in-game menu for benchmarks. Now it's possible to execute multiple benchmarks without reloading the game
* Automatically detect MDA/Hercules video card in FDSetup
* Updated bench.bat (new in-game benchmark options). Also new FDBench executable to make easier to launch benchmarks from commmandline (FDBENCH.EXE)
* New advanced benchmark, which stores frametimes on file ftime.csv. Also calculates 1% low and 0.1% low frametimes, which are saved on bench.csv file. Now you will be able to create videos like Digital Foundry 😁
* New commandline parameter to execute benchmarks:
- Single benchmark with current parameters: -benchmark single [demofile] [-advanced]
- Multiple benchmark: -benchmark file [demofile] [benchmarkfile] [-advanced]
- [-advanced] is optional (frametimes)

https://github.com/viti95/FastDoom/releases/d … tDoom.0.9.7.zip


Forgot to say that with this new release you will be able to create new benchmark scripts easily and run them without effort. Still have to create a Wiki entry to explain how to do it, and very probably will make a video to explain the new features.


Just tested it. Still seems like the music volume is super low when selecting Adlib sound for me. I know Max the Rabbit also noticed this, but I can't help but wonder if it's limited to some specific condition?

ViTi95 wrote on 2023-07-18, 07:14:

The Adlib volume is a known issue that I still haven't been able to solve 😔

Gotcha. No worries. I wanted to find out if I was missing something. Thanks for all the hard work you do on this. I appreciate it and I know a lot of other people do as well.

I've discovered a new optimization for very slow VGA ISA cards on backbuffered modes (fdoom13h.exe, fdoomvbr.exe). Basically using differential copies to the VRAM is much faster, as the ISA bus becomes a huge bottleneck. This is the same optimization used for CGA/EGA modes:

This test has been done with flat visplanes and full screen, there is also performance advantage with regular visplane rendering. Differential 16-bit copies is a bit slower, since it's hard to find 16-bit screen matches in VGA modes.


Update: A bit faster with 32-bit reads + 8-bit writes

that looks awesome, does it work by iterating over entire framebuffer and finding pixel-wise differences (like "dirty rectangles" but "dirty pixels" instead) between current and previous frame or it does take geometry data into account? (so flat visplanes could significally boost "dirty pixels" calculation itself)


It iterates the whole framebuffer against a copy of the previous frame, pixel by pixel. It's optimized in ASM (works on groups of 4 pixels, a single 32-bit read and 4 comparisons/writes if required). Basically this minimizes the number of writes to the video card. Not very useful for VLB cards, but works great for 8-bit ISA cards


Does FPU presence affect fastdoom performance? For example 80486SX vs 80486DX CPUs with same clock speed. How much?

No, FastDoom doesn't use the FPU at all, so no difference between a 486SX and a 486DX.


nice work

i'm gonna have to dig out out my 486 and try this

