FastDoom. A new Doom port for DOS, optimized to be as fast as possible for 386/486 personal computers!

Reply 1220 of 1249, by digger

Posted on 2025-05-14, 22:45

digger Offline

Rank Oldbie

Rank: Oldbie
Posts: 1450
Joined: 2010-02-12, 18:15
Location: Amsterdam, the Netherlands

analog_programmer wrote on 2025-05-14, 11:00:

rasteri wrote on 2025-05-14, 09:01:

Fastdoom is written is for 32bit processors, it doesn't really make sense to have 16-bit binaries compile from the same codebase. It would just be a total mess.

I thought it's mainly written in C, not 32-bit specific assembly.

Even if something (particularly a protected mode DOS game) is written in C, it would require extensive work to port it to 8086 real mode. For one thing, the pointer size does matter, even in C.

Also, the game wouldn't fit in conventional memory, which means that it would have to use EMS for the memory, which requires application-level memory management. Modifying a C program to use EMS memory in real mode instead of memory that can be directly allocated the usual way in protected mode, that's quite the ordeal.

I think this prospect was exactly why John Carmack opted to develop Doom as a 32-bit protected mode game in the first place.

There's really no point in doing this, since Doom (and even FastDoom) doesn't even run smoothly on a 386, let alone on anything older than that.

Reply 1221 of 1249, by analog_programmer

Posted on 2025-05-15, 04:33

analog_programmer Offline

Rank Oldbie

Rank: Oldbie
Posts: 1370
Joined: 2023-06-22, 16:38
Location: eussr-idiocracy inhabitant

digger wrote on 2025-05-14, 22:45:

Even if something (particularly a protected mode DOS game) is written in C, it would require extensive work to port it to 8086 real mode...

Where did you saw real mode porting? There are some 286 16-bit DOS extenders (TNT DOS extender, DOS/16M), even if they're very hard to find nowadays.

The word Idiot refers to a person with many ideas, especially stupid and harmful ideas.
This world goes south since everything's run by financiers and economists.
This isn't voice chat, yet some people overusing online communications talk and hear voices.

Reply 1222 of 1249, by 7F20

Posted on 2025-05-15, 14:58

7F20 Offline

Rank Member

Rank: Member
Posts: 303
Joined: 2015-04-10, 04:06

analog_programmer wrote on 2025-05-15, 04:33:
There are some 286 16-bit DOS extenders (TNT DOS extender, DOS/16M), even if they're very hard to find nowadays.

80286 DOS extenders introduce significant processing overhead because they act as an intermediary between DOS and the application. They are also incompatible with anything that has a device driver with DMA or TSRs.

Not that the original DOOM code (even in the form of FastDoom) could be made to run on something as slow as a 286 to begin with.

Any usable 286 version of DOOM would have to be a ground-up rewrite, and it would be a miracle if someone actually could do it and make it playable (by playable, I mean anything more than a slideshow).

Reply 1223 of 1249, by MrFlibble

Posted on 2025-05-15, 15:21

MrFlibble Offline

Rank Oldbie

Rank: Oldbie
Posts: 1116
Joined: 2013-06-07, 19:56

7F20 wrote on 2025-05-15, 14:58:

Not that the original DOOM code (even in the form of FastDoom) could be made to run on something as slow as a 286 to begin with.

Any usable 286 version of DOOM would have to be a ground-up rewrite, and it would be a miracle if someone actually could do it and make it playable (by playable, I mean anything more than a slideshow).

I suppose you're talking about something like this?

DOS Games Archive | Free open source games | RGB Classic Games

Reply 1224 of 1249, by 7F20

Posted on 2025-05-15, 15:26

7F20 Offline

Rank Member

Rank: Member
Posts: 303
Joined: 2015-04-10, 04:06

MrFlibble wrote on 2025-05-15, 15:21:

I suppose you're talking about something like this?

Yeah, that's probably about as close as anyone will get, although its a rewrite of the GBA doom, not the OG game.

But, yes, you make a good point. If anyone wants to play Doom on a 286, that already exists.

Reply 1225 of 1249, by analog_programmer

Posted on 2025-05-15, 15:52

analog_programmer Offline

Rank Oldbie

Rank: Oldbie
Posts: 1370
Joined: 2023-06-22, 16:38
Location: eussr-idiocracy inhabitant

analog_programmer wrote on 2025-05-14, 08:22:

Any chance we'll ever see a FastDoom version/port with 286 support trough something like DOS|286 (TNT DOS) Extender or DOS/16M?

I know about Doom8088 and RealDoom ports, but they still seem so immature.

I clearly mean an optimized and adjusted for low-res EGA, CGA or Hercules modes FastDoom with 16-bit DOS extender, not a crippled console Doom version port like Doom8088 in real mode. And of course the deniers here are trying to twist my words even if the words are written.

16-bit extenders... what?

As for the 7F20's statement about 286 DOS extenders - it's just a(n empty) statement. Here is an archived contrary opinion on Rational Systems DOS/16M extender from 1989:

We are using this extensively for a major network product. I reccommend it highly for **most** applications that need lots of me […]
Show full quote

We are using this extensively for a major network product. I reccommend it
highly for **most** applications that need lots of memory. In general, a
program such as a spreadsheet or database that is more *applications* oriented
is much easier to port to a protected mode / real mode environment. A TSR or
heavy systems projects such as a network are much more difficult to port.

DOS/16M (as it is called) provides several
object files that are linked in with your Microsoft C code. It also provides
a "post linker" called 'makepm' that creates a protected mode version of
your executable. A very nice source debugger is included. It looks similar to
a combination of codeview and perisope. It only works on protected mode
portions of your code. I reccommend using both Rationals debugger and
Periscope (for real mode portions).

When you run your program, a minimum of 24K will still reside in real mode
memory space (below 1meg.). This is their run-time kernel. Memory can be
allocated in PM memory space or RM memory space. It can also be allocated as
'transparent'. This means that this memory can be both referenced by a PM
selector that has the same value as the phyisical address. This allows both
real mode and protected mode portions of the application to access the same
memory without converting a real mode segment to a selector. An example:

For time critical stuff (like a real-time interrupt handler) you MUST
avoid the mode switching time on each interrupt. It can take anywhere between
90 and 1100 microseconds to do a round trip from RM to PM and back again.
In this example you must implement a 'bi-modal' interrupt handler. You must
implement a handler in PM and another handler in RM. Protected mode has it's
own int vector table called the IDT. Dos 16/M supports calls to do this type
of stuff. It can get quite hairy with real-time systems stuff as you can see.

Here is a list of the functions that are provided in their interface:

Interrupt Handling Functions
----------------------------
D16IntTransparent -- Installs BIOS interrupt handler in a PM vector
D16Passdown -- Sets a protected-mode interrupt vector to pass down.
(A passdown int, is one that is signaled in PM. The system will
then switch to RM and resignal the interrupt).
D16Passup -- Sets a real-mode interrupt vector to pass up. (The opposite of
pass-down)
D16pmGetVector -- Gets the selector and offset for a PM vector.
D16pmInstall -- Installs a PM vector
D16rmGetVector -- Gets the segment and offset for a RM interrupt vector
D16rmInstall -- Installs a RM vector
D16rmInterrupt -- Signals a real-mode interrupt and sets the registers

Memory Management Functions
---------------------------
D16AbsAddress -- Returns the absolute address of a PM pointer
D16ExtAvail - Returns bytes of extended memory available for allocation
D16GetAccess - Returns the access byte for a protected-mode pointer
D16HugeAlloc -- Allocates a block of memory (very large)
D16LowAvail -- Returns the number of bytes in largest block of DOS managed
memory
D16MemAlloc -- Allocates a data segment
D16MemFree -- Frees a PM segment and cancels it's selector
D16MemStrategy -- Sets startegy used for memory allocation (ForceLow, ForceHi,
(HiFirst, LowFirst, Transparent)
D16ProtectedPtr -- Creates a PM pointer from a RM pointer
D16RealPtr -- Creates a RM pointer from a PM pointer
D16SegAbsolute -- Creates a PM selector for a given address
D16SegCancel -- Cancels a PM selector
D16CSAlias -- Creates a selector of type CODE
D16SegDataPtr -- Creates a data selector whose base is offset of a given ptr.
D16SetProtect -- Specifies whether a segment is read-only or read/write
D16SegRealloc -- Compares segments allocation with current memory strategy
D16SegTransparent -- Returns a transparent pointer to a RM segment
D16SetAccess -- Sets the type or access byte for a PM segment.

Process Management Functions
----------------------------
D16MoveStack -- Switches the location of the cpu stack
D16rmCall -- Switches to real mode, loads registers from given structure
D16ToProtected -- Switches the cpu to protected mode
D16ToReal -- Switches the cpu to real mode, translates segment registers
_intflag -- Sets the cpu interrupts enabled flag
_is_pm -- Returns nonzero if cpu is in protected mode

I could go on..and on... but I am running out of energy. In general I am
very happy with the product. The support from Rational Systems has been very
personal and very thorough. Price: I think you would need to talk to them
because they arrange different prices based on the volume of sales, source
code stuff, and other things. I think $5000 + royalties sounds familier for
a typical arangement. Please call them for a definitive price.

If you have any other questions, or any specific questions on this stuff feel
free to e-mail me @ rd...@tops.sun.com.

Ok, my mistake to ask about the "impossible" port.

The word Idiot refers to a person with many ideas, especially stupid and harmful ideas.
This world goes south since everything's run by financiers and economists.
This isn't voice chat, yet some people overusing online communications talk and hear voices.

Reply 1226 of 1249, by MrFlibble

Posted on 2025-05-15, 17:56

MrFlibble Offline

Rank Oldbie

Rank: Oldbie
Posts: 1116
Joined: 2013-06-07, 19:56

7F20 wrote on 2025-05-15, 15:26:

Yeah, that's probably about as close as anyone will get, although its a rewrite of the GBA doom, not the OG game.

I think it's worth noting that the GBA Doom in question is not the original commercial GBA version of Doom, but a homebrew open source port based on PrBoom. So it is still derived from the original PC Doom code, however altered, while Doomwiki says the commercial GBA Doom was based on the Atari Jaguar code at Carmack's insistence (and originally could've used its own custom engine altogether).

DOS Games Archive | Free open source games | RGB Classic Games

Reply 1227 of 1249, by theelf

Posted on 2025-05-15, 22:56

theelf Offline

Rank Oldbie

Rank: Oldbie
Posts: 1081
Joined: 2011-09-25, 19:39

A 286 DOOM, real doom, VGA Adlib/SB sound, etc, will be the best thing ever, i really miss this, but i know how difficult will be

Hate when people say "Doom runs on everything" fuck no! dont run in one of most important CPU of the era, the 286. Man, remember i need to buy a 386 just for for doom, if not i was happy with my harris 286 25mhz

Reply 1228 of 1249, by leileilol

Posted on 2025-05-16, 01:17

leileilol Offline

Rank l33t++

Rank: l33t++
Posts: 12436
Joined: 2006-12-16, 18:03

hobby project; check your entitlement, no one is guaranteed to rewrite doom entirely for a fucntionally shittier processor on a whim. Odd time to demand that when they're reporting to be busy with a car, like that won't contribute to burnout 🙄 (PUN UNINTENDED)

long live PCem

Reply 1229 of 1249, by joeguy3121

Posted on 2025-05-16, 05:17

joeguy3121 Offline

Rank Newbie

Rank: Newbie
Posts: 85
Joined: 2022-06-13, 05:17

leileilol wrote on 2025-05-16, 01:17:

hobby project; check your entitlement, no one is guaranteed to rewrite doom entirely for a fucntionally shittier processor on a whim. Odd time to demand that when they're reporting to be busy with a car, like that won't contribute to burnout 🙄 (PUN UNINTENDED)

Excuse me. This is off topic but, is there another way I can contact you since you lack private message on your profile like maybe e-mail?
Thanks

Reply 1230 of 1249, by xcomcmdr

Posted on 2025-05-16, 05:56

xcomcmdr Offline

Rank Oldbie

Rank: Oldbie
Posts: 1053
Joined: 2009-09-19, 01:03
Location: France

leileilol wrote on 2025-05-16, 01:17:

hobby project; check your entitlement, no one is guaranteed to rewrite doom entirely for a fucntionally shittier processor on a whim. Odd time to demand that when they're reporting to be busy with a car, like that won't contribute to burnout 🙄 (PUN UNINTENDED)

Yeah, FastDoom is amazing, but common there are *hardware requirements* !

Especially some measure of raw CPU performance, and protected mode. Doom for < 486 isn't FastDoom.

If one really wants it, they better start coding.

Reply 1231 of 1249, by Rav

Posted on 2025-06-27, 20:37

Rav Offline

Rank Member

Rank: Member
Posts: 172
Joined: 2023-02-05, 05:42

While I was working on a "Fast Heretic", partly by looking at FastDoom code, I ended up on a better optimization for commit https://github.com/viti95/FastDoom/commit/e74 … 41fea89cd7f03ad

While applying the modification in Heretic code, I ended up with lower performance (realticks went from ~3004 to ~3050, redid a few benchmark to confirm).
It turn out that you do one check for bottom sil and one check for top sil and at the end, if the check is true for both top and bottom, you endup running the same FOR loop two times.
While the initial Heretic code did have a check beforehand to know if it needed to do bottom and/or top and if both need to be clipped, you get a different branch and you run the FOR loop only one time.

After refactoring the patch to apply it on Heretic while keeping it's other part of the optimization, the realticks went down to ~2989 on average.

Here is the specific part of the code, in R_THINGS.C, R_DrawSprite function

1//
2// clip this piece of the sprite
3//
4		silhouette = ds->silhouette;
5		if (spr->gz >= ds->bsilheight)
6			silhouette &= ~SIL_BOTTOM;
7		if (spr->gzt <= ds->tsilheight)
8			silhouette &= ~SIL_TOP;
9
10		if (silhouette == 1)
11		{	// bottom sil
12			for (x=r1 ; x<=r2 ; x++)
13				if (clipbot[x] == viewheight)
14					clipbot[x] = ds->sprbottomclip[x];
15		}
16		else if (silhouette == 2)
17		{	// top sil
18			for (x=r1 ; x<=r2 ; x++)
19				if (cliptop[x] == -1)
20					cliptop[x] = ds->sprtopclip[x];
21		}
22		else if (silhouette == 3)
23		{	// both
24			for (x=r1 ; x<=r2 ; x++)
25			{
26				if (clipbot[x] == viewheight)
27					clipbot[x] = ds->sprbottomclip[x];
28				if (cliptop[x] == -1)
29					cliptop[x] = ds->sprtopclip[x];
30			}
31		}

Reply 1232 of 1249, by ViTi95

Posted on 2025-07-01, 14:58

ViTi95 Online

Rank Oldbie

Rank: Oldbie
Posts: 563
Joined: 2017-02-14, 22:18

That's really interesting! It sounds like a solid optimization. If you're up for it, feel free to open a pull request so we can integrate your changes into FastDoom. I'm currently quite busy with this year racing championship (building LoRa based communication devices and learning FreeCAD to model the enclosures), so I haven't had much time to work on the code myself.

https://www.youtube.com/@viti95

Reply 1233 of 1249, by Frenkel

Posted on 2025-07-02, 20:27

Frenkel Offline

Rank Newbie

Rank: Newbie
Posts: 28
Joined: 2023-12-10, 19:36

There's a comment under the video Exploring The Other GBA Doom Port that says the result of P_CheckSight can be cached. If the input is the same as in the previous call, then return the previous result.
I've tried it in Doom8088, but it doesn't work all the time.

However the idea of caching the previous result sounds promising.
I've tried it in several functions, but only in R_PointInSubsector it improved the performance.

Reply 1234 of 1249, by Yoghoo

Posted on 2025-08-25, 14:23

Yoghoo Offline

Rank Member

Rank: Member
Posts: 457
Joined: 2021-07-17, 18:32
Location: Netherlands

I downloaded FastDoom 1.1.5 and I'm trying to do some benchmarks with bench.bat. But when starting FDOOM* (tried 5 versions) it says that it can't find demo1 (or 2, 3, 4). I tried 3 WAD versions (shareware, Ultimate and 1.9) but that didn't solve it. So no benchmark is generated.

What am I missing?

I found out this is a known message and can be ignored. I thought it was not doing a benchmark as it was repeating itself over and over again. But it seems that bench.bat does 10 timedemo's in a row which was way more then I expected. 😀

Reply 1235 of 1249, by ViTi95

Posted on 2025-10-07, 09:59

ViTi95 Online

Rank Oldbie

Rank: Oldbie
Posts: 563
Joined: 2017-02-14, 22:18

Hi everyone!

After some time without posting updates here, I’ve got something I think is quite interesting. I’ve been reviewing the span rendering code (R_DrawSpan...) for the backbuffered and VBE2 direct modes, and I managed to create a new optimized version specifically for Pentium processors.

On Pentium CPUs, the SHLD instruction is quite slow (4 cycles) and cannot be paired with other instructions in the U and V pipelines. So what I did was to create a new version that replaces those instructions with simpler combinations that can pair in the U and V pipes (MOV, SHL, AND, OR, etc.). Here are the results I’ve obtained:

Pentium 75:
- FDOOM13H i486: 67.670
- FDOOM13H Pentium: 72.109 (+6.5%)
- FDOOMVBR i486: 80.050
- FDOOMVBR Pentium: 86.337 (+7.8%)

Pentium 100:
- FDOOM13H i486: 85.518
- FDOOM13H Pentium: 90.985 (+6.3%)
- FDOOMVBR i486: 111.096
- FDOOMVBR Pentium: 119.862 (+7.8%)

Pentium 133:
- FDOOM13H i486: 95.349
- FDOOM13H Pentium: 99.196 (+4.0%)
- FDOOMVBR i486: 127.431
- FDOOMVBR Pentium: 134.398 (+5.4%)

Pentium 166:
- FDOOM13H i486: 101.201
- FDOOM13H Pentium: 102.583 (+1.3%)
- FDOOMVBR i486: 139.099
- FDOOMVBR Pentium: 140.108 (+0.7%)

Pentium 200:
- FDOOM13H i486: 110.371
- FDOOM13H Pentium: 110.012 (−0.4%)
- FDOOMVBR i486: 158.691
- FDOOMVBR Pentium: 157.032 (−1.1%)

Analyzing the results, it seems that the new code performs well, but the speedup decreases as CPU frequency increases, even becoming slightly slower on the Pentium 200. This made me think “maybe I’m hitting an architectural bottleneck”, so I decided to test on the Pentium MMX:

Pentium 166 MMX:
- FDOOM13H i486: 105.628
- FDOOM13H Pentium: 110.551 (+4.6%)
- FDOOMVBR i486: 148.413
- FDOOMVBR Pentium: 156.668 (+5.5%)

Pentium 200 MMX:
- FDOOM13H i486: 115.264
- FDOOM13H Pentium: 118.393 (+2.7%)
- FDOOMVBR i486: 169.006
- FDOOMVBR Pentium: 175.136 (+3.6%)

Indeed, the MMX models seems to solve the bottleneck issue that affects the non-MMX models. I also ran tests on other architectures; some with good results, others not so good:

IBM 6x86 PR166+ (133MHz):

- FDOOM13H i486: 92.227
- FDOOM13H Pentium: 93.502 (+1.3%)
- FDOOMVBR i486: 119.018
- FDOOMVBR Pentium: 120.934 (+1.6%)

AMD K5 PR100 (SSA5, 100MHz):
- FDOOM13H i486: 74.988
- FDOOM13H Pentium: 74.822 (−0.3%)
- FDOOMVBR i486: 88.775
- FDOOMVBR Pentium: 88.311 (−0.6%)

AMD K5 PR133 (5k86, 100MHz):
- FDOOM13H i486: 96.094
- FDOOM13H Pentium: 95.822 (−0.3%)
- FDOOMVBR i486: 125.074
- FDOOMVBR Pentium: 124.383 (−0.6%)

In the case of the Cyrix/IBM 6x86, the difference is minimally better, but I’d need to test more models at different frequencies to draw a solid conclusion (I only have this one). As for the K5, results are slightly worse, but again, more frequencies would be needed to confirm that.

And now for the one that completely surprised me, the IDT WinChip! This new code shouldn’t be faster on that CPU, since it has a design closer to a 486 than a Pentium… but the results speak for themselves:

IDT WinChip C6 200:
- FDOOM13H i486: 84.397
- FDOOM13H Pentium: 106.209 (+25.8%)
- FDOOMVBR i486: 110.371
- FDOOMVBR Pentium: 150.061 (+35.9%)

The conclusion I can draw from this CPU is that SHLD/SHRD instructions are extremely slow on the WinChip, and should be avoided whenever possible.

I still need to test more architectures such as the K6, K7, or Pentium II (or maybe some rarer ones like the Transmeta). If anyone can help me with that, I’d really appreciate it. In theory, it should also be faster on the K6 since it decodes SHLD/SHRD poorly and they’re not very parallelizable. I’m attaching a compiled version that includes the new code. If you want to check and compare your results with mine, please use the included CFG (Ultimate Doom 1.9 is required but not included). The commands I used are:

1fdoom13h -timedemo demo3 -i486  
2fdoom13h -timedemo demo3 -pentium  
3fdoomvbr -timedemo demo3 -i486  
4fdoomvbr -timedemo demo3 -pentium

https://www.youtube.com/@viti95

Reply 1236 of 1249, by GigAHerZ

Posted on 2025-10-07, 10:49

GigAHerZ Offline

Rank Oldbie

Rank: Oldbie
Posts: 1141
Joined: 2018-12-17, 15:35
Location: Estonia

ViTi95 wrote on 2025-10-07, 09:59:

On Pentium CPUs, the SHLD instruction is quite slow (4 cycles) and cannot be paired with other instructions in the U and V pipelines. So what I did was to create a new version that replaces those instructions with simpler combinations that can pair in the U and V pipes (MOV, SHL, AND, OR, etc.).

Is it similar to what Quake engine did? Kind of like "dual core" effect where some instructions can run in parallel? (I think they were mostly able to run some FPU stuff in parallel)

Amazing work! After every post of your's I have to search for my jaw under the table...

"640K ought to be enough for anybody." - And i intend to get every last bit out of it even after loading every damn driver!
A little about software engineering: https://byteaether.github.io/

Reply 1237 of 1249, by marxveix

Posted on 2025-10-07, 11:24

marxveix Offline

Rank Oldbie

Rank: Oldbie
Posts: 720
Joined: 2018-03-05, 21:46

AMD K5 PR166 and PR200 got new core or tweaked changes or it was already with K5 PR133?

If possible, test with fastest 233MMX as well.

Best ATi Rage3 drivers for 3DCIF / Direct3D / OpenGL / DVD : ATi RagePro drivers and software
30+MiniGL / OpenGL Win 9x dll files for all ATi Rage3 cards : Re: ATi RagePro OpenGL files

Reply 1238 of 1249, by ViTi95

Posted on 2025-10-07, 13:18

ViTi95 Online

Rank Oldbie

Rank: Oldbie
Posts: 563
Joined: 2017-02-14, 22:18

@GigAHerZ Yes, it’s the same idea as Quake’s dual FPU instruction execution, but simpler since there are no special tricks involved. I just replaced and reordered some assembly code. The Pentium has two execution pipes that can run certain instructions in parallel, and SHLD isn’t one of them.

@marxveix The K5 PR133 uses the new core and runs at the same frequency as the K5 PR100, so I’m directly comparing both at the same MHz. I can’t test the Pentium 233 MMX since I don’t have one.

https://www.youtube.com/@viti95

Reply 1239 of 1249, by 7F20

Posted on 2025-10-07, 22:03

7F20 Offline

Rank Member

Rank: Member
Posts: 303
Joined: 2015-04-10, 04:06

ViTi95 wrote on 2025-10-07, 09:59:
Pentium 75: - FDOOM13H i486: 67.670 - FDOOM13H Pentium: 72.109 (+6.5%) - FDOOMVBR i486: 80.050 - FDOOMVBR Pentium: 86.337 (+7.8% […]
Show full quote

Pentium 75:
- FDOOM13H i486: 67.670
- FDOOM13H Pentium: 72.109 (+6.5%)
- FDOOMVBR i486: 80.050
- FDOOMVBR Pentium: 86.337 (+7.8%)

I feel pretty stupid here, but I don't understand what the numbers mean. I'm sure it's obvious and I'm simply failing to wrap my head around it. Are these increases relative to their previous benchmark numbers you have in another post, or on a website? I don't see them on the Git.

But I guess I can assume that this:
FDOOM13H i486: 67.670
FDOOM13H Pentium: 72.109 (+6.5%)

means that i486 showed no improvement with the new code, but Pentium expressed a 6.5% increase over your previous benchmark?

TIA

Main menu