VOGONS


Reply 1220 of 1241, by digger

User metadata
Rank Oldbie
Rank
Oldbie
analog_programmer wrote on 2025-05-14, 11:00:
rasteri wrote on 2025-05-14, 09:01:

Fastdoom is written is for 32bit processors, it doesn't really make sense to have 16-bit binaries compile from the same codebase. It would just be a total mess.

I thought it's mainly written in C, not 32-bit specific assembly.

Even if something (particularly a protected mode DOS game) is written in C, it would require extensive work to port it to 8086 real mode. For one thing, the pointer size does matter, even in C.

Also, the game wouldn't fit in conventional memory, which means that it would have to use EMS for the memory, which requires application-level memory management. Modifying a C program to use EMS memory in real mode instead of memory that can be directly allocated the usual way in protected mode, that's quite the ordeal.

I think this prospect was exactly why John Carmack opted to develop Doom as a 32-bit protected mode game in the first place.

There's really no point in doing this, since Doom (and even FastDoom) doesn't even run smoothly on a 386, let alone on anything older than that.

Reply 1221 of 1241, by analog_programmer

User metadata
Rank Oldbie
Rank
Oldbie
digger wrote on 2025-05-14, 22:45:

Even if something (particularly a protected mode DOS game) is written in C, it would require extensive work to port it to 8086 real mode...

Where did you saw real mode porting? There are some 286 16-bit DOS extenders (TNT DOS extender, DOS/16M), even if they're very hard to find nowadays.

The word Idiot refers to a person with many ideas, especially stupid and harmful ideas.
This world goes south since everything's run by financiers and economists.
This isn't voice chat, yet some people overusing online communications talk and hear voices.

Reply 1222 of 1241, by 7F20

User metadata
Rank Member
Rank
Member
analog_programmer wrote on 2025-05-15, 04:33:

There are some 286 16-bit DOS extenders (TNT DOS extender, DOS/16M), even if they're very hard to find nowadays.

80286 DOS extenders introduce significant processing overhead because they act as an intermediary between DOS and the application. They are also incompatible with anything that has a device driver with DMA or TSRs.

Not that the original DOOM code (even in the form of FastDoom) could be made to run on something as slow as a 286 to begin with.

Any usable 286 version of DOOM would have to be a ground-up rewrite, and it would be a miracle if someone actually could do it and make it playable (by playable, I mean anything more than a slideshow).

Reply 1223 of 1241, by MrFlibble

User metadata
Rank Oldbie
Rank
Oldbie
7F20 wrote on 2025-05-15, 14:58:

Not that the original DOOM code (even in the form of FastDoom) could be made to run on something as slow as a 286 to begin with.

Any usable 286 version of DOOM would have to be a ground-up rewrite, and it would be a miracle if someone actually could do it and make it playable (by playable, I mean anything more than a slideshow).

I suppose you're talking about something like this?

DOS Games Archive | Free open source games | RGB Classic Games

Reply 1224 of 1241, by 7F20

User metadata
Rank Member
Rank
Member
MrFlibble wrote on 2025-05-15, 15:21:

I suppose you're talking about something like this?

Yeah, that's probably about as close as anyone will get, although its a rewrite of the GBA doom, not the OG game.

But, yes, you make a good point. If anyone wants to play Doom on a 286, that already exists.

Reply 1225 of 1241, by analog_programmer

User metadata
Rank Oldbie
Rank
Oldbie
analog_programmer wrote on 2025-05-14, 08:22:

Any chance we'll ever see a FastDoom version/port with 286 support trough something like DOS|286 (TNT DOS) Extender or DOS/16M?

I know about Doom8088 and RealDoom ports, but they still seem so immature.

I clearly mean an optimized and adjusted for low-res EGA, CGA or Hercules modes FastDoom with 16-bit DOS extender, not a crippled console Doom version port like Doom8088 in real mode. And of course the deniers here are trying to twist my words even if the words are written.

16-bit extenders... what?

As for the 7F20's statement about 286 DOS extenders - it's just a(n empty) statement. Here is an archived contrary opinion on Rational Systems DOS/16M extender from 1989:

We are using this extensively for a major network product. I reccommend it highly for **most** applications that need lots of me […]
Show full quote

We are using this extensively for a major network product. I reccommend it
highly for **most** applications that need lots of memory. In general, a
program such as a spreadsheet or database that is more *applications* oriented
is much easier to port to a protected mode / real mode environment. A TSR or
heavy systems projects such as a network are much more difficult to port.

DOS/16M (as it is called) provides several
object files that are linked in with your Microsoft C code. It also provides
a "post linker" called 'makepm' that creates a protected mode version of
your executable. A very nice source debugger is included. It looks similar to
a combination of codeview and perisope. It only works on protected mode
portions of your code. I reccommend using both Rationals debugger and
Periscope (for real mode portions).

When you run your program, a minimum of 24K will still reside in real mode
memory space (below 1meg.). This is their run-time kernel. Memory can be
allocated in PM memory space or RM memory space. It can also be allocated as
'transparent'. This means that this memory can be both referenced by a PM
selector that has the same value as the phyisical address. This allows both
real mode and protected mode portions of the application to access the same
memory without converting a real mode segment to a selector. An example:

For time critical stuff (like a real-time interrupt handler) you MUST
avoid the mode switching time on each interrupt. It can take anywhere between
90 and 1100 microseconds to do a round trip from RM to PM and back again.
In this example you must implement a 'bi-modal' interrupt handler. You must
implement a handler in PM and another handler in RM. Protected mode has it's
own int vector table called the IDT. Dos 16/M supports calls to do this type
of stuff. It can get quite hairy with real-time systems stuff as you can see.

Here is a list of the functions that are provided in their interface:

Interrupt Handling Functions
----------------------------
D16IntTransparent -- Installs BIOS interrupt handler in a PM vector
D16Passdown -- Sets a protected-mode interrupt vector to pass down.
(A passdown int, is one that is signaled in PM. The system will
then switch to RM and resignal the interrupt).
D16Passup -- Sets a real-mode interrupt vector to pass up. (The opposite of
pass-down)
D16pmGetVector -- Gets the selector and offset for a PM vector.
D16pmInstall -- Installs a PM vector
D16rmGetVector -- Gets the segment and offset for a RM interrupt vector
D16rmInstall -- Installs a RM vector
D16rmInterrupt -- Signals a real-mode interrupt and sets the registers

Memory Management Functions
---------------------------
D16AbsAddress -- Returns the absolute address of a PM pointer
D16ExtAvail - Returns bytes of extended memory available for allocation
D16GetAccess - Returns the access byte for a protected-mode pointer
D16HugeAlloc -- Allocates a block of memory (very large)
D16LowAvail -- Returns the number of bytes in largest block of DOS managed
memory
D16MemAlloc -- Allocates a data segment
D16MemFree -- Frees a PM segment and cancels it's selector
D16MemStrategy -- Sets startegy used for memory allocation (ForceLow, ForceHi,
(HiFirst, LowFirst, Transparent)
D16ProtectedPtr -- Creates a PM pointer from a RM pointer
D16RealPtr -- Creates a RM pointer from a PM pointer
D16SegAbsolute -- Creates a PM selector for a given address
D16SegCancel -- Cancels a PM selector
D16CSAlias -- Creates a selector of type CODE
D16SegDataPtr -- Creates a data selector whose base is offset of a given ptr.
D16SetProtect -- Specifies whether a segment is read-only or read/write
D16SegRealloc -- Compares segments allocation with current memory strategy
D16SegTransparent -- Returns a transparent pointer to a RM segment
D16SetAccess -- Sets the type or access byte for a PM segment.

Process Management Functions
----------------------------
D16MoveStack -- Switches the location of the cpu stack
D16rmCall -- Switches to real mode, loads registers from given structure
D16ToProtected -- Switches the cpu to protected mode
D16ToReal -- Switches the cpu to real mode, translates segment registers
_intflag -- Sets the cpu interrupts enabled flag
_is_pm -- Returns nonzero if cpu is in protected mode

I could go on..and on... but I am running out of energy. In general I am
very happy with the product. The support from Rational Systems has been very
personal and very thorough. Price: I think you would need to talk to them
because they arrange different prices based on the volume of sales, source
code stuff, and other things. I think $5000 + royalties sounds familier for
a typical arangement. Please call them for a definitive price.

If you have any other questions, or any specific questions on this stuff feel
free to e-mail me @ rd...@tops.sun.com.

Ok, my mistake to ask about the "impossible" port.

The word Idiot refers to a person with many ideas, especially stupid and harmful ideas.
This world goes south since everything's run by financiers and economists.
This isn't voice chat, yet some people overusing online communications talk and hear voices.

Reply 1226 of 1241, by MrFlibble

User metadata
Rank Oldbie
Rank
Oldbie
7F20 wrote on 2025-05-15, 15:26:

Yeah, that's probably about as close as anyone will get, although its a rewrite of the GBA doom, not the OG game.

I think it's worth noting that the GBA Doom in question is not the original commercial GBA version of Doom, but a homebrew open source port based on PrBoom. So it is still derived from the original PC Doom code, however altered, while Doomwiki says the commercial GBA Doom was based on the Atari Jaguar code at Carmack's insistence (and originally could've used its own custom engine altogether).

DOS Games Archive | Free open source games | RGB Classic Games

Reply 1227 of 1241, by theelf

User metadata
Rank Oldbie
Rank
Oldbie

A 286 DOOM, real doom, VGA Adlib/SB sound, etc, will be the best thing ever, i really miss this, but i know how difficult will be

Hate when people say "Doom runs on everything" fuck no! dont run in one of most important CPU of the era, the 286. Man, remember i need to buy a 386 just for for doom, if not i was happy with my harris 286 25mhz

Reply 1228 of 1241, by leileilol

User metadata
Rank l33t++
Rank
l33t++

hobby project; check your entitlement, no one is guaranteed to rewrite doom entirely for a fucntionally shittier processor on a whim. Odd time to demand that when they're reporting to be busy with a car, like that won't contribute to burnout 🙄 (PUN UNINTENDED)

apsosig.png
long live PCem

Reply 1229 of 1241, by joeguy3121

User metadata
Rank Newbie
Rank
Newbie
leileilol wrote on 2025-05-16, 01:17:

hobby project; check your entitlement, no one is guaranteed to rewrite doom entirely for a fucntionally shittier processor on a whim. Odd time to demand that when they're reporting to be busy with a car, like that won't contribute to burnout 🙄 (PUN UNINTENDED)

Excuse me. This is off topic but, is there another way I can contact you since you lack private message on your profile like maybe e-mail?
Thanks

Reply 1230 of 1241, by xcomcmdr

User metadata
Rank Oldbie
Rank
Oldbie
leileilol wrote on 2025-05-16, 01:17:

hobby project; check your entitlement, no one is guaranteed to rewrite doom entirely for a fucntionally shittier processor on a whim. Odd time to demand that when they're reporting to be busy with a car, like that won't contribute to burnout 🙄 (PUN UNINTENDED)

Yeah, FastDoom is amazing, but common there are *hardware requirements* !

Especially some measure of raw CPU performance, and protected mode. Doom for < 486 isn't FastDoom.

If one really wants it, they better start coding.

Reply 1231 of 1241, by Rav

User metadata
Rank Member
Rank
Member

While I was working on a "Fast Heretic", partly by looking at FastDoom code, I ended up on a better optimization for commit https://github.com/viti95/FastDoom/commit/e74 … 41fea89cd7f03ad

While applying the modification in Heretic code, I ended up with lower performance (realticks went from ~3004 to ~3050, redid a few benchmark to confirm).
It turn out that you do one check for bottom sil and one check for top sil and at the end, if the check is true for both top and bottom, you endup running the same FOR loop two times.
While the initial Heretic code did have a check beforehand to know if it needed to do bottom and/or top and if both need to be clipped, you get a different branch and you run the FOR loop only one time.

After refactoring the patch to apply it on Heretic while keeping it's other part of the optimization, the realticks went down to ~2989 on average.

Here is the specific part of the code, in R_THINGS.C, R_DrawSprite function

//
// clip this piece of the sprite
//
silhouette = ds->silhouette;
if (spr->gz >= ds->bsilheight)
silhouette &= ~SIL_BOTTOM;
if (spr->gzt <= ds->tsilheight)
silhouette &= ~SIL_TOP;

if (silhouette == 1)
{ // bottom sil
for (x=r1 ; x<=r2 ; x++)
if (clipbot[x] == viewheight)
clipbot[x] = ds->sprbottomclip[x];
}
else if (silhouette == 2)
{ // top sil
for (x=r1 ; x<=r2 ; x++)
if (cliptop[x] == -1)
cliptop[x] = ds->sprtopclip[x];
}
else if (silhouette == 3)
{ // both
for (x=r1 ; x<=r2 ; x++)
{
if (clipbot[x] == viewheight)
clipbot[x] = ds->sprbottomclip[x];
if (cliptop[x] == -1)
cliptop[x] = ds->sprtopclip[x];
}
}

Reply 1232 of 1241, by ViTi95

User metadata
Rank Oldbie
Rank
Oldbie

That's really interesting! It sounds like a solid optimization. If you're up for it, feel free to open a pull request so we can integrate your changes into FastDoom. I'm currently quite busy with this year racing championship (building LoRa based communication devices and learning FreeCAD to model the enclosures), so I haven't had much time to work on the code myself.

https://www.youtube.com/@viti95

Reply 1233 of 1241, by Frenkel

User metadata
Rank Newbie
Rank
Newbie

There's a comment under the video Exploring The Other GBA Doom Port that says the result of P_CheckSight can be cached. If the input is the same as in the previous call, then return the previous result.
I've tried it in Doom8088, but it doesn't work all the time.

However the idea of caching the previous result sounds promising.
I've tried it in several functions, but only in R_PointInSubsector it improved the performance.

Reply 1234 of 1241, by Yoghoo

User metadata
Rank Member
Rank
Member

I downloaded FastDoom 1.1.5 and I'm trying to do some benchmarks with bench.bat. But when starting FDOOM* (tried 5 versions) it says that it can't find demo1 (or 2, 3, 4). I tried 3 WAD versions (shareware, Ultimate and 1.9) but that didn't solve it. So no benchmark is generated.

What am I missing?

I found out this is a known message and can be ignored. I thought it was not doing a benchmark as it was repeating itself over and over again. But it seems that bench.bat does 10 timedemo's in a row which was way more then I expected. 😀

Reply 1235 of 1241, by ViTi95

User metadata
Rank Oldbie
Rank
Oldbie

Hi everyone!

After some time without posting updates here, I’ve got something I think is quite interesting. I’ve been reviewing the span rendering code (R_DrawSpan...) for the backbuffered and VBE2 direct modes, and I managed to create a new optimized version specifically for Pentium processors.

On Pentium CPUs, the SHLD instruction is quite slow (4 cycles) and cannot be paired with other instructions in the U and V pipelines. So what I did was to create a new version that replaces those instructions with simpler combinations that can pair in the U and V pipes (MOV, SHL, AND, OR, etc.). Here are the results I’ve obtained:

Pentium 75:
- FDOOM13H i486: 67.670
- FDOOM13H Pentium: 72.109 (+6.5%)
- FDOOMVBR i486: 80.050
- FDOOMVBR Pentium: 86.337 (+7.8%)

Pentium 100:
- FDOOM13H i486: 85.518
- FDOOM13H Pentium: 90.985 (+6.3%)
- FDOOMVBR i486: 111.096
- FDOOMVBR Pentium: 119.862 (+7.8%)

Pentium 133:
- FDOOM13H i486: 95.349
- FDOOM13H Pentium: 99.196 (+4.0%)
- FDOOMVBR i486: 127.431
- FDOOMVBR Pentium: 134.398 (+5.4%)

Pentium 166:
- FDOOM13H i486: 101.201
- FDOOM13H Pentium: 102.583 (+1.3%)
- FDOOMVBR i486: 139.099
- FDOOMVBR Pentium: 140.108 (+0.7%)

Pentium 200:
- FDOOM13H i486: 110.371
- FDOOM13H Pentium: 110.012 (−0.4%)
- FDOOMVBR i486: 158.691
- FDOOMVBR Pentium: 157.032 (−1.1%)

Analyzing the results, it seems that the new code performs well, but the speedup decreases as CPU frequency increases, even becoming slightly slower on the Pentium 200. This made me think “maybe I’m hitting an architectural bottleneck”, so I decided to test on the Pentium MMX:

Pentium 166 MMX:
- FDOOM13H i486: 105.628
- FDOOM13H Pentium: 110.551 (+4.6%)
- FDOOMVBR i486: 148.413
- FDOOMVBR Pentium: 156.668 (+5.5%)

Pentium 200 MMX:
- FDOOM13H i486: 115.264
- FDOOM13H Pentium: 118.393 (+2.7%)
- FDOOMVBR i486: 169.006
- FDOOMVBR Pentium: 175.136 (+3.6%)

Indeed, the MMX models seems to solve the bottleneck issue that affects the non-MMX models. I also ran tests on other architectures; some with good results, others not so good:

IBM 6x86 PR166+ (133MHz):

- FDOOM13H i486: 92.227
- FDOOM13H Pentium: 93.502 (+1.3%)
- FDOOMVBR i486: 119.018
- FDOOMVBR Pentium: 120.934 (+1.6%)

AMD K5 PR100 (SSA5, 100MHz):
- FDOOM13H i486: 74.988
- FDOOM13H Pentium: 74.822 (−0.3%)
- FDOOMVBR i486: 88.775
- FDOOMVBR Pentium: 88.311 (−0.6%)

AMD K5 PR133 (5k86, 100MHz):
- FDOOM13H i486: 96.094
- FDOOM13H Pentium: 95.822 (−0.3%)
- FDOOMVBR i486: 125.074
- FDOOMVBR Pentium: 124.383 (−0.6%)

In the case of the Cyrix/IBM 6x86, the difference is minimally better, but I’d need to test more models at different frequencies to draw a solid conclusion (I only have this one). As for the K5, results are slightly worse, but again, more frequencies would be needed to confirm that.

And now for the one that completely surprised me, the IDT WinChip! This new code shouldn’t be faster on that CPU, since it has a design closer to a 486 than a Pentium… but the results speak for themselves:

IDT WinChip C6 200:
- FDOOM13H i486: 84.397
- FDOOM13H Pentium: 106.209 (+25.8%)
- FDOOMVBR i486: 110.371
- FDOOMVBR Pentium: 150.061 (+35.9%)

The conclusion I can draw from this CPU is that SHLD/SHRD instructions are extremely slow on the WinChip, and should be avoided whenever possible.

I still need to test more architectures such as the K6, K7, or Pentium II (or maybe some rarer ones like the Transmeta). If anyone can help me with that, I’d really appreciate it. In theory, it should also be faster on the K6 since it decodes SHLD/SHRD poorly and they’re not very parallelizable. I’m attaching a compiled version that includes the new code. If you want to check and compare your results with mine, please use the included CFG (Ultimate Doom 1.9 is required but not included). The commands I used are:

fdoom13h -timedemo demo3 -i486  
fdoom13h -timedemo demo3 -pentium
fdoomvbr -timedemo demo3 -i486
fdoomvbr -timedemo demo3 -pentium

https://www.youtube.com/@viti95

Reply 1236 of 1241, by GigAHerZ

User metadata
Rank Oldbie
Rank
Oldbie
ViTi95 wrote on 2025-10-07, 09:59:

On Pentium CPUs, the SHLD instruction is quite slow (4 cycles) and cannot be paired with other instructions in the U and V pipelines. So what I did was to create a new version that replaces those instructions with simpler combinations that can pair in the U and V pipes (MOV, SHL, AND, OR, etc.).

Is it similar to what Quake engine did? Kind of like "dual core" effect where some instructions can run in parallel? (I think they were mostly able to run some FPU stuff in parallel)

Amazing work! After every post of your's I have to search for my jaw under the table...

"640K ought to be enough for anybody." - And i intend to get every last bit out of it even after loading every damn driver!
A little about software engineering: https://byteaether.github.io/

Reply 1237 of 1241, by marxveix

User metadata
Rank Oldbie
Rank
Oldbie

AMD K5 PR166 and PR200 got new core or tweaked changes or it was already with K5 PR133?

If possible, test with fastest 233MMX as well.

30+ MiniGL/OpenGL Win9x files for all Rage3 cards: Re: ATi RagePro OpenGL files

Reply 1238 of 1241, by ViTi95

User metadata
Rank Oldbie
Rank
Oldbie

@GigAHerZ Yes, it’s the same idea as Quake’s dual FPU instruction execution, but simpler since there are no special tricks involved. I just replaced and reordered some assembly code. The Pentium has two execution pipes that can run certain instructions in parallel, and SHLD isn’t one of them.

@marxveix The K5 PR133 uses the new core and runs at the same frequency as the K5 PR100, so I’m directly comparing both at the same MHz. I can’t test the Pentium 233 MMX since I don’t have one.

https://www.youtube.com/@viti95

Reply 1239 of 1241, by 7F20

User metadata
Rank Member
Rank
Member
ViTi95 wrote on 2025-10-07, 09:59:
Pentium 75: - FDOOM13H i486: 67.670 - FDOOM13H Pentium: 72.109 (+6.5%) - FDOOMVBR i486: 80.050 - FDOOMVBR Pentium: 86.337 (+7.8% […]
Show full quote

Pentium 75:
- FDOOM13H i486: 67.670
- FDOOM13H Pentium: 72.109 (+6.5%)
- FDOOMVBR i486: 80.050
- FDOOMVBR Pentium: 86.337 (+7.8%)

I feel pretty stupid here, but I don't understand what the numbers mean. I'm sure it's obvious and I'm simply failing to wrap my head around it. Are these increases relative to their previous benchmark numbers you have in another post, or on a website? I don't see them on the Git.

But I guess I can assume that this:
FDOOM13H i486: 67.670
FDOOM13H Pentium: 72.109 (+6.5%)

means that i486 showed no improvement with the new code, but Pentium expressed a 6.5% increase over your previous benchmark?

TIA