Doom vs Heretic VGA performance difference

Reply 40 of 43, by kool kitty89

Posted on 2025-06-03, 13:23

kool kitty89 Offline

Rank Member

Rank: Member
Posts: 438
Joined: 2012-02-15, 08:43
Location: San Jose, CA

The explanation for this that I've seen quoted from Carmack (and described by some people who worked or are still working on hoby source ports, or from-scratch rebuilds of Doom), was that Doom's original (released, early alphas aside) engine on PC made use of Mode Y for 2 main reasons:
1.
to allow for multi-buffering into VGA memory (320x200 allows for 4 pages to be stored in VGA RAM, potentially smoothing out slow rendering, though it seems that triple buffering specifically was used, not quadrple ... so maybe the other 64kB was used for something else; some suggestions pointed to the other 64kB being used for off-screen buffer space of additional assets, though to me the only thing that would make sense there would be the 2D background layer applied when outside of buildings; rendering that 2D scroll layer might also be one of the few other advantages of using unchained VGA mode, since VGA latch assisted fast block copy could be used to speed up rendering that background layer when/where present ... or I believe it's been stated elsewhere that that background is ALWAYS there, so it could be being block-copied over at the start of each new screen page's render cycle in leu of screen clearing, then the "3D" layer is drawn pixel-by-pixel on top of it).
That and using no buffer in main RAM at all, so freeing up an extra 64kB.

2.
to take advantage of some VGA register features that allow some primitive pipelining and execution overlap (schedule a VGA pixel byte write and calculate the next pixel to prepare for the sobsequent write). Plus you get double-wide pixels for "free" by manipulating VGA registers to write 2 pixels instead of 1, and also only have to worry about 2 VGA memory planes instead of 4 (in full detail) since the extra pixel write is the the adjacent memory plane.

In that context, that version of doom (ie the original DOS release and shareware version) relied on simple 8-bit pixel writes for texture mapping, and wrote directly into VGA memory (or directly to VGA registers, which then completed the write while the CPU continued processing). To me this sounds like targeting a mix of reasonable fast 486 class CPUs for 1993 along with ISA video cards, perhaps reasonably fast ISA cards (like ones with 0ws capability), but still 16-bit, probably-not-overclocked 8 MHz bus ISA. Since it would thus be a choice between rendering a linear framebuffer in main RAM CPU block copying across a 16-bit wide ISA bus (so 16-bit writes at best) using mode 13h or doing what they actually did, texture mapping directly into a VGA framebuffer in unchained mode X (or Mode Y since 320x200 70 Hz screen was used). This should mean that there's little, or at least much less to be gained from VLB or PCI compared to ISA cards other than potentially faster access to VGA registers, if that particular card allows such (ie if the VGA compatible portion of the implementation is also especially fast)

Doom also spends a lot of times rendering single-pixel wide columns, which have to be drawn one byte at a time even if they were to go into main memory, so would only be faster to do that way if they were batch rendered into a mini-framebuffer scratchpad space (small enough to fit into cache reliably without frequent cache flushing), but it seems like that wasn't practical to implement, or they were thinking in terms of small on-chip caches and not larger board-level caches. (though not as fast as 486 on-chip cache) OTOH, the span-wise rendering for floors and ceilings would be easy to pipeline into 32-bit register wide line segments and write back 32 bit values to main RAM, so that would take a bigger hit with the single-pixel unchained VGA memory rendering technique. (I believe Quake took advantage of scrachpad space to allow efficient use of its subdivided texture span renderer and enough buffer space to quickly and efficiently convert linear bitmap to unchained VGA pixel orientation, thus also allowing for pixels to be repacked into 32-bit CPU registers and written directly to VGA memory at 32-bit width in proper, planar memory order ... this might simply entail reading out pixels from cached buffer area and quickly shifting bytes out into 4 separate 32-bit registers, then doing 4 32-bit writes into VGA memory; I'm not sure if Quake switched to mode 13h for using its baseline 320x200 resolution, since it should still be faster and was intended for the lowest detail option, and no Doom style double-wide low-detail mode was supported to potential justify using unchained mode for the benefit of minimum spec machines, so it's possible that quake supported mode 13h for 320x200, but must have used unchained mode for all higher VGA resolutions, then also supported linear VESA compatible SVGA modes as well)

The use of unchained mode for Doom was apparently Carmack's decision, and one that was reversed on the Heretic engine, as that went back to the more common method of block copying to a single linear mode 13h screen buffer from a render buffer in main RAM. (in which case, VLB and PCI cards would directly benefit from the faster, wider bus available, though I've seen some argument that the advantages claimed for Doom in 1993 are dubious as well in terms of real-world results)

If Heretic actually manages to run better than Doom in some examples of ISA VGA cards and lower spec machines, or at least runs proportionally better when comparing both Doom and Heretic on both machines (ie Heretic is slower in both cases, but the difference between lower and higher spec machines is less), than the logic behind Doom's methodology didn't pan out in those cases.
But, to me, the line of thinking Carmack was using must have been very heavily oriented around the VGA card being the biggest bottleneck in the whole system, and makes the most sense from a perspective of the ISA bus.

Now what should not be surprising at all is that the gains from a VLB card vs an ISA card would be relatively modest (especially if you could compare the same VGA/SVGA core on a VLB and ISA card, or on a PCI card for that matter)

Here's a direct quote from Carmack that covers some of this stuff:
https://groups.google.com/g/alt.games.doom/c/ … QMJ?hl=en&pli=1

John Carmack May 18, 1994, 3:50:20 AM to >so let me restate the answer to your question ; yes, it uses MCGA >320x200x256c and no […]
Show full quote

John Carmack
May 18, 1994, 3:50:20 AM
to
>so let me restate the answer to your question ; yes, it uses MCGA
>320x200x256c and no it does not do any page-swaps, it can't, MCGA
>320x200x256c only has 1 page.

nope.

DOOM uses 320*200*256 VGA mode, which is slightly different from MCGA
mode (it would NOT run on an MCGA equiped machine). I access the
frame buffer in an interleaved planar mode similar to Michael
Abrash's "Mode X", but still at 200 scan lines instead of 240 (less
pixels == faster update rate).

DOOM cycles between three display pages. If only two were used, it
would have to sync to the VBL to avoid possible display flicker. If
you look carefully at a HOM effect, you should see three distinct
images being cycled between.

John Carnmack

...

John Carmack
May 22, 1994, 4:23:14 AM
to
e...@agora.rdrop.com (Ed Hurtley) wrote:

>Check, please... In case you haven't hit ESC ever, the Options menu
>has a Low/High resolution toggle... Low is 320x200, High is
>640x400, with the border graphics (the score bar, menu, etc...) are
>still 320x200... (Just the same graphics files)

Low detail is 160*200 in the view screen. This is done by setting
two bits in the mapmask register whenever the texturing functions are
writing to video memory, causing two pixels to be set for each byte
written.

ui...@freenet.Victoria.BC.CA (Ben Morris) wrote:

>John,

>You're using a planar graphics system for a bitmapped game that
>updates the entire screen at a respectable framrate on a 486/66?

Its planar, but not bit planar (THAT would stink). Pixels 0,4,8 are
in plane 0, pixels 1,5,9 are in plane 1, etc.

>That's pretty incredible. I would have thought all the over-
>head for programming the VGA registers would kill that
>possibility.

The registers don't need to be programed all that much. The map mask
register only needs to be set once for each vertical column, and four
times for each horizontal row (I step by four pixels in the inner
loop to stay on the same plane, then increment the start pixel and
move to the next plane).

It is still a lot of grief, and it polutes the program quite a bit,
but texture mapping directly to the video memory gives you a fair
amount of extra speed (10% - 15%) on most video cards because the
video writes are interleaved with main memory accesses and texture
calculations, giving the write time to complete without stalling.

Going to that trouble also gets a perfect page flip, rather than the
tearing you get with main memory buffering.

John Carmack

Reply 41 of 43, by auron

Posted on 2025-06-04, 15:19

auron Offline

Rank Oldbie

Rank: Oldbie
Posts: 966
Joined: 2015-08-20, 01:56

heretic and hexen are more demanding than doom, the latter looks to run at around 10 fps on a dx2/66 even in an early area with not many enemies, so maybe they thought they had to go with this setup to get as much performance as possible even at the cost of introducing tearing.

the page flipping setup in doom works remarkably well in my opinion, no tearing at seemingly no input lag cost whatsoever, variable framerates and no visual glitches that i have seen. in comparison, duke3d (VBE 2.0) and quake also use page flipping, but duke suffers from split-second artifacts, at least on certain video cards. IIRC the quake tech help file says that artifacts can appear when page flipping occurs at the wrong timing or something and suggests to additionally engage vsync instead to prevent this, which will come with the usual drawbacks.

then with 3d acceleration, double buffered vsync became standard for a few years, locking the framerate to fractions of the target refresh rate and introducing more input lag, which really made those first-gen cards suffer more than anything else. likewise, on the n64 double-buffered vsync seemed to have been used much more frequently than on the ps1. so it seems that after doom this aspect of rendering just regressed.

Reply 42 of 43, by leileilol

Posted on 2025-06-04, 15:24

leileilol Offline

Rank l33t++

Rank: l33t++
Posts: 12347
Joined: 2006-12-16, 18:03

I'd think Heretic/Hexen does it that way to use translucency tables better.

long live PCem

Reply 43 of 43, by st31276a

Posted on 2025-06-05, 08:42

st31276a Offline

Rank Member

Rank: Member
Posts: 172
Joined: 2023-03-20, 13:11
Location: South Africa

Wrt original post; maybe it's those 8-bit writes that only uses half of the isa bus...

Main menu