VOGONS


MTRRLFBE and AGP/PCIe cards in DOS

Topic actions

First post, by bakemono

User metadata
Rank Oldbie
Rank
Oldbie

First some results for framebuffer write speed:

486DX4-100, Trident 8900 ISA - 5.4MB/s
486DX4-100, Trident 9440 VLB - 31MB/s
Pentium II-350, i440BX, Trident 9680 PCI - 38MB/s
Pentium II-350, i440BX, Trident 9680 PCI - 62MB/s (write combining enabled)
Pentium III-600e, i440BX, GeForce FX5200 AGP - 47MB/s
Pentium III-600e, i440BX, GeForce FX5200 AGP - 240MB/s (write combining enabled)
Pentium M-1200, i855PM, Radeon 7500 AGP - 50MB/s
Pentium M-1200, i855PM, Radeon 7500 AGP - 169MB/s (write combining enabled)
Athlon XP, ViaKT333, GeForce FX5700 AGP - 83MB/s
Athlon XP, ViaKT333, GeForce FX5700 AGP - 192MB/s (write combining enabled)
Phenom II, AMD770, Radeon 5670 PCIe - 189MB/s
Phenom II, AMD770, Radeon 5670 PCIe - 2500MB/s (write combining enabled)

As you can see, write combining makes a big difference. Even so, I find the performance of AGP 4x/8x cards to be lackluster. The 440BX chipset is only AGP 2x but scores a higher result. Consequently, the Pentium III-600 returns a DOOM benchmark result of 5026 in 1068 (demo1) which beats the Pentium-M (which had 1417 realtics)

On AMD CPUs I used MTRRLFBE to enable write combining for the LFB (vs. fastvid on intel CPUs). However it fails to enable WC for the VGA buffer. Without WC for the VGA buffer, the Athlon XP also fails to beat the Pentium III DOOM bench, and the Phenom II narrowly beats it with 802 realtics.

Anyone know why VGA WC isn't working or another way to enable it?

The Radeon 5670 is the only PCIe card I've tested and it's only running at PCIe 1.1 speed, but gives a whopping 2.5GB/s out of the 4GB/s theoretical maximum of the PCIe bus. AGP 8x should top out at 2.1GB/s. Has anyone gotten better performance than 240MB/s from an AGP card?

I've been playing with DOS lately to test DOS builds of FreeBASIC programs. It seems that the FB graphics routines must use the VGA buffer, because MTRRLFBE makes no difference, and framerates on the Phenom II are much slower in DOS than in Windows.

Reply 2 of 26, by bakemono

User metadata
Rank Oldbie
Rank
Oldbie

Mostly I used the PROFILE utility that comes with scitech display doctor, as it can test both banked and linear screen modes.

The old NES emulator for DOS, NESticle, runs at 2,200fps on the Phenom II with 320x240 Vesa 2 mode. With write combining enabled it goes up to 10,000fps.

Reply 3 of 26, by Falcosoft

User metadata
Rank Oldbie
Rank
Oldbie

On AMD CPUs I used MTRRLFBE to enable write combining for the LFB (vs. fastvid on intel CPUs). However it fails to enable WC for the VGA buffer. Without WC for the VGA buffer, the Athlon XP also fails to beat the Pentium III DOOM bench, and the Phenom II narrowly beats it with 802 realtics.
Anyone know why VGA WC isn't working or another way to enable it?

Starting from the 1st Athlon AMD used that same Intel compatible MTRRs as Pentium Pro+. So fastvid should also work on Athlon/Athlon64/Phenom and set both banked and linear framebuffer properly.
However setting banked VGA framebuffer to WC does not result in the same speed improvement as setting LFB to WC. It's not an AMD specific phenomenon my Core 2 with 7600GT shows the same results. I do not know the exact reason, maybe the bottleneck is elsewhere (e.g. the banking method itself). I have just written a little program that checks the status of WC mode of banked VGA framebuffer and also sets the necessary MTRRs properly. Source is included so you can play with it.

Filename
VGAMTRR.zip
File size
3.84 KiB
Downloads
160 downloads
File license
Fair use/fair dealing exception

Website, Facebook, Youtube
Falcosoft Soundfont Midi Player + Munt VSTi + BassMidi VSTi
VST Midi Driver Midi Mapper

Reply 4 of 26, by bakemono

User metadata
Rank Oldbie
Rank
Oldbie

Thanks for posting. I tried your utliity, as well as fastvid, but it still had no effect on Athlon/Phenom CPUs. Weird.

I tried a couple of additional AGP boards.
Pentium III-933, i815 - 57MB/s
Pentium III-933, i815 - 202MB/s with WC
Athlon XP 2800+, nForce2, 80MB/s
Athlon XP 2800+, nForce2, 237MB/s with WC

VGA WC worked on the Pentium but not the Athlon.

I tried both GeForce4 MX420 and Quadro FX 1000 AGP cards with no difference.

Reply 5 of 26, by gerwin

User metadata
Rank l33t
Rank
l33t

AFAIK Contrary to the VESA LFB, VGA range Write Combining is practically useless. Since the systems that support Write Combining can already run VGA 320x200 at 60 FPS anyways. That and VGA WC can cause some compatibility issues with a few games (Commander Keen IIRC).

--> ISA Soundcard Overview // Doom MBF 2.04 // SetMul

Reply 6 of 26, by Falcosoft

User metadata
Rank Oldbie
Rank
Oldbie
gerwin wrote:

AFAIK Contrary to the VESA LFB, VGA range Write Combining is practically useless. Since the systems that support Write Combining can already run VGA 320x200 at 60 FPS anyways. That and VGA WC can cause some compatibility issues with a few games (Commander Keen IIRC).

Hi, AFAIK it does not work this way. The term 'VGA WC' is misleading. The whole problem has nothing to do with 320x200 or any other VGA resolutions. All the VESA resolutions can be used either in Banked or Linear framebuffer mode. In Banked mode you use the same A0000-AFFFF 64K memory area as a window for the whole framebuffer as VGA/MCGA mode 13h, and you have to use video page (bank) swapping. The problem that we do not understand is why e.g. mode 105h (1024x768x8) do not speed up in banked mode at all even if you set the fixed range MTRR A0000-AFFFF to write combining mode.

@Edit:
Banked mode speed can be important since e.g. Borland/3rd party VESA BGI drivers are banked mode only and generally real mode 16-bit programs can only use banked VESA modes.
Here's an example how banked VESA modes work (1024x768x8) in real mode.
http://falcosoft.hu/dos_softwares.html#vesaman

Website, Facebook, Youtube
Falcosoft Soundfont Midi Player + Munt VSTi + BassMidi VSTi
VST Midi Driver Midi Mapper

Reply 7 of 26, by bakemono

User metadata
Rank Oldbie
Rank
Oldbie

I did some research in the AMD BIOS and Kernel Developer's Guide. There is a separate MSR $C0010113 which controls access to the $A0000 range (it can be remapped to DRAM in system management mode). But the register is locked, I can't modify it. So I guess that if I really wanted to enable VGA WC on my Phenom board I would have to patch the BIOS 🙁

edit:
I looked at the MSRs on my Athlon XP system as well (assuming they are the same as Phenom). On there I was able to set bit 4 of $C0010113 (the register wasn't locked). And VGA WC is working.

I also noticed that the BIOS had already enabled a 128MB region with WC at $E0000000. Too bad the actual LFB is at $D0000000, 🤣...

again another retro game on itch: https://90soft90.itch.io/shmup-salad

Reply 8 of 26, by wbc

User metadata
Rank Member
Rank
Member
gerwin wrote:

AFAIK Contrary to the VESA LFB, VGA range Write Combining is practically useless. Since the systems that support Write Combining can already run VGA 320x200 at 60 FPS anyways. That and VGA WC can cause some compatibility issues with a few games (Commander Keen IIRC).

Same here indeed, write combining is mostly useful for VESA modes only, as for VGA framebuffer it breaks compatibility with 16color modes and Mode-X (making VGA graphics controller bitop/bitmasking/latch register tricks useless as they involve series of 8 bit read-then-write's, while WC combines all 8bit transfers to one 32/64bit batch, messing graphics up 😵 )

--wbcbz7

Reply 9 of 26, by Falcosoft

User metadata
Rank Oldbie
Rank
Oldbie
wbc wrote:

Same here indeed, write combining is mostly useful for VESA modes only

Yep, and 'VGA write combining' affects all VESA modes that do not use linear frame buffer but banked one. As I have written before the standard VGA framebuffer is used in all VESA modes in real mode programs or in protected mode programs when linear frame buffer is not used. To make things clear: before VESA 2.0 only banked mode was available, so all VESA 1.2 compatible modes use banked VGA frame buffer and thus all VESA 1.2 modes are affected. So once again, this is not a VESA OR VGA situation.

bakemono wrote:
I did some research in the AMD BIOS and Kernel Developer's Guide. There is a separate MSR $C0010113 which controls access to the […]
Show full quote

I did some research in the AMD BIOS and Kernel Developer's Guide. There is a separate MSR $C0010113 which controls access to the $A0000 range (it can be remapped to DRAM in system management mode). But the register is locked, I can't modify it. So I guess that if I really wanted to enable VGA WC on my Phenom board I would have to patch the BIOS 🙁
edit:
I looked at the MSRs on my Athlon XP system as well (assuming they are the same as Phenom). On there I was able to set bit 4 of $C0010113 (the register wasn't locked). And VGA WC is working.
I also noticed that the BIOS had already enabled a 128MB region with WC at $E0000000. Too bad the actual LFB is at $D0000000, 🤣...

Hey, nice findings!
Unfortunately on Phenom I/II MSR C001_0113 can be made write protected by setting the 0. bit of MSR C001_0015. And this bit is a write-once bit. If it's set it cannot be cleared anymore.
I have also tested your 4. bit set method and it worked on my Phenom II and Turion64 but the speed improvement was nowhere to linear framebuffer speed improvements.
Phenom II with dedicated Geforce GTX 960: ~2%
Turion64 with Integrated ATI Xpress 1150: ~15%
Contrary your results on Athlon XP I could set the 4. bit, but it did not change anything. The speed of VGA framebuffer writing remained the same.

Website, Facebook, Youtube
Falcosoft Soundfont Midi Player + Munt VSTi + BassMidi VSTi
VST Midi Driver Midi Mapper

Reply 10 of 26, by gerwin

User metadata
Rank l33t
Rank
l33t
Falcosoft wrote:

i, AFAIK it does not work this way. The term 'VGA WC' is misleading. The whole problem has nothing to do with 320x200 or any other VGA resolutions. All the VESA resolutions can be used either in Banked or Linear framebuffer mode. In Banked mode you use the same A0000-AFFFF 64K memory area as a window for the whole framebuffer as VGA/MCGA mode 13h, and you have to use video page (bank) swapping. The problem that we do not understand is why e.g. mode 105h (1024x768x8) do not speed up in banked mode at all even if you set the fixed range MTRR A0000-AFFFF to write combining mode.

Okay, Thanks for explaining it, it is clear to me now.
In 2014 I rewrote all the VESA programming for Doom MBF, with both banked and Linear variants of the screen blitting, so I remember the differences a little. Looking at the source it seems it was necessary to always select banked mode when running inside Windows NT/XP etc. In addition a -nolfb parameter was added. There is this remark of mine in i_vgavbe.c: "The banked mode DPMI functions are not yet implemented, irq 10 is used.", which indicates that these are slightly different ways to deal with VESA banked mode.

wbc wrote:

Same here indeed, write combining is mostly useful for VESA modes only, as for VGA framebuffer it breaks compatibility with 16color modes and Mode-X (making VGA graphics controller bitop/bitmasking/latch register tricks useless as they involve series of 8 bit read-then-write's, while WC combines all 8bit transfers to one 32/64bit batch, messing graphics up 😵 )

Quite a lot of things that can mess up that way, good to know.

--> ISA Soundcard Overview // Doom MBF 2.04 // SetMul

Reply 11 of 26, by wbc

User metadata
Rank Member
Rank
Member
Falcosoft wrote:

so all VESA 1.2 compatible modes use banked VGA frame buffer and thus all VESA 1.2 modes are affected. So once again, this is not a VESA OR VGA situation.

it is possible to create a TSR which will enable WC on VESA banked mode set and disable it for VGA modes, but there is one caveat: RDMSR and WRMSR instructions, which are privilegied under protected\VM86 mode, so you need to mess up with EMM386/QEMM/other VM86 memory managers. Moreover, you should also deal with DOS extenders as well, as a lot of VESA software run in protected mode. *ouch* 🤣

--wbcbz7

Reply 12 of 26, by Falcosoft

User metadata
Rank Oldbie
Rank
Oldbie
wbc wrote:

it is possible to create a TSR which will enable WC on VESA banked mode set and disable it for VGA modes, but there is one caveat: RDMSR and WRMSR instructions, which are privilegied under protected\VM86 mode, so you need to mess up with EMM386/QEMM/other VM86 memory managers. Moreover, you should also deal with DOS extenders as well, as a lot of VESA software run in protected mode. *ouch* 🤣

Yes, that could be the next problem, but the current problem is completely different. (BTW 99% of programs that require EMM386 work with JEMM386 that handles many privileged instructions including RDMSR and WRMSR.) The problem is on more modern CPU's (Athlon/Athlon64/Phenom/Core2 tested) it's not enough to enable write combining in A0000-BFFFF fixed range MTRR to speed up VGA frame buffer writing like on Pentium Pro/II/III era CPU's. It seems (at least on AMD) that System Management Mode related settings interfere with banked VGA write combining by default. From Phenom II BIOS Guide:

2.9.3.1.1 Determining The Cache Attribute 1. The CPU translates the logical address to a physical address. In that process it de […]
Show full quote

2.9.3.1.1 Determining The Cache Attribute
1. The CPU translates the logical address to a physical address. In that process it determines the initial cache attribute based on the settings of the Page Table Entry PAT bits,
[The MTRR Default Memory Type Register (MTRRdefType)] MSR0000_02FF, [The Variable-Size MTRRs (MTRRphysBasen and MTRRphysMaskn)] MSR0000_02[0F:00],
and [The Fixed-Size MTRRs (MTRRfixn)] MSR0000_02[6F:68, 59, 58, 50].
2. The ASeg and TSeg SMM mechanisms are then checked in parallel to determine if the initial cache attribute should be overridden
(see [The SMM TSeg Base Address Register (SMMAddr)] MSRC001_0112 and [The SMM TSeg Mask Register (SMMMask)] MSRC001_0113).
If the address falls within an enabled ASeg/TSeg region, then the final cache attribute is determined as specified in MSRC001_0113...
The ASeg address range is located at a fixed address from A0000h–BFFFFh.

So even your hypothetical TSR could not use a general mechanism since the above MSR's are vendor/CPU specific and even cannot be modified in some cases.
Anyway I do not think there is a high demand for such a TSR, only a few hardcore DOS enthusiasts (so far 2 😀 ) are interested in this problem. And the enable/disable job can be done manually if someone has the tools and knows what has to be done.
I have modified the VGAMTRR tool according to bakemono's info, but this way it is now an AMD Athlon+ only test/tool.

Filename
VGAMTRR_AMD.zip
File size
4 KiB
Downloads
136 downloads
File license
Fair use/fair dealing exception

Website, Facebook, Youtube
Falcosoft Soundfont Midi Player + Munt VSTi + BassMidi VSTi
VST Midi Driver Midi Mapper

Reply 13 of 26, by Iconclas

User metadata
Rank Newbie
Rank
Newbie

I run a real mode DOS program that uses Super VGA extensions from VESA 1.2 banked frame buffer; 640x400 @ 256color; and that requires EMS. It is the Microsoft Flight Simulator 5.1 for DOS. I currently run it on a P4 845 chipset system with an AGP Nvidia 7950 GT. I would like to get some of these tools working in this environment but need EMS. I can not for the life of me after using numerous combinations of Himemx, Jemm386 and umbpcie get Jemm386 to work to try out some of these tools for this simulator.

I was able to dumb down the card by using VESA12 which did increase the frame rates of flying around the buildings of Manhatten in the flight simulator. There is no stutter or lag anymore. I had always thought of such a program so finally someone had made one and it works great.

The Northwood P4 have no intel 64bit stuff but the SL6K7 has HT but is not enabled on the currently used motherboard. The game is run in ramdrive with a virtual cdrom in ramdrive also.

Any insight on why Jemm386 will not work on an Intel D845PT motherboard other than the CPU?

Reply 15 of 26, by vlask

User metadata
Rank Member
Rank
Member

You should find and check Permedia 2V card 4MB AGP, it should be cheap, just not much common. Card i tested that got speed almost comparable with fastvid enabled, but i didn't used it. Results from this card was very strange to me.
Quake 640 without fastvid... 72,4FPS
Quake 640 with fastvid - 87,6FPS
Almost all other cards ended at 34,6FPS without fastvid. They must have some hacks in bios.....

Not only mine graphics cards collection at http://www.vgamuseum.info

Reply 16 of 26, by The Serpent Rider

User metadata
Rank l33t++
Rank
l33t++

Is it vendor specific trait or all Permedia 2V cards perform so well?

I must be some kind of standard: the anonymous gangbanger of the 21st century.

Reply 17 of 26, by Falcosoft

User metadata
Rank Oldbie
Rank
Oldbie
vlask wrote:
You should find and check Permedia 2V card 4MB AGP, it should be cheap, just not much common. Card i tested that got speed almos […]
Show full quote

You should find and check Permedia 2V card 4MB AGP, it should be cheap, just not much common. Card i tested that got speed almost comparable with fastvid enabled, but i didn't used it. Results from this card was very strange to me.
Quake 640 without fastvid... 72,4FPS
Quake 640 with fastvid - 87,6FPS
Almost all other cards ended at 34,6FPS without fastvid. They must have some hacks in bios.....

Unfortunately Quake results are not relevant here since instead of banked modes Quake uses exclusively linear framebuffer modes for higher resolutions. If you force Quake to use only banked modes (e.g. by using Vesa12) the biggest resolution yo can select is 360x480. This does not necessarily mean that Permedia 2V is not faster in banked VESA modes than other cards but it should be tested first.

Website, Facebook, Youtube
Falcosoft Soundfont Midi Player + Munt VSTi + BassMidi VSTi
VST Midi Driver Midi Mapper

Reply 18 of 26, by RayeR

User metadata
Rank Oldbie
Rank
Oldbie

Important note about using MTRRLFBE under v86 mode (JEMM/EMM/QEMM386)

There was made an observation that some DOS programs doesn't make any speed up after MTRRLFBE enabled WC mode for LFB under v86 mode while they make significant speedup when run under real mode. I and Falcosoft confirmed this behavior on different HW platforms (from Pentium Pro to Core i7 2600K) so it's not a rare HW quirk but it's general problem. I narrowed the problem that only older DOS programs that use DOS4GW extender are affected (like Blood, DN3D, perf and profile benchmarks...) while newer programs that are compiled by DJGPP and use external DPMI server (CWSDPMI) works fine and make speedup. It includes e.g. my VESATEST and Qdos, Q2dos, Hexen II game engines.

We have no idea what cause this problem, maybe something how physical address is mapped to program's linear address... The solution may require to update the DOS extender or v86 memory manager. I already asked Japheth for help but he's quite bussy...

BTW for all MTRRLFBE users I reccomend to update to last version at my homepage http://rayer.g6.cz/programm/programe.htm#MTRRLFBE that has improved MTRRs handling.

Gigabyte GA-P67-DS3-B3, Core i7-2600K @4,5GHz, 8GB DDR3, 128GB SSD, GTX970(GF7900GT), SB Audigy + YMF724F + DreamBlaster combo + LPC2ISA

Reply 19 of 26, by gerwin

User metadata
Rank l33t
Rank
l33t

Thanks for looking into this, and for updating your MTRR tool.
DOS4GW seems to be the most widely used extender for commercial games. On the other hand it practically only matters for DOS 3D games that support resolutions above 320x200.

--> ISA Soundcard Overview // Doom MBF 2.04 // SetMul