MTRRLFBE and AGP/PCIe cards in DOS

Discussion about old graphics cards, monitors and video related things.

MTRRLFBE and AGP/PCIe cards in DOS

Postby bakemono » 2018-4-14 @ 22:02

First some results for framebuffer write speed:

486DX4-100, Trident 8900 ISA - 5.4MB/s
486DX4-100, Trident 9440 VLB - 31MB/s
Pentium II-350, i440BX, Trident 9680 PCI - 38MB/s
Pentium II-350, i440BX, Trident 9680 PCI - 62MB/s (write combining enabled)
Pentium III-600e, i440BX, GeForce FX5200 AGP - 47MB/s
Pentium III-600e, i440BX, GeForce FX5200 AGP - 240MB/s (write combining enabled)
Pentium M-1200, i855PM, Radeon 7500 AGP - 50MB/s
Pentium M-1200, i855PM, Radeon 7500 AGP - 169MB/s (write combining enabled)
Athlon XP, ViaKT333, GeForce FX5700 AGP - 83MB/s
Athlon XP, ViaKT333, GeForce FX5700 AGP - 192MB/s (write combining enabled)
Phenom II, AMD770, Radeon 5670 PCIe - 189MB/s
Phenom II, AMD770, Radeon 5670 PCIe - 2500MB/s (write combining enabled)

As you can see, write combining makes a big difference. Even so, I find the performance of AGP 4x/8x cards to be lackluster. The 440BX chipset is only AGP 2x but scores a higher result. Consequently, the Pentium III-600 returns a DOOM benchmark result of 5026 in 1068 (demo1) which beats the Pentium-M (which had 1417 realtics)

On AMD CPUs I used MTRRLFBE to enable write combining for the LFB (vs. fastvid on intel CPUs). However it fails to enable WC for the VGA buffer. Without WC for the VGA buffer, the Athlon XP also fails to beat the Pentium III DOOM bench, and the Phenom II narrowly beats it with 802 realtics.

Anyone know why VGA WC isn't working or another way to enable it?

The Radeon 5670 is the only PCIe card I've tested and it's only running at PCIe 1.1 speed, but gives a whopping 2.5GB/s out of the 4GB/s theoretical maximum of the PCIe bus. AGP 8x should top out at 2.1GB/s. Has anyone gotten better performance than 240MB/s from an AGP card?

I've been playing with DOS lately to test DOS builds of FreeBASIC programs. It seems that the FB graphics routines must use the VGA buffer, because MTRRLFBE makes no difference, and framerates on the Phenom II are much slower in DOS than in Windows.
bakemono
Newbie
 
Posts: 49
Joined: 2018-1-15 @ 06:56

Re: MTRRLFBE and AGP/PCIe cards in DOS

Postby mrau » 2018-4-14 @ 22:35

how did You measure this? does the 2gb transfer rate on phenom translate into framerate well?
mrau
Oldbie
 
Posts: 902
Joined: 2015-11-28 @ 12:43

Re: MTRRLFBE and AGP/PCIe cards in DOS

Postby bakemono » 2018-4-15 @ 08:15

Mostly I used the PROFILE utility that comes with scitech display doctor, as it can test both banked and linear screen modes.

The old NES emulator for DOS, NESticle, runs at 2,200fps on the Phenom II with 320x240 Vesa 2 mode. With write combining enabled it goes up to 10,000fps.
bakemono
Newbie
 
Posts: 49
Joined: 2018-1-15 @ 06:56

Re: MTRRLFBE and AGP/PCIe cards in DOS

Postby Falcosoft » 2018-4-15 @ 09:03

On AMD CPUs I used MTRRLFBE to enable write combining for the LFB (vs. fastvid on intel CPUs). However it fails to enable WC for the VGA buffer. Without WC for the VGA buffer, the Athlon XP also fails to beat the Pentium III DOOM bench, and the Phenom II narrowly beats it with 802 realtics.
Anyone know why VGA WC isn't working or another way to enable it?


Starting from the 1st Athlon AMD used that same Intel compatible MTRRs as Pentium Pro+. So fastvid should also work on Athlon/Athlon64/Phenom and set both banked and linear framebuffer properly.
However setting banked VGA framebuffer to WC does not result in the same speed improvement as setting LFB to WC. It's not an AMD specific phenomenon my Core 2 with 7600GT shows the same results. I do not know the exact reason, maybe the bottleneck is elsewhere (e.g. the banking method itself). I have just written a little program that checks the status of WC mode of banked VGA framebuffer and also sets the necessary MTRRs properly. Source is included so you can play with it.
VGAMTRR.zip
(3.84 KiB) Downloaded 3 times
User avatar
Falcosoft
Oldbie
 
Posts: 556
Joined: 2016-5-21 @ 13:46
Location: Pécs, Hungary

Re: MTRRLFBE and AGP/PCIe cards in DOS

Postby bakemono » 2018-4-15 @ 22:35

Thanks for posting. I tried your utliity, as well as fastvid, but it still had no effect on Athlon/Phenom CPUs. Weird.

I tried a couple of additional AGP boards.
Pentium III-933, i815 - 57MB/s
Pentium III-933, i815 - 202MB/s with WC
Athlon XP 2800+, nForce2, 80MB/s
Athlon XP 2800+, nForce2, 237MB/s with WC

VGA WC worked on the Pentium but not the Athlon.

I tried both GeForce4 MX420 and Quadro FX 1000 AGP cards with no difference.
bakemono
Newbie
 
Posts: 49
Joined: 2018-1-15 @ 06:56

Re: MTRRLFBE and AGP/PCIe cards in DOS

Postby gerwin » 2018-4-16 @ 10:32

AFAIK Contrary to the VESA LFB, VGA range Write Combining is practically useless. Since the systems that support Write Combining can already run VGA 320x200 at 60 FPS anyways. That and VGA WC can cause some compatibility issues with a few games (Commander Keen IIRC).
User avatar
gerwin
l33t
 
Posts: 2448
Joined: 2004-5-07 @ 19:21
Location: NL

Re: MTRRLFBE and AGP/PCIe cards in DOS

Postby Falcosoft » 2018-4-16 @ 11:18

gerwin wrote:AFAIK Contrary to the VESA LFB, VGA range Write Combining is practically useless. Since the systems that support Write Combining can already run VGA 320x200 at 60 FPS anyways. That and VGA WC can cause some compatibility issues with a few games (Commander Keen IIRC).


Hi, AFAIK it does not work this way. The term 'VGA WC' is misleading. The whole problem has nothing to do with 320x200 or any other VGA resolutions. All the VESA resolutions can be used either in Banked or Linear framebuffer mode. In Banked mode you use the same A0000-AFFFF 64K memory area as a window for the whole framebuffer as VGA/MCGA mode 13h, and you have to use video page (bank) swapping. The problem that we do not understand is why e.g. mode 105h (1024x768x8) do not speed up in banked mode at all even if you set the fixed range MTRR A0000-AFFFF to write combining mode.

@Edit:
Banked mode speed can be important since e.g. Borland/3rd party VESA BGI drivers are banked mode only and generally real mode 16-bit programs can only use banked VESA modes.
Here's an example how banked VESA modes work (1024x768x8) in real mode.
http://falcosoft.hu/dos_softwares.html#vesaman
User avatar
Falcosoft
Oldbie
 
Posts: 556
Joined: 2016-5-21 @ 13:46
Location: Pécs, Hungary

Re: MTRRLFBE and AGP/PCIe cards in DOS

Postby bakemono » 2018-4-17 @ 08:23

I did some research in the AMD BIOS and Kernel Developer's Guide. There is a separate MSR $C0010113 which controls access to the $A0000 range (it can be remapped to DRAM in system management mode). But the register is locked, I can't modify it. So I guess that if I really wanted to enable VGA WC on my Phenom board I would have to patch the BIOS :(

edit:
I looked at the MSRs on my Athlon XP system as well (assuming they are the same as Phenom). On there I was able to set bit 4 of $C0010113 (the register wasn't locked). And VGA WC is working.

I also noticed that the BIOS had already enabled a 128MB region with WC at $E0000000. Too bad the actual LFB is at $D0000000, lol...
bakemono
Newbie
 
Posts: 49
Joined: 2018-1-15 @ 06:56

Re: MTRRLFBE and AGP/PCIe cards in DOS

Postby wbc » 2018-4-17 @ 11:31

gerwin wrote:AFAIK Contrary to the VESA LFB, VGA range Write Combining is practically useless. Since the systems that support Write Combining can already run VGA 320x200 at 60 FPS anyways. That and VGA WC can cause some compatibility issues with a few games (Commander Keen IIRC).

Same here indeed, write combining is mostly useful for VESA modes only, as for VGA framebuffer it breaks compatibility with 16color modes and Mode-X (making VGA graphics controller bitop/bitmasking/latch register tricks useless as they involve series of 8 bit read-then-write's, while WC combines all 8bit transfers to one 32/64bit batch, messing graphics up :dead: )
--wbcbz7
wbc
Member
 
Posts: 130
Joined: 2015-3-14 @ 14:51
Location: Russia \ Omsk

Re: MTRRLFBE and AGP/PCIe cards in DOS

Postby Falcosoft » 2018-4-17 @ 12:24

wbc wrote:Same here indeed, write combining is mostly useful for VESA modes only

Yep, and 'VGA write combining' affects all VESA modes that do not use linear frame buffer but banked one. As I have written before the standard VGA framebuffer is used in all VESA modes in real mode programs or in protected mode programs when linear frame buffer is not used. To make things clear: before VESA 2.0 only banked mode was available, so all VESA 1.2 compatible modes use banked VGA frame buffer and thus all VESA 1.2 modes are affected. So once again, this is not a VESA OR VGA situation.

bakemono wrote:I did some research in the AMD BIOS and Kernel Developer's Guide. There is a separate MSR $C0010113 which controls access to the $A0000 range (it can be remapped to DRAM in system management mode). But the register is locked, I can't modify it. So I guess that if I really wanted to enable VGA WC on my Phenom board I would have to patch the BIOS :(
edit:
I looked at the MSRs on my Athlon XP system as well (assuming they are the same as Phenom). On there I was able to set bit 4 of $C0010113 (the register wasn't locked). And VGA WC is working.
I also noticed that the BIOS had already enabled a 128MB region with WC at $E0000000. Too bad the actual LFB is at $D0000000, lol...

Hey, nice findings!
Unfortunately on Phenom I/II MSR C001_0113 can be made write protected by setting the 0. bit of MSR C001_0015. And this bit is a write-once bit. If it's set it cannot be cleared anymore.
I have also tested your 4. bit set method and it worked on my Phenom II and Turion64 but the speed improvement was nowhere to linear framebuffer speed improvements.
Phenom II with dedicated Geforce GTX 960: ~2%
Turion64 with Integrated ATI Xpress 1150: ~15%
Contrary your results on Athlon XP I could set the 4. bit, but it did not change anything. The speed of VGA framebuffer writing remained the same.
User avatar
Falcosoft
Oldbie
 
Posts: 556
Joined: 2016-5-21 @ 13:46
Location: Pécs, Hungary

Re: MTRRLFBE and AGP/PCIe cards in DOS

Postby gerwin » 2018-4-17 @ 12:38

Falcosoft wrote:i, AFAIK it does not work this way. The term 'VGA WC' is misleading. The whole problem has nothing to do with 320x200 or any other VGA resolutions. All the VESA resolutions can be used either in Banked or Linear framebuffer mode. In Banked mode you use the same A0000-AFFFF 64K memory area as a window for the whole framebuffer as VGA/MCGA mode 13h, and you have to use video page (bank) swapping. The problem that we do not understand is why e.g. mode 105h (1024x768x8) do not speed up in banked mode at all even if you set the fixed range MTRR A0000-AFFFF to write combining mode.

Okay, Thanks for explaining it, it is clear to me now.
In 2014 I rewrote all the VESA programming for Doom MBF, with both banked and Linear variants of the screen blitting, so I remember the differences a little. Looking at the source it seems it was necessary to always select banked mode when running inside Windows NT/XP etc. In addition a -nolfb parameter was added. There is this remark of mine in i_vgavbe.c: "The banked mode DPMI functions are not yet implemented, irq 10 is used.", which indicates that these are slightly different ways to deal with VESA banked mode.

wbc wrote:Same here indeed, write combining is mostly useful for VESA modes only, as for VGA framebuffer it breaks compatibility with 16color modes and Mode-X (making VGA graphics controller bitop/bitmasking/latch register tricks useless as they involve series of 8 bit read-then-write's, while WC combines all 8bit transfers to one 32/64bit batch, messing graphics up :dead: )

Quite a lot of things that can mess up that way, good to know.
User avatar
gerwin
l33t
 
Posts: 2448
Joined: 2004-5-07 @ 19:21
Location: NL

Re: MTRRLFBE and AGP/PCIe cards in DOS

Postby wbc » 2018-4-17 @ 15:11

Falcosoft wrote:so all VESA 1.2 compatible modes use banked VGA frame buffer and thus all VESA 1.2 modes are affected. So once again, this is not a VESA OR VGA situation.

it is possible to create a TSR which will enable WC on VESA banked mode set and disable it for VGA modes, but there is one caveat: RDMSR and WRMSR instructions, which are privilegied under protected\VM86 mode, so you need to mess up with EMM386/QEMM/other VM86 memory managers. Moreover, you should also deal with DOS extenders as well, as a lot of VESA software run in protected mode. *ouch* :lol:
--wbcbz7
wbc
Member
 
Posts: 130
Joined: 2015-3-14 @ 14:51
Location: Russia \ Omsk

Re: MTRRLFBE and AGP/PCIe cards in DOS

Postby Falcosoft » 2018-4-17 @ 23:49

wbc wrote:it is possible to create a TSR which will enable WC on VESA banked mode set and disable it for VGA modes, but there is one caveat: RDMSR and WRMSR instructions, which are privilegied under protected\VM86 mode, so you need to mess up with EMM386/QEMM/other VM86 memory managers. Moreover, you should also deal with DOS extenders as well, as a lot of VESA software run in protected mode. *ouch* :lol:

Yes, that could be the next problem, but the current problem is completely different. (BTW 99% of programs that require EMM386 work with JEMM386 that handles many privileged instructions including RDMSR and WRMSR.) The problem is on more modern CPU's (Athlon/Athlon64/Phenom/Core2 tested) it's not enough to enable write combining in A0000-BFFFF fixed range MTRR to speed up VGA frame buffer writing like on Pentium Pro/II/III era CPU's. It seems (at least on AMD) that System Management Mode related settings interfere with banked VGA write combining by default. From Phenom II BIOS Guide:
2.9.3.1.1 Determining The Cache Attribute
1. The CPU translates the logical address to a physical address. In that process it determines the initial cache attribute based on the settings of the Page Table Entry PAT bits,
[The MTRR Default Memory Type Register (MTRRdefType)] MSR0000_02FF, [The Variable-Size MTRRs (MTRRphysBasen and MTRRphysMaskn)] MSR0000_02[0F:00],
and [The Fixed-Size MTRRs (MTRRfixn)] MSR0000_02[6F:68, 59, 58, 50].
2. The ASeg and TSeg SMM mechanisms are then checked in parallel to determine if the initial cache attribute should be overridden
(see [The SMM TSeg Base Address Register (SMMAddr)] MSRC001_0112 and [The SMM TSeg Mask Register (SMMMask)] MSRC001_0113).
If the address falls within an enabled ASeg/TSeg region, then the final cache attribute is determined as specified in MSRC001_0113...
The ASeg address range is located at a fixed address from A0000h–BFFFFh.

So even your hypothetical TSR could not use a general mechanism since the above MSR's are vendor/CPU specific and even cannot be modified in some cases.
Anyway I do not think there is a high demand for such a TSR, only a few hardcore DOS enthusiasts (so far 2 :) ) are interested in this problem. And the enable/disable job can be done manually if someone has the tools and knows what has to be done.
I have modified the VGAMTRR tool according to bakemono's info, but this way it is now an AMD Athlon+ only test/tool.
VGAMTRR_AMD.zip
(4 KiB) Downloaded 2 times
User avatar
Falcosoft
Oldbie
 
Posts: 556
Joined: 2016-5-21 @ 13:46
Location: Pécs, Hungary


Return to Video

Who is online

Users browsing this forum: No registered users and 1 guest