VOGONS


First post, by kjliew

User metadata
Rank Oldbie
Rank
Oldbie

As an introduction, this is *NOT* an upstream endorsed QEMU feature. QEMU native virgil3D/virtio-GPU provides support for complete OpenGL/GLES acceleration for Linux guests from Linux hosts. There were efforts started in 2018 to support Windows guests, but progress seemed to be at halt at the moment. Even the codes posted on GitHub are hard to rebuild because it requires Microsoft tools and SDKs.

Looking at how 3Dfx did standalone OpenGL for Voodoo/Voodoo2, it is in fact a viable solution to use the same concept in VM for full-screen OpenGL rendering, much simpler than writing a complete OpenGL ICD and interaction with GDI/DirectX/Display drivers. So In addition to 3Dfx Glide pass-through, QEMU can now support full host OpenGL acceleration through MESA GL pass-through with the focus for playing full-screen OpenGL Windows games. The Glide API is very similar to OpenGL. The learning from Glide pass-through was leveraged tremendously with similar guest push model and pass-through mechanism of command/data FIFOs but greatly enlarged to handle the data sets of OpenGL. No Microsoft tools and SDKs are required at all.

I presume it is currently at OpenGL 1.1 complete and includes some popular GL extensions that were widely used for OpenGL games in years 1997-2002 such as
ARB_multitexture, ARB_compiled_vertex_array etc. While dgVoodoo2 has been great for Glide Windows games on Windows, Linux aren't so lucky. Fortunately, most of the late Glide games (2000+) also support OpenGL as OpenGL started gaining popularity. The addition of MESA GL pass-through greatly addresses the shortcomings of OpenGlide for Linux KVM gaming. Furthermore, switching to OpenGL also enables higher visual qualities such as 32-bit color and high-res textures.

For Glide-only games without the option for OpenGL, there is also another option of running OpenGL-based Glide wrappers on guests. Current MESA GL pass-through supports enough OpenGL capabilities for the highly capable Zeckensack's Glide wrapper and Sven's Glide3-to-OpenGL wrapper for Diablo II. Games such as NFS Porsche 2000, Titanium Mechwarriors 2 series and Diablo II run perfectly flawless with the OpenGL wrappers at high resolution at the speed of modern CPU virtualization.

Other OpenGL games tested running great (high FPS 35~60, full details, equivalent or better than Glide using dgVoodoo2) are GLQuake1/Quake2/Quake3, Half Life, Unreal/UT99, Homeworld, Serious Sam TFE, Hitman Code 47. Much higher details OpenGL games (typically after year 2001) may run at 30FPS or lower, for eg. BloodRayne1 (2001) and enable water reflection kills another 6~7FPS. UT2003 and Doom3 (2004) are able to run but at <5 FPS.

Testing is always ad hoc and per-game basis, typically involves analyzing the game GL calls trace then implement/fix new existing GL functions pass-through. As usual, QEMU source code overlay is provided at my GitHub and if you are interested to try out, then you will need to be able to apply the patch and build QEMU from source.

Last edited by kjliew on 2020-04-23, 19:10. Edited 2 times in total.

Reply 1 of 13, by kjliew

User metadata
Rank Oldbie
Rank
Oldbie

Attached the MESA GL guest wrapper for Windows 9x/ME/2K/XP/Vista/Win7, 32-bit version only, with test "wglgears" included for basic sanity check.

For standalone OpenGL, the easiest is to drop the DLL into the same folder as the game EXE, so that the game will find the DLL first before it looks into Windows folder. Per Microsoft documentation, this should work for all Windows, but if there is a "ghost" in the guest OS preventing this from working, I have no idea.

In theory, if you are able to defeat the OS system file protection, then you can replace the copy in the Windows system folder to save the hassle of having to update multiple game folders, but I do not recommend such way. Do it at your own risk. Or if you are able to run Win95/OSR2 on QEMU which does not ship with OPENGL32.DLL, then you can just drop the DLL in the OS system folder and be done with it.

If both Glide and MESA pass-through are available in the QEMU, then the recommended placement of DLLs is to place Glide guest wrappers in Windows system folder and MESA GL guest wrapper in the game folder. For some reason if dgVoodoo2 does not work for you and you want to try OpenGL Glide wrappers, then you need to drop the GLIDE2X/3X DLL in the game folder for them to use the right DLL for MESA GL pass-through. Simply delete the GLIDE2X/3X DLL in the game folder to switch back to Glide pass-through.

Attachments

Last edited by kjliew on 2020-05-04, 02:14. Edited 1 time in total.

Reply 2 of 13, by kjliew

User metadata
Rank Oldbie
Rank
Oldbie

It's a dream come true 😁 !
With the Glide 2.1.1 wrapper, I have finally able to play Mechwarrior 2 31st 3Dfx on Linux laptop on QEMU KVM with Zeckensack's Glide Wrapper at 1280x960 using the wrapper high-res option to double the resolution. It plays smoothly hovering between 20-30fps, which is perfectly fine and does not hit the jumpjet slow recharge issue if it approaches 60fps. The mesa dessert map of Trial of Grievance is a huge map and it is typically slower than typical missions.
Thanks to Linux KVM, it does not even require bulky desktop with powerful CPU and hot-air blower GPU. The game plays perfectly fine on Core i3-4010U and Core m3-6Y30 with their respective Intel Graphics.

mech2_1280x960.png
Filename
mech2_1280x960.png
File size
240.95 KiB
Views
348 views
File comment
mech2 1280x960
File license
Fair use/fair dealing exception

Reply 3 of 13, by kjliew

User metadata
Rank Oldbie
Rank
Oldbie

I have finally been able to nail down a puzzle of performance issue with QEMU that games run super-fast on Win98SE/WinME (Quake2/3 at 200~300FPS) but slow (... not exactly ... 40~60FPS) on Win2k/WinXP. The puzzle was there all the time which also affected Glide pass-through on QEMU, but because the playable framerate was still acceptable somewhat, I didn't spend time looking further. As more complex and recent games were brought into QEMU, it became a bottleneck as some games simply require Win2K/WinXP, especially for OpenGL games and I wasn't interested in the after-life KernelEx patch for Win98SE. I prefer to simply keep everything as official as possible.

Adding more mystery to the puzzle, the performance penalty happened unanimously across CPU (Intel, AMD) and GPU (Intel, AMD, NVIDIA) vendors on Windows 10 WHPX. On Linux KVM, Intel CPU and GPU performance simply crushed the rest. Unfortunately, I don't have Intel/NVIDIA or Intel/AMD CPU/GPU combo, otherwise the puzzle would have been more obvious. To put this in simple exaggeration, a Core i3-4010U Haswell exhibited an insane performance in Quake2/3 FPS that crushed more recent AMD Ryzen 2500U APU and desktop AMDFX8300/NVIDIA GT730. Truly, unbelievable! 😳 I knew it was't because of v-sync limitation because the moment the games run on Win98SE/WinME, the FPS blew off the roof.

My recent Google'ing pointed to AMD-V NPT issue in QEMU/KVM, so I tried NPT disabled on Linux KVM, and yeah, AMDFX8300/NVIDIA GT730 regained the expected performance. However, the puzzle wasn't yet solved completely, NPT was supposed to improve KVM performance by reducing frequent VMEXIT due to guest memory paging. There were numerous whitepapers from both Intel and AMD claimed that EPT/NPT improves virtual machines performance by 40% in memory intensive workloads. Adding fire to the puzzle, AMD-V NPT issue was confirmed a KVM software issue (VMWare and XenHV do not have the issue) and fixed in 2017 kernel commits.

So it turned out that the MSR_IA32_CR_PAT is the main culprit, making virtual device-mapped host/guest shared memory uncacheable from the virtual CPU. Virtual device-mapped shared memory was the main implementation concept for Glide and MESAGL pass-through. Contrast to real hardware, device-mapped memory would typically be uncacheable or write-combined and device drivers manage the memory mapping in kernel space. Glide and MESAGL pass-through are driver-less virtual device from the OS perspective, so it is inevitable for the shared memory to be mapped as uncacheable.

QEMU virtualizes MSR registers access from guest. With WHPX on Windows, it does not even sync the IA32_PAT register during the host/guest states sync. There is no visibility into the closed-source WHPX how things are managed behind the scene, but from the performance figures I would conclude that it didn't handle the IA32_PAT correctly for both Intel and AMD. Fair enough, if you would say, no red flag 😁. With KVM on Linux, a much more matured virtualization accelerator, the difference prevailed. The VMX-based kvm_intel got it right but SVM-based kvm_amd remained the same. Poor AMD, apparently Intel 's proceed in x86 hardware virtualization extension fragmentation paid off. There were no short of rants, frustration and disappointment on how AMD had failed to address the issue in time for Ryzen/EPYC launch. Well, this could also be the hard ball for QEMU-side of implementation, but with Intel specific VMX-based implementation getting it right regardless of hypervisor implementation, it simply made AMD look very bad for virtualization.

I don't have the correct fix for QEMU, so let just leave that for upstream. I have host-side hooks for Glide and MESAGL pass-through that reprogram and restore guest-side IA32_PAT register when Glide or MESAGL is activated and this completely restores the rightful performance for both Intel and AMD CPUs on Windows/WHPX and Linux/KVM. Perhaps there is also a better solution by creating proper kernel drivers for the virtual devices, but to do that and the need of Microsoft tools and DDKs, I will just forget it.

Anyway, I am thrilled with the improved performance, Unreal Tournament 2003 is now playable through OpenGL on QEMU 😀, all from a petty fanless, 8W TDP Core m3-6Y30 Skylake-GT2, Windows 10 and Linux, even on battery (but it will drain fast 😜)

UT2003 demo benchmark wrote:

Flyby 148.59FPS
Botmatch 73.79FPS
Resolution 1024x768

And, MESAGL pass-through seems to be able to achieve 90% of native performance in QEMU for advanced OpenGL techniques leveraging VBOs that minimize per-frame drawing calls instead of client-side vertex arrays. DOOM3 (2004) is still not playable but it runs better without crash by exposing the VBO extension.

Reply 4 of 13, by kjliew

User metadata
Rank Oldbie
Rank
Oldbie

Mechwarrior2 Mercenaries from Titanium series. QEMU KVM MESAGL pass-through with Zeckensack's Glide wrapper, up scaled to 1280x960.
The frame rate was simply phenomenal 😳 regardless of how intense the actions going on, it simply locked at 60FPS. I remember I was not able to get this kind of performance running the game on Windows XP with compatibility mode on a Core 2 Duo E8400 and Geforce 9400 regardless of Glide or Direct3D. The frame rate would drop below 15FPS during missle firing and explosions and the stutter was obvious when such happened. I was very disappointed back then because the new system no longer support Win98. Now, worry no more 😁, and it plays well even on petty, power efficient laptops.

MercsTT-1.png
Filename
MercsTT-1.png
File size
1002.76 KiB
Views
244 views
File comment
Mech2 Mercs TT
File license
Fair use/fair dealing exception

Reply 5 of 13, by kjliew

User metadata
Rank Oldbie
Rank
Oldbie

Unreal Tournament 2004 demo v3334, OpenGL, had to tweak OpenGLDrv settings in INI to enable VBO with "UseVBO=True".

ut2004.png
Filename
ut2004.png
File size
1.28 MiB
Views
232 views
File comment
UT2004
File license
Fair use/fair dealing exception

Reply 6 of 13, by Dominus

User metadata
Rank DOSBox Moderator
Rank
DOSBox Moderator

wow! Seems I do need to jump on Quemu one of these days... But with Apple's help in crippling OpenGl it's probably all futile 🙁

Windows 3.1x guide for DOSBox
60 seconds guide to DOSBox

Reply 7 of 13, by kjliew

User metadata
Rank Oldbie
Rank
Oldbie

😁 Anyone remember this game? One of the tough games released in 2001 that required high system specs of the time to play. It wasn't very optimized, but luckily it supports 3Dfx Glide.
It is now playable on Linux KVM, on 8W TDP fan-less Intel laptop. Up scaled to 1280x960. The OpenGL renderer was unfortunately extremely un-optimized and was unable to attain playable frame rate. The same shot position with dynamic lights and shadows on OpenGL renderer only achieved 13~15FPS.

GuessAGame.png
Filename
GuessAGame.png
File size
1.57 MiB
Views
211 views
File comment
Severance Glide
File license
Fair use/fair dealing exception

More on Severance...

SevernGL.png
Filename
SevernGL.png
File size
1.84 MiB
Views
161 views
File comment
Severance OpenGL
File license
Fair use/fair dealing exception

Severance OpenGL rendered at native 1024x768. It won't get more FPS by reducing resolution or less FPS by increasing resolution. So basically this game trying to achieve the level of details with just OpenGL 1.1 level APIs, not very efficient in VM context of passing through GL calls, but the textures, lights and shadows are different from Glide rendering. Nothing good or bad anyway, Glide still looks OK even though the wrapper had upscaled from 640x480., though that is a personal taste. OpenGL render plugin is beta and unsupported, so that's about it.
Update: Look like I was wrong. I forgot to make EXT_fog_coord GL calls into FIFO hence causing frequent VMEXIT when the GL calls were used in the games. After fixing this, Severance OpenGL is fully playable and perhaps on occasion vsync should be enabled to prevent from drastic FPS fluctuation at 1280x960 native rendering resolution.

Last edited by kjliew on 2020-05-13, 12:58. Edited 4 times in total.

Reply 10 of 13, by kjliew

User metadata
Rank Oldbie
Rank
Oldbie
digger wrote on 2020-05-16, 09:31:

Any chance you can have it accepted to be merged in mainline QEMU any time soon?

I highly doubt it will be accepted. Many years back then, some guys (presumably from Intel) did similar things but no patch/source code was ever published. I only found a PDF presentation.. They did it for getting QEMU to run MeeGO as an emulator for development.

QEMU devs community has clearly rejected the idea of passing through an entire OpenGL at API level. It was claimed to be difficult to manage (a few thousands of APIs function calls) and pose a huge attack surface for potential security risks. There were whitepapers published by the BlackHat community targeting Hyper-V, VMWare and VirtualBox on their Guest-to-Host 3D acceleration implementations to instrument an attack on the hypervisors for guest to escape containment and gain access into host. I don't remember if anyone dare to claim their implementation is secure. Both VMWare and VirtualBox do not recommend enabling 3D acceleration for commercial VM deployment, if security is important.

Anyway, I did mention about it and GitHub link was provided when I reported the x86/PAT issue to QEMU-devel mailing list, so they can take it if they want to. That's about all I would do.

Reply 12 of 13, by kjliew

User metadata
Rank Oldbie
Rank
Oldbie
robertmo wrote on 2020-05-16, 13:12:

That is the upstream QEMU solution for 3D acceleration. It's been working really well for several years now for games with official Linux OpenGL port or community source port. I was able to run Heavy Gear II and Shogo MAD Linux version at 60FPS locked from QEMU. Yamagi Quake II and AvP1 source ports also work. It supports OpenGL 3.2/OpenGL ES 2.0 and the solution allegedly does not pass through OpenGL at API level, it works at lower level at the Gallium3D TGSI, which claimed to be more secure.

However, it requires Linux guest on Linux host setup, nothing is supported when either one was Windows. Perhaps it wasn't created with the expectation that one would prefer playing Windows games within KVM when WINE has been serving the purposes of running Windows games on Linux.

The Virgil3D solution does come with overhead. It is not obvious for games that only require up to OpenGL 1.4 APIs. With the OpenGL 2.0 glmark2 benchmark, QEMU MESAGL running glmark2 from WinXP guest outperforms Virgil3D running glmark2 from Ubuntu guest by a huge margin on the same ArchLinux host. If Virgil3D Windows guest support is ever materialized, then more games-centric comparisons can be made to justify the trade-offs between performance and security.

Reply 13 of 13, by kjliew

User metadata
Rank Oldbie
Rank
Oldbie

I was able to get a WineD3D working over QEMU MESA GL for accelerated Direct3D games on Window XP guest. This was from a past version of Wine-staging 1.8.6. Long story short, the entire Wine development is Linux focus and re-purposing any of its components for Windows VMs is challenging. So I tried out several versions and 1.8.6 was so far doing great at least for some games that I was interested. The compatibility was either hit or miss, and due to Windows XP aging MSVCRT more recent versions of WineD3D would not work. Win98/ME does not work yet, although I would love to be able to get it working there for better legacy games compatibility for DirectX5~7.

This is essentially running a Direct3D guest wrapper on top of OpenGL guest wrapper in emulated guest environment, the overhead is high. It is still very much WIP to identify any opportunity for optimization to speed things up and make more efficient use of FIFO to bulk up the payload. Since this was an old version of Wine, not everything would work. Any Direct3D apps/games that require windowed mode (for setup, cut-scenes etc) or mix GDI32/DirectDraw 2D calls with Direct3D are likely to be in troubles due to the way MESA GL pass-through was implemented.

Here's some of the games/demos I tried:

  • PCPlayer Direct3D benchmark (DirectX5). Run 1024x768x32bpp at near 60FPS but app fault 0xC0000005 on exiting demo.
  • Tomb Raider II (DirectX 5) good.
  • Shogo MAD (DirectX 6.1) good.
  • Blood Rayne 1 (DirectX8.1) run very slow <5FPS. The same scene with OpenGL rendered at ~35FPS.
  • 3DMark2001SE (DirectX8.1) OK.
  • 3DMark2000 (DirectX7) crashed.
  • MotoRacer 1 (DirectX5) game crashed, though i would love to be able to make it work.

WineD3D over QEMU MESA GL

Attachments

  • wined3d-qemu.png
    Filename
    wined3d-qemu.png
    File size
    603.16 KiB
    Views
    17 views
    File comment
    WineD3D QEMU
    File license
    Fair use/fair dealing exception