VOGONS


First post, by kjliew

User metadata
Rank Oldbie
Rank
Oldbie

As an introduction, this is *NOT* an upstream endorsed QEMU feature. QEMU native virgil3D/virtio-GPU provides support for complete OpenGL/GLES acceleration for Linux guests from Linux hosts. There were efforts started in 2018 to support Windows guests, but progress seemed to be at halt at the moment. Even the codes posted on GitHub are hard to rebuild because it requires Microsoft tools and SDKs.

Looking at how 3Dfx did standalone OpenGL for Voodoo/Voodoo2, it is in fact a viable solution to use the same concept in VM for full-screen OpenGL rendering, much simpler than writing a complete OpenGL ICD and interaction with GDI/DirectX/Display drivers. So In addition to 3Dfx Glide pass-through, QEMU can now support full host OpenGL acceleration through MESA GL pass-through with the focus for playing full-screen OpenGL Windows games. The Glide API is very similar to OpenGL. The learning from Glide pass-through was leveraged tremendously with similar guest push model and pass-through mechanism of command/data FIFOs but greatly enlarged to handle the data sets of OpenGL. No Microsoft tools and SDKs are required at all.

I presume it is currently at OpenGL 1.1 complete and includes some popular GL extensions that were widely used for OpenGL games in years 1997-2002 such as
ARB_multitexture, ARB_compiled_vertex_array etc. While dgVoodoo2 has been great for Glide Windows games on Windows, Linux aren't so lucky. Fortunately, most of the late Glide games (2000+) also support OpenGL as OpenGL started gaining popularity. The addition of MESA GL pass-through greatly addresses the shortcomings of OpenGlide for Linux KVM gaming. Furthermore, switching to OpenGL also enables higher visual qualities such as 32-bit color and high-res textures.

For Glide-only games without the option for OpenGL, there is also another option of running OpenGL-based Glide wrappers on guests. Current MESA GL pass-through supports enough OpenGL capabilities for the highly capable Zeckensack's Glide wrapper and Sven's Glide3-to-OpenGL wrapper for Diablo II. Games such as NFS Porsche 2000, Titanium Mechwarriors 2 series and Diablo II run perfectly flawless with the OpenGL wrappers at high resolution at the speed of modern CPU virtualization.

Other OpenGL games tested running great (high FPS 35~60, full details, equivalent or better than Glide using dgVoodoo2) are GLQuake1/Quake2/Quake3, Half Life, Unreal/UT99, Homeworld, Serious Sam TFE, Hitman Code 47. Much higher details OpenGL games (typically after year 2001) may run at 30FPS or lower, for eg. BloodRayne1 (2001) and enable water reflection kills another 6~7FPS. UT2003 and Doom3 (2004) are able to run but at <5 FPS.

Testing is always ad hoc and per-game basis, typically involves analyzing the game GL calls trace then implement/fix new existing GL functions pass-through. As usual, QEMU source code overlay is provided at my GitHub and if you are interested to try out, then you will need to be able to apply the patch and build QEMU from source.

Last edited by kjliew on 2020-04-23, 19:10. Edited 2 times in total.

Reply 1 of 16, by kjliew

User metadata
Rank Oldbie
Rank
Oldbie

Attached the MESA GL guest wrapper for Windows 9x/ME/2K/XP/Vista/Win7, 32-bit version only, with test "wglgears" included for basic sanity check.

For standalone OpenGL, the easiest is to drop the DLL into the same folder as the game EXE, so that the game will find the DLL first before it looks into Windows folder. Per Microsoft documentation, this should work for all Windows, but if there is a "ghost" in the guest OS preventing this from working, I have no idea.

In theory, if you are able to defeat the OS system file protection, then you can replace the copy in the Windows system folder to save the hassle of having to update multiple game folders, but I do not recommend such way. Do it at your own risk. Or if you are able to run Win95/OSR2 on QEMU which does not ship with OPENGL32.DLL, then you can just drop the DLL in the OS system folder and be done with it.

If both Glide and MESA pass-through are available in the QEMU, then the recommended placement of DLLs is to place Glide guest wrappers in Windows system folder and MESA GL guest wrapper in the game folder. For some reason if dgVoodoo2 does not work for you and you want to try OpenGL Glide wrappers, then you need to drop the GLIDE2X/3X DLL in the game folder for them to use the right DLL for MESA GL pass-through. Simply delete the GLIDE2X/3X DLL in the game folder to switch back to Glide pass-through.

Attachments

Last edited by kjliew on 2020-05-04, 02:14. Edited 1 time in total.

Reply 2 of 16, by kjliew

User metadata
Rank Oldbie
Rank
Oldbie

It's a dream come true 😁 !
With the Glide 2.1.1 wrapper, I have finally able to play Mechwarrior 2 31st 3Dfx on Linux laptop on QEMU KVM with Zeckensack's Glide Wrapper at 1280x960 using the wrapper high-res option to double the resolution. It plays smoothly hovering between 20-30fps, which is perfectly fine and does not hit the jumpjet slow recharge issue if it approaches 60fps. The mesa dessert map of Trial of Grievance is a huge map and it is typically slower than typical missions.
Thanks to Linux KVM, it does not even require bulky desktop with powerful CPU and hot-air blower GPU. The game plays perfectly fine on Core i3-4010U and Core m3-6Y30 with their respective Intel Graphics.

mech2_1280x960.png
Filename
mech2_1280x960.png
File size
240.95 KiB
Views
442 views
File comment
mech2 1280x960
File license
Fair use/fair dealing exception

Reply 3 of 16, by kjliew

User metadata
Rank Oldbie
Rank
Oldbie

I have finally been able to nail down a puzzle of performance issue with QEMU that games run super-fast on Win98SE/WinME (Quake2/3 at 200~300FPS) but slow (... not exactly ... 40~60FPS) on Win2k/WinXP. The puzzle was there all the time which also affected Glide pass-through on QEMU, but because the playable framerate was still acceptable somewhat, I didn't spend time looking further. As more complex and recent games were brought into QEMU, it became a bottleneck as some games simply require Win2K/WinXP, especially for OpenGL games and I wasn't interested in the after-life KernelEx patch for Win98SE. I prefer to simply keep everything as official as possible.

Adding more mystery to the puzzle, the performance penalty happened unanimously across CPU (Intel, AMD) and GPU (Intel, AMD, NVIDIA) vendors on Windows 10 WHPX. On Linux KVM, Intel CPU and GPU performance simply crushed the rest. Unfortunately, I don't have Intel/NVIDIA or Intel/AMD CPU/GPU combo, otherwise the puzzle would have been more obvious. To put this in simple exaggeration, a Core i3-4010U Haswell exhibited an insane performance in Quake2/3 FPS that crushed more recent AMD Ryzen 2500U APU and desktop AMDFX8300/NVIDIA GT730. Truly, unbelievable! 😳 I knew it was't because of v-sync limitation because the moment the games run on Win98SE/WinME, the FPS blew off the roof.

My recent Google'ing pointed to AMD-V NPT issue in QEMU/KVM, so I tried NPT disabled on Linux KVM, and yeah, AMDFX8300/NVIDIA GT730 regained the expected performance. However, the puzzle wasn't yet solved completely, NPT was supposed to improve KVM performance by reducing frequent VMEXIT due to guest memory paging. There were numerous whitepapers from both Intel and AMD claimed that EPT/NPT improves virtual machines performance by 40% in memory intensive workloads. Adding fire to the puzzle, AMD-V NPT issue was confirmed a KVM software issue (VMWare and XenHV do not have the issue) and fixed in 2017 kernel commits.

So it turned out that the MSR_IA32_CR_PAT is the main culprit, making virtual device-mapped host/guest shared memory uncacheable from the virtual CPU. Virtual device-mapped shared memory was the main implementation concept for Glide and MESAGL pass-through. Contrast to real hardware, device-mapped memory would typically be uncacheable or write-combined and device drivers manage the memory mapping in kernel space. Glide and MESAGL pass-through are driver-less virtual device from the OS perspective, so it is inevitable for the shared memory to be mapped as uncacheable.

QEMU virtualizes MSR registers access from guest. With WHPX on Windows, it does not even sync the IA32_PAT register during the host/guest states sync. There is no visibility into the closed-source WHPX how things are managed behind the scene, but from the performance figures I would conclude that it didn't handle the IA32_PAT correctly for both Intel and AMD. Fair enough, if you would say, no red flag 😁. With KVM on Linux, a much more matured virtualization accelerator, the difference prevailed. The VMX-based kvm_intel got it right but SVM-based kvm_amd remained the same. Poor AMD, apparently Intel 's proceed in x86 hardware virtualization extension fragmentation paid off. There were no short of rants, frustration and disappointment on how AMD had failed to address the issue in time for Ryzen/EPYC launch. Well, this could also be the hard ball for QEMU-side of implementation, but with Intel specific VMX-based implementation getting it right regardless of hypervisor implementation, it simply made AMD look very bad for virtualization.

I don't have the correct fix for QEMU, so let just leave that for upstream. I have host-side hooks for Glide and MESAGL pass-through that reprogram and restore guest-side IA32_PAT register when Glide or MESAGL is activated and this completely restores the rightful performance for both Intel and AMD CPUs on Windows/WHPX and Linux/KVM. Perhaps there is also a better solution by creating proper kernel drivers for the virtual devices, but to do that and the need of Microsoft tools and DDKs, I will just forget it.

Anyway, I am thrilled with the improved performance, Unreal Tournament 2003 is now playable through OpenGL on QEMU 😀, all from a petty fanless, 8W TDP Core m3-6Y30 Skylake-GT2, Windows 10 and Linux, even on battery (but it will drain fast 😜)

UT2003 demo benchmark wrote:

Flyby 148.59FPS
Botmatch 73.79FPS
Resolution 1024x768

And, MESAGL pass-through seems to be able to achieve 90% of native performance in QEMU for advanced OpenGL techniques leveraging VBOs that minimize per-frame drawing calls instead of client-side vertex arrays. DOOM3 (2004) is still not playable but it runs better without crash by exposing the VBO extension.

Reply 4 of 16, by kjliew

User metadata
Rank Oldbie
Rank
Oldbie

Mechwarrior2 Mercenaries from Titanium series. QEMU KVM MESAGL pass-through with Zeckensack's Glide wrapper, up scaled to 1280x960.
The frame rate was simply phenomenal 😳 regardless of how intense the actions going on, it simply locked at 60FPS. I remember I was not able to get this kind of performance running the game on Windows XP with compatibility mode on a Core 2 Duo E8400 and Geforce 9400 regardless of Glide or Direct3D. The frame rate would drop below 15FPS during missle firing and explosions and the stutter was obvious when such happened. I was very disappointed back then because the new system no longer support Win98. Now, worry no more 😁, and it plays well even on petty, power efficient laptops.

MercsTT-1.png
Filename
MercsTT-1.png
File size
1002.76 KiB
Views
338 views
File comment
Mech2 Mercs TT
File license
Fair use/fair dealing exception

Reply 5 of 16, by kjliew

User metadata
Rank Oldbie
Rank
Oldbie

Unreal Tournament 2004 demo v3334, OpenGL, had to tweak OpenGLDrv settings in INI to enable VBO with "UseVBO=True".

ut2004.png
Filename
ut2004.png
File size
1.28 MiB
Views
326 views
File comment
UT2004
File license
Fair use/fair dealing exception

Reply 6 of 16, by Dominus

User metadata
Rank DOSBox Moderator
Rank
DOSBox Moderator

wow! Seems I do need to jump on Quemu one of these days... But with Apple's help in crippling OpenGl it's probably all futile 🙁

Windows 3.1x guide for DOSBox
60 seconds guide to DOSBox

Reply 7 of 16, by kjliew

User metadata
Rank Oldbie
Rank
Oldbie

😁 Anyone remember this game? One of the tough games released in 2001 that required high system specs of the time to play. It wasn't very optimized, but luckily it supports 3Dfx Glide.
It is now playable on Linux KVM, on 8W TDP fan-less Intel laptop. Up scaled to 1280x960. The OpenGL renderer was unfortunately extremely un-optimized and was unable to attain playable frame rate. The same shot position with dynamic lights and shadows on OpenGL renderer only achieved 13~15FPS.

GuessAGame.png
Filename
GuessAGame.png
File size
1.57 MiB
Views
305 views
File comment
Severance Glide
File license
Fair use/fair dealing exception

More on Severance...

SevernGL.png
Filename
SevernGL.png
File size
1.84 MiB
Views
255 views
File comment
Severance OpenGL
File license
Fair use/fair dealing exception

Severance OpenGL rendered at native 1024x768. It won't get more FPS by reducing resolution or less FPS by increasing resolution. So basically this game trying to achieve the level of details with just OpenGL 1.1 level APIs, not very efficient in VM context of passing through GL calls, but the textures, lights and shadows are different from Glide rendering. Nothing good or bad anyway, Glide still looks OK even though the wrapper had upscaled from 640x480., though that is a personal taste. OpenGL render plugin is beta and unsupported, so that's about it.
Update: Look like I was wrong. I forgot to make EXT_fog_coord GL calls into FIFO hence causing frequent VMEXIT when the GL calls were used in the games. After fixing this, Severance OpenGL is fully playable and perhaps on occasion vsync should be enabled to prevent from drastic FPS fluctuation at 1280x960 native rendering resolution.

Last edited by kjliew on 2020-05-13, 12:58. Edited 4 times in total.

Reply 10 of 16, by kjliew

User metadata
Rank Oldbie
Rank
Oldbie
digger wrote on 2020-05-16, 09:31:

Any chance you can have it accepted to be merged in mainline QEMU any time soon?

I highly doubt it will be accepted. Many years back then, some guys (presumably from Intel) did similar things but no patch/source code was ever published. I only found a PDF presentation.. They did it for getting QEMU to run MeeGO as an emulator for development.

QEMU devs community has clearly rejected the idea of passing through an entire OpenGL at API level. It was claimed to be difficult to manage (a few thousands of APIs function calls) and pose a huge attack surface for potential security risks. There were whitepapers published by the BlackHat community targeting Hyper-V, VMWare and VirtualBox on their Guest-to-Host 3D acceleration implementations to instrument an attack on the hypervisors for guest to escape containment and gain access into host. I don't remember if anyone dare to claim their implementation is secure. Both VMWare and VirtualBox do not recommend enabling 3D acceleration for commercial VM deployment, if security is important.

Anyway, I did mention about it and GitHub link was provided when I reported the x86/PAT issue to QEMU-devel mailing list, so they can take it if they want to. That's about all I would do.

Reply 12 of 16, by kjliew

User metadata
Rank Oldbie
Rank
Oldbie
robertmo wrote on 2020-05-16, 13:12:

That is the upstream QEMU solution for 3D acceleration. It's been working really well for several years now for games with official Linux OpenGL port or community source port. I was able to run Heavy Gear II and Shogo MAD Linux version at 60FPS locked from QEMU. Yamagi Quake II and AvP1 source ports also work. It supports OpenGL 3.2/OpenGL ES 2.0 and the solution allegedly does not pass through OpenGL at API level, it works at lower level at the Gallium3D TGSI, which claimed to be more secure.

However, it requires Linux guest on Linux host setup, nothing is supported when either one was Windows. Perhaps it wasn't created with the expectation that one would prefer playing Windows games within KVM when WINE has been serving the purposes of running Windows games on Linux.

The Virgil3D solution does come with overhead. It is not obvious for games that only require up to OpenGL 1.4 APIs. With the OpenGL 2.0 glmark2 benchmark, QEMU MESAGL running glmark2 from WinXP guest outperforms Virgil3D running glmark2 from Ubuntu guest by a huge margin on the same ArchLinux host. If Virgil3D Windows guest support is ever materialized, then more games-centric comparisons can be made to justify the trade-offs between performance and security.

Reply 13 of 16, by kjliew

User metadata
Rank Oldbie
Rank
Oldbie

I was able to get a WineD3D working over QEMU MESA GL for accelerated Direct3D games on Window XP guest. This was from a past version of Wine-staging 1.8.6. Long story short, the entire Wine development is Linux focus and re-purposing any of its components for Windows VMs is challenging. So I tried out several versions and 1.8.6 was so far doing great at least for some games that I was interested. The compatibility was either hit or miss, and due to Windows XP aging MSVCRT more recent versions of WineD3D would not work. Win98/ME does not work yet, although I would love to be able to get it working there for better legacy games compatibility for DirectX5~7.

This is essentially running a Direct3D guest wrapper on top of OpenGL guest wrapper in emulated guest environment, the overhead is high. It is still very much WIP to identify any opportunity for optimization to speed things up and make more efficient use of FIFO to bulk up the payload. Since this was an old version of Wine, not everything would work. Any Direct3D apps/games that require windowed mode (for setup, cut-scenes etc) or mix GDI32/DirectDraw 2D calls with Direct3D are likely to be in troubles due to the way MESA GL pass-through was implemented.

Here's some of the games/demos I tried:

  • PCPlayer Direct3D benchmark (DirectX5). Run 1024x768x32bpp at near 60FPS but app fault 0xC0000005 on exiting demo.
  • Tomb Raider II (DirectX 5) good.
  • Shogo MAD (DirectX 6.1) good.
  • Blood Rayne 1 (DirectX8.1) run very slow <5FPS. The same scene with OpenGL rendered at ~35FPS.
  • 3DMark2001SE (DirectX8.1) OK.
  • 3DMark2000 (DirectX7) crashed.
  • MotoRacer 1 (DirectX3) game crashed, 😁 yeah it worked by setting desktop color to 16-bit instead of 32-bit

WineD3D over QEMU MESA GL

Attachments

  • wined3d-qemu.png
    Filename
    wined3d-qemu.png
    File size
    603.16 KiB
    Views
    111 views
    File comment
    WineD3D QEMU
    File license
    Fair use/fair dealing exception

Reply 14 of 16, by kjliew

User metadata
Rank Oldbie
Rank
Oldbie

Yeah, WineD3D works on Win98/ME!!😁 Some old Direct3D games (pre-DirectX5) may require setting desktop to 16-bit color as they can't handle surface descriptor from WineD3D for 32-bit color. Moto Racer 1 which is actually DirectX3 seemed to be one of those. With Win98/ME on QEMU, it is now very easy to get Moto Racer 1 working simply by installing from CD and applying the final official MR3.22 patch for full Direct3D version.

Unfortunately, the game resolution is limited at 640x480, which is kinda lame for today's standard and WineD3D does not have scaling option (correct me if I was wrong). Otherwise, it will be still fun and thrilling to play it on modern systems. Other Direct3D games such as Tomb Raider II, Fighting Forces and Rage Incoming support all the resolutions reported by Direct3D which makes them still play great on modern systems. Moto Racer probably made a short-sighted decision back then that no Direct3D hardware was capable of rendering higher than 640x480 at acceptable frame rate for such a fast paced game.

Anyway, 640x480 would still be OK on 1366x768 HD panel on most 11.6" valued laptops.

Moto Racer 1 on QEMU

moto1.png
Filename
moto1.png
File size
396.23 KiB
Views
83 views
File comment
Moto Racer 1
File license
Fair use/fair dealing exception

Reply 15 of 16, by digger

User metadata
Rank Member
Rank
Member

First of all: great work!

The VirtualBox GPU guest driver for Windows also relies on WineD3D for 3D acceleration in Windows guests.

It would be very useful if QEMU were enhanced to support that same guest driver. It might prevent you from having to reinvent the wheel w.r.t. full screen vs windowed switching, seamless windows, etc. As a bonus, the VirtualBox guest GPU driver also support 2D video playback acceleration. And it's open source GPL 2.0, so the same license as QEMU.

Also, have you considered DXVK (Direct3D on top of Vulkan) with Vulkan passthrough, for potentially even better guest 3D performance?

Reply 16 of 16, by kjliew

User metadata
Rank Oldbie
Rank
Oldbie
digger wrote on Yesterday, 10:32:

First of all: great work!
The VirtualBox GPU guest driver for Windows also relies on WineD3D for 3D acceleration in Windows guests.

Thanks. I actually doubt if VirtualBox 3D acceleration for guests is better than QEMU. They have something implemented prior to version 6.x that make use of Chromium OpenGL render (not to be confused with Chromium the browser) and that was the implementation which uses WineD3D for Direct3D acceleration. However, support for games and anything below Direct3D9 was poor as the implementation was focus on getting desktop 3D composition such as the Vista/Windows 7 Aero DWM to work in VMs. I did search on VirtualBox forum and there weren't many success stories of running games on VirtualBox. I hope someone could show me some proofs of success for running any 3D games, OpenGL or Direct3D, under VirtualBox. I only knew VMWare has the best Direct3D implementation for virtual machine but again support for anything less than Direct3D9 was poor, not to mention that many of pre-DirectX9 games may require Win98.

In my opinion, QEMU's own virgil3D/virtio-gpu is a better implementation of 3D acceleration for guests. It is just a matter of time when those will be available for Windows or never. My interim solution is mainly focus on games and restores the games playability through QEMU on modern systems, Linux and Windows 10.

digger wrote on Yesterday, 10:32:

Also, have you considered DXVK (Direct3D on top of Vulkan) with Vulkan passthrough, for potentially even better guest 3D performance?

I am already measuring insane performance on my desktop or extremely fast, 60FPS locked, playable frame rate even on laptops with efficient, low-power mobile CPUs for those games that are cumbersome to make them run on Windows 10. Such as:

  • Shogo MAD, did not measure actual in-game FPS, but completely smooth to the point that the action felt like cloud walking.
  • Drakan Order of Flame 1280x960 32bpp, all details max, 92 FPS.
  • Expendable, 1280x960 32bpp, all details max over 100 FPS.
  • Moto Racer 1, over 250 FPS.
  • Incoming, 1024x768 32bpp over 100 FPS.
  • Tomb Raider II, no in-game FPS but again completely smooth at 1280x960 32bpp all details max. Testing of the remaining series III, IV, V and Angel of Darkness I, II is WIP.
  • Clive Barker Undying, Direct3D renderer. With Glide renderer (which is the best) mirror refection only works with dgVoodoo2. OpenGlide, Zeck's GlideWrapper and nGlide all failed to render mirror reflection. Modern UTGLR simply missed this one, so WineD3D actually did it with similar graphics quality across Windows and Linux.

For games that require the performance level of DXVK, they should be modern enough that would probably still run fine on native Windows 10 or with little help from dgVoodoo2. For Linux, they would just use DXVK directly through Wine. I do not completely reject the idea of Vulkan APIs pass-through. Since Vulkan is a fairly recent graphics API and my interests still mainly on old games, Glide, pre-2006 OpenGL and pre-Direct3D9, I am not motivated to work on it. Hopefully by the time I need Vulkan API on my VMs, someone would have done it and I don't have to do it myself again 😁