VOGONS


SoftGPU: OpenGL + DirectX + Glide driver for Windows 95/98/Me

Topic actions

Reply 160 of 169, by JH64

User metadata
Rank Newbie
Rank
Newbie

LSS10999+RayeR: GDI is still pain, there 2 two reasons for this, first is that DIB driver is reading from GPU (when scrolling, it reads block from VRAM, moves pointer of x lines and writes it back). Second is that the DIB engine runs in PM16 and all operations crossing segments are super slow. Other drivers are solving this with HW blit, but in VESA is nothing like that. Solution can be shadow frame buffer in RAM and updating by one way copy to video ram. I’m already doing when emulating double buffering (on VMware), but shadow buffer is still in VRAM, but essence is similar. I can also implement BLIT in PM32 RING-0 (more important is PM32 than RING-0), resp. I can use existing code from HAL, but I think that reading operations from GPU are main bottleneck.

Fortunately, I made the things bit faster by is by implemented my own procedures for working with the mouse cursor, but GDI is still quite slow.

Reply 161 of 169, by MrMateczko

User metadata
Rank Member
Rank
Member

I've tested the newest version (v0.8.2025.53 special edition available for donators) on my ThinkPad X230 and DOS windows work quite well (even in windowed mode, tested only text applications) and window dragging/scrolling/shutdown animation are not as laggy as with VBEMP.
The update modes option from the tray icon worked fine and after performing it - 1366x768 resolution showed up correctly.

In 3DMark2001 SE with default installation settings I got... 241 points 😜 But all game tests did complete.

DX7/DX8/DX9 video tests in DXDiag are working, though of course, the cube spins somewhat slowly.

There are too many options in the installer and too many hardware I have...and not enough time to test everything 😒

I've also enabled the AVX hack, but llvmpipe still shows only 128bit.

Reply 162 of 169, by LSS10999

User metadata
Rank Oldbie
Rank
Oldbie
JH64 wrote on 2025-08-01, 08:02:

LSS10999+RayeR: GDI is still pain, there 2 two reasons for this, first is that DIB driver is reading from GPU (when scrolling, it reads block from VRAM, moves pointer of x lines and writes it back). Second is that the DIB engine runs in PM16 and all operations crossing segments are super slow. Other drivers are solving this with HW blit, but in VESA is nothing like that. Solution can be shadow frame buffer in RAM and updating by one way copy to video ram. I’m already doing when emulating double buffering (on VMware), but shadow buffer is still in VRAM, but essence is similar. I can also implement BLIT in PM32 RING-0 (more important is PM32 than RING-0), resp. I can use existing code from HAL, but I think that reading operations from GPU are main bottleneck.

Fortunately, I made the things bit faster by is by implemented my own procedures for working with the mouse cursor, but GDI is still quite slow.

Did some tests with the new driver.
- It seems GDI performance is indeed weak. Explorer windows, as well as games such as Solitaire, draw much slower compared to VBEMP9x.
- Mouse cursor may flicker when the system is busy. I don't recall this happened with VBEMP9x, but probably related to GDI performance.
- Scrolling in WordPad may perform a bit better compared to VBEMP9x under some circumstances, but still very slow.
- The background darkening effect when opening the Shutdown dialog, however, does not differ much in drawing time compared to VBEMP9x.

I think this driver's memory test may not be a good idea for baremetal video cards, and better leave it disabled by default in such cases.
- With AMD video cards this will lead to the BSOD you mentioned, requiring the use of "NoMemTest"="1", when setting especially a 32-bit color resolution.
- With nVidia video cards the BSOD doesn't occur but the memory test can mess up the display the moment before Windows actually starts, if startup logo enabled. This is mostly harmless, though.

By the way, Direct3D does work with this driver, and can be tested with DXDIAG. The performance is far from ideal, however.

Reply 163 of 169, by LSS10999

User metadata
Rank Oldbie
Rank
Oldbie

I just noticed another thing with the new driver, regarding issues with DOS boxes.
- DOS games that tries to change to a fullscreen VGA mode while in a windowed DOS box will fail and stuck in text mode. This includes starting the DOS game's executable directly.
- Making MS-DOS Prompt enter full screen will lead to a totally blank screen with only the text cursor visible, but works normally, as if it's just the text mode fonts were emptied. DOS games, if launched in this state, will work properly. Exiting the game will make text mode fonts visible again.
- The issue above also applies to "Exit to DOS" in the shutdown options. Exiting to DOS directly will lead to an empty text mode, but if I started a DOS game in a fullscreen "MS-DOS Prompt" then exits (which restores text mode), text mode will work normally upon exiting to DOS.

On the other hand, this breakage somehow avoided an annoying issue I'm having on a system that is using an nVidia video card, that it would cause the system to hang for a while, beep, then resume afterwards, whenever I open an "MS-DOS Prompt", and could also happen with certain programs invoked by installers in the background (without actual window). With this driver the issue doesn't happen. On systems with AMD cards this manifests as flickers (screen turning blank for a brief moment) without any hang/beep, and also doesn't happen with this driver.

EDIT: Okay, just read the CONFIG.MD and it seems this particular parameter (DosWindowSetMode) is responsible for controlling the modeset functionality. By default this is 0 so DOS programs cannot set mode, which also prevents any odd behavior that may happen (like nVidia cards touching the CRTC register).

Reply 164 of 169, by JH64

User metadata
Rank Newbie
Rank
Newbie

MrMateczko: I’m glad that works. For using AVX in LLVMpipe is needed to set in registry HKEY_LOCAL_MACHINE\Software\vmdisp9x\apps\global\mesa\LP_NATIVE_VECTOR_WIDTH to “256” - sorry to make this complicated, but I’m bit paranoid about CPU registers when the operation system has no idea about they existence.

LSS10999: thanks for testing and comparison with VBEMP9x. About slowness I’m able to underline VRAM with RAM and sending changes to VRAM - this speeds up system GUI but slow down multiple buffering in games and fullscreen application – there are more ways how to solve it, so far, I like the linear framebuffer memory to be mapped partly to video ram and partly to system ram depending on whether it is a system area or flipping surface.

Mouse cursor is hidden between system GUI start drawing and end drawing, normally it is fast enough to be invisible. When system routines for software cursor is used, cursor is only remove from area which is accessed (not whole screen), this reduce flashing effect but isn’t 100 % safe and you can time to time see cursor remains variously across the screen – this is significantly in DD/DX games, because most accelerated application assumes that cursor is hardware accelerated. When I’ll create extra buffering, this effect disappear.

Also, I’ll turn off video ram testing and believe that reported framebuffer is size is real. Problem with DOS window I describe here - you probably already found it (fix is start the DOS games in fullscreen), and I’m not having better solution, with the driver I can forbid the DOS application to exit fullscreen but can’t able to forbid the enter to fullscreen (I wonder what was going through the mind of whoever came up with this behaviour... but rather not 😀 ).

Conclusion what I’ll able to fix (I hope):
- slowness in system GUI
- reduce cursor flashing
- disable VRAM test by default
- better check AMD 16bpp modes
- check state when system has multiple video cards (or one double head) and use only first one (BSOD currently)

What I won’t be able to fix:
- 3D rendering speed (or not much)
- DOS window behaviour

I think I'll fix the things that can be fixed in a few days. The others stay unfixed until something enlightens me 😀

Reply 165 of 169, by LSS10999

User metadata
Rank Oldbie
Rank
Oldbie
JH64 wrote on 2025-08-02, 21:31:
LSS10999: thanks for testing and comparison with VBEMP9x. About slowness I’m able to underline VRAM with RAM and sending changes […]
Show full quote

LSS10999: thanks for testing and comparison with VBEMP9x. About slowness I’m able to underline VRAM with RAM and sending changes to VRAM - this speeds up system GUI but slow down multiple buffering in games and fullscreen application – there are more ways how to solve it, so far, I like the linear framebuffer memory to be mapped partly to video ram and partly to system ram depending on whether it is a system area or flipping surface.

Mouse cursor is hidden between system GUI start drawing and end drawing, normally it is fast enough to be invisible. When system routines for software cursor is used, cursor is only remove from area which is accessed (not whole screen), this reduce flashing effect but isn’t 100 % safe and you can time to time see cursor remains variously across the screen – this is significantly in DD/DX games, because most accelerated application assumes that cursor is hardware accelerated. When I’ll create extra buffering, this effect disappear.

Also, I’ll turn off video ram testing and believe that reported framebuffer is size is real. Problem with DOS window I describe here - you probably already found it (fix is start the DOS games in fullscreen), and I’m not having better solution, with the driver I can forbid the DOS application to exit fullscreen but can’t able to forbid the enter to fullscreen (I wonder what was going through the mind of whoever came up with this behaviour... but rather not 😀 ).

Conclusion what I’ll able to fix (I hope):
- slowness in system GUI
- reduce cursor flashing
- disable VRAM test by default
- better check AMD 16bpp modes
- check state when system has multiple video cards (or one double head) and use only first one (BSOD currently)

What I won’t be able to fix:
- 3D rendering speed (or not much)
- DOS window behaviour

I think I'll fix the things that can be fixed in a few days. The others stay unfixed until something enlightens me 😀

There's no need to change the DOS window behavior regarding forbidding modesetting by default. It's better this way as this avoids many issues, which is also affecting VBESVGA and it's currently blocking a system from booting Win3.x in 386 Enhanced Mode properly. As for the text mode font getting emptied... not sure what caused it initially, but when I tried exiting to DOS from Windows directly (without starting any fullscreen DOS program) yesterday, I did not get the issue and the DOS command line was working properly.

By the way, is it possible to initialize the back buffer before the system startup? When Windows is right about to start, the screen would briefly flash with some garbled stuffs consisting of whatever that was in the video memory, similar to this issue that I once observed in VBESVGA (now fixed). I did not recall seeing this with VBEMP9x, though.

Don't know under what circumstances would the other parameters (HWDoubleBuffer and MTRR) make any difference. Tried fiddling with this parameter on one system but no noticeable difference in performance...

Reply 166 of 169, by RayeR

User metadata
Rank Oldbie
Rank
Oldbie

MTRR should be always set to write combining mode, it brings significant memory transfer speed up (on PCIe systems usually a magnitude higher or more). It can be set by other tool (e.g. with my mtrrlfbe) before win9x boots s no need to implement it in drivers, AFAIK VBEMP implemented this itself. Sometimes on some modrn systems it may happen MTRR are not set properly and then no difference in performace visible. It may be tricky as sometimes there are no enough free MTRRs left for LFB settings...

Gigabyte GA-P67-DS3-B3, Core i7-2600K @4,5GHz, 8GB DDR3, 128GB SSD, GTX970(GF7900GT), SB Audigy + YMF724F + DreamBlaster combo + LPC2ISA

Reply 167 of 169, by MERCURY127

User metadata
Rank Member
Rank
Member
RayeR wrote on 2025-08-03, 14:06:

Sometimes on some modrn systems it may happen MTRR are not set properly and then no difference in performace visible.

if machine supported more 64 GiB total memory — UEFI set Default memory type (IA32_MTRR_DEF_TYPE MSR 2ffh = 06 0c 00 00 00 00 00 00) to WB, and need patch UEFI to resolve problem. veryfied on Huanazhi X99-TF + Xeon e5-2666 v3.

Reply 168 of 169, by DoZator

User metadata
Rank Member
Rank
Member

I tried to check all this. Using the example of the trial version of ColinMcRaeRallyDemo.exe (1998), which has serious compatibility problems with nVidia under Windows 98 + ForceWare 60.86 (and above), which affects ALL PCI-E video cards from nVidia that are compatible with 9x, as well as all PCI\AGP solutions based on the NV40 chip (and above, up to the G7X). Specifically, the main game menu is not displayed (the game itself works flawlessly). Similar problems are found in some other DirectX 6 games.

1) Using "softgpu-0.6.2024.36.iso" managed to make the game work:

The attachment CMSW36.PNG is no longer available

Here are used - ddraw.dll, wined3d.dll, winedd.dll and OpenGL32.DLL (SoftWare). The main game menu is displayed correctly, the game works correctly. But there are three significant disadvantages:

- very low performance (About 30FPS);
- there are no anti-aliasing\filtering effects (AA\AF) x16, which makes the image look bad - there are "steps" and constant flickering of textures, objects, and other particles, which is especially noticeable in motion);
- for some reason, the car looks "flattened": for comparison, here is how the car looks on real hardware:

The attachment CMHW_NV.PNG is no longer available

2) Now I tried to use hardware acceleration using the same set from "softgpu-0.6.2024.36.iso" (ddraw.dll, wined3d.dll, winedd.dll), except for the software "OpenGL32.dll" (Instead, it uses the shared system OpenGL32.dll + nvOpenGL.dll version 71.84 from nVidia, OpenGL Version: "1.5.3"). Surprisingly, the game is working (although it is not documented). The main game menu is now fully displayed. However, the game itself is played almost correctly:

The attachment CMSWHW1.PNG is no longer available
The attachment CMSWHW2.PNG is no longer available
The attachment CMSWHW3.PNG is no longer available

What's noteworthy is that it has a very good stable FPS (the same as with native DDRAW.DLL on real hardware) and it seems to correctly apply AA\AF x16. However, there is a significant drawback that you may have already noticed... It's a bit disappointing, isn't it? It's as if it's missing just a little bit to finally work properly with any problematic ForceWare driver, ranging from 60.86 (GL Version: 1.5.1) to 82.69 (GL Version: 2.0.1).

In the new version of "softgpu-0.8.2025.50.iso", there was even a slight regression compared to the version tested above, and the "wined3d.dll" component was completely broken (the game now displays a black screen, although the game itself works and the sound is audible) 🙁

PS: I was able to localize the problem a bit: specifically, any version of "wined3d.dll" newer than "1.7.55.38-sse3" causes a black screen (this is the latest version that still works reasonably well). Versions "wine9x-1.7.55.40-sse3" and "wine9x-1.7.55.45-sse3" already stably cause a black screen, immediately, when starting the game and regardless of the used type of OpenGL (llvmpipe\softpipe\nvOpenGL). At that, after leaving the game, the desktop image no longer appears, until the OS reboot.

Reply 169 of 169, by DoZator

User metadata
Rank Member
Rank
Member

update:

I have tested "softgpu-0.8.2025.53.iso" and the "black screen" problem persists in this version as well. It seems to be a global change in "wined3d.dll". To work around this problem, you can roll back "wined3d.dll" to version "1.7.55.38-sse3" (or lower), or disable hardware acceleration in the display settings, and everything will work, but only SoftWare:

The attachment HWACCOFF.PNG is no longer available
The attachment CMSW53.PNG is no longer available
RayeR wrote on 2025-08-03, 14:06:

It may be tricky as sometimes there are no enough free MTRRs left for LFB settings...

In this case, the default WB will help.