First post, by NY00123
Update (May 5th, 2013): Please see a later post for an up-to-date patch.
In particular, the issue with "Threaded optimization" was really a bug in the patch.
Hey all,
I am attaching a patch that attempts to add support for host VSync with no great emulation slowdowns. I did start a thread on a similar topic before, but it has what I consider to be a vastly different patch.
=== How to use this ===
A few changes to the DOSBox settings are done:
- "fulldouble" is renamed "vsync", for reasons of my own. (OK, one of them is related to some flag from SDL2, although there is nothing SDL2 related in this patch.)
- "vsync" applies to OpenGL output now, in addition to Surface and DirectDraw.
- A new "threading" setting has been added (to the [sdl] section in the configuration file). You probably want to keep it set to "auto", although you may change that if you wish.
=== Known issues ===
If any of you tries the OpenGL path with any Nvidia GPU, you may want to disable "Threaded optimization" in the driver settings, or else you may have not benefited from the patch at all. I really don't know why is it the case, but I'm telling to let you know.
Furthermore, it should not be a surprise at all if new bugs are reproduced. As usual, use at your own risk!
=== (The answer is *no*) Does it imply threaded rendering ===
Unfortunately, for now I have decided to keep the rendering itself (e.g. the work of the scalers) in the emulator's thread. With a few(?) modifications to this patch it can probably be done, although one needs to take care of proper video capturing and more.
=== Some more details ===
ok, some forum readers may not be sure what's the big deal. Maybe you can just enable VSync in some way, as done for lots of games, and do no more. However, chances are it is not going to work well here:
- Many DOS games may output 70 distinct frames per second at some point, while today's displays can often do less as given by their refresh rates (e.g. 60Hz). This means that noticeable slowdowns (really a bit of slow motions) can be experienced.
- Even if it weren't a problem, the CPU will probably need to wait for host vertical sync more often than not. These are periods of times when the emulated machine can't run, which can have an impact on heavier DOS games and/or lighter (host) machines.
As hinted before, I did have an earlier attempt, based on time measurements. However it's a bit too sensitive and may easily fail, requires one to manually specify the monitor's refresh rate so it works better, and a bit of potential CPU time (for emulation) is still lost.
In this patch, a different approach is attempted: Let one thread wait for vertical sync, while the emulator may continue to run in a different thread.
It cannot just work as-is by naively calling an existing screen update function with a few modifications, though. For one, many SDL functions related to video and event handling should be called from the thread where SDL_SetVideoMode is called for the very first time. It is also safer to let this thread be the main one.
And then, there are synchronization issues to take care of.
So, after a bit earlier attempt which semi-worked but was imperfect and possibly a bit messy, I have gotten the given patch. Basically, rather than referring to specific functions, like SDL_Flip, pointers to such functions are used. Without threading they simply pointer to what you'd expect (e.g. SDL_Flip). Otherwise they point to wrapper functions, so the secondary thread can push calls to the main thread. There are a few kinds of such calls in use:
- Synchronous calls: The secondary thread waits until the main thread is done with a call to some function. It may be a void function, or one that returns a value. In both cases it is synchronous.
- Asynchronous calls: The secondary thread schedules a function for the main thread and then returns immediately with no wait. The main thread will execute such a function (possibly with different arguments) a bit later.
- Special calls with unique handlings.
As you can see in the patch, locks are used often for synchronization. C++11 adds std::atomic and SDL 2.0 also has a few functions for atomic data accesses, but with this patch you don't need more than the usual SDL 1.2 setup and a compiler with no C++11 support.
Oh, and if you think that it's possible to avoid *both* locks and (hardware) atomic accesses in some way, I'm afraid it may not work as expected. Sure, one may think that if one thread is waiting for a boolean to become "true", it is sufficient that another thread sets it to "true" with no protection. However, the two threads may run on two distinct CPUs with their own caches, and one of them may still contain the old boolean value in its cache, even after the update. See where is it heading?
So, yes, locks and more have their costs. I haven't seen a large degradation in the performance, though, and that's under the assumption there is any noticeable one.