Reply 160 of 212, by krcroft
Let's talk more nuked!
1. Performance: Let's restate our OPL differences as a rule-of-thumb. All we need to remember is "2 and 3". 'fast' is twice as fast as 'compat', which is three times faster than 'nuked'.
2. SIMD: great point dreamer. Doing it ourselves adds maintenance and complexity but probably worth it if the hot spots are concentrated like you said. We can come at it from the compiler side- have it flag problematic loops and blocks that need some adjustment or hinting before it can effectively SIMD them (nuked is simple and elegant enough that this approach will probably work).
Combined with profile-feedback, this lazy-approach can offer most of the benefits with little added maintenance burden. We'd decide on a handful of CPU models for each architecture and generate optimized release binaries for them (march=..., mtune=..., etc).
3. Quality! I am absolutely amazed that nuked is essentially a cycle-accurate emulator of the OPL processor based on inspecting the hardware die. I mean.. most huge companies and governments don't even go through this level of effort (nor have the talent) to retain or extend compatibility with their own hardware or custom ICs.. they just move on or give up.
Could we be any more lucky to have such a thing? wow. That's like little Johnny losing his favorite pet Fido and his parents just happen to be in charge of the government's clandestine bio-cloning laboratory. "Johnny, we got you another puppy.. remember how you lost Fido last year? Well.."
There's an impressive discussion here involving James-F and others, The way to detect OPL3 clone regarding testing actual OPL hardware. Maybe he and other hardcore OPL folks can comment on distinguishable differences between fast, compat, and nuked? help wanted!
2. Threading: right now almost everything in DOSBox blocks and operates serially.
How true is this on actual hardware though? For example: Game generates FM notes which it sends to the OPL driver, which "sends" them across bus to the card, which generates samples fed to its DAC and then to line-out and the speakers. How soon in this chain does the game's function call to the driver return, allowing the game to carry on running the game?
MSCDEX is asynchronous: the game simply says "play cdrom at sector X for N frames", and is handed back control the moment the cdrom starts playing.. the game code carries on without having to make even a single function call to babysit the cdrom; so audio plays for "free" and asynchronously.
This same MSCDEX flow in DOSBox is synchronous though: the CPU emulation core waits on the Ogg decoder to decode each Vorbis packet before the CPU emulator resumes, and in turn the game waits before its instructions resume executing. Fortunately it all happens fast enough that it all fits within the framerate!
It would be interesting if people with this deeper knowledge could comment on all the hardware flows, and which are mostly async vs mostly sync.