First post, by agovtman
I have created a fork of Nuked-OPL3 that performs much better (1.4x-2.2x faster) with the exact same output accuracy.
I noticed a while back that Nuked-OPL3 does a lot of extra work it doesn't need to do inside the hot loop, like using function dispatches to compute stuff that could be computed inline or just precomputed into a LUT, etc. Moreover, Nuked-OPL3 is often the single heaviest user of CPU time in emulators that use it, so it was a productive target for optimization. I got to messing around, found even more optimizations I could make, and ended up with a version of Nuked-OPL3 that produces exactly identical output to the original, but is as much as 2.2x faster!
I have offered to NukeYKT to merge these changes upstream, but they do some "clever" things (in the pejorative sense: opaque, tricky, weird) that he might not want to drag into his codebase, so for now it exists as an independent fork.
This is a drop-in replacement for Nuked v1.8, and has already been adopted by dosbox-staging, libADLMIDI, and 86box. dosbox-x looks to be ready to merge it imminently, too.
If your project uses Nuked-OPL3, try switching to my -fast fork and run some performance profiling and see the difference for yourself!
PS: I have gotten Nuked-OPL3-fast running in realtime on an overclocked RPi Pico 2!