VOGONS


First post, by Kaizzer

User metadata
Rank Newbie
Rank
Newbie

Hi everybody!

After many many months thinkering about translating the mighty Nuked OPL3 into SIMD intrinsics, I've finally managed to put something meaningful together:

https://github.com/TexZK/aymo

So, what is AYMO? What are its purposes?

AYMO is an attempt to take advantage of SIMD instructions of the most common CPUs to accelerate audio emulators.
Eventually, it's intended to be provided as LGPL 2.1 static and dynamically loaded library to be linked against PC emulators (e.g. DosBox) or music players (e.g. AdPlug).

This journey started after reading that Nuked OPL3 was too slow compared to other OPL3 emulators.
As a person fascinated by High-Performance Computing (just fascinated, not a pro at all!), I was wondering if there's some way to parallelize the OPL3 synthesis.
Rearranging the computation flow might give some parallelization opportunities, because the 4-stage synthesis is mostly the same for all the audio channels.
So, here's a first attempt at improving the performance of the Nuked OPL3 via x86 SSE4.1, x86 AVX2, and ARMv7 NEON.

As I'm also fascinated by the AdLib Gold soundcard, I've also rewritten my own TDA8425 and YM7128 emulators for SIMD.

Is it worth?

Weeell, it really depends on somebody's point of view. My opinion is that I paid hard money for either a workstation or a tiny single-board computer, so I want it to perform as fast as possible!

As per the OPL3 emulator itself, it turns out that the parallelization of the Nuked OPL3, which is inherently very convoluted, provides a speedup in the ballpark of 2x - 3x.
Honestly I was hoping for much more as I started this journey, but I think that such a speedup is still worth the HUUUGE effort (trust me, it's already been VERY demanding!).

I still haven't run any benchmarks against TDA8425 and YM7128 yet, but I'm quite confident that those should perform much better, because of their simple DSP nature.

I also think that the pure C implementations themselves have margin for some optimizations I've already put into AYMO.

One big caveat is that AYMO can't be a quick drop-in replacement for other implementations, because a SIMD code base requires some preliminary configuration.
This is why I'm using some complex meson build scripts (initially borrowed from the Opus codec project).

Next Steps

AYMO is still at a very early development stage, and I hope I can still find time and interest in developing it.
The code base requires some clear refactoring, libraries have to be tested, benchmarks have to be run... and eventually, this toy might find its way into some big emulation projects.