krcroft, unfortunately the wolf3d menu fade effect is still causing audio stutter on my build when using output=surfacepp. The game itself seems to run fine though as long as you fix the cycles to around 4000. (At around 5000 cycles and higher the dosbox process starts to max out one of the cores.)
I was curious about this, so I did a bit more testing. The first thing I noticed was that even when using output=surface, my build of dosbox was far slower than the RetroPie packaged dosbox. (i.e. my dosbox build would max out a CPU core at a fixed 10000 cycles with wolf3d, whereas the RetroPie dosbox binary could do 20000 cycles or higher). From looking at the RetroPie dosbox build script (
https://github.com/RetroPie/RetroPie-Setup/blob/master/scriptmodules/emulators/dosbox.sh), I noticed the follow changes were being made to config.h:
- Code: Select all
if isPlatform "arm"; then
# enable dynamic recompilation for armv4
sed -i 's|/\* #undef C_DYNREC \*/|#define C_DYNREC 1|' config.h
if isPlatform "armv6"; then
<snip>
else
sed -i 's/C_TARGETCPU.*/C_TARGETCPU ARMV7LE/g' config.h
sed -i 's|/\* #undef C_UNALIGNED_MEMORY \*/|#define C_UNALIGNED_MEMORY 1|' config.h
fi
fi
After making those three changes to my config.h and rebuilding, my dosbox build performance was roughly doubled when using output=surface (to basically the same performance as the original RetroPie build). However, these config.h changes had no noticeable effect when using output=surfacepp - it was still limited to about 4000 cycles and still had audio stuttering during fade effects.
My next thought was that maybe the RP3's CPU simply can't handle drawing a 1280x1000 surface in a single thread along with all the other emulation demands of dosbox. The default RetroPie dosbox config just draws the original 320x200 surface and let's the hardware scaler do the rest of the work - so using output=surfacepp on a 1080p display is by comparison drawing 20x (4x5y) the number of pixels. I thought a good way to confirm this was to patch in the normal5x scalar into my build and see if it had similar performance issues (As far as I know, normal5x is done purely in software would actually draw even more pixels that the previous test: 25x vs 20x).
To my surprise, the normal5x scaler runs wolf3d perfectly fine on an RPI3 without any audio stutter at all. I was even able to bump the cycles up to 15000+ without maxing out a CPU core. I'm not sure why surfacepp's 4x5y scaling would run so much slower than normal5x's 5x5y scaling - is this even a valid comparison? Ant, would you expect your 5x5y algorithm to perform similarly to normal5x?