VOGONS


DOSBox SVN for Apple M1 Core Dynrec - A simple hack

Topic actions

First post, by kjliew

User metadata
Rank Oldbie
Rank
Oldbie

This is a simple patch to enable CPU core dynrec for DOSBox SVN on Apple M1. The solution was extracted from dosbox-staging PR #1031 by @kklobe, simplified to be less intrusive on DOSBox SVN at the expense of disregarding for source portability. It also fixed OpenGL output on Apple M1 but only tested for MacBook Air M1 built-in retina display. It may not work on external display through USB-C or on MacMini M1 when the scaling option can be different. The obsolete SDL1.2 does not seem to provide API for detecting display scaling so the fix uses hardcoded values. It is very simple to remove the OpenGL output fix if it is not required, just ignore the 4-line changes in sdlmain.cpp.

Consider this as a temporary hack for DOSBox SVN, it was deliberately made simple so that it will survive SVN changes until the official upstream solution is presented. The devs really want to take their time.

Performance was great, almost 2X of CPU normal core and almost equivalent to QEMU TCG performance in MDKDOS. However, I think it is not as fast as Dominus's published PCPBENCH scores with Rosetta 2 translated dynamic_x64 but close. I think it is good enough for all high CPU demand DOS4GW games including built-in Munt MT-32 MIDI music. Nowadays, I don't really need DOSBox to run Win9x games anymore. I have yet to check out 3Dfx DOS games. There is still problem to be solved for Glide pass-through to be able to share the same OpenGlide as QEMU, but Voodoo chip emulation should work pretty well with the added performance of CPU core dynrec.

Debugging on macOS M1 is hard. I hate LLDB. 😜

Screen Shot 2021-07-13 at 11.55.10 PM.png
Filename
Screen Shot 2021-07-13 at 11.55.10 PM.png
File size
1.28 MiB
Views
2590 views
File comment
CPU Core Dynrec
File license
Fair use/fair dealing exception

EDIT: Update patch for dpiscale fix v2 to use OpenGL for obtaining DPI scale.

Attachments

Last edited by kjliew on 2021-07-21, 09:45. Edited 4 times in total.

Reply 1 of 67, by valuedcustomer

User metadata
Rank Newbie
Rank
Newbie

Glad you found the PR useful, @kjliew.

My original hack was unnecessarily slow due to brute-force invalidation of the entire cache region every time, and I made it a bit smarter with https://github.com/dosbox-staging/dosbox-staging/pull/1072, if you're interested.

There was also an experimental tweak I made that dramatically sped up several titles such as Frontier: First Encounters in commit https://github.com/dosbox-staging/dosbox-stag … ef22e1aaa815c26.

If you do come across M1-related issues with specific titles, feel free to drop a ticket in our repo.

Reply 2 of 67, by kjliew

User metadata
Rank Oldbie
Rank
Oldbie

Thanks @valuedcustomer.

As a temporary hack, I think the PR is good enough. It's simpler and does not mingle with inner implementation of core_dynrec.

Well, I come to understand why DOSBox devs is taking their time. 😉 Here's the comparison data between x86_64 and AArch64 with MDKDOS.

 @max 105%  | normal | dynrec | dyn-x64 |
Ryzen 2500U | 25 | 42 | 475 | pristine r4459
Apple M1 | 43 | 110 | n/a | pristine r4459 Rosetta2
Apple M1 | 105 | 218 | n/a | r4459 with hack

If there was no dynamic_x86 or @jmarsh didn't come up with dynamic_x64, then Apple M1 would have blown any x86 out of the water with just normal core. Frankly, 90% of DOS games just need normal core on x86. Dynrec core was nice on x86 and 64-bit ready, but M1 is truly a beast. And, those scores didn't take real screen pixels into consideration. The Ryzen has the advantage of just a 1080p screen with 1:1 display scale. The M1 would have pushed lots more pixels with the retina screen.

Oh please, Intel/AMD don't cry... 🤣 And, when can we have the next AArch64 SOC not from Apple for Windows/Linux on ARM laptops?

Or, shall we say "32-bit emulators are dead"?! 😉

Last edited by kjliew on 2021-07-21, 02:24. Edited 3 times in total.

Reply 3 of 67, by kjliew

User metadata
Rank Oldbie
Rank
Oldbie

The hack didn't touch anything on core normal, I believe. Normal should just be the same and dynrec does not work on pristine r4459 without the hack. So the scores are 105 and 0 if you would complete the chart.

I would be more interested to have more recent x86 implementation such as Ryzen 5000 and Intel 11th- Gen into the chart but I have none of those.

Reply 4 of 67, by kjliew

User metadata
Rank Oldbie
Rank
Oldbie

I am still puzzled why dynrec targeting x86_64 is so slow compared to dynrec targeting AArch64. I wouldn't have thought dynrec was capable of matching QEMU TCG performance in AArch64 and apparently I was wrong. I am curious if this would be the same between DOSBox dynrec and QEMU TCG on other non-Apple AArch64 implementation such as the RPi4, RockPi64 and OROID.

Kudos to @jmarsh! Similar to how x86 virtualization saving the grace of x86 CPUs for QEMU, his master piece dynamic_x64 saves the grace of x86 CPUs for DOSBox. 😉

If anyone has other ARM SBC at hand to post similar data for academical assessments, please be my guest. Thanks!

Reply 5 of 67, by kjliew

User metadata
Rank Oldbie
Rank
Oldbie

I have to say that CPU core dynrec on Apple M1 is simply amazing. Though it is not faster than dynamic_x64 with Rosetta.2 translation, being native AArch64 is always more future-proof. A similar concern on dynamic_x86 was there before the CPU core was overhauled to be 64-bit ready. Kekko's Voodoo chip emulation was so fast with CPU core dynrec that it made me slacking in bringing up OpenGlide for Glide pass-through. I don't remember CPU core dynrec was that good ever on any x86 CPUs in the past even with Core i7-6600U IIRC which forced me to keep MSYS2/mingw-w64-i686 or bloating up my Linux installation with 32-bit libs to just have DOSBox. CPU core dynrec couldn't play 3Dfx DOS games such as Tomb Raider 1 and Screamer 2 at 30 FPS. The same perception had made me skeptical towards having DOSBox on ARM SBCs to play games.

The Apple M1 experience had shaken off all such believes. Tomb Raider 1 locked at 30FPS, Screamer 2 raced at all-time smooth and Blood 3Dfx frag at absolutely zero lag. No stuttering or audio crackling from all those games. And, Kekko's Voodoo emulation has the advantage of running static linked 3Dfx DOS games. Battle Arena Toshinden just played flawlessly. OpenGlide also suffers several issues with Apple OpenGL that games such as Screamer 2 and Blood 3Dfx do not work on QEMU Glide pass-through on Apple M1 (which I have no idea how to fix/debug them because Windows & Linux work just fine).

The only problem is that Voodoo chip emulation cannot play in windowed, it is too small on retina screen and Voodoo chip emulation can't scale to match DOSBox. Everything becomes little tiny square. Fortunately, SDL1.2 fullscreen mode switching works and one can just play all 3Dfx DOS games in fullscreen. Apple macOS Big Sur even has the way to save & restore states of desktop layout in & out of fullscreen mode switch. Into fullscreen, the games stretched up and filled the retina screen with black bars on both sides to keep the aspect ratio at no cost. I think most would have loved to play 3Dfx DOS games this way. 😉

So I guess I will also take my time for DOSBox Glide pass-through with OpenGlide. 😉

Last edited by kjliew on 2021-07-21, 02:26. Edited 2 times in total.

Reply 6 of 67, by Dominus

User metadata
Rank DOSBox Moderator
Rank
DOSBox Moderator

I need to test this! (@kjliew I was sound asleep when you pinged me on irc 😀)
The most important test is whether I can codesign and notarize it with entitlements so it actually works on other machines. It wouldn't work with another path we took back a while with the DTK.

To work around SDL 1.2x limitations, the SDL12compat project has seen tons of improvements that work nicely on an intel macOS.

Windows 3.1x guide for DOSBox
60 seconds guide to DOSBox
DOSBox SVN snapshot for macOS (10.4-11.x ppc/intel 32/64bit) notarized for gatekeeper

Reply 8 of 67, by kjliew

User metadata
Rank Oldbie
Rank
Oldbie

Shadow Warrior USA with BUILD_FPS=1 for performance stats measurement
Native AArch64 DOSBox SVN CPU core dynrec with Kekko's Voodoo chip emulation.

QEMU won't be able to play this game with Glide pass-through, so DOSBox SVN just fill the shoes then. 😉
It was captured in windowed 800x600. Fullscreen mode 640x480 will have better FPS. It is still too small to play at 800x600 in retina screen and 800x600 fullscreen mode switch seems to be broken in macOS Bug Sur or Apple just didn't care to emulate it properly. I think external standard display should work in windowed, but I don't hook up mine with any.

Attachments

Reply 9 of 67, by Dominus

User metadata
Rank DOSBox Moderator
Rank
DOSBox Moderator

Apple M1 rosetta2: normal 43, dynrec 110
pristine r4459

Now I need to build the hack myself to see how that compares on my machine 😀

Windows 3.1x guide for DOSBox
60 seconds guide to DOSBox
DOSBox SVN snapshot for macOS (10.4-11.x ppc/intel 32/64bit) notarized for gatekeeper

Reply 10 of 67, by kjliew

User metadata
Rank Oldbie
Rank
Oldbie
Dominus wrote on 2021-07-16, 15:56:

Apple M1 rosetta2: normal 43, dynrec dynamic_x64 110
pristine r4459

Interesting, I believe the Rosetta 2 should be dynamic_x64 compiled as x86-64 and translated. So native AArch64 dynrec is actually faster, that is yet another surprise.

Dominus wrote:
Benchmark PCPbench: iMac Pro 3 GHz, Core 10 Intel Xeon: 167 fps MacMini 2.5 GHz, Dual-Core Intel Core i5: 105 fps MacMini M1 (Ro […]
Show full quote

Benchmark PCPbench:
iMac Pro 3 GHz, Core 10 Intel Xeon: 167 fps
MacMini 2.5 GHz, Dual-Core Intel Core i5: 105 fps
MacMini M1 (Rosetta2 emulation): 127 fps

How was that possible in your other post? Did you have display scaling on MacMini M1? Or, perhaps it was the Apple M1 8-core GPU that was doing the wonder and the MacMini better thermal budget compared to MacBook Air.

Can you also redo the PCPBENCH benchmark between native AArch64 and Rosetta2 on MacMini M1?

Last edited by kjliew on 2021-07-21, 02:28. Edited 1 time in total.

Reply 11 of 67, by Dominus

User metadata
Rank DOSBox Moderator
Rank
DOSBox Moderator

Interesting...
native M1 with this patch:
mdkdos: dynrec 232 (but on the very first run after starting DOSbox it goes up to 340-354), normal 107
pcpbench: dynrec 93, normal 33

rosetta2 64bit build:
PCPbench dynrec64 127,5, normal 23,7

But all this with output surface, your opengl fix doesn't seem to fix it, only makes it "different" to what was the problem before.

Attachments

Windows 3.1x guide for DOSBox
60 seconds guide to DOSBox
DOSBox SVN snapshot for macOS (10.4-11.x ppc/intel 32/64bit) notarized for gatekeeper

Reply 12 of 67, by kjliew

User metadata
Rank Oldbie
Rank
Oldbie
Dominus wrote on 2021-07-16, 19:52:

But all this with output surface, your opengl fix doesn't seem to fix it, only makes it "different" to what was the problem before.

So how did it look when you removed the OpenGL fix?

I think that is "the fix", so I also have this to scale up all VGA games nicely.

windowresolution=2048x1280
scaler=none

I have `output=opengl`. Otherwise, `output=surface` caused crackling in Munt MT32 audio in games such as GODS and LSL3 which I don't know why. For both games, cycles was set to the default 3000 with normal core, so the audio crackling wasn't due to CPU hogging AFAIU. With `output=opengl` or `openglnb` , it cures all the audio issues even before I had the fix. I don't have audio issues on Ryzen laptop regardless of any output type for those 2 games be it on Windows or Linux. So I would say the OpenGL fix is more critical for DOS games, the dynrec is just for a show. 😉

This is the same for Windows build, too, for high display scaling. Otherwise, it will just be the same tiny rect in windowed for those with Surface 4k panel crammed into 12.2 inch. I used to be able to get DOSBox SVN WIN64 to respect OS display scaling, so a 640x480 will look like 640x480 after display scaling but I couldn't reproduce the build anymore.

Reply 13 of 67, by Dominus

User metadata
Rank DOSBox Moderator
Rank
DOSBox Moderator

ok, tested it again without the fix and opengl works fine just out of the box for me... huh...
So on my MacMini M1 OpenGl works right away. The only thing (that so far has never been an issue) is that the MacMini is not hooked up to any display and is remote desktop controlled.
I did have OpenGl problems on the DTK machine that's why I thought this would still be the case and didn't even test it again and just assumed I'd still have a problem as you wrote you had one (on the DTK the DOSBox window was the right size but the actually displayed part would be much smaller in the lower left part of the window).
This makes this not easier 🙁
The culprit could be the SDL 1.2 you are using... My SDL 1.2 was built from their development version of March 13 2021 against SDK 11.1 with deployment traget 11.1.
And it's NOT using X11 which might be a possible clue since you are relying on X11 for your other stuff.

On intel macOS all other output than opengl was very noticeably slower, btw. Curiously that is not the case via the Rosetta2 emulation, surface is reproducibly only slower by 3 or 4 fps.

Windows 3.1x guide for DOSBox
60 seconds guide to DOSBox
DOSBox SVN snapshot for macOS (10.4-11.x ppc/intel 32/64bit) notarized for gatekeeper

Reply 14 of 67, by kjliew

User metadata
Rank Oldbie
Rank
Oldbie

That's what I have been yelling *Display Scaling*. If you do remote desktop or Hackintosh on VM, then there is literally no display scaling in action.
If it drives a local retina display, such as the MacBooks, or a 8k/4k panel at 27 inch on the MacMini, then I think you will see the problem. This is not just macOS, it is everywhere, but Linux typically has poor usage of display scaling.

Dominus wrote on 2021-07-16, 21:43:

(on the DTK the DOSBox window was the right size but the actually displayed part would be much smaller in the lower left part of the window).

This was exactly my OpenGL problem before the fix. Are you sure the window size was right? It was only *looked* right because the window was created taking display scaling into consideration while OpenGL rendered at real screen pixels. That's the reason display stayed in the lower-left part of the window.

Dominus wrote on 2021-07-16, 21:43:

This makes this not easier 🙁
The culprit could be the SDL 1.2 you are using... My SDL 1.2 was built from their development version of March 13 2021 against SDK 11.1 with deployment traget 11.1.
And it's NOT using X11 which might be a possible clue since you are relying on X11 for your other stuff.

Unlike for QEMU, my SDL1.2 is pristine version out of HomeBrew. I don't recompile it for customization. It does not rely on X11 at all. If I want an easy way out for Glide pass-through to work for DOSBox with OpenGlide, then I may recompile it for additional X11/XQuartz support. I think it may actually help the situation with display scaling because of X11/XQuartz. 😉

Dominus wrote on 2021-07-16, 21:43:

On intel macOS all other output than opengl was very noticeably slower, btw. Curiously that is not the case via the Rosetta2 emulation, surface is reproducibly only slower by 3 or 4 fps.

Yeah, you're right. all other output than opengl was noticeably slower, even on my Ryzen laptop, but "no one needs more than 30 fps in PCPBench (tm)"... 😉 Munt MT32 BGM matters for DOS games that support it.

Last edited by kjliew on 2021-07-16, 23:45. Edited 1 time in total.

Reply 15 of 67, by Dominus

User metadata
Rank DOSBox Moderator
Rank
DOSBox Moderator

I'll try connecting my thunderbolt display next. Might be in a day, though.
And the DTK was also only on remote control, so that part did not change.

In the meantime maybe give a self compiled SDL1.2 from current git a chance, I have no idea at which commit the homebrew version is.

Windows 3.1x guide for DOSBox
60 seconds guide to DOSBox
DOSBox SVN snapshot for macOS (10.4-11.x ppc/intel 32/64bit) notarized for gatekeeper

Reply 16 of 67, by Bruninho

User metadata
Rank Oldbie
Rank
Oldbie

No one thought about installing glew and glfw (as well as freeglut, maybe) from homebrew or compile yourself, to bump OpenGL on macOS from 2.1 to 4.1 ?

Even though you'd still have to (re)link and (re)compile any program that uses OpenGL to benefit from that?

"Design isn't just what it looks like and feels like. Design is how it works."
JOBS, Steve.
READ: Right to Repair sucks and is illegal!

Reply 17 of 67, by kjliew

User metadata
Rank Oldbie
Rank
Oldbie
Dominus wrote on 2021-07-16, 22:36:

And the DTK was also only on remote control, so that part did not change.

The DTK was right in every sense by forcing HiDPI handling even through remote control. It was made to behave just like MacBooks with retina display to force the early access development to focus on HiDPI aspects of software development which Apple had always been touting. So you already knew about the problem then, great! 😁

The HomeBrew version is the last official snap 1.2.15 with macOS patches cherry-picked. I believe most systems are doing the same, MSYS2/mingw-w64, Arch Linux, Debian/Ubuntu etc. with their own platforms respective patches cherry-picked. I think going for current git is a bad idea.

Reply 18 of 67, by Dominus

User metadata
Rank DOSBox Moderator
Rank
DOSBox Moderator

I have hooked up my Thunderbolt display and tested all available display options and it just works without your fix and goes wrong with it.
("Default for Display" and "Scaled" with all available modes: 2560x1440, 2048x1152, 1920x1080, 1600x900, 1280x720).

I just checked Brew and it is missing this vital SDL12 commit: https://github.com/libsdl-org/SDL-1.2/commit/ … cdac2636c82ff5f
Totally forgot that this was fixed due to my reporting 😀
So try getting the SDL --head via brew or however that works and your issue might be fixed without messing with DOSBox sources.

Windows 3.1x guide for DOSBox
60 seconds guide to DOSBox
DOSBox SVN snapshot for macOS (10.4-11.x ppc/intel 32/64bit) notarized for gatekeeper

Reply 19 of 67, by Bruninho

User metadata
Rank Oldbie
Rank
Oldbie

You can always make a custom homebrew formulae by modifying the original formulae and run it with "brew --build-from-source formulaname.rb" (must match the same name) to fix it...

I'm sure you already knew it, but I have no clue on how to add that commit to the formulae to be used for build.

"Design isn't just what it looks like and feels like. Design is how it works."
JOBS, Steve.
READ: Right to Repair sucks and is illegal!