Reply 20 of 22, by jmarsh

User metadata
Rank Oldbie

I wasn't happy with the way it was done, because we were meant to be discussing the "best" (subjective) method when my suggestions suddenly got dumped into their source tree without even an acknowledgement. If the conversation had continued it could have been a more secure implementation - if you look at the "bulletproof jit" blackhat presentation and some of the changes made to SVN, you can see it's heading in this direction. DOSBox-X instead has a bunch of different implementations for different platforms, none of which are very secure.
If you're not happy with waiting for SVN's solution, the dual-mapping used in DOSBox-X is better than using MAP_JIT and juggling the w^x protection.

Reply 21 of 22, by krcroft

User metadata
Rank Oldbie

Indeed - the goal was to share @kklobe's work in hopes of progressing an elegant and performant solution; to help the general effort (without any project goals or affiliations in mind).

the dual-mapping used in DOSBox-X is better than using MAP_JIT and juggling the w^x protection.

Thanks for weighing in on this. Perhaps @kklobe will take this into account. One aspect that's really nice about his approach is it's very thin and minimally invasive, while enabling dynamic core on latest-generation macOS and Linux OS systems.

if you look at the "bulletproof jit" blackhat presentation and some of the changes made to SVN, you can see it's heading in this direction.

thanks -- and this is great news.

On the security-discussion side, I'd hazard we'd agree that DOSBox has fewer exploit opportunities and lower risks versus an internet-connected hypervisor or browser JIT compiler (WebAssembly, etc..). Given DOSBox doesn't have any active exploits against it (in its anything-goes W+X form), then adding W^X, even in its weaker form, will move DOSBox further off the exploit radar. Of course, lack of evidence isn't proof that it's not possible.

Edit: Your original post does a great job at comparing implementation approaches. Regarding specific goals or criteria (security and performance-wise) or anti-patterns or show-stoppers (like using MAP_JIT mentioned above), could you detail those and what you'd like to hit?

Last edited by krcroft on 2021-04-29, 16:21. Edited 1 time in total.

Reply 22 of 22, by valuedcustomer

User metadata
Rank Newbie

Hi all, author of the code here, just wanted to respond to some good points jmarsh has raised.

TLDR: I'm in favor of a "1.5" Approach that combines a single mmap region with write protect toggling. I started this experiment because I wanted DOSBox Staging dynrec running on my M1 MacBook Pro that I've had since December.

I like Approach 1 because of the simplicity, along with the relative portability of a single mmap using MAP_ANON | MAP_PRIVATE (| MAP_JIT for Apple).

With regards to the 2016 presentation mentioned, a few things are new in the Apple ecosystem in the last several years:

- macOS Mojave shipped with the optional Hardened Runtime in 2018.
- macOS Big Sur shipped with per-thread write protect calls (pthread_jit_write_protect_np) in 2020.
- Apple Silicon hardware shipped in 2020.

Apple's recommended solution for security in 2021 seems to include the above mentioned components: enable the Hardened Runtime, enable the JIT Entitlement, and use per-thread write protect toggling to help reduce the attack surface.

And, I reasoned, if it's going to toggle, it might as well just have a single mapping, and not have to deal with the fiddly issues pointed out for dual mappings, not to mention the potential security issues with having an entire mapped region writeable for the whole process.

After looking around at some other projects that do codegen, I found that the toggling approach is common:

- OpenJDK approved this approach for the HotSpot VM JIT: https://github.com/openjdk/jdk/pull/2200
- Google's JavaScript v8 runtime (used in Chrome and NodeJS): https://github.com/v8/v8/blob/dc712da548c7fb4 … -space-access.h
- Steel Bank Common LISP: https://github.com/sbcl/sbcl/search?q=pthread … rite_protect_np
- qemu: https://github.com/qemu/qemu/search?q=pthread … rite_protect_np

So it looks like there's a decent precedent for the toggling approach. It also worked with a quick test on Fedora 33 arm64 with SELinux enabled by using an mprotect in place of the pthread_jit_write_protect_np.

In summary, I came to the same conclusion as the author of the OpenJDK PR:

"It's implemented with pthread_jit_write_protect_np provided by Apple... This approach of managing W^X mode turned out to be simple and efficient enough."

Thanks for taking the time to discuss.