Hi all, author of the code here, just wanted to respond to some good points jmarsh has raised.
TLDR: I'm in favor of a "1.5" Approach that combines a single mmap region with write protect toggling. I started this experiment because I wanted DOSBox Staging dynrec running on my M1 MacBook Pro that I've had since December.
I like Approach 1 because of the simplicity, along with the relative portability of a single mmap using MAP_ANON | MAP_PRIVATE (| MAP_JIT for Apple).
With regards to the 2016 presentation mentioned, a few things are new in the Apple ecosystem in the last several years:
- macOS Mojave shipped with the optional Hardened Runtime in 2018.
- macOS Big Sur shipped with per-thread write protect calls (pthread_jit_write_protect_np) in 2020.
- Apple Silicon hardware shipped in 2020.
Apple's recommended solution for security in 2021 seems to include the above mentioned components: enable the Hardened Runtime, enable the JIT Entitlement, and use per-thread write protect toggling to help reduce the attack surface.
And, I reasoned, if it's going to toggle, it might as well just have a single mapping, and not have to deal with the fiddly issues pointed out for dual mappings, not to mention the potential security issues with having an entire mapped region writeable for the whole process.
After looking around at some other projects that do codegen, I found that the toggling approach is common:
- OpenJDK approved this approach for the HotSpot VM JIT: https://github.com/openjdk/jdk/pull/2200
- Steel Bank Common LISP: https://github.com/sbcl/sbcl/search?q=pthread … rite_protect_np
- qemu: https://github.com/qemu/qemu/search?q=pthread … rite_protect_np
So it looks like there's a decent precedent for the toggling approach. It also worked with a quick test on Fedora 33 arm64 with SELinux enabled by using an mprotect in place of the pthread_jit_write_protect_np.
In summary, I came to the same conclusion as the author of the OpenJDK PR:
"It's implemented with pthread_jit_write_protect_np provided by Apple... This approach of managing W^X mode turned out to be simple and efficient enough."
Thanks for taking the time to discuss.