VOGONS


First post, by jmarsh

User metadata
Rank Oldbie
Rank
Oldbie

As more operating systems adopt stricter security policies, there is a trend to stamp out programs using memory mapped as writable+executable. This is a problem for DOSBox's dynamic recompiler, as it currently just allocates a large chunk of cache and marks it read+write+executable. For now it works but probably not for much longer - SELinux is starting to see widespread use, macOS is pushing their hardened runtime, and other platforms just flat-out don't support it (no mprotect or similar mechanism) requiring other methods. This thread is to discuss at least two possible alternatives and get opinions from other devs about what the best fix would be, rather than a bunch of completely different fixes being implemented in different builds.

(Note that DOSBox has two distinct dynamic recompilers; one is generic, the other is x86-specific, they both allocate their cache the same way so for the sake of this discussion just pretend they're one and the same.)

What the current code does
Windows: Uses VirtualAlloc to allocate the code cache as read+write+executable. That's it.
Other platforms: Uses malloc to allocate the code cache and then (if the platform implements mprotect) uses mprotect to set that memory to read+write+executable. This is actually wrong (undefined behaviour); mprotect is only meant to be used on memory allocated using mmap, so that's at least one thing that needs fixing.

Approach 1: Switch between writable/executable
This method's pretty simple. The initial allocation is done with no permission (PAGE_NOACCESS / PROT_NONE) and when code is going to be placed into the cache it gets "locked" (marked as read+writable) then "unlocked" when writing is finished (marked as read+executable). This is pretty simple to implement in cache_openblock/cache_closeblock, only takes a few lines of code and doesn't require any changes to the architecture-specific dynrec source files. It does require some lines to be shuffled so that cache_block_closing() can be called while the memory is still writable (in case data/instruction cache flushing/invalidation is required) but that's trivial.
Pros: Simple to implement.
Cons: Relies on mprotect/VirtualProtect or other function to change memory permission, which not all platforms have. SELinux might still return EACCES when using mprotect to switch memory to executable. Small processing overhead every time a new CodeBlock is written as pages have to have their access modes switched back and forth.

Approach 2: Map the same memory at different addresses using different permission
This is more complicated. Basically you use mmap to map a file descriptor using read+write permission (which gives an "out" pointer), then mmap the same file descriptor using read+executable permission (which gives an "exec" pointer). Code is stored using the "out" pointer but the "exec" pointer is used for relocation calculations etc. since it is the address that will actually be executed. Most of this functionality would be hidden inside the cache_add* functions, if we wanted to get fancy it would be possible to turn cache.pos into a class with overloaded operators so referencing it would yield the exec pointer while dereferencing it would use the out pointer.
The main issue with this approach is that mmap (or MapViewOfFile on windows) requires an actual file (descriptor) for multiple mappings of the same physical memory. There are ways in linux to avoid using the actual filesystem (memfd_create,shm_open) but these are non-portable and may not always be supported depending on kernel configurations, so can't be relied on.
Pros: Practically no overhead. The recent NX (Switch) port uses this method. Ulrich Drepper suggests this method when working with SELinux (maybe this should be a con...)
Cons: A bit fiddly to setup and may be non-portable. Likely requires a physical file to be present while DOSBox is running, although it could be unlinked as soon as it is mapped (meaning it "should" be cleaned up as soon as DOSBox exits). May require some small changes to the existing risc_*.h source files.

So those are the options I see so far. I would like to hear from other devs (particular those running their own ports on non-mainstream OSes) what they think the best path to take is, and hopefully find a solution before everybody's GoG-bought DOSBox games start to become unrunnable.

Reply 2 of 22, by jmarsh

User metadata
Rank Oldbie
Rank
Oldbie

Guess there's not much interest in this so I'll just dump some patches in case they might be useful to someone in the future.
First patch: demonstrates approach 1 (lock/unlock) for windows using VirtualProtect(). Pretty basic stuff here.

Attachments

Reply 3 of 22, by jmarsh

User metadata
Rank Oldbie
Rank
Oldbie

Second patch: demonstrates approach 2 (multiple mappings of the same memory with different protection). This uses a specialized class for cache.pos with operator overloading to automatically return either the write pointer or execution pointer, depending on if the return value is unsigned/non-const (write pointer) or signed/const (exec pointer). I've tested this on linux x64 and am fairly sure it should also work with risc_armv8, but the other platforms might need some minor tweaking to ensure they cast to the correct type.

Attachments

  • Filename
    mmap_prot.diff
    File size
    6.3 KiB
    Downloads
    283 downloads
    File license
    Fair use/fair dealing exception

Reply 4 of 22, by Qbix

User metadata
Rank DOSBox Author
Rank
DOSBox Author

Thanks!
I don't know at this point what is the best solution.. Maybe the authors of platform specific dynrec cores have an opinion on it.

Do you have any data on the performance ?

Water flows down the stream
How to ask questions the smart way!

Reply 5 of 22, by jmarsh

User metadata
Rank Oldbie
Rank
Oldbie
Qbix wrote:

Do you have any data on the performance ?

The second patch (multiple mappings) seemed to give a slight performance boost over SVN but I can't really explain why. The mapped addresses were in similar ranges to what malloc returns so the same strategy would have been used for encoding 64-bit addresses (in gen_reg_memaddr and such).

The windows lock/unlock stuff I didn't compile in release mode so don't have accurate numbers.

Reply 6 of 22, by jmarsh

User metadata
Rank Oldbie
Rank
Oldbie

New patch attached.
This uses the same strategy for both windows and other platforms (separate mappings for writing vs. execution).
It merges cache.h from both dynamic cores since they were already 95% identical.
I had to touch a fair bit of code for platforms that I can't actually test (mips and arm), hopefully didn't break them but can't guarantee anything.

Attachments

  • Filename
    dynrec_wx.diff
    File size
    91.95 KiB
    Downloads
    316 downloads
    File license
    Fair use/fair dealing exception

Reply 7 of 22, by latalante

User metadata
Rank Newbie
Rank
Newbie

Under Linux it is most convenient to test using systemd. It's more widely available than SELinux.
MemoryDenyWriteExecute has been available since version 231 (over three years since release).

systemd-run -G --user -p MemoryDenyWriteExecute=yes /usr/bin/dosbox
systemctl --user status run-r9249eab05c714c1093d86e3e0919fb75.service
systemctl --user show run-r9249eab05c714c1093d86e3e0919fb75.service

Reply 8 of 22, by digger

User metadata
Rank Oldbie
Rank
Oldbie

As coincidence would have it, I just read a blog entry over on Talospace by ClassicHasClass, the same guy who also maintains TenFourFox, a Firefox fork for PowerPC-based Macs.

In the blog entry, he described how he modified DOSBox to support Dynamic Recompilation on the new POWER9 platform. He specifically described how he dealt with SELinux in the process. Quoting a relevant part of the blog:

Being able to emit generated code to memory and run it is actually a rather big security hole unless it is done carefully and co […]
Show full quote

Being able to emit generated code to memory and run it is actually a rather big security hole unless it is done carefully and correctly. Indeed, SELinux will halt you right in your tracks (as it should) unless you do the correct mating dance. The basic steps are:

  1. Allocate writeable memory.
  2. Emit machine code to that memory.
  3. Flush the caches.
  4. Make that tract of memory executable. (This is the dangerous bit. We'll talk about how to mitigate it.)
  5. Run the now executable code.
  6. Profit!

He goes in a lot more technical detail in the blog, which you can read here: https://www.talospace.com/2020/01/dosbox-jit- … ow-you-can.html

I hope this is useful to you as well.

Reply 10 of 22, by digger

User metadata
Rank Oldbie
Rank
Oldbie
jmarsh wrote on 2020-01-20, 00:32:

Uh yeah, we were posting back and forth in the PPC dynrec thread here...

Sorry about that. I guess I should have scrolled back a bit first.

Reply 13 of 22, by Ringding

User metadata
Rank Member
Rank
Member
jmarsh wrote on 2020-12-05, 13:11:

Apparently rather than discussing possible solutions, other people just want to be told what works so they can slap it in their project and take credit for figuring it out.

😁

I'm generally interested in this topic, but really short on time currently. And it seems you are the master of code generation here, so it will need to be your call.

Apparently the dosbox-x guys are also experimenting along similar lines, if I correctly interpret what I saw in their repository.

Reply 14 of 22, by jmarsh

User metadata
Rank Oldbie
Rank
Oldbie
Ringding wrote on 2020-12-05, 15:38:

Apparently the dosbox-x guys are also experimenting along similar lines, if I correctly interpret what I saw in their repository.

Yes, obviously they were "inspired" by me mentioning memfd_create and how to use mach_vm_remap.
A shame it was rather rushed and not discussed here like I suggested as there's still some bugs lurking in their implementation (and its security merits are pretty much worthless).

Reply 15 of 22, by krcroft

User metadata
Rank Oldbie
Rank
Oldbie

Carrying on the discussion, GitHub user @kklobe summarized use of MAP_JIT to get the dynamic core running natively on Apple's M1 ARM processor:

https://github.com/dosbox-staging/dosbox-stag … discussions/989

The plan mentioned is to continue following this thread's proposal to discuss and find the most elegant (and portable) path forward; and contribute to this effort if possible.

Reply 16 of 22, by jmarsh

User metadata
Rank Oldbie
Rank
Oldbie

Changing the state of the mapping (not to mention performing cache invalidation) for every single write is a huge performance killer.
MAP_JIT isn't necessary, it's a poor alternative to using mach_vm_remap with dual mappings (which is the recommended solution by Apple's own security team). It only exists for older apps that already have code written to toggle protection levels.

Reply 17 of 22, by krcroft

User metadata
Rank Oldbie
Rank
Oldbie

jmarsh, relaying an update from kklobe based on your feedback.
Exciting stuff.. hopefully a solution is close to converging!

ps- kklobe's wanted to reply here on vogons; just waiting for the account sign-up to clear.

Reply 19 of 22, by krcroft

User metadata
Rank Oldbie
Rank
Oldbie
jmarsh wrote on 2021-04-27, 19:26:

I would suggest copying what DOSBox-X does instead.

I understood you weren't happy with what DOSBox-X did from both of your responses:

Apparently rather than discussing possible solutions, other people just want to be told what works so they can slap it in their project and take credit for figuring it out.

A shame it was rather rushed and not discussed here like I suggested as there's still some bugs lurking in their implementation (and its security merits are pretty much worthless).

In both responses you wished for more discussion. @kklode put forward a solution, you noted a performance issue, which he has acknowledged and addressed.

But now you're not interesting in discussing this further, and instead recommend we do something that you weren't happy with: ie, "the DOSBox-X approach".

I believe everyone's intentions are good here - so I'm probably just missing something (seriously; and apologies for not cluing in).