VOGONS


PowerPC Dynamic Recompiler (patch)

Topic actions

First post, by jmarsh

User metadata
Rank Oldbie
Rank
Oldbie

So I started this over 5 years ago, left it unfinished for a few years and now started from scratch and got it into a fit state for submitting...

This patch adds a dynamic recompiler for 32-bit PowerPC, based on the existing dynrec framework. I've only tested it on a wii but there should be no reason for it not to work on PowerPC based Macs. As far as performance goes with core=normal I get 0.7fps from PCPBENCH, with core=dynamic I get 3.1fps. There are some other big-endian improvements that can be made that get it up to 4.0 but I haven't included them here as they aren't related to dynrec.

I haven't touched any of the autoconfigure scripts, config.h needs the following settings:
#define C_TARGETCPU POWERPC
#define C_DYNREC 1
#define WORDS_BIGENDIAN
The compiler needs to support gcc inline assembly (checked via defined(__GNUC__)) for dcache flushing/icache invalidation. There doesn't seem to be a portable way to achieve this, but they're not supervisor level instructions so should be fine for any userspace program to use.

Some comments on the changes:
- I had to name the FPU_Rec struct so it could be forward-declared in risc_ppc.h (having a dedicated register pointed to it helps FPU heavy code).
- Removed some unneeded WORDS_BIGENDIAN guards in the self-modifying code detection, they weren't needed as the additions aren't meant to overflow between bytes.
- Made dyn_run_code() get called before dyn_return(BR_Link1/BR_Link2) and shuffled their locations a bit. The reason for this is that the PPC dynrec generates its epilog once in gen_run_code() and then puts a jump to it whenever gen_return_function() is called, rather than emitting a full epilog every time. If dyn_return() was called before dyn_run_code() the address of the epilog is unknown.
- Added missing cache_block_before_close()/cache_block_closing() calls for those blocks, since they were missing.
- The dynrec decoder wasn't differentiating between little-endian (host) memory access and regular memory access. I added new functions where necessary (hopefully caught them all) and aliased them to the regular functions when WORDS_BIGENDIAN is not defined.
- dyn_ret_near() was bugged, it tried to write a dword to &reg_ip which overran on big-endian.

Attachments

  • Filename
    ppc_dynrec.diff
    File size
    43.35 KiB
    Downloads
    84 downloads
    File license
    Fair use/fair dealing exception
Last edited by jmarsh on 2019-10-06, 18:41. Edited 2 times in total.

Reply 1 of 117, by digger

User metadata
Rank Member
Rank
Member

Promising work! 😀

How well do you think this will run on Talos and/or Blackbird hardware by Raptor Computing Systems? https://www.raptorcs.com/content/base/products.html

Is there anybody here on Vogons who owns such a system? If not, perhaps you could get in touch with the guy who manages the Talospace blog and ask if he'd be willing to try this out on his hardware: https://www.talospace.com/

He's quite the low-level software wizard himself, by the way. He's also the developer behind TenFourFox, the Firefox fork for PowerPC Macs that he still maintains and updates regularly.

Reply 3 of 117, by digger

User metadata
Rank Member
Rank
Member
jmarsh wrote:

Afaict that's a different ISA. PowerPC (what this dynrec is) branched off from the POWER line, so while they have a similar base they're not compatible.

Hmmm, according to this blog post on Talospace, it should be possible for these systems to virtualize (and partially emulate) any PowerPC CPU. To quote the blog post directly:

However, KVM-PR can also emulate other instructions and their desired behaviour, which theoretically allows it to act like any supported Power ISA or PowerPC CPU, including a G3, G4 or G5. Instructions which aren't supported natively are trapped and executed just like supervisor-level instructions, and everything else can still run on the metal.

So if I understand this correctly, it does have to trap and emulate certain instructions, but most of them can be executed natively. That should still allow these systems from Raptor CS to run DOSBox with your PowerPC Dynamic Recompiler patch with reasonable to good performance, wouldn't it? Perhaps you could even modify your patch to avoid any instructions that POWER8 and POWER9 would have to emulate, or at least reduce the number of those in your code as much as possible.

Of course to be able to develop, test and debug for POWER8 and/or POWER9 architectures, you would of course have to be in possession of such a system and this hardware is not exactly cheap. But if you don't have it, perhaps someone else here on the forum who has such a machine could try this out for you, using KVM-PR if necessary? 😀

Reply 4 of 117, by Qbix

User metadata
Rank DOSBox Author
Rank
DOSBox Author

Thank you for the patch, as I said on IRC, I am not too sure about your fix for dyn_ret_near

Your fixes/changes didn't fix the problem that I have been chasing for a while now.
I'll make topic about it, as at the moment, I am not really sure how to find it what is going wrong.

The reordering that you did doesn't seem to break the x64 dynrec

Water flows down the stream
How to ask questions the smart way!

Reply 5 of 117, by jmarsh

User metadata
Rank Oldbie
Rank
Oldbie

There's actually a bug in gen_and_imm but luckily none of the current use cases for that function trigger it, I can fix it up on the weekend and tweak the dyn_ret_near fix to be more correct rather than just doing the same as little-endian systems.

Reply 6 of 117, by Qbix

User metadata
Rank DOSBox Author
Rank
DOSBox Author

think the bigop?reg_eip:reip,bigop was better ? or what we came up with on irc (and is used in other places)

Water flows down the stream
How to ask questions the smart way!

Reply 7 of 117, by jmarsh

User metadata
Rank Oldbie
Rank
Oldbie

Get rid of the zero extension when decode.big_op is false and use "decode.big_op?(void*)(&reg_eip):(void*)(&reg_ip),decode.big_op)" when storing the value from the host reg.
If the value is zero extended it's fine to write 32 bits to &reg_eip (that's what the normal and full cores always do) but why make 16 bit code emit an extra instruction when we can just write 16-bits directly instead.

The "if (bytes) gen_add_direct_word(&reg_esp,bytes,true);" statement is a bit of a worry because it assumes the stack address size is always 32 bits but I guess it is not likely for sp to overflow. Could maybe be fixed with a branch based on cpu.stack.big, but if the new sp value needs to be fixed there's a good chance an SS exception should be taken too.

Reply 8 of 117, by jmarsh

User metadata
Rank Oldbie
Rank
Oldbie

Related: Here is a patch that fixes drive_fat.cpp to work on big-endian systems (including fixing a bug that allocates one too many clusters when a file's length is a multiple of the cluster size) and makes use of gcc's bswap builtins (which for PowerPC translate to lwbrx/stwbrx) for host memory access.

Attachments

  • Filename
    drive_fat_BE.diff
    File size
    8.14 KiB
    Downloads
    86 downloads
    File license
    Fair use/fair dealing exception

Reply 9 of 117, by fr500

User metadata
Rank Newbie
Rank
Newbie

I don't have a PPC Mac so I wasn't able to test this on such a system.
I was able to test on WiiU via RetroArch (I have a DOSBox fork that sticks as closely as possible to upstream here https://github.com/fr500/dosbox-svn/tree/ppc)

Sadly it crashes and it's well beyond what I could possibly solve.
PeC95zL.jpg

If there is anything I can do to contribute towards a fix I'd be willing to!
OFC I don't expect anyone to put serious time on this considering the niche status of the WiiU

Reply 10 of 117, by jmarsh

User metadata
Rank Oldbie
Rank
Oldbie

I would need to double check the reported SR1/DSI values but at first glance it looks like the memory that is malloc'd to hold the dynamic code isn't marked as executable. Does retroarch have any other cores that use a dynarec on wiiu that might know how to change memory protection?

Reply 11 of 117, by fr500

User metadata
Rank Newbie
Rank
Newbie

Sadly no, WiiU isn't all that popular actually.

One of the toolchain developers said this:

Doesn't look like an easy one - likely some bad pointer math, or they're relying on some mprotect-ish function that's not really a thing on WiiU. dynarecs have gotta have at least a little bit of WiiU-specific code to make 'em work, be it through the usual OSCodegen methods or something a lil' more kernel-ly - you've gotta take the generated code and mark it as executable before you can run it.

Taking a guess based on the stuff on git w/o looking at a binary, my money's on this pointer being uninitialized, which isn't great because we try and jump to it here. Should've been initialized here, can't trace it much further than that

The code locations he's referring to are:
https://github.com/fr500/dosbox-svn/blob/ppc/ … dynrec.cpp#L142
https://github.com/fr500/dosbox-svn/blob/ppc/ … dynrec.cpp#L254
https://github.com/libretro/dosbox-libretro/b … ec/cache.h#L636

Reply 12 of 117, by jmarsh

User metadata
Rank Oldbie
Rank
Oldbie

It's not uninitialised, it's just not executable. You need to use the wiiu equivalent of mmap/mprotect to make it so, not sure if that's what OSCodegen does or if it's more complicated e.g. having to switch memory back and forth between writable or executable.

Reply 13 of 117, by Dominus

User metadata
Rank DOSBox Moderator
Rank
DOSBox Moderator

initial test and I can't even compile ppc with a dynrec core. seems I need some magic, so the OS X ppc built is seen as ppc.
just adding a #def ine PowerPC and #define C_DYNREC in config.h was not enough, as it probably pulled in code for OS X

Windows 3.1x guide for DOSBox
60 seconds guide to DOSBox

Reply 14 of 117, by jmarsh

User metadata
Rank Oldbie
Rank
Oldbie

These are the lines to put in config.h (if they're not there already or set to different values):

#define C_TARGETCPU POWERPC
#define C_DYNREC 1
#define WORDS_BIGENDIAN

Reply 15 of 117, by Dominus

User metadata
Rank DOSBox Moderator
Rank
DOSBox Moderator

That helped somewhat but ran into

In file included from core_dynrec.cpp:155: core_dynrec/risc_ppc.h:489:41: error: invalid suffix "b10100" on integer constant cor […]
Show full quote

In file included from core_dynrec.cpp:155:
core_dynrec/risc_ppc.h:489:41: error: invalid suffix "b10100" on integer constant
core_dynrec/risc_ppc.h:536:31: error: invalid suffix "b10100" on integer constant
core_dynrec/risc_ppc.h:548:26: error: invalid suffix "b01100" on integer constant
core_dynrec/risc_ppc.h:561:26: error: invalid suffix "b00100" on integer constant
core_dynrec/risc_ppc.h:588:26: error: invalid suffix "b00100" on integer constant
core_dynrec/risc_ppc.h:597:26: error: invalid suffix "b00100" on integer constant
core_dynrec/risc_ppc.h:646:31: error: invalid suffix "b10100" on integer constant
core_dynrec/risc_ppc.h:654:30: error: invalid suffix "b10100" on integer constant
core_dynrec/risc_ppc.h:660:31: error: invalid suffix "b10100" on integer constant
core_dynrec/risc_ppc.h:676:31: error: invalid suffix "b10100" on integer constant

Windows 3.1x guide for DOSBox
60 seconds guide to DOSBox

Reply 16 of 117, by krcroft

User metadata
Rank Oldbie
Rank
Oldbie

Dominus,

What kind of PPC machine and OS are you using?

I'd like to chip in some help here too; I have a Power Mac G4 'Sawtooth' collecting dust that I previously ran Gentoo on back in the day.

I'd start fresh with whatever OS jmarsh, QBix, or yourself feel is the best target (I realize on Linux it doesn't matter much.. just the kernel, user-space libraries, and build suite; but I might as well start off on the right foot).

I also figured I could be orthogonal to whatever you're using, to maximize our test coverage.

Reply 17 of 117, by Dominus

User metadata
Rank DOSBox Moderator
Rank
DOSBox Moderator

I'm cross compiling for OS X PPC on a OS X 10.14 (yes that works if you have all the tools from way back 😀)

Windows 3.1x guide for DOSBox
60 seconds guide to DOSBox

Reply 19 of 117, by krcroft

User metadata
Rank Oldbie
Rank
Oldbie
Dominus wrote:

I'm cross compiling for OS X PPC on a OS X 10.14 (yes that works if you have all the tools from way back 😀)

Ahh, right on Dominus! With you covering OSX, I'll go with Debian to give coverage for those trying to use a modern Linux.

Edit: I might be sticking with Gentoo; Debian dropped PPC support in Stretch, while Jessie only supports up to kernel 2.6.x. CentOS dropped PPC support in 8 and newer, while 7 only supports kernel 3.10. ArchLinux gave up on PPC around 2013. OpenSuse dropped PPC in 12, while 11.3 only suports kernel 3.x. YellowDogLinux's latest release was from 2012 (hello seven years' worth of security vulnerabilities), so that's not an option.

Gentoo's PPC port is active and as of today supports kernel 4.19, and it looks like kernel coverage will keep advancing too. I'm not a fan of how slow emerge and portage was back in the day, but hopefully they've improved things.

Last edited by krcroft on 2019-09-25, 15:42. Edited 4 times in total.