VOGONS


PowerPC Dynamic Recompiler (patch)

Topic actions

Reply 41 of 137, by jmarsh

User metadata
Rank Oldbie
Rank
Oldbie
ClassicHasClass wrote:

sys_icache_invalidate() is better on OS X anyway.

I'm not sure, the docs don't mention it performing a data store/flush before invalidating plus it seems wasteful to make a syscall to perform non-supervisor operations.
If this was portable code it would be a no-brainer but given it's explicitly for PowerPC we know exactly which instructions need to be executed.

Reply 42 of 137, by ClassicHasClass

User metadata
Rank Newbie
Rank
Newbie

It's more of a correctness question. sys_icache_invalidate() is a commpage routine in at least 10.4 and up, see https://opensource.apple.com/source/xnu/xnu-7 … ies.h.auto.html and https://opensource.apple.com/source/xnu/xnu-7 … ush.s.auto.html . You can see from the code that there's a couple different things going on: first, it does flush the data cache (it uses dcbf instead of dcbst as yours does but the difference is immaterial for this purpose), and second and somewhat more germane, it knows the cache line size, which is relevant especially on G5 (128 bytes, not 32).

Yes, it is technically a syscall, but commpage calls are pretty quick, and we don't have to reinvent the wheel. I don't recall that this routine specifically is patched, but the kernel does patch commpage routines as well for even better performance tuned to the specific machine in use, so I tend to prefer them for sysdep stuff like this. I suppose you could inline all of this and even eliminate that overhead, but if we're at the point where calls to invalidate the icache are a significant portion of runtime, I think that in and of itself is a much bigger problem than the syscall. That's why I used it here instead of trying to make old and busted gcc happy, and why I use it in TenFourFox instead of a different approach.

For Linux, of course, the instruction sequence makes more sense.

Just my $0.02.

Reply 43 of 137, by ClassicHasClass

User metadata
Rank Newbie
Rank
Newbie

Okay, I've got the ppc64le version building, and it generates and executes code, but I'm hitting an issue I'm not sure how to resolve. I think your patches address this somewhat, so maybe you have an idea. I'm testing with Extreme Pinball since it's DPMI and I have it handy.

Right now I have traps in the epilogue, in gen_jmp_ptr (for reasons to be explained) and in the code gen_jmp_ptr emits.

The first emulated instruction ends up being a JMP. The prologue and epilogue of the generated assembly code work fine, deduct the right number of cycles, etc., and return to the main code. So far so good.

The second instruction calls gen_jmp_ptr as part of the setup. This seems to originate from CreateCacheBlock. However, the pointer passed seems to be uninitialized memory. Indeed, when the generated code is executed, it ends up setting PC to 0 and triggering a fault.

Does this sound familiar? I've attached my current diff and backend. This is Fedora 31 on a Raptor Talos II (POWER9) in little-endian mode.

Starting program: /home/censored/src/dosbox-code-0/src/dosbox 
Missing separate debuginfos, use: dnf debuginfo-install glibc-2.30-8.fc31.ppc64le
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib64/libthread_db.so.1".
DOSBox version SVN
Copyright 2002-2019 DOSBox Team, published under GNU GPL.
---
[New Thread 0x7fffe562f150 (LWP 457628)]
[Thread 0x7fffe562f150 (LWP 457628) exited]
CONFIG: Loading primary settings from config file /home/censored/.dosbox/dosbox-SVN.conf
[New Thread 0x7fffe562f150 (LWP 457630)]
MIXER: Got different values from SDL: freq 44100, blocksize 512
ALSA:Can't subscribe to MIDI port (65:0) nor (17:0)
No working midi device found/selected! Please check your settings and/or compilation environment.
MIDI: Opened device:none
DOSBox has switched to max cycles, because of the setting: cycles=auto.
If the game runs too fast, try a fixed cycles amount in DOSBox's options.
^C
Thread 1 "dosbox" received signal SIGINT, Interrupt. <<<< THIS IS THE TRAP INSTRUCTION IN THE EPILOGUE
0x00007fffe65c0068 in ?? ()
(gdb) disas $pc, $pc+0x40
Dump of assembler code from 0x7fffe65c0068 to 0x7fffe65c00a8:
=> 0x00007fffe65c0068: trap
0x00007fffe65c006c: ld r0,272(r1)
0x00007fffe65c0070: mtlr r0
0x00007fffe65c0074: ld r31,248(r1)
0x00007fffe65c0078: ld r30,240(r1)
0x00007fffe65c007c: ld r29,232(r1)
0x00007fffe65c0080: ld r28,224(r1)
0x00007fffe65c0084: ld r27,216(r1)
0x00007fffe65c0088: ld r26,208(r1)
0x00007fffe65c008c: addi r1,r1,256
0x00007fffe65c0090: blr
0x00007fffe65c0094: lis r12,0
0x00007fffe65c0098: ori r12,r12,0
0x00007fffe65c009c: rldicr r12,r12,32,31
0x00007fffe65c00a0: oris r12,r12,4097
0x00007fffe65c00a4: ori r12,r12,49664
End of assembler dump.
(gdb) set $pc+=4
(gdb) cont
Continuing.
gen_jmp_ptr
^C
Thread 1 "dosbox" received signal SIGINT, Interrupt.
gen_jmp_ptr (ptr=0x7fffe6de00d8, imm=imm@entry=16)
at core_dynrec/risc_ppc64le.h:582
582 fprintf(stderr, "gen_jmp_ptr\n"); __asm__("trap\n");
(gdb) bt
#0 gen_jmp_ptr (ptr=0x7fffe6de00d8, imm=imm@entry=16)
at core_dynrec/risc_ppc64le.h:582
#1 0x00000000100c0dc0 in dyn_exit_link (eip_change=<optimized out>)
at core_dynrec/decoder_opcodes.h:1100
#2 0x00000000100d40d4 in CreateCacheBlock (codepage=<optimized out>,
start=start@entry=124868, max_opcodes=31, max_opcodes@entry=32)
at core_dynrec/decoder.h:518
#3 0x00000000100d4b58 in CPU_Core_Dynrec_Run () at core_dynrec.cpp:213
#4 0x000000001000b538 in Normal_Loop () at dosbox.cpp:137
#5 0x000000001000b624 in DOSBOX_RunMachine () at dosbox.cpp:320
#6 0x0000000010011468 in CALLBACK_RunRealInt (intnum=<optimized out>)
Show last 54 lines
    at callback.cpp:105
#7 0x000000001026c0dc in DOS_Shell::Execute (this=<optimized out>,
name=<optimized out>, args=<optimized out>) at shell_misc.cpp:549
#8 0x0000000010266250 in DOS_Shell::DoCommand (this=0x131b5ee0,
line=0x7fffffffd4b7 "") at shell_cmds.cpp:158
#9 0x000000001025da28 in DOS_Shell::ParseLine (this=this@entry=0x131b5ee0,
line=line@entry=0x7fffffffd4b0 "extreme") at shell.cpp:280
#10 0x000000001025ef00 in DOS_Shell::Run (this=0x131b5ee0) at shell.cpp:360
#11 0x000000001025e860 in SHELL_Init () at shell.cpp:766
#12 0x00000000102527e8 in Config::StartUp (this=<optimized out>)
at setup.cpp:941
#13 0x000000001016c9f0 in main (argc=<optimized out>, argv=<optimized out>)
at sdlmain.cpp:2190
(gdb) x/16wx ptr
0x7fffe6de00d8: 0x00000001 0x00000000 0x00000000 0x00000000
0x7fffe6de00e8: 0x00000000 0x00000000 0x00000001 0x00000000
0x7fffe6de00f8: 0x00000000 0x00000000 0x00000000 0x00000000
0x7fffe6de0108: 0x00000000 0x00000000 0x00000000 0x00000000
(gdb) set $pc+=4
(gdb) cont
Continuing.
^C
Thread 1 "dosbox" received signal SIGINT, Interrupt. <<<< THIS IS THE TRAP INSTRUCTION IN GEN_JMP_PTR GENERATED CODE
0x00007fffe65d00d0 in ?? ()
(gdb) disas $pc, $pc+0x40
Dump of assembler code from 0x7fffe65d00d0 to 0x7fffe65d0110:
=> 0x00007fffe65d00d0: trap
0x00007fffe65d00d4: ld r12,16(r12)
0x00007fffe65d00d8: mtctr r12
0x00007fffe65d00dc: bctr
0x00007fffe65d00e0: li r3,1
0x00007fffe65d00e4: b 0x7fffe65c0068
0x00007fffe65d00e8: .long 0x0
0x00007fffe65d00ec: .long 0x0
0x00007fffe65d00f0: .long 0x0
0x00007fffe65d00f4: .long 0x0
0x00007fffe65d00f8: .long 0x0
0x00007fffe65d00fc: .long 0x0
0x00007fffe65d0100: .long 0x0
0x00007fffe65d0104: .long 0x0
0x00007fffe65d0108: .long 0x0
0x00007fffe65d010c: .long 0x0
End of assembler dump.
(gdb) x/16wx $r12
0x7fffe6de00d8: 0x1273f158 0x00000000 0x00000000 0x00000000
0x7fffe6de00e8: 0x00000000 0x00000000 0x1273f1d8 0x00000000
0x7fffe6de00f8: 0x00000000 0x00000000 0x00000000 0x00000000
0x7fffe6de0108: 0x00000000 0x00000000 0x00000000 0x00000000
(gdb) set $pc+=4
(gdb) cont
Continuing.

Thread 1 "dosbox" received signal SIGSEGV, Segmentation fault.

Attachments

Reply 44 of 137, by jmarsh

User metadata
Rank Oldbie
Rank
Oldbie

There's something wrong with the block linking.
If you look near the beginning of cache_init(), each cache_block has its links initialized to (CacheBlockDynRec *)1. That seems to be what the initial dump of 0x7fffe6de00d8 shows, which is ok; the proper pointers to the link_block functions get placed in cache_closeblock(), which obviously gets run after gen_jmp_ptr(). The second dump of 0x7fffe6de00d8 shows this has happened and the 1s have been overwritten with (what I presume are) &links_blocks[0] (0x1273f158) and &link_blocks[1] (0x1273f1d8).

Where it looks like you've gone wrong is this line in gen_jmp_ptr:

gen_mov_qword_to_reg_imm(HOST_R12,(Bit64u)ptr);         // r12 = *(Bit64u*)ptr

The comment matches what should be done (load a 64-bit value from ptr) but gen_mov_qword_to_reg_imm is just storing the pointer to r12. Add "ld r12, 0(r12)" immediately after it (before the possible offset adjustment!) and it should be right.

(The reason for the indirection is that block links are dynamic - they get updated during execution as blocks are executed/created/recycled/invalidated/etc. If their addresses were hardcoded in the instruction stream the entire block would have to be recompiled.)

Reply 45 of 137, by ClassicHasClass

User metadata
Rank Newbie
Rank
Newbie

D'oh, I should have noticed that. That does indeed get me a few instructions further, but then it bombs with this. (I added annotations to every function so I could see if it was a JIT problem but I don't see any new codegen functions being called that weren't before.)

I'm not exactly sure why this is now null. I guess something could have corrupted it. Any quick guesses before I just step through it bit by bit?

ppc64le: gen_mov_direct_ptr
ppc64le: gen_mov_qword_to_reg_imm
ppc64le: gen_mov_word_to_reg
ppc64le: gen_create_branch_long_leqzero
ppc64le: gen_call_function_raw
ppc64le: gen_mov_regword_from_reg
ppc64le: gen_mov_regval16_from_reg
ppc64le: gen_mov_word_from_reg
ppc64le: gen_mov_word_to_reg
ppc64le: gen_add_imm
ppc64le: gen_call_function_raw
ppc64le: gen_mov_word_to_reg
ppc64le: gen_add_imm
ppc64le: gen_mov_word_from_reg
ppc64le: gen_sub_direct_word
ppc64le: gen_add_direct_word
ppc64le: gen_jmp_ptr
ppc64le: gen_mov_qword_to_reg_imm
ppc64le: gen_fill_branch_long
ppc64le: gen_fill_branch
ppc64le: gen_mov_dword_to_reg_imm
ppc64le: gen_return_function
ppc64le: gen_function
done


Thread 1 "dosbox" received signal SIGSEGV, Segmentation fault.
0x00000000100da65c in CacheBlockDynRec::LinkTo (toblock=0x7fffe6de0a90,
index=<optimized out>, this=0x0) at core_dynrec/cache.h:30
30 link[index].to=toblock;
Missing separate debuginfos, use: dnf debuginfo-install SDL-1.2.15-42.fc31.ppc64le SDL_net-1.2.8-15.fc31.ppc64le alsa-lib-1.2.1.2-4.fc31.ppc64le dbus-libs-1.12.16-3.fc31.ppc64le flac-libs-1.3.3-1.fc31.ppc64le gsm-1.0.18-5.fc31.ppc64le libICE-1.0.10-2.fc31.ppc64le libSM-1.2.3-4.fc31.ppc64le libX11-1.6.9-2.fc31.ppc64le libX11-xcb-1.6.9-2.fc31.ppc64le libXau-1.0.9-2.fc31.ppc64le libXcursor-1.1.15-6.fc31.ppc64le libXext-1.3.4-2.fc31.ppc64le libXfixes-5.0.3-10.fc31.ppc64le libXi-1.7.10-2.fc31.ppc64le libXrandr-1.5.2-2.fc31.ppc64le libXrender-0.9.10-10.fc31.ppc64le libXtst-1.2.3-10.fc31.ppc64le libasyncns-0.8-17.fc31.ppc64le libcap-2.26-6.fc31.ppc64le libgcc-9.2.1-1.fc31.ppc64le libgcrypt-1.8.5-1.fc31.ppc64le libglvnd-1.1.1-5.fc31.ppc64le libglvnd-glx-1.1.1-5.fc31.ppc64le libgpg-error-1.36-2.fc31.ppc64le libogg-1.3.3-3.fc31.ppc64le libpng-1.6.37-2.fc31.ppc64le libsndfile-1.0.28-11.fc31.ppc64le libstdc++-9.2.1-1.fc31.ppc64le libuuid-2.34-4.fc31.ppc64le libvorbis-1.3.6-5.fc31.ppc64le libxcb-1.13.1-3.fc31.ppc64le lz4-libs-1.9.1-1.fc31.ppc64le pulseaudio-libs-13.0-1.fc31.ppc64le systemd-libs-243.5-1.fc31.ppc64le xz-libs-5.2.4-6.fc31.ppc64le zlib-1.2.11-20.fc31.ppc64le
(gdb) bt
#0 0x00000000100da65c in CacheBlockDynRec::LinkTo (toblock=0x7fffe6de0a90,
index=<optimized out>, this=0x0) at core_dynrec/cache.h:30
#1 LinkBlocks (ret=ret@entry=BR_Link1) at core_dynrec.cpp:169
#2 0x00000000100da934 in CPU_Core_Dynrec_Run () at core_dynrec.cpp:298
#3 0x000000001000b538 in Normal_Loop () at dosbox.cpp:137
#4 0x000000001000b624 in DOSBOX_RunMachine () at dosbox.cpp:320
#5 0x0000000010011468 in CALLBACK_RunRealInt (intnum=<optimized out>)
at callback.cpp:105
#6 0x0000000010271d70 in DOS_Shell::Execute (this=<optimized out>,
name=<optimized out>, args=<optimized out>) at shell_misc.cpp:549
#7 0x000000001026bee4 in DOS_Shell::DoCommand (this=0x131b5ee0,
line=0x7fffffffd4b7 "") at shell_cmds.cpp:158
#8 0x00000000102636bc in DOS_Shell::ParseLine (this=this@entry=0x131b5ee0,
line=line@entry=0x7fffffffd4b0 "extreme") at shell.cpp:280
#9 0x0000000010264b94 in DOS_Shell::Run (this=0x131b5ee0) at shell.cpp:360
#10 0x00000000102644f4 in SHELL_Init () at shell.cpp:766
#11 0x000000001025847c in Config::StartUp (this=<optimized out>)
at setup.cpp:941
#12 0x0000000010172684 in main (argc=<optimized out>, argv=<optimized out>)
at sdlmain.cpp:2190

Reply 46 of 137, by jmarsh

User metadata
Rank Oldbie
Rank
Oldbie

Double-check the code emitted by gen_mov_direct_ptr(), it's responsible for updating cache.block.running at the start of each block.

There's a couple of suspicious looking lines in gen_addr().

off = addr - (Bit64s)block_ptr;
if ((Bit64s)off == off)

I think this should be a cast to a Bit16s. Probably what's causing the issue at hand.

    IMM_OP(15, dest, 0,    (addr & 0xffff000000000000)>>48); // lis dest, upper
IMM_OP(24, dest, dest, (addr & 0x0000ffff00000000)>>32); // ori dest, dest, ...
RLD_OP(30, dest, dest, 32, 31, 1, 0); // rldicr dest, dest, 32, 31
IMM_OP(25, dest, dest, (addr & 0x00000000ffff0000)>>16); // oris dest, dest, ...
addr = (Bit16s)addr;

When addr is used in a load/store instruction it will be sign-extended, you need to account for this when assembling the base. Or just do a final OR here and set addr to 0.
(Unless ppc64 doesn't sign-extend immediate offsets in which case disregard.)

Reply 48 of 137, by ClassicHasClass

User metadata
Rank Newbie
Rank
Newbie

Well, that was short-lived. It does start Extreme Pinball and get to the main menu, and the menu works, but trying to actually play crashes. I'm having trouble determining what I should be seeing (especially since the only other 64-bit backend I have to work from is the x86_64 one, and that's not very similar). I think I'm having a 64-bit issue sorting out effective addresses.

Thread 1 "dosbox" received signal SIGSEGV, Segmentation fault.
mem_writeb_checked_drc (address=2411531, val=255 '\377')
at core_dynrec/decoder_basic.h:712
712 host_writeb(tlb_addr+address,val);
(gdb) bt
#0 mem_writeb_checked_drc (address=2411531, val=255 '\377')
at core_dynrec/decoder_basic.h:712
#1 0x00007fffe6a66104 in ?? ()
#2 0x00000000100d60ac in CPU_Core_Dynrec_Run () at core_dynrec.cpp:232
#3 0x000000001000b538 in Normal_Loop () at dosbox.cpp:137
#4 0x000000001000b624 in DOSBOX_RunMachine () at dosbox.cpp:320
#5 0x0000000010011468 in CALLBACK_RunRealInt (intnum=<optimized out>)
at callback.cpp:105
#6 0x000000001026d60c in DOS_Shell::Execute (this=<optimized out>,
name=<optimized out>, args=<optimized out>) at shell_misc.cpp:549
#7 0x0000000010267780 in DOS_Shell::DoCommand (this=0x131b5ee0,
line=0x7fffffffd4b7 "") at shell_cmds.cpp:158
#8 0x000000001025ef58 in DOS_Shell::ParseLine (this=this@entry=0x131b5ee0,
line=line@entry=0x7fffffffd4b0 "extreme") at shell.cpp:280
#9 0x0000000010260430 in DOS_Shell::Run (this=0x131b5ee0) at shell.cpp:360
#10 0x000000001025fd90 in SHELL_Init () at shell.cpp:766
#11 0x0000000010253d18 in Config::StartUp (this=<optimized out>)
at setup.cpp:941
#12 0x000000001016df20 in main (argc=<optimized out>, argv=<optimized out>)
at sdlmain.cpp:2190
(gdb) disas 0x00007fffe6a66104-0x40, 0x00007fffe6a66104
Dump of assembler code from 0x7fffe6a660c4 to 0x7fffe6a66104:
0x00007fffe6a660c4: stw r3,-288(r30)
0x00007fffe6a660c8: lwz r29,-276(r30)
0x00007fffe6a660cc: lwz r8,28(r30)
0x00007fffe6a660d0: add r29,r29,r8
0x00007fffe6a660d4: lwz r8,-260(r30)
0x00007fffe6a660d8: add r29,r29,r8
0x00007fffe6a660dc: li r4,255
0x00007fffe6a660e0: mr r3,r29
0x00007fffe6a660e4: lis r12,0
0x00007fffe6a660e8: ori r12,r12,0
0x00007fffe6a660ec: rldicr r12,r12,32,31
0x00007fffe6a660f0: oris r12,r12,4106
0x00007fffe6a660f4: ori r12,r12,58344
0x00007fffe6a660f8: mtctr r12
0x00007fffe6a660fc: nop
0x00007fffe6a66100: bctrl <<< call to mem_writeb_checked_drc
End of assembler dump.
(gdb) x/16wx $r30-276
0x1031a6dc <cpu_regs+12>: 0xffffebfe 0x001eaebc 0x00000000 0x00000000
0x1031a6ec <cpu_regs+28>: 0x0024e00d 0x001c0bf0 0x00000000 0x00000246
0x1031a6fc <cpu_regs+44>: 0x00000000 0x00000000 0x00000000 0x00000003
0x1031a70c <cpu+12>: 0x00000000 0x00000011 0x00000000 0x00000001
(gdb) x/16wx $r30+28
0x1031a80c <Segs+28>: 0x00000000 0x00000000 0x0001c110 0x00000000
0x1031a81c <Segs+44>: 0x00000000 0x100d5f50 0x00000000 0x000055d1
0x1031a82c: 0x00000000 0x003acde4 0x00000000 0x00000000
0x1031a83c <cpu_tss+4>: 0x00000000 0x00000000 0x00000000 0x00000000
(gdb) x/16wx $r30-260
0x1031a6ec <cpu_regs+28>: 0x0024e00d 0x001c0bf0 0x00000000 0x00000246
0x1031a6fc <cpu_regs+44>: 0x00000000 0x00000000 0x00000000 0x00000003
0x1031a70c <cpu+12>: 0x00000000 0x00000011 0x00000000 0x00000001
0x1031a71c <cpu+28>: 0x00000000 0x00170010 0x00000000 0x00003fff
(gdb) disas $pc
Show last 54 lines
Dump of assembler code for function mem_writeb_checked_drc(unsigned int, unsigned char):
0x00000000100ae3e8 <+0>: lis r2,4143
0x00000000100ae3ec <+4>: addi r2,r2,32000
0x00000000100ae3f0 <+8>: mr r8,r3
0x00000000100ae3f4 <+12>: rldicl r10,r3,52,12
0x00000000100ae3f8 <+16>: addis r9,r10,16
0x00000000100ae3fc <+20>: addi r9,r9,4
0x00000000100ae400 <+24>: rldicr r9,r9,3,60
0x00000000100ae404 <+28>: addis r7,r2,2
0x00000000100ae408 <+32>: addi r7,r7,11776
0x00000000100ae40c <+36>: ldx r9,r7,r9
0x00000000100ae410 <+40>: cmpdi r9,0
0x00000000100ae414 <+44>: beq 0x100ae424 <mem_writeb_checked_drc(unsigned int, unsigned char)+60>
=> 0x00000000100ae418 <+48>: stbx r4,r9,r3
0x00000000100ae41c <+52>: li r3,0
0x00000000100ae420 <+56>: blr
0x00000000100ae424 <+60>: mflr r0
0x00000000100ae428 <+64>: std r0,16(r1)
0x00000000100ae42c <+68>: stdu r1,-32(r1)
0x00000000100ae430 <+72>: std r2,24(r1)
0x00000000100ae434 <+76>: addis r10,r10,48
0x00000000100ae438 <+80>: addi r10,r10,4
0x00000000100ae43c <+84>: rldicr r10,r10,3,60
0x00000000100ae440 <+88>: addis r9,r2,2
0x00000000100ae444 <+92>: addi r9,r9,11776
0x00000000100ae448 <+96>: ldx r3,r9,r10
0x00000000100ae44c <+100>: ld r9,0(r3)
0x00000000100ae450 <+104>: ld r12,104(r9)
0x00000000100ae454 <+108>: mr r5,r4
0x00000000100ae458 <+112>: mr r4,r8
0x00000000100ae45c <+116>: mtctr r12
0x00000000100ae460 <+120>: bctrl
0x00000000100ae464 <+124>: ld r2,24(r1)
0x00000000100ae468 <+128>: addi r1,r1,32
0x00000000100ae46c <+132>: ld r0,16(r1)
0x00000000100ae470 <+136>: mtlr r0
0x00000000100ae474 <+140>: blr
--Type <RET> for more, q to quit, c to continue without paging--q
Quit
(gdb) i reg r3 r4 r8 r9
r3 0x10024cc0b 4297378827
r4 0xff 255
r8 0x10024cc0b 4297378827
r9 0x12ab9a88 313236104
(gdb) p tlb_addr
$1 = (HostPt) 0x12ab9a88 <ram_page_handler> "\240\344.\020"
(gdb) p address
$2 = 2411531
(gdb) x/16wx tlb_addr+address
0x12d06693 <scalerSourceCache+1002603>: 0x00000000 0x00000000 0x00000000 0x00000000
0x12d066a3 <scalerSourceCache+1002619>: 0x00000000 0x00000000 0x00000000 0x00000000
0x12d066b3 <scalerSourceCache+1002635>: 0x00000000 0x00000000 0x00000000 0x00000000
0x12d066c3 <scalerSourceCache+1002651>: 0x00000000 0x00000000 0x00000000 0x00000000

I get similar errors in mem_writed_... and other related functions, so I think they all have the same problem. Is that 0xffffebfe which gets used in the address calculations probably valid, or probably a bug? Is addition to compute the address supposed to be signed or unsigned? I can't really tell from the x86_64 version, and running it with C_DEBUG tells me absolutely nothing.

Sorry to bug you so much, I'm unfamiliar with a lot of the undercode and not having a relatively similar backend to refer to (MIPS64, aarch64) makes it worse.

Reply 49 of 137, by jmarsh

User metadata
Rank Oldbie
Rank
Oldbie

Smells like a 64-bit ABI issue, but I'm not completely positive because it looks like the backtrace info for the call to mem_writeb_checked_drc is hiding the real parameter value.
The emitted instructions look like translation of LEA: r29 is loaded from a cpu reg (base), added with a segment reg, then added with another cpu reg (index). On an x86 system it's all 32-bit math but on ppc64 there is an overflow that puts junk in the high half of r29. That's usually fine since only the lower 32-bits get stored to memory, but here the result is being used as a parameter to a function:

// load a host-register as param'th function parameter
static void INLINE gen_load_param_reg(Bitu reg,Bitu param) {
gen_mov_regs(RegParams[param], (HostReg)reg);
}

You need to zero-extend from 32 to 64-bits here. It looks like you already figured that out for gen_load_param_imm() but the code in risc_x64.h for this function is more subtle; the default data word-size for x86 long mode is 32-bit unless a rex.W prefix is used.
Even though the function definition for mem_writeb_checked_drc declares the first parameter as PhysPt (uint32_t) I'd guess the compiler assumes the upper 32-bits are zeroed by the caller, so despite gdb saying "address=2411531" (0x0024CC0B) I suspect it is actually 4297378827 (0x10024CC0B).

Edit: My bad, I missed the "show more lines" button at the bottom of the code block. That confirms that r3 (parameter 1) is expected to be zero-extended to 64 bits and in this case it has bit 32 set, causing the crash.

Reply 50 of 137, by ClassicHasClass

User metadata
Rank Newbie
Rank
Newbie

Turned out the simplest solution was instead of mr in gen_mov_regs, use clrldi. This matches x86_64 which just uses mov as you pointed out.

"The Emperor has approved your test demonstration, General Mohc."

I'm doing more testing before I optimize prematurely. 😀

Reply 51 of 137, by ClassicHasClass

User metadata
Rank Newbie
Rank
Newbie

So far tested with Dark Forces, Extreme Pinball, Pinball Illusions, Doom, Quake, Descent and various benchmarks. The only thing odd that turned up is that Descent's credits and "cut scenes" were abnormally slow with dynrec. It looks like this Talos II can run it substantially faster than the game expects -- if I back the cycles down to about 20%, those run normally. However, the game itself plays extremely well in dynrec, absolutely butter smooth.

Next is to whittle down the other patches to the minimum required.

Attachments

Reply 52 of 137, by jmarsh

User metadata
Rank Oldbie
Rank
Oldbie

Palette updates can cause slowdown in DOSBox, I know Descent's credits use them and the loading screens have fade-in/fade-outs. I can't remember if they're done unconditionally for every frame or only when a color is actually modified though.

Reply 53 of 137, by ClassicHasClass

User metadata
Rank Newbie
Rank
Newbie

No, I think this is something else. Even my Quad G5 does it with your 32-bit JIT if cycles are at 100%, even in Reduced performance. If you try the Descent Shareware episode, the little text cut scenes occasionally have stutters in the music as the text is printed and I don't see anything that looks palette related there. The credits also have little hiccups when it scrolls. Again, on both systems, backing down the % cycles fixes the issue, which makes me wonder if the dynrec is destabilizing some sort of timing loop. I'm curious if a high-performance ARM system has the same problem.

But, the game wouldn't be playable on the G5 without it, so I'll take it!

I've got the patch against source shaved down some more (basically the configure changes and your cacheblock code is left) and I'm making sure there aren't any other edges.

Reply 54 of 137, by jmarsh

User metadata
Rank Oldbie
Rank
Oldbie

I definitely remember finding a bug in Descent's source code related to the timed text rendering on briefing screens but I can't remember the details. I think it was rendering everything twice when adding each letter which caused obvious slowdown - you can see it in the enemy briefings, the rotating models get faster when the text description reaches its end.

Reply 55 of 137, by ClassicHasClass

User metadata
Rank Newbie
Rank
Newbie

Okay, I think this is the minimum patch required for little-endian ppc64 (plus the backend).

Attachments

Reply 56 of 137, by xjas

User metadata
Rank l33t
Rank
l33t

I'm just about out of PowerPC machines (and never had anything POWER to begin with), but I just wanted to say it's pretty cool you guys are developing this. Awesome work!
Would it have any application to e.g. a future ARM dynarec? Or are they too dissimilar?

twitch.tv/oldskooljay - playing the obscure, forgotten & weird - most Tuesdays & Thursdays @ 6:30 PM PDT. Bonus streams elsewhen!

Reply 57 of 137, by M-HT

User metadata
Rank Newbie
Rank
Newbie
xjas wrote on 2020-01-07, 10:31:

Would it have any application to e.g. a future ARM dynarec? Or are they too dissimilar?

Dynarec for ARM already exists - 32-bit and 64-bit.

Reply 58 of 137, by kas1e

User metadata
Rank Newbie
Rank
Newbie

@jmarsh
I tried to use your patch for AmigaOS4 (PPC32, BIG-Endian). All that I do now is:

1). download today's SVN DOSBox code

2). put your path to root of DOSBox directory and do "patch -p1 < ppc_dynrec.diff". All fine, no errors.

3).
./autogen.sh
./configure --build=x86_64 --host=ppc-amigaos --target=ppc-amigaos --disable-dynamic-x86 --disable-opengl

If i understand right, i should do only "--disable-dynamic-x86", so no "--disable-dynrec" or "--disable-dynamic-core", to make that ppc-risk dynrec working. That correct ?:)

4). at top of config.h add:

#define C_TARGETCPU POWERPC
#define C_DYNREC 1
#define WORDS_BIGENDIAN

5). make + link and binary ready.

6). Tested that binary runs and works fine with "core:normal".

7). Changed in config "core:normal" to "core:dynamic" and run DOSBox => crash.

Crash type is "Instruction Storage Interrupt", coming from CPU_Core_Dynrec_Runv() which then shows "symbol not available 0x5929B000". On the assembler side, it crashes on "stwu r1,-256(r1)". And that 0x5929B000 which contains r9 register on the moment of the crash. And before the crashed instructions there is a lot of the same repeated instructions which look like this "abadcafe lha r29,-13570(r13)". And that "abadcafe" thing mean on amigaos4 "A bad cafe", Used to initialize all unallocated memory.

Can it be an issue with unaligned memory access for example? (on my OS it didn't handle by kernel automatically). Or maybe some "ifdef __amigaos4__" need it in your path?

Thanks for answer 😀

Reply 59 of 137, by jmarsh

User metadata
Rank Oldbie
Rank
Oldbie

Unaligned memory access would trigger either an alignment or DSI exception, so it's not that.
An ISI would likely be caused by stale data in the instruction cache (the code in cache_block_closing is meant to prevent this), or trying to run code from memory that isn't marked as executable. Most PPC32 platforms don't implement this, but if yours does it needs to support mprotect() to change the memory permissions (see the "#if (C_HAVE_MPROTECT)" section around line 604 in cache.h).
If you could examine the contents of the SRR1 register when the exception occurs, it will give more information.