I have been working with NovaCoder to enable 68k dynrec support. I have the dynrec cache buffers executing code but I believe I have found several bugs. Making function parameters fully stack based causes problems. This can be experienced by turning off the GCC fastcall function attribute (#define DRC_CALL_CONV _fastcall) as x86 uses all stack based calls by default. I have fixed some of the problems in decoder_basic.h which I will attach to this message. The first problems are with gen_call_function_xxx() functions where gen_call_function_setup() is called with the wrong number of parameters. With register function parameters this isn't a problem but the count can be used to deallocate the parameters on the stack after the function call. The gen_call_function_xxx() functions work well for register or stack based parameters but they aren't always used. For example, dyn_read_byte() calls gen_move_regs() to load it's register parameter and then gen_call_function_raw() to call the function. I changed this to gen_call_function_R() so the parameter can be passed in a register or on the stack. I have marked all my potential bug fixes in decoder_basic with *** Bug fix ***. This file was pretty straight forward to change and I used the general style. There is more problems like this in decoder_opcodes.h that aren't as simple to change. Someone experienced with this code could probably edit them in much less time with fewer problems and the way they want them. Note that gen_call_function_raw is correct where there are no parameters.
I also noticed while debugging that some dynrec generated code doesn't seem to have the caches flushed by cache_block_closing() before executing. This includes the dynrec code generated by gen_run_code(), a return (2) equivalent and a return (3) equivalent. None of these need to be dynrec generated anyway. gen_run_code() would be faster and wouldn't need the caches flushed if it was a normal function that was passed a pointer to the code to start rather like cache_block_closing() which doesn't generate dynrec code as it's already compiled/assembled. I haven't noticed any crashes from this but the 68k has a fraction of the caches of a modern processor.
Wow. After two days you begin throwing around names? Are your problems the most important ones in the world? Did you consider that maybe people have a real life as well? Goodbye 68k as...
Have you taken a look at the other platforms ?
Do arm and mipsel handle it differently ?
RISC ABI's (and risc_x64.h) pretty much all specify register based function parameters and this default is built into GCC for those platforms. The GCC _fastcall function attribute is ignored (doesn't exist) on those platforms. They end up calling a function like this:
1move par4 to r3 // gen_load_param_xxx() generated 2move par3 to r2 // gen_load_param_xxx() generated 3move par2 to r1 // gen_load_param_xxx() generated 4move par1 to r0 // gen_load_param_xxx() generated 5call function // gen_call_function_setup() generated
The paramcount passed to gen_call_function_setup() does not matter because it's not used because there is no stack to deallocate. If the gen_call_function_xxx() functions are bypassed in decoder_basic.h then the parameters are already in the correct register. The dynrec support has no problems here.
The x86 and 68k use stack based function parameters (_cdecl) by default. The x86 can use the _fastcall function attribute which passes the first 2 parameters in registers and the rest on the stack. Most other platforms do not support _fastcall though. This would look like:
1push par4 // gen_load_param_xxx() generated 2push par3 // gen_load_param_xxx() generated 3move par2 to r1 // gen_load_param_xxx() generated 4move par1 to r0 // gen_load_param_xxx() generated 5call function // gen_call_function_setup() generated
Stack paramaters are deallocated by the function. It looks to me like _fastcall in risc_x86.h will work but turning it off (_cdecl) will not. This line:
1cache_addb((!fastcall)?paramcount*4:0);
Won't work for all stack based parameters when gen_call_function_R3(), gen_call_function_m() and gen_call_function_mm() are used. The paramcount must match the number of paramaters pushed on the stack and it doesn't.
I think _stdcall parameter passing would be easier to implement than _fastcall and is possibly as fast in risc_x86.h. The x86 doesn't have many general purpose registers so the function may have to push some of the parameters back on the stack anyway defeating the work _fastcall did in putting them in particular registers. Working gen_load_param_xxx() functions would need slow difficult to predict branches to do _fastcall correctly. It's possible that _stdcall could be faster in this case for these reasons. This looks like:
With _stdcall (like _fastcall), the function deallocates the stack with the x86 RET instruction which accepts an immediate value for how many bytes of the stack to deallocate. It gets this from the function prototypes. The paramcount parameter of gen_call_function_setup() is then unused. It's easy, less prone to problems in this case and fairly fast. The risc_x86.h dynrec gen_call_function_setup() would look something like this:
1static Bit32u INLINE gen_call_function_setup(void * func,Bitu paramcount,bool fastcall=false) { 2 // Do the actual call to the procedure 3 Bit32u proc_addr=(Bit32u)cache.pos; 4 cache_addb(0xe8); 5 cache_addd((Bit32u)func - (Bit32u)cache.pos-4); 6 7#if DRC_CALL_CONV!=_stdcall 8 // Restore the params of the stack 9 if (paramcount) { 10 cache_addw(0xc483); //add ESP,imm byte 11 cache_addb(paramcount*4); 12 { 13#endif 14 return proc_addr; 15}
The 68k has an instruction called RTD (Return and Deallocate) that does the same as the x86 RET with immediate, however, GCC has not provided any function attributes for parameters which is why we use the old _cdecl style for now. Here is what it looks like:
The last line deallocates 4 parameters of 4 bytes each. Here the paramcount passed to gen_call_function_setup must be correct to deallocate the parameters from the stack. The gen_call_function_xxx() function calls must be used for a function called from the dynrec cache buffer when there is at least 1 parameter. The gen_call_function_xxx() functions must call gen_call_function_setup() with the correct number of parameters loaded with gen_load_param_xxx() calls. If everything is not correct, either the stack gets corrupted (crash) or the parameters are not where they are expected (memory corruption). All stack based parameters (_cdecl) is the best test for problems after dynrec code changes. All aspects of the interface have to be correct for it to work. The original interface was designed properly to support all kinds of parameter calling but it has to be used.
I'll attach the risc_68k.h file I have created. It's still under construction but it might be useful as 68k assembler is very easy to read. It's possible someone may spot errors as well.
I think I've figured out how to make _fastcall, _stdcall, and _cdecl compatible using the same interface. I've defined an optional new variable called DRC_STACKPARM for risc_anyCPU.h that pushes the _cdecl and _stdcall missing paramaters to the stack in decoder_basic.h gen_call_function_R3(), gen_call_function_m() and gen_call_function_mm(). The code remains the same for _fastcall which I think works (turn _fastcall off in risc_x86.h and I think it crashes though). I improved the comments some too 😉. There are still a lot of functions in decoder_opcodes.h that are not using this interface. They put parameters in FC_OP1 and FC_OP2 registers instead of on the stack before calling functions from the dynrec buffer.
Just an idea based on how some x86 compilers optimize function calls:
Instead of pushing parameters on stack and then reclaiming the stack after function call, the idea is to allocate the space for 4 parameters on stack (when entering dynarec) and when you call a function you move the parameters into this space. (The stack space is reclaimed when exiting dynarec.)
The function call would look like this:
1move par4 to [sp+12] 2move par3 to [sp+8] 3move par2 to [sp+4] 4move par1 to [sp] 5call function
where sp is the stack pointer.
I haven't thought about this much - whether it's better or not (and I didn't read your code), but it can be implemented in the 68k backend without changing other dosbox code.
@M-HT
I believe the stack on both the 68k and x86 grows negative.
The x86:
1push par1
is actually:
1move par1,-(sp)
And pop is:
1move (sp)+,par1
Where sp is the stack pointer and we use () instead of the x86 [] to represent memory indirect (we do use () and [] in double memory indirect addressing modes). This is the notation we use on the 68k as we don't have pop and push instructions (although macros are easy enough). It looks like your example would overwrite data already on the stack. You would have to write your new data to negative offsets, maybe like this:
1move par4 to [sp-20] 2move par3 to [sp-16] 3move par2 to [sp-12] 4move par1 to [sp-8] 5;[sp-4] reserved for return PC from function call 6call function
The return instruction would pop off the value at [sp-4] restoring the stack correctly but the paramaters would be at a negative offset instead of a positive offset. We can't easily change the way the parameters are to be received in the function calls. Even if this was workable somehow, it would not solve are problem of some of the parameters not being pushed on the stack. For example, gen_call_function_R3() in decoder_basic.h is used for calling functions with 3 parameters but only 1 is pushed on the stack and then paramcount=3 are deallocated from the stack with all stack based parameters. Originally, I made paramcount=1 which fixed the stack problem but then 3 parameters were read from the stack when there was 1 parameter passed to the function which causes memory corruption. This was wrong! DosBox was passing the first 2 parameters in registers sometimes which only works with _fastcall and all register function calling conventions where paramcount isn't used. The paramcount of gen_call_function_setup() is only needed by _cdecl (all stack) style parameters and the fact that it exists means the interface was originally designed to handle stack based passing as it serves no other purpose than to deallocate the stack after function calls. If all stack based parameter passing used to work, then partial register parameter passing also violates the original interface. I don't think the devs even care. I have done all the fixes required myself with conditional preprocessor instructions which is the least obtrusive way to handle this and NovaCoder and I are testing now. I bet the devs won't even accept the work when it's done and working but maybe it will get us up and running faster than creating assembler stubs or functions. Thanks for at least trying to help us.
Using Matt's updates (above), now produces this output from DosBox AGA.
1DOSBOX_Init() - Adding Init Functions 2DOSBox version 0.74 3Copyright 2002-2010 DOSBox Team, published under GNU GPL. 4--- 5Init SDL() - START 6Init SDL() - END 7CONFIG:Loading primary settings from config file dosbox.conf 8Init all the sections 9MIXER:No Sound Mode Selected. 10MIDI:Opened device:none 11CPU Message Set Video Mode 3 12CPU_Core_Dynrec_Run - START 13CPU_Core_Dynrec_Run - no block cache found 14CPU_Core_Dynrec_Run - let the dynamic core handle this instruction 15CPU_Core_Dynrec_Run - before runcode 16CPU_Core_Dynrec_Run - after runcode 17CPU_Core_Dynrec_Run - no block cache found 18CPU_Core_Dynrec_Run - let the dynamic core handle this instruction 19CPU_Core_Dynrec_Run - before runcode 20CPU_Core_Dynrec_Run - after runcode 21CPU_Core_Dynrec_Run - START 22CPU_Core_Dynrec_Run - no block cache found 23CPU_Core_Dynrec_Run - let the dynamic core handle this instruction 24CPU_Core_Dynrec_Run - before runcode 25CPU_Core_Dynrec_Run - after runcode 26CPU_Core_Dynrec_Run - START 27CPU_Core_Dynrec_Run - before runcode 28CPU_Core_Dynrec_Run - after runcode 29Illegal read from ffffff91, CS:IP 188:ffffe711 30DYNREC:Can't run code in this page 31Illegal read from ffffff91, CS:IP 188:ffffe711 32Illegal read from ffffff92, CS:IP 188:ffffe711 33Illegal read from ffffff93, CS:IP 188:ffffe713 34Illegal read from ffffff94, CS:IP 188:ffffe713 35Illegal read from ffffff95, CS:IP 188:ffffe715 36Illegal read from ffffff96, CS:IP 188:ffffe715 37Illegal read from ffffff97, CS:IP 188:ffffe717 38Illegal read from ffffff98, CS:IP 188:ffffe717 39Illegal read from ffffff99, CS:IP 188:ffffe719 40Illegal read from ffffff9a, CS:IP 188:ffffe719 41Illegal read from ffffff9b, CS:IP 188:ffffe71b 42Illegal read from ffffff9c, CS:IP 188:ffffe71b 43Illegal read from ffffff9d, CS:IP 188:ffffe71d 44Illegal read from ffffff9e, CS:IP 188:ffffe71d 45Illegal read from ffffff9f, CS:IP 188:ffffe71f 46Illegal read from ffffffa0, CS:IP 188:ffffe71f 47Illegal read from ffffffa1, CS:IP 188:ffffe721 48Illegal read from ffffffa2, CS:IP 188:ffffe721 49Illegal read from ffffffa3, CS:IP 188:ffffe723 50Illegal read from ffffffa4, CS:IP 188:ffffe723 51Illegal read from ffffffa5, CS:IP 188:ffffe725 52Illegal read from ffffffa6, CS:IP 188:ffffe725 53Illegal read from ffffffa7, CS:IP 188:ffffe727 54Illegal read from ffffffa8, CS:IP 188:ffffe727 55Illegal read from ffffffa9, CS:IP 188:ffffe729 56Illegal read from ffffffaa, CS:IP 188:ffffe729 57Illegal read from ffffffab, CS:IP 188:ffffe72b 58Illegal read from ffffffac, CS:IP 188:ffffe72b 59Illegal read from ffffffad, CS:IP 188:ffffe72d 60Illegal read from ffffffae, CS:IP 188:ffffe72d
…Show last 172 lines
61Illegal read from ffffffaf, CS:IP 188:ffffe72f 62Illegal read from ffffffb0, CS:IP 188:ffffe72f 63Illegal read from ffffffb1, CS:IP 188:ffffe731 64Illegal read from ffffffb2, CS:IP 188:ffffe731 65Illegal read from ffffffb3, CS:IP 188:ffffe733 66Illegal read from ffffffb4, CS:IP 188:ffffe733 67Illegal read from ffffffb5, CS:IP 188:ffffe735 68Illegal read from ffffffb6, CS:IP 188:ffffe735 69Illegal read from ffffffb7, CS:IP 188:ffffe737 70Illegal read from ffffffb8, CS:IP 188:ffffe737 71Illegal read from ffffffb9, CS:IP 188:ffffe739 72Illegal read from ffffffba, CS:IP 188:ffffe739 73Illegal read from ffffffbb, CS:IP 188:ffffe73b 74Illegal read from ffffffbc, CS:IP 188:ffffe73b 75Illegal read from ffffffbd, CS:IP 188:ffffe73d 76Illegal read from ffffffbe, CS:IP 188:ffffe73d 77Illegal read from ffffffbf, CS:IP 188:ffffe73f 78Illegal read from ffffffc0, CS:IP 188:ffffe73f 79Illegal read from ffffffc1, CS:IP 188:ffffe741 80Illegal read from ffffffc2, CS:IP 188:ffffe741 81Illegal read from ffffffc3, CS:IP 188:ffffe743 82Illegal read from ffffffc4, CS:IP 188:ffffe743 83Illegal read from ffffffc5, CS:IP 188:ffffe745 84Illegal read from ffffffc6, CS:IP 188:ffffe745 85Illegal read from ffffffc7, CS:IP 188:ffffe747 86Illegal read from ffffffc8, CS:IP 188:ffffe747 87Illegal read from ffffffc9, CS:IP 188:ffffe749 88Illegal read from ffffffca, CS:IP 188:ffffe749 89Illegal read from ffffffcb, CS:IP 188:ffffe74b 90Illegal read from ffffffcc, CS:IP 188:ffffe74b 91Illegal read from ffffffcd, CS:IP 188:ffffe74d 92Illegal read from ffffffce, CS:IP 188:ffffe74d 93Illegal read from ffffffcf, CS:IP 188:ffffe74f 94Illegal read from ffffffd0, CS:IP 188:ffffe74f 95Illegal read from ffffffd1, CS:IP 188:ffffe751 96Illegal read from ffffffd2, CS:IP 188:ffffe751 97Illegal read from ffffffd3, CS:IP 188:ffffe753 98Illegal read from ffffffd4, CS:IP 188:ffffe753 99Illegal read from ffffffd5, CS:IP 188:ffffe755 100Illegal read from ffffffd6, CS:IP 188:ffffe755 101Illegal read from ffffffd7, CS:IP 188:ffffe757 102Illegal read from ffffffd8, CS:IP 188:ffffe757 103Illegal read from ffffffd9, CS:IP 188:ffffe759 104Illegal read from ffffffda, CS:IP 188:ffffe759 105Illegal read from ffffffdb, CS:IP 188:ffffe75b 106Illegal read from ffffffdc, CS:IP 188:ffffe75b 107Illegal read from ffffffdd, CS:IP 188:ffffe75d 108Illegal read from ffffffde, CS:IP 188:ffffe75d 109Illegal read from ffffffdf, CS:IP 188:ffffe75f 110Illegal read from ffffffe0, CS:IP 188:ffffe75f 111Illegal read from ffffffe1, CS:IP 188:ffffe761 112Illegal read from ffffffe2, CS:IP 188:ffffe761 113Illegal read from ffffffe3, CS:IP 188:ffffe763 114Illegal read from ffffffe4, CS:IP 188:ffffe763 115Illegal read from ffffffe5, CS:IP 188:ffffe765 116Illegal read from ffffffe6, CS:IP 188:ffffe765 117Illegal read from ffffffe7, CS:IP 188:ffffe767 118Illegal read from ffffffe8, CS:IP 188:ffffe767 119Illegal read from ffffffe9, CS:IP 188:ffffe769 120Illegal read from ffffffea, CS:IP 188:ffffe769 121Illegal read from ffffffeb, CS:IP 188:ffffe76b 122Illegal read from ffffffec, CS:IP 188:ffffe76b 123Illegal read from ffffffed, CS:IP 188:ffffe76d 124Illegal read from ffffffee, CS:IP 188:ffffe76d 125Illegal read from ffffffef, CS:IP 188:ffffe76f 126Illegal read from fffffff0, CS:IP 188:ffffe76f 127Illegal read from fffffff1, CS:IP 188:ffffe771 128Illegal read from fffffff2, CS:IP 188:ffffe771 129Illegal read from fffffff3, CS:IP 188:ffffe773 130Illegal read from fffffff4, CS:IP 188:ffffe773 131Illegal read from fffffff5, CS:IP 188:ffffe775 132Illegal read from fffffff6, CS:IP 188:ffffe775 133Illegal read from fffffff7, CS:IP 188:ffffe777 134Illegal read from fffffff8, CS:IP 188:ffffe777 135Illegal read from fffffff9, CS:IP 188:ffffe779 136Illegal read from fffffffa, CS:IP 188:ffffe779 137Illegal read from fffffffb, CS:IP 188:ffffe77b 138Illegal read from fffffffc, CS:IP 188:ffffe77b 139Illegal read from fffffffd, CS:IP 188:ffffe77d 140Illegal read from fffffffe, CS:IP 188:ffffe77d 141Illegal read from ffffffff, CS:IP 188:ffffe77f 142CPU Message CPU:LOCK 143CPU Message CPU:LOCK 144CPU Message CPU:LOCK 145CPU Message CPU:LOCK 146CPU Message CPU:LOCK 147CPU Message CPU:LOCK 148CPU Message CPU:LOCK 149CPU Message CPU:LOCK 150CPU Message CPU:LOCK 151CPU Message CPU:LOCK 152CPU Message CPU:LOCK 153CPU Message CPU:LOCK 154CPU Message CPU:LOCK 155CPU Message CPU:LOCK 156CPU Message CPU:LOCK 157CPU Message CPU:LOCK 158CPU Message CPU:LOCK 159CPU Message CPU:LOCK 160CPU Message CPU:LOCK 161CPU Message CPU:LOCK 162CPU Message CPU:LOCK 163CPU Message CPU:LOCK 164CPU Message CPU:LOCK 165CPU Message CPU:LOCK 166CPU Message CPU:LOCK 167CPU Message CPU:LOCK 168CPU Message CPU:LOCK 169CPU Message CPU:LOCK 170CPU Message CPU:LOCK 171CPU Message CPU:LOCK 172CPU Message CPU:LOCK 173CPU Message CPU:LOCK 174CPU Message CPU:LOCK 175CPU Message CPU:LOCK 176CPU Message CPU:LOCK 177CPU Message CPU:LOCK 178CPU Message CPU:LOCK 179CPU Message CPU:LOCK 180CPU Message CPU:LOCK 181CPU Message CPU:LOCK 182CPU Message CPU:LOCK 183CPU Message CPU:LOCK 184CPU Message CPU:LOCK 185CPU Message CPU:LOCK 186CPU Message CPU:LOCK 187CPU Message CPU:LOCK 188CPU Message CPU:LOCK 189CPU Message CPU:LOCK 190CPU Message CPU:LOCK 191CPU Message CPU:LOCK 192CPU Message CPU:LOCK 193CPU Message CPU:LOCK 194CPU Message CPU:LOCK 195CPU Message CPU:LOCK 196CPU Message CPU:LOCK 197CPU Message CPU:LOCK 198CPU Message CPU:LOCK 199CPU Message CPU:LOCK 200CPU Message CPU:LOCK 201CPU Message CPU:LOCK 202CPU Message CPU:LOCK 203CPU Message CPU:LOCK 204CPU Message CPU:LOCK 205CPU Message CPU:LOCK 206CPU Message CPU:LOCK 207CPU Message CPU:LOCK 208CPU Message CPU:LOCK 209CPU Message CPU:LOCK 210CPU Message CPU:LOCK 211CPU Message CPU:LOCK 212CPU Message CPU:LOCK 213CPU Message CPU:LOCK 214CPU Message CPU:LOCK 215CPU Message CPU:LOCK 216CPU Message CPU:LOCK 217CPU Message CPU:LOCK 218CPU Message CPU:LOCK 219CPU Message CPU:LOCK 220CPU Message CPU:LOCK 221CPU Message CPU:LOCK 222CPU Message CPU:LOCK 223CPU Message Illegal Unhandled Interrupt Called 0 224CPU_Core_Dynrec_Run - START 225CPU_Core_Dynrec_Run - no block cache found 226CPU_Core_Dynrec_Run - let the dynamic core handle this instruction 227CPU_Core_Dynrec_Run - before runcode 228CPU_Core_Dynrec_Run - after runcode 229CPU_Core_Dynrec_Run - no block cache found 230CPU_Core_Dynrec_Run - let the dynamic core handle this instruction 231CPU_Core_Dynrec_Run - before runcode
But instead of using the instruction "sub esp, 4*4" (to reserve stack) before every function call, you execute it once, when entering dynrec code (in function gen_run_code).
And instead of using the instruction "add esp, 4*4" (to release stack) after every function call, you execute it once, when exiting dynrec code.
I hope I made it clear this time.
Reserving and releasing the stack (in function gen_run_code) might be a bit complicated, but it's doable.
M-HT wrote:Apparently I didn't explain the idea sufficiently enough.
I'll try it with examples (with x86 instructions, because I'm not fami […] Show full quote
Apparently I didn't explain the idea sufficiently enough.
I'll try it with examples (with x86 instructions, because I'm not familiar with 68k).
But instead of using the instruction "sub esp, 4*4" (to reserve stack) before every function call, you execute it once, when entering dynrec code (in function gen_run_code).
And instead of using the instruction "add esp, 4*4" (to release stack) after every function call, you execute it once, when exiting dynrec code.
Alright. I think I understand now. That would work if setup from 1 level of code and calling multiple functions without having to fix the stack after each function. However, calling the function here puts the return value on the stack and would not work at the level below without overwriting the return value. It might be possible to set it up at the start of the dynrec buffer code but there are multiple return points. It's a good assembler trick but it also has it's limitations 😀
I fixed another bug and we now get to the flashing command prompt but there are still some problems. Keyboard entry still isn't working. We are getting close though 😎.