When the PLT already contains the cos() address (when it was loaded with a previous run with normal core) there is no problem. If the library is not loaded, then the dyld_stub_binder() is called. It crashes at the first xmm0 instruction so I assume the error is that the stack is not 16-bit aligned at that point.
Why is 0x08 added to rsp in gen_call_function_setup()?
Replaced gen_call_function_raw() with gen_call_function_setup() and it works the first time as well, so it is the stack alignment 😎. Although it is quite a lot slower...I guess the real solution would be to see which calls really need aligned stack and fix only those 😁
Ok, tested it just now.
I assumed the changes by wd need to go on top of the diff by gulikoza (and at risc_x64.h line 336).
With this core dynamic still crashes at the PCPbench. Also crashes if the Pcpbench first ran in normal core and I then switched to dynamic.
I was able to steal the macbook for the weekend again 😁
With wd's changes, PCPBENCH works fine here. Dominus, are you sure you changed the gen_call_function_raw() (as that one is at line 378 here (although yes, my risc_x64.h is probably modified))? The gen_call_function_setup() should be left as it is (line 343 here). And yes, these changes should go after applying my patch from the previous page.
edit: nvm, I probably moved gen_call_function_raw, but still it should work. Let me try again with clean risc_x64.h
Ok...checked out risc_x64 from svn again. Touched the file (so the modifications get picked up by make). Ran dosbox - crash before prompt. Applied my patch. Crash just after running PCPBENCH. Pasted wd's code on line 336. PCPBENCH works. 😀
Ok so your patch (fixed memory addressing) + the stack adjustment i've
posted + small cache blocks makes this work? Nice.
Does it work with cache block entries:=32 and the _raw replaced by the other
function setup routine? The logic behind my small change was just to align
the stack to 16byte boundaries always since that seems an ABI requirement,
but i'll check where we call that (though it doesn't sound like it'd be related
to the max cache block entries).
It crashes at 0x11e04cdcc. 0x11e04cdce looks like the second part of the gen_call_function_raw(), but add BYTE PTR [rax], al?? Memory at that address is:
I guess those NOPs come from gen_fill_function_ptr() but they no longer correctly overwrite the function call. Disabling DRC_FLAGS_INVALIDATION fixes the crash.
There's a problem with default case (pos+2) I guess. This becomes pos+6 when _raw is called, but remains (?) pos+2 when _setup is called. I'm not sure I can find all cases when pos+6 is required. I've added -4 to cache.pos in _setup in the following patch. Feel free to change that (or put something in the comment) 😀
The following patch has all the changes merged and seems to work (MAC OSX at least, I'll try to test linux later as well)