Reply 80 of 110, by wd
If somebody does some testing (osx or linux) on this, please post if it works fine now.
If somebody does some testing (osx or linux) on this, please post if it works fine now.
yes, works now without crashing. Much slower than a 32bit build though but it doesn't crash anymore. Speed tested with pcpbench, 32bit got about 58 FPS, 64bit got about 22 FPS 😀
Thanks for testing 😀 (boy THAT was quick)
he he, I was just sitting down on at my desk and thought better do it now or forget about it again for a couple of weeks 😀
And what speed does normal core get? 😀
1+1=10
normal core 64bit: 1.4 FPS, 32bit: 5.8 FPS
Whoo lucky that the recompiler still beats it 😉
I suppose if both normal core tests used the same fpu backend the numbers should be in favor of the 64bit compile.
Split of the how to compile in 32bit on 64bit OS X part...
BUT I'm having trouble with SVN in 64bit on OS X 10.8 and dynamic core. For some reason it is unbearable slow. The PCPBenchmark isn't even starting anymore (I posted about this some time ago but only now found more time to look into it core dynamic on OS X 64bit *again*). I *thought* it was some regression in SVN after this fix for 64bit dynamic core, but on testing out revision 3674 which first had this fix, same problems... So it must be something with either my machine or my OS X version. Both changed since the fix was applied.
Qbix had told me some time ago to mess with src/core_dynrec/decoder.h and comment cases. Did that, no reaction except when I commented out blocks of cases including and downwards of
// case 0xa6 to 0xaf string operations, some missing
// movsb/w/d
case 0xa4:dyn_string(STR_MOVSB);break;
case 0xa5:dyn_string(decode.big_op ? STR_MOVSD : STR_MOVSW);break;
With these at least dos4gw started and crashed with no memory warning 😀
gulikoza, can you help me? Do you have any idea?
SVN still works fine for me in 10.6. It must be something with newer versions then...I'll see what I can do.
thanks a lot. A simple test is just running Dos4GW.exe. With current OS X it just stalls on starting Dos4GW.
Great...it's a gcc issue. It works fine with -O0 for me.
I don't have any good ideas how to debug this as even when I set a breakpoint it will be set at the wrong location...
$ gcc -v
gcc version 4.2.1 (Based on Apple Inc. build 5658) (LLVM build 2336.11.00)
After I wrote today here, it began to dawn me that it's perhaps a gcc issue ;(
It might be a problem with the code afterall. Can you try the following patch?
I have made no effort to see if any other functions are affected as well 😀
diff --git a/src/cpu/core_dynrec/risc_x64.h b/src/cpu/core_dynrec/risc_x64.h
--- a/src/cpu/core_dynrec/risc_x64.h
+++ b/src/cpu/core_dynrec/risc_x64.h
@@ -116,7 +116,11 @@ static INLINE void gen_memaddr(Bitu op,void* data,Bitu off) {
// move a 32bit (dword==true) or 16bit (dword==false) value from memory into dest_reg
// 16bit moves may destroy the upper 16bit of the destination register
static void gen_mov_word_to_reg(HostReg dest_reg,void* data,bool dword) {
- if (!dword) cache_addb(0x66);
+ if (!dword) {
+ cache_addb(0x31); // xor dest_reg to clear upper 16bit
+ cache_addb(0xc0+(dest_reg<<3)+dest_reg);
+ cache_addb(0x66);
+ }
cache_addb(0x8b); // mov reg,[data]
gen_reg_memaddr(dest_reg,data);
}
The problem as far as I could see was that some stale part of the register was passed as a function parameter to dynrec_movsb_word()...count (the first parameter) was 0xC3D0001 instead of 1. If anybody has a better idea than of xor, don't be shy 😀
ps: which is probably where gcc optimization kicks in...how could Bit16u count be larger than 0xFFFF?
wrote:If anybody has a better idea than of xor, don't be shy
A different approach, not sure if better:
- if (!dword) cache_addb(0x66);
- cache_addb(0x8b); // mov reg,[data]
+ if (!dword) cache_addw(0xb70f); // movzx reg,[data]
+ else cache_addb(0x8b); // mov reg,[data]
gcc (at least on linux) uses movsx when passing short as a function parameter. The function comment says it is safe to destroy the upper part of the register. Perhaps that would be the safest bet? 😀
Yeah, it uses movsx when passing short, but movzx when passing unsigned short 😜
Anyway, I was able to disassemble the dynrec_movsb_word(), here's where the error occurs. The "if (count<(Bitu)CPU_Cycles)" is the culprit. The unoptimized version goes something like this:
mov si,di
mov WORD PTR [rbp-0x2], si
mov ax, WORD PTR [rbp-0x2]
movzx eax,ax
lea rcx,[<CPU_Cycles>]
mov ecx, DWORD PTR [rcx]
movsxd rcx,ecx
cmp rax,rcx
The DI register (RDI holds the first function parameter on x86_64) is zero extended before the compare (and moved to the stack and back 😜).
The optimized version goes something like:
lea rax,[<CPU_Cycles>]
movsxd rax,DWORD PTR [rax]
cmp rdi,rax
where full RDI is compared against CPU_Cycles. It will be extended later, but that cmp will obviously be wrong.
The x86_64 ABI does not specify if function parameters should be extended (edit: in the caller), "It's intentionally unspecified." But apparently GCC will extend the parameters in the called function while LLVM will not (and thus expects the caller will). It's considered a bug, but I guess it wouldn't hurt to extend parameters in the generated code (reading some comments extending would also avoid partial registry stalls).
http://sourceware.org/ml/libffi-discuss/2013/msg00012.html
http://gcc.gnu.org/ml/gcc/2013-01/msg00448.html
Wow, thanks for analyzing that.
Should I still try the patch? I couldn't react yesterday, i was sick the whole day...
No problem, hope you're feeling better today 😀
Both patches should work, but ripsaw's is probably slightly faster 😁
Tested and neither patch works 🙁
Both either crash DOSBox with
libc++abi.dylib: terminate called throwing an exception
Abort trap: 6
or only dos4gw crashes with either of those errors:
dos/16M error: [22] cannot free memory
dos/16M error: [40] not enough available extended memory (XMIN)
If I persist in running dos4gw it will crash Dosbox on the 2nd or third try.
*Sometimes* Dosbox puts out following in stdout before crashing:
Illegal read from af00fa54, CS:IP 70: 3fe7
Illegal read from af00fa55, CS:IP 70: 3fe7
Illegal read from af00fa56, CS:IP 70: 3fe7
Illegal read from af00fa57, CS:IP 70: 3fe7
Illegal read from af00fa50, CS:IP 70: 3fe7
Illegal read from af00fa51, CS:IP 70: 3fe7
Illegal read from af00fa52, CS:IP 70: 3fe7
Illegal read from af00fa53, CS:IP 70: 3fe7
Edit: also tested with -O0 and old (working) gcc-42 with -O2
gcc --version
i686-apple-darwin11-llvm-gcc-4.2 (GCC) 4.2.1 (Based on Apple Inc. build 5658) (LLVM build 2336.11.00)
with -O0 both patches work and score about the same 35.1fps or 35.3fps
/developer-old/usr/bin/gcc-4.2 --version
i686-apple-darwin10-gcc-4.2.1 (GCC) 4.2.1 (Apple Inc. build 5666) (dot 3)
with -O2 you get about 53.x or 54.x fps with both patches.
What compiler do you have then (and flags)? I thought I installed the latest xcode...