VOGONS


SVN on OS X - core=dynamic causes segfault

Topic actions

Reply 81 of 110, by Dominus

User metadata
Rank DOSBox Moderator
Rank
DOSBox Moderator

yes, works now without crashing. Much slower than a 32bit build though but it doesn't crash anymore. Speed tested with pcpbench, 32bit got about 58 FPS, 64bit got about 22 FPS 😀

Windows 3.1x guide for DOSBox
60 seconds guide to DOSBox
DOSBox SVN snapshot for macOS (10.4-11.x ppc/intel 32/64bit) notarized for gatekeeper

Reply 83 of 110, by Dominus

User metadata
Rank DOSBox Moderator
Rank
DOSBox Moderator

he he, I was just sitting down on at my desk and thought better do it now or forget about it again for a couple of weeks 😀

Windows 3.1x guide for DOSBox
60 seconds guide to DOSBox
DOSBox SVN snapshot for macOS (10.4-11.x ppc/intel 32/64bit) notarized for gatekeeper

Reply 85 of 110, by Dominus

User metadata
Rank DOSBox Moderator
Rank
DOSBox Moderator

normal core 64bit: 1.4 FPS, 32bit: 5.8 FPS

Windows 3.1x guide for DOSBox
60 seconds guide to DOSBox
DOSBox SVN snapshot for macOS (10.4-11.x ppc/intel 32/64bit) notarized for gatekeeper

Reply 87 of 110, by Dominus

User metadata
Rank DOSBox Moderator
Rank
DOSBox Moderator

Split of the how to compile in 32bit on 64bit OS X part...

BUT I'm having trouble with SVN in 64bit on OS X 10.8 and dynamic core. For some reason it is unbearable slow. The PCPBenchmark isn't even starting anymore (I posted about this some time ago but only now found more time to look into it core dynamic on OS X 64bit *again*). I *thought* it was some regression in SVN after this fix for 64bit dynamic core, but on testing out revision 3674 which first had this fix, same problems... So it must be something with either my machine or my OS X version. Both changed since the fix was applied.
Qbix had told me some time ago to mess with src/core_dynrec/decoder.h and comment cases. Did that, no reaction except when I commented out blocks of cases including and downwards of

//		case 0xa6 to 0xaf string operations, some missing

// movsb/w/d
case 0xa4:dyn_string(STR_MOVSB);break;
case 0xa5:dyn_string(decode.big_op ? STR_MOVSD : STR_MOVSW);break;

With these at least dos4gw started and crashed with no memory warning 😀
gulikoza, can you help me? Do you have any idea?

Windows 3.1x guide for DOSBox
60 seconds guide to DOSBox
DOSBox SVN snapshot for macOS (10.4-11.x ppc/intel 32/64bit) notarized for gatekeeper

Reply 89 of 110, by Dominus

User metadata
Rank DOSBox Moderator
Rank
DOSBox Moderator

thanks a lot. A simple test is just running Dos4GW.exe. With current OS X it just stalls on starting Dos4GW.

Windows 3.1x guide for DOSBox
60 seconds guide to DOSBox
DOSBox SVN snapshot for macOS (10.4-11.x ppc/intel 32/64bit) notarized for gatekeeper

Reply 90 of 110, by gulikoza

User metadata
Rank Oldbie
Rank
Oldbie

Great...it's a gcc issue. It works fine with -O0 for me.
I don't have any good ideas how to debug this as even when I set a breakpoint it will be set at the wrong location...

$ gcc -v
gcc version 4.2.1 (Based on Apple Inc. build 5658) (LLVM build 2336.11.00)

http://www.si-gamer.net/gulikoza

Reply 91 of 110, by Dominus

User metadata
Rank DOSBox Moderator
Rank
DOSBox Moderator

After I wrote today here, it began to dawn me that it's perhaps a gcc issue ;(

Windows 3.1x guide for DOSBox
60 seconds guide to DOSBox
DOSBox SVN snapshot for macOS (10.4-11.x ppc/intel 32/64bit) notarized for gatekeeper

Reply 92 of 110, by gulikoza

User metadata
Rank Oldbie
Rank
Oldbie

It might be a problem with the code afterall. Can you try the following patch?
I have made no effort to see if any other functions are affected as well 😀

diff --git a/src/cpu/core_dynrec/risc_x64.h b/src/cpu/core_dynrec/risc_x64.h
--- a/src/cpu/core_dynrec/risc_x64.h
+++ b/src/cpu/core_dynrec/risc_x64.h
@@ -116,7 +116,11 @@ static INLINE void gen_memaddr(Bitu op,void* data,Bitu off) {
// move a 32bit (dword==true) or 16bit (dword==false) value from memory into dest_reg
// 16bit moves may destroy the upper 16bit of the destination register
static void gen_mov_word_to_reg(HostReg dest_reg,void* data,bool dword) {
- if (!dword) cache_addb(0x66);
+ if (!dword) {
+ cache_addb(0x31); // xor dest_reg to clear upper 16bit
+ cache_addb(0xc0+(dest_reg<<3)+dest_reg);
+ cache_addb(0x66);
+ }
cache_addb(0x8b); // mov reg,[data]
gen_reg_memaddr(dest_reg,data);
}

The problem as far as I could see was that some stale part of the register was passed as a function parameter to dynrec_movsb_word()...count (the first parameter) was 0xC3D0001 instead of 1. If anybody has a better idea than of xor, don't be shy 😀

ps: which is probably where gcc optimization kicks in...how could Bit16u count be larger than 0xFFFF?

http://www.si-gamer.net/gulikoza

Reply 93 of 110, by ripsaw8080

User metadata
Rank DOSBox Author
Rank
DOSBox Author
gulikoza wrote:

If anybody has a better idea than of xor, don't be shy

A different approach, not sure if better:

-    if (!dword) cache_addb(0x66);
- cache_addb(0x8b); // mov reg,[data]
+ if (!dword) cache_addw(0xb70f); // movzx reg,[data]
+ else cache_addb(0x8b); // mov reg,[data]

Reply 94 of 110, by gulikoza

User metadata
Rank Oldbie
Rank
Oldbie

gcc (at least on linux) uses movsx when passing short as a function parameter. The function comment says it is safe to destroy the upper part of the register. Perhaps that would be the safest bet? 😀

http://www.si-gamer.net/gulikoza

Reply 95 of 110, by gulikoza

User metadata
Rank Oldbie
Rank
Oldbie

Yeah, it uses movsx when passing short, but movzx when passing unsigned short 😜

Anyway, I was able to disassemble the dynrec_movsb_word(), here's where the error occurs. The "if (count<(Bitu)CPU_Cycles)" is the culprit. The unoptimized version goes something like this:

mov si,di
mov WORD PTR [rbp-0x2], si
mov ax, WORD PTR [rbp-0x2]
movzx eax,ax
lea rcx,[<CPU_Cycles>]
mov ecx, DWORD PTR [rcx]
movsxd rcx,ecx
cmp rax,rcx

The DI register (RDI holds the first function parameter on x86_64) is zero extended before the compare (and moved to the stack and back 😜).
The optimized version goes something like:

lea rax,[<CPU_Cycles>]
movsxd rax,DWORD PTR [rax]
cmp rdi,rax

where full RDI is compared against CPU_Cycles. It will be extended later, but that cmp will obviously be wrong.

The x86_64 ABI does not specify if function parameters should be extended (edit: in the caller), "It's intentionally unspecified." But apparently GCC will extend the parameters in the called function while LLVM will not (and thus expects the caller will). It's considered a bug, but I guess it wouldn't hurt to extend parameters in the generated code (reading some comments extending would also avoid partial registry stalls).

http://sourceware.org/ml/libffi-discuss/2013/msg00012.html
http://gcc.gnu.org/ml/gcc/2013-01/msg00448.html

Last edited by gulikoza on 2013-05-12, 11:44. Edited 1 time in total.

http://www.si-gamer.net/gulikoza

Reply 96 of 110, by Dominus

User metadata
Rank DOSBox Moderator
Rank
DOSBox Moderator

Wow, thanks for analyzing that.
Should I still try the patch? I couldn't react yesterday, i was sick the whole day...

Windows 3.1x guide for DOSBox
60 seconds guide to DOSBox
DOSBox SVN snapshot for macOS (10.4-11.x ppc/intel 32/64bit) notarized for gatekeeper

Reply 98 of 110, by Dominus

User metadata
Rank DOSBox Moderator
Rank
DOSBox Moderator

Tested and neither patch works 🙁

Both either crash DOSBox with

libc++abi.dylib: terminate called throwing an exception
Abort trap: 6

or only dos4gw crashes with either of those errors:

dos/16M error: [22] cannot free memory
dos/16M error: [40] not enough available extended memory (XMIN)

If I persist in running dos4gw it will crash Dosbox on the 2nd or third try.
*Sometimes* Dosbox puts out following in stdout before crashing:

Illegal read from af00fa54, CS:IP       70:    3fe7
Illegal read from af00fa55, CS:IP 70: 3fe7
Illegal read from af00fa56, CS:IP 70: 3fe7
Illegal read from af00fa57, CS:IP 70: 3fe7
Illegal read from af00fa50, CS:IP 70: 3fe7
Illegal read from af00fa51, CS:IP 70: 3fe7
Illegal read from af00fa52, CS:IP 70: 3fe7
Illegal read from af00fa53, CS:IP 70: 3fe7

Edit: also tested with -O0 and old (working) gcc-42 with -O2

gcc --version
i686-apple-darwin11-llvm-gcc-4.2 (GCC) 4.2.1 (Based on Apple Inc. build 5658) (LLVM build 2336.11.00)
with -O0 both patches work and score about the same 35.1fps or 35.3fps

/developer-old/usr/bin/gcc-4.2 --version
i686-apple-darwin10-gcc-4.2.1 (GCC) 4.2.1 (Apple Inc. build 5666) (dot 3)
with -O2 you get about 53.x or 54.x fps with both patches.

Windows 3.1x guide for DOSBox
60 seconds guide to DOSBox
DOSBox SVN snapshot for macOS (10.4-11.x ppc/intel 32/64bit) notarized for gatekeeper