VOGONS


First post, by kjliew

User metadata
Rank Oldbie
Rank
Oldbie

Hi DOSBox devs,

The recompiling cpu core for x64 is not working for recent Intel CPUs when compiled under 64-bit Linux with gcc. It output the following error when CPU core switched to dynarec, either explicitly with "core=dynamic" in conf file or with "core=auto" and a PMODE (DOS/4GW) DOS games was started.

Exit to error: DRC64:gen_reg_memaddr:Unhandled memory reference, data=0x7fa41a781206, diff=0x7fa4135e8b76

As you can see I had added extra outputs to check the data and diff values. My Ubuntu is 14.04LTS and gcc 4.8.4 from Ubuntu standard repositories. A similar bug had also been filed by others on sourceforge.net
http://sourceforge.net/p/dosbox/bugs/413/

An important note for the devs, this only happens with newer Intel CPUs. It *DOES NOT* repro on older Core 2 based CPUs that I have access to. Here's the CPUs I have tried:
Core 2 E6400 (Conroe) - OK
Core 2 E8400 (Wolfdale) - OK
Core 2 Q9400 (Yorkfield) - OK
AMD C-60 - OK
Intel Celeron 847 (Sandybridge) - FAILED
Intel Core i3-4010u (Haswell) - FAILED

Reply 1 of 27, by Qbix

User metadata
Rank DOSBox Author
Rank
DOSBox Author

Yeah, it is getting problematic.
Fixed some issues a while ago, but the newer compilers and the dynrec core aren't best friends....

Water flows down the stream
How to ask questions the smart way!

Reply 2 of 27, by gulikoza

User metadata
Rank Oldbie
Rank
Oldbie

The problem, if I remember correctly is that x64 doesn't have a simple 64-bit absolute addressing. DRC core currently handles only 2 cases:
- either data is below 2GB and can be directly referenced (as in 32-bit mode)
- or data is within 4GB (+- 2GB) of current instruction pointer and can be relatively accessed

When using small code model (which is default AFAIK) program data is guaranteed to be below 2GB. Now I can't say for sure, but it would appear that allocated memory was within 4GB of RIP when doing our tests at the time. This might no longer be the case as it would appear 😵

The memaddr functions would therefore need a third mode: load a 64-bit immediate into a free register and use register for addressing memory.

http://www.si-gamer.net/gulikoza

Reply 3 of 27, by Qbix

User metadata
Rank DOSBox Author
Rank
DOSBox Author

You are probably right. 64 bit asm isn't something I am very familiar with 🙁

Water flows down the stream
How to ask questions the smart way!

Reply 4 of 27, by kjliew

User metadata
Rank Oldbie
Rank
Oldbie

This still does not explain why things are still working with older Core 2 and AMD C-60 CPUs. Can someone help to check how AMD Richland and Kaveri CPUs behave with recompiling dynarec cpu core? Could dynarec cpu core be encoding x64 ASM instructions that conflict with new ISA extensions that Intel did to the new CPUs? Since I found this failed with SandyBridge and Haswell, I believe IvyBrigde will fail, too. Nehelem and Arrandale may not. I don't have enough data to conclude on the AMD side. Hopefully we can gather more data to figure it out. Or, it may uncover undocumented ISA behaviors.

The amount of RAM does not matter. It failed with my Intel Celeron 847 with 2GB RAM. And it is working with my 2 other Core 2 desktop with 4GB and 8GB RAM respectively.

Reply 5 of 27, by gulikoza

User metadata
Rank Oldbie
Rank
Oldbie

Maybe glibc optimizes malloc based on cpu features? Or maybe that 64-bit diff calculation is somehow messed up 😀

It's really pretty simple. cache.pos will be RIP (instruction pointer) when dynrec will be executing that particular instruction. data is the memory address we need to [load into]/[save from] a register. These are helper functions that are used later when instructions are compiled. For instance, to do a gen_mov_word_from_reg (store value from a register into a memory address), you need to issue something like:

mov [data],ecx

If the memory address is <2GB, we can simply do that and address it directly, the same as 32-bit x86 (data will be absolute 32-bit address). If it's within 4GB of cache.pos, it can be accessed with diff, the instruction issued is something like:

mov [rip+diff], ecx

If neither is true...dynrec core can't access it at the moment, since it would need to issue something like:

mov rax, data (load the 64-bit address into rax)
mov [rax],ecx.

You cannot do 'mov [data],ecx' as there is no instruction that would take full 64-bit address as a parameter.
You can try printing cache.pos as well, and then we'll see what the diff is exactly 😀

http://www.si-gamer.net/gulikoza

Reply 6 of 27, by kjliew

User metadata
Rank Oldbie
Rank
Oldbie
0x7ad31db 0x7ad31dc 0x7ad31dd 0x7ad31de 0x7ad31df 0x7ad31e0 0x7ad31e1 0x7ad31e2 0x7ad31e3 0x7ad31e4 0x7ad31e5 0x7ad31e6 0x7ad31e7 0x7ad31e8 0x7ad31e9 0x7ad31ea 
data=0x7fe739350206, diff=0x7fe73187d026
Exit to error: DRC64:gen_reg_memaddr:Unhandled memory reference

The 1st line - cache.pos+0 to cache.pos+0x10. De-referencing cache.pos are all '0' which makes sense as opcodes are going to be filled.
The 2nd line - data and diff, as you can see diff = data - cache.pos+5.
It seems that the diff calculation is seriously wrong when the data is 64-bit pointer while cache.pos is 32-bit pointer never exceed the upper 2GB. The diff calculation seems to assume that cache.pos and data both are pointers within the same proximity of 64-bit pointers such that the upper 32-bit will be nullified during the diff calculation and the remaining lower 32-bit becomes the RIP relative addressing.

Is the same code working on 64-bit Windows compiled with Mingw64?

Reply 7 of 27, by gulikoza

User metadata
Rank Oldbie
Rank
Oldbie

If cache.pos is <2GB that would mean that malloc is still returning memory bellow 2GB. If malloc would return a higher address, then of course cache.pos would be larger...

The question is what exactly is at 0x7fe739350206, as program data should be bellow 2GB as well when using small memory model.

edit: what I'm maybe trying to say is, that data might not even be correct and the DRC error is simply a result of something else that's gone wrong. Can you dereference *data or do you get segfault?

http://www.si-gamer.net/gulikoza

Reply 8 of 27, by kjliew

User metadata
Rank Oldbie
Rank
Oldbie
0x7f182519a501 0x7f182519a502 0x7f182519a503 0x7f182519a504 0x7f182519a505 0x7f182519a506 0x7f182519a507 0x7f182519a508 0x7f182519a509 0x7f182519a50a 0x7f182519a50b 0x7f182519a50c 0x7f182519a50d 0x7f182519a50e 0x7f182519a50f 0x7f182519a510 
*data=100005ff8
gen_reg_memaddr: data=0x8dd140, diff=0xffff80e7db742c3a

0x7f182519a58c 0x7f182519a58d 0x7f182519a58e 0x7f182519a58f 0x7f182519a590 0x7f182519a591 0x7f182519a592 0x7f182519a593 0x7f182519a594 0x7f182519a595 0x7f182519a596 0x7f182519a597 0x7f182519a598 0x7f182519a599 0x7f182519a59a 0x7f182519a59b
*data=7f1825994210
gen_reg_memaddr: data=0x2d021f8, diff=0xffff80e7ddb67c67

0x7f182519a613 0x7f182519a614 0x7f182519a615 0x7f182519a616 0x7f182519a617 0x7f182519a618 0x7f182519a619 0x7f182519a61a 0x7f182519a61b 0x7f182519a61c 0x7f182519a61d 0x7f182519a61e 0x7f182519a61f 0x7f182519a620 0x7f182519a621 0x7f182519a622
*data=1587
gen_reg_memaddr: data=0x8dcfc8, diff=0xffff80e7db7429b0

I am providing a few samples on the "GOOD" Core 2 Q9400. Now, the cache.pos is at >4GB while the data always <2GB. This produces the diff which satisfies the 1st if{...}. *data is de-referenced with (unsigned long *), but I don't think this matter at all as the function does not seem to de-reference data. Data was used as a pointer type throughout the function.

Reply 9 of 27, by gulikoza

User metadata
Rank Oldbie
Rank
Oldbie

Actually it shouldn't use the first if as diff is clearly less than -2GB 😉
Dereferencing data is what this function is trying to do in asm. If address of data is <2GB, then it can be addressed directly.

The interesting thing would be if the high address of data in "BAD" case is correct, which could be tested by printing it here in the function and seeing if it segfaults.

http://www.si-gamer.net/gulikoza

Reply 10 of 27, by kjliew

User metadata
Rank Oldbie
Rank
Oldbie
gen_reg_memaddr: data=0x7f48f79d1206, diff=0x7f48efa64be76, cache.pos=0x7f6c690 (57f8026fa5e03f8) 
Exit to error: DRC64:Unhandled memory reference

gen_reg_memaddr: data=0x7f4e80d42206, diff=0x7f4e799f48f6, cache.pos=0x734d910 (57f8026fa5e03f8)
Exit to error: DRC64:Unhandled memory reference

gen_reg_memaddr: data=0x7f8219878206, diff=0x7f8211012306, cache.pos=0x8865f00 (57f8026fa5e03f8)
Exit to error: DRC64:Unhandled memory reference

The (...) is the result of de-referencing data as (unsigned long *). The cache.pos is really (cache.pos+5). It did not trigger segment fault. Tried 3 runs and *data has the same de-referenced value.

@gulikoza: BTW what's your existing setup for compiling DOSBox? Do 64-bit Windows compilers from Visual Studio have the same issue?

gen_reg_memaddr: data=0x7fab9a108206, diff=0xb8c272b, cache.pos=0x7fab8e845adb (57f8026fa5e03f8)

On the "GOOD" Core 2 Q9400, the *data also de-referenced to the same value, but cache.pos is above 4GB which satisfies the 1st if{...}

Reply 11 of 27, by gulikoza

User metadata
Rank Oldbie
Rank
Oldbie

I haven't actually compiled 64-bit DOSBox in a while...but when I did, I used linux. I don't know if dynrec core was ever really tested on Windows (as it is slower than dynamic and 32-bit DOSBox works just fine).

Ok, the second output indicates that we would indeed need the third case of addressing data.
I've actually looked briefly into this, but it would need a somewhat larger modification since opcode prefixes are added to the cache before calling gen_reg_memaddr...

http://www.si-gamer.net/gulikoza

Reply 12 of 27, by vext01

User metadata
Rank Newbie
Rank
Newbie

Hi,

I was the guy that raised bug #413. FWIW, I'm still having that issue.

FWIW, my setup is an x86_64 laptop running OpenBSD-current. I've tried compiling with g++-4.2.1 and g++-4.9.3 on today's SVN and the issue persists. I tried with clang++-3.5 and got:

Making all in core_dynrec
clang++ -DHAVE_CONFIG_H -I. -I../.. -I../../include -I/usr/local/include -I/usr/local/include/SDL -D_GNU_SOURCE=1 -D_REENTRANT -I/usr/X11R6/include -DXTHREADS -g -O2 -mno-ms-bitfields -MT callback.o -MD -MP -MF .deps/callback.Tpo -c -o callback.o callback.cpp
clang-3.5: error: unknown argument: '-mno-ms-bitfields'

I kludged the configure script to not use this flag, but anyway it seems the acutoconf check for this feature is broken.
Once kludged, it builds, and the outcome is the same.

My CPU:
cpu0: Intel(R) Core(TM) i5-3320M CPU @ 2.60GHz, 1197.48 MHz

In case it helps, OpenBSD does not use glibc. They have their own libc.

As I said in my bug report, I tried a troublesome game under debian/dosbox on the same machine, and it worked. So I'm not certain it is anything to do with the CPU.

FWIW, OpenBSD has pretty extensive ASLR. IIRC PIE is default, so may be this is why memory is being allocated higher than is expected(?).

I am happy to try other things out if anyone has any ideas.

Cheers

Reply 15 of 27, by gulikoza

User metadata
Rank Oldbie
Rank
Oldbie

Here's a first shot. It only tries to fix gen_reg_memaddr, gen_memaddr will still exit with the same error if it occurs.
The code written should work in any case (doesn't need the existing if() blocks), but is probably slightly slower since it uses more instructions (but then again, this is just speculation as I have no idea what is faster). So I've left the ifs() as they were...

A quick test, the new code started Prince, Transport Tycoon & Settlers2 successfully.

Attachments

http://www.si-gamer.net/gulikoza

Reply 16 of 27, by vext01

User metadata
Rank Newbie
Rank
Newbie

Hey,

I applied your patch, and indeed, gen_memaddr() is hitting the crashout case.

I added this following line:
printf("data = %p, diff=%p\n", data, diff);

and tested with the game "a fragile allegiance" (which always crashes after the cracktro). I get:

DOSBox version SVN
Copyright 2002-2015 DOSBox Team, published under GNU GPL.
---
CONFIG:Loading primary settings from config file /home/edd/.dosbox/dosbox-SVN.conf
MIXER:Got different values from SDL: freq 44100, blocksize 882
MIDI:Opened device:none
Using joystick /dev/uhid0 with 0 axes, 0 buttons and 0 hat(s)
Using joystick /dev/uhid1 with 0 axes, 0 buttons and 0 hat(s)
DOSBox switched to max cycles, because of the setting: cycles=auto. If the game runs too fast try a fixed cycles amount in DOSBox's options.
data = 0x14f4d1c6a130, diff=0xfffffffda75850f9
Exit to error: DRC64:Unhandled memory reference

Cheers.

Reply 17 of 27, by gulikoza

User metadata
Rank Oldbie
Rank
Oldbie

So I have to do that one as well...sigh

I feel somewhat less confident about this one, but it loads Settlers2 for me 😀

Attachments

http://www.si-gamer.net/gulikoza

Reply 18 of 27, by vext01

User metadata
Rank Newbie
Rank
Newbie

Well, it doesn't crash now. I've managed to start a new game! I can't vouch for the correctness of the diff, but at least we run now!

Something is clearly still not right though. It took a good 10 minutes to get to the main menu of the game. Once at the menu, the game runs smooth and fluid, but up until that point performance is really bad.

I looked at top during the bad performance phase to see what the CPU was up to and dosbox was consuming 0%, stuck in biowait. In other words, dosbox is doing very little actual work at this time. It's unlikely that your change has introduced the slowdown, a similar thing happens under 'normal core'.

This is likely a separate issue (?). Can you try "a fragile alliegience" on svn head and see if you can repro?

Cheers

Reply 19 of 27, by gulikoza

User metadata
Rank Oldbie
Rank
Oldbie

Hmm...The game shows intro screens immediately after start, but then I get DOS4GW transfer stack overflow randomly on different screens. This happens with both normal and dynamic (patched & unpatched) cores. I've changed memsize to 32 and actually got to the main menu once, but the next startup crashed on the second screen.

I'm running this over ssh and X-forward to my Windows desktop so it might not be the best way, but I don't have linux on my desktop at the moment 😀 I'll see if the game runs on 32-bit windows build...

http://www.si-gamer.net/gulikoza