Reply 20 of 123, by jmarsh
AFAIK the difference between 1 and 2 is that memory returned by brk() will be randomized at the higher level.
Data segments can't be randomized as the text segments contain fixed RIP-relative offsets to them.
AFAIK the difference between 1 and 2 is that memory returned by brk() will be randomized at the higher level.
Data segments can't be randomized as the text segments contain fixed RIP-relative offsets to them.
wrote:kjliew, does disabling randomization of the virtual address space help?
I am not interested in tinkering with kernel to get things work. The core should be able to handle any forms of the CPU instructions dealing with relative and absolute offsets.
Several observation so far:
- Real mode codes are usually OK, even dynamic core was forced to run at the beginning. GODS played over 2 Worlds at Level 1. However, DUNE2 will error out with the same issue. DUNE2 is purely real mode code, but make use of EMS memory.
- DOS4G/DOS4GW is almost certain that the error will show up, perhaps within the setting up of protected-mode stub.
- CWSDPMI is OK. Quake1 will run, but at some point of time, it will error out.
@krcsoft, were you able to run any DOS4G/DOS4GW games without replacing the stub on your Core i7-6700K, for eg. WarCraft?
@jmarsh, the error is rarely reproducible on Windows. Did you get a chance to reproduce it consistently to address the issue?
The old Core2 Quad with ArchLinux is fine. Completed W98SE installation all the way from blank image.
Core i7-6000 series is fine, too, on Wndows 10 Pro. Booted W98SE without any error.
wrote:@krcroft, were you able to run any DOS4G/DOS4GW games without replacing the stub on your Core i7-6700K, for eg. WarCraft?
kjliew, I unfortunately wasn't able to reproduce any crashes or failures, regardless of kernel.randomize_va_space set to 0 or 2. I've tried:
- Space Hulk with dos4gw.exe 1.92 and 2.01a
- Redneck Rampage with 1.97 and 2.01a
- Warcraft I with 2.01a
- Warcraft II, with war2.exe's stub replaced with dos32a 9.1.2
wrote:kjliew, I unfortunately wasn't able to reproduce any crashes or failures, regardless of kernel.randomize_va_space set to 0 or 2.
Thanks. I would expect similar results since my Core i7-6600U laptop does not show any error, too.
You need to test games that use self-modifying code, which causes hostmem to get accessed directly by the emitted code (for frequently modified instructions). Games using the Build engine are the most common example so if RR didn't trigger it you likely won't have an issue.
There's not much point going further into it now anyway, the way the code cache is allocated needs to be switched from malloc to mmap (since using mprotect on malloc'd memory is not meant to be done) and that can affect the returned address.
wrote:You need to test games that use self-modifying code, which causes hostmem to get accessed directly by the emitted code (for frequently modified instructions). Games using the Build engine are the most common example so if RR didn't trigger it you likely won't have an issue.
For the problematic Intel Haswell CPU and ArchLinux (and I presumed the same for most of the Linux distro with recent kernel), it only takes a simple DOS4GW application to reproduce the error. In my case, a simple TEST04 from the 3Dfx Glide2 SDK demo would be suffice. I am pretty sure the code is not self-modifying. PCPBENCH is another good one. The bad RIP address error reproduces easily and consistently.
@Dominus had an even worst error scenario with MacOS that just by switching the core in the DOSBox command prompt would trigger the bad RIP address error, which I couldn't reproduce on ArchLinux. Since it's on MacOS, it got to be an Intel CPU, but there wasn't further details about the Intel CPU family.
Anyway, let's see if the mmap allocation would fix the issue for all. I am looking forward to the next patch update.
After Jmarsh told me to replace the first line of gen_jmp_ptr() with this: "opcode(0).set64().setimm((Bitu)ptr,8).Emit8Reg(0xA1);" I had no more crash and extreme speed gain.
First post updated with latest patch and a windows build.
I've just tried the Windows build and it's faster indeed. Great job! 😎
11 1 111 11 1 1 1 1 1 11 1 1 111 1 111 1 1 1 1 111
Is it possible to port any performance improvements back to the x86 dynamic core for 32-bit OS?
From the first post:
Results look good; a huge speed boost over current 64-bit builds and around 10-20% improvement over 32-bit dyn_x86.
The post did not distinguish testing on 32-bit OS versus 64-bit OS. It leads to errors in the former case in my build, but requires further checks to submit a useful report.
Are you saying the patch breaks 32-bit builds?
Not yet, I haven't tested enough. I would have to have a better control build. Warcraft 2 runs but Quake led to a page fault on startup.
Thank you for the great contribution with this cpu core.
If you haven't fetched the latest SVN commit (r4252), that would be a problem.
I had that commit. Edit: I'll verify I didn't overwrite the commit while patching.
Verified that the r4252 commit is present. The 32-bit build works now, but by undefining X86_DYNFPU_DH_ENABLED and using a non-x86 fpu interpreter core instead. It runs significantly slower in this non-standard configuration than the previous dynamic x86 cpu core in 32-bit Windows.
It seems that there are not many changes to the 32-bit dynamic x86 core with this core apart from reducing redundant x86 fpu code.
Why are you undefining X86_DYNFPU_DH_ENABLED and making it use the slower code?
For a working performance comparison to the original x86 dynamic core. The result was unexpected since the relevant changes are not many.