VOGONS


64-bit dynamic_x86 (patch)

Topic actions

First post, by jmarsh

User metadata
Rank Oldbie
Rank
Oldbie

Since the current 64-bit builds of DOSBox only support the slower dynrec core I decided it was time to port the dynamic_x86 core.
Results look good; a huge speed boost over current 64-bit builds and around 10-20% improvement over 32-bit dyn_x86. Just in time for Apple and Ubuntu to drop 32-bit support...
Needs lots of testing due to it basically being a new core. Different OSes, compilers, PIE, ASLR, etc. can all trigger different codegen strategies (which hopefully have all been implemented). Ideally the x32 ABI would be used instead and we could go back to assuming all pointers fit in 32-bits but that's not likely to happen any time soon.

Patch may need slight formatting adjustments due to being generated by git.

Attachments

  • Filename
    64b-dyn_x86.patch
    File size
    100.39 KiB
    Downloads
    151 downloads
    File license
    Fair use/fair dealing exception
  • Filename
    dosbox-x64-win.exe
    File size
    3.35 MiB
    Downloads
    125 downloads
    File license
    Fair use/fair dealing exception
Last edited by jmarsh on 2019-07-01, 17:34. Edited 3 times in total.

Reply 1 of 123, by kjliew

User metadata
Rank Oldbie
Rank
Oldbie

Wow! I never thought that this could be done. Kudo, jmarsh!

On MSYS2/mingw-w64 current, everything built and 64-bit DOSBox and a few DOS4GW games checked out fine on Windows 10 Pro x64, great performance, including both Glide pass-through and Voodoo chip emulation. This is with GCC 9.10/binutils 2.30 with -O3 and native optimization. I will be testing across different Intel/AMD CPUs both desktop and laptop class. A very exciting news for myself who recently endorsed ArchLinux for x86_64 and rolling updates, knowing the fact that DOSBox would be slower on x86_64, and no official i686 multilib support. I will get back with more results on ArchLinux build, including running Windows 98 from DOSBox 😵

I wish your contribution would be accepted upstream.

Reply 2 of 123, by cyclone3d

User metadata
Rank l33t++
Rank
l33t++

Great! Bookmarked for future use.

Yamaha modified setupds and drivers
Yamaha XG repository
YMF7x4 Guide
Aopen AW744L II SB-LINK

Reply 3 of 123, by DosFreak

User metadata
Rank l33t++
Rank
l33t++

AFAIK, Ubuntu is not dropping 32bit application support in 19.10 just the libraries so it'll be up to the developer of the application to supply them or for the user to hunt them down and add them manually. More than likely the people who care will switch to Debian rather than going through the hassle of using chroot, LXD, snaps, flatpak or VMs for all their 32bit programs. Ubuntu is just being lazy. It is pretty sad seeing the comments from people who don't realize how much software is still 32bit and dismissing peoples concerns.

How To Ask Questions The Smart Way
Make your games work offline

Reply 5 of 123, by kjliew

User metadata
Rank Oldbie
Rank
Oldbie
DosFreak wrote:

AFAIK, Ubuntu is not dropping 32bit application support in 19.10 just the libraries so it'll be up to the developer of the application to supply them or for the user to hunt them down and add them manually.

That really means dropping 32-bit support. No offence 😀 ArchLinux is spearheading this. Ubuntu is just following.

Reply 6 of 123, by kjliew

User metadata
Rank Oldbie
Rank
Oldbie

HI jmarsh!

ArchLinux build was great, too, really great performance from the new 64-bit dyn_x86 core. I even have a record score running AutoCAD R13 with BENCH13 from SoftEngine. I think it is very well optimized for modern 32-bit DOS protected-mode with DOS extenders.

Win98SE boots, too, however it is less deterministic than the previous dyn_x86 core compiled in i686 or dynrec core compiled in x86_64. If Win98SE runs SCANDISK during Windows boot, then it would be stormed with divide overflow error. This happens most frequently. Then, next Windows boot will automatically select Safe Mode and the core would hung a couple times, and eventually Safe Mode will be up. Then, properly shutdown from Safe Mode and next boot volla! Windows boots. The illegal read/write [ADDR] from CS:IP and PageFault are normal, both previous dyn_x86 and dynrec showed the same messages on successful Windows boots.

Copyright 2002-2019 DOSBox Team, published under GNU GPL.
---
CONFIG: Loading primary settings from config file /tmp/win.conf
Memory sizes above 31 MB are NOT recommended.
Stick with the default values unless you are absolutely certain.
MIXER: Got different values from SDL: freq 44100, blocksize 512
MIDI: Opened device:mt32
Glide:LFB access: read-write (no aux)
Illegal read from dad45000, CS:IP 28:c02322b1
Illegal read from dad45001, CS:IP 28:c02322b1
Illegal read from dad45002, CS:IP 28:c02322b1
Illegal read from dad45003, CS:IP 28:c02322b1
Illegal read from dad45000, CS:IP 28:c02322b1
Illegal read from dad45001, CS:IP 28:c02322b1
Illegal read from dad45002, CS:IP 28:c02322b1
Illegal read from dad45003, CS:IP 28:c02322b1
Illegal read from dad45000, CS:IP 28:c02322b1
Illegal read from dad45001, CS:IP 28:c02322b1
Illegal read from dad45002, CS:IP 28:c02322b1
Illegal read from dad45003, CS:IP 28:c02322b1
Illegal write to dad45000, CS:IP 28:c02322b1
Illegal write to dad45001, CS:IP 28:c02322b1
Illegal write to dad45002, CS:IP 28:c02322b1
Illegal write to dad45003, CS:IP 28:c02322b1
Illegal read from dad45000, CS:IP 28:c02322b1
Illegal read from dad45001, CS:IP 28:c02322b1
Illegal read from dad45002, CS:IP 28:c02322b1
Illegal read from dad45003, CS:IP 28:c02322b1
Illegal read from dad45000, CS:IP 28:c02322b1
Illegal read from dad45001, CS:IP 28:c02322b1
Illegal read from dad45002, CS:IP 28:c02322b1
Illegal read from dad45003, CS:IP 28:c02322b1
Illegal read from dad45000, CS:IP 28:c02322b1
Illegal read from dad45001, CS:IP 28:c02322b1
Illegal read from dad45002, CS:IP 28:c02322b1
Illegal read from dad45003, CS:IP 28:c02322b1
Illegal write to dad45000, CS:IP 28:c02322b1
Illegal write to dad45001, CS:IP 28:c02322b1
Illegal write to dad45002, CS:IP 28:c02322b1
Illegal write to dad45003, CS:IP 28:c02322b1
Illegal read from dad45000, CS:IP 28:c02322b1
Illegal read from dad45001, CS:IP 28:c02322b1
Illegal read from dad45002, CS:IP 28:c02322b1
Illegal read from dad45003, CS:IP 28:c02322b1
Illegal read from dad45000, CS:IP 28:c02322b1
Illegal read from dad45001, CS:IP 28:c02322b1
Illegal read from dad45002, CS:IP 28:c02322b1
Illegal read from dad45003, CS:IP 28:c02322b1
Illegal read from dad45000, CS:IP 28:c02322b1
Illegal read from dad45001, CS:IP 28:c02322b1
Illegal read from dad45002, CS:IP 28:c02322b1
Illegal read from dad45003, CS:IP 28:c02322b1
Illegal write to dad45000, CS:IP 28:c02322b1
Illegal write to dad45001, CS:IP 28:c02322b1
Illegal write to dad45002, CS:IP 28:c02322b1
Illegal write to dad45003, CS:IP 28:c02322b1
PageFault at 8088A000 type [6] queue 1
Left PageFault for 8088a000 queue 1
VOODOO: OpenGL: mode set, resolution 800:600
VOODOO: OpenGL: quit
Show last 6 lines


Reboot requested, quitting now.


Once in the Windows, most of the time it ran great. Intermittently it would get "Reboot requested, quitting now" on exiting a program/game (for eg. GLQuake/Quake2). I guess this could be faults within fault handler (triple fault). This would also reproduce consistently when trying to shutdown properly from Windows. It never reached the "It is Safe to Turn Off" screen, but ended up in "Reboot requested, quitting now" from DOSBox console and next boot will trigger SCANDISK and the whole story repeats. The behavior is similar from Windows 10 Pro x64 and ArchLinux.

I wonder if you would take the new 64-bit dyn_x86 core to support Win98. Though running Win98 from DOSBox is unsupported, both the previous dynamic cores do the Win98 (with some quirks) pretty well. I understand that debugging Win98 quirks is challenging.

Reply 8 of 123, by krcroft

User metadata
Rank Oldbie
Rank
Oldbie

Can't say enough good stuff about this work, jmarsh!

This ensures DOSBox will be a first-class 64-bit binary, offering the best performance on moderns free software platforms, without needing a pile of obsolete or kludged-in 32bit libraries that were previously needed to get the best performance.

Given open source labour isn't free yet the amount of software and security issues continues to grow, it's only a matter of time before more distributions shed support for platforms that have long been replaced (amd's 64bit processors were selling 16 years ago, and Intel's one year later with the pentium 4 model F).

Yeah, our community here is unique with many of us still tinkering with very old hardware, and for those with 32bit-only OSes there will still be the 32bit dynarec core.

Regarding X32 on Linux - it's been a while since I played with this on Gentoo. If the kernel, glibc, build-stack, and X11 libraries support it, then presumably we could get a pure static build of dosbox out of it (assuming building sdl, audio libs, and so on from source to static libs). Feels like a fair bit of work though, and this hybrid binary format never gained traction with distros.

I also looked into managing your own 32bit "pointers" using an index table (instead being forced to use 64bit pointers), but the the performance gains aren't there (or negligible), while the code cleanliness is reduced with an additional layer of indirection.

Can't wait to give this a spin on my systems!

Reply 9 of 123, by kjliew

User metadata
Rank Oldbie
Rank
Oldbie
jmarsh wrote:

Unless something works with core=normal, it's out of scope for dynamic cores.

I am sorry that I didn't quite get what you mean. DOSBox dynamic cores (dyn_x86 & dynrec) work pretty well for Win9x, even that was just a coincidence. In fact, core=normal has never worked for Win9x and is too slow.

For all the CPU-heavy, late 90's DOS games at my disposal, they work really well with the new 64-bit dynamic_x86. I couldn't find anything that it won't run perfectly. I don't have Win3.1/WfW3.11 to check. Hopefully, someone will do that shortly if that's what would expose the quirks of running Win9x between the old and new dynamic cores.

Reply 10 of 123, by krcroft

User metadata
Rank Oldbie
Rank
Oldbie
jmarsh wrote:

Results look good; a huge speed boost over current 64-bit builds and around 10-20% improvement over 32-bit dyn_x86.

Quake is now buttery smooth with software rendering.

Speed test
Quake.exe, timedemo demo1

  • 33.7 FPS dosbox-r4236
  • 182.4 FPS dosbox-r4236-jmarsh-x86_64-dynarec

Test environment

  • CPU: Quad Core Intel Core i7-6700K (-MT MCP-) speed/min/max: 3900/800/4300 MHz
  • Kernel: 5.1.7-kyber-20ms-bmq96-100hztick-fullnohz-pre-teo x86_64
  • Up: 10d 20h 15m
  • Mem: 3840.0/15886.3 MiB (24.2%)
  • Storage: 19.45 TiB (78.6% used)
  • Procs: 310
  • Shell: bash 5.0.3
  • inxi: 3.0.33

For those running Linux who want to try jmarsh's patch but don't have a build environment, here's my patched DOSBox 64bit Linux binary (sdl and audio deps are statically-linked while x11 libs are dynamically-linked, and optimized with generic-tuning)

Reply 11 of 123, by Dominus

User metadata
Rank DOSBox Moderator
Rank
DOSBox Moderator

Wrote you on IRC but in case it gets lost, on OS X 10.14 Dosbox crashes with this patch as soon as the dynamic core is activated (for example by the command "core dynamic").
The error message is:
Exit to error: DynCore: illegal option in opcode::Emit: bad RIP address

Windows 3.1x guide for DOSBox
60 seconds guide to DOSBox
DOSBox SVN snapshot for macOS (10.4-11.x ppc/intel 32/64bit) notarized for gatekeeper

Reply 12 of 123, by kjliew

User metadata
Rank Oldbie
Rank
Oldbie

Finally, I nailed down the divide overflow issue with the new 64-bit dyn_x86 core. The IDIV/DIV word division helpers should be updated to take care of 64-bit Bitu type. Otherwise, an unsigned word division with DX:AX > 0x7fffffff will result in divide overflow exception when (quo!=quo16).

After the patch, Win98SE booted and I no longer observed the storm of divide overflow exceptions. Hurray 😀 ! And all games enjoy a very nice speed boost!

--- ../orig/r4237/src/cpu/core_dyn_x86/helpers.h        2019-06-23 21:24:56.955482000 -0700
+++ ./src/cpu/core_dyn_x86/helpers.h 2019-06-23 22:17:48.359367200 -0700
@@ -40,8 +40,8 @@

static bool dyn_helper_divw(Bit16u val) {
if (!val) return CPU_PrepareException(0,0);
- Bitu num=(reg_dx<<16)|reg_ax;
- Bitu quo=num/val;
+ Bit32u num=(reg_dx<<16)|reg_ax;
+ Bit32u quo=num/val;
Bit16u rem=(Bit16u)(num % val);
Bit16u quo16=(Bit16u)(quo&0xffff);
if (quo!=(Bit32u)quo16) return CPU_PrepareException(0,0);
@@ -52,8 +52,8 @@

static bool dyn_helper_idivw(Bit16s val) {
if (!val) return CPU_PrepareException(0,0);
- Bits num=(reg_dx<<16)|reg_ax;
- Bits quo=num/val;
+ Bit32s num=(reg_dx<<16)|reg_ax;
+ Bit32s quo=num/val;
Bit16s rem=(Bit16s)(num % val);
Bit16s quo16s=(Bit16s)quo;
if (quo!=(Bit32s)quo16s) return CPU_PrepareException(0,0);

Reply 13 of 123, by winuser_pl

User metadata
Rank Member
Rank
Member

So there may be a chance of running Windows 95 fast enough?

PC1: Highscreen => FIC PA-2005, 64 MB EDO RAM, Pentium MMX 200, S3 Virge + Voodoo 2 8 MB
PC2: AOpen => GA-586SG, 512 MB SDRAM, AMD K6-2 400 MHz, Geforce 2 MX 400

Reply 14 of 123, by Dominus

User metadata
Rank DOSBox Moderator
Rank
DOSBox Moderator

Please keep on topic. It's about making the 64bit dynamic core faster than it currently is. The 32bit core is already way faster than the 64bit one. Running Windows 95 is not a speed issue anymore. Please use DOSBox-x for that. Further replies on the W95 topic will be moved out of this thread but please just don't 😉

Windows 3.1x guide for DOSBox
60 seconds guide to DOSBox
DOSBox SVN snapshot for macOS (10.4-11.x ppc/intel 32/64bit) notarized for gatekeeper

Reply 15 of 123, by kjliew

User metadata
Rank Oldbie
Rank
Oldbie
Dominus wrote:

The error message is:
Exit to error: DynCore: illegal option in opcode::Emit: bad RIP address

I got the same error on ArchLinux on Intel Haswell Core i3-4010U. Windows 10 build with MSYS2/mingw-w64 is fine. AMD FX and Ryzen CPUs are fine for both ArchLinux and Windows 10 builds.

This smells very similar to an early 64-bit dynarec issue several years ago for new Intel CPUs. Older Intel CPUs such as Core2 are fine.
risc_x64.h - Error DRC64 unhandled memory reference

I will see when I get the chance to build and run it my other Core2 Quad desktop with ArchLinux.

Update:Windows 10 build with MSYS2/mingw-w64 sees the same issue on booting Windows 98SE

Last edited by kjliew on 2019-06-25, 02:35. Edited 1 time in total.

Reply 16 of 123, by jmarsh

User metadata
Rank Oldbie
Rank
Oldbie

It has nothing to do with the CPU type, but how your libc allocates memory. Specifically how two consecutive allocations can return pointers that are at completely opposite ends of the virtual address range instead of being close to each other.

Reply 17 of 123, by kjliew

User metadata
Rank Oldbie
Rank
Oldbie
jmarsh wrote:

It has nothing to do with the CPU type, but how your libc allocates memory. Specifically how two consecutive allocations can return pointers that are at completely opposite ends of the virtual address range instead of being close to each other.

I know about the potential randomness of libc memory allocation. But to be frank, I never understand that why it occurred every time with specific families of Intel CPUs, but not even once on AMD CPUs.

Anyway, I wish you could nail down this last issue for Intel CPUs for the rest of the audience. My AMD FX and Ryzen are pretty happy with the new 64-bit dyn_x86 core. I even tried a clean Win98SE installation all the way from blank image and it worked very well, including necessary official updates such as IE 5.5, WM player 9.0 and DirectX 7.0a, together with MagicDisc CD emulator.

64-bit code is better than 32-bit code at pushing pixels. The Voodoo1 chip emulation got a nice 10~15% speed boost in the 64-bit dyn_x86 core. Similar improvements can also be seen in Glide pass-through by using 64-bit build of OpenGlide, psVoodoo and dgVoodoo2.

Reply 18 of 123, by jmarsh

User metadata
Rank Oldbie
Rank
Oldbie
kjliew wrote:

I know about the potential randomness of libc memory allocation. But to be frank, I never understand that why it occurred every time with specific families of Intel CPUs, but not even once on AMD CPUs.

Possibly it's related to the read/write fsbase/gsbase instructions being available but that seem unlikely, those registers weren't included in the ABI so it would possibly break existing programs if libc had started using them for heap management. More likely your systems contain different software versions.

64-bit code is better than 32-bit code at pushing pixels. The Voodoo1 chip emulation got a nice 10~15% speed boost in the 64-bit dyn_x86 core. Similar improvements can also be seen in Glide pass-through by using 64-bit build of OpenGlide, psVoodoo and dgVoodoo2.

That's possibly because the 32-bit dyn_x86 core is bugged, causing all unaligned dword memory accesses to take the slow path instead of only ones that cross page boundaries.

Reply 19 of 123, by krcroft

User metadata
Rank Oldbie
Rank
Oldbie
kjliew wrote:
jmarsh wrote:

It has nothing to do with the CPU type, but how your libc allocates memory. Specifically how two consecutive allocations can return pointers that are at completely opposite ends of the virtual address range instead of being close to each other.

I know about the potential randomness of libc memory allocation. But to be frank, I never understand that why it occurred every time with specific families of Intel CPUs, but not even once on AMD CPUs.

kjliew, does disabling randomization of the virtual address space help?

sudo sysctl -w kernel.randomize_va_space=0

If you build your own kernel and set CONFIG_COMPAT_BRK ("Disable heap randomization"), it will have a value of 1 which means the stack, shared object pages, and shared memory regions will be randomized; however the data segments will /not/ be randomized. This is what my kernel is set to, and I haven't seen this RIP error yet.

Almost all release-build kernels will have this set to 2, which additionally enables randomization of the data segments.