Since the current 64-bit builds of DOSBox only support the slower dynrec core I decided it was time to port the dynamic_x86 core.
Results look good; a huge speed boost over current 64-bit builds and around 10-20% improvement over 32-bit dyn_x86. Just in time for Apple and Ubuntu to drop 32-bit support...
Needs lots of testing due to it basically being a new core. Different OSes, compilers, PIE, ASLR, etc. can all trigger different codegen strategies (which hopefully have all been implemented). Ideally the x32 ABI would be used instead and we could go back to assuming all pointers fit in 32-bits but that's not likely to happen any time soon.
Patch may need slight formatting adjustments due to being generated by git.
Wow! I never thought that this could be done. Kudo, jmarsh!
On MSYS2/mingw-w64 current, everything built and 64-bit DOSBox and a few DOS4GW games checked out fine on Windows 10 Pro x64, great performance, including both Glide pass-through and Voodoo chip emulation. This is with GCC 9.10/binutils 2.30 with -O3 and native optimization. I will be testing across different Intel/AMD CPUs both desktop and laptop class. A very exciting news for myself who recently endorsed ArchLinux for x86_64 and rolling updates, knowing the fact that DOSBox would be slower on x86_64, and no official i686 multilib support. I will get back with more results on ArchLinux build, including running Windows 98 from DOSBox 😵
I wish your contribution would be accepted upstream.
AFAIK, Ubuntu is not dropping 32bit application support in 19.10 just the libraries so it'll be up to the developer of the application to supply them or for the user to hunt them down and add them manually. More than likely the people who care will switch to Debian rather than going through the hassle of using chroot, LXD, snaps, flatpak or VMs for all their 32bit programs. Ubuntu is just being lazy. It is pretty sad seeing the comments from people who don't realize how much software is still 32bit and dismissing peoples concerns.
AFAIK, Ubuntu is not dropping 32bit application support in 19.10 just the libraries so it'll be up to the developer of the application to supply them or for the user to hunt them down and add them manually.
That really means dropping 32-bit support. No offence 😀 ArchLinux is spearheading this. Ubuntu is just following.
ArchLinux build was great, too, really great performance from the new 64-bit dyn_x86 core. I even have a record score running AutoCAD R13 with BENCH13 from SoftEngine. I think it is very well optimized for modern 32-bit DOS protected-mode with DOS extenders.
Win98SE boots, too, however it is less deterministic than the previous dyn_x86 core compiled in i686 or dynrec core compiled in x86_64. If Win98SE runs SCANDISK during Windows boot, then it would be stormed with divide overflow error. This happens most frequently. Then, next Windows boot will automatically select Safe Mode and the core would hung a couple times, and eventually Safe Mode will be up. Then, properly shutdown from Safe Mode and next boot volla! Windows boots. The illegal read/write [ADDR] from CS:IP and PageFault are normal, both previous dyn_x86 and dynrec showed the same messages on successful Windows boots.
1Copyright 2002-2019 DOSBox Team, published under GNU GPL. 2--- 3CONFIG: Loading primary settings from config file /tmp/win.conf 4Memory sizes above 31 MB are NOT recommended. 5Stick with the default values unless you are absolutely certain. 6MIXER: Got different values from SDL: freq 44100, blocksize 512 7MIDI: Opened device:mt32 8Glide:LFB access: read-write (no aux) 9Illegal read from dad45000, CS:IP 28:c02322b1 10Illegal read from dad45001, CS:IP 28:c02322b1 11Illegal read from dad45002, CS:IP 28:c02322b1 12Illegal read from dad45003, CS:IP 28:c02322b1 13Illegal read from dad45000, CS:IP 28:c02322b1 14Illegal read from dad45001, CS:IP 28:c02322b1 15Illegal read from dad45002, CS:IP 28:c02322b1 16Illegal read from dad45003, CS:IP 28:c02322b1 17Illegal read from dad45000, CS:IP 28:c02322b1 18Illegal read from dad45001, CS:IP 28:c02322b1 19Illegal read from dad45002, CS:IP 28:c02322b1 20Illegal read from dad45003, CS:IP 28:c02322b1 21Illegal write to dad45000, CS:IP 28:c02322b1 22Illegal write to dad45001, CS:IP 28:c02322b1 23Illegal write to dad45002, CS:IP 28:c02322b1 24Illegal write to dad45003, CS:IP 28:c02322b1 25Illegal read from dad45000, CS:IP 28:c02322b1 26Illegal read from dad45001, CS:IP 28:c02322b1 27Illegal read from dad45002, CS:IP 28:c02322b1 28Illegal read from dad45003, CS:IP 28:c02322b1 29Illegal read from dad45000, CS:IP 28:c02322b1 30Illegal read from dad45001, CS:IP 28:c02322b1 31Illegal read from dad45002, CS:IP 28:c02322b1 32Illegal read from dad45003, CS:IP 28:c02322b1 33Illegal read from dad45000, CS:IP 28:c02322b1 34Illegal read from dad45001, CS:IP 28:c02322b1 35Illegal read from dad45002, CS:IP 28:c02322b1 36Illegal read from dad45003, CS:IP 28:c02322b1 37Illegal write to dad45000, CS:IP 28:c02322b1 38Illegal write to dad45001, CS:IP 28:c02322b1 39Illegal write to dad45002, CS:IP 28:c02322b1 40Illegal write to dad45003, CS:IP 28:c02322b1 41Illegal read from dad45000, CS:IP 28:c02322b1 42Illegal read from dad45001, CS:IP 28:c02322b1 43Illegal read from dad45002, CS:IP 28:c02322b1 44Illegal read from dad45003, CS:IP 28:c02322b1 45Illegal read from dad45000, CS:IP 28:c02322b1 46Illegal read from dad45001, CS:IP 28:c02322b1 47Illegal read from dad45002, CS:IP 28:c02322b1 48Illegal read from dad45003, CS:IP 28:c02322b1 49Illegal read from dad45000, CS:IP 28:c02322b1 50Illegal read from dad45001, CS:IP 28:c02322b1 51Illegal read from dad45002, CS:IP 28:c02322b1 52Illegal read from dad45003, CS:IP 28:c02322b1 53Illegal write to dad45000, CS:IP 28:c02322b1 54Illegal write to dad45001, CS:IP 28:c02322b1 55Illegal write to dad45002, CS:IP 28:c02322b1 56Illegal write to dad45003, CS:IP 28:c02322b1 57PageFault at 8088A000 type [6] queue 1 58Left PageFault for 8088a000 queue 1 59VOODOO: OpenGL: mode set, resolution 800:600 60VOODOO: OpenGL: quit
…Show last 6 lines
61 62 63 Reboot requested, quitting now. 64 65
Once in the Windows, most of the time it ran great. Intermittently it would get "Reboot requested, quitting now" on exiting a program/game (for eg. GLQuake/Quake2). I guess this could be faults within fault handler (triple fault). This would also reproduce consistently when trying to shutdown properly from Windows. It never reached the "It is Safe to Turn Off" screen, but ended up in "Reboot requested, quitting now" from DOSBox console and next boot will trigger SCANDISK and the whole story repeats. The behavior is similar from Windows 10 Pro x64 and ArchLinux.
I wonder if you would take the new 64-bit dyn_x86 core to support Win98. Though running Win98 from DOSBox is unsupported, both the previous dynamic cores do the Win98 (with some quirks) pretty well. I understand that debugging Win98 quirks is challenging.
Can't say enough good stuff about this work, jmarsh!
This ensures DOSBox will be a first-class 64-bit binary, offering the best performance on moderns free software platforms, without needing a pile of obsolete or kludged-in 32bit libraries that were previously needed to get the best performance.
Given open source labour isn't free yet the amount of software and security issues continues to grow, it's only a matter of time before more distributions shed support for platforms that have long been replaced (amd's 64bit processors were selling 16 years ago, and Intel's one year later with the pentium 4 model F).
Yeah, our community here is unique with many of us still tinkering with very old hardware, and for those with 32bit-only OSes there will still be the 32bit dynarec core.
Regarding X32 on Linux - it's been a while since I played with this on Gentoo. If the kernel, glibc, build-stack, and X11 libraries support it, then presumably we could get a pure static build of dosbox out of it (assuming building sdl, audio libs, and so on from source to static libs). Feels like a fair bit of work though, and this hybrid binary format never gained traction with distros.
I also looked into managing your own 32bit "pointers" using an index table (instead being forced to use 64bit pointers), but the the performance gains aren't there (or negligible), while the code cleanliness is reduced with an additional layer of indirection.
Unless something works with core=normal, it's out of scope for dynamic cores.
I am sorry that I didn't quite get what you mean. DOSBox dynamic cores (dyn_x86 & dynrec) work pretty well for Win9x, even that was just a coincidence. In fact, core=normal has never worked for Win9x and is too slow.
For all the CPU-heavy, late 90's DOS games at my disposal, they work really well with the new 64-bit dynamic_x86. I couldn't find anything that it won't run perfectly. I don't have Win3.1/WfW3.11 to check. Hopefully, someone will do that shortly if that's what would expose the quirks of running Win9x between the old and new dynamic cores.
For those running Linux who want to try jmarsh's patch but don't have a build environment, here's my patched DOSBox 64bit Linux binary (sdl and audio deps are statically-linked while x11 libs are dynamically-linked, and optimized with generic-tuning)
Wrote you on IRC but in case it gets lost, on OS X 10.14 Dosbox crashes with this patch as soon as the dynamic core is activated (for example by the command "core dynamic").
The error message is:
Exit to error: DynCore: illegal option in opcode::Emit: bad RIP address
Finally, I nailed down the divide overflow issue with the new 64-bit dyn_x86 core. The IDIV/DIV word division helpers should be updated to take care of 64-bit Bitu type. Otherwise, an unsigned word division with DX:AX > 0x7fffffff will result in divide overflow exception when (quo!=quo16).
After the patch, Win98SE booted and I no longer observed the storm of divide overflow exceptions. Hurray 😀 ! And all games enjoy a very nice speed boost!
Please keep on topic. It's about making the 64bit dynamic core faster than it currently is. The 32bit core is already way faster than the 64bit one. Running Windows 95 is not a speed issue anymore. Please use DOSBox-x for that. Further replies on the W95 topic will be moved out of this thread but please just don't 😉
The error message is:
Exit to error: DynCore: illegal option in opcode::Emit: bad RIP address
I got the same error on ArchLinux on Intel Haswell Core i3-4010U. Windows 10 build with MSYS2/mingw-w64 is fine. AMD FX and Ryzen CPUs are fine for both ArchLinux and Windows 10 builds.
It has nothing to do with the CPU type, but how your libc allocates memory. Specifically how two consecutive allocations can return pointers that are at completely opposite ends of the virtual address range instead of being close to each other.
It has nothing to do with the CPU type, but how your libc allocates memory. Specifically how two consecutive allocations can return pointers that are at completely opposite ends of the virtual address range instead of being close to each other.
I know about the potential randomness of libc memory allocation. But to be frank, I never understand that why it occurred every time with specific families of Intel CPUs, but not even once on AMD CPUs.
Anyway, I wish you could nail down this last issue for Intel CPUs for the rest of the audience. My AMD FX and Ryzen are pretty happy with the new 64-bit dyn_x86 core. I even tried a clean Win98SE installation all the way from blank image and it worked very well, including necessary official updates such as IE 5.5, WM player 9.0 and DirectX 7.0a, together with MagicDisc CD emulator.
64-bit code is better than 32-bit code at pushing pixels. The Voodoo1 chip emulation got a nice 10~15% speed boost in the 64-bit dyn_x86 core. Similar improvements can also be seen in Glide pass-through by using 64-bit build of OpenGlide, psVoodoo and dgVoodoo2.
I know about the potential randomness of libc memory allocation. But to be frank, I never understand that why it occurred every time with specific families of Intel CPUs, but not even once on AMD CPUs.
Possibly it's related to the read/write fsbase/gsbase instructions being available but that seem unlikely, those registers weren't included in the ABI so it would possibly break existing programs if libc had started using them for heap management. More likely your systems contain different software versions.
64-bit code is better than 32-bit code at pushing pixels. The Voodoo1 chip emulation got a nice 10~15% speed boost in the 64-bit dyn_x86 core. Similar improvements can also be seen in Glide pass-through by using 64-bit build of OpenGlide, psVoodoo and dgVoodoo2.
That's possibly because the 32-bit dyn_x86 core is bugged, causing all unaligned dword memory accesses to take the slow path instead of only ones that cross page boundaries.
It has nothing to do with the CPU type, but how your libc allocates memory. Specifically how two consecutive allocations can return pointers that are at completely opposite ends of the virtual address range instead of being close to each other.
I know about the potential randomness of libc memory allocation. But to be frank, I never understand that why it occurred every time with specific families of Intel CPUs, but not even once on AMD CPUs.
kjliew, does disabling randomization of the virtual address space help?
1sudo sysctl -w kernel.randomize_va_space=0
If you build your own kernel and set CONFIG_COMPAT_BRK ("Disable heap randomization"), it will have a value of 1 which means the stack, shared object pages, and shared memory regions will be randomized; however the data segments will /not/ be randomized. This is what my kernel is set to, and I haven't seen this RIP error yet.
Almost all release-build kernels will have this set to 2, which additionally enables randomization of the data segments.