VOGONS


64-bit dynamic_x86 (patch)

Topic actions

Reply 80 of 123, by latalante

User metadata
Rank Newbie
Rank
Newbie

A small comparison of the dosbox (0.74-3) distribution version from Archlinux (x86-64) compared to the svn version (r4271).
Core2 2GHz
lame.3.97
lame.exe --nohist --abr 64 -mm test.wav test.mp3
2.1348x | 0.8133x

lua 5.2.2 (fannkuch-redux, spectral-norm, start time in seconds)
runtime.exe lua.exe fannk.lua 9
23.4 | 42.25
runtime.exe lua.exe spec.lua 400
32.97 | 66.37

PythonD 2.4.2r1 for DJGPP [GCC 3.3.2] on ms-dos5 (fannkuch-redux, spectral-norm, start time in seconds)
runtime.exe python24.exe spec.py 200
36.7 | 57.42
runtime.exe python24.exe fannk.py 9
79.67 | 122.91
The 64-bit dynamic_x86 version is on average twice as fast. It looks very similar in games.

Edit:
The svn version was optimized with PGO. This is not crucial, the difference can be up to 2-5% in favor of PGO.
CXXFLAGS="-mtune=native -O3 -g0 -fprofile-arcs"
CXXFLAGS="-mtune=native -O3 -g0 -fbranch-probabilities -fprofile-use -fprofile-correction -Wno-error=coverage-mismatch -Wno-missing-profile"

Last edited by latalante on 2019-10-11, 13:39. Edited 3 times in total.

Reply 82 of 123, by Dominus

User metadata
Rank DOSBox Moderator
Rank
DOSBox Moderator

Yes, the patch made the 64bit dynamic core blazing fast 😉

Windows 3.1x guide for DOSBox
60 seconds guide to DOSBox
DOSBox SVN snapshot for macOS (10.4-11.x ppc/intel 32/64bit) notarized for gatekeeper

Reply 83 of 123, by Yesterplay80

User metadata
Rank Oldbie
Rank
Oldbie

So, a 64-Bit build would actually make sense now?

My full-featured DOSBox SVN builds for Windows & Linux: Vanilla DOSBox and DOSBox ECE (Google Drive Mirror)

Reply 84 of 123, by Dominus

User metadata
Rank DOSBox Moderator
Rank
DOSBox Moderator

Yes, the big gap is gone and for some the 64bit build is even faster.
AND a 64bit built might be good to catch bugs

Windows 3.1x guide for DOSBox
60 seconds guide to DOSBox
DOSBox SVN snapshot for macOS (10.4-11.x ppc/intel 32/64bit) notarized for gatekeeper

Reply 85 of 123, by kjliew

User metadata
Rank Oldbie
Rank
Oldbie

Well, if we see how the open sourced software world works,, 32-bit software will be in nobody-care 😵 land very soon. Functional testing will be reduced to nothing since no one is running 32-bit system anymore and only build tests will go on. Bugs will slowly creep in and no one would notice until reported (and if we are lucky if someone will be interested in fixing them.....)

64-bit software is the future. The only minor issue is off-the-shelf Glide wrappers if one cares about Glide pass-through in DOSBox since all of them are 32-bit DLLs, but we at least have dgVoodoo2 coming to rescue. I believe nGlide is ramping up the 64-bit efforts to catch up. Anyway, Kekko's Voodoo chip emulation can still be used as last resort, and 64-bit version gives it an additional 10~15% boost in pushing pixels.

Reply 86 of 123, by Kerr Avon

User metadata
Rank Oldbie
Rank
Oldbie

Is there any real advantage or disadvantage to running a 64 bit version of DOSBox, when it comes to actually emulating DOS games? I wouldn't think that many DOS games process numbers that are larger than 32 bit, nor of course would you need the larger than 32 bit memory access. And even if a given DOS game does use larger than 32 bit numbers, then because the game would break down the numbers to 32 bit (or 16 bit) for processing on the hardware of the game's time, then could DOSBox somehow use it's own native 64 bit features (and the current host's 64 bit features) to speak this up, with the game itself being specifically rewritten to use 64 bit code? It does seem unlikely.

Reply 87 of 123, by jmarsh

User metadata
Rank Oldbie
Rank
Oldbie

Trying to emulate an entire 32-bit x86 CPU on a 32-bit x86 CPU is awkward because most of it is unusable by user code; segment registers, system registers, even ESP is restricted from being freely modifiable. A 64-bit CPU has an extra 8 general purpose registers to help get the job done.

Reply 88 of 123, by fr500

User metadata
Rank Newbie
Rank
Newbie

Hello
So I maintain the libretro port and I've been trying to enable this on my builds.
On the 32-bit build it's working, business as usual.

But on 64 bit I'm getting these build errors once I switch to dynamic_x86:
https://hastebin.com/jigejapiri.sql

For reasons I can't really explain the libretro guys don't like autogen/autotools etc so I have a baked config.h.
That said I regenerated config.h on standalone and used that but I still get the same errors.

If I add -fpermissive I get these in addition to the others:

D:\Tools\msys64\tmp\ccbuZ6oW.s: Assembler messages:
D:\Tools\msys64\tmp\ccbuZ6oW.s:22812: Error: invalid instruction suffix for `push'
D:\Tools\msys64\tmp\ccbuZ6oW.s:22813: Error: invalid instruction suffix for `push'
D:\Tools\msys64\tmp\ccbuZ6oW.s:22814: Error: invalid instruction suffix for `push'
D:\Tools\msys64\tmp\ccbuZ6oW.s:22817: Error: invalid instruction suffix for `pop'

Any ideas?

Reply 90 of 123, by robertmo

User metadata
Rank l33t++
Rank
l33t++

it looks 32-bit windows build is also affected with Ryzen cpus.
Benchmarking emulators on latest machines
though "a fair amount of variance between runs"
would need confirmation by someone else

Reply 91 of 123, by jmarsh

User metadata
Rank Oldbie
Rank
Oldbie
robertmo wrote:
it looks 32-bit windows build is also affected with Ryzen cpus. Benchmarking emulators on latest machines though "a fair amount […]
Show full quote

it looks 32-bit windows build is also affected with Ryzen cpus.
Benchmarking emulators on latest machines
though "a fair amount of variance between runs"
would need confirmation by someone else

Something smells wrong, the difference between 32-bit and 64-bit should not be that large.

Reply 92 of 123, by robertmo

User metadata
Rank l33t++
Rank
l33t++

don't forget it's amd.
it looks they released fast 64
and left 32 same seed as previous generation
i guess they no longer care about 32
that may also explain its unstable performance in 32bit

Reply 93 of 123, by Kisai

User metadata
Rank Member
Rank
Member
jmarsh wrote:
robertmo wrote:
it looks 32-bit windows build is also affected with Ryzen cpus. Benchmarking emulators on latest machines though "a fair amount […]
Show full quote

it looks 32-bit windows build is also affected with Ryzen cpus.
Benchmarking emulators on latest machines
though "a fair amount of variance between runs"
would need confirmation by someone else

Something smells wrong, the difference between 32-bit and 64-bit should not be that large.

It might depend on the compiler. Mingw (GCC for Windows) doesn't use native Windows threads and only uses Windows C runtime, not the C++ runtimes. So compiling something with Mingw is not the same as compiling with MS Visual Studio, and thus the compiler optimizations are different. So if there are specific optimizations for GCC, clang, MSVC, etc there may be different outcomes, even if the same compiler is used to target Windows and Linux.

Reply 94 of 123, by jmarsh

User metadata
Rank Oldbie
Rank
Oldbie

Dynamic code (where the majority of execution time is spent during that benchmark) is assembled at run-time so the compiler doesn't matter. Those results look more like normal/dynrec core vs. dyn_x86 - which is very possible since 0.74-3 uses a different .conf file than an SVN build.

Reply 95 of 123, by latalante

User metadata
Rank Newbie
Rank
Newbie

Dosbox is not only cpu emulation in the first place graphics. For the game quake (timedemo demo1 800x600) I can see what generates the load.

perf top
3.14% dosbox64 [.] Normal1x_8_32_R
2.77% libc-2.30.so [.] __memcpy_ssse3
2.35% dosbox64 [.] RENDER_StartLineHandler
1.62% dosbox64 [.] mem_readd
1.29% [JIT] tid 2332 [.] 0x000076983e10e44c
1.16% dosbox64 [.] CPU_Core_Dyn_X86_Run
1.14% dosbox64 [.] mem_writed_checked
0.95% dosbox64 [.] FPU_FLD_32
0.63% dosbox64 [.] mem_readd_checked
0.62% [JIT] tid 2332 [.] 0x000076983e2e3ab7
0.52% [JIT] tid 2332 [.] 0x000076983e2e38cd
0.51% [JIT] tid 2332 [.] 0x000076983e2e3a64
0.51% [JIT] tid 2332 [.] 0x000076983e2e3786
0.51% [JIT] tid 2332 [.] 0x000076983e10e377
0.48% [JIT] tid 2332 [.] 0x000076983e2e3940
0.45% [JIT] tid 2332 [.] 0x000076983e2e3a6f
0.41% dosbox64 [.] mem_writed
0.40% dosbox64 [.] Normal1x_9_32_R
Although in total CPU emulation dominates.

In recent days I tested various dosbox compilations using different compilers gcc-9.2.0, gcc-4.9.4, clang version 10.0.0. The differences are small but clearly noticeable. It depends on looking at this issue. For me, 1.5% is something to fight for. For someone who has a lot of CPU power, not worth the game.

Reply 97 of 123, by latalante

User metadata
Rank Newbie
Rank
Newbie

Different processor models different disparities between 32-bit and 64 versions.

cat /proc/cpuinfo
vendor_id : GenuineIntel
cpu family : 6
model : 15
model name : Intel(R) Core(TM)2 CPU T7200 @ 2.00GHz

dmesg | grep 'x86/fpu'
[ 0.000000] x86/fpu: x87 FPU will use FXSAVE

dosbox32 DHRY1ND.EXE - 58.19 (VAX MIPS rating)
dosbox64 DHRY1ND.EXE - 60.91

cat /proc/cpuinfo
vendor_id : GenuineIntel
cpu family : 6
model : 85
model name : Intel(R) Xeon(R) CPU @ 2.00GHz

dmesg | grep 'x86/fpu'
[ 0.000000] x86/fpu: Supporting XSAVE feature 0x001: 'x87 floating point registers'
[ 0.000000] x86/fpu: Supporting XSAVE feature 0x002: 'SSE registers'
[ 0.000000] x86/fpu: Supporting XSAVE feature 0x004: 'AVX registers'
[ 0.000000] x86/fpu: Supporting XSAVE feature 0x008: 'MPX bounds registers'
[ 0.000000] x86/fpu: Supporting XSAVE feature 0x010: 'MPX CSR'
[ 0.000000] x86/fpu: Supporting XSAVE feature 0x020: 'AVX-512 opmask'
[ 0.000000] x86/fpu: Supporting XSAVE feature 0x040: 'AVX-512 Hi256'
[ 0.000000] x86/fpu: Supporting XSAVE feature 0x080: 'AVX-512 ZMM_Hi256'
[ 0.000000] x86/fpu: xstate_offset[2]: 576, xstate_sizes[2]: 256
[ 0.000000] x86/fpu: xstate_offset[3]: 832, xstate_sizes[3]: 64
[ 0.000000] x86/fpu: xstate_offset[4]: 896, xstate_sizes[4]: 64
[ 0.000000] x86/fpu: xstate_offset[5]: 960, xstate_sizes[5]: 64
[ 0.000000] x86/fpu: xstate_offset[6]: 1024, xstate_sizes[6]: 512
[ 0.000000] x86/fpu: xstate_offset[7]: 1536, xstate_sizes[7]: 1024
[ 0.000000] x86/fpu: Enabled xstate features 0xff, context size is 2560 bytes, using 'compacted' format.

dosbox32 DHRY1ND.EXE - 79.90 (VAX MIPS rating)
dosbox64 DHRY1ND.EXE - 95.48

Edit:
The results for Xeon are certainly heavily underestimated due to the fact that on one core were running other processes quite clearly loading the processor (running under a cloud system).

Reply 99 of 123, by latalante

User metadata
Rank Newbie
Rank
Newbie
jmarsh wrote:

The dynamic core always uses FSAVE/FRSTOR regardless of the host CPU's capabilities

Thanks for this information.

jmarsh wrote:

plus the Dhrystone benchmark is specifically designed to not test floating point performance.

only emulated processor? Is this benchmark reliable?
I wondered why he achieved such high results under qemu. While generally qemu is not particularly suitable for DOS.
dosbox64 DHRY1ND.EXE - 60.91
qemu-system-x86_64 DHRY1ND.EXE - 96.76 Completely inadequate result (for such a weak processor and for other benchmarks).