VOGONS


First post, by swaaye

User metadata
Rank l33t++
Rank
l33t++

I've spent a few hours compiling DOSBOX. I have a EeePC 900 and wanted to see if I get get a bit more speed out of the little 900MHz Celeron M inside it. I've only compiled DOSBOX one other time, years ago, so I needed to relearn how to do it. The wiki and some forum threads here were invaluable. I am a PHP programmer / Linux admin by day so I have some decent experience with using GCC on a Linux-based server and was at home once things were up and running.

I got DOSBOX CVS from the Wiki, DOSBOX 0.72 source from the main site, MINGW 5.1.4, MSYS 1.0.10, and MsysDTK-1.0.1. Also got nasm.exe and the directx headers as instructed on the SDL Win32 guide. Compiled my own SDL with the DDRAW fix. Didn't mess with SDL_Net or SDL_Sound.

To figure out CFLAGS/CXXFLAGS, I usually go to the "safe cflags" list in the Gentoo Wiki. I went with "-march=pentium-m -O2 -pipe -fomit-frame-pointer". I also tested with -O3 and -Os. I tried out the --enable-core-inline DOSBOX option too. And, after reading about it in a thread here, I tried out GCC profile guided optimization. Compiled SDL without profile optimization, but with the above flags.

Profile optimization seems like it would add some significant variability to the compilation results. Depending on what games you use, how long you run them, what you do in the games, what parts of DOSBOX you use, etc.

References:
http://gentoo-wiki.com/Safe_Cflags
http://gcc.gnu.org/onlinedocs/gcc-3.4.6/gcc/i … _002d64-Options
http://gcc.gnu.org/onlinedocs/gcc-3.4.6/gcc/O … ptimize-Options
http://www.libsdl.org/extras/win32/mingw32/README.txt
CVS compile question(s)
http://www.dosbox.com/wiki/BuildingDOSBox

To test the results, I ran Chris Dial's CBENCH DOS SVGA 3D benchmark and QUAKE TIMEDEMO DEMO2 (in default VGA mode). Both programs showed the same relative performance boost or loss. I realize that there are a whole load of variables to DOSBOX performance, and picking a benchmark is going to be troublesome.

The results:

  • -O2 is best. -O3 and -Os were slower. (CVS)
  • profile guided optimization can give some definite benefits. The fastest compile I produced was a profile optimized DOSBOX 0.72 with "-march=pentium-m -O2 -pipe -fomit-frame-pointer". I used Crusader: No Regret, Quake, and Dark Forces as games to profile with. Ran the latter in FM music mode. (CVS,0.72)
  • Profile guided optimizations produce a smaller executable. Resulting compile with "-march=pentium-m -O2 -pipe -fomit-frame-pointer" and profile optimization with 0.72 was 2.75MB. (CVS,0.72)
  • -O3 caused compilation to fail in the "use profile" recompile. (CVS)
  • DOSBOX's inlined memory cpu core option was not faster. (CVS)
  • The current CVS is somewhat slower than 0.72. Same profile guided compile with above options ~30% slower (14.5 vs. 19.5fps) in Quake timedemo. However, performance was equal in CBENCH SVGA. I couldn't make a CVS compile run as fast as 0.72 no matter the flags.

Results in Quake vs. 0.72 stock exe were a ~5% improvement. This is going to vary though. Tested in full screen with fullresolution=0x0, dynamic core, cycles=max, frameskip=0, output=ddraw, scaler=none, aspect=false.

If you'd like to download the executable from my fastest result and check it out:
http://rapidshare.com/files/137358177/DOSBOX_P-M.7z.html (~800KB)

Last edited by swaaye on 2008-08-20, 18:49. Edited 1 time in total.

Reply 1 of 16, by Pickle

User metadata
Rank Member
Rank
Member

Interesting read. -fomit-frame-pointer in general helps a lot in performance it did for the GP2X.

Reply 2 of 16, by Qbix

User metadata
Rank DOSBox Author
Rank
DOSBox Author

core-inline doesn't help because you are running a game with the dynamic cpu core.
core-inline inlines the memory read and write functions in the normal cpu core only.
So to test the effect of that you have to set core=normal instead of core=auto.

Water flows down the stream
How to ask questions the smart way!

Reply 3 of 16, by swaaye

User metadata
Rank l33t++
Rank
l33t++
Qbix wrote:

core-inline doesn't help because you are running a game with the dynamic cpu core.
core-inline inlines the memory read and write functions in the normal cpu core only.
So to test the effect of that you have to set core=normal instead of core=auto.

Ok. Thanks for the heads up.

I'm experimenting with GCC 4.3.1 now. Can't get DOSBOX 0.72 to compile with it (errors), but CVS is working.

Reply 4 of 16, by Qbix

User metadata
Rank DOSBox Author
Rank
DOSBox Author

I recall some includes missing for gcc 4.x
Should be fairly simple to port back from the cvs if needed

Water flows down the stream
How to ask questions the smart way!

Reply 5 of 16, by swaaye

User metadata
Rank l33t++
Rank
l33t++

Yeah I did some file compares and rigged up a 0.72 with some fixed CPP files with the added includes. It worked. The compiler was warning about some sort of typedef mis-use too but it didn't seem to cause anything to fail.

GCC 4.3.1 won't make me a faster EXE though. I tried lots of stuff. Used the supposedly improved profiled compiling, tried the vectorizer, etc. It just can't quite beat the 0.72 compile I made above. It also made me a few non-functional compiles. Profiling + O3 doesn't seem to work right and DOSBOX immediately crashed, and a profiled vectorizer compile crashed too. And I was excited to try a profiled vectorized compile..... The render_scalers.cpp file gets vectorized in a ton of places. "Vectorized without profiling" is insignificantly faster than a non-vectorized compile. You'd think they could instrument the exe during profile generation to determine which vectorized loops really did perform and drop those that didn't.

Notably, CVS seems to compile fine and without any warnings. 😀

Reply 6 of 16, by wd

User metadata
Rank DOSBox Author
Rank
DOSBox Author

As Qbix already said you don't gain much from compiler optimizations as
(if you're using the dynamic core) the heavy parts are generated (recompiled)
where the compiler does not matter.
Only things like scalers or non-cpu hardware (adlib) benefit from better compiler
optimizations/profiling.

Reply 7 of 16, by swaaye

User metadata
Rank l33t++
Rank
l33t++

Yeah that's what I'm seeing. The only time compiler options that made a tangible improvement were O2 + profile guided optimization + omitted frame pointer with GCC 3.4.5 and DOSBOX 0.72. That gave me a small improvement of about 5% over the 0.72 compile on the DOSBOX site. Everything else, even vectorizing and mfpmath=sse, did basically nothing to improve or even worsen performance.

I haven't tested scalers much at all because a) I don't like them most of the time b) they are slower than just using ddraw to scale.

I'm trying to understand the code better but obviously it's a major endeavor to get a grasp on the stuff you guys have put together. C++ is easy to read of course, but understanding the hardware emulation is another story.

Reply 8 of 16, by Qbix

User metadata
Rank DOSBox Author
Rank
DOSBox Author

For your information purposes:
DOSBox 0.72 was compiled with GCC 3.4.4

CXXFLAGS=" -s -O3 -fomit-frame-pointer -ffast-math -march=i586 -mtune=i686 -fexpensive-optimizations"
and
--enable-core-inline

-O2 instead of -O3 might be faster. At least I had that feeling a few times. but it is hard to benchmark.

Water flows down the stream
How to ask questions the smart way!

Reply 9 of 16, by wd

User metadata
Rank DOSBox Author
Rank
DOSBox Author

even vectorizing and mfpmath=sse, did basically nothing to improve or even worsen performance.

The dynamic core uses some non-trivial loading/saving of the fpu state which
might not work well together with mmx/sse/whatever code in other parts of
the emulation (it's funny that it works at all *g* ).

Reply 10 of 16, by jal

User metadata
Rank Oldbie
Rank
Oldbie
swaaye wrote:

out of the little 900MHz Celeron M inside it

Unless you have 'over'clocked it, it actually runs on 630MHz, since it is underclocked by default.

JAL

Reply 11 of 16, by swaaye

User metadata
Rank l33t++
Rank
l33t++

Actually, on the Eee900 it starts at 900 MHz. There is an option in the BIOS for "power save mode" which puts it at 630 MHz and undervolts everything. On "performance" mode, it starts at 900 Mhz and I've found that it'll do 1000 MHz fine. On power save, it can only handle 750 MHz oc.

Reply 12 of 16, by jal

User metadata
Rank Oldbie
Rank
Oldbie
swaaye wrote:

Actually, on the Eee900 it starts at 900 MHz. There is an option in the BIOS for "power save mode" which puts it at 630 MHz and undervolts everything. On "performance" mode, it starts at 900 Mhz and I've found that it'll do 1000 MHz fine. On power save, it can only handle 750 MHz oc.

Ok, that's good to know, as I've got a 900. The wiki on eeeuser.com is quite out of date.

JAL

Reply 13 of 16, by swaaye

User metadata
Rank l33t++
Rank
l33t++

I've found that my Athlon 64 3000+ in my big notebook is a little over 2x faster than this EeePC at 1000 MHz. Going by a Quake timedemo. With the same compile optimized for a Pentium M 🤣. Poor little Dothan CPU misses its 2MB cache I think. The A64 is an old Clawhammer with 1MB L2 @ 1800MHz.

Reply 14 of 16, by jal

User metadata
Rank Oldbie
Rank
Oldbie

Sigh... I was being lazy and downloaded the precompiled executable, only to find out it was a Windows EXE (I'm running Linux 😀)

JAL

Reply 15 of 16, by swaaye

User metadata
Rank l33t++
Rank
l33t++

well you'll just have to try to figure out how to build it on there. 😀

Reply 16 of 16, by jal

User metadata
Rank Oldbie
Rank
Oldbie

yeah, no doubt I can, but then I have to find out how to install gcc, sdl, and the whole muck, and I was happy to avoid that 😀.

JAL