VOGONS

Common searches


Reply 440 of 733, by Ant_222

User metadata
Rank Oldbie
Rank
Oldbie
krcroft wrote:
Unther, […]
Show full quote

Unther,

Does your build solve the issue mentioned here?

VIDEO Patch for pixel-perfect scaling (SDL1)

My build was based on r2025 and the current pixel perfect patch at the time, and I haven't checked to see if performance has improved since then (I'm running profile-feedback built with gcc 7.x)

You must have mistyped the revision. DOSBox was well over 3000 at the inception of the patch. The patch has become faster, but the speedup may not be sufficient for your situtation, so please compile it with -O3 or higher and test. Edit: Implementing pixel-perfect scaling as a pixel shader or via the hardware-accelerated scaling functionality of SDL 2.0 will solve all performace problems.

Last edited by Ant_222 on 2017-09-06, 08:53. Edited 3 times in total.

Reply 441 of 733, by Ant_222

User metadata
Rank Oldbie
Rank
Oldbie
lukeman3000 wrote:
Ant_222 wrote:
Giuliano wrote:

What I thought was wrong was the resulting aspect ratio of the games with 640x400 graphics. But now I see that, for surfacepp to bring those graphics closer to 4:3, it would require a 3200x2400 full screen area.

Yes, except that 3200 x 2400 will give you precisely a 4:3 aspect ratio, whereas 2560 x 2000 will be nearly perfect.

Wouldn't 1600x1200 give him a unity PAR as well as a 4:3 aspect ratio?

Yes, it would, but 640x400 cannot be scaled to this resolution in a pixel-perfect manner:

640 x 400 -> [2.5 x 3] -> 1600 x 1200,

where the horizontal scale 2.5 is not an integer.

I was under the impression that 1600x1200 is the lowest resolution for which you will have a truly perfect PAR and aspect ratio […]
Show full quote

I was under the impression that 1600x1200 is the lowest resolution for which you will have a truly perfect PAR and aspect ratio for games with native resolution of 320x200 and 1.20 PAR. To my understand, it is as follows:

For games with native resolution of 320x200 and 1.20 PAR
960x800 = 89% perfect PAR; 92% perfect aspect ratio
1280x1000 = 96% perfect PAR; 99% perfect aspect ratio
1600x1200 = 100% perfect PAR; 100% perfect aspect ratio

Correct.

Assuming I'm understanding all this correctly so far, I am still confused by something.

You first quote my explanation about 640x400 but then switch to 320x200.

Why is 4:3 the correct aspect ratio when the native resolution's aspect ratio (320x200) is 1.6?

Because some games are intended to be displayed using rectangular pixels. Some other games running at 320x200, such as Lure of the Temptress, are designed for square pixels. These should be played without aspect-ratio correction at 1600x1000 or at another proportional resolution.

Reply 442 of 733, by Ant_222

User metadata
Rank Oldbie
Rank
Oldbie
lukeman3000 wrote:

Assuming I'm understanding all this correctly so far, I am still confused by something. Why is 4:3 the correct aspect ratio when the native resolution's aspect ratio (320x200) is 1.6?

Mark that 4:3 is the ratio of the width and height of the image as displayed on the monitor, whereas 1.6 (or 8:5) is the ratio of the pixel dimensions, which in case of non-square pixels is different from that of the image dimensions. For a PAR of 1.2:

(320 / 200) / 1.2 = 4 / 3

Reply 443 of 733, by Ant_222

User metadata
Rank Oldbie
Rank
Oldbie
unther wrote:

After testing on my desktop, I wanted to get it running on my Raspberry Pi 3 (RPI3). However, I ran into compiling problems related to OpenGL calls. From what I understand, the RPI3 GPU driver doesn't support standard OpenGL, so SDL functions related to OpenGL aren't available.

Normally, when compiling dosbox for use on an RPI3, the lack of Open is worked around by passing "--disable-opengl" to the dosbox configure script, which uses the C_OPENGL preprocessor directive to disable OpenGL calls. Unfortunately, your new code for handling opengl surfaces is not wrapped in C_OPENGL #ifdefs, so it's not being disabled with "--disable-opengl", hence causes compile time errors.

I will fix it and ask you to test the patch again.

Reply 444 of 733, by Ant_222

User metadata
Rank Oldbie
Rank
Oldbie

unther, please test whether the conditional compilation of OpenGL support is correct in the attached patch.

Attachments

  • Filename
    openglfix.patch
    File size
    84.61 KiB
    Downloads
    69 downloads
    File license
    Fair use/fair dealing exception

Reply 445 of 733, by unther

User metadata
Rank Newbie
Rank
Newbie
krcroft wrote:

Unther,

Does your build solve the issue mentioned here?

VIDEO Patch for pixel-perfect scaling (SDL1)

krcroft, unfortunately the wolf3d menu fade effect is still causing audio stutter on my build when using output=surfacepp. The game itself seems to run fine though as long as you fix the cycles to around 4000. (At around 5000 cycles and higher the dosbox process starts to max out one of the cores.)

I was curious about this, so I did a bit more testing. The first thing I noticed was that even when using output=surface, my build of dosbox was far slower than the RetroPie packaged dosbox. (i.e. my dosbox build would max out a CPU core at a fixed 10000 cycles with wolf3d, whereas the RetroPie dosbox binary could do 20000 cycles or higher). From looking at the RetroPie dosbox build script (https://github.com/RetroPie/RetroPie-Setup/bl … ators/dosbox.sh), I noticed the follow changes were being made to config.h:

    if isPlatform "arm"; then
# enable dynamic recompilation for armv4
sed -i 's|/\* #undef C_DYNREC \*/|#define C_DYNREC 1|' config.h
if isPlatform "armv6"; then
<snip>
else
sed -i 's/C_TARGETCPU.*/C_TARGETCPU ARMV7LE/g' config.h
sed -i 's|/\* #undef C_UNALIGNED_MEMORY \*/|#define C_UNALIGNED_MEMORY 1|' config.h
fi
fi

After making those three changes to my config.h and rebuilding, my dosbox build performance was roughly doubled when using output=surface (to basically the same performance as the original RetroPie build). However, these config.h changes had no noticeable effect when using output=surfacepp - it was still limited to about 4000 cycles and still had audio stuttering during fade effects.

My next thought was that maybe the RP3's CPU simply can't handle drawing a 1280x1000 surface in a single thread along with all the other emulation demands of dosbox. The default RetroPie dosbox config just draws the original 320x200 surface and let's the hardware scaler do the rest of the work - so using output=surfacepp on a 1080p display is by comparison drawing 20x (4x5y) the number of pixels. I thought a good way to confirm this was to patch in the normal5x scalar into my build and see if it had similar performance issues (As far as I know, normal5x is done purely in software would actually draw even more pixels that the previous test: 25x vs 20x).

To my surprise, the normal5x scaler runs wolf3d perfectly fine on an RPI3 without any audio stutter at all. I was even able to bump the cycles up to 15000+ without maxing out a CPU core. I'm not sure why surfacepp's 4x5y scaling would run so much slower than normal5x's 5x5y scaling - is this even a valid comparison? Ant, would you expect your 5x5y algorithm to perform similarly to normal5x?

Reply 446 of 733, by unther

User metadata
Rank Newbie
Rank
Newbie
Ant_222 wrote:

unther, please test whether the conditional compilation of OpenGL support is correct in the attached patch.

Ant, the attached patch works correctly with --disable-opengl - thanks!

Reply 447 of 733, by Ant_222

User metadata
Rank Oldbie
Rank
Oldbie
unther wrote:

Ant, would you expect your 5x5y algorithm to perform similarly to normal5x?

I know it doesn't, but I do not understand the implementation of the normalnx scalers so I cannot tell you now whence the difference. I will think about it.

Reply 448 of 733, by Ant_222

User metadata
Rank Oldbie
Rank
Oldbie

I have measured the performance of my scaling routine on my PC with AMD A4 3400 at the scaling of a 320x200 image to 1600x1200. Here are the results for different optimisation levels:

-O0: 134 fps
-O1: 201 fps
-O2: 395 fps
-O3: 504 fps

They look pretty good for practical purposes. I wonder why it works slower when compiled as part of DOSBox. Whosoever shall desire to try the test on his own machine, let him download the source from the attachment to this post. The program outputs the result in this form:

Scaling 320x200 -> 1600x1200 at 504 fps.

Attachments

  • Filename
    pixelscale-test.zip
    File size
    52.64 KiB
    Downloads
    367 downloads
    File license
    Fair use/fair dealing exception

Reply 449 of 733, by unther

User metadata
Rank Newbie
Rank
Newbie
Ant_222 wrote:

Whosoever shall desire to try the test on his own machine, let him download the source from the attachment to this post.

Ant,

I'm getting the following results with your test program running on a Raspberry Pi 3 (Quad-core ARM Cortex-A7 @ 1.2 GHz)

  -O0: Scaling 320x200 -> 1600x1200 at 24 fps.
-O1: Scaling 320x200 -> 1600x1200 at 31 fps.
-O2: Scaling 320x200 -> 1600x1200 at 32 fps.
-O3: Scaling 320x200 -> 1600x1200 at 32 fps.

I think Dosbox normally wants to render at 60 or 70 fps (depending on the video mode), so I guess that explains the slowdowns on full-screen fade effects.

Reply 450 of 733, by ZakMcKracken

User metadata
Rank Newbie
Rank
Newbie

Hi , I recently searched for better rendering output and stumbled on this thread, building my own debug version for linux(arch) from svn for some time now (http://svn.code.sf.net/p/dosbox/code-0/dosbox rev. 4052), patch 14 applied ok and looks beautiful !

Are there any plans integrating this patch to some other builds (or even official ones, is anyone there still alive ? 😵 ).
Dreaming of a working git repo with continuous builds in the future, the only ones I found did not want to compile because of reasons 😢

Running the benchmark yields the following results on my Core i5-4690 (3.9 GHz Turbo) (had to append the -lm flag in your Makefile or it would not build)

-O0 668 fps
-O1 902 fps
-O2 1506 fps (3 run average)
-O3 1322 fps (3 run average)

so building dosbox with default -O2 it is 😁

Thanks again for this great scaler!

EDIT:
Also got an Raspberry PI 3 (arch armv7 kernel, stock clock of 1.2GHz), benchmarks:

-O0 43 fps
-O1 74 fps
-O2 80 fps
-O3 81 fps

Reply 451 of 733, by Ant_222

User metadata
Rank Oldbie
Rank
Oldbie
ZakMcKracken wrote:

Are there any plans integrating this patch to some other builds (or even official ones, is anyone there still alive ?

I should be glad if there were and should help all I could.

Running the benchmark yields the following results on my Core i5-4690 (3.9 GHz Turbo) (had to append the -lm flag in your Makefile or it would not build)

What does the flag do?—I did not find it in the documentation.

so building dosbox with default -O2 it is :-D […]
Show full quote
-O0 668 fps
-O1 902 fps
-O2 1506 fps (3 run average)
-O3 1322 fps (3 run average)

so building dosbox with default -O2 it is :-D

Strange deterioration—seems to depend upon the CPU type. But your numbers are heart-warming.

Also got an Raspberry PI 3 (arch armv7 kernel, stock clock of 1.2GHz), benchmarks: […]
Show full quote

Also got an Raspberry PI 3 (arch armv7 kernel, stock clock of 1.2GHz), benchmarks:

-O0 43 fps
-O1 74 fps
-O2 80 fps
-O3 81 fps

What makes your RPi so much faster than unther's?

Everybody is welcome to help me optimise the scaling algorithm. Should anybody know how the built-in normalnx scalers work and why they perform many times better than my routine, I will be grateful for an explanation.

Reply 452 of 733, by ZakMcKracken

User metadata
Rank Newbie
Rank
Newbie
Ant_222 wrote:

What does the flag do?—I did not find it in the documentation.

When searching why some included math.h functions did throw compiler errors I only found
"That's a linker option. It tells the linker to link with (-l) the m library (libm.so/dll). That's the math library. You often need it if you #include <math.h>."
So no idea why its needed on my setup 😒

What makes your RPi so much faster than unther's?

Arch linux has very recent kernels (maybe optimizations for armv7 cores?), also I am running with enabled VC4 video core, maybe its offloading some work from the cpu and freeing up resources... but dosbox is not very fast to begin with, benchmarking shows doom and quake(super small window benchmarks using phils computer lab benchmark package) that its less than 1/2 the speed , ugh.

BUT on my core i5 its actually faster:

normal3x / overlay stock 0.74 dosbox
Doom (35*gameticks/realticks formula): 93 fps
Quake: 31.1 fps
PC Player Benchmark (640x480): 43.2

surface/no aspect and scaler none stock 0.74 dosbox build (fastest code path?):
Doom: 139 fps
Quake: 41.5 fps
PC Player: 53 fps

surfacepp:
Doom: 121 fps
Quake: 39.2
PC Player: 46.1 fps

So its a faster renderer for me 😎 , almost as fast than doing nothing at all to the picture, did you by any chance write code that is super optimized for some sse instruction set by accident 🤣

Reply 453 of 733, by Ant_222

User metadata
Rank Oldbie
Rank
Oldbie

Thanks for the testing, Zak. I am glad my patch performs well on your machine. To make the results more accurate, could you use surface instead of overlay and make sure that the scaler uses the same scale as the pixel-perfect mode, because comparing normal3x with, say, 4x5 pixel-perfect scaling is wrong without the introduction of a correction coefficient of 9/20. In other words, we are interested in pixels per second, rather than frames per second.

Reply 454 of 733, by ZakMcKracken

User metadata
Rank Newbie
Rank
Newbie
Ant_222 wrote:

To make the results more accurate, could you use surface instead of overlay and make sure that the scaler uses the same scale as the pixel-perfect mode

Here you go , also included adjusted PC Player benchmark with 2x2 scale / unscaled for raw throughput comparison

surface normal3x aspect=false
Doom 133fps
Quake 40.6
PC Player 52.6 fps (runs at 640x480 and is not scaled)

scalerpp windowsize=960x600
Doom 130 fps (320x200 -> 960x600) 3x3
Quake 39.7 (320x200 -> 960x600) 3x3
Pc Player 50.0 (640x480 -> 640x480) 1x1

surface normal2x forced
PC Player 50.1 fps (1280x960)

scalerpp windowsize=desktop (1920x1080)
PC Player 46 fps 640x480 --> 1280x960 2x2 (same resulting scale and result as in my previous run)

Reply 456 of 733, by ZakMcKracken

User metadata
Rank Newbie
Rank
Newbie
Ant_222 wrote:

Those are excellent results, Zak, thank you. I still fail to understand, however, why on unther's RPi wolf3d works fast with normal5x but stutters with surfacepp...

Playing around with the CFLAGS and -O3 I only seem to get any decent performance when using some form of opengl output so far, but I didn't patch anything for sdl2 or similar, have to check what the best output for stock dosbox is first , performance is all over the place and overlay completely locks up dosbox / so slow I never see the start of any benchmark...

Reply 459 of 733, by Yesterplay80

User metadata
Rank Oldbie
Rank
Oldbie
Ant_222 wrote:

This is what I feared. I will look into it at the weekend. Edit: Yesterplay80 has managed to apply it to 4063.

Actually, it's not that much that has changed really. It's just a matter of replacing all instances of "#if (HAVE_DDRAW_H) && defined(WIN32)" with "#if C_DDRAW". That's because of the change already introduced in r4056 that moved the dddraw detection to a configure option. Apart from that, only two line defintions for dosbox.cpp changed, but the patch command should get around that by itself. However, here's a fixed patch that works flawlessly with r4063:

Filename
pixel-perfect-alpha14_fixed.diff
File size
84.22 KiB
Downloads
72 downloads
File license
Fair use/fair dealing exception

My full-featured DOSBox SVN builds for Windows & Linux: Vanilla DOSBox and DOSBox ECE (Google Drive Mirror)