VOGONS

Common searches


Reply 520 of 725, by pantercat

User metadata
Rank Newbie
Rank
Newbie

It works!!

I've downloaded dosbox svn r4163, I've applied the last pixel-perfect patch [2018-09-22] Alpha 14 and I've compiled dosbox with

patch -p0 <pixel-perfect-alpha14-4163.patch && cd src && ./autogen.sh && ./configure && make

Now I can run dosbox with surfacepp 😀 Thank you

Reply 521 of 725, by Ant_222

User metadata
Rank Oldbie
Rank
Oldbie
pantercat wrote:
It works!! […]
Show full quote

It works!!

I've downloaded dosbox svn r4163, I've applied the last pixel-perfect patch [2018-09-22] Alpha 14 and I've compiled dosbox with

patch -p0 <pixel-perfect-alpha14-4163.patch && cd src && ./autogen.sh && ./configure && make

Now I can run dosbox with surfacepp :) Thank you

I am glad it does, but we should thank aaronp for finding the bug and blame myself for introducing it :-)

Reply 522 of 725, by Yesterplay80

User metadata
Rank Member
Rank
Member

A new ECE build including the fixed patch is available as well, please check if it now works as it should.

My full-featured DOSBox SVN builds (without debugger) for Windows: Vanilla DOSBox and DOSBox ECE (Enhanced Community Edition)

Reply 524 of 725, by unther

User metadata
Rank Newbie
Rank
Newbie
Ant_222 wrote:
unther wrote:

Ant, would you expect your 5x5y algorithm to perform similarly to normal5x?

I know it doesn't, but I do not understand the implementation of the normalnx scalers so I cannot tell you now whence the difference. I will think about it.

Ant_222, just a follow-up to our performance discussion a year ago, I created patch (attached) that adds a normal4x5y scaler to dosbox, which uses this same scaling technique as the built-in normal2x/normal3x. Using this normal4x5y scaler along with 'output=surface' is much faster on an RPi3 versus using 'output=surfacepp', while have exactly the same pixel output @1080p. I don't have an effective way to benchmarch the two methods, but normal4x5y can handle the Wolf3D fade effect without stutter on the RPi3, even at 2 or 3 times the cycle count that surfacepp stutters at.

I'm not sure how the normalNx scalers differ from the way surfacepp does scaling, but maybe you'll be able to see something here that'll give you a hint.

Note: I actually created this patch a year ago again r4025, just never got around to posting it, but it still applies and works with r4163.

Attachments

  • Filename
    normal4x5y.diff
    File size
    5.88 KiB
    Downloads
    4 downloads
    File license
    Fair use/fair dealing exception

Reply 527 of 725, by unther

User metadata
Rank Newbie
Rank
Newbie
krcroft wrote:

This is an excellent addition for those on single-Ghz systems (early Pentium 3's, Rpi3, etc..) to get as close to crisp pixels on modern displays. Thank you!

Not a problem, but just to clarify for others, because 'normal4x5y' is a fixed scaler, it's only optimal for running 320x200 games on a display with a height of just over 1000 pixels (e.g. 1280x1024, 1680x1050, 1920x1080). For displays with a pixel height of 1200 or more, you'd want to create a patch to add a 'normal5x6y' scaler.

Since I run RetroPie on my RPI3, I just use game-specific configs to set the optimal scaler. I have my RPI3 connected to a 1280x1024 monitor, so for 320x200 games I'll use 'normal4x5y', and for 640x480 games I'll use 'normal2x'. (And I just use Ant's pixel perfect on my desktop connected to a 1920x1200 monitor.)

Reply 528 of 725, by unther

User metadata
Rank Newbie
Rank
Newbie
Ant_222 wrote:

unther, I never understood how those built-in scalers worked. Can you explain, perhaps?

Unfortunately, I don't have any real understanding of how they work either (or really any familiarity with the dosbox code base). I just used the existing normalNx scalers as a template to create a normal4x5y without delving in to how these scalers actually push pixels to the display.

I just took a look now but the code is hard for me to follow due to the nested includes and conditional pre-processor directives being used to reduce code duplication. It looks like the actual scaling code is in render_simple.h, which itself in included multiple times by render_templates.h, once in each scaler definition. render_templates.h in included multiple times by render_scalers.cpp, looks like once for each variation of color bit depth.

At its heart, it looks like these scalers just copy a source pixel into a grid/block for output. Here's the definition from render_templates.h for the built-in normal3x and the normal4x5y that I created from it. Note that SCALERFUNC is then inserted into the code included from render_simple.h

#define SCALERNAME              Normal3x
#define SCALERWIDTH 3
#define SCALERHEIGHT 3
#define SCALERFUNC \
line0[0] = P; \
line0[1] = P; \
line0[2] = P; \
line1[0] = P; \
line1[1] = P; \
line1[2] = P; \
line2[0] = P; \
line2[1] = P; \
line2[2] = P;
#include "render_simple.h"
#undef SCALERNAME
#undef SCALERWIDTH
#undef SCALERHEIGHT
#undef SCALERFUNC

#define SCALERNAME Normal4x5y
#define SCALERWIDTH 4
#define SCALERHEIGHT 5
#define SCALERFUNC \
line0[0] = P; \
line0[1] = P; \
line0[2] = P; \
line0[3] = P; \
line1[0] = P; \
line1[1] = P; \
line1[2] = P; \
line1[3] = P; \
line2[0] = P; \
line2[1] = P; \
line2[2] = P; \
line2[3] = P; \
line3[0] = P; \
line3[1] = P; \
line3[2] = P; \
line3[3] = P; \
line4[0] = P; \
line4[1] = P; \
line4[2] = P; \
line4[3] = P;
#include "render_simple.h"
#undef SCALERNAME
#undef SCALERWIDTH
#undef SCALERHEIGHT
#undef SCALERFUNC

That's as far as I went - hopefully that points you in the right direction.

BTW, if you can't figure out how to get your scaler to run as fast as these built-in ones, another approach might be to just patch in all the scaler variants you need (normal4x5y, normal5x6y, etc.) and just change to them on the fly after you've calculated the optimal one from the PAR. (You can change the scaler on the fly from the command line, should it might be doable - you might not even need surfacepp anymore?)

Reply 529 of 725, by Ant_222

User metadata
Rank Oldbie
Rank
Oldbie

Thanks for the explanation, unther.

unther wrote:

BTW, if you can't figure out how to get your scaler to run as fast as these built-in ones

I do have a couple ideas. One is to simplify my overcomplicated code by removing surfacenb (which is already available as openglnb) surfacenp (which is too slow and hardly different from nearest neighbor), and then carefully to analylse the simplified code. The second idea is to parallelise the scaling.

another approach might be to just patch in all the scaler variants you need (normal4x5y, normal5x6y, etc.) and just change to them on the fly after you've calculated the optimal one from the PAR. (You can change the scaler on the fly from the command line, should it might be doable - you might not even need surfacepp anymore?)

Indeed, but this is so ugly that I will let someone else it :-) I will help with the selection of the optimal scaling factors.

Reply 530 of 725, by Yesterplay80

User metadata
Rank Member
Rank
Member

FYI: r4178 once again breaks compatibility with your patch, as I had to adapt the changes to the modified patch I use, I quickly did the same with the original patch.

Filename
pixel-perfect-alpha14-4178.zip
File size
20.36 KiB
Downloads
4 downloads
File license
Fair use/fair dealing exception

My full-featured DOSBox SVN builds (without debugger) for Windows: Vanilla DOSBox and DOSBox ECE (Enhanced Community Edition)

Reply 532 of 725, by Ant_222

User metadata
Rank Oldbie
Rank
Oldbie
Yesterplay80 wrote:

FYI: r4178 once again breaks compatibility with your patch

I have fixed it in alpha 15 and also introduced a minor change to the implementation. Let me know if it breaks anything.

Reply 533 of 725, by Ant_222

User metadata
Rank Oldbie
Rank
Oldbie
unther wrote:

Ant_222, just a follow-up to our performance discussion a year ago, I created patch (attached) that adds a normal4x5y scaler to dosbox, which uses this same scaling technique as the built-in normal2x/normal3x. Using this normal4x5y scaler along with 'output=surface' is much faster on an RPi3 versus using 'output=surfacepp', while have exactly the same pixel output @1080p. I don't have an effective way to benchmarch the two methods, but normal4x5y can handle the Wolf3D fade effect without stutter on the RPi3, even at 2 or 3 times the cycle count that surfacepp stutters at.

I have tested the scaling algorithm from alpha 15 separately from DosBox:

320x200 -> [4x5] -> 1280x1000
FPS: 438
MPS: 561

320x200 -> [8x10] -> 2560x2000
FPS: 145
MPS: 743

where MPS stands for output megapixels per second. As you see, the results are more than sufficient even on my ancient PC with AMD A4-3400 APU. Does anyone have an idea how to determine the bottleneck of this algorithm when it works as part of DOSBox?

Reply 535 of 725, by Yesterplay80

User metadata
Rank Member
Rank
Member
Ant_222 wrote:
Yesterplay80 wrote:

FYI: r4178 once again breaks compatibility with your patch

I have fixed it in alpha 15 and also introduced a minor change to the implementation. Let me know if it breaks anything.

Nope, seems to work so far. I also updated DOSBox ECE with A15 of your patch, available in ECE r4180.2.

My full-featured DOSBox SVN builds (without debugger) for Windows: Vanilla DOSBox and DOSBox ECE (Enhanced Community Edition)