Ant, would you expect your 5x5y algorithm to perform similarly to normal5x?
I know it doesn't, but I do not understand the implementation of the normalnx scalers so I cannot tell you now whence the difference. I will think about it.
Ant_222, just a follow-up to our performance discussion a year ago, I created patch (attached) that adds a normal4x5y scaler to dosbox, which uses this same scaling technique as the built-in normal2x/normal3x. Using this normal4x5y scaler along with 'output=surface' is much faster on an RPi3 versus using 'output=surfacepp', while have exactly the same pixel output @1080p. I don't have an effective way to benchmarch the two methods, but normal4x5y can handle the Wolf3D fade effect without stutter on the RPi3, even at 2 or 3 times the cycle count that surfacepp stutters at.
I'm not sure how the normalNx scalers differ from the way surfacepp does scaling, but maybe you'll be able to see something here that'll give you a hint.
Note: I actually created this patch a year ago again r4025, just never got around to posting it, but it still applies and works with r4163.
This is an excellent addition for those on single-Ghz systems (early Pentium 3's, Rpi3, etc..) to get as close to crisp pixels on modern displays. Thank you!
This is an excellent addition for those on single-Ghz systems (early Pentium 3's, Rpi3, etc..) to get as close to crisp pixels on modern displays. Thank you!
Not a problem, but just to clarify for others, because 'normal4x5y' is a fixed scaler, it's only optimal for running 320x200 games on a display with a height of just over 1000 pixels (e.g. 1280x1024, 1680x1050, 1920x1080). For displays with a pixel height of 1200 or more, you'd want to create a patch to add a 'normal5x6y' scaler.
Since I run RetroPie on my RPI3, I just use game-specific configs to set the optimal scaler. I have my RPI3 connected to a 1280x1024 monitor, so for 320x200 games I'll use 'normal4x5y', and for 640x480 games I'll use 'normal2x'. (And I just use Ant's pixel perfect on my desktop connected to a 1920x1200 monitor.)
unther, I never understood how those built-in scalers worked. Can you explain, perhaps?
Unfortunately, I don't have any real understanding of how they work either (or really any familiarity with the dosbox code base). I just used the existing normalNx scalers as a template to create a normal4x5y without delving in to how these scalers actually push pixels to the display.
I just took a look now but the code is hard for me to follow due to the nested includes and conditional pre-processor directives being used to reduce code duplication. It looks like the actual scaling code is in render_simple.h, which itself in included multiple times by render_templates.h, once in each scaler definition. render_templates.h in included multiple times by render_scalers.cpp, looks like once for each variation of color bit depth.
At its heart, it looks like these scalers just copy a source pixel into a grid/block for output. Here's the definition from render_templates.h for the built-in normal3x and the normal4x5y that I created from it. Note that SCALERFUNC is then inserted into the code included from render_simple.h
That's as far as I went - hopefully that points you in the right direction.
BTW, if you can't figure out how to get your scaler to run as fast as these built-in ones, another approach might be to just patch in all the scaler variants you need (normal4x5y, normal5x6y, etc.) and just change to them on the fly after you've calculated the optimal one from the PAR. (You can change the scaler on the fly from the command line, should it might be doable - you might not even need surfacepp anymore?)
BTW, if you can't figure out how to get your scaler to run as fast as these built-in ones
I do have a couple ideas. One is to simplify my overcomplicated code by removing surfacenb (which is already available as openglnb) surfacenp (which is too slow and hardly different from nearest neighbor), and then carefully to analylse the simplified code. The second idea is to parallelise the scaling.
another approach might be to just patch in all the scaler variants you need (normal4x5y, normal5x6y, etc.) and just change to them on the fly after you've calculated the optimal one from the PAR. (You can change the scaler on the fly from the command line, should it might be doable - you might not even need surfacepp anymore?)
Indeed, but this is so ugly that I will let someone else it :-) I will help with the selection of the optimal scaling factors.
FYI: r4178 once again breaks compatibility with your patch, as I had to adapt the changes to the modified patch I use, I quickly did the same with the original patch.
The attachment pixel-perfect-alpha14-4178.zip is no longer available
Ant_222, just a follow-up to our performance discussion a year ago, I created patch (attached) that adds a normal4x5y scaler to dosbox, which uses this same scaling technique as the built-in normal2x/normal3x. Using this normal4x5y scaler along with 'output=surface' is much faster on an RPi3 versus using 'output=surfacepp', while have exactly the same pixel output @1080p. I don't have an effective way to benchmarch the two methods, but normal4x5y can handle the Wolf3D fade effect without stutter on the RPi3, even at 2 or 3 times the cycle count that surfacepp stutters at.
I have tested the scaling algorithm from alpha 15 separately from DosBox:
where MPS stands for output megapixels per second. As you see, the results are more than sufficient even on my ancient PC with AMD A4-3400 APU. Does anyone have an idea how to determine the bottleneck of this algorithm when it works as part of DOSBox?