VOGONS

Common searches


Graphics performance boost

Topic actions

  • This topic is locked. You cannot reply or edit posts.

First post, by Kronuz

User metadata
Rank Member
Rank
Member

Hello everybody, finally this is my first patch for DOSBox, this patch significantly improves the performance of the scalers.
What I did is basically modify the scalers so they update only the parts of the screen that really changed since the last time they were processed.
This has an amazingly positive impact on some of the scalers and a clearly noticeable overall performance boost.

This are some numbers comparing the performance of the scalers before and after the patch (version 4 of the patch):

.................... Old ......... New ....... Imp.
Normal ......... 127.3 ........ 67 ......... 90%
Normal2x ...... 356 .......... 69 ......... 416%
AdvMame2x .. 825.5 ........ 72 ......... 1047%
AdvMame3x .. 1665 ......... 77 ......... 2062%
AdvInterp2x .. 841 .......... 72 ......... 1068%
Interp2x ....... 868 .......... 72 ......... 1106%
TV2x ........... 413.5 ........ 71 ......... 482%
Hq2x ........... 10480 ....... 109 ........ 9515%
TVHq2x ........ 10694 ....... 110 ........ 9622%
* timings are from the high-resolution performance counter
averaged over about 5000 frames and divided by 1,000,000.
(A 100% improvement is equivalent to twice the speed.)

Of course these numbers are for low action games... others, like DOOM, can give different numbers under stress:

.................... Old ......... New ....... Imp.
Normal ......... 130 ......... 79 .......... 65%
Normal2x ...... 354 ........ 104 ......... 239%
AdvMame2x .. 1146 ....... 191 ......... 500%
TVHq2x ....... 10890 ...... 959 ........ 1005%

(all my tests are made without frameskip and running at 8000 cycles on a P4 3Ghz, using the MS VC 7.1 compiler)

You can also get the patch at SourceForge with the name of "Scalers performance boost". I also included in the patch the Hq2x scaler by Moe, now optimized as well, a Normal3x scaler and modified versions of the Normal, Normal2x, AdvMame2x and Hq2x scallers with TV scanlines.

Just for recapitulation, in the latest version of the patch (version 8 ), Hq2x scaler went down from 109 to 89 in the numbers above, so I leave you as an exercise to do the math for the other scalers 😉


Version History:
+ Version 11b (still RC7) Small bugfix on OpenGL code.
+ Version 11 (RC7) TV mode added to the optimized scalers.
+ Version 10 (RC6) double buffer not supported; fixed minor bugs.
+ Version 9b (RC5) just added the exposure event
+ Version 9 (RC5) fixed the aspect correction, improved screen updates
+ Version 8 (RC4) updated region detection speed improvements
+ Version 7 (RC3) implemented gulikoza's suggestions
+ Version 6 (RC2) added modified-chunks "prediction"
+ Version 5 (RC1) fixes the toggle fullscreen/windowed issue.
+ Version 4 fixes the Warcraft 2 & other games issue
+ Version 3 fixes some artifacts and improves speed
+ Version 2 greatly improves the speed


!!! ATTENTION !!!
This (RC7) is now truly hopefully the last release candidate version; this is a call for everyone to stress-test the patch before it's officially released so that I can move to new areas of DOSBox optimization (if I can find some extra time). I'll be around at #dosbox

(Note: That my patch is officially released does NOT mean it will yet be in the cvs, if it ever makes it to the cvs at all.)

Kronuz.

Attachments

  • Filename
    dosbox-optscalers-20051213.diff
    File size
    77.5 KiB
    Downloads
    217 downloads
    File comment
    Optimized Scalers v.11b (still RC7)
    File license
    Fair use/fair dealing exception
  • Filename
    dosbox-optscalers-20051205b.diff
    File size
    77.5 KiB
    Downloads
    100 downloads
    File comment
    Optimized Scalers v.11 (RC7)
    File license
    Fair use/fair dealing exception
  • Filename
    dosbox-optscalers-20051203b.diff
    File size
    81.93 KiB
    Downloads
    117 downloads
    File comment
    Optimized Scalers v.10 (RC6)
    File license
    Fair use/fair dealing exception
  • Filename
    dosbox-optscalers-20051202.diff
    File size
    78.4 KiB
    Downloads
    96 downloads
    File comment
    Optimized Scalers v.9b (RC5)
    File license
    Fair use/fair dealing exception
  • Filename
    dosbox-optscalers-20051201.diff
    File size
    77.75 KiB
    Downloads
    103 downloads
    File comment
    Optimized Scalers v.9 (RC5)
    File license
    Fair use/fair dealing exception
Last edited by Kronuz on 2005-12-13, 18:17. Edited 17 times in total.

Kronuz
"Time is of the essence"

Reply 2 of 227, by `Moe`

User metadata
Rank Oldbie
Rank
Oldbie

Especially for Hq2x, there's more optimization possible. I had to remove some of it in order to accomodate for the latest render changes.

QBix, now that it seems like DRAW_PARTS is not exactly needed (see the discussion about it elsewhere), could scalers do their own loop over the source framebuffer again, as it used to be (instead of being called once per scanline)? Optimizing the render loop for Hq2x makes quite a big difference.

Reply 6 of 227, by Kronuz

User metadata
Rank Member
Rank
Member

Okay, this just keeps getting better and better... The new version of the patch is ready, it has many improvements, I fixed some glitches and a small bug.
The new version gives (once again) almost twice the speed the previous patch 😀

There are still some issues with the palette (I noticed in Warcraft 2) it's nothing big, just a missing update for some weird reason. Anyway I'm feeling we're getting close to have the patch at its best 😉
On this patch I worked for the improvement of the None scaler and higher resolutions. I gained an extra boost for all scalers as well, and now the command line jumped from being 740 on the original code to 420, so it's a 76% improvement instead of the initial gain of 13% and the subsequent 40% in the second version of the patch.

I fixed the <windows.h> issue and updated the patch in the first post of my topic. Please test it, if you can, and report your results; if you see a bug, or something is not working properly, please let me know so I can fix it.

Last edited by Kronuz on 2005-11-24, 18:18. Edited 1 time in total.

Kronuz
"Time is of the essence"

Reply 8 of 227, by Kronuz

User metadata
Rank Member
Rank
Member

What are the differences between all the versions of the patches?
I've run the benchmarks again and here are the numbers:

Plataform games & Warcraft 2
......................... unopt ........... v1 .............. v2 ........... v3
640x480 ............. 620 ............. 614 ............ 374 .......... 275
Prompt ............... 740 ............. 670 ............ 538 .......... 426
Normal ............... 126 ............. 139 ............ 92 ............ 67
Normal2x ............ 362 ............. 134 ............ 95 ............ 69
AdvMame2x ........ 646 ............. 142 ............. 97 ............ 72
Hq2x ................. 10676 .......... 217 ............ 128 .......... 109

For games where movement is high and constant, like with Doom:
......................... unopt ........... v1 ............... v2 ........... v3
Normal ............... 130 ............. 146 ............. 104 ........... 79
Normal2x ............ 353 ............. 166 ............. 128 .......... 104
AdvMame2x ........ 1146 ............ 253 ............. 213 .......... 191
Hq2x ................. 10600 .......... 1160 ........... 1019 ......... 959

* I used 8000 fixed cycles for all tests and no frameskip in a P4@3GHz, using the MS VC 7.1 compiler.

**Applying the timesync patch improves the speed by 30% in my test with Doom.

Reply 12 of 227, by Kronuz

User metadata
Rank Member
Rank
Member

Oh, I forgot to take out the <windows.h>....
I'm using it to profile the optimizations (QueryPerformanceCounter())...
it's safe just to delete that include... sorry 'bout that.

I have posted the fixed patch now.

Last edited by Kronuz on 2005-11-24, 18:21. Edited 1 time in total.

Reply 13 of 227, by Qbix

User metadata
Rank DOSBox Author
Rank
DOSBox Author

mmm. I don't know exactly which parts are your doing.
Let's see:
1) changing all scalers to start with a capital is a cosmetic change. So you don't need to modify all the strcasecmp as that one ignores the case.
2) at first glance it looks like a lot of code is "double" did you consider using a define for the starting and the ending block ?

3) it's a bit hard to figure out as you merged it with the hq2 scaler and the 16 bit vesa support of moe and the normal3x scaler of ykhwong.

4) those pallet glitches with warcraft 2. are they still present ?

5) timesync shouldn't alter your timings. The max amount of cycles is another story though.

Water flows down the stream
How to ask questions the smart way!

Reply 14 of 227, by GreatBarrier86

User metadata
Rank Newbie
Rank
Newbie

This seems incredible. I don't mean to rain on your parade but with all these advantages, is there a downside or is this really well optimized code?

IBM ThinkPad X40
1.2Ghz with 2MB L2 cache
1.0GB DDR2 SDRAM
Intel 852/855GME Graphics Media Accelerator with 64MB
12'' LCD Screen
SoundMax Integrated Audio

Peas Pobie!

Reply 15 of 227, by Kronuz

User metadata
Rank Member
Rank
Member

Hi Qbix, it's good to hear from you. Here are some answers:

Qbix wrote:

mmm. I don't know exactly which parts are your doing.
Let's see:
1) changing all scalers to start with a capital is a cosmetic change. So you don't need to modify all the strcasecmp as that one ignores the case.

I know, but it's also a cosmetic for the sourceode 😉 ...I'm still not sure how good that is, since most of the other code is in lowercase too, it might actually be better just to leave it lowercase as well...

Qbix wrote:

2) at first glance it looks like a lot of code is "double" did you consider using a define for the starting and the ending block ?

yep, a lot is doulble, I was just figuring out what parts are needed to be joined or put together (still a work in progress) 😉
There's a lot of duplicated stuff, 'cause I first trying to see how big the gains would be, so I started just wrintting without thinking of that. I'll fix it, the best I can, don't worry about that 😀

Qbix wrote:

3) it's a bit hard to figure out as you merged it with the hq2 scaler and the 16 bit vesa support of moe and the normal3x scaler of ykhwong.

I wrote the new normal3x scaler, was there a normal3x scaler already? 😜
anyway that one was pretty easy (the hard part was understanding hq2x algorithm enough to convert the code) and yes, I'm meging with moe's hq2x for a reason: the scaler was really really slow and it's still a scaler that needed modifications for it to run faster (you can't just apply my patch and then moe's and expect hq2x to be faster), so that was a must.
About merging with vesa16... well, that's not really needed for my patch, but it's that way, 'cause I wanted to keep compatibility for the future so I was testing it using vesa16 (if it works with vesa16 it works without it, but not necessarily the other way around) and I can always keep vesa16 out if you want, just let me first finish the patch (to remove vesa16 it's only needed to delete the first line of every scaler and change the variable type the scaler functions receive (and of course stop changing any other thing the vesa patch changes).

Qbix wrote:

4) those pallet[e] glitches with warcraft 2. are they still present ?

I think they are, but they are not very noticeable... I'm working on those (as well as in other optimization stuff)

Qbix wrote:

5) timesync shouldn't alter your timings. The max amount of cycles is another story though.

but it does, that's what I found to be weird... when I activated it it seemed it took less time for the scalers to do their job...

*or* it could be that for some reason, when the timesync is activated the VGA_DrawPart() function in vga_draw.cpp is called more often with no lines to be drawn.

Okay, that's all for the questions, if you need to know anything else, please ask 😀 and... please join #dosbox more ofter if you can 😉

Kronuz
"Time is of the essence"

Reply 16 of 227, by Kronuz

User metadata
Rank Member
Rank
Member
GreatBarrier86 wrote:

This seems incredible. I don't mean to rain on your parade but with all these advantages, is there a downside or is this really well optimized code?

It is really optimized code. There's no downside, once it's finished there should be nothing at all wrong with it... unless perhaps, right now, when you toggle fullscreen, you don't get the display updated right away. It can be easily fixed in some ways, but that would mean to be too obtrusive in other parts of DOSBox, so we'll need to talk to someone with the authority to add small changes in the underlying architecture of DOSBox, or with enough knowledge of it to suggest the best solution. Still, this small issue is nothing you can't live without... I mean, eventually the screen does get updated if you switch fullscreen/windowed, and let's face it we don't do that often so it's not a big deal.

Other two small issues I've noticed are: one is with the palette in some games, but I'll fix that, that's not a problem (I hope) the other issue has to do with the Interp2x leaving some traces as it interpolates lines just above arts of the screen that changed. This one is a bit harder to fix, but it's doable... or we can just keep the old Interp2x scaler, though it's slower 🙁

Kronuz
"Time is of the essence"

Reply 17 of 227, by Kronuz

User metadata
Rank Member
Rank
Member
`Moe` wrote:

Optimizing the render loop for Hq2x makes quite a big difference.

Moe, could you elaborate a bit more on this? how exactly can the Hq2x be optimized if it's in its own loop?

...and hey, moe, and join the IRC channel more often 😉
It's been a long time since we last talked there...

Kronuz
"Time is of the essence"

Reply 19 of 227, by Kronuz

User metadata
Rank Member
Rank
Member

Okay, this is yet another version of my patch. this one fixes the problems with the palette in some games (i.e. Warcraft 2) and it now works perfectly. I only need to do some cleaning to the code and run some tests on the algorithms to verify their sanity.

(Get the patch above or from sourceforge)

Kronuz
"Time is of the essence"