VOGONS


First post, by ih8registrations

User metadata
Rank Oldbie
Rank
Oldbie

http://sourceforge.net/tracker/index.php?func … 551&atid=467234

<text>
The patch this time speeds up 16 & 32 bit normal2x video rendering by
3x for graphics and 25% for text.

Use 16 bit, and opengl, for best results.

To apply(to cvs source):
cd dosbox
patch -up1 < render_cvs.diff

Enjoy!

Gritty details:

For a video bound game/scene, the renderer handler is the big-o routine by far:

CVS build:
Each sample counts as 0.01 seconds.
% cumulative self self total
time seconds seconds calls s/call s/call name
64.31 23.21 23.21 1552200 0.00 0.00 void Normal<8, 8, true>(unsigned char*)
14.91 28.59 5.38 2072187 0.00 0.00 BituMove(unsigned char*, unsigned char*, unsigned)
8.04 31.49 2.90 228400 0.00 0.00 RENDER_Init(Section*)
5.26 33.39 1.90 void Normal<8, 8, false>(unsigned char*)
2.02 34.12 0.73 228400 0.00 0.00 VGA_TEXT_Draw_Line(unsigned, unsigned, unsigned)
1.03 34.49 0.37 6819579 0.00 0.00 THEOPL3::advance(THEOPL3::OPL3*)
0.58 34.70 0.21 get_thrpc

New handler:
Each sample counts as 0.01 seconds.
% cumulative self self total
time seconds seconds calls s/call s/call name
68.38 9.84 9.84 1548600 0.00 0.00 Normal2x816(unsigned char*)
10.70 11.38 1.54 285200 0.00 0.00 Normalx816(unsigned char*)
5.70 12.20 0.82 285200 0.00 0.00 VGA_TEXT_Draw_Line(unsigned, unsigned, unsigned)
2.57 12.57 0.37 2277486 0.00 0.00 THEOPL3::advance(THEOPL3::OPL3*)
0.97 12.71 0.14 40994748 0.00 0.00 THEOPL3::chan_calc(THEOPL3::OPL3_CH*)
0.90 12.84 0.13 9040201 0.00 0.00 THEOPL3::op_calc1(unsigned, unsigned, int, unsigned)
0.83 12.96 0.12 209501567 0.00 0.00 mem_readw_inline(unsigned long)

Above is profiling output of the intro from Microprose F-117A.

The new handler is a merge of Normal & BituMove so 28.59s vs 9.84s.

</text>

Attachments

  • Filename
    render_cvs.diff
    File size
    5.44 KiB
    Downloads
    198 downloads
    File license
    Fair use/fair dealing exception

Reply 1 of 13, by Magamo

User metadata
Rank Member
Rank
Member

I know that running on my linux box (2.4.25, Matrox G550, 32bit, PIII/600) This patch segfaults when running Wing Commander: Privateer. Going to be difficult to pin it down, I'll wash it through dosbox's debugger later.

But I figured that the feedback might be useful to you. (It segfaults before the sound drivers initialize, and long before anything is displayed to the screen.

Edit: It actually seems to segfault on my system whenever a graphical mode is initializing (Pretty much all of my games)

Reply 4 of 13, by Qbix

User metadata
Rank DOSBox Author
Rank
DOSBox Author

magamo run with gdb ?
to see where it segfaults ?
configure CXXFLAGS="-ggdb3" && make clean && make

cd src
gdb dosbox
run

Water flows down the stream
How to ask questions the smart way!

Reply 5 of 13, by Harekiet

User metadata
Rank DOSBox Author
Rank
DOSBox Author

Not that i put much trust in gprof result anyway, seems more likely that gcc doesn't inline the bitumove function so should probably change that with some force inline attribute. But changing them all a bit again anyway

Reply 6 of 13, by ih8registrations

User metadata
Rank Oldbie
Rank
Oldbie

Even with those little code snippets setup as inline functions they jump all over the place, repeat setup code, and prevent or make it more difficult to do other optimizations. Forcing bitumove to inline would just be a start. Also, sitting behind an array of pointers makes it unecessary to have a unified function that requires a bunch of 'who are you' checks everywhere. Put another way, you've already primed things for streamlined and seperated out, and everything I've been seeing has been telling me this is a bottleneck area that's begging for streamlined and seperated. If you don't trust gprof, and why not & what's your profiling preference?, run some games to see the difference.

Reply 8 of 13, by ih8registrations

User metadata
Rank Oldbie
Rank
Oldbie

Um, it's starring at us in the face. Each time one is called they're blitting the screen. Doing the math shows they have reeeeaally high refresh rates:) Should at most match refresh rate + time for rendering.

Reply 9 of 13, by Magamo

User metadata
Rank Member
Rank
Member

Program received signal SIGSEGV, Segmentation fault.
[Switching to Thread 16384 (LWP 3370)]
0x080bfc21 in Normal2x816 (src=0x9571410 "") at render_normal.h:133
133 *(long double*)dst=*(long double*)src1;

Looks like that's where our segfault is.

Reply 10 of 13, by ih8registrations

User metadata
Rank Oldbie
Rank
Oldbie

What's your compiler version? Does it seg fault on the equivalent lines in the other three new handlers? Set scale to none in 16bit & see if it pukes. My hunch is it's not handling long doubles as 128 bit. Your Pentium III supports SSE so I'm thinking compiler issue.

Reply 12 of 13, by ih8registrations

User metadata
Rank Oldbie
Rank
Oldbie

Well, VGA_SetupDrawing in vga_draw.cpp already tries to set things up as I thought it should, that is, update at the rate of screen refresh. Yet, as we know, dosbox is doing thousands of updates/sec instead of 72, so something somewhere is getting lost in the translation. Fix this bug and performance should go through the roof. On with the hunt..