VOGONS


First post, by gulikoza

User metadata
Rank Oldbie
Rank
Oldbie

I've been waiting for some time for this 😀
I've noticed that my ATi drivers (Cat7.1) finally support the Pixel Buffer Object extension. This is the ARB version of NV_Pixel_Data_Range extension. So I rewrote dosbox opengl handling for PBOs (nVidia cards should already have support). This should finally make opengl the same speed on ati (btw: NV_PDR seems to be disabled in sdlmain.cpp for some time already, I guess nvidia is as fast as ati hehe)...well, in theory. In practice, it doesn't seem to work any faster 🙁. I don't know if the drivers don't have full support or if a different texture format has to be used, maybe somebody can figure it out...

Attachments

  • Filename
    dosbox_pbo.diff
    File size
    7.58 KiB
    Downloads
    426 downloads
    File license
    Fair use/fair dealing exception

http://www.si-gamer.net/gulikoza

Reply 1 of 18, by gulikoza

User metadata
Rank Oldbie
Rank
Oldbie

Using some test program I found, texture download speed is up from 750MB/s to about 910MB/s using PBOs on my ATi. So I guess it must be doing something 😀. Unfortunately if I change the pixel data format to anything else from the default GL_UNSIGNED_INT_8_8_8_8_REV (even using the most simple GL_RGBA, GL_UNSIGNED_BYTE) I get a crash in atioglxx.dll...

http://www.si-gamer.net/gulikoza

Reply 2 of 18, by `Moe`

User metadata
Rank Oldbie
Rank
Oldbie

While I didn't work with pbo's, I had these stupid crashes during OpenGL-HQ work in two situations: ATIs triple buffering feature (no idea why that was, no idea if it still is), and when I actually did something wrong. Some subtle wrong programming, not exactly following the OpenGL spec. I was able to find out the latter type of problem by querying the opengl error status very frequently to find out which call didn't work. Often it was _not_ the crashing call, but something earlier.

Reply 3 of 18, by gulikoza

User metadata
Rank Oldbie
Rank
Oldbie

No, no errors unfortunately. The disassembly is rather interesting:

Access violation reading location 0x00000024.
692A8C57 call 692E4920
-> 692A8C5C mov ecx,dword ptr [eax+24h]

seems somebody's forgetting if(ret==NULL) 🤣

For further testing, I've compiled 4 versions of dosbox (disabled all render optimizations, so it was drawing full 70fps):
1. original texture format (GL_UNSIGNED_INT_8_8_8_8_REV)
2. GL_UNSIGNED_BYTE
3. PBO + GL_UNSIGNED_INT_8_8_8_8_REV
4. PBO + GL_UNSIGNED_BYTE

The results on my ATi are (tested on dualcore...50% = 100% of a single core)
1. 50 % cpu usage
2. 20-25 % cpu usage
3. 50 % cpu usage
4. crash

Tested with nvidia card:
1. 5-10%
2. 5-10%
3. 0-2%
4. 0-2%

The results are rather interesting. Seems my PBO code does work 😀. Also on ATi, changing the texture format from GL_UNSIGNED_INT_8_8_8_8_REV to GL_UNSIGNED_BYTE gains more than 100%! Why is GL_UNSIGNED_INT_8_8_8_8_REV used anyway? I read it's some kind of architecture independant format, but should that matter in dosbox?

Next step, I wrote a simple app, just drawing some random texture to the screen. Made the same 4 builds:

ATi:
1. 42fps
2. 222fps
3. 46fps
4. crash

nVidia:
1. 293fps
2. 274fps
3. 1122fps
4. 1183fps

Again, using GL_UNSIGNED_BYTE works 5(!) times faster on ATi. Again, using PBO seem to help a lot, unfortunately the most interesing thing crashes with ATi. I hope this will be resolved in the next driver version...

Attached: my test app 😀

Attachments

  • Filename
    pbo_test.c
    File size
    5.57 KiB
    Downloads
    506 downloads
    File license
    Fair use/fair dealing exception

http://www.si-gamer.net/gulikoza

Reply 5 of 18, by gulikoza

User metadata
Rank Oldbie
Rank
Oldbie
CC = g++
CFLAGS = -Wall -O3 -fomit-frame-pointer -march=i586 -mtune=i686 -I/usr/include/sdl -I/usr/include
LIBS = -L/usr/lib -lmingw32 -lSDLmain -lSDL -lopengl32 -lglu32 -mwindows
OBJECTS = pbo_test.o

pbo_test.exe : $(OBJECTS)
$(CC) $(CFLAGS) -o $@ $^ $(LIBS)

%.o : %.c
$(CC) $(CFLAGS) -c $< -o $@

clean :
rm -f $(OBJECTS)

http://www.si-gamer.net/gulikoza

Reply 6 of 18, by wd

User metadata
Rank DOSBox Author
Rank
DOSBox Author

Compared it with some pbo implementations from the net, and there
doesn't seem to be anything wrong. Maybe just the drivers aren't
good enough yet...

> btw: NV_PDR seems to be disabled in sdlmain.cpp for some time already

What do you mean? The define is enabled, and the actual code is not
outcommented as well.

Reply 7 of 18, by gulikoza

User metadata
Rank Oldbie
Rank
Oldbie
#if defined(NVIDIA_PixelDataRange)
sdl.opengl.pixel_data_range=(strstr(gl_ext,"GL_NV_pixel_data_range") >0 ) &&
glPixelDataRangeNV && db_glAllocateMemoryNV && db_glFreeMemoryNV;
sdl.opengl.pixel_data_range = 0;
#endif

http://www.si-gamer.net/gulikoza

Reply 9 of 18, by gulikoza

User metadata
Rank Oldbie
Rank
Oldbie

It was added with the render optimizations. I don't know why since I can't test if it works 😀 That's also why I wrote a separate app for PBO testing, just to rule out any dosbox incompatibility. I don't know what different scalers do to the locked memory - I'm locking PBO with WRITE_ONLY, I assume scalers don't read from the surface when GFX_HARDWARE is specified. That would probably be very bad 😀

http://www.si-gamer.net/gulikoza

Reply 10 of 18, by gulikoza

User metadata
Rank Oldbie
Rank
Oldbie

I got a response from ATi today. I don't know whether to laugh or not...

Solution: Please refer to some of the online forums that deals directly with the issues you're having. Below are a couple of forums we have found.

ARB_pixel_buffer_object patch

http://forums.miranda-im.org/showthread.php?t=525&page=30

I've had my share of experiences with customer support, but they are usually providing useless answers...but this tops it all...
Is this the worst customer support or what 🤣 🤣

http://www.si-gamer.net/gulikoza

Reply 11 of 18, by MiniMax

User metadata
Rank Moderator
Rank
Moderator

Try sending them an invoice for providing support to yourself 😀

DOSBox 60 seconds guide | How to ask questions
_________________
Lenovo M58p | Core 2 Quad Q8400 @ 2.66 GHz | Radeon R7 240 | LG HL-DT-ST DVDRAM GH40N | Fedora 32

Reply 12 of 18, by gulikoza

User metadata
Rank Oldbie
Rank
Oldbie

The story goes on. After writing a 'not-so-very-polite' answer I got a response again:

"This is customer care for support of AMD products. We troubleshoot issues relating to graphics card and software installation. We don't have any training on programming, application test or software engineering."

So now I really wonder who writes the drivers 🤣 🤣

This time I waited a day to blow off some steam and then politely asked why writing to "Catalyst Crew Feedback" cannot solve issues with "Catalyst drivers" 😀 Usually I'm not such PITA, but listening to that guy trying to get Verizon acknowledge the difference between 0.002 $ and 0.002 cents has got me thinking 😁

http://www.si-gamer.net/gulikoza

Reply 13 of 18, by gulikoza

User metadata
Rank Oldbie
Rank
Oldbie

They've fixed the crash in 7.4! 🤣
Well...kinda fixed it, I wonder do they even try to run the code they write? 😒
Unfortunately the speed is still lower then no pbo and GL_UNSIGNED_BYTE texture format.

Attachments

  • dosbox-pbo.png
    Filename
    dosbox-pbo.png
    File size
    22.58 KiB
    Views
    2451 views
    File license
    Fair use/fair dealing exception

http://www.si-gamer.net/gulikoza

Reply 15 of 18, by gulikoza

User metadata
Rank Oldbie
Rank
Oldbie

Seems less and less likely somebody will get it right 🙁
In 7.5, the images render ok, but the speed is terrible. Here are the updated results:

ATi:
1. 50fps
2. 245fps
3. 47fps
4. 172fps

compare this to the same program running on nvidia:

nVidia:
1. 293fps
2. 274fps
3. 1122fps
4. 1183fps

my next card just might be 8800, I doubt x2900 performs any better...

btw, I also ported my test app to Direct3D. The results are (1. is using normal textures, 2. is using D3DUSAGE_DYNAMIC flag which should be comparable to OGL PBO):

1. 470fps
2. 751fps

(those of you that saw lower numbers before the edit...I added -O3 when compiling 😜 😉)

http://www.si-gamer.net/gulikoza

Reply 16 of 18, by gulikoza

User metadata
Rank Oldbie
Rank
Oldbie

Let's repeat the tests again with 7.12 😀

Just to refresh everyone's memory, I've compiled 4 versions of dosbox (disabled all render optimizations, so it's drawing full 70fps). I'm still using the very same builds as above. I'm testing different texture formats:
1. original texture format (GL_UNSIGNED_INT_8_8_8_8_REV)
2. GL_UNSIGNED_BYTE
3. PBO + GL_UNSIGNED_INT_8_8_8_8_REV
4. PBO + GL_UNSIGNED_BYTE

The results on my ATi are (tested on dualcore...50% = 100% of a single core)
1. 50 % cpu usage
2. 50% cpu usage
3. 15% cpu usage (< 10% with mouse pointer outside the window)
4. 15% cpu usage (< 10% with mouse pointer outside the window)

Holy cow batman! 😳 Something has actually changed!
No. 2 (which I use in my builds since my tests) is now unusable. On the other hand, PBO shows a significant improvement. Almost as good as nvidia, if it wasn't performing worse when mouse cursor is over the window (why 😕)

Let's do the numbers (with PBO_test app):

1. 40.40 fps
2. 40.36 fps
3. 290.23 fps (~200 fps with mouse cursor hovering over the window)
3. 289.32 fps (~200 fps with mouse cursor hovering over the window)

The results seem to mostly confirm dosbox cpu usage. It's clear that without PBO, performance has dropped significantly. It is also clear, that opengl still cannot push more then 300fps even with PBO enabled. PBOs also seem to help cpu usage a bit, although are not increasing the overall texture upload speed a lot. Now if these tests are real, then openglide LFB performance should also be a lot lower then with previous driver versions - well, at least until I write a PBO patch 😀

http://www.si-gamer.net/gulikoza

Reply 18 of 18, by gulikoza

User metadata
Rank Oldbie
Rank
Oldbie

Yes, I saw that patch...it's basically the same as mine (but mine is better 😜). I have it somewhere on my hdd, I wanted to test it some time ago but haven't had the time yet 😀

http://www.si-gamer.net/gulikoza