VOGONS


First post, by superfury

User metadata
Rank l33t++
Rank
l33t++

I want to further optimize UniPCemu for slower PCs than what's currently required to get it running at full speed with accuracy(2.0GHz Intel P6100 is the slowest I've tested).

Using gprof this reveals the following report(everything taken until 1.0%):

Flat profile:

Each sample counts as 0.01 seconds.
% cumulative self self total
time seconds seconds calls s/call s/call name
39.76 94.90 94.90 38800 0.00 0.00 GPU_textrenderer
5.90 108.97 14.07 146136052 0.00 0.00 DMA_tick
4.42 119.52 10.55 56745039 0.00 0.00 updateCMOS
3.65 128.23 8.71 56804662 0.00 0.00 CPU_exec
3.19 135.84 7.61 526691878 0.00 0.00 readfifobuffer
3.11 143.27 7.43 56750760 0.00 0.00 tickPIT
2.99 150.40 7.13 56776238 0.00 0.00 updateVGA
2.55 156.49 6.09 801790 0.00 0.00 DoEmulator
2.41 162.25 5.76 434046491 0.00 0.00 writefifobuffer
1.75 166.42 4.17 154619149 0.00 0.00 CPU_MMU_checklimit
1.19 169.26 2.84 __umoddi3
1.18 172.08 2.82 73058552 0.00 0.00 VGA_ActiveDisplay_Text
1.11 174.72 2.64 56803261 0.00 0.00 updateAdlib
1.10 177.34 2.62 56810180 0.00 0.00 update8042
1.03 179.81 2.47 291783501 0.00 0.00 DMA_SetDREQ
1.00 182.20 2.39 56818199 0.00 0.00 updateUART
0.90 184.35 2.15 381284146 0.00 0.00 CPU_MMU_start
0.89 186.47 2.12 136013945 0.00 0.00 applySoundFilter
0.87 188.55 2.08 169668414 0.00 0.00 BIOS_readhandler
0.81 190.49 1.94 97679318 0.00 0.00 CPU_readOP
0.79 192.38 1.89 floorf
0.78 194.23 1.85 112846373 0.00 0.00 MMU_rb
0.77 196.07 1.84 141328267 0.00 0.00 MMU_INTERNAL_directrb_realaddr
0.74 197.83 1.76 145830462 0.00 0.00 FLOPPY_DMADREQ
0.65 199.39 1.56 154576918 0.00 0.00 checkMMUaccess
0.64 200.92 1.53 56753199 0.00 0.00 CPU_tickPrefetch
0.64 202.44 1.52 145714078 0.00 0.00 DRAM_DMADREQ
0.63 203.95 1.51 57337497 0.00 0.00 updateAudio
0.55 205.27 1.32 97 0.01 0.01 zoomSurfaceRGBA
0.53 206.54 1.27 56732338 0.00 0.00 tickssourcecovox
0.52 207.78 1.24 57030472 0.00 0.00 MMU_INTERNAL_directwb_realaddr
0.51 208.99 1.21 154134323 0.00 0.00 fifobuffer_freesize
0.49 210.17 1.18 70104238 0.00 0.00 VGA_Sequencer_TextMode
0.46 211.26 1.09 324439737 0.00 0.00 latchBUS
0.41 212.24 0.98 324520791 0.00 0.00 is_paging

So, essentially, the heaviest part is supposed to be the text surface renderer. This can be found in the text surface module:
https://bitbucket.org/superfury/unipcemu/src/ … ext.c?at=master

Although the rendering cost should have been minimized by only updating the buffered display(which is essentially a mask to apply and render the pixels of the text display transparently(either 100% transparent or some pixel of the contained text surface)) when it actually changes. But it still seems to be pretty heavy on the CPU at 60FPS.

Anyone knows a way to optimize it further and make it less heavy to use? It essentially draws transparently (with 100% or 0% transparency) a VGA 8x8 text output with border around it in a character-based cell(VGA-style), but with each cell having a font color and border color instead of background color. The color is stored as a RGB for both font and background color for the entire cell. It draws it's pixels to a intermediate buffer first(only updated when any character/font/background is changed). The intermediate buffer is simply plotted to the actual display each frame, on top of the (when changed only) video display.

Author of the UniPCemu emulator.
UniPCemu Git repository
UniPCemu for Android, Windows, PSP, Vita and Switch on itch.io

Reply 1 of 6, by vladstamate

User metadata
Rank Oldbie
Rank
Oldbie

If you are trying this in Windows try to run AMD CodeXL, it will show you what part of GPU_textrenderer function takes the most time (which line).

Also, I know this does not help with 2.0Ghz processors and lower (as they tend to be single core) but you could just move the VGA emulation on its own thread so on (modern) CPUs it will be executed on a separate core.

YouTube channel: https://www.youtube.com/channel/UC7HbC_nq8t1S9l7qGYL0mTA
Collection: http://www.digiloguemuseum.com/index.html
Emulator: https://sites.google.com/site/capex86/
Raytracer: https://sites.google.com/site/opaqueraytracer/

Reply 2 of 6, by superfury

User metadata
Rank l33t++
Rank
l33t++

I've just installed AMD CodeXL, which doesn't seem to work well:

- When opening Visual Studio, all output windows refuse to show themselves with AMD CodeXL installed.
- When trying to profile and debug using both the AMD CodeXL application and through Visual Studio using the AMD CodeXL options, the application starts and immediately terminates without any errors, AMD CodeXL nor Visual Studio showing anything's done, while AMD CodeXL profiling gives an barely filled report of 1 second of runtime, with only 2 hottest functions: BHDrvx64.sys!0xf809e612f024(1 sample) and other (27 samples).

Edit: After restarting Visual Studio, it seems to work somewhat(after opening and running the Teapot example). The windows show themselves again and the menu options seem to work again.

Do you know why the application just stops running within 1 second? No message or indication to the cause whatsoever.

Author of the UniPCemu emulator.
UniPCemu Git repository
UniPCemu for Android, Windows, PSP, Vita and Switch on itch.io

Reply 3 of 6, by vladstamate

User metadata
Rank Oldbie
Rank
Oldbie

I am personally not running it through Visual Studio, although I know it should work. I just run CodeXL separately then create a new project, I point it to my emulator executable and make sure to add the parameters I need. Then from Profile I chose CPU profile. I usually let it run for about 1 minute (at most). The performance feedback is pretty good. Not as good as Intel vTune but then again, it is free.

YouTube channel: https://www.youtube.com/channel/UC7HbC_nq8t1S9l7qGYL0mTA
Collection: http://www.digiloguemuseum.com/index.html
Emulator: https://sites.google.com/site/capex86/
Raytracer: https://sites.google.com/site/opaqueraytracer/

Reply 4 of 6, by superfury

User metadata
Rank l33t++
Rank
l33t++

Currently, looking at the rendering function of the text surfaces, it looks like the main loop rendering pixels to the screen(which are already prerendered to a buffer) is taking quite a lot of CPU time(although it's optimized for stretching(Android) too(it stretches some, both horizontally and vertically).

Filename
profiling_GPU_textrenderer.zip
File size
1.46 KiB
Downloads
81 downloads
File comment
GPU_textrenderer rendering part profiled using AMD CodeXL
File license
Fair use/fair dealing exception

I've copy-pasted it into a text file(seeing as it keeps everything seperated with commas, it seems to be a CSV file format).

Any tips on how to optimize it? It does seem to be the heaviest part (over 30%) of the entire emulator(others are in the 3-5% range. That's including the VGA).

(Once again, this forum is pretty restrictive on file extensions uploaded. Text files(txt) are allowed, but log files(.log), csv files etc. aren't(they're plain text files after all)?)

Author of the UniPCemu emulator.
UniPCemu Git repository
UniPCemu for Android, Windows, PSP, Vita and Switch on itch.io

Reply 6 of 6, by superfury

User metadata
Rank l33t++
Rank
l33t++

It locks and unlocks for a total of 180 times each second(60 frames times 3 text surfaces). It locks before rendering and unlocks after rendering.

Author of the UniPCemu emulator.
UniPCemu Git repository
UniPCemu for Android, Windows, PSP, Vita and Switch on itch.io