First post, by superfury
I want to further optimize UniPCemu for slower PCs than what's currently required to get it running at full speed with accuracy(2.0GHz Intel P6100 is the slowest I've tested).
Using gprof this reveals the following report(everything taken until 1.0%):
Flat profile:
Each sample counts as 0.01 seconds.
% cumulative self self total
time seconds seconds calls s/call s/call name
39.76 94.90 94.90 38800 0.00 0.00 GPU_textrenderer
5.90 108.97 14.07 146136052 0.00 0.00 DMA_tick
4.42 119.52 10.55 56745039 0.00 0.00 updateCMOS
3.65 128.23 8.71 56804662 0.00 0.00 CPU_exec
3.19 135.84 7.61 526691878 0.00 0.00 readfifobuffer
3.11 143.27 7.43 56750760 0.00 0.00 tickPIT
2.99 150.40 7.13 56776238 0.00 0.00 updateVGA
2.55 156.49 6.09 801790 0.00 0.00 DoEmulator
2.41 162.25 5.76 434046491 0.00 0.00 writefifobuffer
1.75 166.42 4.17 154619149 0.00 0.00 CPU_MMU_checklimit
1.19 169.26 2.84 __umoddi3
1.18 172.08 2.82 73058552 0.00 0.00 VGA_ActiveDisplay_Text
1.11 174.72 2.64 56803261 0.00 0.00 updateAdlib
1.10 177.34 2.62 56810180 0.00 0.00 update8042
1.03 179.81 2.47 291783501 0.00 0.00 DMA_SetDREQ
1.00 182.20 2.39 56818199 0.00 0.00 updateUART
0.90 184.35 2.15 381284146 0.00 0.00 CPU_MMU_start
0.89 186.47 2.12 136013945 0.00 0.00 applySoundFilter
0.87 188.55 2.08 169668414 0.00 0.00 BIOS_readhandler
0.81 190.49 1.94 97679318 0.00 0.00 CPU_readOP
0.79 192.38 1.89 floorf
0.78 194.23 1.85 112846373 0.00 0.00 MMU_rb
0.77 196.07 1.84 141328267 0.00 0.00 MMU_INTERNAL_directrb_realaddr
0.74 197.83 1.76 145830462 0.00 0.00 FLOPPY_DMADREQ
0.65 199.39 1.56 154576918 0.00 0.00 checkMMUaccess
0.64 200.92 1.53 56753199 0.00 0.00 CPU_tickPrefetch
0.64 202.44 1.52 145714078 0.00 0.00 DRAM_DMADREQ
0.63 203.95 1.51 57337497 0.00 0.00 updateAudio
0.55 205.27 1.32 97 0.01 0.01 zoomSurfaceRGBA
0.53 206.54 1.27 56732338 0.00 0.00 tickssourcecovox
0.52 207.78 1.24 57030472 0.00 0.00 MMU_INTERNAL_directwb_realaddr
0.51 208.99 1.21 154134323 0.00 0.00 fifobuffer_freesize
0.49 210.17 1.18 70104238 0.00 0.00 VGA_Sequencer_TextMode
0.46 211.26 1.09 324439737 0.00 0.00 latchBUS
0.41 212.24 0.98 324520791 0.00 0.00 is_paging
So, essentially, the heaviest part is supposed to be the text surface renderer. This can be found in the text surface module:
https://bitbucket.org/superfury/unipcemu/src/ … ext.c?at=master
Although the rendering cost should have been minimized by only updating the buffered display(which is essentially a mask to apply and render the pixels of the text display transparently(either 100% transparent or some pixel of the contained text surface)) when it actually changes. But it still seems to be pretty heavy on the CPU at 60FPS.
Anyone knows a way to optimize it further and make it less heavy to use? It essentially draws transparently (with 100% or 0% transparency) a VGA 8x8 text output with border around it in a character-based cell(VGA-style), but with each cell having a font color and border color instead of background color. The color is stored as a RGB for both font and background color for the entire cell. It draws it's pixels to a intermediate buffer first(only updated when any character/font/background is changed). The intermediate buffer is simply plotted to the actual display each frame, on top of the (when changed only) video display.
Author of the UniPCemu emulator.
UniPCemu Git repository
UniPCemu for Android, Windows, PSP, Vita and Switch on itch.io