Optimizing UniPCemu further for slower (2.0GHz-) PCs?

Emulation of old PCs, PC hardware, or PC peripherals.

Optimizing UniPCemu further for slower (2.0GHz-) PCs?

Postby superfury » 2017-1-07 @ 20:39

I want to further optimize UniPCemu for slower PCs than what's currently required to get it running at full speed with accuracy(2.0GHz Intel P6100 is the slowest I've tested).

Using gprof this reveals the following report(everything taken until 1.0%):
Code: Select all
Flat profile:

Each sample counts as 0.01 seconds.
  %   cumulative   self              self     total           
 time   seconds   seconds    calls   s/call   s/call  name   
 39.76     94.90    94.90    38800     0.00     0.00  GPU_textrenderer
  5.90    108.97    14.07 146136052     0.00     0.00  DMA_tick
  4.42    119.52    10.55 56745039     0.00     0.00  updateCMOS
  3.65    128.23     8.71 56804662     0.00     0.00  CPU_exec
  3.19    135.84     7.61 526691878     0.00     0.00  readfifobuffer
  3.11    143.27     7.43 56750760     0.00     0.00  tickPIT
  2.99    150.40     7.13 56776238     0.00     0.00  updateVGA
  2.55    156.49     6.09   801790     0.00     0.00  DoEmulator
  2.41    162.25     5.76 434046491     0.00     0.00  writefifobuffer
  1.75    166.42     4.17 154619149     0.00     0.00  CPU_MMU_checklimit
  1.19    169.26     2.84                             __umoddi3
  1.18    172.08     2.82 73058552     0.00     0.00  VGA_ActiveDisplay_Text
  1.11    174.72     2.64 56803261     0.00     0.00  updateAdlib
  1.10    177.34     2.62 56810180     0.00     0.00  update8042
  1.03    179.81     2.47 291783501     0.00     0.00  DMA_SetDREQ
  1.00    182.20     2.39 56818199     0.00     0.00  updateUART
  0.90    184.35     2.15 381284146     0.00     0.00  CPU_MMU_start
  0.89    186.47     2.12 136013945     0.00     0.00  applySoundFilter
  0.87    188.55     2.08 169668414     0.00     0.00  BIOS_readhandler
  0.81    190.49     1.94 97679318     0.00     0.00  CPU_readOP
  0.79    192.38     1.89                             floorf
  0.78    194.23     1.85 112846373     0.00     0.00  MMU_rb
  0.77    196.07     1.84 141328267     0.00     0.00  MMU_INTERNAL_directrb_realaddr
  0.74    197.83     1.76 145830462     0.00     0.00  FLOPPY_DMADREQ
  0.65    199.39     1.56 154576918     0.00     0.00  checkMMUaccess
  0.64    200.92     1.53 56753199     0.00     0.00  CPU_tickPrefetch
  0.64    202.44     1.52 145714078     0.00     0.00  DRAM_DMADREQ
  0.63    203.95     1.51 57337497     0.00     0.00  updateAudio
  0.55    205.27     1.32       97     0.01     0.01  zoomSurfaceRGBA
  0.53    206.54     1.27 56732338     0.00     0.00  tickssourcecovox
  0.52    207.78     1.24 57030472     0.00     0.00  MMU_INTERNAL_directwb_realaddr
  0.51    208.99     1.21 154134323     0.00     0.00  fifobuffer_freesize
  0.49    210.17     1.18 70104238     0.00     0.00  VGA_Sequencer_TextMode
  0.46    211.26     1.09 324439737     0.00     0.00  latchBUS
  0.41    212.24     0.98 324520791     0.00     0.00  is_paging


So, essentially, the heaviest part is supposed to be the text surface renderer. This can be found in the text surface module:
https://bitbucket.org/superfury/unipcem ... ?at=master

Although the rendering cost should have been minimized by only updating the buffered display(which is essentially a mask to apply and render the pixels of the text display transparently(either 100% transparent or some pixel of the contained text surface)) when it actually changes. But it still seems to be pretty heavy on the CPU at 60FPS.

Anyone knows a way to optimize it further and make it less heavy to use? It essentially draws transparently (with 100% or 0% transparency) a VGA 8x8 text output with border around it in a character-based cell(VGA-style), but with each cell having a font color and border color instead of background color. The color is stored as a RGB for both font and background color for the entire cell. It draws it's pixels to a intermediate buffer first(only updated when any character/font/background is changed). The intermediate buffer is simply plotted to the actual display each frame, on top of the (when changed only) video display.
superfury
Oldbie
 
Posts: 1281
Joined: 2014-3-08 @ 11:25
Location: Netherlands

Re: Optimizing UniPCemu further for slower (2.0GHz-) PCs?

Postby vladstamate » 2017-1-08 @ 00:04

If you are trying this in Windows try to run AMD CodeXL, it will show you what part of GPU_textrenderer function takes the most time (which line).

Also, I know this does not help with 2.0Ghz processors and lower (as they tend to be single core) but you could just move the VGA emulation on its own thread so on (modern) CPUs it will be executed on a separate core.
User avatar
vladstamate
Member
 
Posts: 327
Joined: 2015-8-23 @ 01:43

Re: Optimizing UniPCemu further for slower (2.0GHz-) PCs?

Postby superfury » 2017-1-08 @ 14:31

I've just installed AMD CodeXL, which doesn't seem to work well:

- When opening Visual Studio, all output windows refuse to show themselves with AMD CodeXL installed.
- When trying to profile and debug using both the AMD CodeXL application and through Visual Studio using the AMD CodeXL options, the application starts and immediately terminates without any errors, AMD CodeXL nor Visual Studio showing anything's done, while AMD CodeXL profiling gives an barely filled report of 1 second of runtime, with only 2 hottest functions: BHDrvx64.sys!0xf809e612f024(1 sample) and other (27 samples).

Edit: After restarting Visual Studio, it seems to work somewhat(after opening and running the Teapot example). The windows show themselves again and the menu options seem to work again.

Do you know why the application just stops running within 1 second? No message or indication to the cause whatsoever.
superfury
Oldbie
 
Posts: 1281
Joined: 2014-3-08 @ 11:25
Location: Netherlands

Re: Optimizing UniPCemu further for slower (2.0GHz-) PCs?

Postby vladstamate » 2017-1-08 @ 14:51

I am personally not running it through Visual Studio, although I know it should work. I just run CodeXL separately then create a new project, I point it to my emulator executable and make sure to add the parameters I need. Then from Profile I chose CPU profile. I usually let it run for about 1 minute (at most). The performance feedback is pretty good. Not as good as Intel vTune but then again, it is free.
User avatar
vladstamate
Member
 
Posts: 327
Joined: 2015-8-23 @ 01:43

Re: Optimizing UniPCemu further for slower (2.0GHz-) PCs?

Postby superfury » 2017-1-08 @ 16:57

Currently, looking at the rendering function of the text surfaces, it looks like the main loop rendering pixels to the screen(which are already prerendered to a buffer) is taking quite a lot of CPU time(although it's optimized for stretching(Android) too(it stretches some, both horizontally and vertically).

profiling_GPU_textrenderer.zip
GPU_textrenderer rendering part profiled using AMD CodeXL
(1.46 KiB) Downloaded 1 time


I've copy-pasted it into a text file(seeing as it keeps everything seperated with commas, it seems to be a CSV file format).

Any tips on how to optimize it? It does seem to be the heaviest part (over 30%) of the entire emulator(others are in the 3-5% range. That's including the VGA).

(Once again, this forum is pretty restrictive on file extensions uploaded. Text files(txt) are allowed, but log files(.log), csv files etc. aren't(they're plain text files after all)?)
superfury
Oldbie
 
Posts: 1281
Joined: 2014-3-08 @ 11:25
Location: Netherlands

Re: Optimizing UniPCemu further for slower (2.0GHz-) PCs?

Postby Alegend45 » 2017-1-11 @ 14:23

Are you locking and unlocking the surface you're writing to often? I'm thinking that's probably the answer.
User avatar
Alegend45
Newbie
 
Posts: 70
Joined: 2012-6-23 @ 18:18

Re: Optimizing UniPCemu further for slower (2.0GHz-) PCs?

Postby superfury » 2017-1-11 @ 20:50

It locks and unlocks for a total of 180 times each second(60 frames times 3 text surfaces). It locks before rendering and unlocks after rendering.
superfury
Oldbie
 
Posts: 1281
Joined: 2014-3-08 @ 11:25
Location: Netherlands


Return to PC Emulation

Who is online

Users browsing this forum: Bing [Bot] and 3 guests