First post, by superfury
I'm running UniPCemu's cycle(mostly cycle-accurate, except new 80386+ instructions and stuff like task switching and segment descriptor loading) core of the 80386 using a stock 4.0GHz i7 CPU. But when I run the 80386 Compaq Deskpro 386@16MHz(same speed used on the Inboard 386 XT, while the Inboard 386 AT uses a 32MHz CPU clock instead). The hardware run at a base clock of 14.31818Mhz always, except for hardware using their own clocks(like video cards(even for the CGA, for simple video card compatibilty(even though it runs it's own clock of 14.31818MHz instead of the general clock used by the CPU) with VGA and SVGA emulation) and sound card output(running at the 14MHz clock converted to normal time in nanoseconds to provide a basic timing to be used for outputting the samples easily to the renderer at a fixed rate(like 44.1kHz, depending on the Sound Card, or realtime clocks(the Sound Blaster recording clock is modified to use the actual time the emulator's running to provide recording at realtime without distorting the recorded input by running at a variable rate(depending on the CPU being able to match the realtime speed, which isn't always the case, especially with the heavier CPUs like the 80386+ at 16MHz+))), it runs at only 20% speed(requiring 100% speed to play games at normal speed). Is it normal for a cycle-accurate 16MHz CPU emulation to be so heavy on a Intel i7@4.0GHz CPU? Or does that mean my emulator is badly optimized, for some reason? It's essentially running three heavy clocks on the system in that situation: a 16MHz CPU clock(integer clock which is converted to the 14.31818MHz clock most PC-compatible hardware is based on), a video clock that's running off the CPU clock converted to nanosecond units(double floating point) which runs at different speeds(e.g. 25/28MHz VGA, 14.31818MHz CGA, MDA clock or ET3000/ET4000 SVGA clocks, which can be set by software in the (S)VGA cases) and finally a realtime clock that's directly used by specific hardware(e.g. 44.1kHz Sound Blaster output, CMOS timing, Floppy disk controller(TODO, but planned for some future version supporting physical disk movements for more accuracy), 44.1kHz Game Blaster output, ATA/ATAPI controllers, Joystick timings, Modem timings, Parallel port timings, PIT sound output, PS/2 keyboard timing, PS/2 mouse timing, Sound Source/Covox Speech Thing output, UART timings).
Profiling shows that about 35% is spent in the CPU EU/BIU emulation, 20% in the video card emulation(Plain VGA in this case), remaining units take barely any time, up to 7% each, depending on the hardware.
Is it normal for a 16MHz 80386 cycle-based emulation to be this heavy? Or am I simply optimizing it wrong in some way?
Profiler output from Compaq Deskpro 386 POSTing:
Flat profile:
Each sample counts as 0.01 seconds.
% cumulative self self total
time seconds seconds calls s/call s/call name
36.28 66.76 66.76 _mcount_private
20.46 104.40 37.64 __fentry__
3.57 110.97 6.57 259658824 0.00 0.00 updateVGA
3.39 117.20 6.23 2621155 0.00 0.00 DoEmulator
2.47 121.75 4.55 259633524 0.00 0.00 CPU_tickBIU
1.95 125.33 3.58 319094823 0.00 0.00 VGA_ActiveDisplay_Text
1.86 128.76 3.43 floor
1.59 131.68 2.92 555702049 0.00 0.00 getnspassed
1.53 134.50 2.82 259637130 0.00 0.00 CPU_exec
1.26 136.81 2.31 259661765 0.00 0.00 update8042
1.22 139.05 2.24 259650694 0.00 0.00 updateATA
1.07 141.02 1.97 259637810 0.00 0.00 updateGameBlaster
0.98 142.83 1.81 floorf
0.83 144.36 1.53 304588206 0.00 0.00 checkMMUaccess
0.79 145.82 1.46 304586596 0.00 0.00 CPU_MMU_checklimit
0.71 147.12 1.30 259659161 0.00 0.00 debugger_step
0.66 148.33 1.21 259661051 0.00 0.00 needdebugger
0.65 149.52 1.19 77454927 0.00 0.00 DMA_StateHandler_SI
0.64 150.70 1.18 452976015 0.00 0.00 fifobuffer_freesize
0.64 151.88 1.18 259654778 0.00 0.00 tickPIT
0.62 153.02 1.14 177706134 0.00 0.00 CPU_fillPIQ
0.61 154.15 1.13 229797672 0.00 0.00 BIOS_readhandler
Author of the UniPCemu emulator.
UniPCemu Git repository
UniPCemu for Android, Windows, PSP, Vita and Switch on itch.io