VOGONS


First post, by superfury

User metadata
Rank l33t++
Rank
l33t++

I've created a little BIOS ROM program myself (can be found in UniPCemu's git repository, at UniPCemu/assembly/testRAMthroughput.asm, compiled with nasm on Windows in my case).

It essentially keeps executing a MOVSD instruction at unaligned addresses (starting at offset 1) in a loop.
But the loop itself not just loops and resets the SI/DI parameters. It also keeps swapping the ES and DS registers each time it executed (alternating between segment 0h and 1000h).
The code segment itself is doing the very same thing, alternativing between segment 2000h and 3000h.
Execution itself starts at segment F000h (through the initial far jump at FFFF:0 or F000:FFFF). The little startup code I wrote simply copies the ROM to segments 2000h and 3000h (except the initial jump and bytes following it in the ROM) and jumps to the entry point of the loop.

So that should stress the caches as much as possible, at least in theory (unaligned accesses combined with constant cache flushes if using a simple 1-segment cache on code segment read and 1-segment cache on read and write data segments, as UniPCemu uses itself for it's memory caches).

Weirdly enough, it keeps the CPU at a steady 22%, even though Windows 95 slows it down to (at worst) 7% with it's execution.
It drops down to 17% with two instances. 12% with 3 instances. 10% using 4 instances.
All running at 3000MIPS speed in a i7-4790K CPU@4GHz.

Although I don't know if that's very good speeds for such a fast host CPU (and the CPU is using an interpreter in IPS cycle mode (Dosbox-compatible mode, which should be faster than cycle-accurate mode)).
Although it's still running at about 660000 instructions per second realtime (calculated from the cycle count(3000000 instructions) and the rate it reports running (22% realtime speed), so 3000000x0.22=660000 instructions per second).
So that's the equivalent of a 0.66MHz 486 (assuming 1-cycle instructions on average)?

Author of the UniPCemu emulator.
UniPCemu Git repository
UniPCemu for Android, Windows, PSP, Vita and Switch on itch.io

Reply 1 of 1, by superfury

User metadata
Rank l33t++
Rank
l33t++

Managed to stress it more by fixing some issues with it. Now it's down to 17%.

Although Windows 9x/NT does a better jpb somehow(lowering it to 7% and 14% sometimes)?

Oddly enough, even though RAM accesses are more optimized in the profiler, throughput seems lower in total?

If you look at UniPCemu's mmu/mmuhandler.c, there's a (relatively) heavy function called applyMemoryHoles which does most of said heavy lifting, even though still mostly cached somehow (the second if-clause is the cache exerting itself, taking most CPU time)?
There are essentially 8 caches, split into 2x3 caches. One of those is for non-DMA, the other for DMA accesses(upper 4 entries). Of each of those 4, 3 entries are used. They are used as data write(0/4), data read(1/5) and code read(3/7).
In the newer (not released on itch.io) version, most parsing and mapping are now precalculated (the (pre)calc functions). Although stuff like BIOS seems to slow down because of the new version?

Cache behaviour sure is weird sometimes, although theoretically simple.

Author of the UniPCemu emulator.
UniPCemu Git repository
UniPCemu for Android, Windows, PSP, Vita and Switch on itch.io