First post, by vladstamate
- Rank
- Oldbie
So I finally got around to implementing (what I think) is more correct CPU bus and prefetch behavior in my emulator. I wanted to share some interesting finds. But first this is what is emulated:
- instruction timings (not 100% accurate, but quite close)
- EA calculation timing (5 clocks for this)
- take vs not take a jump (different timings in each case)
- jumps (including calls, int, etc) clear prefetch queue
- correct bus behavior: a transfer every 4th clock, with prefetch filling up in only idle bus cycles.
- all data transfered in and out of CPU obeys the rule above
So I've tried this with both 8086 and 8088 and I found out that the prefetch queue is empty about 9% of requests for 8086 and 15% of requests for 8088. This means even for an 8088 5 out of 6 times the CPU is reading an instruction byte it finds it in the prefetch queue and only 1 out of 6 has to wait 4 more cycles. Is this in the realm of expected?
Here is a question though:
In the case of Jxx where we do not take the branch the instruction only takes 4 cycles. All good, however if we do take the branch, I clear the prefetch queue and proceed to wait 16 cycles. The penalty to the following instructions is not that high though because by the time the 16 cycles finished, the prefetch had a chance to fill 2 bytes or so now it has data again. However if I would clear the prefetch queue at the end of the 16 cycles that would hurt more since then I would have to wait 4 more cycles (for at least 1 bus transfer) before the next instruction is available.
Currently I am implementing the first way, and I am not sure which one is the more correct.
YouTube channel: https://www.youtube.com/channel/UC7HbC_nq8t1S9l7qGYL0mTA
Collection: http://www.digiloguemuseum.com/index.html
Emulator: https://sites.google.com/site/capex86/
Raytracer: https://sites.google.com/site/opaqueraytracer/