My code calculates 5 EA cycles for BX and 2 EA cycles for the CS override. Add 1 cycle for reading the prefetch and you end up with the correct 8-cycle delay? I just need to adjust the 8086 core to apply the EA cycles seperately, like the current prefetch cycles, and apply it to the BIU and debugger. That should automatically fix that bug.
Thinking about the EA cycles again, it's probably used in every instruction that uses ModR/M parameters. Then one strange thing is present: Almost all documentation I see on the 808X+ is having either X cycles for register or X+EA cycles for memory. So this doesn't talk about register or R/M used, but rather the top 2 bits of the ModR/M byte being 11b(Reg cases) vs 00b-10b(Mem cases)?
Edit: After the EA cycle change, it runs at 1261 cycles(EU execution and fetch starting at any T cycle, not just T3(+1) during prefetching.
Edit: 8088 MPH runs like crazy, most CPU-speed sensitive parts being way too fast(Deloringan when touching the top of the screen even/odd lines disappearing, 3D objects super fast turning, Credits music in high speed ffwd, Kefrens Bars going wrong as usual). The remaining parts run without problems.
Edit: Looking at your sniffer log, I see 'I' at T1 and 'S' at T3 most of the time(although they're all over sometimes?). Also, the BIU fetches into the PIQ or normal memory/IO on T4 only. Any idea how this works?