superfury wrote on 2023-07-14, 06:23:
Is that 1 cycle before pop for all POPs? What about RETF? UniPCemu does 2 cycles before all POPs.
Also, what do you mean with "write operand"? What timing is that (in cycles?)? Also, memory? Reg operand?
Nope, just that particular form (8f) Since there is a memory operand, POP rm8/16 has to stash the index value produced by the EA calculation in a temporary register before copying the stack pointer to the index for the stack read; that accounts for one cycle.
By 'write operand' I mean the instruction has to write the popped value from the stack back to either memory (BIU timings) or a register (1 cycle).
superfury wrote on 2023-07-14, 06:23:
Edit: OK. Prefetching is now moved into the BIU request handler (which checks on T1 and processes on T3 (T1 on 286, T0(all Tstates are T0 on 486+ afaik and is implemented) only on 486+).
unclear what you mean by that exactly; but i'll mention that i make prefetch decisions on T3/TwLast. I can only speak for 8088, but the cpu makes a decision to prefetch at least 2 cycles in advance of when it it will actually try to begin a CODE bus cycle; i call this prefetch scheduling. So this usually lands on T3 if no wait states. The prefetch is not always scheduled 2 cycles later; based on the length of the queue the cpu may delay the fetch an additional 2-3 cycles.
In any case, doing the logic this way means that you can prevent a prefetch from being scheduled during a given bus cycle if an EU bus request arrives before the prefetch is scheduled, and will incur a prefetch abort penalty if it arrives on or after the cycle the prefetch is scheduled. The latter is always a 2 cycle delay after T4.
superfury wrote on 2023-07-14, 06:23:
The only thing left to do now would be to convert all bus memory and i/o transactions using a motherboard-specified waitstate count (1 bus waitstate on XT architecture) to trigger when starting a request, before the i/o or memory access (between T2 and T3 on 808x) instead of between T3 and T4. Luckily most of the handling is already there. I'll just need to load the bus waitstate count into the BIU like it did already (if set and the below mentioned flags are both cleared) and set a flag(lower bit) and perform a normal waitstate abort and then when it's returns to said code with that bit set shift it 1 position up in the flag to mark it as processed and perform the actual memory or i/o access and check for waitstates the usual way. Then once the T4 cycle handler activates, it will clear both flags, causing future (or second broken up accesses for word transfers) transfers to perform the motherboard waitstate once again.
my waitstate processing is super simple; basically i have a counter for how many cycles that READY should be deasserted, it is always decremented on each cpu cycle, and if it is >0 when we reach T3 we insert wait states until it is 0. Bus operations that incur wait states just increment the counter; as does DMA.
superfury wrote on 2023-07-14, 06:23:Running 8088 MPH again now...
Credits still hang? It seems to have filled the 'blank canvas' code with a 'mov sp,02db' (at 1df8: […]
Show full quote
Running 8088 MPH again now...
Credits still hang? It seems to have filled the 'blank canvas' code with a 'mov sp,02db' (at 1df8:025d) and 12 NOPs.
Although I've forgotten to enable the cycle logging, this is what happens (common log format):
debugger_8088MPH_credits_UniPCemu_20230714_1738.7z
Perhaps the contents of the instruction and registers is a hint as to what's happening and when?
The main thing with the end credits of 8088MPH is that it's got a bunch of self-modifying code, so your prefetch and instruction queue emulation must be spot on, or you'll end up executing the wrong instructions. it's so tight that you must fetch operands in a cycle accurate way - for example it is tempting to read an immediate operand during instruction decode, but that is too early. Instruction decode handles reading the prefixes, opcode, modrm for you, and common microcode routines handle loading the displacement and EA operand for you, but immediate operands have to be deliberately fetched by the specific opcode's main microcode. That's usually done first thing by an instruction with an immediate operand form, but not always - JCXZ and LOOP for example both wait two cycles before fetching their operands.
MartyPC: A cycle-accurate IBM PC/XT emulator | https://github.com/dbalsom/martypc