Reply 100 of 122, by superfury
I've just been thinking about something.
Your prefetch algorithm mentions that on T3 it makes a decision to switch to either PF or EU state after T4.
But can it also decide to go idle (ticking said 3rd state transition on T1 after that, delaying 1 cycle)? So if no request from EU and PIQ is full after T4?
UniPCemu just blindly switches to idle state in 3 cycles including the T3 cycle (so T3, T4, --(final cycle of the switch to idle)).
Or should UniPCemu never decide to switch to idle state on T3, but instead decide to go to PF state instead if it's on T3 (but T1 can still move to idle regardless)?
Edit: Just modified the BIU to instead of moving to Idle state on T3, instead move to PF state on T3 and only allow moving to Idle state on T1 after that (unless it actually can start of course).
So that should fix some issues with T3-T4-FirstIdle(T1)-IdleModeCycle(T1 again) (due to full PIQ at T3, the EU not requesting anything new). If the PIQ had one byte left and it's filled at T3, behaviour is unchanged (as T1 would see a full PIQ without EU request, thus 3 cycles to Idle mode being performed already).
Edit: 8088 MPH now reports 1656 cycles (1%), thus only 22 cycles missing from the count (in total).
I at least think that the IN/OUT instructions' specific timings might be a part of the problem (it allows a prefetch in between, which will fetch the 12h byte before the OUT instruction actually performs it's own I/O write(T1-T2-T3-Tw-T4 cycles)), seeing as it happens at the start of the log in the spreadsheet. But somehow the spreadsheet of MartyPC doesn't seem to perform that fetch before it?
May I ask what your emulator does when executing a OUT or IN instruction? What steps does the EU perform before and after the request to the BIU to read/write the I/O port?
Edit: Found a slight bug in the detection of the idle mode when determining the next mode to operate in. It was always detecting an active BIU (because the mode read would never be 2(which is the idle mode) only 0(EU) or 1(PF) depending on the current transfer).
8088 MPH reports 1664 cycles now (14 cycles unaccounted for). Getting close now, but something somewhere is still erratic?
Although (according to your article (https://martypc.blogspot.com/2023/08/the-8088 … h-cpu-test.html)) only 1 more PIT clock (because that's 4 metric cycles?) will make 8088 MPH pass it as a real CPU, so it can't be used when getting closer than that?
8088 MPH's 16/256 color part is now showing 3 scanlines (each scanline shifted one character to the right), starting at the second character clock of active display repeated on the screen's entire height (so second on first scanline, fourth on second, sixth on third, second on fourth etc.) of noise now, so that's increased somehow?
In realtime measurements, the Kefrens effect 'blocks' (actually two ends of the same scanline being displayed on two consecutive rows) seem to shift off the screen in roughly 1 second intervals (with the CPU at 22% realtime). So that combined is about every 220ms of emulated time one of those blocks move off the screen and another one takes it's place.
Using a simple calculator (4.77MHzx0.22) that's about 1050000 cycles on the CPU for each of those split scanline ends to move off the screen entirely and the next to take it's place (the scanline displacement effectively, that shouldn't exist).
Edit: Fixing the DMA to end with 3 extra cycles after S4 (as in https://martypc.blogspot.com/2023/05/explorin … -on-ibm-pc.html 's cycle log at the end of that page) makes 8088 MPH report a 'true 8088' now.
But it hangs immediately after that because the DMA controller keeps hogging the bus permanently somehow (S4 always transferring into S1)?
Edit: Modifying the DMA controller to tick S4 as S0-S1-S2 as well when SI on S4's clock cycle detects another transfer pending (so S4 is followed by S3), 8088MPH doesn't crash the DMA controller on the infinite stream of super-fast transfer requests (it would get a request before it could finish one with the PIT counter setup for 2 PIT ticks per transfer, which would be way too fast).
Edit: Removing that to become just SI-S0 on S4 changes back the behaviour. Then adding back the extra 'Tw' 3 cycles to be performed after as a special waitstate DMA S-state fixes it to run 8088 MPH without crashing again when starting the first screen, as well as the cycle count to be reporting a 'real' 8088.
The extra Tw-states (on XT only) are now performed using an extra Sw S-state handler (just like the SI and S0-S4 handlers). It is only triggered to tick after T4 with the purpose of keeping the bus busy for 3 more cycles in this case.
Edit: The racing the beam (Kefrens) gives a black screen now? It doesn't crash though, just continues to the next part after some time.
So it might have become more 'accurate' according to the real 8088 check (falling within range somehow), but the Kefrens effect fails 100% now (black screen, which is weird. That didn't ever happen from what I've seen, in not a single UniPCemu test so far).
I do see 8088 MPH performing DMA transfers back-to-back though when starting up? That happens when switching to the old vs new CGA screen from MS-DOS after a 'real' 8088 is found and some (relatively) long delay (a few seconds emulated time) is finished, while switching to the hacked color mode (the MS-DOS text screen is visible and stretched at that point, DMA executing: SI-S4(which has a merged SI-S0 in the S4 cycle)followed S1-S4(since DREQ is raised again) followed by 3 Sw cyvles (The special Tw cycles, but induced by XT DMA only, preventing the BIU from obtaining the bus (archieving 'ready'))).
Author of the UniPCemu emulator.
UniPCemu Git repository
UniPCemu for Android, Windows, PSP, Vita and Switch on itch.io