VOGONS


UniPCemu 8088 cycle accuracy

Topic actions

Reply 21 of 122, by superfury

User metadata
Rank l33t++
Rank
l33t++
GloriousCow wrote on 2023-07-15, 14:43:

are you able to check if DMA was disabled as expected?

If the PIT register is written properly it should be disabled when it's running (as the timer is waiting for input and kept the output high or low permanently until properly starting the timer by triggering it by writing the counter).

Author of the UniPCemu emulator.
UniPCemu Git repository
UniPCemu for Android, Windows, PSP, Vita and Switch on itch.io

Reply 23 of 122, by superfury

User metadata
Rank l33t++
Rank
l33t++
GloriousCow wrote on 2023-07-15, 19:38:

Assume or verify... up to you!

Well, of the proper value is set in the PIT command register it should work (it's fully implemented, with PIT0 and PIT1 gate wired permanently high, PIT2 gate from port 61h's bit).
And yes, it should work if it's according to documented bits (all implemented).

Author of the UniPCemu emulator.
UniPCemu Git repository
UniPCemu for Android, Windows, PSP, Vita and Switch on itch.io

Reply 24 of 122, by superfury

User metadata
Rank l33t++
Rank
l33t++
GloriousCow wrote on 2023-07-15, 19:38:

Assume or verify... up to you!

Do you know if only at one time in the program the register is written before and after the tests?
Edit: Also just found out a slight BIU timing issue. When 16-bit or 32-bit transfers are performed (hardware not responding at said address), it will perform the 16-bit(to 8-bit) and 32-bit(to 16-bit) break up in one go and perform the actual hardware i/o access (on the emulated hardware side only, not the BIU) in the T3 cycle on the 808x.
I sure hope 8088MPH or area5150 doesn't require that (or use 16-bit port i/o)?

UniPCemu will perform the correct amount of BIU cycles on the BIU in that case, but only perform the hardware access (the hardware port i/o) on the first cycle. So if OUT DX,AX with DX being a CGA CRTC register and value, the hardware will receive both at the very same cycle (without waitstates in between of course).

Thinking about it, how are ports broken up by the BIU? If no 16-bit port reponds on a 16-bit access, it's obvious to split into byte accesses. But what about 32-bit on 16-bit bus (or unaligned accesses)? Is it always split into 2x16-bit, 4x8 bit or perhaps 1x8+1x16(perhaps converted to 2x8)+1x8 etc. depending on alignment?

Edit: Just modified it to actually split up the 16-bit and 32-bit memory accesses (added theoretical support for 64-bit split into 32-bit accesses as well, although usually not used but supported by BIU flags in theory). So an OUT DX,AL/AX will try a port as 16-bit (if the bus is capable), otherwise split into 2 (as in the 8088's case) accesses with two transfers (T1-T4 twice, once for each split i/o port). Previously that wasn't split at all (on the BIU only, not in hardware receiving/giving the request).
Just spend hours trying to find the cause of the BIOS not booting anymore. Eventually found the cause: it wasn't returning the read value for port reads, so the CPU kept reading 0x00 from any I/O port it tried! Whoops! 😖

So for example writing CGA registers using a single OUT DX,AX will now perform two BIU T1-T4 cycles, with proper timing on the hardware (the second I/O port of the higher address received by the hardware during the second T3 of the request, the first I/O port during the first T3). And of course both ports will be properly waitstating before the write at said I/O port when split up now.

Ah well, that's fully fixed now and working! 😁

Edit: And the latest timings from your program:

disk:	real:	comp:	disp:	comp:	disp2:	comp:
FF36 FF43 <(-13) FF36 <(-13) FF36 <(-13)
FE3C FE59 <(-29) FE3D <(-28) FE3D <(-28)
FDA0 FDC5 <(-37) FDA1 <(-36) FDA3 <(-34)
FD26 FD58 <(-50) FD27 <(-49) FD28 <(-48)
FCF7 FD2A <(-51) FCF8 <(-50) FCF8 <(-50)
FC31 FC6B <(-58) FC31 <(-58) FC31 <(-58)
FB6D FBB7 <(-74) FB6E <(-73) FB6D <(-74)
F93C F9A9 <(-109) F93D <(-108) F93D <(-108)
CPU test complete. Elapsed timer ticks:
07F1 07CA <(+27) 07F2 <(+28) 07F1 <(+27)

Edit: You were right about the DMA though. It was ticking in PIT mode 2, which should be stuck in state 1 in UniPCemu (which means: listening for reload, which never happens during the testing, thus stuck output high).
But UniPCemu wasn't doing that because of a missing c++ break statement, thus flowing into state 1 of modes 3/7 (the same for mode 6), thus actually toggling the output of PIT1 on getting to 0(even) or -1/FFFF(odd).

Having fixed that, I don't see any DMA anymore during the test.

The results (disp3 and it's comp column following it showing the results and difference):

disk:	real:	comp:	disp:	comp:	disp2:	comp:	disp3:	comp:
FF36 FF43 <(-13) FF36 <(-13) FF36 <(-13) FF46 >(+3)
FE3C FE59 <(-29) FE3D <(-28) FE3D <(-28) FE61 >(+8)
FDA0 FDC5 <(-37) FDA1 <(-36) FDA3 <(-34) FDD3 >(+14)
FD26 FD58 <(-50) FD27 <(-49) FD28 <(-48) FD63 >(+11)
FCF7 FD2A <(-51) FCF8 <(-50) FCF8 <(-50) FD37 >(+13)
FC31 FC6B <(-58) FC31 <(-58) FC31 <(-58) FC81 >(+22)
FB6D FBB7 <(-74) FB6E <(-73) FB6D <(-74) FBCD >(+22)
F93C F9A9 <(-109) F93D <(-108) F93D <(-108) F9C9(+32)
CPU test complete. Elapsed timer ticks:
07F1 07CA <(+27) 07F2 <(+28) 07F1 <(+27) 074D <(-125)
Last edited by superfury on 2023-07-16, 03:10. Edited 1 time in total.

Author of the UniPCemu emulator.
UniPCemu Git repository
UniPCemu for Android, Windows, PSP, Vita and Switch on itch.io

Reply 25 of 122, by superfury

User metadata
Rank l33t++
Rank
l33t++

Just a small question: what is the program dumping? Pit clock counts(compared to the reload value)? CPU cycles? Raw pit clock counts read from the PIT registers (after latching)?

Edit: 8088MPH's snow at 16 colors/256 colors is now at the first clock of the scanline?
Kefrens is still showing the even/odd scanlines.
Although I did see something interesting now: the bottom of the screen sometimes shows entire green scanlines from all the way left to the right side of the screen? That probably shouldn't happen?

Edit: Made a recording just to be sure. And I was right: they're actually there (green scanlines instead of proper background!):

UniPCemu_8088MPH_Kefrens_strangegreenscanlineeffectsatbottom.png
Filename
UniPCemu_8088MPH_Kefrens_strangegreenscanlineeffectsatbottom.png
File size
432.89 KiB
Views
1024 views
File comment
8088 MPH Kefrens green scanlines showing.
File license
Fair use/fair dealing exception

Edit: 8088 MPH credits run again! 😁
And sound great as expected.

Author of the UniPCemu emulator.
UniPCemu Git repository
UniPCemu for Android, Windows, PSP, Vita and Switch on itch.io

Reply 26 of 122, by superfury

User metadata
Rank l33t++
Rank
l33t++

Hmmm... The queue status (QS0/1)...
Is it representative of what the EU is doing with the queue on a cycle? Or is it driven by the BIU only?

Edit: I've implemented it based on the EU side of reading the BIU for now (- being idle (or waiting), I for first byte fetched from PIQ, S for second byte and onwards (unless it's a single cycle together with the first byte, which results in the first cycle instead)) and E when the EU empties the PIQ.

This is what I'm getting atm:

BIU cycles of the first few instruction executing in the BIOS
00:00:21:77.06138: BIU T1 -		
00:00:21:77.07244: BIU T2 -
00:00:21:77.07444: BIU T3 - Physical(p):000ffff0=ea(ê); Paged(p):000ffff0=ea(ê); Normal(p):00000000=ea(ê)
00:00:21:77.07616: BIU T4 -
00:00:21:77.07786: BIU T1 I
00:00:21:77.07952: BIU T2 -
00:00:21:77.08124: BIU T3 - Paged(p):000ffff1=5b([); Normal(p):00000001=5b([)
00:00:21:77.08292: BIU T4 -
00:00:21:77.08456: BIU T1 S
00:00:21:77.08622: BIU T2 -
00:00:21:77.08792: BIU T3 - Paged(p):000ffff2=e0(à); Normal(p):00000002=e0(à)
00:00:21:77.08958: BIU T4 -
00:00:21:77.09126: BIU T1 S
00:00:21:77.09292: BIU T2 -
00:00:21:77.09462: BIU T3 - Paged(p):000ffff3=00( ); Normal(p):00000003=00( )
00:00:21:77.09630: BIU T4 -
00:00:21:77.09798: BIU T1 S
00:00:21:78.00154: BIU T2 -
00:00:21:78.00272: BIU T3 - Paged(p):000ffff4=f0(ð); Normal(p):00000004=f0(ð)
00:00:21:78.00384: BIU T4 -
00:01:02:64.02888: BIU T1 S
00:01:02:64.03148: BIU T2 E ffff:0000 (EA5BE000F0)JMP F000:E05B
00:01:02:64.03432: BIU T3 - Physical(p):000fe05b=fa(ú); Paged(p):000fe05b=fa(ú); Normal(p):0000e05b=fa(ú)
00:01:02:64.03584: BIU T4 -
00:01:02:64.03720: BIU T1 I
00:01:02:64.03852: BIU T2 - f000:e05b (FA)CLI
00:01:02:64.03980: BIU T3 - Paged(p):000fe05c=fc(ü); Normal(p):0000e05c=fc(ü)
00:01:02:64.04096: BIU T4 -
00:01:02:64.04220: BIU T1 I
00:01:02:64.04340: BIU T2 - f000:e05c (FC)CLD
00:01:02:64.04460: BIU T3 - Paged(p):000fe05d=b0(°); Normal(p):0000e05d=b0(°)
00:01:02:64.04572: BIU T4 -
00:01:02:64.04684: BIU T1 I
00:01:02:64.04796: BIU T2 -
00:01:02:64.04908: BIU T3 - Paged(p):000fe05e=00( ); Normal(p):0000e05e=00( )
00:01:02:64.05024: BIU T4 -
00:01:02:64.05132: BIU T1 S
00:01:02:64.05256: BIU T2 - f000:e05d (B000)MOV AL,00
00:01:02:64.05372: BIU T3 - Paged(p):000fe05f=e6(æ); Normal(p):0000e05f=e6(æ)
00:01:02:64.05484: BIU T4 -
00:01:02:64.05596: BIU T1 I
00:01:02:64.05708: BIU T2 -
00:01:02:64.05824: BIU T3 - Physical(p):000fe060=a0( ); Paged(p):000fe060=a0( ); Normal(p):0000e060=a0( )
00:01:02:64.05940: BIU T4 -
00:01:02:64.06048: BIU T1 S
00:01:02:64.06172: BIU T2 - f000:e05f (E6A0)OUT A0,AL
00:01:02:64.06376: BIU T3 - Paged(p):000fe061=ba(º); Normal(p):0000e061=ba(º)
00:01:02:64.06496: BIU T4 -
00:01:02:64.06604: BIU T1 -
00:01:02:64.06720: BIU T2 -
00:01:02:64.06836: BIU T3 - Paged(p):000fe062=d8(Ø); Normal(p):0000e062=d8(Ø)
00:01:02:64.06944: BIU T4 -
00:01:02:64.07060: BIU T1 -
00:01:02:64.07168: BIU T2 -
00:01:02:64.07276: BIU TW -
00:01:02:64.07452: BIU TW -
00:01:02:64.07580: BIU T3 -
00:01:02:64.07692: BIU T4 -
00:01:02:64.07804: BIU T1 -
00:01:02:64.07916: BIU T2 I
Show last 465 lines
00:01:02:64.08032: BIU T3 -		Paged(p):000fe063=03(); Normal(p):0000e063=03()
00:01:02:64.08140: BIU T4 -
00:01:02:64.08256: BIU T1 S
00:01:02:64.08364: BIU T2 -
00:01:02:64.08492: BIU T3 - f000:e061 (BAD803)MOV DX,03D8 Paged(p):000fe064=ee(î); Normal(p):0000e064=ee(î)
00:01:02:64.08604: BIU T4 -
00:01:02:64.08712: BIU T1 I
00:01:02:64.08832: BIU T2 - f000:e064 (EE)OUT DX,AL
00:01:02:64.08948: BIU T3 - Paged(p):000fe065=b2(²); Normal(p):0000e065=b2(²)
00:01:02:64.09056: BIU T4 -
00:01:02:64.09172: BIU T1 -
00:01:02:64.09280: BIU T2 -
00:01:02:64.09396: BIU T3 - Paged(p):000fe066=b8(¸); Normal(p):0000e066=b8(¸)
00:01:02:64.09504: BIU T4 -
00:01:02:64.09620: BIU T1 -
00:01:02:64.09728: BIU T2 -
00:01:02:64.09836: BIU TW -
00:01:02:64.09952: BIU TW -
00:01:02:65.00060: BIU T3 -
00:01:02:65.00176: BIU T4 -
00:01:02:65.00284: BIU T1 -
00:01:02:65.00396: BIU T2 I
00:01:02:65.00508: BIU T3 - Paged(p):000fe067=fe(þ); Normal(p):0000e067=fe(þ)
00:01:02:65.00624: BIU T4 -
00:01:02:65.00732: BIU T1 S
00:01:02:65.00856: BIU T2 - f000:e065 (B2B8)MOV DL,B8
00:01:02:65.00972: BIU T3 - Paged(p):000fe068=c0(À); Normal(p):0000e068=c0(À)
00:01:02:65.01080: BIU T4 I
00:01:02:65.01188: BIU T1 -
00:01:02:65.01304: BIU T2 S
00:01:02:65.01420: BIU T3 - Paged(p):000fe069=ee(î); Normal(p):0000e069=ee(î)
00:01:02:65.01548: BIU T4 - f000:e067 (FEC0)INC AL
00:01:02:65.01740: BIU T1 -
00:01:02:65.01848: BIU T2 I
00:01:02:65.01964: BIU T3 - Paged(p):000fe06a=b0(°); Normal(p):0000e06a=b0(°)
00:01:02:65.02084: BIU T4 - f000:e069 (EE)OUT DX,AL
00:01:02:65.02200: BIU T1 -
00:01:02:65.02316: BIU T2 -
00:01:02:65.02428: BIU T3 - Paged(p):000fe06b=99(™); Normal(p):0000e06b=99(™)
00:01:02:65.02540: BIU T4 -
00:01:02:65.08740: BIU T1 -
00:01:02:65.08900: BIU T2 -
00:01:02:65.09016: BIU T3 - Paged(p):000fe06c=e6(æ); Normal(p):0000e06c=e6(æ)
00:01:02:65.09132: BIU T4 -
00:01:02:65.09240: BIU T1 -
00:01:02:65.09724: BIU T2 -
00:01:02:65.09836: BIU TW -
00:01:02:65.09948: BIU TW -
00:01:02:66.00064: BIU T3 -
00:01:02:66.00188: BIU T4 -
00:01:02:66.00296: BIU T1 -
00:01:02:66.00412: BIU T2 I
00:01:02:66.00540: BIU T3 - Paged(p):000fe06d=63(c); Normal(p):0000e06d=63(c)
00:01:02:66.00648: BIU T4 -
00:01:02:66.00764: BIU T1 S
00:01:02:66.00884: BIU T2 - f000:e06a (B099)MOV AL,99
00:01:02:66.01000: BIU T3 - Paged(p):000fe06e=b0(°); Normal(p):0000e06e=b0(°)
00:01:02:66.01108: BIU T4 I
00:01:02:66.01224: BIU T1 -
00:01:02:66.01332: BIU T2 S
00:01:02:66.01448: BIU T3 - Paged(p):000fe06f=a5(¥); Normal(p):0000e06f=a5(¥)
00:01:02:66.01568: BIU T4 - f000:e06c (E663)OUT 63,AL
00:01:02:66.01684: BIU T1 -
00:01:02:66.01792: BIU T2 -
00:01:02:66.01916: BIU T3 - Physical(p):000fe070=e6(æ); Paged(p):000fe070=e6(æ); Normal(p):0000e070=e6(æ)
00:01:02:66.02024: BIU T4 -
00:01:02:66.02132: BIU T1 -
00:01:02:66.02248: BIU T2 -
00:01:02:66.02356: BIU TW -
00:01:02:66.02464: BIU TW -
00:01:02:66.02580: BIU T3 -
00:01:02:66.02696: BIU T4 -
00:01:02:66.02804: BIU T1 -
00:01:02:66.02912: BIU T2 I
00:01:02:66.03028: BIU T3 - Paged(p):000fe071=61(a); Normal(p):0000e071=61(a)
00:01:02:66.03144: BIU T4 -
00:01:02:66.03252: BIU T1 S
00:01:02:66.03372: BIU T2 - f000:e06e (B0A5)MOV AL,A5
00:01:02:66.03488: BIU T3 - Paged(p):000fe072=b0(°); Normal(p):0000e072=b0(°)
00:01:02:66.03604: BIU T4 I
00:01:02:66.03712: BIU T1 -
00:01:02:66.03820: BIU T2 S
00:01:02:66.03936: BIU T3 - Paged(p):000fe073=54(T); Normal(p):0000e073=54(T)
00:01:02:66.04052: BIU T4 - f000:e070 (E661)OUT 61,AL
00:01:02:66.04168: BIU T1 -
00:01:02:66.04276: BIU T2 -
00:01:02:66.04396: BIU T3 - Paged(p):000fe074=e6(æ); Normal(p):0000e074=e6(æ)
00:01:02:66.04508: BIU T4 -
00:01:02:66.04620: BIU T1 -
00:01:02:66.04732: BIU T2 -
00:01:02:66.04840: BIU TW -
00:01:02:66.04948: BIU TW -
00:01:02:66.05064: BIU T3 -
00:01:02:66.05172: BIU T4 -
00:01:02:66.05288: BIU T1 -
00:01:02:66.05404: BIU T2 I
00:01:02:66.05524: BIU T3 - Paged(p):000fe075=43(C); Normal(p):0000e075=43(C)
00:01:02:66.05632: BIU T4 -
00:01:02:66.05740: BIU T1 S
00:01:02:66.05864: BIU T2 - f000:e072 (B054)MOV AL,54
00:01:02:66.05980: BIU T3 - Paged(p):000fe076=b0(°); Normal(p):0000e076=b0(°)
00:01:02:66.06088: BIU T4 I
00:01:02:66.06196: BIU T1 -
00:01:02:66.06312: BIU T2 S
00:01:02:66.06428: BIU T3 - Paged(p):000fe077=12(); Normal(p):0000e077=12()
00:01:02:66.06540: BIU T4 - f000:e074 (E643)OUT 43,AL
00:01:02:66.06656: BIU T1 -
00:01:02:66.06764: BIU T2 -
00:01:02:66.06880: BIU T3 - Paged(p):000fe078=e6(æ); Normal(p):0000e078=e6(æ)
00:01:02:66.06988: BIU T4 -
00:01:02:66.07100: BIU T1 -
00:01:02:66.07212: BIU T2 -
00:01:02:66.07324: BIU TW -
00:01:02:66.07432: BIU TW -
00:01:02:66.07548: BIU T3 -
00:01:02:66.07660: BIU T4 -
00:01:02:66.07772: BIU T1 -
00:01:02:66.07884: BIU T2 I
00:01:02:66.07996: BIU T3 - Paged(p):000fe079=41(A); Normal(p):0000e079=41(A)
00:01:02:66.08108: BIU T4 -
00:01:02:66.08220: BIU T1 S
00:01:02:66.08332: BIU T2 - f000:e076 (B012)MOV AL,12
00:01:02:66.08448: BIU T3 - Paged(p):000fe07a=b0(°); Normal(p):0000e07a=b0(°)
00:01:02:66.08564: BIU T4 I
00:01:02:66.08672: BIU T1 -
00:01:02:66.08788: BIU T2 S
00:01:02:66.08904: BIU T3 - Paged(p):000fe07b=40(@); Normal(p):0000e07b=40(@)
00:01:02:66.09020: BIU T4 - f000:e078 (E641)OUT 41,AL
00:01:02:66.09212: BIU T1 -
00:01:02:66.09320: BIU T2 -
00:01:02:66.09436: BIU T3 - Paged(p):000fe07c=e6(æ); Normal(p):0000e07c=e6(æ)
00:01:02:66.09608: BIU T4 -
00:01:02:66.09724: BIU T1 -
00:01:02:66.09832: BIU T2 -
00:01:02:66.09940: BIU TW -
00:01:02:67.00056: BIU TW -
00:01:02:67.00164: BIU T3 -
00:01:02:67.00280: BIU T4 -
00:01:02:67.00388: BIU T1 -
00:01:02:67.00504: BIU T2 I
00:01:02:67.00620: BIU T3 - Paged(p):000fe07d=43(C); Normal(p):0000e07d=43(C)
00:01:02:67.00732: BIU T4 -
00:01:02:67.00844: BIU T1 S
00:01:02:67.00964: BIU T2 - f000:e07a (B040)MOV AL,40
00:01:02:67.01080: BIU T3 - Paged(p):000fe07e=b0(°); Normal(p):0000e07e=b0(°)
00:01:02:67.01188: BIU T4 I
00:01:02:67.01296: BIU T1 -
00:01:02:67.01412: BIU T2 S
00:01:02:67.01528: BIU T3 - Paged(p):000fe07f=00( ); Normal(p):0000e07f=00( )
00:01:02:67.01648: BIU T4 - f000:e07c (E643)OUT 43,AL
00:01:02:67.01756: BIU T1 -
00:01:02:67.01872: BIU T2 -
00:01:02:67.01988: BIU T3 - Physical(p):000fe080=e6(æ); Paged(p):000fe080=e6(æ); Normal(p):0000e080=e6(æ)
00:01:02:67.02104: BIU T4 -
00:01:02:67.02212: BIU T1 -
00:01:02:67.02320: BIU T2 -
00:01:02:67.02436: BIU TW -
00:01:02:67.02552: BIU TW -
00:01:02:67.02660: BIU T3 -
00:01:02:67.02776: BIU T4 -
00:01:02:67.02884: BIU T1 -
00:01:02:67.03000: BIU T2 I
00:01:02:67.03116: BIU T3 - Paged(p):000fe081=81(); Normal(p):0000e081=81()
00:01:02:67.03224: BIU T4 -
00:01:02:67.03340: BIU T1 S
00:01:02:67.03452: BIU T2 - f000:e07e (B000)MOV AL,00
00:01:02:67.03576: BIU T3 - Paged(p):000fe082=e6(æ); Normal(p):0000e082=e6(æ)
00:01:02:67.03684: BIU T4 I
00:01:02:67.03800: BIU T1 -
00:01:02:67.03908: BIU T2 S
00:01:02:67.04024: BIU T3 - Paged(p):000fe083=82(‚); Normal(p):0000e083=82(‚)
00:01:02:67.04140: BIU T4 - f000:e080 (E681)OUT 81,AL
00:01:02:67.04248: BIU T1 -
00:01:02:67.04364: BIU T2 -
00:01:02:67.04476: BIU T3 - Paged(p):000fe084=e6(æ); Normal(p):0000e084=e6(æ)
00:01:02:67.04588: BIU T4 -
00:01:02:67.04700: BIU T1 -
00:01:02:67.04812: BIU T2 -
00:01:02:67.04920: BIU TW -
00:01:02:67.05036: BIU TW -
00:01:02:67.05144: BIU T3 -
00:01:02:67.05252: BIU T4 -
00:01:02:67.05368: BIU T1 -
00:01:02:67.05488: BIU T2 I
00:01:02:67.05612: BIU T3 - Paged(p):000fe085=83(ƒ); Normal(p):0000e085=83(ƒ)
00:01:02:67.05720: BIU T4 -
00:01:02:67.05828: BIU T1 S
00:01:02:67.05948: BIU T2 - f000:e082 (E682)OUT 82,AL
00:01:02:67.06064: BIU T3 - Paged(p):000fe086=e6(æ); Normal(p):0000e086=e6(æ)
00:01:02:67.06172: BIU T4 -
00:01:02:67.06284: BIU T1 -
00:01:02:67.06396: BIU T2 -
00:01:02:67.06512: BIU T3 - Paged(p):000fe087=0d( ); Normal(p):0000e087=0d( )
00:01:02:67.06620: BIU T4 -
00:01:02:67.06732: BIU T1 -
00:01:02:67.06844: BIU T2 -
00:01:02:67.06956: BIU TW -
00:01:02:67.07064: BIU TW -
00:01:02:67.07180: BIU T3 -
00:01:02:67.07288: BIU T4 -
00:01:02:67.07404: BIU -- -
00:01:02:67.07516: BIU T1 I
00:01:02:67.07628: BIU T2 -
00:01:02:67.07740: BIU T3 S Paged(p):000fe088=b0(°); Normal(p):0000e088=b0(°)
00:01:02:67.07856: BIU T4 -
00:01:02:67.07980: BIU T1 - f000:e084 (E683)OUT 83,AL
00:01:02:67.08088: BIU T2 -
00:01:02:67.08204: BIU T3 - Paged(p):000fe089=58(X); Normal(p):0000e089=58(X)
00:01:02:67.08312: BIU T4 -
00:01:02:67.08428: BIU -- -
00:01:02:67.08536: BIU -- -
00:01:02:67.08644: BIU -- -
00:01:02:67.08760: BIU T1 -
00:01:02:67.08868: BIU T2 -
00:01:02:67.08976: BIU TW -
00:01:02:67.09092: BIU TW -
00:01:02:67.09200: BIU T3 -
00:01:02:67.09308: BIU T4 -
00:01:02:67.09424: BIU -- -
00:01:02:67.09540: BIU T1 I
00:01:02:67.09648: BIU T2 -
00:01:02:67.09764: BIU T3 S Paged(p):000fe08a=e6(æ); Normal(p):0000e08a=e6(æ)
00:01:02:67.09872: BIU T4 -
00:01:02:67.09996: BIU T1 - f000:e086 (E60D)OUT 0D,AL
00:01:02:68.00104: BIU T2 -
00:01:02:68.00220: BIU T3 - Paged(p):000fe08b=0b(); Normal(p):0000e08b=0b()
00:01:02:68.00328: BIU T4 -
00:01:02:68.00448: BIU -- -
00:01:02:68.00564: BIU -- -
00:01:02:68.00672: BIU -- -
00:01:02:68.00780: BIU T1 -
00:01:02:68.00896: BIU T2 -
00:01:02:68.01004: BIU TW -
00:01:02:68.01120: BIU TW -
00:01:02:68.01228: BIU T3 -
00:01:02:68.01340: BIU T4 -
00:01:02:68.01452: BIU -- -
00:01:02:68.01568: BIU T1 I
00:01:02:68.01676: BIU T2 -
00:01:02:68.01792: BIU T3 S Paged(p):000fe08c=b0(°); Normal(p):0000e08c=b0(°)
00:01:02:68.01908: BIU T4 -
00:01:02:68.02024: BIU T1 - f000:e088 (B058)MOV AL,58
00:01:02:68.02132: BIU T2 -
00:01:02:68.02248: BIU T3 I Paged(p):000fe08d=41(A); Normal(p):0000e08d=41(A)
00:01:02:68.02364: BIU T4 -
00:01:02:68.03880: BIU T1 -
00:01:02:68.04000: BIU T2 S
00:01:02:68.04116: BIU T3 - Paged(p):000fe08e=e6(æ); Normal(p):0000e08e=e6(æ)
00:01:02:68.04236: BIU T4 - f000:e08a (E60B)OUT 0B,AL
00:01:02:68.04348: BIU T1 -
00:01:02:68.04500: BIU T2 -
00:01:02:68.04648: BIU T3 - Paged(p):000fe08f=0b(); Normal(p):0000e08f=0b()
00:01:02:68.04764: BIU T4 -
00:01:02:68.04876: BIU T1 -
00:01:02:68.04988: BIU T2 -
00:01:02:68.05096: BIU TW -
00:01:02:68.05212: BIU TW -
00:01:02:68.05320: BIU T3 -
00:01:02:68.05436: BIU T4 -
00:01:02:68.05548: BIU -- -
00:01:02:68.05660: BIU T1 I
00:01:02:68.05772: BIU T2 -
00:01:02:68.05896: BIU T3 S Physical(p):000fe090=b0(°); Paged(p):000fe090=b0(°); Normal(p):0000e090=b0(°)
00:01:02:68.06004: BIU T4 -
00:01:02:68.06124: BIU T1 - f000:e08c (B041)MOV AL,41
00:01:02:68.06236: BIU T2 -
00:01:02:68.06348: BIU T3 I Paged(p):000fe091=42(B); Normal(p):0000e091=42(B)
00:01:02:68.06464: BIU T4 -
00:01:02:68.06580: BIU T1 -
00:01:02:68.06688: BIU T2 S
00:01:02:68.06804: BIU T3 - Paged(p):000fe092=e6(æ); Normal(p):0000e092=e6(æ)
00:01:02:68.06920: BIU T4 - f000:e08e (E60B)OUT 0B,AL
00:01:02:68.07036: BIU T1 -
00:01:02:68.07144: BIU T2 -
00:01:02:68.07260: BIU T3 - Paged(p):000fe093=0b(); Normal(p):0000e093=0b()
00:01:02:68.07372: BIU T4 -
00:01:02:68.07488: BIU T1 -
00:01:02:68.07596: BIU T2 -
00:01:02:68.07708: BIU TW -
00:01:02:68.07820: BIU TW -
00:01:02:68.07932: BIU T3 -
00:01:02:68.08044: BIU T4 -
00:01:02:68.08156: BIU -- -
00:01:02:68.08268: BIU T1 I
00:01:02:68.08380: BIU T2 -
00:01:02:68.08500: BIU T3 S Paged(p):000fe094=b0(°); Normal(p):0000e094=b0(°)
00:01:02:68.08608: BIU T4 -
00:01:02:68.08724: BIU T1 - f000:e090 (B042)MOV AL,42
00:01:02:68.08840: BIU T2 -
00:01:02:68.08956: BIU T3 I Paged(p):000fe095=43(C); Normal(p):0000e095=43(C)
00:01:02:68.09064: BIU T4 -
00:01:02:68.09172: BIU T1 -
00:01:02:68.09288: BIU T2 S
00:01:02:68.09404: BIU T3 - Paged(p):000fe096=e6(æ); Normal(p):0000e096=e6(æ)
00:01:02:68.09524: BIU T4 - f000:e092 (E60B)OUT 0B,AL
00:01:02:68.09632: BIU T1 -
00:01:02:68.09740: BIU T2 -
00:01:02:68.09856: BIU T3 - Paged(p):000fe097=0b(); Normal(p):0000e097=0b()
00:01:02:68.09972: BIU T4 -
00:01:02:69.00080: BIU T1 -
00:01:02:69.00188: BIU T2 -
00:01:02:69.00304: BIU TW -
00:01:02:69.00412: BIU TW -
00:01:02:69.00540: BIU T3 -
00:01:02:69.00652: BIU T4 -
00:01:02:69.00760: BIU -- -
00:01:02:69.00876: BIU T1 I
00:01:02:69.00984: BIU T2 -
00:01:02:69.01100: BIU T3 S Paged(p):000fe098=b0(°); Normal(p):0000e098=b0(°)
00:01:02:69.01208: BIU T4 -
00:01:02:69.01328: BIU T1 - f000:e094 (B043)MOV AL,43
00:01:02:69.01436: BIU T2 -
00:01:02:69.01564: BIU T3 I Paged(p):000fe099=ff(ÿ); Normal(p):0000e099=ff(ÿ)
00:01:02:69.01676: BIU T4 -
00:01:02:69.01788: BIU T1 -
00:01:02:69.01900: BIU T2 S
00:01:02:69.02012: BIU T3 - Paged(p):000fe09a=e6(æ); Normal(p):0000e09a=e6(æ)
00:01:02:69.02128: BIU T4 - f000:e096 (E60B)OUT 0B,AL
00:01:02:69.02244: BIU T1 -
00:01:02:69.02352: BIU T2 -
00:01:02:69.02468: BIU T3 - Paged(p):000fe09b=01(); Normal(p):0000e09b=01()
00:01:02:69.02584: BIU T4 -
00:01:02:69.02692: BIU T1 -
00:01:02:69.02800: BIU T2 -
00:01:02:69.02916: BIU TW -
00:01:02:69.03024: BIU TW -
00:01:02:69.03132: BIU T3 -
00:01:02:69.03324: BIU T4 -
00:01:02:69.03440: BIU -- -
00:01:02:69.03580: BIU T1 I
00:01:02:69.03696: BIU T2 -
00:01:02:69.03812: BIU T3 S Paged(p):000fe09c=e6(æ); Normal(p):0000e09c=e6(æ)
00:01:02:69.03920: BIU T4 -
00:01:02:69.04044: BIU T1 - f000:e098 (B0FF)MOV AL,FF
00:01:02:69.04152: BIU T2 -
00:01:02:69.04272: BIU T3 I Paged(p):000fe09d=01(); Normal(p):0000e09d=01()
00:01:02:69.04380: BIU T4 -
00:01:02:69.04496: BIU T1 -
00:01:02:69.04604: BIU T2 S
00:01:02:69.04720: BIU T3 - Paged(p):000fe09e=40(@); Normal(p):0000e09e=40(@)
00:01:02:69.04836: BIU T4 - f000:e09a (E601)OUT 01,AL
00:01:02:69.04952: BIU T1 -
00:01:02:69.05060: BIU T2 -
00:01:02:69.05176: BIU T3 - Paged(p):000fe09f=e6(æ); Normal(p):0000e09f=e6(æ)
00:01:02:69.05284: BIU T4 -
00:01:02:69.05400: BIU T1 -
00:01:02:69.05516: BIU T2 -
00:01:02:69.05624: BIU TW -
00:01:02:69.05732: BIU TW -
00:01:02:69.05848: BIU T3 -
00:01:02:69.05956: BIU T4 -
00:01:02:69.06072: BIU -- -
00:01:02:69.06180: BIU T1 I
00:01:02:69.06296: BIU T2 -
00:01:02:69.06412: BIU T3 S Physical(p):000fe0a0=08(); Paged(p):000fe0a0=08(); Normal(p):0000e0a0=08()
00:01:02:69.06524: BIU T4 -
00:01:02:69.06640: BIU T1 - f000:e09c (E601)OUT 01,AL
00:01:02:69.06756: BIU T2 -
00:01:02:69.06864: BIU T3 - Paged(p):000fe0a1=e6(æ); Normal(p):0000e0a1=e6(æ)
00:01:02:69.06980: BIU T4 -
00:01:02:69.07088: BIU -- -
00:01:02:69.07196: BIU -- -
00:01:02:69.07312: BIU -- -
00:01:02:69.07420: BIU T1 -
00:01:02:69.07536: BIU T2 -
00:01:02:69.07644: BIU TW -
00:01:02:69.07756: BIU TW -
00:01:02:69.07868: BIU T3 -
00:01:02:69.07980: BIU T4 -
00:01:02:69.08092: BIU -- -
00:01:02:69.08204: BIU T1 I
00:01:02:69.08332: BIU T2 - f000:e09e (40)INC AX
00:01:02:69.08444: BIU T3 I Paged(p):000fe0a2=0a( ); Normal(p):0000e0a2=0a( )
00:01:02:69.08560: BIU T4 -
00:01:02:69.08668: BIU T1 -
00:01:02:69.08780: BIU T2 S
00:01:02:69.08900: BIU T3 - Paged(p):000fe0a3=b0(°); Normal(p):0000e0a3=b0(°)
00:01:02:69.09016: BIU T4 - f000:e09f (E608)OUT 08,AL
00:01:02:69.09132: BIU T1 -
00:01:02:69.09240: BIU T2 -
00:01:02:69.09356: BIU T3 - Paged(p):000fe0a4=36(6); Normal(p):0000e0a4=36(6)
00:01:02:69.09508: BIU T4 -
00:01:02:69.09628: BIU T1 -
00:01:02:69.09744: BIU T2 -
00:01:02:69.09852: BIU TW -
00:01:02:69.09968: BIU TW -
00:01:02:70.00076: BIU T3 -
00:01:02:70.00192: BIU T4 -
00:01:02:70.00300: BIU -- -
00:01:02:70.00416: BIU T1 I
00:01:02:70.00524: BIU T2 -
00:01:02:70.00640: BIU T3 S Paged(p):000fe0a5=e6(æ); Normal(p):0000e0a5=e6(æ)
00:01:02:70.00756: BIU T4 -
00:01:02:70.00872: BIU T1 - f000:e0a1 (E60A)OUT 0A,AL
00:01:02:70.00980: BIU T2 -
00:01:02:70.01096: BIU T3 - Paged(p):000fe0a6=43(C); Normal(p):0000e0a6=43(C)
00:01:02:70.01212: BIU T4 -
00:01:02:70.01320: BIU -- -
00:01:02:70.01428: BIU -- -
00:01:02:70.01544: BIU -- -
00:01:02:70.01652: BIU T1 -
00:01:02:70.01768: BIU T2 -
00:01:02:70.01876: BIU TW -
00:01:02:70.01984: BIU TW -
00:01:02:70.02100: BIU T3 -
00:01:02:70.02208: BIU T4 -
00:01:02:70.02316: DMA S0 -
00:01:02:70.02432: DMA S0 -
00:01:02:70.02548: DMA S1 -
00:01:02:70.02656: DMA S2 -
00:01:02:70.02764: DMA S3 -
00:01:02:70.02880: DMA S4 -
00:01:02:70.02988: BIU -- -
00:01:02:70.03104: BIU T1 I
00:01:02:70.03212: BIU T2 -
00:01:02:70.03328: BIU T3 S Paged(p):000fe0a7=b0(°); Normal(p):0000e0a7=b0(°)
00:01:02:70.03436: BIU T4 -
00:01:02:70.03560: BIU T1 - f000:e0a3 (B036)MOV AL,36
00:01:02:70.03668: BIU T2 -
00:01:02:70.03788: BIU T3 I Paged(p):000fe0a8=00( ); Normal(p):0000e0a8=00( )
00:01:02:70.03904: BIU T4 -
00:01:02:70.04012: BIU T1 -
00:01:02:70.04124: BIU T2 S
00:01:02:70.04236: BIU T3 - Paged(p):000fe0a9=e6(æ); Normal(p):0000e0a9=e6(æ)
00:01:02:70.04360: BIU T4 - f000:e0a5 (E643)OUT 43,AL
00:01:02:70.04468: BIU T1 -
00:01:02:70.04584: BIU T2 -
00:01:02:70.04700: BIU T3 - Paged(p):000fe0aa=40(@); Normal(p):0000e0aa=40(@)
00:01:02:70.04812: BIU T4 -
00:01:02:70.04924: BIU T1 -
00:01:02:70.05032: BIU T2 -
00:01:02:70.05148: BIU TW -
00:01:02:70.05256: BIU TW -
00:01:02:70.05372: BIU T3 -
00:01:02:70.05492: BIU T4 -
00:01:02:70.05608: DMA S0 -
00:01:02:70.05716: DMA S0 -
00:01:02:70.05832: DMA S1 -
00:01:02:70.05940: DMA S2 -
00:01:02:70.06048: DMA S3 -
00:01:02:70.06156: DMA S4 -
00:01:02:70.06272: BIU -- -
00:01:02:70.06380: BIU T1 I
00:01:02:70.06496: BIU T2 -
00:01:02:70.06612: BIU T3 S Paged(p):000fe0ab=e6(æ); Normal(p):0000e0ab=e6(æ)
00:01:02:70.08832: BIU T4 -
00:01:02:70.08960: BIU T1 - f000:e0a7 (B000)MOV AL,00
00:01:02:70.09068: BIU T2 -
00:01:02:70.09184: BIU T3 I Paged(p):000fe0ac=40(@); Normal(p):0000e0ac=40(@)
00:01:02:70.09300: BIU T4 -
00:01:02:70.09408: BIU T1 -
00:01:02:70.09524: BIU T2 S
00:01:02:70.09640: BIU T3 - Paged(p):000fe0ad=ba(º); Normal(p):0000e0ad=ba(º)
00:01:02:70.09756: BIU T4 - f000:e0a9 (E640)OUT 40,AL
00:01:02:70.09868: BIU T1 -
00:01:02:70.09980: BIU T2 -
00:01:02:71.00092: BIU T3 - Paged(p):000fe0ae=13(); Normal(p):0000e0ae=13()
00:01:02:71.00204: BIU T4 -
00:01:02:71.00316: BIU T1 -
00:01:02:71.00428: BIU T2 -
00:01:02:71.00540: BIU TW -
00:01:02:71.00652: BIU TW -
00:01:02:71.00764: BIU T3 -
00:01:02:71.00876: BIU T4 -
00:01:02:71.00984: BIU -- -

Author of the UniPCemu emulator.
UniPCemu Git repository
UniPCemu for Android, Windows, PSP, Vita and Switch on itch.io

Reply 27 of 122, by GloriousCow

User metadata
Rank Member
Rank
Member
superfury wrote on 2023-07-16, 03:01:

Just a small question: what is the program dumping? Pit clock counts(compared to the reload value)? CPU cycles? Raw pit clock counts read from the PIT registers (after latching)?

It sends the latch command and reads timer 0. The timer started at FFFF so you can calculate the elapsed ticks at each checkpoint. ticks*4 = cpu cycles

superfury wrote on 2023-07-16, 11:23:

Hmmm... The queue status (QS0/1)...
Is it representative of what the EU is doing with the queue on a cycle? Or is it driven by the BIU only?

The queue status lines reflect a queue read operation occurred on the PREVIOUS cycle. Also good to note, a single instruction can have multiple "First byte" fetches - prefixes are always considered "first byte" fetches, regardless of how many there are. So an instruction with a prefix, opcode, modrm and 16 bit displacement would have queue statuses of F, F, S, S, S.

I confess there are some parts of the CPU i am not sure are the BIU or the EU. It can be the EU sometimes, as when a fetch is part of a microcode program, but is the decode phase the BIU or the EU? Or something else? Not sure.

MartyPC: A cycle-accurate IBM PC/XT emulator | https://github.com/dbalsom/martypc

Reply 28 of 122, by superfury

User metadata
Rank l33t++
Rank
l33t++
GloriousCow wrote on 2023-07-16, 14:09:
It sends the latch command and reads timer 0. The timer started at FFFF so you can calculate the elapsed ticks at each checkpoi […]
Show full quote
superfury wrote on 2023-07-16, 03:01:

Just a small question: what is the program dumping? Pit clock counts(compared to the reload value)? CPU cycles? Raw pit clock counts read from the PIT registers (after latching)?

It sends the latch command and reads timer 0. The timer started at FFFF so you can calculate the elapsed ticks at each checkpoint. ticks*4 = cpu cycles

superfury wrote on 2023-07-16, 11:23:

Hmmm... The queue status (QS0/1)...
Is it representative of what the EU is doing with the queue on a cycle? Or is it driven by the BIU only?

The queue status lines reflect a queue read operation occurred on the PREVIOUS cycle. Also good to note, a single instruction can have multiple "First byte" fetches - prefixes are always considered "first byte" fetches, regardless of how many there are. So an instruction with a prefix, opcode, modrm and 16 bit displacement would have queue statuses of F, F, S, S, S.

I confess there are some parts of the CPU i am not sure are the BIU or the EU. It can be the EU sometimes, as when a fetch is part of a microcode program, but is the decode phase the BIU or the EU? Or something else? Not sure.

Well, in UniPCemu, a prefetch is always the BIU and a fetch is always the EU. Decoding also is done on the EU (this has nothing to do with the BIU after all).
Currently all fetches for parameters (8, 16 or 32-bit) execute in 1 cycle each (so a 16-bit fetch from prefetch is 1 cycle, not 2 (it isn't broken up into 8 bit fetches on timings)).
So for example MOV BX,4 would be 2 ticks (excluding prefetching). 1 tick for the MOV BX instruction opcode(and 1 read on the BIU first) and 1 tick for the parameter(and 2 reads on the BIU first).

UniPCemu currently reports the -/I/E/S state on the cycle it occurs(which is before the BIU ticks 1 cycle).
So the cycle after the final S state will be when the instruction handling actually starts (when the EU starts 'decoding' etc.). Then the instruction is logged once 'decoding' ends, which is usually the cycle after S, unless some extra timing is applied. Those extra cycles are EA calculations, some string extra timings(MOVS,LODS,CMPS,SCAS,STOS), and REP starting/finishing before the instruction.

Edit: Just found a bug in the EU's PIQ fetching. It was counting prefetch cycles into the BIU. So the BIU would tick until those cycles were spent (these are the total of prefetch and EA cycles). But actually, the BIU is supposed to ignore that and simply continue to become ready during that.
UniPCemu's CPU reset still ticks those though (which usually isn't documented at all).

Although those EA and prefetch cycles won't make the BIU non-ready anymore, they would of course affect the EU still (just tick the BIU a few more times before resuming).
Edit: And fixing some bookkeeping on the EU's prefetching (it was counting prefetch cycles with an extra cycle on top of it's 1 cycle, which it shouldn't), the timing now seems proper (wrt to the I/S prefetch states).
That should at least shave off at least 1 cycle for each prefetch (up to 3 if prefetching dwords in one go).
Edit: 8088 MPH reports 1564 cycles now.

Latest 8088tst3 results:

disk:	real:	comp:	disp:	comp:	disp2:	comp:	disp3:	comp:	disp4:
FF36 FF43 <(-13) FF36 <(-13) FF36 <(-13) FF46 >(+3) FF48 >(+5)
FE3C FE59 <(-29) FE3D <(-28) FE3D <(-28) FE61 >(+8) FE65 >(+12)
FDA0 FDC5 <(-37) FDA1 <(-36) FDA3 <(-34) FDD3 >(+14) FDD7 >(+18)
FD26 FD58 <(-50) FD27 <(-49) FD28 <(-48) FD63 >(+11) FD67 >(+15)
FCF7 FD2A <(-51) FCF8 <(-50) FCF8 <(-50) FD37 >(+13) FD3C >(+18)
FC31 FC6B <(-58) FC31 <(-58) FC31 <(-58) FC81 >(+22) FC87 >(+28)
FB6D FBB7 <(-74) FB6E <(-73) FB6D <(-74) FBCD >(+22) FBD2 >(+27)
F93C F9A9 <(-109) F93D <(-108) F93D <(-108) F9C9(+32) F93E >(+32)
CPU test complete. Elapsed timer ticks:
07F1 07CA <(+27) 07F2 <(+28) 07F1 <(+27) 074D <(-125) 0735 <(-149)

Author of the UniPCemu emulator.
UniPCemu Git repository
UniPCemu for Android, Windows, PSP, Vita and Switch on itch.io

Reply 29 of 122, by GloriousCow

User metadata
Rank Member
Rank
Member
superfury wrote on 2023-07-16, 14:21:

So the cycle after the final S state will be when the instruction handling actually starts (when the EU starts 'decoding' etc.). Then the instruction is logged once 'decoding' ends, which is usually the cycle after S, unless some extra timing is applied. Those extra cycles are EA calculations, some string extra timings(MOVS,LODS,CMPS,SCAS,STOS), and REP starting/finishing before the instruction.

What I've been trying to impart this whole time is that some of those S's might be fetches being done by the instruction execution itself; instruction execution starts after "First byte" representing the opcode with no modrm, or after the first "Subsequent byte" assuming a modrm was present. The instruction is then running - if there was a modrm, the microcode calculates the EA, fetches any displacement (one or two more 'S'), reads the EA operand if necessary, then jumps to the appropriate microcode for the opcode in question, and that opcode-program might fetch immediate operands (more 'S's')

Decode doesn't read your displacement or your immediate operand.

Here's another great example, F6/F7, TEST rm, imm.

If a memory operand, does UniPCemu fetch the immediate before or after reading the EA operand?

MartyPC: A cycle-accurate IBM PC/XT emulator | https://github.com/dbalsom/martypc

Reply 30 of 122, by superfury

User metadata
Rank l33t++
Rank
l33t++
GloriousCow wrote on 2023-07-16, 16:15:
What I've been trying to impart this whole time is that some of those S's might be fetches being done by the instruction executi […]
Show full quote
superfury wrote on 2023-07-16, 14:21:

So the cycle after the final S state will be when the instruction handling actually starts (when the EU starts 'decoding' etc.). Then the instruction is logged once 'decoding' ends, which is usually the cycle after S, unless some extra timing is applied. Those extra cycles are EA calculations, some string extra timings(MOVS,LODS,CMPS,SCAS,STOS), and REP starting/finishing before the instruction.

What I've been trying to impart this whole time is that some of those S's might be fetches being done by the instruction execution itself; instruction execution starts after "First byte" representing the opcode with no modrm, or after the first "Subsequent byte" assuming a modrm was present. The instruction is then running - if there was a modrm, the microcode calculates the EA, fetches any displacement (one or two more 'S'), reads the EA operand if necessary, then jumps to the appropriate microcode for the opcode in question, and that opcode-program might fetch immediate operands (more 'S's')

Decode doesn't read your displacement or your immediate operand.

Here's another great example, F6/F7, TEST rm, imm.

If a memory operand, does UniPCemu fetch the immediate before or after reading the EA operand?

It does it in the following order:
First read TEST opcode (F6), then rm parameters, then imm(every byte/word/dword parameter so far are in 1-cycle fetches from the prefetch buffer, unless not buffered yet(kept pending for 1 cycle until buffered), thus as per the 8088/8086 documentation afaik (every byte/word/dword parameter fetched in 1 cycle from the PIQ)), then perform EA timing, then the first cycle of the instruction handler starts. The byte/word/dword buffering is modified to be byte buffering on the 808x CPUs.
Edit: Just checked the word timings a bit for prefetching. Apparently it's not fetched from the PIQ in 1 cycle but 2 instead. Just adjusted the EU to perform it as 8-bit fetches instead (2 cycles).

Author of the UniPCemu emulator.
UniPCemu Git repository
UniPCemu for Android, Windows, PSP, Vita and Switch on itch.io

Reply 31 of 122, by GloriousCow

User metadata
Rank Member
Rank
Member
superfury wrote on 2023-07-16, 16:41:

It does it in the following order:
First read TEST opcode (F6), then rm parameters, then imm(every byte/word/dword parameter so far are in 1-cycle fetches from the prefetch buffer, unless not buffered yet(kept pending for 1 cycle until buffered), thus as per the 8088/8086 documentation afaik (every byte/word/dword parameter fetched in 1 cycle from the PIQ)), then perform EA timing, then the first cycle of the instruction handler starts. The byte/word/dword buffering is modified to be byte buffering on the 808x CPUs.
Edit: Just checked the word timings a bit for prefetching. Apparently it's not fetched from the PIQ in 1 cycle but 2 instead. Just adjusted the EU to perform it as 8-bit fetches instead (2 cycles).

And that's not correct.

Since the TEST instruction is responsible for fetching its own immediate within its specific microcode, what happens is that the EA operand is loaded *first*.

This is a hardware cycle trace of executing TEST rm, imm:

017   [F0104] CS M:R.. I:... Q:.. CODE T2 <-r 07 | F0 [        ] <-q F7 TEST word [bx], 0BEEFh
018 [F0104] CS M:R.. I:... Q:.. PASV T3 <-r 07 | 0 [ ]
019 [F0104] CS M:... I:... Q:.. PASV T4 | 0 [ ]
020 A:[F0105] M:... I:... Q:.. CODE T1 | 1 [07 ]
021 [F0105] CS M:R.. I:... Q:.. CODE T2 <-r EF | S0 [ ] <-q 07 ; modrm fetched
022 [F0105] CS M:R.. I:... Q:.. PASV T3 <-r EF | 0 [ ]
023 [F0105] CS M:... I:... Q:.. PASV T4 | 0 [ ]
024 A:[F0106] M:... I:... Q:.. CODE T1 | 1 [EF ]
025 [F0106] CS M:R.. I:... Q:.. CODE T2 <-r BE | 1 [EF ]
026 [F0106] CS M:R.. I:... Q:.. PASV T3 <-r BE | 1 [EF ]
027 [F0106] CS M:... I:... Q:.. PASV T4 | 1 [EF ]
028 A:[1EA44] M:... I:... Q:.. MEMR T1 | 2 [EFBE ] ; EA operand loaded here
029 [1EA44] DS M:R.. I:... Q:.. MEMR T2 <-r 00 | 2 [EFBE ]
030 [1EA44] DS M:R.. I:... Q:.. PASV T3 <-r 00 | 2 [EFBE ]
031 [1EA44] DS M:... I:... Q:.. PASV T4 | 2 [EFBE ]
032 A:[1EA45] M:... I:... Q:.. MEMR T1 | 2 [EFBE ]
033 [1EA45] DS M:R.. I:... Q:.. MEMR T2 <-r 00 | 2 [EFBE ]
034 [1EA45] DS M:R.. I:... Q:.. PASV T3 <-r 00 | 2 [EFBE ]
035 [1EA45] DS M:... I:... Q:.. PASV T4 | 2 [EFBE ]
036 A:[F0107] M:... I:... Q:.. CODE T1 | 2 [EFBE ]
037 [F0107] CS M:R.. I:... Q:.. CODE T2 <-r 90 | 2 [EFBE ]
038 [F0107] CS M:R.. I:... Q:.. PASV T3 <-r 90 | S1 [BE ] <-q EF ; immediate fetched here!
039 [F0107] CS M:... I:... Q:.. PASV T4 | S0 [ ] <-q BE
040 A:[F0108] M:... I:... Q:.. CODE T1 | 1 [90 ]

MartyPC: A cycle-accurate IBM PC/XT emulator | https://github.com/dbalsom/martypc

Reply 32 of 122, by superfury

User metadata
Rank l33t++
Rank
l33t++
GloriousCow wrote on 2023-07-16, 17:17:
And that's not correct. […]
Show full quote
superfury wrote on 2023-07-16, 16:41:

It does it in the following order:
First read TEST opcode (F6), then rm parameters, then imm(every byte/word/dword parameter so far are in 1-cycle fetches from the prefetch buffer, unless not buffered yet(kept pending for 1 cycle until buffered), thus as per the 8088/8086 documentation afaik (every byte/word/dword parameter fetched in 1 cycle from the PIQ)), then perform EA timing, then the first cycle of the instruction handler starts. The byte/word/dword buffering is modified to be byte buffering on the 808x CPUs.
Edit: Just checked the word timings a bit for prefetching. Apparently it's not fetched from the PIQ in 1 cycle but 2 instead. Just adjusted the EU to perform it as 8-bit fetches instead (2 cycles).

And that's not correct.

Since the TEST instruction is responsible for fetching its own immediate within its specific microcode, what happens is that the EA operand is loaded *first*.

This is a hardware cycle trace of executing TEST rm, imm:

017   [F0104] CS M:R.. I:... Q:.. CODE T2 <-r 07 | F0 [        ] <-q F7 TEST word [bx], 0BEEFh
018 [F0104] CS M:R.. I:... Q:.. PASV T3 <-r 07 | 0 [ ]
019 [F0104] CS M:... I:... Q:.. PASV T4 | 0 [ ]
020 A:[F0105] M:... I:... Q:.. CODE T1 | 1 [07 ]
021 [F0105] CS M:R.. I:... Q:.. CODE T2 <-r EF | S0 [ ] <-q 07 ; modrm fetched
022 [F0105] CS M:R.. I:... Q:.. PASV T3 <-r EF | 0 [ ]
023 [F0105] CS M:... I:... Q:.. PASV T4 | 0 [ ]
024 A:[F0106] M:... I:... Q:.. CODE T1 | 1 [EF ]
025 [F0106] CS M:R.. I:... Q:.. CODE T2 <-r BE | 1 [EF ]
026 [F0106] CS M:R.. I:... Q:.. PASV T3 <-r BE | 1 [EF ]
027 [F0106] CS M:... I:... Q:.. PASV T4 | 1 [EF ]
028 A:[1EA44] M:... I:... Q:.. MEMR T1 | 2 [EFBE ] ; EA operand loaded here
029 [1EA44] DS M:R.. I:... Q:.. MEMR T2 <-r 00 | 2 [EFBE ]
030 [1EA44] DS M:R.. I:... Q:.. PASV T3 <-r 00 | 2 [EFBE ]
031 [1EA44] DS M:... I:... Q:.. PASV T4 | 2 [EFBE ]
032 A:[1EA45] M:... I:... Q:.. MEMR T1 | 2 [EFBE ]
033 [1EA45] DS M:R.. I:... Q:.. MEMR T2 <-r 00 | 2 [EFBE ]
034 [1EA45] DS M:R.. I:... Q:.. PASV T3 <-r 00 | 2 [EFBE ]
035 [1EA45] DS M:... I:... Q:.. PASV T4 | 2 [EFBE ]
036 A:[F0107] M:... I:... Q:.. CODE T1 | 2 [EFBE ]
037 [F0107] CS M:R.. I:... Q:.. CODE T2 <-r 90 | 2 [EFBE ]
038 [F0107] CS M:R.. I:... Q:.. PASV T3 <-r 90 | S1 [BE ] <-q EF ; immediate fetched here!
039 [F0107] CS M:... I:... Q:.. PASV T4 | S0 [ ] <-q BE
040 A:[F0108] M:... I:... Q:.. CODE T1 | 1 [90 ]

Isn't that essentially the same? First the TEST opcode is fetched and read (every empty PIQ waiting 1 cycle) in 1 cycle, then the same for the modr/m byte and the remaining modr/m parameters and finally the immediate.
It's still the same result?

That's exactly what UniPCemu does, only in the prefetching part of operations instead of the EU handler (since it's the same for all instructions anyway).
Btw, with rm parameters I meant the modrm byte (1 cycle) followed by the EA word (if any). Each is done in 1 cycle on the 8088 (with the words of course broken up into seperate cycles now).

Author of the UniPCemu emulator.
UniPCemu Git repository
UniPCemu for Android, Windows, PSP, Vita and Switch on itch.io

Reply 33 of 122, by GloriousCow

User metadata
Rank Member
Rank
Member
superfury wrote on 2023-07-16, 17:24:

Isn't that essentially the same? First the TEST opcode is fetched and read (every empty PIQ waiting 1 cycle) in 1 cycle, then the same for the modr/m byte and the remaining modr/m parameters and finally the immediate.
It's still the same result?

No it is not the same. We're talking about cycle-accuracy. It's not cycle accurate. Your reads and fetches will happen at the wrong time, you will have an instruction that executes in a different amount of cycles than a real 8088.

If you don't care, that's fine, it's your decision how accurate you really want to be and how much time you want to spend getting there. But it's a bit silly to sit and wonder why the 8088MPH CPU test or Area 5150 end credits don't work if that is not your goal.

MartyPC: A cycle-accurate IBM PC/XT emulator | https://github.com/dbalsom/martypc

Reply 34 of 122, by superfury

User metadata
Rank l33t++
Rank
l33t++
GloriousCow wrote on 2023-07-16, 17:36:
superfury wrote on 2023-07-16, 17:24:

Isn't that essentially the same? First the TEST opcode is fetched and read (every empty PIQ waiting 1 cycle) in 1 cycle, then the same for the modr/m byte and the remaining modr/m parameters and finally the immediate.
It's still the same result?

No it is not the same. We're talking about cycle-accuracy. It's not cycle accurate. Your reads and fetches will happen at the wrong time, you will have an instruction that executes in a different amount of cycles than a real 8088.

If you don't care, that's fine, it's your decision how accurate you really want to be and how much time you want to spend getting there. But it's a bit silly to sit and wonder why the 8088MPH CPU test or Area 5150 end credits don't work if that is not your goal.

What is the difference then, wrt that TEST opcode? Doesn't each prefetch just stretch each instruction byte fetch to 4 cycles(T1-T4 each, T1 then succeeding (1 cycle) then proceeding on to the next byte or instruction specific part? Cause that's what UniPCemu is doing in that case. It's still the same order of fetches and PIQ waits. The main difference being the first opcode fetch start timing adding T4-Tn(as in MINUS) cycles before fetching the first opcode byte. And ofc the reporting of the specific S cycles, which need to be done through the BIU anyways.

Author of the UniPCemu emulator.
UniPCemu Git repository
UniPCemu for Android, Windows, PSP, Vita and Switch on itch.io

Reply 36 of 122, by superfury

User metadata
Rank l33t++
Rank
l33t++
GloriousCow wrote on 2023-07-16, 21:13:

you stated you fetch the rm and then the imm right after, correct?

Yes, each on 1 cycle of their own, delayed by (-)T2-T4 ticking in(to fill it with 1 byte each time) between if the PIQ is empty.
Atm 286+ will fuse 16-bit (or 32-bit on 386+) into 1 cycle of a 16-bit or 32-bit parameter, but 8086 uses 8-bit only (each on 1 cycle+any waiting for BIU to fill(essentially an EU stall)).
In fact, 8-bit vs 16-bit vs 32-bit fetch in 1 cycle is just an 8-bit blocking (think as in OS process threads, but for the EU instead) that is triggered at different moments (preventing more fetching and stalling the EU for that 1 cycle until the next cycle is ticked on the BIU).

Last edited by superfury on 2023-07-16, 21:21. Edited 1 time in total.

Author of the UniPCemu emulator.
UniPCemu Git repository
UniPCemu for Android, Windows, PSP, Vita and Switch on itch.io

Reply 38 of 122, by superfury

User metadata
Rank l33t++
Rank
l33t++
GloriousCow wrote on 2023-07-16, 21:17:

Did you look at the cycle trace I posted? Do you see that is not what the CPU does?

OK. So if I read it correctly, the fetch from BIU goes until the immediate, then execution starts, which reads it(documented instruction), then it reads the immediate from PIQ instead? Is that what is happening?

Edit: I think I've simply misunderstood what you meant with reading the 'EA'. I thought you meant the address that follows the modr/m byte(BEEF, as in "DS:BEEF"), but you meant the address value TEST is comparing the immediate against, don't you? I usually don't count the modr/m destination in memory as 'EA' being read, but instead call it's address EA and it's memory location (in physical memory) just src(source) or dst(destination), depending on disassembly.
Actually what I usually call EA is actually 'displacement'. And usually if I say modr/m byte I mean the first byte after the opcode(s). And if I say modr/m I usually mean both the modr/m byte and displacement after each other being fetched(both seperate reads by the main instruction decoder).

Is the immediate being fetched in the instruction itself a common behaviour? Or more of an exception? It complicates disassembly though.

Author of the UniPCemu emulator.
UniPCemu Git repository
UniPCemu for Android, Windows, PSP, Vita and Switch on itch.io

Reply 39 of 122, by GloriousCow

User metadata
Rank Member
Rank
Member
superfury wrote on 2023-07-16, 21:31:

OK. So if I read it correctly, the fetch from BIU goes until the immediate, then execution starts, which reads it(documented instruction), then it reads the immediate from PIQ instead? Is that what is happening?

Edit: I think I've simply misunderstood what you meant with reading the 'EA'. I thought you meant the address that follows the modr/m byte(BEEF, as in "DS:BEEF"), but you meant the address value TEST is comparing the immediate against, don't you? I usually don't count the modr/m destination in memory as 'EA' being read, but instead call it's address EA and it's memory location (in physical memory) just src(source) or dst(destination), depending on disassembly.

Whenever you have a modrm with a memory operand (ie, modrm encodes one of the 24 memory addressing modes), the EA / effective address is the final calculation of base+offset+displacement. The 8088 lacks circuitry for calculating the EA; so it's all done in microcode that begins the cycle after the modrm is fetched. This same microcode will load the byte or word at the EA if necessary, this is done via the EALOAD procedure. Only after that does the opcode microcode program execute; which will fetch any immediate.

ea_timings.png
Filename
ea_timings.png
File size
36.22 KiB
Views
832 views
File comment
EA calculation timings
File license
Fair use/fair dealing exception

This explains intel's published cycle timings for EA calculations; it's simply the number of microcode cycles it typically takes in each case to calculate the EA - but even then these figures aren't set in stone and can vary in practice.

The main point is; the EA load happens before the immediate fetch.

Let's annotate the cycle listing - Cycles 17-21 are the CPU's decode phase, fetching the opcode byte and modrm. Microcode execution starts at Cycle 22, starting in the procedure for [BX] (each addressing mode sans displacement has its own program). Having no displacement, we jump directly to EALOAD. The jump costs one cycle; EALOAD begins on Cycle 24, and issues a bus request to read the EA operand. The BIU is busy - it's T1 of a prefetch; so we wait until Cycle #28 when the load begins. The load is done on Cycle #34. We return from EALOAD on cycle #35, and begin microcode execution of TEST on cycle #37. The first thing TEST does is fetch its immediate operand, on cycles #37 and #38 (reflected in the QS status bits a cycle later, on #38 and #39, respectively). TEST signals that it is finishing on cycle #39 by setting the NX flag; this would normally allow the CPU to pipeline the last instruction of TEST with the first cycle of the next instruction; but we end up waiting on the next instruction opcode to be put in the queue on cycle #39 and read out on #40.

017   [F0104] CS M:R.. I:... Q:.. CODE T2 <-r 07 | F0 [        ] <-q F7 TEST word [bx], 0BEEFh 
018 [F0104] CS M:R.. I:... Q:.. PASV T3 <-r 07 | 0 [ ] ; fetch modrm
019 [F0104] CS M:... I:... Q:.. PASV T4 | 0 [ ] ; fetch modrm
020 A:[F0105] M:... I:... Q:.. CODE T1 | 1 [07 ] ; fetch modrm
021 [F0105] CS M:R.. I:... Q:.. CODE T2 <-r EF | S0 [ ] <-q 07 ; modrm fetched ; dispatch to microcode
022 [F0105] CS M:R.. I:... Q:.. PASV T3 <-r EF | 0 [ ] ; execute [BX] procedure - no displacement; so jump directly to [EALOAD] (cycle 1/2)
023 [F0105] CS M:... I:... Q:.. PASV T4 | 0 [ ] ; jump to [EALOAD] (cycle 2/2)
024 A:[F0106] M:... I:... Q:.. CODE T1 | 1 [EF ] ; execute [EALOAD] procedure - READ request to BIU
025 [F0106] CS M:R.. I:... Q:.. CODE T2 <-r BE | 1 [EF ] ; wait for BIU
026 [F0106] CS M:R.. I:... Q:.. PASV T3 <-r BE | 1 [EF ] ; wait for BIU
027 [F0106] CS M:... I:... Q:.. PASV T4 | 1 [EF ] ; wait for BIU
028 A:[1EA44] M:... I:... Q:.. MEMR T1 | 2 [EFBE ] ; READ request from BIU serviced
029 [1EA44] DS M:R.. I:... Q:.. MEMR T2 <-r 00 | 2 [EFBE ] ; EA operand loading
030 [1EA44] DS M:R.. I:... Q:.. PASV T3 <-r 00 | 2 [EFBE ] ; EA operand loading
031 [1EA44] DS M:... I:... Q:.. PASV T4 | 2 [EFBE ] ; EA operand loading
032 A:[1EA45] M:... I:... Q:.. MEMR T1 | 2 [EFBE ] ; EA operand loading
033 [1EA45] DS M:R.. I:... Q:.. MEMR T2 <-r 00 | 2 [EFBE ] ; EA operand loading
034 [1EA45] DS M:R.. I:... Q:.. PASV T3 <-r 00 | 2 [EFBE ] ; EA operand load complete
035 [1EA45] DS M:... I:... Q:.. PASV T4 | 2 [EFBE ] ; return from EALOAD (cycle 1/2)
036 A:[F0107] M:... I:... Q:.. CODE T1 | 2 [EFBE ] ; return from EALOAD (cycle 2/2)
037 [F0107] CS M:R.. I:... Q:.. CODE T2 <-r 90 | 2 [EFBE ] ; execute TEST opcode microcode: read byte from queue for imm16 operand
038 [F0107] CS M:R.. I:... Q:.. PASV T3 <-r 90 | S1 [BE ] <-q EF ; first queue read reflected on QS lines; read another byte for imm16 operand
039 [F0107] CS M:... I:... Q:.. PASV T4 | S0 [ ] <-q BE ; second queue read reflected on QS lines; raise NX flag, next instruction byte placed into queue
040 A:[F0108] M:... I:... Q:.. CODE T1 | 1 [90 ] ; TEST execution completes; instruction byte is read out of queue this cycle to begin next instruction next cycle
Last edited by GloriousCow on 2023-07-16, 22:44. Edited 2 times in total.

MartyPC: A cycle-accurate IBM PC/XT emulator | https://github.com/dbalsom/martypc