UniPCemu cycle accurate 8088 implementation

Emulation of old PCs, PC hardware, or PC peripherals.

Re: UniPCemu cycle accurate 8088 implementation

Postby superfury » 2019-5-19 @ 13:12

So perhaps the STOSW instruction is failing for some unknown reason?

Also, see my previous post that I edited in just before your post about the missing timings in UniPCemu. Could those be the timings that cause UniPCemu's CPU to miss some EU cycles, thus resulting in the CPU being too fast(and 8088 MPH reporting too few PIT cycles having elapsed during it's startup speed check)?

The difficult part in that one is that UniPCemu starts taking 1 EU cycles until the BIU has completed it's transfer(if there actually is a BIU memory transfer). So that amounts to waiting for T1(completing the current transfer, if it's not there already) and waiting for the DMA transfer to release the bus(if any is running), then tick to T1(if nothing was running), ticking the transfer through to T4(the memroy transfer itself), after which at T1 the BIU returns success at the EU(the EU running before the BIU starts up a next transfer at it's T1 clock).

So should I just add those cycles in the BUSinit function to UniPCemu's timings for said instructions(ignoring the timings within the if instructions, since they're probably already done by the BIU as said parallel process)?

So, ignoring those BIU-related timings, should I add the following timings for those instructions using those _accessNumber settings?
Code: Select all
1,6=1
2=1
3=2
4=3
5=2(INT 3 instruction) or 3 otherwise
7=1
8=1
9=1
10=3
11=2
12,13=4
14=4
15=2
16=2
17=1
18,19=3
20,21,24=1
22,23=2
25=2
26=2
27,32,37=3
28=1
29,30=4
31=6
33=4
34,39,41=4
35=2
36=5+m(1 if memory)
38=5
40=6
42=3
43=3
44,45=2
46=2
47,48,49,50,51=1
52=2
53=1
54=2
55=1
56=1
57,58=4+m(2 if memory)
59=5+m(1 if memory)
60=4
62=1
65=3+m(1 if memory)
68=1
70=5


These are my current EU instruction emulation timings for the 808X core:
https://bitbucket.org/superfury/unipcem ... des_8086.c

Look for cycles_OP for the timings I've implemented for those instructions so far. Although all timings are implemented at only one point during the instruction, no two locations(before and after the memory access cycles). Is that an issue(it's mainly the way UniPCemu handles all BIU transfers, using requests(which are accepted, transferred and returned, all the time ticking the EU in some NOP cycles(of 1 cycle at a time))?
superfury
l33t
 
Posts: 3228
Joined: 2014-3-08 @ 11:25
Location: Netherlands

Re: UniPCemu cycle accurate 8088 implementation

Postby reenigne » 2019-5-19 @ 13:49

superfury wrote:Edit: Looking a bit further, it seems all those timings concerning "_accessNumber = " are the timings that are probably missing from UniPCemu(except perhaps the CS/IP-related timings, which don't seem to match at all, perhaps because it's based on one of the earlier replies in this thread instead). So perhaps I would need to take all those timings and add them to UniPCemu's timings for said instruction?


Yes, that represents the limits of my knowledge of the 8088 cycle-exact timings at the moment. It's not perfect yet (there are some corner cases I haven't written tests for yet) but it does give the same results as the real hardware for millions of testcases.

Having said that, _accessNumber is a horrible hack and busInit() really should be much simpler. Some trial-and-error will be needed to find those simplifications. I have a suspicion that the key may be taking into account the 8088's 8086 heritage and original 16-bit bus width into account. The even and odd bytes of the prefetch queue might not be symmetrical. There is also some official documentation in the "8086 Instruction Sequence" section starting on page 4-37 of the iAPX 86,88 User's Manual, which I recently discovered covers some details of the timing that I had previously only discovered by observation, but which is documented differently to my model. In particular, that document says (third paragraph in second column of page 4-37):
Instead of completing the opcode fetch and forcing the EU to wait four additional clock cycles, the BIU immediately aborts the fetch cycle (resulting in two idle clock cycles (T_I) in clock cycles 19 and 20) and performs the required memory write. This interaction between the EU and BIU results in a single clock extension to the execution time of the PUSH AX instruction, the maximum delay that can occur in response to an EU bus cycle request.

I had observed these two idle clock cycles before but hadn't modelled them as an aborted fetch cycle. Doing so might simplify the code significantly.
User avatar
reenigne
Oldbie
 
Posts: 509
Joined: 2006-11-30 @ 05:13
Location: Cornwall, UK

Re: UniPCemu cycle accurate 8088 implementation

Postby superfury » 2019-5-23 @ 22:28

reenigne wrote:
superfury wrote:Edit: Looking a bit further, it seems all those timings concerning "_accessNumber = " are the timings that are probably missing from UniPCemu(except perhaps the CS/IP-related timings, which don't seem to match at all, perhaps because it's based on one of the earlier replies in this thread instead). So perhaps I would need to take all those timings and add them to UniPCemu's timings for said instruction?


Yes, that represents the limits of my knowledge of the 8088 cycle-exact timings at the moment. It's not perfect yet (there are some corner cases I haven't written tests for yet) but it does give the same results as the real hardware for millions of testcases.

Having said that, _accessNumber is a horrible hack and busInit() really should be much simpler. Some trial-and-error will be needed to find those simplifications. I have a suspicion that the key may be taking into account the 8088's 8086 heritage and original 16-bit bus width into account. The even and odd bytes of the prefetch queue might not be symmetrical. There is also some official documentation in the "8086 Instruction Sequence" section starting on page 4-37 of the iAPX 86,88 User's Manual, which I recently discovered covers some details of the timing that I had previously only discovered by observation, but which is documented differently to my model. In particular, that document says (third paragraph in second column of page 4-37):
Instead of completing the opcode fetch and forcing the EU to wait four additional clock cycles, the BIU immediately aborts the fetch cycle (resulting in two idle clock cycles (T_I) in clock cycles 19 and 20) and performs the required memory write. This interaction between the EU and BIU results in a single clock extension to the execution time of the PUSH AX instruction, the maximum delay that can occur in response to an EU bus cycle request.

I had observed these two idle clock cycles before but hadn't modelled them as an aborted fetch cycle. Doing so might simplify the code significantly.


What do you mean with 'The even and odd bytes of the prefetch queue might not be symmetrical'? Is there even a concept of even and odd bytes in a PIQ(a circular buffer of sorts)? What's symmetrical about a PIQ? It can't be the data stored within it(instruction data), as that would corrupt the entire instruction stream? What do you mean with that?

Also, how do you suppose I implement this, seeing as UniPCemu has an entirely different way of working(a simple 1-command FIFO for sending requests to the PIQ(empty only when not busy handling an command and only fillable when ready), the reverse for it's result FIFO(containing 1 for writes or x for the read memory data(at reaching T1 completing the memory transfer).
UniPCemu's BIU simply spins on 'T1' during DMA or when it has nothing to do(PIQ full and no I/O requests from the EU). The EU spins as well waiting for the BIU to finish/get ready to receive a command, during requesting memory accesses using the request PIQ to get empty, as well as during the result of the BIU to get filled(ticking in 1 cycle increments). The basic commands the BIU receives from the EU are just a few things: memory read byte/word/dword, memory write byte/word/dword, bus read byte/word/dword and bus write byte/word/dword. And of course the BIU, when it has nothing to do on T1 and the PIQ isn't full, it just starts a memory fetch to fill the PIQ(dword on 80386+, word on 8086+, byte on 8088; dword->word->byte also being dependant on physical memory alignment of course). And finally there's DMA, which will currently take the bus between any byte/word/dword transfer, unless the LOCK prefix or XCHG is used.

With a DMA transfer, T3 will release the bus(happens always), DMA will take it at that cycle and take the bus and delay(performing the first S0 one cycle too early), starting S0 at the CPU's blocking loop's T1 cycle and clears the delay flag at said cycle, then at the 'T2''s time(of course the CPU doesn't tick, it just patiently waits in T1 state for DMA to release the bus and be able to continue it's next request/PIQ fetch) it ticks the actual S0, then at the next cycles it's the usually documented DMA states(S1-S5, looping S5=S5+SI+S0 without releasing the bus during block transfer/burst mode). So that simulates the usual DMA in a compatible way(at S4/S1). Of course bus locking blocks the DMA from taking the bus until the EU finishes the entire instruction(including trailing cycles).
superfury
l33t
 
Posts: 3228
Joined: 2014-3-08 @ 11:25
Location: Netherlands

Re: UniPCemu cycle accurate 8088 implementation

Postby superfury » 2019-5-24 @ 13:58

Is the REP prefix used in the 8088 MPH credits? I just found a bug causing REP instructions that went past their first iteration of the instruction(and decreasing CX because the instruction was finished) caused all other instructions to effectively become NOP instructions with 1-cycle timings :S Simply becauses the repeating check forgot to reset the execution phase handler to start a new instruction(it thought the instruction was finished, and thus it had nothing to do, thus not calling the instruction handler anymore).
superfury
l33t
 
Posts: 3228
Joined: 2014-3-08 @ 11:25
Location: Netherlands

Re: UniPCemu cycle accurate 8088 implementation

Postby superfury » 2019-5-24 @ 18:04

Just noticed something interesting after fixing the REP prefix. Now that the instructions that are used with it actually execute(instead of the second byte/word/dword onwards essentially transferring nothing and have no execution instruction phase while still counting 1 cycle each on the BIU), certain parts of the demo simply skip now? Like the Delorean sprite demo, also immediately after the first 3D pyramid and the vectorballs part as well?

Edit: Tried it again with the latest updates(which also fixes the prefetch buffer clearing and jumping back to the start of the REPeated instruction). It now also properly uses the REP instruction, not prefetching anymore during said instruction's runtime(over multiple instructions being executed that way for CX count times). Now I notice the sprite recompiler somehow failing completely, with the Delorean car disappearing against the background completely(might be a timing issue, though)?

The vector balls still have a lot of noise in the background besides the vector balls' moving area? It seems to switch between two different static backgrounds?
superfury
l33t
 
Posts: 3228
Joined: 2014-3-08 @ 11:25
Location: Netherlands

Re: UniPCemu cycle accurate 8088 implementation

Postby superfury » 2019-6-01 @ 23:20

Just found out a tiny 'bug' in the CGA/MDA timings. It was properly applying waitstates and address wrapping to memory writes, but not applying it to memory reads as well. That might account for some timings and drifts?
superfury
l33t
 
Posts: 3228
Joined: 2014-3-08 @ 11:25
Location: Netherlands

Previous

Return to PC Emulation

Who is online

Users browsing this forum: No registered users and 2 guests