VOGONS


UniPCemu cycle accurate 8088 implementation

Topic actions

Reply 160 of 176, by superfury

User metadata
Rank l33t
Rank
l33t

So perhaps the STOSW instruction is failing for some unknown reason?

Also, see my previous post that I edited in just before your post about the missing timings in UniPCemu. Could those be the timings that cause UniPCemu's CPU to miss some EU cycles, thus resulting in the CPU being too fast(and 8088 MPH reporting too few PIT cycles having elapsed during it's startup speed check)?

The difficult part in that one is that UniPCemu starts taking 1 EU cycles until the BIU has completed it's transfer(if there actually is a BIU memory transfer). So that amounts to waiting for T1(completing the current transfer, if it's not there already) and waiting for the DMA transfer to release the bus(if any is running), then tick to T1(if nothing was running), ticking the transfer through to T4(the memroy transfer itself), after which at T1 the BIU returns success at the EU(the EU running before the BIU starts up a next transfer at it's T1 clock).

So should I just add those cycles in the BUSinit function to UniPCemu's timings for said instructions(ignoring the timings within the if instructions, since they're probably already done by the BIU as said parallel process)?

So, ignoring those BIU-related timings, should I add the following timings for those instructions using those _accessNumber settings?

1,6=1
2=1
3=2
4=3
5=2(INT 3 instruction) or 3 otherwise
7=1
8=1
9=1
10=3
11=2
12,13=4
14=4
15=2
16=2
17=1
18,19=3
20,21,24=1
22,23=2
25=2
26=2
27,32,37=3
28=1
29,30=4
31=6
33=4
34,39,41=4
35=2
36=5+m(1 if memory)
38=5
40=6
42=3
43=3
44,45=2
46=2
47,48,49,50,51=1
52=2
53=1
54=2
55=1
56=1
57,58=4+m(2 if memory)
59=5+m(1 if memory)
60=4
62=1
65=3+m(1 if memory)
68=1
70=5

These are my current EU instruction emulation timings for the 808X core:
https://bitbucket.org/superfury/unipcemu/src/ … /opcodes_8086.c

Look for cycles_OP for the timings I've implemented for those instructions so far. Although all timings are implemented at only one point during the instruction, no two locations(before and after the memory access cycles). Is that an issue(it's mainly the way UniPCemu handles all BIU transfers, using requests(which are accepted, transferred and returned, all the time ticking the EU in some NOP cycles(of 1 cycle at a time))?

UniPCemu Git repository
UniPCemu for Android, Windows and PSP on itch.io
Older UniPCemu PC/Android/PSP releases

Reply 161 of 176, by reenigne

User metadata
Rank Oldbie
Rank
Oldbie
superfury wrote:

Edit: Looking a bit further, it seems all those timings concerning "_accessNumber = " are the timings that are probably missing from UniPCemu(except perhaps the CS/IP-related timings, which don't seem to match at all, perhaps because it's based on one of the earlier replies in this thread instead). So perhaps I would need to take all those timings and add them to UniPCemu's timings for said instruction?

Yes, that represents the limits of my knowledge of the 8088 cycle-exact timings at the moment. It's not perfect yet (there are some corner cases I haven't written tests for yet) but it does give the same results as the real hardware for millions of testcases.

Having said that, _accessNumber is a horrible hack and busInit() really should be much simpler. Some trial-and-error will be needed to find those simplifications. I have a suspicion that the key may be taking into account the 8088's 8086 heritage and original 16-bit bus width into account. The even and odd bytes of the prefetch queue might not be symmetrical. There is also some official documentation in the "8086 Instruction Sequence" section starting on page 4-37 of the iAPX 86,88 User's Manual, which I recently discovered covers some details of the timing that I had previously only discovered by observation, but which is documented differently to my model. In particular, that document says (third paragraph in second column of page 4-37):

Instead of completing the opcode fetch and forcing the EU to wait four additional clock cycles, the BIU immediately aborts the fetch cycle (resulting in two idle clock cycles (T_I) in clock cycles 19 and 20) and performs the required memory write. This interaction between the EU and BIU results in a single clock extension to the execution time of the PUSH AX instruction, the maximum delay that can occur in response to an EU bus cycle request.

I had observed these two idle clock cycles before but hadn't modelled them as an aborted fetch cycle. Doing so might simplify the code significantly.

Reply 162 of 176, by superfury

User metadata
Rank l33t
Rank
l33t
reenigne wrote:
Yes, that represents the limits of my knowledge of the 8088 cycle-exact timings at the moment. It's not perfect yet (there are s […]
Show full quote
superfury wrote:

Edit: Looking a bit further, it seems all those timings concerning "_accessNumber = " are the timings that are probably missing from UniPCemu(except perhaps the CS/IP-related timings, which don't seem to match at all, perhaps because it's based on one of the earlier replies in this thread instead). So perhaps I would need to take all those timings and add them to UniPCemu's timings for said instruction?

Yes, that represents the limits of my knowledge of the 8088 cycle-exact timings at the moment. It's not perfect yet (there are some corner cases I haven't written tests for yet) but it does give the same results as the real hardware for millions of testcases.

Having said that, _accessNumber is a horrible hack and busInit() really should be much simpler. Some trial-and-error will be needed to find those simplifications. I have a suspicion that the key may be taking into account the 8088's 8086 heritage and original 16-bit bus width into account. The even and odd bytes of the prefetch queue might not be symmetrical. There is also some official documentation in the "8086 Instruction Sequence" section starting on page 4-37 of the iAPX 86,88 User's Manual, which I recently discovered covers some details of the timing that I had previously only discovered by observation, but which is documented differently to my model. In particular, that document says (third paragraph in second column of page 4-37):

Instead of completing the opcode fetch and forcing the EU to wait four additional clock cycles, the BIU immediately aborts the fetch cycle (resulting in two idle clock cycles (T_I) in clock cycles 19 and 20) and performs the required memory write. This interaction between the EU and BIU results in a single clock extension to the execution time of the PUSH AX instruction, the maximum delay that can occur in response to an EU bus cycle request.

I had observed these two idle clock cycles before but hadn't modelled them as an aborted fetch cycle. Doing so might simplify the code significantly.

What do you mean with 'The even and odd bytes of the prefetch queue might not be symmetrical'? Is there even a concept of even and odd bytes in a PIQ(a circular buffer of sorts)? What's symmetrical about a PIQ? It can't be the data stored within it(instruction data), as that would corrupt the entire instruction stream? What do you mean with that?

Also, how do you suppose I implement this, seeing as UniPCemu has an entirely different way of working(a simple 1-command FIFO for sending requests to the PIQ(empty only when not busy handling an command and only fillable when ready), the reverse for it's result FIFO(containing 1 for writes or x for the read memory data(at reaching T1 completing the memory transfer).
UniPCemu's BIU simply spins on 'T1' during DMA or when it has nothing to do(PIQ full and no I/O requests from the EU). The EU spins as well waiting for the BIU to finish/get ready to receive a command, during requesting memory accesses using the request PIQ to get empty, as well as during the result of the BIU to get filled(ticking in 1 cycle increments). The basic commands the BIU receives from the EU are just a few things: memory read byte/word/dword, memory write byte/word/dword, bus read byte/word/dword and bus write byte/word/dword. And of course the BIU, when it has nothing to do on T1 and the PIQ isn't full, it just starts a memory fetch to fill the PIQ(dword on 80386+, word on 8086+, byte on 8088; dword->word->byte also being dependant on physical memory alignment of course). And finally there's DMA, which will currently take the bus between any byte/word/dword transfer, unless the LOCK prefix or XCHG is used.

With a DMA transfer, T3 will release the bus(happens always), DMA will take it at that cycle and take the bus and delay(performing the first S0 one cycle too early), starting S0 at the CPU's blocking loop's T1 cycle and clears the delay flag at said cycle, then at the 'T2''s time(of course the CPU doesn't tick, it just patiently waits in T1 state for DMA to release the bus and be able to continue it's next request/PIQ fetch) it ticks the actual S0, then at the next cycles it's the usually documented DMA states(S1-S5, looping S5=S5+SI+S0 without releasing the bus during block transfer/burst mode). So that simulates the usual DMA in a compatible way(at S4/S1). Of course bus locking blocks the DMA from taking the bus until the EU finishes the entire instruction(including trailing cycles).

UniPCemu Git repository
UniPCemu for Android, Windows and PSP on itch.io
Older UniPCemu PC/Android/PSP releases

Reply 163 of 176, by superfury

User metadata
Rank l33t
Rank
l33t

Is the REP prefix used in the 8088 MPH credits? I just found a bug causing REP instructions that went past their first iteration of the instruction(and decreasing CX because the instruction was finished) caused all other instructions to effectively become NOP instructions with 1-cycle timings 😖 Simply becauses the repeating check forgot to reset the execution phase handler to start a new instruction(it thought the instruction was finished, and thus it had nothing to do, thus not calling the instruction handler anymore).

UniPCemu Git repository
UniPCemu for Android, Windows and PSP on itch.io
Older UniPCemu PC/Android/PSP releases

Reply 164 of 176, by superfury

User metadata
Rank l33t
Rank
l33t

Just noticed something interesting after fixing the REP prefix. Now that the instructions that are used with it actually execute(instead of the second byte/word/dword onwards essentially transferring nothing and have no execution instruction phase while still counting 1 cycle each on the BIU), certain parts of the demo simply skip now? Like the Delorean sprite demo, also immediately after the first 3D pyramid and the vectorballs part as well?

Edit: Tried it again with the latest updates(which also fixes the prefetch buffer clearing and jumping back to the start of the REPeated instruction). It now also properly uses the REP instruction, not prefetching anymore during said instruction's runtime(over multiple instructions being executed that way for CX count times). Now I notice the sprite recompiler somehow failing completely, with the Delorean car disappearing against the background completely(might be a timing issue, though)?

The vector balls still have a lot of noise in the background besides the vector balls' moving area? It seems to switch between two different static backgrounds?

UniPCemu Git repository
UniPCemu for Android, Windows and PSP on itch.io
Older UniPCemu PC/Android/PSP releases

Reply 165 of 176, by superfury

User metadata
Rank l33t
Rank
l33t

Just found out a tiny 'bug' in the CGA/MDA timings. It was properly applying waitstates and address wrapping to memory writes, but not applying it to memory reads as well. That might account for some timings and drifts?

UniPCemu Git repository
UniPCemu for Android, Windows and PSP on itch.io
Older UniPCemu PC/Android/PSP releases

Reply 166 of 176, by superfury

User metadata
Rank l33t
Rank
l33t

I've just been looking at https://github.com/reenigne/reenigne/blob/mas … r/8088/8088.txt again.

I see those IN and OUT instructions mentioning 4+4/8(for AL/AX using DX) and 6+4(for AL/AX).

You've mentioned the 2 cycles before the transfer(and 1 cycle waitstate)? I'd assume the second 4/8 is the actual transfer to the port(T1-T4). Why does it mention the first being 4 or 6? Didn't you mention 2 cycles only?

Edit: At least, the most recent changes(proper fetching termination and perhaps most of the IN/OUT instructions(except using DX) using a 1-cycle startup(up to 4 cycles to complete the current(4 cycles when starting at T1) or previous(when at T2-T4, which is until it reaches T1 again), with the 1 cycle waitstate on the bus transactions(in/out) and the E4-E7 opcodes a 2-cycle idle bus after those cycles) increases the metric cycle count to ~1545. The rolling over fake text screen at the start of the demo (after the calibration screen) also seems to run without visible issues now?

UniPCemu Git repository
UniPCemu for Android, Windows and PSP on itch.io
Older UniPCemu PC/Android/PSP releases

Reply 167 of 176, by reenigne

User metadata
Rank Oldbie
Rank
Oldbie
superfury wrote on 2020-03-11, 18:10:

Pay no attention to that document - it's an older one and the timings there are just from the published ones if I recall correctly. https://github.com/reenigne/reenigne/blob/mas … 088/xtce/xtce.h is the second best source of timing information I know of right now, the best being the XT Server and ISA bus sniffer.

Reply 168 of 176, by superfury

User metadata
Rank l33t
Rank
l33t

OK. But those 1-cycle startup after the start of IN/OUT, 2-cycle for non-DX IN/OUT BIU idle following that and 1-cycle waitstate on all I/O bus operations is correct?

8088MPH reports 1547 metric cycles now.

UniPCemu Git repository
UniPCemu for Android, Windows and PSP on itch.io
Older UniPCemu PC/Android/PSP releases

Reply 169 of 176, by reenigne

User metadata
Rank Oldbie
Rank
Oldbie
superfury wrote on 2020-03-12, 11:44:

OK. But those 1-cycle startup after the start of IN/OUT, 2-cycle for non-DX IN/OUT BIU idle following that and 1-cycle waitstate on all I/O bus operations is correct?

8088MPH reports 1547 metric cycles now.

The PC/XT motherboard imposes a 1-cycle waitstate on all port IO instructions, yes.

As for the other questions, I'm not sufficiently familiar with your model of the 8088's timings to say for sure. But take a look at http://www.reenigne.org/misc/inout_sniffer.zip for some ISA bus sniffer logs of tricky sequences involving IN and OUT. If your emulator has the same timings for these sequences, it's probably correct for these instructions.

Reply 170 of 176, by superfury

User metadata
Rank l33t
Rank
l33t

Do you have some sniffer log of the metric cycle check done at the start of the demo(as well as it's starting address, perhaps the instruction jumping to it for the first time)?

Edit: Btw, UniPCemu now(new addition to CPU emulation!) waits for the prefetch timings to complete before starting the request(on that cycle) for the instruction execution(e.g. memory accesses or port I/O part of the instruction, it's execution phase). Previously it did the request and start of I/O/MMU request cycle (T1) during the last (few) cycles of the prefetch fetch for execution(essentially overlapping them incorrectly).

Now, after the prefetch cycles, the execution phase(normal instruction handling) starts executing the cycle(s) after that, timing properly.

So for port I/O(BUS as UniPCemu calls it, the other one being MMU/memory), it's one cycle normal behaviour by the BIU(prefetching if T1), then wait for T1 again(if not T1 after that yet), then the 2-cycle idle(for non-DX), then the actual T1-T2-T3-Tw(only 1)-T4 cycles, finishing the instruction on reaching T1(which will start to prefetch, if possible, which it will due to the instruction not lasting long enough to fill it fully again).

UniPCemu Git repository
UniPCemu for Android, Windows and PSP on itch.io
Older UniPCemu PC/Android/PSP releases

Reply 171 of 176, by reenigne

User metadata
Rank Oldbie
Rank
Oldbie
superfury wrote on 2020-03-12, 13:13:

Do you have some sniffer log of the metric cycle check done at the start of the demo(as well as it's starting address, perhaps the instruction jumping to it for the first time)?

Unfortunately that's easier said than done - the ISA bus sniffer only has 2kB of RAM on the ATMega328 microcontroller that runs it, limiting it to runs of 2048 cycles. I could capture it with multiple runs but it'd be a bit of work.

However, bear in mind that the 8088MPH speed test was never meant to be an emulator torture test - it was just sufficiently sensitive to tell IBM PC/XTs from contemporary machines with similar (but not identical) timings. Getting the speed test to pass doesn't mean that all instructions are correct, nor will even guarantee that the rest of the demo will run correctly. These logs of carefully curated patterns are much more of a torture test. I plan to make an actual emulator torture test out of them soon.

Reply 172 of 176, by superfury

User metadata
Rank l33t
Rank
l33t

What about just splitting the 8088 MPH speed test part into managable chunks for the bus sniffer(of course starting with a jump to get the correct start cache situation)? If I can run those as some kind of BIOS on UniPCemu in cycle-accurate mode, it might be able to find the offending instructions not taking up enough time(seeing as the count with UniPCemu is less than what it should be on a real 8088)?

Also, having such smaller chunks to test with might at least indicate what opcodes are actually acting up?

Edit: Btw, don't you have some kind of list of all opcodes that are used in xtce.h and their counts for the different parts of the instruction? Like it is atm I have to keep jumping up and down the code for those _accessNumber methods of accessing memory/bus, which is kind of confusing trying to verify it againt other emulators? And from what I remember, some other emulators that copy said behaviour use almost exactly the same method, which is confusing as hell, having to jump up and down the code just to find out one instruction's time(or a group of related timings)?

Edit: Just implemented the timings up to and including INCDEC(at least the 16-bit versions) in your code(skipping 00-3B for now), then implemented all timings using jumpNear/jumpShort from your code.

Is it really true that the conditional jumps and normal jumps using jumpShort seem to wait for T1 in the middle of the instruction?

Edit: Applying the missing 00-3B opcode timings, it reports 1546 cycles right now. Somthing's still missing obviously.
'

Last edited by superfury on 2020-03-13, 17:57. Edited 1 time in total.

UniPCemu Git repository
UniPCemu for Android, Windows and PSP on itch.io
Older UniPCemu PC/Android/PSP releases

Reply 173 of 176, by Alegend45

User metadata
Rank Newbie
Rank
Newbie
superfury wrote on 2020-03-13, 13:43:
What about just splitting the 8088 MPH speed test part into managable chunks for the bus sniffer(of course starting with a jump […]
Show full quote

What about just splitting the 8088 MPH speed test part into managable chunks for the bus sniffer(of course starting with a jump to get the correct start cache situation)? If I can run those as some kind of BIOS on UniPCemu in cycle-accurate mode, it might be able to find the offending instructions not taking up enough time(seeing as the count with UniPCemu is less than what it should be on a real 8088)?

Also, having such smaller chunks to test with might at least indicate what opcodes are actually acting up?

Edit: Btw, don't you have some kind of list of all opcodes that are used in xtce.h and their counts for the different parts of the instruction? Like it is atm I have to keep jumping up and down the code for those _accessNumber methods of accessing memory/bus, which is kind of confusing trying to verify it againt other emulators? And from what I remember, some other emulators that copy said behaviour use almost exactly the same method, which is confusing as hell, having to jump up and down the code just to find out one instruction's time(or a group of related timings)?

Edit: Just implemented the timings up to and including INCDEC(at least the 16-bit versions) in your code(skipping 00-3B for now), then implemented all timings using jumpNear/jumpShort from your code.

Is it really true that the conditional jumps and normal jumps using jumpShort seem to wait for T1 in the middle of the instruction?

You could try looking at 86Box's 808x.c code, as it does pass the 8088MPH check, and it runs the demo just fine all the way through 😜

Reply 174 of 176, by reenigne

User metadata
Rank Oldbie
Rank
Oldbie
superfury wrote on 2020-03-13, 13:43:

What about just splitting the 8088 MPH speed test part into managable chunks for the bus sniffer(of course starting with a jump to get the correct start cache situation)?

Like I said, that's the tricky bit. The queue and bus state also need to be in the right state for each instruction.

superfury wrote on 2020-03-13, 13:43:

If I can run those as some kind of BIOS on UniPCemu in cycle-accurate mode, it might be able to find the offending instructions not taking up enough time(seeing as the count with UniPCemu is less than what it should be on a real 8088)?

Taking the right number of cycles is only part of the puzzle, though. Each instruction also has to leave the bus and prefetch queue in the right state. You might have some instructions that are taking too long as well. So for your purposes it is better to do a lot of small tests than one big one like the 8088 MPH benchmark.

superfury wrote on 2020-03-13, 13:43:

Edit: Btw, don't you have some kind of list of all opcodes that are used in xtce.h and their counts for the different parts of the instruction?

Unfortunately not as the 8088 timings are more complicated than that.

superfury wrote on 2020-03-13, 13:43:

Like it is atm I have to keep jumping up and down the code for those _accessNumber methods of accessing memory/bus, which is kind of confusing trying to verify it againt other emulators?

_accessNumber is a hack which I would like to get rid of, once I've figured out how to do so and keep the same timings. I have some ideas about how to do this, but I need to sit down with it for a while and work through it. However, as I am having to cancel plans all over the place for pandemic-related reasons, I might have time to do this quite soon.

superfury wrote on 2020-03-13, 13:43:

And from what I remember, some other emulators that copy said behaviour use almost exactly the same method, which is confusing as hell, having to jump up and down the code just to find out one instruction's time(or a group of related timings)?

That may be because the emulator that you are thinking of uses XTCE's code (with my blessing).

Edit: Just implemented the timings up to and including INCDEC(at least the 16-bit versions) in your code(skipping 00-3B for now), then implemented all timings using jumpNear/jumpShort from your code.

superfury wrote on 2020-03-13, 13:43:

Is it really true that the conditional jumps and normal jumps using jumpShort seem to wait for T1 in the middle of the instruction?

That is the best explanation that I have so far been able to come up with based on the observed behaviour. I'm hoping that once I sort out the _accessNumber mess then a lot of other things like that can be done in a way that seems more likely to reflect what the chip is actually doing.

Reply 175 of 176, by superfury

User metadata
Rank l33t
Rank
l33t

About UniPCemu's cycle-accurate method, it's pretty simple:
- The BIU runs after each EU cycle(EU only when not sleeping) in UniPCemu. The EU posts request for the current cycle(s) to execute on the BIU. After posting it, it essentially starts to sleep for n cycles(leaving only the BIU to run). Once the BIU has finished it's job(reached T1 again from T4), it posts it's result in a reply buffer(data read for reads, 1 for writes). Once the BIU has processed all requird cycles, the EU starts running again.
- The BIU runs without stop, either processing T1-T2-T3-Tw(times * )-T4 cycles(when doing a memory access) or stuck at T1 with the bus idle when doing nothing. It will fill the prefetch queue when it's empty enough, otherwise(without request from the EU), it will idle. And of course, requests from the EU have priority over prefetching bytes/words/dwords from memory(which depend on the PIQ free size in bytes and the current PIQ prefetch physical memory address).

https://bitbucket.org/superfury/unipcemu/src/ … PCemu/cpu/biu.c

Most timings by the EU are simply kept using various instruction and internal counters(instruction counters for the instruction level itself, internal counters for the common instruction handlers and interrupt handling).
Said instruction counters are simply instruction state registers, which increment by 1 for each step, each 2 steps being a request(even step number) and the other of the 2 steps being the result(odd step number) retrieval).
Said request and result functions can either pass(return 0) to make the instruction continue on or skip already done steps, or fail(return 1) to make the instruction abort and wait for the BIU to become ready for the step.

For example the add instruction to modr/m uses this:

byte CPU8086_instructionstepreadmodrmw(word base, word *result, byte paramnr)
{
byte BIUtype;
if (CPU[activeCPU].modrmstep==base) //First step? Request!
{
if ((BIUtype = modrm_read16_BIU(&params,paramnr,result))==0) //Not ready?
{
CPU[activeCPU].cycles_OP += 1; //Take 1 cycle only!
CPU[activeCPU].executed = 0; //Not executed!
return 1; //Keep running!
}
++CPU[activeCPU].modrmstep; //Next step!
if (BIUtype==2) //Register?
{
++CPU[activeCPU].modrmstep; //Skip next step!
}
else //Memory?
{
BIU_handleRequests(); //Handle all pending requests at once when to be processed!
}
}
if (CPU[activeCPU].modrmstep==(base+1))
{
if (BIU_readResultw(result)==0) //Not ready?
{
CPU[activeCPU].cycles_OP += 1; //Take 1 cycle only!
CPU[activeCPU].executed = 0; //Not executed!
return 1; //Keep running!
}
++CPU[activeCPU].modrmstep; //Next step!
}
return 0; //Ready to process further! We're loaded!
}

So, for example, a simple load&store instruction goes like this(e.g. ADD [0],12h):
- Request [0]. When failing(BIU not ready yet for a request, still busy handling something), waits for the BIU(reaches line 8 in the code above) to abort the instruction(return in c). When success, continue on to check the result(isn't there for new requests), reaching line 12.
- Check the result. If there's a result, read it(BIU is now ready for a new request). Registers are already read during the previous request step(returning the value 2). Otherwise, abort the instruction until it is ready to read the result(reaching line 26).
- Once the result is successfully read, the result of the function is 0, allowing the caller to continue handling the next step of the instruction timing.

The same kind of request/result method (setting CPU[activeCPU].executed to 0 to not finish the instruction and make the BIU tick some) is basically used for any timing in the EU core.

For the EU core itself, see: https://bitbucket.org/superfury/unipcemu/src/ … /opcodes_8086.c

The basics are pretty simple: look up the opcode for the opcode(CPU8086_OPxx), which will either handle the instruction using said functions mentioned above(the CPU_instructionstep* functions), or it will call the generic handlers that are shared by multiple instructions(the CPU_internal_* functions, e.g. for ADD/SUB/XOR/CMP/XCHG, as well as some misc instructions(XLAT, the string instructions, adjustment instructions(DAA/DAS/AAD/AAS), RET(F) instructions, INTO, LxS(LDS/LES) and far call(which is partly handled externally, in the 80386 protected mode handlers(protection.c, see function segmentWritten's else clause at its end(mostly for compatiblity with 80286+ segment writes and jumps/calls etc.)))).

Edit: So currently I'm 114 cycles short in the 8088 MPH 1546 metric cycle count. That's still quite a lot cycles missing(114 cycles or so)?

As can be seen in BIU.c, the 808X requests are pretty much handled on T1 always, so requests while it's not at T1 yet makes the EU delay the request until T1 is actually reached automatically(it might place the request, but the BIU will only start handling it when the state becomes T1 again(finishing the prefetch operation or the previous request). Although normally(as can be seen) the requests are finished by the EU after the BIU posts it's result value(what's read or 1 for writes), so the request should always get posted(although the BIU will finish the prefetch it's handling before getting to said request and clearing the buffer). And since the BIU will sleep the EU while it's handling previous instructions timings (e.g. the cycles from a MUL instruction), the EU won't start checking again until it's cycles are fully handled, making it sync properly now(previously it didn't do this properly with the start of an instruction, which was an obvious bug).

Essentially it's like the EU(the client) talking to the BIU(the server). That's essentially the way it's built. Of course the EU keeps track of it's executed state using some simply counters(for different kinds of steps), which increase by 1 for request/response or by 2 with functions which simply deal with timings and delays(e.g. CPU8086_instructionstepdelayBIUidle). Those are split up as mentioned above, with a special seperated counter for normal steps and modr/m ones(for modr/m based steps).

UniPCemu Git repository
UniPCemu for Android, Windows and PSP on itch.io
Older UniPCemu PC/Android/PSP releases

Reply 176 of 176, by superfury

User metadata
Rank l33t
Rank
l33t

Thinking about it, can't I just run 86Box with some logging and compare it to UniPCemu's during the metric cycle count part? That would at least indicate to me what might be wrong?

UniPCemu Git repository
UniPCemu for Android, Windows and PSP on itch.io
Older UniPCemu PC/Android/PSP releases