VOGONS


IBM PC Speaker RC values?

Topic actions

Reply 60 of 66, by superfury

User metadata
Rank l33t++
Rank
l33t++

I've made a little log of the 8088 MPH credits executing, which includes register states etc., but now also logs the used cycles on different parts of the instruction:
https://www.dropbox.com/s/674e6s71smifkp3/deb … redits.zip?dl=0

Every instruction information is dumped right after the instruction disassembly.

Edit: Looking at the source code of your EU( https://bitbucket.org/vstamate/cape-public/sr … U.cpp?at=master ), it seems that the rough workflow is about the same as UniPCemu's. The main difference being that fetches to/from memory and delays are seperated from the main execution flow(whereas UniPCemu does everything in the same order all at once(your decode/rmmode/EA/imm cycles are done at my CPU_readOP_prefix combined with the CPU_readOP's results, while the rest of your execution(read/delay/execute/write) is done in the function in the huge multi-CPU(8086-80586) lookup table(although cycle-number-wise only)), then applying BIU prefetching on unused cycles afterwards instead of during execution/fetching).

My CPU core(close to cycle-exact, only the general CPU requirements are done here, the rest is in the 8086, NECV30(80186), 80286 and 80386 core files(basic execution/read/write timings)): https://bitbucket.org/superfury/unipcemu/src/ … cpu.c?at=master
My 808X core: https://bitbucket.org/superfury/unipcemu/src/ … 086.c?at=master

Can you see what's going wrong, vladstamate, reenigne, Jepael?

Edit: The CPU core can be adjusted to provide (semi-)exact cycle timings though, although, since the CPU has to wait for the BIU transfer to complete anyway(the only thing that can occur in parallel is prefetching?), that might be wasted time to implement? So the only difference might actually be that my emulator has 'faster' prefetch, since the prefetch cycles (4 for each prefetched byte) apply to the total time the BIU is idle, instead of the total time the BIU is idle including little delays(e.g. your emulator might do a prefetch like: 2 cycles decoding, 4 cycles fetching, 2 cycles delay, 2 cycles writing, 2 cycles delay; UniPCemu will simply do the equivalent of: 2 cycles decoding, 4 cycles fetching, [2 cycles writing, 4 cycles delay]. The total cycles will be the same, but since UniPCemu has a 4 cycles delay, it will fetch into the prefetch once, while your emulator won't fetch into the prefetch at all, due to the read/write/execution phases not being emulated seperately).

One thing that's got me wondering though is this: how does the BIU knows to prefetch into the prefetch buffer while the EU is busy? It cannot know to fetch a byte into the prefetch, since it doesn't know for how long the EU is still going to be busy on the execution phase? How is this synchronized?

Author of the UniPCemu emulator.
UniPCemu Git repository
UniPCemu for Android, Windows, PSP, Vita and Switch on itch.io

Reply 61 of 66, by superfury

User metadata
Rank l33t++
Rank
l33t++

Btw, vladstamate, since you've managed to get pretty much cycle-accuracy 8088 emulation, have you figured out the exact cycle counts of (i)div and (i)mul? I have the mul instruction working afaik, but the (i)div formulas are still unknown. Have you managed to figure those out?

Author of the UniPCemu emulator.
UniPCemu Git repository
UniPCemu for Android, Windows, PSP, Vita and Switch on itch.io

Reply 62 of 66, by vladstamate

User metadata
Rank Oldbie
Rank
Oldbie
superfury wrote:

Btw, vladstamate, since you've managed to get pretty much cycle-accuracy 8088 emulation, have you figured out the exact cycle counts of (i)div and (i)mul? I have the mul instruction working afaik, but the (i)div formulas are still unknown. Have you managed to figure those out?

No, those 2 I did not. We should probably implement the algorithm that Scali (or was it reenigne) mentioned a while ago. Basically iterative subtraction/addition with one operation per cycle. Plus some setup cost.

YouTube channel: https://www.youtube.com/channel/UC7HbC_nq8t1S9l7qGYL0mTA
Collection: http://www.digiloguemuseum.com/index.html
Emulator: https://sites.google.com/site/capex86/
Raytracer: https://sites.google.com/site/opaqueraytracer/

Reply 63 of 66, by vladstamate

User metadata
Rank Oldbie
Rank
Oldbie
superfury wrote:

Edit: The CPU core can be adjusted to provide (semi-)exact cycle timings though, although, since the CPU has to wait for the BIU transfer to complete anyway(the only thing that can occur in parallel is prefetching?), that might be wasted time to implement? So the only difference might actually be that my emulator has 'faster' prefetch, since the prefetch cycles (4 for each prefetched byte) apply to the total time the BIU is idle, instead of the total time the BIU is idle including little delays(e.g. your emulator might do a prefetch like: 2 cycles decoding, 4 cycles fetching, 2 cycles delay, 2 cycles writing, 2 cycles delay; UniPCemu will simply do the equivalent of: 2 cycles decoding, 4 cycles fetching, [2 cycles writing, 4 cycles delay]. The total cycles will be the same, but since UniPCemu has a 4 cycles delay, it will fetch into the prefetch once, while your emulator won't fetch into the prefetch at all, due to the read/write/execution phases not being emulated seperately).

But here is the problem. When exactly (in terms of what cycle) inside an instruction execution will the EU decide it needs some data matters. That is because once that request has been made to the BIU the BIU can no longer do any prefetches. It might have been lucky to squeeze in one before but maybe not. Also if the instruction needs the data from the BIU before actual execution as in "AND [BX], AX" then only after that can the BIU fill in the prefetch. And it might not have a chance before the next instruction.

superfury wrote:

One thing that's got me wondering though is this: how does the BIU knows to prefetch into the prefetch buffer while the EU is busy? It cannot know to fetch a byte into the prefetch, since it doesn't know for how long the EU is still going to be busy on the execution phase? How is this synchronized?

The BIU does not care what the EU is doing. Every T4 cycle it will see if it has any valid requests from the EU. If it has any (like reads/writes/ins/outs) then it does those (a byte at a time for 8088) if not it will just issue a memory read and put the byte in the EU's prefetch. For my emulator it is available to the EU in the next cycle (T1).

YouTube channel: https://www.youtube.com/channel/UC7HbC_nq8t1S9l7qGYL0mTA
Collection: http://www.digiloguemuseum.com/index.html
Emulator: https://sites.google.com/site/capex86/
Raytracer: https://sites.google.com/site/opaqueraytracer/

Reply 64 of 66, by reenigne

User metadata
Rank Oldbie
Rank
Oldbie
superfury wrote:

One thing that's got me wondering though is this: how does the BIU knows to prefetch into the prefetch buffer while the EU is busy? It cannot know to fetch a byte into the prefetch, since it doesn't know for how long the EU is still going to be busy on the execution phase? How is this synchronized?

The BIU starts a prefetch operation if all of the following conditions hold:
1) It's not busy doing a non-prefetch bus operation (i.e. a bus operation initiated by the EU)
2) The prefetch queue is not full
3) The BIU isn't paused (the EU sets a flag to pause bus operations in some circumstances when it is known that the results won't be useful - in particular, during execution of a jump instruction).

Reply 65 of 66, by vladstamate

User metadata
Rank Oldbie
Rank
Oldbie
reenigne wrote:

3) The BIU isn't paused (the EU sets a flag to pause bus operations in some circumstances when it is known that the results won't be useful - in particular, during execution of a jump instruction).

Oh that is interesting. I am not modeling that in CAPE. According to your logs from capturing the bus operations it seems the prefetch is cleared towards the beginning of the JMP execution. I did not know the BIU is also paused. However from the 15 cycles that JMP is advertised to take by Intel, the last 4 are just prefetching the next byte from the new location. So you have about 11 cycles of JMP execution. The first 3 cycles are busy with reading from the prefetch (in an ideal world where everything IS in the prefetch already, as the Intel times assume that) - the code and 2 byte for offset. So that leaves you with 8 cycles for actual "execution" such as clear prefetch and set new IP. What you are saying is that during those 8 cycles BIU is paused, interesting.

However the problem is the BIU must be only paused AFTER everything is in the processor, as it might happen (and it usually does) that the offset for example is not in the prefetch so the BIU has to work to get it. I'll update CAPE with this behavior. Nothing really happens during that time anyway (maybe DMA transfers? but those take 5 cycles per 8bit value so not many can happen during that time). Also I assume that applies to CALLs, FAR JMPs, FAR CALLs, etc.

I am not contradicting you reenigne, you have more gathered data than I do, I am just thinking out loud.

YouTube channel: https://www.youtube.com/channel/UC7HbC_nq8t1S9l7qGYL0mTA
Collection: http://www.digiloguemuseum.com/index.html
Emulator: https://sites.google.com/site/capex86/
Raytracer: https://sites.google.com/site/opaqueraytracer/

Reply 66 of 66, by reenigne

User metadata
Rank Oldbie
Rank
Oldbie
vladstamate wrote:

Oh that is interesting. I am not modeling that in CAPE. According to your logs from capturing the bus operations it seems the prefetch is cleared towards the beginning of the JMP execution. I did not know the BIU is also paused.

You can see this behaviour in action in this dump: http://reenigne.homenet.org/ri8JtfcEAYZzsiFF0.sniff.txt - look at the sequence of "JMP IP+00" instructions starting at (say) line 160. The prefetch queue is emptied on line 167, so the prefetching of the next instruction happens on lines 168-175. So we know there are two bytes in the queue at this point. The instruction starts executing at line 174 and a third prefetch starts at line 176. Now, when we get to line 180 we know there can't be more than 3 bytes in the prefetch queue (since that's how many we prefetched since the queue was emptied). At this point, the bus isn't busy (the third prefetch finished) and there's space in the prefetch queue but still no prefetch starts, so we know there must be some kind of "pause" mechanism in action to prevent a prefetch.

vladstamate wrote:

However the problem is the BIU must be only paused AFTER everything is in the processor, as it might happen (and it usually does) that the offset for example is not in the prefetch so the BIU has to work to get it.

Yep - it would be quite counterproductive of the CPU to pause prefetching before it's got all the bytes of the instruction it's working on!

vladstamate wrote:

I'll update CAPE with this behavior. Nothing really happens during that time anyway (maybe DMA transfers?

Yes, DMA transfers can still happen.

Emulating this pause behaviour will be necessary for correct timings, since if you continue to prefetch when you should be paused then the post-queue-empty prefetch might not start on the correct cycle.

vladstamate wrote:

Also I assume that applies to CALLs, FAR JMPs, FAR CALLs, etc.

I think so, though it's more difficult to prove it in cases where there could be 4 bytes in the prefetch queue (since we don't know precisely when bytes are removed from the queue).