UniPCemu cycle accurate 8088 implementation

Reply 40 of 198, by superfury

Posted on 2017-04-05, 08:30

superfury Offline

Rank l33t++

Rank: l33t++
Posts: 5822
Joined: 2014-03-08, 11:25
Location: Netherlands

vladstamate wrote:
Yes. And I do that. I have the concept of "wasted cycles" I think I call them and those I subtract from final EU time. […]
Show full quote
superfury wrote:
Are the PIQ fetches(1 cycle/instruction byte) also needed to be substracted?

Yes. And I do that. I have the concept of "wasted cycles" I think I call them and those I subtract from final EU time.

superfury wrote:

Or should just the EA timings and memory timings need to be substracted? Also, the references say X + EA cycles, so X only needs to be substracted by 4 cycles/memory access?

Yes. You need to subtract the 4 cycles/memory access from X.

So if I understand it completely, you'll have the documentation saying X (+EA) cycles, ignore the EA part(seperated from EU cycles already), finally substract 4 cycles/memory access(in bytes and words, since the X cycles are cycles of the 8086(second set for 8088 is mentioned seperately, so it's just like the EA cycles) and also substract 1 cycle for each instruction byte?

One complication that occurs is: how many instruction bytes should be substracted? Since (including modr/m data and/or prefixes) these change for each instruction? How many of those need to be substracted in instructions using modr/m parameters and prefixes? The only constant in that is the opcode byte and 0F extension byte?

Author of the UniPCemu emulator.
UniPCemu Git repository
UniPCemu for Android, Windows, PSP, Vita and Switch on itch.io

Reply 41 of 198, by vladstamate

Posted on 2017-04-05, 15:09

vladstamate Offline

Rank Oldbie

Rank: Oldbie
Posts: 967
Joined: 2015-08-23, 01:43

superfury wrote:
vladstamate wrote:
Yes. And I do that. I have the concept of "wasted cycles" I think I call them and those I subtract from final EU time. […]
Show full quote
superfury wrote:
Are the PIQ fetches(1 cycle/instruction byte) also needed to be substracted?

Yes. And I do that. I have the concept of "wasted cycles" I think I call them and those I subtract from final EU time.

superfury wrote:

Or should just the EA timings and memory timings need to be substracted? Also, the references say X + EA cycles, so X only needs to be substracted by 4 cycles/memory access?

Yes. You need to subtract the 4 cycles/memory access from X.

So if I understand it completely, you'll have the documentation saying X (+EA) cycles, ignore the EA part(seperated from EU cycles already), finally substract 4 cycles/memory access(in bytes and words, since the X cycles are cycles of the 8086(second set for 8088 is mentioned seperately, so it's just like the EA cycles) and also substract 1 cycle for each instruction byte?

One complication that occurs is: how many instruction bytes should be substracted? Since (including modr/m data and/or prefixes) these change for each instruction? How many of those need to be substracted in instructions using modr/m parameters and prefixes? The only constant in that is the opcode byte and 0F extension byte?

There is no fix number. In CAPE every case is treated separately and microcoded. So I count all the "fake cycles" in different EU_PHASE_xxx and I subtract them from the EU cycles value at the end (when I have to wait the execution, so that I wait less).

Prefixes for me are treated separately (and they generally only include the decode cycle but not always). That is true for stuff like CS:, ES:, REPZ, WAIT, etc

CAPE is not really a table based emulator so I do not have that data for you I am afraid.

YouTube channel: https://www.youtube.com/channel/UC7HbC_nq8t1S9l7qGYL0mTA
Collection: http://www.digiloguemuseum.com/index.html
Emulator: https://sites.google.com/site/capex86/
Raytracer: https://sites.google.com/site/opaqueraytracer/

Reply 42 of 198, by superfury

Posted on 2017-04-05, 17:41

superfury Offline

Rank l33t++

Rank: l33t++
Posts: 5822
Joined: 2014-03-08, 11:25
Location: Netherlands

You say you substract them from the EU cycles value at the end? You'll have a problem if those "fake cycles" exceed the amount of used cycles in EU_PHASE_XXX. It will underflow into a huge amount or invalid(none) amount of cycles? Say 5 cycles used with 8 "fake cycles". Thus you'll either get around 65333 cycles to spend or none(when using simple <-operator comparison on a signed number).

I've just finished adjusting the BIU, ModR/M and Stack handling of UniPCemu. It should now be able to handle (as an 8-bit BIU only atm) all possible requests to the BIU by the EU. It works using two 1-entry queues: One queue for requests, One queue for responses(which are 1 for write BIU cycles and the value read for read BIU cycles). The request queue is to be filled by the CPU with access requests. The BIU pops an entry off this queue and starts processing cycles(in parallel with the EU) until the MMU or BUS I/O is complete. One it's complete, it will push a response(1 for writes, value read for reads) on the response queue.

The EU works during writes/reads by first queueing a request for a MMU/IO read/write and starts idling 1 cycle at a time. Once the response buffer is filled, it will pop off the response(1 or memory/BUS value read) and continue execution. Different stages are to be done by using a simple increasing counter to keep the current execution state to return to(a series of if-counter-equals STEP1 else if counter equals STEP2 else finishinstructionwith0cycles(or delay EU when needed). The STEP* points simply will call the function to do something(BIU queue functionality or response), add some cycles to delay(1 cycle when waiting for the request to complete or be able to add a request) or perform an action and delay some(actual execution cycles).

Using such a step-based system allows for the EU to roughly do the same as your version of those execution queues in your EU files, only it's done with one or a few counters in my case.

I'm still thinking about how to handle the interrupts etc. Things get a bit complicated when handling 80286+ interrupts using this method as well, as they can be nested.

The implementing of that step-based system into the 8086 core is the only big thing that's still left to do to make it run more like your emulator(and more cycle-accurate in general). Currently the (EU) cycle counts of the instructions haven't been adjusted yet to exclude fetching from the PIQ or memory access cycles(4/access(either byte or word, since those cycles are 8086 cycles afaik(4 cycles need to be added manually to get 8088 timings. This is currently done by adding to the total cycles by adding to the cycles_MMUR, cycles_MMUW or cycles_IO variables)), although the EA timings have already been seperated and moved back to the start of the 'execution-phase'(for lack of a better term to describe the actual execution of an instruction itself(which is essentially everything after the prefetch-phase(the prefetch phase are the opcode fetching, modr/m fetching, parameter fetching, 1-cycle idle timings during those and EA decode cycles). Although, currently, the EA decode immediately is absorbed into the first stage of the 'execution-phase', instead of being seperated in order for the BIU to do a little work before actually starting instruction execution.

Edit: I've just modified the EA cycles to be consumed first by the BIU(and hardware), before starting the 'execution-phase'. Now the opcode-specific handler itself starts once the EA cycles(if any) have completed.

Author of the UniPCemu emulator.
UniPCemu Git repository
UniPCemu for Android, Windows, PSP, Vita and Switch on itch.io

Reply 43 of 198, by vladstamate

Posted on 2017-04-05, 18:16

vladstamate Offline

Rank Oldbie

Rank: Oldbie
Posts: 967
Joined: 2015-08-23, 01:43

superfury wrote:
You say you substract them from the EU cycles value at the end? You'll have a problem if those "fake cycles" exceed the amount of used cycles in EU_PHASE_XXX. It will underflow into a huge amount or invalid(none) amount of cycles? Say 5 cycles used with 8 "fake cycles". Thus you'll either get around 65333 cycles to spend or none(when using simple <-operator comparison on a signed number).

Yes which is why I guard against that:

1	// now account for lost cycles
2	instr.cyclesToDelay -= instr.wastedCycles;
3	if (instr.cyclesToDelay < 0)
4		instr.cyclesToDelay = 0;

superfury wrote:
I've just finished adjusting the BIU, ModR/M and Stack handling of UniPCemu. It should now be able to handle (as an 8-bit BIU on […]
Show full quote
I've just finished adjusting the BIU, ModR/M and Stack handling of UniPCemu. It should now be able to handle (as an 8-bit BIU only atm) all possible requests to the BIU by the EU. It works using two 1-entry queues: One queue for requests, One queue for responses(which are 1 for write BIU cycles and the value read for read BIU cycles). The request queue is to be filled by the CPU with access requests. The BIU pops an entry off this queue and starts processing cycles(in parallel with the EU) until the MMU or BUS I/O is complete. One it's complete, it will push a response(1 for writes, value read for reads) on the response queue.

The EU works during writes/reads by first queueing a request for a MMU/IO read/write and starts idling 1 cycle at a time. Once the response buffer is filled, it will pop off the response(1 or memory/BUS value read) and continue execution. Different stages are to be done by using a simple increasing counter to keep the current execution state to return to(a series of if-counter-equals STEP1 else if counter equals STEP2 else finishinstructionwith0cycles(or delay EU when needed). The STEP* points simply will call the function to do something(BIU queue functionality or response), add some cycles to delay(1 cycle when waiting for the request to complete or be able to add a request) or perform an action and delay some(actual execution cycles).

Using such a step-based system allows for the EU to roughly do the same as your version of those execution queues in your EU files, only it's done with one or a few counters in my case.

I'm still thinking about how to handle the interrupts etc. Things get a bit complicated when handling 80286+ interrupts using this method as well, as they can be nested.

Yes task switching in 286 in cycle mode was a big beast to handle as there was a lot of read some bytes, execute, write some bytes, execute some more, write some more, etc. Took me a whole release just getting that to be cycle based. And even now it is not dealing properly with some error codes. But it was a big achievement nonetheless.

superfury wrote:
The implementing of that step-based system into the 8086 core is the only big thing that's still left to do to make it run more like your emulator(and more cycle-accurate in general). Currently the (EU) cycle counts of the instructions haven't been adjusted yet to exclude fetching from the PIQ or memory access cycles(4/access(either byte or word, since those cycles are 8086 cycles afaik(4 cycles need to be added manually to get 8088 timings. This is currently done by adding to the total cycles by adding to the cycles_MMUR, cycles_MMUW or cycles_IO variables)), although the EA timings have already been seperated and moved back to the start of the 'execution-phase'(for lack of a better term to describe the actual execution of an instruction itself(which is essentially everything after the prefetch-phase(the prefetch phase are the opcode fetching, modr/m fetching, parameter fetching, 1-cycle idle timings during those and EA decode cycles). Although, currently, the EA decode immediately is absorbed into the first stage of the 'execution-phase', instead of being seperated in order for the BIU to do a little work before actually starting instruction execution.

Edit: I've just modified the EA cycles to be consumed first by the BIU(and hardware), before starting the 'execution-phase'. Now the opcode-specific handler itself starts once the EA cycles(if any) have completed.

To be honest, the subtracting 1 cycle per PIQ while it will help you get more accurate results is not as important for the whole accuracy as doing read/writes to memory at the correct cycles/points within an instruction lifetime. From what you are saying it sounds like you already are doing some work in that direction.

YouTube channel: https://www.youtube.com/channel/UC7HbC_nq8t1S9l7qGYL0mTA
Collection: http://www.digiloguemuseum.com/index.html
Emulator: https://sites.google.com/site/capex86/
Raytracer: https://sites.google.com/site/opaqueraytracer/

Reply 44 of 198, by superfury

Posted on 2017-04-05, 20:50

superfury Offline

Rank l33t++

Rank: l33t++
Posts: 5822
Joined: 2014-03-08, 11:25
Location: Netherlands

Thinking about it: Execution phase is probably the correct term. I've seen emulation of a CPU explained as being 'fetch-decode-execute', so fetching opcodes(my CPU_readOP_prefix function), decode(EA cycles) and my opcode handlers are essentially the execution phase.

Edit: It seems that I was right: https://en.m.wikipedia.org/wiki/Instruction_cycle

Also, wouldn't just the memory cycles need to be substracted? I assume the manual cycles won't include fetch/delays during fetching(which depends on the modr/m length etc.) and the EA cycles are already seperated? So only 4 cycles/memory access need to be substracted, 8086 style(the manual gives the 8086 timings after all. The only thing mentioned about the 8088 in that section is about needing to add 4 cycles for each memory access on the 8088 with the word transfers or odd physical memory addresses.)

One thing I've changed about the BIU compared to earlier is that it now uses a basic MMU syntax(segment selector index, offset, word/dword offset. Also, the value to be written with write instructions), instead of physical memory addresses. This is done to properly support things like address wrapping correctly(on the offset side of things). If course, unused parameters are removed(subbyte index within (d)word transfer and instruction fetch inducation(which is done by the BIU itself on defaulted T4 cycles(empty requests).

Last edited by superfury on 2017-04-05, 21:12. Edited 1 time in total.

Author of the UniPCemu emulator.
UniPCemu Git repository
UniPCemu for Android, Windows, PSP, Vita and Switch on itch.io

Reply 45 of 198, by vladstamate

Posted on 2017-04-05, 21:05

vladstamate Offline

Rank Oldbie

Rank: Oldbie
Posts: 967
Joined: 2015-08-23, 01:43

I agree. But keep in mind for some instruction (like a simple INT call on a 8088) execution phase is not a single item. I break INT execution phase in about 3 parts in CAPE: the part where we decide what interrupt number it is (is it a INTO? INT 3? INT X?), then the part where we instruct the BIU to write out current CS/IP and also instruct the BIU to read for us the new CS/IP and then finally the last execution part where we set the new CS/IP. All those are intermingled with a lot of BUS operations and the EU ends up waiting a good number of cycles for BIU to be done before it can process the last part.

But that is just implementation detail of how I treat that.

YouTube channel: https://www.youtube.com/channel/UC7HbC_nq8t1S9l7qGYL0mTA
Collection: http://www.digiloguemuseum.com/index.html
Emulator: https://sites.google.com/site/capex86/
Raytracer: https://sites.google.com/site/opaqueraytracer/

Reply 46 of 198, by superfury

Posted on 2017-04-05, 21:44

superfury Offline

Rank l33t++

Rank: l33t++
Posts: 5822
Joined: 2014-03-08, 11:25
Location: Netherlands

What about execution phase exceptions(like DIV0)? When are they handled?

Author of the UniPCemu emulator.
UniPCemu Git repository
UniPCemu for Android, Windows, PSP, Vita and Switch on itch.io

Reply 47 of 198, by vladstamate

Posted on 2017-04-05, 23:01

vladstamate Offline

Rank Oldbie

Rank: Oldbie
Posts: 967
Joined: 2015-08-23, 01:43

If a DIV by 0 exception happens then the following things occur:

1) the EU goes back to its initial phase which is EU_PHASE_DECODE
2) then I in there I check if there is a exception (which is the same workflow as described in the 8088 datasheet from Intel, I've basically implemented that diagram)
3) if there is an exception I effectively create an instruction descriptor out of thin air that is an INT x (it is actually an exception instruction but really it is an INT) The x is pre-populated as the CPU should not need to prefetch what int type it is
4) then it gets executed as a normal interrupt

Other things like NMI or traps (INT 3) are treated more or less the same. This mechanism allows me to have cycle based interrupt/exception/HWinterrupt/NMI handling, which is nice.

YouTube channel: https://www.youtube.com/channel/UC7HbC_nq8t1S9l7qGYL0mTA
Collection: http://www.digiloguemuseum.com/index.html
Emulator: https://sites.google.com/site/capex86/
Raytracer: https://sites.google.com/site/opaqueraytracer/

Reply 48 of 198, by superfury

Posted on 2017-04-06, 06:20

superfury Offline

Rank l33t++

Rank: l33t++
Posts: 5822
Joined: 2014-03-08, 11:25
Location: Netherlands

Does the CPU start this at the very first cycle of your emulator's version execution phase(and real CPU), to check for zero divider or isn't there some small checking time/delay after reading all neccesary information to start calulating(register/memory contents and/or divider)? I assume not, since it needs to check for a zero divider always?

Author of the UniPCemu emulator.
UniPCemu Git repository
UniPCemu for Android, Windows, PSP, Vita and Switch on itch.io

Reply 49 of 198, by superfury

Posted on 2017-04-06, 14:00

superfury Offline

Rank l33t++

Rank: l33t++
Posts: 5822
Joined: 2014-03-08, 11:25
Location: Netherlands

I've improved the BIU a bit to be more accurate. Also, the CPU now tries(but fails for some reason) to instead of reading the modr/m data immediately, use a waitstate(1 cycle) function to fetch the data into a buffer before passing control to the internal handlers until the end of the handlers. This should improve accuracy partly already, but the Turbo XT BIOS v3.0 fails at the end of the memory test with a System Board error 02 instead.

Edit: This has to do with the new BIU fetching (read data phase of your emulator) it seems.

Author of the UniPCemu emulator.
UniPCemu Git repository
UniPCemu for Android, Windows, PSP, Vita and Switch on itch.io

Reply 50 of 198, by vladstamate

Posted on 2017-04-06, 15:24

vladstamate Offline

Rank Oldbie

Rank: Oldbie
Posts: 967
Joined: 2015-08-23, 01:43

superfury wrote:
Does the CPU start this at the very first cycle of your emulator's version execution phase(and real CPU), to check for zero divider or isn't there some small checking time/delay after reading all neccesary information to start calulating(register/memory contents and/or divider)? I assume not, since it needs to check for a zero divider always?

No, no. The zero divide flag (and other overflow flags) are triggered in the execute phase which is on cycle N. Then in cycle N+1 the CPU will go back to decode phase (due to exception triggered in cycle N).

Technically this is wrong because a real CPU will have to actually perform the operation first (the actual divide) which does not take 0 cycles. So the proper workflow would be:

1Cycle N: start execution (EU_PHASE_EXECUTE)
2Cycle N+EXECUTE_CYCLES : divide ended, divide by 0 flag set, CPU put in decode phase
3Cycle N+EXECUTE_CYCLES+1 : decode phase detects divide by 0 flag prepare exception
4Cycle N+EXECUTE_CYCLES+2 : execution of INT-like instruction (the exception) start

The EXECUTE_CYCLES is however it takes for the DIV instruction to do its work. I shall fix that, thank you for pointing it out.

EDIT: It seems I am already doing it properly and the handling of the exception (say from a divide by 0 from an AAM instruction) happens after the AAM finished.

YouTube channel: https://www.youtube.com/channel/UC7HbC_nq8t1S9l7qGYL0mTA
Collection: http://www.digiloguemuseum.com/index.html
Emulator: https://sites.google.com/site/capex86/
Raytracer: https://sites.google.com/site/opaqueraytracer/

Reply 51 of 198, by superfury

Posted on 2017-04-06, 16:20

superfury Offline

Rank l33t++

Rank: l33t++
Posts: 5822
Joined: 2014-03-08, 11:25
Location: Netherlands

Having implemented the first stage(reading data from memory) to use the BIU instead of direct access changes 8088 MPH's cycle count to 1558 cycles(deviating 7%). Now I still need to modify the internal functions to use their own substep system to make the final bits accurate. Also all GRP opcodes still need to be modified as well. Things are slowly but surely getting closer to the real thing. Also, the clock cycles themselves(as in EU cycles) still have their old values, so those need to be modified as well when the stages are implemented, substracting 4 cycles/memory access. The register timings would probably be unmodified?

One last confirmation about the cycles to apply for the EU: Do I need to substract extra cycles for the fetching(1 cycle/instruction byte)? I would think not, because these differ for each instruction, while the 8086/8088 manual doesn't give the information for the seperate opcodes themselves, just the instructions?

Edit: Also, the stack handling needs to be updated(together with the remaining misc. instructions that don't use ModR/M parameters).

Author of the UniPCemu emulator.
UniPCemu Git repository
UniPCemu for Android, Windows, PSP, Vita and Switch on itch.io

Reply 52 of 198, by superfury

Posted on 2017-04-06, 19:53

superfury Offline

Rank l33t++

Rank: l33t++
Posts: 5822
Joined: 2014-03-08, 11:25
Location: Netherlands

I've just implemented most (except the internal 8088 functions for general support and the I/O(BUS) functionality) to use the BIU for input/output. The instruction cycles are still unchanged. 8088 MPH now thinks it's running on a real 8088! 😁

Edit: Still some timing problems as far as I can see. The Delorean car is once again working without problems. Kefrens Bars are messing up, and so are the credits(crashing the demo, making it hang).

https://bitbucket.org/superfury/unipcemu/src/ … 086.c?at=master

It's the CPU8086_internal_* helper functions that are after the BIU helper functions that still need modification to use the BIU, using said helper functions. Also, a few opcodes and the GRP opcodes still need modification to implement the BIU into them. The CPU8086_internal_RET(F) instructions are already modified. Also, the calls to the functions MMU_rb/w and MMU_wb/w need to be adjusted to use the BIU instead in their remaining calls.

Strange that 8088 MPH detects the CPU as a true 8088, while it's using the EU execution phase timings from the manuals(and double timings from common instructions(4 or 8 cycles/access extra for the instructions using the old cycle style still used with the CPU8086_internal_* helper functions, which set them themselves)? Also, the (I)DIV instructions currently use 0 cycles as far as I saw earlier(strange, I thought I've implemented the max cycles, like (I)MUL) while modifying and implementing the basic support for the DIV0 exception(interrupts still use old timings as well, without using the BIU).

Author of the UniPCemu emulator.
UniPCemu Git repository
UniPCemu for Android, Windows, PSP, Vita and Switch on itch.io

Reply 53 of 198, by superfury

Posted on 2017-04-07, 12:23

superfury Offline

Rank l33t++

Rank: l33t++
Posts: 5822
Joined: 2014-03-08, 11:25
Location: Netherlands

After fixing the Dosbox-style handling as well as the 80286 (which was crashing because of improper use of the BIU vs CPU and Dosbox-style cycles), it now gives a divergeance of 1% with 1667 cycles. I still need to apply the new memory and I/O to the internal functions and IN/OUT instructions though. Also, the PUSH word instruction had a little bug on 80286+ CPUs causing it to post a request to the BIU and keep posting requests infinitely(always returning 0(instead of 1 when requested), thus causing the PUSH instruction to keep executing the requesting clock cycle over and over again).

Otherwise, it's still not accurate enough(MIPS 1.10 still reports diverging timings, while the IBM PC AT timings are reasonably constant for some reason?), since the timings have still to be adjusted.

Edit: I'm currently up to the XOR instructions in the 8086 internal opcodes. The CPU is still working correctly afaik. MIPS 1.10 reports as an average speed of 1.02.
Edit: Updated internal timings up to the non-algorithmic opcodes comment.
Edit: The only internal handlers(except for the opcodes handling themselves, as well as the 0xFF opcode as far as I saw when rougly patching it(The "JMP Mp"(0xFF /5) instruction)) left are the string instructions, INTO, XLAT, XCHG and LXS(LDS, LES) internal handlers. After that I only still need to patch the remaining opcode handlers, where needed, and maybe update the execution cycles to apply(I still need to look if I didn't do this already during implementation or while fixing bugs. Looks like I'll have to get my 8086/8088 manual again when I reach that point).
Edit: Implemented the 0xFF opcode that remained right away, so I don't have to look at the GRP opcodes anymore. Now only those mentioned remaining internal handlers and the remaining opcode handlers themselves are left.
Edit: MIPS 1.10 reports 0.93 general instructions, 1.45 integer instructions, 0.77 memory to memory(string instructions not done yet), 1.65 register to register, 0.92 register to memory and 1.02 performance rating. The IBM/AT 8MHz column gives the values 0.27, 0.23, 0.24, 0.21, 0.28 and 0.25 instead, which is oddly constant?
Edit: MOVSB(the most complicated instruction of the 8086 string instructions) has been implemented with steps. Now only the other string instructions need to be made like them(MOVSB/MOVSW).
Edit: Just implemented the other string instructions. 8088MPH cycle count decreases again and MIPS 1.10 now reports General Instructions 1.00! 😁
Left: (INTO,) XLAT, XCHG and LXS(LDS, LES) instructions.

Author of the UniPCemu emulator.
UniPCemu Git repository
UniPCemu for Android, Windows, PSP, Vita and Switch on itch.io

Reply 54 of 198, by superfury

Posted on 2017-04-08, 17:32

superfury Offline

Rank l33t++

Rank: l33t++
Posts: 5822
Joined: 2014-03-08, 11:25
Location: Netherlands

I've implemented those remaining cases(opcodes being opcode A0/A1 only. Now 8088MPH reports 1633 cycles). Although the EU timings havent been reduced yet(4/memory access). Also, the port I/O isn't redirected to the BIU yet.

Author of the UniPCemu emulator.
UniPCemu Git repository
UniPCemu for Android, Windows, PSP, Vita and Switch on itch.io

Reply 55 of 198, by superfury

Posted on 2017-04-08, 19:52

superfury Offline

Rank l33t++

Rank: l33t++
Posts: 5822
Joined: 2014-03-08, 11:25
Location: Netherlands

I've just implemented the port I/O BIU handling and cleaned up the 8086 BIU handlers. Now 8088 MPH reports a metric cycle count of 1634? It keeps going down instead of up, even though it should slow down instead of speed up? Do BIU cycles still continue during execution phase of the EU?

Edit: Just tried 8088 MPH. It strangely slows down instead of speeding up?
Edit: Is this just because I'm still using the EU timings described in the 8086/8088 user manual?
Edit: 8088 MPH crashes when starting the credits.

These are the current timings (as currently implemented, according to the 8086/8088 user manual):
https://bitbucket.org/superfury/unipcemu/src/ … 086.c?at=master

Look for cycles_OP for the used EU timings for each instruction.

All EU timings were directly taken from: http://matthieu.benoit.free.fr/cross/data_she … sers_Manual.pdf

Should I substract 4 cycles/memory access(each byte/word memory access done during the instruction)? Would that increase the 8088 MPH speed to it's correct speed?
Edit: I've just restored the documented timings from the user manual to the IN/OUT instructions, while applying the new defines for applying memory/IO cycles to it(10 or 8 cycles).

Author of the UniPCemu emulator.
UniPCemu Git repository
UniPCemu for Android, Windows, PSP, Vita and Switch on itch.io

Reply 56 of 198, by vladstamate

Posted on 2017-04-08, 23:00

vladstamate Offline

Rank Oldbie

Rank: Oldbie
Posts: 967
Joined: 2015-08-23, 01:43

superfury wrote:

Do BIU cycles still continue during execution phase of the EU?

Yes. If nothing else the BIU will be busy prefetching from CS:IP. If you mean does the CPU execute some part of the instruction while it is waiting for the BIU to do some work I do not know about that. I work under the assumption that the CPU will need all the data (all sources) available before starting the execution of said instruction.

That being said a good number of instruction do BIU work at the end for writing out stuff. In CAPE in that situation the EU is simply waiting.

YouTube channel: https://www.youtube.com/channel/UC7HbC_nq8t1S9l7qGYL0mTA
Collection: http://www.digiloguemuseum.com/index.html
Emulator: https://sites.google.com/site/capex86/
Raytracer: https://sites.google.com/site/opaqueraytracer/

Reply 57 of 198, by superfury

Posted on 2017-04-08, 23:34

superfury Offline

Rank l33t++

Rank: l33t++
Posts: 5822
Joined: 2014-03-08, 11:25
Location: Netherlands

What about jumps? If a jump is taken(16 cycles), does that mean the prefetch is fully filled(16/4=~4 bytes(or 3) fetched at the new address, 3/4 depending on the starting T cycle of the execution phase)? If it only will reload one byte, how long (in cycles) is the prefetch stalled? In this example, is it always stalled for 12 cycles, or is it using some kind of formula depending on the T-state?

Author of the UniPCemu emulator.
UniPCemu Git repository
UniPCemu for Android, Windows, PSP, Vita and Switch on itch.io

Reply 58 of 198, by superfury

Posted on 2017-04-09, 00:53

superfury Offline

Rank l33t++

Rank: l33t++
Posts: 5822
Joined: 2014-03-08, 11:25
Location: Netherlands

I've just modified the writeback to not start until the execution phase is finished. 8088 MPH now reports 1568 cycles(no 4 cycles/memory access applied yet.).

Also, does the CPU disable prefetching after a jump until the next instruction starts it's fetch phase? So 16 cycles disabled with most jumps?

Author of the UniPCemu emulator.
UniPCemu Git repository
UniPCemu for Android, Windows, PSP, Vita and Switch on itch.io

Reply 59 of 198, by superfury

Posted on 2017-04-09, 12:57

superfury Offline

Rank l33t++

Rank: l33t++
Posts: 5822
Joined: 2014-03-08, 11:25
Location: Netherlands

I've substracted 4 cycles/bus access from all instruction and their variants that use them, but I still get a metric cycle count of 1542? Why is this so low?

Edit: Also, the (I)DIV timings have been reimplemented(they were lost at some point during conversion).
Edit: The demo is ridiculously slow now and the Delorean car has about 1 disappearance(one wave of background lines from bottom to top(covering the entire sprite halfway), on even/odd lines) each second. The Kefrens Bars mess up as usual:

The attachment 358-KefrensBars_20170409_1445.jpg is no longer available

Also, the music hangs with a high note(1000Hz?) during the 3D pyramid? Becomes silent during the faces of the creators of 8088 MPH, then a reasonably soft high tone while it's hanging at the credits.

Author of the UniPCemu emulator.
UniPCemu Git repository
UniPCemu for Android, Windows, PSP, Vita and Switch on itch.io

Main menu