VOGONS


UniPCemu 8088 cycle accuracy

Topic actions

Reply 80 of 122, by superfury

User metadata
Rank l33t++
Rank
l33t++
GloriousCow wrote on 2023-08-07, 14:15:
Bear in mind I model this as a state we enter; not a one time event. […]
Show full quote
superfury wrote on 2023-08-06, 11:52:
Just implemented an extra timer on the BIU emulation of the 808x: - When T3 is ticked (proceeding onto T4), it will set a flag i […]
Show full quote

Just implemented an extra timer on the BIU emulation of the 808x:
- When T3 is ticked (proceeding onto T4), it will set a flag if the prefetch isn't empty.
- When T1 arrives to tick and either said flag is set, or no request is made, an additional check is made before checking the requests from the EU:
-- (in both cases below the above flag is cleared, preventing it from retriggering until after the EU request finishes after this)
-- PIQ not full? Perform a prefetch instead.
-- PIQ full? Perform 4 idle clock cycles instead (thus taking T1-T4 cycles with idle bus).

Bear in mind I model this as a state we enter; not a one time event.

It's perhaps most noticeable in string instructions with REP prefixes, for example a REP MOVSB will incur the 3 cycle delay after the first R/W iteration as the queue fills up. Once full, we now delay 3 extra cycles per iteration (effectively 18% of iteration time) because we do not 'resume' the BIU until a byte is actually read out of the queue again, which it won't until the operation is complete. There are other possible ways to model this logic - perhaps you could assume that the prefetcher still schedules on a full queue, and so the delay is explained by a fetch attempt each time instead of specifically delaying EU operations - then you wouldn't need to track state. I don't yet know how to test which underlying theory is correct.

I don't know if you can just add these delays into your code easily, since you are likely accounting for them in some sort of static cycle count already. To properly emulate all the 8088 bus delays, i think one pretty much has to model the microcode execution time exactly and let the BIU delay logic fill in the rest.

Also, it won't set the flag if the current operation is a prefetch operation. Otherwise, any prefetch would ignore the EU requests until the PIQ is fully filled, which shouldn't happen.

Currently, on the 8088, the word accesses's byte accesses will be interrupted by the PIQ fetching (if any available) aren't interrupted, since it's considered atomic (so T1-T4 of byte +0, followed by T1-T4 of byte +1, followed by 1 guaranteed T1-T4 or 3 idle cycles(if buffer full) for a prefetch.
If T1 arrives without any request from the EU, a prefetch will always be attempted (T1-T4 if not full, 1 idle cycle otherwise).

If requests from the EU keep coming each cycle, it would flip-flop between prefetching a byte and any pending request byte, back and forth between the two.
Once the EU stops requestig before/at T1, the PIQ starts doing requests on T1 until filled, at which point the PIQ simply ticks 1 cycle waiting for it to empty or a request to be made from the EU(which gets priority, unless it was the last transfer made on T1-T4 and asking for another byte to transfer, which gets interrupted by either prefetching or 3-cycle stall once a byte is transferred).

Author of the UniPCemu emulator.
UniPCemu Git repository
UniPCemu for Android, Windows, PSP, Vita and Switch on itch.io

Reply 81 of 122, by superfury

User metadata
Rank l33t++
Rank
l33t++

I've just added a neat little feature to UniPCemu's softdebugger: software triggered debugger logging.

So software can instruct the debugger to start logging at a specified point in execution and instruct it to stop logging when needed.

The basic protocol to set it up is done using a normal softdebugger command. That command sets the debugger mode and after how many instructions to start (including the initial arming read).

So the protocol is pretty simple:
- First start the softdebugger to accept commands (using port E9h).
- Then execute a command on port EAh to arm the debugger logging. When executing said command, the parameters tell it what debugger logging mode to use (see settings file for the values) and after how many instructions (including the starting trigger) to start.
- The result of the command will tell if the debugger logging mode is armed(1) at port E9h or fully disarmed(0).
- Read port E9h when armed to start counting down (it's armed to countdown mode). Said read also counts one on the counter and will start the debugger if it counts down to 0.
- At this point, further reads from port E9h will disarm the countdown if counting instead and terminate the logging after the read instruction from port E9h.
- Instructions executed tick the counter when armed. Once it hits 0, the next instruction executing will have said debugger logging mode enabled.

So that way, the software can tell the softdebugger module to, for example, a timeout of 3 (1 for the IN instruction to port E9h and 2 for the cleanup and RET instructions) and a valid debugger logging mode.

Then, once it's ready to start logging:

push ax
in 0xE9,al ;Start the debugger logging countdown!
pop ax ; First instruction ticked
ret ;Second instruction ticked. After this, the debugging starts.

Then, once it's back at the main module and ready to terminate logging:

push ax
in 0xE9,al ;Stop the debugger logging, if active, or abort the countdown!
pop ax ;This isn't logged anymore.

Author of the UniPCemu emulator.
UniPCemu Git repository
UniPCemu for Android, Windows, PSP, Vita and Switch on itch.io

Reply 82 of 122, by GloriousCow

User metadata
Rank Member
Rank
Member
superfury wrote on 2023-08-10, 20:13:
I've just added a neat little feature to UniPCemu's softdebugger: software triggered debugger logging. […]
Show full quote

I've just added a neat little feature to UniPCemu's softdebugger: software triggered debugger logging.

So software can instruct the debugger to start logging at a specified point in execution and instruct it to stop logging when needed.

The basic protocol to set it up is done using a normal softdebugger command. That command sets the debugger mode and after how many instructions to start (including the initial arming read).

So the protocol is pretty simple:
- First start the softdebugger to accept commands (using port E9h).
- Then execute a command on port EAh to arm the debugger logging. When executing said command, the parameters tell it what debugger logging mode to use (see settings file for the values) and after how many instructions (including the starting trigger) to start.
- The result of the command will tell if the debugger logging mode is armed(1) at port E9h or fully disarmed(0).
- Read port E9h when armed to start counting down (it's armed to countdown mode). Said read also counts one on the counter and will start the debugger if it counts down to 0.
- At this point, further reads from port E9h will disarm the countdown if counting instead and terminate the logging after the read instruction from port E9h.
- Instructions executed tick the counter when armed. Once it hits 0, the next instruction executing will have said debugger logging mode enabled.

This is a good feature. You can also build a loader using Int 21h, AH==4Bh, AL==01h to load a program and then start your logger before jumping to the new code segment

MartyPC: A cycle-accurate IBM PC/XT emulator | https://github.com/dbalsom/martypc

Reply 83 of 122, by superfury

User metadata
Rank l33t++
Rank
l33t++

Just added a new interrupt vector (FEh) with functions to terminate the emulator and send a command to the debugger.
The sending of the command to the debugger only sets carry flag on error though, no result code is given. Also no other registers are left changed (all stored and restored using the stack).

This should make it easier to implement the debugger logging functionality etc. when I start to implement that (which is easy now). Although result codes aren't implemented yet (it isn't supported to be read yet by the BIOS COM ROM INT FE handler).

Edit: Implemented the full INT FEh 00h service to allow for proper result stage parsing on the debugger (unverified to be working though).

I'll still need to add the debugger logging command call though.

Author of the UniPCemu emulator.
UniPCemu Git repository
UniPCemu for Android, Windows, PSP, Vita and Switch on itch.io

Reply 84 of 122, by GloriousCow

User metadata
Rank Member
Rank
Member

I've changed a bit of how I handle the BIU state logic. In my previous post, I talked about the BIU "stalling" when the queue is full - I now no longer use this terminology; but the behavior is essentially the same. I now call this the BIU "Idle" state. I'll have a writeup about this soon.

MartyPC: A cycle-accurate IBM PC/XT emulator | https://github.com/dbalsom/martypc

Reply 85 of 122, by superfury

User metadata
Rank l33t++
Rank
l33t++

After a lot more testing finally managed to get the entire softdebugger interface to work (interrupt FEh function 00h to send commands to the softdebugger, function 4Ch to terminate for real (which uses function 00h) and INT20h (which is called directly by INT 21h function 4C as well) which calls interrupt FEh function 4Ch to terminate the app).

The logging is a bit weird somehow, though? I only see some of the test logs coming through (using the INT21h and INT10h teletype and text services).
INT10h seems to go through, but INT 21h interface seems to do nothing for some weird reason (other than ofc terminate using function 4Ch)...

Edit: Found the issue! The INT 21h vector wasn't hooking correctly, incorrectly being hooked as the INT FEh vector instead.
Fixing that makes everything run as it should again! 😁
At least for the test COM program that's within UniPCemu's repository.

But at least it's fully automated now (just run the UniPCemu emulator and it will run the tests, log in the specified logging mode and then close the emulator when it's finished (stopping logging once it detects the termination call inside the INT20h/INTFEh(function 4C) handler at the very first point after it's specific jump to the start of it's routine to terminate).

The example ROM (inside UniPCemu, it's a simple COM ROM test routine that just calls the various supported interrupt functions to test logging etc.):

Filename
debugger_AutomaticLogging_BIOSCOMROM_20230814_2219.7z
File size
9.69 KiB
Downloads
33 downloads
File comment
Example log for the test function log, all automatic (except starting UniPCemu itself and basic UniPCemu configuration of how to log exactly (registers etc.)).
File license
Fair use/fair dealing exception

Also made a 8088 MPH instruction mix (the COM program from the earlier assembler code):

Filename
debugger_8088MPHinstructionmix_20230814_automaticlog.7z
File size
15.39 KiB
Downloads
30 downloads
File comment
8088 MPH instruction mix running inside UniPCemu using the BIOS COM ROM.
File license
Fair use/fair dealing exception

Also made using the BIOS COM ROM of course (so just a surplus of two extra instructions being logged (the INT 20h at CS:0 and the IN instruction that terminates the logging)).

It's still weird that the 8088 seems to run fine, as does the 80186? But 286 and up (perhaps some of the newer instructions) crashes in weird ways due to unknown reasons (although can still boot MS-DOS and run part of 286+ software (vdiag for Tseng Labs ET4000/W32 crashes midway the final ACL test while rendering (using reversed order directly after drawing a quarter screen) during execution))?
Edit: OK. I see it execution opcode 0F85, but instead just executing 808x opcode 85h? That isn't supposed to happen?
Edit: Woo... Interesting! It's executing said opcode, reads opcode 'prefix' (special cased) 0F, then never finishes the 0F prefix parsing correctly, moving onwards onto the normal 85h opcode handling (which is a TEST instruction 😖).

Edit: OK. I see it detecting the special 0F opcode 'prefix' (not counted as a prefix though, but as an special opcode byte that extends the normal opcode byte as a 'high' byte of sorts (think opcode OR (0F shl 8 ), except the 0F opcode is 01h instead)).
But when trying to read the actual normal 0F opcode (the byte following 0F), it's peeking inside the PIQ to check if it's to block the loading. But said peeking is overwriting the 0F opcode prefix that it uses for detection (which isn't expected by the caller), thus effectively aborting the 0F opcode prefix and replacing the 0F 'prefix' with the escaped opcode instead (in this case 85h). And opcode 85h (instead of 0F85h) is a TEST instruction instead of a proper JNZ rel16/32 opcode!). That wasn't expected!

The 440fx BIOS still executes an invalid INT opcode to location 0000:0000, which obviously shouldn't happen.
Edit: Hmmm... It still happens reverting the CPU files back to commit 10ed6f35843ed99f4c9c27189195e3032c4fa4b0 ?
The proper final commit of 2023/06/13 still has it running.
So far have managed to narrow it down to the entire CPU emulation changes and DMA S-states being handled only.
Edit: Found an issue with DMA cleaning up it's waitstate handling for next accesses. Other than that, DMA seems clean.
So it's just the CPU itself now that's left to fully check.
Edit: Just got the EU new timings for fetching instructions and pretty much the entire BIU changes left. Somewhere in there there's still the error, if it's still left.

Edit: And with the new BIU active (and the EU not handling the new way), it crashes in the specified 0000:0000 interrupt way.

So there's definitely something wrong in the BIU's latest code. Probably with reading and/or writing 32-bit operands (assuming 16-bit operands don't fail as well, since 808x does boot)?
The EU seems to be functioning properly with the BIU of the 13/07 commit (with some of the new EU-related additions of the new EU instruction fetching algorithm added).
Once the new BIU (mainly the changed ticking of T3 and waitstates etc.) is used, the i440fx BIOS starts to crash on the invalid interrupt to 0000:0000.
Edit: Hmm... With the latest bugfix, it no longer crashes that way, instead performing a CLI HLT hang before it displays anything? So there's definitely something wrong in the BIU now?

Author of the UniPCemu emulator.
UniPCemu Git repository
UniPCemu for Android, Windows, PSP, Vita and Switch on itch.io

Reply 86 of 122, by GloriousCow

User metadata
Rank Member
Rank
Member

Hi superfury, are you familiar with Tom Harte's CPU test suites?

https://github.com/TomHarte/ProcessorTests/

they are a set of JSON files with initial and ending state information for single instructions, including cycle state information.
I'm working on generating a set of these for the 8088. Do you think it would be a lot of work for UniPCemu to utilize them? It would be a good way to validate your 8088 implementation.

I posted an issue that has an example 8088 test so you can see what you'd be working with:

https://github.com/TomHarte/ProcessorTests/issues/47

MartyPC: A cycle-accurate IBM PC/XT emulator | https://github.com/dbalsom/martypc

Reply 87 of 122, by superfury

User metadata
Rank l33t++
Rank
l33t++
GloriousCow wrote on 2023-08-20, 23:37:
Hi superfury, are you familiar with Tom Harte's CPU test suites? […]
Show full quote

Hi superfury, are you familiar with Tom Harte's CPU test suites?

https://github.com/TomHarte/ProcessorTests/

they are a set of JSON files with initial and ending state information for single instructions, including cycle state information.
I'm working on generating a set of these for the 8088. Do you think it would be a lot of work for UniPCemu to utilize them? It would be a good way to validate your 8088 implementation.

I posted an issue that has an example 8088 test so you can see what you'd be working with:

https://github.com/TomHarte/ProcessorTests/issues/47

Well, in this case I know exactly where the issue is: it's in the BIU itself, either in the instruction fetching (whose settings are already correct as validated by the sandsifter/baresifter projects against real CPU results, though unvalidated on the latest commits since last release) or a weird case issue with the BIU handling memory reads/writes.
Luckily both are easy to verify with using simple shifting/write/readback tests afaik.

So I made exactly that:
https://bitbucket.org/superfury/unipcemu/src/ … bly/biutest.asm

It's a simple program that writes a test pattern to memory (at different memory offsets for each pattern type used for the final test), then proceeds to read it back in different quantities (performing the register jumbling for testing using a simple xchg for now. Although a simple xor with ff/ffff/ffff should also work). It performs each pattern a total of 4 times in a loop to validate things like unaligned offsets as well (this is all done using ecx/cx(or bx in special 8-bit immediate tests)).
I've also added an extra test that performs the same kind of tests, but instead using 8/16/32-bit modr/m immediates on top of 8/16/32-bit immediate values to test those as well (using ecx/bx for 32/(8 or 16-bit) for obtaining different modr/m immedaite modes to combine with the immediate size to test).

At the end it will simply print a Y or N with a simple 0xD newline (Carriage Return) to indicate it's finished with success or failure.
Edit: It seems it needs some fixing. Dosbox-X gives N as a result, which shouldn't happen?
Edit: After the latest bugfixes in the BIU test program, Dosbox-X has interesting behaviour on it. Instead of properly giving Y or N, it results in "[Y/N]?" being asked, even though the program never asks for any (it just sends Y or N followed by a carriage return to the output using interrupt 21h function 02h).

UniPCemu runs both it and the BIOS COM ROM fine now.

Edit: It reports "Y"! So the basic BIU tests using the BIU seem to have succeeded, at least on the BIOS COM ROM executing said COM file! 😁
Edit: But the 440fx still crashes before starting it's memory test, executing a INT to a 0000:0000 vector for some weird reason?

Edit: Also, as for your question, it'll be difficult. As it's dynamic JSON data, but UniPCemu can't and probably never will parse it (as UniPCemu is an compiled x86/PSP/Android/Windows/'whatever else' executable (various compiled CPUs and platforms it's running on, which aren't predetermined (other than basic supported SDL requirements)), as well as requiring me to either implement or use an external full JSON parser, both of which I wouldn't like to do (due to complexity and simply because of unportability)).

Author of the UniPCemu emulator.
UniPCemu Git repository
UniPCemu for Android, Windows, PSP, Vita and Switch on itch.io

Reply 88 of 122, by superfury

User metadata
Rank l33t++
Rank
l33t++

I've just managed to get the new BIU states implemented (3 cycles to switch between EU, PIQ and idle states in parallel to normal transfers), as mentioned in your article on the BIU's 3-modes machine.

It's now taking 3 parallel cycles (executed in parallel to the normal T-states) for every mode change. It performs the mode changes on T3 (for most cases) and on T1 when in idle mode.

8088 MPH now jumps up to the 1903 cycle count, which is way too heavy?

The 16/256 color noise moved to the clock after the | of 16 (so the clock with the grey following the clock of the white of the longest side of the '1' (as in '| roughly in pixels).
Although another appears after the final clock of the bottom line of the '6' (the bottom scanline of it, at the position of the single light grey 256-color 'pixel'), vertically on about every 1/4th screen at fixed intervals, jumping up and down 1 character.

Racing the beam shows a black screen only. So that's going horribly wrong?

Author of the UniPCemu emulator.
UniPCemu Git repository
UniPCemu for Android, Windows, PSP, Vita and Switch on itch.io

Reply 89 of 122, by GloriousCow

User metadata
Rank Member
Rank
Member
superfury wrote on 2023-08-26, 13:45:

I've just managed to get the new BIU states implemented (3 cycles to switch between EU, PIQ and idle states in parallel to normal transfers), as mentioned in your article on the BIU's 3-modes machine.

It's now taking 3 parallel cycles (executed in parallel to the normal T-states) for every mode change. It performs the mode changes on T3 (for most cases) and on T1 when in idle mode.

It's not 3 cycles for every state change. It's 3 to go from Idle to PF or EU, only 2 to go from PF<->EU

MartyPC: A cycle-accurate IBM PC/XT emulator | https://github.com/dbalsom/martypc

Reply 90 of 122, by superfury

User metadata
Rank l33t++
Rank
l33t++
GloriousCow wrote on 2023-08-26, 14:48:
superfury wrote on 2023-08-26, 13:45:

I've just managed to get the new BIU states implemented (3 cycles to switch between EU, PIQ and idle states in parallel to normal transfers), as mentioned in your article on the BIU's 3-modes machine.

It's now taking 3 parallel cycles (executed in parallel to the normal T-states) for every mode change. It performs the mode changes on T3 (for most cases) and on T1 when in idle mode.

It's not 3 cycles for every state change. It's 3 to go from Idle to PF or EU, only 2 to go from PF<->EU

And PF/EU->idle?

Edit: Just corrected the EU<->PF and EU/PF<->idle ticks. They now tick 2 cycles(EU<->PF) and 3 cycles(EU/PF<->idle) respectively.

8088 MPH now drops down to 1715 cycles again (2%).
Edit: 1706 cycles (2%) with the following timings:
- Idle to PF/EU: 3 cycles
- Idle to idle: 0 cycles
- PF/EU to idle: 3 cycles
- PF to EU or EU to PF: 2 cycles.
- Unchanged state: 0 cycles (nothing ticking).

Idle to PF/EU is done when activity is detected during idle.
PF/EU to idle is done when neither has activity during PF/EU states (usually after T4).
Idle state itself just ticks 1 cycles waiting for EU activity. PF/EU performs either a switch on T3 (to idle or PF/EU). Switching to the same state (unchanging state decided) is taking 0 cycles.

All are done in parallel to the usual T-states. It mostly affects the BIU when at T1 state with remaining clocks left to tick (from those 3 or 2 cycles).

So:
T3 (2 cycles to switch to new mode, new requests aren't accepted and moved to the next cycle here)
T4
T1(ticked right away in the new mode).

Or:
T3 (3 cycles to idle, this is the first clock)
T4 --(final clock of the move to idle)
-- idle clock(s) final clock, delaying PF/EU clocks
-- (Perhaps moving back in 3 clocks total to T1 from this point onwards)
-- second of 3 idle clocks
-- third of 3 idle clocks
T1 (first clock of the waitstate clocks are included in the switch, but the switch during waitstates are retriggered each clock (so the moving restarts each Tw state (counted as just another T3 state)).

So moving to idle mode:
T3 (3 cycles to idle)
Tw (3 cycles to idle)
T4 (2 cycles to idle)
-- (1 cycle to idle)
-- (first clock in idle mode, request made to become active again, which is it's first clock)
-- (second clock to becoming active again)
-- (third clock of becoming active again)
T1 (first active clock)
...

The move to idle/PF/EU is made based on the prefetch queue state and EU request state (simple EU > PIQ > idle).

The special case with the EU requests before/at/after T3 is made by blocking requests from being made on T3 when in PIQ or EU modes. This pushes the request to Tw or T4 (depending on the hardware waitstates), preventing the EU from requesting on T3 and allowing the BIU to execute properly in that way.

Author of the UniPCemu emulator.
UniPCemu Git repository
UniPCemu for Android, Windows, PSP, Vita and Switch on itch.io

Reply 91 of 122, by GloriousCow

User metadata
Rank Member
Rank
Member
superfury wrote on 2023-08-26, 14:55:

And PF/EU->idle?

I'm not sure, actually, but I model it as one cycle.

superfury wrote on 2023-08-26, 14:55:

- PF/EU to idle: 3 cycles

This is likely going to be too slow

I've published a blog article on the 8088MPH CPU test. Included is a cycle-perfect trace of the CPU test execution including a column that shows my BIU state transition logic, you might find it helpful to compare.
https://martypc.blogspot.com/2023/08/the-8088 … h-cpu-test.html
https://docs.google.com/spreadsheets/d/1bHCCS … C9OA9-L-N3xv8TQ

MartyPC: A cycle-accurate IBM PC/XT emulator | https://github.com/dbalsom/martypc

Reply 92 of 122, by superfury

User metadata
Rank l33t++
Rank
l33t++
GloriousCow wrote on 2023-08-26, 16:41:
I'm not sure, actually, but I model it as one cycle. […]
Show full quote
superfury wrote on 2023-08-26, 14:55:

And PF/EU->idle?

I'm not sure, actually, but I model it as one cycle.

superfury wrote on 2023-08-26, 14:55:

- PF/EU to idle: 3 cycles

This is likely going to be too slow

I've published a blog article on the 8088MPH CPU test. Included is a cycle-perfect trace of the CPU test execution including a column that shows my BIU state transition logic, you might find it helpful to compare.
https://martypc.blogspot.com/2023/08/the-8088 … h-cpu-test.html
https://docs.google.com/spreadsheets/d/1bHCCS … C9OA9-L-N3xv8TQ

Edit: Changing said clock to 1 cycle as you mentioned changes 8088 MPH reporting 1708 cycles now.

Author of the UniPCemu emulator.
UniPCemu Git repository
UniPCemu for Android, Windows, PSP, Vita and Switch on itch.io

Reply 94 of 122, by superfury

User metadata
Rank l33t++
Rank
l33t++
GloriousCow wrote on 2023-08-26, 17:14:

The second tab of that spreadsheet has cycle counts for every instruction in the test, too.

I'll look at that later. Although I'll need to identify the erroring instructions first.

For now I've made a simple log file (advanced log enabled (so memory i/o is visible, although also enabling the interrupt logging as an extra line for each interrupt, which luckily isn't used in the affected code afaik), cycle logging and register logging disabled) using the BIOS COM ROM again:

Filename
debugger_UniPCemu_COMROM_8088MPHmix_20230826_1858.zip
File size
29.24 KiB
Downloads
28 downloads
File comment
BIOS COM ROM executing the 8088 MPH mix inside UniPCemu with the latest BIU mode logic.
File license
Fair use/fair dealing exception

Perhaps putting it side-by-side with your log can somehow reveal what is going wrong timing-wise?
Edit: Wait a sec.... Is your spreadsheet really of the 8088 MPH code running? It starts with different instructions, according to your log?

Edit2: Did you know that the first two instructions (from your assembly file of the 8088 MPH mix) are missing from your spreadsheet?

begin_test:

mov al, 54h ; Timer 1, LSB, Mode 2
out 43h, al ; Timer 8253-5 (AT: 8254.2).
mov al, 12h ; Timer 1 = 18
out 41h, al ; Timer 8253-5 (AT: 8254.2).

mov ax, 1234h

It starts logging at mov al,12h in the spreadsheet.

Also, your log is kind of confusing. Where do instructions start fetching, I can't see where they start?

I've tried to compare it to UniPCemu's output, by what seems to be the corresponding address of the immediate of "mov al,12h" being fetched into the PIQ during T3:
https://docs.google.com/spreadsheets/d/1dcslw … dit?usp=sharing

Author of the UniPCemu emulator.
UniPCemu Git repository
UniPCemu for Android, Windows, PSP, Vita and Switch on itch.io

Reply 95 of 122, by GloriousCow

User metadata
Rank Member
Rank
Member
superfury wrote on 2023-08-26, 18:04:

Also, your log is kind of confusing. Where do instructions start fetching, I can't see where they start?

In many cases, instructions are already being fetched during the previous instruction. If you look in the last column, "RNI" indicates the last cycle of microcode execution for an instruction, which will either read the next instruction byte from the queue, or fetch it if the queue is empty, whereupon it will terminate at "RNI_END" and the next instruction will begin on the next cycle.

If there is instruction disassembly in that column, that is when an instruction is beginning. I have peeked ahead so you have access to the full disassembly on the first cycle of the instruction.

MartyPC: A cycle-accurate IBM PC/XT emulator | https://github.com/dbalsom/martypc

Reply 96 of 122, by superfury

User metadata
Rank l33t++
Rank
l33t++
GloriousCow wrote on 2023-08-26, 22:55:
superfury wrote on 2023-08-26, 18:04:

Also, your log is kind of confusing. Where do instructions start fetching, I can't see where they start?

In many cases, instructions are already being fetched during the previous instruction. If you look in the last column, "RNI" indicates the last cycle of microcode execution for an instruction, which will either read the next instruction byte from the queue, or fetch it if the queue is empty, whereupon it will terminate at "RNI_END" and the next instruction will begin on the next cycle.

If there is instruction disassembly in that column, that is when an instruction is beginning. I have peeked ahead so you have access to the full disassembly on the first cycle of the instruction.

If there is instruction disassembly in that column, that is when an instruction is beginning. I have peeked ahead so you have access to the full disassembly on the first cycle of the instruction.
[/quote]

So basically RNI_END would be where UniPCemu stops dumping the instruction address (where EU cycles ended and fetching the next instruction starts)?

And the second cycle is the start of the MOV AL,12 EU's fetching of the immediate microcode?

Still, the instructions before it in the mix are missing (the PIT setup), except it's final cycle?

Also, is it correct that T4 ending of an instruction (like the T4 of an I/O operation without anything else past the I/O, just RNI as you call it), like "OUT DX,AL", starts the RNI immediately when the T4 is ticking? So T1 IS the RNI?

UniPCemu reads it's result in the EU handler 'microcode' on T1 after T4 of the transfer to the I/O port (so T2 if any is the first cycle of the new instruction read from the PIQ (I PIQ status)). So it's always 1 cycle late?

Author of the UniPCemu emulator.
UniPCemu Git repository
UniPCemu for Android, Windows, PSP, Vita and Switch on itch.io

Reply 97 of 122, by GloriousCow

User metadata
Rank Member
Rank
Member
superfury wrote on 2023-08-27, 10:18:

So basically RNI_END would be where UniPCemu stops dumping the instruction address (where EU cycles ended and fetching the next instruction starts)?

I don't know what "dumping the instruction address" refers to, sorry.

superfury wrote on 2023-08-27, 10:18:

And the second cycle is the start of the MOV AL,12 EU's fetching of the immediate microcode?

The immediate is fetched there on cycle_n == 202 (Row #7) Y0u can tell because the next cycle lights up the queue status lines, which then read out 'S' for 'Subsequent byte read' and the queue_data becomes 12 to indicate 12h was read from the queue (the immediate). The queue status lines reflect what was done on the *previous* cycle. Recall that the instruction microcode fetches immediate operands, this is why execution of 'mov al, 12h' began on the first cycle even though the immediate wasn't fetched yet. The byte was placed into the queue on cycle_n == 201, but it takes 1 cycle to read from the queue.

superfury wrote on 2023-08-27, 10:18:

Still, the instructions before it in the mix are missing (the PIT setup), except it's final cycle?

I only included the last OUT that resets the value of the PIT channel so that you could synchronize the PIT with the cycle in which the PIT timer is set. But the entire OUT instruction is present. The real measurement starts at cycle_n == 220, in any case.

superfury wrote on 2023-08-27, 10:18:

Also, is it correct that T4 ending of an instruction (like the T4 of an I/O operation without anything else past the I/O, just RNI as you call it), like "OUT DX,AL", starts the RNI immediately when the T4 is ticking? So T1 IS the RNI?

A bus write operation ends on T3 (or TwLast if wait states, such as we see with OUT). So if the microcode instruction is labelled as "W, RNI", then RNI executes on T3/Tw as well. This means the next instruction can begin on T4, as we see on cycle_n == 220 when mov ax, 1234h begins on T4 of OUT's write bus transfer.

superfury wrote on 2023-08-27, 10:18:

UniPCemu reads it's result in the EU handler 'microcode' on T1 after T4 of the transfer to the I/O port (so T2 if any is the first cycle of the new instruction read from the PIQ (I PIQ status)). So it's always 1 cycle late?

Sounds like it might be. For a read operation, the value read is available in OPR on T4. So if an instruction does something with a read operand, it can do it on T4 of the read, it doesn't have to wait until T1. Code bytes fetched are also put into the queue on T4, but they can't be read out until one cycle later, because of the 1 cycle cost of reading the queue.

EDIT:

I made a video to explain the trace log fields better. Please forgive the bad audio. https://www.youtube.com/watch?v=cE8MihFf6OI

MartyPC: A cycle-accurate IBM PC/XT emulator | https://github.com/dbalsom/martypc

Reply 98 of 122, by superfury

User metadata
Rank l33t++
Rank
l33t++
GloriousCow wrote on 2023-08-27, 12:09:
I don't know what "dumping the instruction address" refers to, sorry. […]
Show full quote
superfury wrote on 2023-08-27, 10:18:

So basically RNI_END would be where UniPCemu stops dumping the instruction address (where EU cycles ended and fetching the next instruction starts)?

I don't know what "dumping the instruction address" refers to, sorry.

superfury wrote on 2023-08-27, 10:18:

And the second cycle is the start of the MOV AL,12 EU's fetching of the immediate microcode?

The immediate is fetched there on cycle_n == 202 (Row #7) Y0u can tell because the next cycle lights up the queue status lines, which then read out 'S' for 'Subsequent byte read' and the queue_data becomes 12 to indicate 12h was read from the queue (the immediate). The queue status lines reflect what was done on the *previous* cycle. Recall that the instruction microcode fetches immediate operands, this is why execution of 'mov al, 12h' began on the first cycle even though the immediate wasn't fetched yet. The byte was placed into the queue on cycle_n == 201, but it takes 1 cycle to read from the queue.

superfury wrote on 2023-08-27, 10:18:

Still, the instructions before it in the mix are missing (the PIT setup), except it's final cycle?

I only included the last OUT that resets the value of the PIT channel so that you could synchronize the PIT with the cycle in which the PIT timer is set. But the entire OUT instruction is present. The real measurement starts at cycle_n == 220, in any case.

superfury wrote on 2023-08-27, 10:18:

Also, is it correct that T4 ending of an instruction (like the T4 of an I/O operation without anything else past the I/O, just RNI as you call it), like "OUT DX,AL", starts the RNI immediately when the T4 is ticking? So T1 IS the RNI?

A bus write operation ends on T3 (or TwLast if wait states, such as we see with OUT). So if the microcode instruction is labelled as "W, RNI", then RNI executes on T3/Tw as well. This means the next instruction can begin on T4, as we see on cycle_n == 220 when mov ax, 1234h begins on T4 of OUT's write bus transfer.

superfury wrote on 2023-08-27, 10:18:

UniPCemu reads it's result in the EU handler 'microcode' on T1 after T4 of the transfer to the I/O port (so T2 if any is the first cycle of the new instruction read from the PIQ (I PIQ status)). So it's always 1 cycle late?

Sounds like it might be. For a read operation, the value read is available in OPR on T4. So if an instruction does something with a read operand, it can do it on T4 of the read, it doesn't have to wait until T1. Code bytes fetched are also put into the queue on T4, but they can't be read out until one cycle later, because of the 1 cycle cost of reading the queue.

EDIT:

I made a video to explain the trace log fields better. Please forgive the bad audio. https://www.youtube.com/watch?v=cE8MihFf6OI

I've modified the T3 to T4 transition (so on T4 the result is ready) to have the result of a transfer ready (if not a multibyte transfer that needs another byte/word/3 bytes of course).
I've also modified the debugger to perform a special case for the final cycle of an OUT instruction. In that case, T4 gives the instruction disassembly on a non-ticking BIU special case (which ticks '0' cycles (the EU just reads the BIU result and discards it, taking no cycles), basically a special kind of NOP for the BIU (hardware isn't ticked either in this special case)). The debugger logging makes not of this case. Then once T4 ticks, the EU will start a new instruction fetch from the PIQ as well as the debugger noticing that the last non-BIU cycle wasn't getting logged and this logs the disassembly of the last instruction on the T4 cycle (you can see this happened because the instruction address of the previous instruction isn't logged anymore in this case).

Filename
debugger_UniPCemu_COMROM_8088MPHmix_20230827_1608.zip
File size
27.78 KiB
Downloads
29 downloads
File comment
Fixed logging of the OUT instructions and other related instructions finishing without BIU ticking due to writes.
File license
Fair use/fair dealing exception

Edit: I've been comparing, and it seems to be off right from the start already.

I see your BIU starting the MOV AL,12h PIQ I-state on line 2 it seems, but UniPCemu (log line 31) manages to immediately load the second byte off the PIQ on the very next cycle, thus starting way too early (it's already in the queue!).
So the issue with the queue diverging starts (way) earlier, before the MOV AL,12 is reached (one of the earlier instructions)?

UniPCemu (according to my log) loads it during T3 immediately before the port write is performed (to port 43h) in a T1-T2-TW-T3-T4 cycle. So already during the OUT instruction (opcode E6h), the immediate of the MOV AL,12h is loaded at that point (line 25 of UniPCemu, it would be at line -4 in your log equivalent, at the T3 before your OUT instruction writes the port with waitstate in it).

Edit: Interestingly (and perhaps related) UniPCemu's OUT instruction has some cycles before it calls the BIU:

	if (CPU_readimm(0)) return; //Read immediate!
if (EMULATED_CPU <= CPU_NECV30) //Valid CPU to apply?
{
if (CPU8086_instructionstepdelayBIU(0, 1)) return; //1 cycles before we start, active cycles only, wait for it to finish!
if ((BIU_getcycle()==1) && (getActiveCPU()->timingpath==0)) //T2?
{
getActiveCPU()->timingpath = 0;
if (CPU8086_instructionstepdelayBIUidle(2, 2)) return; //2 cycles before we start, active cycles only!
}
else
{
getActiveCPU()->timingpath = 1; //Other path!
if (CPU8086_instructionstepwaitBIUready(2)) return; //Wait for ready!
if (CPU8086_instructionstepdelayBIU(4, 1)) return; //1 cycles before we start, active cycles only, wait for it to finish!
}
}
if (CPU_readimm(1)) return; //Read immediate!
INLINEREGISTER byte theimm = getActiveCPU()->immb;
debugger_setcommand("OUT %02X,AL", theimm);
if(CPU_PORT_OUT_B(0,theimm,REG_AL)) return;
if (CPU_apply286cycles()==0) /* No 80286+ cycles instead? */
{
//getActiveCPU()->cycles_OP += 1;
}
/*Timings!*/

The point of CPU_PORT_OUT_B is the actual request to the BIU to start performing the EU request and wait for a response. The whole thing depending on the emulated CPU being a NEC V30 or lower (to use 808x timings) is perhaps what causes this?

Author of the UniPCemu emulator.
UniPCemu Git repository
UniPCemu for Android, Windows, PSP, Vita and Switch on itch.io

Reply 99 of 122, by superfury

User metadata
Rank l33t++
Rank
l33t++

I've fixed up the logging of the instructions that weren't logging (due to the BIU not ticking).
Now it will take the instruction logged and keep it pending if the BIU isn't ticking. When a cycle is ticked by the BIU again, it will add a special carat sign instead of the address now (unless the EU is active, which adds the address as well).

https://docs.google.com/spreadsheets/d/1dcslw … dit?usp=sharing

As can be seen with the OUT instructions, the ^ is logged immediately after the otherwise hidden cycle because the BIU wasn't ticking for the termination of the instruction. It's logged on the first cycle of the next instruction trying to fetch from the PIQ in that case.

I've moved the code before the part you've logged down to match the point UniPCemu should actually be at roughly the same point (but obviously isn't due to EU timing differences).

Could you make a more complete log with the entire code, from the point of the initial jump to the mov al,54h instruction? There obviously is already an issue in those very first instruction timings?

Edit: 8088 MPH reports 1653 cycles (1%) right now. The Kefrens effect is a bit more stable, but still short a few cycles (looking at the bottom it's moving one scanline to the left in about 10 seconds while running at 20% speed (so about every 2 seconds for each one scanline short when it would be running at 100% realtime speed or half a scanline a second).
Edit: So it's 25 cycles too few right now (assuming it's supposed to be 1678 cycles, improved a small 2 cycles in total since 26 days ago's commit?).

Edit: There's two noise parts on the 16/256 color part now. One on the first character clock and one directly left of the | of the 1 of '16'.
8088 MPH still reports a true 8088.
Edit: End result currently:
https://docs.google.com/spreadsheets/d/1dcslw … dit?usp=sharing

Edit: During the 8088 MPH Kefrens part, it's never retracing, so never rendering frames?

Last edited by superfury on 2023-08-29, 14:23. Edited 3 times in total.

Author of the UniPCemu emulator.
UniPCemu Git repository
UniPCemu for Android, Windows, PSP, Vita and Switch on itch.io