UniPCemu 8088 cycle accuracy

Reply 100 of 122, by superfury

Posted on 2023-08-27, 22:53

superfury Offline

Rank l33t++

Rank: l33t++
Posts: 5831
Joined: 2014-03-08, 11:25
Location: Netherlands

I've just been thinking about something.

Your prefetch algorithm mentions that on T3 it makes a decision to switch to either PF or EU state after T4.
But can it also decide to go idle (ticking said 3rd state transition on T1 after that, delaying 1 cycle)? So if no request from EU and PIQ is full after T4?

UniPCemu just blindly switches to idle state in 3 cycles including the T3 cycle (so T3, T4, --(final cycle of the switch to idle)).
Or should UniPCemu never decide to switch to idle state on T3, but instead decide to go to PF state instead if it's on T3 (but T1 can still move to idle regardless)?

Edit: Just modified the BIU to instead of moving to Idle state on T3, instead move to PF state on T3 and only allow moving to Idle state on T1 after that (unless it actually can start of course).
So that should fix some issues with T3-T4-FirstIdle(T1)-IdleModeCycle(T1 again) (due to full PIQ at T3, the EU not requesting anything new). If the PIQ had one byte left and it's filled at T3, behaviour is unchanged (as T1 would see a full PIQ without EU request, thus 3 cycles to Idle mode being performed already).

Edit: 8088 MPH now reports 1656 cycles (1%), thus only 22 cycles missing from the count (in total).

I at least think that the IN/OUT instructions' specific timings might be a part of the problem (it allows a prefetch in between, which will fetch the 12h byte before the OUT instruction actually performs it's own I/O write(T1-T2-T3-Tw-T4 cycles)), seeing as it happens at the start of the log in the spreadsheet. But somehow the spreadsheet of MartyPC doesn't seem to perform that fetch before it?
May I ask what your emulator does when executing a OUT or IN instruction? What steps does the EU perform before and after the request to the BIU to read/write the I/O port?
Edit: Found a slight bug in the detection of the idle mode when determining the next mode to operate in. It was always detecting an active BIU (because the mode read would never be 2(which is the idle mode) only 0(EU) or 1(PF) depending on the current transfer).

8088 MPH reports 1664 cycles now (14 cycles unaccounted for). Getting close now, but something somewhere is still erratic?
Although (according to your article (https://martypc.blogspot.com/2023/08/the-8088 … h-cpu-test.html)) only 1 more PIT clock (because that's 4 metric cycles?) will make 8088 MPH pass it as a real CPU, so it can't be used when getting closer than that?

8088 MPH's 16/256 color part is now showing 3 scanlines (each scanline shifted one character to the right), starting at the second character clock of active display repeated on the screen's entire height (so second on first scanline, fourth on second, sixth on third, second on fourth etc.) of noise now, so that's increased somehow?

In realtime measurements, the Kefrens effect 'blocks' (actually two ends of the same scanline being displayed on two consecutive rows) seem to shift off the screen in roughly 1 second intervals (with the CPU at 22% realtime). So that combined is about every 220ms of emulated time one of those blocks move off the screen and another one takes it's place.
Using a simple calculator (4.77MHzx0.22) that's about 1050000 cycles on the CPU for each of those split scanline ends to move off the screen entirely and the next to take it's place (the scanline displacement effectively, that shouldn't exist).

Edit: Fixing the DMA to end with 3 extra cycles after S4 (as in https://martypc.blogspot.com/2023/05/explorin … -on-ibm-pc.html 's cycle log at the end of that page) makes 8088 MPH report a 'true 8088' now.
But it hangs immediately after that because the DMA controller keeps hogging the bus permanently somehow (S4 always transferring into S1)?
Edit: Modifying the DMA controller to tick S4 as S0-S1-S2 as well when SI on S4's clock cycle detects another transfer pending (so S4 is followed by S3), 8088MPH doesn't crash the DMA controller on the infinite stream of super-fast transfer requests (it would get a request before it could finish one with the PIT counter setup for 2 PIT ticks per transfer, which would be way too fast).
Edit: Removing that to become just SI-S0 on S4 changes back the behaviour. Then adding back the extra 'Tw' 3 cycles to be performed after as a special waitstate DMA S-state fixes it to run 8088 MPH without crashing again when starting the first screen, as well as the cycle count to be reporting a 'real' 8088.
The extra Tw-states (on XT only) are now performed using an extra Sw S-state handler (just like the SI and S0-S4 handlers). It is only triggered to tick after T4 with the purpose of keeping the bus busy for 3 more cycles in this case.

Edit: The racing the beam (Kefrens) gives a black screen now? It doesn't crash though, just continues to the next part after some time.

So it might have become more 'accurate' according to the real 8088 check (falling within range somehow), but the Kefrens effect fails 100% now (black screen, which is weird. That didn't ever happen from what I've seen, in not a single UniPCemu test so far).

I do see 8088 MPH performing DMA transfers back-to-back though when starting up? That happens when switching to the old vs new CGA screen from MS-DOS after a 'real' 8088 is found and some (relatively) long delay (a few seconds emulated time) is finished, while switching to the hacked color mode (the MS-DOS text screen is visible and stretched at that point, DMA executing: SI-S4(which has a merged SI-S0 in the S4 cycle)followed S1-S4(since DREQ is raised again) followed by 3 Sw cyvles (The special Tw cycles, but induced by XT DMA only, preventing the BIU from obtaining the bus (archieving 'ready'))).

Author of the UniPCemu emulator.
UniPCemu Git repository
UniPCemu for Android, Windows, PSP, Vita and Switch on itch.io

Reply 101 of 122, by GloriousCow

Posted on 2023-08-28, 15:15

GloriousCow Offline

Rank Member

Rank: Member
Posts: 488
Joined: 2022-09-12, 20:00

superfury wrote on 2023-08-27, 22:53:

May I ask what your emulator does when executing a OUT or IN instruction? What steps does the EU perform before and after the request to the BIU to read/write the I/O port?

They're pretty simple. the OUT imm8, al/ax has two cycles between reading the immediate and writing to the port, the OUT dx, al has one cycle before writing to the port.

MartyPC: A cycle-accurate IBM PC/XT emulator | https://github.com/dbalsom/martypc

Reply 102 of 122, by superfury

Posted on 2023-08-29, 09:13

superfury Offline

Rank l33t++

Rank: l33t++
Posts: 5831
Joined: 2014-03-08, 11:25
Location: Netherlands

GloriousCow wrote on 2023-08-28, 15:15:

superfury wrote on 2023-08-27, 22:53:

May I ask what your emulator does when executing a OUT or IN instruction? What steps does the EU perform before and after the request to the BIU to read/write the I/O port?

They're pretty simple. the OUT imm8, al/ax has two cycles between reading the immediate and writing to the port, the OUT dx, al has one cycle before writing to the port.

OK. Is the same true for the IN versions?
What about the BIU during those cycles? Can it start T1-T4 of transfers during those cycles (or is it blocked from doing so)?
Because it isn't blocked currently and will start a prefetch cycle in the 8088 setting up the timer, fetching the immediate (which it shouldn't, according to the spreadsheet of the instruction after (MOV AL,12h). In UniPCemu's COM ROM dump, the second OUT instruction to setup DMA (to port 41h) performs a prefetch of the 12h immediate during said 2 cycles between fetching the immediate(41h) and the cycles writing port 41h (although the T3 won't show the I/O port being written, it can be detected by a missing memory logging on said line and a Tw cycle as well on the XT).

Edit: Just made a new log of the BIOS COM ROM executing the 8088 MPH mix again, this time with new IO port reads/writes added to the log:

The attachment debugger_UniPCemu_COMROM_8088MPHmix_20230829_1207.zip is no longer available

So the actual T3 of IO ports are actually visible in the log now (instead of being like memory writes but without memory addresses).
You can also see the width of the broken up writes (if broken up) by looking at the width of the read/written value (for example 0000=0000( ) vs 0000=000000000( )).
The width of the port of course is always fixed to 16-bits, since that's a limit of the hardware itself afaik.

For the common log format with advanced logging (not officially defined by the emulator authors though) it's now like:

1BIU T3 -	2000:00000102 	IO(w):0043=54(T)
2BIU T4 I	^ E6 43 out 43,al

Without advanced logging the IO(w) part is dropped and not logged.
The same for interrupts triggering that are currently still logged on a seperate line (also part of the advanced logging).
Edit: And that's parsed just like memory accesses from current commits onwards. So the rogue interrupt (any interrupt for that fact) etc. that was logged in advanced logging mode according to the old logging code is now properly put into the same format as memory accesses etc.
The old logging modes are still kept on single lines as before, they're routed to the same equivalent of a log function (except being parsed in a smaller log buffer first, which shouldn't be an issue due to the nature of those advanced logging texts being short (usually less than 20 characters long) by nature).

So advanced logging doesn't trigger any extra line breaks in it's logging anymore. The only parts that still do that directly are extra stuff like register logging (can be disabled) and some misc. register data from hardware (VGA etc.). But those are disabled (except registers, which if configurable in the emulator debugger settings itself) in single line logging formats, like the one used for this test case (the mode that the BIOS COM ROM logs, which is mode 12 (Always log, even during skipping, common log format)).

Author of the UniPCemu emulator.
UniPCemu Git repository
UniPCemu for Android, Windows, PSP, Vita and Switch on itch.io

Reply 103 of 122, by superfury

Posted on 2023-08-29, 13:35

superfury Offline

Rank l33t++

Rank: l33t++
Posts: 5831
Joined: 2014-03-08, 11:25
Location: Netherlands

The whole routine is now timed from row 0 to 6397 for the entire batch (from the first instruction fetch until the RET instruction).
So it's 6398 cycles for the entire 8088 MPH batch. Although different if taken from the first effective instruction you mentioned.

The attachment debugger_UniPCemu_COMROM_8088MPHmix_20230829_1524.zip is no longer available

Author of the UniPCemu emulator.
UniPCemu Git repository
UniPCemu for Android, Windows, PSP, Vita and Switch on itch.io

Reply 104 of 122, by superfury

Posted on 2023-08-29, 13:45

superfury Offline

Rank l33t++

Rank: l33t++
Posts: 5831
Joined: 2014-03-08, 11:25
Location: Netherlands

During those 2 or 1 cycles before writing to the port, can the BIU start a prefetch operation if it's on T1, causing a delay on the EU side for the port write? Or is it forced to stay idle if not prefetching already?

What about IN instructions? Do they behave the same?

The attachment debugger_UniPCemu_COMROM_8088MPHmix_20230829_1547.zip is no longer available

6396 cycles for the entire mix now.

8088 MPH still seems to somehow generate a black screen during the Kefrens bars? Any idea if simple instruction timings would cause this?

Author of the UniPCemu emulator.
UniPCemu Git repository
UniPCemu for Android, Windows, PSP, Vita and Switch on itch.io

Reply 105 of 122, by superfury

Posted on 2023-08-29, 15:05

superfury Offline

Rank l33t++

Rank: l33t++
Posts: 5831
Joined: 2014-03-08, 11:25
Location: Netherlands

Interestingly, when looking at the Kefrens writing the CRTC registers, I see them happening at virtual scanline(pixel clock location) locations:
0,1C5=setting register 4 to 3B.
14h,12F=setting register 4 to 1.
0,1C6=setting register 4 to 3B
14,130=setting register 4 to 1.
0,1C7=setting register 4 to 3B.
14,131=setting register 4 to 1.

So the start and end register updates keep flipping the register 4 between 1 and 3B(59 decimal) every 'screen' of 14 scanlines, with 1 clock being late every screen?

Because it only reaches scanline 14h (#24 decimal), it never retraces(which happens on scanline 3B), thus never draws any frames!
It will never retrace, because the vertical sync (18h) is never reached (only up to scanline 14h is reached)! So only 20/59 scanlines are clocked during the VRetrace clocks (assuming it sets it to 3Bh for that)?

Edit: Latest 8088tst3 results:

1real:	disp1:	comp1:		disp3:	comp3:
2FF43	FF38	<(-11)	FF35	-14	FF2F	-20
3FE59	FE48	<(-17)	FE3D	-28	FE2E	-43
4FDC5	FDB3	<(-18)	FDA4	-33	FD91	-52
5FD58	FD44	<(-20)	FD33	-37	FD18	-64
6FD2A	FD1A	<(-16)	FD07	-35	FCEC	-62
7FC6B	FC63	<(-8)	FC49	-34	FC2A	-65
8FBB7	FBAE	<(-9)	FB92	-37	FB6C	-75
9F9A9	F990	<(-25)	F96C	-61	F948	-97
10CPU test complete. Elapsed timer ticks:
1107CA	0786	<(-68)	07AF	-27	07DA	+10

It got even worse with those tests? Total ticks is close though?
Edit: Huh? The i440fx miraculously POSTs again after fixing the BIU new request variable to behave properly (clearing it after the first byte)! 😁 It tests RAM again now at least.
Edit: And immediately after hangs the BIU somehow? The EU isn't executing anything anymore and the CPU seems to not be executing new instruction (some OR [mem] instruction, which is odd).
Edit: It seems to be fixed now, after some more slight bugfixes in the CPU reset algorithm.
Windows 95 boots on the i440fx, but oddly enough, the 8042 goes a bit wrong now: the output buffer is filled, but the CPU never seems to properly read it? The IRQ line, oddly enough, seems to be low (it should be high until the buffer is read after all)?

Author of the UniPCemu emulator.
UniPCemu Git repository
UniPCemu for Android, Windows, PSP, Vita and Switch on itch.io

Reply 106 of 122, by superfury

Posted on 2023-08-30, 14:51

superfury Offline

Rank l33t++

Rank: l33t++
Posts: 5831
Joined: 2014-03-08, 11:25
Location: Netherlands

OK. Since Windows 95 starts to hang receiving output from the PS/2 keyboard/mouse, I'd assume there might be an issue with I/O ports.
And ofc the only way to test that is by probing some 8/16/32-bit I/O ports and see if the results match up.

So I've added a little 8/16/32-bit (16/32-bit need to be aligned) I/O port at 0xEC (inside the softdebugger module). It functions just like a little buffer that's (like RAM) writable and readable at all offsets. The 8-bit handlers (r/w) handle all offsets if used. The 16-bit handlers just handle even offsets (0/2 in this case, so port ECh and EEh) and 32-bit handles the base offset only (port EC).
Then trying to write/read to those addresses and reading it back would tell if the I/O operations in the BIU is operating properly at least.

Edit: And it's reporting to be failing (N). So the I/O mechanism on the 80386 is failing reads/writes somehow (perhaps on 808x as well though).

Edit: After some more fixing of the port testing program and moving the port ECh 16-bit (also mapped as 2x8-bit) UniPCemu callback I/O port (used for callbacks into internal handling of ROMs inside UniPCemu, as well as the bootstrapping and related startup of UniPCemu itself (callback #0 is a mandatory callback. It's called when the emulator starts up, which performs a INT19h handler call, where UniPCemu loads all ROMs, displays the yellow text when starting up (to enter the settings menu before proper emulation starts and ROMs are loaded after that if the key isn't pressed or the yellow text tapped/clicked with touch or mouse or using the "Set" button), as well as the proper INT19h bootstrapping interrupt (which is handled Dosbox-style) using the internal BIOS).
Said 8/16-bit I/O port was interfering with the I/O ports used to test the functionality, so the port EDh 8-bit operation wasn't getting through (it was caught by the wrong handler).

Having fixed that, the I/O port test program actually managed to finish and report a proper 'Y' in the port e9 log.
So the BIU is now properly doing it's job, at least wrt to port I/O.

So that means that there's probably an issue with prefetching or memory BIU (requested by EU) operations?

Edit: Just changed the breaking into bytes and words to be handled by hardware responding respectively. It's basically a dword > word > byte, whichever responds first (and if enough bytes to transfer are left). If nothing responds at the requested size (must be aligned for dword or word accesses though), a lower size is used instead on said cycle.
The next cycle would then get the chance of being aligned (if enough bytes to transfer are left), so you can get:
port=value written on a cycle
EC=12345678
or
ED=12
EE=3456
F0=78 (16-bit listening on EE, only 8-bit or none on ED/F0).
or
EC=1234
EE=5678 (32-bit port doesn't exist, but 16-bit does).
or
EC=12
ED=34
EE=5678 (EC/ED are 8-bit, EE supports 16-bit)
or
EC=12
ED=34
EE=56
EF=78 (all ports are 8-bits supporting only).

So it basically just checks each cycle where it left off for an access, determines 32/16/8-bit depending on alignment and if something responds use said size found first (32-bit>16-bit>8-bit), advances the base address for the next cycle (according to what responded (nothing responds means a byte access always) based on what responded (nothing responding meaning 8-bit is assumed).
Edit: Windows 95 also doesn't hang for input anymore for some reason. I've also fixed some 8042 issues, that might also have been it.
Edit: Hmmm... Both it and Windows NT 4.0 seem to run, but CD-ROM booting fails for some weird reason? ATAPI commands seem to execute though (although erroring out for some reason) when booting Windows NT 4.0?
Edit: OK. So the basic CD-ROM somewhat works. Using Windows NT 4.0, it properly seems to read the CD-ROM images (and run stuff on them). But when trying to boot them, it somehow fails?
Edit: OK. Booting them succeeds now at least (perhaps some ejecting problem, which should work fine though).
As XP home setup seemed to run fine on Windows NT 4.0, perhaps it'll run using a CD-ROM boot environment as well (previously failed on that though)?

Author of the UniPCemu emulator.
UniPCemu Git repository
UniPCemu for Android, Windows, PSP, Vita and Switch on itch.io

Reply 107 of 122, by superfury

Posted on 2023-09-08, 00:44

superfury Offline

Rank l33t++

Rank: l33t++
Posts: 5831
Joined: 2014-03-08, 11:25
Location: Netherlands

With the latest CGA VRAM improvements (essentially exploiting the VGA-compatible monochrome mode memory addressing combined with slightly added special logic (in the VGA byte mode addressing clocking and half character clock for 4-color graphics and text mode) for latching the character code (or first 4 pixels in 4-color mode) on the second half clock (halfway the final character clock before horizontal total and all previous character clock for all other horizontal character cocks), then the attribute byte (or second 4 pixels in 4-color mode) on the actual first character clock starting (and the same repeating accross the entire width of the active display).
The monochrome graphics mode is basically unchanged, latching 8 pixels every 8 pixels (except the first set of 8 pixels of a scanline being fetched on the first clock instead of the horizontal/vertical total clock).

8088 MPH still seems to run. The Kefrens effect still can't show itself due to CPU timing problems.
The noise at the 16/256-color part of the demo is now on the first scanline and two scanlines (before the 1's vertical bar and before the '2' of '256' at the bottom, as well as on the first character clock of the first scanline.

Author of the UniPCemu emulator.
UniPCemu Git repository
UniPCemu for Android, Windows, PSP, Vita and Switch on itch.io

Reply 108 of 122, by GloriousCow

Posted on 2023-09-08, 01:20

GloriousCow Offline

Rank Member

Rank: Member
Posts: 488
Joined: 2022-09-12, 20:00

My best suggestion to you at this point would be to utilize these tests if you can.

A Test Suite for the Intel 8088 CPU

MartyPC: A cycle-accurate IBM PC/XT emulator | https://github.com/dbalsom/martypc

Reply 109 of 122, by superfury

Posted on 2023-09-08, 01:41

superfury Offline

Rank l33t++

Rank: l33t++
Posts: 5831
Joined: 2014-03-08, 11:25
Location: Netherlands

Area 5150 changes with the latest code as well. No more corruption on the first part of the displayed chaplin, but instead it's completely missing?

After that part finishes, he's displayed for a frame, but the image is shifted in a weird way, with vertical slices in the wrong positions it seems?

The credits have the same effect, with blocks of rendering data interleaved with roughly equal blocks of black (looks like roughly 8 scanline vertical blocks, with rendered non-black output having all even or odd scanlines black, producing a scanline effect).

Chaplin:

The attachment 1760-Area 5150 chaplin UniPCemu 20230908_0156.png is no longer available

Credits:

The attachment 1762-Area 5150 credits UniPCemu 20230908_0156.png is no longer available

Interestingly, when rendering the text of the credits on top of the static image, the horizontal and vertical movement stops and becomes fullscreen (on the top and bottom green borders). A bit after it finishes rendering, the top 1/3 to 1/2 of the screen starts waving a bit vertically (like a wave going up).
Directly after the text is rendered (with red background), the screen looks like it's jumping about 8 character clock in width, although vertical seems to be roughly stable as far as I can see. A few seconds (much faster in realtime though, as it's only running at about 20% realtime speed) after that, the wavey effect on the top +/- 1/3 to +/- 1/2 screen area starts to wave, like a wave flowing upwards (all scanlines together).
Said weird effects repeat each time text is drawn again for the next block of credits.

Author of the UniPCemu emulator.
UniPCemu Git repository
UniPCemu for Android, Windows, PSP, Vita and Switch on itch.io

Reply 110 of 122, by superfury

Posted on 2023-10-04, 00:40

superfury Offline

Rank l33t++

Rank: l33t++
Posts: 5831
Joined: 2014-03-08, 11:25
Location: Netherlands

Hmm... Adding some bugfixes on the newer CPUs (That appeared in the 80486 and up crashing in weird ways) on the BIU somehow caused 8088 MPH to report too many cycles again. So the CPU is running too slow again, at 1746 cycles (it reported a true 8088 at 2023/09/27 's final commit).

Edit: Found it! The issue was with the T1 cycle being an idle cycle (which now reports itself as being a true idle cycle instead of a fake EU T3 cycle). It wasn't being detected anymore, so it wouldn't report ready for processing more data until T4 was encountered.
Having fixed that (idle (in this case a full PIQ) T3 cycles reporting read for transfers on T4), 8088 MPH reports a true 8088 IBM PC again.
Edit: Fixed a missing BIU state change from idle/PF to EU state.
Having that now implemented once again puts the 8088 MPH metric cycle count too high: it's reporting 1715 cycles (2%) now instead of cycle-accurate.

8088tst3 behaves as expected, once again lowering all counts read (even slower), and the total elapsed count increasing a bit more to 7F3h.

One nice thing happened to the 286 CPU with those latest BIU (all CPUs now actually use the PF-EU-Idle switching method in cycle-accurate mode) changes: it can finally POST with default 6MHz settings as intended! 😁

Though the 8088 still seems to be a bit too slow now (1715 cycles metric cycle count is too much).

Author of the UniPCemu emulator.
UniPCemu Git repository
UniPCemu for Android, Windows, PSP, Vita and Switch on itch.io

Reply 111 of 122, by superfury

Posted on 2023-10-04, 09:26

superfury Offline

Rank l33t++

Rank: l33t++
Posts: 5831
Joined: 2014-03-08, 11:25
Location: Netherlands

GloriousCow, do you have a list of all opcodes and their EU-only cycle counts (so excluding the BIU timing, which is common to all instructions)? So if it's for example 2 cycles waiting, BIU read/write, 2 cycles waiting, I only need to know those 2 cycles waiting before and after (as the BIU should be working properly now, according to your documentation).

Or some easy way to extract that information from your emulator without having to check multiple subroutines to figure out what the EU internal timings are (how many cycles before and after the BIU operation for each instruction).
UniPCemu's EU is still based on the old Reenigne code wrt to those.

Author of the UniPCemu emulator.
UniPCemu Git repository
UniPCemu for Android, Windows, PSP, Vita and Switch on itch.io

Reply 112 of 122, by GloriousCow

Posted on 2023-10-04, 16:31

GloriousCow Offline

Rank Member

Rank: Member
Posts: 488
Joined: 2022-09-12, 20:00

superfury wrote on 2023-10-04, 09:26:

GloriousCow, do you have a list of all opcodes and their EU-only cycle counts (so excluding the BIU timing, which is common to all instructions)? So if it's for example 2 cycles waiting, BIU read/write, 2 cycles waiting, I only need to know those 2 cycles waiting before and after (as the BIU should be working properly now, according to your documentation).

Or some easy way to extract that information from your emulator without having to check multiple subroutines to figure out what the EU internal timings are (how many cycles before and after the BIU operation for each instruction).
UniPCemu's EU is still based on the old Reenigne code wrt to those.

I don't have a list. You could come up with one, but only for instructions that do not branch or loop in the microcode. for those instructions it will always be "it depends"

I've had a mind to create sort of a 'microcode explorer' webpage where you could pull up an opcode and see the microcode and maybe even step through it, but of course, I have limited time and resources - if we could live forever, i'd make a lot of cool things.

All of MartyPC's basic instruction timings are in one file, execute.rs:
https://github.com/dbalsom/martypc/blob/versi … 808x/execute.rs

opcode execution begins here:
https://github.com/dbalsom/martypc/blob/versi … execute.rs#L197

but by the time execute is called, all of the modrm handling and EA calculation and fetching has already been done; but as long as you are doing that correctly, that can be considered common boilerplate. execute_instruction() just runs the opcode's base microcode routine.

Wherever you see a call to one of my cycle*() functions, that is a microcode step. cycle functions that end in _i will take a list of hex values, those represent microcode line numbers. MartyPC's execute stage ends at any microcode statement flagged with NXT or RNI; so the actual instruction may be 1-2 cycles longer than it appears. if you see a call to cycles_nx cycles_nx_i subtract one cycle; they are mostly for documentation purposes. NX cycles are not actually executed.

While we're discussing this allow me to re-advertise my extensive 8088 CPU test suite:
https://github.com/TomHarte/ProcessorTests/tree/main/8088

I built this especially for other emulator developers like yourself. There is no better way to validate your instruction implementation for cycle-accuracy than checking against these tests; they were hardware generated.

MartyPC: A cycle-accurate IBM PC/XT emulator | https://github.com/dbalsom/martypc

Reply 113 of 122, by superfury

Posted on 2023-10-05, 13:26

superfury Offline

Rank l33t++

Rank: l33t++
Posts: 5831
Joined: 2014-03-08, 11:25
Location: Netherlands

GloriousCow wrote on 2023-10-04, 16:31:
I don't have a list. You could come up with one, but only for instructions that do not branch or loop in the microcode. for th […]
Show full quote

superfury wrote on 2023-10-04, 09:26:

GloriousCow, do you have a list of all opcodes and their EU-only cycle counts (so excluding the BIU timing, which is common to all instructions)? So if it's for example 2 cycles waiting, BIU read/write, 2 cycles waiting, I only need to know those 2 cycles waiting before and after (as the BIU should be working properly now, according to your documentation).

Or some easy way to extract that information from your emulator without having to check multiple subroutines to figure out what the EU internal timings are (how many cycles before and after the BIU operation for each instruction).
UniPCemu's EU is still based on the old Reenigne code wrt to those.

I don't have a list. You could come up with one, but only for instructions that do not branch or loop in the microcode. for those instructions it will always be "it depends"

I've had a mind to create sort of a 'microcode explorer' webpage where you could pull up an opcode and see the microcode and maybe even step through it, but of course, I have limited time and resources - if we could live forever, i'd make a lot of cool things.

All of MartyPC's basic instruction timings are in one file, execute.rs:
https://github.com/dbalsom/martypc/blob/versi … 808x/execute.rs

opcode execution begins here:
https://github.com/dbalsom/martypc/blob/versi … execute.rs#L197

but by the time execute is called, all of the modrm handling and EA calculation and fetching has already been done; but as long as you are doing that correctly, that can be considered common boilerplate. execute_instruction() just runs the opcode's base microcode routine.

Wherever you see a call to one of my cycle*() functions, that is a microcode step. cycle functions that end in _i will take a list of hex values, those represent microcode line numbers. MartyPC's execute stage ends at any microcode statement flagged with NXT or RNI; so the actual instruction may be 1-2 cycles longer than it appears. if you see a call to cycles_nx cycles_nx_i subtract one cycle; they are mostly for documentation purposes. NX cycles are not actually executed.

While we're discussing this allow me to re-advertise my extensive 8088 CPU test suite:
https://github.com/TomHarte/ProcessorTests/tree/main/8088

I built this especially for other emulator developers like yourself. There is no better way to validate your instruction implementation for cycle-accuracy than checking against these tests; they were hardware generated.

I've tried interpreting your code.

The "if let OperandType::AddressingMode(_) = self.i.operand1_type {" parts are weird though? I see some instruction types execute in 0 cycles effectively? Sometimes they set that nx variable as well (looking at the other file with the cycles functions), which will add a single cycle on the start of the EU execution of the next instruction?
But in some cases it doesn't tick any cycles at all and doesn't set the nx, so effectively executes an ALU or CMP in 0 cycles (not a single cycle timed)! Huh? What's happening there?
See opcode 83h for example, I see it only timing if destination operand is memory (if I understand the AddressingMode if-clause correctly), but doesn't set NX (using "self.cycle_nx();"), thus the whole EU executes in 0 cycles using a register operand!?

Also I can't use your testsuite, since it requires some interpreter and manual text parsing, which UniPCemu doesn't support (unless I create an entire rust interpreter just for it or include some external interpreter mess, which I rather wouldn't). My emulator isn't written in Rust, but in C/C++ (although the entire code is purely in (ANSI) C for portability issues for some supported platforms).

Author of the UniPCemu emulator.
UniPCemu Git repository
UniPCemu for Android, Windows, PSP, Vita and Switch on itch.io

Reply 114 of 122, by GloriousCow

Posted on 2023-10-05, 14:08

GloriousCow Offline

Rank Member

Rank: Member
Posts: 488
Joined: 2022-09-12, 20:00

superfury wrote on 2023-10-05, 13:26:

I've tried interpreting your code.
The "if let OperandType::AddressingMode(_) = self.i.operand1_type {" parts are weird though?

The if let syntax is a bit counterintuitive for non-rust programmers. It still looks a bit weird for me. What that is doing is checking if operand1_type is any type of addressing mode (the _ means I don't care which mode, just that a mode is)
Some instructions have different timings when they have a register vs memory operand.

superfury wrote on 2023-10-05, 13:26:

I see some instruction types execute in 0 cycles effectively? Sometimes they set that nx variable as well (looking at the other file with the cycles functions), which will add a single cycle on the start of the EU execution of the next instruction?
But in some cases it doesn't tick any cycles at all and doesn't set the nx, so effectively executes an ALU or CMP in 0 cycles (not a single cycle timed)! Huh? What's happening there?
See opcode 83h for example, I see it only timing if destination operand is memory (if I understand the AddressingMode if-clause correctly), but doesn't set NX (using "self.cycle_nx();"), thus the whole EU executes in 0 cycles using a register operand!?

I mentioned before I do not execute RNI cycles in execute_instruction(). Part of what RNI does is fetch the next instruction, so that is handled as 'boilerplate' elsewhere; so all the cycle timings in execute are 1-2 cycles shorter than their microcode programs would suggest. Let's look at 83h.

The attachment microcode_83.png is no longer available

You can see that the very first line of microcode execution is tagged with RNI; if there is a memory operand, you are correct, no cycles are executed for it. The next instruction is just immediately fetched.
There is a cycle spent - but it's in the fetching code. Maybe it would have been more clear if i put those RNI cycles in execute, but I guess I just had to go and be confusing.

superfury wrote on 2023-10-05, 13:26:

Also I can't use your testsuite, since it requires some interpreter and manual text parsing, which UniPCemu doesn't support (unless I create an entire rust interpreter just for it or include some external interpreter mess, which I rather wouldn't). My emulator isn't written in Rust, but in C/C++ (although the entire code is purely in (ANSI) C for portability issues for some supported platforms).

The test suite has nothing to do with Rust whatsoever. It is JSON.

Several emulator authors have used it now. It's been interesting seeing different approaches.

You can use a JSON parsing library directly (many are listed for C https://www.json.org/json-en.html )
You can use python to convert the JSON tests to C code - an interesting approach taken by phix of VirtualXT https://github.com/andreas-jonsson/virtualxt/ … 8v1/generate.py
You can python to convert the JSON to some more easily parsed binary format
You can use python to read the JSON and then manipulate your debugger to execute the test

You're a talented programmer; I have no doubt you'll find an acceptable solution.

MartyPC: A cycle-accurate IBM PC/XT emulator | https://github.com/dbalsom/martypc

Reply 115 of 122, by superfury

Posted on 2023-10-05, 15:09

superfury Offline

Rank l33t++

Rank: l33t++
Posts: 5831
Joined: 2014-03-08, 11:25
Location: Netherlands

GloriousCow wrote on 2023-10-05, 14:08:
The if let syntax is a bit counterintuitive for non-rust programmers. It still looks a bit weird for me. What that is doing i […]
Show full quote

superfury wrote on 2023-10-05, 13:26:

I've tried interpreting your code.
The "if let OperandType::AddressingMode(_) = self.i.operand1_type {" parts are weird though?

The if let syntax is a bit counterintuitive for non-rust programmers. It still looks a bit weird for me. What that is doing is checking if operand1_type is any type of addressing mode (the _ means I don't care which mode, just that a mode is)
Some instructions have different timings when they have a register vs memory operand.

superfury wrote on 2023-10-05, 13:26:

I see some instruction types execute in 0 cycles effectively? Sometimes they set that nx variable as well (looking at the other file with the cycles functions), which will add a single cycle on the start of the EU execution of the next instruction?
But in some cases it doesn't tick any cycles at all and doesn't set the nx, so effectively executes an ALU or CMP in 0 cycles (not a single cycle timed)! Huh? What's happening there?
See opcode 83h for example, I see it only timing if destination operand is memory (if I understand the AddressingMode if-clause correctly), but doesn't set NX (using "self.cycle_nx();"), thus the whole EU executes in 0 cycles using a register operand!?

I mentioned before I do not execute RNI cycles in execute_instruction(). Part of what RNI does is fetch the next instruction, so that is handled as 'boilerplate' elsewhere; so all the cycle timings in execute are 1-2 cycles shorter than their microcode programs would suggest. Let's look at 83h.

microcode_83.png

You can see that the very first line of microcode execution is tagged with RNI; if there is a memory operand, you are correct, no cycles are executed for it. The next instruction is just immediately fetched.
There is a cycle spent - but it's in the fetching code. Maybe it would have been more clear if i put those RNI cycles in execute, but I guess I just had to go and be confusing.

superfury wrote on 2023-10-05, 13:26:

Also I can't use your testsuite, since it requires some interpreter and manual text parsing, which UniPCemu doesn't support (unless I create an entire rust interpreter just for it or include some external interpreter mess, which I rather wouldn't). My emulator isn't written in Rust, but in C/C++ (although the entire code is purely in (ANSI) C for portability issues for some supported platforms).

The test suite has nothing to do with Rust whatsoever. It is JSON.

Several emulator authors have used it now. It's been interesting seeing different approaches.

You can use a JSON parsing library directly (many are listed for C https://www.json.org/json-en.html )
You can use python to convert the JSON tests to C code - an interesting approach taken by phix of VirtualXT https://github.com/andreas-jonsson/virtualxt/ … 8v1/generate.py
You can python to convert the JSON to some more easily parsed binary format
You can use python to read the JSON and then manipulate your debugger to execute the test

You're a talented programmer; I have no doubt you'll find an acceptable solution.

So it's executing an ALU/CMP instruction in fetching code? That can't be correct, can it? And an ALU instruction (same as CMP) can never take 0 cycles, which is what's happening in said case (the ALU must need at least 1 cycle to shift or add etc. before writing the result to the register)?

Author of the UniPCemu emulator.
UniPCemu Git repository
UniPCemu for Android, Windows, PSP, Vita and Switch on itch.io

Reply 116 of 122, by GloriousCow

Posted on 2023-10-05, 16:17

GloriousCow Offline

Rank Member

Rank: Member
Posts: 488
Joined: 2022-09-12, 20:00

superfury wrote on 2023-10-05, 15:09:

So it's executing an ALU/CMP instruction in fetching code? That can't be correct, can it?

And an ALU instruction (same as CMP) can never take 0 cycles, which is what's happening in said case (the ALU must need at least 1 cycle to shift or add etc. before writing the result to the register)?

A line of microcode can do two things in a single cycle. Some operation, then some other operation. Such as, doing some alu operation, then an RNI.

I do not execute RNI cycles in execute_instruction(). I don't know how to rephrase that in any other way that is more clear.

Therefore if an alu op is coming along for the ride on RNI, i don't have to explicitly account for that cycle. It doesn't mean the operation takes 0 cycles. It took 1 - the same cycle we did the RNI on.

MartyPC: A cycle-accurate IBM PC/XT emulator | https://github.com/dbalsom/martypc

Reply 117 of 122, by superfury

Posted on 2023-10-05, 16:48

superfury Offline

Rank l33t++

Rank: l33t++
Posts: 5831
Joined: 2014-03-08, 11:25
Location: Netherlands

GloriousCow wrote on 2023-10-05, 16:17:
A line of microcode can do two things in a single cycle. Some operation, then some other operation. Such as, doing some alu […]
Show full quote

superfury wrote on 2023-10-05, 15:09:

So it's executing an ALU/CMP instruction in fetching code? That can't be correct, can it?

And an ALU instruction (same as CMP) can never take 0 cycles, which is what's happening in said case (the ALU must need at least 1 cycle to shift or add etc. before writing the result to the register)?

A line of microcode can do two things in a single cycle. Some operation, then some other operation. Such as, doing some alu operation, then an RNI.

I do not execute RNI cycles in execute_instruction(). I don't know how to rephrase that in any other way that is more clear.

Therefore if an alu op is coming along for the ride on RNI, i don't have to explicitly account for that cycle. It doesn't mean the operation takes 0 cycles. It took 1 - the same cycle we did the RNI on.

So basically all those RNI instructions are calculating results as well for register operands (or just store them)? How come they're merged into essentially the start of the next instruction? Does that RNI happen after the 1st byte fetch or before it?

Author of the UniPCemu emulator.
UniPCemu Git repository
UniPCemu for Android, Windows, PSP, Vita and Switch on itch.io

Reply 118 of 122, by GloriousCow

Posted on 2023-10-06, 15:26

GloriousCow Offline

Rank Member

Rank: Member
Posts: 488
Joined: 2022-09-12, 20:00

superfury wrote on 2023-10-05, 16:48:

So basically all those RNI instructions are calculating results as well for register operands (or just store them)? How come they're merged into essentially the start of the next instruction? Does that RNI happen after the 1st byte fetch or before it?

an RNI-flagged instruction could do anything. being the effective end of the instruction, putting the results of some operation in a register is just a common thing to do along with RNI, but it doesn't mean that's all it could be doing. You have to look at the microcode.

superfury wrote on 2023-10-05, 16:48:

How come they're merged into essentially the start of the next instruction?

an RNI signals the end of an instruction, so you would expect the next thing to happen to be the start of the next instruction. An NXT flag even allows the RNI-tagged microcode instruction to execute on the cycle in which the next instruction byte is read out. It's a primitive sort of pipelining - rather than emulate that, i just end instructions on NXT or RNI, whichever comes first.

superfury wrote on 2023-10-05, 16:48:

Does that RNI happen after the 1st byte fetch or before it?

RNI *is* the fetch. It literally means "Read Next Instruction"

MartyPC: A cycle-accurate IBM PC/XT emulator | https://github.com/dbalsom/martypc

Reply 119 of 122, by superfury

Posted on 2023-10-06, 18:11

superfury Offline

Rank l33t++

Rank: l33t++
Posts: 5831
Joined: 2014-03-08, 11:25
Location: Netherlands

GloriousCow wrote on 2023-10-06, 15:26:
an RNI-flagged instruction could do anything. being the effective end of the instruction, putting the results of some operatio […]
Show full quote

superfury wrote on 2023-10-05, 16:48:

So basically all those RNI instructions are calculating results as well for register operands (or just store them)? How come they're merged into essentially the start of the next instruction? Does that RNI happen after the 1st byte fetch or before it?

an RNI-flagged instruction could do anything. being the effective end of the instruction, putting the results of some operation in a register is just a common thing to do along with RNI, but it doesn't mean that's all it could be doing. You have to look at the microcode.

superfury wrote on 2023-10-05, 16:48:

How come they're merged into essentially the start of the next instruction?

an RNI signals the end of an instruction, so you would expect the next thing to happen to be the start of the next instruction. An NXT flag even allows the RNI-tagged microcode instruction to execute on the cycle in which the next instruction byte is read out. It's a primitive sort of pipelining - rather than emulate that, i just end instructions on NXT or RNI, whichever comes first.

superfury wrote on 2023-10-05, 16:48:

Does that RNI happen after the 1st byte fetch or before it?

RNI *is* the fetch. It literally means "Read Next Instruction"

So basically, I need to do the inverse in my case: tick an extra cycle (NOP cycle on the EU) if an instruction ends without said flag set?

Author of the UniPCemu emulator.
UniPCemu Git repository
UniPCemu for Android, Windows, PSP, Vita and Switch on itch.io

Main menu