UniPCemu cycle accurate 8088 implementation

Reply 100 of 198, by superfury

Posted on 2017-04-28, 15:21

superfury Offline

Rank l33t++

Rank: l33t++
Posts: 5463
Joined: 2014-03-08, 11:25
Location: Netherlands

Here's the link with the full log of the executed cycles during the credits(which still hangs):
https://www.dropbox.com/s/o8b2dzkxbf9ppaj/deb … 7_2054.zip?dl=0

Edit: It jumps back to the top at the end, to offset 0x26C?

Last edited by superfury on 2017-04-28, 15:48. Edited 1 time in total.

Author of the UniPCemu emulator.
UniPCemu Git repository
UniPCemu for Android, Windows, PSP, Vita and Switch on itch.io

Reply 101 of 198, by reenigne

Posted on 2017-04-28, 15:44

reenigne Offline

Rank Oldbie

Rank: Oldbie
Posts: 610
Joined: 2006-11-30, 05:13
Location: Cornwall, UK

I don't see IP going into the weeds in this log...

I found where it's writing to the "mov cl,xx" instruction by searching for "BX: 029C" but I don't see any indication of what value is being read from the stack (and written to memory) by this "POPW [CS:BX]" instruction.

Reply 102 of 198, by superfury

Posted on 2017-04-28, 15:50

superfury Offline

Rank l33t++

Rank: l33t++
Posts: 5463
Joined: 2014-03-08, 11:25
Location: Netherlands

You're right, I'm missing that information too. I'm now going to look into why it isn't logging that...

Edit: Just after, I see 202B2 being read(B3) and 202B4 being read(8A), but not 202B3(which should be prefetched(07 according to the disassembly)). So, for some strange reason, parts of the log by the BIU is missing...

Author of the UniPCemu emulator.
UniPCemu Git repository
UniPCemu for Android, Windows, PSP, Vita and Switch on itch.io

Reply 103 of 198, by superfury

Posted on 2017-04-28, 16:27

superfury Offline

Rank l33t++

Rank: l33t++
Posts: 5463
Joined: 2014-03-08, 11:25
Location: Netherlands

Looking at my own memory access code(3 layers), there's 3 kinds of memory accesses(and layers to travel through. Layer 3 through 1 is the normal data path during valid mapped RAM accesses using Segment:Offset accessing):
1. Actual RAM addresses(non-memory mapped device only, so actual emulated RAM. Doesn't log for memory holes. Uses memory access descriptor type -1(The segment descriptors are all type >=0)). These are logged as "Reading/writing to/from RAM: %08X=%02X (%c)".
2. Paged RAM addresses(including memory-mapped devices). Uses memory access descriptor type -2(Used for anything requiring paging, but not paging translation itself. So this is used for anything by the CPU through Paging. So all normal segment:offset memory addresses and things needing paging(descriptor cache loads, interrupt vectors etc.)). This layer doesn't log anything.
3. Normal MMU accesses. These are the actual Segment:Offset addresses normally used for most operations(instruction fetches, memory fetching using segment:offset, memory writes using segment:offset).

Currently, two data types are logged: data written to and read from RAM itself(Layer 1) and data written to and read from all hardware(Between layer1 and any other higher layers, logging as "Read from memory: %08X=%02X (%c)" and "Writing to memory: %08X=%02X (%c)". Read logging at this level only applies to normal memory accesses currently(so no opcode fetches are logged at this level)).

Since the opcodes aren't all logged(read from RAM itself), could this mean that it's somehow reading from MMIO devices?

Looking at the logs(looking at the MMU read/write cycles), it looks like it's actually reading data from the hardware(reading two bytes, then writing two bytes), but it doesn't log the actual access to/from RAM in any way?
Edit: I've just modified the MMU to also log memory accesses that are opcode fetches. But it's still strange as to why the POPW accesses disappear from the logs. This shouldn't be the case(at least both logs should be there).

Edit: I've just implemented logging at the Paging and Normal layers. So any kind of access from the BIU will be logged now. I'm starting 8088 MPH now...

Author of the UniPCemu emulator.
UniPCemu Git repository
UniPCemu for Android, Windows, PSP, Vita and Switch on itch.io

Reply 104 of 198, by reenigne

Posted on 2017-04-28, 17:31

reenigne Offline

Rank Oldbie

Rank: Oldbie
Posts: 610
Joined: 2006-11-30, 05:13
Location: Cornwall, UK

superfury wrote:
Since the opcodes aren't all logged(read from RAM itself), could this mean that it's somehow reading from MMIO devices?

Since it should be copying a word from 2004:030D to 2004:029C it should be real RAM!

superfury wrote:
Looking at the logs(looking at the MMU read/write cycles), it looks like it's actually reading data from the hardware(reading two bytes, then writing two bytes), but it doesn't log the actual access to/from RAM in any way?

If the "POP mw" opcode was broken and had no effect, that would account for the crash! I can't think of anywhere else offhand where it's used in 8088MPH - it's not a very common operation.

Reply 105 of 198, by superfury

Posted on 2017-04-29, 07:51

superfury Offline

Rank l33t++

Rank: l33t++
Posts: 5463
Joined: 2014-03-08, 11:25
Location: Netherlands

Well, when you look at the cycles execution you see that the BIU does execute the operation: First two times 4 memory read cycles(reading from the stack), then two times 4 memory write cycles(writing to modr/m address). The only strange thing is that nothing's logged about the RAM writes themselves. They're both layer 3 accesses, so maybe some kind of faulty abort?

Edit: Sorry for it taking this long. I'm currently on my 10 year old laptop this weekend, so the demo is taking quite a while to get to the point we need logging(been running since this morning and (besides the PC showing it's age due to problems with the power supply, which will make it turn off(because of power loss(w/ possible hard disk/OS damage)) when disconnected from the wall socket. Atm it's at the sprite demo(at 14:39, running for about 2:39 hours at 1% speed atm).). For some reason the task planning of Windows isn't working properly either(I keep entering "CHKDSK /F C:" in the Command prompt(with Administrator privileges), but it won't start and just start Windows 10 when I reboot, so no way of knowing if the hard disk is intact from last power failure). I'll need to wait manually until it gets to the point that I can enable the logging capture(when it reaches the 3D objects part of the demo), so it will break into the debugger when it gets to the point it starts the credits(any sooner and it will break into the wrong points, due to the 8088 MPH loader reusing memory).
Edit: Was busy looking for a new phone, so lost the exact moment to set the breakpoint. I think I'll first need to optimize the emulator more in order to make it fast enough on the slower laptop to be workable(instead of running below 1% for most of the demo, making it unrunnable for some unknown reason). Maybe too much cross-threading(delays)? It reported around 1600 cross-threading synchronization (1ms) lagging on the 4.0GHz i7 CPU. Mostly the renderer update(SDL itself flipping the rendered surface into view), so the SDL_Flip functionality(SDL_RenderSurface to be exact).

Author of the UniPCemu emulator.
UniPCemu Git repository
UniPCemu for Android, Windows, PSP, Vita and Switch on itch.io

Reply 106 of 198, by superfury

Posted on 2017-05-01, 07:34

superfury Offline

Rank l33t++

Rank: l33t++
Posts: 5463
Joined: 2014-03-08, 11:25
Location: Netherlands

Looking at the newly generated code(generated just now), it looks like it's somehow disabling the log during execution. Only one memory access is logged(special flag which is set before each CPU exection step(each 1 cycle step)) during the entire instruction, no matter what it's logging(memory reads or memory writes or instruction fetches). So there might be a problem in the logging detection itself here.

Author of the UniPCemu emulator.
UniPCemu Git repository
UniPCemu for Android, Windows, PSP, Vita and Switch on itch.io

Reply 107 of 198, by superfury

Posted on 2017-05-01, 09:08

superfury Offline

Rank l33t++

Rank: l33t++
Posts: 5463
Joined: 2014-03-08, 11:25
Location: Netherlands

I've just found out something odd: it was only loading the debugger state(to log and to generate debugger information(to log MMU accesses or not etc.)) when starting a new instruction, until the first byte of the instruction was read. After each BIU cycle, this flag is cleared to block hardware(DMA to be specific) to log memory accesses. This caused it to only log the very first instruction fetch(if any) and stop logging immediately after the very first cycle it's read the very first instruction byte into the BIU.

Author of the UniPCemu emulator.
UniPCemu Git repository
UniPCemu for Android, Windows, PSP, Vita and Switch on itch.io

Reply 108 of 198, by reenigne

Posted on 2017-05-01, 09:16

reenigne Offline

Rank Oldbie

Rank: Oldbie
Posts: 610
Joined: 2006-11-30, 05:13
Location: Cornwall, UK

superfury wrote:
the demo is taking quite a while to get to the point we need logging(been running since this morning

Oh wow, I didn't realize that you needed to run the entire demo for each iteration of debugging the credits. That's going to slow down your development cycle quite a lot. If there's no other option I could send you some development tools we used for the demo that means you can start it at any part, but I think this would be a really good point for you to implement state saving/loading (since that'll help you debug your emulator's handling of programs other than 8088MPH).

Reply 109 of 198, by superfury

Posted on 2017-05-01, 10:19

superfury Offline

Rank l33t++

Rank: l33t++
Posts: 5463
Joined: 2014-03-08, 11:25
Location: Netherlands

Logging works properly now. Here's the full log, with the missing data:
https://www.dropbox.com/s/3tm1gim2sx0km30/deb … 1_1105.zip?dl=0

Author of the UniPCemu emulator.
UniPCemu Git repository
UniPCemu for Android, Windows, PSP, Vita and Switch on itch.io

Reply 110 of 198, by Scali

Posted on 2017-05-01, 10:29

Scali Offline

Rank l33t

Rank: l33t
Posts: 4873
Joined: 2014-12-13, 14:24

I think the credits could be run standalone. You can just start "8088MPH.00D" directly.
This won't work with all parts, because some of them rely on the resident loader.

http://scalibq.wordpress.com/just-keeping-it- … ro-programming/

Reply 111 of 198, by superfury

Posted on 2017-05-01, 11:00

superfury Offline

Rank l33t++

Rank: l33t++
Posts: 5463
Joined: 2014-03-08, 11:25
Location: Netherlands

Just tried copying 8088MPH.00D to CREDITS.00D and running it. It directly launches into the credits and ends up hanging(as expected).

Now the question: can you see anything going wrong in the log? Everything is emulated cycle-by-cycle(although the DMA states are handled after each BIU(and EU by extension, since it's based on the BIU and giving the BIU timings to work with) cycle and isn't logged in the log).

Author of the UniPCemu emulator.
UniPCemu Git repository
UniPCemu for Android, Windows, PSP, Vita and Switch on itch.io

Reply 112 of 198, by Scali

Posted on 2017-05-01, 11:44

Scali Offline

Rank l33t

Rank: l33t
Posts: 4873
Joined: 2014-12-13, 14:24

What is interesting to note is that the credits also crash on certain real hardware, such as my Commodore PC20-III (real 8088 at 4.77 MHz).
It plays a few notes, shows the initial scroller screen, and then locks up.
We aren't entirely sure why it crashes.
The one issue we knew of is the prefetch buffer with the SMC, but since the PC20-III uses a real 8088, it shouldn't be affected. I actually switched the CPU with a known good 8088, and it didn't make a difference. The CPU from the PC20-III worked fine in another machine, and the known good CPU crashed in the exact same way in the PC20-III.

All we know is that if we replace the CRTC writes with nops, the crash does not happen. But we can't explain it.
Perhaps you're running into that same mysterious issue here.

http://scalibq.wordpress.com/just-keeping-it- … ro-programming/

Reply 113 of 198, by superfury

Posted on 2017-05-01, 12:35

superfury Offline

Rank l33t++

Rank: l33t++
Posts: 5463
Joined: 2014-03-08, 11:25
Location: Netherlands

With CRTC writes, do you mean the pseudocode that's mentioned in reenigne's post?

http://www.reenigne.org/blog/8088-pc-speaker- … r-how-its-done/

reenigne wrote:
Here is one of the v-instructions from the CRTC update v-instruction routine in 8088 MPH: […]
Show full quote
Here is one of the v-instructions from the CRTC update v-instruction routine in 8088 MPH:

forget 7
startAt 4
MOV_BX_DX
MOV_DX_iw
w 0x3d4
MOV_AX_iw
w 0x990d
OUT_DX_AX
MOV_DX_BX
runV 1
This translates to the 8088 instructions:

mov bx,dx
mov dx,0x03d4
mov ax,0x990d
out dx,ax
mov dx,bx

Author of the UniPCemu emulator.
UniPCemu Git repository
UniPCemu for Android, Windows, PSP, Vita and Switch on itch.io

Reply 114 of 198, by reenigne

Posted on 2017-05-01, 13:53

reenigne Offline

Rank Oldbie

Rank: Oldbie
Posts: 610
Joined: 2006-11-30, 05:13
Location: Cornwall, UK

superfury wrote:
Logging works properly now. Here's the full log, with the missing data:
https://www.dropbox.com/s/3tm1gim2sx0km30/deb … 1_1105.zip?dl=0

It is, as I predicted, a problem with the prefetching. At line 9729 in the latest debugger.log, the instruction at 029C is "MOVB CL,01". The POP instruction at line 10492 rewrites this as "MOVB CL,02". This instruction should be in the prefetch queue by this time. However, at line 10520, the instruction executed is "MOVB CL,02" (the new one) rather than "MOVB CL,01" (the old one) so it's executing the instruction as it is in memory, not as it is in the prefetch queue.

Reply 115 of 198, by superfury

Posted on 2017-05-01, 14:03

superfury Offline

Rank l33t++

Rank: l33t++
Posts: 5463
Joined: 2014-03-08, 11:25
Location: Netherlands

So of the instructions in the loop executed immediately before it, one of those instructions has incorrect timings?

Author of the UniPCemu emulator.
UniPCemu Git repository
UniPCemu for Android, Windows, PSP, Vita and Switch on itch.io

Reply 116 of 198, by Scali

Posted on 2017-05-01, 14:28

Scali Offline

Rank l33t

Rank: l33t
Posts: 4873
Joined: 2014-12-13, 14:24

superfury wrote:
So of the instructions in the loop executed immediately before it, one of those instructions has incorrect timings?

Looks like it...
Assuming your prefetch buffer works correctly, then apparently mov cl, 01 wasn't prefetched before it was modified.
So either there were too many bus cycles stolen by the instructions above, preventing the prefetcher from doing its magic... or some instructions executed more quickly than they should have, so that prefetching hadn't reached the mov cl, 01 yet.

http://scalibq.wordpress.com/just-keeping-it- … ro-programming/

Reply 117 of 198, by reenigne

Posted on 2017-05-01, 14:39

reenigne Offline

Rank Oldbie

Rank: Oldbie
Posts: 610
Joined: 2006-11-30, 05:13
Location: Cornwall, UK

superfury wrote:
So of the instructions in the loop executed immediately before it, one of those instructions has incorrect timings?

I think the timings of the "POP [CS:BX]" instruction itself are incorrect. Take a look at http://reenigne.homenet.org/sTwTH9fedRosj2a50.sniff.txt, which is essentially the same loop. Execution of the POP instruction starts at line 308. The stack fetches don't start until line 320, so there is time for at least two prefetches (the ones that start on lines 310 and 314) before the stack fetches start. There is also time for a third prefetch at line 328, in between the stack fetches and the result writes (which start on line 334). So by the time the following instruction starts executing at line 342, it's definitely going to be in the prefetch queue no matter what occurred prior to the POP.

As your log isn't laid out cycle-by-cycle like the sniffer logs are, it's hard for me to tell exactly what the problem is, but I suspect that you're doing too much of the POP instruction's work too early in its execution.

Reply 118 of 198, by superfury

Posted on 2017-05-01, 18:45

superfury Offline

Rank l33t++

Rank: l33t++
Posts: 5463
Joined: 2014-03-08, 11:25
Location: Netherlands

In UniPCemu's case, the instruction is handled as follows:
1. The cycle following the cycle the instruction bytes and parameters are fetched, execution is started(S point in the sniffer log, can be any T-state).
2. The EU will delay 5 cycles(EA timings for BX address).
3. The Stack request is requested. The BIU will respond with value read on the first T3 cycle(starting the request on the very first following T1 cycle after request).
4. The EU will delay 17-8 (memory access for ModR/M after all) cycles, thus 9 cycles.
5. The EU will request a memory write at the ModR/M location.
6. When the memory request is given OK by the BIU(on T4), the next instruction will start (I-phase, as soon as the prefetch has 1 byte in it).

The BIU starts memory requests on T1 cycle and finishes at T4(Thus the following T1 cycle the EU has it's data).

BIU script:

1		if (((BIU[activeCPU].prefetchclock&3)==3) && (BIU[activeCPU].waitstateRAMremaining)) //T4? Check for waitstate RAM first!
2		{
3			//WaitState RAM busy?
4			--BIU[activeCPU].waitstateRAMremaining; //Tick waitstate RAM!
5		}
6		else //No waitstates to apply?
7		{
8			if (BUSactive==2) {++CPU[activeCPU].cycles_Prefetch_DMA; } //Handling a DRAM refresh? We're idling on DMA!
9			else //Active CPU cycle?
10			{
11				cycleinfo->curcycle = (BIU[activeCPU].prefetchclock++&3); //Current cycle!
12				if (cycleinfo->cycles_stallBIU) //To stall?
13				{
14					--cycleinfo->cycles_stallBIU; //Stall the BIU instead of normal runtime!
15				}
16				else if ((cycleinfo->curcycle==0) && (BUSactive==0)) //T1 while not busy? Start transfer, if possible!
17				{
18					if (cycleinfo->prefetchcycles) {--cycleinfo->prefetchcycles; goto tryprefetch808X;}
19					else if (cycleinfo->iorcycles) { cycleinfo->iorcycles -= 4; BUSactive = BUSactive?BUSactive:1; } //Skip read cycle!
20					else if (cycleinfo->iowcycles && (cycleinfo->cycles<=cycleinfo->iowcyclestart)) { cycleinfo->iowcycles -= 4; BUSactive = BUSactive?BUSactive:1; } //Skip write cycle!
21					else
22					{
23						tryprefetch808X:
24						if (BIU_processRequests(memory_waitstates)) //Processing a request?
25						{
26							BIU[activeCPU].requestready = 0; //We're starting a request!					
27						}
28						else if (fifobuffer_freesize(BIU[activeCPU].PIQ)>=((uint_32)2>>CPU_databussize)) //Prefetch cycle when not requests are handled? Else, NOP cycle!
29						{
30							BUSactive = BUSactive?BUSactive:1; //Start memory cycles!
31							CPU_fillPIQ(); //Add a byte to the prefetch!
32							if (CPU_databussize==0) CPU_fillPIQ(); //8086? Fetch words!
33							++CPU[activeCPU].cycles_Prefetch_BIU; //Cycles spent on prefetching on BIU idle time!
34							BIU[activeCPU].waitstateRAMremaining += memory_waitstates; //Apply the waitstates for the fetch!
35							BIU[activeCPU].requestready = 0; //We're pending a request!
36						}
37					}
38				}
39				else if ((BUSactive==1) && (cycleinfo->curcycle==3)) //Finishing transfer on T4?
40				{
41					BUSactive = 0; //Inactive BUS!
42					BIU[activeCPU].requestready = 1; //The request is ready to be served!
43				}
44				--cycleinfo->cycles; //Decrease the amount of cycles that's left!
45			}
46		}

BIU_processRequests handles one byte(still needs to be extended for word accesses on 8086+) read/written to/from memory or BUS. prefetchclock(0-3) contains the current T-state.

Edit: Why does it wait 8 cycles? Only 5 cycles are documented(when it starts) for EA calculation?

Author of the UniPCemu emulator.
UniPCemu Git repository
UniPCemu for Android, Windows, PSP, Vita and Switch on itch.io

Reply 119 of 198, by reenigne

Posted on 2017-05-01, 19:59

reenigne Offline

Rank Oldbie

Rank: Oldbie
Posts: 610
Joined: 2006-11-30, 05:13
Location: Cornwall, UK

superfury wrote:
Edit: Why does it wait 8 cycles? Only 5 cycles are documented(when it starts) for EA calculation?

This isn't documented anywhere as far as I know...

According to the "8086 family users manual" the best case 8086 timing for this instruction is 17+EA, EA being 5 for [bx]. So that's a total of 22. Since this is an 8088, we have to add another 4 clocks for each word transfer giving 30 cycles. The actual measured value from the sniffer log (counting from the instruction's "I" cycle to the "I" cycle of the next instruction) is 34 cycles. Presumably the other 4 cycles are due to the state of the prefetch queue and the BIU cycle at the start of the instruction (I'm not sure if the theoretical best time of 30 cycles is actually achievable in practice).

Counting the last S cycle as cycle 0 (the point at which the EU has all the instruction bytes) the stack read starts at cycle 8 and then there's 6 cycles between the end of the stack read the start of the store. I'm not sure exactly what the EU is doing during these 8-cycle and 6-cycle delays. It might be executing some microcode. Some of the cycles are to do with the EA decoding (which is presumably 5 cycles). Presumably 1 cycle is for getting the modrm byte from the prefetch queue. Presumably another 4 cycles are the logic of the "POP" operation itself (that's what "POP rw" and "POP segreg" take when you subtract the 4 cycles for an 8086 bus word operation). That leaves another 4 cycles for the part of "POP mw" which is not shared with "POP rw" (possibly something to do with coordinating the various BIU operations and shuffling data around).

Main menu

Common searches