UniPCemu progress

Reply 300 of 710, by superfury

Posted on 2020-11-29, 03:16

superfury Offline

Rank l33t++

Rank: l33t++
Posts: 5822
Joined: 2014-03-08, 11:25
Location: Netherlands

Just managed to fix some bugs in UniPCemu's paging handling. It still crashes when using the EMM386 driver, though (with the paging unit enabled).

So far managed to fix some processing bugs with paging-occuring memory locks, interrupt paging issues(on the fetching of the IDT entry) and IPS clocking mode memory requests getting accepted by the BIU(like, for example, a bus read request) and somehow discarded due to the paging handling of the lock variables(which are meant for the non-bus memory transfers only, causing issues with I/O transfers when used).
Still ends up at the PUSH EDX instruction that's somehow triple faulting the CPU due to a kernel stack overflow on EMM386's kernel stack?

Edit: Hmmm... I see the EMM386 kernel in protected mode somehow trying to jump to real-mode segment selectors(which it can't, obviously), followed by all kinds of weird things(IRET/interrupt general protection faults, stack faults(which should never occur during normal operation).

Something's definitely going crazy with the new paging locking feature enabled? Anyone can see what's going wrong?
It all starts with the BIU_obtainbuslock() function call in paging.c returning 1, which causes the paging unit to respond to the caller routine (like a stack or memory check for an instruction) that it needs to wait for the paging unit to obtain a bus lock. That's done by returning the value 2 to the caller(which gets all the way to the main memory check functions or segmentWritten function). Some functions return plain 2, others will result with the special -2 instead of the usual -1 result(which happens for page faults instead).
Edit: Those seem to happen within segment 48h in kernel mode?
Edit: It eventually occurred to me that some instructions change ESP(some POP instructions) and then modify a segment register. If said segment register would cause a new paging lock abort(to restart the load when granted) to occur, it would restore the state of the undecreased ESP value before the instruction was started, instead of actually using the decreased ESP value to return to.
The same issue applied to the RETF instruction, which rougly does the same(pop registers, then writes a segment's new value). And by extension, the RETF handling in the segmentation-specific stack return to lower privilege level required said functionality as well(as it would alter ESP in a non-recoverable way when such a page lock would occur).

That seems to have fixed the EMM386 loading process. Testing EMS with no$gmb(which uses EMS) seems to run fine?

Author of the UniPCemu emulator.
UniPCemu Git repository
UniPCemu for Android, Windows, PSP, Vita and Switch on itch.io

Reply 301 of 710, by superfury

Posted on 2020-11-29, 11:35

superfury Offline

Rank l33t++

Rank: l33t++
Posts: 5822
Joined: 2014-03-08, 11:25
Location: Netherlands

It eventually crashed, because it was causing a pseudo-interrupt when the EIP value reached 0x10000 in the V86-mode. So I left the EXT bit intact, but caused it to throw a #GP fault for invalid CPL rights.

Edit: Just moved all stack-specific ESP committing for page loading to the stack module instead, so it will only be triggered once required to trigger (preventing reloads of it when not supposed to).

Edit: EMM386 seems to be mostly working now. I just notice something odd: when pressing the escape key while the emulator is running, this eventually causes a pseudo-#GP fault(because it's in Virtual 8086 mode) because of executing EIP 0x10000. This eventually leads to the kernel code executing an invalid LXS instruction(LDS or LES I think) which throws an #UD in the kernel mode handler? I don't think that's supposed to happen? The modr/m byte of it is 0x1E.
Edit: Hmmm... I see the V86 program executing opcode 8E B2, which is an invalid MOV Sreg instruction with an invalid segment register specified?
Edit: Weird. When enabling the new paging locking functionality, it keeps trying to load segment ECC0 in the EMM386 VMM? That's using the DS segment (GRP5 opcode) with the used data segment(DS) being 0xC0?
However, it's weirdly in low RAM instead of upper 1MB RAM?

Author of the UniPCemu emulator.
UniPCemu Git repository
UniPCemu for Android, Windows, PSP, Vita and Switch on itch.io

Reply 302 of 710, by superfury

Posted on 2020-11-29, 19:33

superfury Offline

Rank l33t++

Rank: l33t++
Posts: 5822
Joined: 2014-03-08, 11:25
Location: Netherlands

So, with the latest source code, any EMM386-based OS(EMM386 under MS-DOS, Windows 3.x(untested) and 9x(confirmed)) fail to start when the paging unit is enabled(they always crash and burn with the paging bus locking feature enabled).

The bus lock feature improves paging with multiple cores mainly, because it's executing the mandatory(documented) bus locks preventing other CPUs from locking the bus for doing the same(as well as the LOCK prefixed instructions on other CPUs(only the other CPU and DMA)) until the affected instruction finishes.

Unfortunately, somehow, the bus locking pending state(waiting for approval of the bus arbitration in the emulator core itself) causes some of the instructions or mechanism to go awry, causing the VMM kernel to jump or call a weird segment:offset pair(main issue being said segment), which causes the kernel(CPL 0) to #GP(selector) fault, causing eventual stack faults leading to double and then triple fault.

Can anyone look at the protection.c and related CPU modules and see what's going wrong?
It looks like the stack is having issues somehow, but I'm not sure if that's actually the issue? Some check done during segmentWritten's (the main protected-mode segment handling routine) calls somewhere perhaps? DS looks like it's receiving a weird(sub-1MB non-zero) base? Perhaps some issue with that?

The feature can be enabled by uncommenting the small block of code in paging.c, before the prefetch abort. Windows NT 4.0 doesn't seem to have additional issues with it, but EMM386 OSes and drivers(dos, 9x/3.x) do crash without visible explanation atm, because of some weird data corruption, it seems?

Author of the UniPCemu emulator.
UniPCemu Git repository
UniPCemu for Android, Windows, PSP, Vita and Switch on itch.io

Reply 303 of 710, by superfury

Posted on 2020-11-30, 09:01

superfury Offline

Rank l33t++

Rank: l33t++
Posts: 5822
Joined: 2014-03-08, 11:25
Location: Netherlands

Found and fixed some more stack and paging bugs related to the new paging l0cks.

Also changed the small block to enable the functionality to a simple define at the top of the paging module instead of being commented out. So to enable it, simply remove the line-comment before said line at the start of the paging module.

Edit: It seems the issue isn't in the protected mode segmentation logic itself?
Edit: This is what's happening:

The attachment debugger_EMM386_crashingcause_UniPCemu_20201201_1354.7z is no longer available

Edit: Just added a bit of extra logging to the advanced logging methods. It will now log "Paging pending" whenever it's waiting for a paging operation to complete the bus lock.
That should make it able to find the exact cause of this?

The attachment debugger_EMM386_crashing_UniPCemu_20201201_1400.7z is no longer available

Just improved the log a bit with pending acccepted cycles included:

The attachment debugger_EMM386_crashing_UniPCemu_20201201_1748.7z is no longer available

As can be seen, when the paired accepting ends up below the instruction itself, it was pending due to the instruction itself dereferencing said memory address. Otherwise, it was due to a prefetch paging operation(which is accepted before the instruction gives it's disassembly of the instruction).

Author of the UniPCemu emulator.
UniPCemu Git repository
UniPCemu for Android, Windows, PSP, Vita and Switch on itch.io

Reply 304 of 710, by superfury

Posted on 2020-12-01, 19:10

superfury Offline

Rank l33t++

Rank: l33t++
Posts: 5822
Joined: 2014-03-08, 11:25
Location: Netherlands

So far confirmed that the basic instructions don't seem to cause the issue (because they're written according to the cylce-accurate stepping used in UniPCemu, they can't cause issues directly and should be restartable).
Interrupts themselves also seem fine: I see each of them properly finishing. All interrupts that actually occur are from Virtual 8086 mode to the privilege level 0-mode handler.
So perhaps some issue with the segmentation to be applied somewhere?
Edit: Just compared the running of EMM386 with the non-pagefault version. It doesn't even throw the #GP(ECC0) fault? So there's definitely an error caused by the paging unit somehow, before ending up with the ECC0 in protected mode?

I'm generating the debugger logs with the following parameters in UniPCemu's debugger settings:
Breakpoint: 0000:0000PM
CR3 breakpoint: 118000
Debug mode: Enabled, just run, don't show, ignore shoulder buttons
Debugger log: Only when debugging, common log format

So the ECC0 isn't supposed to have been loaded at all? There's something going very wrong in protected mode with the new paging enabled? Or perhaps somewhere inside the normal Virtual 8086 mode even (which I didn't check)?
Edit: So the last occurrence of "ESP: 00000fdc" seems to be the last occurrence of virtual 8086 mode being ended by a fault or interrupt? It's starting execution at 0048:00000166 in protected mode.

Edit: The following sequence of instructions seems to be responsible for getting the SP register to impossibly low? That eventually causes an underflow on the stack?

1	RealRAM(p):001389df=03(); RAM(p):001389df=03(); Physical(p):001389df=03(); Paged(p):001389df=03(); Normal(p):000005df=03(); RealRAM(p):001389e0=e9(é); RAM(p):001389e0=e9(é); Physical(p):001389e0=e9(é); Paged(p):001389e0=e9(é); Normal(p):000005e0=e9(é)
20048:000005d2 8B 76 08 mov si,word ss:[bp+08]	RealRAM(r):00142edc=5a(Z); RAM(r):00142edc=5a(Z); Physical(r):00142edc=5a(Z); Paged(r):00142edc=5a(Z); RealRAM(r):00142edd=03(); RAM(r):00142edd=03(); Physical(r):00142edd=03(); Paged(r):00142edd=03()
3Registers:
4EAX: 000000fa EBX: 000800d8 ECX: 00007601 EDX: 00001a96
5ESP: 00000fcc EBP: 00000fd4 ESI: 001232be EDI: 0011a32e
6CS: 0048 DS: 00d8 ES: 0050 FS: 0000 GS: 0000 SS: 0058 TR: 0028 LDTR: 0000
7EIP: 000005d2 EFLAGS: 00003016
8CR0: e0000011 CR1: 00000000 CR2: 00000000 CR3: 00118000
9CR4: 00000000
10DR0: 00000000 DR1: 00000000 DR2: 00000000 DR3: 00000000
11DR6: 00000000 DR7: 00000000
12GDTR: 0000001437800177 IDTR: 000000142f8007ff
13CS descriptor: 00009B138400FFFF
14DS descriptor: 00009308E810FFFF
15ES descriptor: 00009313E800FFFF
16FS descriptor: 0000130035F0FFFF
17GS descriptor: 0000130029E0FFFF
18SS descriptor: 000093141F000FFF
19TR descriptor: 00008B13EED4FFFF
20LDTR descriptor: 0000000000000000
21FLAGSINFO: 0000000000ipfavr0n11oditsz0A0P1c 
22	RealRAM(p):001389e1=d5(Õ); RAM(p):001389e1=d5(Õ); Physical(p):001389e1=d5(Õ); Paged(p):001389e1=d5(Õ); Normal(p):000005e1=d5(Õ); RealRAM(p):001389e2=05(); RAM(p):001389e2=05(); Physical(p):001389e2=05(); Paged(p):001389e2=05(); Normal(p):000005e2=05(); RealRAM(p):001389e3=83(ƒ); RAM(p):001389e3=83(ƒ); Physical(p):001389e3=83(ƒ); Paged(p):001389e3=83(ƒ); Normal(p):000005e3=83(ƒ)
2300:04:15:41.02640: Paging pending!
240048:000005d5 8A 1C mov bl,byte ds:[si]
2500:04:15:41.02816: Paging pending: bus locked!
26	RealRAM(r):00118003=00( ); RAM(r):00118003=00( ); Physical(r):00118003=00( ); RealRAM(r):00118002=12(); RAM(r):00118002=12(); Physical(r):00118002=12(); RealRAM(r):00118001=40(@); RAM(r):00118001=40(@); Physical(r):00118001=40(@); RealRAM(r):00118000=67(g); RAM(r):00118000=67(g); Physical(r):00118000=67(g); RealRAM(r):0012423b=00( ); RAM(r):0012423b=00( ); Physical(r):0012423b=00( ); RealRAM(r):0012423a=08(); RAM(r):0012423a=08(); Physical(r):0012423a=08(); RealRAM(r):00124239=e0(à); RAM(r):00124239=e0(à); Physical(r):00124239=e0(à); RealRAM(r):00124238=67(g); RAM(r):00124238=67(g); Physical(r):00124238=67(g); RealRAM(r):0008eb6a=63(c); RAM(r):0008eb6a=63(c); Physical(r):0008eb6a=63(c); Paged(r):0008eb6a=63(c); RealRAM(p):001389e4=c6(Æ); RAM(p):001389e4=c6(Æ); Physical(p):001389e4=c6(Æ); Paged(p):001389e4=c6(Æ); Normal(p):000005e4=c6(Æ); RealRAM(p):001389e5=02(); RAM(p):001389e5=02(); Physical(p):001389e5=02(); Paged(p):001389e5=02(); Normal(p):000005e5=02()
270048:000005d7 B7 00 mov bh,00
28Registers:
29EAX: 000000fa EBX: 00080063 ECX: 00007601 EDX: 00001a96
30ESP: 00000fcc EBP: 00000fd4 ESI: 0012035a EDI: 0011a32e
31CS: 0048 DS: 00d8 ES: 0050 FS: 0000 GS: 0000 SS: 0058 TR: 0028 LDTR: 0000
32EIP: 000005d7 EFLAGS: 00003016
33CR0: e0000011 CR1: 00000000 CR2: 00000000 CR3: 00118000
34CR4: 00000000
35DR0: 00000000 DR1: 00000000 DR2: 00000000 DR3: 00000000
36DR6: 00000000 DR7: 00000000
37GDTR: 0000001437800177 IDTR: 000000142f8007ff
38CS descriptor: 00009B138400FFFF
39DS descriptor: 00009308E810FFFF
40ES descriptor: 00009313E800FFFF
41FS descriptor: 0000130035F0FFFF
42GS descriptor: 0000130029E0FFFF
43SS descriptor: 000093141F000FFF
44TR descriptor: 00008B13EED4FFFF
45LDTR descriptor: 0000000000000000
46FLAGSINFO: 0000000000ipfavr0n11oditsz0A0P1c 
47	RealRAM(p):001389e6=68(h); RAM(p):001389e6=68(h); Physical(p):001389e6=68(h); Paged(p):001389e6=68(h); Normal(p):000005e6=68(h); RealRAM(p):001389e7=b8(¸); RAM(p):001389e7=b8(¸); Physical(p):001389e7=b8(¸); Paged(p):001389e7=b8(¸); Normal(p):000005e7=b8(¸)
480048:000005d9 D1 E3 shl bx,1
49Registers:
50EAX: 000000fa EBX: 00080063 ECX: 00007601 EDX: 00001a96
51ESP: 00000fcc EBP: 00000fd4 ESI: 0012035a EDI: 0011a32e
52CS: 0048 DS: 00d8 ES: 0050 FS: 0000 GS: 0000 SS: 0058 TR: 0028 LDTR: 0000
53EIP: 000005d9 EFLAGS: 00003016
54CR0: e0000011 CR1: 00000000 CR2: 00000000 CR3: 00118000
55CR4: 00000000
56DR0: 00000000 DR1: 00000000 DR2: 00000000 DR3: 00000000
57DR6: 00000000 DR7: 00000000
58GDTR: 0000001437800177 IDTR: 000000142f8007ff
59CS descriptor: 00009B138400FFFF
60DS descriptor: 00009308E810FFFF

…Show last 7 lines

61ES descriptor: 00009313E800FFFF
62FS descriptor: 0000130035F0FFFF
63GS descriptor: 0000130029E0FFFF
64SS descriptor: 000093141F000FFF
65TR descriptor: 00008B13EED4FFFF
66LDTR descriptor: 0000000000000000
67FLAGSINFO: 0000000000ipfavr0n11oditsz0A0P1c

It only seems to happen when the new paging method is enabled.
And if you look closely, you can see that the mov to BL seems like it doesn't properly finish executing, since the registers aren't dumped for the instruction finishing?
It's indeed not finishing the instruction in this case, as is verified by looking at the debugger. The registers are only logged once an instruction completes. Since this 8A instruction doesn't log it's registers, that means that somehow the CPU is skipping or aborting said instruction once it's started up, which is incorrect behaviour?
Edit: It's executing alright. But since the debugger wasn't called for the locked instruction cycle being started up, it wasn't logged at all! That's because said lock acnowledge cycle-merge was handled without handling the debugger and callback handlers! Thus they would never show up on the debugger output and log.
Edit> The fixed log:

The attachment debugger_EMM386_crashing_UniPCemu_20201201_2314.7z is no longer available

Edit: Hmmm... So perhaps the issue of address 000101e3 is somewhere in Virtual 8086 mode instead?
Edit: Virtual 8086 mode log:

The attachment debugger_EMM386_crashing_V86mode_UniPCemu_20201201_2314.7z is no longer available

Edit: Hmmm... One good thing can be seen in the V86 log: Each "pending" paging request also has at least one "bus locked" logged in the same instruction. So the locking itself is at least operating in the same instruction, aborting as it should.
Now to hunt for the offending address and beyond...
Edit: It doesn't read or write said address. I do see some weird #UD(ARPL and the like) at the final instructions before the triple fault occurs because of the VMM being activated?
Edit: Hmmm... I see the offending ECC0 fault (not) happening in the following circumstances:
- Only during prefetches use the paging lock: ECC0 fault doesn't happen.
- During prefetches and instructions not affecting segment registers use the paging lock: fault happens.

So the issue is somewhere with some instruction not properly handling it's paging abort to wait for a lock when it's supposed to? That's somewhere inside the instruction handling, not having anything to do with the segmentation itself handling of page faults and interrupts. There is some instruction that doesn't adhere to the aborting of the instruction and restart later. And said instruction is performing an abort while not properly handling the relaunching of the instruction after the lock has been obtained.
Edit: Just tried filtering the paging lock to only happen when an normal instruction is executing, which isn't writing segment registers. The V86 mode programs will end up at a FFxx #UD instruction.
So there's definitely something going wrong in the instruction handling itself, not within the segmentation, interrupt or prefetch part. It's some instruction's execution part itself that's misbehaving somehow.

Last edited by superfury on 2020-12-02, 17:23. Edited 2 times in total.

Author of the UniPCemu emulator.
UniPCemu Git repository
UniPCemu for Android, Windows, PSP, Vita and Switch on itch.io

Reply 305 of 710, by superfury

Posted on 2020-12-02, 16:36

superfury Offline

Rank l33t++

Rank: l33t++
Posts: 5822
Joined: 2014-03-08, 11:25
Location: Netherlands

Hmmm... Limiting the locking prefetch to only lock the bus when prefetching makes it boot normally.
So the issue isn't somewhere in the prefetching part, but actually somewhere in the execution part(interrupts, task switches(not happening for this setup), instructions and segment writes)...

Edit: Putting the following condition on the bus locking to be applied makes it properly boot:

1	if (!((((CPU[activeCPU].instructionfetch.CPU_isFetching==0) && (!(((CPU[activeCPU].currentEUphasehandler == &CPU_executionphase_normal) && ((CPU[activeCPU].currentOP_handler == &CPU8086_OPCF) || (CPU[activeCPU].currentOP_handler == &CPU80386_OPCF) || (CPU[activeCPU].segmentWritten_instructionrunning)))))) && (CPU[activeCPU].currentEUphasehandler!=&CPU_executionphase_interrupt)))) //Are we fetching an instruction or executing one which isn't a segment write?

That pretty much confirms there is some instruction, which isn't faulting in a segment write function, that is getting the pending paging result and doesn't properly handle the restartability of the instruction!

Author of the UniPCemu emulator.
UniPCemu Git repository
UniPCemu for Android, Windows, PSP, Vita and Switch on itch.io

Reply 306 of 710, by superfury

Posted on 2020-12-03, 07:57

superfury Offline

Rank l33t++

Rank: l33t++
Posts: 5822
Joined: 2014-03-08, 11:25
Location: Netherlands

Hmmm... Returning 2 to pend the paging bus lock on instruction handlers only seems to have the effect of ending up with a FFFFh instruction, infinitely faulting? That's just with a handful of instructions?

The filter that's matched so far:
(CPU[0].instructionfetch.CPU_isFetching==0) && (CPU[0].currentEUphasehandler==&CPU_executionphase_normal) && (!(((CPU[activeCPU].currentEUphasehandler == &CPU_executionphase_normal) && ((CPU[activeCPU].currentOP_handler == &CPU8086_OPCF) || (CPU[activeCPU].currentOP_handler == &CPU80386_OPCF) || (CPU[activeCPU].segmentWritten_instructionrunning))))) && (CPU[0].currentOP_handler!=CPU80386_OP59) && (CPU[0].currentOP_handler!=CPU8086_OPC6) && (CPU[0].currentOP_handler!=CPU8086_OP80) && (CPU[0].currentOP_handler!=CPU80386_OP81) && (CPU[0].currentOP_handler!=CPU8086_OP1F) && (CPU[0].currentOP_handler!=CPU8086_OPF7) && (CPU[0].currentOP_handler!=CPU80386_execute_MOV_modrmmodrm32)&& (CPU[0].currentOP_handler!=CPU80386_OP83)&& (CPU[0].currentOP_handler!=CPU80386_OPA3_32)&& (CPU[0].currentOP_handler!=CPU8086_OP83)&& (CPU[0].currentOP_handler!=CPU8086_OP59)&& (CPU[0].currentOP_handler!=CPU8086_execute_MOV_modrmmodrm16)&& (CPU[0].currentOP_handler!=CPU8086_OP1E)&& (CPU[0].currentOP_handler!=CPU8086_OPFF)&& (CPU[0].currentOP_handler!=CPU8086_execute_MOV_modrmmodrm8)&& (CPU[0].currentOP_handler!=CPU80386_OP87)&& (CPU[0].currentOP_handler!=CPU80386_OP0FBA_16)&& (CPU[0].currentOP_handler!=CPU8086_OPE8)&& (CPU[0].currentOP_handler!=CPU8086_OPA7)&& (CPU[0].currentOP_handler!=CPU8086_OPA3)&& (CPU[0].currentOP_handler!=CPU80586_OP9C_16)&& (CPU[0].currentOP_handler!=CPU8086_execute_CMP_modrmmodrm8)&& (CPU[0].currentOP_handler!=CPU8086_OPF6)&& (CPU[0].currentOP_handler!=CPU8086_OP50)&& (CPU[0].currentOP_handler!=CPU386_OP6D)&& (CPU[0].currentOP_handler!=CPU80386_OP0FB6_16)&& (CPU[0].currentOP_handler!=CPU8086_OPA1) && (CPU[0].currentOP_handler!=CPU8086_OP8F) && (CPU[0].currentOP_handler!=CPU8086_execute_ADD_modrmmodrm8)&& (CPU[0].currentOP_handler!=CPU8086_OP55)&& (CPU[0].currentOP_handler!=CPU80386_OPA2_16)

All those OP and executer_*_modrmmodrm* instructions are seemingly with the cause of the issue? No other handlers are triggered in the conditional breakpoint until the FFFFh instruction occurs?
Although afaik all those instructions SHOULD be resumable, since they're all cycle-accurate based?

Ofc there's also the question about opcode 8F being used correctly as well...
Edit: Just fixed opcode 8F's stack in this case. The other related stack_push/pop special modrm instructions need to be patched as well(FF /6 etc.)
Edit: And after that fixed said operation when page faulting(which isn't supposed to reset the esp of the fault handler's execution).
Edit: Having fixed opcode 8F, the MS-DOS prompt starts successfully and is operational, at least with the filter on instructions executing only(no prefetch, segment write logic or interrupts/task switches).
Edit: The same for full mode.

Author of the UniPCemu emulator.
UniPCemu Git repository
UniPCemu for Android, Windows, PSP, Vita and Switch on itch.io

Reply 307 of 710, by superfury

Posted on 2020-12-03, 18:15

superfury Offline

Rank l33t++

Rank: l33t++
Posts: 5822
Joined: 2014-03-08, 11:25
Location: Netherlands

Hmmmm.... Just tried Windows 3.0, 3.1, 3.11 and WFW3.11 on UniPCemu's latest commit, and I notice the following:
- Windows 3.0a in standard mode crashes with a double fault.
- Windows 3.1 in standard mode is unresponsive in graphics mode. Does display the program manager, though.
- Windows 3.11 in standard mode acts the same was as 3.1.
- Windows for Workgroups 3.11 in 386-Enhanced mode acts normally. Changing from the full-screen MS-DOS prompt by Alt-Tab to Windows causes it to hang?

I even reinstalled Windows 3.0a from the CD-ROM on the i430fx, but it still crashes.
Edit: Hmmm... Perhaps an incompatiblity with the i430fx BIOS?
Edit: The Slackware 1.01 LILO bootloader also hangs at the first L being displayed?

Author of the UniPCemu emulator.
UniPCemu Git repository
UniPCemu for Android, Windows, PSP, Vita and Switch on itch.io

Reply 308 of 710, by superfury

Posted on 2020-12-06, 13:53

superfury Offline

Rank l33t++

Rank: l33t++
Posts: 5822
Joined: 2014-03-08, 11:25
Location: Netherlands

Just found a small little bug in the writeback part of the new CMPXCHG instruction. It was writing a 16-bit value to memory in both zero and non-zero cases, where it was writing only a 8-bit truncated value to the memory address(or invalid register) instead of a 16-bit data block in memory or valid 16-bit register.

Author of the UniPCemu emulator.
UniPCemu Git repository
UniPCemu for Android, Windows, PSP, Vita and Switch on itch.io

Reply 309 of 710, by superfury

Posted on 2020-12-11, 00:01

superfury Offline

Rank l33t++

Rank: l33t++
Posts: 5822
Joined: 2014-03-08, 11:25
Location: Netherlands

Anyone knows more software I can use to verify/diagnose x86 emulation(Pentium II w/o FPU)?

Slackware linux 1.01's (linux 0.99pl12) boot disk and Minix 3.3.0 both crash with page faults somehow?

Author of the UniPCemu emulator.
UniPCemu Git repository
UniPCemu for Android, Windows, PSP, Vita and Switch on itch.io

Reply 310 of 710, by superfury

Posted on 2020-12-18, 18:07

superfury Offline

Rank l33t++

Rank: l33t++
Posts: 5822
Joined: 2014-03-08, 11:25
Location: Netherlands

Just implemented a slight multitasking improvement: when loading descriptors for the TR register(with a valid TSS access type(present TSS either busy or not)) or loading a descriptor which needs to have it's accessed bit set(code/data type with accessed bit cleared(always assumed cleared on 80386 and below)), it will actually lock the bus now.
So that's another simple SMP improvement wrt bus locking. Afaik, all the 80386 cases are now handled. The only thing that isn't handled is the stuff with external caches(not the descriptor/TLB ones). So the L1 cache and beyond. But those aren't emulated at all, so it shouldn't be a problem?
So it's all the cases at https://xem.github.io/minix86/manual/intel-x8 … 80e0ce-259.html , section 8.1.2.1. The only case that isn't done is the Acnowledging interrupts case, but this is atomic in UniPCemu anyways and doesn't require direct use of the bus(thus nothing to corrupt). It's using it's own specialized way to handle this(using a seperate bus of sorts, which is handled in one go anyways).

Author of the UniPCemu emulator.
UniPCemu Git repository
UniPCemu for Android, Windows, PSP, Vita and Switch on itch.io

Reply 311 of 710, by mr.cat

Posted on 2020-12-18, 19:13

mr.cat Offline

Rank Member

Rank: Member
Posts: 400
Joined: 2020-12-13, 11:56
Location: Finland

superfury wrote on 2020-12-11, 00:01:

Anyone knows more software I can use to verify/diagnose x86 emulation(Pentium II w/o FPU)?

Slackware linux 1.01's (linux 0.99pl12) boot disk and Minix 3.3.0 both crash with page faults somehow?

How much memory did you have for the Linux test? I tested some minimal 32-bit Linux distros (current ones) a while back and found out that many of them don't like to boot at all if you have less than 192MB memory (I don't know the exact limit).
This must be a recent development though, I don't think 192MB was a requirement in the Linux 0.xx days...
Depending on the distro there were some requirements on the CPU (cmov support) but I don't think that's a problem for P-II.

One suggestion would be to try NetBSD, it's quite minimal and should have a good support for i386 (well, why not FreeBSD too while you're at it).
On the FPU testing though, idk.

Reply 312 of 710, by superfury

Posted on 2020-12-18, 20:26

superfury Offline

Rank l33t++

Rank: l33t++
Posts: 5822
Joined: 2014-03-08, 11:25
Location: Netherlands

mr.cat wrote on 2020-12-18, 19:13:
How much memory did you have for the Linux test? I tested some minimal 32-bit Linux distros (current ones) a while back and foun […]
Show full quote

superfury wrote on 2020-12-11, 00:01:

Anyone knows more software I can use to verify/diagnose x86 emulation(Pentium II w/o FPU)?

Slackware linux 1.01's (linux 0.99pl12) boot disk and Minix 3.3.0 both crash with page faults somehow?

How much memory did you have for the Linux test? I tested some minimal 32-bit Linux distros (current ones) a while back and found out that many of them don't like to boot at all if you have less than 192MB memory (I don't know the exact limit).
This must be a recent development though, I don't think 192MB was a requirement in the Linux 0.xx days...
Depending on the distro there were some requirements on the CPU (cmov support) but I don't think that's a problem for P-II.

One suggestion would be to try NetBSD, it's quite minimal and should have a good support for i386 (well, why not FreeBSD too while you're at it).
On the FPU testing though, idk.

It's a 1GB RAM setup of a i440fx motherboard (the first one in the PCem-X hardware list in the source code).

Author of the UniPCemu emulator.
UniPCemu Git repository
UniPCemu for Android, Windows, PSP, Vita and Switch on itch.io

Reply 313 of 710, by mr.cat

Posted on 2020-12-18, 22:13

mr.cat Offline

Rank Member

Rank: Member
Posts: 400
Joined: 2020-12-13, 11:56
Location: Finland

superfury wrote on 2020-12-18, 20:26:

mr.cat wrote on 2020-12-18, 19:13:
How much memory did you have for the Linux test? I tested some minimal 32-bit Linux distros (current ones) a while back and foun […]
Show full quote

superfury wrote on 2020-12-11, 00:01:

Anyone knows more software I can use to verify/diagnose x86 emulation(Pentium II w/o FPU)?

Slackware linux 1.01's (linux 0.99pl12) boot disk and Minix 3.3.0 both crash with page faults somehow?

How much memory did you have for the Linux test? I tested some minimal 32-bit Linux distros (current ones) a while back and found out that many of them don't like to boot at all if you have less than 192MB memory (I don't know the exact limit).
This must be a recent development though, I don't think 192MB was a requirement in the Linux 0.xx days...
Depending on the distro there were some requirements on the CPU (cmov support) but I don't think that's a problem for P-II.

One suggestion would be to try NetBSD, it's quite minimal and should have a good support for i386 (well, why not FreeBSD too while you're at it).
On the FPU testing though, idk.

It's a 1GB RAM setup of a i440fx motherboard (the first one in the PCem-X hardware list in the source code).

OK, should be more than enough... Have you tried to dig in the boot logs to see the exact spot where the crash happens?
NetBSD seems to be less picky on hw, but on the Linux front there's a distro called Slitaz that stood out as perhaps the most resilient booter. Very minimal and very customized, that one.

Reply 314 of 710, by superfury

Posted on 2020-12-19, 14:15

superfury Offline

Rank l33t++

Rank: l33t++
Posts: 5822
Joined: 2014-03-08, 11:25
Location: Netherlands

mr.cat wrote on 2020-12-18, 22:13:

superfury wrote on 2020-12-18, 20:26:

mr.cat wrote on 2020-12-18, 19:13:
How much memory did you have for the Linux test? I tested some minimal 32-bit Linux distros (current ones) a while back and foun […]
Show full quote

How much memory did you have for the Linux test? I tested some minimal 32-bit Linux distros (current ones) a while back and found out that many of them don't like to boot at all if you have less than 192MB memory (I don't know the exact limit).
This must be a recent development though, I don't think 192MB was a requirement in the Linux 0.xx days...
Depending on the distro there were some requirements on the CPU (cmov support) but I don't think that's a problem for P-II.

One suggestion would be to try NetBSD, it's quite minimal and should have a good support for i386 (well, why not FreeBSD too while you're at it).
On the FPU testing though, idk.

It's a 1GB RAM setup of a i440fx motherboard (the first one in the PCem-X hardware list in the source code).

OK, should be more than enough... Have you tried to dig in the boot logs to see the exact spot where the crash happens?
NetBSD seems to be less picky on hw, but on the Linux front there's a distro called Slitaz that stood out as perhaps the most resilient booter. Very minimal and very customized, that one.

That's a problem: Minix 3.3.0 kernel panics, which doesn't allow viewing any logs(it does say that the buffer contained some APIC or APM text).
Linux 0.99pl12 gets a page fault, which results in a kernel panic. Same issue as Minix, but nothing to retrace other than a process VM number (being 9) and an EIP value.
Edit: Also, UniPCemu's CPUs don't have FPU support. They do have FPU emulation support, though(like Windows uses, using the FPU exception and CR0 EM bits).
Executing such an instruction without the emulation exception handler is simply a NOP. Or #UD for the newer instructions(like Pentium and up's 0F prefixed onces).

Author of the UniPCemu emulator.
UniPCemu Git repository
UniPCemu for Android, Windows, PSP, Vita and Switch on itch.io

Reply 315 of 710, by mr.cat

Posted on 2020-12-19, 14:27

mr.cat Offline

Rank Member

Rank: Member
Posts: 400
Joined: 2020-12-13, 11:56
Location: Finland

superfury wrote on 2020-12-19, 14:15:

mr.cat wrote on 2020-12-18, 22:13:

superfury wrote on 2020-12-18, 20:26:

It's a 1GB RAM setup of a i440fx motherboard (the first one in the PCem-X hardware list in the source code).

OK, should be more than enough... Have you tried to dig in the boot logs to see the exact spot where the crash happens?
NetBSD seems to be less picky on hw, but on the Linux front there's a distro called Slitaz that stood out as perhaps the most resilient booter. Very minimal and very customized, that one.

That's a problem: Minix 3.3.0 kernel panics, which doesn't allow viewing any logs(it does say that the buffer contained some APIC or APM text).
Linux 0.99pl12 gets a page fault, which results in a kernel panic. Same issue as Minix, but nothing to retrace other than a process VM number (being 9) and an EIP value.

Right...I guess it would require some kernel-fu to get to the root cause. I think Linux has some Magic-SysRq key combination that can be used to get some information from a crashed kernel, haven't used that myself though.
EDIT: A list of capabilities here: https://www.kernel.org/doc/html/latest/admin- … uide/sysrq.html
Also, there are several APIC and ACPI related boot parameters that you could play with. It's quite common that physical Linux machines need some of those to be able to boot (or shut down).
Not sure if 0.xx kernels have these.

Reply 316 of 710, by superfury

Posted on 2020-12-19, 15:54

superfury Offline

Rank l33t++

Rank: l33t++
Posts: 5822
Joined: 2014-03-08, 11:25
Location: Netherlands

mr.cat wrote on 2020-12-19, 14:27:
Right...I guess it would require some kernel-fu to get to the root cause. I think Linux has some Magic-SysRq key combination tha […]
Show full quote

superfury wrote on 2020-12-19, 14:15:

mr.cat wrote on 2020-12-18, 22:13:

OK, should be more than enough... Have you tried to dig in the boot logs to see the exact spot where the crash happens?
NetBSD seems to be less picky on hw, but on the Linux front there's a distro called Slitaz that stood out as perhaps the most resilient booter. Very minimal and very customized, that one.

That's a problem: Minix 3.3.0 kernel panics, which doesn't allow viewing any logs(it does say that the buffer contained some APIC or APM text).
Linux 0.99pl12 gets a page fault, which results in a kernel panic. Same issue as Minix, but nothing to retrace other than a process VM number (being 9) and an EIP value.

Right...I guess it would require some kernel-fu to get to the root cause. I think Linux has some Magic-SysRq key combination that can be used to get some information from a crashed kernel, haven't used that myself though.
EDIT: A list of capabilities here: https://www.kernel.org/doc/html/latest/admin- … uide/sysrq.html
Also, there are several APIC and ACPI related boot parameters that you could play with. It's quite common that physical Linux machines need some of those to be able to boot (or shut down).
Not sure if 0.xx kernels have these.

Thinking about it, I believe minix 3.3.o complained about ACPI. UniPCemu does implement the APIC, but doesn't implement SMM nor ACPI. Anything signalling SMM(all kinds of interrupts and inter-processor triggers) are ignored by the targeted CPU. The same for the i430fx/i440fx APM triggers and causes not being emulated at all, other than the I/O and PCI registers. They can be written to and read back, but are just stored and read back without any SMM interrupts being triggered.

Author of the UniPCemu emulator.
UniPCemu Git repository
UniPCemu for Android, Windows, PSP, Vita and Switch on itch.io

Reply 317 of 710, by mr.cat

Posted on 2020-12-19, 16:14

mr.cat Offline

Rank Member

Rank: Member
Posts: 400
Joined: 2020-12-13, 11:56
Location: Finland

superfury wrote on 2020-12-19, 15:54:

mr.cat wrote on 2020-12-19, 14:27:
Right...I guess it would require some kernel-fu to get to the root cause. I think Linux has some Magic-SysRq key combination tha […]
Show full quote

superfury wrote on 2020-12-19, 14:15:

That's a problem: Minix 3.3.0 kernel panics, which doesn't allow viewing any logs(it does say that the buffer contained some APIC or APM text).
Linux 0.99pl12 gets a page fault, which results in a kernel panic. Same issue as Minix, but nothing to retrace other than a process VM number (being 9) and an EIP value.

Right...I guess it would require some kernel-fu to get to the root cause. I think Linux has some Magic-SysRq key combination that can be used to get some information from a crashed kernel, haven't used that myself though.
EDIT: A list of capabilities here: https://www.kernel.org/doc/html/latest/admin- … uide/sysrq.html
Also, there are several APIC and ACPI related boot parameters that you could play with. It's quite common that physical Linux machines need some of those to be able to boot (or shut down).
Not sure if 0.xx kernels have these.

Thinking about it, I believe minix 3.3.o complained about ACPI. UniPCemu does implement the APIC, but doesn't implement SMM nor ACPI. Anything signalling SMM(all kinds of interrupts and inter-processor triggers) are ignored by the targeted CPU. The same for the i430fx/i440fx APM triggers and causes not being emulated at all, other than the I/O and PCI registers. They can be written to and read back, but are just stored and read back without any SMM interrupts being triggered.

I see, that doesn't seem very likely to cause problems.
But if you want to try it, a typical incantation on Linux would be something like "acpi=off noapic" and maybe "nomodeset" thrown in for good measure.

Btw, I quite like that you've documented your progress with UniPCemu even if it does tend to go to monolog (and this low-level stuff goes way over my head).
Could be very helpful for emu builders following your footsteps. Or maybe when UniPCemu is "finished", you can gather all that into a book 😁

Last edited by mr.cat on 2020-12-19, 23:05. Edited 1 time in total.

Reply 318 of 710, by superfury

Posted on 2020-12-19, 21:57

superfury Offline

Rank l33t++

Rank: l33t++
Posts: 5822
Joined: 2014-03-08, 11:25
Location: Netherlands

This is what happens with minix, directly after starting the kernel's APIC it seems(acpi_init at https://github.com/Stichting-MINIX-Research-F … rch/i386/acpi.c , after this row perhaps https://github.com/Stichting-MINIX-Research-F … h_system.c#L264 ?), according to the source code(

The attachment 17-Minix panic past ACPI initialization.jpg is no longer available

Author of the UniPCemu emulator.
UniPCemu Git repository
UniPCemu for Android, Windows, PSP, Vita and Switch on itch.io

Reply 319 of 710, by mr.cat

Posted on 2020-12-19, 23:15

mr.cat Offline

Rank Member

Rank: Member
Posts: 400
Joined: 2020-12-13, 11:56
Location: Finland

Right. I'm not familiar with Minix, but it seems there have been somewhat similar issues with Minix and qemu (that same panic spot, but with the system rebooting there).
Customizing the kernel might be a possibility if the kernel boot params don't cut it. If nothing else, just adding some printf's in the right spot could prove to be useful, it's the poor man's debug hah 😁

After ACPI is dealt with, the kernel goes after APIC next. I understand that Minix has very rudimentary/non-existant support for SMP, so this single-cpu init apic_single_cpu_init() is probably the default:
https://github.com/Stichting-MINIX-Research-F … 86/apic.c#L1124

EDIT:Since I'm playing around with qemu anyway, I did a test run with the Minix ISO + qemu. Does UniPCemu have a serial port capability?
You need to add the parameter -serial tcp:127.0.0.1:1234,server for the qemu command line, and then telnet to port 1234
You should then give the Minix kernel some additional boot parameters: cttyline=0 cttybaud=115200 verbose=2

Here's a sample what it shows with qemu:

$ telnet 127.0.0.1 1234
Trying 127.0.0.1...
Connected to 127.0.0.1.
Escape character is '^]'.
cstart
intr_init(0)
acpi: returning 0x7fe144b as vir addr
acpi: returning 0x7fe0040 as vir addr
acpi: poweroff initialized
APIC disabled, using legacy PIC
main()
initializing asyncm... done
initializing idle... done
initializing clock... done
initializing system... done
...
etc. etc.

So it seems APIC is skipped here.

Main menu

Topic actions

Reply 300 of 710, by superfury

Reply 301 of 710, by superfury

Reply 302 of 710, by superfury

Reply 303 of 710, by superfury

Reply 304 of 710, by superfury

Reply 305 of 710, by superfury

Reply 306 of 710, by superfury

Reply 307 of 710, by superfury

Reply 308 of 710, by superfury

Reply 309 of 710, by superfury

Reply 310 of 710, by superfury

Reply 311 of 710, by mr.cat

Reply 312 of 710, by superfury

Reply 313 of 710, by mr.cat

Reply 314 of 710, by superfury

Reply 315 of 710, by mr.cat

Reply 316 of 710, by superfury

Reply 317 of 710, by mr.cat

Reply 318 of 710, by superfury

Reply 319 of 710, by mr.cat