Writing a 386/486 emulator, having some issues \ VOGONS

Writing a 386/486 emulator, having some issues

Topic actions

First post, by UselessSoftware

Posted on 2025-03-15, 17:12

UselessSoftware Offline

Rank Newbie

Rank: Newbie
Posts: 93
Joined: 2020-06-15, 00:20
Location: United States

I've been upgrading my old 80186 emulator (XTulator) to handle 386 and probably 486 as well. It's obviously a huge step up from the 16-bit PCs in complexity.

I've got a good amount of the work done already, but I'm having a problem booting 386 BIOSes that I've tried. From my debug logs, what's happening is it's enabling the timer interrupt (sets the 8259 IMR to 0xFE) and enables protected mode WITHOUT remapping the PIC to vector 0x20. So then of course, a timer interrupt fires in protected mode with a vector of 8 instead of 0x20.... this is bad because 8 would be for the double fault exception handler, not the timer handler.

Then of course it does stuff it's not supposed to do and winds up in a GP fault and locking up the system as the memory test begins like below.

If I FORCE IRQs to never happen in protected mode via the emulator code, it goes through the POST:

Does anyone have any thoughts? I'm stumped right now, I've been trying to figure this one out for like two days now. This exact BIOS works just fine in 86Box.

It's not all bad news though. If I boot with an XT style BIOS and force A20 to be enabled, I can at least start some DOS games that required a 386, like:

But other things are still having issues, like anything that needs DOS4GW. It's possible that it's looking for BIOS services that an XT doesn't provide though, I haven't looked too deep yet. Focusing on the BIOS first.

Hopefully I'll get there with all of it soon enough.

I want to see it booting Linux and Windows 9x/NT. I'm dreading the 80387 stuff...

But if anybody has any idea why that BIOS might be enabling interrupts in protected mode with the PIC still using vector 8, let me know. This is complex as hell, definitely the toughest emu project I've ever worked on.

Last edited by UselessSoftware on 2025-03-15, 23:14. Edited 1 time in total.

Reply 1 of 31, by vstrakh

Posted on 2025-03-15, 18:26

vstrakh Offline

Rank Member

Rank: Member
Posts: 418
Joined: 2021-04-09, 12:31
Location: Ukraine, Dnipro obl.

Could it be just your PIC implementation acting up?

Reply 2 of 31, by UselessSoftware

Posted on 2025-03-15, 19:09

UselessSoftware Offline

Rank Newbie

Rank: Newbie
Posts: 93
Joined: 2020-06-15, 00:20
Location: United States

Certainly possible, but it's been very reliable for years as part of the 80186 emulator. I'll take a closer look. There are no port writes to it that indicate it's trying to set ICW2 to 0x20 though.

Reply 3 of 31, by vstrakh

Posted on 2025-03-16, 08:55

vstrakh Offline

Rank Member

Rank: Member
Posts: 418
Joined: 2021-04-09, 12:31
Location: Ukraine, Dnipro obl.

UselessSoftware wrote on 2025-03-15, 19:09:

There are no port writes to it that indicate it's trying to set ICW2 to 0x20 though.

Then it's also worth checking if the port access is actually never attempted (it's on CPU side logs now, not PIC), or maybe it was rejected because of some piece missing in CPU protected mode implementation, or maybe just a bug about i/o privileges and whatnot.

Reply 4 of 31, by UselessSoftware

Posted on 2025-03-17, 14:23

UselessSoftware Offline

Rank Newbie

Rank: Newbie
Posts: 93
Joined: 2020-06-15, 00:20
Location: United States

I actually haven't even implemented IO protections yet. Have a feeling this is gonna be a tough one to figure out. I guess I'll start with just looking through all my opcodes again for any obvious bugs.

Reply 5 of 31, by UselessSoftware

Posted on 2025-03-18, 05:48

UselessSoftware Offline

Rank Newbie

Rank: Newbie
Posts: 93
Joined: 2020-06-15, 00:20
Location: United States

I still haven't figured that BIOS issue out, but it's close to playing DOS4GW games at least. Next step seems to be implementing virtual 8086 mode. It'll also show the Windows 3.1 splash screen, but then the CPU resets.

Reply 6 of 31, by UselessSoftware

Posted on 2025-03-30, 16:08

UselessSoftware Offline

Rank Newbie

Rank: Newbie
Posts: 93
Joined: 2020-06-15, 00:20
Location: United States

Lots of progress in the last few weeks. A 486 BIOS is booting and DOS4GW games that don't need an FPU are working. 😁

The attachment 1743315862912.png is no longer available

I'll be adding FPU though, not sure if it's worth adding full 80-bit accurate emulation of it but at least just using host's float/double. I'll look into it though. If I need full accuracy for Win9x/NT or Linux, I'll be doing it.

Picky BIOSes don't like my DMA implementation and won't POST, so I've gotta figure that out too but I did find one that just warns about it and boots anyway so using that for now.

Reply 7 of 31, by superfury

Posted on 2025-03-31, 09:24

superfury Offline

Rank l33t++

Rank: l33t++
Posts: 5822
Joined: 2014-03-08, 11:25
Location: Netherlands

UselessSoftware wrote on 2025-03-30, 16:08:
Lots of progress in the last few weeks. A 486 BIOS is booting and DOS4GW games that don't need an FPU are working. :D […]
Show full quote

Lots of progress in the last few weeks. A 486 BIOS is booting and DOS4GW games that don't need an FPU are working. 😁

The attachment 1743315862912.png is no longer available

I'll be adding FPU though, not sure if it's worth adding full 80-bit accurate emulation of it but at least just using host's float/double. I'll look into it though. If I need full accuracy for Win9x/NT or Linux, I'll be doing it.

Picky BIOSes don't like my DMA implementation and won't POST, so I've gotta figure that out too but I did find one that just warns about it and boots anyway so using that for now.

Wrt DMA verify, it's like a write to memory (and read from device) w/o writing. So it just reads the device I/O port.

Author of the UniPCemu emulator.
UniPCemu Git repository
UniPCemu for Android, Windows, PSP, Vita and Switch on itch.io

Reply 8 of 31, by vstrakh

Posted on 2025-03-31, 09:25

vstrakh Offline

Rank Member

Rank: Member
Posts: 418
Joined: 2021-04-09, 12:31
Location: Ukraine, Dnipro obl.

So what was the issue with PIC not relocating interrupt vectors?

Reply 9 of 31, by UselessSoftware

Posted on 2025-03-31, 20:21

UselessSoftware Offline

Rank Newbie

Rank: Newbie
Posts: 93
Joined: 2020-06-15, 00:20
Location: United States

superfury wrote on 2025-03-31, 09:24:

UselessSoftware wrote on 2025-03-30, 16:08:
Lots of progress in the last few weeks. A 486 BIOS is booting and DOS4GW games that don't need an FPU are working. :D […]
Show full quote

Lots of progress in the last few weeks. A 486 BIOS is booting and DOS4GW games that don't need an FPU are working. 😁

The attachment 1743315862912.png is no longer available

I'll be adding FPU though, not sure if it's worth adding full 80-bit accurate emulation of it but at least just using host's float/double. I'll look into it though. If I need full accuracy for Win9x/NT or Linux, I'll be doing it.

Picky BIOSes don't like my DMA implementation and won't POST, so I've gotta figure that out too but I did find one that just warns about it and boots anyway so using that for now.

Wrt DMA verify, it's like a write to memory (and read from device) w/o writing. So it just reads the device I/O port.

Thanks I'll see if that's the issue.

Trying to solve another problem first though. DOS4GW games will play for a bit and then randomly crash/hang/fault. Sometimes they won't even start up all the way. I'm not sure what the deal is yet, probably something related to stack and/or interrupt handling in protected mode but who knows, there are so many moving parts in the 386.

Windows 3.1/95 and Linux still don't boot up either. Maybe there's one bug causing all of these issues.

Reply 10 of 31, by UselessSoftware

Posted on 2025-03-31, 20:21

UselessSoftware Offline

Rank Newbie

Rank: Newbie
Posts: 93
Joined: 2020-06-15, 00:20
Location: United States

vstrakh wrote on 2025-03-31, 09:25:

So what was the issue with PIC not relocating interrupt vectors?

I think the issue was unrelated and it wasn't even supposed to do that, rather it only services IRQs in real mode.

Reply 11 of 31, by superfury

Posted on 2025-03-31, 20:52

superfury Offline

Rank l33t++

Rank: l33t++
Posts: 5822
Joined: 2014-03-08, 11:25
Location: Netherlands

UselessSoftware wrote on 2025-03-31, 20:21:

vstrakh wrote on 2025-03-31, 09:25:

So what was the issue with PIC not relocating interrupt vectors?

I think the issue was unrelated and it wasn't even supposed to do that, rather it only services IRQs in real mode.

The PIC always serves interrupts, though. In all CPU modes, as long as the CPU allows it (interrupy flag). The host OS does disable (mask) all interrupts when it switches to multiprocessor mode using the local APIC and IOAPIC configuration, though. In that case, the APIC on the CPU's internal bus and IOAPIC (replacing or mixed with the PIC, depending on it's setup, like virtual wire mode etc.) will take over multiprocessor-capable interrupt handling. Newer CPUs (486 with external support chips or Pentium(internal per-CPU chips) also can disable the INTR line itself using some APIC registers by masking it entirely or performing manual handling of the INTR INTA signals using APIC commands. Though DOS-based Windows versions don't use (IO)APIC.

If the interrupt mask register stays 0xFF, perhaps IRQ detection fails or an early protected mode crash is occurring before it's re-enabled after switching to protected mode.
Just a simple check, though: are you implementing the descriptor cache and Paging TLB? Windows requires those two caches to work properly. For example, setting CR0.PE doesn't toggle the descriptor cache on or off. It's always on and retains the (un)real mode-compatible values until CS is reloaded (through a far jump, exception or interrupt). The real mode for example simply (partly) loads the descriptor cache with some or all descriptor values (leaving the limit alone for some cases. CS behaves different than other registers depending on Pentium vs 486- CPU model being used, with the newer CPUs ignoring some values in real mode and older CPUs loading values into it that newer CPUs leave alone). Then the 286 and 386/486 have LOADALL in different formats and sticky PE bit(286), even a SAVEALL too (undocumented, hanging the CPU after saving requiring an external reset afterwards due to missing CPU connections). HIMEM for example uses LOADALL on 286 at least to implement unreal mode and access high memory locationa, maybe some 386+ software use it as well (the BIOS emulating 286 LOADALL with the 386/486 one).

Windows also requires the Paging TLB to behave properly, causing weird crashes if it doesn't. It also performs relatively odd (late) TLB invalidation in some cases.
Edit: Paging TLB info:
https://blog.stuffedcow.net/2015/08/pagewalk-coherence/

UniPCemu for example provides a 4-way 32-entry TLB, split for 4KB and 4MB/2MB entries (so 64 entries in total on Pentium CPUs and newer).
It keeps a relatively big 1MB(4KB)+2KB(2/4MB) lookup table for each CPU (up to four) to speed up lookups (each byte specifying one real TLB entry, being zero for not in the TLB. Though that's 4MB+8KB of fast lookup data, which is quite a lot on the lowest memory device it supports (only some 20MB RAM available on the PSP for example, so that leaves less than 16MB, substracting about the same 4MB for the executable itself right now).
It also uses a doubly linked list that points to the actual TLB entries to provide fast MRU/LRU services. Basically MRU is updated by moving a list item to the head of the list. Invalidation is performed by moving from an used(in-use) list to it's corresponding free list, so there's actually 4 pointers for each way: 1 for in-use (cached) head, 1 for in-use (cached) tail, 1 for free head and 1 for free tail. Simple moving is performed by unlinking, updating head/tail if the entries' previous/next is zero, then adding to the head of the destination list (used or free).
It's TLB is fast enough to not be visible in profiling on the devkits I used (mostly Visual Studio). Most of the actual overhead is in the physical memory accesses itself (which are mostly reads, due to them being more common). Perhaps I'll add the same kind of caching for memory accesses by the BIU someday to improve that (it's currently basically a 1-entry TLB for reads, writes and code fetches, so it has pretty high overhead).
Still getting over 10% speed even with the slow RAM/ROM accesses, though. Protected mode is also almost just as fast, due to the TLB and descriptor caches. The only thing adding extra overhead there is mainly protected mode stuff (intertupts, exceptions, page table walks on TLB misses), but those still pale in comparison with the memory accesses themselves, due to those being too random (not to mention PCI emulation overhead, different (split) memory spaces for certain memory ranges (UMA for example) and it's behaviour). So the lookups (actually performed for RAM mapping much like paging table walks) invalidate themselves a lot, despite 128-bit data caches (though those speeding up stuff like 16/32/64-bit memory reads and PIQ prefetching in Dosbox-compatible IPS clocking mode (up to max instruction length bytes)).

Author of the UniPCemu emulator.
UniPCemu Git repository
UniPCemu for Android, Windows, PSP, Vita and Switch on itch.io

Reply 12 of 31, by UselessSoftware

Posted on 2025-04-02, 14:35

UselessSoftware Offline

Rank Newbie

Rank: Newbie
Posts: 93
Joined: 2020-06-15, 00:20
Location: United States

superfury wrote on 2025-03-31, 20:52:
The PIC always serves interrupts, though. In all CPU modes, as long as the CPU allows it (interrupy flag). The host OS does disa […]
Show full quote

UselessSoftware wrote on 2025-03-31, 20:21:

vstrakh wrote on 2025-03-31, 09:25:

So what was the issue with PIC not relocating interrupt vectors?

I think the issue was unrelated and it wasn't even supposed to do that, rather it only services IRQs in real mode.

The PIC always serves interrupts, though. In all CPU modes, as long as the CPU allows it (interrupy flag). The host OS does disable (mask) all interrupts when it switches to multiprocessor mode using the local APIC and IOAPIC configuration, though. In that case, the APIC on the CPU's internal bus and IOAPIC (replacing or mixed with the PIC, depending on it's setup, like virtual wire mode etc.) will take over multiprocessor-capable interrupt handling. Newer CPUs (486 with external support chips or Pentium(internal per-CPU chips) also can disable the INTR line itself using some APIC registers by masking it entirely or performing manual handling of the INTR INTA signals using APIC commands. Though DOS-based Windows versions don't use (IO)APIC.

If the interrupt mask register stays 0xFF, perhaps IRQ detection fails or an early protected mode crash is occurring before it's re-enabled after switching to protected mode.
Just a simple check, though: are you implementing the descriptor cache and Paging TLB? Windows requires those two caches to work properly. For example, setting CR0.PE doesn't toggle the descriptor cache on or off. It's always on and retains the (un)real mode-compatible values until CS is reloaded (through a far jump, exception or interrupt). The real mode for example simply (partly) loads the descriptor cache with some or all descriptor values (leaving the limit alone for some cases. CS behaves different than other registers depending on Pentium vs 486- CPU model being used, with the newer CPUs ignoring some values in real mode and older CPUs loading values into it that newer CPUs leave alone). Then the 286 and 386/486 have LOADALL in different formats and sticky PE bit(286), even a SAVEALL too (undocumented, hanging the CPU after saving requiring an external reset afterwards due to missing CPU connections). HIMEM for example uses LOADALL on 286 at least to implement unreal mode and access high memory locationa, maybe some 386+ software use it as well (the BIOS emulating 286 LOADALL with the 386/486 one).

Windows also requires the Paging TLB to behave properly, causing weird crashes if it doesn't. It also performs relatively odd (late) TLB invalidation in some cases.
Edit: Paging TLB info:
https://blog.stuffedcow.net/2015/08/pagewalk-coherence/

UniPCemu for example provides a 4-way 32-entry TLB, split for 4KB and 4MB/2MB entries (so 64 entries in total on Pentium CPUs and newer).
It keeps a relatively big 1MB(4KB)+2KB(2/4MB) lookup table for each CPU (up to four) to speed up lookups (each byte specifying one real TLB entry, being zero for not in the TLB. Though that's 4MB+8KB of fast lookup data, which is quite a lot on the lowest memory device it supports (only some 20MB RAM available on the PSP for example, so that leaves less than 16MB, substracting about the same 4MB for the executable itself right now).
It also uses a doubly linked list that points to the actual TLB entries to provide fast MRU/LRU services. Basically MRU is updated by moving a list item to the head of the list. Invalidation is performed by moving from an used(in-use) list to it's corresponding free list, so there's actually 4 pointers for each way: 1 for in-use (cached) head, 1 for in-use (cached) tail, 1 for free head and 1 for free tail. Simple moving is performed by unlinking, updating head/tail if the entries' previous/next is zero, then adding to the head of the destination list (used or free).
It's TLB is fast enough to not be visible in profiling on the devkits I used (mostly Visual Studio). Most of the actual overhead is in the physical memory accesses itself (which are mostly reads, due to them being more common). Perhaps I'll add the same kind of caching for memory accesses by the BIU someday to improve that (it's currently basically a 1-entry TLB for reads, writes and code fetches, so it has pretty high overhead).
Still getting over 10% speed even with the slow RAM/ROM accesses, though. Protected mode is also almost just as fast, due to the TLB and descriptor caches. The only thing adding extra overhead there is mainly protected mode stuff (intertupts, exceptions, page table walks on TLB misses), but those still pale in comparison with the memory accesses themselves, due to those being too random (not to mention PCI emulation overhead, different (split) memory spaces for certain memory ranges (UMA for example) and it's behaviour). So the lookups (actually performed for RAM mapping much like paging table walks) invalidate themselves a lot, despite 128-bit data caches (though those speeding up stuff like 16/32/64-bit memory reads and PIQ prefetching in Dosbox-compatible IPS clocking mode (up to max instruction length bytes)).

Right, I worded that poorly. I meant not servicing during protected mode as in the BIOS just clears the interrupt flag and/or masks every IRQ during that time.

Speaking of paging. I've also been working on that, and it seems to be working to some extent but I'm wondering if even GDT, LDT, IDT and TSS reads/writes get translated through the page table, or if they're a special case and bypass it. Do *all* memory accesses get translated through the table?

The lookup table is a sensible idea to keep speed high, though I'm going to put off bothering with anything like that until I'm sure the basic code is 100% functional. I'm trying to see if I can get NT 4.0 to boot. So far it gets past the NTDETECT and OS loader stage, but goes off the rails right before the blue screen appears where it's supposed to print the kernel version, etc. Hangs at a black screen instead after setting the text mode to 80x43.

Currently, I'm simply doing this just to try to get something going here. Very basic, incomplete and unoptimized.

1uint32_t translate_page(CPU_t* cpu, uint32_t addr32) {
2	uint32_t dir, table, offset, dentry_addr, dentry, tentry_addr, tentry, linear;
3
4	dir = (addr32 >> 22) & 0x3FF;
5	table = (addr32 >> 12) & 0x3FF;
6	offset = addr32 & 0xFFF;
7
8	dentry_addr = (cpu->cr[3] & 0xFFFFF000) + (dir << 2);
9	dentry = (uint32_t)cpu_read_linear(cpu, dentry_addr) |
10		((uint32_t)cpu_read_linear(cpu, dentry_addr + 1) << 8) |
11		((uint32_t)cpu_read_linear(cpu, dentry_addr + 2) << 16) |
12		((uint32_t)cpu_read_linear(cpu, dentry_addr + 3) << 24);
13
14	tentry_addr = (dentry & 0xFFFFF000) + (table << 2);
15	tentry = (uint32_t)cpu_read_linear(cpu, tentry_addr) |
16		((uint32_t)cpu_read_linear(cpu, tentry_addr + 1) << 8) |
17		((uint32_t)cpu_read_linear(cpu, tentry_addr + 2) << 16) |
18		((uint32_t)cpu_read_linear(cpu, tentry_addr + 3) << 24);
19
20	linear = (tentry & 0xFFFFF000) + offset;
21
22	if ((tentry & 1) == 0) { //not present
23		cpu->cr[2] = addr32;
24		exception(cpu, 14, 0); //PF
25		return 0xFFFFFFFF;
26	}
27
28	return linear;
29}

Reply 13 of 31, by superfury

Posted on 2025-04-02, 19:47

superfury Offline

Rank l33t++

Rank: l33t++
Posts: 5822
Joined: 2014-03-08, 11:25
Location: Netherlands

Windows 9x and probably NT too won't boot without the TLB cache installed. It requires that invalidation works as is documented (and entries for 4KB PTEs and 4MB/2MB PDEs are cached in said TLB. It even performs on-the-fly TLB remapping that will fail and crash the system if the TLB isn't implemented, known to happen on some emulators that don't implement the TLB (or as the documentation I mentioned earlier said: with some hardware-assisted page tables, which have the same effect as no TLB being present, causing random crashes in the emulated Windows OS).

You're missing a lot of checks required for the PDE/PTE lookups. PTEs aren't read nor used when the PDE isn't marked present for example.
Then there's also protection checks (r/w, r/o, present and other checks too, combining both PDE and PTE entries together (which is badly documented), as well as privilege level checks).

You could try the test386.asm testsuite, which verifies such behaviour too, besides a lot of other known behaviours (even flags and undocumented 386 flags for example).

And yes, everything but GDT/LDT/IDT/TSS accesses are mapped through the paging unit too (just forced as CPL 0 accesses).

Stuff like r/o and present for example IS mandatory for getting NT to run (and present bit is required on Windows 9x/3.1 and everything implementing dynamic memory allocation too). r/o for example on Windows NT is used for COW (Copy-on-write) mapping of processes and is required for example.
WRT NT 4.0 for example, you also need proper implementation of CR0's WP bit to implement the COW mentioned, which forces r/w protections on CPL 0 for example.

You can look at https://bitbucket.org/superfury/unipcemu/src/ … mu/cpu/paging.c / https://bitbucket.org/superfury/unipcemu/src/ … rs/cpu/paging.h for my implementation on it (look for the function isvalidpage for the implementation (although it also includes 4MB/2MB pages (and PAE mode too, used on Windows 2000 and above), which is present on Pentium CPUs and above). It's nice and orderly, so shouldn't be too hard to understand from the code.
My emulator also exploits the way that writable and non-writable pages, dirty and present has requirements to load into the TLB (writes requiring dirty and writability, present is always required for mapped pages for example) to quickly scan for an item in the TLB, while walking non-dirty pages writes and faulting if writing to r/o pages (through a forced page table walk because of the filters used) and the present bit is exploited to filter pages present in the TLB too (besides doubly-linked lists for fast and easy LRU detection and MRU marking). Then there's the same kind of forcing of privilege level splits on pages too, which prevents kernel-mode pages in the TLB from being read from user mode accesses (which is never the case for GDT/LDT/IDT/TSS accesses, as they're forced PL0 accesses).
Beware that TLB_usedlist_index is a relatively huge per-CPU table (1MB+2KB) for quickly looking up 4KB(first 1MB) and 2/4MB(last 8KB) entries that are loaded in the TLB by the linear address bits in O(1) time, though costing quite a lot of memory. Since my emulator implements 4 CPUs max, that's a whopping 4MB+8KB memory used for the O(1) TLB lookups, which is quite a lot for low-memory systems (like the PSP with it's only 20MB-ish of free RAM available). It's quite fast lookups for the emulation though.

Author of the UniPCemu emulator.
UniPCemu Git repository
UniPCemu for Android, Windows, PSP, Vita and Switch on itch.io

Reply 14 of 31, by UselessSoftware

Posted on 2025-04-03, 19:50

UselessSoftware Offline

Rank Newbie

Rank: Newbie
Posts: 93
Joined: 2020-06-15, 00:20
Location: United States

Thanks for the tips -- lots to do still it seems.

I'm getting signs of life from a Linux kernel though! I'm starting to get somewhere here.

Reply 15 of 31, by UselessSoftware

Posted on 2025-04-04, 22:25

UselessSoftware Offline

Rank Newbie

Rank: Newbie
Posts: 93
Joined: 2020-06-15, 00:20
Location: United States

https://youtu.be/jeqklp8wpq8

Booting a Linux kernel (From Debian 3.1)

I didnt realize the BT family of opcodes can have a bit count >15 or >31 for memory ops, where it will start going into the next mem addresses for data. Also I needed to implement proper error codes and implement WP support on page faults. That got it working.

But I haven't implemented a real IDE controller yet so it dies when trying to mount root. I'll see if I can get that working tonight.

Reply 16 of 31, by UselessSoftware

Posted on 2025-04-08, 01:29

UselessSoftware Offline

Rank Newbie

Rank: Newbie
Posts: 93
Joined: 2020-06-15, 00:20
Location: United States

I implemented a basic ATA controller over the weekend and Debian 3.0 mounts root, but then throws this error. The same disk image works fine in QEMU or 86Box. Not sure what the deal is.

Lots of other kernels throw a "broken WP" error and refuse to boot -- I've implemented WP in my paging just fine, but the kernel never set the WP bit in CR0 before the test.

And Debian 2.2 just goes silent when it's supposed to be running the init after mounting root.

So still some issues to fix. I wonder where to even look first. My ATA stuff seems fine, I don't think it's a disk access problem.

And Windows NT 4 just does this after the OS Loader 🤣

Reply 17 of 31, by superfury

Posted on 2025-04-08, 10:45

superfury Offline

Rank l33t++

Rank: l33t++
Posts: 5822
Joined: 2014-03-08, 11:25
Location: Netherlands

Have you tried test386.asm yet?

Also regarding the BT* instructions, isn't the offset (based on the shr 4(16) or 5(32) bit position shifted left by one(16) or two(32)) supposed to be signed? So bits 8000h is actually byte r/m offset-2048, bit 0(as a 16-bit word read and written)?
So for R/M offset 10000h, it's at a lower address.

Edit: Added a simple MS-DOS based testsuite for bit test string instruction (it tests both 16-bit and 32-bit versions on a 3 doubleword bit string in memory, with the pointer on the middle doubleword).
It will test the first doubleword (positive addresses, at the base address), then the second doubleword (base+1), then the previous doubleword (base-1).

It's fully running in 16-bit MS-DOS mode, but the 32-bit addresses will use operand and address size overrides (it uses EDX for addressing easily).

In UniPCemu's current commits, it seems to run properly at least (with additional bugfixes in the emulator performed). Oddly enough, the test386.asm testsuite doesn't verify that the positive and negative ranges are functioning properly (it just tests the register version of those opcodes for some odd reason).

I found a Youtube video that seems to explain it nicely:
https://www.youtube.com/watch?v=en_7DtfT8Cg

Author of the UniPCemu emulator.
UniPCemu Git repository
UniPCemu for Android, Windows, PSP, Vita and Switch on itch.io

Reply 18 of 31, by UselessSoftware

Posted on 2025-04-08, 18:09

UselessSoftware Offline

Rank Newbie

Rank: Newbie
Posts: 93
Joined: 2020-06-15, 00:20
Location: United States

superfury wrote on 2025-04-08, 10:45:
Have you tried test386.asm yet? […]
Show full quote

Have you tried test386.asm yet?

Also regarding the BT* instructions, isn't the offset (based on the shr 4(16) or 5(32) bit position shifted left by one(16) or two(32)) supposed to be signed? So bits 8000h is actually byte r/m offset-2048, bit 0(as a 16-bit word read and written)?
So for R/M offset 10000h, it's at a lower address.

Edit: Added a simple MS-DOS based testsuite for bit test string instruction (it tests both 16-bit and 32-bit versions on a 3 doubleword bit string in memory, with the pointer on the middle doubleword).
It will test the first doubleword (positive addresses, at the base address), then the second doubleword (base+1), then the previous doubleword (base-1).

It's fully running in 16-bit MS-DOS mode, but the 32-bit addresses will use operand and address size overrides (it uses EDX for addressing easily).

In UniPCemu's current commits, it seems to run properly at least (with additional bugfixes in the emulator performed). Oddly enough, the test386.asm testsuite doesn't verify that the positive and negative ranges are functioning properly (it just tests the register version of those opcodes for some odd reason).

I found a Youtube video that seems to explain it nicely:
https://www.youtube.com/watch?v=en_7DtfT8Cg

Ah! Didn't realize that, thanks. I'll get that fixed.

I did quickly put in a printf debug line that tells me if any BT opcodes are operating on an offset with the sign bit set but it never triggered. So that's not the cause of my current problems, but definitely still need to fix.

I did run test386 before, it runs successfully up into some of the protected mode tests that fail just because I haven't implemented a number of protections yet. I guess it's time to do those. Or comment out those tests and re-compile so it continues and get to them later.

I actually wonder if my FPU is just extremely broken and that's causing the problems. I've barely worked on it, and I do see that Linux executes a few FPU instructions as it loads. Maybe some errors there are tripping it up, I'm just not sure how much it relies on it for the boot process.

Reply 19 of 31, by superfury

Posted on 2025-04-08, 21:42

superfury Offline

Rank l33t++

Rank: l33t++
Posts: 5822
Joined: 2014-03-08, 11:25
Location: Netherlands

UselessSoftware wrote on 2025-04-08, 18:09:
Ah! Didn't realize that, thanks. I'll get that fixed. […]
Show full quote

superfury wrote on 2025-04-08, 10:45:
Have you tried test386.asm yet? […]
Show full quote

Have you tried test386.asm yet?

Also regarding the BT* instructions, isn't the offset (based on the shr 4(16) or 5(32) bit position shifted left by one(16) or two(32)) supposed to be signed? So bits 8000h is actually byte r/m offset-2048, bit 0(as a 16-bit word read and written)?
So for R/M offset 10000h, it's at a lower address.

Edit: Added a simple MS-DOS based testsuite for bit test string instruction (it tests both 16-bit and 32-bit versions on a 3 doubleword bit string in memory, with the pointer on the middle doubleword).
It will test the first doubleword (positive addresses, at the base address), then the second doubleword (base+1), then the previous doubleword (base-1).

It's fully running in 16-bit MS-DOS mode, but the 32-bit addresses will use operand and address size overrides (it uses EDX for addressing easily).

In UniPCemu's current commits, it seems to run properly at least (with additional bugfixes in the emulator performed). Oddly enough, the test386.asm testsuite doesn't verify that the positive and negative ranges are functioning properly (it just tests the register version of those opcodes for some odd reason).

I found a Youtube video that seems to explain it nicely:
https://www.youtube.com/watch?v=en_7DtfT8Cg

Ah! Didn't realize that, thanks. I'll get that fixed.

I did quickly put in a printf debug line that tells me if any BT opcodes are operating on an offset with the sign bit set but it never triggered. So that's not the cause of my current problems, but definitely still need to fix.

I did run test386 before, it runs successfully up into some of the protected mode tests that fail just because I haven't implemented a number of protections yet. I guess it's time to do those. Or comment out those tests and re-compile so it continues and get to them later.

I actually wonder if my FPU is just extremely broken and that's causing the problems. I've barely worked on it, and I do see that Linux executes a few FPU instructions as it loads. Maybe some errors there are tripping it up, I'm just not sure how much it relies on it for the boot process.

Afaik (at least older) Linux should plain work without FPU instructions (and using FPU emulation by trapping those opcodes if it's not implemented on a x87 (using a specific fault handler, enabled using a CR0 bit (EM) for opcodes D8-DFh. Those can be safely ignored (NOP except with a modr/m, no immediate) to emulate without a FPU (the OS will usually execute FNINIT and FNSTSW):

1FNINIT
2FNSTSW WORD PTR [FPU_STATUS]

My emulator simply does the following for example:
- FNINIT/FNSTSW: Disassemble, behave like a NOP.
- Any other FPU (D8-DF) instruction without EM set: NOP, but disassemble as an 'unimplemented FPU instruction'. Also fetch instruction ModR/M, but ignore it (to continue onwards to any next instruction).
- Any FPU instruction with EM set: trap to the OS using the emulation exception (#NM), like other CPU faults (in this case, just like #UD, except fetching modr/m for the undocumented instruction, as D8-DF instruction fetching is handled first).
In a way, it's like 0F18-0F1F, but all behaving like 0F1F, except optionally throwing an exception on execution (#NM) depending on the EM and TS bits in CR0.

Author of the UniPCemu emulator.
UniPCemu Git repository
UniPCemu for Android, Windows, PSP, Vita and Switch on itch.io

Go to top of page Go to top of page

Back to PC Emulation