Managed to fix the x86 BCD instructions to work properly again, based on the IBMulator's source code.
Somehow the code adapted from Reenigne's XTCE emulator doesn't behave properly wrt flags and results.
Currently UniPCemu still uses a Dosbox-adapted MPU-401 implementation. Perhaps I should rewrite it from scratch to become a full MPU-401 implementation (using UniPCemu's own functionality). That can also implement other things like recording and all missing modes (don't know how much is missing from the implementation though).
It might not be that difficult to implement?
One nice thing that'd add is accurate timing for the MPU-401 and MIDI IN/OUT being fully timed etc.?
Just improved the various system realtime commands (MIDI clock, MIDI start, MIDI continue, MIDI stop, Active Sense and Reset) to not modify running status. Although reset will clear running status (as state after it isn't documented afaik).
That's of course in preparation for the new MPU-401 code I might be implementing soon (when I have enough time to start on such a major rework, starting from scratch of course (discarding all the Dosbox MPU-401 code)).
It should also help in the handling of the future MPU-401 code, since it requires such behaviour (for those realtime commands to work properly).
Implemented a full MPU-401 interface with timing now.
It's currently forced into dumb UART mode on reset (maybe becoming an option setting at a later point in time).
But since it's timing the sending/receiving of MIDI data and MPU-401 commands now, it breaks the MIDI player inside UniPCemu itself, since that player just sends data to the command register (reset and then kick to UART) and then the player channels starts just banging MIDI player audio data to the data port without waiting for the buffer to become empty (thus the song outputs (bytes and running status too) currently mostly get discarded on the full buffer 😖 ).
I'll need to improve the player to properly wait for bit 6 of the status register to clear and interrupt the player stream sending in the meantime. Basically make the player itself interruptable (like a state machine) while keeping timing to sync the next notes etc.
Though extreme stuff like black MIDI might make it hang in theory? Is there anything that MIDI players do to prevent that?
Improved the MIDI player somewhat, now performing all MIDI output (to the MPU-401 emulated in UART mode, as well as any commands sent to the MPU-401) using automatically timed I/O on ports 330h/331h.
Somehow, it does seem to trigger some extreme speed at the start of any song? Like it's rushing all notes at the start, to stabilize a bit later.
A big-ish CPU bug was fixed. The RSM instruction, when used, wasn't faulting properly, executing (probably minus CR0 and CR4 writes) when the CPU wasn't in System Management Mode. So it loaded state from 'SMRAM' at A0000-BFFFF (where the SMRAM aperture is in SMM). But since SMM wasn't active, it points to video RAM (if enabled, which is the case when Baresifter is running) or no device responds otherwise (reading 0xFF bytes). So it loaded a very weird CPU state, probably unaffecting CR0/CR4 if interrupted (not while I ran this, since it was in IPS clocking mode, unless external factors interrupt it somehow (like bus locking)). But the #UD is handled first, so it effectively executed directly after the #UD fault, which it shouldn't.
With that fixed, Baresifter is now running onwards. Although it'll take some 2 or 3 days to complete I think (due to only running 300KIPS speed roughly I think). It got to opcode 0F E0-ish in about 18 hours, so a long time is still left to finish it.
Edit: It reached 23 0C F5 00 00 00 00 after 30 hours and 47 minutes. So that's 00-0E, then 0F prefixes and then 10-23h opcode range.
It still needs to run through to FFh, including any and all prefixes. So about 4 or 5 days I think to sift it's instructions?
Edit: it finished sifting.
150:54:34:91.05840: >>> Done!
So it took some 50 hours and 54 minutes to sift. So that's roughly 2 days, 2 hours and 54 minutes.
Although that didn't include any prefixes. So I'll need to run it again with all prefixes (66/67h, which are the operand and address size overrides).
Although that quadruples the execution time to sift (to roughly 203 hours and 36 minutes I think, so about 8 and a half days). So it'd need to churn for about a week and a day or so?
Edit: Only doubled it:
197:32:49:97.09392: >>> Done!
97 hours and 32 minutes to finish with apparently only the 66h or 67h or no prefix.
Oddly enough it didn't try the combination prefixes of both active at once? Even with 2 prefixes enabled and both 66h and 67h prefixes enabled?
Although the filtered output (with just 66h and 67h prefixes tested) gives OK results according to the analyzer.
It just complains about the combinations with all the other prefixes (lock, rep([n]e/z), segment overrides), which is obvious, as those combinations are skilled using my configuration settings of the sifter (to save time, as they don't affect the opcode length, which is tested in this case (other than being consumed from the queue of course)).
So the prefetch unit is working properly at least. So the latest problems with UniPCemu's CPU is probably with some other component in the CPU?
With regard to Jazz Jackrabbit 2 crashing after the first cutscene, I got some information now (using DR Watson for Windows NT):
The attachment 1835-DRwatson1.png is no longer available
The attachment 1836-DRwatson2.png is no longer available
The attachment 1837-DRwatson3.png is no longer available
The attachment 1838-DRwatson4.png is no longer available
The attachment 1839-DRwatson5.png is no longer available
Can anyone see anything in there that's incorrect?
Edit: Extracted the game from the disk image and ran it on Windows 10. It ran without problems.
So the issue isn't with Jazz Jackrabbit 2 itself, but in the CPU emulation? Or perhaps NT 4.0?
Although I remember the same issue happening on Windows 9x inside UniPCemu.
Edit: Also discovered a disk image conversion bug where it would transfer 0 bytes infinitely when converting from sfdimg disk image format to plain static disk images (doesn't affect the last official release though, as it's increased buffering mechanics in the newer versions introduced somewhere in the last year of development of the app).
Improved the ATA-1 hard drive specification a bit.
Now the (custom) sector size is properly reported and the set multiple command's maximum value is adjusted properly according to it.
Also words 80, 81, 82, 85 from ATA/ATAPI-4 are now used on the ATA-1 hard drives (although ATA-1 specs don't specify anything past word 63, this improves compatiblity by specifying the limited instruction set, NOP support and that it conforms to said ATA-1 specification).
Also added reporting for 32-bit drive I/O according to the ATA-1 specification.
Some issues with the CD-ROM drives not reporting some bits they should (major/minor version for ATA/ATAPI-4 properly filled for sff8020i, some RESET PACKET DEVICE/NOP command feature set reported enabled properly now, one documented bit on word 87 properly set according to ATA/ATAPI-4 specs).
Edit: Also fixed some text field errors in the ATA/ATAPI drive identification data to parse the length of odd length strings properly, preventing the 0x00 bytes from appearing in the text fields.
It seems that the atademo program is kind of successful for interpreting said data at least, showing me if there's an error in the data I make my emulator produce.
Oddly enough, performing the ID command on my ATAPI-4 CD-ROM drives (also based on the SFF-8020i documentation as well) it complains about some fields it says aren't correct in the ID data (the IDENTIFY PACKET DEVICE results):
- It says that word 49 should have the DMA bit cleared? Isn't this supposed to be set when the drive is supporting DMA commands? Or does it follow the PCI configuration's DMA port being enabled or disabled(set to port 0)?
It also seems to complain about all kinds of ATA-1 defined values, undefined in ATA-7 or so it looks like?
After looking into the newer 486+ instruction, I found a 486+ bug:
The XADD instruction has it's destination and source operands in reversed order (basically adding dest to source instead of the other way around).
Basically the opcode information has it specified as DST,SRC but the instruction parses the second operator (thus SRC) as DST of the instruction, which it shouldn't do.
I'll need to fix that in a new commit once I got the time to implement that.
Edit: Fixed.
Last edited by superfury on 2025-03-10, 17:22. Edited 1 time in total.
So BSWAP on 16-bit general purpose registers brings the upper 12 bits swapped into the low bits (basically reading the 32-bit register, swapping 32-bits, writing the lower 16 bits of the result into the 16-bit register)?
Found some issues with the SYSEXIT instruction as well, as it shouldn't affect any flags.
It previously cleared the Virtual 8086 mode and interrupt flags, which it shouldn't (probably because it was a modified copy of the SYSENTER code).
Found that the Pentium Pro NOP (opcode 0F0D) and HINT_NOP (0F18 throuh 0F1F) undocumented opcode cases to not properly #UD fault when undefined cases (like /0 and below /4 for those cases). Thus that's now properly implemented.
OK. During the Windows 2000 boot process from the installation CD-ROM I see the following new instructions being called:
CMPXCHG dword
WRMSR 8b
RDMSR 8b
WRMSR 10
RDTSC
WRMSR 10
CMPXCHG8B (equal) x2
XADD dword x3 ...
WRMSR 10
" 4c
" 174 =8
" 176 =807ec1e0
" 175 =0
No new instructions after those?
The primary master hard drive's lasts sector read was sector 1. It's IRQ is lowered. It's selected. Busmastering DMA has status 24h, all other registers cleared.
The CD-ROM is reporting it's signature. It's last read LBA was 10h (sector 16). It's IRQ is lowered. Busmastering DMA has status 04h with all other registers cleared (00h).
I'm wondering a bit about something. Since the busmastering DMA both have bit 2 set (which indicates an IRQ), is it raised for any IDE IRQ raise? Or just when DMA is transferring?
OK. Some more updates to the hardware (minus the ATAPI result phase not loading the task file with the signature (for cylinder high, low and sector number fields)), I see Windows 2000 starting to access the PCI configuration space for the IDE hard drive controller.
I see it reading byte 0 of the IDE configuration space (of the PIIX3 IDE controller).
It reads the entire PCI ID of the controller (8086:1237).
It then selects register 04 of the i440fx host bus.
It reads 06008002h from it. (the command&status words)
It selects register 08 of the host bus.
It reads 06000000h from it (the class code&revision ID).
It selects register 0C of the host bus.
It reads 00002000h from it (header type, caching etc.).
It selects register 10h of the host bus (the first BAR range).
It reads zeroes from it.
It selects register 14h.
Also zeroed read.
Register 18...
Also zeroes read.
Register 1C...
More zeroes.
Register 20...
Again zeroes.
Register 24...
"
Register 28...
"
Register 2C...
"
Register 30...
"
Register 34...
"
Register 38...
"
Register 3C...
"
It disables the PCI configuration space.
It enables the PCI configuration space to point to the controller again.
It reads the ID again.
It selects register 04...
It reads it's command/status.
It selects register 08...
It reads 06000000h.
It selects register 0C...
It reads 00002000h.
It disables the PCI configuration space.
It enables the PCI configuration space to point to the controller again.
It reads it's ID.
register 04...
It reads it.
Register 08...
It reads it.
Register 0C...
It reads it.
Disable/enable again.
It reads the ID.
It eventually moves over to the IDE controller.
It reads it's ID.
It reads it's class code/revision ID.
It reads it's cache line/header type.
It reads it's BARs and other registers.
It then reaches the final device on the bus (the PCI-to-ISA adapter where the video card is located) and reads all of it's data.
It returns to the host controller again.
I then see it invalidating E10A9000...
E111E000...
E111F000
E111E000
E111F000
FC75000
FC4E000
FC9C9000
...
E1127000
non-responding port 3C4 (16-bit, obvious, because it's on a PCI bus only supporting 8-bit).
"
I/O port 330h read not responding. Odd. That's the MIDI implementation (if it's enabled).
Same for 334h (no 3rd MIDI device).
234h 8-bit read fail.
134h too.
130h too.
230h too.
35Fh too.
140h too.
101-141h etc. (all ending in xx1h).
Page E1129000
Page FC199000
FC1A9000
FC7E6000
E112D000
E114D000
E114E000
E114F000
E114E000
I/O to port 3B4 not responding.
3B5 too (input).
3D4 not responding (write).
...
Edit: interestingly, FC90E63C seems to contain some information (just a bit):
D\x00F\x00\x80\x38\x80\x38\x12\xE1\\ArcName\%s\x00
(converting the ASCII characters into simple 8-bit C-style escaped string, followed by a zero string termination?).
So it looks like some data(44004600803812E1) followed by some ArcName sprintf specifier?
Is there some kind of resolution of an arcname supposed to be happening that doesn't? Or sprintf failing to be called?
Edit: Looking into memory, C0000040 seems to contain some kind of lookup table? I see what looks like some addresses? Or perhaps that's the page table? The entries all start with 67h and end with 01h. Before 01h there's 9F, A4 and A5h, another row 78,B8,F8,38 (byte index 1).
It's the memory range of C0000040-C00000CF.
It's probably the PDE table I think?
mr.catwrote on 2025-03-11, 08:53:Believe it or not, M$ does sometimes document their doings, as seen here:
https://learn.microsoft.com/en-us/windows-har … ble-bo […] Show full quote
The PCI IDE ID in this case (the onboard one that's used to boot Windows and contains both the CD-ROM (secondary master) and HDD(primary master)) is 8086:7010. Base and sub-class are set to 01h.
One little question: is the i440fx PIIX's ProfIf supposed to be writable? So are bits 0/2 supposed to be modifyable (which is probably indicated by bits 1/3 respectively being forced set)?
Did some more diving deep into the page faults and invalidation. I see happening (not every fault/invalidation though):
7fff0000 (pf)
IDE PCI 0
IDE PCI 4
IDE PCI 8
IDE PCI C
IDE PCI 10
...
e1069000 (pf)
e106a000 (...)
e106b000
e106c000
e106d000
e106e000
e1075000 x3
e1076000
e1077000
e1078000
e1079000
e107a000
e107b000
e107c000
e107d000
e107e000
e107f000
e1080000
(...)
e1081000
e10a3000 (^\-stosd)
-e10a9000 x2
INVLPG e10a9000
e10a9000
e10aa000
-e1114000
e1150000
e114e000
e114f000
fc19a9a4
fc1d14e0
fc1d26e4
fc1c9204
fc1abfb6
fc1a53a0
fc1a86ec
fc252d4c
fc1b32ef
fc1d0eac REP SCASW
fc1b0695
fc1ce234 MOV (pf-np)
fc1cdaa4 opcode 80
e114f000
clear TLB by MOV CR3
fc1aabbe fetch np
fc253208 A1 (32 bits oper/addr)
again e114e000
Then a blue screen is printed on the screen with BSOD, STOP: 0x000007B (0xFC90E63C,0xC0000034,0x0...,0x0...)
As a side note, I also saw some interesting things:
opcode FFh /2 page fault on addr fc1ca598
opcode A1 16-bit operand with 32-bit address size PFaddr:7ffe02d0 not present
Just found another small bug. The 80486 CPUs and higher were falling back non-INVLPG instructions on opcode 0F01 to the 80286 version if it's the 16-bit version. But it requires the 80386 version instead, which overrides both 16-bit and 32-bit (with fallback to 16-bit in some specific cases). Mostly required for SMSW though.