Just tried the Supersoft/Landmark diagnostics ROM (1.2.0) from minuszerodegrees again. All seems to pass, but the U126 keybord controller test and Protected mode CPU test(usual cause, because of the IRET, giving #GP fault (when it apparently shouldn't) due to invalid segment register loading? Maybe task switching problem?)? The normal BIOS passes the PS/2 keyboard/8082 correctly. But this BIOS fails either or both of them?
Edit: I just see it using the 8042, no keyboard itself? So there's a problem with the 8042?
Oddly enough, sometimes dreams itself can find faults(if they're related!). In this case, the dream was about a problem with FDC port 3F4, which after some searching found out a bug in my emulation and the osdev.org article describing the BAR1/3 values of it's PCI IDE. The article describes values 0/1 becoming 3F6/376, while it's actually 3F4/374(according to https://forum.osdev.org/viewtopic.php?f=1&p=167798 ). So the osdev article has this little fact backwards(forgetting to substract 2 for that register!). Now implementing that into my emulation.
Edit: Implemented it, but no changes from software perspective(the end result is the same either way, just different unprecalculated base ports(n instead of n+2, with the +2(and -2 when decoding) part being moved to the port scanning/decoding part).
Just thinking about something... Could the reason the FDC worked somewhat on the AT before be because the FDC was properly preconfigured using GSETUP? I was trying to get it working with that(720K setting?)? That might help with the AT DIAGNOSTICS disks?
Booting from Dos 5.0a, running checkit 720K checks out properly during the read test. But trying the 1.44MB disks(B-drive) seems to fail?
It tells me cylinder 2&4 head 0 and 62 heads 0 and 1 and 68 head 0 fails with type MAJOR and Notes Drive Not Ready.
As far as I can see that can only happen due to the seeking taking too long(random sector reads moving the track for a too big distance to time, due to CPU timing issues?)?
So it's the same issue the AT had? Easily confirmed reading two tracks far apart after each other(e.g. first track 0 and then track 62(which times out))?
Didn't take much to create a simple program that for each track on a 80-track disk(tracks 0 to 79) executes a drive reset, then reads the first sector on said track. This is done for each of the 80 tracks and printed on screen if it had any errors. Since it's resetting the controller(thus recalibrating) before each read, a maximum distance is createn to seek, allowing FDC errors to be easily spotted.
Just ran my little program on the 1.44MB B drive(for which it's hardcoded) containing a simple testdisk(just some random 1.44MB MS-DOS 6.22 disk). All seeks(and corresponding reads of the first sector) check out correctly on the Compaq Deskpro 386 emulation! 😁
So that actually means that the BIOS FDC handling is going without problems(at least concerning the basic first-random read of the first sector on said track). The application itself just performs a simple test: read all 80 tracks, reading sectors CHS C,0,1 from the disk after performing a disk controller reset for the B-drive(which resets the drive and recalibrates it to track 0) and then read one sector from the B-drive after that on the selected track. 😁 But if the CheckIt! Diagnostics software is failing it somehow, then why is that happening? Why do some of the random read accesses fail? It cannot be because the FDC times out, since it's still the BIOSes job?
Just tried a Dutch Dos 6.22 setup disk(disk 1) boot again. The setup seems to start without problems, but when detecting the hardware, it seems to once again crash into offset FFFF with an IRET? So there's definitely something going on there? That's not a FDC bug, but an actual CPU bug?
The problem is, are there any methods left to find the bug in the remaining CPU opcodes(ALL 80386 opcodes) not tested with the test386.asm testsuite?
Just found a little bug in the 32-bit popping of CS during a RETF, causing it to only pop a word instead of a dword. Thus two dword pops instead of a dword for EIP and a word for CS, resulting in an wrongly aligned stack.
Edit: It seems to have been correct before after all. It's thus restored to it's previous state.
I'm currently drawing a blank here on what's causing the odd jump during the different(Win95 as well as Dos6.22) setups. Earlier, it worked, but somewhere along the line, it started failing(might be due to who knows which commit). Since the last test on Dos6.22 setup was long ago, I also don't know what caused the bug to crash during setup. Anyone? Somewhere during "checking your system configuration", going into the bushes?
Just tried running the MS-DOS 6.22 setup on the XT NEC V30 configuration, which seems to end up in a "System error: 20<Newline>Continue?" from the Turbo XT BIOS v3.0, after which it performs an seemingly limited hardware reboot? So there's something awefully wrong there?
Edit: That error apparently is an Bad Expansion ROM Checksum after soft-reset?
Can anyone find the relevant errors? Jepael? Vladstamate? It's in the common log format, of course. It ends up back running in the Turbo XT BIOS v3.0 code, maybe part of the invalid jump? Maybe part of the cause?
Anyone? Jepael? Vladstamate?
Edit: At line 29544349 it seems to still be executing the program? Somewhere in there or near there the error occurs?
I've managed to track it down to line 29547644, where it's the final point that it registers to before rebooting? It seems to request a verify sectors instruction from the FDC and/or DMA controller? It's calling the interrupt 13h function from 2403:00000016, executing interrupt 13h function 04h(verify sectors). Maybe a DMA problem?
Reducing the machine to a 8088 XT seems to aggrevate the situation: it now reports error 65?
Edit: This seems to have been due to multiple keypresses during booting?
Edit: Since it's even crashing on the basic 8088 emulation, that means that there's probably a problem in the base 808X core? But what's going wrong?
Can anyone see what's going wrong? Reenigne? Jepael?
Edit: Just ran the 80186(actually originally ran on 80286 according to flags dumps) testsuite again. Three modules fail the tests:
- Shifts: All give 'incorrect' values on the low byte of the flags, on bit 7,6 and 3? test386.asm testsuite checks out on the 80386 emulation.
- BCDCNV: test386.asm checks out. 80286 fails on flags AAA(bit 7, 6, 4, 2), AAS(bit 7, 6), DAS(bit 11)?
- MUL: AAD 39h(bit 4, 11 set when expecting cleared), AAD 12h(bit 4, 0 set when expecting cleared)?
Just found a possible bug on the 80186+ LEAVE instruction. This was related to (E)SP being reset while popping (E)BP. This now works as it should.
Edit: Now FreeDOS seems to crash with an #UD exception? So there's now something going (else?) going very wrong?
Edit: Looking at the exception itself being triggered is actually an ARPL instruction executed from Real mode, causing a #UD exception because it isn't supported to be used in real mode? So maybe a problem regarding setting up protected mode in FreeDOS? The read instruction is 636520, which is a ARPL instruction?
Edit: Now dealing with the weirdness of the Compaq Deskpro accessing the FDC, regarding invalid track accesses and detection.
It seems to somehow seek to track 2, then proceeding to read a sector from track 1, which fails due to current track mismatch(would also happen on a real controller?). Is it somehow using double track seeking?
OK. So it's a 80386+ problem: Just tried the same on the AT configuration. It boots properly and seeks to cylinder 1 properly. Only the Compaq Deskpro 386 BIOS seems to somehow incorrectly seek to cylinder 2 instead of cylinder 1? Since it was working, only the ENTER and LEAVE instructions have been improved(80386+) and the FDC has been made more accurate regarding track management(seeks, recalibrates, disk access errors and physical vs controller's idea of the current cylinder).
Just looked at the int13h request that's being done in that case: It's requesting a read on cylinder 01h(registerCH), but seems to seek to cylinder 02h instead, for some odd reason? Thus probably a CPU bug?
Hmmmm..... Line 2170 loads the value 0x74 from the Drive 0,1,2,3 media state BDA entry at 40:90. That causes it to double step, multiplying the FDC SEEK track with 2(by shifting left 1) when seeking only. Thus it will end up at the incorrect track?
The double stepping shouldn't be used in the first place: The hardware uses single-stepping for all different FDC media types. So why does it detect so?
My notes:
1CALLER_AL equ 0x00 ; [BP+0x00] = AL SECTORS TO READ 2CALLER_AH equ 0x01 ; [BP+0x01] = AH FUNCTION NUMBER 3CALLER_BL equ 0x02 ; [BP+0x02] = BL LOW PART OF DESTINATION OFFSET 4CALLER_BH equ 0x03 ; [BP+0x03] = BH HIGH PART OF DESTINATION OFFSET 5CALLER_CL equ 0x04 ; [BP+0x04] = CL SECTOR NUMBER 6CALLER_CH equ 0x05 ; [BP+0x05] = CH TRACK SELECTED 7CALLER_DL equ 0x06 ; [BP+0x06] = DL DRIVE TO USE 8CALLER_DH equ 0x07 ; [BP+0x07] = DH HEAD 9CALLER_SI equ 0x08 ; [BP+0x08] = SI 10CALLER_DI equ 0x0A ; [BP+0x0A] = DI 11CALLER_ES equ 0x0C ; [BP+0x0C] = ES DESTINATION SEGMENT 12CALLER_DS equ 0x0E ; [BP+0x0E] = DS 13CALLER_BP equ 0x10 ; [BP+0x10] = BP 14CALLER_IP equ 0x12 ; [BP+0x12] = IP 15CALLER_CS equ 0x14 ; [BP+0x14] = CS 16CALLER_FLAGS equ 0x16 ; [BP+0x16] = FLAGS 17 18ecc0: call 8fa4: Loads 40:8F(Combination hard/floppy disk card when bit 0 set) 19x9190: Load media state address in BX based on [BP+06]. This results in [40:90] to be loaded for drive A(bit 0-2=media state, 4=drive established, 5=double step drive, 6-7=BPS rate(0=500k,1=300k,2=250k,3=reserved)). 20xecf0: Start of the FDC operation 21xca4f: Setup DMA for the transfer of data 22:note opcode 86 has it's parameters reversed(two registers being switched around)? This has now been fixed(still reversed in the current log). 23xc956: Enable the drive motors as required(and delay when started).
Edit: Looking at the occurrences of reads/writes to memory location 00000494(The byte containing the double step setting at bit 5(value 0x20)), I only see reads with the unchanged value. So the cause of the double step being set isn't caused during the current INT13h call, but some earlier INT13h call that causes it to detect a double step drive?
Edit: This seems to be established during the very first FDC access, reading the boot sector?
Edit: Looking at it again, the first time that bit(bit 5) is set(which stays set when set), is during the very first sector read operation of the FDC(C,H,S address 0,0,1; The boot sector). After that it remembers that it's supposed to be a 'double stepping' drive, thus multiplying all track numbers with 2 when seeking. This causes the current boot problem in some way.
Now the question: Why is it setting said bit? The drive isn't a double stepping drive, nor is it supposed to give any information to the CPU that should be identifying it as being one(reporting the same track as seeked to, as is detected by the IBM AT BIOS(which tries seeking and detecting the double stepping using a simple READ ID floppy command to verify it's on the right track, instead of a double stepping track).
Time for a dump of the boot sector being read...
Edit: This is the dump of the BIOS routine(interrupt 13h) executing (dumped in the common emulator format), which details what is happening when reading the very first FDC boot sector into memory(and consequently detecting that it's a 'double stepping' drive, which it isn't, causing all following track Seek commands to be doubled in size compared to their Read sector command counterparts, causing the FDC controller to error out with ST0=0x40).
https://www.dropbox.com/s/o5fob1o0rf79jcm/deb … 02_1954.7z?dl=0
Warning: Since the text file is so large(1.5GB), it might crash/hang the OS(Like when using Notepad on Windows) due to the file being too large to process in memory. Use a large text file editor that is compatible with large files to read the log.
Looking for the changes to address 00000490 shows to me that at 000090A1, 0x61(300Kbps drive, double stepping required, media/drive not established, 360Kb diskette/1.2Mb drive not established) is loaded into the byte, which shouldn't happen? Or at least to be overwritten later? The block that sets this starts at (F000:)0000909B. Why is it ending up at that invalid location? Hmmm....
Edit: It seems the cause of it ending up there is the jump in the block at F000:00009072? So the cause starts at line 25817631?
Addition to my notes:
1F000:0000B544 Reads the CMOS register AL into AL. 2The CMOS register that's read in the relevant block is register E of the CMOS(which is set to 0x44, being two 1.44MB floppy drives). 3It tests bit 7(in this case, the second floppy drive's third bit, which is set), then jumps to the code that sets the invalid 0x61 value because the bit is set(the 0x40 bit in the floppy configuration byte in the CMOS RAM)?
Why would it set said bit(double stepping drive) when detecting the second floppy disk drive to be of type 4+(4=1.44MB and 5=2.88MB), thus 1.44MB or larger disks?
Edit: So, does that mean I have to set the second floppy disk to 720K at most(even though 1.44MB+ is theoretically supported by the BIOS) in order to keep booting properly?
Edit: Just changed it manually, since FDC booting fails. It will now seek correctly, but fail anyways?