VOGONS


Reply 100 of 145, by maxtherabbit

User metadata
Rank l33t
Rank
l33t

I guess we kinda already knew this, but whatever the problem is with this 73GB drive on this system is specific to the CHS type int 13 calls. If I partition/format it with a single 100% FAT32 partition it actually works fine. The format will crash at the end as shown in my screenshot before, but after a reboot the drive is actually accessible in DOS. Can read directory and do some basic file manipulation.

My next step will be to try the 2.10 ROM

Reply 102 of 145, by mkarcher

User metadata
Rank l33t
Rank
l33t
maxtherabbit wrote on 2023-09-27, 20:34:

I guess we kinda already knew this, but whatever the problem is with this 73GB drive on this system is specific to the CHS type int 13 calls. If I partition/format it with a single 100% FAT32 partition it actually works fine. The format will crash at the end as shown in my screenshot before, but after a reboot the drive is actually accessible in DOS.

I don't think it matters whether you do CHS-style or LBA-style calls. In both cases, the read and write calls don't know and don't care about how big the drive actually is. The LBA-style call just forwards the LBA to the SCSI drive, and the CHS-style call uses the number of "heads" and "sectors per track" to calculate an LBA from the CHS value. After that, they are processed in the same way (at least after I fixed the ID0 bug). You should be able to test my theory the LBA vs CHS doesn't matter by trying to re-create the FAT16 partition that causes the issue, and switching between type 6 (which makes IO.SYS use CHS calls) and 0E (which makes IO.SYS use LBA calls). I don't expect any difference.

The strongest indication that we are not dealing with an geometry calculation issue is the drive LED that stays on. Both the CHS and the LBA code calls the same "SCSI execution engine" function. This function is supposed to turn on the LED, run the SCSI command, and then turn the LED back off. If the LED keeps stuck on, this execution function "forgot" to turn it off, probably because some timeout handling in it is buggy. As SCSI command execution (at least on the level supported by the basic input/output system) is very straightforward (you select the device, the device asks for a control message that selects the LUN, the device asks for the command to execute, which will be a 10-byte sized data block containing the read or write command, the device requests to transfer data, then the device requests to send a status byte, which will indicate "OK", and finally the device requests to send a message which will be "command complete". It's difficult to mess this up, especially if you have hardware that supports the software with recognizing what kind of action is currently requested by the device. The is only one thing that can complicate the handling of SCSI command execution - and that is a device sending special control messages during the transfer. These device may interrupt the normal SCSI command execution process any time to send a control message. The most prominent use of the "interrupt execution and send a message" system is the "disconnect" feature. When the device detects that it will take some time until the next byte can be transferred, it will switch from the "data" phase to the "message in" phase, and send a "DISCONNECT" message. In that case, the bus will be freed after the host adapter receives the message, and the device will "re-select" the host adapter later to continue the execution of the command. The disconnect/reconnect process during transfer can be combined with "SAVE POINTERS"/"RESTORE POINTERS" which controls what data range will be transferred after re-connection. This adds a lot of complexity to the process, which makes it much easier to break. So my intuition is that the Atlas firmware sends SCSI messages in a way or timing that is incompatible with the more recent BIOS versions. My suggestion to try with "disconnect" disabled (via the ECU) still stands. If disabling disconnect helps, I may take a look at what changed regarding SCSI command execution, and whether I can identify some sequence that would cause the execution engine to deadlock.

Reply 103 of 145, by maxtherabbit

User metadata
Rank l33t
Rank
l33t
mkarcher wrote on 2023-09-27, 23:11:

the CHS-style call uses the number of "heads" and "sectors per track" to calculate an LBA from the CHS value. After that, they are processed in the same way (at least after I fixed the ID0 bug). You should be able to test my theory the LBA vs CHS doesn't matter by trying to re-create the FAT16 partition that causes the issue, and switching between type 6 (which makes IO.SYS use CHS calls) and 0E (which makes IO.SYS use LBA calls). I don't expect any difference.

The BIOS decides whether to do "extended translation" (X/255/63) or regular (X/64/32) translation based on whether the total drive size is >1GB right? So wouldn't the CHS calls need to process the total drive size for that reason? Even if that was only done once at the time of enumeration, if it overflowed couldn't it poison the well for all future CHS calls?

I don't see how the "disconnect" theory would explain why DOS can read the drive fine when there is a single 73GB FAT32 partition on it, necessitating the use of LBA calls, but not when there is a single 2GB FAT16 partition. I'm willing to try it since it's easy enough though.

Reply 105 of 145, by maxtherabbit

User metadata
Rank l33t
Rank
l33t
mkarcher wrote on 2023-09-27, 23:11:
maxtherabbit wrote on 2023-09-27, 20:34:

I guess we kinda already knew this, but whatever the problem is with this 73GB drive on this system is specific to the CHS type int 13 calls. If I partition/format it with a single 100% FAT32 partition it actually works fine. The format will crash at the end as shown in my screenshot before, but after a reboot the drive is actually accessible in DOS.

I don't think it matters whether you do CHS-style or LBA-style calls. In both cases, the read and write calls don't know and don't care about how big the drive actually is. The LBA-style call just forwards the LBA to the SCSI drive, and the CHS-style call uses the number of "heads" and "sectors per track" to calculate an LBA from the CHS value. After that, they are processed in the same way (at least after I fixed the ID0 bug). You should be able to test my theory the LBA vs CHS doesn't matter by trying to re-create the FAT16 partition that causes the issue, and switching between type 6 (which makes IO.SYS use CHS calls) and 0E (which makes IO.SYS use LBA calls). I don't expect any difference.

The strongest indication that we are not dealing with an geometry calculation issue is the drive LED that stays on. Both the CHS and the LBA code calls the same "SCSI execution engine" function. This function is supposed to turn on the LED, run the SCSI command, and then turn the LED back off. If the LED keeps stuck on, this execution function "forgot" to turn it off, probably because some timeout handling in it is buggy. As SCSI command execution (at least on the level supported by the basic input/output system) is very straightforward (you select the device, the device asks for a control message that selects the LUN, the device asks for the command to execute, which will be a 10-byte sized data block containing the read or write command, the device requests to transfer data, then the device requests to send a status byte, which will indicate "OK", and finally the device requests to send a message which will be "command complete". It's difficult to mess this up, especially if you have hardware that supports the software with recognizing what kind of action is currently requested by the device. The is only one thing that can complicate the handling of SCSI command execution - and that is a device sending special control messages during the transfer. These device may interrupt the normal SCSI command execution process any time to send a control message. The most prominent use of the "interrupt execution and send a message" system is the "disconnect" feature. When the device detects that it will take some time until the next byte can be transferred, it will switch from the "data" phase to the "message in" phase, and send a "DISCONNECT" message. In that case, the bus will be freed after the host adapter receives the message, and the device will "re-select" the host adapter later to continue the execution of the command. The disconnect/reconnect process during transfer can be combined with "SAVE POINTERS"/"RESTORE POINTERS" which controls what data range will be transferred after re-connection. This adds a lot of complexity to the process, which makes it much easier to break. So my intuition is that the Atlas firmware sends SCSI messages in a way or timing that is incompatible with the more recent BIOS versions. My suggestion to try with "disconnect" disabled (via the ECU) still stands. If disabling disconnect helps, I may take a look at what changed regarding SCSI command execution, and whether I can identify some sequence that would cause the execution engine to deadlock.

No effect

20230928_162459.jpg
Filename
20230928_162459.jpg
File size
1.77 MiB
Views
928 views
File license
CC-BY-4.0

Reply 106 of 145, by mkarcher

User metadata
Rank l33t
Rank
l33t
maxtherabbit wrote on 2023-09-27, 23:37:

The BIOS decides whether to do "extended translation" (X/255/63) or regular (X/64/32) translation based on whether the total drive size is >1GB right? So wouldn't the CHS calls need to process the total drive size for that reason? Even if that was only done once at the time of enumeration, if it overflowed couldn't it poison the well for all future CHS calls?

You are correct. At the time of enumeration, the high word of the 32-bit sector count is compared to 32 (which compares the sector count to 2 "binary millions"), which is 1 GiB. If it is less than 32, the 64/32 geometry is chosen, otherwise the 255/63 geometry. This is stored as a single bit in a drive flags register on the controller card. In case the number of cylinders is required (for INT 13, AH=8), the maximum sector count representable with the chosen geometry is calculated (1024*H*S) as 32-bit number. If the actual sector count returned by the drive (it is queried when INT 13, AH=8 is called) is bigger than the maximum count representable by the geometry (determined by comparing the 32-bit numbers), the actual sector count is replaced by the maximum sector count. Afterwards, the number of virtual cylinders is calculated. This approach eliminates any possiblility of overflow, at least up to 1TB.

maxtherabbit wrote on 2023-09-27, 23:37:

I don't see how the "disconnect" theory would explain why DOS can read the drive fine when there is a single 73GB FAT32 partition on it, necessitating the use of LBA calls, but not when there is a single 2GB FAT16 partition. I'm willing to try it since it's easy enough though.

You already disproved my disconnect theory. The idea was that fine details of the access patterns influence at what time the drive firmware will disconnect and reconnect. I supposed some access patterns (depending on what parts of a read request hit the on-disk cache) cause a disconnect/reconnect pattern that is incompatible with the 2.11 BIOS. The layout of the on-disk structures is obviously different on the FAT16 and the FAT32 partition.

You can prove or disprove my theory that the issue is not related to LBA vs. CHS by creating a partition in the first 8GB, and then switching between non-LBA types (06/0B) and LBA types (0E/0C).

maxtherabbit wrote on 2023-09-28, 20:31:
mkarcher wrote on 2023-09-27, 23:11:

My suggestion to try with "disconnect" disabled (via the ECU) still stands.

No effect

Thanks for testing that.

Reply 108 of 145, by maxtherabbit

User metadata
Rank l33t
Rank
l33t
jakethompson1 wrote on 2023-09-27, 23:51:

Do SCSI cards place a FDPT at INT 41h/INT 46h or is that strictly for ATA-compatible?

It would appear the answer to that is no

20230928_180603.jpg
Filename
20230928_180603.jpg
File size
1.74 MiB
Views
889 views
File license
CC-BY-4.0

Reply 110 of 145, by Disruptor

User metadata
Rank Oldbie
Rank
Oldbie
maxtherabbit wrote on 2023-09-28, 22:34:

Are you suggesting I just hex edit the partition type field in the table and change nothing else? Should that work normally?

It is just an attempt. We don't know yet what will happen. Please try it.
Thanks.

Reply 111 of 145, by maxtherabbit

User metadata
Rank l33t
Rank
l33t
Disruptor wrote on 2023-09-29, 00:32:
maxtherabbit wrote on 2023-09-28, 22:41:

It would appear the answer to that is no
20230928_180603.jpg

Which tool do you use for that?

Checkit 3. Go to the memory map, you should see "Interrupt Vectors" first on the list. Select it and press enter and it will display the entire IVT

Reply 112 of 145, by maxtherabbit

User metadata
Rank l33t
Rank
l33t
Disruptor wrote on 2023-09-29, 00:36:
maxtherabbit wrote on 2023-09-28, 22:34:

Are you suggesting I just hex edit the partition type field in the table and change nothing else? Should that work normally?

It is just an attempt. We don't know yet what will happen. Please try it.
Thanks.

Ok, I'll try it first on another machine to make sure DOS can tolerate it. Then try on the EISA box.

Reply 113 of 145, by mkarcher

User metadata
Rank l33t
Rank
l33t
maxtherabbit wrote on 2023-09-28, 22:34:

Are you suggesting I just hex edit the partition type field in the table and change nothing else? Should that work normally?

Yes, exactly that. As long as the CHS and LBA fields in the partition table are consistent, and you don't change a partition that exceeds the CHS boundary (i.e. 8GB with the usual 255/63 mapping) to use CHS calls, it should work without issues. Windows 9x FDISK uses the CHS type code whenever the partition is CHS addressable to be compatible with older non-LBA capable operating systems and environments, and uses the LBA type code if CHS access is going to fail (because the partition exceeds the CHS bounds) to prevent damage from operating systems that don't implement LBA calls. Marking a CHS-compatible partition "LBA only" is not going to hurt, but of course DOS 6.2 will no longer see it.

I'm thinking about trying to get a Quantum ATLAS hard drive similar to yours to try to reproduce the issue. What series of ATLAS is it? According to my (quick Google) research, there is a 73.4GB model both in the ATLAS 10K II and the ATLAS 10K III series. Knowing the firmware version of your drive would be nice, but likely I can't pick a specific firmware anyway...

Reply 114 of 145, by maxtherabbit

User metadata
Rank l33t
Rank
l33t
mkarcher wrote on 2023-09-29, 18:27:
maxtherabbit wrote on 2023-09-28, 22:34:

Are you suggesting I just hex edit the partition type field in the table and change nothing else? Should that work normally?

Yes, exactly that. As long as the CHS and LBA fields in the partition table are consistent, and you don't change a partition that exceeds the CHS boundary (i.e. 8GB with the usual 255/63 mapping) to use CHS calls, it should work without issues. Windows 9x FDISK uses the CHS type code whenever the partition is CHS addressable to be compatible with older non-LBA capable operating systems and environments, and uses the LBA type code if CHS access is going to fail (because the partition exceeds the CHS bounds) to prevent damage from operating systems that don't implement LBA calls. Marking a CHS-compatible partition "LBA only" is not going to hurt, but of course DOS 6.2 will no longer see it.

I'm thinking about trying to get a Quantum ATLAS hard drive similar to yours to try to reproduce the issue. What series of ATLAS is it? According to my (quick Google) research, there is a 73.4GB model both in the ATLAS 10K II and the ATLAS 10K III series. Knowing the firmware version of your drive would be nice, but likely I can't pick a specific firmware anyway...

I changed the type field to 0E and confirmed DOS can still interact with the partition just fine on the other machine. Haven't been able to test it on the EISA machine yet.

It's an Atlas 10K II. I believe the firmware is DDD6.

20230929_163208.jpg
Filename
20230929_163208.jpg
File size
1.46 MiB
Views
810 views
File license
CC-BY-4.0

Reply 115 of 145, by maxtherabbit

User metadata
Rank l33t
Rank
l33t

Your suspicion was correct, the EISA system's behavior was unaffected by the partition type change to 0E. So it's something specifically triggered either by the size or actual format (FAT16/32) of the partition.

Reply 116 of 145, by Disruptor

User metadata
Rank Oldbie
Rank
Oldbie
Disruptor wrote on 2023-09-24, 12:03:
I currently use an 82.4 G IDE drive using an ACARD adapter on the 2842VL (single, ID0). SCSISelect 1.01 is currently verifying t […]
Show full quote

I currently use an 82.4 G IDE drive using an ACARD adapter on the 2842VL (single, ID0).
SCSISelect 1.01 is currently verifying that disk and does a wraparound at 12997 MBytes.
FDISK from Windows 95 B is showing a disk capacity of 12993 MB.

We'll do some more examination soon.

Finally connected a 2 TB SATA HDD via a SATA-to-IDE adapter and an ACard adapter to a SCSI bus and verified it with SCSISelect on an Adaptec 19160. That took a half night.
Note: The verify speed has not been influenced by the interface speed.

Reply 117 of 145, by maxtherabbit

User metadata
Rank l33t
Rank
l33t

Figured I'd report back and update with my findings about this Atlas 10KII drive. There is indeed something about that drive specifically (and it may not be the capacity, but since it is my only SCSI drive >64GB I can't be sure) that makes it not work 100% on several old pre-EDD SCSI controllers. There is nothing physically broken on the drive, as it works perfectly on several more "modern" controllers. Perhaps a firmware bug?

It has problems with ASPI4DOS, the 2.x 274x Adaptec BIOSes, and a Compaq fast-wide SCSI2 EISA controller I've tried (using an NCR ASIC). On all of these systems the 36.4GB "minion" drive works perfectly.

Reply 118 of 145, by maxtherabbit

User metadata
Rank l33t
Rank
l33t

Does anyone else have an AIC-7770 based card running in win95?

I'm getting a maximum of 5MB/s transfers in 95 on two totally different systems with different HDDs attached, in both cases I can get close to a full 10MB/s in pure DOS (speedsys) using the ASPI driver.

Reply 119 of 145, by ltning

User metadata
Rank Member
Rank
Member

I'm trying to upgrade my 2740 (EISA) board with this, but it seems to be causing my nothing but headache:
- With original BIOS (1.1), the BIOS message shows, but complains about invalid EISA config and doesn't even try to detect devices
- With new BIOS (2.11, unpatched), I get no BIOS messages from the board at all
I haven't even tried with a patched BIOS yet.

The board is configured using newest EISA .CFG/.OVL files. I'll try original BIOS with older .CFG/OVL to be sure, but..

Any ideas?

/Eirik

The Floppy Museum - on a floppy, on a 286: http://floppy.museum
286-24/4MB/ET4kW32/GUS+SBPro2
386DX-40/20MB/CL5434 ISA/GUSExtreme
486BL-100/32MB/ET4kW32p VLB/GUSPnP/AWELegacy

~ love over gold ~