VOGONS


First post, by Kahenraz

User metadata
Rank l33t
Rank
l33t

In Windows 98, it's as easy as checking a box in device manager. Is DMA mode for disks available in Windows 3.1? Does it require DOS or Windows drivers? And what tests can I run to validate that it's actually working?

Reply 1 of 56, by jakethompson1

User metadata
Rank Oldbie
Rank
Oldbie

"DMA" is somewhat of an ambiguous term here

The "DMA" checkbox on an IDE hard drive in Windows 95 OSR2 and later refers to the PCI IDE standard bus mastering standard described here and introduced in the Triton chipset: https://pdos.csail.mit.edu/6.828/2018/reading … E-BusMaster.pdf

As that didn't exist yet when Windows 3.1 and Windows for Workgroups 3.11 were released, it couldn't be supported. And in fact, it didn't even make it into the original Windows 95 release.

The FastDisk part of Windows 3.1 that provides 32-bit disk access does not even assume IDE (e.g., it does not attempt the IDENTIFY DISK command). It works with stock AT disk controllers, too, and does some heuristics to see if there is geometry translation going on that it doesn't know about and can't support.

Maybe there is a third party driver? I don't run DOS or Windows 3.x on machines where I care about PCI IDE bus mastering so I haven't thought deeply about this. Pre-Triton PCI IDE chipsets do have their own custom drivers sometimes.
It would indeed need both DOS and Windows drivers if you want 32-bit disk access.

Reply 2 of 56, by douglar

User metadata
Rank Oldbie
Rank
Oldbie

My experience is that you need to have a vendor provided driver for your storage controller and you need a storage device that likes your driver and your controller. The Promise drivers from that era let you specify the transfer modes as passed parameters and would let you know if they worked or not.

Is there a specific set of components you are working with here?

Reply 3 of 56, by rasz_pl

User metadata
Rank l33t
Rank
l33t

Afaik there was no busmastering on win3.1, but there were 32bit drivers.
Re: List of VLB IDE Controllers Re: List of VLB IDE Controllers
https://www.os2museum.com/wp/how-to-please-wdctrl/

Open Source AT&T Globalyst/NCR/FIC 486-GAC-2 proprietary Cache Module reproduction

Reply 5 of 56, by Horun

User metadata
Rank l33t++
Rank
l33t++

I thought for DOS (and WIn3.1) that Bus mastering is BIOS plus hardware dependent more than driver dependent, but am probably wrong 😀

Hate posting a reply and then have to edit it because it made no sense 😁 First computer was an IBM 3270 workstation with CGA monitor. Stuff: https://archive.org/details/@horun

Reply 6 of 56, by rasz_pl

User metadata
Rank l33t
Rank
l33t

you need hardware that supports it, has own DMA engine so intel 430 or higher, and appropriate driver. There is this https://www.vogonsdrivers.com/getfile.php?fileid=398 claiming windows 3.1 support so apparently I was wrong 😀 There was never official Microsoft busmastering drivers nor official Intel ones probably.
Maybe this would work loaded before win3.1 https://home.mnet-online.de/willybilly/fdhelp … d/base/uide.htm but then you lose protected mode driver advantages (32bit disk access)

Open Source AT&T Globalyst/NCR/FIC 486-GAC-2 proprietary Cache Module reproduction

Reply 7 of 56, by douglar

User metadata
Rank Oldbie
Rank
Oldbie
rasz_pl wrote on 2024-02-27, 04:03:

you need hardware that supports it, has own DMA engine so intel 430 or higher, and appropriate driver. There is this https://www.vogonsdrivers.com/getfile.php?fileid=398 claiming windows 3.1 support so apparently I was wrong 😀 There was never official Microsoft busmastering drivers nor official Intel ones probably.
Maybe this would work loaded before win3.1 https://home.mnet-online.de/willybilly/fdhelp … d/base/uide.htm but then you lose protected mode driver advantages (32bit disk access)

I’m no expert, but I think what you are describing is the way busmaster DMA worked for PCI. Before PCI, there was DMA, but it was more primitive and depended on the device driver (or bios) to “set the table” for the transfers to take advantage of it.

I think there are a handful of Win 3.1 storage drivers for VLB cards capable of enabling “Multiword DMA” if paired with amenable storage devices, but it was tied to specific cards. There were some late ISA cards that had controllers that did more than just pass signals through to the ISA bus, but I don’t remember ever seeing win 3.1 drivers for them and I dont know if they supported DMA.

Reply 8 of 56, by rasz_pl

User metadata
Rank l33t
Rank
l33t
douglar wrote on 2024-02-27, 05:36:

I’m no expert, but I think what you are describing is the way busmaster DMA worked for PCI.

and VLB, and ISA like Adaptec AHA-1540/42. Basically any third party device capable of taking over role of 'master of the bus' 😀

douglar wrote on 2024-02-27, 05:36:

Before PCI, there was DMA, but it was more primitive and depended on the device driver (or bios) to “set the table” for the transfers to take advantage of it.

ISA DMA, very slow and problematic https://www.os2museum.com/wp/the-danger-of-datasheets/ https://www.os2museum.com/wp/more-fun-with-isa-dma/
Afaik after long search "GSI, INC. MODEL 4C INTELLIGENT IDEA" might be the only IDE controller using legacy ISA DMA https://forum.vcfed.org/index.php?threads/was … 02/#post-527616 but even this one is unconfirmed as nobody was able to get his hands on one, its all speculation based on available jumper description https://arvutimuuseum.ee/th99/c/E-H/20413.htm

douglar wrote on 2024-02-27, 05:36:

I think there are a handful of Win 3.1 storage drivers for VLB cards capable of enabling “Multiword DMA” if paired with amenable storage devices

When we last had discussions about those it all came mostly down to DMA on the drive side with FIFO facing CPU, in effect hiding DMA from computer/drivers.

'In Windows 98, it's as easy as checking a DMA box in device manager' is about bus mastering DMA, between computer ram and disk adapter/controller. MWDMA happens between disk adapter/controller and HDD. One could even build and provide drivers for a DMA bus mastering controller supporting just PIO modes. Confusingly similar naming of two slightly related concepts.

Open Source AT&T Globalyst/NCR/FIC 486-GAC-2 proprietary Cache Module reproduction

Reply 9 of 56, by douglar

User metadata
Rank Oldbie
Rank
Oldbie
rasz_pl wrote on 2024-02-27, 23:43:
douglar wrote on 2024-02-27, 05:36:

I’m no expert, but I think what you are describing is the way busmaster DMA worked for PCI.

and VLB, and ISA like Adaptec AHA-1540/42. Basically any third party device capable of taking over role of 'master of the bus' 😀

Like I said, I'm not an expert on this and I'm trying to figure out out, so bear with me here and I try to read and regurgitate:

So I read this: https://en.wikipedia.org/wiki/Direct_memory_access and this https://wiki.osdev.org/ISA_DMA

Here is my summary:

Most well known ISA "DMA" used "Third Party DMA", meaning there was a chip on the motherboard that did the DMA. This was the 8 bit, 4Mhz Intel 8237 DMA controller. PC bus had one chip, AT bus had two of the chips.

ISA "Third Party DMA" DMA was initially provided for hard drives but was quickly replaced by PIO because by the time the CPU's got to 8Mhz, the CPU could move data faster than the 8237 chips, PIO was easier to code, and there was rarely little real world multitasking benefit to third party DMA on the ISA bus because the 8237 chips put a heavy load on the memory bus. Also, IBM never connected all of the features of the 8237 chips to the ISA bus, which further reduced their efficiency, because IRQ's were still needed to indicate that the transfer was complete.

ISA DMA was still used for low speed transfers that needed guaranteed bandwidth. PIO transfers could be disrupted by IRQ requests from other devices and this could cause problems for things like floppy drives, DOS audio, and ECP printer ports, aka thing that needed uninterrupted transfers.

And from reading this, https://en.wikipedia.org/wiki/WDMA_(computer) even the slowest WDMA mode was well past the theoretical limit of the 8237 DMA controllers.

https://wiki.osdev.org/ISA_DMA
To get around the limitations of the 'on board' DMA controller, expansion card manufacturers began to put their own DMA controllers on their expansion cards. They functioned exactly the same way as the 'on board' DMA, 'stealing memory bus cycles' when the processor wasn't looking and thus improving the performance of the system as a whole. These "ISA Bus Masters" are still usually limited to the lower 16 MiB of memory, but do not have the 4.77 MHz issue. This trend continued through the creation of the PCI bus, which eventually entirely replaced the ISA bus in PCs.

So it seems possible that there are first party DMA IDE controllers on the ISA bus, but so far it seems rare.

PCI provided "First Party" DMA, where any expansion card could become a "bus master" and run at high speeds

VLB allowed for legacy "Third Party" DMA over the ISA bus or "First Party" DMA over the VESA connector, but the first party DMA was limited to the Promise 20630 and some of the VLB caching controllers. I'll do a survey of that in the VLB controller thread at some point.

Reply 10 of 56, by jakethompson1

User metadata
Rank Oldbie
Rank
Oldbie
douglar wrote on 2024-02-28, 15:53:

ISA "Third Party DMA" DMA was initially provided for hard drives but was quickly replaced by PIO because by the time the CPU's got to 8Mhz, the CPU could move data faster than the 8237 chips, PIO was easier to code, and there was rarely little real world multitasking benefit to third party DMA on the ISA bus because the 8237 chips put a heavy load on the memory bus. Also, IBM never connected all of the features of the 8237 chips to the ISA bus, which further reduced their efficiency, because IRQ's were still needed to indicate that the transfer was complete.

Not just 8 MHz but presence of the REP INSW and REP OUTSW instructions.
Without those, you have to read 16-bit words from the device in a loop, and because there is no cache and the prefetch queue is cleared on a branch, the code has to be re-fetched from RAM for every loop iteration, competing with accesses of the words and greatly slowing things down.

However, since the AT added 16-bit ISA bus mastering support (e.g., as used by SCSI cards), it's interesting as to why PIO was used for the AT disk controller instead of that. Anyone know?

Reply 11 of 56, by douglar

User metadata
Rank Oldbie
Rank
Oldbie
jakethompson1 wrote on 2024-02-28, 19:11:

However, since the AT added 16-bit ISA bus mastering support (e.g., as used by SCSI cards), it's interesting as to why PIO was used for the AT disk controller instead of that. Anyone know?

I think it might be because the SCSI cards were adding a relatively small amount of circuitry to an already complicated board that allowed them to do first party bus mastering at an incremental cost for an incremental performance increase that was more valuable in the multi-tasking operating systems that often used SCSI storage.

Inexpensive IDE controllers were essentially pass through devices so adding busmaster circuitry to the IDE boards or hard drives would have likely caused the cost of the devices to go up by a larger percentage without a corresponding performance gain, especially when using a single tasking OS like DOS and a slow bus like ISA.

It wasn't until ATA-2 & VLB/PCI that it started to make sense for IDE DMA because IDE controllers started to have some brains in them, storage devices had gotten fast enough that they could clearly benefit from extra bandwidth, and multitasking operating systems were becoming more common which made PIO less desirable.

Reply 12 of 56, by jakethompson1

User metadata
Rank Oldbie
Rank
Oldbie
douglar wrote on 2024-02-28, 19:46:

Inexpensive IDE controllers were essentially pass through devices so adding busmaster circuitry to the IDE boards or hard drives would have likely caused the cost of the devices to go up by a larger percentage without a corresponding performance gain, especially when using a single tasking OS like DOS and a slow bus like ISA.

This decision would have still been pre-IDE as the original AT still came with an ST-506 style disk.
For IDE, the decision to use PIO may have been pure backward compatibility.
I sent a PM to mkarcher asking what he thinks.

Reply 13 of 56, by mkarcher

User metadata
Rank l33t
Rank
l33t
jakethompson1 wrote on 2024-02-28, 19:11:

However, since the AT added 16-bit ISA bus mastering support (e.g., as used by SCSI cards), it's interesting as to why PIO was used for the AT disk controller instead of that. Anyone know?

I guess it's a matter of cost cutting. Implementing bus mastering on the ISA bus is way more complicated than to respond to 16-bit I/O on port 1F0h. A bus master card has to generate the timing of the bus cycle, and it has to make sure the ISA bus is returned to refresh onece every 15µs. At a sensible bus master burst rate of 4MB/s (1WS at 6MHz bus clock), the transfer of 512 bytes still takes 128µs, so a sector burst has to be interrupted around 8 times. It seems WD didn't have a chip for that at hand, and adding an external busmaster DMA engine would have added significant complexity (and cost) to the card, most likely making the floppy controller no longer fit on the same card.

Furthermore, you didn't need that speed on this kind of controller anyway. The WD1006 uses a cheap design with single-ported RAM for the sector buffer. Single ported RAM means that transfer from the sector buffer to the host can start only after the sector has been completely transferred from disk to the sector buffer. The transfer from the sector buffer to the host can not be fast enough to fit a single sector gap to allow a 1:1 interleave, so you have the time of a sector including the gap to transfer the sector to the host (minus overhead for setting up transfer for the next sector). Assuming 1:2 interleave is possible, a 17-sector MFM hard drive at 3600 RPM (60 rotations per second) transfers 17*60 = 1020 sectors per second, i.e. one millisecond per sector. So there is one millesecond to get 512 bytes to the host, i.e. 500KB/s. To get that rate using "REP INSW", you need to perform REP INSW at a rate higher than 250kHz, i.e. 24 cycles per iteration at 6MHz clock. This is easily achievable, so there is no point in implementing a faster host interface. Providing background transfers for multitasking systems was likely not an issue IBM wanted to address with their basic MFM hard drive controller.

I don't think IBM/WD designed the WD1006 knowing that this interface will later become "IDE" and be the primary PC hard disk interface for more than 10 years. They were just transitioning from the XT interface to the AT interface after the XT interface just was a couple of years old, so a more advanced interface for more advanced drives some years later likely seemed like a valid option. Thus I don't think the question "why didn't IBM introduce a higher performance interface with that MFM controller to be future proof" is valid.

Reply 14 of 56, by douglar

User metadata
Rank Oldbie
Rank
Oldbie

I thought I'd set up a nice 256MB DOS622/Win311 image that I could use for testing this. Got it all built and I can use 32 bit disk access and such, but while the 256MB CF will do PIO3 and PIO3 w/ IORDY, it doesn't do multi-word DMA. So I tried to transfer the image to a 512MB CF that does do DMA multi-word transfers using winimage, but regardless of what I did with the MBR & Bootsector, the volume was not bootable from XUB and wasn't visible when I booted from DOS 6.22, but was visible when I booted from a Win95b boot disk or looked at it from a contemporary computer. Anyone have any suggestions for how to fix the 512MB copy I made with WinImage?

p.s. I'm using this card & driver https://www.vogonsdrivers.com/getfile.php?fileid=2076 and if I add the /M0:8 parameter to VG4.386 in the system.ini with the 256MB drive, it crashes when windows starts, and if I try doing that with the DOS driver, it just tells me that it defaulted to the PIO setting done by jumper. /D0:8 and /D0:A work OK.

Last edited by douglar on 2024-03-01, 03:41. Edited 1 time in total.

Reply 15 of 56, by kingcake

User metadata
Rank Oldbie
Rank
Oldbie
douglar wrote on 2024-03-01, 03:36:

I thought I'd set up a nice 256MB DOS622/Win311 image that I could use for testing this. Got it all built and I can use 32 bit disk access and such, but while the 256MB CF will do PIO3 and PIO3 w/ IORDY, it doesn't do multi-word DMA. So I tried to transfer the image to a 512MB CF that does do DMA multi-word transfers using winimage, but regardless of what I did with the MBR & Bootsector, the volume was not bootable from XUB and wasn't visible when I booted from DOS 6.22, but was visible when I booted from a Win95b boot disk or looked at it from a contemporary computer. Anyone have any suggestions for how to fix the 512MB copy I made with WinImage?

p.s. I'm using this card & driver https://www.vogonsdrivers.com/getfile.php?fileid=2076 and if I add the /M0:8 parameter in the system.ini with the 256MB drive, it crashes when windows starts.

What does "transfer the image" mean?

I would read/write the image with DD then resize the partition to the new card size.

Reply 16 of 56, by douglar

User metadata
Rank Oldbie
Rank
Oldbie
kingcake wrote on 2024-03-01, 03:40:

What does "transfer the image" mean?

I would read/write the image with DD then resize the partition to the new card size.

I made a *.VHD file of the 256MB CF from win image and then wrote the VHD image to the 512MB CF as a 512MB volume.

Reply 17 of 56, by kingcake

User metadata
Rank Oldbie
Rank
Oldbie
douglar wrote on 2024-03-01, 03:43:
kingcake wrote on 2024-03-01, 03:40:

What does "transfer the image" mean?

I would read/write the image with DD then resize the partition to the new card size.

I made a *.VHD file of the 256MB CF from win image and then wrote the VHD image to the 512MB CF as a 512MB volume.

I'm not familiar with win image or VHDs, so someone might have to correct me, but I think that captures the partition(s) and not the MBR.

Reply 18 of 56, by rasz_pl

User metadata
Rank l33t
Rank
l33t
jakethompson1 wrote on 2024-02-28, 19:11:

However, since the AT added 16-bit ISA bus mastering support (e.g., as used by SCSI cards), it's interesting as to why PIO was used for the AT disk controller instead of that. Anyone know?

Designing, taping out, fabbing, debugging, packaging and shipping whole DMA controller are all very expensive when compared to one address decoder on a sliver of PCB Re: 8-bit compatible ISA-16 Multi-I/O (and not only) 😀
Microflex-UTC-3001I-ATF20V8B-ISA-IDE-Controller.jpg
Not many people know Adaptec ran a big and very advanced silicon design group. Adaptec not only designed its own SCSI controllers, they also did ASICs for Disks both SCSI and IDE, they even did whole controller with custom chips for Apple LaserWriter.
Oral History of Grant Saviers, CEO of Adaptec https://archive.computerhistory.org/resources … 7-05-01-acc.pdf
Oral History of Grant Saviers, part 2 of 2 https://www.youtube.com/watch?v=Od830KDrLUU
"we were designing our own chips and having them
fabbed. TSMC, Taiwan Semiconductor Manufacturing was the company that made them for us, most of
them. We had a couple of other suppliers. We had a good relationship with TSMC. We had a couple of
other businesses but the SCSI business was the biggest and most profitable we had. We were making
disk drive controller chips for Maxtor and Conner and for Seagate. So we had a pretty capable not drive
development team but everything but the mechanics development, doing all the data paths and the SCSI
interface or whatever the drive interface was. So that business, it was a dogfood business. It was really,
really difficult building custom ASICS."
" We also had a printer controller business. We actually made the imager, the raster imaging guts for the Apple LaserWriter."

douglar wrote on 2024-02-28, 15:53:

ISA "Third Party DMA" DMA was initially provided for hard drives

Floppy. I dont think any MFM controllers ever used ISA DMA. Some network cards tried in mid eighties but it was quickly abandoned.

douglar wrote on 2024-02-28, 15:53:

Also, IBM never connected all of the features of the 8237 chips to the ISA bus, which further reduced their efficiency, because IRQ's were still needed to indicate that the transfer was complete.

Can you elaborate? Afaik there are no hidden unused features in 8237. Do you mean wiring chip originally designed for 8080 as is leading to forced 64K boundary alignment issues?

douglar wrote on 2024-02-28, 15:53:

So it seems possible that there are first party DMA IDE controllers on the ISA bus, but so far it seems rare.

so far potentially one 😀

douglar wrote on 2024-02-28, 15:53:

PCI provided "First Party" DMA, where any expansion card could become a "bus master" and run at high speeds

PCI merely supports bus arbitration (ability to take over), it doesnt provide any DMA on its own. One could argue it got worse with PCI compared to ISA, EISA and MCA - at least all of those had standard defined central DMA engine for cards and drivers to rely on.

Its easy to excuse VESA for VLB being just as bare as PCI. Whole point of VLB was to quickly ship something delivering high transfer rates for absolute minimum cost, and wiring cards directly to CPU bus using recycled MCA connectors did just that 😀

For Intel there are no excuses, they already employed thousands of engineers and knew better. Wouldnt be out of line to suspect malice. Intel at the time was busy doing tons of shady stuff:

- suing AMD

- bought DVI (no, not that DVI) from General Electric Sarnoff Labs (no, not Smirnoff). It was rapidly abandoned once Intel and Microsoft together got caught stealing Apple QuickTime source code https://en.wikipedia.org/wiki/San_Francisco_Canyon_Company https://www.theregister.com/1998/10/29/micros … aid_apple_150m/ Intel silently dropped Video ambitions, Microsoft settled by "investing" $150Mil in Apple:

"David Boies, attorney for the DoJ, noted that John Warden, for Microsoft, had omitted to quote part of a handwritten note by Fred Anderson, Apple's CFO, in which Anderson wrote that "the [QuickTime] patent dispute was resolved with cross-licence and significant payment to Apple." The payment was $150 million."

"Microsoft and Intel had been shocked to find that Apple's QuickTime product made digital video on Windows seem like continuous motion, and was far in advance of anything that either of them had, even in a planning stage. The speed was achieved by bypassing Windows' Graphics Display Interface and enabling the application to write directly to the video card. The result was a significant improvement over the choppy, 'slide-show' quality of Microsoft's own efforts. Apple's intention was to establish the driver as a standard for multimedia video imaging, so that Mac developers could sell their applications on the Windows and Mac platforms. Microsoft requested a free licence from Apple for QuickTime for Windows in June 1993, and was refused. In July 1993, the San Francisco Canyon Company entered into an agreement with Intel to deliver a program (codenamed Mario) that would enable Intel to accelerate Video for Windows' processing of video images."

"Intel gave this code to Microsoft as part of a joint development program called Display Control Interface."

"Canyon admitted that it had copied to Intel code developed for and assigned to Apple. In September 1994, Apple's software was distributed by Microsoft in its developer kits, and in Microsoft's Video for Windows version 1.1d."

- "Designed for Intel MMX" campaign where Intel was paying off game publishers $1 million per game for sticker on every box and some token software changes. For example game POD employs MMX to deliver one optional sound effect 😀 but marketing worked and people to this day associate POD with MMX "acceleration". This later grew into forming whole Developer Relations Group doing same payoffs when pushing SSE in new Pentium 3.

- came up with Native Signal Processing (NSP) initiative to inject MMX dependencies into industry. Idea was purely software peripherals (modems, sound cards, video cards) requiring MMX to force users into buying Intel CPUs. Microsoft got jealous and "asked" Intel to stop https://www.theregister.com/1998/11/11/micros … _said_drop_nsp/

"Microsoft had threatened to withdraw support for MMX if Intel did not drop NSP software development."
ps: Many years later Microsoft did same thing to Creative with Vista killing DirectAudio.

Intel repeated PCI crime in PCIE! This is a big pain point in PCs to this day (performance & security). PCIE didnt standardize unified DMA engine. Every PCI and PCIE card ever manufactured needs to provide its own DMA implementation.

Open Source AT&T Globalyst/NCR/FIC 486-GAC-2 proprietary Cache Module reproduction

Reply 19 of 56, by mkarcher

User metadata
Rank l33t
Rank
l33t
rasz_pl wrote on 2024-03-01, 04:12:
douglar wrote on 2024-02-28, 15:53:

ISA "Third Party DMA" DMA was initially provided for hard drives

Floppy. I dont think any MFM controllers ever used ISA DMA. Some network cards tried in mid eighties but it was quickly abandoned.

While it is likely true that the ISA DMA was initially provided for floppy controllers, as IBM probably did not even think about hard drive controllers for the PC while designing it in 1980/1981. When designing the IBM PC, casette was thought as the "standard" storage option and "floppy" as the luxury upgrade (a strategy that worked on the C64 published one year later, but at a considerably different price point). Yet, every standard (i.e. WD1002-compatible) XT MFM/RLL controller used DMA. The standard resource assignment was Port 320, IRQ 5, DMA 3. Things looked much more in favor of DMA on the PC/XT than it did on the AT, with the availability of "REP INSB"/"REP INSW" being at least as important as the relative clock of the bus compared to the DMA controller. The standard substitute for REP INSB on an 8088 PC is

get_data:
IN AL, DX ; 1 byte of code, one I/O cycle
STOSB ; 1 byte of code, one data memory cycle
LOOP get_data ; 2 bytes of code, pipeline empty after execution

Being lazy, performance of this loop can easily be seen to be no faster than the duration of 6 bus cycles just for instruction fetches and data transfer cycles. As long as you are competing with memory refresh on the PC/XT bus, and thinking about general inefficiency of the 8088 execution engine, we can safely assume an "effective net bus clock" of 4MHz (instead of the actual bus clock of 4.77 MHz), so code executing 6 bus cycles takes 6 microseconds (probably even more, because the empty prefetch queue after the loop instruction has not been sufficiently factored into the calculation), so a top transfer rate of 166KB/s unless you start unrolling the loop. With an unrolled loop, you can asymptotically reach 4 bus cycles per iteration, increasing the theoretical maximum to 250KB/s.

If the XT hard disk controller would respond to two consecutive data ports, so a split 16-bit cycle could be responded to correctly, you can even get down to 6 bus cycles per word, getting to an absolute maximum of 333KB/s with sufficient unrolling. Compare that to the DMA controller that manages to do most cycles within 4 clocks. If we disregard the extra clock every 256 cycles and necessary breaks for memory refresh, and again cumulate these effects into an "effective bus clock" of 4MHz, this would be 1MB/s data transfer rate at no effort at all.

rasz_pl wrote on 2024-03-01, 04:12:
douglar wrote on 2024-02-28, 15:53:

Also, IBM never connected all of the features of the 8237 chips to the ISA bus, which further reduced their efficiency, because IRQ's were still needed to indicate that the transfer was complete.

Can you elaborate? Afaik there are no hidden unused features in 8237. Do you mean wiring chip originally designed for 8080 as is leading to forced 64K boundary alignment issues?

I expect rasz_pl refer to the "end of process" indication provided by the 8237. This pin is connected to the ISA bus, to a pin called "TC". This signal goes active after the last byte has been transferred using DMA. This signal could have been used to trigger an IRQ, but while it is available on the ISA bus, it is not routed to the interrupt controller. The only meaningful use of that signal I know of is that the floppy controller can write consecutive sectors as long as the DMA controller provides more data, and knows that "enough sectors" have been written by getting the TC signal from the DMA controller, thus generating an IRQ.

rasz_pl wrote on 2024-03-01, 04:12:
douglar wrote on 2024-02-28, 15:53:

So it seems possible that there are first party DMA IDE controllers on the ISA bus, but so far it seems rare.

so far potentially one 😀

Just connect an SCSI-to-IDE converter to an Adaptec 1542, and there is your "first party DMA ISA IDE controller". It's a quite expensive "solution", though.

rasz_pl wrote on 2024-03-01, 04:12:
douglar wrote on 2024-02-28, 15:53:

PCI provided "First Party" DMA, where any expansion card could become a "bus master" and run at high speeds

PCI merely supports bus arbitration (ability to take over), it doesnt provide any DMA on its own.

Well, "first party" DMA never has been more than arbitration to get the bus to the expansion card that wants to perform the transfer, as this is the definition of first-party DMA. The first party is the controller card that requires data transfer, the second party is the CPU and the third party is an external DMA controller provided by the platform. So if PCI would have provided some DMA mechanism on its own, it would have been a kind of third-party DMA.

rasz_pl wrote on 2024-03-01, 04:12:

One could argue it got worse with PCI compared to ISA, EISA and MCA - at least all of those had standard defined central DMA engine for cards and drivers to rely on.

The ISA DMA engine was quite lacking comparing the needs of anything better than an XT for anything more demanding than stream transfers to a sound card or providing a means to arbitrate for the bus to do first-party DMA.

For example, think of a network card: The network card does not receive a stream of bytes, it receives a stream of packets. You likely want each packet to be in its own packet buffer, so the card needs a way to skip to the next buffer after receiving one packet. We don't want ethernet cards to apply back-pressure to the coax cable (jamming the cable until the next receive buffer has been set up), so receiving a new packet while the previous one is still being transferred out of the card's buffer is a requirement. To get the full advantage of DMA, a first-party DMA card writes the received packets into host memory into OS buffers, and can advance to the next buffer on its own. You could still use third-party DMA in convoluted ways like assigning two DMA channels to the receiver which are used in a ping-pong fashion, and a third DMA channel to the transmitter. Hey, wait! How many DMA channels does ISA provide? Do you want to assign 50% of the available DMA channels (after deducting DMA2 for the floppy) to a single expansion card?

Then think of SCSI controllers with SCSI disconnect and tagged queueing. They have a lot of requests pending, and we can't prepare a dedicated ISA DMA channel for every pending request. So the controller needs to generate an IRQ to have the host set up DMA for handling a request at the point the controller gets to know what request is next in getting a data phase. The 1542 is even more flexible: It supports the arcane SCSI "SET POINTERS" message, allowing a hard disk (controller) to tell the SCSI host adapter what part of the data block is transferred next. The sectors don't have to be transferred in order of the request, but the hard drive controller can start with the sector just at the head, even if it is in the middle of the requested area. Good luck doing this with 3rd-party DMA.

Designing a third-party-DMA engine that caters all the needs of different DMA-capable cards (not even talking about a card like the SB Live that reads wavetable samples for MIDI synthesis over the PCI bus) would result in a very complicated DMA engine, and likely create extra complexity on the cards to interface that logic. So I think it makes a lot of sense to abandon the idea of a central DMA engine, especially as chip complexity got cheap: In the times of PCI, adding first-party DMA to a card didn't significantly increase cost of chip and card manufacturing anymore.

rasz_pl wrote on 2024-03-01, 04:12:

For Intel there are no excuses, they already employed thousands of engineers and knew better. Wouldnt be out of line to suspect malice. Intel at the time was busy doing tons of shady stuff:

As the people doing the shady stuff you list are not the engineers that could have specified a DMA engine for PCI, I don't think the task of doing "capitalism bullshit" and the task of designing an advanced bus system competed for the same resources.