digger wrote on 2025-07-30, 07:48:
That's the cool thing about this forum. I keep learning new stuff from the DOS PC era. Thanks. 🙂
Nevertheless, this thread is about EXMS86, not the design(?) of the DMA System of IBM-compatible computers, so we shouldn't derail that thread too much with off-topic posts. If you are interested in a more detailed discussion, I suggest you open a new thread if you have further questions. Feel free to send me a PM with a link if you are afraid I miss it otherwise.
digger wrote on 2025-07-30, 07:48:
Wasn't it still the case that ISA DMA was kind of a bottleneck, especially on anything faster than a 286, since it always had to access the system RAM at 5MHz?
While you have a point in that the transfer rate of ISA DMA ist quite low, and the base frequency of the DMA controller in AT computers is half the ISA clock, i.e. around 4MHz, that is even less than the 4,77MHz in the PC/XT, calling it a "bottleneck" is exaggerating it.
digger wrote on 2025-07-30, 07:48:
It's kind of disappointing that it ended up being the primary I/O method for playing back digitized audio samples on popular sound cards back in the day, causing major headaches w.r.t. DOS sound compatibility once PCs moved beyond ISA slots.
For ISA sound cards, ISA DMA ist the method of choice for data transfer. Having a central multi-channel DMA controller on the mainboard instead of some DMA technique on each sound card also is a smart choice, as it makes card design easier. The issue with non-ISA sound cards is that there was no standard how PCI cards can be target of the central (ISA) DMA controller, which is a design choice when PCI was specified.
digger wrote on 2025-07-30, 07:48:
If I understand correctly, from the 386 and up, this became such a bottleneck, that even driving a Covox-like dumb LPT DAC directly would be take up less overhead than relying on ISA DMA for digital sound playback. Or does that deserve more nuance?
This paragraph mirrors the sentiment about ISA DMA, which clearly is deficient, and in my opinion was barely adequate for the original PC, but never kept up, yet it is that far exaggerated that one has to consider it factually false. So, let's put the rants aside and get back to the facts.
ISA DMA is inconvenient to use, as the DMA controller used in the original IBM PC is actually designed for 8080 or 8085 systems with a total address space of 64K. IBM added a "page register" that supplies four extra address bits per channel, but these page registers are not tied to the address bits in the DMA controller, so a transfer always stays inside one 64K block. You can't do a transfer starting at physical address 60K (which is in the first 64K block) and ending at address 70K (which is in the second 64K block). The IBM mainboard BIOS rejects a floppy operation like this with error code 9 ("DMA segment overrun"). It is quite likely you never knew this, even if you were writing programs that directly access the floppy drive using INT 13h. This is because DOS installs a shim layer above the BIOS which transparently works around this limitation, so the disk operating system actually did care about making disk operations easier. As the second DMA controller in the AT is wired to address 64K words (128K bytes, but each transfer needs to be performed from an even address), you get barriers every 128K instead of every 64K, but the issue persists, as the AT still uses the 8237A meant for 8-bit systems. If the AT hard drive controller would have used a 16-bit DMA channel instead of port I/O, all hard disk transfers would have required word alignment, which is something XT software didn't care about, and is likely a contributing factor why the AT hard drive controller did not use DMA.
Indeed, the clock speed of the 8237A is limited to 5 MHz, and the controller is operated at 3MHz in the original 6MHz AT, and at 4 to 4.2 MHz in later systems. The standard configuration of the ISA DMA controller requires 4 click cycles per transfer (+ some overhead), so at 4MHz, a transfer rate of 1 mega transfer per second was theoretically obtainable, which is 2MB/s. This is faster than the transfer rate of an MFM hard drive. Typical controllers first transferred a sector from the drive surface into controller RAM, and only after getting the signal from the hardware CRC comparator that the checksum is OK, the sector is transferred from the controller to the host. The hardware CRC comparator checks the CRC while the data is read from the drive, so ther is no notable latency for the CRC check, and the transfer to the host can begin immediately after reading the sector+its CRC. In case the CRC mismatched, the correction was a slow process performed on the controller, but let's omit this fringe case for now. A desirable design uses interleave 2, which records the sectors in the order 1 - 10 - 2 - 11 - 3 - 12 - 4 - 13 - 5 - 14 - 6 - 15 - 7 - 16 - 8 - 17 - 9. This means that the data from sector 1 can be transferred from the controller to the host while sector 10 passes the drive head. Thus 50% of the time is used for reading data from the disk to the controller and the other 50% is used to transfer data from the controller to the host. At 3600rpm (60 rotations per second), we can read 17 sectors in two revolutions, so 30 * 17 = 510 sectors can be read in a second, yielding a net transfer rate of 261 KB/s. As only 50% of the time is spent on transferring data from the controller to the host, that transfer would require twice the speed, i.e. arounf 520KB/s. This is way below the 2MB/s theoretical (and likely 1.6MB/s practical) limit of ISA DMA, so contrary to what some people say, when IBM designed the AT, ISA DMA was not prohibitively slow for hard drive transfers. Even if you used very modern MFM drives that have 50% higher transfer rates, you still end up at around 780KB/s required DMA rate. While faster hard drives did exist in 1984, they were not available at a form factor or price point that are a good fit for the IBM AT. So while the transfer rate is limited, disregard any claim that the rates are unusable for any practical purpose.
Furthermore, it is true that DMA transfers at their quite low rate put a burden on the ISA bus. If you run a WSS-type card in the less efficient "single mode DMA" variant at 48kHz 16 bit stereo (around 200KB/s), expect ISA transfer rates to go down by 33%. This is because WSS-type cards use an 8-bit channel, and "single mode" being the least efficient mode of the three available modes. Basically, you have
- Block mode: The device request DMA, the DMA controller obtains bus ownership and then issues as many DMA read or or write cycles from/to the device as the DMA controller is set up for. It doesn't matter whether the device stops requesting DMA during the transfer, the transfer will be continued until the programmed number of bytes/words have been transferred.
- Single mode: The device requests DMA, the DMA controller obtains bus ownership, transfers a single byte or word and then release bus ownership. If after releasing bus ownership, the device is still requesting DMA, the DMA controller again enters bus arbitration to obtain the bus for the next byte or word.
- Demand mode: This mode a cross-over of the previous two modes: If the device requests DMA, the DMA controller obtains the bus and starts sending/requesting multiple bytes/words just like in block mode, but the device can interrupt the transfer any time it likes by stopping to request DMA. In that case, the current transfer gets finished and the DMA controller releases the bus, even if it still has some bytes to transfer.
So, assuming the device wants to transfer a certain number of bytes (or words, but ignore that for now), and the DMA controller is set up for exactly that size, and the device keeps its DMA request line active all the time, it will flawlessly work with all modes, even without knowing what mode the DMA controller is in! The DMA controller is quite primitive. It can not interrupt a transfer if a higher priority request comes in. As long as bus ownership for one DMA channel is established, the DMA controller will keep the bus assigned for that channel. This effectively means block transfers can not be interrupted, even for memory refresh! Also, demand mode transfers can only be interrupted if the device temporarily releases the DMA request line. Single mode transfers can be interrupted any time. As we know that we can get around 1 transfer per microsecond, the information that you may not "own" the ISA bus for longer than 15µs immediately shows that transfers exceeding 15 bytes or words may not be transferred in block mode, and in demand mode, the device needs to "cooperate" by periodically releasing the bus. In single mode, re-arbitration happens after every byte/word transferred, so there is no issue with RAM refresh in single mode.
WSS-type cards are meant to be used in demand mode. The AD1848 (or later compatible chips) and the SB16 have a FIFO the card, and they transfer a couple of bytes or word in demand mode, until the FIFO is full, and then release the bus until the FIFO is nearly empty. This is more efficient than arbitrating for the ISA bus for every byte/word. The sound chip that is meant to interface demand mode will not notice if the DMA controller is in single mode instead - except for the worse performance delivered in single mode.
So, ISA DMA does put a clearly relevant burden on the ISA bus, which will reduce the bandwidth available to writing to ISA graphics card (which is not good for games). On the other hand, the comparison with simple LPT DACs is nonsense. Using a device like that in the background means that for every sample the processor will get interrupted, needs to save the current execution address and processor state, look up the timer interrupt handler, jump there, push some registers to the stack, output a sample to the DAC, pop the registers stored manually and then return back to the interrupted task. This is clearly more overhead even than DMA in single byte mode for 16-bit stereo sound. Things might turn against DMA if the parallel port sound device has a FIFO, so you don't have to fill every single byte. The Disney Sound Source works this way, but as the parallel port usually uses 8-bit I/O at default wait states (which are set quite high usually to be PC compatible), I don't think you can beat DMA. Now, if EMM386 is loaded, things for non-DMA sound playback look even worse because EMM386 is basically a virtual machine monitor (or hypervisor) that runs the DOS tasks. As monitor programm, EMM386 receives all interrupts, and handles them in protected mode. When a DOS program is interrupted, the processor switches from virtual-8086-mode (a sub-mode of protected mode) to standard protected mode (which takes ~100 clock cycles) to have EMM386 handle the interrupt. EMM386 is then supposed to "forward" the interrupt to the virtualized DOS task, which means it has to switch back to virtual-8086-mode. I assume it can set up the processor in a way that returning from the interrupt (for example the timer interrupt to send samples to a parallel-port DAC) does not require another round trip through standard protected mode. Every kind of DMA is more efficient than having a high-frequency timer interrupt with EMM386 loaded.
In the end, (non-busmaster) ISA DMA was only used in cases where high throughput wasn't required, but background transfers were important. Sound cards fit this pattern perfectly. The programming model of ISA DMA is awful, and if EMM386 virtualizes RAM, it also needs to virtualize DMA, which is a very cumbersome operation. So nobody in the software industry liked ISA DMA. This explains why system designers were happy to get rid of the obselete ISA DMA scheme when they designed PCI systems. They did not expect the use case of "ISA compatible sound cards" being that important. Later on, some standards (eg. PC/PCI) were designed to give PCI sound cards access to the ISA DMA/IRQ system, and that's why PCI cards can't generally offer a nice Plug&Play experience including soundblaster compatibility.