How to get and validate DMA mode for disks in Windows 3.1?

Reply 20 of 69, by rasz_pl

Posted on 2024-03-01, 12:03

rasz_pl Offline

Rank l33t

Rank: l33t
Posts: 4432
Joined: 2017-06-04, 00:57

Thank you for info on MFM controllers. So after all there were three legitimate user of ISA DMA - floppy, MFM until AT, and sound cards because Creative lazy 😀.

mkarcher wrote on 2024-03-01, 08:01:

"end of process" indication provided by the 8237. This pin is connected to the ISA bus, to a pin called "TC". This signal goes active after the last byte has been transferred using DMA. This signal could have been used to trigger an IRQ

What would be the benefit from wiring it straight to irq instead of letting cards decide about irq?

mkarcher wrote on 2024-03-01, 08:01:

For example, think of a network card:

Funny you mention network cards as one of the good examples. I think Realteks first chip with DMA support was 8139 https://people.freebsd.org/~wpaul/RealTek/3.0/if_rl.c :
"The 8139 supports bus-master
* DMA, but it has a terrible interface that nullifies any performance
* gains that bus-master DMA usually offers.
* Each transmit frame must be in a contiguous buffer, aligned
* on a longword (32-bit) boundary. This means we almost always have to
* do mbuf copies in order to transmit a frame, except in the unlikely
* case where a) the packet fits into a single mbuf, and b) the packet
* is 32-bit aligned within the mbuf's data area. The presence of only
* four descriptor registers means that we can never have more than four
* packets queued for transmission at any one time.
* Reception is not much better. The driver has to allocate a single large
* buffer area (up to 64K in size) into which the chip will DMA received
* frames. Because we don't know where within this region received packets
* will begin or end, we have no choice but to copy data from the buffer
* area into mbufs in order to pass the packets up to the higher protocol
* levels."

Quality stuff. It only got worse with released 10 years later 1Gbit RTL8111B requiring alignment on 256 byte boundaries 😀 we stopped noticing only because CPUs got really good at moving data at ram speeds.

mkarcher wrote on 2024-03-01, 08:01:

Good luck doing this with 3rd-party DMA.

Iv been doing similar stuff for years .. on microcontrollers 😀 Chained DMA. Googling gives this IBM patent from 2005 https://patents.google.com/patent/US20030229733A1/en but Im pretty sure Iv played with it way earlier than that using PICs.
Here STM description https://docs.kernel.org/arch/arm/stm32/stm32- … a-chaining.html
Here a multi channel PWM controller implemented purely with chained DMA on rp2040 https://gregchadwick.co.uk/blog/playing-with-the-pico-pt2/
ESP32 https://docs.espressif.com/projects/esp-idf/e … c_tiPK7uint8_tb
etc etc. TLDR: DMA transfer directly reprogramming consecutive DMA transfer. Peripherals fully controlling flow of data without being allowed to own the bus.

mkarcher wrote on 2024-03-01, 08:01:

As the people doing the shady stuff you list are not the engineers that could have specified a DMA engine for PCI, I don't think the task of doing "capitalism bullshit" and the task of designing an advanced bus system competed for the same resources.

Intel was, and is again with Gelsinger back, famously run by engineers. Cant play ignorance with Andy Grove at the wheel 😀

https://github.com/raszpl/sigrok-disk FM/MFM/RLL decoder
https://github.com/raszpl/FIC-486-GAC-2-Cache-Module (AT&T Globalyst)
https://github.com/raszpl/386RC-16 ram board
https://github.com/raszpl/440BX Reference Design adapted to Kicad

Reply 21 of 69, by douglar

Posted on 2024-03-01, 13:25

douglar Offline

Rank l33t

Rank: l33t
Posts: 2758
Joined: 2019-11-04, 15:37
Location: Ohio, USA

rasz_pl wrote on 2024-03-01, 12:03:

Thank you for info on MFM controllers. So after all there were three legitimate user of ISA DMA - floppy, MFM until AT, and sound cards because Creative lazy 😀.
...
Intel was, and is again with Gelsinger back, famously run by engineers. Cant play ignorance with Andy Grove at the wheel 😀

This guy blames IBM, not Intel--

https://wiki.osdev.org/ISA_DMA

The idea behind DMA is that you can set up a 'channel' with an address pointing into memory and the length of the data to be transferred. Once set up, the CPU can tell the peripheral owning the channel to do whatever it is supposed to do (e.g. read a sector). Then the CPU can go do something else. When the memory bus isn't being used by the CPU, the DMA chip takes over and transfers data between the peripheral and memory without involving the CPU. When the transfer is complete (e.g. an entire sector has been sent to the floppy drive) the DMA chip then signals that it is finished. The DMA chip can even signal if it has run out of data, allowing the system to locate the next block of data to transfer on the same DMA transaction. DMA can improve the speed of a system quite a bit and was borrowed by Intel (who designed the DMA controller chip) from the old 1960s mainframes which had DMA channels for all devices (CPUs weren't all on a single chip and very slow back then).

Of course all good ideas can have downsides and while Intel can't really be blamed for what is about to be described, IBM certainly can.

In the beginning there was a PC, but the PC was slow. IBM looked down from the heavens and said "Slap on a DMA controller -- that should speed it up." IBM's heart was in the right place; its collective brains were elsewhere as the DMA controller never met the needs of the system. The PC/AT standard contains 2 Intel 8237A DMA chips, connected as Master/Slave. The second chip is Master, and its first line (Channel 4) is used by the first chip, which is Slave. (This is unlike the interrupt controller, where the first chip is Master.) The 8237A was designed for the old 8080 8-bit processor and this is probably the main reason for so many DMA problems. The 8088 and 8086 processors chosen by IBM for its PC were too advanced for the DMA controller.

Previously it was mentioned that a DMA controller is able to signal completion and even ask for more information. Unfortunately this would make expansion slots too big, so IBM left all of the connections to the DMA chips off. The only time you know when a transfer is complete is for a peripheral to signal an interrupt. This implies that all peripherals using an ISA DMA channel are limited to no more than 64 KB transfers for fear of upsetting the DMA controller.

Even with the PC/AT, IBM began bypassing the ISA DMA used in the PC/XT and used ATA PIO Mode for the hard disk. This was because of the 64 KB limitations outlined above and the fact that the 286 processor could perform 16 bit transactions at 6 MHz. Even the ISA bus could run at a speed of up to 12 MHz, far faster than the 4.77 MHz the DMA controller was running at.

Expansion card designers were also upset with DMA's lack of capabilities, noticeably 'Hard-Card' hard disk expansion card manufacturers who depended on the speed of data transfer.

To get around the limitations of the 'on board' DMA controller, expansion card manufacturers began to put their own DMA controllers on their expansion cards. They functioned exactly the same way as the 'on board' DMA, 'stealing memory bus cycles' when the processor wasn't looking and thus improving the performance of the system as a whole. These "ISA Bus Masters" are still usually limited to the lower 16 MiB of memory, but do not have the 4.77 MHz issue. This trend continued through the creation of the PCI bus, which eventually entirely replaced the ISA bus in PCs.

Reply 22 of 69, by mkarcher

Posted on 2024-03-01, 14:54

mkarcher Offline

Rank l33t

Rank: l33t
Posts: 3800
Joined: 2019-01-19, 16:29
Location: Germany

rasz_pl wrote on 2024-03-01, 12:03:

Thank you for info on MFM controllers. So after all there were three legitimate user of ISA DMA - floppy, MFM until AT, and sound cards because Creative lazy 😀.

Actually, ISA DMA is quite useful for low-rate background transfers of a single stream. Low-rate means that REP OUTSW/INSW is not a viable option. There are other users than just Creative Labs sound cards, for example Windows Sound System sound cards (the AD1848 stuff), industrial analog data acquisition cards (basically precision multi-channel sound cards that go down to DC) and the parallel port in ECP mode.

rasz_pl wrote on 2024-03-01, 12:03:

mkarcher wrote on 2024-03-01, 08:01:

"end of process" indication provided by the 8237. This pin is connected to the ISA bus, to a pin called "TC". This signal goes active after the last byte has been transferred using DMA. This signal could have been used to trigger an IRQ

What would be the benefit from wiring it straight to irq instead of letting cards decide about irq?

As long as the card doesn't need to signal interrupts except for the "DMA done" case, this would allow the card to not use an IRQ line at all. The DMA controller contains a status port indicating which channels finished (IIRC a read-once self resetting status bit), so a single IRQ handler can be used to dispatch "DMA done" notifications for different cards. In practice, many cards that used DMA could signal interrupts for other reasons (like floppy/hard drive controllers indicating that a seek command completed, or the Sound Blaster indicating that a "play silence" command completed), so the idea of simplifying the card by not having the card deal with IRQ handling is likely not viable for a lot of DMA consumers. On the other hand, it definitely would be viable for simple sound cards.

rasz_pl wrote on 2024-03-01, 12:03:

mkarcher wrote on 2024-03-01, 08:01:

For example, think of a network card:

Funny you mention network cards as one of the good examples. I think Realteks first chip with DMA support was 8139 https://people.freebsd.org/~wpaul/RealTek/3.0/if_rl.c :

I know the annoying implementation of bus mastering in the RTL8139. And while this is the first Realtek chip with bus mastering, it definitely is not the first network chip with bus mastering. For example, the 3com 3c900 series 10-MBit cards, and even their 3c59x predecessors already did bus mastering in a better way. The Realtek chips were not designed to deliver maximal performance or a good driver writing experience, they were designed to be cheap. Looking at Realtek stuff is often a good way to find out what is the cheapest way to implement something in a way that works "good enough" for most customers.

rasz_pl wrote on 2024-03-01, 12:03:

mkarcher wrote on 2024-03-01, 08:01:

Good luck doing this with 3rd-party DMA.

Iv been doing similar stuff for years .. on microcontrollers 😀 Chained DMA. Googling gives this IBM patent from 2005 https://patents.google.com/patent/US20030229733A1/en but Im pretty sure Iv played with it way earlier than that using PICs.
Here STM description https://docs.kernel.org/arch/arm/stm32/stm32- … a-chaining.html

This still does not let the peripheral decide which of multiple pending buffers (like one buffer to receive the next block of the ISO image from the hard drive, a second buffer that contains a slightly earlier block of that ISO image that should be sent to the CD recorder next, and a third buffer containing the data to be written to the paging file) will be serviced next. A SCSI controller can easily have these three transactions all pending as "disconnected transactions" on the SCSI bus, with the two hard drive transactions being managed by tagged command queueing. Only when a target re-selects the controller and (if necessary) identifies the current queue tag, the I/O device (SCSI controller in this example) knows which buffer needs to be read from / written to. In a bus-master design, the I/O device can service the appropriate buffer on its own. In a typical third-party DMA design, the I/O device needs the CPU to tell the DMA controller which of multiple possible "next transactions" is required at the moment. Even if you can control the DMA controller using a second DMA channel, you still can't control the DMA controller from a device, unless that device can perform bus-mastering to program the DMA controller.

Reply 23 of 69, by douglar

Posted on 2024-03-01, 21:56

douglar Offline

Rank l33t

Rank: l33t
Posts: 2758
Joined: 2019-11-04, 15:37
Location: Ohio, USA

Seems like in order to get an IDE drive is working in multi word DMA mode, you would have to have VLB or PCI controller, yes?

https://en.wikipedia.org/wiki/WDMA_(computer)

While it looks like "DMA 0" could be implemented on an ISA bus, seems unlikely that you can get multi word DMA 0 working through ISA in practice because you would need to have either a fancy ISA controller that can bus master which are so rare that they might as well as not exist or perhaps an IDE drive that can work as an ISA bus master, which if someone ever made one of those, they didn't leave a written record about it. And even if you had all that stuff, it's not clear that it would be any faster than PIO in 32 bit protected mode because of the addressing issues of the ISA bus.

Seems like the Promise 20630c VLB controller must have some sort of DMA controller in it that lets it do 32 bit DMA over VLB and it can do DMA in real mode and protected mode.

I can guess when I've got multi word DMA working on a Promise 630c VLB controller because 1) first I had to put a special parameter on the driver line and 2) I see an increase in performance increase by about 60-80% when it works or I get feed back indicating that it didn't work ( text message / windows crashes on boot)

PCI would be a little trickier. I'll have to do a PCI build next week.

edit: typos

Last edited by douglar on 2024-03-02, 03:19. Edited 1 time in total.

Reply 24 of 69, by Riikcakirds

Posted on 2024-03-01, 23:18

Riikcakirds Offline

Rank Member

Rank: Member
Posts: 303
Joined: 2020-11-02, 18:18

Kahenraz wrote on 2024-02-26, 00:41:

In Windows 98, it's as easy as checking a box in device manager. Is DMA mode for disks available in Windows 3.1? Does it require DOS or Windows drivers? And what tests can I run to validate that it's actually working?

I used DMA, or MWDMA-2 pre Win95 with an MSI 5119 motherboard in March 1995. This was with a Western Digital Caviar AC-2700 730MB.
No driver needed in Dos or Win3.1, the bios had the option to enable it called "PCI IDE BUSMASTER" (not to be confused with the PIO/DMA options that would not actually enable DMA in Dos).
In Win3.1 I did not use 32 bit disk access as it was faster to use the bios . It makes a difference even in Dos but you notice much more in Win3.1

Problem was the WD Caviar only support MWDMA-1 so I got around 13MB/S instead of the 16MB/S the chipset supported. PIO mode 4 was slow at around 4-5MB max.
A lot of Bios' did not support this for some reason. Had the option to set PIO mode or DMA but it would not actually set up DMA mode in DOS.
So users coud use the 430FX chipset a good 6 months before Win95 came out and have MWDMA in Dos/Win3.1 if your motherboard bios supported it.

Reply 25 of 69, by rasz_pl

Posted on 2024-03-02, 03:24

rasz_pl Offline

Rank l33t

Rank: l33t
Posts: 4432
Joined: 2017-06-04, 00:57

douglar wrote on 2024-03-01, 13:25:

rasz_pl wrote on 2024-03-01, 12:03:

Thank you for info on MFM controllers. So after all there were three legitimate user of ISA DMA - floppy, MFM until AT, and sound cards because Creative lazy 😀.
...
Intel was, and is again with Gelsinger back, famously run by engineers. Cant play ignorance with Andy Grove at the wheel 😀

This guy blames IBM, not Intel--

My Intel comment was in regards to PCI DMA mastering. Sorry, discussion is getting tangled up 😀 Yes, ISA DMA deficiencies are on IBM, but they were working with off the shelf chips and this is the best they could get at the time.

douglar wrote on 2024-03-01, 13:25:

>Even with the PC/AT, IBM began bypassing the ISA DMA used in the PC/XT and used ATA PIO Mode for the hard disk. This was because of the 64 KB limitations outlined above and the fact that the 286 processor could perform 16 bit transactions at 6 MHz. Even the ISA bus could run at a speed of up to 12 MHz, far faster than the 4.77 MHz the DMA controller was running at.

I think ISA DMA in AT ran at slightly faster clock? and some later chipset integrated legacy 8237 layer solutions with clever extensions letting them overcome not only 64KB barrier, but also 16MB - again I read it on https://www.os2museum.com/wp/, what an absolute treasure that blog is.

douglar wrote on 2024-03-01, 13:25:

>To get around the limitations of the 'on board' DMA controller, expansion card manufacturers began to put their own DMA controllers on their expansion cards. They functioned exactly the same way as the 'on board' DMA, 'stealing memory bus cycles' when the processor wasn't looking and thus improving the performance of the system as a whole.

Oh if only that was the case. When ISA yields the bus to a Bus Master it also halts CPU, CPU cant even run from ~~its internal~~ L2 cache. That was the case all the way to 486 and maybe even Pentium? This is the tragedy of allowing external bus masters on your computer main data artery, you have to stop everything. No sweat on XT, big problem on xx-xxx MHz platforms.

mkarcher wrote on 2024-03-01, 14:54:
and the parallel port in ECP mode.

I had it in my head that ECP was using its own dma engine and not reusing 8237 as it was introduced at a time 16MB on the desktop was already feasible 🙁 Bad IBM, bad /rubs IBM nose in urine stain on the carpet

mkarcher wrote on 2024-03-01, 14:54:

This still does not let the peripheral decide which of multiple pending buffers (like one buffer to receive the next block of the ISO image from the hard drive, a second buffer that contains a slightly earlier block of that ISO image that should be sent to the CD recorder next, and a third buffer containing the data to be written to the paging file) will be serviced next.

Even if you can control the DMA controller using a second DMA channel, you still can't control the DMA controller from a device, unless that device can perform bus-mastering to program the DMA controller.

Thats what its for among other things. Device sets up its own transfers by using DMA itself to program consecutive DMA transaction. One channel transferring setup instructions (chained descriptor list) for next transaction, followed by that transaction, end of that transaction flips back to setup mode. Device still controls direction, size and source/destination of transfers, but its the central DMA controller being in full command of
how - build-in fifos, write coalescing
when - not much change here I guess
if at all - much cheaper and faster to perform IOMMU at the transaction setup stage than per every single granular bus access

douglar wrote on 2024-03-01, 21:56:

Seems like the Promise 20630c VLB controller must have some sort of DMA controller in it that lets it do 32 bit DMA over VLB and it can do DMA in real mode and protected mode.

hehe we already talked about this one Re: List of VLB IDE Controllers
Re: List of VLB IDE Controllers
Re: Is there a VLB card that supports > Mode 3 or UDMA support?
>linux/blob/master/drivers/ata/pata_legacy.c
>>The 20620 DMA support is weird being DMA to controller and PIO’d to the host and not supported.

its cheating with FIFO and PIOs. Clever, cheap and least problematic.

Last edited by rasz_pl on 2024-03-03, 01:47. Edited 6 times in total.

https://github.com/raszpl/sigrok-disk FM/MFM/RLL decoder
https://github.com/raszpl/FIC-486-GAC-2-Cache-Module (AT&T Globalyst)
https://github.com/raszpl/386RC-16 ram board
https://github.com/raszpl/440BX Reference Design adapted to Kicad

Reply 26 of 69, by kingcake

Posted on 2024-03-02, 03:27

kingcake Offline

Rank Oldbie

Rank: Oldbie
Posts: 824
Joined: 2018-03-20, 02:09

rasz_pl wrote on 2024-03-02, 03:24:
a Bus Master it also halts CPU, CPU cant even run from its internal cache. That was the case all the way to 486 and maybe even Pentium? This is the tragedy of allowing external bus masters on your computer main data artery, you have to stop everything. No sweat on XT, big problem on xx-xxx MHz platforms.

When this happens the ISA card then has to produce the memory refresh signals, too, correct? I've read this is also a big source of ISA DMA locking up systems.

Reply 27 of 69, by jakethompson1

Posted on 2024-03-02, 03:38

jakethompson1 Offline

Rank l33t

Rank: l33t
Posts: 2077
Joined: 2015-11-17, 04:16

Riikcakirds wrote on 2024-03-01, 23:18:

A lot of Bios' did not support this for some reason. Had the option to set PIO mode or DMA but it would not actually set up DMA mode in DOS.
So users coud use the 430FX chipset a good 6 months before Win95 came out and have MWDMA in Dos/Win3.1 if your motherboard bios supported it.

I suspect it's to do with physical vs. virtual addresses when a 386 memory manager, or Windows protected mode, is running.
MS-DOS Virtual DMA Services is the "proper" way to deal with this. I wonder if they tried to cram that in the BIOS.
Lacking that, you would need DEVICE=SMARTDRV.EXE /DOUBLE_BUFFER (aka DBLBUFF.SYS in Windows 95) to deal with it.
There are period writeups about it from Microsoft since SCSI cards of the time did bus mastering and encountered the issue long before PCI IDE bus mastering.
I wonder if part of the appeal of IDE and PIO was to not deal with it.

Reply 28 of 69, by rasz_pl

Posted on 2024-03-02, 03:47

rasz_pl Offline

Rank l33t

Rank: l33t
Posts: 4432
Joined: 2017-06-04, 00:57

kingcake wrote on 2024-03-02, 03:27:

rasz_pl wrote on 2024-03-02, 03:24:
a Bus Master it also halts CPU, CPU cant even run from its internal cache. That was the case all the way to 486 and maybe even Pentium? This is the tragedy of allowing external bus masters on your computer main data artery, you have to stop everything. No sweat on XT, big problem on xx-xxx MHz platforms.

When this happens the ISA card then has to produce the memory refresh signals, too, correct? I've read this is also a big source of ISA DMA locking up systems.

I think hidden refresh in never chipsets bypasses that problem.
One other thing I forgot is how that very low ISA DMA speed affects overall computer speed, transferring samples to ISA sound card temporarily freezes CPU! This is why ISA sound cards are slow - big impact of enabling sound on FPS in doom/quake for example - and why switching for PCI one when playing under Windows was a good idea. You might think oh "It's only at max 180KB/s, Michael. What could it cost, $10?', but ISA DMA runs at that wretched 4MHz and takes 4 cycles so it comes out as ISA sound card freezing the bus for hmm 20% of the time? someone correct me on that. Would be great if someone (again with the "wont somebody else do this thing I want" 😜) experimented with a scope to see at which point this stopped being the case. Definitely happened all the way to 486, probably at least some socket 5-7 platforms, with Pentium 2 ISA was just some external appendage and shouldnt interfere, but who knows (not me 😜).

EDIT: above is highly questionable on Pentium and higher platforms. Needs investigating.

Last edited by rasz_pl on 2024-03-03, 01:48. Edited 1 time in total.

https://github.com/raszpl/sigrok-disk FM/MFM/RLL decoder
https://github.com/raszpl/FIC-486-GAC-2-Cache-Module (AT&T Globalyst)
https://github.com/raszpl/386RC-16 ram board
https://github.com/raszpl/440BX Reference Design adapted to Kicad

Reply 29 of 69, by kingcake

Posted on 2024-03-02, 04:14

kingcake Offline

Rank Oldbie

Rank: Oldbie
Posts: 824
Joined: 2018-03-20, 02:09

rasz_pl wrote on 2024-03-02, 03:47:

kingcake wrote on 2024-03-02, 03:27:

rasz_pl wrote on 2024-03-02, 03:24:
a Bus Master it also halts CPU, CPU cant even run from its internal cache. That was the case all the way to 486 and maybe even Pentium? This is the tragedy of allowing external bus masters on your computer main data artery, you have to stop everything. No sweat on XT, big problem on xx-xxx MHz platforms.

When this happens the ISA card then has to produce the memory refresh signals, too, correct? I've read this is also a big source of ISA DMA locking up systems.

I think hidden refresh in never chipsets bypasses that problem.
One other thing I forgot is how that very low ISA DMA speed affects overall computer speed, transferring samples to ISA sound card temporarily freezes CPU! This is why ISA sound cards are slow - big impact of enabling sound on FPS in doom/quake for example - and why switching for PCI one when playing under Windows was a good idea. You might think oh "It's only at max 180KB/s, Michael. What could it cost, $10?', but ISA DMA runs at that wretched 4MHz and takes 4 cycles so it comes out as ISA sound card freezing the bus for hmm 20% of the time? someone correct me on that. Would be great if someone (again with the "wont somebody else do this thing I want" 😜) experimented with a scope to see at which point this stopped being the case. Definitely happened all the way to 486, probably at least some socket 5-7 platforms, with Pentium 2 ISA was just some external appendage and shouldnt interfere, but who knows (not me 😜).

I believe this is why some games used ADPCM, which could cut the bitrate by half or more. AFAIK only ISA soundblasters and ess audiodrives supported this.

Reply 30 of 69, by douglar

Posted on 2024-03-02, 17:37

douglar Offline

Rank l33t

Rank: l33t
Posts: 2758
Joined: 2019-11-04, 15:37
Location: Ohio, USA

rasz_pl wrote on 2024-03-02, 03:24:
hehe we already talked about this one Re: List of VLB IDE Controllers Re: List of VLB IDE Controllers Re: Is there a VLB card t […]
Show full quote

douglar wrote on 2024-03-01, 21:56:

Seems like the Promise 20630c VLB controller must have some sort of DMA controller in it that lets it do 32 bit DMA over VLB and it can do DMA in real mode and protected mode.

hehe we already talked about this one Re: List of VLB IDE Controllers
Re: List of VLB IDE Controllers
Re: Is there a VLB card that supports > Mode 3 or UDMA support?
>linux/blob/master/drivers/ata/pata_legacy.c
>>The 20620 DMA support is weird being DMA to controller and PIO’d to the host and not supported.
its cheating with FIFO and PIOs. Clever, cheap and least problematic.

Thanks for pointing that out. That's very interesting to me.

I recently got a BT-420C, a caching VLB controller the third ISA IDE port of ATAPI access, which is identical to Tekram VLB DC-680C, even the BIOS, except for a few stickers. It transfers from the cache at an even 20MB/s with a 33Mhz bus. Any idea of that's doing PIO?

Reply 31 of 69, by mkarcher

Posted on 2024-03-02, 17:52

mkarcher Offline

Rank l33t

Rank: l33t
Posts: 3800
Joined: 2019-01-19, 16:29
Location: Germany

douglar wrote on 2024-03-01, 13:25:
This guy blames IBM, not Intel-- […]
Show full quote

rasz_pl wrote on 2024-03-01, 12:03:

Thank you for info on MFM controllers. So after all there were three legitimate user of ISA DMA - floppy, MFM until AT, and sound cards because Creative lazy 😀.
...
Intel was, and is again with Gelsinger back, famously run by engineers. Cant play ignorance with Andy Grove at the wheel 😀

This guy blames IBM, not Intel--

[...]

In the beginning there was a PC, but the PC was slow. IBM looked down from the heavens and said "Slap on a DMA controller -- that should speed it up." IBM's heart was in the right place; its collective brains were elsewhere as the DMA controller never met the needs of the system. The PC/AT standard contains 2 Intel 8237A DMA chips, connected as Master/Slave. The second chip is Master, and its first line (Channel 4) is used by the first chip, which is Slave. (This is unlike the interrupt controller, where the first chip is Master.) The 8237A was designed for the old 8080 8-bit processor and this is probably the main reason for so many DMA problems. The 8088 and 8086 processors chosen by IBM for its PC were too advanced for the DMA controller.

Reading this comment, you should keep in mind that this paragraph applies to the AT. DMA in the PC/XT was mostly an adaequate solution for considering the general performance of the system. The DMA controller in the PC/XT did actually speed it up. The 64K pagination was annoying on the XT, but nothing that couldn't be worked around in software. I guess (pure speculation) IBM originally intended to run the DMA subsystem in the AT at the system clock rate of 6MHz, but discovered that some XT hardware designed for 4.77 MHz did not work reliably at the DMA timings that result from clocking the DMA controller at 6MHz, or IBM could not source 8237A chips specified for more than 5 MHz. There were faster chips at some time, called 82C37A (CMOS process instead of NMOS process), but I didn't research market availability of these chips with the introduction of the AT.

The primary issue of the AT is not that the 8088 is "too advanced" for the 8237 (while this statement is a valid opinion, it's not primary), but that the 80286 (not the 8086) with the REP INSW "DMA substitute" and its frontside bus allowing a bus cycle per 2 clocks (in the AT: there is 1WS, so it is 3 clocks) instead of the 8088 that required 4 clocks is not sufficiently assisted by a DMA controller operating at half the bus clock. Furthermore, with the advancement of technology, the typical data set sizes rose, too and the 64K/128K barrier of the DMA pagination started to hurt harder than it did in the PC. Starting with the 80386 and virtual memory / paging support, the statement of "the 8237(A) is not advanced enough to be useful for more demanding applications than playing back digitized sound" got true - but that is like 7 years after IBM desinged the 8237 into the IBM PC.

douglar wrote on 2024-03-01, 21:56:

Seems like in order to get an IDE drive is working in multi word DMA mode, you would have to have VLB or PCI controller, yes?

https://en.wikipedia.org/wiki/WDMA_(computer)

While it looks like "DMA 0" could be implemented on an ISA bus, seems unlikely that you can get multi word DMA 0 working through ISA in practice because you would need to have either a fancy ISA controller that can bus master which are so rare that they might as well as not exist or perhaps an IDE drive that can work as an ISA bus master, which if someone ever made one of those, they didn't leave a written record about it.

IDE drives could work perfectly as targets for 3rd-party DMA driven by the 8237A DMA controller. The "single-word DMA" mode is specified in a way that it is compatible with the "single" transfer mode of the 8237A, and the "multi-word DMA" mode is specified to be compatible with the "demand" transfer mode of the 8237A. A shortcoming of the classic 8237A design is that once a device arbitrated for DMA, the 8237A doesn't give up the bus for exchanging data with this device until the "transfer" ends. While in single mode, "the transfer ends" after every transferred byte (or word), in block and demand mode, the transfer doesn't end before the block size is exhausted or (in the case of demand mode) the requesting device voluntary relinquishes the bus request. On the other hand, the AT needs to arbitrate the bus to the refresh controller (which no longer is part of the DMA controller, as it was on the PC/XT) every 16µs. Even at the maximum permitted MWDMA0 rate, a single sector takes around 120µs, so the classic ISA bus requires that a DMA capable I/O device periodically drops DRQ to not disturb RAM refresh. This might be "missing" from the IDE specification.

The point of 3rd-party DMA is that the coordination of bus signals is performed by the DMA controller, not the I/O device. This in turn implies that the DMA controller can not be "too slow" to cooperate with a device unless we are talking about real-time transfers and buffer over/underuns in the DMA target. At 4MHz DMA clock with compressed timing disabled (the typical AT timing), demand mode results in a cycle time of 1µs. The "official maximum" for the ISA bus, 8.33MHz, with the DMA controller clocked at 4.16MHz would result in a cycle time of 960ns, which is still twice as long as permitted by the IDE specification. Nevertheless, A 2.1MB/s transfer between an IDE drive and an ISA mainboard at 8.33MHz ISA / 4.16MHz DMA clock would have been possible with the drive configured to "MWDMA 0".

EISA improved a lot on the ISA DMA controller by providing modes with tighter timing as long as the "memory" affected by the DMA transfer is standard system memory, providing scatter/gather lists, allowed addressing of the whole 4G address space and eliminating the 64K/128K page boundaries. All of these enhancements are also available for transfers with ISA cards, the only mode of the EISA DMA controller that requires an EISA target is the burst mode, because that mode uses burst handshaking that is not part of the ISA protocol. The EISA implementation of the ISA DMA controller could have helped ISA DMA a lot, but it seems it never got enough market penetration that it was deemed useful to design ISA cards or ISA card drivers specifically meant to take advantage of the EISA DMA controller. One of the curiosities around EISA DMA is the south bridge used in ISA Saturn / Saturn II chipsets, the Intel 82378ZB System I/O supports all the EISA DMA enhancements except burst mode in a completely non-EISA system. Contrary to the original 8237A, the new DMA timing modes introduced with EISA allowed a higher-priority master to take over during demand-mode transfer, so the device is no longer required to repeatedly relinquish DMA itself to keep the system alive. For a well-implemented DMA target, temporarily losing arbitration during a demand-mode transfer just looks like extra wait states, so no extra measures need to be implemented on the device to make use of the interruptability. On the other hand, in the enhanced EISA modes, a device must not depend on a maximum time between transfers after it won DMA arbitration.

Newer PCI-to-EISA bridges still implement "quick DMA timings" on the ISA bus (called "type-F DMA"), which would yield a cycle time of 360ns at 8.33MHz ISA clock, but skip implementing all the niceties that were implemented in the EISA DMA controller. Actually, type-F DMA is faster than the two ISA-compatible accelerated DMA modes provided by an EISA DMA controller (called type-A and type-B), and the Saturn ISA south bridge also supports this type-F mode instead of the EISA burst mode.

rasz_pl wrote on 2024-03-02, 03:24:

When ISA yields the bus to a Bus Master it also halts CPU, CPU cant even run from its internal cache. That was the case all the way to 486 and maybe even Pentium? This is the tragedy of allowing external bus masters on your computer main data artery, you have to stop everything. No sweat on XT, big problem on xx-xxx MHz platforms.

Do you have any reliable source for this claim? It sounds wrong. While it is true that neither the CPU nor the refresh controller can actively get hold of the ISA bus while a master owns the ISA bus (at least, a master can trigger a refresh cycle while it owns the bus, temporarily relinquishing control to the refresh controller), and while it also is true that most AT/386/486 chipsets claimed the CPU frontside bus while ISA bus mastering is in progress, the claim that the CPU can not run code from L1 cache seems wrong, and I don't remember reading anything in the 486 databook even explaining a way to inhibit execution of cached instructions, short of stopping the CPU clock (which is not permitted on early 486 processors at all). As 286 and 386 processors do not have any kind of L1 cache, those processors will definitely stop doing anything useful during ISA bus mastering as soon as they perform any kind of memory access or the prefetch queue is drained.

When the (ISA) bus is owned by a non-CPU master, the CPU needs to get informed about memory addresses touched by that master to maintain cache coherency. The complexity of cache coherency got even higher as CPUs started to have WB L1 cache. As the cache coherency protocol uses the same address lines as the CPU uses for addressing memory, performing cache coherency ("snoop cycles") obviously claims some bandwidth of the frontside bus, but as running cache coherency cycles just requires the CPU to give up the address bus, and not a full bus arbitration, this is a "lightweight" operation and at least for classic ISA DMA timings, it would make sense to not block the frontside bus during the full ISA DMA transaction.

Reply 32 of 69, by mkarcher

Posted on 2024-03-02, 17:55

mkarcher Offline

Rank l33t

Rank: l33t
Posts: 3800
Joined: 2019-01-19, 16:29
Location: Germany

douglar wrote on 2024-03-02, 17:37:

I recently got a BT-420C, a caching VLB controller the third ISA IDE port of ATAPI access, which is identical to Tekram VLB DC-680C, even the BIOS, except for a few stickers. It transfers from the cache at an even 20MB/s with a 33Mhz bus. Any idea of that's doing PIO?

While I don't own a DC-680C, I do know the architecture of the DC-680T quite well. Except for the CD port, the architecture should be identical. The cache-to-host transfers are performed using 32-bit-PIO (REP INSD).

Reply 33 of 69, by BitWrangler

Posted on 2024-03-02, 17:57

BitWrangler Offline

Rank l33t++

Rank: l33t++
Posts: 8630
Joined: 2017-10-11, 00:55
Location: Ontario

DMA under 3.x experience back in the day kinda makes me want to label it "Disk Mangling Accessory" in that when it becomes desirable to have high disk speed, you are also messing with high performance applications relying on a shaky pile of stuff like runtime libraries (VB and C etc) WinGs, Win32s, and it's almost inevitable that you get a windows crash in an average day.... with that... windows was bound to be in the middle of writing something, because it always is, and messed up your FAT and left a bunch of orphaned blocks and clusters. Scandisk time after every windows hiccup... and you'll be going about a month before complete reinstall because it got too messed up.

DMA in DOS can kinda work okay in protected mode games, not a lot of use elsewhere, and if you get a crash in those it was more likely reading than writing, so a bit less likely to give you disk corruption.

Having seen the reality of it in the mid 90s, I'll say it's why Win95 is like it is, needs to strongarm compliance so you can have things like DMA without frequent disk mangling.

Yes, I'm sure it works perfectly for you in the artificial environment of three things plus a benchmark installed.

Unicorn herding operations are proceeding, but all the totes of hens teeth and barrels of rocking horse poop give them plenty of hiding spots.

Reply 34 of 69, by douglar

Posted on 2024-03-02, 18:39

douglar Offline

Rank l33t

Rank: l33t
Posts: 2758
Joined: 2019-11-04, 15:37
Location: Ohio, USA

BitWrangler wrote on 2024-03-02, 17:57:

Yes, I'm sure it works perfectly for you in the artificial environment of three things plus a benchmark installed.

Perfect? You over estimate my success immensely! "This card fits in that slot but not this slot because it hits the CPU" "That card fits in this slot but not in that slot because it hits the keyboard controller" "This controller won't detect the hard drive correctly when paired with the WDC VLB video card but works with an ISA Trident" "That storage device won't work with the promise DMA driver but this one will" "This storage device works with the DMA driver, but reports too many cylinders for DOS 6.22 fdisk" "If I leave this card in the motherboard overnight, the battery will be dead in the morning" I know, there are work arounds for all of those things, but all the trouble shooting is draining. Sometimes days, I'm just thrilled to see DOS to boot again, artificial environment or not!

Reply 35 of 69, by Jo22

Posted on 2024-03-02, 18:50

Jo22 Offline

Rank l33t++

Rank: l33t++
Posts: 11653
Joined: 2009-12-13, 07:06
Location: Europe

If memory serves, Microsoft had hoped that hard disk vendors would start to develop some 32-Bit Disk Access drivers (aka 32BDA aka FastDisk drivers).

But the actual outcome wasn't that great.
Many drivers were being "dongled" to a specific controller or HDD type.

That's why I'm so glad about that MicroHouse driver, which works with generic IDE ports, too.

The DDOs (Dynamic Drive Overlays) of the time sometimes had a FastDisk driver bundled, too.

Strictly speaking, the "32-Bit" access is a bit misleading, I think.
What's really meant is that a Protected-Mode driver is at work.

Which in case of running "386 Enhanced-Mode" Windows means that it's also using i386/32-Bit instructions.

That's what sets 32-Bit File Access (HDD cache) and 32-Bit Disk Access (HDD driver) apart from classic SmartDrive (HDD cache) and DOS/BIOS (HDD "driver"):
Those are running in Real-Mode and use 16-Bit instructions, too.

Anyway, just saying. Most Windows 3.x fans already know this, I think.

Edit: Hm. Maybe this information is a bit more useful.
I've once managed to use the MicroHouse FastDisk driver under Windows 95.
It still worked fine, as far as I can tell.
So it seems that the underlying architecture of Windows 95 still was compatible with such drivers.
That standard esdi_506.pdr wasn't the only one around.

"Time, it seems, doesn't flow. For some it's fast, for some it's slow.
In what to one race is no time at all, another race can rise and fall..." - The Minstrel

//My video channel//

Reply 36 of 69, by douglar

Posted on 2024-03-02, 19:10

douglar Offline

Rank l33t

Rank: l33t
Posts: 2758
Joined: 2019-11-04, 15:37
Location: Ohio, USA

Jo22 wrote on 2024-03-02, 18:50:

That's why I'm so glad about that MicroHouse driver, which works with generic IDE ports, too.
….
That standard esdi_506.pdr wasn't the only one around.

Is this the driver that you are talking about?

http://vogonsdrivers.com/getfile.php?fileid=2070

Reply 37 of 69, by Jo22

Posted on 2024-03-02, 19:13

Jo22 Offline

Rank l33t++

Rank: l33t++
Posts: 11653
Joined: 2009-12-13, 07:06
Location: Europe

douglar wrote on 2024-03-02, 19:10:

Jo22 wrote on 2024-03-02, 18:50:

That's why I'm so glad about that MicroHouse driver, which works with generic IDE ports, too.
….
That standard esdi_506.pdr wasn't the only one around.

Is this the driver that you are talking about?

http://vogonsdrivers.com/getfile.php?fileid=2070

Maybe, yes. I got my copy years ago from here.
Could be that the version number isn't exactly same, not sure. But in principle, yes, that's the one.

"Time, it seems, doesn't flow. For some it's fast, for some it's slow.
In what to one race is no time at all, another race can rise and fall..." - The Minstrel

//My video channel//

Reply 38 of 69, by jakethompson1

Posted on 2024-03-03, 01:43

jakethompson1 Offline

Rank l33t

Rank: l33t
Posts: 2077
Joined: 2015-11-17, 04:16

mkarcher wrote on 2024-03-02, 17:52:

IDE drives could work perfectly as targets for 3rd-party DMA driven by the 8237A DMA controller. The "single-word DMA" mode is specified in a way that it is compatible with the "single" transfer mode of the 8237A, and the "multi-word DMA" mode is specified to be compatible with the "demand" transfer mode of the 8237A. A shortcoming of the classic 8237A design is that once a device arbitrated for DMA, the 8237A doesn't give up the bus for exchanging data with this device until the "transfer" ends. While in single mode, "the transfer ends" after every transferred byte (or word), in block and demand mode, the transfer doesn't end before the block size is exhausted or (in the case of demand mode) the requesting device voluntary relinquishes the bus request. On the other hand, the AT needs to arbitrate the bus to the refresh controller (which no longer is part of the DMA controller, as it was on the PC/XT) every 16µs. Even at the maximum permitted MWDMA0 rate, a single sector takes around 120µs, so the classic ISA bus requires that a DMA capable I/O device periodically drops DRQ to not disturb RAM refresh. This might be "missing" from the IDE specification.

Is it fair to say that the MWDMA feature of ATA didn't get much exercise prior to the Triton chipset?

Reply 39 of 69, by rasz_pl

Posted on 2024-03-03, 01:52

rasz_pl Offline

Rank l33t

Rank: l33t
Posts: 4432
Joined: 2017-06-04, 00:57

mkarcher wrote on 2024-03-02, 17:52:

Do you have any reliable source for this claim? It sounds wrong.
I don't remember reading anything in the 486 databook even explaining a way to inhibit execution of cached instructions, short of stopping the CPU clock (which is not permitted on early 486 processors at all).

Cunningham's Law in action! 😀

"the best way to get the right answer on the internet is not to ask a question; it's to post the wrong answer."

You are right! Embedded Intel486™ Processor Family Developer's Manual:
>The Intel486 processor does not cease internal activity during bus hold because the internal
cache satisfies the majority of bus requests.

My statement should be limited to "up to 486, cant run from L2/off chip cache during DMA"

mkarcher wrote on 2024-03-02, 17:52:

this is a "lightweight" operation and at least for classic ISA DMA timings, it would make sense to not block the frontside bus during the full ISA DMA transaction.

At least for few 486 motherboard diagrams I have (for example http://www.bitsavers.org/pdf/samsung/pc/98134 … Manual_1993.pdf) ISA address bus is directly connected to L2 cache. This means CPU must be HOLD/AHOLDed during ISA DMA. Intel Pentium chipsets keep ISA behind PIIX PCI-ISA bridge so those platforms should be able to run unfazed by ISA DMA.
Why do I remember ISA sound cards measurably slower than PCI ones back in the day? Is this another false memory of mine? 😀 I remember articles advising to upgrade to gain fps. Now looking at "What is the best sound card for Windows 98 Super Socket 7?"- PhilsComputerLab https://www.youtube.com/watch?v=TC01uiyuJxI this is not the case, at least not on this particular SS7 platform. VxD vs WDM drivers? https://www.coolcomputing.com/reviews/audio/pcivsisa.html at least slightly corroborates my memory 😀

https://github.com/raszpl/sigrok-disk FM/MFM/RLL decoder
https://github.com/raszpl/FIC-486-GAC-2-Cache-Module (AT&T Globalyst)
https://github.com/raszpl/386RC-16 ram board
https://github.com/raszpl/440BX Reference Design adapted to Kicad

Main menu