spacesaver wrote on 2025-02-27, 18:30:
This is pretty fascinating. I thought I knew a lot of computer architecture, but I didn't realize the ISA bus is so inefficient.
The ISA bus has been designed in a time in which cards were built from standard 74 series TTL logic chips, with maybe one higher integrated chip (like the 6845 CRTC on the MDA and CGA, the NEC µPD765 on the floppy controller). The bus is not optimized to be fast, it is optimized to be low effort to being interfaced to.
spacesaver wrote on 2025-02-27, 18:30:
Apparently, ISA doesn't have burst transfers! so those address setup and wait states aren't overlapping with data transfer.
This is mostly correct, but ISA has another pecularity up its sleeve: On the PC/XT, you are absolutely correct. Address and data transfer doesn't overlap, there is no pipelining. And transfers are slow, if you measure them in processor clocks. That's because the ISA bus is basically what you get if you combine the 8088 processor and the 8288 bus controller. The 8088 requires 4 cycles per bytes transferred (compare that to the 6502 that can transfer a byte every cycle). This makes comparing clock speed between Intel-style systems (including Z80 systems) and MOS-style systems quite difficult. OTOH, a C64 at 1MHz still as a lower bus transfer rate than a PC at 4.77MHz, so measured in throughput in actual systems, the ISA bus is not worse than the cartridge slot of a C64.
On the AT, though, the 286 processor was introduced. IBM kept the bus close to the processor (we are talking about 6MHz and 8MHz AT systems, not the later AT clones). The 286 can handle a 16-bit bus cycle in 2 clocks instead of an 8-bit bus cycle in 4 clocks. (if you look at execution unit clocks, that is). OTOH, if you look at the clock signal clocks, the 286 still needs 4 clock cycles per bus transfer, it's just that the execution clock of the 286 (and 386) is just have of the clock frequency you input at the clock pin, while the execution unit of the 8088 is exactly the clock at the clock pin. As the 8088 multiplexed address and data pins (Intel had no choice for the 8086: You have 16 dedicated data pins and 16 dedicated address pins on a 40-pin DIP package on a processor as complex as the 8086), the 8088 mainboards had to latch the address (and that's how the "address latch enable" signal came to be on the ISA bus: It's generated on the board, and it might be useful for some card as well, so why not put it on the bus). With the 286 having more pins, address/data multiplexing was no longer needed, and the 286 designers decided that in the last half execution clock (i.e. the last full clock pin clock) of one bus cycle, the address pins may already contain the address of a subsequent bus cycle, so the time from the address appearing at the processor address pins to the cycle being finished is not just 2 execution clocks, but 2.5 execution clocks. You don't get the .5 extra clock on the 8-bit ISA bus, though, as ISA is defined to have a valid address over the whole bus cycle. And that's why ISA added "unlatched" high address pins on the 16-bit connectors: Those pins are valid half an execution clock period earlier, but don't necessarily stay valid over the whole cycle. If you need that, you must add an extra latch on the card, and use the ALE signal to control it. The key point of having the top 7 address bits "early" on the bus is to give more time to the address decoder, and most importantly the circuit that decides whether a target supports 16-bit transfers.
spacesaver wrote on 2025-02-27, 18:30:
Using voices for uploading soundfonts sounds like a very adhoc way of sharing the memory. It sounds like only the EMU8K can drive the DRAM instead of also allowing memory requests from ISA to be a bus master.
Exactly that's how it works. The DRAM is not shared at all, but the DRAM is 100% occupied by the EMU8K, so the processor has to access the DRAM through the EMU8K, and thus you need to free up memory time by allocating "voices" to the RAM interface. You can see this in contrast to the IBM CGA card: It also had a fixed access pattern: In all video modes (except the 80-column text modes), the video card used 50% of the bus bandwidth, and the processor got to use the other 50% (yet it couldn't do it, as the duration of a single ISA cycle on the IBM PC is longer than the period between two processor-dedicated memory access time slots of the CGA, the the processor missed at least every other chance to transfer data). In high-res text mode, the required bandwidth doubled, so the CRTC required 100% of the bandwidth. Yet, the CGA allows the 8088 to access the video RAM, and if it does so outside of the blanking/retrace period, the CRTC gets its memory cycle stolen and the processor cycle is executed instead, causing the well-know "CGA snow". Using a similar scheme on the EMU8K would cause static noise during sample upload.
spacesaver wrote on 2025-02-27, 18:30:
mkarcher suggested "likely 4 consecutive samples," though one would expect the 3 older samples to be buffered, not reread. Also, that upper bound speed only considers the speed between the EMU8K and its onboard RAM. It seems the ISA transfers are the real bottleneck.
I did suspect 4 samples to allow good (3rd-order) interpolation. While you can buffer 3 older samples when slowing down the sample from the RAM/ROM, this will break down when you need to speed up samples, i.e. the sample position pointer advances more than 1 sample in RAM for one 44.100Hz sample.
spacesaver wrote on 2025-02-27, 18:30:
It does seem pretty inefficient. You have to write an address in addition to the data.
You can write the address once, so writing addresses does not contribute to the slowness:
EMU8K programming guide wrote:
If you wish to do a write transfer, simply write the data words to be transfered to sequential sound memory addresses into sound memory to SMLD (left) or SMRD (right). The address will be automatically incremented.