So the beta PCB I've designed, and I've sent a few boards out to testers in North America. A few other people have fabricated their own boards as well using the KiCAD project I have on my Github. If you search picogus on Twitter you'll see them! I'm not confident enough in the design to say it's final though, but it's close.
As for better IRQ support, that's been eluding me for a while. The symptom is the IRQ line going high and never going back down – the interrupt handler on the PC side somehow doesn't convince the emulated GUS that it's serviced the IRQ. This usually manifests in hanging playback. I've waited to get a logic analyzer with more channel to dig into it and after a false start with the Digilent Digital Discovery (which has clunky software and produced questionable captures that I don't think fully reflected reality...) I finally have one I'm happy with, the DSLogic U3Pro32. What's happening seems to be with voice IRQs, which have to be serviced in a particular way by reading the voice IRQ source register according to the GUS docs:
Note: It is possible that multiple voices could interrupt at virtually the same time. In this case, this register will behave like a fifo. When in your IRQ handler, keep reading (and servicing) this register until you do a read with both IRQ bits set to a 1. This means there are no voice IRQs left to deal with.
However, a lot of demoscene productions only read the IRQ source register once after getting a voice IRQ, even if there are multiple voice IRQs in the queue on the PicoGUS to be serviced. On real GUS hardware with its timing, that assumption appears safe, but the PicoGUS reacts to ISA bus events a bit slower than the real thing so things get a bit out of whack. I'm currently experimenting with re-raising the IRQ if there are still voices to be serviced. Things aren't perfect yet, but demos that hung previously now continue running. I'm also working on improving the speed in which I react to bus events – for example, I lower IOCHRDY on every IO write or read even if it's needed or not because it keeps the PIO code much more simple, and PIO instructions are precious on the RP2040, as there are only 32 available on each of the two PIO units. I may have enough instructions left to only lower IOCHRDY on writes or reads I know can't complete in the typical 500ns event time on the ISA bus.
As for DMA, I've had a bit of a regression. DMA sample upload worked at least some of the time on my first prototype board, but now it fails on the beta PCB. I'll wait until I get IRQs fully sorted out before I tackle that one, though.