EDIT: I build the Athlon and P3 again!
That's great but I think we have too many configurations to test and it looks to me not all of them suffer from the problem in the topic. I'd like to focus on one configuration with which we see "Super fast DOSBox+MT-32 but very slowww mt32emu Win32 driver" problem.
Actually, I'm still uncertain how this is possible... All three systems (unfortunately, all three are quite decent vs. yours, dual core AMD Athlon II X2 240, AMD Turion 64 X2, and Intel P6200) I tested display the opposite trend, i.e. good CPU load with the driver (20-40% depending on the tune playing), (when underclocking to 800-900 MHz, the CPU load is about unmeasurable when they run at full speed) a bit higher with my DOSBox build and yet a bit higher with Ykhwong's (all tested with default settings).
Also note that the CPU load for dual core systems is measured as an average over two cores, and the emulation is single threaded.
When I try DOSBox at full speed, the CPU load meter shows 0 whereas with the driver it shows a few percent of load. I assume this is due to imperfect Windows performance counters. The difference is DOSBox by default performs audio rendering by 1 millisecond chunks, and the driver uses 10 millisecond chunks by default. So, the process scheduler might consider a time slice not fully used by the emulation as "free". 😀
Anyway, I also tested Munt with my old Intel PIII 800 MHz system but the results are not so nice as yours. It seems totally incapable of running DOSBox with MT-32 emulation. Even DOSBox itself takes 20% of CPU (without any sound output) for 3000 emulation cycles. So, the sound was awful until I reduce the emulation cycles down to 1000. But it was acceptable (well, no underflows and crackles at least) when I tried mt32emu_qt and its internal MIDI player and it was ok with Media player + driver as well.
So, I'm personally very interested in solution of this problem but alas I have no possibility to diagnose it properly. 🙁