The problem with CPU-local cache memory is how to invalidate the entries when DMA is moving data (from FDC/HDD, sound card maybe) to RAM. There's obviously no protocol for that on a 386 since it's not supposed to have cache at all.
Cyrix assigned some unused 386 pins, both on PGA and SX PQFP packages, to new signals that allow maintaining rudimentary cache coherency. But since only newer mobos/chipsets had support for those (like ALI M1217 for SX series), and worse yet some mobos could have the N/C pins actually connected to GND or VCC by mistake, you need to enable those in software. If the BIOS doesn't know anything about Cyrix SLC/DCL-like CPUs, and will not do it at boot time, you have to run a DOS utility yourself. Sometimes you need to do that anyway to work around certain BIOS shortcomings.
The point here is, it's not just the CPU or the interface PCB that decide how well the upgrade will run. It's down to the mobo and BIOS as well. The best case scenario is full BIOS support - then you'll have a decent performance boost. Reasonably well designed mobo with no support can usually match that with some carefuly chosen BIOS settings and DOS utils. But if the mobo doesn't support hidden RAM refresh and is requesting bus hold for each refresh cycle, the perfomance will be bad and you can't do anything about it short of soldering wires and adding some extra logic to try and overcome that.
So you can't just compare the results from different mobos and upgrade kits, it might work great for one person and not at all for another. It's possible to misconfigure the cache settings so badly you'll get performace regression vs the native 386.