I have a 256K COAST module installed in my Pentium i200. I've recently recorded footage comparing the performance differences between no L2 cache, 256K, and 512K. I've yet to edit the footage, but my results suggest that adding more L2 cache on a stick does slightly improve the performance. For example, playing a 256Kbps MP3 file with no L2 cache uses about 50% of the CPU, about 40% for 256K, and about 35% for 512K. Perhaps the performance could be better using 512K of old-fashioned L2 cache, but I don't know because I never tried that.
One obnoxious problem I have with COAST is inserting and removing the cache module. Pushing in the module is so hard to do because the connectors are excessively tight, and I'm worried I'll bend the motherboard too much. It doesn't help that there weren't many usable mounting holes on my board! Pulling the module back out is equally terrible. I'm guessing the connection is so vice tight because there's nothing extra to hold the chip in place, like plastic tabs seen in DIMM slots. It might vary between motherboards; I used an MS-5128.