I've been trying to validate some of the claims regarding what would happen if one was to run SSE-capable programs in Windows 95. Using a 450MHz Pentium III running Windows 95 OSR2.1 with new chipset drivers and an updated USB supplement, I ran three programs at once which support SSE optimizations from what I understand. As it turns out, none of the programs corrupt each other, but I can't give a firm reason as to why this is. I doubt these programs are actually using SSE here, as there doesn't seem to be any updates for Windows 95 which add native SSE support. The best I can speculate is that these programs prevent themselves from using SSE if they're running under Windows 95, or Windows 95 itself is incapable of executing SSE instructions, causing programs to fall back to MMX.
I know 3DMark 99 MAX doesn't have an option to enable SSE optimizations when run in Windows 95, whereas either edition of Windows 98 is able to use them. Even though this system uses an NVidia TNT2 Pro, some set of benchmarks I ran with an 800MHz Pentium III and a Voodoo3 3000 leave me confused over whether SSE is being used or not. The frame rates I got when running the demo "four" in Quake III Arena were 52.2 for Windows 95B and 52.4 for Windows 98SE. Does the Voodoo3 not support SSE even with the latest drivers, or is SSE actually being used in Windows 95? I'll run some benchmarks on Windows 95/98 with the TNT2 Pro tomorrow, assuming that'll make any giant difference.
I have two computers with a 450MHz K6-2 and a 650MHz Athlon handy, so I'll also be sure to benchmark those; since 3DNow uses the same register set as the FPU and MMX, I expect that to work as many of you have said.