Pentium MMX has 32K L1 cache vs 16K for Pentium Classic which is what had the most impact on its speed. There are only a few "tech demo" games that have MMX support from those days. I know of POD and Rebel Moon Rising offhand, both of which are not that great.
I think the best example of heavy MMX use in games is Unreal. The game uses MMX for its audio engine and 3D software renderer (which was incredible btw). The rest of the game also has some limited use of 3DNow and KNI (SSE). Unfortunately a Pentium MMX is really too slow for that game unless you could crank it to 300 MHz maybe. An interesting catch with PPro running Unreal is that Unreal will automatically set you to low quality audio since you lack MMX (you can tweak the INI to set it back tho). A PPro at 233 MHz runs Unreal fairly well but you really want more like a P2 400 for that game or you'll be chugging when there's any significant action going on.
The PPro FPU definitely helps with 3D games, but you're still talking about a chip that's somewhat slower than a Pentium II and only at 200 MHz officially. Pentium II not only adds MMX, but also fixes 8/16-bit code execution speed and bumps the L1 to 32K like with PMMX. This must have been a happy size going by how P3 Tualatin also has 32K. Sure the L2 is sorta gimped by being half speed, but it was also 2x the size of the most popular PPro (256K) and the PII's clock speed ramped up like crazy too.
The PPro Overdrive is basically a Xeon CPU (512K cache) turned into a PPro-shaped circuit board with pins. It's only 333 MHz though, vs 400 MHz for the original Slot 2 Xeons. And it would have to deal with that gimpy PPro 66MHz bus with EDO DRAM.