Here is Phil's video comparing 486 DX33 vs DX266: https://www.youtube.com/watch?v=HNlcZetLzY8
And here is a DX2 66MHZ with benchmarks: https://www.youtube.com/watch?v=9bqqpuJ7M_I
I have run some benchmarks without L1 and it's just short of the 66 with some benchmarks and a smidge higher with others.
There are two kinds of benchmarks:
Constant load without graphics like Speedsys, Landmark, Topbench.
Varied load with varied graphics like Doom, Quake, Wolfenstein, 3Dbench.
The former shows stable clock/load measurement, the latter shows and average of varied load over time.
Judging by Doom gameplay with L1 disabled on the 233MMX and Phils video of the 33/66, it is much closer to the 66 just by eyeballing the smoothness of gameplay, the 33 is definitely choppier than what I see with L1 disabled.
The Doom benchmark averages a lot of frames and varies in processor load during the demo for quite a lengthy time, this actually is not a good measurement of CPU power but the system as a whole.
With Wolfenstein I get exactly the same FPS (70) score as the 66, with Quake FPS (6.3) very close to 66, with Doom like the 33, Topbench 181.
Practically all benchmarks score just a hair short of the 66 except Doom for some reason, yet Doom "choppiness" looks closer to the 66 even if the score is like the 33.
Anyway, a Pentium 233MMX with the cache disabled gets me in the ballpark between 33 and 66 which is sufficient to my 486 gaming needs.