Somehow I'd failed to notice the odd performance distribution in the Quake 1 tests before. The oddly high performance of the Pentium II overdrives was noted early on after the results were posted, but there's other oddities all over the range too.
I know Quake is hand-optimized for the P5 architecture's quirks, and the P55C obviously does the best clock for clock of any of the CPUs tested, that's not so surprising. What's weird is how well the Winchip (even C6), K6, and 6x86 (and MII) compare to the Pentium Pro (even 1 MB cache version) and Pentium II and how poorly the Athlon does. The K6 family and P6 (PPro, II, III, Celeron) also seem very close in performance to eachother at similar clock and bus speeds leaving only the P5 family to have serious clock for clock disparity on Intel's end.
Given the P5, P6, and K7 all have dual-issue piplined FPUs (and only the P5 requires specific optimization for in-order execution) I'd think there'd be more consistency in performance there assuming the typical FPU performance bottleneck on Quake. Even with some performance oddities between the P5 and P6 FPUs (like the theoretical multiplication throughput of the P5 being double that of the P6) it wouldn't explain the K7's odd performance and more so since the 640x480 resolution should be pretty heavy on Fxch and Fdiv operations due to the way Quake texture maps, so the division performance lead the P6 and K7 add should show up more prominently. (granted, that might help the 6x86 a bit too with its decent Fdiv performance) That, or I've been misled as far as what Quake's most finicky requirements are.
The Winchip 2 and C6 performing so well is particularly odd, often matching or slightly beating similarly clocked P6 chips. I could maybe understand that 1 MB cache SS7 results might skew some things, but that wouldn't account for the 1 MB cache Xeon and PPro performance. Plus, that advantage should disappear for Socket 5/7 CPUs tested on 512kB boards ... unless perhaps it's some odd affinity for Quake and the direct-mapped caching scheme used for board-level caches over the set-associative caches for the Socket 8, Slot 1, and SlotA examples. (except that still wouldn't explain the PII OD's high performance, unless they used a different caching scheme from other PIIs)
The affinity for 75 MHz (and some 83 MHz) bus examples seems pretty reasonable at least with the slightly overclocked PCI/AGP bus. (presumably the 83 MHz tests that don't show such disparity were using 2.5x rather than 2x PCI dividers) Though if it is the board-level cache making a significant difference in a lot of these figures, I'd think the 100 MHz bus would be favored more consistently than it is.
Perhaps including a Duron in testing might have shed at least a little more light on things, at least as far as the cache-specific performance issues go. (would give a decent contrast to the Samuel II and Ezra with their similar cache configurations; obviously useful for comparing far more than just Quake)
Quake II very consisistently favors all the pipelined FPU CPUs and favors more advanced P6 and K7 types over the P5 to the point it seems to run noticeably faster than Quake 1 on most of them while several of the other CPUs fall behind on Quake II compared to Quake 1. (and SIMD extentions aren't taken advantage of in either of them)
Aside from the Athlon, the Winchip performance is probably the most surprising of anything I noticed when sifting back through these tests, Quake 1 does oddly well in it compared to most other computationally intensive applications. (FPU or ALU wise)
It does make me wonder how much better the VIA C3 family would've performed as Socket-7 parts with the benefit of said caches. (given the FPU should be pretty close to the Winchip2's but clocked at half the ALU speed -and indeed showing just slightly better than 1:1 performance at 2x the clock speed in other FPU benchmarks, it seems like the cache is the weak link setting those C3 chips well below the 1/2 or even 1/3 clock for clock performance mark compared to the C6 and Winchip2 ... a 500 MHz SS7 Samuel should comprehensively outperform a 250 MHz Winchip II by that logic -which the S370 Samuel certainly does not, even at 600 MHz)