noshutdown wrote:fpu running at half speed is not a good explanation for poor fpu performance, because the nehemiah core, which has full speed fp […]
Show full quote
fpu running at half speed is not a good explanation for poor fpu performance, because the nehemiah core, which has full speed fpu, is only about 10% faster than previous half-speed-fpu cores in superpi.
nehemiah-1g ~9min
ezra-1g ~10min
idt c6-250 ~13min
p55c-300 ~7min
I wasn't sure how to interpret figures for the Nehemiah given the motherboard-matching issues mentioned above. (unless those are already best-case figures)
and while idt c6's fpu performance is slower than intel and amd's, the gap isn't that great at all.
From the benchmarks I've seen, the gap is pretty big (3DNow! aside), almost on 486 levels and noticeably below the 6x86. (though certainly much closer clock for clock than the C3 appears to be)
Of course, comparisons will be benchmark dependent too, and there's a lot of variables to consider there. (benchmarks using code that focuses more on certain instructions than other benchmarks or code that performs particularly well on a specific architecture but disproportionally on other CPUs)
On the issue of FPUs, the K5 is one chip that seems to have very mixed benchmark figures for the FPU. (I've seen claims that it's faster than the 6x86, but most benchmarks show it much lower than the 6x86 at the same clock rate -I haven't seem specific cycle time figures for different operations, so perhaps it has an advantage in some areas but not others -if mult and/or add is fast, but div is especially slow, that could throw things way off, especially since the Cyrix is relatively fast at div and relatively slow at add and mult -by Pentium standards)
swaaye wrote:An interesting topic to consider is Pentium 4. Intel did the a lot of the same things Cyrix/Centaur did, in order to make a CPU more about clock speed headroom than instructions per clock. Pentium 4's x87 FPU is quite weak, for example. Unfortunately Centaur and Cyrix CPUs clock low and have low IPC.
The pentium 4 had other design trade-offs though . . . and the Centaur design was kept simple for cost/power reasons rather than clock speeds (the C6 did the same, but scaled poorly), other design aspects of the C3 certainly focused more on clock speed scalability though (the long pipeline being an obvious one, especially compared to the 4-stage C6).
And the P4 obviously still has a much higher IPC rate than the C3 (or C6 for that matter) for both integer and floating point. (and per-clock, the FPU is still not that far off from the Athlon or PII/III FPUs -well ahead of the old 6x86 one and faster in some respects than the K6 FPU)
But, in any case, you're right about the C3's (relatively) poor clock speed scaling keeping it from being really useful in some respects (compared to the P4), and unlike the P4, it wasn't hindered by heat issues but just core stability. (and with the low IPC rate and only modest clock rates for the time, it also lost much of the perceived advantage over the Cyrix M2 based parts VIA was also considering -aside from the clock-speed marketing related issues)