Reply 60 of 72, by Falcosoft
- Rank
- l33t
jtchip wrote on 2025-06-27, 23:51:Anyway, interestingly Esther (VIA C7) slightly beats Bonnell in SSE2 on this workload, 16 pixels/ms vs 14, the only "win" it has […]
Anyway, interestingly Esther (VIA C7) slightly beats Bonnell in SSE2 on this workload, 16 pixels/ms vs 14, the only "win" it has.
The rest of my results are from a NSC Geode GX1 300MHz (FPU_FAST enabled, slowest ALU result), Athlon 5350 (Kabini, slowest AVX), and Athlon 64 X2 5000+. The model names from CPUID (perhaps the DOS version should output this too), including the C7-D, are (from /proc/cpuinfo in Linux):
- VIA Esther processor 1500MHz
- Geode(TM) Integrated Processor by National Semi
- AMD Athlon(tm) 5350 APU with Radeon(tm) R3
- AMD Athlon(tm) 64 X2 Dual Core Processor 5000+
Thanks, I have uploaded your attached 3 result sets.
Maybe I overlook something but I cannot find your mentioned VIA Esther results.
Regarding you Athlon 64 results:
It's interesting that your desktop version of Athlon 64 X2 is ~5% faster in 1GHz normalized ALU/integer calculations (and only in ALU/integer) compared to my Turion 64 X2.
I re-tested the mobile Turion X2 also with the DOS version and the difference is consistent.
Both have the same sized L1+L2 caches so this cannot explain the difference.
Maybe it's because the mobile version shares the memory with the integrated ATI Radeon Xpress 1150. And the ALU/integer code path is the only one where inside the 'hot' inner loop memory is also accessed.
It's because there are fewer freely available registers than with other instruction sets. There are only 6 freely available general purpose integer registers in 32-bit code (namely EAX, EBX, ECX, EDX, EDI, ESI) while in all other code paths you have 8 freely available FPU/MMX/SSE/AVX registers + the general purpose integer ones.