First post, by Falcosoft
- Rank
- l33t
Hi,
I have released a new version 1.7 of MandelX with included benchmarks:
https://falcosoft.hu/otesz/mandelx.zip
MandelX is portable so it does not require any installations. You only have to copy the MandelX folder from the zip file to your desktop or any other folder.
It uses an ini file to store settings so a folder with write permission for normal users is preferred.
MandelX can work on any Windows versions from Win98 to Win 11.
It uses hand written optimized assembly routines to generate Mandelbrot/Julia fractals. The code paths do not favor any CPU vendors or micro-architectures deliberately and in practice work well with both older and newer CPU generations.
The benchmark uses 6 code paths of the same algorithm using different instruction sets.
Only usually fast additions, subtractions, multiplications are used. No costly divisions, square roots, trigonometric or other rarely optimized instructions.
So the benchmarks can faithfully represent the pure number crunching performance of different x86 microarchitectures.
Both used data and code can simply fit into L1 cache so multiple level cache hierarchies and memory subsystem performance should not play a big role.
More info about the used benchmark routines:
1. x86 ALU is a fixed point integer routine that uses 32-bit integers and 32-bit x86 integer registers and calculates 1 pixel per round.
2. x87 FPU is a floating point routine that uses 80-bit extended precision floats and the 80-bit registers of the FPU and calculates 1 pixel per round.
3. 3DNow! is a SIMD floating point routine that uses AMD's 3DNow! instruction set introduced with the K6-2 using 32-bit floats and 64-bit MMX registers and it calculates 2 pixels per round.
4. SSE is a SIMD floating point routine that uses Intel's SSE instruction set introduced with the Pentium 3 using 32-bit floats and 128-bit XMM registers and it calculates 4 pixels per round.
5. SSE2 is a SIMD floating point routine that uses Intel's SSE2 instruction set introduced with the Pentium 4 using 64-bit doubles and 128-bit XMM registers and it calculates 2 pixels per round.
6. AVX is a SIMD floating point routine that uses Intel's AVX instruction set introduced with Sandy Bridge CPUs using 64-bit doubles and 256-bit YMM registers and it calculates 4 pixels per round.
Notice that since the SSE routine calculates 4 pixels and SS2 only 2 pixels per round the SSE routine is faster than the SSE2 but SSE2 is more precise (64-bit double vs. 32-bit float).
The main goal of the benchmark is to compare different micro-architectures and their efficiencies using the different instruction sets.
It's important to note that I could not find a generic and reliable way to determine the actual 1 core turbo frequencies of different modern processors. I could only detect the CPU speed by using the TSC which usually only gives back the base speed. So you have to manually type the proper 1 core turbo frequency of your CPU before the benchmark to get one of most important benchmarks results:
That is the 1GHz normalized pixels/millisecond value.
With this value you can easily compare the execution efficiency of different generations of CPUs with different working speed/MHz values.
So please, use CPU-Z (https://www.cpuid.com/softwares/cpu-z.html) or read your CPU's datasheet to determine the correct 1 core turbo frequency value.
I have already uploaded the results of my home PCs. You can find them here:
https://falcosoft.hu/mandelx_benchmark_results.php
It's not a too big collection so I ask you kindly to download MandelX, run the benchmarks and upload the results. No other information is collected just the ones you can see on the benchmark dialog of Mandelx.
I'm particularly interested in K6-2/3 , Pentium 3/4 as well as Bulldozer and Ryzen results but of course any other results are welcome.
Thanks in advance!