VOGONS


First post, by Falcosoft

User metadata
Rank l33t
Rank
l33t

Hi,
I have released a new version 1.7 of MandelX with included benchmarks:
https://falcosoft.hu/otesz/mandelx.zip

MandelX is portable so it does not require any installations. You only have to copy the MandelX folder from the zip file to your desktop or any other folder.
It uses an ini file to store settings so a folder with write permission for normal users is preferred.

The attachment MandelX _Benchmark.png is no longer available

MandelX can work on any Windows versions from Win98 to Win 11.
It uses hand written optimized assembly routines to generate Mandelbrot/Julia fractals. The code paths do not favor any CPU vendors or micro-architectures deliberately and in practice work well with both older and newer CPU generations.
The benchmark uses 6 code paths of the same algorithm using different instruction sets.
Only usually fast additions, subtractions, multiplications are used. No costly divisions, square roots, trigonometric or other rarely optimized instructions.
So the benchmarks can faithfully represent the pure number crunching performance of different x86 microarchitectures.
Both used data and code can simply fit into L1 cache so multiple level cache hierarchies and memory subsystem performance should not play a big role.

More info about the used benchmark routines:

1. x86 ALU is a fixed point integer routine that uses 32-bit integers and 32-bit x86 integer registers and calculates 1 pixel per round.
2. x87 FPU is a floating point routine that uses 80-bit extended precision floats and the 80-bit registers of the FPU and calculates 1 pixel per round.
3. 3DNow! is a SIMD floating point routine that uses AMD's 3DNow! instruction set introduced with the K6-2 using 32-bit floats and 64-bit MMX registers and it calculates 2 pixels per round.
4. SSE is a SIMD floating point routine that uses Intel's SSE instruction set introduced with the Pentium 3 using 32-bit floats and 128-bit XMM registers and it calculates 4 pixels per round.
5. SSE2 is a SIMD floating point routine that uses Intel's SSE2 instruction set introduced with the Pentium 4 using 64-bit doubles and 128-bit XMM registers and it calculates 2 pixels per round.
6. AVX is a SIMD floating point routine that uses Intel's AVX instruction set introduced with Sandy Bridge CPUs using 64-bit doubles and 256-bit YMM registers and it calculates 4 pixels per round.

Notice that since the SSE routine calculates 4 pixels and SS2 only 2 pixels per round the SSE routine is faster than the SSE2 but SSE2 is more precise (64-bit double vs. 32-bit float).

The main goal of the benchmark is to compare different micro-architectures and their efficiencies using the different instruction sets.

It's important to note that I could not find a generic and reliable way to determine the actual 1 core turbo frequencies of different modern processors. I could only detect the CPU speed by using the TSC which usually only gives back the base speed. So you have to manually type the proper 1 core turbo frequency of your CPU before the benchmark to get one of most important benchmarks results:
That is the 1GHz normalized pixels/millisecond value.
With this value you can easily compare the execution efficiency of different generations of CPUs with different working speed/MHz values.
So please, use CPU-Z (https://www.cpuid.com/softwares/cpu-z.html) or read your CPU's datasheet to determine the correct 1 core turbo frequency value.

I have already uploaded the results of my home PCs. You can find them here:

https://falcosoft.hu/mandelx_benchmark_results.php

It's not a too big collection so I ask you kindly to download MandelX, run the benchmarks and upload the results. No other information is collected just the ones you can see on the benchmark dialog of Mandelx.
I'm particularly interested in K6-2/3 , Pentium 3/4 as well as Bulldozer and Ryzen results but of course any other results are welcome.

Thanks in advance!

Last edited by Falcosoft on 2025-06-13, 08:12. Edited 3 times in total.

Website, Facebook, Youtube
Falcosoft Soundfont Midi Player + Munt VSTi + BassMidi VSTi
VST Midi Driver Midi Mapper

Reply 1 of 5, by Falcosoft

User metadata
Rank l33t
Rank
l33t

Thanks @argh for your results!

So far it seems Ryzen 5 3500X dominates the micro-architecture race except for x87 FPU performance.
It seems on Ryzen the x87 FPU is not only 'not faster' but definitely slower compared to previous AMD/Intel CPU generations.
This is somewhat understandable since in case of modern x64 code the FPU is not used at all. Yet it is still an interesting result for legacy FPU intensive tasks...

Website, Facebook, Youtube
Falcosoft Soundfont Midi Player + Munt VSTi + BassMidi VSTi
VST Midi Driver Midi Mapper

Reply 2 of 5, by bakemono

User metadata
Rank Oldbie
Rank
Oldbie

In Geekbench 2 there are also some FPU tasks which run faster on K8/K10 than they do on Steamroller or Zen.

GBAJAM 2024 submission on itch: https://90soft90.itch.io/wreckage

Reply 3 of 5, by Falcosoft

User metadata
Rank l33t
Rank
l33t
bakemono wrote on Yesterday, 09:05:

In Geekbench 2 there are also some FPU tasks which run faster on K8/K10 than they do on Steamroller or Zen.

I suspect that this is true for all common FPU intensive tasks since the used instructions in the X87 FPU function are only additions, subtractions, multiplications and FPU stack manipulation instructions.
Nothing fancy like costly trigonometric functions or FDIV, FSQRT etc.

function iterate_FPU_x87(a, b, re, im, cnst: extended; max2: integer): integer; assembler; stdcall;
asm

xor edx,edx
fld a
fld b
fld cnst
fld re
fld st(0)
fmul st(0),st(1)
fld im
fld st(0)
fmul st(1),st(0)
fincstp
fxch
fdecstp
mov ecx,[max2]

@iter: fmul st,st(3)
fadd st,st(0)
fadd st,st(5)
fincstp
fsub st,st(1)
fadd st,st(5)
fst st(2)
fmul st,st(0)
inc edx
fst st(6)
fdecstp
fst st(2)
fmul st(2),st
fdecstp
fadd st,st(3)
fcomp st(5)
fnstsw ax
and ah,41H
JZ @ok
cmp edx,ecx
jb @iter
@ok:
fcompp
fcompp
fcompp
fstp st(0)
mov eax,edx
end;

Website, Facebook, Youtube
Falcosoft Soundfont Midi Player + Munt VSTi + BassMidi VSTi
VST Midi Driver Midi Mapper

Reply 4 of 5, by Falcosoft

User metadata
Rank l33t
Rank
l33t

Hmm. According to new results it seems Zen 4 Ryzen is even slower on x87 FPU calculations than Zen 2...

The attachment Ryzens_FPU.png is no longer available

@Edit:

In case of Zen4 the FPU result is less then 1/2 of the ALU result, less than 1/4 of the SSE2 result, and less than 1/8 of the SSE/AVX result.
In case of other non-Zen CPUs the FPU/ALU ratio is close to 1 and the FPU result is about 1/2 of the SSE2 result, and about 1/4 of the SSE/AVX result.
The results of the 2nd group are more logical considering that ALU/FPU calculates 1 pixel, SSE2 2 pixels, while SSE/AVX 4 pixels.
It seems that Zen 4 has a particularly slow x87 FPU (and of course particularly fast ALU/SSE/AVX units) .

The attachment Zen4.png is no longer available

Website, Facebook, Youtube
Falcosoft Soundfont Midi Player + Munt VSTi + BassMidi VSTi
VST Midi Driver Midi Mapper

Reply 5 of 5, by Falcosoft

User metadata
Rank l33t
Rank
l33t

Someone with a Pentium 4 Northwood/Prescott please, upload some results 😀
Real AMD K6-2/3 results are also still missing.
Thanks in advance!

And of course thanks to everyone who has run the benchmark and uploaded results!

Website, Facebook, Youtube
Falcosoft Soundfont Midi Player + Munt VSTi + BassMidi VSTi
VST Midi Driver Midi Mapper