VOGONS


x86 microarchitecture benchmark (MandelX)

Topic actions

First post, by Falcosoft

User metadata
Rank l33t
Rank
l33t

Hi,
I have released a new version 1.7 of MandelX with included benchmarks:
https://falcosoft.hu/otesz/mandelx.zip

MandelX is portable so it does not require any installations. You only have to copy the MandelX folder from the zip file to your desktop or any other folder.
It uses an ini file to store settings so a folder with write permission for normal users is preferred.

The attachment MandelX _Benchmark.png is no longer available

MandelX can work on any Windows versions from Win98 to Win 11.
It uses hand written optimized assembly routines to generate Mandelbrot/Julia fractals. The code paths do not favor any CPU vendors or micro-architectures deliberately and in practice work well with both older and newer CPU generations.
The benchmark uses 6 code paths of the same algorithm using different instruction sets.
Only usually fast additions, subtractions, multiplications are used. No costly divisions, square roots, trigonometric or other rarely optimized instructions.
So the benchmarks can faithfully represent the pure number crunching performance of different x86 microarchitectures.
Both used data and code can simply fit into L1 cache so multiple level cache hierarchies and memory subsystem performance should not play a big role.

More info about the used benchmark routines:

1. x86 ALU is a fixed point integer routine that uses 32-bit integers and 32-bit x86 integer registers and calculates 1 pixel per round.
2. x87 FPU is a floating point routine that uses 80-bit extended precision floats and the 80-bit registers of the FPU and calculates 1 pixel per round.
3. 3DNow! is a SIMD floating point routine that uses AMD's 3DNow! instruction set introduced with the K6-2 using 32-bit floats and 64-bit MMX registers and it calculates 2 pixels per round.
4. SSE is a SIMD floating point routine that uses Intel's SSE instruction set introduced with the Pentium 3 using 32-bit floats and 128-bit XMM registers and it calculates 4 pixels per round.
5. SSE2 is a SIMD floating point routine that uses Intel's SSE2 instruction set introduced with the Pentium 4 using 64-bit doubles and 128-bit XMM registers and it calculates 2 pixels per round.
6. AVX is a SIMD floating point routine that uses Intel's AVX instruction set introduced with Sandy Bridge CPUs using 64-bit doubles and 256-bit YMM registers and it calculates 4 pixels per round.

Notice that since the SSE routine calculates 4 pixels and SS2 only 2 pixels per round the SSE routine is faster than the SSE2 but SSE2 is more precise (64-bit double vs. 32-bit float).

The main goal of the benchmark is to compare different micro-architectures and their efficiencies using the different instruction sets.

It's important to note that I could not find a generic and reliable way to determine the actual 1 core turbo frequencies of different modern processors. I could only detect the CPU speed by using the TSC which usually only gives back the base speed. So you have to manually type the proper 1 core turbo frequency of your CPU before the benchmark to get one of most important benchmarks results:
That is the 1GHz normalized pixels/millisecond value.
With this value you can easily compare the execution efficiency of different generations of CPUs with different working speed/MHz values.
So please, use CPU-Z (https://www.cpuid.com/softwares/cpu-z.html) or read your CPU's datasheet to determine the correct 1 core turbo frequency value.

I have already uploaded the results of my home PCs. You can find them here:

https://falcosoft.hu/mandelx_benchmark_results.php

It's not a too big collection so I ask you kindly to download MandelX, run the benchmarks and upload the results. No other information is collected just the ones you can see on the benchmark dialog of Mandelx.
I'm particularly interested in K6-2/3 , Pentium 3/4 as well as Bulldozer and Ryzen results but of course any other results are welcome.

Thanks in advance!

Last edited by Falcosoft on 2025-06-13, 08:12. Edited 3 times in total.

Website, Youtube
Falcosoft Soundfont Midi Player + Munt VSTi + BassMidi VSTi
VST Midi Driver Midi Mapper
x86 microarchitecture benchmark (MandelX)

Reply 1 of 39, by Falcosoft

User metadata
Rank l33t
Rank
l33t

Thanks @argh for your results!

So far it seems Ryzen 5 3500X dominates the micro-architecture race except for x87 FPU performance.
It seems on Ryzen the x87 FPU is not only 'not faster' but definitely slower compared to previous AMD/Intel CPU generations.
This is somewhat understandable since in case of modern x64 code the FPU is not used at all. Yet it is still an interesting result for legacy FPU intensive tasks...

Website, Youtube
Falcosoft Soundfont Midi Player + Munt VSTi + BassMidi VSTi
VST Midi Driver Midi Mapper
x86 microarchitecture benchmark (MandelX)

Reply 2 of 39, by bakemono

User metadata
Rank Oldbie
Rank
Oldbie

In Geekbench 2 there are also some FPU tasks which run faster on K8/K10 than they do on Steamroller or Zen.

GBAJAM 2024 submission on itch: https://90soft90.itch.io/wreckage

Reply 3 of 39, by Falcosoft

User metadata
Rank l33t
Rank
l33t
bakemono wrote on 2025-06-12, 09:05:

In Geekbench 2 there are also some FPU tasks which run faster on K8/K10 than they do on Steamroller or Zen.

I suspect that this is true for all common FPU intensive tasks since the used instructions in the X87 FPU function are only additions, subtractions, multiplications and FPU stack manipulation instructions.
Nothing fancy like costly trigonometric functions or FDIV, FSQRT etc.

function iterate_FPU_x87(a, b, re, im, cnst: extended; max2: integer): integer; assembler; stdcall;
asm

xor edx,edx
fld a
fld b
fld cnst
fld re
fld st(0)
fmul st(0),st(1)
fld im
fld st(0)
fmul st(1),st(0)
fincstp
fxch
fdecstp
mov ecx,[max2]

@iter: fmul st,st(3)
fadd st,st(0)
fadd st,st(5)
fincstp
fsub st,st(1)
fadd st,st(5)
fst st(2)
fmul st,st(0)
inc edx
fst st(6)
fdecstp
fst st(2)
fmul st(2),st
fdecstp
fadd st,st(3)
fcomp st(5)
fnstsw ax
and ah,41H
JZ @ok
cmp edx,ecx
jb @iter
@ok:
fcompp
fcompp
fcompp
fstp st(0)
mov eax,edx
end;

Website, Youtube
Falcosoft Soundfont Midi Player + Munt VSTi + BassMidi VSTi
VST Midi Driver Midi Mapper
x86 microarchitecture benchmark (MandelX)

Reply 4 of 39, by Falcosoft

User metadata
Rank l33t
Rank
l33t

Hmm. According to new results it seems Zen 4 Ryzen is even slower on x87 FPU calculations than Zen 2...

The attachment Ryzens_FPU.png is no longer available

@Edit:

In case of Zen4 the FPU result is less then 1/2 of the ALU result, less than 1/4 of the SSE2 result, and less than 1/8 of the SSE/AVX result.
In case of other non-Zen CPUs the FPU/ALU ratio is close to 1 and the FPU result is about 1/2 of the SSE2 result, and about 1/4 of the SSE/AVX result.
The results of the 2nd group are more logical considering that ALU/FPU calculates 1 pixel, SSE2 2 pixels, while SSE/AVX 4 pixels.
It seems that Zen 4 has a particularly slow x87 FPU (and of course particularly fast ALU/SSE/AVX units) .

The attachment Zen4.png is no longer available

Website, Youtube
Falcosoft Soundfont Midi Player + Munt VSTi + BassMidi VSTi
VST Midi Driver Midi Mapper
x86 microarchitecture benchmark (MandelX)

Reply 5 of 39, by Falcosoft

User metadata
Rank l33t
Rank
l33t

Someone with a Pentium 4 Northwood/Prescott please, upload some results 😀
Real AMD K6-2/3 results are also still missing.
Thanks in advance!

And of course thanks to everyone who has run the benchmark and uploaded results!

Website, Youtube
Falcosoft Soundfont Midi Player + Munt VSTi + BassMidi VSTi
VST Midi Driver Midi Mapper
x86 microarchitecture benchmark (MandelX)

Reply 6 of 39, by Falcosoft

User metadata
Rank l33t
Rank
l33t

Thanks to @Dothan Burger from the 'Netburst: Aiming for the Stars' topic the 1st Pentium 4 result arrived
Re: Netburst: Aiming for the Stars

It's interesting that the SSE result is virtually the same as the SSE2.
The SSE2 time result is normal in the sense that it is about half of the FPU result (2 vs. 1 pixel per round) but the SSE time result should be about half of the SSE2 ( 4 vs. 2 pixels per round).
It seems the problem is the double amount of used 'CMOV' instructions. CMOV was introduced with the P6 (Pentium Pro) and it is generally faster than traditional branches. But it seems not on the Pentium 4.
No other microarchitectures are affected by this so far.

Website, Youtube
Falcosoft Soundfont Midi Player + Munt VSTi + BassMidi VSTi
VST Midi Driver Midi Mapper
x86 microarchitecture benchmark (MandelX)

Reply 7 of 39, by Falcosoft

User metadata
Rank l33t
Rank
l33t

It seems with newest Core Ultra Intel also left legacy x87 FPU peformance behind.
The Intel Core Ultra 9 285K has similar x87 FPU performance to Ryzens (a little better).
There are no results from 14th Gen Core 14xxx so far but the result of the 12th Gen Corei5-12400F shows (best x87 FPU performance) that 14th Gen Intel could be the last generation with strong x87 FPU.

Website, Youtube
Falcosoft Soundfont Midi Player + Munt VSTi + BassMidi VSTi
VST Midi Driver Midi Mapper
x86 microarchitecture benchmark (MandelX)

Reply 8 of 39, by myne

User metadata
Rank Oldbie
Rank
Oldbie

Which core of the 12th gen?
I suspect that eventually, once the windows scheduler has caught up to Linux and can handle truly heterogeneous core clusters, that the P and E cores will split significantly from an instruction capability and performance.
Eg e cores might retain 32bit compatibility and all those instructions, while p cores will pick a base 64bit era like skylake and only be compatible from that point on.

Which would theoretically allow both backward compatibility, and a streamlined x64.

It's just my theory.

I built:
Convert old ASUS ASC boardviews to KICAD PCB!
Re: A comprehensive guide to install and play MechWarrior 2 on new versions on Windows.
Dos+Windows 3.11+tcp+vbe_svga auto-install iso template
Script to backup Win9x\ME drivers from a working install
Re: The thing no one asked for: KICAD 440bx reference schematic

Reply 9 of 39, by Falcosoft

User metadata
Rank l33t
Rank
l33t
myne wrote on 2025-06-17, 08:08:
Which core of the 12th gen? I suspect that eventually, once the windows scheduler has caught up to Linux and can handle truly he […]
Show full quote

Which core of the 12th gen?
I suspect that eventually, once the windows scheduler has caught up to Linux and can handle truly heterogeneous core clusters, that the P and E cores will split significantly from an instruction capability and performance.
Eg e cores might retain 32bit compatibility and all those instructions, while p cores will pick a base 64bit era like skylake and only be compatible from that point on.

Which would theoretically allow both backward compatibility, and a streamlined x64.

It's just my theory.

Hi,
Specifically the the Corei5-12400F (which result I referred to as 12th Gen) has no E-cores only 6 P-cores.
But I do not think this matters too much since at the very beginning the benchmark sets the affinity to 1st core only (AFAIK it's always a P-core) and sets the thread priority to high.
This way the OS gets the hint that this is a high priority task.

Last edited by Falcosoft on 2025-06-17, 08:20. Edited 1 time in total.

Website, Youtube
Falcosoft Soundfont Midi Player + Munt VSTi + BassMidi VSTi
VST Midi Driver Midi Mapper
x86 microarchitecture benchmark (MandelX)

Reply 10 of 39, by myne

User metadata
Rank Oldbie
Rank
Oldbie

Ah, right. Be interesting to see the contrast of p and E.

I built:
Convert old ASUS ASC boardviews to KICAD PCB!
Re: A comprehensive guide to install and play MechWarrior 2 on new versions on Windows.
Dos+Windows 3.11+tcp+vbe_svga auto-install iso template
Script to backup Win9x\ME drivers from a working install
Re: The thing no one asked for: KICAD 440bx reference schematic

Reply 11 of 39, by Falcosoft

User metadata
Rank l33t
Rank
l33t
myne wrote on 2025-06-17, 08:19:

Ah, right. Be interesting to see the contrast of p and E.

Do you know any reliable method to determine which cores are the E-cores? More specifically how different cores are numbered by modern Windows?
Unfortunately I have no such system to experiment with.

Website, Youtube
Falcosoft Soundfont Midi Player + Munt VSTi + BassMidi VSTi
VST Midi Driver Midi Mapper
x86 microarchitecture benchmark (MandelX)

Reply 12 of 39, by myne

User metadata
Rank Oldbie
Rank
Oldbie

Simplest, most reliable way i can think of without much extra code, is to just run the test on every core sequentially and spot what should be a clear performance difference between them since at a bare minimum the e cores clock lower.

I built:
Convert old ASUS ASC boardviews to KICAD PCB!
Re: A comprehensive guide to install and play MechWarrior 2 on new versions on Windows.
Dos+Windows 3.11+tcp+vbe_svga auto-install iso template
Script to backup Win9x\ME drivers from a working install
Re: The thing no one asked for: KICAD 440bx reference schematic

Reply 13 of 39, by Falcosoft

User metadata
Rank l33t
Rank
l33t
myne wrote on 2025-06-17, 09:05:

Simplest, most reliable way i can think of without much extra code, is to just run the test on every core sequentially and spot what should be a clear performance difference between them since at a bare minimum the e cores clock lower.

OK,
I have added manual processor affinity selection to benchmark dialog.
This way P-cores and E-cores on modern CPUs can be tested separately.
According to Google the cores are numbered the following way:
"the numbering of cores in Task Manager typically prioritizes Performance cores (P-cores) followed by Efficient cores (E-cores). For example, on a system with 8 P-cores and 8 E-cores, the numbering might start with P-core 0, then its hyper-threaded logical processor, then P-core 1, its hyper-threaded logical processor, and so on. After all the P-cores and their logical processors are listed, the E-cores would be numbered next. "

So if you want to test P-core performance you should select 1st Core (default) while if you want to test E-core performance you should select the last one.
Version 1.7.1 of MandelX:
https://falcosoft.hu/otesz/mandelx.zip

Ps:
If anyone uploads E-core results please, note it explicitly in the user comment part. And of course do not forget to set the '1 Core turbo CPU speed' field to the max turbo clock of the E-core!
Thanks!

Website, Youtube
Falcosoft Soundfont Midi Player + Munt VSTi + BassMidi VSTi
VST Midi Driver Midi Mapper
x86 microarchitecture benchmark (MandelX)

Reply 14 of 39, by nickles rust

User metadata
Rank Newbie
Rank
Newbie
Falcosoft wrote on 2025-06-11, 10:11:

I'm particularly interested in K6-2/3 ...

I tried this on a 400MHz K6-2 and got ALU 7.82 P/S, FPU 4.39, 3DN 24.7

What are the different results telling you?

Reply 15 of 39, by Falcosoft

User metadata
Rank l33t
Rank
l33t
nickles rust wrote on 2025-06-17, 15:11:
Falcosoft wrote on 2025-06-11, 10:11:

I'm particularly interested in K6-2/3 ...

I tried this on a 400MHz K6-2 and got ALU 7.82 P/S, FPU 4.39, 3DN 24.7

What are the different results telling you?

Hi,
Thanks for the results, but would you be so kind to make at least a screenshot from the result dialog (if you cannot upload the result)?
Back to your question: The results say that the K6-2/3 has a not too strong, non-pipelined FPU but a very fast 3Dnow! execution unit. Clock for clock it is even faster than later generations of AMD CPUs (Athlons, Athlon 64s, Phenoms).
Actually it shows that with optimized 3DNow! code the K6-2 is more than 5x faster compared to FPU code!
Starting with the Athlons AMD CPUs also have a fast pipelined FPU and a little slower 3DNow! unit so 3DNow! code can be only 2x faster at best compared to FPU code.

PS:
The results also say that the 'K6-2 233 MHz' emulated by PCem 17 is substantially faster than a real one. But the ALU/FPU/3DNow! results proportionally are almost correct.

Website, Youtube
Falcosoft Soundfont Midi Player + Munt VSTi + BassMidi VSTi
VST Midi Driver Midi Mapper
x86 microarchitecture benchmark (MandelX)

Reply 16 of 39, by nickles rust

User metadata
Rank Newbie
Rank
Newbie

I ran it again and got basically the same numbers: (I had to change the file name to .zip instead of .bmp for it to work with the forum software)

The attachment untitled.zip is no longer available

Reply 17 of 39, by Falcosoft

User metadata
Rank l33t
Rank
l33t
nickles rust wrote on 2025-06-17, 16:39:

I ran it again and got basically the same numbers: (I had to change the file name to .zip instead of .bmp for it to work with the forum software)

The attachment untitled.zip is no longer available

Thanks! I uploaded your results.

The attachment untitled.png is no longer available
The attachment k62_400.png is no longer available

Website, Youtube
Falcosoft Soundfont Midi Player + Munt VSTi + BassMidi VSTi
VST Midi Driver Midi Mapper
x86 microarchitecture benchmark (MandelX)

Reply 18 of 39, by Start me up

User metadata
Rank Newbie
Rank
Newbie

Here are my results of an Intel Atom and an Intel Core running at slightly different clock rates. The screenshots include the measurement readings for the power consumption.

Reply 19 of 39, by Falcosoft

User metadata
Rank l33t
Rank
l33t
Start me up wrote on 2025-06-17, 16:54:

Here are my results of an Intel Atom and an Intel Core running at slightly different clock rates. The screenshots include the measurement readings for the power consumption.

Hi,
I'm sorry to ask this, but would you repeat the Core i5 tests with the correct 1 core turbo speed corrected?
According to this Intel page the correct 1 core max turbo is 2700 MHz.
https://www.intel.com/content/www/us/en/produ … ifications.html

Edit:
Hmm. But according to these results the 1 core speed was definitely not 2700 MHz. I would say it was not even 1800 MHz. Your CPU must have run at a lower P-state...

Last edited by Falcosoft on 2025-06-17, 17:17. Edited 2 times in total.

Website, Youtube
Falcosoft Soundfont Midi Player + Munt VSTi + BassMidi VSTi
VST Midi Driver Midi Mapper
x86 microarchitecture benchmark (MandelX)