VOGONS


x86 microarchitecture benchmark (MandelX)

Topic actions

Reply 40 of 67, by mihai

User metadata
Rank Member
Rank
Member

Uploaded some results on 5700x3d, with the boost core clock@4050 MHz, checked with HWInfo.

The FPU decrease in performance accross the generations with quite strange. Why is this happening? FPU performance is no longer needed?

Reply 41 of 67, by Falcosoft

User metadata
Rank l33t
Rank
l33t
mihai wrote on 2025-06-21, 09:30:

Uploaded some results on 5700x3d, with the boost core clock@4050 MHz, checked with HWInfo.

The FPU decrease in performance accross the generations with quite strange. Why is this happening? FPU performance is no longer needed?

Thanks!
Yep, the traditional x87 FPU is not the architectural floating point unit in x64 code anymore.
When the AMD64 specification was released more than 20 years ago AMD defined SSE/SSE2 (128-bit XMM registers) to be the standard unit for floating point calculations.
Although for legacy x86/32-bit code the FPU is still relevant. Of course you can use SSE/SSE2 or even AVX in new 32-bit code but the standard floating point unit is still the x87 FPU.

Maybe it's worth mentioning that in case of SSE/AVX 64-bit (double precision) is the most precise floating point format while the x87 FPU also offers 80-bit precision (extended precision).
In case of MandelX it is relevant since it means that with the help of the x87 FPU you can zoom deeper with native hardware speed when the 64-bit precision offered by SSE2/AVX is not enough anymore.

Website, Youtube
Falcosoft Soundfont Midi Player + Munt VSTi + BassMidi VSTi
VST Midi Driver Midi Mapper
x86 microarchitecture benchmark (MandelX)

Reply 42 of 67, by Falcosoft

User metadata
Rank l33t
Rank
l33t

Hi,
v1.7.2 of MandelX has been released.
https://falcosoft.hu/otesz/mandelx.zip

The benchmark results of the new version are fully compatible with the previous ones.

The only addition is the offline comparison possibility to local DB results.
The results are in the 'results.csv' file within MandelX's folder. The csv format recognized by MandelX is fully compatible with the export format of the online database. More precisely with the comma separated CSV (,) version.
So if you want to use the newest results for comparison you should select the 'All' view first and then press the 'CSV (,)' button here:
https://falcosoft.hu/mandelx_benchmark_results.php
Then you can use the exported csv file in MandelX by overwriting the 'results.csv' file in the folder of MandelX with the exported one.

The attachment mandelx_compare.png is no longer available

Website, Youtube
Falcosoft Soundfont Midi Player + Munt VSTi + BassMidi VSTi
VST Midi Driver Midi Mapper
x86 microarchitecture benchmark (MandelX)

Reply 43 of 67, by jtchip

User metadata
Rank Member
Rank
Member

More results from a Pentium III 500 (Katmai) and a Celeron 900 (Coppermine), ran a few days ago. One issue is it misidentifies PIII CPUs, I guess back then the CPUID instruction doesn't report a model name so I've included CPU-Z in the screenshot.

I was moderately surprised that the normalised SSE results were the same since (according to Wikipedia anyway), the SSE implementation in Katmai double-cycles the existing 64-bit data paths. Is the SSE code scheduled for Katmai or a later CPU?

Falcosoft wrote on 2025-06-18, 07:17:

It would be interesting to see the results of older, rarer CPUs such as later Cyrix/VIA chips. Or even IDT WinChip and Transmeta Crusoe 😀

I have some of these CPUs but they don't have Windows (or even a fixed storage device) so a DOS or Linux version of the benchmark would be ideal.

Reply 44 of 67, by Falcosoft

User metadata
Rank l33t
Rank
l33t
jtchip wrote on 2025-06-22, 23:43:
More results from a Pentium III 500 (Katmai) and a Celeron 900 (Coppermine), ran a few days ago. One issue is it misidentifies P […]
Show full quote

More results from a Pentium III 500 (Katmai) and a Celeron 900 (Coppermine), ran a few days ago. One issue is it misidentifies PIII CPUs, I guess back then the CPUID instruction doesn't report a model name so I've included CPU-Z in the screenshot.

I was moderately surprised that the normalised SSE results were the same since (according to Wikipedia anyway), the SSE implementation in Katmai double-cycles the existing 64-bit data paths. Is the SSE code scheduled for Katmai or a later CPU?

Falcosoft wrote on 2025-06-18, 07:17:

It would be interesting to see the results of older, rarer CPUs such as later Cyrix/VIA chips. Or even IDT WinChip and Transmeta Crusoe 😀

I have some of these CPUs but they don't have Windows (or even a fixed storage device) so a DOS or Linux version of the benchmark would be ideal.

Hi,
Thanks for the results, I have uploaded them.
No code is optimized specifically for any micro-architectures. This is also true for the SSE code path. But it is also true for all code paths that in the busiest inner loop there is no memory access at all, only register to register operations.
So I think the narrower bandwidth does not play a big role since 128-bit data is only accessed/loaded at the very beginning once.
According to Agner Fog's instruction table documentations the Katmai and Coppermine instruction latencies and throughput for SSE instructions do not differ.
This can explain the similar results. Which is not bad at all. 3x faster SSE performance compared to x87 FPU is rather good considering the well performing FPU in PIII.
Compared to contemporary AMD K6 chips the PIII's FPU performance is about 2x clock for clock.
It's also interesting that clock for clock the K6-2/3 has the same 3DNow! performance as the PIII's SSE performance. That is about ~60 pixels/msec (1GHz normalized).

Website, Youtube
Falcosoft Soundfont Midi Player + Munt VSTi + BassMidi VSTi
VST Midi Driver Midi Mapper
x86 microarchitecture benchmark (MandelX)

Reply 45 of 67, by Falcosoft

User metadata
Rank l33t
Rank
l33t
jtchip wrote on 2025-06-22, 23:43:

...
I have some of these CPUs but they don't have Windows (or even a fixed storage device) so a DOS or Linux version of the benchmark would be ideal.

Unfortunately a native DOS or Linux version is not an option currently. But MandelX works perfectly under Wine. So if you could run the benchmark on Linux/Wine that would be perfect.

Website, Youtube
Falcosoft Soundfont Midi Player + Munt VSTi + BassMidi VSTi
VST Midi Driver Midi Mapper
x86 microarchitecture benchmark (MandelX)

Reply 46 of 67, by Standard Def Steve

User metadata
Rank Oldbie
Rank
Oldbie
jtchip wrote on 2025-06-22, 23:43:

I was moderately surprised that the normalised SSE results were the same since (according to Wikipedia anyway), the SSE implementation in Katmai double-cycles the existing 64-bit data paths.

Even the P4 and Athlon 64 have 64-bit wide SIMD hardware & require two cycles for SSE instructions.
Core 2 and Phenom were the first CPUs with 128-bit FPUs that could pull off single-cycle SSE.

"A little sign-in here, a touch of WiFi there..."

Reply 47 of 67, by Falcosoft

User metadata
Rank l33t
Rank
l33t
Standard Def Steve wrote on 2025-06-23, 01:51:
jtchip wrote on 2025-06-22, 23:43:

I was moderately surprised that the normalised SSE results were the same since (according to Wikipedia anyway), the SSE implementation in Katmai double-cycles the existing 64-bit data paths.

Even the P4 and Athlon 64 have 64-bit wide SIMD hardware & require two cycles for SSE instructions.
Core 2 and Phenom were the first CPUs with 128-bit FPUs that could pull off single-cycle SSE.

Thanks for the info!
Yep, the Athlon 64 (Turion) vs. Phenom 2 results seem to confirm your claim: The ALU/FPU/3DNow! normalized results are almost exactly the same while the SSE/SSE2 performance of the Phenom 2 is about 25% better.

The attachment A64_Phenom2.png is no longer available

Website, Youtube
Falcosoft Soundfont Midi Player + Munt VSTi + BassMidi VSTi
VST Midi Driver Midi Mapper
x86 microarchitecture benchmark (MandelX)

Reply 48 of 67, by nickles rust

User metadata
Rank Newbie
Rank
Newbie

I ran this on an IBM 6x86M at 200MHz and got ALU 2.03, FPU 1.15. I think this is the oldest CPU I have.

Reply 49 of 67, by Falcosoft

User metadata
Rank l33t
Rank
l33t
nickles rust wrote on 2025-06-23, 13:23:

I ran this on an IBM 6x86M at 200MHz and got ALU 2.03, FPU 1.15. I think this is the oldest CPU I have.

Thanks,
I know this can be painful because of the speed of this CPU but a screenshot would be very helpful. It's because the 'msec' value is the real raw data, the others are derived data (with rounding).
If you do not want to run the benchmark again because of the long run time at least make a screenshot about the benchmark dialog without pressing start. The Vendor/CPU name info is still more than nothing.
BTW, your results confirm the common knowledge about Cyrix CPUs: better than Pentium's integer performance but worse FPU performance.
(PCem v17 Pentium 200 MHz - ALU: 1.75, FPU: 2.34)

Website, Youtube
Falcosoft Soundfont Midi Player + Munt VSTi + BassMidi VSTi
VST Midi Driver Midi Mapper
x86 microarchitecture benchmark (MandelX)

Reply 50 of 67, by jtchip

User metadata
Rank Member
Rank
Member
Falcosoft wrote on 2025-06-23, 01:31:

Unfortunately a native DOS or Linux version is not an option currently. But MandelX works perfectly under Wine. So if you could run the benchmark on Linux/Wine that would be perfect.

To be clear, I was referring to a command-line benchmark-only port but I understand even that takes some work refactoring, porting, etc. I boot Linux (console-only Debian) via PXE and nfsroot on the vintage systems so they're not set up with an X server, or any GUI.

Standard Def Steve wrote on 2025-06-23, 01:51:

Even the P4 and Athlon 64 have 64-bit wide SIMD hardware & require two cycles for SSE instructions.
Core 2 and Phenom were the first CPUs with 128-bit FPUs that could pull off single-cycle SSE.

Thanks for the clarification. Reading that section again, it mentions the 64-bit data path then instruction scheduling on Katmai vs. later CPUs and I mistakenly conflated the two. Looking at the Intel results, the normalised SSE results go from about 60 for the PIII, 68 for a Pentium M (and Atom Silvermont), then to 108 for a Core 2 so that matches the expectation (ignoring the unusually low 33 for the P4 Prescott).

Reply 51 of 67, by Falcosoft

User metadata
Rank l33t
Rank
l33t
jtchip wrote on 2025-06-23, 23:05:

...
To be clear, I was referring to a command-line benchmark-only port but I understand even that takes some work refactoring, porting, etc. I boot Linux (console-only Debian) via PXE and nfsroot on the vintage systems so they're not set up with an X server, or any GUI.

Hi,
I have written a DOS version of the benchmark. The results are somewhat comparable to the Windows version's results but at the same CPU speed the DOS version tends to be faster (not so surprising).
If you have a CPU with SSE/AVX support you should run the included simd95.com utility before the benchmark (or run the included AVXSSE.BAT to start the benchmark).
It's important that no memory manager that puts the CPU into protected/v86 mode should be loaded (HIMEM.SYS is OK, EMM386 is not).
You can choose right at beginning to run the test in graphics or text mode. The text mode is marginally faster. The graphics mode requires VESA 1.2 1024x768 8-bit resolution.
At the end of the run you can choose to save a 'result.txt' file. Then you can upload the results here. The format is similar to the Windows version:

CPU Vendor: AuthenticAMD
CPU ID: 100FA0
CPU speed: 3449 MHz
System: DOS 7.10
Mode: Text

Time(msec) Pixels/msec Pixels/msec(1GHz)
ALU: 5944 132.31 38.36
FPU: 6983 112.62 32.65
3DNow!: 3874 203.00 58.86
SSE: 2086 377.00 109.31
SSE2: 3529 222.85 64.61
AVX: N/A N/A N/A
The attachment MANBENCH_DOS.ZIP is no longer available

Website, Youtube
Falcosoft Soundfont Midi Player + Munt VSTi + BassMidi VSTi
VST Midi Driver Midi Mapper
x86 microarchitecture benchmark (MandelX)

Reply 52 of 67, by jtchip

User metadata
Rank Member
Rank
Member
Falcosoft wrote on 2025-06-25, 00:32:

I have written a DOS version of the benchmark. The results are somewhat comparable to the Windows version's results but at the same CPU speed the DOS version tends to be faster (not so surprising).

Thanks, that was quick! I've attached results for a Vortex86DX2 933MHz and a VIA C7-D 1500MHz. These appear to be the slowest normalised FPU (and SIMD in the case of the C7) results so far, the latter being about 41.9% of the Cyrix 6x86MX result and the former another 12.4% slower. The early VIA C3 with its half-speed FPU should be slower still but I don't have one. I do have a Geode GX1 (based on the Cyrix 5x86 core) which I'll post in a few days as that system needs some assembly.

Remember to also post Start me up's Atom D2550 result, that seems to be the only Bonnell result so far.

Reply 53 of 67, by Falcosoft

User metadata
Rank l33t
Rank
l33t
jtchip wrote on 2025-06-26, 00:48:

Thanks, that was quick! I've attached results for a Vortex86DX2 933MHz and a VIA C7-D 1500MHz. These appear to be the slowest normalised FPU (and SIMD in the case of the C7) results so far, the latter being about 41.9% of the Cyrix 6x86MX result and the former another 12.4% slower. The early VIA C3 with its half-speed FPU should be slower still but I don't have one. I do have a Geode GX1 (based on the Cyrix 5x86 core) which I'll post in a few days as that system needs some assembly.
...

Thanks!
I have uploaded the results.

jtchip wrote on 2025-06-26, 00:48:

...
Remember to also post Start me up's Atom D2550 result, that seems to be the only Bonnell result so far.

I do not think that the CPU MHz value is correct in that case (Atom D2550 also has an invariant TSC).
Similarly to Start me up's Core i5 results I have the feeling that the CPU operated below its nominal speed. Unfortunately ThrottleStop cannot help in case of Atoms.

BTW, according to Intel the Atom D2550 is not a Bonnell but a Saltwell/Cedarview one:
https://www.intel.com/content/www/us/en/produ … ifications.html

There is no sign the Cedarview Atoms should be so much slower on 1 core and clock for clock compared to Bay Trail/Cherry Trail Atoms:
https://www.cpubenchmark.net/cpu.php?cpu=Inte … +1.86GHz&id=606
https://www.cpubenchmark.net/cpu.php?cpu=Inte … 1.46GHz&id=2233

And here are Start me up's reults. According to these results the Cedarview Atom D2550 has about half of the 1 core 1GHz normalized processing power of Atom E3815.

Intel%20Atom%20D2550%20MandelX.png
Intel%20Atom%20E3815%20at%201.46%20GHz.png

Last edited by Falcosoft on 2025-06-26, 12:04. Edited 1 time in total.

Website, Youtube
Falcosoft Soundfont Midi Player + Munt VSTi + BassMidi VSTi
VST Midi Driver Midi Mapper
x86 microarchitecture benchmark (MandelX)

Reply 54 of 67, by Falcosoft

User metadata
Rank l33t
Rank
l33t

I'm sorry but I was wrong so I have to correct myself.
I 'm succeeded in starting an old Samsung NC210 netbook with a "Pineview/Bonnell" Intel Atom N570 1.66 GHz (using an MS-DOS pendrive).
And the results reflect perfectly the results of Start me up's Atom D2550:

CPU Vendor: GenuineIntel
CPU ID: 0106CA
CPU speed: 1662 MHz
System: DOS 7.10
Mode: Text

Time(msec) Pixels/msec Pixels/msec(1GHz)
ALU: 32698 24.05 14.47
FPU: 47738 16.47 9.91
3DNow!: N/A N/A N/A
SSE: 11544 68.12 40.99
SSE2: 33557 23.44 14.10
AVX: N/A N/A N/A

So it seems early Atoms are really that much slower...
I will upload both results as soon as I can.

Website, Youtube
Falcosoft Soundfont Midi Player + Munt VSTi + BassMidi VSTi
VST Midi Driver Midi Mapper
x86 microarchitecture benchmark (MandelX)

Reply 55 of 67, by myne

User metadata
Rank Oldbie
Rank
Oldbie

Iirc they were modified pentium cores

I built:
Convert old ASUS ASC boardviews to KICAD PCB!
Re: A comprehensive guide to install and play MechWarrior 2 on new versions on Windows.
Dos+Windows 3.11+tcp+vbe_svga auto-install iso template
Script to backup Win9x\ME drivers from a working install
Re: The thing no one asked for: KICAD 440bx reference schematic

Reply 56 of 67, by Falcosoft

User metadata
Rank l33t
Rank
l33t
myne wrote on 2025-06-26, 10:55:

Iirc they were modified pentium cores

Do you mean the Atoms are modified Pentium 1 cores?
In some broader sense this can be true (e.g. no Out-of-order execution in either Pentium or Atom) but in stricter sense it does not seem to be true.
The design philosophy and performance characteristics are rather different.
The original Pentium has better x87 FPU performance compared to its integer performance. E.g. its floating point multiplication is much faster than the integer one.
The Atoms seem to be just the opposite.

Website, Youtube
Falcosoft Soundfont Midi Player + Munt VSTi + BassMidi VSTi
VST Midi Driver Midi Mapper
x86 microarchitecture benchmark (MandelX)

Reply 57 of 67, by Falcosoft

User metadata
Rank l33t
Rank
l33t

Does someone know what this anomaly is about?

The attachment Intel_0000_anomaly.png is no longer available

CPU ID string says Genuine Intel(R) CPU 0000 @ 2.80GHz.
Searching with Google it seems that this is rather widespread:
https://www.google.com/search?client=firefox- … 28R%29+CPU+0000

Maybe it would have been better to save/send the CPUID directly similarly to how the DOS version does this since this way it's hard to tell what the affected CPUs can really be.

Website, Youtube
Falcosoft Soundfont Midi Player + Munt VSTi + BassMidi VSTi
VST Midi Driver Midi Mapper
x86 microarchitecture benchmark (MandelX)

Reply 58 of 67, by myne

User metadata
Rank Oldbie
Rank
Oldbie

Ok, apparently it was a clean sheet design, but is similar to the p5

I built:
Convert old ASUS ASC boardviews to KICAD PCB!
Re: A comprehensive guide to install and play MechWarrior 2 on new versions on Windows.
Dos+Windows 3.11+tcp+vbe_svga auto-install iso template
Script to backup Win9x\ME drivers from a working install
Re: The thing no one asked for: KICAD 440bx reference schematic

Reply 59 of 67, by jtchip

User metadata
Rank Member
Rank
Member
Falcosoft wrote on 2025-06-26, 08:27:

BTW, according to Intel the Atom D2550 is not a Bonnell but a Saltwell/Cedarview one:
https://www.intel.com/content/www/us/en/produ … ifications.html

You corrected yourself later but for anyone else wondering, Bonnell refers to the CPU microarchitecture and Saltwell/Cedarview refers to the processor code name (which also includes other aspects like semiconductor process).

Anyway, interestingly Esther (VIA C7) slightly beats Bonnell in SSE2 on this workload, 16 pixels/ms vs 14, the only "win" it has.

The rest of my results are from a NSC Geode GX1 300MHz (FPU_FAST enabled, slowest ALU result), Athlon 5350 (Kabini, slowest AVX), and Athlon 64 X2 5000+. The model names from CPUID (perhaps the DOS version should output this too), including the C7-D, are (from /proc/cpuinfo in Linux):

  • VIA Esther processor 1500MHz
  • Geode(TM) Integrated Processor by National Semi
  • AMD Athlon(tm) 5350 APU with Radeon(tm) R3
  • AMD Athlon(tm) 64 X2 Dual Core Processor 5000+