VOGONS


x86 microarchitecture benchmark (MandelX)

Topic actions

Reply 80 of 97, by Falcosoft

User metadata
Rank l33t
Rank
l33t
Hoping wrote on 2025-07-06, 23:04:

I guess it's my QTJ0, BGA 1440 ES with a Socket 1151 interposer.
I didn't notice the comment box before I clicked submit.
Sorry.

Hi,
Thanks for your many new results!
1. One of the 2 'Intel Classmate PC v3' Atom results seems to be a duplicate. I remove the later if you do not mind.
2. The results of the Intel(R) Pentium(R) CPU N4200 @ 1.10GHz are surprisingly good. It belongs to the low-power new 'Atom' series yet its 1GHz normalized results are closer to 4rd Gen Core i5 processors!

Website, Youtube
Falcosoft Soundfont Midi Player + Munt VSTi + BassMidi VSTi
VST Midi Driver Midi Mapper
x86 microarchitecture benchmark (MandelX)

Reply 81 of 97, by Hoping

User metadata
Rank Oldbie
Rank
Oldbie
Falcosoft wrote on 2025-07-07, 13:42:
Hi, Thanks for your many new results! 1. One of the 2 'Intel Classmate PC v3' Atom results seems to be a duplicate. I remove the […]
Show full quote

Hi,
Thanks for your many new results!
1. One of the 2 'Intel Classmate PC v3' Atom results seems to be a duplicate. I remove the later if you do not mind.
2. The results of the Intel(R) Pentium(R) CPU N4200 @ 1.10GHz are surprisingly good. It belongs to the low-power new 'Atom' series yet its 1GHz normalized results are closer to 4rd Gen Core i5 processors!

Yes, possibly, the N2600 result is duplicated, it is so slow that I could have clicked twice thinking that the first one didn't work. This is common with slower equipment if you don't pay attention.
The N4200 performs very well in real use, I use it for XP virtual machines with VirtualBox. I use it for using the somewhat older car workshop software. Saab for example.
I still have laptops and desktops left.
Your benchmark is very easy and fast to use.
I'm trying to make the screenshots in each test as in the first case, I don't upload them here because I think it would clutter the thread with too many images. If you are interested in a specific one, no problem.

Edit:
I'm using the maximum 1 core frequency reported by Hwinfo, not the theoretical maximum frequency by the datasheet because sometimes the CPU can't reach its maximum frequency because of the TDP limit. So in another environment, it is possible the same CPU would reach higher frequency or never reach the target turbo frequency.
The A10-5750M and the A10-7300M are two examples of this. Both CPUs have problems to reach the maximum turbo frequency at stock settings and need an undervolt to maintain the Turbo frequencies.
An I doing it wrong?
Using your benchmark is the perfect excuse to power on the computers that are o storage. 😉

Reply 82 of 97, by Falcosoft

User metadata
Rank l33t
Rank
l33t
Hoping wrote on 2025-07-07, 21:17:
Yes, possibly, the N2600 result is duplicated, it is so slow that I could have clicked twice thinking that the first one didn't […]
Show full quote
Falcosoft wrote on 2025-07-07, 13:42:
Hi, Thanks for your many new results! 1. One of the 2 'Intel Classmate PC v3' Atom results seems to be a duplicate. I remove the […]
Show full quote

Hi,
Thanks for your many new results!
1. One of the 2 'Intel Classmate PC v3' Atom results seems to be a duplicate. I remove the later if you do not mind.
2. The results of the Intel(R) Pentium(R) CPU N4200 @ 1.10GHz are surprisingly good. It belongs to the low-power new 'Atom' series yet its 1GHz normalized results are closer to 4rd Gen Core i5 processors!

Yes, possibly, the N2600 result is duplicated, it is so slow that I could have clicked twice thinking that the first one didn't work. This is common with slower equipment if you don't pay attention.
The N4200 performs very well in real use, I use it for XP virtual machines with VirtualBox. I use it for using the somewhat older car workshop software. Saab for example.
I still have laptops and desktops left.
Your benchmark is very easy and fast to use.
I'm trying to make the screenshots in each test as in the first case, I don't upload them here because I think it would clutter the thread with too many images. If you are interested in a specific one, no problem.

Edit:
I'm using the maximum 1 core frequency reported by Hwinfo, not the theoretical maximum frequency by the datasheet because sometimes the CPU can't reach its maximum frequency because of the TDP limit. So in another environment, it is possible the same CPU would reach higher frequency or never reach the target turbo frequency.
The A10-5750M and the A10-7300M are two examples of this. Both CPUs have problems to reach the maximum turbo frequency at stock settings and need an undervolt to maintain the Turbo frequencies.
An I doing it wrong?
Using your benchmark is the perfect excuse to power on the computers that are o storage. 😉

Hi,

1. It's not a problem if you do not upload the screenshots. Uploading the results directly to database is the more straightforward approach. Originally I thought all the results would be uploaded this way but later it became obvious that not everyone can/want to upload results this way. So the 'screenshot method' was born and it's also perfectly fine as long as I can handle the amount and can save the data to the database manually.

2. MandelX benchmark uses only 1 thread and sets the processor affinity to the selected 1 CPU core ( 1st Core by default). This is the exact scenario where modern CPUs can use their maximum ' 1 core turbo frequency'. So even if the CPU's datasheet reports lower maximum frequency for more cores/all cores than for 1 core still the 1 core maximum frequency should be used. In case of such a workload maximum TDP limit is almost never a constraint since in case of multi-core processors only a fraction of the available resources is used during the benchmark.
The problem you described (undervolting is needed to maintain turbo frequencies) usually only happens with multi-threaded workloads that are using all available CPU cores (including hyper-threaded logical ones). Of course there can be exceptions so you should decide whether the documented 1 core turbo frequency is really used or not in case of your own CPU. Comparing your results to existing ones can help to decide if it's used or not.
Typical problem is that sometimes other background tasks (web browsing, active downloads etc.) use other cores so maximum 1 core turbo frequency cannot be reached by the 1 core that the benchmark uses.

The Hwinfo method you mentioned is a very trustworthy one. Make sure you type in the 'Max' reached frequency reported by HWinfo not the actual one.
But I reserve the right to correct suspicious 1 core MHz values 😀

3. All the results of your parking computers are welcome!

Website, Youtube
Falcosoft Soundfont Midi Player + Munt VSTi + BassMidi VSTi
VST Midi Driver Midi Mapper
x86 microarchitecture benchmark (MandelX)

Reply 83 of 97, by Falcosoft

User metadata
Rank l33t
Rank
l33t

Some interesting proof that the benchmark is really 'vendor neutral' 😀

The Core 2 and K10 were rivals about ~15 years ago. In this case the Core 2 Duo E7500 and the Phenom II x4 960T run almost at the same frequency and produce almost the same results:

The attachment Core2_Phenom2.png is no longer available

And rivals from the present also with almost identical frequency and results: Intel(R) Core(TM) Ultra 9 285K vs. AMD Ryzen 7 9700X (Zen5)

The attachment CoreUltra_Ryzen5.png is no longer available

Website, Youtube
Falcosoft Soundfont Midi Player + Munt VSTi + BassMidi VSTi
VST Midi Driver Midi Mapper
x86 microarchitecture benchmark (MandelX)

Reply 84 of 97, by Hoping

User metadata
Rank Oldbie
Rank
Oldbie
Falcosoft wrote on 2025-07-08, 07:06:
Hi, […]
Show full quote
Hoping wrote on 2025-07-07, 21:17:
Yes, possibly, the N2600 result is duplicated, it is so slow that I could have clicked twice thinking that the first one didn't […]
Show full quote
Falcosoft wrote on 2025-07-07, 13:42:
Hi, Thanks for your many new results! 1. One of the 2 'Intel Classmate PC v3' Atom results seems to be a duplicate. I remove the […]
Show full quote

Hi,
Thanks for your many new results!
1. One of the 2 'Intel Classmate PC v3' Atom results seems to be a duplicate. I remove the later if you do not mind.
2. The results of the Intel(R) Pentium(R) CPU N4200 @ 1.10GHz are surprisingly good. It belongs to the low-power new 'Atom' series yet its 1GHz normalized results are closer to 4rd Gen Core i5 processors!

Yes, possibly, the N2600 result is duplicated, it is so slow that I could have clicked twice thinking that the first one didn't work. This is common with slower equipment if you don't pay attention.
The N4200 performs very well in real use, I use it for XP virtual machines with VirtualBox. I use it for using the somewhat older car workshop software. Saab for example.
I still have laptops and desktops left.
Your benchmark is very easy and fast to use.
I'm trying to make the screenshots in each test as in the first case, I don't upload them here because I think it would clutter the thread with too many images. If you are interested in a specific one, no problem.

Edit:
I'm using the maximum 1 core frequency reported by Hwinfo, not the theoretical maximum frequency by the datasheet because sometimes the CPU can't reach its maximum frequency because of the TDP limit. So in another environment, it is possible the same CPU would reach higher frequency or never reach the target turbo frequency.
The A10-5750M and the A10-7300M are two examples of this. Both CPUs have problems to reach the maximum turbo frequency at stock settings and need an undervolt to maintain the Turbo frequencies.
An I doing it wrong?
Using your benchmark is the perfect excuse to power on the computers that are o storage. 😉

Hi,

1. It's not a problem if you do not upload the screenshots. Uploading the results directly to database is the more straightforward approach. Originally I thought all the results would be uploaded this way but later it became obvious that not everyone can/want to upload results this way. So the 'screenshot method' was born and it's also perfectly fine as long as I can handle the amount and can save the data to the database manually.

2. MandelX benchmark uses only 1 thread and sets the processor affinity to the selected 1 CPU core ( 1st Core by default). This is the exact scenario where modern CPUs can use their maximum ' 1 core turbo frequency'. So even if the CPU's datasheet reports lower maximum frequency for more cores/all cores than for 1 core still the 1 core maximum frequency should be used. In case of such a workload maximum TDP limit is almost never a constraint since in case of multi-core processors only a fraction of the available resources is used during the benchmark.
The problem you described (undervolting is needed to maintain turbo frequencies) usually only happens with multi-threaded workloads that are using all available CPU cores (including hyper-threaded logical ones). Of course there can be exceptions so you should decide whether the documented 1 core turbo frequency is really used or not in case of your own CPU. Comparing your results to existing ones can help to decide if it's used or not.
Typical problem is that sometimes other background tasks (web browsing, active downloads etc.) use other cores so maximum 1 core turbo frequency cannot be reached by the 1 core that the benchmark uses.

The Hwinfo method you mentioned is a very trustworthy one. Make sure you type in the 'Max' reached frequency reported by HWinfo not the actual one.
But I reserve the right to correct suspicious 1 core MHz values 😀

3. All the results of your parking computers are welcome!

I was trying to be faithful to the test scenario, sorry, if what you need is the maximum frequency indicated in the datasheet or reported as theoretical by CPU-Z/Hwinfo, that will be the one I use, it's your test and your work, I didn't mean to hinder it.
Sorry again.
This problem is typical of laptops, as it is not easy to improve cooling.
The A10-7300M has a maximum frequency of 3200 MHz and a maximum TDP of 19 W according to specifications, according to HWinfo it reaches about 27 W maximum, the temperature never exceeds 64 C and without undervolting it never exceeds 2700 MHz on a single core.
The A10-5570M is known to overheat in many cases and also rarely reaches the maximum frequency. With undervolting and a custom heatsink it maintains the maximum frequency, but this is not the usual scenario.
The N4200 has also been found to be limited by the TDP set by the computer manufacturer, although its maximum frequency according to specifications is 2500 MHz, I have never seen it go over 2400 MHz on the one I have, even if the temperature stays within the specified limits. There are threads on other forums on how to change the maximum and minimum TDP of this processor.
The problem is that the operating system loads all threads to a greater or lesser extent and even if the test is performed on only one thread, the operating system is using the others, in the case of the A10-7300M, Windows 10, I know this behavior well, as I used that laptop as my main laptop for many years.

Anyway, I think have already tested it on the laptops I have that show this behavior.
I'll keep doing the test at the same time that I check the performance of the ones I have stored, as I said, it's the perfect excuse to turn then on.

Reply 85 of 97, by Falcosoft

User metadata
Rank l33t
Rank
l33t
Hoping wrote on 2025-07-08, 08:40:
I was trying to be faithful to the test scenario, sorry, if what you need is the maximum frequency indicated in the datasheet or […]
Show full quote
Falcosoft wrote on 2025-07-08, 07:06:
Hi, […]
Show full quote
Hoping wrote on 2025-07-07, 21:17:
Yes, possibly, the N2600 result is duplicated, it is so slow that I could have clicked twice thinking that the first one didn't […]
Show full quote

Yes, possibly, the N2600 result is duplicated, it is so slow that I could have clicked twice thinking that the first one didn't work. This is common with slower equipment if you don't pay attention.
The N4200 performs very well in real use, I use it for XP virtual machines with VirtualBox. I use it for using the somewhat older car workshop software. Saab for example.
I still have laptops and desktops left.
Your benchmark is very easy and fast to use.
I'm trying to make the screenshots in each test as in the first case, I don't upload them here because I think it would clutter the thread with too many images. If you are interested in a specific one, no problem.

Edit:
I'm using the maximum 1 core frequency reported by Hwinfo, not the theoretical maximum frequency by the datasheet because sometimes the CPU can't reach its maximum frequency because of the TDP limit. So in another environment, it is possible the same CPU would reach higher frequency or never reach the target turbo frequency.
The A10-5750M and the A10-7300M are two examples of this. Both CPUs have problems to reach the maximum turbo frequency at stock settings and need an undervolt to maintain the Turbo frequencies.
An I doing it wrong?
Using your benchmark is the perfect excuse to power on the computers that are o storage. 😉

Hi,

1. It's not a problem if you do not upload the screenshots. Uploading the results directly to database is the more straightforward approach. Originally I thought all the results would be uploaded this way but later it became obvious that not everyone can/want to upload results this way. So the 'screenshot method' was born and it's also perfectly fine as long as I can handle the amount and can save the data to the database manually.

2. MandelX benchmark uses only 1 thread and sets the processor affinity to the selected 1 CPU core ( 1st Core by default). This is the exact scenario where modern CPUs can use their maximum ' 1 core turbo frequency'. So even if the CPU's datasheet reports lower maximum frequency for more cores/all cores than for 1 core still the 1 core maximum frequency should be used. In case of such a workload maximum TDP limit is almost never a constraint since in case of multi-core processors only a fraction of the available resources is used during the benchmark.
The problem you described (undervolting is needed to maintain turbo frequencies) usually only happens with multi-threaded workloads that are using all available CPU cores (including hyper-threaded logical ones). Of course there can be exceptions so you should decide whether the documented 1 core turbo frequency is really used or not in case of your own CPU. Comparing your results to existing ones can help to decide if it's used or not.
Typical problem is that sometimes other background tasks (web browsing, active downloads etc.) use other cores so maximum 1 core turbo frequency cannot be reached by the 1 core that the benchmark uses.

The Hwinfo method you mentioned is a very trustworthy one. Make sure you type in the 'Max' reached frequency reported by HWinfo not the actual one.
But I reserve the right to correct suspicious 1 core MHz values 😀

3. All the results of your parking computers are welcome!

I was trying to be faithful to the test scenario, sorry, if what you need is the maximum frequency indicated in the datasheet or reported as theoretical by CPU-Z/Hwinfo, that will be the one I use, it's your test and your work, I didn't mean to hinder it.
Sorry again.
This problem is typical of laptops, as it is not easy to improve cooling.
The A10-7300M has a maximum frequency of 3200 MHz and a maximum TDP of 19 W according to specifications, according to HWinfo it reaches about 27 W maximum, the temperature never exceeds 64 C and without undervolting it never exceeds 2700 MHz on a single core.
The A10-5570M is known to overheat in many cases and also rarely reaches the maximum frequency. With undervolting and a custom heatsink it maintains the maximum frequency, but this is not the usual scenario.
The N4200 has also been found to be limited by the TDP set by the computer manufacturer, although its maximum frequency according to specifications is 2500 MHz, I have never seen it go over 2400 MHz on the one I have, even if the temperature stays within the specified limits. There are threads on other forums on how to change the maximum and minimum TDP of this processor.
The problem is that the operating system loads all threads to a greater or lesser extent and even if the test is performed on only one thread, the operating system is using the others, in the case of the A10-7300M, Windows 10, I know this behavior well, as I used that laptop as my main laptop for many years.

Anyway, I think have already tested it on the laptops I have that show this behavior.
I'll keep doing the test at the same time that I check the performance of the ones I have stored, as I said, it's the perfect excuse to turn then on.

Hi,
I understand your arguments and I agree that if the CPU never reaches the advertised '1 core turbo speed' during single core testing then the real value should be used shown by HWInfo's 'Max' field.
Overall I see you know what you are doing so I trust your judgement. Seeing the N4200 's results I thought the 2400 MHz was a typo and it really worked at the advertised 2500 MHz. Now that I know your detailed explanation I also know that I was wrong so I'm correcting it in the table right now (the 1GHz normalized values are automatically corrected based on this change).

Website, Youtube
Falcosoft Soundfont Midi Player + Munt VSTi + BassMidi VSTi
VST Midi Driver Midi Mapper
x86 microarchitecture benchmark (MandelX)

Reply 86 of 97, by Hoping

User metadata
Rank Oldbie
Rank
Oldbie
Falcosoft wrote on 2025-07-08, 07:06:

Hi,
I understand your arguments and I agree that if the CPU never reaches the advertised '1 core turbo speed' during single core testing then the real value should be used shown by HWInfo's 'Max' field.
Overall I see you know what you are doing so I trust your judgement. Seeing the N4200 's results I thought the 2400 MHz was a typo and it really worked at the advertised 2500 MHz. Now that I know your detailed explanation I also know that I was wrong so I'm correcting it in the table right now (the 1GHz normalized values are automatically corrected based on this change).

Well, it's clear that I don't know what I'm doing, that's why I'm taking screenshots, I was convinced, but I think I mixed things up from different computers.
I did again the tests and new screenshots, this time I made screenshots of the HWinfo summary and the sensors.
N4200, maximum frequency during the test, 2500 MHz.
With the A10-5750M I also made the same mistake, one core reaches the maximum frequency of 3530 MHz.
In the case of A10-7300M I repeated the test several times and during the test it reaches 3240 MHz.
These frequencies are the maximum reported by HWinfo after the test.
There is no excuse for my mistake.
You are right, and it is your project, your rules.
Thanks for your patience.

What do you need me to do now to correct my mistakes? I'm very sorry.

Reply 87 of 97, by Falcosoft

User metadata
Rank l33t
Rank
l33t
Hoping wrote on 2025-07-08, 16:53:

...
N4200, maximum frequency during the test, 2500 MHz.

Yeah, the results were too good to be true 😀

Hoping wrote on 2025-07-08, 16:53:

...
What do you need me to do now to correct my mistakes? I'm very sorry.

Actually nothing. The results are valid, only the 1 core MHz values have to be corrected. As I have said the 1 GHz normalized results are calculated by the SQL query real-time and these are the only results that are 1 core MHz value dependent.

Website, Youtube
Falcosoft Soundfont Midi Player + Munt VSTi + BassMidi VSTi
VST Midi Driver Midi Mapper
x86 microarchitecture benchmark (MandelX)

Reply 88 of 97, by Falcosoft

User metadata
Rank l33t
Rank
l33t

BTW, Thanks for the results, especially the last 1.6 GHz Willamette Pentium 4 one!

It's very interesting that the Willamette P4 does not produce the same poor SSE results as the Northwood/Prescott ones. In case of Willamette the SSE result is almost 2x of the later Northwood/Prescott P4 SSE results.

The attachment Willamette_Northwood.png is no longer available

@Edit:
And Pentium MMX results, very nice 😀

Website, Youtube
Falcosoft Soundfont Midi Player + Munt VSTi + BassMidi VSTi
VST Midi Driver Midi Mapper
x86 microarchitecture benchmark (MandelX)

Reply 89 of 97, by Falcosoft

User metadata
Rank l33t
Rank
l33t

The AMD K10 has not only multipliers but also dividers. With minimum multiplier and maximum divider its clock can be set to FSB/2 (100 MHz in case of default 200 MHz FSB).
With the DOS utility CPUSpd you can set this minimum clock:
CpuSpd - A Hardware Based CPU Speed Control Utility for DOS/Win9X Retro Gaming

And then you get this result:

CPU Vendor: AuthenticAMD
CPU ID: 100FA0
CPU speed: 114 MHz
System: DOS 7.10
Mode: Text

Time(msec) Pixels/msec Pixels/msec(1GHz)
ALU: 179943 4.37 38.34
FPU: 211349 3.72 32.64
3DNow!: 117389 6.70 58.77
SSE: 63111 12.46 109.31
SSE2: 106812 7.36 64.59
AVX: N/A N/A N/A

At this speed its 32-bit ALU and FPU performance is still better than a Pentium MMX 266 but its 16-bit real mode performance is comparable to a Pentium MMX (e.g. it does not trigger the infamous division by zero error in case of unpatched DOS Turbo Pascal programs and too fast CPUs).

The attachment P1MMX_K10.png is no longer available

Website, Youtube
Falcosoft Soundfont Midi Player + Munt VSTi + BassMidi VSTi
VST Midi Driver Midi Mapper
x86 microarchitecture benchmark (MandelX)

Reply 90 of 97, by Falcosoft

User metadata
Rank l33t
Rank
l33t
myne wrote on 2025-06-17, 08:19:

Ah, right. Be interesting to see the contrast of p and E.

Hi,
Thanks to BB we have i7-12700KF P-core vs. E-core results. It's intersesting that except AVX the 1 GHz normalized results of E-cores are not much worse than P-cores.

The attachment P_core_vs_Ecore.png is no longer available

https://falcosoft.hu/mandelx_benchmark_results.php

Website, Youtube
Falcosoft Soundfont Midi Player + Munt VSTi + BassMidi VSTi
VST Midi Driver Midi Mapper
x86 microarchitecture benchmark (MandelX)

Reply 91 of 97, by Falcosoft

User metadata
Rank l33t
Rank
l33t

Hi,
A new vwersion (1.7.3) of MandelX has been released.
https://falcosoft.hu/otesz/mandelx.zip

New features in benchmark section:

1. Added Auto Affinity option that can be useful for 12th+ Gen Intel processors that use Turbo Boost Max Technology 3.0.
According to Intel in-die variation during manufacturing produces some cores that are faster than others, some P-cores can outperform others.
Turbo Boost Max 3.0 capitalizes on these differences by identifying the best P-cores within the processor and routing work to them.

2. Added option to run the benchmark multiple times (1-4). In case of all code paths the best result of any rounds counts as the final result.

The attachment autoaffinity_runcount.png is no longer available

@Edit:
The 1st Intel 13th Gen result has arrived and confirms that it still has strong legacy x87 performance. As we suspected most likely Intel 14th Gen would produce similar results so it seems Intel 14th Gen is the last generation with emphasis on legacy x87 performance (Intel Core Ultra has Ryzen like relatively weak x87 FPU performance).

Website, Youtube
Falcosoft Soundfont Midi Player + Munt VSTi + BassMidi VSTi
VST Midi Driver Midi Mapper
x86 microarchitecture benchmark (MandelX)

Reply 92 of 97, by myne

User metadata
Rank Oldbie
Rank
Oldbie

I doubt it will change much.
I'd assume x87 instructions are simply put through the avx128 (or similar) pipeline to get 80bits, and presumably the extra precision is dropped.

Ie the instruction decoder just decodes it as an avx instruction.

I built:
Convert old ASUS ASC boardviews to KICAD PCB!
Re: A comprehensive guide to install and play MechWarrior 2 on new versions on Windows.
Dos+Windows 3.11+tcp+vbe_svga auto-install iso template
Script to backup Win9x\ME drivers from a working install
Re: The thing no one asked for: KICAD 440bx reference schematic

Reply 93 of 97, by Falcosoft

User metadata
Rank l33t
Rank
l33t
myne wrote on 2025-10-02, 09:32:

I doubt it will change much.
I'd assume x87 instructions are simply put through the avx128 (or similar) pipeline to get 80bits, and presumably the extra precision is dropped.

Ie the instruction decoder just decodes it as an avx instruction.

Hi,
I do not think this is exactly what happens. AVX (even AVX-512 ) still does not support bigger than 64-bit floats. So rather than 'dropping' extra precision (that does not exist) extra precision needs to be added for x87's 80-bit format.

Website, Youtube
Falcosoft Soundfont Midi Player + Munt VSTi + BassMidi VSTi
VST Midi Driver Midi Mapper
x86 microarchitecture benchmark (MandelX)

Reply 94 of 97, by myne

User metadata
Rank Oldbie
Rank
Oldbie

That's not the way I read it
https://en.m.wikipedia.org/wiki/Advanced_Vector_Extensions

I built:
Convert old ASUS ASC boardviews to KICAD PCB!
Re: A comprehensive guide to install and play MechWarrior 2 on new versions on Windows.
Dos+Windows 3.11+tcp+vbe_svga auto-install iso template
Script to backup Win9x\ME drivers from a working install
Re: The thing no one asked for: KICAD 440bx reference schematic

Reply 95 of 97, by Falcosoft

User metadata
Rank l33t
Rank
l33t
myne wrote on 2025-10-02, 10:54:

Then read it again 😀
There is no sign in your linked document either about supporting bigger than 64-bit (i.e. double) floating point formats.
From your linked document:

AVX uses sixteen YMM registers to perform a single instruction on multiple pieces of data (see SIMD). Each YMM register can hold […]
Show full quote

AVX uses sixteen YMM registers to perform a single instruction on multiple pieces of data (see SIMD). Each YMM register can hold and do simultaneous operations (math) on:

eight 32-bit single-precision floating-point numbers or
four 64-bit double-precision floating-point numbers.

The width of the SIMD registers is increased from 128 bits to 256 bits, and renamed from XMM0–XMM7 to YMM0–YMM7 (in x86-64 mode, from XMM0–XMM15 to YMM0–YMM15). The legacy SSE instructions can still be utilized via the VEX prefix to operate on the lower 128 bits of the YMM registers.

All the relevant AVX instructions have 2 precision related forms similarly to SSE instructions:
The single precision (32-bit) ones have an -'S' suffix and the double precision (64-bit) ones have a -'D' suffix. There are no versions for bigger precision (e.g. there is no -'Q' suffix for quad precision 128-bit floats).
Examples:

vaddss - add scalar single
vaddsd -add scalar double
vaddps - add packed single
vaddpd -add packed double

vmulss - multiply scalar single
vmulsd - multiply scalar double
vmulps - multiply packed single
vmulpd -multiply packed double

And so on...

Website, Youtube
Falcosoft Soundfont Midi Player + Munt VSTi + BassMidi VSTi
VST Midi Driver Midi Mapper
x86 microarchitecture benchmark (MandelX)

Reply 96 of 97, by myne

User metadata
Rank Oldbie
Rank
Oldbie

"In CPUs with the vector length (VL) extension—included in most AVX-512-capable processors (see § CPUs with AVX-512)—these instructions may also be used on the 128-bit and 256-bit vector sizes."

I built:
Convert old ASUS ASC boardviews to KICAD PCB!
Re: A comprehensive guide to install and play MechWarrior 2 on new versions on Windows.
Dos+Windows 3.11+tcp+vbe_svga auto-install iso template
Script to backup Win9x\ME drivers from a working install
Re: The thing no one asked for: KICAD 440bx reference schematic

Reply 97 of 97, by Falcosoft

User metadata
Rank l33t
Rank
l33t
myne wrote on 2025-10-02, 11:23:

"In CPUs with the vector length (VL) extension—included in most AVX-512-capable processors (see § CPUs with AVX-512)—these instructions may also be used on the 128-bit and 256-bit vector sizes."

What are these instructions exactly? I strongly think that this is only an unfortunate wording and in reality the sentence simply refers to the 128-bit and 256-bit wide vector registers, nothing more.
There is no 128-bit wide floating point format (let alone 256-bit!) support in AVX/AVX-512 for normal mathematical operations (additions, subtractions, multiplications, divisions) for sure. The wider vector registers are only for more efficient vector utilization.

@Edit:
I attached the official documentation from Intel:

The attachment intel-avx-512-instruction-set-for-packet-processing-technology-guide-1645717553.pdf is no longer available

From the attached document:

3.1.1 Packed Data Types In the Intel® AVX-512 instruction set, each intrinsic’s suffix is used to indicate how the operands are […]
Show full quote

3.1.1 Packed Data Types
In the Intel® AVX-512 instruction set, each intrinsic’s suffix is used to indicate how the operands are treated, adopting the same
instruction naming conventions as its predecessors. The pi suffix indicates packed integer operands, pu suffix indicates packed
unsigned integer operands, the pd suffix indicates packed double-precision floating-point operands, and the ps suffix indicates
packed single-precision floating-point operands.
The C data type of the intrinsic operands is declared as either __m512i, indicating packed integers, or __m512d, indicating packed
double precision floating points, or __m512, indicating packed single precision floating points. It follows that AVX-512 intrinsics
with a ps suffix would therefore have operands with a data type of __m512.
To understand this concept in detail, consider the example of vector addition shown in Table

Table 2. Packed Data Type Vector Addition
INTRINSIC SUFFIX C INTRINSIC FORM OF INSTRUCTION PACKED DATA TYPE
_epi64 __m512i _mm512_add_epi64 (__m512i a, __m512i b) 8 x 64-bit Integer
_epi32 __m512i _mm512_add_epi32 (__m512i a, __m512i b) 16 x 32-bit Integer
_epi16 __m512i _mm512_add_epi16 (__m512i a, __m512i b) 32 x 16-bit Integer
_epi8 __m512i _mm512_add_epi8 (__m512i a, __m512i b) 64 x 8-bit Integer
_pd __m512d _mm512_add_pd (__m512d a, __m512d b) 8 x 64-bit double precision floating point
_ps __m512 _mm512_add_ps (__m512 a, __m512 b) 16 x 32-bit single precision floating point

There is no mention of 128-bit/256-bit float support.

Website, Youtube
Falcosoft Soundfont Midi Player + Munt VSTi + BassMidi VSTi
VST Midi Driver Midi Mapper
x86 microarchitecture benchmark (MandelX)