VOGONS


Reply 120 of 124, by Marco Pistella

User metadata
Rank Member
Rank
Member
Falcosoft wrote on Yesterday, 21:40:
Thanks, I found them. […]
Show full quote
Marco Pistella wrote on Yesterday, 19:24:
Falcosoft wrote on Yesterday, 19:16:

Github is still indexing the code base so searching does not function properly.
Can you tell what files contain the SSE and AVX benchmark routines?

VESA_Speed_Routines.ASM + VESA_COMMAND_02.ASM

Thanks, I found them.

There is 1 thing that I could not find in X-VESA (even in the detailed video mode information page): It's the bits in the ModeAttributes field inside the ModeInfoBlock struct.
Some of these bits, such as 'VGA compatible mode' could be useful for troubleshooting since it seems e.g. Build engine games use it to determine code paths.
Thanks in advance!

All ModeAttributes bits are decoded in VESA_Command_01. The decoding is version-dependent: bits 5-6-7 (VGA compatible + windowed + linear frame buffer) are shown only for VESA 2.0, additional fields for VESA 3.0. The Block_Yes_Or_No routine handles the actual bit testing, iterating through the mask defined by DH/DL.

Reply 121 of 124, by Falcosoft

User metadata
Rank l33t
Rank
l33t
Marco Pistella wrote on Today, 06:44:
Falcosoft wrote on Yesterday, 21:40:
Thanks, I found them. […]
Show full quote
Marco Pistella wrote on Yesterday, 19:24:

VESA_Speed_Routines.ASM + VESA_COMMAND_02.ASM

Thanks, I found them.

There is 1 thing that I could not find in X-VESA (even in the detailed video mode information page): It's the bits in the ModeAttributes field inside the ModeInfoBlock struct.
Some of these bits, such as 'VGA compatible mode' could be useful for troubleshooting since it seems e.g. Build engine games use it to determine code paths.
Thanks in advance!

All ModeAttributes bits are decoded in VESA_Command_01. The decoding is version-dependent: bits 5-6-7 (VGA compatible + windowed + linear frame buffer) are shown only for VESA 2.0, additional fields for VESA 3.0. The Block_Yes_Or_No routine handles the actual bit testing, iterating through the mask defined by DH/DL.

Ahh, OK. Thanks!

Regarding the AVX benchmarks: Don't you think that after the AVX block of instructions VZEROALL or VZEROUPPER should be called to prevent false dependency on the upper 128-bits in ymm registers when SSE instructions are used later?
I refer to this problem (implicit widening):
https://stackoverflow.com/questions/66874161/ … 895855#66895855

@Edit:
Since it seems always only 1 SIMD register is used by all routines (xmm0, ymm0, zmm0 respectively) VXORPS on the 1st register should be enough.
Or you always only move (zeroed) data so the issue is not relevant in your case?

Website, Youtube
Falcosoft Soundfont Midi Player + Munt VSTi + BassMidi VSTi
VST Midi Driver Midi Mapper
x86 microarchitecture benchmark (MandelX)

Reply 122 of 124, by Marco Pistella

User metadata
Rank Member
Rank
Member
Falcosoft wrote on Today, 07:18:
Ahh, OK. Thanks! […]
Show full quote
Marco Pistella wrote on Today, 06:44:
Falcosoft wrote on Yesterday, 21:40:
Thanks, I found them. […]
Show full quote

Thanks, I found them.

There is 1 thing that I could not find in X-VESA (even in the detailed video mode information page): It's the bits in the ModeAttributes field inside the ModeInfoBlock struct.
Some of these bits, such as 'VGA compatible mode' could be useful for troubleshooting since it seems e.g. Build engine games use it to determine code paths.
Thanks in advance!

All ModeAttributes bits are decoded in VESA_Command_01. The decoding is version-dependent: bits 5-6-7 (VGA compatible + windowed + linear frame buffer) are shown only for VESA 2.0, additional fields for VESA 3.0. The Block_Yes_Or_No routine handles the actual bit testing, iterating through the mask defined by DH/DL.

Ahh, OK. Thanks!

Regarding the AVX benchmarks: Don't you think that after the AVX block of instructions VZEROALL or VZEROUPPER should be called to prevent false dependency on the upper 128-bits in ymm registers when SSE instructions are used later?
I refer to this problem (implicit widening):
https://stackoverflow.com/questions/66874161/ … 895855#66895855

@Edit:
Since it seems always only 1 SIMD register is used by all routines (xmm0, ymm0, zmm0 respectively) VXORPS on the 1st register should be enough.
Or you always only move (zeroed) data so the issue is not relevant in your case?

As you guessed in your edit, since the SSE/AVX routines only transfer data and never read the previous register state, and these instructions are not used elsewhere in the program, VZEROALL/VZEROUPPER are not needed. The upper bits state is irrelevant in this context.

Reply 123 of 124, by Falcosoft

User metadata
Rank l33t
Rank
l33t
Marco Pistella wrote on Today, 08:16:
Falcosoft wrote on Today, 07:18:
Ahh, OK. Thanks! […]
Show full quote
Marco Pistella wrote on Today, 06:44:

All ModeAttributes bits are decoded in VESA_Command_01. The decoding is version-dependent: bits 5-6-7 (VGA compatible + windowed + linear frame buffer) are shown only for VESA 2.0, additional fields for VESA 3.0. The Block_Yes_Or_No routine handles the actual bit testing, iterating through the mask defined by DH/DL.

Ahh, OK. Thanks!

Regarding the AVX benchmarks: Don't you think that after the AVX block of instructions VZEROALL or VZEROUPPER should be called to prevent false dependency on the upper 128-bits in ymm registers when SSE instructions are used later?
I refer to this problem (implicit widening):
https://stackoverflow.com/questions/66874161/ … 895855#66895855

@Edit:
Since it seems always only 1 SIMD register is used by all routines (xmm0, ymm0, zmm0 respectively) VXORPS on the 1st register should be enough.
Or you always only move (zeroed) data so the issue is not relevant in your case?

As you guessed in your edit, since the SSE/AVX routines only transfer data and never read the previous register state, and these instructions are not used elsewhere in the program, VZEROALL/VZEROUPPER are not needed. The upper bits state is irrelevant in this context.

OK, I see.

Website, Youtube
Falcosoft Soundfont Midi Player + Munt VSTi + BassMidi VSTi
VST Midi Driver Midi Mapper
x86 microarchitecture benchmark (MandelX)

Reply 124 of 124, by Marco Pistella

User metadata
Rank Member
Rank
Member
Falcosoft wrote on Today, 08:47:
Marco Pistella wrote on Today, 08:16:
Falcosoft wrote on Today, 07:18:
Ahh, OK. Thanks! […]
Show full quote

Ahh, OK. Thanks!

Regarding the AVX benchmarks: Don't you think that after the AVX block of instructions VZEROALL or VZEROUPPER should be called to prevent false dependency on the upper 128-bits in ymm registers when SSE instructions are used later?
I refer to this problem (implicit widening):
https://stackoverflow.com/questions/66874161/ … 895855#66895855

@Edit:
Since it seems always only 1 SIMD register is used by all routines (xmm0, ymm0, zmm0 respectively) VXORPS on the 1st register should be enough.
Or you always only move (zeroed) data so the issue is not relevant in your case?

As you guessed in your edit, since the SSE/AVX routines only transfer data and never read the previous register state, and these instructions are not used elsewhere in the program, VZEROALL/VZEROUPPER are not needed. The upper bits state is irrelevant in this context.

OK, I see.

If you're interested in the SSE/AVX handling, I'd also suggest looking at LIBS\Retrieve_CPU_Capabilities.ASM — it contains the full detection and enabling sequence (SSE/AVX/AVX512F)