[Release] X-VESA 2.0 Public Beta – Deep VESA diagnostics for real DOS hardware

Reply 80 of 97, by RayeR

Posted on 2026-05-06, 17:08

RayeR Offline

Rank Oldbie

Rank: Oldbie
Posts: 1239
Joined: 2007-08-11, 13:26
Location: CZ

Cool to see more AVX enabled SW 😀
So as I understand I cannot use this instructions (opcode) just out of the box but I have to enable XCR0 first. If I don't what happens? Some invalid opcode exception?
Does this work the same on intel and AMD? Isn't here some issues when running such AVX code under PM or V86 modes? Also I'm bit confused about how various CPU supports different sets/subsets of AVX256/512, If I remember well, intel even disabled (partially?) AVX on some CPU family via BIOS update or so...

Last edited by RayeR on 2026-05-07, 17:39. Edited 1 time in total.

Gigabyte GA-P67-DS3-B3, Core i7-2600K @4,5GHz, 8GB DDR3, 128GB SSD, GTX970(GF7900GT), SB Audigy + YMF724F + DreamBlaster combo + LPC2ISA

Reply 81 of 97, by Marco Pistella

Posted on 2026-05-06, 19:00

Marco Pistella Offline

Rank Newbie

Rank: Newbie
Posts: 55
Joined: 2019-09-17, 20:38
Location: Italy

RayeR wrote on 2026-05-06, 17:08:

Cool to see more AVX en...[CUT]

Re: XCR0, exceptions, PM/V86, and AVX fragmentation
Great questions, let me address them in order from X-VESA's perspective.
Without enabling XCR0 — it depends on the CPU mode
In protected mode (16 or 32-bit), if CR4.OSXSAVE is not set or XCR0[2:1] (YMM+SSE state) are not enabled via XSETBV, any VEX-encoded YMM instruction will raise #UD (Invalid Opcode, INT 6). The CPU checks this at decode time, not at execution — so CPUID reporting AVX as available means nothing, the real gate is XCR0.
In real mode the situation is subtly different: the VEX prefix C5h is not decoded as a VEX prefix at all — it's the opcode for the LDS instruction. So you won't get #UD, you'll get silent data corruption or a completely wrong memory load, which is arguably worse. This is exactly why X-VESA switches to protected mode before executing any AVX path.
Intel vs AMD
Behaviorally identical on these specific points — both follow the SDM/APM spec for XCR0 gating and mode-dependent prefix decoding. The difference is in what they expose. AMD has been more consistent: if they advertise AVX2 you generally get it intact. Intel is where the fragmentation nightmare lives (see below).
AVX in PM / V86
X-VESA runs in real mode, but executes the AVX benchmarks (W_512L etc.) by switching to protected mode on the fly — full GDT setup, CR0.PE toggle, then back. The VEX prefix C5h/C4h is decoded correctly in 16-bit PM segments (USE16), which is technically undocumented behavior but works on every tested CPU. V86 mode is a different story — V86 traps all sensitive instructions and the VMM would need to emulate XSETBV, which nobody does. Avoid V86 for AVX entirely.
One critical detail discovered the hard way: cli before XSETBV is essential. On some Intel boards (i5-7500/Asrock 200-series) an SMI firing during XCR0 manipulation corrupts YMM state and causes random crashes. Switching to a PS/2 keyboard eliminates the USB Legacy polling component of this problem.
AVX fragmentation / Intel disabling via microcode
This is genuinely a mess. Short version:

Alder Lake (12th gen) and later: Intel fused out AVX-512 at the silicon level on consumer parts — Efficiency cores can't execute it, and Intel disabled it globally to avoid asymmetry bugs. Some early Alder Lake steppings had it working until a BIOS/microcode update killed it.
Skylake-X / Cascade Lake: full AVX-512 (multiple flavors: AVX512F, BW, DQ, VL, VNNI...).
Ice Lake mobile: AVX-512 present but throttles heavily under sustained load.
The safest check: CPUID leaf 7, subleaf 0 for AVX-512F (bit 16 of EBX), then verify XCR0[7:5] (opmask+ZMM_Hi256+Hi16_ZMM) are all actually writable — some BIOSes advertise the feature but don't enable the full ZMM save area.

X-VESA performs the full CPUID+XCR0 verification chain before attempting any AVX-512 path, and falls back gracefully to AVX2 or plain rep movsd. The RTX 5070 (B650M/Ryzen 7800X3D) was the most problematic test case, though for a completely unrelated reason — the BIOS VIDEO_200_LINES call in CSM mode froze the system, which we fixed by using direct VGA register manipulation as the primary init path, with BIOS as fallback.

Reply 82 of 97, by RayeR

Posted on Yesterday, 17:56

RayeR Offline

Rank Oldbie

Rank: Oldbie
Posts: 1239
Joined: 2007-08-11, 13:26
Location: CZ

Thanks for detailed info. It's tempting me to try to play a bit with AVX, I'd like to use with GCC/DJGPP as I used it nearly for all my DOS tools.
Seems 1st I should add AVX512 detection to my CPUID utility... Also seems this instructions needs aligned data (16B for SSE, 32B for AVX).
Problem with V86 - I can ask Japhet what can be done in JEMM386 - if it would be possible some passthrough of this instructions. Also it may be enough to use CWSDPR0 to run DPMI clients in ring 0...
BTW CLI/STI cannot prevent entering SMI - it has higher priority than ring 0. It may be possible some way to reconfigure SERR# signal routing in chipset to not trigger SMI#, if I remember well, but it's chipset-dependent stuff...

Gigabyte GA-P67-DS3-B3, Core i7-2600K @4,5GHz, 8GB DDR3, 128GB SSD, GTX970(GF7900GT), SB Audigy + YMF724F + DreamBlaster combo + LPC2ISA

Reply 83 of 97, by Falcosoft

Posted on Yesterday, 18:08

Falcosoft Offline

Rank l33t

Rank: l33t
Posts: 2651
Joined: 2016-05-21, 13:46
Location: Pécs, Hungary

RayeR wrote on Yesterday, 17:56:

..
Seems 1st I should add AVX512 detection to my CPUID utility... Also seems this instructions needs aligned data (16B for SSE, 32B for AVX).
..

Not necessarily. Both SSE and AVX have unaligned move instructions if you cannot guarantee natural alignment (16 and 32 byte correspondingly).
SSE: movups, movupd vs. movaps, movapd
AVX: vmovups, vmovupd vs. vmovaps, vmovapd

Website, Youtube
Falcosoft Soundfont Midi Player + Munt VSTi + BassMidi VSTi
VST Midi Driver Midi Mapper
x86 microarchitecture benchmark (MandelX)

Reply 84 of 97, by RayeR

Posted on Yesterday, 18:43

RayeR Offline

Rank Oldbie

Rank: Oldbie
Posts: 1239
Joined: 2007-08-11, 13:26
Location: CZ

OK but I guess there will be performance penalty on unaligned data?

Gigabyte GA-P67-DS3-B3, Core i7-2600K @4,5GHz, 8GB DDR3, 128GB SSD, GTX970(GF7900GT), SB Audigy + YMF724F + DreamBlaster combo + LPC2ISA

Reply 85 of 97, by Falcosoft

Posted on Yesterday, 18:51

Falcosoft Offline

Rank l33t

Rank: l33t
Posts: 2651
Joined: 2016-05-21, 13:46
Location: Pécs, Hungary

RayeR wrote on Yesterday, 18:43:

OK but I guess there will be performance penalty on unaligned data?

Yep, unaligned moves are slower so they are not recommended for memory bandwidth heavy tasks like X-VESA's benchmark. But if you do number crunching tasks mainly in XMM/YMM registers (like my Mandelbrot benchmark) and only read data from memory at the start they are completely OK.

Website, Youtube
Falcosoft Soundfont Midi Player + Munt VSTi + BassMidi VSTi
VST Midi Driver Midi Mapper
x86 microarchitecture benchmark (MandelX)

Reply 86 of 97, by jmarsh

Posted on Yesterday, 21:36

jmarsh Offline

Rank Oldbie

Rank: Oldbie
Posts: 1863
Joined: 2014-01-04, 09:17

RayeR wrote on 2026-05-06, 17:08:
Also I'm bit confused about how various CPU supports different sets/subsets of AVX256/512, If I remember well, intel even disabled (partially?) AVX on some CPU family via BIOS update or so...

It's one of the weirdest situations in x86 history: intel developed AVX-512 then dropped support starting with gen 12 and everything since, while AMD started supporting it from zen 4 and onwards...

AVX 256 is a little different since it's not just about more/bigger registers, it also uses a different encoding style that has a separate destination operand rather than it being implicit. So sometimes you can take an existing SSE2 (128-bit) algorithm and gain speed simply by eliminating some register moves / reducing code size.

Reply 87 of 97, by zyzzle

Posted on Today, 01:52

zyzzle Offline

Rank Member

Rank: Member
Posts: 491
Joined: 2020-04-08, 04:02

I did some extensive tests with X-VESA on various laptops:
First 4 screencaps show an Ivy Bridge system (core i5-3427) and its integrated GPU. This could enable MTRRs for both VGA and LFB. But I see a possible bug in working with the 1366x768 mode. When performing tests at this resolution, which is valid and recognized, the error "Invalid horizontal / vertical resolution" displays. The virtual resolution and "detect memory" tests will not run. Why is 1366x768 invalid? Not divisible by four? But it (mode 0x17f) is OK as it's a valid 16:9 resolution for older laptops, being the native display for these LCD screens. This Ivy Bridge system is also interesting because it's the only system I tested which could complete VGA timings (screenshot 3 and 4). All other systems failed this test - said "off" / "disabled".

The second system (screenshots 5-9) is a Kaby Lake i5 (i5-8250) laptop. This has several anomalies. It will enable MTRRs at VGA registers, but freezes at LFB. So I could perform memory bandwidth tests only in Banked mode for max. speed. Another problem is that screenshots 5-9 are at 1.6 Ghz speed. Normal expectated results. But at 3.4Ghz, this system has very strange results in 128b memory write test (screenshot 9-11.)

Third system is a netbook with Broxton graphics architecture (screenshots 12-15). This system also has 1366x768 mode which X-VESA claims is "Invalid horizontal/vertical resolution" (screen 15), yet screen 14 shows it's a valid VESA mode. I can also enable MTRRs for both VGA and LFB on this system -- memory bandwidth tests are with LFB mode @ 1.1 Ghz.

A few technical concerns for Mario: Why can't we rename X-VESA.COM? I'd like to name it to XV.com or even X.com. I unpacked the encrypted .com file in order to compress it better with UPX. The unpacked file compresses to 30560 bytes with UPX 4.01 --lzma option. There was an intermediary packed file which has "STD-STUB V1.0.0 (15/04/2026)". What and why is that stub used? This unencryped intemediary file is also packed (33096 bytes). When that packed file is unpacked, a 57802 valid .COM file results. However, you prevent us from repacking the file or running the unpacked version. There is "no Virus attack", I just want to get the compression down as much as possible -- 30560 bytes is better than 33096 bytes. Will you please allow this?

(I see that we can't attach more than 5 files per message). I'll continue in subsequent posts with the others.

Last edited by zyzzle on 2026-05-08, 02:02. Edited 1 time in total.

Reply 88 of 97, by zyzzle

Posted on Today, 01:55

zyzzle Offline

Rank Member

Rank: Member
Posts: 491
Joined: 2020-04-08, 04:02

screenshots 6-10

Reply 89 of 97, by zyzzle

Posted on Today, 01:56

zyzzle Offline

Rank Member

Rank: Member
Posts: 491
Joined: 2020-04-08, 04:02

screenshots 10-15

Reply 90 of 97, by zyzzle

Posted on Today, 02:00

zyzzle Offline

Rank Member

Rank: Member
Posts: 491
Joined: 2020-04-08, 04:02

Proving MTRRs could be enabled for VGA for my Kaby Lake i5-8250 laptop using VGAMTRR utility by Falcosoft. LFB freezes entirely with writecombining. Bad LFB? Even freezes when loading in autoexec.bat with no USB legacy.

And the intermediary compressed and uncompressed X-VESA com files.

Reply 91 of 97, by Falcosoft

Posted on Today, 05:01

Falcosoft Offline

Rank l33t

Rank: l33t
Posts: 2651
Joined: 2016-05-21, 13:46
Location: Pécs, Hungary

zyzzle wrote on Today, 02:00:

Proving MTRRs could be enabled for VGA for my Kaby Lake i5-8250 laptop using VGAMTRR utility by Falcosoft. L...

This means that if you use vesa12.com or nolfb.com to force banked VESA modes then you should get pretty high frame rates in games and VESA speed tests.
On my IvyB/Haswell systems when VGA MTRRs are set to WC and nolfb is used I get 150+ FPS in PCPBENCH's 1024x768x8-bit mode test which is comparable to WC enabled LFB speeds.

Website, Youtube
Falcosoft Soundfont Midi Player + Munt VSTi + BassMidi VSTi
VST Midi Driver Midi Mapper
x86 microarchitecture benchmark (MandelX)

Reply 92 of 97, by Marco Pistella

Posted on Today, 05:33

Marco Pistella Offline

Rank Newbie

Rank: Newbie
Posts: 55
Joined: 2019-09-17, 20:38
Location: Italy

@RayeR

Re: AVX with DJGPP, alignment, V86, SMI

DJGPP + AVX

CWSDPR0 is the right call. The fundamental issue is XSETBV requiring ring 0 — under standard CWSDPMI (ring 3 DPMI) you can't execute it directly. CWSDPR0 solves this cleanly. GCC supports AVX intrinsics with -mavx / -mavx2 / -mavx512f flags on any reasonably recent version, so the compiler side is straightforward once XCR0 is properly enabled.

Alignment

Correct on the requirements: 16B for SSE, 32B for AVX, 64B for AVX-512 zmm registers. However the distinction between aligned and unaligned variants matters: vmovaps/vmovapd will fault on misaligned data while vmovups/vmovupd handle any alignment. On Haswell and later the penalty for unaligned vmovups on actually-aligned data is zero at runtime — the hardware detects alignment. On older AVX (Sandy Bridge / Ivy Bridge) the penalty is real even with vmovups, so explicit 32B alignment in the data structure is worth it for portable code.

V86 / JEMM386

The core problem is that XSETBV in V86 generates #GP, which the VMM (JEMM386) would need to intercept, validate, and re-execute in ring 0. Technically feasible — Japheth would know the internal hooks required — but non-trivial. Worth asking, the JEMM386 codebase is well-maintained.

CLI/STI and SMI — you are right, I was imprecise

CLI does not block SMI. SMI has higher priority than NMI and is invisible to the processor's interrupt flag entirely — it triggers an immediate switch to SMM regardless of CPL or EFLAGS.IF. What CLI prevents in the YMM corruption scenario is a regular IRQ firing between the XCR0 write and the first YMM instruction, but the actual culprit on i5-7500/Asrock 200-series is the USB Legacy SMI from the xHCI controller, which CLI cannot stop.

The correct fix is exactly what you described: disable xHCI LEGSUP via PCI config space. The sequence is: locate the xHCI controller (class 0C/03/30), walk the extended capabilities to find USB Legacy upport (cap ID 1), set HC_OS_OWNED and clear the SMI enables in USBLEGCTLSTS. This is chipset-dependent in detail but the xHCI spec defines the capability structure so it is portable across compliant implementations. That work is planned for X-PCI, the PCI config space utility currently in the X-VESA roadmap.

@jmarsh

Both good points worth expanding.

On the Intel/AMD AVX-512 situation

The irony is complete: Intel introduced AVX-512 with Skylake-X in 2017, then dropped it from all consumer parts starting with Alder Lake (gen 12) due to the hybrid P+E core architecture — E cores don't support it, so rather than handle the asymmetry Intel disabled it globally on consumer SKUs. Raptor Lake, Meteor Lake, and Arrow Lake all follow the same pattern. Intel Xeon server parts still have it.

AMD went the opposite direction: Zen 4 (Ryzen 7000, 2022) added AVX-512 and it has been present on every AMD desktop part since. The AMD implementation covers the most practically useful subsets (F, BW, CD, DQ, VL, VNNI, VBMI, VBMI2 among others) — not the full Intel server feature set but more than enough for compute workloads.

So the current landscape is: if you want AVX-512 on a consumer CPU today, buy AMD.

On VEX encoding and the three-operand form

Exactly right. The non-destructive destination is underappreciated as an optimization opportunity. Legacy SSE: paddq xmm0,xmm1 destroys xmm0. VEX: vpaddq ymm0,ymm1,ymm2 leaves ymm1 and ymm2 intact. In tight loops this eliminates several register-to-register moves that exist purely to preserve values, reducing both code size and execution pressure on the move units.

There is a related benefit: VEX-encoded 128-bit instructions zero the upper 128 bits of the destination YMM register, which avoids the transition penalty between legacy SSE and VEX code paths that AVX-capable CPUs impose to manage dirty upper state. Mixing legacy SSE and VEX instructions in the same code path without being aware of this is a common source of unexpected performance degradation.

@zyzzle

Thanks for the detailed report. Addressing each point:

1366x768 — by design, not a bug

The horizontal resolution must produce a scanline length in bytes divisible by 4 — this is a hard requirement for all of X-VESA's internal graphic routines. 1366x768 fails this (1366 × bytes-per-pixel
is never divisible by 4 for any standard depth).

The obvious workaround would be to use 4F06h to request a logical scanline width of 1368, which IS divisible by 4. However, testing showed that on several controllers this produces an undefined state — the graphics controller accepts the call but enters an inconsistent configuration. Since a clear error message is preferable to a random freeze, all resolutions that cannot produce a valid scanline length are
explicitly rejected. This is unlikely to change without a substantial rewrite of the rendering routines.

Ivy Bridge VGA timings — confirmed

Ivy Bridge is the last Intel iGPU generation with complete legacy VGA timing register support. From Haswell onwards those code paths are progressively disabled or emulated. This is hardware behavior, not an X-VESA limitation.

Kaby Lake LFB freeze — confirmed

Known behavior on modern Intel iGPUs in CSM mode. Banked mode is the correct workaround for bandwidth testing on that system.

Regarding the Kaby Lake Write 64b anomaly (182,000 MiB/s)

The transfer rate measurement itself is reliable. What fails at these speeds is exclusively the overhead calculation.

X-VESA uses the PIT for all timing, including overhead measurement, in order to maintain compatibility with CPUs that predate RDTSC — At transfer rates in the Gb/s range the overhead calculation ecomes unreliable for a specific reason: the statistical sample used (32 iterations) produces a value that is of the same order of magnitude as the PIT measurement error itself. X-VESA includes a ompensation algorithm for this — effectively measuring the overhead of the overhead measurement — but at these speeds even that cannot fully compensate for the fundamental granularity limit of the PIT (~838ns per tick).

Increasing the number of samples for the overhead calculation would reduce the problem but not eliminate it, and would introduce an asymmetry in the measurement methodology between overhead and
transfer rate that would complicate interpretation of results.

The correct reading of the Kaby Lake data is therefore: the raw transfer rate values are valid, the overhead-subtracted values at very high bandwidths are not meaningful and should be disregarded.
This is a known limitation of PIT-based measurement at Gb/s speeds and not a defect in the transfer rate benchmark itself.

Renaming X-VESA.COM

The file is not designed to be renamed. Please use it as distributed.

SuperDoubleTiny and UPX

The STD-STUB V1.0.0 you found is the SuperDoubleTiny bootstrap, a custom loader developed specifically for X-VESA. The reason SDT exists is architectural: DOS COM files are inherently limited to a single 64KB segment shared by code, data, and stack. X-VESA requires a full 64KB code segment AND a separate full 64KB data segment simultaneously — a memory model that is impossible for a standard COM file.

Here is how SDT works. Three 64KB segments are involved:

SEG1 — starting segment of X-VESA.COM (code segment)
SEG2 — used by APACK as decompression workspace
SEG3 — used by the stub and as stack

a) STUB.COM starts and copies itself into SEG3
b) STUB.COM executes a RETF from SEG3
c) STUB.COM copies compressed DATA.COM from SEG1 into SEG3
behind itself
d) STUB.COM reallocates compressed CODE.COM in SEG1 at ORG 0100h
e) STUB.COM executes RETF handing control to CODE.COM in SEG1:0100h
f) Compressed CODE.COM copies itself into SEG2
g) APACK stub executes FAR JMP into SEG2 and decompresses CODE.COM
back into SEG1
h) FAR JMP from SEG2 to SEG1:0100h — decompressed CODE.COM runs
i) CODE.COM detects stack != 0FFFEh — STUB.COM return value is on
stack in SEG3
j) CODE.COM swaps STUB.COM RETF address in SEG3 with its own
reentry point in SEG1
k) CODE.COM executes RETF jumping to STUB.COM in SEG3
l) STUB.COM copies compressed DATA.COM into SEG1 immediately after
the end of CODE.COM
m) STUB.COM executes RETF returning to CODE.COM in SEG1
(reentry point set in step j)
n) CODE.COM pushes its own reentry address onto the stack then
executes RETF jumping to compressed DATA.COM
o) DATA.COM APACK stub copies itself into the next 64KB and executes
FAR JMP which decompresses DATA.COM in place
p) Decompressed DATA.COM contains a single RETF as its first
instruction, returning control to CODE.COM in SEG1
q) X-VESA is now fully operational with 64KB code and 64KB data
in separate segments

UPX is incompatible with this model for three concrete reasons. First, during decompression UPX writes into memory areas beyond its expected output range, corrupting the stub residing in SEG3. Second,
even with --ultra-brute UPX produces files slightly larger than the APACK result — there is no compression gain to justify the effort. Third, when applied to the SDT model specifically, UPX-compressed
binaries fail to execute on an IBM 5150 — this is not a general UPX limitation on 8088 hardware, but a failure specific to the interaction between UPX's decompression behaviour and the SDT memory layout.

X-VESA requires 302240 bytes of conventional memory and is designed to run on any hardware from an IBM 5150 to a 2026 system with a legacy CSM BIOS. On an IBM 5150 it starts correctly and reports
"80386 or above required" — a clean, graceful exit. If conventional memory is insufficient it reports the shortage and exits without crashing. This compatibility range is non-negotiable and any change
to the bootstrap that breaks it, as UPX does, is not acceptable regardless of other considerations.

Reply 93 of 97, by Marco Pistella

Posted on Today, 05:42

Marco Pistella Offline

Rank Newbie

Rank: Newbie
Posts: 55
Joined: 2019-09-17, 20:38
Location: Italy

zyzzle wrote on Today, 02:00:

Proving MTRRs could b ... [CUT]

The LFB freeze without USB legacy is significant: it rules out SMI
as the cause entirely. The problem is intrinsic to the Kaby Lake
iGPU's LFB implementation in CSM mode — most likely the VESA BIOS
for this generation does not properly initialize the LFB region for
direct WC access in legacy mode, or the iGPU silently rejects WC
write combining on the LFB aperture at the hardware level regardless
of MTRR settings. Banked mode remains the correct workaround for
this system and the transfer rate results obtained in that mode are
fully valid.

Regarding the intermediary files: the decompression chain in SDT
involves two separate APACK stages for code and data independently.
If you obtained files at intermediate stages of the bootstrap, they
represent partial states of the decompression sequence and are not
standalone executables.

Reply 94 of 97, by Marco Pistella

Posted on Today, 05:43

Marco Pistella Offline

Rank Newbie

Rank: Newbie
Posts: 55
Joined: 2019-09-17, 20:38
Location: Italy

Falcosoft wrote on Today, 05:01:

This means that if you use vesa ... [CUT]

Correct, and it is an underappreciated point. With WC enabled on the
VGA aperture, banked mode can approach LFB WC speeds because write
combining buffers aggregate the writes before they reach the
hardware, amortizing the bank switch overhead almost entirely.
The 150+ FPS result on IvyB/Haswell confirms this — the bottleneck
in banked mode without WC is the uncached write latency, not the
bank switching itself. Once WC is in place that bottleneck
disappears and the remaining overhead is negligible.

For the Kaby Lake system where LFB freezes, this is therefore not
just a workaround but effectively the optimal path — banked mode
with VGA WC MTRR active should deliver results comparable to what
a working LFB WC would give.

Reply 95 of 97, by zyzzle

Posted on Today, 07:22

zyzzle Offline

Rank Member

Rank: Member
Posts: 491
Joined: 2020-04-08, 04:02

Marco Pistella wrote on Today, 05:33:
Kaby Lake LFB freeze — confirmed […]
Show full quote

Kaby Lake LFB freeze — confirmed

Known behavior on modern Intel iGPUs in CSM mode. Banked mode is the correct workaround for bandwidth testing on that system.

Regarding the Kaby Lake Write 64b anomaly (182,000 MiB/s)

The transfer rate measurement itself is reliable. What fails at these speeds is exclusively the overhead calculation.

I'm in awe of your through analysis and response. Thanks for your in depth comments, especially regarding the X-VESA.COM file. You are doing quite a lot of magic- the cascade from stub to final execution is pure genius. I noticed ASPACK 0.99 packs the 57800 byte com file to ~32520 bytes. But obviously that header is needed and should not be discarded. So the 57,800 byte unpacked X-VESA .com is incomplete, since it lacks the stub.

I've got Skylake laptops that also freeze in LFB mode when enabling the MTRRs and writecombining. I can post some screenshots if wanted.

On that kaby lake system, I followed RayeR's suggestion to disable USB legacy option ROMs and legacy USB boot. I installed DOS 7.1 onto the NVMe SSD. Booting to DOS directly and calling MTRRs enabling though autoexec.bat hard freezes the system. As far as I can, USB boot and legacy has been completely disabled, yet still freezes.

@ Falcosoft: Yes, Silverman's nolfb and the similar vesa12.com do eliminate LFB and use bankswitching, but many or most games won't use bankswitching in VESA modes and insist on LFB and VESA 2.0 requirement in order to run. So, I'm stuck with very slow LFB VESA performance on those games on the Skylake or greater intel core systems. The difference as seen in the X-VESA benchmarks is about 3 orders of magnitude (~1000x more memory bandwidth with bankswitching than with LFB mode). I've never found a utility which turns VESA3.0 into VESA2.0, something like vesa2.com or similar.

incidentally, the Braxton Intel integrated graphics system shown in my screenshots above, does allow LFB writecombining caching. I think it is one of the last Intel chipsets which came out before UEFI-only was forced upon us and CSM BIOSes were sadly eliminated in 2020. Kaby Lake core intel systems likely predate it. That system also has an eMMC 64gb flash drive and I installed DOS directly onto it and called MTRRLFBE directly from autoexec to enable MTRRs in LFB. The CPU is an Intel Celeron N4020. I can also boot with a USB flash drive to DOS and enable LFB MTRRs on this Braxton system.

Reply 96 of 97, by Marco Pistella

Posted on Today, 09:12

Marco Pistella Offline

Rank Newbie

Rank: Newbie
Posts: 55
Joined: 2019-09-17, 20:38
Location: Italy

New Beta 4 - Changes from Beta 1

Beta 1 — 30/04/2026
Initial public release.

Beta 2
- Command 8 (DAC 6/8-bit test): spacebar added as alternative toggle
key alongside numpad +.
- Command 7 (virtual resolution test) and dual-page test: results now
distinguish between "4F07h not supported" and "4F07h supported but
failed" (return code 014Fh). Previously both conditions produced the
same output.

Beta 3
- Command 8 (DAC 6/8-bit test): spacebar alternative key extended for
full compatibility with keyboards without a dedicated numeric keypad.
- Command 7 (virtual resolution test): the summary panel shown on ESC
now includes the horizontal granularity of the virtual resolution —
the interval in pixels at which function 4F06h generates a new
distinct BytesPerScanLine value, derived from the compatible virtual
resolution table built during the test.

Beta 4
- Command 7 (virtual resolution test): the info panel (424 pixels wide)
is now clipped correctly on horizontal resolutions below 424 pixels.
No display corruption on 320-wide modes.
- ROM space in the F10 extra info section is now shown in KiBytes
instead of bytes.

The attachment X-VESA200.ZIP is no longer available

Reply 97 of 97, by Marco Pistella

Posted on Today, 09:20

Marco Pistella Offline

Rank Newbie

Rank: Newbie
Posts: 55
Joined: 2019-09-17, 20:38
Location: Italy

Special build for zyzzle (and anyone with 1366x768 hardware):

A custom build is available that removes the divisible-by-4 horizontal
resolution restriction. As documented in this thread, the standard
build rejects resolutions like 1366x768 because using 4F06h to pad
the scanline to 1368 bytes produced undefined behavior on several
tested controllers.

This build removes that check. It may work correctly on your hardware
or it may not — behavior is controller-dependent and untested beyond
the reported cases. Use at your own risk and please report results.

The attachment XVESA200.ZIP is no longer available

Main menu