VOGONS


First post, by mdrejhon

User metadata
Rank Newbie
Rank
Newbie

I'm the founder of Blur Busters and creator of TestUFO, and want to make an announcement of one open source code module (used in TestUFO and VSYNCtester) that will be useful for some emulator authors.

I happen to have experimented with the world's first cross platform Kefrens Bars (works on PC and Mac via generic OpenGL VSYNC OFF tearline-position beamracing)

Here's some fruits I'd like to share.

Excellent news: New VSYNC Estimator Source Code
- Accurate estimate of your real refresh rate (to many decimal digits)
- Accurate estimate of next VSYNC
- Useful for improved lag-reducing inputdelay algorithms (vsync phase offsetting between the executing emuHz and the graphics drivers' realHz)
- Accurate enough for cross platform beam racing applications (to help sync emulator display's beam to real display's beam ala WinUAE Lagless VSYNC)
- Useful for both non-beamracing and beamracing purposes

file.php?mode=view&id=168356
https://github.com/blurbusters/RefreshRateCalculator

- One "RefreshRateCalculator()" class object, self contained.
- About 200 lines of code (+ ~100 lines of comments)
- No external dependancies
- Easy to port to almost any language on almost any platform

Purposes for emulators

- Non-beamraced:
- .....This can simply be used for crossplatform nudging/flywheeling emuHz (CPU clocked) slowly towards realHz (GPU clocked) to prevent latency phase slewing effects
- .....This can be used for crossplatform VSYNC phase offsets (input delay algorithms).
- Beamraced:
- .....It is also accurate enough for beam racing applications, such as cross platform Lagless VSYNC (like WinUAE, but crossplatform on any VSYNC OFF supported platform)

It's up -- https://github.com/blurbusters/RefreshRateCalculator

________________

RefreshRateCalculator CLASS

  • PURPOSE: Accurate cross-platform display refresh rate estimator / dejittered VSYNC timestamp estimator.
  • Input: Series of frame timestamps during framerate=Hz (Jittery/lossy)
  • Output: Accurate filtered and dejittered floating-point Hz estimate & refresh cycle timestamps.
  • Algorithm: Combination of frame counting, jitter filtering, ignoring missed frames, and averaging.
  1. This is also a way to measure a GPU clock source indirectly, since the GPU generates the refresh rate during fixed Hz.
  2. IMPORTANT VRR NOTE: This algorithm does not generate a GPU clock source when running this on a variable refresh rate display (e.g. GSYNC/FreeSync), but can still measure the foreground software application's fixed-framerate operation during windowed-VRR-enabled operation, such as desktop compositor (e.g. DWM). This can allow a background application to match the frame rate of the desktop compositor or foreground application (e.g. 60fps capped app on VRR display). This algorithm currently degrades severely during varying-framerate operation on a VRR display.

LICENSE - Apache-2.0

Copyright 2014-2023 by Jerry Jongerius of DuckWare (https://www.duckware.com) - original code and algorithm
Copyright 2017-2023 by Mark Rejhon of Blur Busters / TestUFO (https://www.testufo.com) - refactoring and improvements

Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at:

http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.

*** First publicly released July 2023 under mutual agreement
*** between Rejhon Technologies Inc. (Blur Busters) and Jongerius LLC (DuckWare)
*** PLEASE DO NOT DELETE THIS COPYRIGHT NOTICE

JAVASCRIPT VSYNC API / REFRESH CYCLE TIME STAMPS

CODE PORTING

  • This algorithm is very portable to most languages, on most platforms, via high level and low level graphics frameworks.
  • Generic VSYNC timestamps is usually immediately after exit of almost any frame presentation API during VSYNC ON framerate=Hz
  • APIs for timestamps include RTDSC / QueryPerformanceCounter() / std::chrono::high_resolution_clock::now()
  • APIs for low level frame presentation include DirectX Present(), OpenGL glFinish(), Vulkan vkQueuePresentKHR()
  • APIs for high level frame presentation include XBox/MonoGame Draw(), Unity3D Update(), etc.
  • APIs for zero-graphics timestamps (e.g. independent/separate thread) include Windows D3DKMTWaitForVerticalBlankEvent()
  • While not normally used for beam racing, this algorithm is sufficiently accurate enough for cross-platform raster estimates for beam racing applications, based on a time offset between refresh cycle timestamps! (~1% error vs vertical resolution is possible on modern AMD/NVIDIA GPUs)
  • Can be used for tearingless VSYNC OFF algorithms (scanline-specific tearline steering offscreen ala RTSS Scanline Sync or SpecialK Latent Sync) as long as separate thread is able to monitor and provide your (jittery) VSYNC or refresh cycle timestamps. Or if your platform/framework supports simultaneous VSYNC ON (offscreen) and VSYNC OFF (visible) in separate threads/contexts.

SIMPLE CODE EXAMPLE

var hertz = new RefreshRateCalculator();

[...]

// Call this inside your full frame rate VSYNC ON frame presentation or your VSYNC listener.
// It will automatically filter-out the jitter and dropped frames.
// For JavaScript, most accurate timestamp occurs if called at very top of your requestAnimationFrame() callback.

hertz.countCycle(performance.now());

[...]

// This data becomes accurate after a few seconds

var accurateRefreshRate = hertz.getCurrentFrequency();
var accurateRefreshCycleTimestamp = hertz.getFilteredCycleTimestamp();

// See code for more good helper functions

OPTIONAL: If you use this for cross platform "lagless vsync"

WinUAE implements a "lagless vsync" algorithm based on beam raced synchronization between emulator refresh cycle and real refresh cycle.
For cross platform beam racing, you'd do your code-ported version of this JavaScript (!) code:
Remember, you need VSYNC OFF while also concurrently being able to listen to the real displays' VSYNC.

// Run this after a 10-second refresh cycle counting initialization at startup (but keep counting beyond, to incrementally improve accuracy sufficiently enough for beam racing apps)
var accurateRefreshRate = hertz.getCurrentFrequency();
var accurateRefreshInterval = 1.0 / accurateRefreshRate;
var accurateRefreshCycleTimestamp = hertz.getFilteredCycleTimestamp();

// Vertical screen resolution
var height = screen.height;

// Common VBI size for maximum raster accuracy, adjust as needed. VGA 480p has 45, and HDTV 1080p has 45
// Or optionally use #ifdef type for plat-specific APIs like Linux modelines or Windows QueryDisplayConfig()
var blanking = 45;

var verticaltotal = height + blanking;
var elapsed = performance.now() - accurateRefreshCycleTimestamp;
var raster = Math.round(verticaltotal * (elapsed / accurateRefreshCycleTimestamp));

// OPTIONAL: If your VSYNC timestamp is end-of-VBI rather than start-of-VBI, then compensate
raster += blanking
raster = (raster % verticaltotal);

While this will freaking actually (uselessly) work in a web browser (I got roughly ~5-10% raster scan line position accuracy guesstimated in a WEB BROWSER running on an NVIDIA GPU, fer crissakes), e.g. a raster guesstimate vertically 50-75 pixels on a 1080p display, NVIDIA-type GPU, i7-type CPU.

...This won't be useful for rasterdemos in a web browser since they're permanently VSYNC ON and do not generate tearlines (no way to listen to VSYNC ON tick-tocks while running in VSYNC OF mode) -- but will work with high-framerate VSYNC OFF standalone software, for precise tearline steering, but remember to Flush() before timstamping for more accurate raster guesstimates. But the fact, I could get a raster scan line estimate in a FREAKING WEB BROWSER to roughly a 5% error margin (on landscape desktop displays, i7 CPU, NVIDIA GPU)....

Don't expect any good rasterdemo accuracy on Intel GPUs (but it's sufficiently accurate enough for emulator frameslice beamracing)

Now, if you do want to raster-estimate a mobile LCD or OLED display, make sure you rotate to its landscape default orientation (top-to-bottom scanout, and verify with high speed camera on https://www.testufo.com/scanout creating videos similar to ala https://www.blurbusters.com/scanout ...) ... Some mobile displays scans sideways, and you can't detect scanout direction in javascript. Boo. However, GPUs scan top-to-bottom to the GPU output, so landscape-monitor-mode will always be displaying a signal that's being scanned top-to-bottom, so you can certainly cross-platform beam race that (more or less).

If ported to C#, you can get sub-1% accuracy, much like Tearline Jedi.

If ported to C / C++ / Rust and using lower level VSYNC listeners, you can sometimes (on high performance platforms) get even better accuracy to as little as 1-scanline on certain less-hyperpipelined GPUs, although likely with a fixed offset that needs to be compensated-for. Emulator frameslice beamracing only need a worse error margin of one frameslice worth of jitter (e.g. 10 frameslices per refresh cycle at 60Hz = 1/600sec beamrace jitter is allowed before artifacts appear)

Remember, this cross platform module need not necessarily be used for beamracing;

- Non-beamraced:
- .....This can simply be used for crossplatform nudging/flywheeling emuHz (CPU clocked) slowly towards realHz (GPU clocked) to prevent latency phase slewing effects
- .....This can be used for crossplatform VSYNC phase offsets (input delay algorithms).
- Beamraced:
- .....It is also accurate enough for beam racing applications, such as cross platform Lagless VSYNC (like WinUAE, but crossplatform on any VSYNC OFF supported platform)

For more information about lagless VSYNC algorithms, see HOWTO: Possible Lagless VSYNC for Emulator Devs (implemented in WinUAE/etc), via beam-raced tearingless VSYNC OFF

Attachments

  • permission.png
    Filename
    permission.png
    File size
    18.91 KiB
    Views
    1481 views
    File license
    Fair use/fair dealing exception

Founder of www.blurbusters.com and www.testufo.com
- Research Portal
- Beam Racing Modern GPUs
- Lagless VSYNC for Emulators

Reply 1 of 4, by mdrejhon

User metadata
Rank Newbie
Rank
Newbie

...This won't be useful for rasterdemos in a web browser since they're permanently VSYNC ON and do not generate tearlines (no way to listen to VSYNC ON tick-tocks while running in VSYNC OF mode) -- but will work with high-framerate VSYNC OFF standalone software, for precise tearline steering, but remember to Flush() before timstamping for more accurate raster guesstimates. But the fact, I could get a raster scan line estimate in a FREAKING WEB BROWSER to roughly a 5% error margin (on landscape desktop displays, i7 CPU, NVIDIA GPU)...

Correction. "I WAS WRONG" </Area5150>

Hint:

C:\Program Files\Google\Chrome\Application\chrome_proxy.exe" --profile-directory=Default --app-id=secondinstance-kefrensbars-testufo-demo --disable-gpu-vsync --disable-frame-rate-limit --user-data-dir=c:\temp\javascript-kefrens-bars

Getting 4000 frameslices/sec in a browser on an RTX 3080 force-upclocked with AfterBurner. Still needs a 2nd browser process (VSYNC ON) as a surrogate blanking interval heartbeat WebSocket proxy.

(Keep tuned for a world's first. Anybody who can read command lines, can guess where this is going...)

Founder of www.blurbusters.com and www.testufo.com
- Research Portal
- Beam Racing Modern GPUs
- Lagless VSYNC for Emulators

Reply 2 of 4, by mdrejhon

User metadata
Rank Newbie
Rank
Newbie

SUCCESS! (shh). Just PM'd VileR who I'm letting beta test the world's first browser-based rasterdemo.
EDIT: Confirms that it also works on VileR's GPU too, so he can vouch I actually put "web browser" and "beam racing" in the same sentence!
Keeping details under wraps, trying to decide how to announce.

Anyway, for now, some learning experience for emulator authors who do not realize that simple graphics = worse jitter! (Dosbox transitioned to a 1ms system, which accidentally made dosbox run much smoother by virtur)

RULE OF THUMB:
Prevent GPU power management jitter: Never let GPU idle for a millisecond between frames

This is useful for reducing latency in emulators used for video games.

Some of the random performance / random rendertimes / random speed on my GPU was actually traced to power management jitter.

We've found algorithms to solve/stabilize (if necessary). You wouldn't need to do any beam racing, but simply sync emulator Hz to real Hz -- as long as the machine was performant enough.

Although you don't have to do beam racing:

Some emulators have transitioned to spreading GPU rendering in less than 1ms apart (Rule: "Never Never Let The GPU Idle For A Full 1ms Between Draw Commands") to force GPU to stop its jittery (4-8ms jitter) power management that occurs during low GPU load between frames.

That's the main technique: Use performance mode, and never let the GPU idle for a full millisecond. Tardy-out the rendering to ensure graphics draw commands and frame presentation don't ever, ever, ever idle for a full millisecond. That means adding CPU inputdelays between draw commands, or other random tricks some emu authors do -- and suddenly the GPU gives you much more deterministic behavior! (At least not in Battery Saver Mode).

Although not all emulator authors were aware of our knowledge of GPU's jittery powermanagement behaviors, dosbox apparently accidentally made the GPU behave more smoothly, from their 1ms-surge-execute system, rather than "try to render quick and idle between frames" approaches for underutilized-% GPU. The idling between frames creates really nasty jitter from the cost of powermanagement -- the GPU actually falls asleep (brief moments of 0Mhz-like ops) between frames!

The algorithms built into emulators to intentionally spread out GPU draw commands, helps massively.

The GPU actually goes to SLEEP (0Mhz) betwen frames. That's why you get a lot of jitter when waking up a GPU.

Techniques to reduce/prevent GPU power management jitter from low GPU % utilization.

1. Use Performance Power Plan. There are APIs to warn the user (console, log, warning) if the Performance Power Plan is not selected.

2. Spread your GPU draw commands in a way that only about ~100-500 microseconds elapse between consecutive GPU draw commands.

3. Or if you must sleep (save battery on laptop), wake up the GPU about ~4ms (configurable) before frame presentation and then "thrash" it with a series of final spread-out draw commands or repeat-renders (be careful: drivers will ignore NOP commands, so be creative). The GPU can still sleep 75% of a 60Hz refresh cycle on a laptop, and still give you reasonable frame-presentation precision, if you wake the GPU early before doing your render work. Some GPU needs 8+ms to "wake up", other GPUs only need 1ms to "wake up".

Emulation usually does not push GPUs hard at all (e.g. NVIDIA/AMD) and they often randomly sleep for several milliseconds randomly throughout rendering, when GPU utilization is extremely low.

When we did our workarounds, GPU render jitter dramatically fell by more than 90%-99% (depending on GPU), which we had to do in order to pull off beamracing feats. (Some of this rendertime dejittering is applicable to emulator authors who want to improve framepacing accuracy, even without beamracing)

Once I made sure to never let a GPU idle for long -- I was able to get sub-1ms precision in browser based stuff. I can now definitively say it is possible to get a "lagless vsync" algorithm (sync emulator raster to real raster, like WinUAE does) into a browser-based emulator, to allow JavaScript to approach within ~2ms of original machine latency / FPGA latency. Imagine that. But it's not a very intuitive algorithm -- however, it's also applicable to simpler sync of emulator Hz to real Hz.

Founder of www.blurbusters.com and www.testufo.com
- Research Portal
- Beam Racing Modern GPUs
- Lagless VSYNC for Emulators

Reply 3 of 4, by mdrejhon

User metadata
Rank Newbie
Rank
Newbie

Still discussing behind the scenes, how to publicize browser-based raster beam racing.

Practical prototype confirmed on several Windows machines. Works in multiple chromium forks, although not in Edge or FireFox.

P.S. I wish a HTML5 API existed to turn VSYNC ON/OFF, but perhaps publicizing this will convince the web standardizations to provide an API to activate VSYNC OFF mode

Founder of www.blurbusters.com and www.testufo.com
- Research Portal
- Beam Racing Modern GPUs
- Lagless VSYNC for Emulators

Reply 4 of 4, by mdrejhon

User metadata
Rank Newbie
Rank
Newbie

Video of Kefrens Bars in Javascript via VSYNC OFF frameslice beam racing: https://www.youtube.com/watch?v=Bk20l7akRUk

It's doing 2000 frames/sec, so that's 2000 pixels rows/sec, works best at low refresh rates.

Also unscreenshottable (only captures one pixel row as a stretched full-height frame), so it's true raster.

It may be time to collaborate on creating a cross-platform VSYNC timestamps library/daemon, based on the information I've found. This would be a good way to make all of this much easier.

Founder of www.blurbusters.com and www.testufo.com
- Research Portal
- Beam Racing Modern GPUs
- Lagless VSYNC for Emulators