superfury wrote on 2023-07-11, 01:22:Wouldn't that simply be filling a 1-byte FIFO buffer during each T3 state with the read/write value, cleared after T4 occurs (finishing cycle). Then simply replace CGA read ram when fetching for rendering with said value.
Although UniPCemu still renders the EGA/VGA way, latching 4 bytes from all 4 planes in one go into the shift register.
Should I just replace all 4 bytes with the buffer contents if non-empty?
I admit I'm not familiar enough with the CGA's CRTC *specifically* to understand what kind of few-pixels-buffer A/D conversion workflow it might or might not have -- or whether the CRTC has on-chip memory / shift registers -- or it did it more directly off the main memory (unbuffered), etc.  Theoretically a sufficient RAMDAC FIFO could have eliminated CGA snow, but there's so many things to go wrong, especially in the early age, e.g. unexpected waitstates longer than a single-byte/single-word FIFO.
I'm just providing a generic-RAMDAC-knowledge theory of possible offset for your snow where it might be offset a few pixels to the left (offscreen) versus the real 5150/5160.
___
If it behaved generally roughly like a typical (slightly more recent) retro RAMDAC, it was often a separate chip -- and it often contained a super tiny amount of on-chip FIFO memory separate from the main memory.  Sometimes only a few bits, perhaps just enough for maybe 2, 4, 8 or 16 horizontal pixels.  That level of tininess, almost unmeasurable, and not usually interesting to emulation since you're only concerned about the pre-analog side -- but possibly theoretically relevant to CGA snow emulation.
It may be implemented as a shift register, or a circular buffer, or other, to assist the D/A conversion.  Can't know for sure without delidding the silicon and reverse engineering, but you can at least oscilloscope the digital side and analog side simultaneously (with a good high-samplerate oscilloscope), and check the phase offset -- and count the cycles/pixel offset that way.
For the FIFO-behavior location, there can be two 'internal' places (additional potential conflicts away from main memory) where memory reads/writes can theoretically collide -- the digital side (where the buffer writeout side injects the pixels) -- and the analog side (where the buffer readout finally converts to analog).   If the memory collision occured at a specific location, there might or might not be a tapedelay effect to the snow result.
  
I'm just providing a generic-RAMDAC-knowledge theory of possible offset for your snow where it might be offset a few pixels to the left (offscreen) versus the real 5150/5160.
   
While I am an expert at Present()-to-Photons, my per-retro-platform of specific RAMDAC/transciever ultratiny-tapedelay behaviors (e.g. VGA vs EGA vs CGA vs Matrox vs ATI vs Nintendo vs Sega vs etc) is limited -- different chips RAMDAC'd differently.  And even into the digital era, we still also have tapedelay-style buffering lag (still under 1ms) at the port level.  The HDMI/DisplayPort transceiver / multiplexing (audio/video/etc) / micropacketization latency in the HDMI/DP chip (the bigger digital version of RAMDAC tape delay effects, typcially a tapedelay effect of a scanline or few), is also a giant rabbit hole unto itself.   While DisplayPort was micropacketized practically from the beginning (page 5, PDF from displayport.org 2008 on Internet Archive), HDMI versions had a progression from more rudimentary transmission (for passive adaptoring between HDMI and DVI) to a multiplexing-capable micropacket format.  Be noted, early HDMI used separate wires f0r audio (and still uses today for the baseline minimum audio spec), but newer versions of high-bitrate HDMI multiplexes more digitals onto the high-bandwidth wires, to accomodate higher-bitrate audio formats and other metadata, etc, so HDMI 2.x is way more packetized than HDMI 1 was 20-ish years ago.  And now we have optional compression like DSC.  So more transceiver buffering on both ends nowadays to dejitter the micropackets into a constant-rate bitstream for the display scaler/tcon - even if less than 1ms.  But way more than a few pixels, indeed.
But fortunately you don't have to worry about real world behaviors, since you're also cycle-accurately emulating the display and composite output, into a digital framebuffer, and can compensate accordingly without worrying one iota about modern RAMDAC/transceiver behaviors.   Just only focusing on accurate CGA behavior including snow -- you're lucky.
A great way to learn about the responsibilities of a RAMDAC is this video -- https://youtu.be/l7rce6IQDWs -- is build your own rudimentary graphics adaptor out of common electronics parts.
  
And a YouTuber did exactly that -- highly educational why a RAMDAC needs a shift register or FIFO to dejitter the digital before analog output, as a CRT tube will not pause mid-scanline even for a nanosecond for you.  Someone rolled their own electronics to create a defacto graphics adaptor (outputting low rez large pixels via a SVGA signal) using homebrew electronics chips / Arduino style microcontroller -- and found out things like memory timings (waitstates?) inserted glitches as thin black vertical lines between large pixels into the VGA video signal.  A proper RAMDAC would use a FIFO or shift register (away from main video RAM) of a few pixels to prevent glitches like these.   Ironically -- In theory if properly implemented in 1981, this could have fully prevented CGA snow (if colliding memory access events was automatically serialized, and the FIFO smoothed that digital timing jitter over in the merrily jitter-free analog output).   Ha.
(Screenshot from YouTube of home made graphics adaptor, showing vertical black lines from memory-timings-induced glitches from not having a FIFO/shiftregister dejitterer during A/D conversion -- some "pixels" are even horizontally offset relative to previous scanlines!  The same artifact appears on any VGA-capable display, whether CRT tube or a LCD converting the VGA output)
