MartyPC

Reply 40 of 559, by Scali

Posted on 2023-07-10, 12:56

Scali Offline

Rank l33t

Rank: l33t
Posts: 4873
Joined: 2014-12-13, 14:24

GloriousCow wrote on 2023-07-09, 18:43:

4 phases would make sense if we're talking about the CPU vs PIT, since 1 PIT tick is 4 CPU cycles.

I don't recall the exact details. I suppose this is something that needs to be verified against real hardware.
But yes, there is some quirk that not all components start at the same relative phase to each other everytime. Which is something that is quite unique. I don't recall any other system having such an issue.
But it might go beyond just the CPU and the PIT. It could also affect for example the CRTC, the DMA controller, and who knows what other peripherals.

http://scalibq.wordpress.com/just-keeping-it- … ro-programming/

Reply 41 of 559, by GloriousCow

Posted on 2023-07-10, 13:12

GloriousCow Offline

Rank Oldbie

Rank: Oldbie
Posts: 512
Joined: 2022-09-12, 20:00

Scali wrote on 2023-07-10, 12:56:

But it might go beyond just the CPU and the PIT. It could also affect for example the CRTC, the DMA controller, and who knows what other peripherals.

The CGA phase would shift the order of the resulting wait states, which could be an issue, but I would think unless you had to reboot the system 16 times to keep Kefrens stable, it's somehow not causing issues.
The DMA controller, as far as I can tell, gets the same clock as the CPU, it should synchronize on a DREQ when that whole process kicks off.

We could have system tick boot-offsets for all of these devices, if there's a rationale for it, it's easy to add.

MartyPC: A cycle-accurate IBM PC/XT emulator | https://github.com/dbalsom/martypc

Reply 42 of 559, by mdrejhon

Posted on 2023-07-11, 00:57

mdrejhon Offline

Rank Newbie

Rank: Newbie
Posts: 32
Joined: 2023-07-01, 10:18

GloriousCow wrote on 2023-07-08, 16:10:
We have snow!

A nice good old fashioned Canadian snowstorm!

(I'm Canadian, by the way. ATI and Matrox were Canadian companies by the way)

I suppose my question was a fantastic one that triggered your decision to implement CGA snow -- because artifacts are "unexpectedly" a great debugging tool (much like for me, that generic tearline position were a great visual raster debugging tool for modern GPU-based beam racing through modern APIs)

I suspect there's a slight tapedelay-effect (off-by-a-cycle-or-two) for CGA snow. I did hear that some (many?) retro graphics chips had a ultra-tiny FIFO buffer (just a few pixels) to allow for things like one-cycle or partial-cycle jitter in processing (e.g. reading character map data from waitstated memory, etc). So it is possible that the actual snow output might be delayed a clock or fraction thereof. I don't know if this behavior applied to the CRTC though.

So, if you want to do the reverse-engineering way to determine if CGA snow undergoes a tiny tapedelay effect --
An oscilloscope on a real machine (listening to the digital-side of CRTC then the analog-side of CRTC) may determine how big a theoretical undocumented D/A conversion FIFO buffer (might be a tiny tapedelay of 8 pixels, for example).

However...
...easier is probably to try a temporary artifical offset and see if you can reproduce the snow position in Scali's post.

So the CRTC maybe unexpectedly has the 'behavior' of an early proto-RAMDAC (Random Access Memory Digital Analog Converter), of typically adding a very tiny FIFO-like tapedelay, sometimes of only a few pixels.

RAMDACs have an extremely minor (almost unmeasurable) latency effect. Retroactive on-chip buffer memory in a retro "RAMDAC" (whether tapedelayed or not) may also be utilized with SECAM. And possibly PAL -- I heard that delay line logic often improved PAL picture quality further. But the memory in the RAMDAC may still be utilized for NTSC or RGBI for other reasons, such as microscopic amounts of de-jittering during D/A conversion. Or attenuating-out certain kinds of composite artifacts, such as reducing visibility of horizontally-moving beads. All kinds of tricks the signal engineers have come up with to solve various problems of a metaphorically impatient analog beam never waiting a nanosecond for the digital side to finish the deed. So tiny on-chip FIFO buffers of a few bytes often exists for the A/D conversion logic. While RAMDACs are fairly lagless (microsecond-league), they aren't perfectly lagless even during the VGA era;

From Scali's post:
file.php?id=167984

Last edited by mdrejhon on 2023-07-11, 01:23. Edited 1 time in total.

Founder of www.blurbusters.com and www.testufo.com
- Research Portal
- Beam Racing Modern GPUs
- Lagless VSYNC for Emulators

Reply 43 of 559, by superfury

Posted on 2023-07-11, 01:22

superfury Offline

Rank l33t++

Rank: l33t++
Posts: 5905
Joined: 2014-03-08, 11:25
Location: Netherlands

mdrejhon wrote on 2023-07-11, 00:57:
A nice good old fashioned Canadian snowstorm! […]
Show full quote

GloriousCow wrote on 2023-07-08, 16:10:
We have snow!

A nice good old fashioned Canadian snowstorm!

(I'm Canadian, by the way. ATI and Matrox were Canadian companies by the way)

I suppose my question was a fantastic one that triggered your decision to implement CGA snow -- because artifacts are "unexpectedly" a great debugging tool (much like for me, that tearline position were a raster debugging tool for modern GPU-based beam racing through modern APIs)

I suspect there's a slight tapedelay-effect (off-by-a-cycle-or-two) for CGA snow. I did hear that some (many?) retro graphics chips had a ultra-tiny FIFO buffer (just a few pixels) to allow for things like one-cycle or partial-cycle jitter in processing (e.g. reading character map data from waitstated memory, etc). So it is possible that the actual snow output might be delayed a clock or fraction thereof. I don't know if this behavior applied to the CRTC though.

So, if you want to do the reverse-engineering way to determine if CGA snow undergoes a tiny tapedelay effect --
An oscilloscope on a real machine (listening to the digital-side of CRTC then the analog-side of CRTC) may determine how big a theoretical undocumented D/A conversion FIFO buffer (might be a tiny tapedelay of 8 pixels, for example). RAMDACs have an extremely minor (almost unmeasurable) latency effect. Retroactive buffer memory in a retro "RAMDAC" (whether tapedelayed or not) is more utilized with SECAM (and PAL -- delay line can improve PAL picture quality) but the memory in the RAMDAC may still be utilized for NTSC or RGBI for other reasons, such as microscopic amounts of de-jittering during D/A conversion, or attenuating-out certain kinds of composite artifacts (like reducing visibility of horizontally-moving beads). While RAMDACs are fairly lagless (microsecond-league), they aren't perfectly so;

So the CRTC maybe unexpectedly has the 'behavior' of an early proto-RAMDAC (Random Access Memory Digital Analog Converter), of typically adding a very tiny FIFO-like tapedelay, sometimes of only a few pixels.

However...
...easier is probably to try a temporary artifical offset and see if you can reproduce the snow position in Scali's post.

Wouldn't that simply be filling a 1-byte FIFO buffer during each T3 state with the read/write value, cleared after T4 occurs (finishing cycle). Then simply replace CGA read ram when fetching for rendering with said value.
Although UniPCemu still renders the EGA/VGA way, latching 4 bytes from all 4 planes in one go into the shift register.
Should I just replace all 4 bytes with the buffer contents if non-empty?

Edit: Does the CGA like the VGA in CGA-compatible mode fetch 2 bytes (1 from plane 0 and 1 from plane 1) in one read, then parse(text mode) or shift odd/even(4 color mode) or serially(monochrome mode) like the VGA does?

Last edited by superfury on 2023-07-11, 01:30. Edited 2 times in total.

Author of the UniPCemu emulator.
UniPCemu Git repository
UniPCemu for Android, Windows, PSP, Vita and Switch on itch.io

Reply 44 of 559, by mdrejhon

Posted on 2023-07-11, 01:26

mdrejhon Offline

Rank Newbie

Rank: Newbie
Posts: 32
Joined: 2023-07-01, 10:18

superfury wrote on 2023-07-11, 01:22:
Wouldn't that simply be filling a 1-byte FIFO buffer during each T3 state with the read/write value, cleared after T4 occurs (finishing cycle). Then simply replace CGA read ram when fetching for rendering with said value.
Although UniPCemu still renders the EGA/VGA way, latching 4 bytes from all 4 planes in one go into the shift register.
Should I just replace all 4 bytes with the buffer contents if non-empty?

I admit I'm not familiar enough with the CGA's CRTC *specifically* to understand what kind of few-pixels-buffer A/D conversion workflow it might or might not have -- or whether the CRTC has on-chip memory / shift registers -- or it did it more directly off the main memory (unbuffered), etc. Theoretically a sufficient RAMDAC FIFO could have eliminated CGA snow, but there's so many things to go wrong, especially in the early age, e.g. unexpected waitstates longer than a single-byte/single-word FIFO.

I'm just providing a generic-RAMDAC-knowledge theory of possible offset for your snow where it might be offset a few pixels to the left (offscreen) versus the real 5150/5160.

___

If it behaved generally roughly like a typical (slightly more recent) retro RAMDAC, it was often a separate chip -- and it often contained a super tiny amount of on-chip FIFO memory separate from the main memory. Sometimes only a few bits, perhaps just enough for maybe 2, 4, 8 or 16 horizontal pixels. That level of tininess, almost unmeasurable, and not usually interesting to emulation since you're only concerned about the pre-analog side -- but possibly theoretically relevant to CGA snow emulation.

It may be implemented as a shift register, or a circular buffer, or other, to assist the D/A conversion. Can't know for sure without delidding the silicon and reverse engineering, but you can at least oscilloscope the digital side and analog side simultaneously (with a good high-samplerate oscilloscope), and check the phase offset -- and count the cycles/pixel offset that way.

For the FIFO-behavior location, there can be two 'internal' places (additional potential conflicts away from main memory) where memory reads/writes can theoretically collide -- the digital side (where the buffer writeout side injects the pixels) -- and the analog side (where the buffer readout finally converts to analog). If the memory collision occured at a specific location, there might or might not be a tapedelay effect to the snow result.

I'm just providing a generic-RAMDAC-knowledge theory of possible offset for your snow where it might be offset a few pixels to the left (offscreen) versus the real 5150/5160.

While I am an expert at Present()-to-Photons, my per-retro-platform of specific RAMDAC/transciever ultratiny-tapedelay behaviors (e.g. VGA vs EGA vs CGA vs Matrox vs ATI vs Nintendo vs Sega vs etc) is limited -- different chips RAMDAC'd differently. And even into the digital era, we still also have tapedelay-style buffering lag (still under 1ms) at the port level. The HDMI/DisplayPort transceiver / multiplexing (audio/video/etc) / micropacketization latency in the HDMI/DP chip (the bigger digital version of RAMDAC tape delay effects, typcially a tapedelay effect of a scanline or few), is also a giant rabbit hole unto itself. While DisplayPort was micropacketized practically from the beginning (page 5, PDF from displayport.org 2008 on Internet Archive), HDMI versions had a progression from more rudimentary transmission (for passive adaptoring between HDMI and DVI) to a multiplexing-capable micropacket format. Be noted, early HDMI used separate wires f0r audio (and still uses today for the baseline minimum audio spec), but newer versions of high-bitrate HDMI multiplexes more digitals onto the high-bandwidth wires, to accomodate higher-bitrate audio formats and other metadata, etc, so HDMI 2.x is way more packetized than HDMI 1 was 20-ish years ago. And now we have optional compression like DSC. So more transceiver buffering on both ends nowadays to dejitter the micropackets into a constant-rate bitstream for the display scaler/tcon - even if less than 1ms. But way more than a few pixels, indeed.

But fortunately you don't have to worry about real world behaviors, since you're also cycle-accurately emulating the display and composite output, into a digital framebuffer, and can compensate accordingly without worrying one iota about modern RAMDAC/transceiver behaviors. Just only focusing on accurate CGA behavior including snow -- you're lucky.

A great way to learn about the responsibilities of a RAMDAC is this video -- https://youtu.be/l7rce6IQDWs -- is build your own rudimentary graphics adaptor out of common electronics parts.

And a YouTuber did exactly that -- highly educational why a RAMDAC needs a shift register or FIFO to dejitter the digital before analog output, as a CRT tube will not pause mid-scanline even for a nanosecond for you. Someone rolled their own electronics to create a defacto graphics adaptor (outputting low rez large pixels via a SVGA signal) using homebrew electronics chips / Arduino style microcontroller -- and found out things like memory timings (waitstates?) inserted glitches as thin black vertical lines between large pixels into the VGA video signal. A proper RAMDAC would use a FIFO or shift register (away from main video RAM) of a few pixels to prevent glitches like these. Ironically -- In theory if properly implemented in 1981, this could have fully prevented CGA snow (if colliding memory access events was automatically serialized, and the FIFO smoothed that digital timing jitter over in the merrily jitter-free analog output). Ha.

(Screenshot from YouTube of home made graphics adaptor, showing vertical black lines from memory-timings-induced glitches from not having a FIFO/shiftregister dejitterer during A/D conversion -- some "pixels" are even horizontally offset relative to previous scanlines! The same artifact appears on any VGA-capable display, whether CRT tube or a LCD converting the VGA output)

file.php?mode=view&id=168127

Last edited by mdrejhon on 2023-07-11, 06:25. Edited 8 times in total.

Founder of www.blurbusters.com and www.testufo.com
- Research Portal
- Beam Racing Modern GPUs
- Lagless VSYNC for Emulators

Reply 45 of 559, by superfury

Posted on 2023-07-11, 02:38

superfury Offline

Rank l33t++

Rank: l33t++
Posts: 5905
Joined: 2014-03-08, 11:25
Location: Netherlands

mdrejhon wrote on 2023-07-11, 01:26:
I admit I'm not familiar enough with the CGA's CRTC *specifically* to understand what kind of few-pixels-buffer A/D conversion w […]
Show full quote

superfury wrote on 2023-07-11, 01:22:
Wouldn't that simply be filling a 1-byte FIFO buffer during each T3 state with the read/write value, cleared after T4 occurs (finishing cycle). Then simply replace CGA read ram when fetching for rendering with said value.
Although UniPCemu still renders the EGA/VGA way, latching 4 bytes from all 4 planes in one go into the shift register.
Should I just replace all 4 bytes with the buffer contents if non-empty?

I admit I'm not familiar enough with the CGA's CRTC *specifically* to understand what kind of few-pixels-buffer A/D conversion workflow it might or might not have -- or whether the CRTC has on-chip memory / shift registers -- or it did it more directly off the main memory (unbuffered), etc. Theoretically a sufficient RAMDAC FIFO could have eliminated CGA snow, but there's so many things to go wrong, especially in the early age, e.g. unexpected waitstates longer than a single-byte/single-word FIFO.

I'm just providing a generic-RAMDAC-knowledge theory of possible offset for your snow where it might be offset a few pixels to the left (offscreen) versus the real 5150/5160.

___

If it behaved generally roughly like a typical (slightly more recent) retro RAMDAC, it was often a separate chip -- and it often contained a super tiny amount of on-chip FIFO memory separate from the main memory. Sometimes only a few bits, perhaps just enough for maybe 2, 4, 8 or 16 horizontal pixels. That level of tininess, almost unmeasurable, and not usually interesting to emulation since you're only concerned about the pre-analog side -- but possibly theoretically relevant to CGA snow emulation.

It may be implemented as a shift register, or a circular buffer, or other, to assist the D/A conversion. Can't know for sure without delidding the silicon and reverse engineering, but you can at least oscilloscope the digital side and analog side simultaneously (with a good high-samplerate oscilloscope), and check the phase offset -- and count the cycles/pixel offset that way.

For the FIFO-behavior location, there can be two 'internal' places (additional potential conflicts away from main memory) where memory reads/writes can theoretically collide -- the digital side (where the buffer writeout side injects the pixels) -- and the analog side (where the buffer readout finally converts to analog). If the memory collision occured at a specific location, there might or might not be a tapedelay effect to the snow result.

I'm just providing a generic-RAMDAC-knowledge theory of possible offset for your snow where it might be offset a few pixels to the left (offscreen) versus the real 5150/5160.

While I am an expert at Present()-to-Photons, my per-retro-platform of specific RAMDAC ultratiny-tapedelay behaviors (e.g. VGA vs EGA vs CGA vs Matrox vs ATI vs Nintendo vs Sega vs etc) is limited -- different chips RAMDAC'd differently. And even into the digital era, even the HDMI/DisplayPort transceiver and micropacketization latency in the HDMI/DP chip (the bigger digital version of RAMDAC tape delay effects, typcially a tapedelay effect of a scanline or few), is also a giant rabbit hole unto itself.

But fortunately you don't have to worry about real world behaviors, since you're also cycle-accurately emulating the display and composite output, into a digital framebuffer, and can compensate accordingly without worrying one iota about modern RAMDAC/transceiver behaviors. Just only focussing on accurate CGA behavior including snow -- you're lucky.

A great way to learn about the responsibilities of a RAMDAC is this video -- https://youtu.be/l7rce6IQDWs -- is build your own rudimentary graphics adaptor out of common electronics parts.

And a YouTuber did exactly that -- highly educational why a RAMDAC needs a shift register or FIFO to dejitter the digital before analog output, as a CRT tube will not pause mid-scanline even for a nanosecond for you. Someone rolled their own electronics to create a defacto graphics adaptor (outputting low rez large pixels via a SVGA signal) using homebrew electronics chips / Arduino style microcontroller -- and found out things like memory timings (waitstates?) inserted glitches as thin black vertical lines between large pixels into the VGA video signal. A proper RAMDAC would use a FIFO or shift register (away from main video RAM) of a few pixels to prevent glitches like these. Ironically -- In theory if properly implemented in 1981, this could have fully prevented CGA snow (if colliding memory access events was automatically serialized, and the FIFO smoothed that digital timing jitter over in the merrily jitter-free analog output). Ha.

(Screenshot from YouTube of home made graphics adaptor, showing vertical black lines from memory-timings-induced glitches from not having a FIFO/shiftregister dejitterer during A/D conversion -- some "pixels" are even horizontally offset relative to previous scanlines! The same artifact appears on any VGA-capable display, whether CRT tube or a LCD converting the VGA output)

Well, for now I tried the algorithm I mentioned, and the results were suprising:
https://www.dropbox.com/s/tqwz6lwdcabstnd/Uni … 4-28-06.7z?dl=0

The CGA snow is clearly present during 8088MPH running inside UniPCemu, although it might be a bit much? Or is that a sign of a different problem in UniPCemu's CGA emulation?
UniPCemu simply makes the access (which lasts from T1 through T4) fill a buffer on T1 (only when reading/writing to VRAM), clearing it after T1 ticks again.
Then the CGA/MDA card simply is modified that when it reads/writes it's RAM (for filling it's latches) it will check if the buffer is filled. If it is filled, simply replace both bytes read from VRAM (char/attr or 2 bytes of rendering data for the graphics modes(planes 0/1 data)) with a double duplicate of the value in the buffer (so the two bytes to render get replaced with the value that's being read/written to/from VRAM).

Edit: Most 8088 MPH effects now get garbled up quite a bit (due to the buffer being filled when rendering)? Pretty much all the effects are affected by it? It's probably happening during almost every write to VRAM it looks like (whenever display is overwritten with something else).

I see some interesting effects later on as well though (espescially the Kefrens part):
https://www.dropbox.com/s/l3417r7x32k41yy/Uni … 4-42-27.7z?dl=0

I think I see some kind of pattern in the noise?

Author of the UniPCemu emulator.
UniPCemu Git repository
UniPCemu for Android, Windows, PSP, Vita and Switch on itch.io

Reply 46 of 559, by Scali

Posted on 2023-07-11, 05:48

Scali Offline

Rank l33t

Rank: l33t
Posts: 4873
Joined: 2014-12-13, 14:24

If you have implemented your wait states and bus logic correctly, then snow should only occur in 80c textmode, as that's the only mode that uses twice the bandwidth.
Wait states prevent the CPU from accessing the bus during regular VRAM reads for 40c textmode and graphics modes. So although you could theoretically only enable the snow-emulation in 80c mode, a more correct implementation has snow enabled all the time, but the wait state emulation prevents it from occuring in any other mode than the high-bandwidth 80c mode. It does twice as many fetches from VRAM, and the wait state only prevents snow on every other fetch.

http://scalibq.wordpress.com/just-keeping-it- … ro-programming/

Reply 47 of 559, by mdrejhon

Posted on 2023-07-11, 06:27

mdrejhon Offline

Rank Newbie

Rank: Newbie
Posts: 32
Joined: 2023-07-01, 10:18

On the related "video port idiosyncracies" talk such as RAMDAC/transceiver lag behaviors, there's another rabbit hole: GPU-vs-CPU clock slew effect (unrelated to CGA snow).

So as another potential latency optimization for emulator authors, I added yet another lag-reducing tip/algorithm (much easier than WinUAE lagless vsync):
New easier lag-reduction algorithm sub-post within the lagless vsync thread

This is because CPU and GPU clocks are never synchronized on modern machines.

So CPU and GPU clocks can drift apart and the drift speed can change microscopically as machine warms up (thermals), even if it's a 0.001Hz change. So for DOS emulators syncing to a CPU clock, there's a latency sawtoothing effect (cyclic [no added lag...+one refresh cycle added lag] latency that slowly beat-frequencies every, say, 60 minutes depending on how emulator is drifting versus your custom non-VRR fixed-Hz refresh rate mode).

That can happen gradually over a time period, if the emulator clocks emulated refresh cycle via CPU (e.g. RTDSC or QueryPerformanceCounter) rather than via GPU as refresh rate clock source (e.g. VSYNC detection). The former is good for VRR, while the latter is better for fixed-Hz. This can also be done automatically, for user friendliness. Selecting the master clock source for emulated refresh cycles is a useful setting that several emulators have (e.g. "Sync to refresh rate ON/OFF") as a defacto switch between CPU and GPU clock, to prevent the latency-slewing effects -- which can affect muscle memory for video games.

Remember, even a continual 2ms lag change (over time) can mean 2 millimeter "muscle memory" misaim for objects moving 1000 millimeters per second (e.g. archery arrows, moving targets, pinball balls, FPS enemies in esports, other fast moving objects etc). The latency sawtooth can be a full refresh cycle (60Hz = 1/60sec = 16.7ms) so a game may lag 16ms less or more the next minute (or hour) than the previous minute (or hour), depending on how fast the latency slew effect occurs between a CPU-clocked emulator refresh cycle and the GPU refresh cycle. It's as if the slew effect is wreaking havoc on your average human reaction time (aiming) - 16ms is over 10% of a 150ms human reaction time (well-attuned can be 100ms for button press reactions when ignoring the software and Present-to-Photons latencies, which is why you should subtract 2-3 refresh cycles from a web browser benchmark such as humanbenchmark). This can mean the difference between feeling you're playing expertly, or feeling you're not playing as well as you did on original machine.

Scali wrote on 2023-07-11, 05:48:
If you have implemented your wait states and bus logic correctly, then snow should only occur in 80c textmode, as that's the only mode that uses twice the bandwidth.

Great point on this!

Founder of www.blurbusters.com and www.testufo.com
- Research Portal
- Beam Racing Modern GPUs
- Lagless VSYNC for Emulators

Reply 48 of 559, by superfury

Posted on 2023-07-11, 08:45

superfury Offline

Rank l33t++

Rank: l33t++
Posts: 5905
Joined: 2014-03-08, 11:25
Location: Netherlands

Scali wrote on 2023-07-11, 05:48:

If you have implemented your wait states and bus logic correctly, then snow should only occur in 80c textmode, as that's the only mode that uses twice the bandwidth.
Wait states prevent the CPU from accessing the bus during regular VRAM reads for 40c textmode and graphics modes. So although you could theoretically only enable the snow-emulation in 80c mode, a more correct implementation has snow enabled all the time, but the wait state emulation prevents it from occuring in any other mode than the high-bandwidth 80c mode. It does twice as many fetches from VRAM, and the wait state only prevents snow on every other fetch.

Well, the waitstates on the CGA memory read/write (from the CPU) is from the point of the T1(start of memory access) until the next match on the CGA rendering (First straight 8 hdots, then waiting for modulo 15 on the pixel counter to become 0 (next lchar). Said counter for that increases every video card clock. It's never reset.
After that lchar (modulo 15)==0 match, the CPU will start the next access to VRAM. During that entire part, the BIU is effectively halted in Tw state, with the buffer (the latch) kept filled because of the VRAM access.

Edit: After some more work on the BIU and CGA (to make it properly count the waitstates and keep the BIU in T1 state) it now seems to work properly.

It tries to execute a memory access first. Then if any waitstates occur on the access, the T1 state is kept to try again next time and the access is transformed into a waitstate access instead on the BIU.

The CGA/MDA on the other hand simply perform a check when starting an access. When that check isn't done for the current access, it will mark that as started and set a flag in the CPU halt flags which is checked once the CPU's halt state (because of the waitstate) is released. That release is done by the hdot counter mentioned above the Edit part in this post. Once the CPU then again tries to execute T1, the CGA/MDA will recognise the flag set in the CPU halt flags and let the access succeed completely. After that the access to VRAM (actually done for every waitstate) will succeed and see the flag. It will fill the RAM access buffer for the snow and clear the bit marking to do that in the CPU halt flags, completing the access. Since the BIU/CPU isn't halted at that point anymore, it will actually proceed to T2 at that point until the operation finishes.

8088MPH still shows quite a lot of snow though?

Edit: Although the Kefrens effect still has noise (at what seems regular intervals accross the screen on certain scanlines), it keeps sync with the display now at last 😁
Edit: Made a capture of 8088MPH again with the snow emulated:
https://www.dropbox.com/s/2820fv5fojci2qc/Uni … 3-16-24.7z?dl=0

Author of the UniPCemu emulator.
UniPCemu Git repository
UniPCemu for Android, Windows, PSP, Vita and Switch on itch.io

Reply 49 of 559, by GloriousCow

Posted on 2023-07-11, 11:59

GloriousCow Offline

Rank Oldbie

Rank: Oldbie
Posts: 512
Joined: 2022-09-12, 20:00

If may be so bold, I'd like to nudge this conversation over to a new thread I've started on CGA snow specifically. superfury, I will reply to your post there.

Emulating CGA Snow

MartyPC: A cycle-accurate IBM PC/XT emulator | https://github.com/dbalsom/martypc

Reply 50 of 559, by keenmaster486

Posted on 2023-07-11, 15:20

keenmaster486 Offline

Rank l33t

Rank: l33t
Posts: 3023
Joined: 2016-02-16, 02:04
Location: Gnosticus IV

Very nice project. Excited to see EGA emulation introduced.

I may be missing this, but is there a way to configure the video output? That is, the scaling method, fullscreen, hardware accelerated vs. not, etc.?

One other thing I noticed is that the navigational keys (home, end, etc.) don't work.

I had to get all my ROMs from minuszerodegrees. Using ROMs from 86box and other sources simply did not work.

Also, the hard drive controller emulation did not work when the machine type was set to 5150. I assume this configuration would work in real life, so why not allow it?

World's foremost 486 enjoyer.

Reply 51 of 559, by GloriousCow

Posted on 2023-07-11, 16:05

GloriousCow Offline

Rank Oldbie

Rank: Oldbie
Posts: 512
Joined: 2022-09-12, 20:00

keenmaster486 wrote on 2023-07-11, 15:20:

Very nice project. Excited to see EGA emulation introduced.

Thanks! Just need to tidy up the design of my display system and I can restart work on EGA and VGA, which brings us to the next point...

keenmaster486 wrote on 2023-07-11, 15:20:

I may be missing this, but is there a way to configure the video output? That is, the scaling method, fullscreen, hardware accelerated vs. not, etc.?

That's coming soon. The 'pixels' crate I'm using just has that one scaling method, where it only scales the pixel buffer by an integer factor. This is nice for keeping things crisp, but does lead to a of of negative space. I might be outgrowing this crate, and may just use bare wgpu going forward; in any case, I'm planning on having several scaling options (Integer, Fit, and Stretch, for starters)

Unfortunately, hardware acceleration will always be on. The GUI requires it, and the scalers are all shaders. I am thinking about a software frontend in the future for lower-spec machines, but you won't have the debugger.

keenmaster486 wrote on 2023-07-11, 15:20:

One other thing I noticed is that the navigational keys (home, end, etc.) don't work.

Should be fixed in 0.1.4, which is introducing an entirely rewritten keyboard system including low-level keyboard emulation and optional keyboard remapping and macro support. (woo!)

keenmaster486 wrote on 2023-07-11, 15:20:

I had to get all my ROMs from minuszerodegrees. Using ROMs from 86box and other sources simply did not work.

Would you be so kind as to give me the filenames and md5's you tried using? The 86box rom set should definitely work.
On the ROMs section of my wiki, https://github.com/dbalsom/martypc/wiki/ROMs I list the corresponding 86box ROMs to MZD ROMs. I'm curious if you have an older or newer set(?)

keenmaster486 wrote on 2023-07-11, 15:20:

Also, the hard drive controller emulation did not work when the machine type was set to 5150. I assume this configuration would work in real life, so why not allow it?

Probably. You need a specific version of the IBM 5150 BIOS for option ROM support for the hard drive to work. The ROM situation is painful enough as it is, so I didn't want to add the confusion there. I can probably add a flag to my ROM definitions that specify whether a BIOS supports option ROMs, then give an error if we load a ROM set that doesn't support it and then try to add a device with an option ROM. But I'll be writing the whole ROM management and machine definition stuff as well - the ROM database will become an external file so people can contribute new definitions or add their own custom ones.

MartyPC: A cycle-accurate IBM PC/XT emulator | https://github.com/dbalsom/martypc

Reply 52 of 559, by GloriousCow

Posted on 2023-07-11, 18:30

GloriousCow Offline

Rank Oldbie

Rank: Oldbie
Posts: 512
Joined: 2022-09-12, 20:00

Here's a peek at the new keyboard system for 0.1.4

https://www.youtube.com/watch?v=qEV8z-4kGV8

It may not seem like much, but mapping a modern keyboard to its 1980's equivalent scancodes so your keys do what they expect is not trivial! For example the 1980's Italian keyboard layout had no pipe character, so we have to synthesize one with an alt-codes macro.

MartyPC: A cycle-accurate IBM PC/XT emulator | https://github.com/dbalsom/martypc

Reply 53 of 559, by VileR

Posted on 2023-07-12, 08:48

VileR Offline

Rank l33t

Rank: l33t
Posts: 2214
Joined: 2003-05-14, 22:11
Location: 1-01-80 0:00a

GloriousCow wrote on 2023-07-11, 16:05:
I am thinking about a software frontend in the future for lower-spec machines, but you won't have the debugger.

Ah, then no chance for a simpler console/TUI-based debugger interface? I was wondering if that could make things easier to implement, maybe through one of those curses libraries that can use various backends... especially if you want 'standard' text/keyboard operations, clipboard, etc.

[ WEB ] - [ BLOG ] - [ TUBE ] - [ CODE ]

Reply 54 of 559, by GloriousCow

Posted on 2023-07-12, 12:48

GloriousCow Offline

Rank Oldbie

Rank: Oldbie
Posts: 512
Joined: 2022-09-12, 20:00

VileR wrote on 2023-07-12, 08:48:

GloriousCow wrote on 2023-07-11, 16:05:
I am thinking about a software frontend in the future for lower-spec machines, but you won't have the debugger.

Ah, then no chance for a simpler console/TUI-based debugger interface? I was wondering if that could make things easier to implement, maybe through one of those curses libraries that can use various backends... especially if you want 'standard' text/keyboard operations, clipboard, etc.

No, a debugging console is still very much planned; just stating that the SDL backend would probably be very bare-bones, if it ever even exists.

I've got some big plans for the debugger CLI; I need to work out exactly how I want to do some expression parsing, but I think it will really take MartyPC's debugging capabilities to the next level.

MartyPC: A cycle-accurate IBM PC/XT emulator | https://github.com/dbalsom/martypc

Reply 55 of 559, by superfury

Posted on 2023-07-12, 21:26

superfury Offline

Rank l33t++

Rank: l33t++
Posts: 5905
Joined: 2014-03-08, 11:25
Location: Netherlands

Also slightly related to the improved timings perhaps (since the new waitstate behaviour):

When using the Generic Super PC/Turbo XT BIOS, pressing the ESC key causes it to beep and somehow hang the EU in INTA's first cycle being processed (because the output buffer of the CPU is still filled with some result oddly enough, which shouldn't happen usually, as that means an instruction terminated it's EU operation inproperly without reading back it's result and finish it's timings).
Pressing any other key will boot the floppy drive, pressing no key continues on to the hard disk properly.

Edit: Hmmm... The cause seems to be the INTA cycle itself, as after adding some more logging on the opcode and 0F status (for 286+ CPUs) to the request and result (the result can give the request with payload(which is the address and paging TLB being used or not)) and checking what the result gives back. I clearly see it's receiving a result of an A1 cycle when requesting an A1 cycle, which shouldn't ever happen (sending the A1 cycle request to the BIU successfully should cause the special EU handler to proceed to receiving phase and read the result instead of sending the A1 request again (which is a double request, which is invalid))!
Edit: OK. The issue here is that the BIU is handling an IRQ, waiting for a result, then receives a new IRQ (which it shouldn't ever!) that resets the EU step counter, causing this issue! The new IRQ was caused by the EU performing the hardware interrupt (through the 8259 APIC) while keeping the instruction in to-be-starting state (basically the T1 of the EU if you can call it that), where it initializes CPU state for the new instruction to start executing.
Since it was in the to-be-starting state and another IRQ seemed to be pending, the emulator core started to acnowledge it (while the first one on the CPU was still performing the acnowledging simulation) because it thought it was starting a new instruction after that (which is incorrect in this special case).
So making the emulator core check for that flag to be set before acnowledging the (A)PIC request fixes that problem (causing the CPU to properly handle hardware interrupts again).

Edit: Tried 8088 MPH again. All effects seem correct, except for the crash(hanging loop w/o audio rendering or display rendering) immediately after the credits display (as in immediately after that first frame).
The Kefrens are correct (except for the final few clocks on the end of all scanlines, which is fixed on the entire height of the screen, not moving left or right).

I do notice an interesting thing on the Kefrens effect, though: the chinese wall (don't know what it's called) as it were seems to only be rendered every other scanline? Anyone can shed any light on that?
Edit: Those lines not displaying the chinese wall (are those the kefrens bars) seem to all be displaying the background that's to the left and right of the kefrens? That results in a vertical on/off for every scanline (1 on 1 off 1 on 1 off etc.), with the 'off' being the overscan entirely?
Edit: The credits of 8088 MPH run again, with the latest commits having improved the BIU waitstates and related BIU timings (and EU prefetching being improved, 8088 MPH's cycle count improved into the 1600s range).
About 8088 MPH's Kefrens effect, see the cycle accuracy thread I made instead: UniPCemu 8088 cycle accuracy
It's interesting what happens now, but it'll be better to continue on said thread instead about that.

Author of the UniPCemu emulator.
UniPCemu Git repository
UniPCemu for Android, Windows, PSP, Vita and Switch on itch.io

Reply 56 of 559, by VileR

Posted on 2023-07-30, 06:58

VileR Offline

Rank l33t

Rank: l33t
Posts: 2214
Joined: 2003-05-14, 22:11
Location: 1-01-80 0:00a

Will the "device control" debug tool be back in some fashion? Looking through the commits tells me it was considered unstable and hidden behind a 'devtools' build config option.

As I seem to recall, it allowed you to step through execution in multiples of PIT/CGA/etc ticks... at least I assumed that's what it did. Something like that could come in very handy.

[ WEB ] - [ BLOG ] - [ TUBE ] - [ CODE ]

Reply 57 of 559, by GloriousCow

Posted on 2023-07-31, 02:15

GloriousCow Offline

Rank Oldbie

Rank: Oldbie
Posts: 512
Joined: 2022-09-12, 20:00

VileR wrote on 2023-07-30, 06:58:

Will the "device control" debug tool be back in some fashion? Looking through the commits tells me it was considered unstable and hidden behind a 'devtools' build config option.

As I seem to recall, it allowed you to step through execution in multiples of PIT/CGA/etc ticks... at least I assumed that's what it did. Something like that could come in very handy.

That's not quite what it did, it ticked the device without ticking the CPU. It was pretty specifically built for debugging the end credits of area 5150. It might be back at some point, but my CGA optimizations broke it.

MartyPC: A cycle-accurate IBM PC/XT emulator | https://github.com/dbalsom/martypc

Reply 58 of 559, by VileR

Posted on 2023-07-31, 19:38

VileR Offline

Rank l33t

Rank: l33t
Posts: 2214
Joined: 2003-05-14, 22:11
Location: 1-01-80 0:00a

GloriousCow wrote on 2023-07-31, 02:15:
That's not quite what it did, it ticked the device without ticking the CPU. It was pretty specifically built for debugging the end credits of area 5150. It might be back at some point, but my CGA optimizations broke it.

Alright, then consider my wrong impression of what it did to be a feature request, if it's no big thing. Could be a nice addition if you get to revisit the debugging tools at some point.

[ WEB ] - [ BLOG ] - [ TUBE ] - [ CODE ]

Reply 59 of 559, by GloriousCow

Posted on 2023-07-31, 20:48

GloriousCow Offline

Rank Oldbie

Rank: Oldbie
Posts: 512
Joined: 2022-09-12, 20:00

VileR wrote on 2023-07-31, 19:38:

GloriousCow wrote on 2023-07-31, 02:15:
That's not quite what it did, it ticked the device without ticking the CPU. It was pretty specifically built for debugging the end credits of area 5150. It might be back at some point, but my CGA optimizations broke it.

Alright, then consider my wrong impression of what it did to be a feature request, if it's no big thing. Could be a nice addition if you get to revisit the debugging tools at some point.

Such a thing is unlikely to ever be added; not without completely rewriting the CPU core. Instructions execute pretty much atomically, so you can't just tick the cpu by an arbitrary number of system ticks, unfortunately.

MartyPC: A cycle-accurate IBM PC/XT emulator | https://github.com/dbalsom/martypc

Main menu

Topic actions

Reply 40 of 559, by Scali

Reply 41 of 559, by GloriousCow

Reply 42 of 559, by mdrejhon

Reply 43 of 559, by superfury

Reply 44 of 559, by mdrejhon

Reply 45 of 559, by superfury

Reply 46 of 559, by Scali

Reply 47 of 559, by mdrejhon

Reply 48 of 559, by superfury

Reply 49 of 559, by GloriousCow

Reply 50 of 559, by keenmaster486

Reply 51 of 559, by GloriousCow

Reply 52 of 559, by GloriousCow

Reply 53 of 559, by VileR

Reply 54 of 559, by GloriousCow

Reply 55 of 559, by superfury

Reply 56 of 559, by VileR

Reply 57 of 559, by GloriousCow

Reply 58 of 559, by VileR

Reply 59 of 559, by GloriousCow