VOGONS


Emulating Area5150

Topic actions

Reply 100 of 113, by superfury

User metadata
Rank l33t++
Rank
l33t++
GloriousCow wrote on 2023-06-30, 02:16:
superfury wrote on 2023-06-30, 01:31:

Still not there entirely, but making progress on it.
Although the parts mentioned (the vertical scrolling part until the bob at the bottom of the houses image and the part before the dancing elephant) should work, assuming that it's not doing anything weird with horizontal timings?
Both seem to have issues with some frames being half-width instead of full width? Or perhaps the issue is something taking double the time in clocks for even or odd frames only (seeing as such a thing would cause such an effect)?

How do you handle CGA mode byte 0x03? Both these effects use that trick. VileR explains how it works here: https://pcem-emulator.co.uk/phpBB3/viewtopic.php?t=3831

So you mean setting of the CGA mode register (3d8)?
That's quite complicated. Most of the heavy lifting is done by setting various VGA registers (MAP14 of CRTC mode control is adjusted to use scanline counter bit 0 instead of bit 1).
https://bitbucket.org/superfury/unipcemu/src/ … a/vga_cga_mda.c
Search for applyCGAModeControl() occurences, as well as the function itself.

https://bitbucket.org/superfury/unipcemu/src/ … /vga_renderer.c, look for said variable storage and CGAMDA macros for some more functionality.

Interestingly, the Windows o5 boot logo seems to have the same kind of issue (VGA rendering twice the horizontal timings than needed, causing 1 active screen of overscan after horizontal total point effectively).
So a htotal (DCR?) bug?

Edit: OK. It's indeed a DCR-related bug on the CGA. Did some modifications on the conditions for double width pixels and only applied it to the CRT calculations (there's still DCR affecting the clock rate of rendering pixels horizontally though, which is basically the base CRT timing, not the timing used for the VGA-compatible horizontal rendering pixel clocks).

Edit: That makes the pre-elephant tap dance wobbly effect work properly at fullscreen! 😁
The vertical scrolling effect until the glass ball effect now works properly as well! 😁

I do notice that the wobbly effect for the pre-elephant wobbly text has some interesting corruption above and below it (that moves with the test and seems to exhibit some wave function as well (at the start of the corrupted scanlines)?

Filename
1678-UniPCemu_area5150_canyouteachan effect corruption.png
File size
3.74 KiB
Downloads
No downloads
File comment
Can you teach an scrolling in and wobbling.
File license
Fair use/fair dealing exception

Edit: The 3d vector part has some interesting flickering now?
https://www.dropbox.com/s/e7x436ck0q4xm63/Uni … 5-47-05.7z?dl=0
Charlie part is the same as during the last recording, as are the credits.

Author of the UniPCemu emulator.
UniPCemu Git repository
UniPCemu for Android, Windows, PSP, Vita and Switch on itch.io

Reply 101 of 113, by GloriousCow

User metadata
Rank Member
Rank
Member
superfury wrote on 2023-06-30, 06:37:

So you mean setting of the CGA mode register (3d8)?

I mean specifically what happens when bit 0 (HIRES) AND bit 1 (GRAPHICS) is set in the byte written to the CGA mode register.

I'm guessing based on the corruption you're seeing in the village scroll/sphere effect and the elephant intro text, that you might be going into graphics mode seeing the presence of the graphics mode bit. But when both bits are set, you remain in hires text mode, just the background color handling changes - see the link in previous post.

MartyPC: A cycle-accurate IBM PC/XT emulator | https://github.com/dbalsom/martypc

Reply 102 of 113, by superfury

User metadata
Rank l33t++
Rank
l33t++
GloriousCow wrote on 2023-06-30, 14:31:
superfury wrote on 2023-06-30, 06:37:

So you mean setting of the CGA mode register (3d8)?

I mean specifically what happens when bit 0 (HIRES) AND bit 1 (GRAPHICS) is set in the byte written to the CGA mode register.

I'm guessing based on the corruption you're seeing in the village scroll/sphere effect and the elephant intro text, that you might be going into graphics mode seeing the presence of the graphics mode bit. But when both bits are set, you remain in hires text mode, just the background color handling changes - see the link in previous post.

Oh. That's interesting. Indeed, graphics mode was activated when bit 1 was set, not taking bit 0 into account!
Thanks for the info! I don't think that's documented anywhere?

The part with the wave effect before the elephant changed once again? Made a short video capture of it:
https://www.dropbox.com/s/9y4e4u0hgrxkz5a/Uni … 0-56-25.7z?dl=0
Not entirely there, but getting close.

8088MPH doesn't use said mode bit (bit 0 is never set with bit 1(graphics mode)), so it should still work properly. Also official software at least never sets both bits (nor the BIOS)?

What's that with said low 2 bits set and the video bit (bit 3) being cleared in the register? Active display should render overscan instead of output of VRAM (while still timing/fetching everything?)? Currently it performs no VRAM accesses and just NOPs on the rendering part (together with rendering black) on active display afaik?
UniPCemu just makes it clear VGA clocking mode register bit (setting the inverted value of bit 3 of the CGA mode control register), with it's effect performing the actual blanking.
Should it affect the overscan or active display of the normal high-res text mode when bits 0&1 are set depending on bit 3? What if it's cleared (is it a normal text mode in that case, with normal overscan behaviour)?
So what happens to the overscan color in that mode and what happens to the active display rendered pixel (and is VRAM fetched normally)?

The weird behaviour in the first 8 or so pixels of the area above and below the wobble effect is weird? What's happening there in the app? It looks fine at the end of the scanline?

Edit: The 3D vector effect is running properly now, other than the black (cleared overscan, even though 40 column mode is forced? It maybe rendering 0 overscan or some normal text-mode blanking signal?) appearing in between the text mode at the top and graphics area during a top bounce and the second 3D object spawning a corrupted looking scanline at the bottom of the graphics area in the middle.
Just like the vector effect, the credits still have the same problem. I did notice though that the scanline that's rendered at the very top of active display (or is it bottom?) before the overscan(if it's that, although it's probably the overscan at the bottom of the vertical effect, since it's performing like the water effect as it looks like that vertical waves rippling through it?).

Edit: Just implemented blanking (which is triggered by bit 3 of the CGA mode control register in UniPCemu) to render overscan if graphics mode (bit 1) is set. Otherwise, it'll just render 0 during blanking. (as happens with the VGA-compatible cards during overscan).

Also interesting is that See ya text at the end? UniPCemu renders a small version with a large text-hacked version below it, both moving from left to right and back a few times?

Author of the UniPCemu emulator.
UniPCemu Git repository
UniPCemu for Android, Windows, PSP, Vita and Switch on itch.io

Reply 103 of 113, by mdrejhon

User metadata
Rank Newbie
Rank
Newbie
VileR wrote on 2023-06-25, 18:28:

Nice!
Speaking of the display: would it be possible to improve the centering of the active area within the emulated monitor? As far as I can tell, +HRES modes (80-col text) are shifted far to the right, while -HRES (everything else) is way off to the left

GloriousCow wrote:

The version of MartyPC you're using returns the raster to the left side of the screen at the end of hblank. So it's not just the HorizontalSyncPosition that matters, but SyncWidth as well. You are correct in that the HorizontalSyncPosition is effectively the same in 80 column and 45 column mode, but SyncWidth is programmed as 10 in both modes so in 40 column mode the SyncWidth is effectively twice as wide. I already decided that waiting for end of hblank was probably incorrect.

Hello!

I'm the founder of Blur Busters and TestUFO, and I'm the resident expert of the "Present()-to-photons" black box on PC based systems.

Most of you have figured this out by now, but, I'd like to provide a useful diagram that enlightens many people a bit more quickly.

The display is not "deciding". It's definitely something digital before the analog side (either spewing out of the CRTC, or some digital trigger circuit between the CRTC or the display itself)

Terminology vary a lot between communities, e.g. broadcast vs programmers vs manufacturers vs API naming nomenclature. e.g. "Overscan" = "Porch", or "VBLANK" = "VSYNC", but they can be different, if one community considers VBLANK as (VSYNC + vertical porches), but another community does not. It is sort of a "Semantics Shemantics" rabbit hole, but for simplicity, I'll stick to Blur Busters's usual terminologies for the purposes of this post. A specific API or register named "SyncLength" might refer to only the sync pulse, or the entire blanking interval (sync and porches) -- on different platforms. I don't know what 5150 does terminologically / signally, but it merits keeping in mind.

So, here are some generalities:

  • What the software thinks is not always aligned with the analog signal.
  • What some elsewhere call HBLANK often accidentally includes horizontal back porch (left overscan) and vertical back porch (right overscan), in addition to the real HBLANK.
  • What some elsewhere call VBLANK often accidentally includes vertical back porch (top overscan) and vertical front porch (bottom overscan), in addition to the real VBLANK.
  • What some elsewhere call OVERSCAN (for borders) isn't always pixel-aligned with signal-level horizontal back porch and vertical back porch.
  • Most color-border overscan is actually not part of the signal vertical back porch nr vertical front porch!
  • The (turned-off but sometimes still spewing a few latent electrons) CRT beam is held stationary beyond left edge of tube if it's finished moving there during true horizontal signal blanking (HBLANK analog side)
  • The CRT beam yoke electromagnets instantly starts deflecting left-to-right upon analog signal edge transition from HBLANK to signal-level overscan (horizontal back porch)
  • Thus, horizontal image position on a real tube is typically very dependant on the exit out of signal HSYNC (transition from signal HSYNC to left overscan).

So you've got two terminological overscans (sometimes three!) involved in concentric rectangles around your addressable graphics -- the color borders overscan and the signal overscan.

Most are usually unaware that overscan actually contains two overscans on some platforms.

Here is a modified version of my earlier signal structure diagram, to highlight the two overscans.

SignalStructureForRetroEmulatorCoders.jpg

Sometimes on some platforms, the overscan merges both only on a certain platform (e.g. border color in front porches, and using a very thin black-color back porch). Or, depending on platform, signal, and signal type, there can be three concentric overscan rectangles, e.g. a different-color padding between the outer signal overscan and inner border-colors overscan, depending on the signal used.

In programming some platforms, there can be some transfers between all, e.g. most CRTs will be fine if you randomly transfer pixel between inner/outer every line (horizontals) or every frame (verticals), as like color in actual signal overscan, or signal overscan in what was normally border color. It's typically harmless to "slush" a little that way, as long as you don't bleep around with those "blanking"->"porch" signal transitions, and you don't bleep around with those clocks per line or per refresh. Since overscan is really just usually black pixels historically. In some signal standards like NTSC, porches are simply same as black pixel voltage (7.5 IRE) and sync is below black pixel voltage (0 IRE). RGBI does not work the same way, but they still have use signal blanking and signal porches too on the analog side.

Sometimes you've got some weirdnesses on some signals of merged overscans only on certain edges, where the back porch is normal specification but the front porch is out-of-spec with non-black pixels (merged bordercolor with front porch) -- whether by programming tricks or quirks on specific platforms. This, too, typically works on real tubes, though can create side effects, like edge distortions or slight black level risings from the electron backscatter of the beam hitting the inside of the tube outside the phosphor mask area. So it's not always concentric rectangles.

If there are still problems after some debugging -- perhaps some dummy padding is being kicked/slushed around a bit from abusing CRTC like a TIA. Your logical sync stuff may or may not line up with the signal sync stuff.

I am not an expert on the CRTC, just the generic signal layout found in all video signals, analog and digital, including components that modifies analog horizontal position.

From the existing registers, some math between HorizontalSyncPosition and SyncWidth may allow you to determine where the signal HSYNC ends (overscan starts), since that's the important image-positioner dependency on a real CRT, unless some black box circuitry is executing some automatic compensation.

There's possibly three components (horizontal front porch, horizontal overscan, horizontal back porch) you're going to have to math-out from the registers you are successfully emulating. Depending on what the registers really creates analog-side, it is possible to ignore the front porch as part of your mathematics, if you don't need it to determine the size of back porch. But if you aren't emulating enough data in order to determine the exact moment signal HSYNC falling from true->false (transition from HSYNC to overscan) that dictates horizontal image positioning on a real CRT tube, then you're going to have to reverse-engineer with an oscilloscope next.

You might be witnessing a situation where you've got thicker/thinner HBLANK and thinner/thicker porches (still same clocks per line). Or where you've got your HBLANK offsetting left/right ("thinner back porch/thicker front porch" situation) Either way same clocks per line. Most CRTs work fine that way, although the image position usually shifts left/right if you transfer timings between the front-vs-back horizontal porch.

However, your situation of:
- Horizontally shifted image in the emulator
- Constantly centered image on real machine
Might be from the emulator assuming incorrect end-of-hblank position, like not knowing where the signal-hblank transitions to signal-hbackporch, which might not be easy to determine (?) without a bit of further reverse engineering. Most emulators simply treat the blanking intervals as one monolithic block without all those pesky little components...

There's some sheningans that seems to shift blanks around a bit -- but it's definitely something before the analog side, as the beam electromagnets only starts deflecting the cathode ray from left-to-right upon the edge-transition from hblank to horizontal back porch. So reverse engineer that edge transition with an oscilloscope! The signal HSYNC and VSYNC (also called HBLANK or VBLANK, if it's not including the porches) "holds" the turned-off beam stationary at the very left (when in HSYNC) and very top (when in VSYNC) if it's already finished moving there, and the beam doesn't start moving until the edge-transition into porch (signal overscan).

To help debug how the CRTC decides to mis-align signal blanks/overscan timings with assumed blanks/overscans -- I suggest connecting an oscilloscope.

There may be multiple possible oscilloscope-debug workflows, such as connecting Oscillope Input #1 to the analog signal of a real IBM CGA monitor during executing 8088 MPH as well as AREA 5150, while connecting oscilloscope input #2 to some debugging trigger (e.g. an Arduino logic-monitor "listening into a CRTC pin and triggering a marker pulse when a specific value is written to specific registers") -- and watching the offsets between #1 and #2 on the oscilloscope screen. Finding the certain patterns where the Ground Truth changes (signal edge of hblank suddenly transitioning to horizontal back porch), and modifying the emulator to be more compatible, perhaps. There may be easier oscilloscope debugging workflows, but an oscilloscope may be needed to debug the offset between assumed vs actual. Reverse engineer those sync->porch transitions. This Is The Way </mandalorian>

(Did you know? This signal structure has been around for a century, still in 2020s DisplayPort, even in a 240Hz DSC G-SYNC signal! It is the topology found from 1930s analog TV broadcast signal through 2020s DisplayPort (in digital format), so it's impressive we've stuck to the same raster topology for a century! The digital transition kept the analog timings like time-paddings and comma separators so precisely that a generic passive synchronous 1:1 VGA->HDMI adaptor (in the era before full HDMI packetization) works perfectly timing-wise. So everything's still there digitally in a 2020s DisplayPort signal as an analog signal of the same timing parameters. Although modern DP/HDMI packetization and multiplex (audio packets, etc) jitters by an error margin of about 1 scanline nowadays. But that, today, even is still good enough for line-based beam racing today. It's just an obvious defacto serialization of 2D imagery into a 1D spew threw a cable / radio / drinking straw / etc)

_____

Some more background: I am in 25 research papers (purple "Research" tab at Blur Busters website). I don't have much demoscene cred, but I cut my teeth on Commodore 64, with 6502 and beam racing experience.

I'm the author of the still-WIP Tearline Demo (volunteers welcome) for treating any tearing-capable GeForces/Radeons/Intel like an Atari TIA (beam racing modern GPUs), I posted videos at pouet.net including real-raster Kefrens Bars on NVIDIA GeForce GTX 1080 GPU a few years ago ("Tearlines Are Just Rasters"), under Windows in mere C# HLE and the cross-platform MonoGame game engine. I also recompiled it on a Mac, and most of it worked, so it was crossplatform-capable, albiet with lots of raster jitters. Code was shelved, but with some babying, could become the world's first crossplatform rasterdemo, tolerating a bit of very fun raster jitter from all the varying error margins of the various platforms, abusing a number of modern GPUs like an Atari TIA, rendering unscreenshottable beamraced graphics on the fly. If any democoder would like to resurrect this concept and/or teamup and include me in the credits, I'm open. Time is limited on my side, but I can teach modern-GPU beamracing basics.

Some emulators (Toni's WinUAE and Tom Harte's CLK) now use my findings to create a new lagless VSYNC mode, with sub-refresh latencies, by Present()ing frameslices during VSYNC OFF mode later blocks of scanlines to the GPU even while the GPU is already video-outputting earlier scanlines, from an invention of mine called "frameslice beam racing", which has been named "lagless vsync" by Toni of WinUAE Amiga Emulator. Windows D3DKMTGetScanLine() is sometimes used, or a raster-estimate via time-offset between consecutive VSYNCs, with Present()+Flush() precisely timed by CPU busywait loops, during a Full Screen Exclusive VSYNC OFF mode (that has ugly tearing unless the tearing is hidden by beamracing). This works just fine since there's a performance jitter safety margin in frameslice beamracing. Since WinUAE/CLK respective lagless vsync races ahead of the real displays' raster, VSYNC OFF never tears, and is a solid lagless "perfect VSYNC ON lookalike" emulated mode, e.g. 600fps VSYNC OFF at 10 frameslices per emulated 60Hz refresh cycles. And tearing/glitching never appears unless the system jitters 1/600sec "late", etc. Assuming the realworld raster is inside the jitter margin of one frameslice ago, aka (0.5 + 1)/600sec, this is only ~1.5/600sec input lag behind a real machine! (Configurable higher or lower, e.g. 1200fps, 1800fps, etc). Thin VSYNC OFF frameslices that are beamraced onto the screen, appended to the still-invisible "tearline" before the beam hits it. That way, you can stay tearingless. That creates subrefresh-latency "VSYNC ON" done via raster beam racing software tricks on VSYNC OFF -- with no tearing! Some GPUs can do over 2000 frameslices per second in WinUAE, creating sub-1ms lag relative to original machine! While not useful for emulators playing demoscene content, but great for gaming in an emulator to have original-machine lagfeel -- it is pretty clever.

Founder of www.blurbusters.com and www.testufo.com
- Research Portal
- Beam Racing Modern GPUs
- Lagless VSYNC for Emulators

Reply 104 of 113, by superfury

User metadata
Rank l33t++
Rank
l33t++

Just did some looking at your code for HLT and INTA handling and added the following to UniPCemu:
- INTA for hardware interrupts using the PIC directly (on non-multiprocessor CPUs) are now done through the BIU.
- Two INTA fetches (using memory T-states without waitstates) have been added for 80(1)8x CPUs. Newer CPUs use 1 cycle instead (only 1 INTA for each interrupt).
The above wasn't implemented previously (all INTA were instantly progressing into the INT EU handling on the CPU).
This roughly adds 8 cycles to each PIC hardware interrupt (4 cycles for each INTA), only adding on any states until T1 is reached before that of course.
NMI is unchanged. It still executes untimed (no INTA or anything like that after all), just INT 02h handling.
- HLT now properly waits for an idle BIU, then starts ticking 1 (or 2 in the case a REP instruction was executing) EU cycle with BIU forced idle, then puts the entire CPU in HLT state (until an IRQ triggers INTA, mentioned above).

Edit: Just modified the debugger to log VGA and PIC interrupt requests as it was before the instruction executed instead of as it is after it executed (making it match the register state).
Edit: Found out when double checking on the PSP build that it was requesting too much memory to be able to run. The only solution was to decrease the renderer framebuffer (that the VGA draws to) to half it's size, at a maximum resolution rendered of 1024x512 pixels (which makes it use 4MB on all the required display buffers, which is about the maximum it can use without causing the app to fail allocating itself and leave enough memory to play with).

Edit: Just checked out that weird 'blanking' issue on the 3d vector object bouncing up against the text area above it.
It looks like it isn't rendering blanking at all? Perhaps a black overscan is rendered instead (overscan being set to 0)?

Author of the UniPCemu emulator.
UniPCemu Git repository
UniPCemu for Android, Windows, PSP, Vita and Switch on itch.io

Reply 105 of 113, by GloriousCow

User metadata
Rank Member
Rank
Member
mdrejhon wrote on 2023-07-01, 10:24:

Hello!

I'm the founder of Blur Busters and TestUFO, and I'm the resident expert of the "Present()-to-photons" black box on PC based systems.

Hi! This is a wealth of information, and I appreciate you sharing it. It will take me a while to digest it all.

mdrejhon wrote on 2023-07-01, 10:24:

The display is not "deciding". It's definitely something digital before the analog side (either spewing out of the CRTC, or some digital trigger circuit between the CRTC or the display itself)

My assumption was there was something in say, the IBM 5153, that would do an hsync if the CGA/CRTC did not otherwise trigger one, otherwise, what happens? The beam just sits on the right side of the screen forever?
I have a fixed render buffer representing the display 'field', so the only logical thing for me to do when I reach the right side of it is to wrap around to the left, which effectively is performing hsync.

There's one test in CGACOMP that my emulator doesn't handle and it's one where hsync and vsync are disabled entirely and just a solid color is drawn. It's supposed to fill the screen with that color, implying that the monitor is still scanning out as normal in the absence of those signals - but without a vsync, my emulator won't generate a frame, so you just see black.

mdrejhon wrote on 2023-07-01, 10:24:

[*]The (turned-off but sometimes still spewing a few latent electrons) CRT beam is held stationary beyond left edge of tube if it's finished moving there during true horizontal signal blanking (HBLANK analog side)
[*]The CRT beam yoke electromagnets instantly starts deflecting left-to-right upon analog signal edge transition from HBLANK to signal-level overscan (horizontal back porch)
[*]Thus, horizontal image position on a real tube is typically very dependant on the exit out of signal HSYNC (transition from signal HSYNC to left overscan).

Thanks, that's exactly the sort of thing I was curious about. I wasn't sure if the beam ever stopped moving or not.

mdrejhon wrote on 2023-07-01, 10:24:

From the existing registers, some math between HorizontalSyncPosition and SyncWidth may allow you to determine where the signal HSYNC ends (overscan starts), since that's the important image-positioner dependency on a real CRT, unless some black box circuitry is executing some automatic compensation.

I think I have the end of HSYNC dialed in pretty well now, at least, everything appears well-centered if I assume that hsync lasts 80 hdots. This returns my virtual raster to the left side of the screen to draw the remaining hblank period (back porch?), which pushes the display area to the right. So things like 40 column vs 80 column modes now line up, and effects in Area 5150 that use a sync width of 15 also work.

mdrejhon wrote on 2023-07-01, 10:24:

There's possibly three components (horizontal front porch, horizontal overscan, horizontal back porch) you're going to have to math-out from the registers you are successfully emulating. Depending on what the registers really creates analog-side, it is possible to ignore the front porch as part of your mathematics, if you don't need it to determine the size of back porch. But if you aren't emulating enough data in order to determine the exact moment signal HSYNC falling from true->false (transition from HSYNC to overscan) that dictates horizontal image positioning on a real CRT tube, then you're going to have to reverse-engineer with an oscilloscope next.

You might be witnessing a situation where you've got thicker/thinner HBLANK and thinner/thicker porches (still same clocks per line). Or where you've got your HBLANK offsetting left/right ("thinner back porch/thicker front porch" situation) Either way same clocks per line. Most CRTs work fine that way, although the image position usually shifts left/right if you transfer timings between the front-vs-back horizontal porch.

To help debug how the CRTC decides to mis-align signal blanks/overscan timings with assumed blanks/overscans -- I suggest connecting an oscilloscope.

I've done some scope work on the CGA, particularly in terms of vsync/vblank timings, as I was curious about the exact timings of the last raster line of vblank where you kind of have vblank and hblank overlapping. The vsync output from the DIN connector does seem to correlate with the vblank output from the CRTC, at least. There's more to be done there, certainly, my poor CGA card is bristling with test point wires at this point.

mdrejhon wrote on 2023-07-01, 10:24:

There may be multiple possible oscilloscope-debug workflows, such as connecting Oscillope Input #1 to the analog signal of a real IBM CGA monitor during executing 8088 MPH as well as AREA 5150, while connecting oscilloscope input #2 to some debugging trigger (e.g. an Arduino logic-monitor "listening into a CRTC pin and triggering a marker pulse when a specific value is written to specific registers") -- and watching the offsets between #1 and #2 on the oscilloscope screen. Finding the certain patterns where the Ground Truth changes (signal edge of hblank suddenly transitioning to horizontal back porch), and modifying the emulator to be more compatible, perhaps. There may be easier oscilloscope debugging workflows, but an oscilloscope may be needed to debug the offset between assumed vs actual. Reverse engineer those sync->porch transitions. This Is The Way </mandalorian>

Some good ideas there. An Arduino might be a little slow for the 14Mhz CGA clock but I'm working on 5v Teensy board that might be able to do the job. I thought it might even be a fun project to make a Teensy CGA.

mdrejhon wrote on 2023-07-01, 10:24:

Some more background: I am in 25 research papers (purple "Research" tab at Blur Busters website). I don't have much demoscene cred, but I cut my teeth on Commodore 64, with 6502 and beam racing experience.

I've been to your site a few times, so I'm not completely unfamiliar with your work. I certainly appreciate you taking the time to provide your insight.

mdrejhon wrote on 2023-07-01, 10:24:

Some emulators (Toni's WinUAE and Tom Harte's CLK) now use my findings to create a new lagless VSYNC mode, with sub-refresh latencies, by Present()ing frameslices during VSYNC OFF mode later blocks of scanlines to the GPU even while the GPU is already video-outputting earlier scanlines, from an invention of mine called "frameslice beam racing"

This would be extremely cool to try to implement. Right now I'm double-buffering so I know there's a bit of latency, although at 60fps with most old PC titles it's probably not much of a concern(?) But since I'm already drawing the CGA in a clock by clock fashion there's nothing technically stopping me from sending as many screen updates as I want whenever, but I have to see if it would be possible with wgpu/Vulkan. I'm sort of at the mercy of that library, unless I port to another front-end. Can it be done through SDL?

MartyPC: A cycle-accurate IBM PC/XT emulator | https://github.com/dbalsom/martypc

Reply 106 of 113, by GloriousCow

User metadata
Rank Member
Rank
Member
superfury wrote on 2023-07-01, 15:14:

Just did some looking at your code for HLT and INTA handling and added the following to UniPCemu:
- INTA for hardware interrupts using the PIC directly (on non-multiprocessor CPUs) are now done through the BIU.
- Two INTA fetches (using memory T-states without waitstates) have been added for 80(1)8x CPUs. Newer CPUs use 1 cycle instead (only 1 INTA for each interrupt).

I think that the INTA bus cycles can be affected by wait states as well if DRAM refresh is occurring. Don't know if modelling that is required for Area 5150 or not, but I can try turning it off and seeing if it breaks...
You'll also want to model what the PIC does on that second INTA. Area 5150 does some funny stuff setting up lockstep, the PIT timer #1 is set to a very short period (1 or 2) so there are cases where refresh will happen back-to-back and cases where it will be too late and skip one. If you haven't yet check out my blog on DMA https://martypc.blogspot.com/2023/05/explorin … -on-ibm-pc.html.

superfury wrote on 2023-07-01, 15:14:

- HLT now properly waits for an idle BIU, then starts ticking 1 (or 2 in the case a REP instruction was executing) EU cycle with BIU forced idle, then puts the entire CPU in HLT state (until an IRQ triggers INTA, mentioned above).

Don't take my HLT behavior for gospel, I still have some research/validation to do on that. You are correct though it will wait for the BIU as HALT is just another (very short) bus state. There's some funky timing business waking up from HLT on an interrupt, it's not immediate and there's a variable cycle in there. We had a thread on that: 8088 Interrupt delay timing This probably has some relevance in Area 5150 since the Wibble/Lake effects do halt for a bit at the end of the effect before letting the timer ISR wake things back up. You have a short window to hit the ISR in before the CRTC updates will be late, so a significant inaccuracy in halt resuming could be an issue.

MartyPC: A cycle-accurate IBM PC/XT emulator | https://github.com/dbalsom/martypc

Reply 107 of 113, by mdrejhon

User metadata
Rank Newbie
Rank
Newbie

[Edited again for corrections; grammar; terminology. May still contain slip-ups, like interchanges between HBLANK vs HSYNC]

GloriousCow wrote on 2023-07-01, 18:14:

Hi! This is a wealth of information, and I appreciate you sharing it. It will take me a while to digest it all.

No rush. It's a hobby of mine to dabble in all of this on the side!

GloriousCow wrote on 2023-07-01, 18:14:

My assumption was there was something in say, the IBM 5153, that would do an hsync if the CGA/CRTC did not otherwise trigger one, otherwise, what happens? The beam just sits on the right side of the screen forever?

Only within the tight confines' of the display horizontal scanrate specifications.

For example, if your CRT tube supports 31.5KHz-67KHz horizontal scan rate, you have to begin deflecting the beam again 1/67000sec and 1/31500sec or the tube is gonna go out of sync (e.g. roll, tear, skew, whatever). At higher scanrates, the beam has little time to be parked, while at lower scanrates, the beam have lots of time to be parked. Some deflectors are so slow (especially on early fixed-sync tubes), that there's no time for the beam to loiter at the left edge of the screen waiting for the HSYNC to disappear from the signal.

It is just analog electronics behavior. Metaphorically, like a fully charged capacitor no longer accepting electrons -- depending on how quickly you interrupt the charging. If your scanrate is really high, and the horizontal deflector is slow, the beam has no time to park itself. If your scanrate is low and/or the horizontal deflector is powerful, the beam can be parked for a long time.

I only simply explain this, only to explain why the horizontal picture position changes on an analog tube if you vary the HSYNC edge position in a signal. Fixed-sync CRTs are very forgiving with slightly early / slightly out-of-spec late HSYNC -- as long everything else is in-spec. It just merely simply affects horizontal picture position. I only explained it only to explain horizontal picture position behaviors on a real CRT, which affected by whatever you slush between the left/right porches (overscan) at the HSYNC signal edge.

The (turned-off) beam may or may not loiter parked motionless at left edge. The CRT tube's "horizontal scan rate" range simply make sure there's enough time for the beam to reach the left side (and enough beam acceleration/deceleration room in the porches -- that's why linearity is often compressed if you try to put picture data inside signal overscan area) -- whether or not it immediately scans the next scanline, or just loiters parked until signal HSYNC ends. Hope this makes sense.

Also you have to keep a fixed scanrate through the entire refresh cycle, or you will have really bad glitches (Pictures will be very unstable on raster CRTs if you vary the horizontal scanrate).

Also, in many platforms (dont know if applies to CRTC) there are electronics that forces a fixed-size HSYNC and a fixed-size VSYNC. The respective graphics chip of the platform simply triggers it, and downstream electronics emits it on a fixed-time. Which is why if you trigger it, it's always the same number of black pixels or black scanlines artifacts in mid-refresh-cycle. If CRTC behaves this way, the great thing is you can simply use constants from knowing the signal specifications (e.g. number of cycles that a HSYNC takes) and just call it a day. You can simply subtract sizes/offsets and come up with the signal-overscan numbers (e.g. Horizontal Back Porch and Horizontal Front Porch). You're simply only concerned with the offset (position) of the HSYNC, and you can simply use a math constant to assume the size of HSYNC for specific video modes (e.g. 320x240, 512x384, 60Hz, 70Hz, etc). I assume you can do so. Then you can simply calculate the front porch and back porch that way by already knowing the constant size of HSYNC (if the electronics always emits fixed-size unchangeable-size HSYNCs). This means, if overscanned text modes and-nonoverscanned text modes create different video modes (e.g. different number of CPU cycles per scanline, or different number of CPU cycles per refresh cycle), then you will have to do different calibrations (lookup tables) for different video modes on a multisync display. Since horizontal picture position behaviors will be different for different video modes (different signal timings).

I don't know how big the rabbit hole we're dealing with here, but isn't the 5150 display a single-video-mode fixed-frequency display -- essentially 40x25 and 80x25 are refreshing exactly the same number of cycles per line and per refresh? If so, your job is probably some simple adjustments. Otherwise...

GloriousCow wrote on 2023-07-01, 18:14:
mdrejhon wrote on 2023-07-01, 10:24:

[*]The (turned-off but sometimes still spewing a few latent electrons) CRT beam is held stationary beyond left edge of tube if it's finished moving there during true horizontal signal blanking (HBLANK analog side)
[*]The CRT beam yoke electromagnets instantly starts deflecting left-to-right upon analog signal edge transition from HBLANK to signal-level overscan (horizontal back porch)
[*]Thus, horizontal image position on a real tube is typically very dependant on the exit out of signal HSYNC (transition from signal HSYNC to left overscan).

Thanks, that's exactly the sort of thing I was curious about. I wasn't sure if the beam ever stopped moving or not.

It may be, or it may not be. It depends on how fast the deflector managed to move the turned-off beam back to the left edge. Don't worry about that, it's /somewhat/ of red herring wrong tree in forest for the trees. I only simply explained only to explain analog horizontal-picture-positioning behavior -- the problem we're dealing with -- and the infodump to try to help you out on this specific other forest tree to focus on.

Picture positioning flexibility may behave differently in different modes (e.g. 40x25 vs 80x25 vs non-standard modes), but it's still the offset of the HSYNC within the combined horizontal blanking (signal HSYNC + signal horizontal porches), that is affecting horizontal picture position. Theoretically easy to determine if you're permanently dealing with a single fixed-frequency mode (same clocks per line, same clocks per refresh). But if you're treating IBM's CRTC like an Atari TIA, I know that you could add/remove scanlines from refresh cycles on an Atari -- like having weird too-long or too-short refresh cycles. If you're going out of that one fixed-frequency spec of whatever standard (NTSC or RGBI or whatever), you'll definitely need to compensate as horizontal position during out-of-spec operation will be unpredictable, and will need to be validated by reverse engineering (comparing real system behavior to emulator behavior).

mdrejhon wrote on 2023-07-01, 10:24:

From the existing registers, some math between HorizontalSyncPosition and SyncWidth may allow you to determine where the signal HSYNC ends (overscan starts), since that's the important image-positioner dependency on a real CRT, unless some black box circuitry is executing some automatic compensation.

GloriousCow wrote on 2023-07-01, 18:14:

I think I have the end of HSYNC dialed in pretty well now, at least, everything appears well-centered if I assume that hsync lasts 80 hdots.

That's a fair assumption.

Since you're dealing with a fixed-frequency mode, the specifications of a fixed-frequency mode is conveniently hardcodeable -- so using magic numbers (80 hdots) for cycles are perfectly fair game here.

Just make sure you've hardcoded the right numbers for the fixed-frequency analog timings that all the digital modes run through. Proceed with your path, if it worked. If it is flawless position sync vs real machine, then thank goodness you don't have to reverse engineer with an oscilloscope!

...Unless you abuse the spec (different scanrate, different refresh rate, excessively large/tiny porches, etc).

GloriousCow wrote on 2023-07-01, 18:14:

This returns my virtual raster to the left side of the screen to draw the remaining hblank period (back porch?), which pushes the display area to the right. So things like 40 column vs 80 column modes now line up, and effects in Area 5150 that use a sync width of 15 also work.

Good. I get the feeling you are doing it correctly now. (I think)

GloriousCow wrote on 2023-07-01, 18:14:

I've done some scope work on the CGA, particularly in terms of vsync/vblank timings, as I was curious about the exact timings of the last raster line of vblank where you kind of have vblank and hblank overlapping. The vsync output from the IN connector does seem to correlate with the vblank output from the CRTC, at least. There's more to be done there, certainly, my poor CGA card is bristling with test point wires at this point.

There may be a HSYNC pulse embedded in the last VSYNC scanline, simply to make sure the beam is really at the left edge. HSYNC embedded early in VSYNC is ypically meaningless because it doesn't matter what the (turned off) CRT beam horizontally does when it's being moved vertically back to the top of the screen, as long as you have at least one HBLANK pulse before the next active scanline (time-aligned with previous HBLANK pulses).

Simple analog circuits (lowest cost) expect to see a continuous VSYNC signal for a time period (with no HSYNC pulses in VSYNC scanlines) to move the beam more quickly to the top edge. But a bit of optional spec-violating (or spec-meeting!) HSYNC pulse late inside a final VSYNC scanline(s) is harmless, and just makes sure the beam is already leftwards when at the top.

GloriousCow wrote on 2023-07-01, 18:14:

Some good ideas there. An Arduino might be a little slow for the 14Mhz CGA clock but I'm working on 5v Teensy board that might be able to do the job. I thought it might even be a fun project to make a Teensy CGA.

The latest Teensy can be overclocked to about a gigahertz too, if necessary. Spewing out RGBI in realtime would be a fun microcontroller project! You won't have time to do interrupts (USB communications etc), but I suspect we're reaching an era where it's becoming possible to software-generate RGBI in realtime.

Semi-related, but my Tearline Jedi Kefrens Bars animation [YouTube] on a GeForce GTX 1080 Ti running at 8000fps is half NTSC scanrate (GPU generating 8000fps in 1-pixel-row frameslices). Every single pixel row is a VSYNC OFF tearline right above/below! So it's all one pixel row with approximately ~100-150 VSYNC OFF tearlines per refresh cycle at 60Hz, in a super-abuse of VSYNC OFF tearing to generate true-raster Kefrens Bars via any DirectX/OGL/Vulkan API in VSYNC OFF mode!

So, that means if I had a GPU capable of 15750fps, and I could trust a RTOS to spew the scanlines deterministicaly enough, I could just Atari-TIA-style spew emulator scanlines out at NTSC scanrate! (Though unfortunately won't work in practice, as horizontal jitter is too imprecise, and the GPUs don't support low scanrates).

However, scaled to destination resolution, and displayed on modern HDTVs, this would be an ultra-low emulator with only one scanline latency behind original machine connected to the same HDTV display!!! (Ideally two or three raceahead margin, as a computer-performance jitter safety margin).

[EDIT -- one reply block was forked to a new thread. See below]

Last edited by mdrejhon on 2023-07-03, 00:27. Edited 4 times in total.

Founder of www.blurbusters.com and www.testufo.com
- Research Portal
- Beam Racing Modern GPUs
- Lagless VSYNC for Emulators

Reply 108 of 113, by superfury

User metadata
Rank l33t++
Rank
l33t++
GloriousCow wrote on 2023-07-01, 18:29:
I think that the INTA bus cycles can be affected by wait states as well if DRAM refresh is occurring. Don't know if modelling t […]
Show full quote
superfury wrote on 2023-07-01, 15:14:

Just did some looking at your code for HLT and INTA handling and added the following to UniPCemu:
- INTA for hardware interrupts using the PIC directly (on non-multiprocessor CPUs) are now done through the BIU.
- Two INTA fetches (using memory T-states without waitstates) have been added for 80(1)8x CPUs. Newer CPUs use 1 cycle instead (only 1 INTA for each interrupt).

I think that the INTA bus cycles can be affected by wait states as well if DRAM refresh is occurring. Don't know if modelling that is required for Area 5150 or not, but I can try turning it off and seeing if it breaks...
You'll also want to model what the PIC does on that second INTA. Area 5150 does some funny stuff setting up lockstep, the PIT timer #1 is set to a very short period (1 or 2) so there are cases where refresh will happen back-to-back and cases where it will be too late and skip one. If you haven't yet check out my blog on DMA https://martypc.blogspot.com/2023/05/explorin … -on-ibm-pc.html.

superfury wrote on 2023-07-01, 15:14:

- HLT now properly waits for an idle BIU, then starts ticking 1 (or 2 in the case a REP instruction was executing) EU cycle with BIU forced idle, then puts the entire CPU in HLT state (until an IRQ triggers INTA, mentioned above).

Don't take my HLT behavior for gospel, I still have some research/validation to do on that. You are correct though it will wait for the BIU as HALT is just another (very short) bus state. There's some funky timing business waking up from HLT on an interrupt, it's not immediate and there's a variable cycle in there. We had a thread on that: 8088 Interrupt delay timing This probably has some relevance in Area 5150 since the Wibble/Lake effects do halt for a bit at the end of the effect before letting the timer ISR wake things back up. You have a short window to hit the ISR in before the CRTC updates will be late, so a significant inaccuracy in halt resuming could be an issue.

It's currently implemented as a modified memory or bus(I/O) operation. It's just reading from some internally cached variable of the interrupt number. The first INTA will give either the interrupt number of the PIC (on 286+ systems), otherwise all bits set. The second INTA (only if the first had all bits set on 808x) will actually give the value of the interrupt number, which the CPU uses to start up the INT nn handler to tick all previously discussed timings. Both of those are just like a normal memory/IO access in all respects, just a different result source.

Author of the UniPCemu emulator.
UniPCemu Git repository
UniPCemu for Android, Windows, PSP, Vita and Switch on itch.io

Reply 109 of 113, by mdrejhon

User metadata
Rank Newbie
Rank
Newbie

UPDATE: I have forked the potentially useful "Lagless Vsync HOWTO" for emulator developers to a separate thread here. Even if you don't plan to ever implement it, I wanted to post the "2023 best practices" there.

GloriousCow wrote on 2023-07-01, 18:14:
mdrejhon wrote on 2023-07-01, 10:24:

Some emulators (Toni's WinUAE and Tom Harte's CLK) now use my findings to create a new lagless VSYNC mode, with sub-refresh latencies, by Present()ing frameslices during VSYNC OFF mode later blocks of scanlines to the GPU even while the GPU is already video-outputting earlier scanlines, from an invention of mine called "frameslice beam racing"

This would be extremely cool to try to implement. Right now I'm double-buffering so I know there's a bit of latency, although at 60fps with most old PC titles it's probably not much of a concern(?) But since I'm already drawing the CGA in a clock by clock fashion there's nothing technically stopping me from sending as many screen updates as I want whenever, but I have to see if it would be possible with wgpu/Vulkan. I'm sort of at the mercy of that library, unless I port to another front-end. Can it be done through SDL?

As that is a separate topic than emulating Area 5150, I have moved my reply to this to this separate thread:

HOWTO: Possible Lagless VSYNC for PC/DOS Emulator Devs (implemented in WinUAE/etc), via beam-raced tearingless VSYNC OFF

This information is useful to all emulator developers currently emulating graphics chips in a beamraced manner (such as MartyPC, but also applies to most 8-bit and 16-bit platforms too)

Founder of www.blurbusters.com and www.testufo.com
- Research Portal
- Beam Racing Modern GPUs
- Lagless VSYNC for Emulators

Reply 110 of 113, by superfury

User metadata
Rank l33t++
Rank
l33t++

Just checked Windows 95 booting (also weird overscan behaviour).

It displays the same horizontal rendering errors as the CGA demo version did now?
OK. Just found the cause: the Tseng chips will have it render double pixels on the overscan due tomemory clock divide being set in 8-bit clocking mode. This is now restricted to active display only.
That fixes the overscan part of booting Windows 9x (320x200 256-color to be exact).
Though only the first row of active display is behaving with a slight skew of 1 pixel on the very first scanline of active display?
Would that perhaps be happening on the CGA as well, some common bug? Hmm....
Edit: Managed to fix it. The issue was with the horizontal/vertical total handlers updating to the next coordinate to render misbehaving.
Text mode otoh (scandisk running in Windows 95's boot) seems to use some weird timings (27 front porch, 45 back porch with 9 pixel wide character clocks?).
Although that might supposed to be, since base timing is 9 clocks on the character clock (which is set to use that in the sequencer, which applies to text mode)?
Edit: After that fixed EGA and up overscan rendering to properly render during vertical overscan. It was having the same issue as above, just a different cause (doubling pixel clocks when it shouldn't for overscan, as it isn't obeying DCR on EGA and up).
Now the screens look fine again on EGA and up, although some timings seem to be a bit off (retrace and/or horizontal total seems a bit off in some modes?).

CGA still seems fine (running area5150 atm to verify).
No changes so far until the elephant part at least (still the corruption on the single lines part above and below the wavey effect part. The wavey effect scrolling and wavey effect run fine btw.
It looks like the start 1/8th of the scanline is those lines with random fattening on it? It seems like it might be related to the waves a bit? From that point until the last part of the scanlines it's a solid block of overscan color it looks like? Then the final part (1/8th or so) of the scanlines is actually correct as in the official recording of the demo, displaying the thin and fat horizontal lines?

Edit: Hmmm... Just looked up the video of Viler on the demo on Youtube...
https://www.youtube.com/watch?v=fWDxdoRTZPc&t=271s
Hmmm... I do see that fat/thin effect happening at the first few character clocks of those lines as well. Only in UniPCemu, most of the remainder, until about the location of the scrolling in of the next text to wave (it's spawn point, so to speak) is a single color the same as the area above it? Just one big block of a single color, no lines continued on. Perhaps something to do with the blanking effect you guys talked about? Is it actually used at that point in the demo?

Edit: Yay! Some progress at least: the 3D vector image part doesn't show those black renderings on the top when bouncing the object(s) against the top part anymore! 😁

https://www.youtube.com/watch?v=fWDxdoRTZPc&t=476s
Just looked at that again. The whole effect with the black bars overlapping the screen doesn't happen? Instead the whole animation just freezes, with the ship warping to the top of the screen on a static background image for a few seconds?
Edit: Also, the green part on the credits right side looks like it's overscan? So that would make that weird part to the right of it, if it's on the same scanline, the next row being rendered?
The row at the top of the image looks like it's being animated. So that would mean it's actually the bottom of the image that got pushed past vertical retrace somehow?

Also the See ya part is kind of weird after the credits. It's moving right and left, but has a copy in what looks like ANSI art below it moving with it? I don't think that's supposed to happen?
Anyone knows what's happening there?

Author of the UniPCemu emulator.
UniPCemu Git repository
UniPCemu for Android, Windows, PSP, Vita and Switch on itch.io

Reply 111 of 113, by GloriousCow

User metadata
Rank Member
Rank
Member

My ultimate goal is to help other emulators achieve emulating Area5150 and help elevate PC emulation in general.

To that end, I'm excited to publish what should be a pretty useful resource, a full logic analyzer dump of the Lake (end credits) effect from Area 5150:
https://github.com/dbalsom/marty_tools/blob/m … a5150/README.md

Together with the 8088 sigrok decoder I wrote, you can see a timeline of the entire effect including instruction disassembly and t-states, CS:IP calculation, interrupts, DMA, timer clock - and crucially, VS, HS and DEN off the CRTC.

There's also a traditional cycle log in CSV format there as well, if you'd prefer automated processing. It can optionally be converted into an excel formatted file with hyperlinks via the 'excelify.py' tool.

The decode can also be visualized as a CGA video stream with events overlaid:

Filename
timeline.png
File size
6.83 KiB
Downloads
No downloads
File comment
area 5150 isr setup timing
File license
Public domain

Even if you're not interested specifically in conquering Area 5150 - this is a pretty useful resource for research into general timings of the IBM 5150 and the 8088. There are a lot of halts and interrupt transitions in here! Some things surprised me, like the horizontal offset for DEN. The original 50Mhz capture is also available but too big for github; PM me if you're interested and I'll figure out a way to send it to you.

MartyPC: A cycle-accurate IBM PC/XT emulator | https://github.com/dbalsom/martypc

Reply 112 of 113, by superfury

User metadata
Rank l33t++
Rank
l33t++

Perhaps a bit related to the whole 8088 accuracy one.

Just found this description of pretty much all 808x microcoded (undocumented/missing) opcodes and their behaviour:
https://www.righto.com/2023/07/undocumented-8 … structions.html

Might be interesting to implement one day?

I have some already implemented of course (I think most of the more easy ones are already implemented, as they're widely documented):
D6 SALC (widely documented on newer CPUs)
0F POP CS (one thing I haven't implemented yet is the specific no PIQ flush behaviour). Edit: Implemented the PIQ non-flushing behaviour.
60-6F conditional jumps (widely documented)
C0 to C2, C8 to CA aliasing. Implemented as well.
C1 to C3 aliasing. Implemented as well.
C9 to CB aliasing. Implemented as well.
F1 is LOCK alias (actually triggers the same logic as F0 in the logic). Not implemented yet in UniPCemu (UniPCemu decodes it as a special prefix on 80286 (only has effect on the undocumented SAVEALL before locking up the CPU waiting for reset signal from external hardware), all other CPUS (including 8086) implement it as a opcode instead (ICEBP/INT1)). Edit: Just implemented that one.

Then the two-byte opcode holes are interesting too:
D0-D1 /6: SETMO. Set minus one. Basically sets FFh or FFFFh. Don't know about any flags modified. Edit: Probably implemented correctly now.
D2-D3 /6: Don't know what to make of this. Some unknown ALU logic (or maybe a repeated SETMO 'shift' operation using CL)? Edit: Probably implemented correctly now.
F6 /1, F7 /1 aliased to F6 /0 and F7 /0 respectively. Already implemented of course.
FE/FF /7 alias to FE/FF /6. This is pretty simple aliasing (the documented PUSH). UniPCemu performs #UD on 186+ and NOP on 808x though, due to no #UD existing on said CPU). Edit: Implemented this aliasing now.
82 is simply aliased to 80 in UniPCemu. Looks the same as the description. 82 simply sign-extends the immediate, as is documented on newer CPUS (and the description mentions it as well), so it should be no issue.
FE /2-6 are interesting. UniPCemu handles them as undefined (same as FF /7 right now), but they're actually the same as FF /2-6 apparently aliases to special 'versions' because it uses the same microcode as the GRP5 opcodes (opcode FF). An interesting thing there is that they seem to behave the same, except all memory accesses (stack and modr/m reads) are performed as byte accesses. From what I can tell FE /2 (from what he tells) probably behaves like that as well, but I'm not 100% sure?

Edit: So now implemented most of them. All that remains are (GRP2) SETMO, The D2-D3 /6 (perhaps also SETMO, but using CL somehow?) and FE(GRP4) /2-/6 (/7 aliased to /6) with their 8-bit modr/m and stack access reads and writes.

Edit: Interesting, the FE 'corrupted' versions might not be that useful (perhaps as a random input generator or something like that?) but POP CS, SETMO and the GRP5(opcode FFh) /7 PUSH Ev and perhaps D2-D3 /6 (unknown behaviour. Perhaps SETMO as well, but different cycle count?) might be interesting to emulate for accuracy and future demoscene?

Edit: And with the SETMO and SETMOC instructions implemented (808x only) that only leaves the weird versions of the FE /2-/7 opcodes to implement. The timings for SETMO/SETMOC are probably the same as for the other shifts I think (common shift logic after all)?
Edit: Implemented the 808x undocumented GRP4 (opcode FEh) instructions as well, using GRP5(opcode FFh) as a base (reimplementing the 808x INC/DEC 8-bit GRP4 instructions though).

Author of the UniPCemu emulator.
UniPCemu Git repository
UniPCemu for Android, Windows, PSP, Vita and Switch on itch.io