VOGONS


Software for the IBM PC Light Pen?

Topic actions

Reply 20 of 30, by reenigne

User metadata
Rank Oldbie
Rank
Oldbie
reenigne wrote on 2023-09-07, 12:30:

I'm not actually sure if the CGA loads the character or attribute first, now you mention it. I had been assuming character, but it's not actually very easy to tell from the schematic (and not observable from software either - at least not easily).

I've now determined that it's not observable from software, and have made myself a diagram showing when everything happens within each 16-hdot CGA cycle. Because of the pipelining induced by the various latches, it actually takes about 32 hdots from when the CRTC outputs a new address until the last pixel of the corresponding character hits the monitor cable.

Counting in hdots:
0: CRTC starts outputting address for location 0.
6: CRT address latched for location 0.
6.5: CRT RAS (Row Address Strobe) output enabled for location 0.
8: RAS for location 0. CRTC starts outputting address for location 1 (+HRES only).
8.5: CRT CAS (Column Address Strobe) output enabled for character 0.
9: CAS for character 0.
11: Character 0 data latched.
12: CRT CAS changed to odd address for attribute 0.
12.5: CAS for attribute 0.
14: CPU or CRT(+HRES) address latched for location 1.
14.5: CPU or CRT(+HRES) RAS output enabled for location 1.
15: attribute data latched (+GRPH only).
16: Row Address Strobe for location 1. Attribute data latched (-GRPH only). Shift register loaded and starts outputting hdot 0.
16.5: CPU or CRT(+HRES) CAS output enabled for character 1. Output buffer starts outputting hdot 0.
17: CAS for character 1. Shift register starts outputting hdot 1.
17.5: Output buffer starts outputting hdot 1.
18: Shift register starts outputting hdot 2.
18.5: Output buffer starts outputting hdot 2.
19: Character 1 data latched. Shift register starts outputting hdot 3.
19.5: Output buffer starts outputting hdot 3.
20: CRT(+HRES) CAS changed to odd address for attribute 1. Shift register starts outputting hdot 4.
20.5: CAS for attribute 1. Output buffer starts outputting hdot 4.
21: Shift register starts outputting hdot 5.
21.5: Output buffer starts outputting hdot 5.
22: Shift register starts outputting hdot 6.
22.5: Output buffer starts outputting hdot 6.
23: Shift register starts outputting hdot 7.
23.5: Output buffer starts outputting hdot 7.
24: Attribute 1 data latched (+HRES-GRPH only). Shift register loaded and starts outputting hdot 8.
24.5: Output buffer starts outputting hdot 8.
25: Shift register starts outputting hdot 9.
25.5: Output buffer starts outputting hdot 9.
(and so on for hdots 10-15).

This sequence overlaps (offset by 16 hdots) the next cycle over.

It's unfortunate that attribute (odd) byte 1 is not latched in +HRES+GRPH mode, or more interesting things could be done with this improper mode. But that would have required more gates and the CGA doesn't have enough VRAM for a 640x200x4 image.

Reply 21 of 30, by superfury

User metadata
Rank l33t++
Rank
l33t++
reenigne wrote on 2023-09-08, 12:58:
I've now determined that it's not observable from software, and have made myself a diagram showing when everything happens withi […]
Show full quote
reenigne wrote on 2023-09-07, 12:30:

I'm not actually sure if the CGA loads the character or attribute first, now you mention it. I had been assuming character, but it's not actually very easy to tell from the schematic (and not observable from software either - at least not easily).

I've now determined that it's not observable from software, and have made myself a diagram showing when everything happens within each 16-hdot CGA cycle. Because of the pipelining induced by the various latches, it actually takes about 32 hdots from when the CRTC outputs a new address until the last pixel of the corresponding character hits the monitor cable.

Counting in hdots:
0: CRTC starts outputting address for location 0.
6: CRT address latched for location 0.
6.5: CRT RAS (Row Address Strobe) output enabled for location 0.
8: RAS for location 0. CRTC starts outputting address for location 1 (+HRES only).
8.5: CRT CAS (Column Address Strobe) output enabled for character 0.
9: CAS for character 0.
11: Character 0 data latched.
12: CRT CAS changed to odd address for attribute 0.
12.5: CAS for attribute 0.
14: CPU or CRT(+HRES) address latched for location 1.
14.5: CPU or CRT(+HRES) RAS output enabled for location 1.
15: attribute data latched (+GRPH only).
16: Row Address Strobe for location 1. Attribute data latched (-GRPH only). Shift register loaded and starts outputting hdot 0.
16.5: CPU or CRT(+HRES) CAS output enabled for character 1. Output buffer starts outputting hdot 0.
17: CAS for character 1. Shift register starts outputting hdot 1.
17.5: Output buffer starts outputting hdot 1.
18: Shift register starts outputting hdot 2.
18.5: Output buffer starts outputting hdot 2.
19: Character 1 data latched. Shift register starts outputting hdot 3.
19.5: Output buffer starts outputting hdot 3.
20: CRT(+HRES) CAS changed to odd address for attribute 1. Shift register starts outputting hdot 4.
20.5: CAS for attribute 1. Output buffer starts outputting hdot 4.
21: Shift register starts outputting hdot 5.
21.5: Output buffer starts outputting hdot 5.
22: Shift register starts outputting hdot 6.
22.5: Output buffer starts outputting hdot 6.
23: Shift register starts outputting hdot 7.
23.5: Output buffer starts outputting hdot 7.
24: Attribute 1 data latched (+HRES-GRPH only). Shift register loaded and starts outputting hdot 8.
24.5: Output buffer starts outputting hdot 8.
25: Shift register starts outputting hdot 9.
25.5: Output buffer starts outputting hdot 9.
(and so on for hdots 10-15).

This sequence overlaps (offset by 16 hdots) the next cycle over.

It's unfortunate that attribute (odd) byte 1 is not latched in +HRES+GRPH mode, or more interesting things could be done with this improper mode. But that would have required more gates and the CGA doesn't have enough VRAM for a 640x200x4 image.

OK. So basically 8 clocks for each byte fetching from VRAM. Latches loading and starting to shift when both bytes in text mode are read.

How does that combine with horizontal setup timings? UniPCemu shifts out every 1 (80 text or 640 graphics) or 2 cycles (depending on double width setting (80 vs 40 mode) that's set for the mode.
What about the first byte fetched from RAM?

Author of the UniPCemu emulator.
UniPCemu Git repository
UniPCemu for Android, Windows, PSP, Vita and Switch on itch.io

Reply 22 of 30, by reenigne

User metadata
Rank Oldbie
Rank
Oldbie
superfury wrote on 2023-09-08, 15:11:

How does that combine with horizontal setup timings? UniPCemu shifts out every 1 (80 text or 640 graphics) or 2 cycles (depending on double width setting (80 vs 40 mode) that's set for the mode.
What about the first byte fetched from RAM?

That's all a matter of the display enable and horizontal sync logic, which is separate from the pixel sequencing stuff. The timings of these signals are just tweaked to put the overscan in the right place with the respect to the active pixels, and to horizontally centre the image respectively.

Reply 23 of 30, by superfury

User metadata
Rank l33t++
Rank
l33t++
reenigne wrote on 2023-09-08, 15:28:
superfury wrote on 2023-09-08, 15:11:

How does that combine with horizontal setup timings? UniPCemu shifts out every 1 (80 text or 640 graphics) or 2 cycles (depending on double width setting (80 vs 40 mode) that's set for the mode.
What about the first byte fetched from RAM?

That's all a matter of the display enable and horizontal sync logic, which is separate from the pixel sequencing stuff. The timings of these signals are just tweaked to put the overscan in the right place with the respect to the active pixels, and to horizontally centre the image respectively.

So those 16 clocks fetching the character and attribute (and first byte of rendered graphics modes) are fetched before horizontal total? Or is it actually the cause of "+1" (or more) of the hotizontal total registers all CGA/EGA+/MDA cards? So does that mean the display start (start address register) and related horizontal/vertical timings (vertical timings and horizontal display skew etc. on EGA+) that apply to the first scanline and first character clock are always loaded on/right before the final clock of horizontal display (double bytes and 8 clocks(so 8 clocks x2) on text mode CGA)?

So in text mode the above first scanline displacements (and memory addresses etc.) and character/attribute are loaded/increased 16 clocks before horizontal total?

Author of the UniPCemu emulator.
UniPCemu Git repository
UniPCemu for Android, Windows, PSP, Vita and Switch on itch.io

Reply 24 of 30, by Calvero

User metadata
Rank Member
Rank
Member
reenigne wrote on 2023-09-03, 15:48:
GloriousCow wrote on 2023-09-03, 15:07:

Kinda like the NES Zapper.

Kind of, except that if I remember correctly the NES zapper didn't actually find the position of sensor using the timing of the raster beam - it replaced the target with a white square on a black background for a frame when you pressed the trigger, and just recorded whether a light pulse was received (hit) or not (miss). So you could hit the target every time by pointing the zapper at a light bulb!

The light bulb cheat doesn't work with a NES Zapper.

Reply 25 of 30, by reenigne

User metadata
Rank Oldbie
Rank
Oldbie
superfury wrote on 2023-09-08, 19:07:

So those 16 clocks fetching the character and attribute (and first byte of rendered graphics modes) are fetched before horizontal total?

Horizontal total isn't relevant to the sequence I mentioned above (except in so far as the memory addresses will be non-sequential). Horizontal total is a concept entirely internal to the CRTC, whereas the sequence is what happens to the memory addresses after they leave the CRTC. I realise we're talking at cross-purposes a bit here since I'm describing what the hardware does whereas you're coming it at from the point of view of an emulator author - clearly an emulator isn't going to do extra work to do all that pixel sequencing stuff during overscan as that would be extra work for no benefit, but the hardware does it because it simplifies things (and it also helps with DRAM refresh, but that's another story).

superfury wrote on 2023-09-08, 19:07:

Or is it actually the cause of "+1" (or more) of the hotizontal total registers all CGA/EGA+/MDA cards?

No, that's because the CRTC needs to know a cycle before the end of the line starts in order to set things up in advance.

superfury wrote on 2023-09-08, 19:07:

So does that mean the display start (start address register) and related horizontal/vertical timings (vertical timings and horizontal display skew etc. on EGA+) that apply to the first scanline and first character clock are always loaded on/right before the final clock of horizontal display (double bytes and 8 clocks(so 8 clocks x2) on text mode CGA)?

I'm not sure what you mean by "loaded" here. I'm not sure, but it's quite possible that inside the CRTC, the start address register is copied to the "address of start of row" internal register on the last character of the last row (i.e. when column == horizontal total register value and row == vertical total register value) so that "address of start of row" doesn't suffer a race condition by being read and written on the same cycle.

superfury wrote on 2023-09-08, 19:07:

So in text mode the above first scanline displacements (and memory addresses etc.) and character/attribute are loaded/increased 16 clocks before horizontal total?

No, the VRAM circuitry can't look ahead of what the CRTC is doing. I think part of the difficulty here is that you are thinking of "horizontal total" as being a specific moment in time, but things take a non-zero time to percolate through the graphics pipeline from the CRTC to the screen. So "horizontal total" from the CRTC's perspective is not the same thing as the end of overscan in terms of what's being loaded from VRAM, which is not the same things as the pixels being scanned out. That's what I was trying to illustrate with the sequence.

Reply 26 of 30, by superfury

User metadata
Rank l33t++
Rank
l33t++
reenigne wrote on 2023-09-09, 18:50:
Horizontal total isn't relevant to the sequence I mentioned above (except in so far as the memory addresses will be non-sequenti […]
Show full quote
superfury wrote on 2023-09-08, 19:07:

So those 16 clocks fetching the character and attribute (and first byte of rendered graphics modes) are fetched before horizontal total?

Horizontal total isn't relevant to the sequence I mentioned above (except in so far as the memory addresses will be non-sequential). Horizontal total is a concept entirely internal to the CRTC, whereas the sequence is what happens to the memory addresses after they leave the CRTC. I realise we're talking at cross-purposes a bit here since I'm describing what the hardware does whereas you're coming it at from the point of view of an emulator author - clearly an emulator isn't going to do extra work to do all that pixel sequencing stuff during overscan as that would be extra work for no benefit, but the hardware does it because it simplifies things (and it also helps with DRAM refresh, but that's another story).

superfury wrote on 2023-09-08, 19:07:

Or is it actually the cause of "+1" (or more) of the hotizontal total registers all CGA/EGA+/MDA cards?

No, that's because the CRTC needs to know a cycle before the end of the line starts in order to set things up in advance.

superfury wrote on 2023-09-08, 19:07:

So does that mean the display start (start address register) and related horizontal/vertical timings (vertical timings and horizontal display skew etc. on EGA+) that apply to the first scanline and first character clock are always loaded on/right before the final clock of horizontal display (double bytes and 8 clocks(so 8 clocks x2) on text mode CGA)?

I'm not sure what you mean by "loaded" here. I'm not sure, but it's quite possible that inside the CRTC, the start address register is copied to the "address of start of row" internal register on the last character of the last row (i.e. when column == horizontal total register value and row == vertical total register value) so that "address of start of row" doesn't suffer a race condition by being read and written on the same cycle.

superfury wrote on 2023-09-08, 19:07:

So in text mode the above first scanline displacements (and memory addresses etc.) and character/attribute are loaded/increased 16 clocks before horizontal total?

No, the VRAM circuitry can't look ahead of what the CRTC is doing. I think part of the difficulty here is that you are thinking of "horizontal total" as being a specific moment in time, but things take a non-zero time to percolate through the graphics pipeline from the CRTC to the screen. So "horizontal total" from the CRTC's perspective is not the same thing as the end of overscan in terms of what's being loaded from VRAM, which is not the same things as the pixels being scanned out. That's what I was trying to illustrate with the sequence.

So the fetching actually starts on character clock #0 with the output for character clock #0 (1 clock late) starting at character clock #1? So it's basically an extra overscan character clock before active display starts? And all horizontal active display timings should be shifted late by 1 clock (wrt the horizontal display register)?

Author of the UniPCemu emulator.
UniPCemu Git repository
UniPCemu for Android, Windows, PSP, Vita and Switch on itch.io

Reply 27 of 30, by reenigne

User metadata
Rank Oldbie
Rank
Oldbie
superfury wrote on 2023-09-10, 09:58:

So the fetching actually starts on character clock #0 with the output for character clock #0 (1 clock late) starting at character clock #1? So it's basically an extra overscan character clock before active display starts? And all horizontal active display timings should be shifted late by 1 clock (wrt the horizontal display register)?

This still isn't a question I can answer, because the time when a thing happens has to be specified relative to the time when some other thing happens. What do you mean by "the fetching actually starts"? From the sequence in my earlier message, you can see there's quite a few possible steps that could refer to. Likewise with "the output for character clock #0". And "an extra overscan character clock" - measured from when to when? The sequence isn't necessarily relevant to emulator development as it's such a low-level detail of how the CGA hardware works. The important thing for emulators is that they produce the correct output and that software can't tell if it's running on an emulator or real hardware.

Bringing this back around to the topic, the light pen does introduce a situation where the exact timing matters because the system as a whole (emulator and the software running on it) will have to compensate for the delay in the system from the CRTC's idea of what the position is, through the CGA pixel generation logic, monitor, lightpen and back to the CRTC for the light pen position latch. It's difficult to tell what the correct adjustment should be - I think the best way would be to actually try it out - calibrate my light pen software on real hardware and then try the same software on an emulator. I'll dig out my light pen later and see if I can get it working.

Reply 28 of 30, by superfury

User metadata
Rank l33t++
Rank
l33t++
reenigne wrote on 2023-09-10, 10:31:
superfury wrote on 2023-09-10, 09:58:

So the fetching actually starts on character clock #0 with the output for character clock #0 (1 clock late) starting at character clock #1? So it's basically an extra overscan character clock before active display starts? And all horizontal active display timings should be shifted late by 1 clock (wrt the horizontal display register)?

This still isn't a question I can answer, because the time when a thing happens has to be specified relative to the time when some other thing happens. What do you mean by "the fetching actually starts"? From the sequence in my earlier message, you can see there's quite a few possible steps that could refer to. Likewise with "the output for character clock #0". And "an extra overscan character clock" - measured from when to when? The sequence isn't necessarily relevant to emulator development as it's such a low-level detail of how the CGA hardware works. The important thing for emulators is that they produce the correct output and that software can't tell if it's running on an emulator or real hardware.

Bringing this back around to the topic, the light pen does introduce a situation where the exact timing matters because the system as a whole (emulator and the software running on it) will have to compensate for the delay in the system from the CRTC's idea of what the position is, through the CGA pixel generation logic, monitor, lightpen and back to the CRTC for the light pen position latch. It's difficult to tell what the correct adjustment should be - I think the best way would be to actually try it out - calibrate my light pen software on real hardware and then try the same software on an emulator. I'll dig out my light pen later and see if I can get it working.

I've now disabled the look-ahead mechanism and restored it to it's old state.
Instead, the active display now always starts and ends one character clock later on all video cards.
So each scanline will match the behaviour of your documented fetching and rendering. That first character clock spent fetching from VRAM (clocks 0-15 in your example) UniPCemu will now simply render overscan for those clocks instead of what's in the latches (delaying the output until the latches are loaded).
The horizontal total being reached by the horizontal counter behaves with it's old behaviour again (being normal CGA-documented loading of new scanline timings, same for vertical total).

The CGA/MDA will fetch the character first on a scanline, using known half clocks (which are 4 rendered pixel units worth of time, double timed in 40 column text modes). So that results in each 8 CGA clocks a character or attribute being fetched from VRAM (but not rendered yet until the clock after the attribute). Those are timed in parallel with the normal rendering, as they do on VGA and other graphics cards (so both tick at the same time).

So from clocks 0-8 in 80 column mode, 0-15 in 40 column text mode. The fetching is at 0 and 4 in 80 column modes, 0 and 8 in 40 column modes. Although they only take a single cycle atm.
When on the VGA, the active display can be moved to a later point in time using the horizontal display skew register setting (0-3 character clocks). This will move the fetching start now as well (to 8 or 16 character clocks before the active display signal starts, although done using a look-ahead of the horizontal timing).

Edit: Hmmm... VGA 8BPP modes and up seem to be broken on the ET4000 for some reason (including mode 13h). 4BPP (16 colors) and less seem to run without issues though.
WhatVGA also crashes on the i440fx now?

Author of the UniPCemu emulator.
UniPCemu Git repository
UniPCemu for Android, Windows, PSP, Vita and Switch on itch.io

Reply 29 of 30, by reenigne

User metadata
Rank Oldbie
Rank
Oldbie
reenigne wrote on 2023-09-10, 10:31:

Bringing this back around to the topic, the light pen does introduce a situation where the exact timing matters because the system as a whole (emulator and the software running on it) will have to compensate for the delay in the system from the CRTC's idea of what the position is, through the CGA pixel generation logic, monitor, lightpen and back to the CRTC for the light pen position latch. It's difficult to tell what the correct adjustment should be - I think the best way would be to actually try it out - calibrate my light pen software on real hardware and then try the same software on an emulator. I'll dig out my light pen later and see if I can get it working.

I found my light pen and connected it, but unfortunately it seems to be dead (switch works, but not seeing anything on the strobe signal). I also can't figure out how to take it apart to fix it.

I did find this thread on VCFed which points to some light pen software: https://forum.vcfed.org/index.php?threads/tan … annigans.71591/ . The links to the files are broken but perhaps messaging creepingnet would unearth them.