VOGONS


Reply 40 of 57, by clb

User metadata
Rank Member
Rank
Member
mkarcher wrote on 2023-08-22, 18:24:

I don't want to spoil the fun in this thread, but as I understand it, the "RNG" is just reading the currently displayed address (which is actually a very great demo hack if you try to race the beam!). Couldn't we get similar results by just reading the low byte of system timer 0? Perhaps the VGA "RNG" is more interesting, as it doesn't increase during the blanking interval, but most of the RNG effect seems to be that you can access a counter that is not synchronized to the processor clock.

Yeah, that would indeed also be a source of entropy. Also one could set Sound Blaster mic input volume to maximum and record entropy from there. Or record entropy by collecting timer timestamps for when the user triggers a keyboard interrupt or clicks on a mouse, or examine the low bits of mouse cursor moving directions. No claims to this being the only invented source of entropy on a PC system, there are definitely several. Nor the most practical, or ubiquitous or anything like that.

It is just fun to discover something new.

Reply 42 of 57, by clb

User metadata
Rank Member
Rank
Member
Pierre32 wrote on 2023-08-22, 19:26:
mkarcher wrote on 2023-08-22, 18:24:

I don't want to spoil the fun in this thread

Yet you continued typing.

Just jokes. It's all interesting stuff.

It's alright, Mkarcher intended no pun. James Acaster knows how these things go: https://www.youtube.com/watch?v=wEPHYbRDQb0

"They'll hate you more than the time you openly defended murder." 😁

Reply 43 of 57, by clb

User metadata
Rank Member
Rank
Member

Updated a new version of RNG Adventure with some new features and cleanups, for all the two other people in the world who will be able to play. The game is now in the same RNG.zip file as the RNG test tool: https://github.com/juj/crt_terminator/raw/mai … DOS/bin/RNG.zip

Attachments

Reply 44 of 57, by ViTi95

User metadata
Rank Member
Rank
Member
clb wrote on 2023-08-22, 18:40:
mkarcher wrote on 2023-08-22, 18:24:

I don't want to spoil the fun in this thread, but as I understand it, the "RNG" is just reading the currently displayed address (which is actually a very great demo hack if you try to race the beam!). Couldn't we get similar results by just reading the low byte of system timer 0? Perhaps the VGA "RNG" is more interesting, as it doesn't increase during the blanking interval, but most of the RNG effect seems to be that you can access a counter that is not synchronized to the processor clock.

Yeah, that would indeed also be a source of entropy. Also one could set Sound Blaster mic input volume to maximum and record entropy from there. Or record entropy by collecting timer timestamps for when the user triggers a keyboard interrupt or clicks on a mouse, or examine the low bits of mouse cursor moving directions. No claims to this being the only invented source of entropy on a PC system, there are definitely several. Nor the most practical, or ubiquitous or anything like that.

It is just fun to discover something new.

I wonder if there is a faster way to have hardware RNG on x86 computers (without any special instruction like RDRAND, just i386 instructions). This method requires 2 OUT + 2 IN instructions per generated number, which is a bit slow. Same for using PIT timer as a randomizer, it requires 1 OUT + 2 IN.

https://www.youtube.com/@viti95

Reply 45 of 57, by clb

User metadata
Rank Member
Rank
Member
ViTi95 wrote on 2023-08-22, 21:20:

I wonder if there is a faster way to have hardware RNG on x86 computers (without any special instruction like RDRAND, just i386 instructions). This method requires 2 OUT + 2 IN instructions per generated number, which is a bit slow. Same for using PIT timer as a randomizer, it requires 1 OUT + 2 IN.

I was thinking that very thing as well last weekend. For testing, I implemented a hardware RNG into CRT Terminator that is just behind a single IN port, via

unsigned char rng = inp(0x125);

i.e. no indexed subregisters business, and as result, no need to disable() + enable() interrupts either.. Iirc I think I got about 4000 kbit/sec on my Cyrix 486 80 MHz with ISA bus at 10 MHz - though I'll have to verify that number.

(I am not yet 100% sure if we will ship RNG that as part of CRTT though, since our floorplan is *extremely* constrained at the moment. It would be a cool homage though)

You could try changing the Lightpen RNG code to only do one uchar read from the indexed subregister 03D5h/11h and ignore 03D5h/10h completely, and then remove the interrupt disable+enables and hence the write to 03D4h index register. If you can guarantee that the 3D4h index won't change underneath, that will let you prototype the absolute max performance.

If that is still not fast enough, the only remedy I think would be to start to decimate calls to the port, i.e. run a LFSR for several rng calls, and then only e.g. every 4th or 8th call wash new entropy into the LFSR state. The LFSR function should then be changed a little bit to advance more than one bit at an iteration though to retain randomness.

Reply 46 of 57, by clb

User metadata
Rank Member
Rank
Member
BloodyCactus wrote on 2023-08-22, 17:28:

hmmm my tandy 1000 sx has a lightpen interface built in (but i have no light pen for it)... its just not vga 😀

If the lightpen interface has a manual way to latch the pen address register, then I think it would be possible to use it as well. Earlier I commented that the RNG.EXE code was found not to work on IBM EGA adapter, but EGA cards do have a manual way to latch the pen address register for diagnostic purposes, so I think that will make it work - though the code needs updating from the current VGA format to match that EGA mechanism.

If Tandy 1000 sx lightpen interface has that kind of manual testing mechanism as well, then it might be possible to work.

Reply 47 of 57, by rasz_pl

User metadata
Rank l33t
Rank
l33t
mkarcher wrote on 2023-08-22, 18:24:

I don't want to spoil the fun in this thread, but as I understand it, the "RNG" is just reading the currently displayed address (which is actually a very great demo hack if you try to race the beam!).

Only if its dummy implementation is always triggering? Would reprogramming CRTC Horizontal Total/Blanking to make most of the screen blank (stopping video core from fetching new data) affect it into returning same value most of the time? One could also slow video output down substantially (soft15khz).

Open Source AT&T Globalyst/NCR/FIC 486-GAC-2 proprietary Cache Module reproduction

Reply 48 of 57, by mkarcher

User metadata
Rank l33t
Rank
l33t
rasz_pl wrote on 2023-08-23, 01:35:

Only if its dummy implementation is always triggering?

As I understood the quote from the Headland documentation, it is "never triggering", not "always triggering". The lightpen register is counting with the current address counter until the light pen emits the signal that it got hit by the beam. That's the point in time the light pen register freezes. As there is no lightpen trigger input on the VGA anymore, there is no way to freeze the counter, so it happliy keeps counting all the time.

rasz_pl wrote on 2023-08-23, 01:35:

Would reprogramming CRTC Horizontal Total/Blanking to make most of the screen blank (stopping video core from fetching new data) affect it into returning same value most of the time?

That's a very interesting question! I expect different clones to behave differently in this regard. The datasheet of the 6845 (which the EGA/VGA CRTC is loosely based on) seems to indicate that at the start of the blanking, the address counter is captured into some shadow register, but it keeps counting into the blanking. At the end of blanking, it is restored from that shadow register. On the other hand, on the EGA/VGA code, there is the programmable line offset register, so there likely is a shadow register in the CRTC that gets incremented by the offset register each line (or twice the offset register? I vaguely remember that the offset register is to be set to half the desired offset in byte mode, a quarter the desired offset in word more and so on), and at the start of each scan line, the address counter is loaded from the shadow register. In contrast to the 6845, the displayed line length has no effect on the start address of the next scan line.

rasz_pl wrote on 2023-08-23, 01:35:

One could also slow video output down substantially (soft15khz).

Indeed, using the lowest available dot clock (25.xxx MHz / 2) and the widest available character widht (9 pixels) and then REP INSBing the light pen register will likely provide a nice set of samples to examine the address counter behaviour.

Reply 49 of 57, by ViTi95

User metadata
Rank Member
Rank
Member
clb wrote on 2023-08-22, 22:19:

...
If that is still not fast enough, the only remedy I think would be to start to decimate calls to the port, i.e. run a LFSR for several rng calls, and then only e.g. every 4th or 8th call wash new entropy into the LFSR state. The LFSR function should then be changed a little bit to advance more than one bit at an iteration though to retain randomness.

I've been testing an stupid idea that works kinda well (at least for the Doom fuzzy renderer), and is to use the parity flag as mixer function for the input data:

mov edx,0x40 ; Use PIT as input

...

loop(
...
in al,dx ; Read port -> 6 cycles (386), 8 cycles (486)

xor ebp,eax ; XOR with random unused register, set PF flag -> 2 cycles (386), 1 cycles (486)
lahf ; Get flags (PF -> AH) -> 2 cycles (386), 3 cycles (486)
mov al,ah ; Move flags to lower register (AH -> AL) -> 2 cycles (386), 1 cycles (486)
and eax,0x4 ; Use PF as pointer to the random table values -> 2 cycles (386), 1 cycles (486)

mov ebx,[_rndtable+eax] ; +80 or -80 depending on PF -> 4 cycles (386), 1 cycles (486)
; TOTAL: 18 cycles (386), 15 cycles (486)
...
)

https://www.youtube.com/@viti95

Reply 50 of 57, by clb

User metadata
Rank Member
Rank
Member

Very nice use of the flags register! Awesome to see your assembly skills so sharp.

I thought to try incorporate that into my code,where I was calculating the parity of a "unsigned char rng" with

rng ^= rng >> 4;
rng ^= rng >> 2;
rng ^= rng >> 1;
return rng & 1;

that netted 131.047 kbits/second RNG generation.

I switched that to the same code from you

asm {
mov al, [rng]
or al, al
lahf
mov al, ah
shr al, 1
shr al, 1
and al, 1
mov [rng], al
}
return rng;

adapted a little bit for 8088 instruction set immediate shifts, although my result with was then only 129.949 kbits/second, when running on a 486. This is with Borland Turbo C++ 3.0 with 8088 instruction set. Hmm, need to dig more into it. Although performance is not the critical thing in this usage.

Reply 51 of 57, by ViTi95

User metadata
Rank Member
Rank
Member

Maybe this can optimize even further the parity calculation @clb for 486 cpu's, but may be slower on 8088/286 cpu's as shifting using CL register is usually slow:

 Compute parity in parallel

unsigned int v; // word value to compute the parity of
v ^= v >> 16;
v ^= v >> 8;
v ^= v >> 4;
v &= 0xf;
return (0x6996 >> v) & 1;

The method above takes around 9 operations, and works for 32-bit words. It may be optimized to work just on bytes in 5 operations by removing the two lines immediately following "unsigned int v;". The method first shifts and XORs the eight nibbles of the 32-bit value together, leaving the result in the lowest nibble of v. Next, the binary number 0110 1001 1001 0110 (0x6996 in hex) is shifted to the right by the value represented in the lowest nibble of v. This number is like a miniature 16-bit parity-table indexed by the low four bits in v. The result has the parity of v in bit 1, which is masked and returned.

Thanks to Mathew Hendry for pointing out the shift-lookup idea at the end on Dec. 15, 2002. That optimization shaves two operations off using only shifting and XORing to find the parity.

(From https://graphics.stanford.edu/~seander/bithacks.html, an awesome web with C bit fiddling tricks)

https://www.youtube.com/@viti95

Reply 53 of 57, by wbc

User metadata
Rank Member
Rank
Member

seems I came late to the party :D

Indeed, as mkarcher already pointed out, this Light Pen posiiton readback basically works as Current CRTC Position pointer readback, allowing to find out where the CRT beam is, theoretically with 8/9 pixels and 1-2 line precision, depending on . Actually, as VGA runs totally asyncronously from CPU (unlike i.e. CGA on geninue IBM PC), actual precision is like minus 24-64 pixels or even more, depending on CRTC registers I/O access performance, and as this feature is not supported by a fair bunch of VGA clones, it is pointless to rely on, as CRT beam tracking can be more or less precisely simulated by synchronizing IRQ0 or i8254 PIT channel 2 (aka PC Speaker) to the VGA vertical retrace and reading back current counter value to estimate current beam position (and unlike VGA Light Pen Position, PIT counters can be latched, so you don't have to worry about high byte carryover).

As conclusion, this is just another prime example of half-baked feature which is potentially usable but done wrong enough to lost any of its advantages, as in case of VGA Vertical Retrace IRQ, which on most VGA cards (especially PCI/AGP) is routed to anywhere but IRQ2, and triggered at start of active screen, while using end of active screen or at least VSYNC leading edge would make more sense (as CRTC Start Address and Pixel Panning registers, used to perform smooth hardware scrolling, are latched for the next frame at leading and trailing VSYNC edge, respectively)

Anyway, I made another small testing utility, which enables Light Pen Position readback and REP INSB's the corresponding CRTC registers to the disk, so you can examine how it works on your VGA card. Additionally, it enables Video Status Mux (which is used in EGA for diagnostics purposes, exposing 2 of 8 bits of the VGA-RAMDAC pixel bus in the Input Status Register (0x3DA) bits 5-4) to feed back bits 6 and 7, with frame buffer cleared to 00 and border set to color F0, then similarly dumps port 3DA contents to the disk. All of this is done in 320x200 16color mode, 256 color chained 0x13, and Mode-X. Sources included, as always :)

more info about LPENTEST
syntax is:
lpentest.exe [buffer_size_in_kbytes]

where [buffer_size_in_kbytes] = optional I/O port buffer size
in kbytes (default is 128kb)

LPENTEST does the following:
- tests 320x200 16 color and 256 colormodes, including Mode-X
- set screen buffer color to 00, border color to F0
- enable the Video Status Mux to feed back bits 7 and 6 of video
data in bits [5:4] of Input Status Register
- init Light Pen Position Readback (controlled by CR03 bit 7),
and test for its presence
- set CRTC Start Address to 0x0000, then 0x5555
- dump Input Status Register (0x3DA) to I/O port buffer to get
reference horizontal and vertical timing info
- dump Light Pen High, then Low Address (CR10/11), to test light
pen readback behavior
- time the dumping process to estimate I/O performance
- dump the data for further analysis

timings are shown in PIT (1.19 MHz) ticks, divide by 0x1234DD to get
I/O buffer readback time in seconds.

dump filename format:
[cc]C_[yyy].[zzz], where

[cc] - mode [16C - 16 color, 256C - 256 color, MX - Mode-X]
[yyy] - type of test data
[3DA - Input Status Register, LO - Light Pen Low Position,
HI - Light Pen High Position]
[zzz] - CRTC Start Address shifted right by 4 (either 000 or 555)

--wbcbz7 o7.o9.2o23[/details]

Attachments

  • Filename
    LPENTEST.zip
    File size
    56.61 KiB
    Downloads
    38 downloads
    File license
    Fair use/fair dealing exception
Last edited by wbc on 2023-09-07, 18:08. Edited 1 time in total.

--wbcbz7

Reply 54 of 57, by wbc

User metadata
Rank Member
Rank
Member

By the way, I've also tested Intel 865 integrated graphics in RNG.EXE, it says there is no PRNG, and my utility comfirms that - reading CRTC index 0x10/11 returs Vertical Retrace Start/End register contents, as usual. Which is at least unexpected, since some sources claim that Intel VGA core is derived from Chips&Technologies chips, and those C&T cards seem to support light pen position readback (or it was true for i740/752 but not for newer Intel chips?)

--wbcbz7

Reply 55 of 57, by wbc

User metadata
Rank Member
Rank
Member

One more interesting tidbit - Matrox Millennium programming manual states that CRTC register 0x03 bit 7 (which routes CR10/11 reads between Light Pen Position and Vertical Blank Start/End) is used "for chip testing on the IBM VGA":

Снимок.PNG
Filename
Снимок.PNG
File size
79.59 KiB
Views
935 views
File license
Fair use/fair dealing exception

perhaps we have to test it on an actual IBM VGA card (and/or PS/2 integrated video, if that makes sense) to solve this :)

--wbcbz7

Reply 57 of 57, by clb

User metadata
Rank Member
Rank
Member

While testing the behavior in Snow on ISA vga card , I found out that Ahead V5000B card also implements the freerunning lightpen registers:

SNOOP-Ahead-5000B.png
Filename
SNOOP-Ahead-5000B.png
File size
90.15 KiB
Views
845 views
File license
Public domain
RNG-Ahead-V5000B.png
Filename
RNG-Ahead-V5000B.png
File size
24.25 KiB
Views
845 views
File license
Public domain
RNG-Ahead-V5000B-MonteCarlo.png
Filename
RNG-Ahead-V5000B-MonteCarlo.png
File size
1.26 MiB
Views
845 views
File license
Public domain