reenigne wrote on 2020-02-19, 12:13:
wbhart wrote on 2020-02-19, 00:08:2 stosbs = 2x11 cycles + 2x8 cycles of 8 bit memory accesses (including what the manual calls bus cycles) = 38 cycles
1 stosw = 15 cycles + 16 cycles of 16->8 bit conversion costs + 2x8 cycles of 8 bit memory accesses = 47 cycles
stosw is 11 cycles on 8086. I think the 2x8 cycles of 8 bit memory accesses are included in the conversion costs. The total time for a 16-bit access on the 8-bit bus is 2uS or 16 cycles, 12 more than the 4 it would be on the 16-bit bus. So the stosw time would be 11+12 = 23 cycles, not including CGA-specific wait states.
That would mean the numbers really don't add up. A raster is 509.6 cpu cycles. If we take 11 cycles for stosw (granted, I used the wrong number here) and just 12 additional cycles for the part of the access not already included in the stosw time, then 7 stosw's and 21 nops should take 161 + 63 = 224 cycles. That is one hell of a long way short of 509.6.
Given that the nops are surely executing while the CGA wait states are happening between stosw's and given that two (or three; I forget) of the stosws cannot be replaced with a pair of stosb's and the rest can only be replaced with pairs of stosb's separated by 2 nops at most, there can't be that many CGA wait states happening during stosw's.
And now you see precisely the source of my confusion regarding the timings! It's almost twice as slow as one would expect based on the info in the manual.
reenigne wrote on 2020-02-19, 12:13:
wbhart wrote on 2020-02-19, 00:08:* Try changing the palette every second raster to get more colours (interrupts probably have to be off, e.g. the keyboard would probably stuff this up; might be ok for a demo effect though)
The keyboard will only give you an interrupt if you press or release a key. But some ISA cards have their own interrupts, so it's always best to work with interrupts off when doing stuff that requires very precise timings.
Sure. I was thinking of whether this could be used for games. Probably not, as for example California games already gets a significant amount of colour bleed at the points where it changes palettes when the keys are mashed in the games (on an IBM PC ) in their more-colour CGA mode.
reenigne wrote on 2020-02-19, 12:13:
wbhart wrote on 2020-02-19, 00:08:* Try to figure out why I never see anything like 46 cycle gaps (with 3 cycles per nop, the most I ever see is actually 15 cycles)
Ah, so the 46 cycle gap was documented rather than observed? It's possible that they were conservative with the documentation and they changed the design so that the refresh wait states were spread out along the scanline, meaning that the 46 cycle wait never actually occurs.
Yes, I am taking the figure of 46 cycles every scanline from the Amstrad manual. I do not observe it in practice.
Going from memory right now, but I think the most optimal pattern per raster for stosws and nops is:
stosw, nop x5, stosw, nop x2, stosw, nop x3, stosw, nop x2, stosw, nop x3, stosw, nop x4, stosw, nop x2.
I don't remember off the top of my head which of the stosws can and which can't be replaced with a pair of nops.
I tried to find a two raster pattern last night, and whilst the behaviour was very slightly different, there seems to be no variation in the maximum number of nops that do not cause additional glitches.
Again going from memory, but I recall that the nop x5 and the final nop x4 and nop x2 absolutely can't be increased. The others nearly can be increased by a single nop each, but every few frames there will be glitches as a stosw gets delayed until the next gap. This could be due to other overheads in my code. But I did some work to cut down the variation in detection of vertical retrace. Unfortunately that just didn't make any difference. If I add additional nops it occasionally misses the best place to insert stosws and this causes very regular and noticeable glitches, though clearly most rasters the extra nop is tolerated without causing a stosw to miss its schedule. I also tried adding additional rasters at the beginning with fewer or no nops between to help things settle down before inserting extra nops in the pattern later. Again, no difference.
At one point I had the code set up in such a way that addition of a single extra nop per frame (not per raster) would cause totally different behaviour. That was very surprising, and I don't have an explanation just yet, though my vertical retrace code is designed to be very fragile so it will quit immediately if it misses detection in a small window. The extra nop was presumably pushing it off one end of the short window I set up. But this is still very surprising as the additional nop was after the v-retrace detection, which means that everything else in the frame was so tightly packed that it pushed a subsequent v-retrace detection off the ends of the window for detection.
I will try three raster patterns tonight. I intend to go all the way up to 6 raster patterns is necessary, as the numbers as I currently have them just don't add up.