Reply 40 of 127, by reenigne
For "VGA": the "smart DMA" that would make the card pull and convert or interpret the data on-the-fly rather than just dumb RAM to VRAM copy would be nice. That would allow you to prepare a pixel group in system RAM, without having to worry about packing - just a pixel value per byte or word, and then let the card handle it.
That would indeed have been nice with the benefit of hindsight. I suspect that would have been way down on IBM's list of priorities for VGA, though, for a couple of reasons. One is that system RAM to VRAM blitting speed was probably not very important for the kind of workloads that they were envisioning (business graphics, desktop GUIs and perhaps a little bit of 2D scrolling games). One of the more demanding applications at the time was CAD so if they were adding hardware features to accelerate anything, then drawing Bresenham lines would have been it (and indeed I suspect some of the ALU/read-mode/write-mode stuff on the card was actually designed with that in mind).
Way I see it, we didn't need that clunky bitlplane stuff to make the early CPU go fast, it could've been done the right way from the start.
You see bitplanes as clunky because (as an interface from software to hardware) they are lot more complicated and difficult to understand than a linear framebuffer of packed pixels. But (and I guess this was my point when talking about the Wolfenstein 3D code) when you understand them they increase the flexibility of the hardware and can drastically accelerate a number of common operations (by a factor approaching 4) by performing operations in parallel. So they were a good design given the constraints of the time.
And if that took any changes to the IBM PC to make it happen? So be it. We did move from XT to AT, increased the bus size, IRQ/DMA channel count, then we had PCI - it was all about incremental changes for the better.
Perhaps. But part of the reason the PC was so successful was incremental upgrades and backwards compatibility. You could upgrade from a CGA to an EGA without changing your monitor. You could upgrade to VGA without getting a new motherboard as well, and so on. If VGA had required upgrading other parts of the system as well it might not have taken off nearly as well as it did.
Now, a DMA can saturate the bus, true, but if that is happening then you are trying to push too much data through it. How is that different from CPU not being fast enough to push all that by itself in the same amount of time?
It's not. Apart from DMA adding more complexity for hardware and programming. Which is rather my point - when you have a framebuffer that the CPU can copy to at approaching bus-saturation speeds, it rather negates the need for DMA.
If it turned out that CPU writes are faster then PC DMA, then this DMA could've simply be upgraded with another controller that is faster and could do microbursts, and pace itself rather than transfer everything in one go. Then you just need your code to work on a pixel group the size of the burst, while DMA is transfering the previous group. That would steal some cycles from you but not halt the CPU completly, so this works faster than having the CPU do everything.
Throughput is maximised when the bus is saturated, regardless of whether it's the CPU or the DMA controller that are saturating it.
For "CGA": I'd give it 32k RAM so that it could do 320x200 in 16 colors, and 640x200 in 4 colors. Simple 2 or 4 pixels per byte. Ideally it would use EGA-like 2-bit per pixel output but I don't want to sound like I'm just replacing CGA with EGA-lite. So let's stick to original RGBI. Then I would add palette LUT,
So far this is sounding exactly like PCjr graphics! Which did use a ULA (or something very similar).
or 2 in fact. Even and odd pixel columns would use LUT0 and LUT1, and those would be independent. This would also be swapped eve […]
or 2 in fact. Even and odd pixel columns would use LUT0 and LUT1, and those would be independent. This would also be swapped every row, like this (row (y), column (x)):
0,0: L0; 0,1: L1; 0,2: L0; 0,3: L1; ...
1,0: L1; 1,1: L0; 1,2: L1; 1,3: L0; ...
That's easy to do, a couple of XOR gates on the counter lowest bits to drive enable signal from the correct LUT to the output amps. Those LUTs would be small enough to use SRAM cells inside the GPU chip itself, but external SRAM also works. This way not only you are not forced to use "blue or not blue" colors, but you could very cheaply do dithering - if nothing else, to be used while showing static images.
Clever! I like that idea a lot. The "dither clock" doesn't even need to be tied to the pixel clock - it could have been sub-pixel dithering. I wonder what the availability and cost of suitable SRAMs (like the 7489?) would have been in 1981, and if the designers of the CGA considered using them instead of the palette logic that they ended up with.
But it should be fast enough to use in games and having palletes and HSYNC interrupt allows for all kinds of cool color increasing tricks - if you like demos.
HSYNC interrupts were possible with the original PC and CGA (we used one in the final version of 8088 MPH). I'm not sure why they weren't used more. One game that I know used them (California Games) had some bugs which made the effect kind of fragile, so I guess it was a bit of a black art back in the day, without the benefit of the internet to share ideas and documentation. By the time people had figured it out, CGA had been superseded already and was only interesting as a fallback for customers who didn't have EGA/VGA.
Ha, I suppose I just don't see that "38 years later" as a positive thing. It should've been so easy to use that people could utilize 99% of its performance a year after it was released.
I guess my point was that a good design works well for the use cases it was designed for, and a great design is sufficiently flexible to accommodate use cases the designer never thought of. The important use cases at the time (drawing graphs and simple games where the speed of the display adapter wasn't critical) were well documented and performed adequately for the time. With modern eyes we can envisage much more difficult cases, and (surprisingly) the CGA can rise to the challenge more often than we felt entitled to expect it to. So yes, in hindsight we can see some ways in which CGA's design could have been done better, but overall it's remarkably good given the constraints!