mkarcher wrote on 2020-09-28, 22:40:The whole MCGA stuff is very interesting from a technological point of view. It is claimed to be "an improved CGA version with s […]
Show full quote
VileR wrote on 2020-09-28, 22:12:
I'm not sure why a grand total of two people seem to be interested in undocumented surprises like that.... but I'm glad to have learned this, even if I indirectly have to thank a boorish youtube heathen who dares to mutilate screws with a dremel. 😉
The whole MCGA stuff is very interesting from a technological point of view. It is claimed to be "an improved CGA version with some VGA features", but technically, it's quite its own thing!
If you take a look at the PS/2 Model 30 planar (IBM speak for mainboard), you will find only 2 video RAM chips, 64k x 4 bits each. That's not such a surprise at first, because everyone knows MCGA has 64KB of RAM. What is surprising is that MCGA manages to get 320x200 pixels at 256 at full VGA frequencies from an 8-bit video memory, whereas the 32-bit interface of the VGA is also maxed out in that mode. They can't magically quadruple the memory throughput in MCGA, a design on the budget side? Of course they can't, but they can use VRAM instead of DRAM. You get the high-bandwidth video output basically for free on VRAM chips. So MCGA seems to be the first original IBM PC graphics solution that uses VRAM.
Another interesting thing: 80x25 with custom font. 80 characters at a pixel clock of 25MHz is around 3 MHz character rate. The MCGA (if it doesn't use the VRAM serial output for it) needs 2 memory cycles to load character and attributes. This makes it difficult to squeeze in a third memory cycle for the font load. Even if the two cycles for character and attribute are fast-page-mode cycles, the font load won't be. They solve the problem just like CGA/MDA: They have a separate 8 bit wide memory chip that contains font data. But this chip is a RAM chip, not a ROM chip. On my PS/2 planar, it's a HM6264FP-12T 8k x 8 CMOS static SRAM. This memory is not accessible over the system bus, but it can only be read or written by the MCGA ASIC. You load a font by first placing it into the first half of VRAM (in an interesting format), and then instruct a "memory to memory DMA engine" inside the MCGA ASIC to copy font data from the VRAM to the font SRAM. It seems you can choose between a fast transfer that disrupts displaying text and a slow transfer that only takes place during blanking. MCGA seems to be the first affordable IBM PC graphics solution that includes memory copying engine. I specified "affordable" to avoid discussion about the PGA that most likely also was able to copy data on its own. You can't really call the MCGA font copying function an "accellerator function", as it can only copy data from VRAM to the font RAM. Writing a demo that uses the slow "only on blanking" transfer mode to dynamically modify character shapes might be quite interesting, though.
Huh, so that's how MCGA gets its bandwidth for line doubling (it wouldn't need it for the 640x480 monochrome mode ... that actually uses less bandwidth than the Atari ST's higres mono mode: assuming 25.175 MHz pixel clock vs the 32 MHz one on the ST) but VGA mode 13h at 70 Hz 31.46875 kHz 12.5875 MHz 8bpp pixel clock should use 4x the bandwidth. (or~28.322/14.161 MHz for 720x4bpp and 360x8bpp, funny that's it's not just 28.63636/14.31818 MHz, though ... they also could've synthesized those pixel clocks from a 14.3181818 MHz signal, but the PS/2 boards I've seen all look like they use discrete 25 and 28 MHz 2-pin crystals next to the VGA chip rather than using PLLs and dividers instead; 14.318181818*160/91 = 25.174825175 or *180/91 = 28.321678321 ... though they could've just used 910 pixel clocks per line at 28.636363636 MHz for 720 width modes, unless DRAM was really at its bare limit at 28.32 MHz)
I was thinking they might've implemented a line buffer in MCGA to allow for line doubling after scanning out the line at a slower rate (or using page-mode to fill the line buffer quickly, and block less CPU access time to video RAM, but there's also not enough hblank time to fill that with just 1 line buffer, and then you're getting into dual 320x8-bit buffers, or an optimmally sized series of smaller buffers, or a dual-port 320x8 buffer that can start scanning out the new line as soon as the first pixel of the second instance of the doubled line has been displayed on-screen). But VRAM means less hardware to actually design, and was presumably cheaper than SRAM or using dual-sync monitors (with 15.7 and 31.468 kHz modes, though they COULD have done it that way, and still used 70 Hz for the 256 color mode: just use 224 lines rather than 262 lines ... and a 2-sync monitor with dedicated pin for mode detect would've been a lot cheaper than the auto-detect multi-sync monitors NEC was putting out back then).
Though, I'd also simply considered that IBM went really cheap with 64kB of DRAM that required bus saturation during 320x200x8bpp 31 kHz mode, so video RAM would be hogged by the MCGA core during all active scanline time (only hblank and vblank free to external access). But I guess IBM didn't want it to be that slow, and I also guess VRAM was at least less than 2x as expensive as simple DRAM, otherwise dual 64kx8-bit banks of DRAM would have been a cheaper and more useful solution. (it would allow page flipping and free CPU access to the off-screen buffer while access to the on-screen buffer was blocked completely or limited to blanking areas) You wouldn't even need to use a bus transceiver + bank flipping logic with 2 fully separate data paths, but could just do 2-bank interleave for DRAM access, with the CPU reading/writing through an 8-bit latch. (the active screen buffer could be working in page-mode for entire scanlines long and the dual-bank arrangement would allow the same page/row to be held open continuously without the accesses in the alternate bank forcing a row-change and page break)
But just two little 64kx4-bit VRAM chips was cheap enough, it seems. It's also exactly what Sega put into the Mega Drive. (the VDP actually supports 128kB 16-bits wide, but only needs 64kB 8-bits wide for all its video modes to work: the 16-bit mode allows for faster DMA transfers on top of double the memory, and was used in some arcade boards, but no exclusive video modes added)
Also interesting that MCGA has any DMA transfer functions at all within video memory. I wonder if it could be used for accelerating simple block transfers in the monochrome mode, since only 37.5 kB of the 64kB of VRAM are used in that mode. (OTOH support for a 640x400 mono mode at 70 Hz with 2 pages in VRAM would also have been useful ... 320x200x4bpp 16 colors from the 18-bit RGB palette would also have been interesting for 2 pages and some graphics/games use, and would be pretty minimal to add logic wise, assuming packed pixels are used, you've already got 1, 2, and 8 bit packed pixel modes; but OTOH maybe implementing a compatible mode for that in VGA would be difficult, though I'd think it'd just be another modification of chain-4 functionality used for 13h)
Also, I'm not sure IBM's implementation of VGA is actually 32-bit in the conventional sense ... maybe I'm wrong and such could be disproven by the pinouts and traces routed from the VGA ASIC and DRAM, but I have a suspicion that IBM configured that DRAM as 4 physical 8-bit wide banks of 64kB and used 4-way bank-interleave on an 8-bit bus to achieve the same bandwidth, plus additional access slots for CPU access (or memory to memory copy using VGA latches). EGA would've been implemented in the same manner, given VGA is an extension of EGA, and using 4 physical banks that way would be a convenient way to implement 4 bitplanes. Or ... does EGA and VGA not address its bitplanes that way, ie for a given byte address you have 1 byte for each bitplane via memory planes 1, 2, 3, and 4. If external CPU access had been directly mapped to those interleaved byte addresses (ie CPU sees 4 successive plane bytes as 4 adjacent byte addresses) you'd lose efficiency for doing word or longword wise 1-bit operations (fast manipulation of monochrome pixels on 1 plane of a 16 or 32 pixel wide area), but OTOH could manipulate 4 bitplanes at once as 4 interleaved bytes. (ie 32 bits = 4 8-bit wide plane sections, 16-bits = 2 8-bit wide bitplane segments, so on the original AT, you could focus on fast, 4 color 2-bitplane oriented graphics operations, though pure 1-bitplane operations would be byte-wise and less efficient, so actual pixel-accurate shifts would be slower, but coarse 8-pixel-wide copies/moves/clears/etc would be just as fast for 1-bit plane operations, and 2x as fast for handling 2 or 4 planes at once; though I suppose the emphasis was on pixel-accurate shifting for graphics operations and not faster, coarser options or tricks like using up RAM with pre-shifted graphics for byte or word alligned writes ... Atari ST style: the ST, of course, used bitplanes interleaved on a word basis, not byte basis, so there wasn't really a performance trade-off, though some address holes/skipping for working on wider than 16-bit screen areas: OTOH 32-bit copy instructions could quickly move 2 bitplanes at a time ... except in monochrome mode, where you'd be moving 32 contiguous pixels instead)
Had they mapped it that way, with the CPU interface seeing the 4 banks/planes of VGA RAM as successive linear byte addresses, then unchained Mode X would've been linear, too. (OTOH it would've been pretty easy to just allow both memory mapping options, with VGA latches and processor interface able to select the ability to view 4-bank-interleaved byte memory as 4 contiguous, sequential bytes, and bank/plane select determined by 2 bits of the address, with the alternate mapping being planar mode with the external interface seeing 4 adjacent byte addresses as 32 contiguous bits of 1 plane)
If my assumption is right, and EGA (and VGA) were implemented using interleaved 8-bit wide banks, it would make perfect sense from an engineering perspective as you can reduce the number of external pins and traces going from the logic to the DRAM chips, and you minimize the width and number of FIFO stages or latches required for buffering the bitplane data. (interleaving along byte boundaries means you'd only need to have 8 total 8-bit latches/buffer stages in the FIFO, 4 bytes for fetching and 4 bytes for active use by the shift logic for serially shifting 1 bit from each plane into a 4-bit latch which then either gets output as 4-bit RGBI or translated through the 6-bit RGB palette CLUT ... you'd actually be using half the latch space/size that the Atari ST SHIFTER needs to use to do the same thing ... except EGA also implements hardware smooth scrolling, so there's some additional logic and buffering going on there, though the latter might actually be cheapest to implement with a tiny multi-pixel 4-bit packed line buffer, or line segment buffer, rather, especially since EGA modes are strictly 4 bitplanes, not variable plane count like the Amiga, or even the ST, so the final output will always be 4-bit)
Maybe they did it some other way, but I can't see anything being much cheaper than doing it that way. (and it explains the weird planar memory mapping ... they'd have put in added work to map it that way for faster manipulation of 1 bitplane at a time on as wide a row as possible, so you'd have the largest destination buffer width for shifted/manipulated source graphics bits, plus any planning for hardware bitblits would have been biased towards bitwise logical operations at the time, though purely chunky-pixel optimized blitter designs and concepts were obviously a thing, too: the Sinclair Loki project that eventually became the Flare Slipstream did that, as did the Epyx "Handy" project that became the Atari Lynx ... kind of funny that there's some parallels and cross-licensing between Epyx in the US and Konix in the UK with their joysticks, and the Flare team, and the Slipstream+Konix Multystem project was going on in parallel with the Handy, both had financial problems, but Atari took over the Epyx project where Konix sold the farm, but still failed to bring the Multisystem to market ... but 1 of the Flare engineers consulted with Atari on the Panther, then 2 of the 3 Flare engineers ended up designing the Atari Jaguar, while the third joined up with Argonaut Software and designed the Super FX chip for Argonaut and Nintendo, though somewhere between all that there were also upgrades to the Slipstream up to a 32-bit bus version with 16-bit color gouraud shading plus basic texture rendering scaling/rotation engine, and then, of course also the former Amiga and Atari 8-bit Engineers that did the Lynx ended up doing the 3DO chipset ... and John Mathieson of the Flare team and more or less head of the Jaguar team has been at Nvidia since 2001, led development of the Tegra processor, so ... the heart of the Nintendo Switch has Atari Jaguar lineage in it, sort of?)
mkarcher wrote on 2020-09-28, 22:50:
MMaximus wrote on 2020-09-28, 22:21:
But I find it amazing there are still discoveries to be made more than 30 years after this standard was invented! And if it means it's possible to drive a 15khz monitor with a simple VGA card then it's definitely exciting 👍
You can program the VGA card to generate 15kHz compatible timing in a custom mode. Just use the "divide clock by 2" feature that is usually used to obtain 320 pixel modes, but set the card for 640 pixels. Also set the card for fewer lines to keep in line with NTSC field timing. The VGA BIOS does not support modes like this, but I think there is third-party hardware that converts VGA to SCART by just generating CSYNC from VSYNC and HSYNC, and comes with software to set up the correct video timing.
On the other hand, the video solution inside the IBM PS/2 model 25 and IBM PS/2 model 30, called MCGA, has out-of-the-box support to run all CGA modes and the VGA 256-color mode on a 15kHz analog RGB monitor. It likely uses the 14.318 MHz oscillator as pixel clock, just as the orginal CGA did. There is no composite video generator, though, so no 8088MPH composite color artifacts from the MCGA. Also no way of directly connecting an IBM 5153 CGA display to either the MCGA or the VGA, as that monitor requires a digital TTL input.
Around 15 years ago, Apolloboy was using a little 8-bit (maybe ISA, but I think 8-bit) card with TV out NTSC composite/s-video encoding that relied on a driver to manipulate the VGA card to actually work properly. I'm not sure how much of that might have included messing with 15 kHz VGA timing to thus minimize the hardware on that board, but I'll have to look at it again some time. (he gave it to me years after he stopped using a TV for his DOS PC display, so I've got it around somewhere) OTOH, I don't think VGA does interlace, but I also don't remember if the display was interlaced or 200/240 lines.
14.3181818/7.1590909 MHz generated VGA pixel clock and 15.7 kHz output would certainly make it simple to encode, otherwise you'd at least need some ADC+line buffer + DAC weirdness going on. (though still a low cheaper and lower tech going from 400/480 line VGA to interlace 15 kHz than the other way around: ie flicker-fixers can't just use line buffers, they need full framebuffers, or at least framebuffer, singular, they could update a single buffer 1 line at a time that way, but you need a framebuffer since you'd only have access to every other scanline per 60 Hz field)
Hmm, so VGA, or especially MCGA was a smarter, more efficient design than sticking a flicker-fixer on an Amiga 2000? And used VRAM in a cost-cutting measure focused context. (so on the non-Commodore end of the Amiga ... still nothing like what the Ranger chipset was supposed to be like with heavy VRAM usage, not a skimpy 64kB on an 8-bit wide bus, unless there were minimalist, dirt-cheap suggestions for using VRAM that way by Jay Miner as well, which would be really interesting and make CBM look considerably worse than the do for simply not jumping on a more expensive graphics workstation/rich kid/home arcade gaming enthusiast machine prospect: since it was loosely inspired by the x68000)
Shame IBM didn't support a dual-bank 128kB MCGA implementation ... sure it might seem like it'd cut in on VGA's market appeal, but also kind of not if all it did was enable double buffering/page flipping (and possibly extra RAM to use for VRAM to VRAM DMA operations), still no EGA compatibility, no 16 color 640/720 x480/400 VGA modes, and they could've even used that angle up-sell extra premium 512kB VGA implementations (and require 512kB on VGA to emulate the 128kB MCGA functionality, ie basically dual chain-4 pages).
On that note, I'd think that, on paper at least, the 512kB (and larger) VGA clones out there could support multi-page flipping for Mode 13h (since each 256kB chunk gets mapped down to 64kB useable in chain-4), and maybe some/all of those technically can do that, but does any software use it? (unless that's something that got rolled into the VESA umbrella among other things, but would be separate from SVGA-compatible 256 color modes, since it would strictly be MCGA/13h style fixed, dumb 320x200x8bpp framebuffer flipped via multiple 64kB banks) Then again, if you're going to support that via 512kB when set to otherwise-100% IBM VGA compatible mode, it would also be cheap/easy to just implment your own alternate memory map allowing the base 256kB to be mapped as 4 chain-4 pages, depending on the methods used for VGA clone implementation. (though any VGA chipset that could also work with 64 or 128kB configurations would/should already have that capability via some cheap tweaks to actually make it useable in software: ie if those smaller memory sizes were implemented via jumpers or straps for bank select, adding logic gates to bank-switch in hardware would be trivial, so it was just a matter of being seen as a useful or marketable gimick)
Also, obviously, VGA clones didn't implement VGA in the way IBM did (DRAM bank/plane wise) and some definitely made it work with a single bank of 8-bit wide DRAM, probably with heavy use of page-mode and possibly on-chip line buffers (possibly not full-screen-width line buffers), and EGA clones doing the same thing. So the funky pixel addressing was even less relevant/necessary there (and probably actually cost a bit extra to implement), but obviously had to be there for full compatibility. (OTOH, un-implementing it/bypassing it would have been really cheap/simple ... so the incentive for 3rd party standards that eventually fell under the VESA umbrella made even more sense, including for the bottom-end, absolute cheapest economy measure engineering versions of VGA cores ... though I also suppose the 4-byte plane oriented IBM implementation would make 4-byte page mode bursts a fairly obvious starting point for a basic cheap VGA clone)
I'm not sure, but I'd imagine some of the slowest, cheapest VGA cores did a bare minimum of on-chip buffering and just resorted to long, sustained page-mode bursts at the expense of forcing wait states for external processor access. (or uh ... 14.161 MB/s via 8-bit DRAM ... if using only 4-byte bursts ... that's 4 bytes in 282.466 ns, which you could do with 80 ns FPM DRAM, but could use slower if you did longer bursts, or continuous bursts with 70.616 ns page-mode cycles each byte, which most 100 ns FPM DRAM could do, and some 120 ns FPM stuff, too, others close to it like 75 ns rating; but with 8-bit memory, that means 100% bus saturation for the entire active line time, so LOTS of long waits and only hblank/vblank free to access, OTOH any of those same chipsets that offered either 16-bit wide or 2-bank interleave with 8-bit would be able to get away with shorter bursts and slots open for external access)
Unless my memory is borked and I am misremembering seeing VGA cards with just 2 256kx4-bit DRAMs onboard.