mkarcher wrote on 2024-04-09, 17:45:Not really. On the PC, there is no "line cache", so each text line needs to be retrieved for each scan line of the character. For example, on CGA, in high-res text mode, a character takes 8 pixels at 14.318MHz pixel clock, which is 1.8 megacharacters/second.......
I primarly wrote about the idea of having framebuffer bits lie in the system memory part, and a system that is meant to work in such a way while not necessarily abiding to the way the actual video standards being emulated work. What I wrote would roughly apply for just the character+attribs/framebuffer part the programmer etc. directly manipulates, while character lookup etc. would stay on the "video card" side and not also be stolen from main memory. That would absolutely destroy any performance possible unless the character bits can say in another bank and leverage page mode accesses so that there is no need to swap between two unrelated rows every character and get a huge DRAM induced stalls in the process.
In the same vein, VGA's double scan wouldn't be an obstacle, it is the video controller side that will do that part and spare the CPU side from lost bandwidth. I did forget about the blanking periods but these are parts that the any possible main memory video controller won't have to waste any memory bandwidth on and leave it all to the CPU side as guaranteed access windows. Of course it will require a line buffer etc. in the video card side and all the practical implementations do work that way, and it gets more critical as CPU speed increases. Every video line the controller has to open at least one a new row depending on mode/resolution and page size (since CPU/cache/busmasters destroy any previous state anyway), do a page mode burst within those rows and surrender the memory bus back to CPU, which will then have to suffer through opening any rows it was previously working in. It is not going to be viable without page mode accesses. In the end quite many µs of time will be lost on just these row openings and in a fast system that can be very many thousands of instructions that coulnd't execute, even when not including the time to actually read the actual video data necessary. At higher resolutions the video data reads together will take a substantial amount of time away from the CPU and if the cache of CPU isn't enough, it will have to get stalled until video process finishes, giving a potentially huge loss in performance yet again. Ability to have multiple DRAM banks and have things distributed between them is key to any success here.
My experience with integrated video without their own local memory ended at roughly P3 era, but I do remember that using onboard video vs external card of any sort, showed a very dramatic performance difference which should be explainable by all the DRAM related stalls that such a method has to produce. One motherboard I had could use local DRAMs for integrated video and it showed much better CPU performance, similar to dedicated video card (although video performance itself sucked but that's another story 🤣). I except modern things to fare a lot better, mostly because of much bigger caches, larger DRAM pages and much faster page mode accesses, it definitely seems to work well enough for the modern game consoles, despite opening a new row not being much faster if at all compared to the old memory chips.
I do very much appreciate the writeups on how the actual CGA, MDA etc. controllers do their business, very detailed and easy to understand ! It isn't something I have looked very deeply into myself but if I ever revive my FPGA VLB+ISA video card, I'll definitely get intimately familiar 🤣. The ISA+VLB means that while it is primarly a VLB card, it does connect entire ISA bus and is supposed to be usable from just an ISA slot on a 286 or something older even. I planned to have several video BIOSes too, to leverage capabilities of better CPUs, if there's room (I have not really tried to see what video BIOSes do and how much room is there for 32bit data manipulation, perhaps there isn't at all). I did plan to play as fast and loose as possible without trying to reimplement exactly how the standards work, apart from when that way is absolutely necessary to maintain compatibility with existing software. Access interfaces will definitely look the same to the programmer as the real deals, but what goes behind the scenes and timings of things are gonna be something else entirely...
Jo22 wrote on 2024-04-09, 21:20:Um, I was thinking of using a shared memory that's not a bottle neck, rather than ordinary SDRAM/DDRx RAM.
Something like video RAM, static RAM, dual-ported RAM, or something based on a new, experimental technology.
It can work on the olden computer but won't really carry on into the future and quite possibly will kill the computer because the memory just cannot scale to many tens of Mbytes or even hundreds and beyond... In the end we are stuck with DRAMs (be it vanilla SDRAM, DDR or GDDR flavors). I would love SRAM based main memory in current times but we can only do tens of Mbytes at obscene cost...