Can a faster ISA Graphics card be built?

Reply 20 of 64, by mkarcher

Posted on 2024-04-07, 23:27

mkarcher Offline

Rank l33t

Rank: l33t
Posts: 2904
Joined: 2019-01-19, 16:29
Location: Germany

Jo22 wrote on 2024-04-07, 21:48:

mkarcher wrote on 2024-04-07, 21:24:

Jo22 wrote on 2024-04-07, 21:05:

However, backwards compatibility with the older architecture (IRQ, DMA, 8-Bit i/o, ~14 MHz clock pin) and existing PC cards caused performance penalties.

The 14MHz clock pin is not related to bus performance at all. No bus cycles are synchronized to that pin (except, of course, if the system clock happens to be synchronized to that pin, like on the PC/XT).

I mean, sure, speed wise, those ~14 MHz would be good enough for expansion cards.

If an expansion card uses the 14MHz signal for anything that is related to bus cycle speed, the card is plain broken. The ISA bus has two clock signals: The bus clock (which is 4.77MHz on the PC/XT, 6MHz on the original AT and 8MHz on later AT-class computers), and the reference oscillator, which is fixed at 14.318MHz. It is OK, although uncommon, to synchronize ISA signals to the bus clock, but the 14.318MHz signal can not be assumed to be locked to the bus clock. The 14.318 signal is not locked to the bus and system clock in all 8MHz Turbo XTs, 10MHz Turbo XTs and of course the AT and later computers. It most likely is locked to the bus/system clock in the 9.54 MHz Turbo XT systems, which used the 28.636 MHz oscillator, like on the page you linked.

You might want to point out the CGA card as a counter-example, but I don't think it counts. While it is true that the memory clock of the CGA card runs off the 14.318 MHz oscillator, and thus the CGA card provides memory access to the host only in intervals given by the 14.318 MHz clock, this is due to the architecture of that card and the pixel clock required to drive NTSC timings. If there were no 14.318MHz signal on the bus, the CGA card would have had an local 14.318 MHz oscillator. In the end, the CGA designers just decided that the host should get one chance to access a byte of video memory per character clock (and the PC/XT misses two out of three of these chances even on REP STOSW). The 14 MHz signal did not prevent the EGA card to increase the character clock (by increasing the pixel clock to 16.257 MHz) and give the host more chances to access video memory. The VGA card increased the pixel clock further to 25/28MHz, and thus allowed way more video memory access, yet with the same 14.318 MHz signal on the bus.

Jo22 wrote on 2024-04-07, 21:48:

The 80286 front side bus pretty much is available via ISA, but some legacy parts of the PC/XT architecture remain.
Things like wait states, recovery times, electrical specs and so on.

While you had one wait state on expansion cards for 16-bit cycles on the AT (just as the AT also used for RAM), RAM cards (and other cards that claim areas of 128KB contigous memory address space) can remove that wait state for memory cycles by activating the /0WS signal, and thus operate at maximum 286 bus performance.

Jo22 wrote on 2024-04-07, 21:48:

That's one reason as to why later 80286 systems did hide behind a chipset, rather than driving ISA bus natively.
The ISA bus quickly fell behind to what later 80286 CPUs had been capable of.

The reason the ISA bus fell behind is not mainly rooted in the architecture of the AT bus, but simply because the ISA clock was defined to not exceed 8 (or 8.33) MHz. So it's not the PC/XT legacy compatibility that made ISA too slow for later 286 systems, but it is AT compatibility.

Reply 21 of 64, by NightShadowPT

Posted on 2024-04-08, 19:55

NightShadowPT Offline

Rank Member

Rank: Member
Posts: 117
Joined: 2020-09-14, 14:35

Well, this thread ended up a lot more interesting than I had anticipated 😀

Just for the record, my initial question was to have this "super card" work with existing software (i.e.: as a way to accelerate the existing games). Anything that would require rewriting code would be a nice theoretical exercise, but not much useful.

If i may expand on the question a little bit, what about EISA? My classic PC uses an EISA Bus and unfortunately there are not many EISA VGA cards. I have one, the Compaq Qvision 1280, but it performs on pair with the best ISA cards, not better.

NightShadowPT
----------------
Compaq Deskpro M 486/66 - 64MB Ram - Compaq QVision 1MB - Orpheus II Sound
Card - 4GB SCSI HDD + 4GB CF Card - SCSI CD-ROM Plextor PX-32TSi - Adaptec WideSCSI AHA-2740W - 3COM Etherlink III Card

Reply 22 of 64, by bakemono

Posted on 2024-04-08, 21:02

bakemono Offline

Rank Oldbie

Rank: Oldbie
Posts: 758
Joined: 2018-01-15, 06:56

mkarcher wrote on 2024-04-07, 23:27:

While you had one wait state on expansion cards for 16-bit cycles on the AT (just as the AT also used for RAM), RAM cards (and other cards that claim areas of 128KB contigous memory address space) can remove that wait state for memory cycles by activating the /0WS signal, and thus operate at maximum 286 bus performance.

Using only two clocks for every word transaction would reach ~8MB/s, which would be better than most if not all existing ISA cards. Maybe there are some cards around that can do it, but I've only seen cards that do ~5MB/s (three clocks per word) at best, and most are slower.

EISA could be a lot faster than ISA in theory, but I'm guessing it would depend on how well the burst mode is implemented by the motherboard chipset.

BTW, If someone wanted to go nuts, they could design a video card with a ribbon cable connecting it to a 72-pin SIMM daughter card, providing a linear framebuffer at the full system memory bandwidth.

GBAJAM 2024 submission on itch: https://90soft90.itch.io/wreckage

Reply 23 of 64, by BitWrangler

Posted on 2024-04-08, 22:07

BitWrangler Offline

Rank l33t++

Rank: l33t++
Posts: 7685
Joined: 2017-10-11, 00:55
Location: Ontario

bakemono wrote on 2024-04-08, 21:02:

BTW, If someone wanted to go nuts, they could design a video card with a ribbon cable connecting it to a 72-pin SIMM daughter card, providing a linear framebuffer at the full system memory bandwidth.

I was thinking of an ATA100 cable from an interposer between the CPU and socket, and a daughterboard that goes on the feature connector, but it's hardly an ISA card then, more like jury rigged local bus.

Unicorn herding operations are proceeding, but all the totes of hens teeth and barrels of rocking horse poop give them plenty of hiding spots.

Reply 24 of 64, by Tiido

Posted on 2024-04-08, 22:27

Tiido Offline

Rank l33t

Rank: l33t
Posts: 3176
Joined: 2018-01-14, 04:40
Location: Norway (used to be Estonia)

There's the big complication that the memory bus is usually unavailable, always accessed by the CPU so it is actually difficult to use that data, without some sort of a complicated dual part mechanism that arbitrates the accesses on both sides. Video memory buses are just as busy as any CPU buses are, and moreover they often need exact access patterns to give the DACs pixels to output in time. This is one reason why integrated solutions which steal memory from system RAM cause a significant performance drop.

T-04YBSC, a new YMF71x based sound card & Official VOGONS thread about it
Newly made 4MB 60ns 30pin SIMMs ~
mida sa loed ? nagunii aru ei saa 😜

Reply 25 of 64, by Shagittarius

Posted on 2024-04-08, 22:35

Shagittarius Offline

Rank Oldbie

Rank: Oldbie
Posts: 1630
Joined: 2007-12-20, 06:49
Location: California, USA

I'd rather see an ISA card with a HDMI or DP out that does internal upscaling to 4k. Actually I'd like that for a PCI card too.

Maybe you could even do some post processing on the image for retro visuals.

Also make it so you can redefine the amount of RAM it has too for ultimate compatibility reasons. Could it load different profiles to make it different cards like dosbox?

Reply 26 of 64, by wierd_w

Posted on 2024-04-09, 00:54

wierd_w Offline

Rank Oldbie

Rank: Oldbie
Posts: 735
Joined: 2023-07-14, 06:20

Shagittarius wrote on 2024-04-08, 22:35:

I'd rather see an ISA card with a HDMI or DP out that does internal upscaling to 4k. Actually I'd like that for a PCI card too.

Maybe you could even do some post processing on the image for retro visuals.

Also make it so you can redefine the amount of RAM it has too for ultimate compatibility reasons. Could it load different profiles to make it different cards like dosbox?

So, a video card that's a fancy fpga? not sure how you would handle a live reprogramming though...

But fpga with a new synthesis uploaded would let you change basically anything. Might even be able to integrate sound hardware for digital presentation over hdmi or dp, and thus let you pick between what card you are using that way too.

Reply 27 of 64, by DrAnthony

Posted on 2024-04-09, 00:59

DrAnthony Offline

Rank Newbie

Rank: Newbie
Posts: 99
Joined: 2021-04-16, 22:37

Shagittarius wrote on 2024-04-08, 22:35:

I'd rather see an ISA card with a HDMI or DP out that does internal upscaling to 4k. Actually I'd like that for a PCI card too.

Maybe you could even do some post processing on the image for retro visuals.

Also make it so you can redefine the amount of RAM it has too for ultimate compatibility reasons. Could it load different profiles to make it different cards like dosbox?

You could imagine an FPGA based card like the aforementioned Fury that had a simple SVGA core and an upscaler that applied something like CRT Royale for the digital out. Not to say that any of that would be easy, I'm not really aware of any FPGA implementations of classic video cards, but it's definitely in the realm of possibilities considering the existence of N64 cores for Mister.

Reply 28 of 64, by midicollector

Posted on 2024-04-09, 01:26

midicollector Offline

Rank Member

Rank: Member
Posts: 225
Joined: 2023-08-27, 08:38

There isn’t any software that would benefit from it. That’s the other side of the equation. Also keep in mind that you have to turn the speed of a lot of early software down on faster machines to get it to run right.

Reply 29 of 64, by Jo22

Posted on 2024-04-09, 02:56

Jo22 Offline

Rank l33t++

Rank: l33t++
Posts: 10078
Joined: 2009-12-13, 07:06
Location: Europe

Tiido wrote on 2024-04-08, 22:27:

There's the big complication that the memory bus is usually unavailable, always accessed by the CPU so it is actually difficult to use that data, without some sort of a complicated dual part mechanism that arbitrates the accesses on both sides. Video memory buses are just as busy as any CPU buses are, and moreover they often need exact access patterns to give the DACs pixels to output in time. This is one reason why integrated solutions which steal memory from system RAM cause a significant performance drop.

A shared memory architecture, isn't that something the PC Jr did use once ?
But without fast, dual-ported chips?

Generaly speaking, assuming that the CPU and graphics chip would be on same die or chip, like with Cyrix MediaGX (?) or Mx Macs, couldn't this be also an advantage at some point ?

I remember that a shared memory concept always had been seen as something rather negative,
but if all components could access the same RAM like in an efficient network topology..

Edit: These are just some thoughts (it's night time here), it's more of a rhetorical thing.

"Time, it seems, doesn't flow. For some it's fast, for some it's slow.
In what to one race is no time at all, another race can rise and fall..." - The Minstrel

//My video channel//

Reply 30 of 64, by BitWrangler

Posted on 2024-04-09, 03:43

BitWrangler Offline

Rank l33t++

Rank: l33t++
Posts: 7685
Joined: 2017-10-11, 00:55
Location: Ontario

Jo22 wrote on 2024-04-09, 02:56:

Generaly speaking, assuming that the CPU and graphics chip would be on same die or chip, like with Cyrix MediaGX (?) or Mx Macs, couldn't this be also an advantage at some point ?

Yeah, write to Intel and AMD, tell them they are doing it wrong, instead of crappy budget GFX/CPU combos, they should try to get 250W of CPU plus 500W of graphics on the same little slice of silicon.

Unicorn herding operations are proceeding, but all the totes of hens teeth and barrels of rocking horse poop give them plenty of hiding spots.

Reply 31 of 64, by darry

Posted on 2024-04-09, 04:13

darry Offline

Rank l33t++

Rank: l33t++
Posts: 6133
Joined: 2014-01-20, 06:27
Location: Canada

BitWrangler wrote on 2024-04-09, 03:43:

Jo22 wrote on 2024-04-09, 02:56:

Generaly speaking, assuming that the CPU and graphics chip would be on same die or chip, like with Cyrix MediaGX (?) or Mx Macs, couldn't this be also an advantage at some point ?

Yeah, write to Intel and AMD, tell them they are doing it wrong, instead of crappy budget GFX/CPU combos, they should try to get 250W of CPU plus 500W of graphics on the same little slice of silicon.

Have recent modern game consoles CPU+GPU combos (APUs) not essentially been going in that direction ?

I get that some people may want/need a personal computer system that draws nearly a kilowatt or even more under load, but I personally find that both undesirable (as it would apply to me) and potentially unsustainable because of the strain on electrical infrastructure and the challenges of thermal management in warmer climats (unless one enjoys bankrolling and watching a potential deathmatch between one's HVAC and personal computer 😉 ).

I decided a while back that I do not want a GPU with a max power enveloppe of more than 150 to 185ish watts and, on the CPU front, up to 95ish watts.

This is just my opinion, of course (and quite off-topic, I realize).

Reply 32 of 64, by wierd_w

Posted on 2024-04-09, 04:19

wierd_w Offline

Rank Oldbie

Rank: Oldbie
Posts: 735
Joined: 2023-07-14, 06:20

No need to get so saucy.

Rather, tell them to stop making highly fragile lga sockets. 😁

Semiseriously, a socket designed for vertical installation (not really a slotket, but similarly vertical) would permit heatsinks on both sides, with installable gpu chip on the reverse side of the cpu. That would keep data and signal lines short enough for reliable multi-ghz rate (without doubling as fucking antennas), and the setup mostly simple and configurable.

A pipedream, but I could see that being made to work.

Reply 33 of 64, by mkarcher

Posted on 2024-04-09, 06:52

mkarcher Offline

Rank l33t

Rank: l33t
Posts: 2904
Joined: 2019-01-19, 16:29
Location: Germany

Shagittarius wrote on 2024-04-08, 22:35:

I'd rather see an ISA card with a HDMI or DP out that does internal upscaling to 4k. Actually I'd like that for a PCI card too.

Without upscaling, there is the CRT Terminator: CRT Terminator Digital VGA Feature Card ISA DV1000 .

Reply 34 of 64, by mkarcher

Posted on 2024-04-09, 07:27

mkarcher Offline

Rank l33t

Rank: l33t
Posts: 2904
Joined: 2019-01-19, 16:29
Location: Germany

bakemono wrote on 2024-04-08, 21:02:

mkarcher wrote on 2024-04-07, 23:27:

While you had one wait state on expansion cards for 16-bit cycles on the AT (just as the AT also used for RAM), RAM cards (and other cards that claim areas of 128KB contigous memory address space) can remove that wait state for memory cycles by activating the /0WS signal, and thus operate at maximum 286 bus performance.

Using only two clocks for every word transaction would reach ~8MB/s, which would be better than most if not all existing ISA cards. Maybe there are some cards around that can do it, but I've only seen cards that do ~5MB/s (three clocks per word) at best, and most are slower.

That's true indeed, but that's not the fault of the ISA cards. For example, the first ET4000AX card I had (there are other fast ISA cards as well, but these were very common) provided around 6.5MB/s in a Compaq Deskpro 386/20, but provided disappointingly low performance in a 486 board, which could be improved to 5.3MB/s by optimizing the BIOS set-up. The key point was the "LDEV sample point".

One issue with the 16-bit ISA bus is that it is heavily optimized for the 286 architecture. And while that architecture has a throughput of 2 clocks per word on memory cycles, the time from the address appearing on the 286 address pins to the transfer being finished is 2.5 clocks. The 286 drives the address of the next cycle during the last half clock of the previous cycle to allow pipelined address decoding. On the ISA bus, you only get to see the top 7 address bits early, which selects a 128K area. This can be used to select which memory expansion card well serve a cycle, but it won't suffice to test whether the cycle is an FPM hit or miss. Conveniently, the video memory range of the VGA card, A000-BFFF is a 128K block. To achieve 0WS cycles, according to the ISA specification you have to assert the /0WS signal based on the early address bits. If you decode including the "late" latched address bits, you won't be able to generate the /0WS signal in time to skip the first wait state.

I have yet to see a 486 chipset (especially one with local bus support) that is able to pipeline ISA writes to get the optimal throughput. I mention local bus support specifically, because the usual model on ISA bridges for 486 boards is "subtractive decode", i.e. the ISA bridge only forwards those cycles to the ISA bus that are not claimed by any local bus device, so the ISA bridge commonly waits for a timeout. A VLB card asserts the signal "/LDEV" signal within 1 clock (meant for up to 33MHz bus clock) or 2 clocks (meant for 40/50 MHz bus clock) after the cycle has been started. And that's why the safe setting "LDEV sample point: late" (or however it is called on your BIOS) kills ISA performance. Further note that even if a 486 board does not support VLB or any kind of proprietary local bus, the Weitek coprocessor also is a local bus slave, and unless the chipset specifically decodes the Weitek address range itself, the "one cycle delay until it is decided whether this cycle is to be forwarded to ISA" issue applies to Weitek-capable systems as well. Actually, before we had chipsets specifically designed to support local bus slots, the first local bus 486 boards connected the VL signals to Weitek control pins of the chipset.

So, why doesn't the 386 system I mentioned in the introduction get to 8MB/s? In that case, I was able to use datasheets to calculate why 6.5MB/s was actually optimal performance. While I don't remember the number, it is related to the REP STOSD repetition rate of the 386DX at 20MHz combined with the pseudo-synchronous nature of the ISA bus in that system: The ISA bus in that system is clocked at 8MHz, but if the chipset knows that a cycle is "incoming right now", it can slightly delay one clock to sync it to the 20MHz host clock.

bakemono wrote on 2024-04-08, 21:02:

EISA could be a lot faster than ISA in theory, but I'm guessing it would depend on how well the burst mode is implemented by the motherboard chipset.

Burst mode would need to be implemented by the motherboard chipset (by write combining, most likely), as well as the graphics chips. IIRC we didn't get useful write combining until PCI arrived. The PCI cycle latency is so big that PCI wouldn't have had a chance to compete with VLB if there was no write combining going on, so chipset manufacturers likely had to implement the conversion of linear ascending writes into burst cycles. Also, burst mode would need to be implemented by the specific EISA video card. While I have the S3 928 datasheet at hand, which includes the pinout in EISA mode, I do not have the "86C805/86C928 EISA Bus Configuration Design Guide" at hand - and only that publication specifies how to connect the 928 to the EISA bus. I noticed that the 928 EISA datasheet does not include any pin for burst negotiation, so if that chip supports EISA bursting (I doubt it), it would needed to be added using external glue logic.

Reply 35 of 64, by Tiido

Posted on 2024-04-09, 09:55

Tiido Offline

Rank l33t

Rank: l33t
Posts: 3176
Joined: 2018-01-14, 04:40
Location: Norway (used to be Estonia)

Jo22 wrote on 2024-04-09, 02:56:
A shared memory architecture, isn't that something the PC Jr did use once ? But without fast, dual-ported chips? […]
Show full quote
A shared memory architecture, isn't that something the PC Jr did use once ?
But without fast, dual-ported chips?

Generaly speaking, assuming that the CPU and graphics chip would be on same die or chip, like with Cyrix MediaGX (?) or Mx Macs, couldn't this be also an advantage at some point ?

I remember that a shared memory concept always had been seen as something rather negative,
but if all components could access the same RAM like in an efficient network topology..

Edit: These are just some thoughts (it's night time here), it's more of a rhetorical thing.

A number of retro computers did similar stuff but they were very slow to begin with, and this sort of memory access interleaving they did easily halved actual CPU throughput. Many of the video capability limitations also stemmed from such things, you only had so many accesses per frame into which the entire image had to be able to be composed out of...

Modern systems are largely able to do this stuff due to ginormous caches, main memory has been far too slow for a long time for the CPU to run anything out of without being choked. Much of the operation is done in the caches of the CPU, leaving memory bus free for other stuff such as accesses from GPU and other bus masters.

It takes a lot of bandwidth to scan out a full framebuffer, and especially at high resolutions and refresh rates. For example 32bit 1920 x 1080 @ 60Hz requires roughly 475MB/second from the main memory, and in these older systems your memory bandwidth was far lower than that. DOS text mode output (80x25) can be done with just 230kB/sec which isn't that much of a problem, when one ignores that it happens realtime with specific timing requirements (unlike CPU, the video processing cannot be stalled or there will be "image corruption" from lack of data arriving in time). VGA mode13h (320x200@70Hz 8bpp) takes ~4.4MB/sec which isn't so bad and 640x480@60Hz with 4bpp taking ~9MB/sec, which is a significant portion of the rather low memory bandwidths of the hardware of the time. Continuous process that requires opening pages is what kills most of the achievable bandwidth, since it is the most slowest thing one can do in a DRAM system, even on modern memories opening a new page can take hundreds on ns and it is a loss that can only be avoided by interleaved memory bank accesses, assuming the memory access patterns permit such. Not impossible, since it has been successfully, but has many gothcas and will always come with a significant performance difference.

T-04YBSC, a new YMF71x based sound card & Official VOGONS thread about it
Newly made 4MB 60ns 30pin SIMMs ~
mida sa loed ? nagunii aru ei saa 😜

Reply 36 of 64, by mkarcher

Posted on 2024-04-09, 17:45

mkarcher Offline

Rank l33t

Rank: l33t
Posts: 2904
Joined: 2019-01-19, 16:29
Location: Germany

Tiido wrote on 2024-04-09, 09:55:

DOS text mode output (80x25) can be done with just 230kB/sec

Not really. On the PC, there is no "line cache", so each text line needs to be retrieved for each scan line of the character. For example, on CGA, in high-res text mode, a character takes 8 pixels at 14.318MHz pixel clock, which is 1.8 megacharacters/second. You need to read a character byte and an attribute byte per character, so the required memory bandwidth for that mode is 3.6MB/s. The CGA card has an 8-bit memory bus (8 16k x 1 chips at U50..U57) on this PCB revision, so this requires a memory cycle every 280ns. The installed RAM chips have a row access time of 120ns, and a read cycle time of 270ns, so obviously, in high-res text mode, the full memory bandwidth is required for scanout. That's the reason why you get snow when the CPU accesses memory during scanout: In that case, a memory cycle that would be required for correct scan-out is replaced by the cycle requested by the CPU, and the scan-out logic latches the CPU data instead of the image data.

In all other modes, the CGA card only needs a memory cycle every 560ns, and only uses every second cycle slot. Every other cycle slot would be available for CPU use - but the 8088 can't generate bus cycles fast enough to achieve the theoretical CGA bandwidth of 1.8MB/s. There are videos on Youtube with people running CGA cards at highly overclocked ISA busses, and IIRC they managed to get to every second "bus slot" actually used, which is 900KB/s, which is still way faster than the XT could do.

The MDA also operates at 1.8 megacharacters per second, but with a 9 pixel character box. Yet the MDA has no snow issues. This is not just because it uses static RAM with a cycle time of 200ns, but mainly because it has an internal memory bus width of 16 bits instead of 8 bits, so only 1.8 megacycles per second are required.

With EGA, the bandwidth requirement increased slightly by operating at 2 megacharacters per second in 640x350 mode. On the other hand, the EGA now had software loadable fonts in the same video RAM. Due to very clever design, IBM managed to stil get good performance for that time: The EGA card has 32-bit memory access, which (as I understand it) can be split into two independent 16-bit busses. This split is used in text modes, with one 16-bit bus serving characters and attributes, and the other 16-bit bus serving character data. So for EGA, the 2 megacharacters per second require 2 megacycles per second. If I remember correctly, high-res text mode uses a configuration with 80% of the memory bandwidth allocated to scanout and 20% to the ISA bus, so the memory obviously needs to handle 2.5 megacycles per second, which is less demanding than on the CGA.

On the VGA card, bandwidth requirement increased, because the scanout requires 3.2 megacharacters per second, which translates to 6.4MB/s, or 3.2 megacycles per second. As with the EGA, the VGA also allocates 20% of the bandwidth to the bus, so the cycles run at 4 megacycles per second, which is just slightly higher than CGA, but on the VGA, only 800KB/s can theoretically be used on the ISA bus, assuming you can issue cycles fast enough.

A reference claims that VGA allows all memory access time slots to be used by the bus during blanking, so this would allow a whopping 4MB/s - but obviously the 8-bit ISA bus is nowhere near that speed, so it can not claim all the theoretically available access time slots.

Tiido wrote on 2024-04-09, 09:55:

VGA mode13h (320x200@70Hz 8bpp) takes ~4.4MB/sec which isn't so bad and 640x480@60Hz with 4bpp taking ~9MB/sec, which is a significant portion of the rather low memory bandwidths of the hardware of the time.

You missed the fact that VGA mode 13h is double-scanned at 320x400, so the bandwidth requirement is identical to the the 640x480 mode. At a dot clock of 25.175MHz and 8 pixels per 32-bit access on the VGA, this is a cycle rate of 3.14 megacycles per second, or a scan-out rate at 12.6 MB/s. The deviation from the 9MB/s is that I calculated the required bandwidth during the active period, while you seem to have calculated the average bandwidth including blanking. The VGA card works perfectly in an PC/XT with an on-board memory bus, and can pull of this high data rate only due to its 32-bit memory bus.

Tiido wrote on 2024-04-09, 09:55:

Continuous process that requires opening pages is what kills most of the achievable bandwidth, since it is the most slowest thing one can do in a DRAM system, even on modern memories opening a new page can take hundreds on ns

Exactly, opening/closing pages is an expensive operation, and that's what (fast) page mode DRAM allows to avoid. All classic video cards I mentioned up to here do not use page mode. Remeber that all cards allow interleaving of CPU and scanout cycles, and while you might be able to guarantee page hits during continous video scanout cycles, every CPU cycle and the first video cycle after that is a potential page miss. When the CGA card was designed, page mode access was not even supported by the common RAM chips of the time. While the MDA used static RAM (and discussion of page mode makes no sense regarding the MDA), the Hercules card is DRAM based and does not use page mode as well. The Hercules card has a very peculiar design: It's 64KB DRAM uses an 8-bit bus just as the CGA does, but it has a 2K x 8 SRAM used as mirror RAM for attribute data. This explains why you can get multiple text pages on the CGA, but not on the Hercules, even though it has more RAM than the CGA card.

Tiido wrote on 2024-04-09, 09:55:

and it is a loss that can only be avoided by interleaved memory bank accesses, assuming the memory access patterns permit such. Not impossible, since it has been successfully, but has many gothcas and will always come with a significant performance difference.

The best way to avoid bank open/close time is to use fast page mode RAM, and to run "bursts" that mostly hit the page. That's the improvement that made the jump from ~500-800KB/s usable bus bandwidth of early VGA/SVGA cards to multiple MB/s. "Modern" SVGA cards had a FIFO for display data that gets filled using page-mode (burst) reads, and a second FIFO that buffers CPU writes, and can often be drained using page-mode (burst) writes (if the CPU writes consecutive addresses). Due to the FIFOs, the hard real-time requirement on memory access is mitigated, which is the prerequisite to allow burst writes from the bus. Read from the video memory to the CPU will not be prefetched to any FIFO, and can not be posted into a buffer, so the read performance did not get the same upshot as the write performance on that generation of cards, wich makes screen-to-screen copies via the bus annoyingly slow, so this task is best shifted to an accelerator (mentioning this brings this post back on topic. Yeah!)

A final remark: The IBM MCGA video system managed to pull off the stunt of displaying 320x200 in 256 colors on VGA monitors using the same timing as the VGA card, so it requires the same 12.6MB/s data rate. Yet, the PS/2 model 30 just has 2 memory chips, each 64K x 4, which is an 8-bit bus. This requires a scan-out cycle time of 80ns. How did IBM pull that off? Easy... at the time the MCGA was designed, dual-ported VRAM got affordable, at least at that capacity. While most people associate VRAM with high-performance graphics systems, IBMs entry-level PS/2 graphics solution also relied on VRAM to provide sufficient memory bandwidth to display that colorful mode.

Reply 37 of 64, by Sphere478

Posted on 2024-04-09, 18:58

Sphere478 Offline

Rank l33t++

Rank: l33t++
Posts: 5835
Joined: 2021-01-13, 04:45

DrAnthony wrote on 2024-04-07, 13:33:

I mean there's always room for some improvement, especially decades later, but the primary bottleneck historically was the bus itself. Even designs that had little to no optimization showed massive gains moving to VLB. I doubt there's really all that much left to wring out.

I wonder how a real 3d card would react over isa bus. Because basically all isa gpus were 2d chips if I recall. So high end isa (all isa) systems exist that can modestly power some 3d applications but the graphics hardware was never made for that bus. Granted, probably rightly so. And I assume that basically because of the isa bus it would be useless but I’m curious to build it anyway. As a thought experiment if nothing else.

Like a isa voodoo 2 or something might be interesting.

Drivers may be a trick..

Sphere's PCB projects.
-
Sphere’s socket 5/7 cpu collection.
-
SUCCESSFUL K6-2+ to K6-3+ Full Cache Enable Mod
-
Tyan S1564S to S1564D single to dual processor conversion (also s1563 and s1562)

Reply 38 of 64, by BitWrangler

Posted on 2024-04-09, 19:12

BitWrangler Offline

Rank l33t++

Rank: l33t++
Posts: 7685
Joined: 2017-10-11, 00:55
Location: Ontario

Though it should be pointed out that on the fastest 486 systems, properly embussed 3D acceleration is getting close to a max of 2x boost in frame rates on early games that were capable of 3D acceleration and many are only 1.5x boost. So compare that to the frame rates of "software mode" on machines likely to be stuck with ISA only, like 386/486 hybrids, and those are only getting at most 5 frames per second, put the absolute best acceleration for them on ISA and you'll be lucky to gain 2-3fps on most stuff and maybe get to a whole still unplayable 10fps on that one thing they run superb.

Unicorn herding operations are proceeding, but all the totes of hens teeth and barrels of rocking horse poop give them plenty of hiding spots.

Reply 39 of 64, by megatron-uk

Posted on 2024-04-09, 19:21

megatron-uk Offline

Rank Oldbie

Rank: Oldbie
Posts: 1832
Joined: 2010-09-07, 10:53
Location: UK

Sphere478 wrote on 2024-04-09, 18:58:
I wonder how a real 3d card would react over isa bus. Because basically all isa gpus were 2d chips if I recall. So high end isa […]
Show full quote

DrAnthony wrote on 2024-04-07, 13:33:

I mean there's always room for some improvement, especially decades later, but the primary bottleneck historically was the bus itself. Even designs that had little to no optimization showed massive gains moving to VLB. I doubt there's really all that much left to wring out.

I wonder how a real 3d card would react over isa bus. Because basically all isa gpus were 2d chips if I recall. So high end isa (all isa) systems exist that can modestly power some 3d applications but the graphics hardware was never made for that bus. Granted, probably rightly so. And I assume that basically because of the isa bus it would be useless but I’m curious to build it anyway. As a thought experiment if nothing else.

Like a isa voodoo 2 or something might be interesting.

Drivers may be a trick..

My understanding is a great deal of the early 3D cards still leaned on the host CPU to do a lot of the heavy lifting of calculations - I suspect anything with an ISA bus would be simply far too slow to feed a card with data at the rate needed for acceptable frame rates.

You would probably want a design where you could pass over the rendering in its entirety to the card.

Of course, just like the other design suggestions, none of this would accelerate any existing games!

My collection database and technical wiki:
https://www.target-earth.net

Main menu