VOGONS


First post, by Xebec

User metadata
Rank Newbie
Rank
Newbie

I’d like to understand a little more about how memory (latency) impacted x86 performance over the early years.

I think in the 8086/8088 era - RAM was relatively fast enough, and the CPU was generally slow enough at execution that DRAM was (always?) fast enough for the 8088/8086.

The 80286 era it looks like you could buy RAM fast enough to feed the 286 at 6 / 8 MHz without any latency. Once the 286 hit 12 MHz, I think DRAM at the time was starting to require some kind of latency to keep up with the CPU’s requests. is this true? Was “zero wait state ram” truly zero delay for access?

By the 386 era, I know external cache became a thing (common?), indicating there was some slowdown without cache. Could you buy RAM fast enough to feed a 16 MHz 386 without latency, around the time it launched or a few years later?

Then with the 486, the internal cache became a necessity as a the fully pipelined CPU was very hungry for RAM while running at 25-33MHz at launch.

For all situations above, how does DRAM refresh affect performance - would that cause occasional stalls for the early CPUs or would they not notice it?

And lastly, does the original 486DX-25 and 486DX-33 technically leave some performance on the table because RAM wasn’t fast enough (needing a cache) to feed the CPU?

Reply 1 of 6, by rmay635703

User metadata
Rank Oldbie
Rank
Oldbie

There were 0ws systems with a 386sx-20

So It was possible in the vintage era but the ram used was somewhat rare/uncommon/expensive.
After 2 years on the market, 16mhz was easy to go 0ws, only very early systems or systems with cheap obsolete memory needed a waitstate.

The 286-20 was about the fastest clocked cpu with a 0ws option , the 286-25 occasionally could run 0ws but it was generally overclocking the ram as it was the last CPU to truly drive 0ws which was hard on dram.

What is “interesting “ is comparing the actual Cycle time needed to access ram for a 10mhz version of each of the following
8088 - 286/386 - 386sx - 486
As an example The 486 is interesting because it can pipeline memory transfers in a row
Each cpu architecture has differing abilities at actually requesting a memory read, 8088 takes 4 bus cycles to do much of anything, that is vastly slower than a 286 let alone 486 at the same clock.

So in general…

A 10mhz 286 could generally get away with 100ns for 0ws without issue

A 10mhz 8088 could many times get away with 200ns ram for 0ws (depending on other devices integrated to the board and the specific board)

A 10mhz 486 at 0ws (no such thing exists as 0ws on 486+ ) would be able to page (Fast Page memory) at 0ws with 100ns ram but the initial read theoretically would need wait state(s). 486/Pentium/Pro all use memory at about twice the speed at a given clock as compared to a 386dx
which is why Fast page and EDO were developed and timing schemes got complicated making it so a real 0ws didn’t usually exist. (Selecting a 0ws on a 386-40 or a 486 doesn’t actually mean what one would think it should mean)

Cache could make things better but caching schemes on 286/386 were pretty pathetic albeit better than slow dram that had wait states..

L2 Cache was a must on 486+ but even there cache wasn’t great in that even at rather tepid bus speeds of 33mhz some systems had an initial wait state despite the cache appearing to be fast enough for an across the board 0/0/0 timing

Last edited by rmay635703 on 2024-06-28, 22:55. Edited 2 times in total.

Reply 2 of 6, by jakethompson1

User metadata
Rank Oldbie
Rank
Oldbie

Hello. I agree this is interesting, and there are CompEs on here who will respond and know far more than me, but here are some thoughts and references.

Xebec wrote on 2024-06-28, 22:14:

I’d like to understand a little more about how memory (latency) impacted x86 performance over the early years.

I think in the 8086/8088 era - RAM was relatively fast enough, and the CPU was generally slow enough at execution that DRAM was (always?) fast enough for the 8088/8086.

In addition to the clock rate, remember those two CPUs have multiplexed address and data pins, meaning there is an even longer delay between the address being presented and the data being ready than the clock rate suggests. I also understand that in that era (1981) memory was more complex of an IC to manufacture than a CPU was.

Xebec wrote on 2024-06-28, 22:14:

The 80286 era it looks like you could buy RAM fast enough to feed the 286 at 6 / 8 MHz without any latency. Once the 286 hit 12 MHz, I think DRAM at the time was starting to require some kind of latency to keep up with the CPU’s requests. is this true? Was “zero wait state ram” truly zero delay for access?

I don't have any such early, non-chipset 286 boards. Checking background reading the PC/AT has one wait state for DRAM access hardwired into it so that IBM could use slower (cheaper) DRAM. The XT 286 @ 6 MHz, which came later, does not have that wait state, possibly because DRAM prices for the desired ns rating had gone down.
The VLSI 82C235 286 chipset has "zero wait state" operation (https://13.209.45.252/datasheet?id=e1819e1567 … 793da6dae923b27, p. 74) and I believe I have used it successfully at 16 MHz. I do not remember off hand how many cycles the 286 read/write cycle is anyway, perhaps there is enough slack that allows such operation. Others will join in.

Xebec wrote on 2024-06-28, 22:14:

By the 386 era, I know external cache became a thing (common?), indicating there was some slowdown without cache. Could you buy RAM fast enough to feed a 16 MHz 386 without latency, around the time it launched or a few years later?

The first 386 PC came without cache. Instead, it used special (static column) DRAM. You can read about that here: https://books.google.com/books?id=UwLE_FWJ-_0 … epage&q&f=false

Some 486 chipsets have a special "slow mode" with more aggressive timings for a 16 MHz or 20 MHz CPU, which might have been used for cacheless 486SX-16 or 486SX-20 systems, as alluded here (http://www.bitsavers.org/components/opti/data … _Set_199410.pdf, p. 28). I do not know if there were any true zero wait state (which would mean 2-2-2-2 access, almost like having an untuned external cache) 486 systems.

Xebec wrote on 2024-06-28, 22:14:

For all situations above, how does DRAM refresh affect performance - would that cause occasional stalls for the early CPUs or would they not notice it?

In 8088 systems it's suggested to be an 8% penalty: https://www.jagregory.com/abrash-zen-of-asm/# … -invisible-hand
Later systems have not just cache but "hidden refresh" features to try and help.

Xebec wrote on 2024-06-28, 22:14:

And lastly, does the original 486DX-25 and 486DX-33 technically leave some performance on the table because RAM wasn’t fast enough (needing a cache) to feed the CPU?

Don't think so, on the contrary, it's the internal cache that makes clock doubling and clock tripling viable.

Reply 3 of 6, by jakethompson1

User metadata
Rank Oldbie
Rank
Oldbie

Now I'm curious exactly what the difference is between static-column DRAM versus fast page mode.

Reply 4 of 6, by rmay635703

User metadata
Rank Oldbie
Rank
Oldbie
jakethompson1 wrote on 2024-06-28, 22:54:

Now I'm curious exactly what the difference is between static-column DRAM versus fast page mode.

They are similar in affect.

Fast page and static column allow you to keep the read buffer full by paging at higher speeds than the NS rating would suggest.

Sadly the initial read speed is usually still pretty meh.

Reply 5 of 6, by bakemono

User metadata
Rank Oldbie
Rank
Oldbie

A truly random access from DRAM is basically a 3-step process. First is RAS-precharge (TRP), second is RAS, third is CAS. After that you get data. The advertised access time for DRAM didn't include TRP, based on the rationale that TRP can already be over and done with before you even need to know what address you're reading. But once you read something, you need to do TRP before you can open another row. The time requirement for it is 60~80% of the advertised access time, so for example if you have 100ns DRAM then RAS-precharge might need around 80ns.

When you have an 8088 that only accesses memory once every four clocks anyway, even at 8MHz that is 500ns, which is plenty of time. You never have to wait for RAM. A 286 can access memory every 2 clocks, so now we're down to 250ns, which means that 150ns DRAM is probably too slow once you account for TRP, and the setup/hold/propagation times of other components on the motherboard. Of course, the majority of memory reads by a 286 are going to be sequential program code. So if you have page-mode DRAM you can satisfy those by doing consecutive CAS cycles (with a CAS precharge in between) which is much quicker. This kind of setup was generally called 0WS, even though it's only "mostly 0WS". Sometimes you have to go to another row/page, and then there is a waitstate(s). 386 also uses 2 clocks per memory access, so the "mostly 0WS" continues to work if you have fast enough RAM (70ns or better to run at 33MHz).

486 and later CPUs mostly do burst reads. The length is one cache line, aligned a cache line boundary, so the first word might be a page miss, but all subsequent words are in the same page. The 486 is capable of reading every cycle, so now if we are running a 4-2-2-2 burst at 33MHz, which was fine for a 386, the 486 is suffering a bunch of waits. Thankfully, a 486 has 8KB of cache so it doesn't have to access DRAM as often.

GBAJAM 2024 submission on itch: https://90soft90.itch.io/wreckage

Reply 6 of 6, by Takedasun

User metadata
Rank Member
Rank
Member

Ultrafast 486 utilizing SRAM memory as RAM?