VOGONS


need help with L2 cache.

Topic actions

First post, by cnpr

User metadata
Rank Newbie
Rank
Newbie

Hi all, I have a Asus 486sv2gx4 rev 2.0 using a amd 133mhz using 32mb as 60ns. bios does detect the l2 cache but for some reason, when i test it in speedsys, it won't read the level 2 cache and I don't get the staircase pattern that i see in other posts. I also tested this in cachechk and is says I'm using one cache. The s-ram cache is 512k at 15ns and the tag ram is the same, all using 32 pins variants. the pins on the board is set correctly as well. other than replacing the cache what can i do to make sure that cache is function properly.

Reply 1 of 11, by jakethompson1

User metadata
Rank Oldbie
Rank
Oldbie

1. It could be the turbo button, disabling cache.

2. If you have 512K cache, you only need a 32Kx8 (256Kbit, 28-pin) tag RAM. But if there were address lines floating on the tag RAM, I expect the cache wouldn't be detected or your system would crash.

Reply 2 of 11, by cnpr

User metadata
Rank Newbie
Rank
Newbie

sometimes when the floppy drive load it will start up for a few seconds then either just freezes or gives up. i also had a 28 pin tag ram in there before but i noticed no difference. even when the cache is enabled it will boot the floppy and still get no L2 cache

Last edited by cnpr on 2025-08-27, 00:59. Edited 1 time in total.

Reply 3 of 11, by jakethompson1

User metadata
Rank Oldbie
Rank
Oldbie

Erratic floppy drive operation with an Am5x86 can indicate issues with the CPU pins related to write-back internal cache. There should be other threads about this

Reply 4 of 11, by AncapDude

User metadata
Rank Newbie
Rank
Newbie

I spotted the same behaviour in speedsys some days ago on my 486 rig. It changes when I change timings and wait states in BIOS. So at least I got the chart I expected for reading but always get a flatline for writing. Still have no clue why.

Reply 5 of 11, by cnpr

User metadata
Rank Newbie
Rank
Newbie

just how does the turbo button affect L2 cache anyway?

Reply 6 of 11, by jakethompson1

User metadata
Rank Oldbie
Rank
Oldbie
AncapDude wrote on 2025-08-27, 01:00:

I spotted the same behaviour in speedsys some days ago on my 486 rig. It changes when I change timings and wait states in BIOS. So at least I got the chart I expected for reading but always get a flatline for writing. Still have no clue why.

This is discussed somewhere by mkarcher, I believe it's because the write access pattern by speedsys is such that it's composed entirely of cache misses, and because 486 chips and archetypal chipset cache are "write-around" (on a write miss, cache is just skipped and the write goes straight to RAM) rather than "write-allocate" (in which a cache is filled first on a write-miss just like it would be on a read-miss, then the in-cache data are modified in hopes of achieving write-back behavior), all you see is the RAM write performance.

cnpr wrote on 2025-08-27, 01:02:

just how does the turbo button affect L2 cache anyway?

One way of implementing a turbo button on these late 486 systems where they moved away from directly trying to mess with the clock rate, is to suppress the cache and force maximum RAM wait states in de-turbo mode to slow the system down. This would be confusing as cachechk performance would deteriorate, yet the BIOS (correctly) would say that there is 512K (or whatever) external cache.

Reply 7 of 11, by AncapDude

User metadata
Rank Newbie
Rank
Newbie
jakethompson1 wrote on 2025-08-27, 01:20:

This is discussed somewhere by mkarcher, I believe it's because the write access pattern by speedsys is such that it's composed entirely of cache misses, and because 486 chips and archetypal chipset cache are "write-around" (on a write miss, cache is just skipped and the write goes straight to RAM) rather than "write-allocate" (in which a cache is filled first on a write-miss just like it would be on a read-miss, then the in-cache data are modified in hopes of achieving write-back behavior), all you see is the RAM write performance.

Thx for the explanation. So i don't need to be worried about the flatline? Otherwise i would configure the cache to Write-Through a try. I also tried lowering the ram size but that didn't change anything either.

Reply 8 of 11, by jakethompson1

User metadata
Rank Oldbie
Rank
Oldbie
AncapDude wrote on 2025-08-27, 02:07:
jakethompson1 wrote on 2025-08-27, 01:20:

This is discussed somewhere by mkarcher, I believe it's because the write access pattern by speedsys is such that it's composed entirely of cache misses, and because 486 chips and archetypal chipset cache are "write-around" (on a write miss, cache is just skipped and the write goes straight to RAM) rather than "write-allocate" (in which a cache is filled first on a write-miss just like it would be on a read-miss, then the in-cache data are modified in hopes of achieving write-back behavior), all you see is the RAM write performance.

Thx for the explanation. So i don't need to be worried about the flatline? Otherwise i would configure the cache to Write-Through a try. I also tried lowering the ram size but that didn't change anything either.

As I've been back in to playing with 486es for about five years now I can summarize this and hopefully avoid you a lot of hassle 😁

First remember there are two caches, an 8K (or 16K in later 486es) cache internal to the 486 CPU, and a 0K-1024K external cache managed by the chipset, usually 256K.

Write-through means that cache and RAM are kept synchronized; cache can never become "newer" than RAM or in other words "dirty". The only caveat is if a device other than the CPU writes directly to RAM (like the floppy controller), the cache must also be updated or simply invalidated; either way the next time the CPU reads that location, it will see the new data.
Write-back tries to keep writes affecting only the cache and postpones the update to RAM as long as possible. The hope is that for frequently written locations like a loop counter, many writes to slower RAM can be eliminated entirely. But this means the RAM contents are potentially "older" than cache. So if the floppy controller tries to access RAM (not just writes now but reads) and the corresponding cache line is "dirty" the CPU or cache has to step in and block that access and update the affected area in RAM first, before the floppy controller is allowed to read it.

To cut down on bookkeeping overhead and take advantage of burst reads (this is what 2-1-1-1 and 3-1-1-1 and 3-2-2-2 are about) 486 cache only operates in "lines" of 16 bytes (486 internal cache).
In a write-back cache, each line gets a bit of metadata keeping track of whether it is still "clean" (freshly copied from RAM) or "dirty" (the CPU has written to it, even if it's only 1 it changed in the 128 bit cache line). The other piece of metadata is the "tag" which for a direct-mapped external cache, just stores a partial memory address in RAM of the data currently stored in the cache; the other bits of that memory address are implied based on the position in cache.
It's the cache controller's job to set the dirty bit to a '1' if it isn't already, every time the cache line is written.

Here is where performance tuning gets complicated. Earlier 486 chipsets require 32 bits of data RAM (or 64 bits for double-bank), 8 bits of tag RAM, and 1 bit of dirty RAM for a properly working write-back cache.
For some reason, many motherboard makers omit the dirty RAM to save on the cost of one chip, and instead simply wire the would-be dirty RAM's data line to a resistor to make it always read '1'. I have no idea why doing this cost cutting, yet going to the trouble to design-in and populate a Weitek 4167 socket on the board as is common on affected boards, made any sense. Such a configuration is called "always dirty" and operates as that sounds. Cache read misses cause the prior 16 bytes of data being forced out of the cache, to get written to RAM, even if the contents are still completely identical to RAM--because the cache has no way of tracking whether that is still true. Overall memory performance is decreased accordingly. In write-through, of course, no dirty bit is needed or beneficial.

In the worst case, such as the OPTi 495SLC chipset, the chipset is designed to permanently operate its external cache in write-back "always dirty" mode, or disabled, and you can do nothing about it other than save up for a better board.
Other chipsets, such as the UMC 481, can operate the external cache in write-back mode with a dedicated Dirty RAM, or disabled. While it is common for motherboard makers to omit the Dirty RAM thereby implementing "Always Dirty", fixing that could be as simple as populating an empty socket the board maker left for this RAM, or by hacking the board like so: UM481/UM491 "Always Dirty" modification HOWTO
Other chipsets, such as the SiS 461, can operate their external cache in either write-back mode with a dedicated Dirty RAM, or in write-through mode, or disabled. If the motherboard maker omitted the Dirty RAM, this gives you the choices: use write-back in "Always Dirty" configuration, use write-through, or modify the board as above to achieve write-back with dirty bit.
The final generation of chipsets--basically the last generation VLB ones with all the bells and whistles for P24D/DX4/Am5x86 CPUs, and anything PCI--is most flexible. They can do write-through, write-back with an external dirty bit (which is 99.9% of the time going to mean always-dirty), or what is called write-back in 7+1 mode. In 7+1 mode, one bit of tag RAM is stolen to instead track whether the line is dirty. Stealing this bit halves the cacheable area in main memory. So if you have 256K cache, 64MB is your cacheable limit in always-dirty, working external dirty bit, or write-through mode, and 32MB is your cacheable limit in 7+1 write-back mode. Double those amounts for 512K and double them again for 1024K. Arbitrary chipset limitations could reduce these limits further. Those limits are obviously far more relevant now than when these decisions were made.

Up through the original DX2, the internal cache is always write-through, and the external cache is write-back or nothing in common chipsets, or at least write-back by default with the possibility of switching to write-through.
This seems a little backward to me; especially as the CPU's clock multiplier becomes greater than 1, the opposite configuration makes more sense. "Write buffers" ease the pain of write-through, by allowing the CPU to continue to execute code while prior writes to RAM or external cache finish, and would be most effective at 1X multiplier. Later 486 CPUs are capable of running the internal cache in write-back, while later 486 chipsets are more likely to be able to run the external cache in write-through.

Anyway, I recently got an SiS 496 board to try for the first time so I have benchmarks at hand. In both cases I have the Am5x86-133 running its internal cache in write-back.
doom -timedemo demo3: 2134 gametics in 1596 realtics in 256K write-through; 2134 gametics in 1592 gametics in 256K write-back; bonus: 2134 gametics in 1548 realtics in 512K write-back.
excerpted speedsys overall memory scores: L2 47.37 MB/s in write-through and 49.19 MB/s in write-back; Memory 38.56 MB/s in write-through and 37.74 MB/s.
It makes sense. Write-back makes L2 read misses more expensive, and makes L2 write hits cheaper, as compared to write-through.
And it makes so little difference in doom, I believe, because the Am5x86's write-back internal cache is providing almost all the benefit of a write-back cache, such that there isn't much harm in making the external cache write-through.

Reply 9 of 11, by cnpr

User metadata
Rank Newbie
Rank
Newbie

how will i know if my l2 cache is working right? also does it matter if its 32 or 28 on cache? also will the l2 be working just not reporting it to speedsys and cachechk? i know there is a bios(based on a beta) on the fixes the dirty tag will that bios should that help resolve this?

Reply 10 of 11, by jakethompson1

User metadata
Rank Oldbie
Rank
Oldbie
cnpr wrote on 2025-08-27, 03:22:

how will i know if my l2 cache is working right? also does it matter if its 32 or 28 on cache? also will the l2 be working just not reporting it to speedsys and cachechk? i know there is a bios(based on a beta) on the fixes the dirty tag will that bios should that help resolve this?

  • You can trust the benchmark in cachechk/speedsys to tell you if the cache is working right. Maybe post your cachechk output?
  • 256K single bank uses four 32-pin data RAMs and one 28-pin tag RAM. 256K double bank uses eight 28-pin data RAMs and one 28-pin tag RAM. 512K uses four (single bank) or eight (double bank) 32-pin data RAMs and one 28-pin tag RAM. 1024K double bank uses eight 32-pin data RAMs and one 32-pin tag RAM. As you can see, only in the 1 megabyte cache case do you need such a large tag RAM as for it to be 32-pin. I don't know what will happen with the extra pins in your case, of running a 512K configuration where you should be using a 28-pin 32Kx8 tag RAM, but have a larger one installed.
  • The beta BIOS with the chipset register changed in MODBIN will fix your RAM (cache miss) timings as compared to always-dirty.
  • For comparison, a fully tuned SiS 471 with Am5x86 @ 160 MHz (slightly overclocked) shows in cachechk: 7 us/KB, 15 us/KB, and 28 us/KB for reads. What are they in your situation?

Reply 11 of 11, by cnpr

User metadata
Rank Newbie
Rank
Newbie

im not sure ill have to look into that tomorrow. I found that the bios chip is a 27c512-20 what will be a good eeprom replacement for that chip