VOGONS


First post, by gregorem

User metadata
Rank Newbie
Rank
Newbie

I hope it is a proper place for thread like this?

So, I read article, which explain how cache addresses work:
https://medium.com/@himanshu0525125/breaking- … et-19e3a28e0662
So we have sets addressed by index address. Each set is divided into blocks, which are addressed by tag address, and particular bytes in a block are described by offset address. Clear.

My board manual informing, that max. cache config is: 512KB total, SRAM 8x 128k*8 and TAG 32K*8. Cache i s direct-mapped.

If I understood it correctly, it means that TAG address is 8bit wide and cover up to 256 blocks, and there are up to 32K (so 32 768) blocks in TAG space. I guess that TAG cache cover all TAG addresses in all sets, so we have 32 768 sets (15bit wide index). So offset address needs to be 9bit wide (32bit address bus) and cover 512 bytes in each block.
Let's calculate: 32 768 sets, each with 256 blocks, each 512 bytes wide gives us: 32 768 x 256 x 512 = 4 294 967 296 bytes (4GB). Nice, but I doubt if a vintage mobo with old chipset could cache the entire 32bit address space. Especially not with 512KB SRAM. So I get something wrong.

Could someone explain me, how that tag cache work? At least with direct-mapped mode?

Reply 1 of 1, by bertrammatrix

User metadata
Rank Member
Rank
Member

My knowledge isn't all that great when it comes to exact specifics, but in a nutshell:

When it comes to direct mapped cache:

256kb will enable you to cache 32mb in write back mode (WB, faster), or 64mb in write through mode (WT, slower)

512kb - 64mb in WB or 128 in WT

1024kb - 128mb in WB.....

Obviously all with the appropriate tag ram size

If you are trying to calculate things the WB/WT probably affects things drastically, since in WB you only effectively have HALF of the cache space to work with (the second half holds a copy of the first half, or something like that)

Now when it comes to L2 cache with these things the difference between WB and WT speed is pretty small (unlike in L1), and WT works fine /is sufficient for most, as long as all your ram is "cached" all is well.

Personally I have not observed that more cache then necessary in any way improves performance, ie 512kb with 64mb ram will score identically to 1024kb with 64mb, all else being equal. I suspect this is just the nature of direct mapped cache (or I am wrong)

Some boards also allow you to mess with things like the cache policy "always dirty", "7+1 or 8 bits" are things you may see, one setting usually yields much better performance then the other is all I know about these. This may play into your calculation somehow.