VOGONS


First post, by mpe

User metadata
Rank Oldbie
Rank
Oldbie

I've just got this cool motherboard (Elitegroup SI5PI AIO) and I am on a mission to upgrade the L2 cache to 2048k. The maximum documented suze for this board is 1M (either in single or dual-bank config), but the DIP32 sockets are there and there are reports that this might be possible so I am trying.

DSC_5292.jpg
Filename
DSC_5292.jpg
File size
1.05 MiB
Views
1886 views
File license
CC-BY-4.0

Getting 16 or 18 128K x 8 chips wasn't easy as the (fakes?) IS61C1024 from eBay have like 30-40% defect rate. But I ordered enough spares to have 18 working chips which work fine in 1024k mode and on other boards.

Now I am trying to figure out the jumpers. There are two sets of jumpers. One of them JP13, JP14 (and unsoldered JP12) seems to be configuring the tag SRAM size. They are wired directly to D58, D61 and D62 pins of the CPU.

The second group of jumpers (JP15 and JP16) seems to be wiring together high address bits of data SRAMs. Using multimeter I made this drawing.

ecs-pinout.png
Filename
ecs-pinout.png
File size
1.33 MiB
Views
1886 views
File license
CC-BY-4.0

My best guess is that the setting for 2MB is JP15 - 2-3, 4-5 and JP16 2-3, 4-5. JP4 SHORT, JP13 SHORT and I am not quite sure about JP12. Depends if I need 16th address bit of TAG.

DSC_5293 (1).jpeg
Filename
DSC_5293 (1).jpeg
File size
252.23 KiB
Views
1886 views
File license
CC-BY-4.0

When I put all chips in, the BIOS indeed reports 2048k, but hangs during booting to DOS (or later when skipping config.sys). So something isn't quite right.I tried to open/shorten JP12, enable/disable write-through, insert/remove dirty cache SRAM. But nothing helped.

Now my questions.

1. How big (address size) tag and dirty cache chip a Pentium system with 2M cache needs? Is 2 x 32kx8 enough?
2. Any idea if my jumper setup is correct?
3. How do actually motherboards detect cache size? Looks like they must be sampling address bits of cache chips at some point during boot. Would love to understand how this works.

Blog|NexGen 586|S4

Reply 1 of 37, by Tiido

User metadata
Rank l33t
Rank
l33t

JP12 needs to be added and connected and you probably need the largest TAG that you can have there for 2MB. Without JP12 half the cache goes nowhere and you get the crashes you observe.

T-04YBSC, a new YMF71x based sound card & Official VOGONS thread about it
Newly made 4MB 60ns 30pin SIMMs ~
mida sa loed ? nagunii aru ei saa 😜

Reply 2 of 37, by Anonymous Coward

User metadata
Rank l33t
Rank
l33t

Ah, this is great. I've always wanted to see 2MB async cache in action. There are even a few VLB Socket4 boards out there that support 2MB, but I have never been fortunate enough to come across one.

"Will the highways on the internets become more few?" -Gee Dubya
V'Ger XT|Upgraded AT|Ultimate 386|Super VL/EISA 486|SMP VL/EISA Pentium

Reply 3 of 37, by mpe

User metadata
Rank Oldbie
Rank
Oldbie
Tiido wrote on 2020-02-03, 08:39:

JP12 needs to be added and connected and you probably need the largest TAG that you can have there for 2MB. Without JP12 half the cache goes nowhere and you get the crashes you observe.

Thanks. The JP12 can only have effect when having 64k or 128k chips in TAG/DIRTY sockets. Smaller chips don't have the A15 pin as they came in shorter DIP-28 packages.

I tried to shorten it momentarily during reset with a metal tweezer, however no effect. But I will try to solder a proper JP12 header just to be sure.

EDIT: Yes Looks like the required tag size is a simple multiplication given by the Pentium cache line of 32B:

cache size. - tag size
256k - 8K (DIP28 chip)
512k - 16K (DIP28 chip)
1024k - 32K (DIP28 chip)
2048k - 64K (DIP32 chip)

That's for 8bit. For write-back a wider tag is needed (and thus the second DIRTY chip).

Blog|NexGen 586|S4

Reply 5 of 37, by mpe

User metadata
Rank Oldbie
Rank
Oldbie
Nvm1 wrote on 2020-02-03, 15:42:

Thanks. yes. The same board indeed. But unclear whether the author actually managed to get the 2M working...

Blog|NexGen 586|S4

Reply 6 of 37, by mpe

User metadata
Rank Oldbie
Rank
Oldbie

OK Mission accomplished. I soldered in the extra header, installed 1 MBit TAG chips and it is now working! Looks like the jumper needs to be permanently in place,

DSC_5295.jpg
Filename
DSC_5295.jpg
File size
676.74 KiB
Views
1763 views
File license
CC-BY-4.0

ctcm is confused with anything >512k. However, cachechk and speedsys confirm the cache. The speedsys graph is a little unusual:

2048.jpg
Filename
2048.jpg
File size
85.54 KiB
Views
1763 views
File license
CC-BY-4.0
IMG_4964.jpg
Filename
IMG_4964.jpg
File size
369.34 KiB
Views
1763 views
File license
CC-BY-4.0

The only DOS app showing some improvement over 1024k is Quake (20.1fps vs 19.8fps). The rest seems to comfortably fit in the 1M cache already. (DOOM +1 realtick, pcpbench + 0.1fps, ...) I think I will need to better benchmarks that use larger data set, likely in Windows.

Will spent a little bit more time on this board. Currently have "faster" cache setting and 2T write cycles. This is the same as 256k/512k/1024k, but I still believe that with genuine <15ns SRAM chips I should be able to run this at full speed and beat even a 75 MHz mb...

Blog|NexGen 586|S4

Reply 7 of 37, by H3nrik V!

User metadata
Rank Oldbie
Rank
Oldbie

That's pretty awesome! But no surprise that the performance gain is pretty neglible, considering that period's software usually didn't have more than typically 512k of cache to play with ..

Please use the "quote" option if asking questions to what I write - it will really up the chances of me noticing 😀

Reply 8 of 37, by The Serpent Rider

User metadata
Rank l33t++
Rank
l33t++

Asynchronous cache does not benefit the Pentium significantly, because the difference between L2 cache and RAM performance is small. 66 Mhz bus and 64-bit memory access are made it somewhat redundant.

The only DOS app showing some improvement over 1024k is Quake (20.1fps vs 19.8fps)

How much it scores without L2 cache?

I must be some kind of standard: the anonymous gangbanger of the 21st century.

Reply 9 of 37, by mpe

User metadata
Rank Oldbie
Rank
Oldbie
The Serpent Rider wrote on 2020-02-03, 22:42:

The only DOS app showing some improvement over 1024k is Quake (20.1fps vs 19.8fps)

How much it scores without L2 cache?

About 17.1

Asynchronous cache does not benefit the Pentium significantly, because the difference between L2 cache and RAM performance is small. 66 Mhz bus and 64-bit memory access are made it somewhat redundant.

On the other hand extra pipeline stages and the same size (or half if considering late 486 designs) of instruction L1 cache compared to 486 made the cost of a L1 cache miss comparably higher. When your pipeline stalls it does matter if you can resolve it at 3-2-2-2 (async cache hit 430LX) or at 7-4-4-4 (best case DRAM page hit 430LX) or 14-4-4-4 (worst case DRAM page miss 430LX). The bandwidth isn't really that interesting.

In Triton era Intel experimented with cache-less EDO RAM designs, but it wasn't working.

Blog|NexGen 586|S4

Reply 10 of 37, by The Serpent Rider

User metadata
Rank l33t++
Rank
l33t++

In Triton era Intel experimented with cache-less EDO RAM designs, but it wasn't working.

Well, good ol' Triton can perform slower with async L2 cache enabled in some tests.

I must be some kind of standard: the anonymous gangbanger of the 21st century.

Reply 11 of 37, by mpe

User metadata
Rank Oldbie
Rank
Oldbie
The Serpent Rider wrote on 2020-02-04, 00:24:

In Triton era Intel experimented with cache-less EDO RAM designs, but it wasn't working.

Well, good ol' Triton can perform slower with async L2 cache enabled in some tests.

Interesting. Something wasn't quite right. Perhaps some form of forced invalidation kicking in on this board? Also weird how much the RAM bandwidth drops when in async mode. Sadly don't have any FX board with async cache to recreate but I don't see the same on my boards with LX and NX chipsets as well as SiS and Ali.

Blog|NexGen 586|S4

Reply 12 of 37, by dionb

User metadata
Rank l33t++
Rank
l33t++

Agreed that asynch L2 was underwhelming on So5/7, but negative impact? I don't have an i430FX system with asynch L2 atm, but I benchmarked one in the past, a Biostar 8500TEC. L2 disabled gave FLOATmem performance of 56.66, L2 enabled 59.44. Small difference, but definitely slower without cache.

Interestingly, the difference was much bigger on other chipsets with asynch, but invariably it was the cache-off score that was slower, i430FX was fastest overall in both tests. That i430FX RAM controller was a game-changer in terms of performance.

Reply 13 of 37, by Anonymous Coward

User metadata
Rank l33t
Rank
l33t

It doesn't matter if 2MB is actually better. What matters is that it's working, so you have bragging rights.

"Will the highways on the internets become more few?" -Gee Dubya
V'Ger XT|Upgraded AT|Ultimate 386|Super VL/EISA 486|SMP VL/EISA Pentium

Reply 14 of 37, by H3nrik V!

User metadata
Rank Oldbie
Rank
Oldbie
Anonymous Coward wrote on 2020-02-04, 08:40:

It doesn't matter if 2MB is actually better. What matters is that it's working, so you have bragging rights.

I totally second that!

Please use the "quote" option if asking questions to what I write - it will really up the chances of me noticing 😀

Reply 15 of 37, by mpe

User metadata
Rank Oldbie
Rank
Oldbie

Thanks guys.

Next step would be to find (or mod) a Socket 4 board to use the burst L2 cache. That would be a killer combo!

A little know fact is that burst cache wasn't actually invention of the Intel Triton chipset. 430LX/NX as well as SiS 801 do support burst SRAMs too. Although those were non-pipelined SRAMs they should have the same timing improvements over async cache (except for a bit shorter bursts). I bet that would raise the P66 to above P75 performance levels (considering the former is on the 66 MHz bus). Although I am not aware of any such board.

Blog|NexGen 586|S4

Reply 16 of 37, by The Serpent Rider

User metadata
Rank l33t++
Rank
l33t++

A little know fact is that burst cache wasn't actually invention of the Intel Triton chipset. 430LX/NX as well as SiS 801 do support burst SRAMs too

It's probably doesn't matter without COAST module slot.

I must be some kind of standard: the anonymous gangbanger of the 21st century.

Reply 17 of 37, by mpe

User metadata
Rank Oldbie
Rank
Oldbie

COAST is IMHO a Triton thing. Although I am still looking for document called Flexible Cache Solution For the 430FX/HX/VX PCIsets (Rev 3.0) Please someone!

I am pretty sure that some early Pentium OEM systems (based on Mercury and Neptune chipsets - well before Triton) came with synchronous (non-pipelined) burst cache. There were chips produced by IDT and others and they are mentioned in datasheets incl. wirings and bus cycle schemes. Also they are mentioned when you read early Pentium reviews. So it was really a thing although definitely not a mainstream thing.

In fact even Triton supports both pipelined as well as non-pipelined-burst configs. It was after 430HX when they dropped them.

Blog|NexGen 586|S4

Reply 18 of 37, by BastlerMike

User metadata
Rank Member
Rank
Member

Good luck finding a Socket 4 / burst SRAM combo. I once saw one on ebay, it was a FIC PM-1000. It seems to be a very early Pentium board. Look at the date codes, they are all of mid '93

Attachments

  • FIC PM-1000.JPG
    Filename
    FIC PM-1000.JPG
    File size
    552.41 KiB
    Views
    1576 views
    File comment
    PM1000
    File license
    Public domain

Reply 19 of 37, by mpe

User metadata
Rank Oldbie
Rank
Oldbie

Yes. That's what I meant. Thanks for sharing that!

These chips have a very similar pinout to async cache. I was wondering if it would be possible to design a daughterboard with surface mounted burst SRAMs. The board would sit in DIP sockets + extra clock and address strobe signal would be wired from the chipset. Together with a BIOS modification that would be the ultimate tuning.

Blog|NexGen 586|S4