VOGONS


First post, by RayeR

User metadata
Rank Oldbie
Rank
Oldbie

Hi,
I'd like to collect more info about this from various users of various 486 systems/chipsets.
I'm mostly interested in FSB speed over std 33MHz (40/50MHz) - what is the best (100% stable) L2 cache timing that you can reach with what cache chips access time (ns) and chips configuration? What is the timing margin of specific chipset that cannot be pushed over with faster cache chips? Resp. how fast cache chips make sense to improve timings and where faster chips doesn't have any further effect? Maybe this question is not just determined by chipset and cache chips but also MB layout so I guess there maybe slightly different results for different MBs with the same chipset...

As I posted before here: Re: Disappointing experience with Octek Hippo 10 motherboard (Socket 3) I'm mostly interested about this UMC chipset UM8498F that unfortunately doesn't have any datasheet available. On this MB I can select 0/1 write WS and 2-1-1-1/3-1-1-1/3-2-2-2 burst read timings for L2. Until I run safe at 33MHz I can reach the fastest timings 0WS, 2-1-1-1. But I want to run overclocked CPU at 40MHz FSB. With 256kB L2 I have to relax to 1WS, 2-1-1-1. Later I ordered 15ns 128kB chips from Ali and changed L2 config to 4*128k and it was very unstable so I had to relax more at 1 WS, 3-2-2-2. I replaced tag chip by 12ns but didn't help. I suspected that I got some garbage from Ali so I made my own 128k cache adapters that use 12ns SOJ32 chips from IDT/Renesas from realiable source (mouser). The PCB is 4 layer with VCC/GND planes inside and 100n cap on bottom side in center of SOJ chip. I expected it would run better but any faster timing than 1 WS, 3-2-2-2 leads to immediate crash of DOS/4GW or in Doom very soon... 🙁

Currently I can't test my cache chips on a better MB. I guess that I hit some chipset limitation rather than cache chips limitation. I have other VLB MB with OPTi 82C895 chipset that is well documented. In the datasheet they mentioned that the chipset is optimized to use L2 dual bank interleave to reach high bandwidth with lower speed (cheaper) cache chips. But it seems that dual bank config is possible only with 32k chips (256kB total) but not with 128k chips (512k total). There's no option to use e.g. 8*64k chips dual bank (can't fit as only one bank has DIL32 sockets). So as in 512k config only one bank is populated it cannot utilize the interleaving and it seems it also cannot utilize faster cache chips. Maybe the same applies to UM8498F...?

Gigabyte GA-P67-DS3-B3, Core i7-2600K @4,5GHz, 8GB DDR3, 128GB SSD, GTX970(GF7900GT), SB Audigy + YMF724F + DreamBlaster combo + LPC2ISA

Reply 1 of 13, by pshipkov

User metadata
Rank l33t
Rank
l33t

To reach tightest timings you will need a big pool of L2 cache chips to pick from. From my experience the binning is a time-consuming process that requires intuition and close attention paid, otherwise the outcome will be disappointing.
9x1024 for 1Mb L2 cache buffer is very challenging.
9x256 and 5x512 are like stroll in the park compared to 9x1024

286, 386, 486 UMC chipsets are more problematic than SiS silicon when it comes to tight DRAM and cache timings. This will be an obstacle. If you can - consider some known good SiS boards, or the 1-2 UMC based ones that we know can do this.

Chip ratings can easily be misleading. 10ns 12ns 15ns 20ns are kind of no factor. Thorough binning FTW.
One exception are 8ns SOJ chips. You can immediately tell the difference with them.

retro bits and bytes

Reply 2 of 13, by jakethompson1

User metadata
Rank Oldbie
Rank
Oldbie

I have also been into this for a good while now so let me share some thoughts.

First, you didn't mention them, but let's leave PCI motherboards aside because: there is no 4/5 divider to run PCI at 33 MHz when operating at 40 MHz, and, the 2/3 divider on UM8881 produces an unclean 33 MHz signal for PCI (discussed on another thread).

The double banking is important, especially when running above 33 MHz, because the "idle" bank of cache gets to see the next memory address at least one cycle ahead of time, while the CPU is busy reading from the opposite bank. This relies on 486 burst mode, so that the chipset can predict the next address and feed it into the idle bank. Since writes don't use burst mode, double banking doesn't help there, so it makes sense that you have to add a cache SRAM write W/S as bus speed increases. I believe write W/S is in addition to the Y in X-Y-Y-Y, so "1 W/S" is actually 2 at 2-1-1-1 and 3 at 3-2-2-2 (the SiS 471 has a good datasheet about this; as you know, there is nothing for UMC).

Having 32-pin sockets for one bank and 28-pin sockets for the other is a board layout limitation and not a chipset one. I'm not sure why as it doesn't save that much space. You can read the MB-8433-UUD thread and feipoa's manual for it for an example of modifying a 28-pin bank to take 32-pin chips. On the other hand, there are other boards such as M912 (real cache) and the "Techmedia" board (with onboard SVGA and IDE), both of which use the UM498 chipset, and have all 32-pin sockets and therefore can run with 1MB cache.

I have found UMC (481, 491, and 498) chipsets the most picky with cache chips, including speed and brand. On a UM481 (386 40 MHz) board I have, it came with a single bank of Toshiba (20ns) cache chips. I ended up putting UMC chips in one bank and moved the Toshiba chips to the opposite bank and can run on fast timings. If I try replacing the Toshiba chips with 15ns UMC ones, or simply swap the chips among the two banks, it no longer works on the fastest timings.

On another UM491 (486DX2 80 MHz) board, I have to run at 3-2-2-2 1 WS, like you. It came with a mix of 20ns and 15ns UMC cache chips. I tried replacing them with fresh (12 or 15ns, don't remember) IDT chips from DigiKey, and it wouldn't work even at 3-2-2-2. So more pickiness about brand. I had a working theory that some VLB cards might inhibit the fastest cache timings (if they loaded down the bus or something), but after trying out SiS chipsets, abandoned that, and I think it's a UMC chipset issue after all.

On two other 386-40 boards (ALI M1429 and SiS 460), I haven't had issues with fast timings. I do find something nearly universal to 386-40 boards is having to relax DRAM to 1 W/S with cache enabled, but being able to get away with 0 W/S with cache disabled, as if there's some extra slack time when accessing DRAM that disappears when turning on cache. It could be related to all three 386-40 chipsets I've used actually being combo 386/486 chipsets.

I have had the least problems with SiS 461 & SiS 471, 486 boards, esp. upgrading the 471 board to 1MB cache. Something to look out for on the 471 though, is that the Award BIOS for it may default to 8+0 (Always Dirty) mode rather than 7+1 (rob one bit of tag as dirty bit) mode, and you need MODBIN to flip that bit (can't be done from setup), for better performance. The SiS 461, the peer to the UMC 491 on late 5V-only 486 boards, doesn't support 7+1 but does have the ability to be switched from write-back to write-through external cache (something the UMC 491 lacks), another way to dodge the "always dirty" issue. The UMC 498 bios, on the other hand, does have a 7+1 option and even defaults to it, but the SiS 471 is better once properly tuned.

If you have been tinkering with DRAM refresh settings (CAS-only vs. CAS before RAS, and refresh divider), put them back on the most conservative settings, as disabling the cache seems to mask problems when you've been too aggressive with those, possibly because the more frequent DRAM access with cache off "accidentally" refreshes DRAM if the right pattern of accesses gets made.

I also thought about board layout. Some VLB boards put the cache, CPU, and chipset all next to each other, while others have the cache and CPU on opposite ends of the board, and the traces have to slip underneath the VLB slots. I do not know if that makes any practical difference.

If there is a particular chipset I have mentioned that you are more interested in, I can pull out a board and re-test and tell you exactly what settings I can get away with and with what cache chips (to the extent of doom timedemo, loading up Linux and going into X11 and running a few programs to fill up memory, etc.)

Reply 3 of 13, by RayeR

User metadata
Rank Oldbie
Rank
Oldbie

Hm, it seems a bit of magic what's happening there. Can somebody explain more detailed or did someone some scoping of cache signals to reveal why some chips works together better than other ones? I guess that when I bought my 12ns chips in a single piece of cut reel they could be from a single silicon manufacturing batch with closer parameters but maybe they mixed due to binning... Do you think that really the whole range 10-20ns are the same silicon just binned to diff. speed grades?

I have a lot of 32k 15ns chips but only 8 of 128k chips so I can't do much combining with this. The buy from Ali was a lot of 5 pcs but one was defective so I left with 4 functional. I'm not sure if it makes sense to mix that 15ns Winbonds with new 12ns IDTs...

>pshipkov
It's a lot of combinations to test many chips with each other, power cycle, boot, test, removing chips from sockets... Do you use some specific tool for testing L2 cache stability? Maybe it would be nice to design some FPGA-based SRAM test machine that would automatically probe various timings, create logs and then one could match best ones according to measured params...

I have to note that I'm currently interested in VLB MBs only and I run it just with a single VLB VGA card (Madao's S3 765VL), I know that more VLB cards adds more bus load and cause other stability problems. This MB has onboard IDE so I don't need other VLB card...

I don't know what is the UM8498F maximum L2 cache size but I guess it would be similar to OPTi 82C895 that can do only up to 512k single channel, so probably no option to upgrade it to dual bank 1MB.

I have option for tag bits 7+1 / 8+0 in SETUP and default is 7+1 that gives better performance.

I tried to play with Refresh method option but it has not effect on stability. For this testing I use a single 16MB (8-chip) 60ns FPM SIMM. I noticed that with 2 SIMMs (or single 32MB with 16 DRAM chips) the stability gets worse and I had to change DRAM Page Mode from "Fast" to "Normal" or increase DRAM WS from 0 to 1 so probably nothing to improve here...

Your experience with UM491 (486DX2 80 MHz) looks similar so probably no sense for hunting for higher speed cache chips. Maybe the chipset engineers did take in account a specific behavior of slower speed chips and "compensate" it some way in chipset design so when we populate faster chips it could break this "compensation" and makes it even worse?
I'm thinking about revert back to 256k L2 as the speed difference is near zero. Only advantage would be larger cacheability RAM area but as I have slowdown with more SIMMs...

Gigabyte GA-P67-DS3-B3, Core i7-2600K @4,5GHz, 8GB DDR3, 128GB SSD, GTX970(GF7900GT), SB Audigy + YMF724F + DreamBlaster combo + LPC2ISA

Reply 4 of 13, by pshipkov

User metadata
Rank l33t
Rank
l33t
RayeR wrote on 2025-06-16, 19:23:

Hm, it seems a bit of magic what's happening there. Can somebody explain more detailed or did someone some scoping of cache signals to reveal why some chips works together better than other ones? I guess that when I bought my 12ns chips in a single piece of cut reel they could be from a single silicon manufacturing batch with closer parameters but maybe they mixed due to binning... Do you think that really the whole range 10-20ns are the same silicon just binned to diff. speed grades?

It does not work that way unfortunately.
So far i haven't been able to find a pattern to follow. Like at all.
The only way is to do enough combinations to find a working set.
I probed with scope in multiple occasions. There is nothing obvious that differentiates a stable/working from unstable/non-working sets.
There must be a difference but is subtle.
To give an example - in some situations the L2 cache is technically functional except some software produce incorrect results.
A common case that many vogon users can recognize is Doom, Quake, etc., timedemos go off rail.
So, no hangs at POST, no crashes, but data corruption which leads to incorrect compute. How you relate this down to signal level and other electrical properties and how you recognize the patterns is something i would like to grasp as well. 😀

RayeR wrote on 2025-06-16, 19:23:

I have a lot of 32k 15ns chips but only 8 of 128k chips so I can't do much combining with this. The buy from Ali was a lot of 5 pcs but one was defective so I left with 4 functional. I'm not sure if it makes sense to mix that 15ns Winbonds with new 12ns IDTs...

Let's quantify "a lot". I think you will need 100ish to get to a really solid "0-ws" set, if you are unlucky.
If lucky - maybe 40-50 will get you there.
Having only 8 128k chips = zero chance.

RayeR wrote on 2025-06-16, 19:23:

>pshipkov
It's a lot of combinations to test many chips with each other, power cycle, boot, test, removing chips from sockets... Do you use some specific tool for testing L2 cache stability? Maybe it would be nice to design some FPGA-based SRAM test machine that would automatically probe various timings, create logs and then one could match best ones according to measured params...

Don't power-down. Swap the L2 chips in place while power is on. Just hard reset.
You can do the same for everything - DRAM, extension cards, CPUs. Saves you precious seconds that quickly accumulate into minutes and hours.
I spoke in other threads how i go about testing. Will repeat shortly here.
Start with a random set. Then start rotating chips from within the set - get all combinations covered.
If this does not work start swapping chips with ones from the pool on the side.
You have to do it methodically. Each new chip must be tried in all slots until first pass discarded.
If this does not get you anywhere shift the installed chips and repeat the process.
It is not fun. Use a bit of combinatorics here.

First i try to get past POST and BOOT to DOS.
Then set of lightweight DOS interactive graphics and other tests - Wolf3D, Doom, local storage testing.
Then onto Windows.
Then into serious computation tasks.
If fail at any step - more or less the above process repeats.
Ugh.

Over time you build some intuition how to approach the whole thing and that often leads to quick results.
After several years i can wrangle a prickly board+componients within a day or two. At the end of the cycle i can with very high confidence say if the rig is a go or not.

Along the way you will distill a set of trusted components that will give you stable starting point when testing stuff.

RayeR wrote on 2025-06-16, 19:23:

I have to note that I'm currently interested in VLB MBs only and I run it just with a single VLB VGA card (Madao's S3 765VL), I know that more VLB cards adds more bus load and cause other stability problems. This MB has onboard IDE so I don't need other VLB card...

I hear you about VLB motherboards. They are fun.
Madao's S3s possess quite a few virtues of their own but are (very) problematic for overclocking and tight wait states.
If you can, grab a Diamond S3 Trio64 DRAM-T. These take the most punishment before blink.

RayeR wrote on 2025-06-16, 19:23:

I tried to play with Refresh method option but it has not effect on stability. For this testing I use a single 16MB (8-chip) 60ns FPM SIMM. I noticed that with 2 SIMMs (or single 32MB with 16 DRAM chips) the stability gets worse and I had to change DRAM Page Mode from "Fast" to "Normal" or increase DRAM WS from 0 to 1 so probably nothing to improve here...

More DRAM chips = worsened stability. Just stating the known fact.
Sometimes it is possible to exchange lowered DRAM timings with tightened L2 cache ones.
It is a thing, but either one of them is a compromise.

RayeR wrote on 2025-06-16, 19:23:

Your experience with UM491 (486DX2 80 MHz) looks similar so probably no sense for hunting for higher speed cache chips. Maybe the chipset engineers did take in account a specific behavior of slower speed chips and "compensate" it some way in chipset design so when we populate faster chips it could break this "compensation" and makes it even worse?
I'm thinking about revert back to 256k L2 as the speed difference is near zero. Only advantage would be larger cacheability RAM area but as I have slowdown with more SIMMs...

256Kb L2 cache at tighter (est) settings > bigger buffer with relaxed timings.
If you can do 256Kb - go for it.

Last edited by pshipkov on 2025-06-17, 06:27. Edited 1 time in total.

retro bits and bytes

Reply 5 of 13, by RayeR

User metadata
Rank Oldbie
Rank
Oldbie

Ah, nice you already tried to scope it. Did you have fast enough scope? Not sure what to look for but maybe how data lines gets delayed after CS# or OE# signals but with randomly changing data it's a mess. I still think there must be some measurable differences in special test device that could reveal how chips differs. Maybe also PCB layout has some importance as various cache sockets may have different traces length that add some minor delay that may combine with chip delays itself so maybe then delays would compensate some way if you put right chips in right sockets...

BTW if you find a combination that runs 0WS on the edge but stable at common room temp. did you try to heat/freeze the chips if it breaks or remain stable?

I have about 100 32k chips, not sure how many 15ns and 20ns, maybe enough pool to select something.
Swapping chips under power doesn't makes much sense to me as you still needs hard reset. If you use CF/SSD it's not problem to make power cycle, more time takes the boot and testing.
Of course I did chip swapping under power when necessary - but for flashROM when doing a hot flash before I got a programmer 😀

Currently I use Madao's VGA with 1WS ROM image as 0WS has some weirdness that nobody clearly explained to me. I plan to tweak 1WS ROM image (original from STB) for better performance, I already found that boosting MCLK or changing to 1-cycle EDO makes it faster...
I don't have much big stock of old HW and in our country it's now nearly impossible to buy something for a reasonable price, ebay includes significant shipping costs esp. from US so I don't get any other VLB VGA. I have only Trio32 1MB (not expandable) and Cirrus Logic so I build 765VL replica as it's best I could get...

Last edited by RayeR on 2025-06-16, 23:27. Edited 1 time in total.

Gigabyte GA-P67-DS3-B3, Core i7-2600K @4,5GHz, 8GB DDR3, 128GB SSD, GTX970(GF7900GT), SB Audigy + YMF724F + DreamBlaster combo + LPC2ISA

Reply 6 of 13, by jakethompson1

User metadata
Rank Oldbie
Rank
Oldbie

The UM498 can work with 1MB (double-bank cache).

You can see the fastest settings I was able to use here, at 40 MHz (UMC Green CPU). It is not optimal, but is still quite fast, even compared to a tuned 2-1-1-1 0WS system operating at 33 MHz.

Before declaring victory, you need to allow the system to heat up a bit, and run some software that is a bit more stressful on RAM/cache returning accurate results than small DOS utilities from disk. I booted into Linux (Slackware 3.9) and ran a few utilities (netscape, gimp, xv) switching back and between them to make sure they didn't crash. To compile something (gcc) would have been even better. An old FAQ talks about this: https://tldp.org/FAQ/sig11/html/index.html

I am using those dodgy 10ns ISSI 1024 cache chips from ebay, and a 32MB, 60ns SIMM here.

Still, I recommend you look out for an SiS 471 based board rather than continuing to get frustrated with the UM498. Maybe you can find one on AmiBay a little easier, since you are in Europe.

Reply 7 of 13, by jakethompson1

User metadata
Rank Oldbie
Rank
Oldbie

The SiS 471 datasheet does have a table in it with needed SRAM speeds for various cache settings. I'm going to translate it here into typical BIOS style (cache burst read settings & cache write W/S).

At 25 MHz,
2-1-1-1 read & 0 W/S write double-bank requires 35ns or better;
2-1-1-1 read & 0 W/S write single-bank requires 25ns or better.

At 33 MHz,
2-1-1-1 read & 0 W/S write double-bank requires 20ns or better;
2-1-1-1 read & 0 W/S write single-bank requires 15ns or better.

At 40 MHz,
2-1-1-1 read & 0 W/S write double-bank requires 12ns or better;
2-1-1-1 read & 1 W/S write double-bank requires 20ns or better;
2-1-1-1 read & 0 W/S write single-bank is impossible;
2-2-2-2 read & 0 W/S write single-bank requires 12ns or better;
2-2-2-2 read & 1 W/S write single-bank requires 20ns or better.

At 50 Mhz,
2-1-1-1 double-bank operation is impossible;
3-1-1-1 read & 0 W/S write double-bank operation requires 12ns or better;
3-1-1-1 single-bank operation is impossible;
3-2-2-2 read & 0 W/S write single-bank operation requires 20ns or better.

As you can see, double-bank vs. single-bank makes a big difference at 40 MHz. So you really need 8-chip (plus) tag configuration, not 4-chip.
3-1-1-1 & 1 W/S double-bank operation, by SiS standards, is pretty bad. That is why we say the UMC chipsets don't do well on fast cache timings.

Reply 8 of 13, by pshipkov

User metadata
Rank l33t
Rank
l33t

i am going to comment for the sake of clarity.

i know you translated here some relevant parts from the chipset manual but if somebody reads your post without scanning the thread the information can be very misleading.

we know that a2111 at up to 66mhz can be achieved with complete stability. this does not invalidate the tech specifications but enhances them.

retro bits and bytes

Reply 9 of 13, by jakethompson1

User metadata
Rank Oldbie
Rank
Oldbie
pshipkov wrote on 2025-06-19, 03:18:

i am going to comment for the sake of clarity.

i know you translated here some relevant parts from the chipset manual but if somebody reads your post without scanning the thread the information can be very misleading.

we know that a2111 at up to 66mhz can be achieved with complete stability. this does not invalidate the tech specifications but enhances them.

No problem. I was aiming to quote what was officially supported by a high-quality chipset and therefore what no one could claim to be "overclocking." It is thus the baseline that there is no excuse for a 486 chipset+board to not support. By the way, the SiS 471 manual talks about its XFCLK that "leads the CPU clock by 3-5 ns" to "increase the margin of cache data RAM access time and the setup time." This would help explain why it is less picky about cache chips, especially if the UM498 does not work like that.

Reply 10 of 13, by RayeR

User metadata
Rank Oldbie
Rank
Oldbie

When you are refering "UM498" you mean the UM8498F (a shortned name)? There's not much info about either one. So who can tell how it's cache controller is similar to SiS 471 but it's probable it use some dual banking too. Maybe UM8498F supports 1MB L2 (8*128k). Even there would be space (socket to socket) on the MB to place DIL32 sockets for bank1 but for some reason they didn't.
I expect that for successfull cache mod I would need at least to wire A15, A16 lines for bank1 from chipset to sockets (If I understand well in dual bank mode there are 2 independent address and data buses for cache banks) but no idea how to find them. If someone would have a MB with the same chipset and 8*DIL32 it would be easy to beep between chache sockets and chipset pins. But then also it may need some special jumper configuration to really use those dual banks as 2*512k. Maybe also tag ram would need increase to 64kB? Hm, currently I see this a bit problematic to do this for a few % up so better would revert to dualbank 256kB... If I'll have a luck to get some better VLB MB I could try again with 1MB L2 but currently not going to spend more money on this. I'll rather continue with tuning VGA BIOS / S3 settings of 765VL...

Gigabyte GA-P67-DS3-B3, Core i7-2600K @4,5GHz, 8GB DDR3, 128GB SSD, GTX970(GF7900GT), SB Audigy + YMF724F + DreamBlaster combo + LPC2ISA

Reply 11 of 13, by pshipkov

User metadata
Rank l33t
Rank
l33t

Take a look at these posts for reference
Re: 3 (+3 more) retro battle stations
Re: 3 (+3 more) retro battle stations
Re: 3 (+3 more) retro battle stations
Including some of the following replies.

I cannot find a link to Feipoa’s original pdf describing the mod, but the links above clarify it well.
Edit: found it https://vogonsdrivers.com/getfile.php?fileid=946&menustate=0

Maybe this will help you.

Otherwise the chipset supports 1mb l2 cache buffer as can be seen in PC-Chips M912 v1.7

retro bits and bytes

Reply 12 of 13, by jakethompson1

User metadata
Rank Oldbie
Rank
Oldbie
RayeR wrote on 2025-06-22, 13:11:

When you are refering "UM498" you mean the UM8498F (a shortned name)? There's not much info about either one. So who can tell how it's cache controller is similar to SiS 471 but it's probable it use some dual banking too. Maybe UM8498F supports 1MB L2 (8*128k). Even there would be space (socket to socket) on the MB to place DIL32 sockets for bank1 but for some reason they didn't.
I expect that for successfull cache mod I would need at least to wire A15, A16 lines for bank1 from chipset to sockets (If I understand well in dual bank mode there are 2 independent address and data buses for cache banks) but no idea how to find them. If someone would have a MB with the same chipset and 8*DIL32 it would be easy to beep between chache sockets and chipset pins. But then also it may need some special jumper configuration to really use those dual banks as 2*512k. Maybe also tag ram would need increase to 64kB? Hm, currently I see this a bit problematic to do this for a few % up so better would revert to dualbank 256kB... If I'll have a luck to get some better VLB MB I could try again with 1MB L2 but currently not going to spend more money on this. I'll rather continue with tuning VGA BIOS / S3 settings of 765VL...

Yes, they are the same. AMI and Award call it a 498 in their POST strings, possibly because its predecessor was the 491 (82C491).
It indeed is undocumented.

As pshipkov says, if you do all that work to do 1MB but then have to de-tune the cache settings vs. what was possible for 256KB, it might be all for naught as far as any performance increase.

I suspect provisioning any 32-pin sockets on some of these boards was not to allow expansion beyond 256KB (although 512KB single-bank is possible), but as interest in 40 and 50 MHz waned, to provide support for 256KB single-bank as a cost saving measure. Some PCI boards only have single bank support anyway like that.

Reply 13 of 13, by RayeR

User metadata
Rank Oldbie
Rank
Oldbie
pshipkov wrote on 2025-06-22, 14:59:

Edit: found it https://vogonsdrivers.com/getfile.php?fileid=946&menustate=0
Maybe this will help you.

Thanks for pointing the manual, good work to feipoa.
So it seems that at least higher addresses are common for both banks. Does only a few lower addresses differ? resp. A3 only (one bank address would be set to a word next to ther bank)?

There's stated:
Wire TAG A15 and CACHE A16 to Um8881 pin 20. Refer to accompanying photos below. This is easily
accomplished by soldering TAG A15 and CACHE A16 to JP6-pin4 and to CPU pin A19.

So does it mean that Um8881 pin 20 is connected to CPU pin A19 (and to cache directly)? So I wouldn't need search for undoc. chipset pin but simply go for CPU A19?

Gigabyte GA-P67-DS3-B3, Core i7-2600K @4,5GHz, 8GB DDR3, 128GB SSD, GTX970(GF7900GT), SB Audigy + YMF724F + DreamBlaster combo + LPC2ISA