VOGONS


Reply 80 of 108, by mkarcher

User metadata
Rank l33t
Rank
l33t
feipoa wrote on 2023-06-18, 09:03:

Very interesting discovery about cache contents being kept valid while it is disabled.

I will try EDO 3-1-1-1 at FSB60 and 0ws/0ws with L2 removed. Since the margin for Cyrix != margin for AMD, I think this test is worth a shot. Or have you already tried this at 2x60 w/Am5x86?

I'm gonna call the mode that keeps the cache valid "standby" instead of disabled, to differentiate between them. With the Cyrix, I was barely able to get 3-1-1-1 working in the winter with L2 removed at 3*40 MHz, in the summer, I'm was able to get 3-1-1-1 with cache removed using an AMD DX4 120 at 4*30, but not at 2*60.

feipoa wrote on 2023-06-18, 09:03:
mkarcher wrote on 2023-06-17, 11:10:

I have a good old HiLo ALL-03A EPROM/GAL/Flash programmer. It might be a good idea to add a SOJ-to-DIP adapter to it, to be able to test the SRAM chips.

Do you know where to buy one? I have a two different programmers with 20 or so adaptors, but neither had a SOJ 300mil to DIP adaptor.

No idea, I'm sorry. I didn't yet search for one.

feipoa wrote on 2023-06-18, 09:03:
mkarcher wrote on 2023-06-17, 17:20:

But there is a distinct "cache disabled" mode that does not touch the data SRAM, which is different from the "cache in standby" mode I presumed to be the "most disabled" mode that is available.

When I set L2 to disabled in the BIOS (and still have L2 installed), is the L2 cache in "cache disabled mode"?

What mode is the L2 in when I set L2 to disabled in the BIOS but do not have L2 installed?

"cache disabled" in both cases. I'm unsure whether "cache disabled" provides any performance benefits over "cache standby", but it should provide power savings, because the data SRAM chips get their chip select input negated all the time.

BTW: It seems I found the issue with my adapter PCB: Two of the chip select inputs are not properly soldered. I guess unmolten paste made sufficient contact when I measured it directly after assembly, but as the paste dried, the continuity broke down. For short pulses of asserting chip select (aka chip enable), capacitive coupling over the gap suffices, but if you try to keep the chip selected for some time, it starts to get deselected after some time. I'm going to include a test for that behaviour into my "UMC 8881 cache diagnostic utility" and want to verify that I can diagnose that issue before fixing the hardware. If I would run large scale production of these adapter boards, I would likely keep the broken one as test subject, but I intend to just fix and use that one instead.

Different tests I made seem to indicate that I will likely not get better than 3-2-2-2 at FSB60 with the cache chips currently soldered to that board, but possibly that gets better once the chip select pins are correctly soldered. So the "10ns" marking might be fake.

Reply 81 of 108, by mkarcher

User metadata
Rank l33t
Rank
l33t
mkarcher wrote on 2023-06-18, 14:46:

BTW: It seems I found the issue with my adapter PCB: Two of the chip select inputs are not properly soldered.

Indeed, this was the issue. Getting solder to a SOJ pad below a chip already soldered is hard. I tried jamming paste under the pin using my syringe. I tried classic soldering with an iron and lot of flux. Either I broke the trace, or the pad just didn't want to wet. I ended up adding a bodge wire to connect one of the chip select inputs I couldn't get soldered. Still without A19, i.e. 512KB cache, I get stable operation at FSB50, 2-1-1-1 and FSB60 3-2-2-2. FSB66 (133MHz on the DX4-120) isn't stable, even without cache and maximum RAM wait states.

mkarcher wrote on 2023-06-18, 14:46:

For short pulses of asserting chip select (aka chip enable), capacitive coupling over the gap suffices, but if you try to keep the chip selected for some time, it starts to get deselected after some time.

Don't guess and claim guesswork as facts. Measure! These chips seem to pull down /CS1 if it is open. So the issue wasn't that the L2 cache lost chip select, but it didn't properly get deselected if it needed to. The UMC8881 design (which is common to other 486 chipsets) uses shared "byte write enable" for both banks of the cache. The bank that receives a write is chosen be deselecting the bank that shouldn't receive the write. Now that the second bank, bytes 2 and 3 were permanently selected, writing to SRAM address 2/3 caused a spurious write to SRAM address 6/7. When you fill the cache from start to end (e.g. by writing a test pattern), the spurious write to 6/7 gets overwritten by the intended write to 6/7 directly afterwards. That's why I missed the fault using simple test methods.

mkarcher wrote on 2023-06-18, 14:46:

Different tests I made seem to indicate that I will likely not get better than 3-2-2-2 at FSB60 with the cache chips currently soldered to that board, but possibly that gets better once the chip select pins are correctly soldered. So the "10ns" marking might be fake.

Indeed. As I reported in the first paragraph, I couldn't run the DOOM and Quake benches at any timing faster than 3-2-2-2 at FSB60. This is not nice. I need to check access time using my scope or logic analyzer, as this is as slow as the 15ns UMC chips. Genuine 10ns Cypress chips should perform better, if the bottleneck is actually the response time of the chips.

See the speedsys bench that there is 512KB L2:

Bio512.png
Filename
Bio512.png
File size
6.38 KiB
Views
1111 views
File comment
512KB cache works
File license
Public domain

Populated + bodged PCB installed on the board. The white cable is a PC speaker cable as delivered with the Amazon/AliExpress POST cards, which connects the A19 input to ground.

CachePCBInstalled.jpg
Filename
CachePCBInstalled.jpg
File size
384.25 KiB
Views
1111 views
File comment
PCB installed.
File license
CC-BY-4.0

Next step: Attach true A19 to get 1MB of cache.

Reply 82 of 108, by pshipkov

User metadata
Rank Oldbie
Rank
Oldbie

several years ago i strapped 1024kb L2 cache on the UUD board. was able to get it to 2111 at 40 and 50 mhz fsb.
tried veey hard to achieve the same at 60/66 but without success.
only 3222 was fully stable.
used trusted chips that do it on SiS based boards, but the UUD guy didnt want to play along.
Hope you succeed.

link (and following conversation)

retro bits and bytes

Reply 83 of 108, by feipoa

User metadata
Rank l33t++
Rank
l33t++

For my tests at L2:3-1-1-1 and UMC 15ns SRAM, this is only with 256K. You are trying 512K. I haven't yet determined whether UMC 15ns SRAM is OK at 3-1-1-1 when the size is 512K. pshipkov, did you try your SRAM musical chairs at 512K and 60 MHz?

pshipkov wrote on 2023-06-18, 19:37:
several years ago i strapped 1024kb L2 cache on the UUD board. was able to get it to 2111 at 40 and 50 mhz fsb. tried veey hard […]
Show full quote

several years ago i strapped 1024kb L2 cache on the UUD board. was able to get it to 2111 at 40 and 50 mhz fsb.
tried veey hard to achieve the same at 60/66 but without success.
only 3222 was fully stable.
used trusted chips that do it on SiS based boards, but the UUD guy didnt want to play along.
Hope you succeed.

link (and following conversation)

mkarcher, keep in mind that this was with an Am5x86.

mkarcher, your SOJ 128kx8 chips look just like mine from mouser. Are there any markings on the bottom?

Plan your life wisely, you'll be dead before you know it.

Reply 84 of 108, by rasz_pl

User metadata
Rank l33t
Rank
l33t

4 layer pcb with solid ground plane, right? I was going to complain about that white cable working as antenna, but if its ground its probably fine.

Open Source AT&T Globalyst/NCR/FIC 486-GAC-2 proprietary Cache Module reproduction

Reply 85 of 108, by Disruptor

User metadata
Rank Oldbie
Rank
Oldbie
rasz_pl wrote on 2023-06-18, 20:57:

4 layer pcb with solid ground plane, right? I was going to complain about that white cable working as antenna, but if its ground its probably fine.

yes 4 layer ... 1024k working now
will test an Am5x86-133 with 3x50 soon, with 2-1-1-1

Reply 87 of 108, by mkarcher

User metadata
Rank l33t
Rank
l33t
feipoa wrote on 2023-06-18, 20:41:

For my tests at L2:3-1-1-1 and UMC 15ns SRAM, this is only with 256K. You are trying 512K. I haven't yet determined whether UMC 15ns SRAM is OK at 3-1-1-1 when the size is 512K. pshipkov, did you try your SRAM musical chairs at 512K and 60 MHz?

I know that we observed that the Cyrix 5x86 didn't like 3*40 at 2-1-1-1 with 1MB on the HOT-433, but (in the winter) it liked 3*40 at 2-1-1-1 with 256KB on the Biostar board. The size of the cache chips is not supposed to influence performance. Possibly the tag comparator speed in the chipset depends on the cache size. Furthermore, it is unknown to me whether some address bits appear later on the FSB than other address bits. I'm going to try the Cyrix at 2*50 next. 2*50 @ 2-1-1-1 should be a nice configuration for the 100GP.

feipoa wrote on 2023-06-18, 20:41:

mkarcher, keep in mind that this was with an Am5x86.

I expect the FSB interface of the WB enhanced DX4 and the Am5x86 to be identical. The advantage of the DX4 is that it supports a *2 multiplier, so I can run it at a core clock that's in spec at FSB60, whereas the lowest core clock I can get on a 5x86 at FSB60 is 180MHz, which is borderline without some overvolting. The Am5x86 at 3*50, 1M L2 at 2-1-1-1 would also be nice.

feipoa wrote on 2023-06-18, 20:41:

mkarcher, your SOJ 128kx8 chips look just like mine from mouser. Are there any markings on the bottom?

Yes, something that looks like a serial number. I took the photo of the bottom side of two of the chips of the same 20-piece part of cut tape I received. The markings are 619704370 and 619704180. I don't remember whether these two chips that were somehow "left over" are from neighbouring pockets of the tape, though. See also the attached picture (quality is low, I know, but should be sufficient to answer this question).

SOJ_bottom.jpg
Filename
SOJ_bottom.jpg
File size
18.84 KiB
Views
1047 views
File comment
Bottom of CY7C1009D chips
File license
Public domain

I took A19 from the trace running from the CPU to the UM8881 chip by scratching the solder mask and soldering enemaled wire to it (as you also do to bridge broken traces), see

GettingA19.jpg
Filename
GettingA19.jpg
File size
227.62 KiB
Views
1047 views
File comment
Getting A19
File license
CC-BY-4.0

1MB cache is recognized and still working at 2-1-1-1 at 50 MHz. I was using 1WS reads from DRAM at 50 MHz the whole time, though. The speedsys memory graph confirms 1MB; Quake low res and DOOM high from Phils DOSBENCH collection worked without crashing.

Bio1M.png
Filename
Bio1M.png
File size
1.96 KiB
Views
1047 views
File license
Public domain

Reply 88 of 108, by feipoa

User metadata
Rank l33t++
Rank
l33t++

2-1-1-1 at 3x50 is nice w/1024K. I think pshipkov had to do some musical chairs to get that config working using DIPs.

The markings on my CY7C1009D-10VXI chips are a little different.

CY7C1009D-10VXI.JPG
Filename
CY7C1009D-10VXI.JPG
File size
318.07 KiB
Views
1036 views
File license
CC-BY-4.0

Plan your life wisely, you'll be dead before you know it.

Reply 89 of 108, by pshipkov

User metadata
Rank Oldbie
Rank
Oldbie

yes, i had to swap bunch of chips around for 2111 at 40/50. keep in mind i have the dram wait states to 0 at these freqs.

Mkarcher, when you can try 60/66 ?
really curious if you will be able to hit 2111.

retro bits and bytes

Reply 90 of 108, by mkarcher

User metadata
Rank l33t
Rank
l33t
pshipkov wrote on 2023-06-19, 01:44:

Mkarcher, when you can try 60/66 ?
really curious if you will be able to hit 2111.

mkarcher wrote on 2023-06-18, 19:18:

I get stable operation at FSB50, 2-1-1-1 and FSB60 3-2-2-2. FSB66 (133MHz on the DX4-120) isn't stable, even without cache and maximum RAM wait states.

Already did so. No 2-1-1-1 at FSB60. That was with A19 not yet connected, but I don't see how connecting another address bit would improve the situation.

Reply 91 of 108, by pshipkov

User metadata
Rank Oldbie
Rank
Oldbie

Missed that line.
Ok, so you are seeing the same thing. Too bad.
The 60/66MHz 3222 1Mb config is actually slower than 256Kb at 2111, which kind of defeats the point.
At 50MHz 2111 should be possible with the 1Mb buffer. I can do it here. Feels like unlucky chips you got. In general all chips must be carefully curated for 2111, or how Feipoa calls it "music chaired" (tm).

retro bits and bytes

Reply 92 of 108, by mkarcher

User metadata
Rank l33t
Rank
l33t
pshipkov wrote on 2023-06-19, 06:59:

The 60/66MHz 3222 1Mb config is actually slower than 256Kb at 2111, which kind of defeats the point.
At 50MHz 2111 should be possible with the 1Mb buffer. I can do it here. Feels like unlucky chips you got. In general all chips must be carefully curated for 2111, or how Feipoa calls it "music chaired" (tm).

I got 2-1-1-1 at FSB50 with 1MB using a random sample of chips labelled CY7C1009D-10 obtained from a possibly unreliable source. Maybe that's a hint that the chips are indeed genuinely marked.

Without curation, I also didn't get faster than 3-2-2-2 at FSB60 with 256K using UM61m256-15 chips.

Reply 93 of 108, by feipoa

User metadata
Rank l33t++
Rank
l33t++
pshipkov wrote on 2023-06-19, 06:59:

In general all chips must be carefully curated for 2111, or how Feipoa calls it "music chaired" (tm).

With SOJ sockets, we can curate among 8 ns 512K, 8 ns 256K, and 10 ns 1024K. And maybe 8 ns 128kx8 exist, I hope!

Plan your life wisely, you'll be dead before you know it.

Reply 94 of 108, by feipoa

User metadata
Rank l33t++
Rank
l33t++

I ran the UUD board with the L2 cache physically removed, in hope of achieving EDO 3-1-1-1, and DRAM 0ws/0ws. Unfortunately, once EDO is set from 4-2-2-2 to 3-1-1-1 and I reboot, the POST process hangs after initialising Plug 'n Play cards. Maybe pshipkov with his magical 64 MB EDO stick can get 3-1-1-1? Attached is the UUD BIOS with the EDO 3-1-1-1/4-2-2-2 setting available.

Filename
UUD2012.zip
File size
85.74 KiB
Downloads
40 downloads
File license
Fair use/fair dealing exception

Plan your life wisely, you'll be dead before you know it.

Reply 95 of 108, by amadeus777999

User metadata
Rank Oldbie
Rank
Oldbie

Lovely thread - MkArcher your cache board looks promising!
I wonder if speeds above 50mhz fsb are feasible on the late era chipsets... wether UMC or SiS I never got far with optimum timings. I also tried feeding the s/rams with a slightly higher voltage but this did not provide any benefits. The northbridge on the SiS boards needed cooling beyond 50mhz albeit it did not get much hotter when running above former speed.
I got the best speedup by leaving the bus at 50mhz / fastest timings and instead bumping the cpu to 200mhz - which enabled 21.3fps in Quake.

Reply 96 of 108, by feipoa

User metadata
Rank l33t++
Rank
l33t++

mkarcher, in case you are still planning on a multi-purpose L2 board for the UUD, I thought I'd let everyone know that these 8 ns 32kx8 modules are still available. These arrived recently:

LP61256GS-8_new_order_July_2023.JPG
Filename
LP61256GS-8_new_order_July_2023.JPG
File size
314.16 KiB
Views
791 views
File license
CC-BY-4.0

They were $2.50 each with $8 shipping.

I couldn't find the 8 ns 64kx8 chips. If anyone is able to find authentic 8 ns 64kx8 or 8 ns 128kx8 chips, could you let me know the source via PM? Many thanks.

Plan your life wisely, you'll be dead before you know it.

Reply 97 of 108, by mkarcher

User metadata
Rank l33t
Rank
l33t

Thanks for posting the picture.

Something like that is definitely worth a shot. I just booted into Linux on my 8433UUD-A using an Am5x86 at 3*50 with my cache PCB with 1MB, and observed that I needed to use 3-2-2-2 to get the system stable. As other people definitely were able to run way faster timings with selected cache chips, probably the chips I use (or the PCB design?) aren't yet optimal. Before I redesign the PCB, I want to take some scope traces and try to identify the bottlenecks. Possibly I will even be able to see whether reflections might cause the issue (i.e. series termination resistors on some lines are helpful to improve performance).

My favorite FSB/CPU/PCI stability test in Linux is to boot from a CD-ROM or run with "init=/bin/sh" which mounts the root filesystem read-only and repeatedly run

dd if=/dev/hda bs=1024k count=<XX> | md5sum

where <XX> is replaced with a number slightly bigger than the number of megabytes RAM installed in the system. This commands reads that many megabytes from the hard disk drive and calculates a checksum of them. As Linux caches the disk reads, this will use most of the RAM, and bit errors anywhere in the RAM will show up. Also, as the number of megabytes read from the disk exceeds the memory size, every invocation of this command re-reads the data from the disk, stressing not only the FSB, but (on later mainboards) the PCI arbitration and bus mastering as well. You expect to repeatedly obtain the same number. If you don't, something's unstable in your set-up. As for every hardware test, it is not guaranteed to find all issues, but I made very good experience with that test to quickly confirm that system stability issues are hardware-related.

Reply 98 of 108, by mkarcher

User metadata
Rank l33t
Rank
l33t
mkarcher wrote on 2023-08-04, 15:17:

Something like that is definitely worth a shot. I just booted into Linux on my 8433UUD-A using an Am5x86 at 3*50 with my cache PCB with 1MB, and observed that I needed to use 3-2-2-2 to get the system stable. As other people definitely were able to run way faster timings with selected cache chips, probably the chips I use (or the PCB design?) aren't yet optimal.

I did some tests with my scope. To make bank interleave work, the chip enable signal alternates between the banks. The time between the chip enable signal going below ~1V on a bank, and the data line toggling is around 7 to 7.5ns. On the other hand, the tag chip has its output permanently enabled, and responds directly to the address line toggling. The time from the address line getting valid (below 0.8 / above 2.0V) and the data line being valid (again: below 0.8 / above 2.0V) is around 13.0ns to 13.5ns, with the chips being perhaps 10°C to 15°C above ambient, so 40°C maximum. This does not match the data sheet of the CY7C1009D-10: They specify "/OE low to data valid: max 5ns" and "Address valid to data valid: max 10ns", which should apply even at the maximum permissible operating temperature of 85°C. So https://www.cpu-world.com/forum/viewtopic.php?t=32721 seems to be spot on:

128k x 8 which were "rated" 10ns by the Chinese. They can do 3-2-2-2 timings at 66MHz or 2-1-1-1 at 40MHz which is what you can expect of 15ns async SRAM.

Reply 99 of 108, by rasz_pl

User metadata
Rank l33t
Rank
l33t

why CY7C1009D and not something like https://eu.mouser.com/ProductDetail/Alliance- … 8C401801-QC166N? 3.3V might be a problem, but Iv seen people use 3.3v sram in Amigas with success (>year with no failures)

Open Source AT&T Globalyst/NCR/FIC 486-GAC-2 proprietary Cache Module reproduction