VOGONS


Reply 40 of 108, by mkarcher

User metadata
Rank l33t
Rank
l33t
feipoa wrote on 2023-04-04, 20:45:
mkarcher wrote on 2023-04-04, 11:55:

Now, that's funny. I had no issues with 2-1-1-1 at 40MHz on the Biostar board and RAM WS 0/0 with the 5x86. I did need 1/0 on RAM WS to get stable operation on the HOT-433 with an earlier 8881 revision, though.

This is correct, 2-1-1-1-1, 0ws/0ws, 40 MHz, cx5x86, and 256K is not an issue, even with the fake chinese 32kx8 chips. The issues start after 256K.

Wow, thanks! That's an interesting data point I forgot to consider up to now: The HOT-433 has 1MB L2 cache and needs 2-1-1-1, 1WS/0WS, wherease the Biostar only has 256KB (dual bank), and works at 2-1-1-1, 0WS/0WS. Maybe it's not the chipset revision or the board layout, but the cache size that causes this difference. Do you still remember whether using 128K x 8 chips at 32K x 8 still exhibits the issue?

Looking at the HOT-433, I find 4 16-bit buffer chips (UM8002), and two slow unidirectional 8-bit buffer chips (74LS244). I might check what they are used for. On the Biostar board, I find two slow bidirectional buffer chips (74LS245), which seem to be used for IDE, but nothing that is fast enough to go on the 486 FSB.

feipoa wrote on 2023-04-04, 20:45:

Are there any socketable 10 ns SRAM chips you can use on a revised PCB version? I know there is one other user here who was hoping to see this PCB socketed. Once you get to the extreme fringe of stability with overclocking (be it CPU freq or wait states), dozens of [brand name] SRAM modules needed to be swapped around to achieve stability. User pshipkov knows more on this front.

It looks like 0.3" SOJ32 sockets are widely available. Using sockets also has the advantage of being better solderable by hand. You could design an adapter PCB allowing installation of those sockets, but it would require a complete re-layout to make them fit. As I already wrote, currently the distance between cache chips on that PCB is 0.4", but the width of a typical SOJ32 socket is 0.55", so they won't fit. On the other hand, there is a conflicting requirement (nothing wrong about that, it's not bad, just conflicting) to get a variations of this board for "more modern pinouts" which likely refers to TSOP. You won't get sockets for that.

Reply 41 of 108, by feipoa

User metadata
Rank l33t++
Rank
l33t++
mkarcher wrote on 2023-04-05, 07:39:

Wow, thanks! That's an interesting data point I forgot to consider up to now: The HOT-433 has 1MB L2 cache and needs 2-1-1-1, 1WS/0WS, wherease the Biostar only has 256KB (dual bank), and works at 2-1-1-1, 0WS/0WS. Maybe it's not the chipset revision or the board layout, but the cache size that causes this difference. Do you still remember whether using 128K x 8 chips at 32K x 8 still exhibits the issue?

Looking at the HOT-433, I find 4 16-bit buffer chips (UM8002), and two slow unidirectional 8-bit buffer chips (74LS244). I might check what they are used for. On the Biostar board, I find two slow bidirectional buffer chips (74LS245), which seem to be used for IDE, but nothing that is fast enough to go on the 486 FSB.

I will send you my Excel sheet from 2012.

What I recall is that each component plays an important part when working on the edge of stability. The components are FSB, L2 size, L2 format (double/single banked), total RAM, how many RAM modules, EDO/FPM, on-motherboard memory buffers, BIOS wait states for L2, BIOS wait states for RAM, and CPU type.

mkarcher wrote on 2023-04-05, 07:39:

It looks like 0.3" SOJ32 sockets are widely available. Using sockets also has the advantage of being better solderable by hand. You could design an adapter PCB allowing installation of those sockets, but it would require a complete re-layout to make them fit. As I already wrote, currently the distance between cache chips on that PCB is 0.4", but the width of a typical SOJ32 socket is 0.55", so they won't fit. On the other hand, there is a conflicting requirement (nothing wrong about that, it's not bad, just conflicting) to get a variations of this board for "more modern pinouts" which likely refers to TSOP. You won't get sockets for that.

I am aware that it would require a PCB redesign to make SOJ sockets fit. I was indirectly asking if you would be interested in creating a new PCB to accommodate sockets.

Plan your life wisely, you'll be dead before you know it.

Reply 42 of 108, by mkarcher

User metadata
Rank l33t
Rank
l33t
feipoa wrote on 2023-04-05, 09:55:

I will send you my Excel sheet from 2012.

PM received. Thanks. Possibly some data points are useful.

feipoa wrote on 2023-04-05, 09:55:

What I recall is that each component plays an important part when working on the edge of stability. The components are FSB, L2 size, L2 format (double/single banked), total RAM, how many RAM modules, EDO/FPM, on-motherboard memory buffers, and CPU type.

That's not surprising. It's surely operating everything outside of the timing parameters guaranteed in the data sheets. At those limits, there are also external influences like room temperature that can shift timings by one or two nanoseconds and make the difference between "working" and "not working". I noticed that CPU temperature on my Cyrix 5x86 is a big deal. It crashed a lot at 120MHz with the standard glued Cyrix heatsink, but works fine with a fan bodged on. When it was crashing, the heatsink was warm, but not burning my fingers. I guess that's around 40°C to 45°C. I might have issues running 120MHz in summer...

feipoa wrote on 2023-04-05, 09:55:
mkarcher wrote on 2023-04-05, 07:39:

It looks like 0.3" SOJ32 sockets are widely available. Using sockets also has the advantage of being better solderable by hand. You could design an adapter PCB allowing installation of those sockets, but it would require a complete re-layout to make them fit. As I already wrote, currently the distance between cache chips on that PCB is 0.4", but the width of a typical SOJ32 socket is 0.55", so they won't fit. On the other hand, there is a conflicting requirement (nothing wrong about that, it's not bad, just conflicting) to get a variations of this board for "more modern pinouts" which likely refers to TSOP. You won't get sockets for that.

I am aware that it would require a PCB redesign to make SOJ sockets fit. I was indirectly asking if you would be interested in creating a new PCB to accommodate sockets.

Yeah, I'm just gathering the requirements right now, so I won't design a variant that slightly misses the mark. So there is demand for socketed SOJ32 300mil (e.g. CY7C1009D)? Or do you prefer socketed SOJ32 400mil (e.g. CY7C109D)?

On the other hand, majestyk asked for a different pinout with Vcc/GND centered, but didn't provide enough detail for me to know what layout is recommended. I failed to find 128k x 8 SOJ32 in a "new pinout" in my first attempt, but I didn't spend a lot of time on that.

Reply 43 of 108, by feipoa

User metadata
Rank l33t++
Rank
l33t++

I'd prefer the 300 mil width variant of the SOJ-32. In looking thru my bin, I already have 10x CY7C1009D-VXI and 10x sockets for these modules.

I had planned on using the sockets for my NexGen PF110 system, however I determined too much notching of the plastic was needed to make the sockets fit well. I was also going to use the CY7C1009D chips on the NexGen PF110 system, however I decided that upgrading from 256K to 1024K on the fastest NexGen might cause stability issues. So far, the one or two others who had done this mod were using a much slower NexGen CPU, thus a lower FSB. All NexGen's were sold with only 256K as far as I could determine.

Plan your life wisely, you'll be dead before you know it.

Reply 44 of 108, by pshipkov

User metadata
Rank l33t
Rank
l33t

Socketed sram on the adapter will be great.
10ns soj are not much different than 10ns dip chips when it comes down to tight timings and above 40mhz fsb.
If no soj sockets, the safe bet is 8ns chips, but havent seen 128k x 8 of them, only 32k x 8, which means 256kb level 2 cache.
For the Biostar uud board that will not be very interesting since a set of 32k x 8 dip chips working at 3x66mhz and tightest timings can be binned with a bit of time spent on it.
But also, from my experience (and man, i tried extra hard) the board does not have the built in circuitry to handle 1mb l2 cache at 40mhz fsb or higher, with tight timings.
Still, it will be very interesting to try that adapter and see if it improves on the current situation.

Cyrix 5x86 procs are outside my zone of interest, but with am5x86 and p24t processors edo+l2 cache is ALWAYS faster than no l2 cache.

Finally, not sure it is worth discussing further since thats the name of the game on vogons, but there are multiple notes throughout the thread about the perf qualities of um8881, sis471, sis469 that are a bit misleading in my view.

retro bits and bytes

Reply 45 of 108, by mkarcher

User metadata
Rank l33t
Rank
l33t
pshipkov wrote on 2023-04-07, 05:25:

But also, from my experience (and man, i tried extra hard) the board does not have the built in circuitry to handle 1mb l2 cache at 40mhz fsb or higher, with tight timings.
Still, it will be very interesting to try that adapter and see if it improves on the current situation.

My key idea is that the mainboard isn't supposed to require any specific circuit to support a bigger amount of L2 cache. There might be a quirk in the 8881 north bridge that makes 1MB cache at high speed unreliable, but that would apply to all 8881-based boards. I use 1MB L2 cache with an AMD 5x86 on a HOT-433 without issues, so I don't expect the north bridge to be the (sole) limiting factor. I hope that my adapter PCB with dedicated decoupling caps possibly provides better signal integrity and power stability than DIP chips stacked in a socket (especially as the power supply of the second bank in taken on Pin 32, which is connected using a bodge wire), but I still can't get rid of the one bodge to get A19 from the CPU to the cache. For signal integrity, it shouldn't be worse than the non-bodge connections getting A5 to A18 to the cache.

pshipkov wrote on 2023-04-07, 05:25:

Cyrix 5x86 procs are outside my zone of interest, but with am5x86 and p24t processors edo+l2 cache is ALWAYS faster than no l2 cache.

While this likely true for the general case, the clickbait title is derived from actual performance measurements with Quake 320x200. Quake seems to be an exception to the general rule, exhibiting memory access patterns that are extremely unfriendly to the L2 cache, especially L2WB. You can see a clear performance improvement if some of the memory used by Quake is in uncached memory due the cacheable area limitation. The high score for the EDO+no cache in this case is likely caused by the faster memory writes. At FSB60, page hit writes into the EDO RAM can be performed at 2 clocks per cycle, but as soon as I enable L2 cache (at 3-2-2-2), all writes need a tag lookup. The tag lookup and possible cache update take 3 clocks. Not having L2 prevents the L2 update overhead (which you have to pay, even on L2 cache misses).

The low scores of Quake with L2WB cache furthermore seem to show that Quake suffers from write amplification caused by L2WB: When just a single write cycle happened to a line in L2 cache, the whole cache line is considered dirty and will be written back to main memory. This means that an L2 write hit is (possibly) faster while the write is performed, because there is no need to wait for a RAM write cycle, but the total write pressure is increased because in the end, more data is written from L2 to RAM. L2WB only shines when you get multiple writes into the same cache line before the line is flushed, and only if the RAM cycles avoided by the WB scheme would have caused a penalty at all. With 2-cycle page hit writes (as I get with EDO at FSB60), page-hit writes do not seem to slow down processor writes.

pshipkov wrote on 2023-04-07, 05:25:

Finally, not sure it is worth discussing further since thats the name of the game on vogons, but there are multiple notes throughout the thread about the perf qualities of um8881, sis471, sis469 that are a bit misleading in my view.

Feel free to share your views in this thread.

Reply 46 of 108, by feipoa

User metadata
Rank l33t++
Rank
l33t++

I've been casually working on an Am5x86-180 system for several months now. It has been a major headache. The path between my 180 MHz adventure and this thread have crossed so I thought I'd share some relevant findings. I first wanted to compare FPM vs. EDO on the Biostar MB-8433UUD board at 180 MHz with 256K double-banked. I unhid the EDO DRAM speed option shown here:

UUD_EDO_read.JPG
Filename
UUD_EDO_read.JPG
File size
378.45 KiB
Views
915 views
File license
CC-BY-4.0

The options are 4-2-2-2 or 3-1-1-1. I grabbed my best 64 MB EDO stick and found some SRAM that was happy at 3-1-1-1 (apparently pshipkov has some SRAM which can do 2-1-1-1 @256K & 180 MHz). The best stable DRAM read/write wait-states I could obtain were 1ws / 0ws. I could get 0ws/0ws to boot at 180 MHz, but it wasn't stable. Benchmark scores showed that EDO 4-2-2-2 and 3-1-1-1 were the same. They were:

Cachechk v7
L1 = 186.1 MB/s
L2 = 91.5 MB/s
DRAM read = 50.2 MB/s
DRAM write = 83.4 MB/s

DOS Quake = 19.4 fps
GLQuake = 28.2 fps
Quake II = 11.4 fps

Alternately, if I'm using FPM 64 MB, I can also use 3-1-1-1 for SRAM and DRAM at 1ws/0ws (read/write). System was perfectly stable and I've been using 64 MB FPM at 1ws/0ws in my IBM 5x86c-133/2x system for many years. Benchmark scores were:

Cachechk v7
L1 = 186.1 MB/s
L2 = 91.5 MB/s
DRAM read = 52.8 MB/s
DRAM write = 83.4 MB/s

DOS Quake = 19.4 fps
GLQuake = 28.3 fps
Quake II = 11.4 fps

There's a slight increase in DRAM read speed with FPM compared to EDO. GLQuake showed a marginal edge in favour of FPM, while DOS Quake and Quake II were unchanged.
.
.
.
On a slightly different topic, running the UUD at 180 MHz required 4.0 V. This is at the limit of the UUD's PCB design. If I used a PSU with a good quality AT connector, I can get 4.01 V to the CPU. On the other hand, if I use an AT PSU with a very worn out connector (PSU has been used on a testbed for 25 years), I can only get up to 3.81 V to the CPU. This is with already adjusting the VRM's trim pot to the maximum voltage.

3.81 V wasn't enough for the CPU to be stable. I then remembered I had some of the upgraded Sharp regulators, that is, units with 3 A max output rather than 2 A output. This regulator swap adjusted the Vout max a little bit, to 3.87 V. System was almost stable. Next I was going to replace the 1N5400 series diode with a solid wire. However, upon searching through my diode bin, I found some 3A and 5A Schottky diodes. I decided to solder on the 5A diode.

Diode_VRM_swap_1.JPG
Filename
Diode_VRM_swap_1.JPG
File size
157.8 KiB
Views
915 views
File license
CC-BY-4.0
Diode_VRM_swap_2.JPG
Filename
Diode_VRM_swap_2.JPG
File size
1.85 MiB
Views
915 views
File license
CC-BY-4.0

This upped the CPU's max voltage to 4.16 V, which was more than enough for my 180 MHz tests. I did a few more measurements:

There was a 0.43 V drop across the SB550 Schottky diode.

The voltage at the AT connector was 5.05 V, but the voltage at the AT connector on the PCB side was only 4.81 V. This means that the voltage drop across my worn out AT connector was 0.24 V.

The voltage to the input of the PQ30RV31 was 4.33 V and the maximum output voltage was 4.16 V, thus the PQ30RV31 regulator is dropping only 0.17 V. The PQ30R21 was dropping 0.23 V. If using a PSU with a quality AT connector, I'd expect the max output voltage from the VRM to be around 4.4 V when using an Am5x86 at 180 MHz.

As a curious side note, I was able to get the Am5x86 running stable on the UUD board with just a heatsink/fan, while on the LSD board I needed to run a weak peltier (~10 W).

Plan your life wisely, you'll be dead before you know it.

Reply 47 of 108, by mkarcher

User metadata
Rank l33t
Rank
l33t
feipoa wrote on 2023-06-07, 10:56:

Benchmark scores showed that EDO 4-2-2-2 and 3-1-1-1 were the same.

That's with L2 enabled. The 3-1-1-1 EDO option possibly only yields a DRAM read performance improvement if L2 is disabled. You also need a processor that executes REP LODSD fast enough to notice the difference (the Cx5x86 does). At FSB60, DRAM read with cacheless 3-1-1-1 EDO is supposed to yield around 114MB/s DRAM read. If you always have L2 enabled, it's not surprising that you don't see any effect.

feipoa wrote on 2023-06-07, 10:56:

Alternately, if I'm using FPM 64 MB, I can also use 3-1-1-1 for SRAM and DRAM at 1ws/0ws (read/write). System was perfectly stable and I've been using 64 MB FPM at 1ws/0ws in my IBM 5x86c-133/2x system for many years. Benchmark scores were:

Yeah, that reminds me: I need to assemble my cache adaptor PCB. Hand soldering SOJ chips with little space in between proved challenging, I managed to scratch the traces on one of the five prototype PCBs I got. A SMD heat gun arrived by now, and when I get the extra nozzles I ordered from China, I will try whether that approach works better. One of the nozzles I'm going to recieve is specifically designed for 300 mil SOJ. I didn't get -1-1-1 at FSB60 with the DIP chips I have.

When comparing waitstate settings, please be aware that DRAM read/write wait states seem to to require higher values when there is no L2 cache active. As I recently obtained a logic analyzer with 2ns timing resolution, I can take a very detailed look at cache/memory timings in different settings now.

feipoa wrote on 2023-06-07, 10:56:

There's a slight increase in DRAM read speed with FPM compared to EDO. GLQuake showed a marginal edge in favour of FPM, while DOS Quake and Quake II were unchanged.

This is consistent with the findings in my initial post in this thread.

feipoa wrote on 2023-06-07, 10:56:

Next I was going to replace the 1N5400 series diode with a solid wire.

Yeah, I really need to continue working on my Biostar board, too. The Schottky diode and the 5K trimpot arrived, so I can do the swap (without changing the regulator), and report whether I can get 4.0V and whether I can get my Cx5x86-100 to 133MHz at that voltage.

feipoa wrote on 2023-06-07, 10:56:

The voltage at the AT connector was 5.05 V, but the voltage at the AT connector on the PCB side was only 4.81 V. This means that the voltage drop across my worn out AT connector was 0.24 V.

You might try applying contact cleaner like Deox-IT to the AT connector to reduce the contact resistance.

Reply 48 of 108, by feipoa

User metadata
Rank l33t++
Rank
l33t++

I must have forgot about your note concerning disabling L2. I have re-run some tests with Am5x86-180.

FPM 64MB @ 1ws/0ws - L2 256K @ 3-1-1-1

Cachechk v7
L1 = 186.2 MB/s
L2 = 91.5 MB/s
DRAM read = 52.8 MB/s
DRAM write = 83.4 MB/s

DOOM = 1367 realtics
Quake = 19.5 fps
PCPBench = 27.6 fps
.
.
.
FPM 64MB @ 1ws/0ws - no L2

Cachechk v7
L1 = 182.3 MB/s
L2 = 55.7 MB/s
DRAM read = 55.7 MB/s
DRAM write = 125.0 MB/s

DOOM = 1439 realtics
Quake = 18.8 fps
PCPBench = 26.0 fps
.
.
.
EDO 64MB @ 1ws/0ws @ 4-2-2-2 - L2 256K @ 3-1-1-1

Cachechk v7
L1 = 186.2 MB/s
L2 = 91.5 MB/s
DRAM read = 50.2 MB/s
DRAM write = 83.4 MB/s

DOOM = 1369 realtics
Quake = 19.4 fps
PCPBench = 27.6 fps
.
.
.
EDO 64MB @ 1ws/0ws @ 4-2-2-2 - no L2

Cachechk v7
L1 = 183.6 MB/s
L2 = 66.8 MB/s
DRAM read = 66.8 MB/s
DRAM write = 125.0 MB/s

DOOM = 1400 realtics
Quake = 19.5 fps
PCPBench = 27.4 fps
.
.
.
EDO 64MB @ 1ws/0ws @ 3-1-1-1 - no L2

DOES NOT POST
.
.
.
EDO 64MB @ 0ws/0ws @ 4-2-2-2 - no L2

Cachechk v7
L1 = 183.8 MB/s
L2 = 66.9 MB/s
DRAM read = 66.9 MB/s
DRAM write = 125.2 MB/s

DOOM = 1391 realtics
Quake = 19.8 fps
PCPBench = 28.1 fps

Interesting that when L2 is disabled the memory write speed jumps significantly, for both FPM and EDO, e.g. from 83 MB/s to 125 MB/s.

The most noticeable benefit of these tests was that disabling L2 for EDO allowed for 0ws/0ws (read/write). With L2 eanbled, I had to use 1ws/0ws. With 0ws/0ws, EDO mostly beats FPM w/L2, except in DOOM. In Quake, 19.8 fps for EDO-noL2 and 19.5 fps for FPM-w/L2. In PCPBench, 28.1 fps for EDO-noL2 and 27.6 fps for FPM-w/L2.

I'm looking forward to your L2 cache adaptor.

mkarcher wrote on 2023-06-07, 18:50:

I didn't get -1-1-1 at FSB60 with the DIP chips I have.

I had the best of luck with UMC branded L2 UM61256FK-15. My 10 ns Chinese Winbond reproductions worked at 3-1-1-1 as well.

I would be interested in reproducing your logic analyser setup to determine SRAM response time. When you have it working well, please share the setup details.

Concerning your attempts to get 4.0 V on a Cyrix 5x86 - from my experience with the stock MB-8433UUD setup, you cannot go much over 3.85 V at full load. Based on my analysis above, you should have no problem getting to 4.x volts with the Schottky. However, my feeling is that if it cannot do 133 Mhz at 3.85 V, it probably won't at 4.0 V. Looking forward to your outcome here.

Concerning my lousy 25 year old AT connector, yes, I applied contact cleaner, used a tooth brush, pushed connector on and off with cleaner still moist, etc, but I'm still seeing 0.24 V drop. I think the thin coating which gets applied to the connector (a sort of leaf spring) is worn out. Would some conductive carbon grease help at all? Normally I used that on sliding contacts though. Something I've done in the past with worn out AT connectors was to pull connector out, solder the crimp, and re-tension the spring. It can be rather time consuming though.

Plan your life wisely, you'll be dead before you know it.

Reply 49 of 108, by mkarcher

User metadata
Rank l33t
Rank
l33t
feipoa wrote on 2023-06-08, 10:40:

Interesting that when L2 is disabled the memory write speed jumps significantly, for both FPM and EDO, e.g. from 83 MB/s to 125 MB/s.

That's not surprising if you look at the low-level protocol: A 0WS write cycle takes 2 clocks. At FSB60, that is 120MB/s. This is (as long as you don't do write bursting, which might be performed for L1WB cache line write-back) the fastest a 486 can provide. The 125MB/s is likely a skewed measurement and actually indicates that the board hits 120MB/s, i.e. the theoretical maximum. If L2 is enabled, every write cycle needs to perform a tag lookup to check whether the L2 cache needs to be updated (even in WT mode). If you run the cache at 3-1-1-1, the first cache cycle, which includes the tag lookup takes three clocks. At FSB60, the theoretical maximum for write cycles that take 3 clocks (i.e. 1WS from the perspective of the 486 processor) is 80MB/s. Again, the 83MB/s reported by cachechk is likely slightly too high and indicates that the board hits the theoretical maximum for 1WS write cycles with L2 enabled. You would need to go to 2-1-1-1 to not limit RAM write performance, but good luck getting that at FSB60...

feipoa wrote on 2023-06-08, 10:40:

With 0ws/0ws, EDO [w/o L2] mostly beats FPM w/L2, except in DOOM.

Which is exactly why this thread is titled the way it is 😉

feipoa wrote on 2023-06-08, 10:40:

I'm looking forward to your L2 cache adaptor.

I assembled it today (except for the cache chips). With the exception of one misplaced ground pin, which was already known, the pin header positions actually match the board. I can plug the cacheless adapter into the cache sockets. It fits tight enough that I don't worry about the extra pins that I could have placed for extra mechanical stability. The adapter board definitely doesn't fall off. So I'm just waiting for the delivery of the SOJ hot-air nozzle to continue, everything else looks fine. The cacheless adapter board does not prevent POSTing, so at least it seems to not short address or data lines of the frontside bus.

feipoa wrote on 2023-06-08, 10:40:

I had the best of luck with UMC branded L2 UM61256FK-15. My 10 ns Chinese Winbond reproductions worked at 3-1-1-1 as well.

I might have some of them at hand, I will try that the next days. I hope the Cypress 10ns chips I got from China will also get to 3-1-1-1.

feipoa wrote on 2023-06-08, 10:40:

I would be interested in reproducing your logic analyser setup to determine SRAM response time. When you have it working well, please share the setup details.

Well, the most important detail is that you need a logic analyzer with a sufficiently high timing resolution. I got a good offer on a used off-brand analyzer (a GoLogic 72-channel model), but I had to fix some solder joints on it to get the second set of 36 channels working, which likely were bad since production. 2ns timing resolution is likely not good enough to bin cache chips. Possibly my one GSps digital scope can provide better insight, but I have not yet purchased faster probes after removing the software bandwidth limit, so that scope with the probes I currently have is also likely just borderline fit for the job.

feipoa wrote on 2023-06-08, 10:40:

Concerning your attempts to get 4.0 V on a Cyrix 5x86 - from my experience with the stock MB-8433UUD setup, you cannot go much over 3.85 V at full load. Based on my analysis above, you should have no problem getting to 4.x volts with the Schottky. However, my feeling is that if it cannot do 133 Mhz at 3.85 V, it probably won't at 4.0 V. Looking forward to your outcome here.

I did the Schottky + Pot mod today. I currently don't have a system set up with the MB-8433UUD-A, so I just tested whether the system can repeatedly POST for some minutes. At 120MHz, the processor passes multiple POST cycles at 3.70V, but not at 3.67V. At 133MHz, the processor fails to pass repeated POSTs even at 3.87V. I avoided going higher than that. The fact that steppig up by 0.17V does not help to get from 120 to 133 likely indicates that my processor won't work at 133 at all. I'm not complaining about it - as overclocking to 133 is known to be rarely possible on Cyrix branded 5x86-100 processors.

To test the capability of the board after the Schottky mod, I inserted a 5V 486DX2 &E processor. This processor starts to work in POST at 3.35V, and is obviously able to cope with 5V. It consumes considerably less power than a 5x86, though. With that processor, I was able to get around 4.53 volts at the end of the range. So I consider the Schottky mod to work perfectly for all sensible requirements.

I think the thin coating which gets applied to the connector (a sort of leaf spring) is worn out. Would some conductive carbon grease help at all?

Sorry, no idea here.

Reply 50 of 108, by pshipkov

User metadata
Rank l33t
Rank
l33t

To mkarcher's note about not limiting RAM.
Present L2 cache with 2-1-1-1 timings is always faster than L2 cache disabled.
FPM can cope with this at up to 4x40MHz, or 3x50MHz. After that 50ns EDO modules are needed.
For reference, UUD motherboard, Am5x86, 3x60MHz, L2 cache 2-1-1-1, 64Mb EDO 1/0WS, all other BIOS settings on MAX except IBC DEVSEL# DECODING= MEDIUM (mandatory for anything above 33MHz), produces the next results:

DOOM = 72 fps (1037 realticks)
Quake = 20.1 fps
PCPBench = 29.1 fps
and so on.

These numbers cannot be achieved under any configuration with L2 cache disabled.
Again, the differentiating factor is level 2 cache at 2-1-1-1.
With that in place, performance is always better for all relevant FSB/CPU frequencies (3x50/150, 4x40/160, 4x50/200, 3x66/200).
That's why i expressed disagreement early in the thread with some of the notions here.

---

Mkarcher, hope the adapter works.
It looks like i am the only one around who sports reliable 2-1-1-1 cache sets for the UUD boards.
Your approach will open an easier path to that, similar to how LuckyStar revision D enables it with its SOJ chips.
That will be neat.

retro bits and bytes

Reply 51 of 108, by feipoa

User metadata
Rank l33t++
Rank
l33t++

pshipkov:

since not many people will be able to locate DIP L2 modules (256K) which can work at 2-1-1-1 at 60/66 MHz, and alternately, probably won't find a 64 MB FPM module which can do 1ws/0ws at 60/66 MHz, I think it is a great alternative to use EDO with L2 disabled, provided they have a 64 MB EDO module which can do 0ws/0ws. On the flip side, it feels a little incomplete [and naked] to run a 486 motherboard without any L2 cache.

Even better would be to run the EDO Read at 3-1-1-1 rather than 4-2-2-2, but I doubt this is achievable at 60 MHz and with 64 MB.

You bring up the DEVSEL# DECODING option. How much performance loss is there when going from Fast to Medium to Slow? I haven't tested this and normally leave it on Slow for 2x66 w/cx5x86.

mkarcher:

Could you remind me - were you ever able to find some old stock 8 ns SRAM to fit on your SOJ adaptor, or are we still limited to 10 ns?

Could you provide a photo of the assembled cache adaptor (not a 3D drawing)? What is the diameter of the pins you are pushing into the cache sockets? I think ordinarily, they are 0.63 mm, which seems like it might over stretch the spring in the cache socket. You can get some round machine pin through-hole pins which are 0.45 mm. If they are still available, I recall there being some relatively thin rectangular pins used with the Arduino. I measured one I have here and it is only 0.39 mm. For reference, the thickness of ordinary DIP SRAM pins are 0.26 mm.

Pondering further about your adaptor, I previously mentioned that it ideally would be able to do 512K and 1024K double-banked, however, I wish to augment that statement to include 256K double-banked because there are 8 ns modules available. If your current prototype adaptor is proven successful, are you still planning on a modified PCB revision, ideally with space for 300 mil SOJ32 sockets? If you aren't planning on any revision, and if you have any extra PCBs, I'd be interested to test one out.

Which Cypress 10 ns modules from China? Do you have a link? I've only seen the Winbond and ISSI reproductions.

I agree 2 ns isn't great for filtering SRAM modules, but it can at least provide some initial insight and you can narrow down which SRAM modules to avoid. Pshipkov has some curious procedure for manually selecting hotrod SRAM modules, which is how he achieved 2-1-1-1 at 60/66 MHz on UUD. It is abnormally time consuming when you are starting with a blank slate, and even after several hours, I could not match his success.

I ran a quick test with the modified Schottky and 3A VRM using an IBM 5x86c at 2x66. While sitting at a DOS prompt, the maximum voltage I could set was 4.12 V. This particular chip doesn't do 133 MHz reliably and running it at 4 V or 4.12 V didn't help the situation. In fact, at 4.12 V, all I could type was the letter 'q' in the DOS prompt before the system froze. At 4 V, I could at least type 'quake' and it would attempt to load, then fail. Too much voltage with the Cyrix 5x86 series is a determent. From my past testing with the UUD and Cx5x86 chips, 3.73 V seemed to be the magic voltage for 133 MHz. Best chips for 2x/133 MHz were IBM branded QFP chips from Week 51 of 1995 to Week 7 of 1996.

Plan your life wisely, you'll be dead before you know it.

Reply 52 of 108, by Disruptor

User metadata
Rank Oldbie
Rank
Oldbie

feipoa:
The module is not mounted completely yet, as mkarcher was playing with a voltage regulator yesterday.
He's playing with the Am486 DX4 SV8B 120 today to push it from 2x60 MHz to 2x66 MHz. The Cx5x86 100 we have won't go over 2x60.
However, the ambient temperature is playing against us as we may hit the 30°C today for the very first time this year.
And he's still waiting for the nozzle to finish his solder work.

Reply 53 of 108, by feipoa

User metadata
Rank l33t++
Rank
l33t++

Which nozzle is he waiting for? When using solder paste, I've found the standard 5 mm nozzle that comes with hot air guns to be sufficient for SOJ work. Can I see a photo of this nozzle specifically designed for 300 mil SOJ?

Plan your life wisely, you'll be dead before you know it.

Reply 54 of 108, by mkarcher

User metadata
Rank l33t
Rank
l33t
pshipkov wrote on 2023-06-09, 02:30:

Again, the differentiating factor is level 2 cache at 2-1-1-1.

Agreed. There is no way that 3-1-1-1 EDO can beat 2-1-1-1 L2 cache. If you get 2-1-1-1 L2 cache working at 60 MHz, that's definitely the optimum configuration. The bold "EDO beats cache" claim in the title applies if you compare a three-clock cache leadoff to two-clock EDO access time. I'm surprised that you can get 2-1-1-1 at 66MHz, though. That's a very tight window for the tag comparator. Possibly the use of a AMD processor is helpful here, as they say that Cyrix processors are more critical regarding cache/memory timings. Based on those observations, I guess that Intel/AMD processors provide a longer setup time on the address line then Cyrix processors. The tag lookup actually starts as soon as the address appears on the FSB address pins, which is any time between the clock edge with /RDY asserted that terminated the previous cycle and the clock edge with /ADS asserted that initiates the next cycle.

Reply 55 of 108, by mkarcher

User metadata
Rank l33t
Rank
l33t
feipoa wrote on 2023-06-09, 09:08:

You bring up the DEVSEL# DECODING option. How much performance loss is there when going from Fast to Medium to Slow? I haven't tested this and normally leave it on Slow for 2x66 w/cx5x86.

While I'm not pshipkov, I guess the DEVSEL# decoding option has no effect on game performance. The PCI standard describes three valid "reaction speeds" of devices on the PCI bus to claim a cycle. A PCI device claims a cycle by asserting the DEVSEL# line in response to seeing an address on the bus that is configured to target said device. Those three speeds (IIRC 1, 2 and 3 PCI clocks after the address is asserted, but it might also be 2,3,4) are called "fast DEVSEL#", medium "DEVSEL#" and "slow DEVSEL#". On most PCI devices, you can't configure how fast they respond, but they indicate their speed in the PCI header so the BIOS knows how long the slowest device tkes to respond. This could be used by the BIOS to select some bus timeouts or fallback handlings.

Wow, a whole paragraph not yet explaining what this DEVSEL# setting is about. So let's continue: The ISA bridge doesn't know what address ranges reside on the ISA bus. Instead, it uses "subtractive decode": It waits until the time frame for "slow DEVSEL#" is over, and then claims and forwards the cycle to the ISA bus as a "fallback solution". This setup option chooses whether the ISA brigde indeed waits for "slow DEVSEL#" to be over, or already claims the cycle and forwards it to the ISA bus when "medium DEVSEL#" is over (i.e. one PCI clock earlier). So the DEVSEL# setting in the CMOS setup is only relevant for cycles accessing ISA cards. Those should be rare enough in DOOM or Quake to not impact performance (unless you run on an ISA VGA card).

feipoa wrote on 2023-06-09, 09:08:

Could you remind me - were you ever able to find some old stock 8 ns SRAM to fit on your SOJ adaptor, or are we still limited to 10 ns?

I can't remember ever seeing serious offers for 8ns SRAM. To be fair, I didn't even know these chips ever existed.

feipoa wrote on 2023-06-09, 09:08:

Could you provide a photo of the assembled cache adaptor (not a 3D drawing)?

Of course, I took a couple of pictures using my mobile phone yesterday when I assembled the board.

FullyAssembled.jpg
Filename
FullyAssembled.jpg
File size
411.47 KiB
Views
756 views
File comment
Assembled board (without cache chips)
File license
CC-BY-4.0
PinsForSoldering.jpg
Filename
PinsForSoldering.jpg
File size
445.41 KiB
Views
756 views
File comment
Pins ready for soldering
File license
CC-BY-4.0
SolderedPinsFront.jpg
Filename
SolderedPinsFront.jpg
File size
348.35 KiB
Views
756 views
File comment
Some pins soldered, viewing the pins
File license
CC-BY-4.0

I then decided to plug the big chunks into the mainboard, and continue soldering on the mainboard, to avoid possible alignment issues. In hindsight, I'm unsure whether this was a good idea, though.

feipoa wrote on 2023-06-09, 09:08:

What is the diameter of the pins you are pushing into the cache sockets? I think ordinarily, they are 0.63 mm, which seems like it might over stretch the spring in the cache socket. You can get some round machine pin through-hole pins which are 0.45 mm.

I am not using the ordinary pin header pins, which are 0.63mm (other sources claim 0.64mm) indeed. I explicitly warned to not use those in the post containing the 3D rendering (because the rendering had those pins). Instead, I'm using the product called "IC Adapterleiste" by my German hobbyist / small business retailer: https://www.reichelt.de/ic-adapterleiste-20-p … 2-20-p4426.html . I'm told those pins are called "IC headers" in english, but international search results were less than stellar. These pins are specified as 0.47mm round, which is quite close to the 0.45mm you mention.

feipoa wrote on 2023-06-09, 09:08:

If they are still available, I recall there being some relatively thin rectangular pins used with the Arduino. I measured one I have here and it is only 0.39 mm.

0.39mm edge length means 0.55mm diagonal. I expect these pins are meant to be pluggable into the precision machined IC sockets that are meant for round pins up to 0.56mm or similar rectangular pins.

feipoa wrote on 2023-06-09, 09:08:

Pondering further about your adaptor, I previously mentioned that it ideally would be able to do 512K and 1024K double-banked, however, I wish to augment that statement to include 256K double-banked because there are 8 ns modules available. If your current prototype adaptor is proven successful, are you still planning on a modified PCB revision, ideally with space for 300 mil SOJ32 sockets? If you aren't planning on any revision, and if you have any extra PCBs, I'd be interested to test one out.

The current revision has one ground pin at the wrong location (and thus not soldered). That's the one close to C6. I will re-run that board with the location of that pin fixed, and the pullup resistor to easily degrade to 512KB added. I need to take a look how much work adding 256KB support would require. Also, routing for the sockets might prove more challenging. Currently, U2/U3/U4, as well as U6/U7/U8 are located at the identical location relative to their DIP socket. This won't be possible with sockets anymore, but I can imagine a layout that could work will with sockets.

Do you have a link at hand for a datasheet of an 8ns 256KB chip? Is it still SOJ32, would those chips require SOJ28 sockets, or can you just plug SOJ28 into SOJ32 sockets?

feipoa wrote on 2023-06-09, 09:08:

Which Cypress 10 ns modules from China? Do you have a link? I've only seen the Winbond and ISSI reproductions.

A batch of CY7C1009D-10VXI I got on AliExpress around 5 years ago. You can see them on the board I tried to hand-solder. I damaged some traces trying to scratch (instead of wash) away flux residues, and I seem to have damaged further traces while practicing how much solder I need to drag solder those SOJ chips. That PCB board is likely SNAFU, and I hope the cache chips are still fine, though. Here is a photo of that old board put on pins plugged into the cache sockets of the MB-8433UUD-A

OldBoardDamagedTraces.jpg
Filename
OldBoardDamagedTraces.jpg
File size
529 KiB
Views
756 views
File comment
Hand-soldered board with damaged traces
File license
CC-BY-4.0

I agree 2 ns isn't great for filtering SRAM modules, but it can at least provide some initial insight and you can narrow down which SRAM modules to avoid.

I hope the analog nature of my oscilloscope allows sub-nanosecond resolution relative timing using interpolation. This might be good enough. The limited bandwidth of my probes will distort the signal, but as long as the original signal shape and the kind distortion is the same over all acquisitions, relative figures should still be useful.

feipoa wrote on 2023-06-09, 09:08:

Too much voltage with the Cyrix 5x86 series is a determent. From my past testing with the UUD and Cx5x86 chips, 3.73 V seemed to be the magic voltage for 133 MHz.

I guess they are locally overheating inside the package. Anyone going to "delid" them (removing the ceramic) and install direct-die heatsinking? 😉 Your experience confirms my decision to not push higher than 3.87V, because I didn't observe any improvements in stability at 133MHz between 3.75 and 3.87.

Reply 56 of 108, by mkarcher

User metadata
Rank l33t
Rank
l33t
feipoa wrote on 2023-06-09, 10:36:

Which nozzle is he waiting for? When using solder paste, I've found the standard 5 mm nozzle that comes with hot air guns to be sufficient for SOJ work. Can I see a photo of this nozzle specifically designed for 300 mil SOJ?

I might try using a simple round nozzle moving around over the chip edges. I already used that kind of nozzle one successfully to mount the capacitors on the board I show in the previous post. The special nozzle I'm talking about is a knock-off of the Hakko A1184B (original) nozzle.

Reply 57 of 108, by BitWrangler

User metadata
Rank l33t++
Rank
l33t++

Oh righty, that does look a bit special. I was gonna say use a tomato paste can rolled to a cone, or thick tinfoil used for pie plates. 🤣 .... though I am now getting ideas 🤣

edit: actually might make something custom for a 40pin SOJ (Video RAM) I wanna do on a board that has far too much else crowded around it for comfort.

Unicorn herding operations are proceeding, but all the totes of hens teeth and barrels of rocking horse poop give them plenty of hiding spots.

Reply 58 of 108, by pshipkov

User metadata
Rank l33t
Rank
l33t

Agreed with the relevant comments above.

DEVSEL# DECODING
I brought it for completeness.
It has no performance impact as far as i can tell.
If set to FAST - POST does not complete past 33MHz FSB.
That's all.

retro bits and bytes

Reply 59 of 108, by feipoa

User metadata
Rank l33t++
Rank
l33t++
mkarcher wrote on 2023-06-09, 16:57:

I'm surprised that you can get 2-1-1-1 at 66MHz, though. That's a very tight window for the tag comparator. Possibly the use of a AMD processor is helpful here, as they say that Cyrix processors are more critical regarding cache/memory timings. Based on those observations, I guess that Intel/AMD processors provide a longer setup time on the address line then Cyrix processors.

Thanks for the description of DEVSEL. The concept of some of these less common BIOS settings can be hard to grasp. I think you would like the BIOS Companion book by Phil Croucher. Below is from his book on BIOS settings. I think your description was more informative though.

CPU Mstr DEVSEL# Time-out
When the CPU initiates a master cycle using an address (target) which has not been mapped to PCI/VESA or ISA space, the system will monitor the DEVSEL (device select) pin to see if any device claims the cycle. Here, you can determine how long the system will wait before timing-out. Choices are 3 PCICLK, 4 PCICLK, 5 PCICLK and 6 PCICLK (default).

PCI Mstr DEVSEL# Time-out
As above, for PCI devices.

IBC DEVSEL# Decoding
Sets the decoding used by the ISA Bridge Controller (IBC) to determine which device to select. The longer the decoding cycle, the better chance it has to correctly decode commands. Choices are Fast, Medium and Slow (default). Fast is less stable and may trash a hard disk.

From my experience with an IBM 5x86c-133/2x, I would get occasional hang-ups with Medium and setting this to Slow resolved that. I don't recall any longer which ISA device was the issue, but I think it was sound.

mkarcher wrote on 2023-06-09, 17:53:

I am not using the ordinary pin header pins, which are 0.63mm (other sources claim 0.64mm) indeed. I explicitly warned to not use those in the post containing the 3D rendering (because the rendering had those pins). Instead, I'm using the product called "IC Adapterleiste" by my German hobbyist / small business retailer: https://www.reichelt.de/ic-adapterleiste-20-p … 2-20-p4426.html . I'm told those pins are called "IC headers" in english, but international search results were less than stellar. These pins are specified as 0.47mm round, which is quite close to the 0.45mm you mention.

I call them male-to-male machine pins headers. I have ones similar to what you've shown and have used them on the custom SXL2 interposer, download/file.php?id=160497&mode=view

mkarcher wrote on 2023-06-09, 17:53:

0.39mm edge length means 0.55mm diagonal. I expect these pins are meant to be pluggable into the precision machined IC sockets that are meant for round pins up to 0.56mm or similar rectangular pins.

Attached are photos of the thin pin headers I'm referring. They are approximately 0.39 mm in thickness and 0.65 mm in width. If I get the chance to assemble one of your cache modules, I'd probably use these. They provide more surface area contact with the sockets compared to round machine pins. I would cut off the female end once soldered in place. Alternately, I suppose the DIP sockets can be replaced with round female counterparts as well and continue to use the machine pins. My main issue with machine pin sockets is that they wear out after enough insertions. The DIP sockets are spring loaded and I think should last longer.

Thin_headers_1.JPG
Filename
Thin_headers_1.JPG
File size
355.85 KiB
Views
685 views
File license
CC-BY-4.0
Thin_headers_2.JPG
Filename
Thin_headers_2.JPG
File size
615.8 KiB
Views
685 views
File license
CC-BY-4.0
Thin_headers_3.JPG
Filename
Thin_headers_3.JPG
File size
664.9 KiB
Views
685 views
File license
CC-BY-4.0
Thin_headers_4.JPG
Filename
Thin_headers_4.JPG
File size
552.05 KiB
Views
685 views
File license
CC-BY-4.0

In the last photo, I'm holding a regular motherboard pin header next to the thin arduino header.

Plan your life wisely, you'll be dead before you know it.