VOGONS


Reply 60 of 108, by feipoa

User metadata
Rank l33t++
Rank
l33t++
mkarcher wrote on 2023-06-09, 17:53:

The current revision has one ground pin at the wrong location (and thus not soldered). That's the one close to C6. I will re-run that board with the location of that pin fixed, and the pullup resistor to easily degrade to 512KB added. I need to take a look how much work adding 256KB support would require. Also, routing for the sockets might prove more challenging. Currently, U2/U3/U4, as well as U6/U7/U8 are located at the identical location relative to their DIP socket. This won't be possible with sockets anymore, but I can imagine a layout that could work will with sockets.

Do you have a link at hand for a datasheet of an 8ns 256KB chip? Is it still SOJ32, would those chips require SOJ28 sockets, or can you just plug SOJ28 into SOJ32 sockets?

Yes, the whole PCB layout would have to be altered to accomodate the sockets, but I think people would appreciate this. Perhaps ensure that the socketless PCB works well first.

Unfortunately, I don't have a datasheet for the 32kx8-8ns EliteMT LP61256GS-8. I couldn't locate one upon a quick search. But the pinouts should be the same as CY7C199D-10VXI, for which the datasheet is here: https://www.mouser.ca/datasheet/2/196/Infineo … Dat-3161465.pdf

So far, I've only seen 64kx8 in 8 ns as well. I'm not sure if 128kx8 - 8ns were made. They very well may have been, but I don't have any. Here is a photo of my 64kx8, SOJ32, 300 mil that I have. EliteMT LP61L512AS-8

SOJ_300mil_socket_64kx8.JPG
Filename
SOJ_300mil_socket_64kx8.JPG
File size
144.7 KiB
Views
1055 views
File license
CC-BY-4.0

Yes, SOJ28 SRAM modules would fit in the same SOJ32 sockets. I took a photo of EliteMT LP61256GS-8 placed in the same socket as CY7C1009D-10VXI.

SOJ_300mil_socket_32kx8_and_128kx8_A.JPG
Filename
SOJ_300mil_socket_32kx8_and_128kx8_A.JPG
File size
427.13 KiB
Views
1055 views
File license
CC-BY-4.0
SOJ_300mil_socket_32kx8_and_128kx8_B.JPG
Filename
SOJ_300mil_socket_32kx8_and_128kx8_B.JPG
File size
490.7 KiB
Views
1055 views
File license
CC-BY-4.0
mkarcher wrote on 2023-06-09, 17:58:

I might try using a simple round nozzle moving around over the chip edges. I already used that kind of nozzle one successfully to mount the capacitors on the board I show in the previous post. The special nozzle I'm talking about is a knock-off of the Hakko A1184B (original) nozzle.

I don't know your level of experience with hot air and solder paste, but based on the photos you've shown me, I'd probably just use a regular 5 mm round nozzle. It seems to me like you'd only need the parallel lots of hot air if you're soldering on the SOJ modules with pre-tinned pads.

EDIT: I forgot to mention that, luckily, these 32kx8-8ns modules [EliteMT LP61256GS-8] have ample supply from Chinese IC warehouses. The issue I've run into is the ridiculous shipping amounts they want, e.g. $40 USD to just for shipping 10 pieces. Yet when you look for IC's on eBay, they have these $1-2 "SpeedPAK" options. I'm not sure why the IC warehouses/distributors cannot use "SpeedPAK".

Plan your life wisely, you'll be dead before you know it.

Reply 61 of 108, by mkarcher

User metadata
Rank l33t
Rank
l33t
feipoa wrote on 2023-06-10, 08:24:
Thanks for the description of DEVSEL. The concept of some of these less common BIOS settings can be hard to grasp. I think you w […]
Show full quote

Thanks for the description of DEVSEL. The concept of some of these less common BIOS settings can be hard to grasp. I think you would like the BIOS Companion book by Phil Croucher. Below is from his book on BIOS settings. I think your description was more informative though.

CPU Mstr DEVSEL# Time-out
When the CPU initiates a master cycle using an address (target) which has not been mapped to PCI/VESA or ISA space, the system will monitor the DEVSEL (device select) pin to see if any device claims the cycle. Here, you can determine how long the system will wait before timing-out. Choices are 3 PCICLK, 4 PCICLK, 5 PCICLK and 6 PCICLK (default).

PCI Mstr DEVSEL# Time-out
As above, for PCI devices.

IBC DEVSEL# Decoding
Sets the decoding used by the ISA Bridge Controller (IBC) to determine which device to select. The longer the decoding cycle, the better chance it has to correctly decode commands. Choices are Fast, Medium and Slow (default). Fast is less stable and may trash a hard disk.

From my experience with an IBM 5x86c-133/2x, I would get occasional hang-ups with Medium and setting this to Slow resolved that. I don't recall any longer which ISA device was the issue, but I think it was sound.

I am just talking from theory here, I'm not contending your experience. I described the setting that's called "IBC DEVSEL# Decoding" in the BIOS compendium. The other settings are even less important: They are for cycles that are not claimed by any PCI device and also not claimed by the ISA bridge. This can happen for memory above 16MB, because those cycles cannot be forwarded to ISA, and won't be claimed by the ISA bridge. Cycles not claimed by any device should not happen during normal operation, because they are pointless. So choosing the slowest possible setting here is safe and doesn't impact operation performance. It might affect some kinds of memory range scans during hardware detection, but even that use case is likely not significantly affected by the DEVSEL# timeout setting.

Regarding theory: The DEVSEL# stuff is part of the PCI bus, not of the frontside bus. The stable values for the DEVSEL# setting might depend on the PCI clock, but should be independent from the FSB clock. If PCI at FSB33 works with "IBC DEVSEL# Decoding: medium", I would have expected it to work at FSB66 with PCI:FSB = 1:2 as well. The expected effect of choosing a faster value for IBC devsel decoding is: Faster ISA cycles (might be noticable on ISA VGA or ISA IDE CD-ROM), at the risk of forwarding some cycles to ISA that are actually targetting a slow PCI card (network card?). If that happens, the ISA bridge and the slow PCI device both respond to the cycle, which can cause undefined behaviour on the bus up to system lockups.

feipoa wrote on 2023-06-10, 08:24:

Attached are photos of the thin pin headers I'm referring. They are approximately 0.39 mm in thickness and 0.65 mm in width. If I get the chance to assemble one of your cache modules, I'd probably use these. They provide more surface area contact with the sockets compared to round machine pins. I would cut off the female end once soldered in place.

Likely they are also more robust against bending. My machined pins bend quite easily, so inserting the cache board needs to be performed carefully. The dimensions you quote exceed the recommended maximum size for the machined precision sockets, which seems to be around 0.40mm x 0.55mm. They should work fine in the cheap spring-loaded sockets, as they are used on the MB-8433UUD-A, though.

feipoa wrote on 2023-06-10, 08:24:

In the last photo, I'm holding a regular motherboard pin header next to the thin arduino header.

Which clearly shows why you should not put these big square pins into the spring-loaded sockets (wears out the springs by excessive bending), and also not into the precision machined sockets (These pins are able to stretch the holes until it tears apart).

feipoa wrote on 2023-06-10, 08:24:

Yes, the whole PCB layout would have to be altered to accomodate the [SOJ-32] sockets, but I think people would appreciate this. Perhaps ensure that the socketless PCB works well first.

Exactly. I'm not gonna invest into adaptions of this board until the concept itself is proven.

[quote="feipoa" post_id=1169659 time=1686385460 user_id=22157
Unfortunately, I don't have a datasheet for the 32kx8-8ns EliteMT LP61256GS-8. I couldn't locate one upon a quick search. But the pinouts should be the same as CY7C199D-10VXI, for which the datasheet is here.
[/quote]
OK, that's sufficient. The SOJ pinout seems to be identical to the DIP pinout, both for 28- and 32-pin chips. I'm burned by the fact that the PLCC32 variants of the 27xxx series EEPROMs have two mutually incompatible pinouts: One is used for chips that are also available in DIP28, and the other one is used for chips that are also available in DIP32. The first time I ordered a 32-pin PLCC flash chip, I picked the wrong variant...

Perfect support for 28-pin chips requires a layout change, because they take supply voltage on pin 28 of 28 (aka 30 of 32). While that pin is also at +5V for the 32-pin chips, my current layout exploits the fact that pin 30 of 32 is not highly loaded and thus does not require a low-inductance path to Vcc. I have pin 32 properly decoupled, but pin 30 is connected using a long thin trace which is definitely not following best practice for power trace routing. Or say it more blunt: The way pin 30 of 32 is connected to Vcc is entirely unsuited to be a Vcc pin, especially for high speed ICs. The reason is that I route some address lines between pins 30 and 32, and the trace from pin 32 to pin 30 is routed around the whole bank of chips. The whole address line stuff is routed on the component side of the board because placing vias between the lines of the address bus is quite at the limit of the manufacturing capabilities of the cheap 2-layer process offered by JLCPCB. I'm sure a stable pin 30-of-32 supply can be done, though.

feipoa wrote on 2023-06-10, 08:24:

Yes, SOJ28 SRAM modules would fit in the same SOJ32 sockets. I took a photo of EliteMT LP61256GS-8 placed in the same socket as CY7C1009D-10VXI.

OK, thanks for confirming. In that case, 256K support is likely possible. I need to make sure to route the address pins in a way that the two pins not used by 32K x 8 chips are actually the highest address lines. Spoiler alert: Currently, that's not the case, because supporting chips smaller than 128k x 8 was no requirement while I did the layout, so I took the freedom to swap address pins as it seemed fit. That's especially interesting for the second bank that has the chips rotated by 180 degrees. I did that because I only connect the data pins on bank 1 to the mainboard, and having bank 2 rotated by 180 degrees places the bank 2 data pins next to the bank 1 data pins. At the same time, I only connect /WE on bank 2, and pull that signal up to bank 1. Only this kind of limiting the pins I connect to the mainboard allowed me to use place the SOJ32 pinouts between the through-hole mounted connecting pins.

feipoa wrote on 2023-06-10, 08:24:

I don't know your level of experience with hot air and solder paste, but based on the photos you've shown me, I'd probably just use a regular 5 mm round nozzle. It seems to me like you'd only need the parallel lots of hot air if you're soldering on the SOJ modules with pre-tinned pads.

Soldering the caps on that PCB was my first hot-air experience (except for using a home-improvement store heat gun to bulk desolder scrap electronics). It's amazing to experience first-hand how surface tension is able to pull the capacitors back into their position when the solder melts. The percieved advantage of the wide nozzle is that I get to melt the paste on all pins approximately at the same time, so I expect the effect of locking the chip into the right position to be more pronounced than when using a small nozzle and hover over all the pins. Last status update on the nozzle order (I ordered a couple of likely useful nozzles) is that they left the international shipping center at Shenzen a week ago. Ebay estimates ten days to delivery. This is using the "SpeedPAK" delivery option you talked about.

feipoa wrote on 2023-06-10, 08:24:

I forgot to mention that, luckily, these 32kx8-8ns modules [EliteMT LP61256GS-8] have ample supply from Chinese IC warehouses. The issue I've run into is the ridiculous shipping amounts they want, e.g. $40 USD to just for shipping 10 pieces.

My experience with one of the bigger Chinese warehouse sellers is that their claimed stock is sometimes made up. I tried to order a set of 70 chips 256k x 16 25ns EDO RAM for video memory at one of them, that shall stay utnamed (mind the "typo" 😉 ), as they claimed to have 32100 in stock at 1.05€ a piece, and after placing the order, they contacted me that they can only fulfill 30 of those, at the same time rising the price to 2.80€ a piece, so they asked me to send them extra money to receive less chips. They accepted my request to cancel that position on my order, though.

Reply 62 of 108, by rasz_pl

User metadata
Rank l33t
Rank
l33t
mkarcher wrote on 2023-06-10, 10:32:

The expected effect of choosing a faster value for IBC devsel decoding is: Faster ISA cycles (might be noticable on ISA VGA or ISA IDE CD-ROM), at the risk of forwarding some cycles to ISA that are actually targetting a slow PCI card (network card?). If that happens, the ISA bridge and the slow PCI device both respond to the cycle, which can cause undefined behaviour on the bus up to system lockups.

In what circumstances would that happen? Wouldnt some ISA device need to explicitly claim access targeting PCI device? arent PCI devices usually mapped in PCI address space somewhere far high in ram map?

Open Source AT&T Globalyst/NCR/FIC 486-GAC-2 proprietary Cache Module reproduction

Reply 63 of 108, by mkarcher

User metadata
Rank l33t
Rank
l33t
rasz_pl wrote on 2023-06-10, 11:34:
mkarcher wrote on 2023-06-10, 10:32:

The expected effect of choosing a faster value for IBC devsel decoding is: Faster ISA cycles (might be noticable on ISA VGA or ISA IDE CD-ROM), at the risk of forwarding some cycles to ISA that are actually targetting a slow PCI card (network card?). If that happens, the ISA bridge and the slow PCI device both respond to the cycle, which can cause undefined behaviour on the bus up to system lockups.

In what circumstances would that happen? Wouldnt some ISA device need to explicitly claim access targeting PCI device? arent PCI devices usually mapped in PCI address space somewhere far high in ram map?

ISA has no notion of "claiming cycles". ISA also doesn't have a "ready" signally that needs to be actively driven to terminate a cycle. Instead, ISA cycles are executed at a default speed (usually equivalent the duration of 4 clocks of 4.77MHz for 8-bit I/O cycles, and somewhat faster for 16-bit cycles as that is the speed of the IBM PC). If a device needs more time, it can actively extend a cycle, though.

In most PC-compatible computers, the ISA bridge does not use "additive decode" which would mean that the ISA bridge knows which addresses reside on the ISA bus, and actively claims those cycles, but the ISA bridge uses "subtractive decode" instead, and claims all cycles not claimed by any PCI device. If you configure the ISA bridge to claim a cycle when the timeout for "medium DEVSEL#" has expired, it might claim a cycle targeted to a PCI device that uses "slow DEVSEL#".

While you are correct that most PCI devices do not claim any memory addresses in the low 16MB range that can be forwarded to ISA (except for VGA-compatible graphics cards), the I/O space is completely shared between ISA and PCI. Typically, PC BIOses configure I/O ports on the PCI bus to be located at x0xx, x4xx, x8xx or xCxx, which traditionally (before PCI) decoded to aliased of mainboard components like the IRQ controller, the DMA controller, the timer chip, the keyboard controller and so on, but PCI cards can claim other cycles, like again legacy-compatible PCI VGA cards do.

My idea is that a NE200-compatible 10-MBit PCI network card might use "slow DEVSEL#" and respond to ports DC00-DC1F, and if you configure the ISA bridge to "medium DEVSEL#", it might forward the cycle to ISA. Having two cards selected at the same time causes the PCI bus to break in various way, as every device supposes it is the only selected target. This is most prominently seen at the "TRDY" line that is to be driven by the one selected target when it is ready to finish the cycle. Having the PCI NE2000 finish the cycle first, and then getting another stray "finish this cycle" pulse from the ISA bridge while the next cycle is already running will cause trouble. Of course, this idea is nonsense if the ISA bridge has a filter to not claim I/O ranges typically assigned to PCI devices. I don't know how the UMC8886 behaves in this regard.

Reply 64 of 108, by jakethompson1

User metadata
Rank Oldbie
Rank
Oldbie
mkarcher wrote on 2023-06-10, 14:08:
ISA has no notion of "claiming cycles". ISA also doesn't have a "ready" signally that needs to be actively driven to terminate a […]
Show full quote

ISA has no notion of "claiming cycles". ISA also doesn't have a "ready" signally that needs to be actively driven to terminate a cycle. Instead, ISA cycles are executed at a default speed (usually equivalent the duration of 4 clocks of 4.77MHz for 8-bit I/O cycles, and somewhat faster for 16-bit cycles as that is the speed of the IBM PC). If a device needs more time, it can actively extend a cycle, though.

In most PC-compatible computers, the ISA bridge does not use "additive decode" which would mean that the ISA bridge knows which addresses reside on the ISA bus, and actively claims those cycles, but the ISA bridge uses "subtractive decode" instead, and claims all cycles not claimed by any PCI device. If you configure the ISA bridge to claim a cycle when the timeout for "medium DEVSEL#" has expired, it might claim a cycle targeted to a PCI device that uses "slow DEVSEL#".

While you are correct that most PCI devices do not claim any memory addresses in the low 16MB range that can be forwarded to ISA (except for VGA-compatible graphics cards), the I/O space is completely shared between ISA and PCI. Typically, PC BIOses configure I/O ports on the PCI bus to be located at x0xx, x4xx, x8xx or xCxx, which traditionally (before PCI) decoded to aliased of mainboard components like the IRQ controller, the DMA controller, the timer chip, the keyboard controller and so on, but PCI cards can claim other cycles, like again legacy-compatible PCI VGA cards do.

My idea is that a NE200-compatible 10-MBit PCI network card might use "slow DEVSEL#" and respond to ports DC00-DC1F, and if you configure the ISA bridge to "medium DEVSEL#", it might forward the cycle to ISA. Having two cards selected at the same time causes the PCI bus to break in various way, as every device supposes it is the only selected target. This is most prominently seen at the "TRDY" line that is to be driven by the one selected target when it is ready to finish the cycle. Having the PCI NE2000 finish the cycle first, and then getting another stray "finish this cycle" pulse from the ISA bridge while the next cycle is already running will cause trouble. Of course, this idea is nonsense if the ISA bridge has a filter to not claim I/O ranges typically assigned to PCI devices. I don't know how the UMC8886 behaves in this regard.

How much of this also applies to VLB+PCI systems (which the 8881+8886 can be used to build like the M919)? Is there an additional delay introduced, e.g., such that VGA I/O ports go to the PCI bus first and then get redirected to VLB when they are unclaimed? There have always been claims that VLB+PCI systems are hindered somehow.

Reply 65 of 108, by mkarcher

User metadata
Rank l33t
Rank
l33t
feipoa wrote on 2023-06-08, 10:40:
mkarcher wrote on 2023-06-07, 18:50:

I didn't get -1-1-1 at FSB60 with the DIP chips I have.

I had the best of luck with UMC branded L2 UM61256FK-15. My 10 ns Chinese Winbond reproductions worked at 3-1-1-1 as well.

While I only found 5 chips UM61256FK-15 in the spare SRAM box, I found 9 UM61m256K-15 (the m printed in italics is part of the chip marking...). The key point on getting -1-1-1 burst at FSB60 seems to be to not use a Cyrix processor. With an AMD 486DX4 at 60*2, I got 3-1-1-1 with the UM61m256K-15 chips, while it didn't work well at all with the Cyrix 5x86 at 60*2.

Furthermore, I don't get the Cyrix with jump prediction enabled stable at 120MHz at all at the moment - but the room temperature is around 10°C (18°F) higher than in the winter when I wrote the initial post. To test stuff with high FSB clocks, I used the AMD 486DX4 in 2x mode instead. All measurements performed with EDO in 4-2-2-2 mode and WB cache (WB cache causes low "memory move" performance)

CPU   RAM  L2   WS    Doom  Quake PCP     L1R   L1W   L1M   L2R  L2W  L2M  MR   MW    MM
2*66 EDO 3-2 1/0 crash 14.7 22.2 125.4 85.0 159.6 63.9 84.7 35.3 46.4 84.8 18.9
2*66 EDO 3-1 1/0 crash crash 22.7 126.2 85.0 161.1 78.7 84.7 39.3 46.4 84.8 20.0
2*66 EDO off 1/0 crash 14.8 22.2 124.6 127.0 158.0 60.0 127.0 30.0
2*66 EDO none 0/0 crash 15.0 22.7 124.7 127.2 158.7 60.0 127.2 31.9
2*60 EDO 3-2 1/0 1773 13.2 19.9 113.0 76.6 143.7 57.5 76.3 31.7 41.8 76.4 17.0
2*60 FPM 3-2 1/0 1770 13.2 20.0 113.0 76.5 143.7 57.6 76.3 31.7 43.7 76.4 17.0
2*60 EDO 3-1 1/0 1735 13.6 20.4 113.7 76.5 145.0 70.8 76.3 35.4 41.8 76.4 18.0
2*60 FPM 3-1 1/0 1733 13.5 20.4 113.7 76.5 145.0 70.8 76.3 35.4 43.7 76.4 18.0
2*60 EDO off 1/0 1805 13.3 20.0 112.3 114.4 142.3 54.0 114.4 27.0
2*60 EDO none 0/0 1787 13.5 20.4 112.3 114.5 142.9 54.0 114.5 28.7
3*40 EDO* none 0/0 2228 13.0 19.3 112.3 76.5 140.9 55.5 76.3 24.5
  • EDO: EDO at 4-2-2-2; EDO*: EDO at 3-1-1-1
  • Doom: Dosbench item (b), result in realticks
  • Quake: Dosbench item (c), results in fps
  • PCP: Dosbench item (5), results in fps
  • L1R .. MM: Memory speed measurements reported by Speedsys (dosbench item (n)), when pressing "M" in the "benchmark done" screen.

This set of results shows that cacheless EDO with 0/0 WS (optimal condition) can beat a cached system only if your only option is to run the cache at 3-2-2-2. I was unable to get 0WS/0WS working with L2 chips inserted. I couldn't find a FPM module that works at 1WS/0WS at FSB66 at the moment. The comparison EDO/FPM at 66MHz should be similar to the comparison at 60MHz.

EDIT: Added 40*3 with the faster EDO burst.

Reply 66 of 108, by mkarcher

User metadata
Rank l33t
Rank
l33t
jakethompson1 wrote on 2023-06-10, 21:31:

How much of this also applies to VLB+PCI systems (which the 8881+8886 can be used to build like the M919)? Is there an additional delay introduced, e.g., such that VGA I/O ports go to the PCI bus first and then get redirected to VLB when they are unclaimed? There have always been claims that VLB+PCI systems are hindered somehow.

Just as PCI, VL also has a notion of "claiming" a cycle. The signal "LDEV#" on the VL bus works similar to the signal "DEVSEL#" on the PCI bus. Even on PCI+VLB 486 boards, the VL bus is directly connected to the 486 frontside bus, no extra delay introduced. On the other hand, all PCI+VLB mainbaords I had at hand introduced an extra wait state, probably because the VL bus RDY signal is synchronized with the processor clock instead of transparently passed through. Also, some PCI+VLB mainboards have issues with 0WS VL cycles, forcing you to your 1WS cycles. Back-to-back 0WS cycles require two clocks each, with a maximum performance of 66MB/s at FSB33. If you lose one cycle to RDY synchronisation and another cycle because you need to degrade to 1WS before issueing RDY, back-to-back cycles take 4 clocks instead of 2 clocks, and the maximum performance is limited to 33MB/s.

Reply 67 of 108, by jakethompson1

User metadata
Rank Oldbie
Rank
Oldbie
mkarcher wrote on 2023-06-10, 22:26:
jakethompson1 wrote on 2023-06-10, 21:31:

How much of this also applies to VLB+PCI systems (which the 8881+8886 can be used to build like the M919)? Is there an additional delay introduced, e.g., such that VGA I/O ports go to the PCI bus first and then get redirected to VLB when they are unclaimed? There have always been claims that VLB+PCI systems are hindered somehow.

Just as PCI, VL also has a notion of "claiming" a cycle. The signal "LDEV#" on the VL bus works similar to the signal "DEVSEL#" on the PCI bus. Even on PCI+VLB 486 boards, the VL bus is directly connected to the 486 frontside bus, no extra delay introduced. On the other hand, all PCI+VLB mainbaords I had at hand introduced an extra wait state, probably because the VL bus RDY signal is synchronized with the processor clock instead of transparently passed through. Also, some PCI+VLB mainboards have issues with 0WS VL cycles, forcing you to your 1WS cycles. Back-to-back 0WS cycles require two clocks each, with a maximum performance of 66MB/s at FSB33. If you lose one cycle to RDY synchronisation and another cycle because you need to degrade to 1WS before issueing RDY, back-to-back cycles take 4 clocks instead of 2 clocks, and the maximum performance is limited to 33MB/s.

I see, would that PCI+VLB penalty be controlled in the BIOS or would it be strapped on something like the 8881, such that by moving resistors around and such you might be able to speed up one of the PCI+VLB boards?

Reply 68 of 108, by feipoa

User metadata
Rank l33t++
Rank
l33t++
mkarcher wrote on 2023-06-10, 22:05:

While I only found 5 chips UM61256FK-15 in the spare SRAM box, I found 9 UM61m256K-15 (the m printed in italics is part of the chip marking...). The key point on getting -1-1-1 burst at FSB60 seems to be to not use a Cyrix processor. With an AMD 486DX4 at 60*2, I got 3-1-1-1 with the UM61m256K-15 chips, while it didn't work well at all with the Cyrix 5x86 at 60*2.

Furthermore, I don't get the Cyrix with jump prediction enabled stable at 120MHz at all at the moment - but the room temperature is around 10°C (18°F) higher than in the winter when I wrote the initial post. To test stuff with high FSB clocks, I used the AMD 486DX4 in 2x mode instead. All measurements performed with EDO in 4-2-2-2 mode and WB cache (WB cache causes low "memory move" performance)

This set of results shows that cacheless EDO with 0/0 WS (optimal condition) can beat a cached system only if your only option is to run the cache at 3-2-2-2. I was unable to get 0WS/0WS working with L2 chips inserted. I couldn't find a FPM module that works at 1WS/0WS at FSB66 at the moment. The comparison EDO/FPM at 66MHz should be similar to the comparison at 60MHz.

The m on the UMC part number indicates the module is a mixed-mode module, so 5V power, 3.3V I/O. I've attached a UMC datasheet for a mixed-mode module, albeit for 64kx8. I try to save my mixed mode modules for my socket 5 boards which claim to require them. On the other hand, they appear to work fine in 486 motherboards but will run warmer (according to my finger, anyway). From what I've seen, half of pshipkov's magic SRAM modules which do 2-1-1-1 at 66 Mhz on UUD are of the UMC mixed-mode variety, the others are non-mixed mode UMC modules.

Filename
SRAM-MixedMode-UM61M512.pdf
File size
191.94 KiB
Downloads
37 downloads
File license
Fair use/fair dealing exception

Your findings parallel mine. If you swap the CPU from Am5x86 to Cx5x86, L2 wait states will need to be slowed down, starting at, I think, 50 MHz on UUD. This is with 256K double-banked. When running more cache, the L2 wait states need to be re-adjusted. Having an adjustable L2 module which can also accept the 8 ns 256K cache chips is ideal.

What is the stepping/revision of your Cx5x86? I think the last week for S1R3 that I've seen was week 44, 1995. Ideally you want a 120 MHz labelled Cx5x86 for branch prediction for 120 MHz operation. If you have a 100 MHz, S1R3, cx5x86, it may work with branch prediction at 100 MHz, but not 120 MHz - yet it may still work at 120 MHz but w/out branch prediction. This was my finding anyway. I vaguely recall another user trying to get LSSER working at 120 MHz on a 100 MHz CPU, which also did not work well at 120 MHz. S1R3 120 MHz cx5x86 chips are out there, but not as common as S0R5.

For Am5x86-180, I find that EDO at 0ws/0ws (no L2) may be faster than FPM at 1ws/0ws (L2 @ 3-1-1-1), depending on the application [at 180 MHz]. For my tests, DOOM preferred the FPM, while PCPBench and Quake preferred the EDO. For your tests, DOOM agreed with my results, however your results indicated that Quake and PCPBench were the same for FPM/EDO with the stated conditions. Your tests were at 2x60 while mine were at 3x60. Perhaps 180 MHz is needed for the EDO results of Quake/PCPBench to move ahead of FPM?

mkarcher wrote on 2023-06-10, 10:32:

Likely they are also more robust against bending. My machined pins bend quite easily, so inserting the cache board needs to be performed carefully. The dimensions you quote exceed the recommended maximum size for the machined precision sockets, which seems to be around 0.40mm x 0.55mm. They should work fine in the cheap spring-loaded sockets, as they are used on the MB-8433UUD-A, though.

Yes, they definitely demonstrate improved elasticity compared to those machine pins, and are also acceptably rigid.

mkarcher wrote on 2023-06-10, 10:32:

Soldering the caps on that PCB was my first hot-air experience (except for using a home-improvement store heat gun to bulk desolder scrap electronics). It's amazing to experience first-hand how surface tension is able to pull the capacitors back into their position when the solder melts. The percieved advantage of the wide nozzle is that I get to melt the paste on all pins approximately at the same time, so I expect the effect of locking the chip into the right position to be more pronounced than when using a small nozzle and hover over all the pins.

I've seen surface tension work well, like in some Norwegian guy's youtube videos, but I've also witnessed it work against me. However, the issue with the later could very well have been the air flow rate being too fast and the IC being too small (0603 capacitor). I'm not an expert on this but have learned my own tricks to deal with it. I find it much easier to hand solder most surface mount capacitors, including the larger ones. You need a steady hand. I pre-tin one of the capacitor pads, then grab the 0603 with tweezers in the other and put the 0603 into place, while the other hand is melting the pad with hand iron. I find with 0603 capacitors, trying to use paste and hot air blows the caps off course very easily. With larger tantalum caps, I tend to get unmelted paste remaining under the the cap.

mkarcher wrote on 2023-06-10, 10:32:

My experience with one of the bigger Chinese warehouse sellers is that their claimed stock is sometimes made up. I tried to order a set of 70 chips 256k x 16 25ns EDO RAM for video memory at one of them, that shall stay utnamed (mind the "typo" ;) ), as they claimed to have 32100 in stock at 1.05€ a piece, and after placing the order, they contacted me that they can only fulfill 30 of those, at the same time rising the price to 2.80€ a piece, so they asked me to send them extra money to receive less chips. They accepted my request to cancel that position on my order, though.

Yes, they do this all the time. I suspect some of these old IC's are traded like stocks and the price fluctuates often. For EliteMT LP61256GS-8, I asked the price a few years ago and it was $4.1/pc. I asked a week ago, to the same seller, and the price was $2.8/pc. Another issue is that many of these IC distributors who don't have the ICs in stock will try to find them for you, e.g. from the IC super market in Shenzen, thus more mark-ups.

As of last week, LP61245GS-8 chips were in stock if you don't mind paying $40USD shipping via DHL/UPS/FEDEX. Those companies are the worst criminals when it comes to their brokerage-related fees. Seems like every few years, they come up with a new brokerage type fee to add-on. The import taxes themselves are very small in comparison to their mafia-like brokerage fees, like I might pay $6 in taxes (goes to Canada) and $45 in brokerage fees (goes to UPS). This is why Canadians don't want US eBay sellers using FedEx or UPS. Canada Post has a fixed $10 brokerage fee (plus the tax).

If you live in the US, none of this is a problem. Most shipments just flow through to the consumer, even $1500 worth of electronics, without duties. Imagine the extra revenue the US government would receive if they taxed all imports like the rest of the world does. They could probably pay down the $31 trillion national debt a bit faster!

Plan your life wisely, you'll be dead before you know it.

Reply 70 of 108, by mkarcher

User metadata
Rank l33t
Rank
l33t
jakethompson1 wrote on 2023-06-10, 22:55:

I see, would that PCI+VLB penalty be controlled in the BIOS or would it be strapped on something like the 8881, such that by moving resistors around and such you might be able to speed up one of the PCI+VLB boards?

I've not yet encountered a board or datasheet that talked about LRDY synchronization being optional on VLB+PCI chipsets. For example, the SiS 496 datasheet says:

SiS496 datasheet wrote:

If the VL target asserts LBD# to claim the cycle, 85C496 releases control and monitors
LRDY#, once LRDY# being sampled active, 85C496 returns RDY# to CPU or ISA bridge at
the next clock and the cycle completes.

It doesn't specify any configuration bit or strapping options, it just states that synchronization happens. The datasheet implies that 0WS cycles should be possible, but my experience with my Virge VL prototype card was poor at 0WS. Whether a cycle is performed at 0WS is not a mainboard option, but a device setting. Many devices don't do 0WS at all. The ET4000/W32 cards I had at hand didn't exceed three clocks per cycle on a board that supports 0WS and transparent LRDY quite well, while the VL prototype board did. For the ET4000/W32i, the extra wait state is not that surprising, because it uses multiplexed address/data pins, which adds some overhead to the cycle.

On the Virge VL, the generation of 0WS cycles is controlled by a configuration bit in the chip, which is initialized by the video BIOS on boot, and is hardcoded in all commercial S3 BIOSes I've seen by now.

Reply 71 of 108, by mkarcher

User metadata
Rank l33t
Rank
l33t
feipoa wrote on 2023-06-10, 23:24:

The m on the UMC part number indicates the module is a mixed-mode module, so 5V power, 3.3V I/O.

Thanks for the information. Running them at 5V power and 5V I/O is in spec according to the data sheet. The "VIH" (data input high) voltage specification in that datasheet is confusing, though. It's fine that they specify that voltages between 2.2V and Vcc+0.3V are considered high (and higher voltages are forbidden). But why is it listed as "recommended operating condition" and not as "DC electrical characteristic"? And what the heck is "typical 3.5V" supposed to mean at all?

feipoa wrote on 2023-06-10, 23:24:

Your findings parallel mine. If you swap the CPU from Am5x86 to Cx5x86, L2 wait states will need to be slowed down, starting at, I think, 50 MHz on UUD. This is with 256K double-banked. When running more cache, the L2 wait states need to be re-adjusted. Having an adjustable L2 module which can also accept the 8 ns 256K cache chips is ideal.

I guessed that the Cyrix processor provides less address setup time to cause this behaviour. This seems to be wrong, though, as less address setup time would make the lead-off slower, but wouldn't have any effect on the burst speed (which doesn't use addresses generated by the 486 anymore). So my new hypotheses is that the Cyrix processor requires more data setup time on read cycles. This is likely true for the Cx486 core already, as some BIOSes contain different auto-config tables for Cyrix and Non-Cyrix processors.

feipoa wrote on 2023-06-10, 23:24:

What is the stepping/revision of your Cx5x86?

According to the DIR, it's S1R3, and indeed I already ran Windows 2000 with branch prediction enabled, which is claimed to be impossible with S0R5 processors. My processor has the blueish green Cyrix heatsink glued on it. Is there a way to identify the production date without breaking the heatsink off? The numbers on the back don't contain an obvious date stamp, but possibly you can identify a "hidden" date stamp from the data Disruptor provided already.

feipoa wrote on 2023-06-10, 23:24:

Ideally you want a 120 MHz labelled Cx5x86 for branch prediction for 120 MHz operation. If you have a 100 MHz, S1R3, cx5x86, it may work with branch prediction at 100 MHz, but not 120 MHz - yet it may still work at 120 MHz but w/out branch prediction. This was my finding anyway. I vaguely recall another user trying to get LSSER working at 120 MHz on a 100 MHz CPU, which also did not work well at 120 MHz. S1R3 120 MHz cx5x86 chips are out there, but not as common as S0R5.

In the winter, at around 19°C room temperature (66°F), LSSER (I mean "serialization disabled", i.e. the faster mode) + BTB worked at 120MHz, as soon as I added a fan. The benchmarks in the initial post of this thread have been performed at 120MHz with optimized settings. Also, I ran some 3DMark benches in Windows 2000 at LSSER + BTB at 120MHz in the winter without any issues. The fact that adding a fan to the heatsink that still wasn't very hot in the winter helped making the system stable also shows that the thermal margin for LSSER+BTB at 120MHz is very low on my CPU. To get more stable operation at 120MHz, something like a Peltier cooler between the CPU and the heatsink might be required, but currently I'm not inclined to go that way.

feipoa wrote on 2023-06-10, 23:24:

For Am5x86-180, I find that EDO at 0ws/0ws (no L2) may be faster than FPM at 1ws/0ws (L2 @ 3-1-1-1), depending on the application [at 180 MHz]. For my tests, DOOM preferred the FPM, while PCPBench and Quake preferred the EDO.

As already noted in this thread, the memory access patterns by Quake seem to not interact well with L2 cache (at least with 256KB of it). The table in the initial post obtained with the Cx5x86 is very clear in this regard: Quake ran faster in most configurations if more RAM was installed than is cachable, the having uncached areas speeded up quake. This is just opposite to the common knowledge that "uncacheable areas make your system very slow". Of course, this is a single data point with (for that period) ridiculously fast RAM, and bad cache timings (3-2-2-2). I expect the working set of Quake to be much bigger than that of Doom, so the cache hit rate of Quake is considerably lower, so you pay the performance overhead for the cache system (i.e. no non-burst cycle can be performed in less than 3 clocks), but that penalty doesn't get compensated by performance improvements due to cache hits.

feipoa wrote on 2023-06-10, 23:24:

For your tests, DOOM agreed with my results, however your results indicated that Quake and PCPBench were the same for FPM/EDO with the stated conditions. Your tests were at 2x60 while mine were at 3x60. Perhaps 180 MHz is needed for the EDO results of Quake/PCPBench to move ahead of FPM?

That's quite likely. The higher the multiplier, the more sensitive the processor is to bus bandwidth. (Side note: That's why I think the 1KB L1 cache 486SLC2 was a mostly pointless idea invented by marketing - OTOH, getting the performance of a 486SLC2-33 using a 25MHz mainboard by running the SLC2 at 2*25MHz might be considered a valid use case by some people). It's quite possible that at 2*60, the primary bottleneck is CPU power (which is not surprising at all when you run Pentium-targeted software like Quake on a 486 processor), so the write performance benefits from the EDO can't be used by at 2* clock. This makes a lot of sense, because the 486 processor contains a write buffer that can buffer up to four memory writes and perform them in the background while already executing the next instructions. You need to issue writes at a very high rate to get the processor stalled on a full write buffer. Of course, at 3*60 writes can be issued 50% faster compared to 2*60, while they get performed at the same rate. The key benefit of the "cacheless EDO" configuration is just fast writes, because the write can be performed without needing to update the L2 cache. Anything that is read-heavy will likely prefer cache over EDO.

feipoa wrote on 2023-06-10, 23:24:
mkarcher wrote on 2023-06-10, 10:32:

Soldering the caps on that PCB was my first hot-air experience (except for using a home-improvement store heat gun to bulk desolder scrap electronics). It's amazing to experience first-hand how surface tension is able to pull the capacitors back into their position when the solder melts. The percieved advantage of the wide nozzle is that I get to melt the paste on all pins approximately at the same time, so I expect the effect of locking the chip into the right position to be more pronounced than when using a small nozzle and hover over all the pins.

I've seen surface tension work well, like in some Norwegian guy's youtube videos, but I've also witnessed it work against me. However, the issue with the later could very well have been the air flow rate being too fast and the IC being too small (0603 capacitor).

Yeah, I made a similar experience. Possibly I bought a suboptimal hot-air station with an excessive minimal air-flow rate. Even at the lowest setting, the 0805 caps on the cache PCB started being blown away when they were not perfectly stuck into paste. But I was able to "blow them back" by pointing the nozzle the right way, and get them stick at that point due to surface tension. It's helpful that that there are no other pads very close the caps, so the had no chance of sticking elsewhere. That's also another point why I expect the SOJ nozzle to be a good idea: The wider the nozzle, the lower the air speed at the same volume flow rate (they included a 2mm nozzle with my station - the air speed emitted by that nozzle is ridicolous!), so the effect of blowing the component away should be non-existant with that nozzle, probably I even have to turn up the flow rate setting. As that station has up to 300W heating power, there should be enough power for soldering the whole SOJ at once.

feipoa wrote on 2023-06-10, 23:24:

With larger tantalum caps, I tend to get unmelted paste remaining under the the cap.

Probably that's why professional PCB assembly pre-heats the PCB.

feipoa wrote on 2023-06-10, 23:24:

If you live in the US, none of this is a problem. Most shipments just flow through to the consumer, even $1500 worth of electronics, without duties. Imagine the extra revenue the US government would receive if they taxed all imports like the rest of the world does. They could probably pay down the $31 trillion national debt a bit faster!

Yeah, it's the same in Germany as it is in Canada. You have to pay import taxes for goods you import, just as you have to pay VAT for goods you buy locally. Probably not by coincidence, the import tax rates equals the default VAT rate in Germany. I'm fine with that situation, because I don't think it's healthy either for the environment or for the local economy if there is less tax on imported goods than there is on locally produced goods. Let's not get into politics too much, though, as that is intentionally off-topic on VOGONs, which seems to work quite well to avoid a lot of heated discussion.

Reply 72 of 108, by feipoa

User metadata
Rank l33t++
Rank
l33t++

Do you and Disruptor live in the same town? Seems like you are fortunate enough to be able to share hardware.

G5F8542K

This is my understanding of the part number shown on the die cap:

G = made by IBM

F = fab location. I don't know which location is F, but D = Burlington, L = East Fishkill, A = unknown, F = unknown

8 = die run or mask

5 = 1995

42 = week 42

K = die lot

Once you've seen a large enough sample of CPUs, you can tell which die cap markings correspond to 4x. Stepping/Revision needs to be read from the registers.

mkarcher wrote on 2023-06-11, 08:53:

Thanks for the information. Running them at 5V power and 5V I/O is in spec according to the data sheet. The "VIH" (data input high) voltage specification in that datasheet is confusing, though. It's fine that they specify that voltages between 2.2V and Vcc+0.3V are considered high (and higher voltages are forbidden). But why is it listed as "recommended operating condition" and not as "DC electrical characteristic"? And what the heck is "typical 3.5V" supposed to mean at all?

When I found this datasheet 5 years ago, I remember having a reaction similar to yours. I don't have an answer for you, but I can see why UMC didn't want their datasheets floating around.

Don't remove your Cyrix 5x86 heatsink. No new information can be determined by looking under it.

Using peltiers is a mess due to condensation and changing conditions once the setup is placed inside an AT case with very little air flow. I'm experimenting with a 8A peltier running at 5V now, which draws around 2 A and 10W. I can get get it to not condensate on the benchtop fine because it lets the CPU temp stay at 2-3 degrees Celsius below ambient room temp, but once I place a cardboard box covering 80% of the setup (to simulate AT case aifrlow), the CPU can get too far above ambient (+4 C) and thus may not handle some sensitive apps. This is for my Am5x86-180 setup. Another user here, pshipkov, has found that using silicone and vascoline can avoid the condensation. I'm trying to get a setup going in a authentic AT case which will not condensate and is long-term stable. I'm getting close I think. Curiously, I don't need a peltier on the UUD board for Am5x86 at 180 MHz and 4V, but the LSD board seems to need the peltier for stability. Anyway, this is a vastly different topic, but serves to add another bonus point for the UUD board as far as I'm concerned.

mkarcher wrote on 2023-06-11, 08:53:

That's quite likely. The higher the multiplier, the more sensitive the processor is to bus bandwidth. The key benefit of the "cacheless EDO" configuration is just fast writes, because the write can be performed without needing to update the L2 cache. Anything that is read-heavy will likely prefer cache over EDO.

Good summary and it agrees with the numbers provided previously.

mkarcher wrote on 2023-06-11, 08:53:

Probably that's why professional PCB assembly pre-heats the PCB.

No doubt! Unfortunately, I am out of space to store any more equipment. Already my soldering gear is on my wife's work desk.

Plan your life wisely, you'll be dead before you know it.

Reply 73 of 108, by Disruptor

User metadata
Rank Oldbie
Rank
Oldbie
feipoa wrote on 2023-06-11, 10:25:

Do you and Disruptor live in the same town? Seems like you are fortunate enough to be able to share hardware.

Once you've seen a large enough sample of CPUs...

No, 20 km.
But we meet very often and we basically share hardware together.

As far as the Cx5x86 is concerned, I just have one piece at all. I handed it over to mkarcher to play.
Since we got a cheap Biostar UUD lacking of some ICs we got it running.
And when I have heared about your overclocking experiments with FSB 60, we tried it with our Am5x86 too but failed to go over 160 MHz.
Just for high FSB experiments I've optained an Am486DX4-SVB8-120 and handed it over to mkarcher for play purposes again.

Reply 74 of 108, by mkarcher

User metadata
Rank l33t
Rank
l33t

I assembled the cache module by now. I did not yet install a way to get A19 from the front side bus, but this shouldn't be a critical issue, it will just limit the cache size to 512K instead of 1M. When I install the module, the BIOS recognizes 512KB cache, but it crashes after printing "EDO RAM installed!". It boots with cache disabled. I tried to toy around developing a cache test utility for UMC based boards to diagnose the kind of fault. The tools still has some quirks, and I learned a lot about the L2 cache controller in the UMC chipset. UMC seems to have taken any short cut imaginable (which makes sense, as the L2 interface is performance critical). Especially, "L2 off" doesn't disable everything about the L2 cache (EDIT: turns out I didn't follow the correct protocol to disable L2 cache. If you properly disable, by not only setting the enabled bit to 0, but also setting the size bits to "no cache", you can disable the L2 cache!). The mode with "enable" cleared, but a valid cache size set can be called "cache in standby". This mode just disables enough of the L2 logic so that it never uses data from the L2 cache, but still keeps the cache hot so it can immediately be re-enabled. Particularly, this means:

  • Every cacheable memory write performs a L2 lookup and updates the L2 on hit(*), even if the L2 enable bit is cleared
  • On read (only), the if the L2 enable bit is cleared, the tag lookup is forced to be "miss to a non-dirty cache line"
  • On every read miss(*), a cache line fill happens, even if the L2 enable bit is cleared. This means in particular that with the L2 cache enable bit cleared, every cacheable read causes a cache line fill.

For cache size determination, there is a "always hit" mode. In that mode, the tag content is ignored, and every cacheable memory cycle is served by the L2 cache. I didn't yet test whether in write-back mode, the always hit mode causes that no write ever hits RAM, but I expect it to be that way. If "always hit" mode is enabled, the decisions marked with an asterisk(*) are forced, so in always hit mode, every cacheable write causes a cache update, and no cache line fill ever happens. "always hit" has higher priority than the forced miss by disabling L2 cache.

With my module installed, I get expected behaviour in "always hit" mode, and I can test that the 512KB that are addressable with A19 grounded seem to work fine. The preliminary cache test utility I started to develop still shows some strange output, but that output is explainable by unintended cache updates while the L2 cache is supposed to be disabled. I already know how to fix it (and thus could create a generally useful cache testing tool), but I didn't implement it yet. My preliminary cache test utility shows erratic behaviour in normal "L2 enabled" mode. This strongly hints to issues with the tag RAM chip. That chip also gets quite warm, which might be normal, as it has /CE and /OE permanently asserted, but it might also be an indication that this chip is broken. I ordered these chips from an AliExpress seller, so getting one DOA chip on a sample of 9 isn't completely unexpected. I'm going to swap it with a different chip to see if it fixes the issue. I quadruple-checked that the solder joints at the tag chip are OK by checking that the /WE pin and all data pins make contact to the chipset and that all the address pins make contact the address pins on the other cache chips.

At least for de-soldering the wide SOJ-type nozzle will be a great help. I intent to do the chip swap tomorrow and will report back whether it helped. Otherwise, I need to start troubleshooting with the scope, as I'm afraid that issues might not be visible on a plain digital logic analyser.

Last edited by mkarcher on 2023-06-17, 17:18. Edited 1 time in total.

Reply 75 of 108, by feipoa

User metadata
Rank l33t++
Rank
l33t++

From your comments, I had some questions

1) If setting L2 to off in the BIOS just disables the L2 logic, is it preferred to physically remove the L2 cache when testing, particularly for fastest speeds, e.g. 3-1-1-1 EDO READ, when not using L2 cache? Perhaps we can even get EDO READ 3-1-1-1 working with 64 MB EDO at 3x60 Mhz if the L2 is physically removed?

2) If you've ordered counterfeit cache from China, expect a 10% failure rate. You will want to test them in an SRAM tester, if possible. Most eeprom programmers have a simple SRAM test option. When I ordered DIP SRAM from china, 10% were bad. However, ordered my 10 ns SOJ SRAM from mouser or digikey,

Yes, I can image the SOJ-nozzle would be most useful for desoldering the SRAM, probably more so than with soldering on the SRAM.

Plan your life wisely, you'll be dead before you know it.

Reply 76 of 108, by mkarcher

User metadata
Rank l33t
Rank
l33t
feipoa wrote on 2023-06-17, 08:41:

1) If setting L2 to off in the BIOS just disables the L2 logic, is it preferred to physically remove the L2 cache when testing, particularly for fastest speeds, e.g. 3-1-1-1 EDO READ, when not using L2 cache? Perhaps we can even get EDO READ 3-1-1-1 working with 64 MB EDO at 3x60 Mhz if the L2 is physically removed?

The L2 cache is directly connected to the 486 address and data pins ("local bus", "frontside bus", however you want to call it). Every chip connected to those lines has some capacitance: When that line toggles from 1 to 0, or from 0 to 1, the capacitance in that chip needs to be (dis)charged. The more capacity there is, the longer the process takes. So if you are after last last nanosecond of response time, removing L2 definitely helps. In the table in the OP I made in the winter, I was able to get EDO 3-1-1-1 at FSB40 with a Cyrix CPU only with L2 physically removed, so the theoretical point I made seems to be valid in practice, too. In the summer, I wasn't able to get EDO 3-1-1-1 at FSB40 with the Cyrix processor working at all, so the margin I had in the winter was already quite low.

Thinking more about the UMC chipset behaviour, it might not just be about simplifying the chipset logic (which is a valid goal), but it also extends the "always valid" concept. (EDIT: the mode I assumed to be "cache disabled" seems to be specifically designed for "cache in hot standby", i.e. you got the low performance of L2 disabled, but you can reenable L2 without any need to re-initialize it. There is a different "actually disabled" mode) Most 386/486/Pentium chipsets do not have a bit indicating whether some part of the cache (a cache line) contains valid data or not. They get away with it by making sure the data is always valid while the cache is operating. This means every write must be mirrored to the cache, processor-originated or DMA originated. You can't just invalidate a cache line that had a write happen to it. The way the UMC8881 implements the caching logic, the cache contents are still kept valid while the cache is "disabled", which allows for just setting the enable bit again without any need to re-initialize the cache.

Having the cache getting filled on reads also makes it possible to optimize initializing the cache. The classic way to initialize an "always valid" L2 cache is to read twice the cache size of cacheable memory. After reading once the cache size, you know that the tag RAM for all cache lines contains the address of the data just read. You don't know whether the data RAM is valid, though, because the tag RAM might be conincidence already had the value so the cycle might be treated as cache hit, so the data RAM didn't get filled. By reading another chunk of memory that also has the size of the L2 cache, you can be sure that all reads are misses this time (the tag had been initalized to point to the first chunk), and the data RAM will surely get filled. The UMC strategy ensures that with cache disabled, every read also initializes the cache line, so reading once the cache size before setting the cache enable bit is already enough to have the whole cache contain valid data.

Of course, the "keep the cache valid even if disabled" strategy only works if the cache timings work. Setting the cache timings to 2-1-1-1 and running at FSB66 might generate some signals to the cache chips, but (unless you have extremely fast chips) is not able to ensure the cache contains valid data. As the cache contents is never used with the cache disabled, the invalid contents of the cache don't disturb operation, just as it is the cache if the cache is missing or broken.

And finally, this explains why you must not use a 3-x-x-x cache burst with 0WS for DRAM write (as I noted in one of my early posts in this thread) when the L2 cache is disabled. A DRAM write at that configuration takes just two clocks and starts immediately when the cycle begins, but updating the cache takes 3 clocks, so the cache update is still running when the next cycle might start. My Biostar BIOS (a slightly edited version of your 2014, as we found out) doesn't enforce a 2-x-x-x burst with 0WS and L2 disabled, so you need to consider this restriction yourself when you configure a L2-less system. (EDIT: The 3-x-x-x issue also applies in the "cache off", not just in the "cache standby" mode. So this definitely is a quirk in the chipset).

feipoa wrote on 2023-06-17, 08:41:

2) If you've ordered counterfeit cache from China, expect a 10% failure rate. You will want to test them in an SRAM tester, if possible. Most eeprom programmers have a simple SRAM test option. When I ordered DIP SRAM from china, 10% were bad. However, ordered my 10 ns SOJ SRAM from mouser or digikey,

I have a good old HiLo ALL-03A EPROM/GAL/Flash programmer. It might be a good idea to add a SOJ-to-DIP adapter to it, to be able to test the SRAM chips. I'm unsure whether my chips are factory rejects, new old stock that ended up at a Chinese broker or relabeled chips from scrap boards. The rate "one chip out of nine doesn't work" is expected, as I know the 10% failure rate experience reported by different VOGONs users. I'm not complaining, though, as the missing quality control is part of the reason why they are able to sell the chips that cheap. Getting all 8 data chips working (I'm quite sure my cache experiments confirmed the data RAM to be working) and the tag chip to be completely useless still surprises me. If exactly one of nine chips is broken, the chance to get the one broken chip on the tag position is just around 11%.

I just swapped the tag chip, but it still doesn't work, so the theory of a DOA tag RAM chip just got more unlikely. Time to crank out the scope. Designing a cache adapter board shouldn't be that difficult, so I need to find out what dumb mistake I made...

feipoa wrote on 2023-06-17, 08:41:

Yes, I can image the SOJ-nozzle would be most useful for desoldering the SRAM, probably more so than with soldering on the SRAM.

Desoldering was easy. It works quite well with soldering, too. You have to add weight to the chip, because otherwise the chip will lift off as soon as the paste starts to melt. There is a time window in which the paste isn't sticky enough to hold the chip, but the solder didn't yet form a blob to keep the chip in place using surface tension. With a nozzle that blows the air below the chip (where you need the heat for SOJ soldering), this inevitably has to generate some lift.

Last edited by mkarcher on 2023-06-17, 17:23. Edited 1 time in total.

Reply 77 of 108, by mkarcher

User metadata
Rank l33t
Rank
l33t
feipoa wrote on 2023-06-17, 08:41:

1) If setting L2 to off in the BIOS just disables the L2 logic, is it preferred to physically remove the L2 cache when testing, particularly for fastest speeds, e.g. 3-1-1-1 EDO READ, when not using L2 cache? Perhaps we can even get EDO READ 3-1-1-1 working with 64 MB EDO at 3x60 Mhz if the L2 is physically removed?

Update: Whatever I wrote about electrical loading (the most important factor) is still true.

Whatever I wrote about about to not configure a 3-cycle leadoff with 0WS writes and L2 disabled is still true.

But there is a distinct "cache disabled" mode that does not touch the data SRAM, which is different from the "cache in standby" mode I presumed to be the "most disabled" mode that is available.

Reply 78 of 108, by jakethompson1

User metadata
Rank Oldbie
Rank
Oldbie
mkarcher wrote on 2023-06-17, 11:10:

Most 386/486/Pentium chipsets do not have a bit indicating whether some part of the cache (a cache line) contains valid data or not. They get away with it by making sure the data is always valid while the cache is operating. This means every write must be mirrored to the cache, processor-originated or DMA originated. You can't just invalidate a cache line that had a write happen to it. The way the UMC8881 implements the caching logic, the cache contents are still kept valid while the cache is "disabled", which allows for just setting the enable bit again without any need to re-initialize the cache.

Interesting. Avoiding the need to fill the entire cache before enabling it aside, I wondered why there would be any advantage to having a valid bit, but the DMA issue you mention is one. I guess that would crop up on multiprocessor systems a lot more as well. I have also read somewhere about obviously needing valid bits in addition to a tag if you want your cache to support partially filled lines. I don't think such support would make sense for a 486, since it would defeat the point of 2-1-1-1 burst read mode, but I suppose it could be useful on a 386.

Reply 79 of 108, by feipoa

User metadata
Rank l33t++
Rank
l33t++

Very interesting discovery about cache contents being kept valid while it is disabled.

I will try EDO 3-1-1-1 at FSB60 and 0ws/0ws with L2 removed. Since the margin for Cyrix != margin for AMD, I think this test is worth a shot. Or have you already tried this at 2x60 w/Am5x86?

mkarcher wrote on 2023-06-17, 11:10:

I have a good old HiLo ALL-03A EPROM/GAL/Flash programmer. It might be a good idea to add a SOJ-to-DIP adapter to it, to be able to test the SRAM chips.

Do you know where to buy one? I have a two different programmers with 20 or so adaptors, but neither had a SOJ 300mil to DIP adaptor.

mkarcher wrote on 2023-06-17, 17:20:

But there is a distinct "cache disabled" mode that does not touch the data SRAM, which is different from the "cache in standby" mode I presumed to be the "most disabled" mode that is available.

When I set L2 to disabled in the BIOS (and still have L2 installed), is the L2 cache in "cache disabled mode"?

What mode is the L2 in when I set L2 to disabled in the BIOS but do not have L2 installed?

Plan your life wisely, you'll be dead before you know it.