My adventure into 386 upgrades started in the mid-90's when I purchased a second-hand 386 board from a friend. It had an AMD 386DX-40. For a long time, I always thought this was the fastest CPU for the ole 386 motherboard. Much later, I later realised that there was a Cyrix 486DLC which also ran at 40 MHz but contained 1 KB of on-CPU cache.
The attachment Cyrix_DLC.jpg is no longer available
More time passes and I was informed about the Texas Instruments 486SXL, which is nearly identical to the 486DLC but has an increadible 8 KB of L1 cache. Naturally, I wanted to take the machine to the max, so I upgraded again.
The attachment Texas_Instruments_SXL.jpg is no longer available
More time passes and I was able to source a clock doubled version of the 486SXL-40, called the 486SXL2-50. I paid the $25 and upgraded yet again. I didn't really like how it ran with a 25 MHz FSB compared to the SXL-40, with a 40 MHz FSB. Obviously, the memory and L2 cache run slower with a 25 MHz FSB. However, recent testing has revealed that this is of little consequence compared to the faster CPU speed of 50 MHz. Some boards really shine with the SXL2-50, while others show only marginal improvement above the SXL-40. I am still working on a large 386 chipset benchmark comparison with all these high-end CPUs, so stay tuned.
Originally, I was running the SXL2-50 with a Cyrix FasMath 83D87 at only 25 MHz. This bothered me. Unfortunately, only Intel i387DX FPUs can run asynchronously. None-the-less, I sourced the Intel i387DX and ran it at 40 MHz. To my dismay, FPU benchmarks ran slower with the i387DX at 40 MHz compared to the Cyrix FasMath at 25 MHz. There exist clock-doubled FPUs from IIT and ULSI. [side note: IIT's are slightly faster than ULSI clock-for-clock] I sourced an IIT x2 and ran it with the SXL2-50. Luckily, the FPU results were faster than the single Cyrix FasMath.
The attachment Texas_Instruments_SXL2.jpg is no longer available
Wanting more out of my 5V SXL2, I was able to establish that running it at 55.25 MHz is stable. Unfortunately, the IIT x2 couldn't cut it at 55 MHz. So it was back to the Cyrix FasMath. Luckily, the SXL2-55 combined with the Cyrix FasMath at 27.6 MHz (55.25 / 2) is equal to the IIT x2 at 50 MHz, so no loss.
There was also a low voltage version of the TI486SXL2 which ran at 3.6 V. It came in PGA168 and QFP144 formats. Shown above is the QFP144 variant placed onto a PGA132 upgrade module, which also contains a voltage regulator to drop the motherboard's 5 V down to 3.6 V. It is rated for 66 MHz (clock-doubled), but it also runs well at 2x40, or 80 MHz. The QFP144 to PGA132 upgrade module is difficult to find, however the PGA168 CPUs are readily available even today. For this reason, it would be nice if there was an upgrade module which converted PGA168 to PGA132, specific for the 486SXL2-66 PGA168, e.g. Custom interposer module for TI486SXL2-66 PGA168 to PGA132 - HELP! Unfortunately, the SXL2-66 would not work in the Transcomputer module.
Some 386 boards have built-in support for enabling and supporting cache coherency between the DLC/SXL's onboard L1 cache, while some require software or hardware work-arounds, e.g. adding an external NAND FLUSH circuit, using the BARB register to invalidate the cache at every hold, or merely enabling via software L1 cache and opening up the cacheable region. Thus far, all my motherboards work with the L1 cache of these CPUs using software to enable it.
There was also a Cyrix 486DRx2 which rated speeds of 50 MHz and 66 MHz. These chips supposedly have the cache coherency circuit built into the unit. Unfortunately, they only have 1 KB of L1 cache. Is a Cyrix 486DRx2-66 with 1 KB of cache faster than a Texas Instruments 486SXL2-50 with 8 KB of cache? They are pretty darn close, but it depends on the motherboard. On my VIA 481/495-based board, the SXL-50 is a good bit faster than the DRx2-66, however on my UMC 481/482 board, the SXL2 needs to be run at 55 MHz to catch up to the DRx2-66. On my SiS 310/320/330 board, the DRx2-66 is faster than even the SXL2-55. More on this chipset vs. CPU comparison in the months to come.
The attachment Cyrix_DRx2.jpg is no longer available
Recently, I acquired an IBM Blue Lightning BL3, which is similar to the DLC/SXL/DRx, but added 16 KB of L1 cache in write-back mode (as opposed to write-through mode on the DLC/SXL/DRx). The IBM BL3 also added dozens of fancy register settings which CTCHIP34 is able to modify. Like the DLC/SXL/DRx chips, the BL3 still contains a 386 core with 486 instructions and requires an external FPU. There is a driver utility from Evergreen Technologies and one from Kingtston which can setup these registers for you at boot. For what register settings I use with which chip, refer to this thread Register settings for various CPUs
The attachment IBM_Blue_Lightning_BL3.jpg is no longer available
The BL3 contains a variable multiplier of 1x, 2x, and 3x while the DRx2 and SXL2 contain a 1x and 2x multiplier option. The Buffalo BL3 module would not run at 2x40, but ran at 3x33 MHz. The BL3 module shown in the middle would run at 2x40, depending on which motherboard it was placed in. It would also POST at 120 MHz, however it was not stable in Windows 3.11. I have sourced a 70, 72, and 74.5 MHz crystal oscillator so that I can see if it is stable at 105, 108, and 112 MHz, respectively.
Lastly, there was the Intel RapidCAD, RAPIDCAD-1 SZ624 and RAPIDCAD-2 SZ625. I believe the RapidCAD was essentially a 486DX-33 CPU without any L1 cache. The lack of L1 cache cripples performance significantly.
Plan your life wisely, you'll be dead before you know it.