I am missing any help/any utility to enable/disable the internal CPU cache of the Cyrix Cx486DX2 (in fact it is a clone named "It's ST 486DX2-80") in a reliable way.
Just use any tool that can enable/disable the internal CPU cache of an Intel 486 processor. The algorithm to enable/disable the L1 cache on the Cx486DX / Cx486DX2 is identical to the algorithm used on any other standard 486 processor. It's toggling bits 30 and 29 of CR0. In case you want to toggle the L1 cache of that Cyrix CPU between WB and WT mode, you might want to use https://github.com/karcherm/cx486wb (full disclosure: I'm the author of that tool). The release ZIP file contains the assembled version of the three utilities.
I am missing any help/any utility to enable/disable the internal CPU cache of the Cyrix Cx486DX2 (in fact it is a clone named "It's ST 486DX2-80") in a reliable way.
I've tested several 486DLC tools, but they do not work or do not even start (because it's of course not a 486DLC), the 5x86 utilities are not working for the same reasons.
For sure the Cyrix Cx486DX2 has an internal CPU cache and yes, the documentation of the CPU says it can be enabled or disabled (disabled by default unfortunately), but there is no dedicated utility software for exactly this CPU...
In addition to mkarcher's post, after enabling WB cache, please try DMA access, like writing to a floppy disc. Please use a floppy disc with unimportant data.
Please verify the data written to the floppy disc with cache set back to WT.
[snip]
Speedsys has a build-in function to export to what you see on the screen to a pcx image, by pressing the R-key when tests are done. You can then convert that saved pcx image to png (or gif) and attach it to the forum.
Oh about that, quite embarrassingly I did export the PCX files, Just that like an idiot I forgot to actually transfer the files from my PM7500 to my PC. I'm working on it atm..
@Disruptor - yes that's planned, I'm just having some issues with transferring files, but normally in a few days, I'll be able to post some results. Although given that it's not really a full PC (it's kind of what a Softmodem is), there might be some unusual things cropping up. (like 5x86.exe causing the whole card to lock up when called using LH in autoexec.bat)
Proud owner of a Shuttle HOT-555A 430VX motherboard and two wonderful retro laptops, namely a Compaq Armada 1700 [nonfunctional] and a HP Omnibook XE3-GC [fully working :p]
I am missing any help/any utility to enable/disable the internal CPU cache of the Cyrix Cx486DX2 (in fact it is a clone named "It's ST 486DX2-80") in a reliable way.
Just use any tool that can enable/disable the internal CPU cache of an Intel 486 processor. The algorithm to enable/disable the L1 cache on the Cx486DX / Cx486DX2 is identical to the algorithm used on any other standard 486 processor. It's toggling bits 30 and 29 of CR0. In case you want to toggle the L1 cache of that Cyrix CPU between WB and WT mode, you might want to use https://github.com/karcherm/cx486wb (full disclosure: I'm the author of that tool). The release ZIP file contains the assembled version of the three utilities.
Mea culpa, but I expressed myself not exact enough... yes, I liked to enable the write back cache, not only the CPU cache. In fact, I already found WBOFF (and WBON of course), and yes, it worked fortunately, thanks.
I am still missing the possibility to enable additional CPU settings, e.g. Burst Write Cycle (BWRT). Or I just didn't understand the Cyrix documents (these documents were related to implement settings in the BIOS).
I am missing any help/any utility to enable/disable the internal CPU cache of the Cyrix Cx486DX2 (in fact it is a clone named "It's ST 486DX2-80") in a reliable way.
I've tested several 486DLC tools, but they do not work or do not even start (because it's of course not a 486DLC), the 5x86 utilities are not working for the same reasons.
For sure the Cyrix Cx486DX2 has an internal CPU cache and yes, the documentation of the CPU says it can be enabled or disabled (disabled by default unfortunately), but there is no dedicated utility software for exactly this CPU...
In addition to mkarcher's post, after enabling WB cache, please try DMA access, like writing to a floppy disc. Please use a floppy disc with unimportant data.
Please verify the data written to the floppy disc with cache set back to WT.
Thanks for the hint, I guess you already made a bad expirience with it. Will test it, too.
So, as promised I ran the DOS benchmark suite on the Apple 7in PC card, and while most of the stuff ran pretty well, there's a few outliers that crashed or locked the whole PC up.
Quake: Out of memory error at startup, attempting to start from a fresh MS-DOS bootdisk yields the same thing.
Landmark: Starts up fine but then got a EMM386 crash. From a bootdisk it runs fine and reports that its running on a 459MHz AT with a 925MHz 287 (LOL)
TOPBENCH: Crash, regardless of using the bootdisk or with or without register enhancements.
DOOM and the other benchmarks ran fine without any weird things, regardless of register enhancements being enabled or not.
Proud owner of a Shuttle HOT-555A 430VX motherboard and two wonderful retro laptops, namely a Compaq Armada 1700 [nonfunctional] and a HP Omnibook XE3-GC [fully working :p]
If you are using a DMA-capable SCSI controller, such as the Adaptec AHA-1540/1542, and the SXL, SXL2, or DRx2, you will need to use the BARB method to invalidate the cache
Wouldnt Adaptec also generate /MEMW thus /FLUSH circuit works as is? Why would you need BARB fallback for ISA bus master?
A20M - In Cyrix_DLC_hardware_mods_by_Ernie_van_der_Meer he states enabling A20M signal meant he had to disable Chipset Fast A20 Gate support. So the tradeoff is between slow A20 masking versus uncached first 64 KBytes of every Megabyte? How often does A20 masking even occurs in normal DOS operation? Isnt it only when someone is explicitly using high memory area (HMA) in real mode? and wiring proper A20M is always beneficial?
Did anyone ever run BARB fallback cost benchmarks? For example Doom with Sound Blaster enabled should generate DMA Read commands, with BARB enabled those will flush cache. With proper /FLUSH circuit those will not generate cache flushes. How much of a difference there is between BARB and FLUSH in DOOM with sound?
Hi rasz_pl. Thanks for pointing this out. I have modified my wording to be more general. It is now written as:
1If you are using a DMA-capable SCSI controller, such as the Adaptec AHA-1540/1542, and the SXL, SXL2, or DRx2, you may need to use the BARB method to invalidate the cache, [b][i]cyrix.exe -b [/i][/b]. This was my experience using this controller on an AMI Mark V Baby Screamer motherboard w/PGA132 SXL2.
It has been quite some time since I reviewed Van der Meer's document in detail, probably 10 years or more. However, at the time, I remember modifying several 386 motherboards to ensure I was compliant with the modifications in that document. I did not analyse the technical bits of concerning the "why's". You certainly have better memory than I do about this subject. My work is more experimental in nature with the results mostly anecdotal based on these experiments. My memory is as follows:
When using AHA-1542CP and PGA132 SXL2-50, I needed to set BARB to get reliable operation in Windows 3.11. DOS looked like it would work with FLUSH, but not WINDOWS. On a few occasions, I wired up a NAND gate to implement the FLUSH circuit as shown in the SXL2 databook appendix. I don't recall it making much of a difference with the ability to use FLUSH.
Once I found the Evergreen SXL2-66 upgrade adaptor, which contains a PAL with various logic, I was able to use flush reliably w/AHA-1542CP. And the Evergreen upgrade adaptor does not have a header for connecting MEMW# to ISA MEMW#. Unfortunately, I don't recall the outcome of connecting the custom PGA168 interposer for the SXL2-66. That adaptor is too tall to fit under my HDD/floppy mounting cage in this system.
Concerning the loss of performance when using BARB vs. FLUSH, I recall it being about 4% in doom, but I think that was without sound. I recall reading other testers indicating they saw a 10% hit, but I don't recall in what benchmark. This might be worth re-evaluating.
If you were wanting to experiment with this further, I can modify my existing testbed setup to use the AHA-1540CP instead of the AHA-1522B. I have an Evergreen SXL2-66, PGA132 SXL2-50, DRx2-66, the custom PGA168 SXL2-66 adaptor, and the Improve-It PGA168 to PGA132 SXL2 board (5V only). Unfortunately, I don't have another AHA-1542CP w/floppy connector, so to add floppy testing, I'd need to configure an ISA multi-I/O card.
I remember on some motherboards, they were fussy about the FPU they could use w/DRx2 or SXL2. The AMI Mark V Baby Screemer was one such fussy motherboard.
Plan your life wisely, you'll be dead before you know it.
Most of the problems with L1 happen when running Windows.
Instead of not caching 64K at each megabyte boundary, I found it normally sufficient to just not cache the first 64K at the 1 MB boundary.
Looking through my 12 year old notes, seems the van der meer doc recommends:
LOCK# pullup to Vcc (this is usually 5K or 10K pullup)
ADS# pullup to Vcc
A20M# wire A20M# on CPU to KBC gate A20
FLUSH# generally uses 74F00 NAND gate IC. Pins to connect to 74F00: MEMW# on ISA slot, HLDA on CPU, and FLUSH# on CPU. Some 386 boards have this connected, some older boards omit this. 74F00 needed when using PGA132 SXL2. PGA168 SXL2 or QFP SXL2 can just wire FLUSH# to MEMW# on ISA slot. SXL2 databook mentions that sometimes HLDA should be taken from the chipset it self (serial, rather than parallel, L2 cache)
On motherboards which contain this NAND flush circuit, I did not need to enable FLUSH# using cyrix.exe Does this make sense to you?
When using AHA-1522B on several boards without the NAND flush circuit, it is adequate to enable cyrix.exe FLUSH# with BARB disabled. Does this make sense?
BARB method invalidates cache when CPU is in hold state. What triggers flush to invalidate ? DMA from northbridge, DMA from ISA, CPU hold state...? I wrote out the truth table from the appendix for the flush circuit, but don't really follow the logic. In general, I mostly focused on what works and what doesn't and I found trying to understand it with several motherboards was getting frustrating. Some reasons for why that is are as follows:
Cyrix provides some A20M_TST and DMA_TST DOS utilities to ensure L1 is working properly, however, I found this does not tell the whole story. Your system can pass these tests, yet still fail running Windows 3.1
On my AMI Mark V Baby Screamer, my notes indicate that gate A20 on KBC connects directly to Vcc. Its not left open like on some boards. Other notes indicate "A20M_TST fails unless using AMISETUP to set the Turbo Switch Function to Enabled, then all tests pass... tried later to enable Turbo using AMISETUP and now A20M_TST fails... why?"
Looking at another motherboard, the Peak/DM, I have "DLC & SXL must use BARB w/1540CP in Windows. DRx2 OK to use FLUSH w/1540CP in Windows. 1520B OK to use FLUSH" on other boards, i wrote "must use FLUSH with DRx2"
I have dozens of pages of old notes and my head is starting to hurt understanding what I wrote and writing them in any coherent manner here. It might be best to start over if there's sufficient interest.
Plan your life wisely, you'll be dead before you know it.
Instead of not caching 64K at each megabyte boundary, I found it normally sufficient to just not cache the first 64K at the 1 MB boundary.
This would work if indeed only DOS HMA and some obscure programs ever needed to use A20, on the other hand Intel added /A20M pin to 486 and kept support for it all the way to Haswell (according to wiki). Why would Intel care so much to dedicate whole precious expensive pin instead of just disabling caching of one stupid 64KB block?
On motherboards which contain this NAND flush circuit, I did not need to enable FLUSH# using cyrix.exe Does this make sense to you?
Yes, FLUSH circuit it pretty easy to understand. Pulse /Flush pin every time there is a DMA Write on ISA bus = CPU drops its Write Thru L1 cache, after end of DMA CPU starts refilling L1 from scratch not losing coherency with fresh data from floppy.
Makes me wonder why not directly snoop data bus instead? Maybe some chipsets didnt expose CPU to data/address bus traffic during DMA hiding it behind some additional buffers or had dedicated ISA-ram path? Or maybe its not that easy to know when to sample the bus? All in all dropping L1 cache only on DMA Writes is not a big deal, those will only ever happen on FDD activity and that doesnt happen all that often.
When using AHA-1522B on several boards without the NAND flush circuit, it is adequate to enable cyrix.exe FLUSH# with BARB disabled. Does this make sense?
Yes, only on boards with FLUSH circuit build in/integrated in the chipset.
BARB method invalidates cache when CPU is in hold state. What triggers flush to invalidate ? DMA from northbridge, DMA from ISA, CPU hold state...?
------------------------------
I always believed that on 386 boards there is only one mechanism for arbitrating DMA - with HOLD signal, but apparently some motherboards do it differently https://www.os2museum.com/wp/386-cache-coherency/ ? I dont even know why and how would one implement that? The only other signals capable of stopping CPU is /READY. For that to work CPU would have to be isolated from ram/chipset bus completely. I dont buy it.
Linked weird Cyrix patent https://patents.google.com/patent/US5724549A/en for flushing cache without HOLD is for:
- multi-master system where interrupts are used to synchronize events between bus masters
- multi-master system where polled I/O is used to synchronize events between bus masters
- multi-master system where polled memory-mapped I/O is used to synchronize events between bus masters
aka SMP system with multiple 386 CPUs on a shared bus :0 WTF Cyrix you crazy bastards why patent this in 1998 EDIT: patent filed in 1993-10-01 that makes much more sense. I remember someone selling a big box with tens of 386 back in the Eighties but cant find it now, what I could find is this 386_Junkie’s duo core 386; The modified Systempro! Compaq SystemPro would definitely require device from the patent to function with Cyrix L1 Cache enabled.
----------------------
Back to normal world, in normal designs every time Chipset wants to have the bus it asks CPU to disengage with HOLD pin. Build-in DMA (used by FDD and Sound Blaster), ISA Bus Master card (Adaptec) or even Chipset refreshing ram (on older chipsets lacking hidden refresh) all trigger this. CPU cant tell the difference so the answer is YES 😀 in BARB mode any one of those no matter if Write or Read, or even not doing anything (Turbo is sometimes controlled by pulsing HOLD) will flush whole L1.
Cyrix documentations says "Invalidate every time the CPU enters a HOLD state" "during each hold acknowledge cycle by setting the BARB bit"
I wrote out the truth table from the appendix for the flush circuit, but don't really follow the logic.
logic is checking if those two are true at the same time:
- HOLD is active = Chipset tells CPU to shut up and pause = we are inside a DMA cycle
- MEMWR is active = something on ISA bus is writing to ram
Obvious benefit is Reading with DMA (sound blaster) no longer flushes cache, neither does memory refresh on earlier chipsets (lacking hidden refresh).
Requires two logic gates, either one inverter and one NAND or two NANDs. 1993_TI486_Microprocessor_Reference_Guide.pdf has ~same diagram as Ernie_van_der_Meer document.
Cyrix provides some A20M_TST and DMA_TST DOS utilities to ensure L1 is working properly, however, I found this does not tell the whole story. Your system can pass these tests, yet still fail running Windows 3.1
Going back to "windows makes problems". Are you running in Standard mode with some real mode drivers loaded potentially capable of switching A20 gate? I dont know if windows switches CPU modes on its own, afaik Win 3.11 Enhanced mode and WfW is always running in protected mode VM and never goes to real mode, everything is either full protected or Virtual86. A20 doesnt matter in protected mode at all, Imo A20 should not be a problem in Windows. What is left is flushing. Maybe Adaptec DOS driver (rather its bios disk interrupt handlers) dont use Bus Mastering at all, and only loading windows with proper driver switches to it.
On my AMI Mark V Baby Screamer, my notes indicate that gate A20 on KBC connects directly to Vcc. Its not left open like on some boards.
https://theretroweb.com/motherboards/s/ami-ma … by-screamer-s42
I am guessing U33 chip laser etched and repainted to "Megatrends" is VLSI VL82C330. Pin 96 "-BLKA20" is the real A20 output from this chip and should be wired to Cyrix A20M input.
Chipset supports Fast A20 gate option, most likely manufactured left it permanently forced ON to not have to wire keyboard controller A20.
Other notes indicate "A20M_TST fails unless using AMISETUP to set the Turbo Switch Function to Enabled, then all tests pass... tried later to enable Turbo using AMISETUP and now A20M_TST fails... why?"
[/quote]
Maybe one of those runs was with "Enable A20M input" and another without?
"The TI486SLC/E automatically does not cache accesses, to the first 64 KBytes and to 1 MByte + 64 KBytes, if the NC0 bit in the CCR0 is set. This prevents data within the wraparound memory area from residing in the internal cache and thus eliminates the need for masking A20 to the internal cache."
That sounds as "first 64KB and 64KB at first 1MB boundary", but then CR0 description does indeed state:
"Non-cacheable 1-MByte Boundaries If=1: Sets the first 64 KBytes at each 1-MByte boundary as non-cacheable"
says _each_ 1MB boundary as in whole ram, like they didnt bother detecting if A21-... are active.
Nevertheless "cyrix.exe -e -b -cd -m- -r" should work flawlessly (albeit in slow super careful mode) on any board lacking Cyrix support.
This however
"however instead of disabling cache of the first 64 KB of each megabyte boundary, you can probably get away with just not caching the first 64KB after the first 1 MB boundary using -x10000,64 instead of -m-"
might fail A20 tests, or worse maybe even pass but work unpredictably. Cyrix is capable of designating 4 non cacheable areas, maybe "-x0,64 -x10000,64" can work better.
Looking at another motherboard, the Peak/DM, I have "DLC & SXL must use BARB w/1540CP in Windows. DRx2 OK to use FLUSH w/1540CP in Windows. 1520B OK to use FLUSH"
Weird. Either Flush works in a mobo or it doesnt, unless DRx2 is on interposers wuith circuit that cheats by always pulsing Flush on Hold, this would make FLUSH mode act exactly like BARB.
When using AHA-1542CP and PGA132 SXL2-50, I needed to set BARB to get reliable operation in Windows 3.11. DOS looked like it would work with FLUSH, but not WINDOWS. On a few occasions, I wired up a NAND gate to implement the FLUSH circuit as shown in the SXL2 databook appendix. I don't recall it making much of a difference with the ability to use FLUSH.
You need two gates for this to work and motherboard must not connect to CPU pin 30 (FLUSH).
Once I found the Evergreen SXL2-66 upgrade adaptor, which contains a PAL with various logic, I was able to use flush reliably w/AHA-1542CP. And the Evergreen upgrade adaptor does not have a header for connecting MEMW# to ISA MEMW#.
FLUSH cant work without that connection unless it just emulates BARB by flushing on any DMA whatsoever. Defeats the purpose of FLUSH mode, but I can see someone manufacturing CPU upgrade that does this to be foolproof and not require any software - if will work in any board with of without Cyrix BIOS support and regardless of enabled Cyrix options, all thanks to always working in the slowest safest mode.
If Im Kingston Technology I will probably make double super sure my CPU upgrades work in IBM computers super idiot proof during the "we need those barely capable machines booting Windows 95 like yesterday" corporate transition period 😀
Concerning the loss of performance when using BARB vs. FLUSH, I recall it being about 4% in doom, but I think that was without sound. I recall reading other testers indicating they saw a 10% hit, but I don't recall in what benchmark. This might be worth re-evaluating.
Performance loss without sound might indicate motherboard with no Hidden Refresh support = up to 60000 short HOLD periods per second when it suspends CPU to refresh ram. 10% is surprisingly low for so many flushes considering Turbo mechanism on some 486 boards is implemented by forcing flush around 10000 times per second.
I have dozens of pages of old notes and my head is starting to hurt understanding what I wrote and writing them in any coherent manner here. It might be best to start over if there's sufficient interest.
If you were wanting to experiment with this further, I can modify my existing testbed setup to use the AHA-1540CP instead of the AHA-1522B. I have an Evergreen SXL2-66, PGA132 SXL2-50, DRx2-66, the custom PGA168 SXL2-66 adaptor, and the Improve-It PGA168 to PGA132 SXL2 board (5V only)
I satisfied my curiosity by finally reading all about this mechanism, but if you want to have another go at implementing/verifying proper operation of Cyrix FLUSH circuit in various motherboards Im ready to assist 😀
I'm normally running the Adaptec DOS driver and ESS sound drivers in DOS, plus HIMEM. Nothing fancy. "Windows makes problems"... well, each motherboard behaves differently, and it has been over a decade, so I cannot really offer specifics without testing everything again. My memory of late isn't great. I swore that in my Adaptec box I had an AHA-1540CP (no floppy). I open the box today to pull out the card, and voila, it is the AHA-1542CP (with floppy).
It is not possible to disable DMA on the AHA-1542CP.
I'm not certain that Megatrends on the AMI Mark V Baby Screamer is the VL82C330. This was a theory, but I couldn't get any MRBIOS for the VLSI 330 series to work on this motherboard. Megatrends may be AMI's own chipset based on the 330. I don't know.
I ran the A20M_TST on my current benchtop and first time it runs, it says PASSED. Next time ran, it said FAILED. I gave up on this test. DMA test passes.
"cyrix.exe -e -b -cd -m- -r" Yes, this usually works universally. But sometimes when you are running a benchmark in DOS, if it is run in the area of memory that isn't being cached, you will get low benchmark results. Thus, for consistency in DOS, I usually run with cyrix -e -cd -f -i1 -i2 -i3 -i4, assuming flush works. More on this in a bit.
"maybe -x0,64 -x10000,64 can work better." I will keep this in mind to test if I run into problems. So far, the -x10000,64 has been sufficient. On my AMI Mark V Baby Screamer, I do get the occasional and not repeatable hang up - about 1 in every 20 times I turn it on to do something. The Mark V has been the most troublesome of all my 386 boards to get working with the SXL2. Maybe I will add the -x0,64. It will take years to test it at my rate of use though.
"You need two gates for this to work and motherboard must not connect to CPU pin 30 (FLUSH)." Yes, the NAND IC contained multiple NAND gates, and I used one NAND as the inverter and another as the NAND gate. More on this below.
"FLUSH cant work without that connection unless it just emulates BARB by flushing on any DMA whatsoever. "
More on this below.
"I can see someone manufacturing CPU upgrade that does this to be foolproof and not require any software - if will work in any board with of without Cyrix BIOS support and regardless of enabled Cyrix options, all thanks to always working in the slowest safest mode."
That logic is sound, however I haven't noticed any speed difference between Evergreen, Improve-It, PGA132, or the custom interposer (no logic) when flush is set. More on this in today's testing.
On the testbed now is my unbranded Symphony 461/362 based motherboard. I did find a sticker on it with a giant "GEM", perhaps that is the brand. I will assume it is for record keeping purposes. This motherboard is:
GEM MB386-40-SYM
256K L2
32 MB 9-chip, 60 ns DRAM
85 MHz crystal oscillator
ISA = 85/2/4 = 10.625 MHz
BIOS is from a DTK PEM-4036Y because it is the best for HaydnII boards
all my DOOM w/sound tests are performed with the mouse enabled
"van der meer" pins on PGA132 socket:
FLUSH# not connected
LOCK# not connected
A20M# not connected
ADS# 10K pullup to 5V
BIOS has an "enable L1 option", which on most boards, merely sets L1 to enabled, but leaves the entire 4GB uncacheable. On this board, that's not the case.
Upon booting to DOS without touching cyrix.exe, cyrix -q indicates this:
1A20M input enabled 2KEN input disabled 3FLUSH input enabled 4BARB disabled 5L1 enabled via CR0 664k of each 1MB set cacheable 7640K-1MB set cacheable 8 9Non-cacheable regions set: 100x000A000 128K 110x000C000 256K
AHA-1522B (non-bus mastering) CPU MEMW# is not connected to ISA MEMW#
The only cyrix command I needed to type was cyrix -cd to clock double. DOOM w/sound = 3689 = 20.25 fps (FLUSH# enabled)
if I disable FLUSH# using cyrix -f-, then DOOM w/sound = 20.25 fps (the same). How? if I test without sound, then DOOM w/out sound = 3370 = 22.16 fps
Next, type cyrix -f- -b to disable flush and enable barb DOOM w/sound = 4420 = 16.90 fps BARB
BARB showed a 20% performance hit w/sound enabled DOOM w/out sound = 3370 = 22.16 fps. Barb has the score as FLUSH when sound disabled.
Since CPU MEMW# is not connected to ISA MEMW#, what is flusing the L1 cache?
Next, I connect CPU MEMW# to ISA MEMW# with a wire. FLUSH# is set to enabled. The performance drops 11% Why? DOOM w/sound = 4091 = 18.26 fps
AHA-1542CP (bus mastering) CPU MEMW# is not connected to ISA MEMW#
I must disable Internal Cache in BIOS to even boot. Trying to enable L1 manually in DOS, the system hangs. For the system to function, I must disable FLUSH# and enable BARB. Upon which:
DOOM w/sound = 4408 = 16.94 fps (BARB), or about the same as AHA-1522B with BARB.
Next, I wire CPU MEMW# to ISA MEMW#. With this wire, I can now use FLUSH# and disable BARB, upon which:
DOOM w/sound = 4090 = 18.26 fps (FLUSH#), or an 8% benefit compared to using FLUSH# [when CPU MEMW# is wired to ISA MEMW#].
Summary
AHA-1542CP needs CPU MEMW# jumpered to ISA MEMW# for FLUSH to function, however AHA-1542CP can work with BARB without this jumper, but suffers a 8% performance hit. When using the AHA-1522B, and CPU MEMW# not connected to ISA MEMW#, BARB witnessed a 20% performance loss compared to FLUSH#. When connecting CPU MEMW# to ISA MEMW#, the FLUSH# results showed a 11% performance hit. The best overall performance is to use the AHA-1522B and not connect CPU MEMW# to ISA MEMW#. Can you make sense out of why the SXL2 is able to invalidate the L1 cache when the FLUSH# pin is not connected to anything and BARB isn't used; and why connecting CPU MEMW# to ISA MEMW# reduces DOOM performance?
I'm not certain that Megatrends on the AMI Mark V Baby Screamer is the VL82C330. This was a theory, but I couldn't get any MRBIOS for the VLSI 330 series to work on this motherboard. Megatrends may be AMI's own chipset based on the 330. I don't know.
Doubt it, but that can be verified further since thankfully there is 330 datasheet available.
I ran the A20M_TST on my current benchtop and first time it runs, it says PASSED. Next time ran, it said FAILED. I gave up on this test.
Testing cache is like testing ram. Cant just read ram once and call it good, there are worse and better ram tests, best ones are long and exhaustive. False negatives (returns "all fine") can happen, false positives almost never. Thus to me this situation means A20 not working, need to disable two 64KB ranges to fully avoid problems.
Doesnt matter when its not connected, CPU pulls it high internally, CPU will never see A20 masking and cant correctly keep track which 64KB is being accessed.
!!! There is no CPU MEMW pin! There is more general W/R pin 25/B10, and it is always handled by the Chipset because it corresponds to SPU bus state, not the overall computer/memory bus state.
There is a memory bus MEMW# somewhere on the board, but it has nothing to do with CPU and is usually buffered either with 74LS245 or chipset.Edit: was looking at wrong CPU package
DOOM w/sound = 4420 = 16.90 fps BARB
BARB showed a 20% performance hit w/sound enabled
Perfect, Thank you! This was the thing I was most curious about BARB mode. So DMA READs by Sound Blaster which are harmless to L1 Cache coherency cause ~5% ~20% performance drop on CPU heavy workload. Not good not bad, I would even call this acceptable for a fallback mode.
DOOM w/out sound = 3370 = 22.16 fps. Barb has the score as FLUSH when sound disabled.
Disabled sound means no DMA activity whatsoever (if mobo does hidden ram refresh, looks like this one does and calls it "Decoupled refresh"). No DMA means BARB doesnt trigger at all. 22.16 fps is the perfect score with CPU never invalidating L1 cache.
Since CPU MEMW# is not connected to ISA MEMW#, what is flusing the L1 cache?
Absolutely nothing in first test :---), CPU /HOLD /HLDA pin when BARB is enabled.
I had idea this morning that Cyrix F-ked up making /FLUSH pin too complicated. It required unreasonable effort from motherboard designers - two additional gates meaning whole TTL chip or using 486 Chipset just to fully support some optional upgrade CPU. Cyrix should have made 3 modes:
- BARB mode as is, perfect for lazy vendors
- FLUSH mode as is, perfect for mobos using 486 chipset as those generate Flush signal for free.
- MEMW mode where motherboard designer has to only wire ISA MEMW signal to /FLUSH pin 30/E13 and the CPU will internally handle combining it with /HOLD /HLDA (two gate circuit from Cyrix documentation)
That third mode would mean almost zero work for mobo manufacturers. EDIT: Spoilers, Cyrix did exactly that in 1994 in bigger packages.
Next, I connect CPU MEMW# to ISA MEMW# with a wire. FLUSH# is set to enabled. The performance drops 11% Why? DOOM w/sound = 4091 = 18.26 fps
1 CRIME CRIME!!!!! As mentioned above there is no CPU MEMW pin! wiring ISA MEMW to CPU W/R is highly illegal. Straight to Jail!Edit: was looking at wrong CPU package
2 Im guessing BARB is disabled here. 3 and the answer is dont know because this is Undefined State! Those signals should never be linked with each other!Edit: was looking at wrong CPU package
Next, I wire CPU MEMW# to ISA MEMW#. With this wire, I can now use FLUSH# and disable BARB, upon which: DOOM w/sound = 4090 = 18.26 fps (FLUSH#), or an 8% benefit compared to using FLUSH# [when CPU MEMW# is wired to ISA MEMW#].
Hmm. Lets think about this undefined state. MEMW linked to CPU W/R might force CPU to think there is a write to ram every time there is a write on ISA bus that somehow lines up with CPU reading its W/R pin. x86 CPU bus is synchronous and W/R pin will only be probed at certain cycles. The only ISA Writes happening during Doom should be to VGA ram ... and disk access when bus mastering. VGA ram should not be cacheable anyway plus its the CPU itself doing those writes. How Bus Mastered disk accesses leak to the CPU is a big mystery because during DMA cycles (HOLD/HLDA) CPU floats thus disconnects completely its address bus and W/R pin. Its all very WTF.Edit: was looking at wrong CPU package
AHA-1542CP needs CPU MEMW# jumpered to ISA MEMW# for FLUSH to function
Nooooo idea how that would even work. From the horses mouth:
" in the hold acknowledge state, the TI486SLC/E microprocessor floats all output and bidirectional signals, except for HLDA and SUSPA. HLDA is asserted as long as the TI486SLC/E CPU remains in the hold acknowledge state and all inputs except HOLD, FLUSH, FLT, SUSP and RESET are ignored."
According to Cyrix "AHA-1542CP needs CPU /FLUSH wired up to ISA /MEMW thru two gate circuit to function" and I would try that instead 😀Edit: was looking at wrong CPU package
however AHA-1542CP can work with BARB without this jumper, but suffers a 8% performance hit
That 8% is from Doom test with sound? You got 32megs of ram there so Im guessing no disk accesses during the test, that 8% is all on Sound Blaster invalidating cache every time it requests samples.
. When using the AHA-1522B, and CPU MEMW# not connected to ISA MEMW#, BARB witnessed a 20% performance loss compared to FLUSH#. When connecting CPU MEMW# to ISA MEMW#, the FLUSH# results showed a 11% performance hit.
You somehow by accident discovered undocumented and totally unsupported Cyrix CPU mode of operation! 😀
Documentation states " in the hold acknowledge state, the TI486SLC/E microprocessor floats all output and bidirectional signals, except for HLDA and SUSPA. HLDA is asserted as long as the TI486SLC/E CPU remains in the hold acknowledge state and all inputs except HOLD, FLUSH, FLT, SUSP and RESET are ignored."
meaning W/R pin is not only floating but also ignored! Looking at detailed Bus timing diagrams there is a slim window of Half a Clock between asserting HOLD and CPU going into Hold Acknowledge State (HLDA).
Still its all very impossible 😮 because AHA-1542CP will not drive address it want to Bus Master until it receives DACK, and DACK will only be set after CPU went into HLDA state. Edit: was looking at wrong CPU package
Please forget that unholy undocumented unsupported combination! 😀 Its no longer surprising you get occasional crashes 😀
Grab 74LS00, cobble together "van der meer" circuit, connect it according to description to CPU /HOLD + ISA MEMW# = CPU /FLUSH pins and lets go for another round of tests 😀
BARB disabled, FLUSH enabled/disabled, L1 enabled/disabled. Doom with sound enabled/disabled, Reading floppy, bus mastering disk controller.Edit: was looking at wrong CPU package
And as a cherry on top you can try BARB mode with Hidden Refresh disabled ("Decoupled refresh"). Wonder how bad it will get, how close to disabled L1 cache case.
Last edited by rasz_pl on 2024-12-23, 16:06. Edited 1 time in total.
Last I checked, read/write floppy access works fine on AHA-1522B on this GEM motherboard without BARB. If you have a particular floppy test combination you wanted me to try, I will try it out.
If the L1 cache doesn't invalidate on this board at all, then wouldn't it be crashing all the time? I've not had any issues with this board. I can read/write floppies in Win3.1 and play audio files, load old webpages, etc.
There is a MEMW# pin on the PGA168 and QFP144 TI SXL2 CPUs. Are you referencing the SXL databook, or the DLC databook? The MEMW# on these CPUs is hooked to an internal flush circuit, the same one that they show in the databook. No need for the external flush circuit when using the MEMW# pin. Just need to connect MEMW# (B16) to ISA MEMW#.
I do not normally use MEMW# because it hasn't been needed on any motherboard I've tested. I do not get occasional crashes on any of these 386 boards, except for the infrequent crash on the AMI Mark V Baby Screamer, which doesn't have MEMW# connected. The Evergreen upgrade adaptor, which installed on the Mark V board, doesn't have a header for the MEMW# pin. I've been wanting to replace the 1540CP on the Mark V with an 1520B to see if my once in 3 year hang goes away. It's usually a hang on shutdown from W31 to DOS, whereby the C: prompt doesn't appear. And using IE5 on the Mark V also causes disk corruption, so I uninstalled it many years ago. I never had any crashes in any other programme. The reason I haven't swapped out the 1540CP on this board is the complexity of my multi-OS setup. NT3.5 and NT4 are particular when it comes to swapping the primary HDD controller.
Plan your life wisely, you'll be dead before you know it.
Last I checked, read/write floppy access works fine on AHA-1522B on this GEM motherboard without BARB.
without MEMW linking? We are only interested in Floppy reading. Even something as simple as executing two different files from FDD in succession has potential of crashing if DOS loads them up in same spot.
There is a MEMW# pin on the PGA168 and QFP144 TI SXL2 CPUs.
I need to dig up interposed project to see what is going on 😀 ... and now I understand, you are using "144-Pin QFP or 168-Pin PGA" variant!
I made a fool of myself operating on the assumption its a TI486SLC 100 pin QFP chip converted to 88 pin PGA.
Checking 1994_TI486SXLC_and_TI486SXL_Microprocessors_Reference_Guide.pdf and everything makes sense.
My yesterdays shower thought "Cyrix should have made 3 modes:... MEMW mode" came to pass in 1994 after all, instead of switching pin mode they added another dedicated pin. False crime report! Im going to jail instead 🙁
Next, I connect CPU MEMW# to ISA MEMW# with a wire. FLUSH# is set to enabled. The performance drops 11% Why?
DOOM w/sound = 4091 = 18.26 fps
so it IS disk activity after all! What little Doom is loading during the benchmark still manages to invalidate cache enough for 11% performance drop.
Sound Blaster cost with BARB ~20%
Sound Blaster cost with FLUSH 0%
Bus Mastering disk controller loading level during the benchmark even with 32MB of ram 11%.
The only benchmarks missing for full picture are
- BARB mode with BIOS "Decoupled refresh" disabled.
- L1 cache completely disabled.
It's usually a hang on shutdown from W31 to DOS, whereby the C: prompt doesn't appear.
That does sound like a place where A20 might be fiddled with, or something to do with AHA-1542CP BIOS. Does this happen with different CPUs? Regardless of "-m-" or "-m"?
And using IE5 on the Mark V also causes disk corruption
Weird. Disk corruption would imply something going wrong during writing to disk, but L1 cache is write thru so ram always has the current state. Only happens with L1 enabled? Might be unrelated to Cyrix.
When using the AHA-1522B (no MEMW# connection), I can execute several DOS programmes from floppy in succession without issue (tried up to 4). DMA_TST provided by Cyrix always passes. It uses the floppy drive as the DMA check.
What mechanism is invalidating the L1 cache when the FLUSH# pin on MB is not connected, when MEMW# isn't wired, when BARB isn't used, and when not using an Evergreen interposer with PAL logic?
I've always had A20M_TST issues, on (I think) every motherboard. I ran a few more tests with the GEM board and AHA-1522B. The PASS/FAILS seem almost random. If I hit "toggle A20 Gate" repeatedly, I can get PASS, PASS, PASS, FAIL, FAIL, PASS, FAIL, PASS, PASS, PASS... etc. The faster I hit toggle between tests, there's a greater probability that the next test will be a fail, but not always. I ran this test with A20M input enabled/disabled (cyrix.exe), with MEMW# wired/not wired, and with PGA132 A20M# connected/disconnected from KBC A20. None of these options altered the seemingly random PASS/FAIL occurrences with A20M_TST. exe
I've attached those Cyrix tests if you were wanting to see them:
Bus Mastering disk controller loading level during the benchmark even with 32MB of RAM 11%
I also get that 11% drop when using a non-bus mastering SCSI controller (AHA-1522B).
Why would anyone want to wire up MEMW# when using the AHA-1522B, if the system runs faster and well without it?
I should also point out that I can use the AHA-1542CP with FLUSH# enabled (MEMW# not wired) if I am using the Evergreen QFP SXL2 interposer (contains some onboard PAL logic). That's how my AMI Mark V Baby Screamer is configured currently. I think you can see why I want to switch to the AHA-1520B on that system. I also want to try the AHA-1542CP on this GEM board to ensure it can also use FLUSH# without MEMW# wired.
I should scan my Evergreen Revto486 manual as it goes into much detail regarding this cache mess. Unfortunately, the one I have saved on PDF is too large to attach here. Who ever scanned it scanned it as greyscale rather than 1-bit images.
If I disable Decoupled Refresh, and use BARB, the result is 13.57 fps (20% loss compared to using BARB w/decoupled refresh)
If I disable L1 entirely, the result is 11.15 fps.
As I recall it, with the AMI Mark V Baby Screamer, the hang on shutdown (C:\ doesn't appear, but the screen still goes to a black DOS screen) happened regardless of -m-. I used to run the system with -m- to disable caching 64K at every MB boundary. I switched to disabling caching of 64K after first 1 MB only about a year ago. Do you think adding X0,64 will help the hang-up? Did you say that all systems with DLC/SXL need to disable the first 64K starting a 0 MB?
The issue with IE5 on the Baby Screamer... it has been at least 7 years since I played with this, but I recall that when IE5 would hang (about 1 in 3 tries), I was forced to shutdown by ctrl-alt-del or reset button. Upon rebooting, the system.ini file or config.sys file would be corrupt, sometimes with garbage characters, or somethings with spaces in the text where there shouldn't be, or the bottom of the page erased. I don't know if the corruption only occurred with L1 enabled because I didn't try it without L1. The system will run too slow loading IE5 without L1.
Something curious, though (Baby Screamer still)... the SXL won't clock double if I have more than a 66.6 MHz crystal oscillator installed (AHA-1540CP). Or it could go up to 72 MHz with AHA-1520B installed. That system also has a unique clock-doubled FPU installed, the ULSI DX2-66. I'm not sure if any of this is related to the IE5 hang-ups, but it points to the oddness of this system.
Last edited by feipoa on 2024-12-23, 15:40. Edited 1 time in total.
Plan your life wisely, you'll be dead before you know it.
I can confirm that, as with the Baby Screamer, on the GEM MB386-40-SYM, I can correctly use the Evergreen QFP SXL2-66 and run DOOM w/sound, read floppy, etc, all without BARB or MEMW# wired (AHA-154xCP). I must have the FLUSH# input enabled via cyrix -f. Doom w/sound = 20.13 fps. Similarly, with this Evergreen PGA132 to PGA132 interposer, which just contains a PAL, I can also run without MEMW# wired and use FLUSH#. Result was also 20.13 fps. These two boards are shown below:
I can also use this Improve-It PGA168 to PGA132 interposer w/AHA-1542CP and no MEMW# wire, but the score is not very good. DOOM w/sound = 16.94 fps, which is consequently the same score achieved when using BARB. However, I confirmed that BARB was not enabled and FLUSH was. Even if I use the AHA-1522B, I still get 16.96 fps using this the Improve-It interposer. This interposer is shown here:
When using the AHA-1522B (no MEMW# connection), I can execute several DOS programmes from floppy in succession without issue (tried up to 4). DMA_TST provided by Cyrix always passes. It uses the floppy drive as the DMA check.
What mechanism is invalidating the L1 cache when the FLUSH# pin on MB is not connected, when MEMW# isn't wired, when BARB isn't used, and when not using an Evergreen interposer with PAL logic?
There isnt one, it shouldnt work 😮 I could understand floppy working because of for example Adaptec bios routines taking over floppy handling and doing stuff with raw IO, but Cyrix dma test should be smarter than that. Unless...
I ran this test with A20M input enabled/disabled (cyrix.exe), with MEMW# wired/not wired, and with PGA132 A20M# connected/disconnected from KBC A20. None of these options altered the seemingly random PASS/FAIL occurrences with A20M_TST. exe
All possible if mobo bios has "fast A20 Gate" option enabled. In that case one needs to find appropriate pin either in chipset datasheet or tracing in reverse from address bus pin A20 or even SIM socket. On the other hand Im now reading Re: Hardware mod for L1 cache support on Cyrix 486DLC/SXL CPUs and IanB already in 2017 confirmed your garbage A20M_TST observations.
I've attached those Cyrix tests if you were wanting to see them:
Perfect. Google claims to me the only occurrence of a string "A20M_TST.exe" on the whole internet is only in this very thread 😐 becoming more worthless by the day 🙁
Bus Mastering disk controller loading level during the benchmark even with 32MB of RAM 11%
I also get that 11% drop when using a non-bus mastering SCSI controller (AHA-1522B).
Now that is weird. To explain this you would have to setup logic analyzer, or a NAND gate + scope, with MEMW and HLDA as inputs. In theory in DOOM nothing whatsoever should be pulling those two pins down at the same time!?!?! No floppy, bus master activity, nothing.
I should also point out that I can use the AHA-1542CP with FLUSH# enabled (MEMW# not wired) if I am using the Evergreen QFP SXL2 interposer (contains some onboard PAL logic).
If I disable Decoupled Refresh, and use BARB, the result is 13.57 fps (20% loss compared to using BARB w/decoupled refresh)
If I disable L1 entirely, the result is 11.15 fps.
Thank you. So ram refresh triggering flush pretty much kills cache gains.
You can confirm if this crash is due to A20 by putting scope on A20 line on the mobo, the real one generated by the chipset, and checking if there is level change when you perform this operation.
I can confirm that, as with the Baby Screamer, on the GEM MB386-40-SYM, I can correctly use the Evergreen QFP SXL2-66 and run DOOM w/sound, read floppy, etc, all without BARB or MEMW# wired. I must have the FLUSH# input enabled via cyrix -f.
... unless all those boards do control FLUSH pin E13 after all!
You did say
"pins on PGA132 socket:
FLUSH# not connected"
and
"AHA-1542CP (bus mastering)
I must disable Internal Cache in BIOS to even boot."
but there is no other explanation 😐 To confirm solder a wire on the back of mobo to pin E13, put it on a scope or logic probe and see it it blinks/changes logic level. That would explain most of what has been happening from the start, except
-lower score with MEMW wired
-not booting with AHA-1542 with cache enabled
Doom w/sound = 20.13 fps. Similarly, with this Evergreen PGA132 to PGA132 interposer, which just contains a PAL, I can also run without MEMW# wired and use FLUSH#. Result was also 20.13 fps.
with AHA-1542CP? only explanation would be mobo properly driving FLUSH pin E13. Without AHA-1542CP sure as there is no DMA writes during Doom test.
I can also use this Improve-It PGA168 to PGA132 interposer w/AHA-1542CP and no MEMW# wire, but the score is not very good. DOOM w/sound = 16.94 fps, which is consequently the same score achieved when using BARB. However, I confirmed that BARB was not enabled and FLUSH was.
That sounds exactly like what I described earlier - an idiot-proof interposer doing BARB no matter what to cut down on "it crashes" complaints 😀 btw this contradicts earlier "however I haven't noticed any speed difference between Evergreen, Improve-It, PGA132, or the custom interposer (no logic) when flush is set."
The Improve-It interposer contains the same circuit as commonly found on this SXL2 upgrade
only HLDA and FLUSH# pins are connected, it FLUSHes on every HLDA = emulates BARB.
New question is one of
- why is floppy working with no FLUSH?
- is FLUSH pin E13 really not active? then why is AHA-1542CP not booting?
1KB L1 is really small (*) and BIOSes are bloated. I could see a situation where ram cached Bios Floppy read routines are soo big they fill whole L1 cache while DMA is taking place, so after a floppy read CPU doesnt hold any stale data thus nothing bad happens. Meanwhile HDD reading routines expect much faster medium so they are small and optimized = CPU will have some stale data after a read = instant crash.
* https://www.youtube.com/watch?v=LQcLhBZY12g 50:40 "Design and development of the Intel 80386 microprocessor" on Computer History Museum channel. September 2, 1988. Pat Gelsinger, now an ex Intel CEO fired just two weeks ago hehe, then 386 design Project Manager talks about design process. During the talk they say "we got rid of 512 BYTES of cache because it was TOO BIG" 😁 :-] 😀
There isnt one, it shouldnt work :o I could understand floppy working because of for example Adaptec bios routines taking over floppy handling and doing stuff with raw IO...
My understanding was that the Adaptec BIOS doesn't touch floppy mechanisms.
Re: Hardware mod for L1 cache support on Cyrix 486DLC/SXL CPUs and IanB already in 2017 confirmed your garbage A20M_TST observations...[the test is] becoming more worthless by the day
I completely forgot about that thread. Good you found some reference to this A20M_TST.
Now that is weird. To explain this you would have to setup logic analyzer, or a NAND gate + scope, with MEMW and HLDA as inputs. In theory in DOOM nothing whatsoever should be pulling those two pins down at the same time!?!?! No floppy, bus master activity, nothing.
With MEMW# wired from CPU to ISA, did you want me to probe MEMW# and HLDA? HLDA on chipset or PGA132?
So ram refresh triggering flush pretty much kills cache gains.
Well, there's still some benefit with L1 and no hidden refresh, but the results are disappointing. Best to you use the AHA-152xB, even if FLUSH# on PGA132 goes nowhere.
You can confirm if this crash is due to A20 by putting scope on A20 line on the mobo, the real one generated by the chipset, and checking if there is level change when you perform this operation.
Are you referring to A20 on the assumed VLSI 330 chispet? My AMI Mark V Baby Screamer is cased. I try not to poke around inside when motherboards are already cased. It will be difficult to jumper A20 from the chipset cased. If I ever pull it from the case, I'll try to remember to check this.
From my limited understanding of 16 bit real mode addressing both starting at 0 and starting at 1MB. Or connect proper A20 signal.
Do you mean jumpering A20M# from PGA132 to Gate A20 on the KBC? On the Baby Screamer, A20 on KBC is tied directly to 5V. Should I still jumper these two together?
... unless all those boards do control FLUSH pin E13 after all!
You did say
"pins on PGA132 socket:
FLUSH# not connected"
and
"AHA-1542CP (bus mastering)
I must disable Internal Cache in BIOS to even boot."
but there is no other explanation :| To confirm solder a wire on the back of mobo to pin E13, put it on a scope or logic probe and see it it blinks/changes logic level. That would explain most of what has been happening from the start, except
-lower score with MEMW wired
-not booting with AHA-1542 with cache enabled
I probed the FLUSH# pin on the back of the PGA132 socket with my scope. It stays high no matter what I do in DOS. I don't think this pin is connected to anything. I probed everywhere with my DMM already.
with AHA-1542CP? only explanation would be mobo properly driving FLUSH pin E13. Without AHA-1542CP sure as there is no DMA writes during Doom test.
Yes, with AHA-1542CP. FLUSH# stays high. I probed MEMW# when MEMW# CPU connected to MEMW# ISA and I see several pulses to GND upon running something like Landmark Speed Test v6 or with running the dir command in DOS.
That sounds exactly like what I described earlier - an idiot-proof interposer doing BARB no matter what to cut down on "it crashes" complaints :) btw this contradicts earlier "however I haven't noticed any speed difference between Evergreen, Improve-It, PGA132, or the custom interposer (no logic) when flush is set."
That is because my previous tests with this [piece of junk] Improve-It interposer were with sound disabled. With sound dsiabled, all the benchmarks read about the same. I have gotten so used to running DOOM with, "doom -nosound -nomouse -timedemo demo3" that the thought to enable sound didn't cross my mind. We do all our benchmarking without sound, but in the case of determinig cache invalidation and final performance, I can see the value in enabling sound. Maybe all 386 era DOOM benchmarks should be run sound on from now on.
New question is one of
- why is floppy working with no FLUSH?
I assume this question is 2-fold:
Why is the floppy working with the AHA-1522B (no BARB, no MEMW#, no Evergreen PAL)?
Why is the floppy working with the AHA-1542CP w/FLUSH# w/Evergreen PAL?
New question is one of
- is FLUSH pin E13 really not active? then why is AHA-1542CP not booting?
I assume this question is regarding use of the Alpha 1 custom interposer (not Evergreen PAL).
FLUSH# is not active. Without MEMW# wired, isn't AHA-1542CP not booting because L1 is getting corrupted (no proper cache invalidation)? If I set L1 to disabled in the BIOS, and don't load cyrix.exe, the system boots.
1KB L1 is really small (*) and BIOSes are bloated. I could see a situation where ram cached Bios Floppy read routines are soo big they fill whole L1 cache while DMA is taking place, so after a floppy read CPU doesnt hold any stale data thus nothing bad happens. Meanwhile HDD reading routines expect much faster medium so they are small and optimized = CPU will have some stale data after a read = instant crash.
The SXL2 uses 8 KB cache. It is the DRx2, DLC, and SLC which use only 1 KB. I recall having some issue with my 286 to 486SLC upgrade. I was able to get the SLC (1 KB) working, but not the SXLC2 (8 KB).
* https://www.youtube.com/watch?v=LQcLhBZY12g 50:40 "Design and development of the Intel 80386 microprocessor" on Computer History Museum channel. September 2, 1988. Pat Gelsinger, now an ex Intel CEO fired just two weeks ago hehe, then 386 design Project Manager talks about design process. During the talk they say "we got rid of 512 BYTES of cache because it was TOO BIG" :D :-] :-)
I was just reading about how Gelsinger was let go in early December. He took some wild gambles. Things aren't looking great for Intel.
Plan your life wisely, you'll be dead before you know it.
One thing I forgot to mention, when I first got my 486SXL2 working it would randomly freeze in windows NT4, always at similar points during bootup or shutdown but this didn't happen when the cache was disabled. After a lot of experimentaion I noticed that the PSU has a slight inductor whistle which changed pitch at times when the freeze occurred. The pitch change is due to varying current load and the only thing that has significantly varying load like that is the CPU (no HD as I use compact flash) so I theorised that when the cache was enabled and the CPU was running at full speed from code in the cache that it was causing momentary spikes on the supply rails. I added a few 100n capacitors across the +5v line by the CPU and this improved things but it wasn't fixed until I added a 1000uF capacitor as well. That was probably overkill but it shows that random freezes may simply be decoupling issues rather than cache coherency as the motherboards weren't designed for that CPU.
On my AMI Mark V Baby Screamer, I'm using the Evergreen QFP SXL2-66. It only has 4 decoupling capacitors. IanB mentioned he had to add a 1000 uF cap to fix the issue. Maybe I should also add a few extra 100 nF and 1000 nF caps to the PGA132 on the motherboard.
From the SXL2 thread, I've done a little more testing and will copy/paste that information here due to relevance.
.
.
.
From MikeSG's question, I've run a few comparative numbers for the case of:
- no FLUSH#
- no MEMW# wiring
- no slow, hidden, or decoupled refresh BIOS options
- using DMA Bus Mastering SCSI controller (AHA-1542CP)
- using BARB only
Run on a GEM MB386-40-SYM (Symphony 461 / 362) motherboard with SXL2 clocked at 85 MHz.
The readme for DRAM.COM states that it is normally safe to set refresh rate to 1000 uS, but noted that after about 250 uS, the gains are negligible. On my AMI Mark V Baby Screamer, I use DRAM 40, or 40 uS, because I was finding floppy drive access slowed down noticeably when altering this refresh rate.
Notice how the L1 cache read speed doesn't pickup to the levels when using hidden refresh, even with DRAM.COM at 250 uS.
Conclusion
From the DOOM results, all is not lost when using BARB if your system doesn't support hidden/decoupled or slow refresh. We can obtain almost identical results using DRAM.COM, that is 16.74 fps compared to 16.80 fps. However, without slowing down the refresh (no DRAM.COM), and without using hidden refresh, the results are a mere 13.58 fps. If your system supports hidden refresh and you slow the DRAM refresh rates to 250 uS, you can achieve up to 17.01 fps with BARB. Using FLUSH with MEMW#, these results go up to 18.26 fps. Skipping the bus mastering SCSI controller in favour of AHA-1522B, results go up to 20.25 fps. Or you can still use the bus mastering SCSI controller if you are using an Evergreen upgrade with some clever logic in the PAL chip to achieve 20.25 fps. My preferred approach is to skip the bus mastering SCSI controller; I don't find the system noticeably faster.
Plan your life wisely, you'll be dead before you know it.