VOGONS


Reply 80 of 95, by pshipkov

User metadata
Rank Oldbie
Rank
Oldbie

Thanks for the info. Sounds pretty good. Will link your post to another post in another thread - for a reference.

JohnBourno wrote on 2022-03-12, 00:06:

Phil's Ultimate VGA Benchmark Database and the fastest 386 scores were 25.9 in 3D Bench and 5.9 in PC Player Bench (320x240)

Huh ? That database seems to be terribly obsolete. 😀

retro bits and bytes

Reply 82 of 95, by aspiringnobody

User metadata
Rank Member
Rank
Member

Sorry for thread necro.

If I’m understanding this correctly, BARB won’t work properly on any board with look-through cache?

I’m using a late OPTI 495SX motherboard that has 486 and 386 sockets (SY-024B1/B2). I already have a 486 so I’m building it with the 486DLC 40 - with cache off it is close enough to a 386 DX 40 and when I enable cache I get some more oomph.

Flush doesn’t seem to work, which is interesting because it would need to have a flush circuit anyway for the 486. It claims to support the 486DLC and enables the cache in flush mode if I do it via bios.

It works-ish with Barb but is unstable. It could be my 15ns tag ram chip on the motherboard’s cache — I’ve got some 10ns chips on order. But after reading this I’m thinking it’s the BARB issue. It definitely is look-through because of the huge penalty incurred on cache misses.

I’m going to check the board and see if the flush and A20 pins on the 386 and 486 socket have continuity. Is the 486 flush pin also active low? If so can I just jumper from the 486 socket the 386 socket to enable flush support?

feipoa wrote on 2015-11-20, 21:18:
I plan on modifying my AMI Mark V Baby Screamer (VLSI 330/331/332) and Chainech 340 (SiS 310/320/330) motherboards to properly […]
Show full quote

I plan on modifying my AMI Mark V Baby Screamer (VLSI 330/331/332) and Chainech 340 (SiS 310/320/330) motherboards to properly support the L1 cache of the Cyrix 486DLC and 486SXL processors using the FLUSH# pin. The hardware mod is quite simple and can be accomplished with a single NAND quad or dual channel package (74F00N).

The problem I am having is that I cannot determine if the L2 cache on these motherboards uses look-aside (parallel) or look-through (serial) scheme. Reading the chipset data sheets did not provide a direct answer. With parallel access, both, the cache and the main memory are accessed simultaneously. If a cache hit occurs, the access to the main memory is aborted. In serial access, the cache is first examined and, if a miss occurs, the main store is accessed.

Instructions from the Ti486DLC Reference Guide are shown below. For parallel L2 cache, we are instructed to take the HLDA pin from the CPU's HLDA pin. This is straight forward enough. However, if I were to make an educated guess, I would say these motherboards likely use serial L2 cache access. How can I be certain?

However, for serial L2 cache, we are instructed to take the HLDA pin from the motherboard's chipset. Looking at the SiS board's chipset, the HLDA pin's input is directly connected to the CPU's HLDA output. So I cannot figure out how to differentiate between the two Ti connection schemes on this motherboard. Perhaps I am supposed to use DMAHLDA from the chipset instead?

Alternately, the VLSI motherboard has a chipset pin HLDA as an input in, which is not direclty connected to the CPU's HLDA pin. I am confused as to why it is an input and not an output pin.

[click to enlarge]Parallel_L2.png[click to enlarge]Serial_L2.png

Reply 83 of 95, by mkarcher

User metadata
Rank l33t
Rank
l33t
aspiringnobody wrote on 2023-08-08, 17:21:

It works-ish with Barb but is unstable. It could be my 15ns tag ram chip on the motherboard’s cache — I’ve got some 10ns chips on order.

Actual 10ns DIP SRAMs are rare items. When 10ns access speed was common, DIP SRAM wasn't common anymore. 10ns SOJ SRAM is more common and still in production.

On the other hand, DIP SRAM chips with "-10" printed on them are not that rare on the secondary market, but they often do not perform at that speed. This can also apply to SOJ chips. I recently got bitten by a lot of 20 "CY7C1009D-10" SOJ chips I bought of AliExpress some years ago (and didn't use initially). After I finally put them to use, they performed mediocre. And if my test setup with my digital scope is not completely crooked, they perform like 15ns are supposed to perform, so I'd guess they are relabelled 15ns chips. At least they are still perfectly working 128k * 8 SRAM chips, just not at the advertised speed.

So if the "10ns" chips you ordered don't improve the situation, the reason might still be the tag ram.

Reply 84 of 95, by aspiringnobody

User metadata
Rank Member
Rank
Member
mkarcher wrote on 2023-08-08, 20:17:
Actual 10ns DIP SRAMs are rare items. When 10ns access speed was common, DIP SRAM wasn't common anymore. 10ns SOJ SRAM is more c […]
Show full quote
aspiringnobody wrote on 2023-08-08, 17:21:

It works-ish with Barb but is unstable. It could be my 15ns tag ram chip on the motherboard’s cache — I’ve got some 10ns chips on order.

Actual 10ns DIP SRAMs are rare items. When 10ns access speed was common, DIP SRAM wasn't common anymore. 10ns SOJ SRAM is more common and still in production.

On the other hand, DIP SRAM chips with "-10" printed on them are not that rare on the secondary market, but they often do not perform at that speed. This can also apply to SOJ chips. I recently got bitten by a lot of 20 "CY7C1009D-10" SOJ chips I bought of AliExpress some years ago (and didn't use initially). After I finally put them to use, they performed mediocre. And if my test setup with my digital scope is not completely crooked, they perform like 15ns are supposed to perform, so I'd guess they are relabelled 15ns chips. At least they are still perfectly working 128k * 8 SRAM chips, just not at the advertised speed.

So if the "10ns" chips you ordered don't improve the situation, the reason might still be the tag ram.

I would think the 15ns chips installed currently would be good enough at 3-1-1-1 but apparently not.

However, there are still hard lockups when I disable external cache so I'm leaning towards the look-through cache being a problem -- it's just weird that it's not more of a common knowledge thing. I'd venture that the vast majority of 386-era motherboards were look-through instead of look-aside -- it's cheaper and easier to produce. The 486DLC chips must have been very uncommon in the day and I'd venture most of them that were sold never lived up to their potential because of the cache coherency issues.

Gonna get my scope out tonight and see if the 486 socket's flush signal is active low. If so I'll bodge it to the 386 socket and hopefully that will let me enable flush.

Reply 85 of 95, by Anonymous Coward

User metadata
Rank l33t
Rank
l33t
aspiringnobody wrote on 2023-08-08, 17:21:
Sorry for thread necro. […]
Show full quote

Sorry for thread necro.

If I’m understanding this correctly, BARB won’t work properly on any board with look-through cache?

I’m using a late OPTI 495SX motherboard that has 486 and 386 sockets (SY-024B1/B2). I already have a 486 so I’m building it with the 486DLC 40 - with cache off it is close enough to a 386 DX 40 and when I enable cache I get some more oomph.

Flush doesn’t seem to work, which is interesting because it would need to have a flush circuit anyway for the 486. It claims to support the 486DLC and enables the cache in flush mode if I do it via bios.

It works-ish with Barb but is unstable. It could be my 15ns tag ram chip on the motherboard’s cache — I’ve got some 10ns chips on order. But after reading this I’m thinking it’s the BARB issue. It definitely is look-through because of the huge penalty incurred on cache misses.

I’m going to check the board and see if the flush and A20 pins on the 386 and 486 socket have continuity. Is the 486 flush pin also active low? If so can I just jumper from the 486 socket the 386 socket to enable flush support?

feipoa wrote on 2015-11-20, 21:18:
I plan on modifying my AMI Mark V Baby Screamer (VLSI 330/331/332) and Chainech 340 (SiS 310/320/330) motherboards to properly […]
Show full quote

I plan on modifying my AMI Mark V Baby Screamer (VLSI 330/331/332) and Chainech 340 (SiS 310/320/330) motherboards to properly support the L1 cache of the Cyrix 486DLC and 486SXL processors using the FLUSH# pin. The hardware mod is quite simple and can be accomplished with a single NAND quad or dual channel package (74F00N).

The problem I am having is that I cannot determine if the L2 cache on these motherboards uses look-aside (parallel) or look-through (serial) scheme. Reading the chipset data sheets did not provide a direct answer. With parallel access, both, the cache and the main memory are accessed simultaneously. If a cache hit occurs, the access to the main memory is aborted. In serial access, the cache is first examined and, if a miss occurs, the main store is accessed.

Instructions from the Ti486DLC Reference Guide are shown below. For parallel L2 cache, we are instructed to take the HLDA pin from the CPU's HLDA pin. This is straight forward enough. However, if I were to make an educated guess, I would say these motherboards likely use serial L2 cache access. How can I be certain?

However, for serial L2 cache, we are instructed to take the HLDA pin from the motherboard's chipset. Looking at the SiS board's chipset, the HLDA pin's input is directly connected to the CPU's HLDA output. So I cannot figure out how to differentiate between the two Ti connection schemes on this motherboard. Perhaps I am supposed to use DMAHLDA from the chipset instead?

Alternately, the VLSI motherboard has a chipset pin HLDA as an input in, which is not direclty connected to the CPU's HLDA pin. I am confused as to why it is an input and not an output pin.

[click to enlarge]Parallel_L2.png[click to enlarge]Serial_L2.png

Did you try enabling cache with software and enabling the exclusion zones?
cyrix.exe -f -m- -xA000,128 -xC000,256

"Will the highways on the internets become more few?" -Gee Dubya
V'Ger XT|Upgraded AT|Ultimate 386|Super VL/EISA 486|SMP VL/EISA Pentium

Reply 86 of 95, by aspiringnobody

User metadata
Rank Member
Rank
Member
aspiringnobody wrote on 2023-08-08, 21:50:
I would think the 15ns chips installed currently would be good enough at 3-1-1-1 but apparently not. […]
Show full quote
mkarcher wrote on 2023-08-08, 20:17:
Actual 10ns DIP SRAMs are rare items. When 10ns access speed was common, DIP SRAM wasn't common anymore. 10ns SOJ SRAM is more c […]
Show full quote
aspiringnobody wrote on 2023-08-08, 17:21:

It works-ish with Barb but is unstable. It could be my 15ns tag ram chip on the motherboard’s cache — I’ve got some 10ns chips on order.

Actual 10ns DIP SRAMs are rare items. When 10ns access speed was common, DIP SRAM wasn't common anymore. 10ns SOJ SRAM is more common and still in production.

On the other hand, DIP SRAM chips with "-10" printed on them are not that rare on the secondary market, but they often do not perform at that speed. This can also apply to SOJ chips. I recently got bitten by a lot of 20 "CY7C1009D-10" SOJ chips I bought of AliExpress some years ago (and didn't use initially). After I finally put them to use, they performed mediocre. And if my test setup with my digital scope is not completely crooked, they perform like 15ns are supposed to perform, so I'd guess they are relabelled 15ns chips. At least they are still perfectly working 128k * 8 SRAM chips, just not at the advertised speed.

So if the "10ns" chips you ordered don't improve the situation, the reason might still be the tag ram.

I would think the 15ns chips installed currently would be good enough at 3-1-1-1 but apparently not.

However, there are still hard lockups when I disable external cache so I'm leaning towards the look-through cache being a problem -- it's just weird that it's not more of a common knowledge thing. I'd venture that the vast majority of 386-era motherboards were look-through instead of look-aside -- it's cheaper and easier to produce. The 486DLC chips must have been very uncommon in the day and I'd venture most of them that were sold never lived up to their potential because of the cache coherency issues.

Gonna get my scope out tonight and see if the 486 socket's flush signal is active low. If so I'll bodge it to the 386 socket and hopefully that will let me enable flush.

So, my motherboard has correct A20M and FLUSH implementations. A20M goes to the KB controller, pin 22. FLUSH goes to the output of a 74F00 Quad NAND gate. The inputs to the gate that goes to flush are tied to the HLDA pin of the CPU and MEMW from the ISA bus (aka it's the parallel implementation).

I ran the tests feopia posted earlier, and both DMA and A20 pass when configured by the bios of the motherboard. The only thing that seems to fail, at all, is speedsys. I am able to load windows 3.11, play MP3 files, and browse the floppy disk without crashes. Doom works, as does Quake. My bios configures the 486DLC as follows:

Internal 1kb cache is presently enabled via CR0.
First 64k of each 1Mb boundary set as cacheable.
640k --> 1Mb region set as cacheable.
A20M input enabled.
KEN input disabled.
FLUSH input enabled.
BARB input disabled.
Internal Cache mode: 2 way associative.
SUSP input and SUSPA output disabled.
All low power mode features disabled.

Non-cacheable region 1: Start Segment 0x00A000, size = 128 kb.
Non-cacheable region 2: Start Segment 0x00C000, size =256 kb.

I've concluded that the crash on starting speedsys is a bug in speedsys. It must be incompatible with 486DLC CPUs in FLUSH mode. Changing the 640k --> 1M and 1M boundary settings don't affect the speedsys crash. Changing to direct mapped also has no effect. Invoking the SP switch to disable ISA bus detection prevents speedsys from crashing (speedsys.exe /sp) and it is able to run its tests normally after that. Not sure why that would matter, but when speedsys probes the bus while the flush circuit is active it hard locks the system.

Is the source to speedsys available? I'm interested to see what he's doing when probing the ISA bus.

Reply 87 of 95, by aspiringnobody

User metadata
Rank Member
Rank
Member
Anonymous Coward wrote on 2023-08-09, 00:33:
aspiringnobody wrote on 2023-08-08, 17:21:
Sorry for thread necro. […]
Show full quote

Sorry for thread necro.

If I’m understanding this correctly, BARB won’t work properly on any board with look-through cache?

I’m using a late OPTI 495SX motherboard that has 486 and 386 sockets (SY-024B1/B2). I already have a 486 so I’m building it with the 486DLC 40 - with cache off it is close enough to a 386 DX 40 and when I enable cache I get some more oomph.

Flush doesn’t seem to work, which is interesting because it would need to have a flush circuit anyway for the 486. It claims to support the 486DLC and enables the cache in flush mode if I do it via bios.

It works-ish with Barb but is unstable. It could be my 15ns tag ram chip on the motherboard’s cache — I’ve got some 10ns chips on order. But after reading this I’m thinking it’s the BARB issue. It definitely is look-through because of the huge penalty incurred on cache misses.

I’m going to check the board and see if the flush and A20 pins on the 386 and 486 socket have continuity. Is the 486 flush pin also active low? If so can I just jumper from the 486 socket the 386 socket to enable flush support?

feipoa wrote on 2015-11-20, 21:18:
I plan on modifying my AMI Mark V Baby Screamer (VLSI 330/331/332) and Chainech 340 (SiS 310/320/330) motherboards to properly […]
Show full quote

I plan on modifying my AMI Mark V Baby Screamer (VLSI 330/331/332) and Chainech 340 (SiS 310/320/330) motherboards to properly support the L1 cache of the Cyrix 486DLC and 486SXL processors using the FLUSH# pin. The hardware mod is quite simple and can be accomplished with a single NAND quad or dual channel package (74F00N).

The problem I am having is that I cannot determine if the L2 cache on these motherboards uses look-aside (parallel) or look-through (serial) scheme. Reading the chipset data sheets did not provide a direct answer. With parallel access, both, the cache and the main memory are accessed simultaneously. If a cache hit occurs, the access to the main memory is aborted. In serial access, the cache is first examined and, if a miss occurs, the main store is accessed.

Instructions from the Ti486DLC Reference Guide are shown below. For parallel L2 cache, we are instructed to take the HLDA pin from the CPU's HLDA pin. This is straight forward enough. However, if I were to make an educated guess, I would say these motherboards likely use serial L2 cache access. How can I be certain?

However, for serial L2 cache, we are instructed to take the HLDA pin from the motherboard's chipset. Looking at the SiS board's chipset, the HLDA pin's input is directly connected to the CPU's HLDA output. So I cannot figure out how to differentiate between the two Ti connection schemes on this motherboard. Perhaps I am supposed to use DMAHLDA from the chipset instead?

Alternately, the VLSI motherboard has a chipset pin HLDA as an input in, which is not direclty connected to the CPU's HLDA pin. I am confused as to why it is an input and not an output pin.

[click to enlarge]Parallel_L2.png[click to enlarge]Serial_L2.png

Did you try enabling cache with software and enabling the exclusion zones?
cyrix.exe -f -m- -xA000,128 -xC000,256

Interestingly, if I disable the bios internal cache and do it from dos using your suggested command:

cyrix.exe -e -f -m- -xA000,128 -xC000,256

I end up with barb AND flush active and speedsys works normally. Obviously there's no point to having both enabled, at least I think there's no point to having both.

I've got a little batch file that runs on boot and asks me if I want to enable the cache (y/n) so that I don't have to use the bios's cache implementation since I've heard it can be buggy even if it claims to support the 486DLC.

it does:

cyrix.exe -a- -b- -c- -e- -f- -i1 -i2 -i3 -i4 -k- -m- -r --> first to put the processor in a known state
cyrix.exe -a -c -e -f -m -r- -xA000,128 -xC000,256 --> if I say I want it in flush mode
cyrix.exe -b -c -e -xA000,128 -xC000,256 --> if I say I want it in barb mode

Unless I'm missing something there's no point to -a if using barb since I've got the 1M boundaries off anyway
(I know that could be all one line but it made more sense to disable whatever might be in the excluded ranges first and then add back what I actually want to be there)

Reply 88 of 95, by aspiringnobody

User metadata
Rank Member
Rank
Member

So, new discovery: I changed up my barb setup to include -a, -m, and -r- (to take advantage of the fact that I know the A20 line works). Speedsys doesn't work this way either! so if -m and -r- are both enabled, speedsys chokes. Interestingly, it must cache the first ISA scan it does the first time it runs, because if I start it up without -m and -r-, close it, enable those two options in flush OR barb mode it won't crash after that until I reboot the PC.

It must do the ISA scan once per boot, and keep track of that somehow (or something more complex is going on that's beyond my level of knowledge).

Reply 89 of 95, by aspiringnobody

User metadata
Rank Member
Rank
Member
mkarcher wrote on 2023-08-08, 20:17:
Actual 10ns DIP SRAMs are rare items. When 10ns access speed was common, DIP SRAM wasn't common anymore. 10ns SOJ SRAM is more c […]
Show full quote
aspiringnobody wrote on 2023-08-08, 17:21:

It works-ish with Barb but is unstable. It could be my 15ns tag ram chip on the motherboard’s cache — I’ve got some 10ns chips on order.

Actual 10ns DIP SRAMs are rare items. When 10ns access speed was common, DIP SRAM wasn't common anymore. 10ns SOJ SRAM is more common and still in production.

On the other hand, DIP SRAM chips with "-10" printed on them are not that rare on the secondary market, but they often do not perform at that speed. This can also apply to SOJ chips. I recently got bitten by a lot of 20 "CY7C1009D-10" SOJ chips I bought of AliExpress some years ago (and didn't use initially). After I finally put them to use, they performed mediocre. And if my test setup with my digital scope is not completely crooked, they perform like 15ns are supposed to perform, so I'd guess they are relabelled 15ns chips. At least they are still perfectly working 128k * 8 SRAM chips, just not at the advertised speed.

So if the "10ns" chips you ordered don't improve the situation, the reason might still be the tag ram.

I purchased some Toshiba TMM2063P-10 chips from eBay. I did verify that at least at one time Toshiba did actually make that part number as a 10ns chip. They’re from one of the cities right by the port of LA so they are certainly a Chinese importer. We shall see. Hopefully they are legit.

I also bought a couple "HY658256S-12" chips — quotes because I don’t think they ever made them in that speed so they’re probably 15ns or 20ns chips that have been repainted. I’m hoping they are at least 15ns chips because they are a different brand from my other 15ns so perhaps they will fare differently. They’re less desirable thought because if I use 32kx8 tag ram I have to do 256k cache. Hopefully the 8kx8 a are real and then I can do 128k.

Reply 90 of 95, by feipoa

User metadata
Rank l33t++
Rank
l33t++

I've run into speedsys bugs on DLC/SXL motherboards often and don't give it much thought. Yeah, sometimes the SP switch allows it to run. I find Windows tests more meaningful for testing stability.

Have you read through my post here? Register settings for various CPUs
I continue to update it when I find new information.

This month I added the hardware modification guide from Ernie van der Meer, which I've had all along but didn't scan it until a few years ago. Another setting I played with recently that has worked well was:

however instead of disabling cache of the first 64 KB of each megabyte boundary, you can probably get away with just not caching the first 64KB after the first 1 MB boundary using -x10000,64 instead of -m-

I had noticed that some DOS benchmarks had reduced speeds if not caching some of the 1 MB boundaries, thus I looked into this further and found that not caching just the first 1 MB boundary as usually sufficient.

By the way, according to the datasheet, TMM2063P-10 is 100 ns, not 10 ns. https://datasheet4u.com/datasheet-pdf/Toshiba … .php?id=1091843

Plan your life wisely, you'll be dead before you know it.

Reply 91 of 95, by aspiringnobody

User metadata
Rank Member
Rank
Member
feipoa wrote on 2023-08-09, 08:07:

By the way, according to the datasheet, TMM2063P-10 is 100 ns, not 10 ns. https://datasheet4u.com/datasheet-pdf/Toshiba … .php?id=1091843

Man why you gotta rain on my parade like that?

Reply 92 of 95, by Deunan

User metadata
Rank Oldbie
Rank
Oldbie
aspiringnobody wrote on 2023-08-09, 01:53:

So, new discovery: I changed up my barb setup to include -a, -m, and -r- (to take advantage of the fact that I know the A20 line works). Speedsys doesn't work this way either! so if -m and -r- are both enabled, speedsys chokes. Interestingly, it must cache the first ISA scan it does the first time it runs, because if I start it up without -m and -r-, close it, enable those two options in flush OR barb mode it won't crash after that until I reboot the PC.

It must do the ISA scan once per boot, and keep track of that somehow (or something more complex is going on that's beyond my level of knowledge).

How do you know for sure that A20M signal is connected to CPU? Did you check if that pin is actually wired to anything at all? Just because some programs seem to work doesn't mean it's all good.
Run DOOM, let it just play the attract loop. It's a pretty darn good test for these CPUs and mobo cache on 386/486 systems in general. If DOOM hangs on load then you can suspect your A20M is not actually working, test that by adding -m- and see if it still hangs.
If DOOM starts but does eventually hang at some point (usually minutes, sometimes more) then your mobo cache timings are too tight, or the chips are not what they claim to be. Note that some mobos will work with better timings when smaller amount of cache memory is installed, but will need more waitstates for bigger cache. The treshold is usually going from 128k to 256k but it depends on mobo and chips.

Frankly not many chipsets actually properly support DLC type CPUs. ALI M1217 does but it's SX/SLC-class chipset. The rest often offers some BIOS options but it turns out these cannot be used, at all or without HW mods (connecting missing signals).

As for BARB, it will only work properly if the mobo asserts HOLD on CPU on _ALL_ DMA cycles. Some newer chipsets allow the CPU to run from cache (so long it's always hitting cache and doesn't need RAM access, which usually implies the mobo cache works in WB mode, not WT) while DMA is taking place in RAM. This will not flush the on-CPU cache properly - however it can only happen during DMA cycles, like reading from floppy or with SCSI HDD cards. If you are running pure PIO IDE and not using floppy, and BARB still doesn't work properly then I would assume it's really about flushing the cache but something else - some signal timings?

BTW for BARB, on chipsets that do assert HOLD always, you need to enable Hidden RAM Refresh option in BIOS or suffer significant performance penalty.

Reply 93 of 95, by pshipkov

User metadata
Rank Oldbie
Rank
Oldbie

One way to make Speedsys work with DLC/SXL processors is to enter and exit Windows 3.1 within the same session.
Roughly 50% success ratio, depends on the motherboard/bios.
But not sure why.

retro bits and bytes

Reply 94 of 95, by aspiringnobody

User metadata
Rank Member
Rank
Member
Deunan wrote on 2023-08-09, 13:52:
How do you know for sure that A20M signal is connected to CPU? Did you check if that pin is actually wired to anything at all? J […]
Show full quote
aspiringnobody wrote on 2023-08-09, 01:53:

So, new discovery: I changed up my barb setup to include -a, -m, and -r- (to take advantage of the fact that I know the A20 line works). Speedsys doesn't work this way either! so if -m and -r- are both enabled, speedsys chokes. Interestingly, it must cache the first ISA scan it does the first time it runs, because if I start it up without -m and -r-, close it, enable those two options in flush OR barb mode it won't crash after that until I reboot the PC.

It must do the ISA scan once per boot, and keep track of that somehow (or something more complex is going on that's beyond my level of knowledge).

How do you know for sure that A20M signal is connected to CPU? Did you check if that pin is actually wired to anything at all? Just because some programs seem to work doesn't mean it's all good.
Run DOOM, let it just play the attract loop. It's a pretty darn good test for these CPUs and mobo cache on 386/486 systems in general. If DOOM hangs on load then you can suspect your A20M is not actually working, test that by adding -m- and see if it still hangs.
If DOOM starts but does eventually hang at some point (usually minutes, sometimes more) then your mobo cache timings are too tight, or the chips are not what they claim to be. Note that some mobos will work with better timings when smaller amount of cache memory is installed, but will need more waitstates for bigger cache. The treshold is usually going from 128k to 256k but it depends on mobo and chips.

Frankly not many chipsets actually properly support DLC type CPUs. ALI M1217 does but it's SX/SLC-class chipset. The rest often offers some BIOS options but it turns out these cannot be used, at all or without HW mods (connecting missing signals).

As for BARB, it will only work properly if the mobo asserts HOLD on CPU on _ALL_ DMA cycles. Some newer chipsets allow the CPU to run from cache (so long it's always hitting cache and doesn't need RAM access, which usually implies the mobo cache works in WB mode, not WT) while DMA is taking place in RAM. This will not flush the on-CPU cache properly - however it can only happen during DMA cycles, like reading from floppy or with SCSI HDD cards. If you are running pure PIO IDE and not using floppy, and BARB still doesn't work properly then I would assume it's really about flushing the cache but something else - some signal timings?

BTW for BARB, on chipsets that do assert HOLD always, you need to enable Hidden RAM Refresh option in BIOS or suffer significant performance penalty.

I did trace out the a20 pin, it goes to pin 22 of the keyboard controller so it’s definitely wired correctly. I’m doom runs fine with sound, I let it go for over an hour. I’m also able to play a mp3 in windows 3.11 while accessing the floppy drive repeatedly which doesn’t work at all if the cache is intentionally misconfigured. I think it’s a speedsys problem. Other versions of my motherboard have vlb slots so it’s very late, it has a 486 chipset.