VOGONS


First post, by bloodbath2you

User metadata
Rank Newbie
Rank
Newbie

Hi, i have a M396F rev v2.7 board with a AM386SX-40 and a ITT XC87SLC-33 math copro

https://theretroweb.com/motherboards/s/pcchips-m396f-v2.7

This board had the typical battery leak issue and i had to fix some traces, also modded it so it could use 3.3v coin batteries.

The thing is, i have a feeling that it is not performing as it should, with LM20 benchmark i get 39 MHz AT perfomance, which seems quite low to me, but i could be wrong

Right now It uses the stock bios, with some tweaks with AMISETUP, mostly for enabling slow refresh. There is little difference with stock settings anyways

Any experience with this board?

Reply 1 of 15, by keenmaster486

User metadata
Rank l33t
Rank
l33t

Remember the SX has a 16 bit bus and LM20 is a 16 bit benchmark, and the SX performs like a 286 clock for clock on 16 bit code, so LM20 reading it as a "~40 MHz AT" is exactly what I would expect from a system like that.

World's foremost 486 enjoyer.

Reply 2 of 15, by Thandor

User metadata
Rank Member
Rank
Member

Agree with keenmaster486; think of it as a very fast 286 with the bonus of running EMM386 and such (much easier to configure to have a lot of base RAM free).

I benchmarked my SX40 and found it’s in the ballpark of a 386DX25.

thandor.net - hardware
And the rest of us would be carousing the aisles, stuffing baloney.

Reply 3 of 15, by theelf

User metadata
Rank Oldbie
Rank
Oldbie

I have same 386 motherboard, expext be a little faster than a 286-25mhz no much more

Reply 4 of 15, by MikeSG

User metadata
Rank Oldbie
Rank
Oldbie

The main bottleneck in a 286/386sx is memory speed... Slow systems may be around 10MB/s. Fast systems around 20MB/s. Using Speedsys in the Dos Benchmark Pack (PhilsComputerLab).

Can change DRAM wait states in BIOS, or faster DRAM (60ns) if those settings fail, or increase the XTAL speed slightly (although it looks like a fixed chip on yours).

On some motherboards the 387 math processor slows down the CPU speed.

Reply 5 of 15, by konc

User metadata
Rank l33t
Rank
l33t

I don't know anything about this specific board but what makes SX appear so much slower than a DX is usually the lack of cache

Reply 6 of 15, by CharlieFoxtrot

User metadata
Rank Oldbie
Rank
Oldbie
konc wrote on 2026-02-09, 13:57:

I don't know anything about this specific board but what makes SX appear so much slower than a DX is usually the lack of cache

It is the external 16-bit data bus that just doesn’t just make SX appear slower, but makes it so. 386SX is really roughly the same speed as 286 clock for clock in your normal applications. Some fastest 286 chipsets with fast memory could beat most 386SX platforms at the same clock speed. And compared to 386DX the performance hit was roughly 50%, sometimes a bit less depending how memory and IO intensive the software was.

Significantly faster performance wasn’t the point of 386SX. It was to provide cheap way to all the good features of 386DX, such as memory management. The big price advantage that 386SX had was due to the fact that chipset and MB manufacturers could pretty much take their 286 designs and just slightly modify them for 386SX. 386DX motherboards and chipsets were much more complex and thus very expensive when 386SX was released.

Reply 7 of 15, by mkarcher

User metadata
Rank l33t
Rank
l33t
CharlieFoxtrot wrote on 2026-02-09, 17:41:

And compared to 386DX the performance hit was roughly 50%, sometimes a bit less depending how memory and IO intensive the software was.

This number seems quite high if you run 16-bit software. As the 386 core does not have internal cache, it performs data every memory cycle as instructed by application software, that it even the 386DX does not perform any data memory cycle using 32 bits at once unless instructed so by software. 16-bit software mostly does not instruct a 386DX processor to perform 32-bit cycles. There are a couple of 16-bit processor instructions that do access 32-bits of data at once (like indirect FAR jumps and calls, LDS, LES and LSS), but these instructions are typically not used often enough that the 32-bit memory bus of the 386DX causes a notable difference.

There are some advantages of the 32-bit bus, though. I carefully wrote data memory cycles in the previous paragraph, because the prefetch queue of the 386DX is 4 32-bit words long, and will be refilled using 32-bit cycles. Instruction fetch bandwidth thus is twice as high on a 386DX system compared to a 386SX system. Furthermore, if EMM386 or another virtual memory manager is active (without doubt, this is one of the main selling points even of the 386SX), the processor needs to fetch memory mapping information from the page table. The page table entries are 32 bits wide and aligned, so they can be fetched in 1 cycle on the 386DX, but require 2 cycles on the 386SX.

There are other processor-initiated operations that make use of the 32-bit bus, like fetching interrupt vectors or hardware task switching. The performance of these operations should not dominate processor performance in a sensibly designed system (and if your interrupt rate is high enough for this to matter, you better do not use EMM386 at all, a 286 in real mode will likely beat a 386DX with EMM386 at handling interrupts by a big margin).

I do believe though, that a typical 386DX-33 system back in the day was 50% faster than a similarly clocked 386SX-33 system, but the key point is that the 386DX platform was higher end, and most mainboards provide cache, whereas many 386SX mainboards are budget boards without cache. As the 386 bus protocol requires 2 clocks per bus cycle in the optimal case (the 286 protocol does so, too), running 0WS a 386 at 33MHz requires a data transfer every 60ns. While this rate is possible to achieve staying on a page, uncached 386 systems typically often didn't run at 0WS back in the day.

The importance of the cache is confirmed by a recent video by Adrian Black on Adrians Digital Basement. In that video, Adrian showed a cached 386SX-40 that achieved 61MHz in Landmark 6.0, which is way higher than the 39 reported in this thread.

Reply 8 of 15, by CharlieFoxtrot

User metadata
Rank Oldbie
Rank
Oldbie
mkarcher wrote on 2026-02-09, 19:49:
This number seems quite high if you run 16-bit software. As the 386 core does not have internal cache, it performs data every me […]
Show full quote
CharlieFoxtrot wrote on 2026-02-09, 17:41:

And compared to 386DX the performance hit was roughly 50%, sometimes a bit less depending how memory and IO intensive the software was.

This number seems quite high if you run 16-bit software. As the 386 core does not have internal cache, it performs data every memory cycle as instructed by application software, that it even the 386DX does not perform any data memory cycle using 32 bits at once unless instructed so by software. 16-bit software mostly does not instruct a 386DX processor to perform 32-bit cycles. There are a couple of 16-bit processor instructions that do access 32-bits of data at once (like indirect FAR jumps and calls, LDS, LES and LSS), but these instructions are typically not used often enough that the 32-bit memory bus of the 386DX causes a notable difference.

There are some advantages of the 32-bit bus, though. I carefully wrote data memory cycles in the previous paragraph, because the prefetch queue of the 386DX is 4 32-bit words long, and will be refilled using 32-bit cycles. Instruction fetch bandwidth thus is twice as high on a 386DX system compared to a 386SX system. Furthermore, if EMM386 or another virtual memory manager is active (without doubt, this is one of the main selling points even of the 386SX), the processor needs to fetch memory mapping information from the page table. The page table entries are 32 bits wide and aligned, so they can be fetched in 1 cycle on the 386DX, but require 2 cycles on the 386SX.

There are other processor-initiated operations that make use of the 32-bit bus, like fetching interrupt vectors or hardware task switching. The performance of these operations should not dominate processor performance in a sensibly designed system (and if your interrupt rate is high enough for this to matter, you better do not use EMM386 at all, a 286 in real mode will likely beat a 386DX with EMM386 at handling interrupts by a big margin).

I do believe though, that a typical 386DX-33 system back in the day was 50% faster than a similarly clocked 386SX-33 system, but the key point is that the 386DX platform was higher end, and most mainboards provide cache, whereas many 386SX mainboards are budget boards without cache. As the 386 bus protocol requires 2 clocks per bus cycle in the optimal case (the 286 protocol does so, too), running 0WS a 386 at 33MHz requires a data transfer every 60ns. While this rate is possible to achieve staying on a page, uncached 386 systems typically often didn't run at 0WS back in the day.

The importance of the cache is confirmed by a recent video by Adrian Black on Adrians Digital Basement. In that video, Adrian showed a cached 386SX-40 that achieved 61MHz in Landmark 6.0, which is way higher than the 39 reported in this thread.

Landmark speedtest is so inaccurate representing actual performance that it is not even funny. It is a small simple arithmetic test depending on clock frequency and it can practically run from a small L1 if you have one. For example, in my experience with 486 you get from zero to very small difference with L2 cache, because Landmark doesn't actually utilize data bus that much and probably runs directly from L1. Graphics performance shown is also useless, as it is based on text modes, something that graphics chip manufacturers probably didn't bother to optimize that much since the late 80s. Because the benchmark is widely known to be so unreliable, I wouldn't draw that big conclusions about Adrian's result vs what OP has and it is certainly not apples to apples comparison. My guess is that the small chipset cache leads to a huge advantage in thay specific test, although in real life the difference is much more modest.

Anyways, it is a fact that 286 and 386SX roughly equal to performance, both without a cache. A 386, with or without the cache absolutely wipes the floor with 286 clock for clock, which means that it beats 386sx too.

Data bus effect is more significant with 386SX/DX than it is for 8088 and 8086 and 8086 performs significantly better. Again, in the worst case 8-bit data bus halves the performance, but in practice it is perhaps 25-30%. For example, 8MHz 8086 XT is roughly equal to 10.5-11MHz 8088.

Edit: If you go by the Landmark numbers, I'd say that OP result is a tad low. If my memory doesn't completely fail me , I get something like 24+MHz on my IBM PS/2 Model 40 (386SX-20), so OP should get more close to 50MHz in Landmark, if we assume that the test scales correclty and is reliable (which it is not). Then again, Adrian's result seems to be just over the top for 386SX. I have 386DX-33 with 64kB cache and a fast SiS "rabbit" chipset MB and if I don't remember completely wrong, it got something around 70MHz on that test, again showing that Landmark doesn't utilize data bus that much and simply depends on some simple arithmetic calculations which scale generally well with clock speed and can fit to a very small cache.

Reply 10 of 15, by Jo22

User metadata
Rank l33t++
Rank
l33t++
badmojo wrote on Yesterday, 07:39:

When I want to gauge the performance of my retro PCs I use this: https://docs.google.com/spreadsheets/d/1lvF9n … dit?gid=0#gid=0

Thanks, that's a great list! 🙂

There's one thing that's a bit unclear to me: was PC-Player Benchmark run in VGA mode or not?
Because, PCPBENCH uses VBE mode 100h by default, which is 640x400 pixels in 256c.

Performance can be boosted if a linear frame buffer is enabled by graphics card and VBE BIOS.
The benchmark even mentions and recommends mode 101h (640x480 256c) as an alternative if the gfx card can't do it (even with optional blackbars to simulate 640x400 256c).

I just wonder, because many users do use the /VGAMODE switch nowadays to run the benchmark,
which was supported, of course, but on other hand wasn't exactly
the main purpose of the benchmark back in the day anymore.

By ca. 1995, the Pentium processor, Protected-Mode games (VBE 2 compatible), PCI graphics cards, VR glasses and 3D graphics were recent.
Mode 13h in 320x200 pels 256c, by contrast, was getting a bit old by that time..

"Time, it seems, doesn't flow. For some it's fast, for some it's slow.
In what to one race is no time at all, another race can rise and fall..." - The Minstrel

//My video channel//

Reply 11 of 15, by konc

User metadata
Rank l33t
Rank
l33t
CharlieFoxtrot wrote on 2026-02-09, 17:41:
konc wrote on 2026-02-09, 13:57:

I don't know anything about this specific board but what makes SX appear so much slower than a DX is usually the lack of cache

It is the external 16-bit data bus that just doesn’t just make SX appear slower, but makes it so. 386SX is really roughly the same speed as 286 clock for clock in your normal applications. Some fastest 286 chipsets with fast memory could beat most 386SX platforms at the same clock speed. And compared to 386DX the performance hit was roughly 50%, sometimes a bit less depending how memory and IO intensive the software was.

Significantly faster performance wasn’t the point of 386SX. It was to provide cheap way to all the good features of 386DX, such as memory management. The big price advantage that 386SX had was due to the fact that chipset and MB manufacturers could pretty much take their 286 designs and just slightly modify them for 386SX. 386DX motherboards and chipsets were much more complex and thus very expensive when 386SX was released.

What I meant is that often people measure the performance of a motherboard with an SX CPU and find it much slower than another one with a DX, only because the DX had cache. In reality, for example with the cache disabled on the board with the DX, the difference would be smaller.
(I don't agree with the "50%, sometimes a bit less" numbers at all btw, but that's not the point now)

Reply 12 of 15, by bloodbath2you

User metadata
Rank Newbie
Rank
Newbie

Hi guys, thanks for all the replies, i really appreciate it 😁

Markarcher mentioned Adrian Black, in fact this video made me question and make this thread
https://youtu.be/L7tnAgL4xIs?si=5If97huavgkbe-oh

At the last 1/3 of the video he tests the same board i have. However it has a ti486slc with l1 cache, so of course perfomance would be different.

Reply 13 of 15, by mkarcher

User metadata
Rank l33t
Rank
l33t
bloodbath2you wrote on Yesterday, 22:52:
Hi guys, thanks for all the replies, i really appreciate it 😁 […]
Show full quote

Hi guys, thanks for all the replies, i really appreciate it 😁

Markarcher mentioned Adrian Black, in fact this video made me question and make this thread
https://youtu.be/L7tnAgL4xIs?si=5If97huavgkbe-oh

At the last 1/3 of the video he tests the same board i have. However it has a ti486slc with l1 cache, so of course perfomance would be different.

Adrian's most common 16-bit PC test board is shown in that video at 4:18, which does have a "proper" 386SX on it, at 40 MHz, and that Macronix chipset has 8K integrated cache. You are talking about the "third mainboard" mentioned in that video, which indeed has a TI branded Cyrix 486SLC with 1KB cache integrated in the processor, which should be even faster than the external cache on adrians favorite board. Both processors show similar CPU scores in Landmark.

Anyway, CharlieFoxtrot is correct that the typical simple PC benchmarks back in the day (Landmark, Norton SI, a similar one in PC Tools) are very bad a representing actual PC performance, and as they only use very little memory, they overestimate the effect of even the smallest caches. I used to have a Laptop with a clock doubled 486SLC running at 50 MHz. With any serious application payload, that processor is waiting for memory all the time, and the miserable 1KB of L1 cache does not help very much, because the hit rate is too low. On the other hand, while executing a tiny benchmark,l you will get 100% cache hit rate, and really great performance numbers.

I consider the idea to put a clock doubler into the 486SLC with just 1KB of cache one of the technologically most pointless processor design ideas in the early 90s. I used to call it "that processor waits twice as fast for the memory access to complete at the same slow speed" compared to a non-clock-doubled 486SLC. It might have made very much sense from a marketing perspective, though, because you could sell your slightly pimped 386SX-25 machines as "486 running at 50MHz" which seemed like beating the mainstream 486DX-33 machines.

Reply 14 of 15, by dionb

User metadata
Rank l33t++
Rank
l33t++
mkarcher wrote on Today, 00:32:

[...]

I consider the idea to put a clock doubler into the 486SLC with just 1KB of cache one of the technologically most pointless processor design ideas in the early 90s. I used to call it "that processor waits twice as fast for the memory access to complete at the same slow speed" compared to a non-clock-doubled 486SLC. It might have made very much sense from a marketing perspective, though, because you could sell your slightly pimped 386SX-25 machines as "486 running at 50MHz" which seemed like beating the mainstream 486DX-33 machines.

You can go one worse: take a clock-doubled or even tripled 486SLC with internal and (16b) external cache and give it a (16b only) VLB bus 😜

Long live the glorious idiocy of the IBM/Alaris Leopard. Absolutely beautiful board though

Reply 15 of 15, by mkarcher

User metadata
Rank l33t
Rank
l33t
dionb wrote on Today, 10:52:

You can go one worse: take a clock-doubled or even tripled 486SLC with internal and (16b) external cache and give it a (16b only) VLB bus 😜

Long live the glorious idiocy of the IBM/Alaris Leopard. Absolutely beautiful board though

Well, that actually seems to make some sense to me.

Granted, the clock tripled 486SLC would be completely pointless if it had just 1KB of Cyrix L1. You seem to be talking about the IBM variant with 16KB L1 cache, though. Furthermore, I fail to find clock-tripled versions of the IBM SLC-derived chip on the Leopard board. That would be interesting, because Wikipedia currently claims that there are no "real" clock-tripled 486SLC chips in the proper 386SX 100-pin BQFP package, but the only 486SLC3 are actually 486BL3 ("blue lightning") chips in 132-pin PQFP with only 24 address / 16 data lines hooked up to the mainboard. If there is no 132PQFP version of the Leopard board, having a clock-tripled processor would require an interposer. A clock-doubled SLC with 16K cache is obviously kind of meh compared to a 486DX2 at the same clock speed, but except for marketing the 486SLC2 didn't actually compete against same-clock 486 systems, but it competed against the entry-level 386DX-40 and 486SX-25 systems.

The 16 bit VL makes a lot of sense in my oppinion. Given that entry level VL graphics cards (Cirrus 542x, Tseng ET4000AX) only supported 16 bit transfers on the VL bus anyway, cutting the VL to 16 bit did not limit the performance with those chips - and providing access to the graphics card at 25MHz on the local bus instead of 8MHz on the ISA bus actually did address a pain point on 386SX systems.