486 SX 16 & 20 MHz

Reply 100 of 107, by Am386DX-40

Posted on 2021-03-08, 16:12

Am386DX-40 Offline

Rank Member

Rank: Member
Posts: 139
Joined: 2011-03-27, 02:50

Doom just loves the 486 architecture. The SX-16 is very close to the DX-40 in Doom, but quite behind in the rest of the benchmarks.
Interesting results though, as I was competing with my friend's SX-25 when I was young (I had a DX-40). Mine had 128kb of external cache, and his didn't have any, so they were closer, but still the cache-less SX-25 was still a bit ahead.

Last edited by Am386DX-40 on 2021-03-08, 16:13. Edited 1 time in total.

Reply 101 of 107, by mpe

Posted on 2021-03-08, 16:12

mpe Offline

Rank Oldbie

Rank: Oldbie
Posts: 1084
Joined: 2018-06-12, 21:19
Location: London, UK

weedeewee wrote on 2021-03-08, 16:04:

mpe wrote on 2021-03-08, 14:03:

...and the 486SX-16 is more or less equal to 486SX-16.

Any comments are welcome.

Just imagine a world where it isn't ... 😉

Edited. Thanks!

Blog|NexGen 586|S4

Reply 102 of 107, by mpe

Posted on 2021-03-08, 16:41

mpe Offline

Rank Oldbie

Rank: Oldbie
Posts: 1084
Joined: 2018-06-12, 21:19
Location: London, UK

douglar wrote on 2021-03-08, 16:00:

What video card and storage did you use?
Can you do a comparison of ISA vs VLB from 386 to faster 486s? I was always curious to see evidence of a clear MHz inflection point where VLB really starts to become important rather than the 486-66dx2 rule of thumb.
Anyway you can quantify the % performance change for both the 386 & 486 chips from the OPTi 496XLC chipset compared against better performing chipsets?

I used Tseng ET4000 W32/p based VGA which is a very fast card.

I am yet to quantify the effect of having VL-Bus on 386 graphic performance. But I don't expect it to be a big hit. The VLB doesn't support burst transfers when deployed with 386. Bursts is one of the main benefits of the VLB.

At the sime time the 486SX-16 doesn't benefit from the faster bus either as the bandwith difference isn't that big compared to let's say an overclocked 13 MHz ISA bus you can have on a 386. Also the slow CPU can't generate enough graphics data to saturate the bus like a fast 486DX4 or Pentium can.

Blog|NexGen 586|S4

Reply 103 of 107, by douglar

Posted on 2021-03-08, 18:27

douglar Offline

Rank Oldbie

Rank: Oldbie
Posts: 1373
Joined: 2019-11-04, 15:37

mpe wrote on 2021-03-08, 16:41:

At the sime time the 486SX-16 doesn't benefit from the faster bus either as the bandwith difference isn't that big compared to let's say an overclocked 13 MHz ISA bus you can have on a 386. Also the slow CPU can't generate enough graphics data to saturate the bus like a fast 486DX4 or Pentium can.

A 50% overclock on the ISA bus (8MHz to 12Mhz) will certainly shows some improvements. It definitely moves the needle on bus limited benchmarks, that's for sure.

But you don't have to saturate the bus to see a significant benefit on CPU bound tasks. Dos games are essentially single threaded systems with few options for hiding latency.

Going from ISA to VLB should reduce the IO latency on each IO call from 375ns to ~30ns, and should reduce the number of total IO calls by 50% because it is 32 bit wide instead of 16.

So say for example, you have a game that is running on ISA, and has a 40% CPU overhead just pushing pixels, 5% CPU overhead doing other I/O, and the frame rate is CPU limited because it doesn't have enough remaining CPU power to do all the game logic.

If you switch from ISA to VLB, it seems reasonable to me that it could reduce the CPU overhead to push pixels from 40% to less than 10%. That should increase the amount of available CPU for game logic and that could make a big difference in frame rates.

https://www.karbosguide.com/books/pcarchitect … %2D2%20MB%2Fsec
The bus has a theoretical bandwidth of about 8 MB per second. However in practise it never exceeds about 1-2 MB/sec. – partly because it takes 2-3 of the processor’s clock pulses to move a packet (16 bits) of data.
One of the reasons the ISA bus was slow was that it only had 16 data channels. The 486 processor, once it was introduced, worked with 32 bits each clock pulse.
Bus / ISA / Time per packet 375 ns
Bus / VLB / Time per packet 30 ns

*edit -- tried to clear up my logic--

Reply 104 of 107, by jesolo

Posted on 2021-03-08, 19:30

jesolo Offline

Rank l33t

Rank: l33t
Posts: 2104
Joined: 2014-06-24, 19:04
Location: South Africa

Am386DX-40 wrote on 2021-03-08, 16:12:

Doom just loves the 486 architecture. The SX-16 is very close to the DX-40 in Doom, but quite behind in the rest of the benchmarks.
Interesting results though, as I was competing with my friend's SX-25 when I was young (I had a DX-40). Mine had 128kb of external cache, and his didn't have any, so they were closer, but still the cache-less SX-25 was still a bit ahead.

Just remember that an Intel 486 (particularly the one you referred to) had 8KB of L1 internal cache. The 386 had no L1 cache, which is why they later on added external cache to 386 motherboards to increase performance.
If you look at, as an example, how a Cyrix 486DLC performed with its 1KB of L1 cache enabled (versus disabled) then you'll see where I'm coming from.

Reply 105 of 107, by Am386DX-40

Posted on 2021-03-08, 19:48

Am386DX-40 Offline

Rank Member

Rank: Member
Posts: 139
Joined: 2011-03-27, 02:50

jesolo wrote on 2021-03-08, 19:30:

Am386DX-40 wrote on 2021-03-08, 16:12:

Doom just loves the 486 architecture. The SX-16 is very close to the DX-40 in Doom, but quite behind in the rest of the benchmarks.
Interesting results though, as I was competing with my friend's SX-25 when I was young (I had a DX-40). Mine had 128kb of external cache, and his didn't have any, so they were closer, but still the cache-less SX-25 was still a bit ahead.

Just remember that an Intel 486 (particularly the one you referred to) had 8KB of L1 internal cache. The 386 had no L1 cache, which is why they later on added external cache to 386 motherboards to increase performance.
If you look at, as an example, how a Cyrix 486DLC performed with its 1KB of L1 cache enabled (versus disabled) then you'll see where I'm coming from.

That's true, too. Anyways IMHO the "correct" way (the most common config at those times) to benchmark them is to use the SX with its L1 cache enabled and no external cache; and the 386 with external cache (at least 64kb, though i think 128kb was more common, except some weird chipsets like the one on the Jaguar V which has 8kb of cache inside the chipset itself) enabled.

Reply 106 of 107, by mpe

Posted on 2021-03-08, 21:12

mpe Offline

Rank Oldbie

Rank: Oldbie
Posts: 1084
Joined: 2018-06-12, 21:19
Location: London, UK

douglar wrote on 2021-03-08, 18:27:

Going from ISA to VLB should reduce the IO latency on each IO call from 375ns to ~30ns, and should reduce the number of total IO calls by 50% because it is 32 bit wide instead of 16.

While what you quoted is true the IO latency expressed in nanoseconds is just a different representation of the bus clock speed (inverted value) and this is where the speed of VLB is pretty much coming from.

Majority of DOS software only transfers 16bit at once when writing to VGA and also some of the most popular VL-Bus controllers, such as the omnipresent CL-GD542x series only had 16bit host interface anyway. So the 32bit nature of the bus is wasted unless using 32bit writes (likely GUI work).

For the same reason EISA cards are no faster than ISA equivalents in DOS.

The main reason why VL-Bus video is faster is clock speed and ability to optionaly do burst transfers which somewhat reduce overhead when supported (essentially you can write 16 bytes and only send address once). Thus I don't see too much difference between ISA and VL-Bus with similar clock speed, but this is something I will try to look into experimentally.

Blog|NexGen 586|S4

Reply 107 of 107, by kdr

Posted on 2021-03-08, 23:57

kdr Offline

Rank Member

Rank: Member
Posts: 215
Joined: 2020-09-20, 12:40
Location: New Zealand

mpe wrote on 2021-03-08, 21:12:

Majority of DOS software only transfers 16bit at once when writing to VGA and also some of the most popular VL-Bus controllers, such as the omnipresent CL-GD542x series only had 16bit host interface anyway. So the 32bit nature of the bus is wasted unless using 32bit writes (likely GUI work).

Even though e.g. the CL-GD542x only does 16-bit transfers, I think there's still quite a big performance gain if doing 32-bit writes because the 486 has four write buffers, each 32 bits in size. So a single 32-bit write instruction should be able to fill one of the buffers in a single cycle, letting the CPU move on to other instructions [which should already be in L1 cache] while the write buffer(s) slowly drain out to the bus 16 bits at a time in the background. This wouldn't really help at all for a simple bitblt loop but could be advantageous for something like a 3D renderer.

Main menu

Common searches

Topic actions

Reply 100 of 107, by Am386DX-40

Reply 101 of 107, by mpe

Reply 102 of 107, by mpe

Reply 103 of 107, by douglar

Reply 104 of 107, by jesolo

Reply 105 of 107, by Am386DX-40

Reply 106 of 107, by mpe

Reply 107 of 107, by kdr