VOGONS


Is having 386 with 486 chipset is bad?

Topic actions

Reply 20 of 36, by MikeSG

User metadata
Rank Member
Rank
Member
BitWrangler wrote on 2024-09-23, 02:37:
Anonymous Coward wrote on 2024-09-23, 00:14:

The good true 386 chipsets had better main memory thoughtput, while the hybrid ones tended to have better external cache design.
Things kind of even out with regards to overall performance. The hybrid boards have the advantage of almost always supporting the 40mhz bus, 4mb SIMMs and cyrix support.

Yeah I was reading about how 386 was supposed to be designed to lock step with RAM such that it was zero wait state, not sure how many mhz that's good for or what nanosecond speed RAM, but seems to imply you don't need a cache if the chipset does it right... unless it all blows up at 33 or 40mhz etc, and that's why you see 40mhz AMDs on 486ey chipsets with a smidgin of cache.

Reading a Chips & Tech F82c351 manual.. says it supports 0WS cache read, and 0/1WS cache write-through at 33Mhz. Later versions supported 40Mhz.

If 486s support a minumum of 1WS, then it's better having a 386 chipset... at 33-40Mhz.

Reply 21 of 36, by mkarcher

User metadata
Rank l33t
Rank
l33t
MikeSG wrote on 2024-09-23, 15:16:

If 486s support a minumum of 1WS, then it's better having a 386 chipset... at 33-40Mhz.

A lot of later 486 chipsets offer 0WS write-through (or just write in cacheless operation) on page hits. I know for sure that the UMC8881 does, as I got 76.3 MB/s memory read and write with EDO RAM at FSB 40 (theoretical maximum at FSB40 is 80MB/s), and I conistently got 76.4MB/s write on FPM as well at FSB40. With the cache physically removed using sufficiently fast EDO, I even achieved 114.6MB/s at FSB60 (theoretical maximum is 120MB/s), so this obviously indicates 0WS operation is possible on 486 chipsets.

Any chipset that hits around 60MB/s write performance in speedsys at FSB33 given a sufficiently fast 486 processor (the 48DX-33 is not, it can not saturate the FSB using the instruction REP STOSD used by speedsys) can perform 0WS writes. The way the speedsys write performance test is designed, it will always miss the cache on 486-class computers, so you will get see a straight line for all block sizes, and the rate given by that line is the raw memory performance.

Reply 22 of 36, by BitWrangler

User metadata
Rank l33t++
Rank
l33t++

So this thread got me thinking a lot about how that might relate to the TOPCAT chipset and the board I unearthed a month or so back, so started another thread rather than hijack this one. ECS 386L "Topcat" VLSI chipset motherboard, technical discussion, best CPUs? eventual build?

Unicorn herding operations are proceeding, but all the totes of hens teeth and barrels of rocking horse poop give them plenty of hiding spots.

Reply 23 of 36, by jakethompson1

User metadata
Rank l33t
Rank
l33t
mkarcher wrote on 2024-09-22, 09:30:

As the 386 and the 486 front-side busses are quite similar, having a chipset support both isn't that difficult. Obviously, the 386 does not do burst cycles, while the 486 does so if it feels like it (e.g. when filling a line of the L1 cache). It doesn't matter if a chipset supports the 486-type burst if you install a processor that just never does a burst (like a 386). I can think of just one feature of the 386 bus protocol that's not present in the 486 bus protocol: Address pipelining. I wonder whether all 386/486 combo chipsets include support for 386 address pipelining.

Address pipelining is a 386 feature that allows the chip to output the address of the next bus cycle while the data transfer of the previous cycle is still running, enabling address decoding of a subsequent cycle to be parallelized with handling data transfer of the previous cycle. Allowing this kind of parallelism is adding complexity to the chipset, which will never be used by a 486 processor, which might motivate manufacturers of a 486 chipset with 386 compatibility to just leave it out.

Without looking at datasheets again, I recall earlier 386-only chipsets sometimes support cache line sizes other than 16 bytes, while 386/486 ones tend to be hardwired as 16 bytes to match the 80486 internal cache.
Is there any advantage to a smaller cache line size when using an 80386?

Reply 24 of 36, by daniil1909

User metadata
Rank Newbie
Rank
Newbie
jakethompson1 wrote on 2024-09-24, 00:49:
mkarcher wrote on 2024-09-22, 09:30:

As the 386 and the 486 front-side busses are quite similar, having a chipset support both isn't that difficult. Obviously, the 386 does not do burst cycles, while the 486 does so if it feels like it (e.g. when filling a line of the L1 cache). It doesn't matter if a chipset supports the 486-type burst if you install a processor that just never does a burst (like a 386). I can think of just one feature of the 386 bus protocol that's not present in the 486 bus protocol: Address pipelining. I wonder whether all 386/486 combo chipsets include support for 386 address pipelining.

Address pipelining is a 386 feature that allows the chip to output the address of the next bus cycle while the data transfer of the previous cycle is still running, enabling address decoding of a subsequent cycle to be parallelized with handling data transfer of the previous cycle. Allowing this kind of parallelism is adding complexity to the chipset, which will never be used by a 486 processor, which might motivate manufacturers of a 486 chipset with 386 compatibility to just leave it out.

Without looking at datasheets again, I recall earlier 386-only chipsets sometimes support cache line sizes other than 16 bytes, while 386/486 ones tend to be hardwired as 16 bytes to match the 80486 internal cache.
Is there any advantage to a smaller cache line size when using an 80386?

I think smaller cache line size takes less cycles than 16 bytes

Reply 25 of 36, by douglar

User metadata
Rank l33t
Rank
l33t
Horun wrote on 2024-09-23, 01:41:

Somewhere in my stocks-o-crap have a Opti 495sx based ISA 386/486 board with a 486DX2-66. Never considered putting a 386 on it (nothing soldered in PQFP area, is bare).....

I've been messing around with an Opti 495sx based ISA/VLB 386/486 board. Terrible slot layout, but it works with MR BIOS v1.6 OPTI4BH, which makes everything more pleasant. Doesn't have LBA support, but does allow storage up to 8GB

I put a copy of the BIOS here if you want to check it out on your board: https://theretroweb.com/motherboards/s/edom-w … tech-mv008#bios

Reply 27 of 36, by Horun

User metadata
Rank l33t++
Rank
l33t++
douglar wrote on 2024-09-24, 15:18:
Horun wrote on 2024-09-23, 01:41:

Somewhere in my stocks-o-crap have a Opti 495sx based ISA 386/486 board with a 486DX2-66. Never considered putting a 386 on it (nothing soldered in PQFP area, is bare).....

I've been messing around with an Opti 495sx based ISA/VLB 386/486 board. Terrible slot layout, but it works with MR BIOS v1.6 OPTI4BH, which makes everything more pleasant. Doesn't have LBA support, but does allow storage up to 8GB

I put a copy of the BIOS here if you want to check it out on your board: https://theretroweb.com/motherboards/s/edom-w … tech-mv008#bios

Thanks. I just discovered I pointed to wrong TRW pages, was going from memory. Just pulled it out and It is this one https://theretroweb.com/motherboards/s/a-tren … lc-vl-bus-3-486
knew it didn't have a pqfp soldered but does have a 386 socket..and does have VLB. Currently Cyrix dx2-80 and 4x4mb ram.....

Hate posting a reply and then have to edit it because it made no sense 😁 First computer was an IBM 3270 workstation with CGA monitor. Stuff: https://archive.org/details/@horun

Reply 28 of 36, by Deunan

User metadata
Rank l33t
Rank
l33t
jakethompson1 wrote on 2024-09-24, 00:49:

Is there any advantage to a smaller cache line size when using an 80386?

There might be some. This is, as always, very data dependent. The main problem with 386 is it has zero on-chip cache. So even instruction fetching and decoding is bottlenecked by the bus. Both 386 and 486 will stall if the requested code or data is not in cache, and mobo has to inject wait cycles to fetch it. But on 486 the L1 might be still providing data to instruction decoder, and the pipeline is still probably partially busy executing and retiring instructions. So some of the penalty of the pretty long 16 bytes fetch is masked on 486 but not on 386. The original i385 cache controller had 4-byte wide lines, this required longer tag and more tag memory in general but allowed for lower latency stalls.

I suppose due to the same reasons (486 being able to overcome some of the stall time) some 386/486 chipsets never bothered with concurrent cache and RAM access - in other words the RAM is only addressed once a cache miss occurs. This adds extra wait cycles for the miss. OPTi chipsets are guilty of this, but not just those. And to add insult to injury the 386 bus protocol can't do bursts and many cache/mobo designs are not good enough to feed the CPU with zero WS even from cache. Sometimes you can overcome it with really fast tag SRAM chips (12ns or less) but in some cases it's a lost cause, the chipset and routing is just not going to keep up without glitching eventually. On 486 you want more cache and if you have to go from 2-1-1-1 to 3-1-1-1 you won't even feel it except in synthetic benchmarks. But on 386 going from 2 latency to 3 negates any benefits from having more than 128K cache. Many combo mobos were meant to be stable with 33MHz 486, not 40MHz 386, so the design is a bit lacking I guess.

As for the OP's question, this was already mentioned but a good 386 chipset will perform better than 386/486 combo one. Unless the latter has been optimized for both CPU families and AFAIK that's never the case. These are just early 486 chipsets that still have 386 backward compatibility. The UNIchip U4800-VLX is a pretty decent 386-only design. Not only is it a bit faster than OPTi 386/486 combos (esp. once you run out of cache or the hit rate is so-so), the best performance is possible with 15ns tag and 20ns data SRAMs. But if op needs VLB (doesn't do much for 386 but is fun anyway) or better than just nominal Cyrix DLC support then 386/486 chipsets might be a better pick. It's not always just the pure performance we are after.

Reply 29 of 36, by Jo22

User metadata
Rank l33t++
Rank
l33t++

What about self-modifying code? As used in demoscene?
Isn't a fast 386 without the need on relying on L1 cache less being affected by cache-misses here?

"Time, it seems, doesn't flow. For some it's fast, for some it's slow.
In what to one race is no time at all, another race can rise and fall..." - The Minstrel

//My video channel//

Reply 31 of 36, by Deunan

User metadata
Rank l33t
Rank
l33t
Jo22 wrote on 2024-09-25, 00:31:

What about self-modifying code? As used in demoscene?
Isn't a fast 386 without the need on relying on L1 cache less being affected by cache-misses here?

Self-modifying code might be changing just a few bytes ahead (for speed) or the entire sequence of code/data (decryption or unpacking). But even in the latter case it's not much different from 486 ecountering the code for the first time, or after a massive L1 cache spill. Yes there will be a load penalty but after each line load the 486 can feed the decoder with full cache lines at once (so 16 bytes and not 4 like on 386) and both decode faster and execute in 1 cycle for many basic ALU instructions. 386 min execute time is 2 cycles since that's how fast a single bus transaction can be, so there is no point in having faster core. It's even held back by the decooder so that would have to be upgraded first.

So in general 486 can overcome its own limitation of stalling during each cache line load and in the end be faster than 386 even in worst case scenarios, and at lower clock. Instruction latency is less predictable unless you also pay attention to memory alignment but it only really affects loops, and even in that case after the first load the data will be in L1 so next pass will more than make up for it, and each pass more will be a pure speed benefit.

And frankly 386 is not all that happy to deal with self-modifying code either. There are 2 queues in the instruction decoder, the bytes read and the decoded instructions (in the rare cases you ecounter a slow instruction like mul or div that allows the decoder to move ahead of the current instruction pointer). Both need to be flushed if your modified code is just ahead and you want to properly reload it. This is done with a jump instruction and IIRC even the shortest jump takes 7 cycles on 386, plus whatever the decoder needs to fully load and process the next instruction, 4 bytes at a time at best. 486 needs only 3 cycles, and also L1 fetch and decoding but as I mentioned above it's not as bad as it looks.

The self-modifying or heavily branching code is however a case where 386 would benefit from shorter cache lines.

Reply 32 of 36, by dionb

User metadata
Rank l33t++
Rank
l33t++
daniil1909 wrote on 2024-09-23, 12:27:

[...]

I just wondering is theres some advantages of true 386 chipset

This sort of reminds me of discussions in the 1990s black metal scene around which band was the most "trve". Usually the ones most likely to win that accolade sounded the most awful, but that was a big part of the point 😉

Apart from some potential corner cases (discussed at lenght above) the answer is a simple "none". It's like putting tyres rated to 210km/h under a car that can't go faster than 160km/h: if the faster rated tyres were more expensive you might have wasted a bit of money, but otherwise they will function exactly as well as tyres rated for say 175km/h on that car.

In addition 486 chipsets can give you things like VLB slots. Totally not 'trve' (and by the look of it not present on this board), but fun to see how even a 386-16 actually benefits (slightly) from the wider bus compared to an ISA VGA card.

Reply 33 of 36, by douglar

User metadata
Rank l33t
Rank
l33t
daniil1909 wrote on 2024-09-25, 01:14:
douglar wrote on 2024-09-24, 15:38:
daniil1909 wrote on 2024-09-22, 08:55:

ALi M1429G

Is it one of these boards? There are more of them than I expected.
https://theretroweb.com/motherboards/?page=1& … Ids%5B0%5D=2643

Seritech SER-386-AD-III

That's a tight little board. M1431 South bridge. No wasted space.

Would have been wild if you found an M1429 board with a M1435 South bridge + 386DX socket, but I think we can all agree that 386 computers with PCI slots are pretty rare.

Does the BIOS give any 486 or Cyrix specific like options like onchip cache?

Reply 34 of 36, by BitWrangler

User metadata
Rank l33t++
Rank
l33t++

I think I heard of some 386/PCI back in the day, but they were alpha grade, needed configured like EISA, mainly intended to be development systems. Some guy on a newsgroup got hold of one in 1997 and it wouldn't run what we think of as early commercial PCI cards, possibly because the resource setting protocol had changed.

Though I wouldn't be that surprised if there was something a bit more robust that occurred in industrial boards later in the 90s, but not so many use cases where you'd need I/O speed of PCI without having horsepower of Vortex86 100 plus mhz 486/586 clones to deal with it.

Unicorn herding operations are proceeding, but all the totes of hens teeth and barrels of rocking horse poop give them plenty of hiding spots.

Reply 35 of 36, by daniil1909

User metadata
Rank Newbie
Rank
Newbie
dionb wrote on 2024-09-25, 11:07:
This sort of reminds me of discussions in the 1990s black metal scene around which band was the most "trve". Usually the ones mo […]
Show full quote
daniil1909 wrote on 2024-09-23, 12:27:

[...]

I just wondering is theres some advantages of true 386 chipset

This sort of reminds me of discussions in the 1990s black metal scene around which band was the most "trve". Usually the ones most likely to win that accolade sounded the most awful, but that was a big part of the point 😉

Apart from some potential corner cases (discussed at lenght above) the answer is a simple "none". It's like putting tyres rated to 210km/h under a car that can't go faster than 160km/h: if the faster rated tyres were more expensive you might have wasted a bit of money, but otherwise they will function exactly as well as tyres rated for say 175km/h on that car.

In addition 486 chipsets can give you things like VLB slots. Totally not 'trve' (and by the look of it not present on this board), but fun to see how even a 386-16 actually benefits (slightly) from the wider bus compared to an ISA VGA card.

VLB slots not present on my board.

Reply 36 of 36, by daniil1909

User metadata
Rank Newbie
Rank
Newbie
douglar wrote on 2024-09-25, 15:34:
That's a tight little board. M1431 South bridge. No wasted space. […]
Show full quote
daniil1909 wrote on 2024-09-25, 01:14:
douglar wrote on 2024-09-24, 15:38:

Is it one of these boards? There are more of them than I expected.
https://theretroweb.com/motherboards/?page=1& … Ids%5B0%5D=2643

Seritech SER-386-AD-III

That's a tight little board. M1431 South bridge. No wasted space.

Would have been wild if you found an M1429 board with a M1435 South bridge + 386DX socket, but I think we can all agree that 386 computers with PCI slots are pretty rare.

Does the BIOS give any 486 or Cyrix specific like options like onchip cache?

No. Bios there is kinda cut down, no shadow ram, no anything.