VOGONS


The World's Fastest 486

Topic actions

Reply 260 of 302, by Horun

User metadata
Rank Oldbie
Rank
Oldbie
kool kitty89 wrote on 2020-04-02, 13:13:
Also, in relation to the argument over what constitutes a 486 or not some pages back in the thread, and the mention of some AM5x […]
Show full quote

Also, in relation to the argument over what constitutes a 486 or not some pages back in the thread, and the mention of some AM5x86 variants having not just the 'x5' printed on them, but actually AM486 DX5.

Here's one I found a few months ago (listed on ebay as an AMD 486, no 5x86 in the title or description).

Am486 DX5-133W16BGC
9738CPA

Very Interesting ! I have a AMD-X5-133ADW with same lower left part number: 25544.
I have not been following this topic but when I saw your CPU had to go check mine to see if it was the same.

Attachments

  • Img_0993s.jpg
    Filename
    Img_0993s.jpg
    File size
    71.43 KiB
    Views
    395 views
    File license
    Public domain

Hate posting a reply and have to edit it because it made no sense 😁 First computer was an IBM 3270 workstation with CGA monitor. 🤣

Reply 261 of 302, by kool kitty89

User metadata
Rank Member
Rank
Member

The other two AMD P75 CPUs (both 5x86 marked and X5-133ADW) are also 25544.

The 5x86 cores down-binned to DX4s should that code as well (as ph4nt0m said), and I think all the 16kB cache versions of the AM486 are 5x86s.

Interestingly, the date code (or I think it's the date code) seems to be the newest on that "486 DX5" at 9738CPA.
The other two are 9606APD and 9645APA.

It's also possible my particular 486DX5 is a fake/remarked chip, but it's really well done if it is. It's engraved rather than printed and the 25544 model number is slightly shallower than either of the others (also black like one of the other two, the 3rd example has uncolored engraving there). If it is a fake/remark, the original could've been printed/painted and had those markings scrubbed with little ceramic surface removed (just polished) and not lose some of the thickness required for the older Intel 486SX remarks that were pretty common (SX25 to SX33), but I'm not sure.

The font looks really similar to the older engraving used on some AM386 makes, but without the flat-topped capital A. Plus the numerical font doesn't have that LED/LCD/floro tube style digital matrix look that the 386 markings have.

It's really precise, small, good looking engraving though. (and definitely a functional CPU, one I'll have to try out more when I have a later model 486 board fully working ... and either sorted out with trial and error or a proper manual, as I haven't found one for my Acorp branded SiS 495 based one yet)

If those late model DX4-100s do share that same model number, doing those sorts of remarked chips would be pretty simple and systems/benchmarks would detect them the same.

OTOH there's one thing nagging at me here:

ph4nt0m wrote on 2020-04-04, 17:52:
That's not a surprise. AMD sold many 5x86 chips labelled simply as […]
Show full quote

That's not a surprise. AMD sold many 5x86 chips labelled simply as

Am486DX4-100
A80486DX4-100SV8B
package 25544

They still had 16Kb cache and could be overclocked to 160MHz easily. Original 500nm SV8B package 25398 couldn't do that.

I'd thought the 16kB DX4s were marked DX4-100SV16B. That's certainly what I'm finding pictures of right now as well, along with the 25544 numbering.

And DX4-100 and -120 SV8B models seem to be 25398 chips.

http://www.chipdb.org/img-amd-a80486dx4-100sv16b-247.htm

There's also some marked NV8T and 25253, which must be an earlier revision, apparently also using the .5 micron process, but with write-through rather than write-back cache.

http://www.cpu-world.com/CPUs/80486/AMD-A8048 … X4-100NV8T.html

The DX4 120 was apparently made in both the write-back and write through forms as well, and share those part/mask numbers with the 100. (also some DX2 66 and 80 parts share the NV8T or V8T and 25253 numbers, and those parts also might support both 2x and 3x multipliers)

I have a pair of AMD DX2-66 chips that appear to be 5V ones (or don't have 3V written on them) and lack a suffix after the 66. Both are 24361. I'm not sure what process they use, but maybe the same old .7 micron one used for the AM386 and 1x multiplier 486s. (which would make sense to compete with Intel's .8 micron BiCMOS DX2-66)

Those .5 micron parts are also interesting in the category for potentially tolerating 5V power. If nothing else, they should handle it better than the .35 micron ones that are definitely known to have fairly short lifespans at that. (as some 200 MHz 5x86 overclocking attempts and/or successes have ended in)

That and it should be the same process AMD used for the early K5 parts, and makes me wonder if AMD could've gotten away with bumping the voltage a bit on those to help with yields, at least if improved cooling methods were also introduced. (though between voltage regulation and heat, more like 4.0V which was the normal setting used on Nexgen's .5 and .35 micron parts in proprietary boards) I haven't seen any references to Socket 5 boards with overvolting support like that either, to try with actual Socket 5 parts, though given how limited typical heatsink+fan set-ups were up through 1996 that's also not too surprising and probably would've been more interesting for overclocking Intel CPUs anyway. (given they ran cooler and already often had a bit of overclocking headroom at 3.4 or 3.5 volts)

Hmm, though given AMD had trouble even getting 90-100 MHz K5s out early on, and given how warm 3.52V 6x86 PR-150 and 166 (120-133 MHz) chips ran, 4.0V might not have been that strange. Unless they were specifically trying to avoid some of the mixed PR Cyrix was getting over those heat and power consumption issues. (plus they avoided supporting the 75 MHz bus speed that Cyrix jumped to even though AMD's 1.5 and 1.75 multiplier settings should've been quite appealing, and I imagine it was at least somewhat popular back then to bump 105 MHz K5 PR-150s up to 112.5 MHz and get performance generally beyond the PR-166)

Since AMD may have avoided actually marking 5x86 chips as x5 160 (4x40 MHz) to avoid cutting in on K5 sales (and likewise avoided 3x50 ratings), it seems like that conflict would've also been avoided by voltage bumping the K5 into a higher performance bracket. (that and doing additional yield management by releasing a 1x66 MHz model of K5 priced closer to the 5x86, though the faster FSB should put it ahead of the K5 75 and 5x86 P75 parts alike for a fair number of things)

It seems some early model K5s, at least the two '5k86' 75 MHz examples I have do have a 1x multiplier setting mapped to Intel's and Cyrix's 2x jumper configuration). Both also seem to go OK at 1x83 MHz 3.5V, but not 95 and I don't have any 90 MHz capable boards to try. Cyrix's 1x multiplier is more well known on the 6x86, and I'd assume was an appealing option for some very early adopters with 80 MHz (2x 40) rated models, and maybe even some situations with the 2x50 MHz ones, especially on boards with larger L2 cache and/or fast DRAM performance. (the 50 MHz case also usually means 25 MHz PCI bus too, though 40 MHz PCI with 40 MHz FSB would be one plus point for the 80 MHz part ... and the 3x multiplier setting for that matter, if you had one of those unusual boards with 40 MHz FSB + 1x PCI)

Well that, and I'd think 1x66 MHz socket-4 compatible upgrade modules would also be appealing. Assuming Cyrix and AMD didn't want to actually run their parts at 5V or even bother manufacturing socket 4 packages, they could've gone the soldered-on-board voltage regulator + pin adapter route like some 486 and 5x86 upgrade boards did. (and having slower, but non-buggy FPUs and faster 16-bit code and/or general ALU performance would seem like selling points) Granted, AMD was so late to market that the Cx6x86 is far more relevant there. (Cyrix's early, 1995 production 6x86s would've been still relevant for Socket 4 systems)

I mean ... I could take some of my Socket 5 and 7 CPUs with 1x multiplier settings and pit them against some of the faster 486s here and see if there's even any relevance (compared to people building fast Socket 4 systems back in 1995/96 ... not so much those who actually had Socket 4 boards). Though I'm tempted to see how that ISA bottleneck issue pans out too. I have one or two FX based Socket 5 or 7 boards (or they're marked socket 7 on the socket, but not full socket 7 spec) that might be worth comparing ISA performance on. None of the early (fast for the time) SiS based Socket 5 boards to compare either on my end.

That supposedly very-slow OPTi Socket 4 pentium chipset with VLB slots would be interesting to compare too, including on the VLB vs ISA performance differences, or if it actually does better with ISA performance than the same cards do on faster, PCI based boards. (though I guess Intel's and SiS's Socket 4 chipsets would also be relevant for comparing there, and ALi's for that matter)

Reply 262 of 302, by CoffeeOne

User metadata
Rank Member
Rank
Member
kool kitty89 wrote on 2020-04-09, 14:22:

I mean ... I could take some of my Socket 5 and 7 CPUs with 1x multiplier settings and pit them against some of the faster 486s here and see
.....

slightly shortened.

There is no 1x multiplier at Socket 5 and Socket 7 cpus.

Reply 264 of 302, by The Serpent Rider

User metadata
Rank l33t
Rank
l33t

Very Interesting ! I have a AMD-X5-133ADW with same lower left part number: 25544.

All 350 nm CPUs are labeled with this number.

Get up, come on get down with the sickness
Open up your hate, and let it flow into me

Reply 265 of 302, by kool kitty89

User metadata
Rank Member
Rank
Member

Nearly all my 6x86s will run at 1x multiplier: all the non L ones will and some (maybe all) of the L ones will, but different jumper mappings are used. (I forget if some 686L's have the same mapping as earlier models, but the PR200s I've tried seem to use a different setting)

All of them use the same 2x multiplier setting and I think 3x as well, and there's supposed to be a 4x setting, but I'm not sure how many have it or if the mapping is changed. (the 1x and 4x settings might be swapped on the late model parts) Though there's very little use for 4x. At 50 MHz, only overvolted PR200 L chips would probably work and that'd be more for academic comparison of various 50 MHz FSB parts (or performance difference from 3x66 to 4x50; 2x100 is possible too, but hasn't been completely stable in my attempts and worse than 3x68).

I thought I had it written down, but apparently not.
The normal (informally documented) 1x setting on the 6x86 is the same as P54C 2.5x while 3x is P54 1.5x (or P55C 3.5x). And 2x = 2x of course. (and by process of elimination, Intel 3x should be Cyrix 4x on chips supporting that setting)

http://www.pchardwarelinks.com/cpuspeed.htm
This article says Cyrix M1 3x is the same as AMD K6-2 4x (which is a 3-jumper setting) but 1.5/3.5x works in my experience and I swear it's also silkscreened onto my P5A-B

And, yes it is, and not just on mine:
Heatsink and fan for ASUS p5A-b
https://i.imgur.com/VUeEnY7.jpg

Note there's no 4x listing on the table's row for 'M1' but there is 1, 2, and 3x listed. (so those were all documented settings by Asus in 1997/1998)

The two AMD 5K86 P75 CPUs I have appear to only recognize 1 jumper and ignore the status of the second, so only the P54 1.5 and 2x settings are relevant. 1.5 maps to 1.5 and 2x (P54) maps to 1x (5K86). This does not happen with any of the K5 CPUs I have, including the older models (P75 and P100) which appear to map to 1.5 and 2x instead. (and thus P54 3x = 5k 1x and P54 2.5x = 5k 1.5x)

It's not just the BIOS misinterpreting the settings as I did benchmark testing that matched up with a 1x speed. (also the floppy disk version of X-Wing was happier with the 1x 5k86 than other CPUs I'd been trying ... my system requires other tweaking to get it working properly with sound with much faster CPUs and FSB speeds: short of disabling the L1 cache, using slower SDRAM timings and higher I/O recovery time settings works though 100 MHz FSB is tricky ... XwingCD is a lot less finicky, though Wing Commander 2 is apparently worse and I don't mean just running too fast, but running at all)

Hmm, though also come to think of it: I wonder why Cyrix didn't include a 1.5x multiplier setting. As far as clock synthesizer complexity it should've been a simple matter of 1/2 of 3x (so 1, 1.5, 2, and 3x settings would be available using 2 jumpers). At least, doing that would make more sense than supporting 4x. I guess supporting just 1, 2, and 3x makes sense at least for avoiding semi- asynchronous I/O buffering. (I assume they bothered to include 3x due to supporting that 40 MHz FSB speed early on, otherwise going even cheaper with just 1x and 2x seems obvious ... unless they simply carried over some of the chip design of the 5x86 that had 2, 3, and 4x PLLs)

On a Socket 3 related note: I have a Socket 3 Pentium Overdrive that's missing the heatsink+fan and one of the surface mount capacitors or resistors (or diodes?) on top, doesn't look desoldered though. It appears to run at a 1x multiplier as well (or counfuses the BIOSes in the boards I've tested), but I need to do more testing to confirm this. I haven't been able to adjust multiplier jumpers on the one late model Socket 3 board I have (Acorp SiS 495, I think 1996 BIOS date and recognizes it as a P24T) and only played around with it briefly at what should have been 1x40 MHz.

Reply 266 of 302, by ph4nt0m

User metadata
Rank Member
Rank
Member
kool kitty89 wrote on 2020-04-13, 08:01:

I have a Socket 3 Pentium Overdrive that's missing the heatsink+fan and one of the surface mount capacitors or resistors (or diodes?) on top, doesn't look desoldered though. It appears to run at a 1x multiplier as well

PODP5V63 and PODP5V83 fail over to 1x is the fan is not present or not working. There is a rotation speed sense logic.

My Active Sales on CPU-World

Reply 267 of 302, by matze79

User metadata
Rank Oldbie
Rank
Oldbie

Hm but a 5x86 is no 486 😉 it just fits the same Socket.

https://dosreloaded.de - The German Retro DOS PC Community
https://www.retroianer.de - under constructing since ever

Co2 - for a endless Summer

Reply 268 of 302, by darry

User metadata
Rank Oldbie
Rank
Oldbie
matze79 wrote on 2020-04-16, 17:23:

Hm but a 5x86 is no 486 😉 it just fits the same Socket.

It's definitely not a Pentium either, so what is it ?

Seriously, I am of the opinion that 5x86 chips are 486 chips with a 5x86 marketing name . They have no Pentium specific features or design traits .

Reply 270 of 302, by ph4nt0m

User metadata
Rank Member
Rank
Member
matze79 wrote on 2020-04-16, 17:23:

Hm but a 5x86 is no 486 😉 it just fits the same Socket.

Here we go again. AMD 5x86 is no different from AMD 486DX4. Cyrix 5x86 is a single pipeline ALU core unlike 6x86 which is dual. Even IDT WinChip is technically a 486 chip for Socket 5 or 7. Single 4-stage ALU pipeline, in order execution. No branch prediction or pipelined FPU either, those appeared in WinChip 2.

My Active Sales on CPU-World

Reply 271 of 302, by feipoa

User metadata
Rank l33t++
Rank
l33t++

Winchip is...a 486 chip for [the] socket 7 [platform].

Never thought of it like this before, but I like it.

Ultimate 486 Benchmark | Ultimate 686 Benchmark | Cyrix 5x86 Enhancements | 486 Overkill Graphics | Worlds Fastest 486

Reply 272 of 302, by The Serpent Rider

User metadata
Rank l33t
Rank
l33t

Well, I've posted few pages ago CPU-Z benchmark with Winchip comparison and it really resembles "486 on steroids". Winchip is obviously faster though, due to huge difference in L1 cache sizes.

Get up, come on get down with the sickness
Open up your hate, and let it flow into me

Reply 273 of 302, by kool kitty89

User metadata
Rank Member
Rank
Member
The Serpent Rider wrote on 2020-04-17, 12:08:

Well, I've posted few pages ago CPU-Z benchmark with Winchip comparison and it really resembles "486 on steroids". Winchip is obviously faster though, due to huge difference in L1 cache sizes.

If you've never seen it, there was the "133 MHz Challenge" discussion/benchmark thread from 11 years ago that resulted in similar remarks. (plus I think some articles and reviews from back in the 90s that made similar comments)

133 MHz Challenge - 5th/6th gen CPU per clock performance

SiS Sandra99 gets used heavily as the benchmark there. That thread is part of why I've been using that benchmark suite fairly heavily for all my win9x related system comparisons, though I've found it's not a good test for system stability. (ie Sandra99 is a lot more tolerant to CPU/FSB/memory running on the edge than some other software is ... Quake I is also on the more stable side for that matter)

Edit:

I see the original images are missing from that thread. Shame. There doesn't seem to be an old enough archive of it on the wayback machine either. (I must remember it from back around 2010-2011)

feipoa wrote on 2020-04-17, 04:24:

Winchip is...a 486 chip for [the] socket 7 [platform].

Never thought of it like this before, but I like it.

Yep, that's part of why I suggested AMD missed out by not hedging there bets on the K5 with a Socket 5 version of their 486 x4/x5 core, probably with a beefier cache. (especially given how small the .35 micron 486 was) Would've been a lot easier than trying anything more elaborate on the CISC end, like an actual P5-like superscalar core.

The Winchip's 486-like performance came up way back in the 133 MHz challenge thread linked above.

Same thing for the VIA CIII series of CPUs. I think all of those are single-pipeline designs. I know the first generation ones are, and the per-clock performance shows it. (I think the 686 benchmark comparison showed they often did worse than the late model Cyrix M2 core CPUs at half the clock speed and I'm not sure they even had a performance per watt advantage compared to the 2.2V or Mobile M2s, though the Joshua core with added 256 kB L2 cache and apparently beefier FPU did more poorly per watt)

I assume clock rate vs PR marking concerns was one of the reasons VIA went with the Winchip/Centaur design.

Both Cyrix's and Centaurs designs were direct-execute CISC x86/IA32 cores, not trasnlated into micro-ops of some embedded RISC core, but all of the Winchip-derived CPUs were simpler direct-execute in-order ones more like a 486 (with branch prediction, but not out of order execution: the Cx6x86 approach was to allow instructions to be swapped at one of the pipeline stages).

The P5 pentium didn't allow out of order execution either, and relied on compiler scheduling for efficient parallel execution. I don't think any other x86 compatible CPUs opted for that (dual-pipeline, in-order execution) but some RISC CPUs did. The Power PC CPU core block in IBM's Cell processor worked like that, as did the Playstation 3 and Xbox 360's CPUs based on it, and it made for headaches in the game development end from what I understand, especially since Microsoft's preproduction development kits had used dual-core (or dual CPU) Power PC G4 or G5 CPUs without such compiler dependency. (not to get into Sony's inclusion of the Cell's array floating-point SIMD coprocessors vs MS's use of 3x of the same PPC cores)

As for the Cyrix 5x86 being a 486-class CPU or not, I think it is but I'm not sure it literally qualifies as a 486 itself ... or is it basically fair to say it's Cyrix's 16kB cache + feature tweaked upgrade to their 486DX4? (at least in as much as Intel's DX4 was over their DX2, given the clock for clock performance gains that seem to have shown up there compared to the Cyrix and AMD DX2s vs their respective DX4s)

From what I remember, the Cyrix 5x86 got a few FPU performance tweaks that weren't enabled by default (otherwise similar to the older, but quite fast Cyrix FPU).

Cyrix doesn't have any other 16kB cache Socket 3 compatible processor, so they'd be left out of the fastest spots in this sort of comparison without the 5x86 being there, I think.

Cyrix's Media GX (and NS's Geode and some models of AMD Geode) also used variants of the 5x86 core, but I'd think those would count more in the vein of the Winchip and Socket 370 based IDT/Centaur/VIA CPUs. It's hard to compare fairly with CPUs with generous L2 cache ... I'm not sure if there's been back to back comparisons with Socket 5 and 7 boards with L2 disabled/missing, but that would be interesting. (the 686 benchmark comparison results at least give a reasonable outlook on the memory controller performance, which should at least give a general idea of how the integrated memory controller/bandwidth compares to Socket 7 chipsets)

On that note: my Opti 495SX board runs very slow with the DX4 when all cache is disabled, much slower than my 386SX at the same bus speed (let alone core clock) and well under 286 speeds at the same clock rate. (at 33 MHz it's down to about equal to my D60 based 286-20 in 3DBench, Landmark, and Xwing's Detect: over 200 ticks for the latter and the in-game performance matches; and at 25 MHz it's more like a 286-12 as far as X-Wing is concerned)

However, I'm not positive it's just the memory controller being that slow (and having neither page-mode nor bank interleave support) or if there's little to no features included for using faster DRAM access for uncached reads/writes. (ie they cut out arbitrary/variable read/write cycle timing and page-mode access duration for anything but cache related operations)

Also, I'd assumed the Opti 495, 495A, B, SX, DLC, and other variants were virtually the same thing, but maybe there's some significant performance differences there too. (if the 495DLC is any different/better than the 495SX, then that would mean Red Hill's typical Cyrix DLC builds would be faster than mine)

That and I haven't seen what a 386DX does in that board with the cache disabled. There might be some weird 386-mode-specific DRAM functionality.

Reply 274 of 302, by mpe

User metadata
Rank Oldbie
Rank
Oldbie

I think these loose definitions of what 486 or Pentium mean are tricky. They just unnecessarily blur things out.

Are all single issue in-order CPUs released after 1993 486s? Are all out-of-order x86 CPU's PPros?

Competing CPU designs all have similar and dissimilar attributes and there is no such thing as 486 for Socket 7. I can provide several reasons why Winchip is not a 486.

Perhaps the title of this amazing thread should have been “World’s Fastest Socket 3 system” so that there would be no doubts as to whether 5x86, PODP83 etc are allowed and no-one would be tempted to compete with Winchip.

Last edited by mpe on 2020-04-17, 13:21. Edited 1 time in total.

Blog||486DX-50|NexGen 586|S4

Reply 275 of 302, by feipoa

User metadata
Rank l33t++
Rank
l33t++
mpe wrote on 2020-04-17, 13:03:

Perhaps the title of this amazing thread should have been “World’s Fastest Socket 3 system” so that there would be no doubts as to whether 5x86, PODP83 etc are allowed and no-one would be tempted to compete with Winchip.

Perhaps it should be, but that title isn't catchy.

Ultimate 486 Benchmark | Ultimate 686 Benchmark | Cyrix 5x86 Enhancements | 486 Overkill Graphics | Worlds Fastest 486

Reply 277 of 302, by kool kitty89

User metadata
Rank Member
Rank
Member

Doesn't the AM486 family also share more in common with the i486 due to AMD's former status as a licensee (and more or less direct copy of the 386) and also perhaps due to the AM486 being somewhat reverse-engineered in design rather than an all in-house design with similar features.

From what I understand the AM486 is also not like the IBM SLC/DLC CPUs in as far as IBM's chips were pure 386 cores internally coupled with an on-chip cache and clock doubling or tripling.

And Cyrix's CPUs were all totally in-house designs, not clones or reverse-engineered hardware and also avoided the issue of copying microcode that NEC ran into with the V20/30. I believe the Cyrix 486DLC has more i486 like features than IBM's DLC parts (in terms of pipelining, faster IPC rate, greater emphasis on 16-bit instruction acceleration compared to the 386 and IBM DLC).

Also, I was partially wrong about the OPTi 495SX chipset's performance. I think the slow DRAM read speed I was getting was due to both BIOS restrictions and write-back L2 cache performance issues killing read speed. (using Cachecheck) Disabling the L2 but leaving the L1 enabled drasically improved DRAM read performance, with writes improving from 326 to 147 ns though writes stayed at 127 ns with the 40 MHz fsb setting. (which sits pretty close to 4 bus clocks for writes and 5 for reads)

However enabling auto-configuration in the advanced chipset features section of the BIOS (DRAM and cache timing, etc) improved that further to 95 ns effective read speed while still leaving writes at 127 ns, so it seems like page-mode operation is enabled there. (I'm not sure about bank interleave as Cachecheck doesn't seem to notice the difference there where some other benchmarks do, including 3Dbench and Xwing's detect.exe as I discovered with my PCChips M396F, but haven't done full comparison with 1 vs 2 banks of DRAM installed in the OPTi board)

It also doesn't seem to need better than 70 ns DRAM to run at 40 MHz 0 WS mode, and the 80 ns modules I've tried so far have been fine too. Except '0' in this board's BIOS seems to just mean 0 additional waits, and the minimum configuration is for 5 or 6 bus clock DRAM cycles and probably 2 bus clocks for page mode. And given the apparent 6 bus clock read cycle timing I'm seeing in cachecheck, that would fit with the 150 ns RC times of some 80 ns FPM DRAMs (some are closer to 170 ns, and those figures do also depend on the DRAM control logic allowing for optimal pulse sizes for the various timing parameters, but there's also usually some room for error beyond the official tolerance limits, especially if you aren't actually running the RAM close to 70C). That also implies the 50 MHz FSB setting would cut timing close for 60 ns DRAMs at the 0 ws settings, though 1WS would probably allow most 70 ns stuff to work. (assuming 1 ws = 7 bus clock reads so ~140 ns)

Performance with all cache disabled is also noticeably improved with auto-config enabled, so it probably would be an OK board to have with a 386DX 40 and no cache. (ie cheapest base model configuration)

I noticed a couple threads on vogons dealing with write-back L2 cache performance issues and DRAM read speed getting killed by enabling WB. Write-Through mode usually avoids this, apparently, but I'm not sure how to enable that setting on my board and it's certainly not in the BIOS. (setting the cacheable area to 64MB doesn't seem to force it either and performance doesn't change between that and 32 or 16 MB, or all the way down to the 4MB minimum setting, other than the uncached regions having somewhat faster DRAM times but still much slower than with L2 disabled)

There might be a jumper setting to select cache modes. I don't remember reading that in the manual or on the silkscreened markings on the board, but it might be there.

Dropping to 33 MHz FSB also allowed auto-config to work with L2 enabled, which made things faster still, but it won't work at 40 MHz with the 20 ns SRAMs installed. Read speeds still take a hit compared to L2 disabled, but are much better than with the fastest manual config settings. Cached RAM access is also much faster (I assume it's using 3-1-1-1 or 2-1-1-1 timing) and Cachecheck gets confused as L2 performance is almost as fast as the L1, which is stuck in write-through mode for some reason. (there's a jumper setting specific to cyrix CPUs that might do something, but that's only supposed to be for the 486DLC chips and I'm not sure what it does even there: not having it set still leaves the L1 Cyrix cache enabled when I tried a DLC chip in the board)

I might be able to use a register tweaking utility to alter the Cyrix 486 cache policy, though. I haven't downloaded any of those yet.

Reply 278 of 302, by kool kitty89

User metadata
Rank Member
Rank
Member

With the L2 disabled and 40 MHz FSB with auto-configuration enabled, that OPTi board and 120 MHz Cx486 is getting 85.76 MB/s in the 'memory bandwidth' section of Speedsys.

I'm pretty sure that number in Speedsys is related to the on-chip L1 cache fill speed given it seems dependent on both external memory performance and internal cache speed and seems to match up with the L1 read speed portion of the extended memory test line graph. (the write speed for the first 8kB is approximately 90 MB/s though oddly it starts above 90 and then dips below it near the 7 kB range but before the drop-off at the L1 cache boundary).

Disabling auto-config drops the bandwidth number down to 54.58, but the L1 read speed portion of the graph remains unchanged at that ~90 MB/s range, while the L1 'memory timing' number in the chart goes from 49.8 with auto-config to 49.1 without, so also virtually the same.
Memory throughput goes from 29.11 to 23.39 MB/s. Like in cachecheck, write speed is unchanged in these and read speed suffers when auto-config is disabled.

Speedsys's uncached read speeds on the line graph also look to match up with the Cachecheck ones I get:
43.6 MB/s with autoconfig, 28.6 MB/s without autoconfig.
The write speeds also match up and remain unchanged for both settings.

Nothing in Cachecheck seems to directly correlate to Speedsys's Memory Bandwidth figure, though the L1 cache speed seems to match Speedsys's L1 read speed in the extended memory test. (Cachechek gets 97.9 or 95.9 MB/s)

Enabling the board level cache totally kills the Speedsys memory bandwidth figure, dropping it all the way down to 24.1 MB/s.

Maybe 'memory bandwidth' in Speedsys is testing max read/write bandwidth with burst-mode memory accesses (when a processor supports them) from DRAM. Or maybe it's dependent on both that and L2 cache fill performance from DRAM. (if I could switch between write-back and write-through modes this would be easier to sort out)

Oddly, L1 cache speed improves with the L2 enabled, or at least the printed L1 data cache number listed in the graph box.
The 8kB region of the line graph doesn't appear to be different, still just over the 90 MB/s gradation, so maybe it's just a rounding error altered by the steeper 8kB boundary region without L2 than with it.

L2 reads look like 67.5 MB/s on the line graph and write speed is still unchanged from the other settings. (which seems about right for 2-2-2-2 timing)
I can do 2-1-1-1 (or auto-config + L2 enabled) at 33 MHz sometimes, but it's only stable with some of my video cards and crashes after loading the DOS shell at other times. (Avance logic VLB card doesn't like it, S3 Sealth 24VL seems fine though it might also have something to do with wait state jumpers on the cards ... it's definitely cache related since disabling the L2 makes the problems disappear)

I've only got 20 ns ISSI cache and TAG RAM installed currently, so that's going to be a limitation.

The interaction between board level cache and DRAM performance (and different chipset register settings and BIOS revisions) might explain some of the odd behavior that came up in this thread:
LuckyStar LS486E rev.C2 and Cyrix 5x86@133
LuckyStar LS486E rev.C2 and Cyrix 5x86@133

(unusually slow Speedsys memory bandwidth scores were present for some SiS 496 based boards)

Write-back caching schemes for board-level cache seem to kill DRAM read performance on most (or all) chipsets, so having WB vs WT mode enabled might be at least one of the differences for the Speedsys 'Memory Bandwidth' scores. (DRAM wait states, overall cycle timing parameters, and page-mode support or timing also should affect that figure) This may have changed with Pipeline burst cache and maybe some asynch cache schemes on Socket 5 and 7 boards. (I haven't run many tests with board level cache disabled on my Socket 7 boards though I thought disabling L2 actually hurt the Sandra99 memory benchmark tests on my P5AB)

Mismatched BIOS + Board might also result in CPU speeds being reported erroneously (similar to BIOSes too old to explicitly support the CPU model) so you could also be running the CPU multiplier at a different setting than is reported. Speedsys often gets that wrong, so you'd want to use another utility to check the real speed. (it thinks my 120 MHz 486 is 40 MHz, probably due to info it's getting from the BIOS ... even though the POST table actually reports 66 MHz)

The CHKCPU utility seems to be good at that so long as on-chip cache is enabled (when present) and Landmark 6.0 seems to be fairly consistent at it as well for 386 and older CPUs. Cachecheck can get figures in the ballpark and is better than Speedsys but not accurate like CHKCPU. (it says my 120 MHz Cx486 is 136.9 MHz and it gets similar figures when run in real-mode or V86 mode: it gives a warning when V86 is used, but I haven't seen any difference in benchmark results so far: maybe there's more issues in win9x compared to just having EMM386 running in DOS or it crops up for some faster CPUs)

I also get really weird results for some tests when the de-turbo mode is enabled (turbo switch set closed), and it doesn't seem to be down-clocking the CPU but messing with the system (and I think ISA) bus speeds and wait states. Cache Check also oddly reports the L1 cache as 2 separate 4kB caches at different speeds when this is set. (the 4kB range is faster than the 8kB range) CHKCPU continues to report 120.1 or 119.9 MHz (and 40.0 or 39.9 MHz FSB) so long as the L1 is left enabled.

I suspect the ISA bus is being messed with since the system would refuse to POST at all with turbo switch set and the ISA divider was higher than 3 or 4, depending on FSB speed.

Reply 279 of 302, by ph4nt0m

User metadata
Rank Member
Rank
Member

Many chipsets of early 1990's defaulted to write back cache without a dirty bit in tag SRAM. It allowed for a larger cacheable range, but led to excessive writes because every line in cache was considered dirty. Write through is generally better than write back without a dirty bit.

My Active Sales on CPU-World