VOGONS


First post, by melbar

User metadata
Rank Oldbie
Rank
Oldbie

I've planned to build a Retro PC over a half year ago. I wanted to re-activate my old dual-celeron system (2x466MHz) on the famous Abit BP6 board. I used this system around 1999 to 2002. The former Voodoo3-2000 and the Geforce256 DDR, which were the GPU cards for this system, doesn't exists anymore cause they were scrapped long ago...

But anyway, i was not able to boot this old hardware, the old air coolers were also really loud, and finally i've decided to sell it on ebay. As a good replacement, i've bought really cheap a Pentium III 866 system + Geforce3-Ti200.

During last chrismas, i've found the really interesting video's from Phil'scomputerlab on youtube about building Retro PC's. And especially the video about the '4in1 time machine' was really inspiring me. Then i've decided to buy in addition a super-socket 7 (SS7) board for Pentium's and AMD K6's processors.

The purpose of this SS7 board for me was, to run the PC also with the lowest speed that a 386 or 486 system is not really necessary anymore, due to the reasons:
1.) nowadays depending on board + CPU this old hardware is really expensive and
2.) the problem which can occur with AT-formfactor, AT-power supply and often with the RTC-clock.

I know that mainly the lowest FSB of a SS7 board is 66MHz. That means a Pentium 75MHz or 90MHz can't run. And as it comes, the board that i've bought comes with a K6-2 500 CPU.
So, the first idea was: Ok, let's clock this CPU with the minimum multiplier 1.5x and have fun with 100MHz. Written in manual and also as a table onboard the mainboard, the smallest multipliers written as '1.5x/3.5x'
It turns out that the K6-2 takes the higher one...And also the 2x muliplier does not work cause the 'modern' K6-2 CXT interprets the second multiplier as 6x.

Then, i was really thinking that with the resultant clock (2.5x66MHz = 166MHz) the system becomes too fast. So, i've bought (also really cheap 😀) a Pentium 150MHz for a final clock of 100/133 MHz.

This weekend i've completed all the tests which i need for a good impression of CPU's from PI+PII+PIII and AMD K6 /-2 /-3 era at their lowest possible speed. My tested hardware with Windows 95 is:

- AMD K6-2 500
- Shuttle HOT597 V14 (with VIA's 82C598MVP chipset)
- 32MB PC66 or 64MB PC100 SDRAM
- PCI ATI Rage XL 8Mb

(During the test, i was only able to set the min. FSB to 68MHz. Even with the setting 66MHz, the cpu runs with 68MHz FSB. Therefore the absolute minimum clock is 171MHz. That means, all clocks of my K6-2 looks not usual due to the 3% overclocked FSB.)

- Pentium III 866
- QDI SynactiX 5EP-A Rev2.0 (with Intel's 815EP chipset)
- 256MB PC133 SDRAM
- PCI ATI Rage XL 8Mb

The last days before the tests, i've tried to predict (calculate) the 3DBench & PCPBench values of a K6-2@166MHz and a Pentium @100MHz / 133MHz, with the values from Phil's K6-III+@133MHz and the corresponding benchmark values from cpu-world. Well, when the architecture of a cpu is similar and the benchmark scale's linearly, then those interpolated values can match the test values without big gaps. But as my testet CPU's have different architecture and also with the scenario's of disable the caches, all this becomes non-linear!

The benchmark, i've only used are the following:
- 3DBench (from Phil's package)
- PCPBench (from Phil's package)
and only a game bench (for live feeling):
- running Wing Commander (like in Phil's video's with testing 386-DX40 / -DX33 / -DX25 Speed).

I have attached all the data (tables + charts) from my tests as png files.

Note:
I know that the values at full speed can be higher for the K6-2 and PIII, even for the coppermine. But in these test's i didn't use modifications like SETK6 or MTRRLFBE or FASTVID.

Conclusion:

As i did not expect: The K6-2 @171MHz is even more slowly during the disable of L1 and L1+L2 caches (for the 486 and 386 mode), compared to the Pentium @137MHz or @103MHz. That means the Pentium is really faster regarding the speed per clock cycle compared to the K6-2. This has also something to do with the different build of these CPU's: the Pentium has a CISC core and the AMD has a RISC core with x86 to RISC86 decoder. The AMD structure is more influenced by disable the caches.

For the Pentium III @431MHz (66MHz FSB): I am really surprised about the behavior of the coppermine when disable the cache and comparing the two benchmarks. First, it doesn't matters if you disable only L1 cache or both L1+L2 cache. The results are the same. With the 3DBenchmark i've got nearly twice the value of the Pentium@137MHz with L1 disable. But the PCPBenchmark, the Pentium III undermatches all the settings of Pentium & K6-2. Really interesting, i don't know why this can happen...

Attachments

#1 K6-2/500, #2 Athlon1200, #3 Celeron1000A, #4 A64-3700, #5 P4HT-3200, #6 P4-2800, #7 Am486DX2-66

Reply 1 of 9, by Tetrium

User metadata
Rank l33t++
Rank
l33t++
melbar wrote:

Conclusion:

As i did not expect: The K6-2 @171MHz is even more slowly during the disable of L1 and L1+L2 caches (for the 486 and 386 mode), compared to the Pentium @137MHz or @103MHz. That means the Pentium is really faster regarding the speed per clock cycle compared to the K6-2. This has also something to do with the different build of these CPU's: the Pentium has a CISC core and the AMD has a RISC core with x86 to RISC86 decoder. The AMD structure is more influenced by disable the caches.

For the Pentium III @431MHz (66MHz FSB): I am really surprised about the behavior of the coppermine when disable the cache and comparing the two benchmarks. First, it doesn't matters if you disable only L1 cache or both L1+L2 cache. The results are the same. With the 3DBenchmark i've got nearly twice the value of the Pentium@137MHz with L1 disable. But the PCPBenchmark, the Pentium III undermatches all the settings of Pentium & K6-2. Really interesting, i don't know why this can happen...

I think the Coppermine becomes slower with the caches disabled compared to the s7 stuff (especially Pentium 1) as Intel probably made changes to that CPU so it would clock higher (that was a big selling point in those days) and they might've optimized it more to make it also perform good (they kinda did this the right way where VIA accepted too many compromises with it's C3), but I'd reckon that as Coppermine is more optimized, having 1 thing out of order (like its caches) will make it break its performance more.

Perhaps it's something along those lines. Coppermine needed to clock higher yet still perform, but apparently it needs its internal cache to still achieve performance. That would be my first guess.

I'll read your other findings in a minute, well done 😀

edit: And I find the vids Phil (and others!) are making very interesting. Not all of them, but many! 😜

Whats missing in your collections?
My retro rigs (old topic)
Interesting Vogons threads (links to Vogonswiki)
Report spammers here!

Reply 2 of 9, by idspispopd

User metadata
Rank Oldbie
Rank
Oldbie

P2/P3 also disable L2 cache when disabling L1 cache. This is also true for K6-3. For S7/SS7 CPUs without L2 cache the L2 cache is on the mainboard and is not automatically disabled with the L1 cache.
IIIRC from results previously posted K6 CPUs do slow down more than Pentium CPUs when disabling the L1 cache while Cyrix CPUs slow down less than Pentium CPUs and P2 CPUs slow down the most (286 level).
K6-2+/3+ are indeed useful because the multiplier can be changed at runtime and down to 2x.

I wouldn't hold any of these points against the CPUs. After all the normal mode of operation is with enabled caches and that's what counts. Disabling caches is for troubleshooting/debugging, and while different CPUs slow down more or less in that case it doesn't make them bad or slow per se, just more or less suited for playing old games of a certain period.

Reply 3 of 9, by PhilsComputerLab

User metadata
Rank l33t++
Rank
l33t++

Nice!

A lot of data to digest, I'll make myself some hot chocolate 🤣

EDIT:

Well that is some really nice work you did there and match my own findings. Apart from the Pentium III, but more on that later.

Like what you found, CPU architecture of Socket 7 processors, determines "how slow" the CPU gets with caches disabled. While I don't have (m)any exotic processors, this is what I found:

K6-2: Slowest speed with caches disabled
Pentium: Faster than K6-2 with caches enables
Pentium MMX and Cyrix: Fastest with caches disabled. They will give you a solid 386DX-40 equivalent experience with Wing Commander running too fast.

You can also tweak the speed a tiny bit by changing memory type (I have these EDO sticks in SDRAM format, they are slower than normal SDRAM) and you can also change the timings in the BIOS.

Now back to the Pentium III and any Slot 1 processor that I tried.

Firstly, it's easiest to think of L1 BIOS settings as CPU cache. And L2 BIOS settings as motherboard cache.

Slot 1 boards don't have motherboard cache, so the L2 setting doesn't do anything. I'm not sure why the BIOS on such a machine have a L2 option to be honest.

Anyway, what I found is that Pentium III is either super fast, or super slow when cache disabled. With super slow I mean 286 speed, so slow that Indiana Jones and the Last Crusade runs slow 😒

So I find that the cache tricks works great on 486 and Pentium machines, but Pentium II and higher I don't use it and just recommend these for newer DOS games.

YouTube, Facebook, Website

Reply 4 of 9, by melbar

User metadata
Rank Oldbie
Rank
Oldbie
Tetrium wrote:

I think the Coppermine becomes slower with the caches disabled compared to the s7 stuff (especially Pentium 1) as Intel probably made changes to that CPU so it would clock higher (that was a big selling point in those days) and they might've optimized it more to make it also perform good... ..., but I'd reckon that as Coppermine is more optimized, having 1 thing out of order (like its caches) will make it break its performance more.
Coppermine needed to clock higher yet still perform, but apparently it needs its internal cache to still achieve performance.

Yes, i agree to that. The PIII was the next step for Intel around the turn of the century. When the data on wiki is correct, they have following different stage pipelines:
Pentium (P5 gen.) - 5 stage pipelines
Pentium Pro (P6 gen.) - 14 stage pipelines (and i think PII has the same basis)
Pentium III (P6 gen.) - 10 stage pipelines
It's both, the combination of longer pipeline, and the more tiny fabrication process that they (PII + PIII) performs higher, but regarding the point of deactivate the caches, the PII & PIII 's it seems even worse for their execution unit.

idspispopd wrote:

P2/P3 also disable L2 cache when disabling L1 cache.
I wouldn't hold any of these points against the CPUs. After all the normal mode of operation is with enabled caches and that's what counts. Disabling caches is for troubleshooting/debugging, and while different CPUs slow down more or less in that case it doesn't make them bad or slow per se, just more or less suited for playing old games of a certain period.

Ok, that means you have no control for the PII and PIII with the bios setting "external cache" which imply to be the L2 cache.
Well, sure it's no normal mode when you disable the caches, but for me it's something like the turbo button of the 386 and 486's. I think it's a good possibility to run 'ancient' games not in software mode but with real hardware like 'tractor-pulling' 😁
Apart from the additional money you have to spend for old hardware, it's good to have the control of speed in that way. With the K6-2, i have now a rage of ~386DX-20 to full 500Mhz speed (PII equivalent). With the PIII, i've got a range of 433-866Mhz, and also whith the third old companion i've used in the olden days (AthlonXP) has a range of 1,33Ghz(1500+) to max. 2,16Ghz(2700+).

PhilsComputerLab wrote:
Nice! […]
Show full quote

Nice!

A lot of data to digest, I'll make myself some hot chocolate 🤣
EDIT:
Pentium: Faster than K6-2 with caches enables

They will give you a solid 386DX-40 equivalent experience with Wing Commander running too fast.

Thanks 😎
Did you mean " K6-2 with caches disabled" instead of enables? Cause when i compare my values of the K6-2@ 171MHz with the 166's Pentiums' and MMX's from your ultimate VGA benchmark chart, then i have nearly the same.
The problem with WC, i totally agree. I was afraid when i saw your video with the DX-40. During my test's i've tried all the scenarios with different clock+ disables caches to see where's the border. It's not easy, but when i was a child, i've played thru this amazing game at the low speed of 7.16Mhz of a MC68000 processor 😊

PhilsComputerLab wrote:

Anyway, what I found is that Pentium III is either super fast, or super slow when cache disabled.

Well, i have anyway not a clear idea, that for example the 3DBench runs with 486DX66 speed whereas with the same settings the PCPBench is a really low 386 / or fast 286...

#1 K6-2/500, #2 Athlon1200, #3 Celeron1000A, #4 A64-3700, #5 P4HT-3200, #6 P4-2800, #7 Am486DX2-66

Reply 5 of 9, by F2bnp

User metadata
Rank l33t
Rank
l33t

Hello! I seem to remember that there used to be a thread a while ago where people had made a comparison table between different 386 and 486 speeds and K6-2/3 CPUs with cache disabled. It was basically an easy way to find out the settings you had to use on your K6 CPU to achieve a specific speed. If I wanted to do 386DX40 for example, I had to disable both caches and use a specific speed.

Is there anything like this available somewhere? I want to "emulate" a 486/DX2 66 using a K6-III+, but I want to be sure I've hit that sort of performance.

Reply 7 of 9, by clueless1

User metadata
Rank l33t
Rank
l33t

Well done, melbar. That's a lot of work! And thanks for sharing with us. 😀

The more I learn, the more I realize how much I don't know.
OPL3 FM vs. Roland MT-32 vs. General MIDI DOS Game Comparison
Let's benchmark our systems with cache disabled
DOS PCI Graphics Card Benchmarks

Reply 8 of 9, by clueless1

User metadata
Rank l33t
Rank
l33t
melbar wrote:

Well, i have anyway not a clear idea, that for example the 3DBench runs with 486DX66 speed whereas with the same settings the PCPBench is a really low 386 / or fast 286...

I think what you're seeing here is 3dbench 1.0c not scaling down. In that case, try 3dbench 1.0 (the older version) which will give more accurate results when the cpu gets into the slow 386/fast 286 range.

The more I learn, the more I realize how much I don't know.
OPL3 FM vs. Roland MT-32 vs. General MIDI DOS Game Comparison
Let's benchmark our systems with cache disabled
DOS PCI Graphics Card Benchmarks