VOGONS


First post, by Skyscraper

User metadata
Rank l33t
Rank
l33t

I use an old socket 7 Soyo Sy-5eas motherboard and a pentium 166mmx in my retro build.
It uses the Via VPX chipset and can cache 64mb of memory.
The board has one single slot for PC66 sdram which according to the manual can handle up to a 32mb module.
I knew for a fact that it can handle a 128mb module since I sucessfully used a single sided 64mb module.
I thought that I might aswell see how much of a performance hit I would suffer if I installed a larger module.

I searched the net and much as I thought the memory selling sites stated that the board could take a double sided 128mb module.
I also found forum posts written by someone that tried a double sided 256mb module but only got 128mb just as you would have thought.
I tried a 512mb module, and got 256mb! which made me very satisfied, the strange thing is that I diddnt notice much of a performance hit at all.
I lost 2% mips in sisoft sandra cpu test but the mflops stayed the same, the multimedia benchmark the same and the memory benchmark numbers the same.
All this was with identical bios settings using the bios that was on the board when it was sold in 1997.

But since I im thinking of trying a k6-3 if I can find one I wanted to update the bios.
I found a bios from year 2000 which I presume have support for k6-2/3 altough limited to the chipsets officially max supported fsb of 75mhz (83mhz works just fine).
The thing is the new bios made as much of a performance inpact as going over the cachable range. I lost another 2% mips in sisoft sandra, again the mflops, multimedia and memory bench stayed the same.
I also tried PC Mark 2002 and I lost the exact same 4% of the CPU score from the memory upgrade and bios update combined. I also lost 4% of the memory score.
I did only run pc mark with the old bios with 64mb memory and with the new bios and 256mb so I cant be sure if its the same 2% + 2% or if the performance hit came from one or the other.
Perhaps sisoft sandra and pc mark isnt the best tools to measure the performance in windows 98?
I did also run speedsys and there you dont see a performance hit at all, neither from the memory upgrade nor from the bios update, but DOS does not fill the memory top down so I can understand that

Is the performance inpact of going over the cachable range in windows overrated?
If a bios update can have just as much as a performace inpact as going over the cacheable range then perhaps lots of people run their systems with unnecessary small ammounts of memory?
Running Windows 98 SE with 64mb memory does not leave much memory free at all but with 256mb there is plenty of breathing room.

I would like some input on these matters, whats your experience?

New PC: i9 12900K @5GHz all cores @1.2v. MSI PRO Z690-A. 32GB DDR4 3600 CL14. 3070Ti.
Old PC: Dual Xeon X5690@4.6GHz, EVGA SR-2, 48GB DDR3R@2000MHz, Intel X25-M. GTX 980ti.
Older PC: K6-3+ 400@600MHz, PC-Chips M577, 256MB SDRAM, AWE64, Voodoo Banshee.

Reply 1 of 39, by Mau1wurf1977

User metadata
Rank l33t++
Rank
l33t++

Simply get a CPU with on-die L2 cache and you don't have to worry about cachable area.

These are the K6-2+ and K6-3+

My website with reviews, demos, drivers, tutorials and more...
My YouTube channel

Reply 2 of 39, by TheMAN

User metadata
Rank Oldbie
Rank
Oldbie

the K6-III has on chip L2, the K6-III+ is just a low power version of the CPU, not a whole lot different
official BIOS support for these CPUs weren't always great, so the unofficial ones found here tend to be better:
http://web.inter.nl.net/hcc/J.Steunebrink/k6plus.htm

Reply 3 of 39, by noshutdown

User metadata
Rank Oldbie
Rank
Oldbie
Mau1wurf1977 wrote:

Simply get a CPU with on-die L2 cache and you don't have to worry about cachable area.

These are the K6-2+ and K6-3+

it really depends on your mainboard's memory performance.
for via boards, disabling the onboard cache causes a 15-20% performance loss even for a k6-2+ with on-die l2 cache, for a cacheless k6-2 it can be up to 25-30%. this is because via boards have poor memory performance, so it has to access the onboard cache rather than accessing sdram.
for ali boards, disabling onboard cache has only 1-2% impact on performance because they have very fast memory performance, which is almost as fast as accessing onboard cache, making the onboard cache rather useless.
heres the comparision of superpi results of k6-2+550 on mvp3 and ali5 boards:
mvp3 with 2mb onboard cache 5:19
mvp3 with onboard disabled 6:45
ali5 with 512kb onboard cache 5:41
ali5 with onboard disabled 5:45

Reply 4 of 39, by Skyscraper

User metadata
Rank l33t
Rank
l33t

The bios I used is not on Steunebrinks list but should at least work with a normal k6-2 and the board can be jumpered to give as low as 2.0v so thats not an issue .
If the board dosnt have real support for the k6-3s cpus will the internal cache still work?
I found a german company selling k6-3+ 400 with 1.6 core voltage supposed to be able to run at much higher speeds when given 2.0v. They seem to have thousands of them so im in no rush.

Is the poor memory performance general for all Via socket 7 chipsets? I dont experience any noticeable performance loss when using the system.
Perhaps its worse with CPUs that use higher multipliers since they are faster in comparison to the memory and get bottlenecked? My 166mmx runs with the 3.5 muliplier and 83mhz fsb.
Or perhaps its only superpi and simular benchmarks that are very cache dependant that shows a huge performance loss.
I will try superpi with 64mb and 256mb and report back.

New PC: i9 12900K @5GHz all cores @1.2v. MSI PRO Z690-A. 32GB DDR4 3600 CL14. 3070Ti.
Old PC: Dual Xeon X5690@4.6GHz, EVGA SR-2, 48GB DDR3R@2000MHz, Intel X25-M. GTX 980ti.
Older PC: K6-3+ 400@600MHz, PC-Chips M577, 256MB SDRAM, AWE64, Voodoo Banshee.

Reply 5 of 39, by noshutdown

User metadata
Rank Oldbie
Rank
Oldbie
Skyscraper wrote:
The bios I used is not on Steunebrinks list but should at least work with a normal k6-2 and the board can be jumpered to give as […]
Show full quote

The bios I used is not on Steunebrinks list but should at least work with a normal k6-2 and the board can be jumpered to give as low as 2.0v so thats not an issue .
If the board dosnt have real support for the k6-3s cpus will the internal cache still work?
I found a german company selling k6-3+ 400 with 1.6 core voltage supposed to be able to run at much higher speeds when given 2.0v. They seem to have thousands of them so im in no rush.

Is the poor memory performance general for all Via socket 7 chipsets? I dont experience any noticeable performance loss when using the system.
Perhaps its worse with CPUs that use higher multipliers since they are faster in comparison to the memory and get bottlenecked? My 166mmx runs with the 3.5 muliplier and 83mhz fsb.
Or perhaps its only superpi and simular benchmarks that are very cache dependant that shows a huge performance loss.
I will try superpi with 64mb and 256mb and report back.

k6-3(and k6-2+)'s on-die cache doesn't need any motherboard support, it will always work as long as the board can power on.
and you got your pmmx running at 291? how much voltage do you need? i tried to push a mmx233 to 300 with 3.0vcore and its not fully stable. superpi finishes at 6:48 but 3dmark crashes sometimes.

Reply 6 of 39, by Skyscraper

User metadata
Rank l33t
Rank
l33t

Here are the results

The board is jumpered to 3.5*83=292 mhz. I use 3.3v core 3.3v IO
The bios settings used is turbo memory speed for both 64mb and 256mb.
Read and write pipelines activated for 64mb (only read pipline enabled produced a 1% worse result)
Only read pipeline enabled for 256mb (both pipelines enabled produced a 2% worse result)

With 64mb superpi 1m took 585s
With 256mb superpi 1m took 626s
The diffrence is 6.5%
Both scores are pretty bad considering the clock frequency but thats how it is with Via chipsets I guess.
A intel chipset running a pentium mmx at 3.5*83mhz should do superpi 1m in ~540s from what I have seen on other sites.
I will try 128mb to see if it produces the same result as 256mb

phgl.jpg

u41o.jpg

Great to hear that the internal L2 cache in k6-3 doesnt need motherboard support to work.

i tried to push a mmx233 to 300 with 3.0vcore and its not fully stable. superpi finishes at 6:48 but 3dmark crashes sometimes.

6:48 is a great superpi 1m score for a pentium mmx, I bet its because you have at least 1mb of cache on your motherboard so the cpu dosnt even have to access the ram 😀

[edit] With 128mb superpi 1m took 651s. Can anyone explain why superpi is much much slower with 128mb compared to with 256mb? [/edit]

New PC: i9 12900K @5GHz all cores @1.2v. MSI PRO Z690-A. 32GB DDR4 3600 CL14. 3070Ti.
Old PC: Dual Xeon X5690@4.6GHz, EVGA SR-2, 48GB DDR3R@2000MHz, Intel X25-M. GTX 980ti.
Older PC: K6-3+ 400@600MHz, PC-Chips M577, 256MB SDRAM, AWE64, Voodoo Banshee.

Reply 7 of 39, by noshutdown

User metadata
Rank Oldbie
Rank
Oldbie

to me, superpi score in windows98 seems quite inconsistant, getting multiple results from a same rig can vary by a few percents. my suggestion is to run games like quake or 3dmark for a few minutes before running superpi to "warm it up".
another well known issue is the "low memory bug", as it usually boosts by another a few percents than normal when you have low memory on your rig(probably <=128mb). this is sometimes considered a bit of cheating, and is unrelated to the cacheable ram size limit, as it has been observed on many p6 or newer rigs that don't have this limit at all.
vpx chipset can cache 128mb with 512k cache in write-through mode, or 64mb in write-back mode, you would need to find which mode your board is running. i remember sisoft may had mentioned that in somewhere, or you can try using wpcredit to find it.
last, try comparing results with onboard cache enabled and disabled in bios.

Reply 8 of 39, by Skyscraper

User metadata
Rank l33t
Rank
l33t

The manual states that the board can cashe 64 mb memory with its 512kb cache so that would indicate that chipset uses write-back mode.
The bad superpi 1m score with 128mb compared to the score with 64mb also supports this. Why is the score with 256mb much better than the score with 128mb? I find this very strange.
The superpi 1m scores with a specific configuration do not seem to differ more than 1-3s or (max 0.5%) on this machine. When I ran superpi on a c2d@4.5 a few years ago I saw several percent diffrence from run to run since 0.1s were over 1%.
I will do some runs with the cache disabled but I think that disabling the L2 cache altogether will have a much higher performance inpact than running with more memory than the board can cache.
The memories of a friends slow cacheless celeron 233 will haunt me forever.

New PC: i9 12900K @5GHz all cores @1.2v. MSI PRO Z690-A. 32GB DDR4 3600 CL14. 3070Ti.
Old PC: Dual Xeon X5690@4.6GHz, EVGA SR-2, 48GB DDR3R@2000MHz, Intel X25-M. GTX 980ti.
Older PC: K6-3+ 400@600MHz, PC-Chips M577, 256MB SDRAM, AWE64, Voodoo Banshee.

Reply 9 of 39, by noshutdown

User metadata
Rank Oldbie
Rank
Oldbie
Skyscraper wrote:
The manual states that the board can cashe 64 mb memory with its 512kb cache so that would indicate that chipset uses write-back […]
Show full quote

The manual states that the board can cashe 64 mb memory with its 512kb cache so that would indicate that chipset uses write-back mode.
The bad superpi 1m score with 128mb compared to the score with 64mb also supports this. Why is the score with 256mb much better than the score with 128mb? I find this very strange.
The superpi 1m scores with a specific configuration do not seem to differ more than 1-3s or (max 0.5%) on this machine. When I ran superpi on a c2d@4.5 a few years ago I saw several percent diffrence from run to run since 0.1s were over 1%.
I will do some runs with the cache disabled but I think that disabling the L2 cache altogether will have a much higher performance inpact than running with more memory than the board can cache.
The memories of a friends slow cacheless celeron 233 will haunt me forever.

its not uncommon for mainboards manuals to have incorrect specs. for example, manuals of most mvp3 boards only listed supporting up to 128mb dimms, because 256mb dimms were very rare when the manuals were printed. they wouldn't have future cpus on the supported list either.
if a program is running in memory outside the cacheable range, it shall be the same as running with cache disabled.
the celeron without l2 cache performs horribly because the p6 structure relies more heavily on ram bandwidth and l2 cache, especially in the 16-256kb range.

Reply 10 of 39, by Skyscraper

User metadata
Rank l33t
Rank
l33t

Superpi 1m with 256mb and with the l2 cache deactivated diddnt hurt the performance much more so you were right about that.
It did have some inpact, the score is ~10s worse than with the cache activated. The 256 mb score without cache is still better than the 128mb score with cache.
It seems that the high density memory on the 512mb module have some inpact?
I cant see the memory timings in the bios. I can only choose betweeen slow, normal, fast, turbo and spd. Since its cas3 memory the turbo setting should be the fastest and should not differ between modules?
The diffrence between 64 mb fully cached memory and no cached memory at all (using the 512mb seen as 256mb module) with the l2 cache disabled in the bios is 7-8%.
GLQuake timedemo only loses a single fps 59 vs 58 fps.

I will try some more modules like a single sided 256mb which should be identified as 128mb if things are consistent.
It should have equal or better performance compared to the 512mb seen as 256mb since its also high density but leaves greater percentage of the memory cached.
[edit] The scores were identical [/edit]

The cache on a board with Via VPX chipset has much less inpact than I thought.

Last edited by Skyscraper on 2013-09-30, 13:49. Edited 2 times in total.

New PC: i9 12900K @5GHz all cores @1.2v. MSI PRO Z690-A. 32GB DDR4 3600 CL14. 3070Ti.
Old PC: Dual Xeon X5690@4.6GHz, EVGA SR-2, 48GB DDR3R@2000MHz, Intel X25-M. GTX 980ti.
Older PC: K6-3+ 400@600MHz, PC-Chips M577, 256MB SDRAM, AWE64, Voodoo Banshee.

Reply 11 of 39, by TELVM

User metadata
Rank Oldbie
Rank
Oldbie

Got intrigued and tested Super PI 1.1 on my K6-2+ / GA-5AX rig with on-motherboard "L3" enabled and disabled. Very little difference, 336 vs 338 seconds.

thumb-132380620952497702ef0da.jpg ..... thumb-16995662165249777d36b13.jpg

Let the air flow!

Reply 12 of 39, by Skyscraper

User metadata
Rank l33t
Rank
l33t

I really have to get a k6-2+ or k6-3(+)
I can get one for around 10 euro on german ebay but the shipping to sweden cost just as much.
I will wait and see if one turns up locally first.

I did run superpi with the single sided 256mb (identified by the board as 128mb) and it matched the scores I got using the 512mb(256mb) module exactly.
So the high density chips seem to have some sort of impact.

New PC: i9 12900K @5GHz all cores @1.2v. MSI PRO Z690-A. 32GB DDR4 3600 CL14. 3070Ti.
Old PC: Dual Xeon X5690@4.6GHz, EVGA SR-2, 48GB DDR3R@2000MHz, Intel X25-M. GTX 980ti.
Older PC: K6-3+ 400@600MHz, PC-Chips M577, 256MB SDRAM, AWE64, Voodoo Banshee.

Reply 13 of 39, by noshutdown

User metadata
Rank Oldbie
Rank
Oldbie
TELVM wrote:
Got intrigued and tested Super PI 1.1 on my K6-2+ / GA-5AX rig with on-motherboard "L3" enabled and disabled. Very little differ […]
Show full quote

Got intrigued and tested Super PI 1.1 on my K6-2+ / GA-5AX rig with on-motherboard "L3" enabled and disabled. Very little difference, 336 vs 338 seconds.

thumb-132380620952497702ef0da.jpg ..... thumb-16995662165249777d36b13.jpg

which revision is your 5ax? if its 4.x or older then its onboard cache can only cache for 128mb of ram, and i see that your rig seems to have 640mb, so better test again with 128mb installed.
my results(5:41 and 5:45) are done on 5ax rev5.2, which can cache for 512mb of ram.

Reply 15 of 39, by Skyscraper

User metadata
Rank l33t
Rank
l33t

While the performance hit when installing more memory than the motherboard can cache is manageable with a k6-2+, k6-3 or even a Pentium mmx the same can not be said for a k6-2
Here are some numbers that show the catastrofic performance hit the k6-2 suffer with more than 64mb memory on a Soyo Sy-5eas Via VPX board.

Sandra = Sisoft Sandra 99
GLquake runs on a voodo2
3dmark 99mx also runs on a voodo 2

As a refeernce

Pentium mmx@292 mhz with 256mb mostly uncached memory.
GLQuake timedemo demo1 58.3 fps
Superpi 1m 10:26

Pentium mmx@292 mhz with cached 64mb
GLQuake timedemo demo1: 59.1 fps
Superpi 1m: 9:45

In Sandra Cached or unchaced makes no real inpact on the Pentium mmx, the numbers stay within a few points exept the memory score.
CPU bench: cpu dhrystone 666 fpu whetstone 333
Multimedia: Integer 680, fpu 232
Memory: cpu, 127 fpu 127, with 256mb cpu 118 fpu 118

3dmark 99 max diddnt care if the Pentium mmx ran with 64 or 256 mb, the numbers were more or less the same.
1568 3dmarks 1987 cpu marks cached 1550 3dmarks and 19xx cpu mark unchaced (I cant read my hand written note)

I started of my K6-2 testing with 292 mhz so I could compare against the pentium mmx at the same speed.

K6-2 450@292 mhz with 256 mb mostly uncached memory
GLQuake timedemo demo1: 50.7 fps
Superpi 1m: 15:10, ouch this dosnt look good.
Sandra CPU bench: Cpu dhrystone 770, fpu whetstone 343
Sandra multimedia: integer 768 fpu 575
Sandra memory: cpu 81, fpu 85

The K6-2 looks good in Sandra exept for memory performace
I did not run 292 mhz with 64 mb cached memory with the k6-2
I increased the speed to 450 mhz still with 256mb hoping that would mitigate the performance loss in GLquake and superpi, it diddnt.

K6-2 450 @ 5.5*83 = 450 mhz, 256mb mostly uncached memory
Superpi 1m: 13:23, ok some improvement but very slow for a 450 mhz cpu
GLquake: 56.7 fps, still beaten by the pentium
3dmark 99 max: 2006 3dmarks 4775 cpu, fps is jumping all around in the game tests
Sandra CPU bench: 1061 cpu dhrystone 550 FPU whetstone
Sandra Multimeda: cpu 1211, fpu 907
Sandra memory: integer 82, fpu 87

K6-2 450 @ 5.5*83 =450 mhz, 64 mb cached memory
Superpi 1m 10.01, better but still beaten by the Pentium mmx at 292mhz
GLquake: 58 fps, same as above
3dmark 99 max: 2243 3dmarks 5601 cpu, 3d now really helps in 3dmark, now smoother with cached memory
Sandra CPU bench :1167 cpu dhrystone 550 FPU whetstone, not a huge diffrene
Sandra Multimeda: cpu 1210, fpu 907, lost a single point with cached memory
Sandra memory: integer 92, fpu 97, gained some

K6-2 450 @ 6*83 = 501 mhz, 64mb cached memory
Superpi 1m: 9.42, finally beats the p166mmx@292mhz withs 3s
GLquake: 58.3 fps, still beaten by the Pentium
3dmark 99 max: 2250 3dmarks 5752 cpu, here we have reached a wall.
Sandra CPU bench: 1256 cpu dhrystone 605 FPU whetstone, improvement at least
Sandra Multimeda: cpu 1321, fpu 989, still good scaling
Sandra memory: integer 86, fpu 90, worse than @450 mhz, much worse than the Pentium mmx with uncached memory

The badly formatted wall of text above shows that a vanilla k6-2 really really needs the memory to be cached
The wall of text does not tell the whole story though, the fps in 3dmark was much much smoother with cached memory, hard to show with numbers.

Its strange that the k6-2 loses memory performace with the 6x multiplier.
I wonder if the motherboard changes some timings when it expects a slow cpu with 2x multiplier compaired to a cpu that uses the 5.5x multiplier.
Or do the motherboard even know what multiplier the board is jumpered for since the jumpers only change a few singnals to the cpu so the cpu knows what multiplier to use?

New PC: i9 12900K @5GHz all cores @1.2v. MSI PRO Z690-A. 32GB DDR4 3600 CL14. 3070Ti.
Old PC: Dual Xeon X5690@4.6GHz, EVGA SR-2, 48GB DDR3R@2000MHz, Intel X25-M. GTX 980ti.
Older PC: K6-3+ 400@600MHz, PC-Chips M577, 256MB SDRAM, AWE64, Voodoo Banshee.

Reply 16 of 39, by noshutdown

User metadata
Rank Oldbie
Rank
Oldbie

the degrading of socket7 memory performance at 6x clock is a myth, we are not sure why it happens but it has been confirmed by many people, on probably all boards. all i can say is stick to 5.5x for best performance.
besides, no doubt that uncached memory degrades more than cached ones, since you don't need to access memory when the data is in cache.

Reply 17 of 39, by Skyscraper

User metadata
Rank l33t
Rank
l33t

I will back down to the 5.5x multiplier as soon as I get my hands on a super socket 7 baby AT motherboard. For the moment the games still gain from the 6x multiplier since I only have 83mhz fsb.
I also find it stange that the k6-2 memory scores are so bad in Sisoft Sandras even with the 5.5x multiplier and with fully cached memory.

In speedsys the k6-2 beats the pentium mmx without breaking a sweat.

New PC: i9 12900K @5GHz all cores @1.2v. MSI PRO Z690-A. 32GB DDR4 3600 CL14. 3070Ti.
Old PC: Dual Xeon X5690@4.6GHz, EVGA SR-2, 48GB DDR3R@2000MHz, Intel X25-M. GTX 980ti.
Older PC: K6-3+ 400@600MHz, PC-Chips M577, 256MB SDRAM, AWE64, Voodoo Banshee.

Reply 18 of 39, by elianda

User metadata
Rank l33t
Rank
l33t

How reproducible are your benchmark scores?
I just ask because with 64 MB cached and 256 MB equipped the memory layout of the running application in physical memory is crucial for the scores. But this is usually not under your control.

Wouldn't it be better just to bench with cache switched off?
What are your memory timings?

Retronn.de - Vintage Hardware Gallery, Drivers, Guides, Videos. Now with file search
Youtube Channel
FTP Server - Driver Archive and more
DVI2PCIe alignment and 2D image quality measurement tool

Reply 19 of 39, by Skyscraper

User metadata
Rank l33t
Rank
l33t

Some tests I only ran once so no I would not write a paper based on my findings.

The tests I did run more then once with the exakt same settings always produced a result with less then 1% deviation. Superpi with the exact same settings and memory module produced nearly identical result every time.
The diffrence between the cache disabled and enabled with 256mb memory was also never much over 1% in windows but I only did those tests with the Pentium mmx. Cache on always produced the higher result, some caching was going on just not much.

I think some low level parts of the operating system loads from the cashed first 64mb and the rest of the operating system and all programs get their memory top down. If that is correct (big if) then with 256mb the benchmarks will pretty much always get uncached memory. With 128mb the chanse would be greater that the programs acually get cached memory allocated?
I did some tests with a single-sided 256mb module seen by the system as 128mb. The tests always produced the same result as the double-sided 512mb module seen as 256mb which I used for the 256mb tests. This indicates that I am wrong.
.
Perhaps someone who is not talking out of their ass could fill in since I only speculate 😁 Exept for the last week I have not benched under windows 98 or even used windows 98 for a very very long time.

To answer your question, the reson for running the tests with cache on even with 256mb is, because its the fastest setting and the one I will use if I decide that I need 256mb ram.

New PC: i9 12900K @5GHz all cores @1.2v. MSI PRO Z690-A. 32GB DDR4 3600 CL14. 3070Ti.
Old PC: Dual Xeon X5690@4.6GHz, EVGA SR-2, 48GB DDR3R@2000MHz, Intel X25-M. GTX 980ti.
Older PC: K6-3+ 400@600MHz, PC-Chips M577, 256MB SDRAM, AWE64, Voodoo Banshee.