VOGONS


Let's benchmark our systems with caches disabled

Topic actions

Reply 180 of 189, by clueless1

User metadata
Rank l33t
Rank
l33t
Baoran wrote:

I added some benchmark results to the bottom of the spreadsheet. One reason why I started testing the system was because I wanted to know how much graphics card really matters when you have disabled caches. That is why there is double tests with 2 different graphics cards.
Basically that trident graphics decelerator turned my 486 into a 386 when it comes to most games. I think using a slower graphics card can be used to fine tune system speed when trying to run old games.
I used bios default settings with the tests.

Thank you! It's great to see actual results comparing VLB vs ISA with caches disabled. I'm a little surprised that there's as much difference as there is at the slowest speeds. Even though the synthetic benchmarks don't really show it, Doom does. Good work, Baoran. When I get a chance, I'll create a separate tab for your results. You're welcome to do it too if you get to it before I do.

The more I learn, the more I realize how much I don't know.
OPL3 FM vs. Roland MT-32 vs. General MIDI DOS Game Comparison
Let's benchmark our systems with cache disabled
DOS PCI Graphics Card Benchmarks

Reply 181 of 189, by Baoran

User metadata
Rank Oldbie
Rank
Oldbie

Since I used bios default when doing those benchmarks. I also did 2 sets of benchmarks when changing memory timings in bios to see what is slowest and fastest the system can do. The bios auto default most likely chooses something between those.

Trident 9000A and slowest memory timings and Turbo+L1+L2 disabled:
3dbench: 1.1 fps
Pcpbench: 0.3 fps
Speedsys: 1.48
Doom: 171879 realticks

Tseng ET4000/W32P and fastest memory timings:
3dbench: 27.7 fps
Pcpbench: 5.9 fps
Speedsys: 12.47
Doom: 5252 realticks

I didn't put them in the spreadsheet because I thought it isn't really about changing bios settings to improve or lower the performance. I mainly just wanted to see how close I can get to something like 4.77Mhz performance.

Reply 182 of 189, by zerker

User metadata
Rank Member
Rank
Member

So I added some entries to this list. I also wanted to benchmark the effects of CPU throttling. I used fdapm for this, but throttle has the same effect. I wasn't sure how you would want this sort of thing marked; I just put a extra artificial multiplier on the clock speed column. I'm pretty happy with the flexibility this technique offers, and I'm sure it will be useful to others.

Yes, I can also combine the two techniques, but that's way too much benchmarking for one morning.

Also, I couldn't extend the Doom FPS column to calculate for my results, since it is protected. I assume you go in and do that after?

EDIT: I couldn't resist, so I also added a "how low can you go" option of 1/8 throttle + both caches disabled. I think I've succeeded. The benchmarks are the slowest on the sheet so far. I did a test run of Ultima 2... and it seems to be running at the correct speed? So I guess approximate 8088 speed?

Reply 183 of 189, by clueless1

User metadata
Rank l33t
Rank
l33t
zerker wrote:

I couldn't extend the Doom FPS column to calculate for my results, since it is protected. I assume you go in and do that after?

EDIT: I couldn't resist, so I also added a "how low can you go" option of 1/8 throttle + both caches disabled. I think I've succeeded. The benchmarks are the slowest on the sheet so far. I did a test run of Ultima 2... and it seems to be running at the correct speed? So I guess approximate 8088 speed?

I took care of the autofill for you. And wow, that does look like 8088 speeds! That's pretty cool 😎 Thank you for adding your results!

The more I learn, the more I realize how much I don't know.
OPL3 FM vs. Roland MT-32 vs. General MIDI DOS Game Comparison
Let's benchmark our systems with cache disabled
DOS PCI Graphics Card Benchmarks

Reply 184 of 189, by gdjacobs

User metadata
Rank l33t++
Rank
l33t++

So I benchmarked a spare S754 machine to see how it behaves with cache manipulation. As I expected, performance is hit hard with cache disabled leaving too wide a performance gap (between slow Pentium and fast non Tualatin P3) to close with clock multipliers. No Doom benchmarks for the moment due to DOS4gw crashes.

jCMzxk8.png

K8_speedsys.png
Filename
K8_speedsys.png
File size
22.3 KiB
Views
199 views
File license
Fair use/fair dealing exception
K8_speedsys_nocache.png
Filename
K8_speedsys_nocache.png
File size
20.2 KiB
Views
199 views
File license
Fair use/fair dealing exception

All hail the Great Capacitor Brand Finder

Reply 185 of 189, by Baoran

User metadata
Rank Oldbie
Rank
Oldbie

What happened to the spreadsheet or is the problem with my web browser? It only shows 486dx2-66 stuff in the main results and I feel like much of the data is missing that used to be there.

Edit: The spreadsheet seems to be back to normal now. No idea what the issue was

Reply 186 of 189, by clueless1

User metadata
Rank l33t
Rank
l33t

I have that tab unlocked for all to add results. Someone deleted a bunch of rows on January 12th. I restored the spreadsheet to January 11th. Thanks for pointing it out! That's the chance you take with leaving things open like that. You hope people won't screw things up, but there's always someone...

edit: actually, it looks like the 3dbench column has a bunch of inaccurate results. I don't have time right now to figure out when things got screwed up. I may just have to restore a much older backup if that's the case and lock the whole page. Make it read only and add the results myself as I have time. Sigh...

edit: I ended up restoring all the way back to March of 2019 because that was when the 3dbench column first got screwed up. Sorry if any results added since then were wiped out. We need to have basic spreadsheet skills if we want to contribute, and if something happens that you can't fix, reply here and I'll restore a backup immediately instead of months later.

The more I learn, the more I realize how much I don't know.
OPL3 FM vs. Roland MT-32 vs. General MIDI DOS Game Comparison
Let's benchmark our systems with cache disabled
DOS PCI Graphics Card Benchmarks

Reply 187 of 189, by Dochartaigh

User metadata
Rank Member
Rank
Member

Is there a list of which types of motherboards allow the different cache's to be turned off? Just looking to learn some rule of thumbs here. For example, if I aim for a computer with a 200mhz and below processor will those (almost?) ALWAYS have these different cache options available for me to slow it down?

I'm thinking about a DOS-only build (to supplement my faster Win98 PC which doesn't play nice with a bunch of DOS games), and I'm having a hard time finding this information...and even a harder time even looking up old computers to see what exact MB and/or exact processor they have inside (seems like a lot of this information was before the internet was in full bloom...and a lot of 404 file not founds on the links I've been following).

Reply 188 of 189, by clueless1

User metadata
Rank l33t
Rank
l33t
Dochartaigh wrote on 2020-01-30, 23:35:

Is there a list of which types of motherboards allow the different cache's to be turned off? Just looking to learn some rule of thumbs here. For example, if I aim for a computer with a 200mhz and below processor will those (almost?) ALWAYS have these different cache options available for me to slow it down?

I'm thinking about a DOS-only build (to supplement my faster Win98 PC which doesn't play nice with a bunch of DOS games), and I'm having a hard time finding this information...and even a harder time even looking up old computers to see what exact MB and/or exact processor they have inside (seems like a lot of this information was before the internet was in full bloom...and a lot of 404 file not founds on the links I've been following).

Setmul is the gold standard for enabling/disabling caches easily on retro systems. Its readme gives a good guideline on which CPUs have which capabilities.

SetMul v1.24 - Multiplier control for VIA C3, AMD K6+7+8 Mobile and Cyrix 5x86 G. Broers 2014..2019 - Free for non-profit use. […]
Show full quote

SetMul v1.24 - Multiplier control for VIA C3, AMD K6+7+8 Mobile and Cyrix 5x86
G. Broers 2014..2019 - Free for non-profit use.

DESCRIPTION
The main purpose of this program is to quickly change the multiplier of a select
range of x86 processors in MS-DOS and Windows 9X. In addition it can enable and
disable processor L1 and L2 caches.
SetMul does not stay resident, but the toggled CPU registers will remain as set,
until the next system reset/reboot.

BACKGROUND
Many DOS game and programs have issues with a CPU speed higher then expected,
while other software benefits from increased CPU speed. Having a means to
adjust the processor speed is important for making a system suitable for a
broad range of vintage software.
Originally a classic Pentium processor multiplier was set through jumpers,
without any means to adjust this through software. Around 2000 came a line of
'mobile' processors specifically aimed at laptops, which had a new feature to
preserve battery life when idle:
- Intel called this feature (Enhanced) SpeedStep.
- AMD called this feature PowerNow!, later renamed to Cool'n'Quiet.
- VIA/Centaur called this feature Longhaul, later renamed to PowerSaver.
In all cases it is about temporarily decreasing the processor multiplier by
software. The resulting net processor speed is the Front Side Bus speed times
the selected multiplier. For example: 66 MHz FSB times 5.5 = 366 MHz.

MULTIPLIER OPTIONS
This table shows the available options for each supported processor:
VIA C3 Samuel 1: 3.0x to 8.0x, 11 choices
VIA C3 Samuel 2 step 0: 3.0x to 8.0x, 11 choices
VIA C3 Samuel 2 step 1+: 3.0x to 12.0x, 16 choices
VIA C3 Ezra: 3.0x to 12.0x, 16 choices
VIA C3 Ezra-T: 3.0x to 16.0x, 27 choices
VIA C3 Nehemiah: 4.0x to 16.0x, 25 choices
AMD K6-2+ / K6-III+: 2.0x to 6.0x, 8 choices (2.5x is excluded)
AMD K7 Mobile (Athlon) 3.0x to 24.0x, 32 choices
AMD K8 (Athlon 64 etc.) 4.0x to 25.0x, 22 choices (integer only)
Cyrix 5x86 1.0x to jumpered multiplier (2, 3 or 4), 2 choices
A VIA C3 may, or may not work reliably at a total core speed below 250MHz.

BUILD-IN CACHE OPTIONS
Disabling L1 Cache makes a processor very slow, at least half the processing
speed is cut. Disabling L2 Cache also slows the net speed, but has far less
impact.
SetMul allows disabling the L1 cache on any x86 processor from the 486 onwards.
SetMul allows disabling the L2 cache on the K6 Mobile and VIA C3. Note that
the C3 Samuel 1 has no L2 cache.
Contrary to many other cache disabling tools it still works when EMM386 or
Windows 9X are loaded.

PARAMETERS
/? - default help screen.
[Multiplier] - as a single digit like '5', or '5.0', or halves like '5.5'.
L1D - L1 Cache Disable.
L1E - L1 Cache Enable.
L1DX - L1 Cache Disable, exclusively, leaves L2 untouched, for K6-2+/III+.
L1EX - L1 Cache Enable, exclusively, leaves L2 untouched, for K6-2+/III+.
L2D - L2 Cache Disable, but cannot toggle any motherboard L2/L3 cache.
L2E - L2 Cache Enable, but cannot toggle any motherboard L2/L3 cache.
ICD - L1 I-Cache Disable, on VIA C3 or Winchip.
ICE - L1 I-Cache Enable, on VIA C3 or Winchip.
BPD - Branch Prediction Disable, on VIA C3.
BPE - Branch Prediction Enable, on VIA C3.
INFO - Show PowerNow! info of AMD K7 or K8.
CMD - Disable clock speed measurement.

Pentium P54C and MMX test register "TR12" options. Parameters:
BPD - Disable Branch Prediction.
VPD - Disable V Pipeline.
L1DX - Disable L1 cache exclusively.
CCD - Disable L1 code cache.
DCD - Disable L1 data cache.
PFE - Pentium Features Enable; Resets the above TR12 options to default.
The status of register TR12 cannnot be read by design.

Multiple commands can be passed at once.

Running SetMul on a K6 mobile / VIA C3 without parameters gives the current
speed. It will also give the multiplier range and parameters that apply.

Also supported are 4 or 5 wide raw bit patterns: like '1010b' or '01010b'. But
these values are not checked for support, and allow for faulty register input!

HARDWARE AND OPERATING SYSTEM COMPATIBILITY
- Runs on a 386 or later x86 Processor.
- Compatible with MS-DOS, both with and without EMM386 loaded.
- Compatible with Windows 95, 98 and ME.
- SetMul requires CWSDPMI.exe or a compatible DPMI host.
- SetMul sets up a Ring0 exploit to get privileged access to the CPU registers.
Windows NT/2K/XP/Vista/7/8/10 or later cannot be fooled this way,
these operating systems are not supported.
- SetMul or CWSDPMI may conflict with motherboard chipset Throttling or with
ISA emulation drivers for PCI sound cards.
- Athlon K7 multiplier adjustment requires support from the motherboard
chipset to work. The earliest K7 chipsets do not offer this support.

DISCLAIMER
Use SetMul at your own risk! The author takes no responsibility for loss
of data or damage to hardware through the use of this software.
This program is for vintage hardware hobby use only. It has not been
sufficiently tested to be used while simultaneously working on important data.

ALTERNATIVES
C3Mul for DOS, all functionality retained in SetMul. Relies on CWSDPR0.exe.
WCPUID for Windows (works with Samuel 1, does not work with Ezra-T)
CrystalCPUID for Windows (Does not work with Samuel 1, works with Ezra-T)
K6DOS config.sys Driver for DOS, and K6Speed for windows.
AMD K6 Central Tweaking Unit 'CTU', for Windows.
falcosoft.hu/ has DOS-based multiplier tools for AMD Athlon etc.

VERSION HISTORY

v1.1 of 20-05-2014
- Fixed protection fault when running SetMul on a 486 system
- K6-2+/III+ : Exclusively toggle L1 cache.
- Pentium Pro/2/3 toggle L2 cache.
(Unfortunately Pentium Pro has been reported not to work with this)
- Winchip C6 toggle I-cache.
- Pentium P54C test register "TR12" options.

v1.2 of 08-07-2015
- Cyrix 5x86 support.
It can switch between the jumpered startup multiplier and 1.0x and back.
Some intended functions do not work, despite following the datasheet by the letter.
Like Multiplier readout, and Half-speed enable/disable (Parameter HSE and HSD).
Half-speed may only come to effect in idle mode of the CPU. Regardless,
these options remain in the program for now.
- Recompiled, seems to have fixed protection fault in v1.2A.

v1.21 of 22-02-2017
- Shows the bootup multiplier of the Pentium Pro, II and III. (Cannot change it).

v1.23 of 19-01-2019
- AMD K7 mobile and K8 multiplier adjustment and readback.
- Better checks in case of unexpected parameters.
- Changed CWSDPMI version to 025 of 2006 as suggested by Mercury127.
- Expanded and cleaned up this Document.

v1.24 of 04-02-2019
- Improved accuracy of RDTSC clockspeed measurements.
- Added alternate clockspeed measurement of older processors, like 386/486/Cx5x86.
- More consistent message colors and CPU information display.
- Cleaned up debug output mode, which is triggered with parameter DBG.
- Added option to disable clockspeed measurements with parameter CMD.
- Retain some CPU information functionality within Windows NT based OS.
- Overhauled CPU identification with proper use of extended family and model.

CREDITS
Author: G. Broers members.quicknet.nl/lm.broers/
Thanks to:
- C3MUL source, 2001/4/28, blue.ribbon.to/~als4kmaniac/i2/
- CrystalCPUID sources by hiyohiyo.
- RayeR for the idea on Ring0 access through DJGPP.
- FalcoSoft for testing AMD K8 Support.
- osdev.org for the concept of clockspeed measurement using XOR.
- DJGPP and GCC team.

Hint: It's not so much a function of the motherboard, it's more on the cpu. Although, there are some motherboards that do not have an L2 cache, in which case that would obviously be one less variable that Setmul could control.

The more I learn, the more I realize how much I don't know.
OPL3 FM vs. Roland MT-32 vs. General MIDI DOS Game Comparison
Let's benchmark our systems with cache disabled
DOS PCI Graphics Card Benchmarks

Reply 189 of 189, by Terracresta

User metadata
Rank Newbie
Rank
Newbie

After having lots of trouble reaching low 3dbench2 scores because deactivating the mainboard cache caused the system to hang, I finally had a breakthrough.

In case anyone has similar problems, it might be caused by too much RAM. I read that activating Memory Hole At 15M-16M might limit the RAM to 15MB on some MBs (helpful for some games too) and it indeed does on my MB. Out of couriosity I tried deactivating the MB cache with this option active and finally it didn't hang anymore! *YAY*

Compared to Phil's 136 in 1 PC, mine is faster if all Caches are active. Now with system set to 120Mhz (Setmul reports 141), Phil's Setmul preset Slowest and a deactived MB cache, I went down from a minimum of 26.8 to 10.7 (386 SX25 speed) in 3dbench2, beating Phil's lowest score without a slow ISA graphics card like the OAK. Will run all the other benchmarks tomorrow (well, today as it's 1:30 AM).

System is a P233 MMX, GA-586 ATX Rev 3.0 Mainboard (Intel 430TX), 128MB SDRAM, S3 Trio64V2/DX + Voodoo2 and a Yamaha Audician 32 Plus.

P233MMX @ 120 Mhz:
SETMUL CCD DCD BPD VPD and L2 (MB) Cache ON: 26,7
SETMUL CCD DCD BPD VPD and L2 (MB) Cache OFF: 10,0
SETUL L1D (L1 Cache OFF) and L2 (MB) Cache ON: 31,9
SETUL L1D (L1 Cache OFF) and L2 (MB) Cache OFF: 12,3
P233MMX @ 233 Mhz:
SETMUL CCD DCD BPD VPD and L2 (MB) Cache ON: 33,5
SETMUL CCD DCD BPD VPD and L2 (MB) Cache OFF: 11,6
SETUL L1D (L1 Cache OFF) and L2 (MB) Cache ON: 37,9
SETUL L1D (L1 Cache OFF) and L2 (MB) Cache OFF: 13,8

I get set my CPU to 8 different clockspeeds using the dipswitches. 120, 133, 150, 166, 180, 200, 210 and 233 Mhz, but so far I only ran the tests (exept for full speed with L2 active) with 120 and 233 Mhz .

SETMUL Full Speed, L2 ON:
120 @ 115,4 fps
133 @ 128,1
150 @ 128,9
166 @ 143,2
180 @ 140,0 (wonder if all test results would be lower than @ 166 Mhz)
200 @ 155,5
210 @ 148,8 (same as with 180 Mhz, only in comparison to 200 Mhz)
233 @ 165,4