VOGONS


Reply 20 of 37, by Jasin Natael

User metadata
Rank Oldbie
Rank
Oldbie

Ok, I'm a little late getting back to this party but here are my results just from DOSBENCH.
Not sure it matters but specs:
System is a Intel Seattle 2 440BX
VIA C3 Nehemiah 1.2@1500 (100x15)
512MB PC133 RAM
Nvidia Geforce 3
Aureal Vortex 2 PCI, (ISA sound card is removed ATM)

Without Fastvid or MTRRLFBE enabled:
3DBench - 305.5
Chris' 3D Bench 640x480 - 182.8
PC Player 640x480 - 96.1
Doom Max detail - 2134 in 764
Quake 640x480 - 61.3

With Fastvid Only:
3DBench - 450.1 (wow!)
Chris' 3D Bench 640x480 - 182.5
PC Player 640x480 - 96.1
Doom Max detail - 2134 in 737
Quake 640x480 - 61.3

With Fastvid & MTRRLFBE
3DBench - 450.1 (wow!)
Chris' 3D Bench 640x480 - 182.9
PC Player 640x480 - 96.1
Doom Max detail - 2134 in 738
Quake 640x480 - 61.2

With MTRRLFBE Only
3DBench - 445.7 (Hmm)
Chris' 3D Bench 640x480 - 182.4
PC Player 640x480 - 95.7
Doom Max detail - 2134 in 740
Quake 640x480 - 61.0

So what do the experts think?

Reply 21 of 37, by sunmax

User metadata
Rank Newbie
Rank
Newbie

Hi Jasin, the numbers look promising! Very nice setup btw, GeForce3 is really classy 😀

I would expect Quake numbers to increase with MTRR WC.

Did you enable MTRR via DOSBENCH or by hand ?

Can you please get a snapshot of MTRR on a fresh boot and before any test with:

MTRRLFBE INFO

(not sure if the version included with DOSBENCH supports INFO, you might have to download 1.6)

Can you try to enable it by hand before launching DOSBENCH with:

MTRRLFBE LFB WC

And then re-run Quake ?

My current guess is that LFB WC is already enabled (either by BIOS or something in AUTOEXEC), and the 3DBench boost might be the additional enabling of VGA WC.

Thanks!

Reply 22 of 37, by sunmax

User metadata
Rank Newbie
Rank
Newbie
jtchip wrote on 2024-08-30, 23:21:

so the C7 results looked entirely normal to me.

Hi jtchip, yes this C7 is definitely turning MTRR WC on. The J7F2 comes with a Unichrome Pro IGP, which might not be the fastest performer. I think there is a PCI slot, do you have any good PCI video card at hand to see if we can further improve the numbers ?

Reply 23 of 37, by Jasin Natael

User metadata
Rank Oldbie
Rank
Oldbie
sunmax wrote on 2024-08-31, 21:55:
Hi Jasin, the numbers look promising! Very nice setup btw, GeForce3 is really classy :-) […]
Show full quote

Hi Jasin, the numbers look promising! Very nice setup btw, GeForce3 is really classy 😀

I would expect Quake numbers to increase with MTRR WC.

Did you enable MTRR via DOSBENCH or by hand ?

Can you please get a snapshot of MTRR on a fresh boot and before any test with:

MTRRLFBE INFO

(not sure if the version included with DOSBENCH supports INFO, you might have to download 1.6)

Can you try to enable it by hand before launching DOSBENCH with:

MTRRLFBE LFB WC

And then re-run Quake ?

My current guess is that LFB WC is already enabled (either by BIOS or something in AUTOEXEC), and the 3DBench boost might be the additional enabling of VGA WC.

Thanks!

I enabled by using Dosbench, not by hand.
I downloaded a fresh copy of 1.6

A rush job, so the pics aren't very good.
But here is a clean boot before it's enabled, and the output using 'info'

I did enable by by hand and ran the Quake bench again, still getting around 61fps

Attachments

  • wc1.jpg
    Filename
    wc1.jpg
    File size
    198.51 KiB
    Views
    653 views
    File license
    Fair use/fair dealing exception
  • wc2.jpg
    Filename
    wc2.jpg
    File size
    225.3 KiB
    Views
    653 views
    File license
    Fair use/fair dealing exception

Reply 24 of 37, by jtchip

User metadata
Rank Member
Rank
Member
sunmax wrote on 2024-08-31, 22:00:

The J7F2 comes with a Unichrome Pro IGP, which might not be the fastest performer. I think there is a PCI slot, do you have any good PCI video card at hand to see if we can further improve the numbers ?

The best PCI graphics card I have is a Matrox Millenium G200 SD 8MB. TBH I wouldn't expect it to be better since it's using the card as a framebuffer and limited by the PCI bandwidth of 133MB/s compared to the IGPU using host memory at 4.26GB/s (1GB of DDR2-533), unless the UniChrome Pro somehow bottlenecks it.
I'll try to re-run but may not get to it for a few days.

Reply 25 of 37, by jtchip

User metadata
Rank Member
Rank
Member
Jasin Natael wrote on 2024-09-01, 00:34:

I did enable by by hand and ran the Quake bench again, still getting around 61fps

I think your BIOS is enabling WC. On Asus boards (P2B and CUSL2-C at least), there is a "Video Memory Cache mode" setting which defaults to UC (uncached), and USWC enables WC. Not sure what the Intel board would call it.

Reply 26 of 37, by Jasin Natael

User metadata
Rank Oldbie
Rank
Oldbie
jtchip wrote on 2024-09-01, 01:26:
Jasin Natael wrote on 2024-09-01, 00:34:

I did enable by by hand and ran the Quake bench again, still getting around 61fps

I think your BIOS is enabling WC. On Asus boards (P2B and CUSL2-C at least), there is a "Video Memory Cache mode" setting which defaults to UC (uncached), and USWC enables WC. Not sure what the Intel board would call it.

That is possible I suppose. In typical Intel fashion the bios has about zero options, certainly nothing pertaining to performance.

Reply 27 of 37, by sunmax

User metadata
Rank Newbie
Rank
Newbie

Hi jtchip, yes agree. Good point on the Unichrome Pro. I got a ProSavage Twister IGP (2D core from Savage 2000 + 3D core form Savage4) which is on the AGP and also uses shared memory, and still some of the top PCI cards (especially Voodoo3) outperform it, when MTRR VGA WC is on. With WC off, as expected, the shared memory IGP takes the lead.

E.g. with Ezra @ 1000 and 3DBench2:

- ProSavage: -> 405.7 -> 568.7
- Voodoo3: 353.9 -> 675.1
- GeForce4 MX 440 SE (128-bit): 213.8 -> 606.8

Hi Jasin, can you try to set the FSB @ 133 and multiplier 10.5 or 11 ? It's possible the numbers will be higher even < 1500 Mhz total, thanks to higher FSB. Thanks

Reply 28 of 37, by Jasin Natael

User metadata
Rank Oldbie
Rank
Oldbie
sunmax wrote on 2024-09-01, 16:37:
Hi jtchip, yes agree. Good point on the Unichrome Pro. I got a ProSavage Twister IGP (2D core from Savage 2000 + 3D core form Sa […]
Show full quote

Hi jtchip, yes agree. Good point on the Unichrome Pro. I got a ProSavage Twister IGP (2D core from Savage 2000 + 3D core form Savage4) which is on the AGP and also uses shared memory, and still some of the top PCI cards (especially Voodoo3) outperform it, when MTRR VGA WC is on. With WC off, as expected, the shared memory IGP takes the lead.

E.g. with Ezra @ 1000 and 3DBench2:

- ProSavage: -> 405.7 -> 568.7
- Voodoo3: 353.9 -> 675.1
- GeForce4 MX 440 SE (128-bit): 213.8 -> 606.8

Hi Jasin, can you try to set the FSB @ 133 and multiplier 10.5 or 11 ? It's possible the numbers will be higher even < 1500 Mhz total, thanks to higher FSB. Thanks

Allright, bit of back story. I've never been able to get this board stable at 133FSB. It's an Intel board and has absolutely zero OC options in the BIOS, nor does it have any jumpers.
I've tried all the software FSB programs in the past but never had much luck. However long story short I tried again with CPUFSB and lo and behold it works just fine, both with my VIA C3 and with my Coppermine 1ghz.
Not sure how I wasn't able to get it to work before.....it's rock stable with a single stick of ram but a bit flaky with two. At least with the kit I have installed now. I don't really want to go dig through my ram box. 256MB is plenty.

Anyway long story short, I tested with the VIA at 1600mhz (133x12) which is not 100% stable in Windows but it is in DOS. 1533 is rock solid in Windows however.
The only other change is the aforementioned reduced ram, from 512 to 256

Tests with both Fastvid & WC enabled @1600

3DBench - 643.8
Chris' 3D Bench 640x480 - 216.6
PC Player 640x480 - 110.7
Doom Max detail - 2134 in 600
Quake 640x480 - 71.4

That FSB makes a difference for sure. The uplift from 1533 to 1600 is marginal but present as well.

Reply 29 of 37, by jtchip

User metadata
Rank Member
Rank
Member

I tested the J7F2 (C7-D 1.5GHz) with a Matrox G200 PCI in Quake:

  • Quake VGA: 172.5 -> 185.8
  • Quake SVGA: 27.5 -> 77.0 (2.8X)

This is slightly slower than the results with the CN700 UniChrome Pro iGPU (-5.06% at SVGA with WC). I did have to manually specify the LFB base address as MTRRLFBE couldn't detect it on the G200 (returns 0h).
I also used RayeR's VESATEST to compare the results at 640x480x32 LFB (using the included TEST640.BAT), without and with WC (transfer speed in MB/s):

  • UniChrome Pro iGPU: 46 -> 197 (4.28X)
  • G200 PCI: 47 -> 78 (1.66X)

197 MB/s is comfortably ahead of the PCI bus bandwidth so it's unsurprising that the integrated graphics is faster.
(As an aside, that was the first time I used a graphics card in the system and the G200 just barely fits in the case, a Morex 668, with 2mm to spare from the drive cage. The G200 also felt pretty toasty, it would be uncomfortable to leave your finger on the GPU heatsink for too long.)

Reply 30 of 37, by Jasin Natael

User metadata
Rank Oldbie
Rank
Oldbie
jtchip wrote on 2024-09-02, 00:32:
I tested the J7F2 (C7-D 1.5GHz) with a Matrox G200 PCI in Quake: […]
Show full quote

I tested the J7F2 (C7-D 1.5GHz) with a Matrox G200 PCI in Quake:

  • Quake VGA: 172.5 -> 185.8
  • Quake SVGA: 27.5 -> 77.0 (2.8X)

This is slightly slower than the results with the CN700 UniChrome Pro iGPU (-5.06% at SVGA with WC). I did have to manually specify the LFB base address as MTRRLFBE couldn't detect it on the G200 (returns 0h).
I also used RayeR's VESATEST to compare the results at 640x480x32 LFB (using the included TEST640.BAT), without and with WC (transfer speed in MB/s):

  • UniChrome Pro iGPU: 46 -> 197 (4.28X)
  • G200 PCI: 47 -> 78 (1.66X)

197 MB/s is comfortably ahead of the PCI bus bandwidth so it's unsurprising that the integrated graphics is faster.
(As an aside, that was the first time I used a graphics card in the system and the G200 just barely fits in the case, a Morex 668, with 2mm to spare from the drive cage. The G200 also felt pretty toasty, it would be uncomfortable to leave your finger on the GPU heatsink for too long.)

Interesting results. Might just be me or my config, but it does seem like my build is underperforming for the average results I'm seeing for similarly specc'd machines.

Reply 31 of 37, by jtchip

User metadata
Rank Member
Rank
Member
Jasin Natael wrote on 2024-09-04, 03:18:

Might just be me or my config, but it does seem like my build is underperforming for the average results I'm seeing for similarly specc'd machines.

Comparing your Nehemiah 1500 (15x100) to sunmax's 1400 (10.5x133), it's a 7.1% increase in clock speed but a 10% improvement in Quake SVGA framerate. From this alone, it looks like it's performing better than average, especially considering the FSB is 25% slower.

Compared to my C7 (Esther) 1500 (15x100), the C7 has a 35% increase in framerate but it does have an extra 64K of L2 cache and a 300% increase in FSB (400MHz), plus other internal optimisations.

Anyway, that still leaves the Ezra-T's underperforming write-combining mode. I had a look at the Linux kernel's initialisation of Centaur CPUs and there's no special treatment for Ezra-T. It should be tested on a different chipset to isolate the issue.

Reply 32 of 37, by Jasin Natael

User metadata
Rank Oldbie
Rank
Oldbie
jtchip wrote on 2024-09-04, 23:40:
Comparing your Nehemiah 1500 (15x100) to sunmax's 1400 (10.5x133), it's a 7.1% increase in clock speed but a 10% improvement in […]
Show full quote
Jasin Natael wrote on 2024-09-04, 03:18:

Might just be me or my config, but it does seem like my build is underperforming for the average results I'm seeing for similarly specc'd machines.

Comparing your Nehemiah 1500 (15x100) to sunmax's 1400 (10.5x133), it's a 7.1% increase in clock speed but a 10% improvement in Quake SVGA framerate. From this alone, it looks like it's performing better than average, especially considering the FSB is 25% slower.

Compared to my C7 (Esther) 1500 (15x100), the C7 has a 35% increase in framerate but it does have an extra 64K of L2 cache and a 300% increase in FSB (400MHz), plus other internal optimisations.

Anyway, that still leaves the Ezra-T's underperforming write-combining mode. I had a look at the Linux kernel's initialisation of Centaur CPUs and there's no special treatment for Ezra-T. It should be tested on a different chipset to isolate the issue.

Yeah I went back and looked at his results and you are right. I guess I read right past those results the first time. It does seem that I'm up against a wall of diminishing returns. My 133 fsb results show a notable uplift, but anything past 1500 MHz regardless of Fsb seems to be marginal, in Quake at least.

Reply 33 of 37, by mockingbird

User metadata
Rank Oldbie
Rank
Oldbie
jtchip wrote on 2024-09-04, 23:40:

Compared to my C7 (Esther) 1500 (15x100), the C7 has a 35% increase in framerate but it does have an extra 64K of L2 cache and a 300% increase in FSB (400MHz), plus other internal optimisations.

Anyway, that still leaves the Ezra-T's underperforming write-combining mode. I had a look at the Linux kernel's initialisation of Centaur CPUs and there's no special treatment for Ezra-T. It should be tested on a different chipset to isolate the issue.

It's 100mhz "quad data rate" FSB, so still 100Mhz FSB... CPUSPD can slow down C3/C7 with multiplier changes, disabling CPU features, and disabling L2 cache (I don't like disabling L1/L2, it does slow down the CPU, but not in a linear fashion). When I take my Ezra-T down to 150Mhz (50mhz FSB x 3x multiplier) and disable branch prediction, i-cache and d-cache, it is exactly the speed of a 486dx/33.

Esther goes down to 4x (400mhz), what do you get out of curiousity with those things disabled (but leaving L2 on)? I ask you because there's precious little information on slowing down Esther.

mslrlv.png
(Decommissioned:)
7ivtic.png

Reply 34 of 37, by jtchip

User metadata
Rank Member
Rank
Member
mockingbird wrote on 2024-09-05, 23:02:

It's 100mhz "quad data rate" FSB, so still 100Mhz FSB... CPUSPD can slow down C3/C7 with multiplier changes, disabling CPU features, and disabling L2 cache (I don't like disabling L1/L2, it does slow down the CPU, but not in a linear fashion). When I take my Ezra-T down to 150Mhz (50mhz FSB x 3x multiplier) and disable branch prediction, i-cache and d-cache, it is exactly the speed of a 486dx/33.

Esther goes down to 4x (400mhz), what do you get out of curiousity with those things disabled (but leaving L2 on)? I ask you because there's precious little information on slowing down Esther.

The FSB is 400MHz, 100MHz is the BCLK (according to the datasheet). I have a C7-D so it has a fixed multiplier. The only slowdown method I've used is disabling L1 but that's generally to get titles with issues in their sound initialisation to run at all rather than to run speed-sensitive software so I haven't measured how fast it is under those conditions.

Reply 35 of 37, by mockingbird

User metadata
Rank Oldbie
Rank
Oldbie
jtchip wrote on 2024-09-05, 23:32:

The FSB is 400MHz, 100MHz is the BCLK (according to the datasheet).

I stand corrected... I always associated FSB with memory speed... You're running the memory at 100mhz on early P4, but your FSB is 400mhz...

I have a C7-D so it has a fixed multiplier. The only slowdown method I've used is disabling L1 but that's generally to get titles with issues in their sound initialisation to run at all rather than to run speed-sensitive software so I haven't measured how fast it is under those conditions.

The multiplier is adjustable on C7-D in software with CPUSPD. If you would run speedsys at 400mhz with i-cache, d-cache, and branch prediction disabled ("cpuspd ecd eid ebd") and post the result here, I would greatly appreciate it. I am very curious to see what it compares to at that spec.

mslrlv.png
(Decommissioned:)
7ivtic.png

Reply 36 of 37, by The Serpent Rider

User metadata
Rank l33t++
Rank
l33t++

Some motherboards has adjustable multiplier option for Esther, like Wyse Vx0 thin client. The lowest it can go is 4x, so 400 MHz.

I must be some kind of standard: the anonymous gangbanger of the 21st century.

Reply 37 of 37, by jtchip

User metadata
Rank Member
Rank
Member
mockingbird wrote on 2024-09-06, 01:38:

The multiplier is adjustable on C7-D in software with CPUSPD. If you would run speedsys at 400mhz with i-cache, d-cache, and branch prediction disabled ("cpuspd ecd eid ebd") and post the result here, I would greatly appreciate it. I am very curious to see what it compares to at that spec.

Disabling the caches and branch prediction worked but not setting the multiplier (m4), it just reports Unsupported VIA processor. Anyway, with those disabled it scored 7.44, about a 386DX-40, compared to its usual score of 1051.00, just under a PIII-933.

The Serpent Rider wrote on 2024-09-06, 06:56:

Some motherboards has adjustable multiplier option for Esther, like Wyse Vx0 thin client. The lowest it can go is 4x, so 400 MHz.

Not the Jetway J7F2, it's a pretty packed motherboard and only has 4 jumpers for the keyboard and USB power, clear CMOS, and RCA output function (SPDIF or composite video).