VOGONS


P3 Tualatin vs. P4

Topic actions

First post, by 2Mourty

User metadata
Rank Member
Rank
Member

I dug two laptops out of my pile of old stuff the other day. One is a Dell Latitude C810 with a 1.13 GHZ Pentium 3 Tualatin processor. The other is a Dell Precision M50 with a 1.8 GHZ Pentium 4 processor.

I was bored and ran SPEEDSYS with a bootdisk on both machines. I expected the P4 to win because of the clock rate advantage. Much to my surprise the P3 chip won by a large margin on the CPU score.

The P4 is a northwood processor with a 400MHZ bus. Its memory bandwith kills the P3 because of DDR vs PC133. I know that clock for clock the the P4 is an inferior processor compared to the P3, but these results suprised me. I am uploading the test results; anybody have any ideas as to why the P4 lost badly?

Attachments

  • P3133.jpg
    Filename
    P3133.jpg
    File size
    77.67 KiB
    Views
    7322 views
    File comment
    P3 Tualatin 1.13 GHZ
    File license
    Fair use/fair dealing exception
  • P418.jpg
    Filename
    P418.jpg
    File size
    76.22 KiB
    Views
    7322 views
    File comment
    P4 Northwood 1.8 GHZ
    File license
    Fair use/fair dealing exception

Reply 1 of 20, by archsan

User metadata
Rank Oldbie
Rank
Oldbie

Have you tried to disable HT in the BIOS for the Northwood? Maybe that would help lessen the performance loss for this app.

The cache result for the P4 seems strange to me: L1 at 512K, no L2 cache, writing drops before 128k...

Also, though effectively runs at 400MHz QDR, that P4 is actually a 100MHz bus processor, more like the first-gen P4 than its desktop counterpart (200MHz/800MHz effective).

Thanks for sharing this.

Reply 3 of 20, by Old Thrashbarg

User metadata
Rank Oldbie
Rank
Oldbie

Have you tried to disable HT in the BIOS for the Northwood?

I don't think a 1.8 Northwood has HT, nor does the 845 chipset support it AFAIK.

In any case, there's something wrong with the benchmark, that's for sure... the P4 is slow, but not that slow. The 1.13 P3 should be roughly on equal footing with a 1.8 P4, and will fall a little behind on certain multimedia tests that take advantage of some of the optimizations in the P4.

Reply 4 of 20, by archsan

User metadata
Rank Oldbie
Rank
Oldbie

I don't think a 1.8 Northwood has HT, nor does the 845 chipset support it AFAIK.

😦 righty there, i don't remember the P4 lineage that well. The lowest clock HT P4 was the famous 3+GHz budget rocker 2.4C which was also--to me--the only P4 worth buying.

I thought one of the biggest performance loss of the OP's P4 came from that strange cache result (and i assumed too quickly that the caching problem is caused by HT's shared multithreading). Compare that to the P3 cache result chart, which corresponds well to its 32K/512K configuration.

This makes me want to rebuild my P4 2.4C and test SpeedSys on it. Or better yet, let other people--who have a P4 ready to boot--share their result! 😜

Reply 5 of 20, by swaaye

User metadata
Rank l33t++
Rank
l33t++

P4's L2 cache is silly fast so some apps might think it's the L1. Also the P4's L1 data cache is an itsy bitsy 8KB. The other 12KB is the L1 trace cache. I'm sure they had to keep the L1 caches small because of their desire for high clock speed instead of purely performance per clock.

You can't really complain about the performance of Intel's cache's starting with Coppermine....

Another thing to remember is that P4 has terrible x87 FPU performance. This test might be measuring that aspect of the CPU.

Reply 7 of 20, by keropi

User metadata
Rank l33t++
Rank
l33t++
sectoid wrote:

This is not so suprising since P3 was used as the baseline for Core 2 technology and P4 was dumped entirely.

really? this is the first time I read this... interesting....

Reply 9 of 20, by Old Thrashbarg

User metadata
Rank Oldbie
Rank
Oldbie

This is not so suprising since P3 was used as the baseline for Core 2 technology and P4 was dumped entirely.

While it's true that the Netburst architecture was a giant flaming ball of poo, (and yes, it was dumped in favor of a rework of the old P6 architecture), the fact remains that a P4 shouldn't be that slow.

Here's a link showing some other guy's System Speed Test results with a Celeron 1.7. That chip should be significantly slower than a 1.8 P4, but it's turning in better numbers than OP's. Notice the CPU speed index of 1145, versus the 973 OP has, and also look at the huge gap in memory bandwidth numbers. There's something wrong here.

I did notice, when searching for suitably similar speed test results, though, that the cache seems to be misreported on other P4-based chips as well. I'd guess it's probably just a quirk about the way the program sees it, and maybe swaaye's analysis is correct as to why.

Reply 11 of 20, by gerwin

User metadata
Rank l33t
Rank
l33t

Here is another Northwood Pentium 4, the one at work. I brought a bootdisk today so I could benchmark it. It scores quite a bit more then 2mourty's 1800 MHz Pentium 4. But still less then the AMD-3000+ systems at home (2416,26 points).

Edit,
While I am at it: I also ran these diagnostics on another Pentium 4 and on a Core 2 Duo system.

Attachments

  • c2d_cpuz.gif
    Filename
    c2d_cpuz.gif
    File size
    14.43 KiB
    Views
    7152 views
    File comment
    Core 2 Duo CPU-Z
    File license
    Fair use/fair dealing exception
  • c2d_sst.gif
    Filename
    c2d_sst.gif
    File size
    10.13 KiB
    Views
    7152 views
    File comment
    Core 2 Duo Speedsys
    File license
    Fair use/fair dealing exception
  • p4_2800ht_sst.gif
    Filename
    p4_2800ht_sst.gif
    File size
    12.06 KiB
    Views
    7152 views
    File comment
    Pentium 4 2800MHz Hyperthreading Speedsys
    File license
    Fair use/fair dealing exception
  • p4_2600_cpuz.gif
    Filename
    p4_2600_cpuz.gif
    File size
    12.35 KiB
    Views
    7152 views
    File comment
    Pentium 4 2600MHz CPU-Z
    File license
    Fair use/fair dealing exception
  • p4_2600_sst.gif
    Filename
    p4_2600_sst.gif
    File size
    12.65 KiB
    Views
    7152 views
    File comment
    Pentium 4 2600MHz Speedsys
    File license
    Fair use/fair dealing exception

Reply 12 of 20, by archsan

User metadata
Rank Oldbie
Rank
Oldbie

Thanks! That's the most impressive "Pentium III" result i've seen so far...

Btw, the Core 2 result seems to challenge the previous assumption on the P4 cache (about the L2 cache being so fast it is recognized as L1), as the three levels of caches are all recognized properly.

Reply 13 of 20, by Old Thrashbarg

User metadata
Rank Oldbie
Rank
Oldbie

as the three levels of caches are all recognized properly.

Except for the fact that Core2 chips don't have L3 cache. I'd say it's less to do with the speed of the cache or anything, but is just a case that the program uses a certain limited set of parameters for determining the cache size and layout, which doesn't work properly on newer chips.

Reply 15 of 20, by Old Thrashbarg

User metadata
Rank Oldbie
Rank
Oldbie

by three levels he means

I gathered that he was talking about the System Speed Test results, since that's been the subject of the rest of the thread. Speed Test shows three levels of cache... L1, L2, L3. Core2 chips don't have L3, nor does that chip have 6MB of cache in any configuration.

CPU-Z correctly identifies the two levels of cache: L1 32KB+32KB, and L2 4MB.

Reply 16 of 20, by archsan

User metadata
Rank Oldbie
Rank
Oldbie

Except for the fact that Core2 chips don't have L3 cache.

I take the word "properly" back--but i was paying attention to their relative speed (~13000MB/s vs. ~8000MB/s), i.e., the Core 2's L2 is even faster than OP's P4's "L1" (as reported by SpeedSys) cache, and the Core 2's L1 is even much faster still.

I guess it has more to do with the different architecture of the CPUs (and their caches), and that SpeedSys' cache benchmark only properly recognize for CPU architectures up to Athlon/P3 class.

After all, no big deal, it's just one of many benchmarks. To me, it only encourages to choose a P3 over P4 for a retro machine. Efficiency not only in the energy used but also in terms of real processing.

To be fairer, how about a real-world retro benchmark: Quake or System Shock 1 in real DOS, at the highest resolution available? I have a P4 2.4C so i can run test from 1.2GHz (@100FSB)--never tested it so not guaranteed-- up to at least 3GHz (@250FSB). That, vs. a Coppermine at 1.2 and a Tualatin at 1.4 (not that much of a difference so a single Tualatin would do)--but i don't have a P3 system over 1GHz atm.

At least that is something i'm more interested in...

Reply 18 of 20, by QBiN

User metadata
Rank Oldbie
Rank
Oldbie

This is a fairly surprising result. Although the Tualatin core P-III's used an arguably superior architecture than the NetBurst P-4, the brute force speed difference in FSB and memory (PC-133 vs. DDR-400) should have edged out the Tualatin.

It's possible that the bios settings in each might have skewed the results. Nonetheless, it goes to show the legs that the i815 B Stepping chipset plus Tualatin CPU's had.

I still have a 1.4Ghz PIII-S, and it's plenty fast and responsive (and a great DirectX 7 and 8 rig, to boot!)

I've posted my PIII-S Tualatin 1.4 and P4C Northwood 3.4 for comparison.

Attachments

  • P3S14GHZ.jpg
    Filename
    P3S14GHZ.jpg
    File size
    60.4 KiB
    Views
    6943 views
    File comment
    PIII-S Tualatin 1.4GHz w/ i815B Chipset
    File license
    Fair use/fair dealing exception
  • P4C34GHZ.jpg
    Filename
    P4C34GHZ.jpg
    File size
    61.55 KiB
    Views
    6944 views
    File comment
    Pentium4 Northwood 3.4GHz w/ i865PE Chipset
    File license
    Fair use/fair dealing exception
Last edited by QBiN on 2009-07-25, 04:33. Edited 1 time in total.

Reply 19 of 20, by prophase_j

User metadata
Rank Member
Rank
Member

A main short coming with Netburst is the length of the pipeline. Modern processors use a technique called branch prediction to run efficiently. However, the longer pipeline of the P4 (more than 20 compared to 13 with the P3) means there is a much higher penalty when the algorithm fails. Intel developed an advanced prefetch engine to try to deal with penalty of branch mis-predictions, bu the end result means the the P4 can really eat up bandwidth. If you compare the same stepping and clock speed, but make more bandwidth available, there is always a marked increase in performance. When the P4 first came out, it's chipsets supported PC133 and RAMBUS. Latter they moved to things like DDR and Dual Channel RAMBUS 1066, eventually peaking with i875 chipset featuring dual channel DDR400 and a unique (although not very significant) technology called Performance Acceleration Technology.

"Retro Rocket"
Athlon XP-M 2200+ // Epox 8KTA3
Radeon 9800xt // Voodoo2 SLI
Diamond MX300 // SB AWE64 Gold