VOGONS


The All-amazing super-de-douper SuperPi thread!

Topic actions

Reply 320 of 330, by underjack

User metadata
Rank Newbie
Rank
Newbie
Tommaso72 wrote on 2020-06-05, 07:01:
I have a question about Multicore SuperPi. I ran it on a P4 Northwood 3.06 Ghz cpu and set it to 32 with two threads for the hy […]
Show full quote

I have a question about Multicore SuperPi. I ran it on a P4 Northwood 3.06 Ghz cpu and set it to 32 with two threads for the hyperthreading . It finished at 72 seconds. Then ran it with one thread and got 111 seconds. This seems normal to me., but then I wanted to compare my P4 Prescot 3.2 Ghz. with hyperthreading and got the same as the Northwoods time of 72 seconds with 2 threads. What confuses me is I did it again with one thread and it was finished at 52 seconds? Why would it be faster with one thread? I thought the HT would boost it a bit like it did with the Northwood CPU.

The Prescott system is the newer model with 2 gigs cache, and it currently is running with one RAM stick of 1 gig, and the Northwood had 2 sticks of 512 megs each being dual channel. May be the longer pipeline of the Prescott is not liking the single channel RAM when using 2 threads?

Hope I wrote this clear enough to make sense, I worked all night and I smoked a fatty. Hope someone can shead some light on the subject, thanks in advance.

Tommaso

A hyperthreaded CPU isn't truly multicore. What's going on is you have one set of execution hardware, and two sets of registers and control hardware, so that two threads of execution can share the ALU. The idea when one thread isn't using the one or more of the execution units, the other can.

So your speed up will never be 100%, and in this case, since SuperPi has pretty good utilization, it will be much lesss, since the threads will do a lot of waiting for the other thread to free the ALU resources.

(A CPU is a guy with a calculator, some scratchpads, and a list of instructions. A HT CPU is two guys each with their own instructions and scratchpads, but only one calculator that they have to share.)

Reply 321 of 330, by Tommaso72

User metadata
Rank Newbie
Rank
Newbie

I understand what your saying and appreciate your response, but I have found with multiple systems that I have checked, most often the HT increases the times by 50 percent, but 2 real threads most often increases by 100. With the comparison with the 3.06 GHz Northwood was like I described,and the p4 640 Prescott actually decreased with HT on by 20 seconds. Why the difference between the Northwood and the Prescott? I thought HT should usually increase the times, even slightly, but something at the least. I find it strange both with HT on the times are similar, with the Prescott being faster in MHz, faster ddr2 ram as apose to see ram of the Northwood, way more cache and being much newer. Something seems off, probably something I am doing. I might not have comprehended what you wrote fully, happens occationally 🤣.

Tommaso

Reply 322 of 330, by underjack

User metadata
Rank Newbie
Rank
Newbie
Tommaso72 wrote on 2020-06-05, 17:27:

I understand what your saying and appreciate your response, but I have found with multiple systems that I have checked, most often the HT increases the times by 50 percent, but 2 real threads most often increases by 100. With the comparison with the 3.06 GHz Northwood was like I described,and the p4 640 Prescott actually decreased with HT on by 20 seconds. Why the difference between the Northwood and the Prescott? I thought HT should usually increase the times, even slightly, but something at the least. I find it strange both with HT on the times are similar, with the Prescott being faster in MHz, faster ddr2 ram as apose to see ram of the Northwood, way more cache and being much newer. Something seems off, probably something I am doing. I might not have comprehended what you wrote fully, happens occationally 🤣.

Tommaso

I know that the Prescott is a significantly different beast that earlier Pentium 4s. It was the penultimate NetBurst, before Intel went back to the P-Pro roots with the Pentium M and Core.

I'm not sure if the architectural differences are to blame, but two things that come to mind is Prescott has a 31 stage pipeline (vrs 21 in Northwood), and they were running much closer to their thermal limits than earlier P4s. The thermal issues are particularly interesting, because SuperPi is practically a synthetic benchmark in its ability to saturate a processor. Hitting both thread paths in a Northwood that heavily, if your cooling isn't up to snuff, may over stress the processor and cause it to thermal throttle, erasing any benefit of multi-threading with room to spare.

An interesting idea is to try to find a similar Cedar Mill chip. A Pentium 4 HT 641 is basically your 640 only built on 65 nm instead of 90 nm. They were known to run much cooler than Prescotts...

In any event, edge cases of workloads that run worse overall with HT (or other SMT schemes) are not a mystery. Chip designers have known about them from beginning, and I believe modern OS will actually adjust scheduling to mitigate when they can.

Last edited by underjack on 2020-06-20, 03:41. Edited 1 time in total.

Reply 323 of 330, by underjack

User metadata
Rank Newbie
Rank
Newbie

I don't know if it's been done, but I just replaced my i5-4460 with an i7-4790K (fastest proc that'll fit in my board).

i7-4790K at stock speed (4.0 GHz boosting to 4.4), Maximus VII Gene, 16 GB DDR3-1866

8.594s

(I don't think I ever posted the speed from the 4460, oh well. Also, I'm using the stock cooler, but a Hyper 212 Evo is on it's way from NewEgg. Once it's installed, I might do a mild overclock. I've seen an i9-9900 on here, and it's only about 1 second faster...I wonder how close I can get...if it'll get the same speed if I get it to 5.0 GHz)

Reply 324 of 330, by Tommaso72

User metadata
Rank Newbie
Rank
Newbie
underjack wrote on 2020-06-19, 22:06:
I don't know if it's been done, but I just replaced my i5-4460 with an i7-4790K (fastest proc that'll fit in my board). […]
Show full quote

I don't know if it's been done, but I just replaced my i5-4460 with an i7-4790K (fastest proc that'll fit in my board).

i7-4790K at stock speed (4.0 GHz boosting to 4.4), Maximus VII Gene, 16 GB DDR3-1866

8.594s

(I don't think I ever posted the speed from the 4460, oh well. Also, I'm using the stock cooler, but a Hyper 212 Evo is on it's way from NewEgg. Once it's installed, I might do a mild overclock. I've seen an i9-9900 on here, and it's only about 1 second faster...I wonder how close I can get...if it'll get the same speed if I get it to 5.0 GHz)

Thanks for the response! I believe you have a good point, thermal throttling could be the reason, i never thought of that. Im a going to look into it and see. I will report back with my findings.

Tommaso

Reply 325 of 330, by overdrive333

User metadata
Rank Newbie
Rank
Newbie

Celeron tualatin 1300mhz @ 133mhz FSB = 1733mhz
128mb 2-2-2-5/7 ram
asus cusl2-c
winXP

1 m 32.923 sec

Reply 326 of 330, by Warlord

User metadata
Rank l33t
Rank
l33t
overdrive333 wrote on 2021-03-11, 17:53:
Celeron tualatin 1300mhz @ 133mhz FSB = 1733mhz 128mb 2-2-2-5/7 ram asus cusl2-c winXP […]
Show full quote

Celeron tualatin 1300mhz @ 133mhz FSB = 1733mhz
128mb 2-2-2-5/7 ram
asus cusl2-c
winXP

1 m 32.923 sec

good performance.

Pentium M 1.82@2.4GHZ CT-479 Asus 865 Chipset 2gb Ram.
file.php?id=103921&mode=view

Reply 327 of 330, by Shagittarius

User metadata
Rank Oldbie
Rank
Oldbie

i12900k Stock, W/5600Mhz DDR5

The attachment i12900k_5600RAM_Stock.jpg is no longer available

Reply 328 of 330, by serenitatis

User metadata
Rank Newbie
Rank
Newbie

Here is my results from this PCs.

The attachment tusl.png is no longer available
The attachment p1.png is no longer available
The attachment p4.png is no longer available
The attachment p2b.png is no longer available

Just for fun my modern PC with Core i7 6700K.

The attachment modern.png is no longer available

Reply 329 of 330, by GemCookie

User metadata
Rank Member
Rank
Member

MSI MS-5169 VER:2.1, K6-2 350, 128/384 MiB PC100 CL3
Detonator 7.76
Windows 95, 128 MiB: 0h 09m 58.390s
Windows 95, 384 MiB: 0h 13m 06.141s
Detonator 21.81
Windows NT 4.0, 128 MiB: 0h 08m 57.222s
Windows NT 4.0, 384 MiB: 0h 09m 05.324s
Detonator 7.97
Windows 2000, 128 MiB: 0h 08m 57.433s
Windows 2000, 384 MiB: 0h 11m 37.773s
Windows XP, 128 MiB: 0h 08m 42.401s
Windows XP, 384 MiB: 0h 11m 56.330s

Fujitsu Siemens D1215, Pentium III "Coppermine" 866, 256 MiB PC133 CL2
Detonator 30.82
Windows 95: 0h 02m 39.466s
Detonator 43.45
Windows NT 4.0: 0h 02m 21.624s
Detonator 7.97
Windows 2000: 0h 02m 19.721s
Detonator 30.82
Windows XP: 0h 02m 19.080s
nouveau
Arch Linux w/ Wine: 0h 02m 59.138s

Just for fun:
Asus Maximus Extreme, Core 2 Quad Q9550, 8 GiB DDR3-1333 CL9
Nvidia 355.98 with pixel clock patch
Windows XP: 0h 00m 16.312s
Nvidia 368.81
Windows Vista: 0h 00m 16.551s
Nvidia 472.12
Windows 11: 0h 00m 17.125s

All systems ran at stock speeds.

Update:
MSI MS-5169 VER:2.1, K6-2/350 @ 380 (4× 95), 128 MiB PC100 CL2
Windows XP: 0h 08m 20.319s

Last edited by GemCookie on 2024-11-13, 18:30. Edited 1 time in total.

Gigabyte GA-8I915P Duo Pro | P4 530J | GF 6600 | 2GiB | 120G HDD | 2k/Vista/10
MSI MS-5169 | K6-2/350 | TNT2 M64 | 384MiB | 120G HDD | DR-/MS-DOS/NT/2k/XP/OBSD
Dell Precision M6400 | C2D T9600 | FX 2700M | 16GiB | 128G SSD | 2k/Vista/11/Arch/OBSD

Reply 330 of 330, by Tom..

User metadata
Rank Newbie
Rank
Newbie

Athlon XP Thoroughbred 2200+ , SiS 735 ECS K7S5A ..

2200-K7-S5-A.jpg

2200-K7-S5-A-1-0.jpg