VOGONS


First post, by bakemono

User metadata
Rank Oldbie
Rank
Oldbie

The GeForce GT 430/440 based on the GF108 chip is specified for 96 shaders, 16 texture units, and 4 ROPs. That last part has always stuck out like a sore thumb for me. Why only 4 ROPs? Is it true?

1) 4 ROPs is a downgrade compared to the previous generation GT 240 and even GT 220. It is less than competing Radeon cards like 4670 or 5570/5670.
2) It is totally out of character compared to the rest of the Fermi lineup, which have a low ratio of texture units to ROPs. For instance: 24/24, 32/16, 56/24, 56/32, 60/48. They mostly have a ratio of 2/1 or less. The GF108 looks strange having a ratio of 4/1
3) GF108 has a 128-bit memory bus, and not only that but it supports GDDR5! What in the world can it do with all that memory bandwidth with only 4 ROPs?

Now, let's look at some benchmarks... In an old Techpowerup review they have these 3Dmark06 results:
Radeon 5450: 3449
Radeon 5570: 7778
GT 430: 8025
GT 440: 9624
Radeon 5670: 10630
GTS 450: 15617

I add some of my results:
Radeon 3850: 8300
Radeon 5550 (Turks LE): 4700
GT 220: 7175
GT 240: 9695

Observations:
1) Turks LE has 4x the shaders and 2x the texture units compared to the 5450, plus a wider memory bus. But it is only somewhat faster, due to having only 4 ROPs. It gets crushed by the 5570 which has the full 8 ROPs. But somehow the GT440 is way ahead with only 4 ROPs??
2) GTS 450 has 2x the shaders and texture units compared to GT 440, and 16 ROPs. But it can't even double the score??
3) GT 240 has double the shaders and texture units compared to GT 220, but comes out only 35% faster. Once again this suggests that ROPs are important. So how could the GT 440 match the GT 240 with only 4 ROPs??

You might say that if GT 440 had 8 ROPs it could beat the GT 240 because of higher clock speed. But in this case, I think it would be texture units holding it back instead. 3Dmark06 seems to not be very heavy on shaders, but it benefits from having 3x texture units per ROP. For instance, the Radeon 3850 score is not so hot, due to only 16 texture units. The 5570 nearly matches it by having 20 and the GT 240 is ahead with 32 texture units. But cards with 4x texture units per ROP are not that much better than those with 2x (compare Turks LE to 5450, or GT 240 to GT 220). 3x is the sweet spot.

What do you think?

again another retro game on itch: https://90soft90.itch.io/shmup-salad

Reply 1 of 3, by BitWrangler

User metadata
Rank l33t++
Rank
l33t++

Don't look too long at 440s, your head will explode, there's a GF106 higher spec variant and another dumbed down variant... and it's hard to tell which one you're seeing benchmarks for.

Unicorn herding operations are proceeding, but all the totes of hens teeth and barrels of rocking horse poop give them plenty of hiding spots.

Reply 2 of 3, by bakemono

User metadata
Rank Oldbie
Rank
Oldbie

The plot thickens!

Recently I got a GTS 450. It's an EVGA model with default clocks of 882/950. However the actual memory installed on the card is rated for 1250MHz! I could take the memory clock up to 1150MHz (1200 would crash) and run benchmarks but it made hardly any improvement, only around 2%. The card must not be limited by memory bandwidth, as even single-texturing fill rate hardly budged. Speaking of which, it looked oddly low at 6400 megapixels/s...

I found these threads:
https://www.techpowerup.com/forums/threads/pi … d-fermi.155459/
https://www.techpowerup.com/forums/threads/gts450.223854/

Apparently the Fermi architecture is only capable of outputting two pixels per SM per clock. More information on that here:
http://www.ece.lsu.edu/gp/refs/gf100-whitepaper.pdf
https://www.anandtech.com/show/3809/nvidias-g … -the-200-king/2

So the GT 430/440 aren't really out of place for having 4 ROPs, because with two SMs they may not need more than that. It's the other cards in the Fermi lineup that seem to have a disproportionate number of ROPs and very generous memory bandwidth.

It still seems weird that Turks LE gets such a low score in 3Dmark06, not even half of a 5670, but maybe Catalyst 13.4 is a poorly performing driver, or these chips are crippled somehow beyond what is obvious from the specs.

again another retro game on itch: https://90soft90.itch.io/shmup-salad

Reply 3 of 3, by Putas

User metadata
Rank Oldbie
Rank
Oldbie
bakemono wrote on 2021-10-21, 13:52:

Apparently the Fermi architecture is only capable of outputting two pixels per SM per clock. More information on that here:
http://www.ece.lsu.edu/gp/refs/gf100-whitepaper.pdf
https://www.anandtech.com/show/3809/nvidias-g … -the-200-king/2

Apparently not, Fermi has rasterizers per GPC and decoupled ROPs.