VOGONS


Exploring the performance of the Matrox G550

Topic actions

First post, by vvbee

User metadata
Rank Oldbie
Rank
Oldbie

In Direct3D performance, the G550 is slightly slower than the G400. But in OpenGL, it's generally much faster, despite sharing the G400's OpenGL ICD. Why is this?

Is the G550's pseudo hardware T&L exposed in OpenGL? As far as I know, if any game uses the built-in GL transform and/or lighting functions, OpenGL will do it in hardware if available.

Here's a few benchmarks comparing the G400 and the G550 in some OpenGL games across four systems: Pentium 133, K6-2 300, Athlon 64 @ 800 MHz, and the same thing @ 2400 MHz. The last "game" is an OpenGL program I wrote that rotates a model on the screen and computes all T&L on the CPU, without GL functions. Quake is GLQuake. All in 800 x 600 in 16-bit color, except for the last one in 32-bit color.

The attachment poerjh.png is no longer available

In Quake 1 and 2, the G550 scales tangibly further than the G400, while in Homeworld and the test app the scaling is generally even or slightly in favor of the G400.

From what I've heard, I assume Quake 1 and 2 are able to take advantage of hardware transforms when available. I don't know about Homeworld, but looking at its code, they at least use a CPU wrapper for the OGL T&L functions, but I'm not sure whether in OpenGL mode the wrapper gets fully bypassed with native GL functions.

So, it seems possible that the G550 is doing some hardware transforms. That said, I'm not sure the numbers make full sense in that way. I'd at least expect CPU transforms to outperform the G550 hardware at the higher MHz.

Last edited by vvbee on 2024-07-27, 00:57. Edited 1 time in total.

Reply 1 of 75, by bakemono

User metadata
Rank Oldbie
Rank
Oldbie

Does your G400 run at AGP 4X? According to wikipedia, some of them only did AGP 2X. Maybe that could explain the difference on the Athlon 64 system.

Or maybe you could try using Vertex Buffer Objects in the OpenGL program and see if that gives the G550 an edge. (Or I could send you a program that I made. I tried to see how many triangles I could push on Mobility Radeon 7500, which oddly enough can do 13 million in 3Dmark01 but only 4.5 million in OpenGL. I wonder if 3Dmark01 uses strips or fans...)

GBAJAM 2024 submission on itch: https://90soft90.itch.io/wreckage

Reply 2 of 75, by vvbee

User metadata
Rank Oldbie
Rank
Oldbie

Both cards are running AGP 4X, or that's what PowerStrip says.

I ran some OpenGL benchmarks on the G400, clocking the Athlon 64 from 800 to 1600 to 2400 MHz (it's actually native at 2600):

The attachment greoij.png is no longer available

There look to be two patterns of scaling: (1) continuous through 2400 MHz, and (2) topping out at ~1600 MHz. The games that top out are Quake 2, MOHAA (Medal of Honor: AA), and Need for Speed 5 (using the Verok OpenGL patch). The games that keep scaling are Unreal Tournament, Bugdom, and 4x4 Evolution.

This is interesting because Quake 2 (probably MOHAA too) is known to use OpenGL-native transforms, while Unreal Tournament is known to not use them. In other words, games that are eligible to be limited by hardware T/L speed are being limited by something, while games that would do T/L on the CPU are continuing to scale with more CPU speed.

One possibility is that the G400 does T/L in hardware, in OpenGL anyway. I guess it's possible, it was speculated back then that the card's Warp Engine, whatever that was, may have been doing this. This might also explain why it shares an OpenGL ICD with the G550.

Another possibility is that the games that keep scaling do so because they rolled their own T/L instead of using built-in functionality and ended up being less efficient.

Reply 3 of 75, by Putas

User metadata
Rank Oldbie
Rank
Oldbie

This is quite interesting. I wonder whether G450 behaves in the same way. Maybe the implementation took so long, Matrox was not bothered to spread the news and announced it only for the G550.

Reply 4 of 75, by swaaye

User metadata
Rank l33t++
Rank
l33t++

The G400 T&L rumor was a speculative news post based on its triangle setup engine being somewhat programmable.
https://web.archive.org/web/20040110103309/ht … .pl/gbm/matrox/

There was a prevailing belief back then that G400 needed a fast CPU to perform competitively. This might be higher driver CPU overhead than the competition. Though they were referring to something like a P3 1000 since this was mostly 1999 and early 2000.

Last edited by swaaye on 2024-07-21, 12:58. Edited 3 times in total.

Reply 5 of 75, by The Serpent Rider

User metadata
Rank l33t++
Rank
l33t++

Unreal Tournament is a CPU intensive game, so hardly a good benchmark for GPUs. Also Matrox Direct3D is notoriously sluggish with driver overhead.
As for Quake 2 and Medal of Honor, to me it looks more like fillrate limit than anything else.

I must be some kind of standard: the anonymous gangbanger of the 21st century.

Reply 6 of 75, by vvbee

User metadata
Rank Oldbie
Rank
Oldbie

Being CPU intensive would be the goal of doing T&L on the CPU. Fillrate is an abstraction, ultimately it depends on how fast the triangles come in, which depends on how fast they're T&L'd.

Because of how OpenGL is, I'm not sure it's very easy to tell whether a card is doing T/L in hardware. Hardware would be faster than software until a certain point, but the same would be true if one software T&L implementation is more efficient than another, so how do you tell. Not easily it seems.

If the G550 can do some aspects of T&L in hardware and does so better than the G400, you'd expect the G550 to be faster at a given CPU speed until you cross a threshold going up. But the results here aren't showing that. Instead, the G550 simply scales further than the G400 in certain cases, apparently specifically in games that use OpenGL's built-in T and/or L, and potentially more so the lower the polycount - while using the same OpenGL ICD. What can you conclude from that?

Reply 7 of 75, by The Serpent Rider

User metadata
Rank l33t++
Rank
l33t++

G550 has two texture mapping units per pixel pipeline. So of course it can scale further.

I must be some kind of standard: the anonymous gangbanger of the 21st century.

Reply 8 of 75, by vvbee

User metadata
Rank Oldbie
Rank
Oldbie

Could be, but it also has a T&L engine and that begs the question. With unlimited CPU power, why does a second TMU make a bigger difference in id tech 1 than 2, and why so in id tech 2 than 3? Why does it appear to do nothing in Homeworld, Need for Speed 5, etc.?

How's this:

The attachment kifrde.png is no longer available

The game is CPU-limited at 800 x 600 16-bit. Overclocking the TNT2 does nothing, CPU's not processing fast enough. But the GF2 and GF4 with their hardware T&L take a load off the CPU and allow for more frames, i.e. suggesting the others don't have hardware T&L. Or? Well, maybe all of them transform in hardware, the GF2 and GF4 just better. My lower-tier cards aren't available so that's the most I can test.

Reply 9 of 75, by vvbee

User metadata
Rank Oldbie
Rank
Oldbie

Results for the OpenGL GPUBench, swarm.cz/gpubench/ (f640low, or whatever the profile was called, 16-bit, showing FPS):

The attachment uiabne.png is no longer available

So multitexturing is the only difference between the G400 and G550 in rasterization performance.

Running GLQuake without multitexturing confirms it:

The attachment jknbfdoi.png is no longer available

So at this point I think it can be said there's no hardware T/L in the G550 that the G400 doesn't have (as per the test against the GF2 and GF4), and the extra TMU of the G550 lets it run faster in some games but not necessarily logically.

This leaves open the question of why the multitexturing benefit tapers off from id tech 1 to 2 to 3, and in general appears to be a non-factor in games where it presumably should be a factor. The cards appear to have the same rasterization throughput otherwise, and despite the halved bus width on the G550 they have the same memory bandwidth.

There's also the question of why the G550 performs worse relative to the G400 in Direct3D, or vice versa why it's so strong in OpenGL. If you look at Phil's Matrox roundup, the G550 beats the G400 MAX in GL but loses to the regular G400 in D3D. The results are basically repeated in reviews from the early 2000s. This could be due to the selection of games though, as OpenGL tests skew toward id tech.

Reply 10 of 75, by NostalgicAslinger

User metadata
Rank Member
Rank
Member

Matrox has never released a G550 MAX, so you also have tried to overclock the G550 to see, if you can reach G400 performance in Direct3D? Would be interesting.

The "Matrox Technical Support Tweak Utility" is the tool you need: https://www.philscomputerlab.com/drivers-for-matrox.html

Reply 11 of 75, by vvbee

User metadata
Rank Oldbie
Rank
Oldbie

Not planning on overclocking. I did notice a forum post from 1999 or so that mentioned the ability in the tweak utility to overclock the Warp Engine specifically, but that's apparently a feature that was pulled and wasn't in two versions that I tried. Either that or I had the wrong GPU in. Would've been interesting to see what aspects of performance it affected.

Reply 13 of 75, by The Serpent Rider

User metadata
Rank l33t++
Rank
l33t++

Could be an issue with Direct3D DLL not utilizing multitexturing on G550. Possibly due to unified driver for G400/G450/G550 family of chips?

I must be some kind of standard: the anonymous gangbanger of the 21st century.

Reply 14 of 75, by vvbee

User metadata
Rank Oldbie
Rank
Oldbie

Need to test a bunch more Direct3D stuff then.

I used the MGATweak tool to modify the Warp clock on the G400. No effect on FPS in GLQuake in the CPU-limited scenario of the K6-2 300, so doesn't look like T/L. Doesn't prove it isn't, but doesn't look to be.

In a CPU-unlimited scenario, downclocking the G400 Warp Engine reduces the triangle rate:

The attachment hkiju.png is no longer available

Looks like a triangle setup engine, as expected.

Reply 15 of 75, by vvbee

User metadata
Rank Oldbie
Rank
Oldbie

The Direct3D situation looks like this in 3DMark2000:

The attachment kolsgn.png is no longer available

Most obviously the G550's multitexturing does work in Direct3D, at least here. Despite having the same memory bandwidth, the G400's single-texturing performance is higher, not sure by what mechanism since the difference is 20% which seems high. The high polycount and game tests seem to be saying the G400 has an edge in triangle strip performance - the first game heavily features a terrain mesh while the second one doesn't, and the high poly torus model looks fairly decimatable as well. Since the relative speed between the two cards grows as the complexity of the first game's terrain mesh increases, I wonder whether the G550 is failing to share vertices in strips?

Here's the same tests with the G400's Warp Engine, aka. triangle setup engine, downclocked:

The attachment nuikbg.png is no longer available

The G550 is now faster at rendering the high poly test, but still slower at the first game, as if the amount of vertex data is asymmetric. According to GPUBench from earlier, triangle strip performance is identical between the G550 and the G400 in OpenGL, but maybe this an issue in Direct3D.

So, the assumptions would be that in Direct3D the G550 is at a disadvantage when there are larger textures (or more of them, or something like that) and/or dense triangle strips, and especially so if not much multitexturing is used. If some of this transfers to OpenGL then it might explain why the G550 is so fast in GLQuake, as the textures and meshes are simple and there's plenty of multitextured lightmaps. Would also explain why the speed advantage shrinks from id tech 1 to 2 to 3.

Reply 16 of 75, by bakemono

User metadata
Rank Oldbie
Rank
Oldbie

Doesn't the G550 use 64-bit DDR like the G450, and unlike the G400 which uses 128-bit SGRAM? Expect better fillrate from the lower latency of SGRAM, vs DDR.

GBAJAM 2024 submission on itch: https://90soft90.itch.io/wreckage

Reply 17 of 75, by vvbee

User metadata
Rank Oldbie
Rank
Oldbie

Problem is the benchmarks here show no difference in fillrate, so it can't be write latency anyway. I'd assume read latency would come into play in the textured fillrate tests, but those results are more or less identical as well.

Reply 18 of 75, by The Serpent Rider

User metadata
Rank l33t++
Rank
l33t++

G550 is always DDR SGRAM and some G450 variants are also DDR SGRAM.

I must be some kind of standard: the anonymous gangbanger of the 21st century.

Reply 19 of 75, by vvbee

User metadata
Rank Oldbie
Rank
Oldbie

With more Direct3D games tested (800 x 600, 16-bit), the relative performance between the G400 and the G550 looks like this:

The attachment ftryoig.png is no longer available

MT Madness 2 is Monster Truck Madness 2.

So, there are some aspects of the G550 that make it faster (multitexturing) and some aspects that make it slower (unknown). You could pick half of these games and it'd look faster in Direct3D, pick the other half and it'll look slower.

As far as explaining the results, Thief 1 and 2 are similar to Quake 1 and 2, lightmaps and relatively simple geometry, presumably favoring the G550. I assume Rally Trophy is showing the benefit of single-pass multitexturing as well, but the difference is suspiciously big. Not too sure about the rest. Gothic, RealMYST and Trainz have similar terrain as game #1 in 3DMark2000 where the G550 was handicapped, but Rally Trophy does as well. What connects Formula One 97 and Homeworld?

There's no clear correlation between Direct3D version and the results for the G550, ditto for year of release.