VOGONS


Exploring the performance of the Matrox G550

Topic actions

Reply 40 of 75, by swaaye

User metadata
Rank l33t++
Rank
l33t++
vvbee wrote on 2024-07-30, 18:44:

Haven't tried, but PowerStrip's OpenGL vsync toggle doesn't seem to have an effect on the G400 at least.

Yeah I was playing some games on G400 a year ago or whatever and couldn't get vsync to work with OpenGL games. Neither full ICD or TurboGL. It appears they never supported the extension for vsync. There was also an old post on MURC about that, which I can't find now because MURC's search is broken.

Reply 41 of 75, by DosFreak

User metadata
Rank l33t++
Rank
l33t++

Does the Matrox Technical Support Tweak Utility have any options? I haven't used a G400 since 2001/2002.
IIRC, on my 2x P3 933 (in 2001/2002) performance was low so refresh rate likely exceeded fps in 3D games on my 21" CRT @ likely 85hz but the G400 was a 1999 card so likely even more so in 1999 heh.

Yeah, the driver version game and mix and matching files was what we had to do back then with the G400 and VIA drivers *shudder*.

I used this combination on my K63-400 w/G400 AGP w/ likely 15" CRT in FEB 2000...

This is ridiculous....
5.50.010
With 5.30.007
G400.DRV,G400.VXD
G400DD32.DLL and
5.50.010 G400DD.VXD
and TURBOGL

If you want a few laughs search MURC for DOSFreak posts.

How To Ask Questions The Smart Way
Make your games work offline

Reply 42 of 75, by vvbee

User metadata
Rank Oldbie
Rank
Oldbie

TurboGL would've been deprecated with the 6.x drivers, from 2000 or so onward. I've found the G400 to be the most compatible card of its era, for Direct3D anyway, when the condition is that you have to stick with one driver version. Same should go for the G550. Matrox were more dedicated to long-term support. The Matrox utility doesn't expose vsync control for OpenGL.

Reply 43 of 75, by swaaye

User metadata
Rank l33t++
Rank
l33t++

As with most AGP cards, there were some problems with Super 7 motherboards. I think I remember 5.30 being said to be more compatible with them. I love that hybrid mix of drivers Dosfreak.

According to me, from 2020
Re: Matrox G400/G450 Quake III Performance

I am playing around with a AMD750 Slot A setup at the moment and busted out the G400 Max. It's stable at AGP2x on there which is nice. I tried the TurboGL with Sin, Quake 2 and Quake 3. It runs quite well and looks good. I couldn't figure out how to enable Vsync though. Games were configured for Vsync but it was still disabled. Even forcing OpenGL Vsync with the Matrox Technical Support Utility did nothing. The full OpenGL ICD in some of the final driver releases does respond to this though.

I noticed something interesting while trying to enable Vsync. The MTSU says the GL ICD is using block transfers by default, but to even have the option of Vsync you must switch it to page flipping. So that gives me the impression that Matrox was avoiding Vsync support in order to maximize their GL performance.

There are a few versions of the MTSU. Maybe some of them don't have OpenGL vsync because it was only exposed in a later driver.

Also, Matrox suggested setting AGP Aperture to 256MB for proper operation.

Reply 44 of 75, by vvbee

User metadata
Rank Oldbie
Rank
Oldbie

Ah, I only had a quick look at the utility and didn't see vsync for OpenGL. I've got AGP aperture set to 64 on the testbench. I can do point sampling on some of these results with 256 later.

The G550 vs G400 in Quake 3 Demo (running demo001), 6.83 drivers, Athlon 64 @ 1.6 GHz:

The attachment nhhwb.png is no longer available

The results for the G400 are close to Anand's numbers from back in the day. This is a much faster CPU with different drivers, but who knows. In any case, the performance lead of the G550 relative to the G400 in 16-bit color is stable at ~10% up to just shy of 1600 x 1200, in 32-bit color they're even past 1024 x 768.

Disabling OpenGL extensions in Quake 3 presumably (ideally) takes away the G550's multitexturing advantage:

The attachment nbxvt.png is no longer available

So presumably having lost that advantage, it loses to the G400 progressively more as the resolution increases:

The attachment ljpikee.png is no longer available

Not sure whether the result for 1600 x 1200 is a mistake (could be a brainfart in flooring rather than rounding one of the FPS numbers, which would make up for the hypothetical difference), but in any case I didn't bother waiting for the 32-bit run of it to complete.

So, assuming that disabling all of the extensions doesn't artificially skew the results, which it might do, it seems plausible that what's going on is stronger multitexturing performance more or less fully masking weaker memory performance.

That said, you'd think texturing performance depends on memory performance, so I'm not sure how it could mask a progressively larger gap in memory performance. I'm still of the opinion/guess that if there's a memory performance bottleneck, it depends more on the pattern of access than raw throughput, like performing worse in games that do lots of small reads vs large blocks, or something like that.

Reply 45 of 75, by swaaye

User metadata
Rank l33t++
Rank
l33t++

I guess we are seeing some inefficiency with the G550 communicating to memory over a 64-bit bus.

I found a review of the GF2MX 64-bit DDR that compares it against the 128-bit SDR version at the same clock speeds. I think it shows behavior comparable to the G400 vs G550.
https://web.archive.org/web/20010208185537/ht … geforce2mx.html

Reply 46 of 75, by The Serpent Rider

User metadata
Rank l33t++
Rank
l33t++

G550 has 2 SGRAM DDR chips, while a normal G400 card has 8 chips (only 4 SGRAM chips are required for 128-bit bus). So former is probably lacking bank interleaving.

I must be some kind of standard: the anonymous gangbanger of the 21st century.

Reply 47 of 75, by vvbee

User metadata
Rank Oldbie
Rank
Oldbie

If it were interleaving, I'd expect 3DMark2000's texture benchmark to show it, but instead the cards have the same relative performance with 8 MB of texture data as they do with 32 MB. The GF2 MX SDR/DDR situation does look similar, though GF4 MX DDR vs GF2 MX SDR would look different, and Matrox had a year or more to optimize the architecture between the G400 and G550 (or G800 as the G550 may have started as), so maybe there's some unexpected corners.

GLQuake at various resolutions:

The attachment gftryjh.png is no longer available

The G550 at 32 bits is faster than the G400 at 16 bits. At 16 bits the G550 is progressively faster than the G400 as the resolution increases: 130% at 640, 133% at 800, 137% at 1024, and 142% at 1600.

GLQuake without multitexturing (-nomtex):

The attachment gwstjm.png is no longer available

This might be the sort of difference you'd expect between the GF2 MX SDR/128 and DDR/64.

Without multitexturing, at 32-bit, as you increase the resolution, the G550 draws fewer bits per second relative to the G400; but with multitexturing it's the reverse:

The attachment bmpcjn.png is no longer available

The two modes have been normalized at 640 x 480, they're not identical in absolute terms. In any case, if the G550's memory performance causes it to be progressively slower than the G400 as the resolution increases, why does its performance lead over the G400 increase with resolution when single-pass multitexturing is enabled?

When you compare the ratios of 32-bit performance to 16-bit performance with the G550 and G400 in some other games, you get this pattern:

The attachment bnvmsy.png is no longer available

In other words, compared to the G400, the G550 has a bigger performance drop from 16-bit to 32-bit, except that at 1600 x 1200 it's the G400 that drops more. A similar effect was shown in the GF2 review that swaaye linked to.

Based on some more per-resolution results, I'd predict:
- The G550 is about 10-40% faster in games where multitexturing is used.
- Where multitexturing isn't used, the G400 is about 10% faster in OpenGL and about 10-20% faster in Direct3D (the D3D drivers seem to favor the G400).
- In some games the behavior is different, for unknown reasons.

Reply 48 of 75, by vvbee

User metadata
Rank Oldbie
Rank
Oldbie

I don't know how TMUs work exactly, but I'm assuming their job is to read from texture memory + write to the framebuffer + in the middle do a bit of UV and filtering computation. So if you have single-pass multitexturing with n textures, it's n reads from texture memory (maybe a few extra for filtering), one write to the framebuffer, and parallel computations for the n samples. Without knowing anything, I'd assume the computations would be more expensive than the reads and writes, and if that's the case, the process would be more influenced by core than memory performance.

I tested underclocking the G400's core, memory, and both and running GLQuake with and without multitexturing enabled (relative performance drop in % on y axis):

The attachment fscgyr.png is no longer available

The results seem to be saying that with multitexturing enabled (the G400 doesn't do it single-pass, but still gets some performance advantage), underclocking the memory has noticeably less effect on performance than underclocking the core. With multitexturing disabled, memory performance has more effect, though still a bit more tilted toward core performance.

So since the G550 gains performance relative to the G400 the more multitextured pixels there are, would this not suggest that its core is stronger? Mine is clocked at 133 MHz vs. the G400's 126 or whatever, but that's just 5%. So maybe the G550 has optimizations to compute or caching or something?

Reply 49 of 75, by vvbee

User metadata
Rank Oldbie
Rank
Oldbie

MGATweak doesn't support the G550, but I used WPCREDIT to feel around for the clock dividers. I didn't get the memory downclocked in the time I had, but the core did go down (verified in PowerStrip), and I used PowerStrip for the core + mem downclock. The results for the G550 as per the ones for the G400 above:

The attachment wbgfli.png is no longer available

Assuming the clocking is reliable this way, the way I interpret the results, memory speed is less important when single-pass multitexturing is enabled, which would somewhat go along with my farted out theory that compute is more important in the TMU process. That said, the relative change between core and memory isn't very big, although it does seem the core is slightly more dominant with multitexturing enabled. This was in 640 x 480 16-bit, so maybe it's hitting some other limit at that low a resolution, need to test some other resolutions later. In any case, this may suggest that single-pass multitexturing does help mask deficiencies in memory performance.

Reply 50 of 75, by swaaye

User metadata
Rank l33t++
Rank
l33t++

Is that possibly improved texture locality when multitexturing? So it just becomes more efficient?

Maybe you could also run PowerVR Villagemark and see if the chips behave differently.

Reply 51 of 75, by Putas

User metadata
Rank Oldbie
Rank
Oldbie
vvbee wrote on 2024-08-02, 06:37:

The results seem to be saying that with multitexturing enabled (the G400 doesn't do it single-pass, but still gets some performance advantage)...

But G400 does it single-pass, at least for two textures with a bilinear filter.

Reply 52 of 75, by vvbee

User metadata
Rank Oldbie
Rank
Oldbie

If you do a Google image search for VillageMark there's a roundup benchmark where the G550 and G400 are fairly even.

Here's a simple model that predicts the G400 multitextured results from above, i.e. fitted to GLQuake. Two pulses of texture reads, two pulses of texture mapping, one pulse of writing to the frame buffer. A texture read pulse takes 3 time units, a texture mapping pulse 10 time units, and a frame buffer write pulse 3 time units. If you underclock the memory by 40%, it's 4.2 time units per read/write pulse, and the card is predicted to run 12 % slower (measured performance is 14% slower). If you underclock the core by 33%, the predicted performance is -23% (measured -27%). If you underclock the memory by 40% and the core by 33%, the predicted performance is -35% (measured -32%).

If you assume the G550 consumes the same time units but needs only one pulse for the two texture mappings, the model predicts the effect of a 17% core underclock as -9% (measured -9%) and the effect of the same on both the core and memory as -17% (measured -14%). It predicts that the G550 is 35% faster than the G400, which is in the ballpark of the GLQuake results, and that the G550 is 32% slower with multitexturing enabled vs. disabled, which was the case.

In Quake 3 the G550 was 10% faster than the G400. The model predicts that if you increase the cost of compute relative to the cost of memory, the G550 gains on the G400, and vice versa, and this is when they're assumed to have identical memory performance. It predicts a 10% edge over the G400 if the cost of memory is 2.6 times the cost of compute. Well, not knowing Quake 3, I'm not sure that's very realistic. On the other hand, if the cost of memory is 8 vs 10 for compute, which I suppose it could if there's lots of extra reads per pixel, and if the G550 performs 20% worse in this memory workload, which it might or might not, then it's predicted to be 12% faster overall vs. the measured 10%. If we take away the G550's edge in parallel compute, these weights predict it to be 12% slower than the G400 in Quake 3 when multitexturing is disabled, which is in the ballpark, and 21% slower than itself when multitexturing was enabled, which again is in the ballpark.

So as far as OpenGL goes, which unlike Direct3D seems to have more easily comparable driver performance, the performance of the G550 in this test PC in GLQuake is more or less predictable from the performance of the G400 + parallel multitexturing. The effect where the G550's edge over the G400 is progressively smaller from id tech 1 to 2 to 3 would be explained by the cost ratio of compute to memory shrinking and also the G550 accumulating more latency in this workload.

Reply 53 of 75, by Putas

User metadata
Rank Oldbie
Rank
Oldbie

While the model can approximate the overall rendering times well enough, texturing is just one of the things happening at the moment. Texture cache largely insulates from memory reads. With hardware at the level of G400, we can also expect an output cache to make the texturing units move immediately to the next work. The doubling of texturing units on G550 enables double the pixel fill rate in appropriate scenarios, while other resources remained roughly the same. It is likely to be the other workloads that make the card sensitive to the memory bandwidth. So I would word it differently, just to make it clear what causes what. Also, I find using words like compute for fixed operations like texturing is quite anachronous. Nothing wrong with "mapping" or "processing".

Seeing the occasionally lower geometry performance, the parallelized frontend of the G400 may have been gutted. Matrox was way ahead with that one.

Reply 54 of 75, by The Serpent Rider

User metadata
Rank l33t++
Rank
l33t++

Does G550 scale better with Trilinear filtering enabled?

I must be some kind of standard: the anonymous gangbanger of the 21st century.

Reply 55 of 75, by vvbee

User metadata
Rank Oldbie
Rank
Oldbie

Need to test trilinear filtering at some point. The model abstracts various things, but as far as I know it comes closest to explaining the results. You can speculate about various things, but if you can't isolate it in the numbers then it's further away from being an explanation.

Here's some of the downclocking data for the G550 in GLQuake, the percentage of effect on performance from downclocking the core relative to downclocking both core and memory:

The attachment ulbyyh.png is no longer available
The attachment omuguv.png is no longer available

So core speed plays a bigger part when parallel multitexturing, which fits the simple model. At least according to these numbers. It would've been ideal to downclock just the memory, but I couldn't find a raw value that would've been artifact-free in-game.

Reply 56 of 75, by vvbee

User metadata
Rank Oldbie
Rank
Oldbie

Overclocking the G550 core and memory by 20% (160/200) improves performance by about 20% in 3DMark2000:

The attachment bhvbif.png is no longer available

About 20% improvement in Quake 3 as well, 32-bit faster than G400's 16-bit:

The attachment pxdfdb.png is no longer available

Where the stock G550 was slower than the G400 in some Direct3D games, the overclocked G550 looks to be generally faster in all:

The attachment behfoi.png is no longer available

I have to wonder whether this isn't more a reversed underclock than an overclock. The stock is 133/166 but the memory chips are rated for 200 and there were no heat issues or artifacts at a glance.

Reply 57 of 75, by BitWrangler

User metadata
Rank l33t++
Rank
l33t++

Possibly they are clocked way down for stability in the poorly vented business boxes of the age. Cases were lagging a couple of years behind higher end ventilation demands.

Unicorn herding operations are proceeding, but all the totes of hens teeth and barrels of rocking horse poop give them plenty of hiding spots.

Reply 58 of 75, by The Serpent Rider

User metadata
Rank l33t++
Rank
l33t++

Matrox just bought SGRAM chips which were readily available at the time.

I must be some kind of standard: the anonymous gangbanger of the 21st century.

Reply 59 of 75, by Nemo1985

User metadata
Rank Oldbie
Rank
Oldbie
vvbee wrote on 2024-08-08, 02:04:

Overclocking the G550 core and memory by 20% (160/200) improves performance by about 20% in 3DMark2000:

What did you use to overclock the g550? Powerstrip?