VOGONS


Truform on ATi GPUs after Radeon 8500/9100

Topic actions

First post, by swaaye

User metadata
Rank l33t++
Rank
l33t++

I've been playing a bit of Return to Castle Wolfenstein lately, on Radeon 8500. Switched to Radeon 9700 Pro and saw the frame rate drop badly in some cases. I was playing on a Athlon XP 1800+ with AGP 4x at the time. For those who aren't aware, the Radeon 8500/9100 R200 chip is the only GPU with full Truform processing. The later chips do it at least partially in software and the speed hit can be tremendous, depending on how many models in view are being tessellated. I don't recall anyone testing it.

I found a scene with a number of tessellated NPC models, which causes a major speed hit on the 9700. I collected frame rate numbers across 3 platforms with 4 different GPUs.

-the same install of RTCW was used for all systems
-set to High Quality 1600x1200x32
-varied CPU speed with RMClock and Throttlestop
-Catalyst 5.8 on Windows XP SP3
(fw = AGP fastwrites, tf = truform)
AGP 0x (PCI mode), 1x, 4x, 8x and PCIe x16 tested.
2Yh0AFcR_o.png

Thoughts:

One needs a rather fast CPU and a speedy bus to avoid a bottleneck with software Truform. The Core i5 2500K machine appears to max it out at a bit more than 2.4 GHz.

AGP 4x appears to be sufficient bus bandwidth, at least for the Athlon 64, but any less than that becomes a bottleneck. Software Truform apparently sends much more data to the GPU. From what I understand, Truform tessellation can be seen as a form of compression, as a way to avoid added bus transfer of more geometric data and you lose this when it runs on the CPU.

Also interesting is I noticed the Radeon 8500 appears to tessellate slightly more than the software TruForm by default. I suppose they reduced the tessellation factor to prevent the performance from being even worse.

Reply 1 of 27, by Scali

User metadata
Rank l33t
Rank
l33t
swaaye wrote:

Software Truform apparently sends much more data to the GPU. From what I understand, Truform tessellation can be seen as a form of compression, as a way to avoid added bus transfer of more geometric data and you lose this when it runs on the CPU.

That is not specific to TruForm, but is true for hardware T&L in general.
Namely, when you use hardware T&L, you store the geometry in object space in video ram, where it remains static.
For each frame, you only send the proper transform matrices and light parameters, and the GPU does the rest. So there is virtually no traffic going over the bus at all, eliminating the bottleneck.
If you have hardware TruForm, it's just a special case of hardware T&L, so you still keep the geometry static in vram.
However, when you have software TruForm, you need to process the geometry on the CPU for each frame, and send the updated geometry to the GPU over the bus each frame.

In general, TruForm and more advanced forms of tessellation can indeed be seen as a form of compression. But that is more about a different kind of bottleneck: if you want to have extremely detailed geometry, that takes a lot of vram. That breaks down in two ways:
1) You need to fetch more data per display area (because more polygons are used), requiring more vram bandwidth.
2) You eventually run out of vram, and need to page geometry in via system memory through the bus.

http://scalibq.wordpress.com/just-keeping-it- … ro-programming/

Reply 2 of 27, by swaaye

User metadata
Rank l33t++
Rank
l33t++

Scali that makes sense. I was digging through Beyond3D for info on Truform on these later cards and came across the compression concept of tessellation. I hadn't thought of it in those terms before.

It's interesting to me that ATI even tried to support it in software because the speed hit is so impractical for a Radeon 9700 era system. I think the added AGP traffic can even cause stability problems on the less than great Socket A chipsets of the day.

Reply 3 of 27, by Scali

User metadata
Rank l33t
Rank
l33t
swaaye wrote:

It's interesting to me that ATI even tried to support it in software because the speed hit is so impractical for a Radeon 9700 era system. I think the added AGP traffic can even cause stability problems on the less than great Socket A chipsets of the day.

It may have something to do with the fact that ATi tried to make tessellation their unique selling point (although the Matrox Parhelia also had advanced displacement mapping support).
They pushed TruForm for the 8500 series, so they may have felt obliged to keep supporting it, to save face (it's a standard feature in the DX8/DX9 API, known as N-patches... NV's equivalent is RT-patches).
They introduced a new generation of tessellation in the Xbox 360 GPU. The 9700 era may have just fallen by the wayside there. Perhaps they actually tried to add a new-and-improved tessellation system on that GPU (which they could have made backward compatible with TruForm), but for some reason it got cancelled (they do have render-to-vertexbuffer).

See here for the D3D documentation on these early tessellation methods: https://docs.microsoft.com/en-us/windows/desk … rder-primitives
They were dropped in DX10 and later because there was the geometry shader now, I suppose.

http://scalibq.wordpress.com/just-keeping-it- … ro-programming/

Reply 4 of 27, by an81

User metadata
Rank Newbie
Rank
Newbie

-

Last edited by an81 on 2020-01-03, 15:38. Edited 1 time in total.

Reply 5 of 27, by swaaye

User metadata
Rank l33t++
Rank
l33t++

Yeah it is incredibly CPU heavy and also apparently floods the AGP/PCIe bus with traffic. I saw in your link that sireric mentioned it is done with software + hardware which is about as strong a confirmation of that as possible. Seems mostly software to me! I suppose it's possible a 9600 Pro could be even slower at it. I don't have one to mess with unfortunately.

I wonder if the 2500K + X850 XT could handle those other Truform games adequately. Heh.

I messed with Call of Duty a bit. I discovered a new problem with Radeon 8500 this way. 🤣. In the presence of some kinds of pixel shading in OpenGL, mip mapping can break. KOTOR and NWN apparently have the same problem with 8500. A lack of mip mapping of course causes extreme texture aliasing. Not pretty. The game was also kinda flaky with GeForce cards. I think it was exposing some AGP problems of nForce1.

Reply 6 of 27, by an81

User metadata
Rank Newbie
Rank
Newbie

-

Last edited by an81 on 2020-01-03, 15:38. Edited 1 time in total.

Reply 7 of 27, by an81

User metadata
Rank Newbie
Rank
Newbie

-

Last edited by an81 on 2020-01-03, 15:38. Edited 3 times in total.

Reply 8 of 27, by swaaye

User metadata
Rank l33t++
Rank
l33t++

Yeah that's pretty slow! Hopefully the X800 XL and Ivy Bridge shockingly change things. 9600 Pro isn't exactly a speed demon in general anyway. X800 XL has 3-4x the geometry and fill rate, hierarchical Z, etc.

Reply 9 of 27, by Standard Def Steve

User metadata
Rank Oldbie
Rank
Oldbie

I've been doing lots of Haswell benchmarking on my HTPC lately. It's powered by an i5-4670K @ 4.4GHz. One thing I noticed while running memtest86 and AIDA64 is that Haswell's L1 cache is much faster than Sandy/Ivy's. Apparently Intel increased the L1 performance to keep the FPU fed while running AVX2 code.

However I've found the fast L1 to really shine in some non-AVX2 workloads too. Haswell is great for pushing absurdly high frame rates in old XP games. It's also a good bit faster than my 4.2GHz i7-2600K at emulation and benchmarks like 3DMark2000/2001. I'm pretty sure I still have an old X800XT somewhere. Could be interesting to see how much of an effect the greatly increased L1 bandwidth has on software Truform.

"A little sign-in here, a touch of WiFi there..."

Reply 10 of 27, by an81

User metadata
Rank Newbie
Rank
Newbie

-

Last edited by an81 on 2020-01-03, 15:39. Edited 1 time in total.

Reply 11 of 27, by an81

User metadata
Rank Newbie
Rank
Newbie

-

Last edited by an81 on 2020-01-03, 15:39. Edited 1 time in total.

Reply 12 of 27, by swaaye

User metadata
Rank l33t++
Rank
l33t++

BTW I had to manually install Catalyst 5.8 on the 2500K, through device manager. The installer didn't seem to like the platform. Catalyst Control Center can be manually installed in the setup directory. There is also a classic control panel version of the drivers, which may be preferable because early CCC is pretty flakey.

5-8_xp-2k_dd_cp_wdm_25203.exe

Reply 13 of 27, by willow

User metadata
Rank Member
Rank
Member
swaaye wrote:
I've been playing a bit of Return to Castle Wolfenstein lately, on Radeon 8500. Switched to Radeon 9700 Pro and saw the frame r […]
Show full quote

I've been playing a bit of Return to Castle Wolfenstein lately, on Radeon 8500. Switched to Radeon 9700 Pro and saw the frame rate drop badly in some cases. I was playing on a Athlon XP 1800+ with AGP 4x at the time. For those who aren't aware, the Radeon 8500/9100 R200 chip is the only GPU with full Truform processing. The later chips do it at least partially in software and the speed hit can be tremendous, depending on how many models in view are being tessellated. I don't recall anyone testing it.

I found a scene with a number of tessellated NPC models, which causes a major speed hit on the 9700. I collected frame rate numbers across 3 platforms with 4 different GPUs.

-the same install of RTCW was used for all systems
-set to High Quality 1600x1200x32
-varied CPU speed with RMClock and Throttlestop
-Catalyst 5.8 on Windows XP SP3
(fw = AGP fastwrites, tf = truform)
AGP 0x (PCI mode), 1x, 4x, 8x and PCIe x16 tested.
2Yh0AFcR_o.png

Thoughts:

One needs a rather fast CPU and a speedy bus to avoid a bottleneck with software Truform. The Core i5 2500K machine appears to max it out at a bit more than 2.4 GHz.

AGP 4x appears to be sufficient bus bandwidth, at least for the Athlon 64, but any less than that becomes a bottleneck. Software Truform apparently sends much more data to the GPU. From what I understand, Truform tessellation can be seen as a form of compression, as a way to avoid added bus transfer of more geometric data and you lose this when it runs on the CPU.

Also interesting is I noticed the Radeon 8500 appears to tessellate slightly more than the software TruForm by default. I suppose they reduced the tessellation factor to prevent the performance from being even worse.

Morrowind support trueform with a russian software (fps optimiser I think). With a 9700pro, the perfs was very bad.

Reply 14 of 27, by Standard Def Steve

User metadata
Rank Oldbie
Rank
Oldbie

Well, I found my X800XT, but it doesn't seem to be feeling too well. The graphics driver is frequently crashing.

I have two other ATI cards, but unfortunately they aren't anywhere near as powerful: a Radeon 9800 Pro and a PCI 9250 with 128-bit memory. I take it the 9250 is capable of hardware Truform?

I'd be testing both cards on this system:
PIII-S at 1628MHz
FSB=155MHz
AGP=77.5MHz, which I guess makes it AGP 5x. 😜 The 9800 Pro doesn't seem to mind at all.
PCI=38.75MHz, which should help the 9250 out a bit.

2GB of DDR RAM at 310MHz, 2-2-2-5 timings
QDI Advance 12T motherboard, VIA Aprollo Pro 266T chipset
X-Fi Platinum
XP SP3

Which scene were you using to test your machines?

"A little sign-in here, a touch of WiFi there..."

Reply 15 of 27, by Munx

User metadata
Rank Oldbie
Rank
Oldbie
Standard Def Steve wrote:

I take it the 9250 is capable of hardware Truform?

Its not. Only 8500, 8500LE and 9100, which is a rebadged 8500LE.

My builds!
The FireStarter 2.0 - The wooden K5
The Underdog - The budget K6
The Voodoo powerhouse - The power-hungry K7
The troll PC - The Socket 423 Pentium 4

Reply 16 of 27, by an81

User metadata
Rank Newbie
Rank
Newbie

-

Last edited by an81 on 2020-01-03, 15:39. Edited 1 time in total.

Reply 17 of 27, by an81

User metadata
Rank Newbie
Rank
Newbie

Got myself a Radeon 9600XT 256mb.

ZBkhoHY.png

ntdUbyP.png

I guess it does indeed need some raw polygon crunching power on the gpu to handle all the extra polys.

Reply 18 of 27, by lost77

User metadata
Rank Member
Rank
Member

Ran some test with a Sandy Bridge. Seems Direct3D is much faster and looks the same (to me anyway). Maybe old OpenGL has a lot of overhead.

EaoS2qk.png

Reply 19 of 27, by an81

User metadata
Rank Newbie
Rank
Newbie

I haven't realized it also runs in d3d, just tried mine and it's indeed a lot faster. Also, which one is the Memphis demo?