VOGONS


Last modern PCI vga speed results

Topic actions

First post, by 386SX

User metadata
Rank l33t
Rank
l33t

Hello,

I open this thread to post something I've found and already tested in the past but never cared much at first, about last PCI video cards and their known limitations I always thought (many others I suppose too) the PCI bus and the less discussed IC bridge choice, were usually the bottleneck of such alternative uncommon products.
If I take a video card I've got for this example, the Geforce 210 (or later the Geforce GT610) in old PCI (with two different IC bridges) version when tested in a supposed supported correct o.s. like Win 8.x, I always found similar (not exactly) limitations testing old games/benchmark which weren't oriented for modern unified shaders architectures.

Most of all the triangles/s rate always limited up to a 15MT/s average speed (lower on a different IC bridge). I was thinking the PCI-EX to PCI bridge to PCI-EX1 GPU translation along the PCI bus limitation itself was the problem but testing the cards in Linux using the Wine app as a x86 Win app launcher and its (I suppose) D3D to OGL wrapper, I was expecting lower scores and instead the bench numbers are incredibly different and also the real visible frame rates and this is impressive cause it's mostly an API translated situation using a slow cpu like the old Atoms.

As an example, 3DMark2001SE in Win 8.1 results in 6600 average score when in "Linux-Wine-D3D>OGL" it results in 8505 score. Now looking at the synthetic numbers it's even more impressive, the triangle/s that was usually around a fixed 15MT/s results in a variable 98MT/s that decrease to 28MT/s with 8 lights test... and it's far from a wrong number, the tests are clearly MUCH faster. I'll test soon 3DMark05 but I'd expect similar results beside the GPU pixel shading performance. In-game random slowdowns are still there which shows the existing PCI bridged work but with its variable frame rate the difference still is impressive. So the PCI bus isn't (not totally) a problem here, the triangle/s almost 10x faster result shows that the 133MB/s PCI bus of the NM10 chipset should be enough in old games/bench compared to the usual results seen in modern Win tests these cards were oriented to at first and usually reviewed.

The only explanation I can think of is that most of these old bench/games not being compatible for any modern GPUs and their Win drivers and not even the GPUs themself were supposed to run old games which in facts run usually quite "slow" on modern Win o.s. (compared to their specs) while in Linux even adding the D3D>OGL translation, it probably use lower level OpenGL calls to force these GPUs for older games rendering. Modern o.s. seems to have both the WDM modern windows manager with "3D" GPU acceleration on and a much heavier background processes number added to the equation too and I suppose DX7/8/9 rendering runs in some "wrapper" logic too to keep the background win 3D accelerated GUI running or something like that which might cost in terms of speed. Which would make sense considering how fast modern GPUs are but probably not ideal for the old isolated Direct3D games logic fully using the GPU for the running game I imagine.
Maybe with the Win Compatibility Tool things could get better but it's not easy to find a good balance of tweaks in the app. I would not expect anyway much difference in the two o.s. with heavy pixel shaders modern bench like Unigine Valley or similar, where the problem is everything about the card GPU included, but at least in older games I wonder they could run faster.
Any opinions?

Thanks, bye.

Config: 1,9Ghz Atom, 8GB DDR3, Geforce 210 PCI 512MB

(1) 3DMark2001SE results with Linux x64, Wine 5.0 and latest NV linux drivers

(2) 3DMark2001SE results with Win 8.1 and latest NV Win drivers

Attachments

  • geforce210pci_linux.jpg
    Filename
    geforce210pci_linux.jpg
    File size
    97.59 KiB
    Views
    1345 views
    File license
    Fair use/fair dealing exception
  • geforce210PCI_Win.jpg
    Filename
    geforce210PCI_Win.jpg
    File size
    45.01 KiB
    Views
    1345 views
    File license
    Fair use/fair dealing exception
Last edited by 386SX on 2022-05-24, 19:26. Edited 8 times in total.

Reply 1 of 25, by Joseph_Joestar

User metadata
Rank l33t
Rank
l33t
386SX wrote on 2022-05-24, 11:52:

I'll compare with the old tests and post more results about this. Any opinions?

Interesting analysis!

BTW, PixelPipes has a video comparing the PCI, PCI-e 1x and PCI-e 16x versions of a GT 520 card. FarCry seems to be one of the games where the differences between these cards become more apparent.

PC#1: Pentium MMX 166 / Soyo SY-5BT / S3 Trio64V+ / Voodoo1 / YMF719 / AWE64 Gold / SC-155
PC#2: AthlonXP 2100+ / ECS K7VTA3 / Voodoo3 / Audigy2 / Vortex2
PC#3: Athlon64 3400+ / Asus K8V-MX / 5900XT / Audigy2
PC#4: i5-3570K / MSI Z77A-G43 / GTX 970 / X-Fi

Reply 2 of 25, by 386SX

User metadata
Rank l33t
Rank
l33t

It's indeed a interesting video. Of course the PCI-EX1 native cards solve one big variable of the equation not having to need a IC bridge translation and through the PCI bus itself and if I remember correclty the numbers clearly confirm a big difference while not solving the sw part of the retrocompatibility problem (I should use an older o.s. for sure but these cards were oriented anyway to modern o.s. too so I should expect similar results).

Imho I think there're different levels of "problems" involved in these PCI cards which usually are often thought as a "PCI bus only limitation" which isn't the only one imho. The PCI bus has its limitations for sure, the "PCI-EX<>PCI IC bridge" choice itself seems also a complex hardware for a complex task with latency/cache/frames stability and is a variable too, the WDDM drivers/WDM GUI o.s. logic in Win are variables, the old games/bench usually these low end cards are tested on are another variable too cause not nearly oriented to run on modern o.s. imho after XP/Vista, while the modern bench/games can't run fast enough cause the low end GPU at first.

So I think the PCI bridged concept at some point might be limiting but the above bench numbers make me think much about modern o.s. retrocompatibility more than these hw PCI concept limit at least in older Directx7/8/9 games where the o.s. sw layers complexity becomes a big variable. I suppose same thing happens with any modern GPUs still nowdays running online store early 2000 version of those games that runs on modern GPU but I wonder if they should run 100x faster than they does considering the specs and the point these games were running fast on incredibly ancient hardware compared to modern ones.

Last edited by 386SX on 2022-05-26, 21:02. Edited 2 times in total.

Reply 3 of 25, by 386SX

User metadata
Rank l33t
Rank
l33t

Update: tested 3DMark05 in Linux recent kernel x64 on a light QT GUI, Wine 5.0 with NV proprietary driver. Compared to the score I remember (EDIT: found also the old Win 8.1 tests I did) the Geforce 210 PCI 512MB DDR3 and the Atom 1,9Ghz dual, results in 3000 points while in Win 8.1 x64 resulted in 2358 points.

Here the image of the tests result and interesting still the triangle/s score is quite high. Of course the fps difference as expected is not extremely high I suppose the GPU's 16 unified shaders begin to be a major bottleneck.

Impressive how much different the geometry speed tests are. The CPU score which isn't important I don't understand.. EDIT: as suggested it's how 3DMark05 "CPU" test work, thanks.
Pixel shaders SM3.0 performance begins to be higher in Win as expected from a more optimized D3D native very heavy pixel shaders test. Again the 15MT/s limit appears in the Win test. I suppose the next tests might be some real games like Far Cry as suggested above. 😉

Both are with the same cpu and same Geforce 210 PCI config:

1) Win 8.1 results

2) Linux results with Wine through OpenGL 3.3 calls. (The score is interesting, similar to the 3DMark05 "faster" GT520 PCI score of the PixelPipes mentioned video if I've read correctly but on a Geforce 210 PCI slower card with the Atom cpu.. well is already a good sign I'd say when I'll test again the GT610 PCI in Linux)
.

Attachments

Last edited by 386SX on 2022-05-24, 23:25. Edited 2 times in total.

Reply 4 of 25, by agent_x007

User metadata
Rank Oldbie
Rank
Oldbie

So few thing to note :
1) GT 210 is unified shader, so it can throw 15 vertex shaders at the problem if circumstances align, and all those work at shader clock.
So yes it can be fast.
2) PCI is still really limiting for GT 210.

3DMark 01 SE.PNG
Filename
3DMark 01 SE.PNG
File size
789.03 KiB
Views
1185 views
File license
Fair use/fair dealing exception

Here's GT 520 result for comparison :

3DMark 01 SE.PNG
Filename
3DMark 01 SE.PNG
File size
792.4 KiB
Views
1185 views
File license
Fair use/fair dealing exception

3) 3DMark 05's CPU test is render test that relies on GPU to do stuff, so a faster GPU (or driver for it) = faster rendering.
That's why it was ditched in 3DMark 06 version (and why I prefer to use 06 instead of 05).
4) You REALLY don't need Vertex shader performance on old stuff (DX7 and earlier), actual fillrates are most important in there.

157143230295.png

Reply 5 of 25, by 386SX

User metadata
Rank l33t
Rank
l33t
agent_x007 wrote on 2022-05-24, 22:06:
So few thing to note : 1) GT 210 is unified shader, so it can throw 15 vertex shaders at the problem if circumstances align, and […]
Show full quote

So few thing to note :
1) GT 210 is unified shader, so it can throw 15 vertex shaders at the problem if circumstances align, and all those work at shader clock.
So yes it can be fast.
2) PCI is still really limiting for GT 210.
3DMark 01 SE.PNG
Here's GT 520 result for comparison :
3DMark 01 SE.PNG

3) 3DMark 05's CPU test is render test that relies on GPU to do stuff, so a faster GPU (or driver for it) = faster rendering.
That's why it was ditched in 3DMark 06 version (and why I prefer to use 06 instead of 05).
4) You REALLY don't need Vertex shader performance on old stuff (DX7 and earlier), actual fillrates are most important in there.

Thanks, very interesting. I didn't know about 3DMark05 CPU test but I began to suspect that something was different (while still it's interesting the same thing doesn't apply to Win 8.1 running the card which results in that low score.. I might test it again to be sure I don't know if something was wrong during the test but should be almost similar).

Interesting the PCI-EX x16 XP 3DMark2001SE scores, I'll consider as fastest example; indeed these PCI design in its complexity (it's not like an AGP to PCI-EX early and proprietary (!) IC solution of course) is a limit, but I think that limit might be set a bit higher than I thought at least using modern o.s. which these card should be oriented to (WDDM1.2 and WDDM1.3 drivers) and actually should perform "better" than XP running its apps but not intended to run on these GPUs. Instead from what I read XP still seems the fastest config to take as example which I don't think the Linux results I'm reading might get close but interesting still faster of the supported Win 8.x. Reading my old tests (I'll do the test anyway) I can quote my own old results with the GT610 PCI and the Atom:

"....I write the score to make it fast (GT610 PCI and Linux x64/Wine 5.x vs same GT610 PCI on Win 8.1 x64, both proprietary drivers)

3DMark05 Score: 3535 (vs 2080 of Win 8.x)
GT1 - Return to Proxicon: 15.2 fps (vs 12.0 fps of Win 8.x)
GT2 - Firefly Forest: 7.9 fps (vs 5.6 fps of Win 8.x)
GT3 - Canyon Flight: 23.4 fps (vs 8.5 fps of Win 8.x!!!)

Feature Tests:
Fill Rate Single Texturing: 1885.3 Mtexels/s (vs 1134)
Fill Rate Multi Texturing: 5277 (almost identical)
Pixel Shaders: 137 fps (vs 165 fps this is slower)
Vertex Shaders Simple: 48.6 MVertice/s (vs 6.0MV/s !!!)
Vertex Shaders Complex: 51.3 MVertice/s (vs 10.9 MVertice/s !!!)................."

From these old tests I suppose these number might be closer to the XP o.s. with the GT520/610 PCI scenario (which of course are slower than any o.s. with the PCI-EX version).
3DMark2001SE seems to depend much more than I thought even on modern o.s. on the CPU speed where even the GT610 PCI increase with a Core2 more than double the game tests fps while the newer bench results are similar, I suspect the way old Dx8.1 and maybe even 9.0 rendering is done on modern Win o.s. "doesn't like" the PCI design of these cards cause the PCI vertex limits might be closer to the ones seen above around 50MV/s in 3DMark05 (or 3DMark2001SE around 90MT/s in linux/Wine). An idea come from that GT610 PCI simple/complex numbers similar where instead the GT210 PCI numbers in the complex test has an impact on the speed. They use two different IC bridges (which is a factor to consider) but that 50MV/s sound like a possible PCI bridge real limit (beside a ipotethical proprietary IC might improve).

While these numbers are through Linux/OpenGL calls doesn't change the PCI design, so I wonder if these are similar in XP o.s. which I don't know for 3DMark05; if anyone can post single test results of any GT210/520/610 PCI cards in 3DMark05/06 and XP it'd be also interesting. About Vertex Shaders vs Fill Rate performance it's really true. But still that triangle/s difference from Win 8.x to Linux (and I suspect also compared to XP) is interesting in this PCI card.

Reply 6 of 25, by Sphere478

User metadata
Rank l33t++
Rank
l33t++
Joseph_Joestar wrote on 2022-05-24, 12:24:
386SX wrote on 2022-05-24, 11:52:

I'll compare with the old tests and post more results about this. Any opinions?

Interesting analysis!

BTW, PixelPipes has a video comparing the PCI, PCI-e 1x and PCI-e 16x versions of a GT 520 card. FarCry seems to be one of the games where the differences between these cards become more apparent.

They also have a video showing a 32-bit pci video card comparison.

It’s interesting to see how much further pci can go past when pci was relevant for video

Socket 7 before ss7 and pII was probably the last time it was relevant

The best scores I’ve been able to get on that platform has been with a k6-3+ at 400, and a pci radeon 7500 if I recall correctly, my record scores are around 2000 and that’s with a video card and processor that is not period correct to s7 (newer)

Seeing you take the score like 12 times higher really outlines the fact that a faster bus really wasn’t needed yet. Didn’t hurt I mean, but wow.

Sphere's PCB projects.
-
Sphere’s socket 5/7 cpu collection.
-
SUCCESSFUL K6-2+ to K6-3+ Full Cache Enable Mod
-
Tyan S1564S to S1564D single to dual processor conversion (also s1563 and s1562)

Reply 7 of 25, by chiveicrook

User metadata
Rank Newbie
Rank
Newbie

While Linux vs Windows score difference might be interesting, is it possible to confirm that 3dmark via wine is actually rendering everything correctly? For all I know there might be significant shortcuts taken in wine's d3d implementation.
It would be best to use opengl benchmark with both native windows and native linux port.

Reply 8 of 25, by 386SX

User metadata
Rank l33t
Rank
l33t

I've installed in linux the original DVD Far Cry version patched to 1.40. Now I don't remember, how the benchmark should start on the game? There was a command line option or what? 😁

Maybe I'll also test its OpenGL engine renderer which in Linux might improve the final speed not using the D3D>OGL translation. Last time I tried that in Win it didn't work I suppose cause the GMA on board GPU I was testing not exactly easy to find compatibility.

Anyway the first minutes of the game with 1024x768 High settings no AA, seems to run well until the player goes out to the beach scenario where the performance take quite an impact (I'd say around 15fps). Faster than the GMA SGX on board GPU for sure at this detail/resolution. I think with Win 8.1 the average speed was similar, I didn't write the average FPS speed so just a feeling. I can only compare the results with the online ones and the mentioned youtube video.

Reply 9 of 25, by 386SX

User metadata
Rank l33t
Rank
l33t
chiveicrook wrote on 2022-05-25, 07:40:

While Linux vs Windows score difference might be interesting, is it possible to confirm that 3dmark via wine is actually rendering everything correctly? For all I know there might be significant shortcuts taken in wine's d3d implementation.
It would be best to use opengl benchmark with both native windows and native linux port.

I was thinking that too at first but at the end it doesn't change the logic of the test. If the whole "PCIEX to PCI bridge to PCIEX GPU@x1 and back" translation is a limit (and it is) that limit would be an hw "bandwidth/IC translation/latency" bottlenecks multiple limits and it'd be interesting to understand and estimate these limits. The "shortcuts" using Linux/Wine imho might exists but the o.s. could be generally lighter than modern Win o.s. (where also retrocompatibily comes with some cost) while in linux OpenGL calls might work at a lower level with an higher process isolation even if there's still the Direct3D to OpenGL translation that has a cost in linux too.

The rendering quality looks "the same" but of course is not easy to compare and it shouldn't help too cause using two different API to sort of doing the same rendering but more interesting are the syntethic scores and their frame rates which are real; some different rendering quality might boost frame rates but not that much. If the polygons/s rate was limited "that much" by the PCI design, I'd not expect 10 times increase difference in linux in any ways. Of course I'm not testing the rendering accuracy but wondering that these old softwares usually tested on these cards might work better in older o.s. and if tested on modern ones other variables might decrease the final speed before analyzing the PCI limitations.

The question now would be to know synthetic (not necessary the final main score) single bench numbers of these cards on XP to compare with the above Linux ones.

Reply 10 of 25, by agent_x007

User metadata
Rank Oldbie
Rank
Oldbie

You don't send triangles/polygons over PCI port.

You only send points coordinates in space (or so called "input assembly" for DirectX or "vertex array data" for OpenGL), and other data needed for what you are doing with it (example : textures, programs for shaders, etc.).
ALL manipulation on initial data (first as vertices, then polygons/objects, and later pixels), GPU does internally. So, the moment you talk about checking "vertices" or triangle/sec performance - you are off the bus and in cache/VRAM territory, which makes PCI bus irrelevant (to a point).

Keep in mind : Old synthetic benchmarks are made to rely on slot bandwidth as little as possible (or on it's latency). After all, main point of them was to show GPU performance that ISN'T constrained by platform that you use (PCI/PCIe port included).

Example of purpose made PCI bandwidth test is "PCI Express Feature Test" from 3DMark benchmark, but I doubt you simply want to know how fast PCI port actually is (you are more interested in performance difference between OS'es that are made possible for different approach to rendering pipeline).

In short :
Don't get too exited on performance difference between OS'es in basicly "lab environment".
Testing in actual games (using timedemos or build-in benchmarks tools), is best approach to see how things scale between OS'es. Having "super speed" on Vertex Shader, is pointless when you are VRAM bandwidth starved later down the road, or by ROPs/TMUs performance.

157143230295.png

Reply 11 of 25, by Joseph_Joestar

User metadata
Rank l33t
Rank
l33t
386SX wrote on 2022-05-25, 08:02:

I've installed in linux the original DVD Far Cry version patched to 1.40. Now I don't remember, how the benchmark should start on the game? There was a command line option or what?

I don't think FarCry has a built-in benchmark.

Back in the day, people used the "HardwareOC FarCry Benchmark" utility at the "Ultra Detail" preset. You can download it from here.

PC#1: Pentium MMX 166 / Soyo SY-5BT / S3 Trio64V+ / Voodoo1 / YMF719 / AWE64 Gold / SC-155
PC#2: AthlonXP 2100+ / ECS K7VTA3 / Voodoo3 / Audigy2 / Vortex2
PC#3: Athlon64 3400+ / Asus K8V-MX / 5900XT / Audigy2
PC#4: i5-3570K / MSI Z77A-G43 / GTX 970 / X-Fi

Reply 12 of 25, by chiveicrook

User metadata
Rank Newbie
Rank
Newbie
386SX wrote on 2022-05-25, 09:06:
I was thinking that too at first but at the end it doesn't change the logic of the test. If the whole "PCIEX to PCI bridge to PC […]
Show full quote
chiveicrook wrote on 2022-05-25, 07:40:

While Linux vs Windows score difference might be interesting, is it possible to confirm that 3dmark via wine is actually rendering everything correctly? For all I know there might be significant shortcuts taken in wine's d3d implementation.
It would be best to use opengl benchmark with both native windows and native linux port.

I was thinking that too at first but at the end it doesn't change the logic of the test. If the whole "PCIEX to PCI bridge to PCIEX GPU@x1 and back" translation is a limit (and it is) that limit would be an hw "bandwidth/IC translation/latency" bottlenecks multiple limits and it'd be interesting to understand and estimate these limits. The "shortcuts" using Linux/Wine imho might exists but the o.s. could be generally lighter than modern Win o.s. (where also retrocompatibily comes with some cost) while in linux OpenGL calls might work at a lower level with an higher process isolation even if there's still the Direct3D to OpenGL translation that has a cost in linux too.

The rendering quality looks "the same" but of course is not easy to compare and it shouldn't help too cause using two different API to sort of doing the same rendering but more interesting are the syntethic scores and their frame rates which are real; some different rendering quality might boost frame rates but not that much. If the polygons/s rate was limited "that much" by the PCI design, I'd not expect 10 times increase difference in linux in any ways. Of course I'm not testing the rendering accuracy but wondering that these old softwares usually tested on these cards might work better in older o.s. and if tested on modern ones other variables might decrease the final speed before analyzing the PCI limitations.

The question now would be to know synthetic (not necessary the final main score) single bench numbers of these cards on XP to compare with the above Linux ones.

I agree that OS differences are not directly related to the fact that PCI is not as limiting factor as could be first thought.
However, apples to apples comparison would be beneficial in finding out which workloads are most affected by lower bus throughput of PCI.
Hypothetically, wine's d3d implementation could be emulating some stuff which usually uses fixed-function HW by using shaders, which could potentially perform better by executing more stuff in-place on the gpu (this is pure speculation and likely wrong but it illustrates my point I believe). This is somewhat similar to the fact that some glide+direct3d games for example run much better with nglide than with original direct3d renderer on new gpus, which sometimes struggle with old apis.

I've found this cool nvidia's graphic from 2003:
bottlenecks.png

Reply 13 of 25, by 0xCats

User metadata
Rank Newbie
Rank
Newbie

Windows run D3D6 code (3dmark 2001 is largely directx6) natively as designed, with all it's problems and the largely CPU side TCL logic.
Additionally to this under Windows 7,8+ D3D6 code runs via a legacy API translation layer which implements a lot of things in CPU side software. All GPU's since the unified shader architecture do the same to approximate older deprecated Directx6 & 7 hardware fixed function logic in software. XP in general outperforms Windows 7/8/10 in 3dmark 2001 every single time. The 2001SE benchmark is very CPU tied in general.

Now Linux has a superior memory allocator and scheduler, as well as generally (* depends on distro) being lighter on CPU and memory resources. This is of benefit for a lowly Atom core.
Wine intentionally also optimizes old slow Win32 API & DDX etc logic to faster Linux userspace API equivalents.
Further to this Wined3d translates old D3D6/7/8/9 logic to modern OpenGL3.3 equivalents. This in turn removes a lot of CPU overhead related to otherwise CPU bound fixed function logic.

As part of that a lot of Vertex data is VBO transformed, cached to VRAM and batching is used to improve the draw call rate. (As opposed to Windows native which re-transfers that geometry over PCI for every drawn frame)
I personally know one of the Wine D3D developers spent ages working on the legacy API to GL3.3 translation and achieved performance wins above native, since wine was always very popular for retro gaming under linux.
Happy to see it still working well as demonstrated.

Last edited by 0xCats on 2022-05-25, 11:54. Edited 1 time in total.

There are two types of devices, those that know they've been hacked and those that don't yet know they're going to be hacked.

Reply 14 of 25, by 386SX

User metadata
Rank l33t
Rank
l33t
agent_x007 wrote on 2022-05-25, 10:36:
You don't send triangles/polygons over PCI port. […]
Show full quote

You don't send triangles/polygons over PCI port.

You only send points coordinates in space (or so called "input assembly" for DirectX or "vertex array data" for OpenGL), and other data needed for what you are doing with it (example : textures, programs for shaders, etc.).
ALL manipulation on initial data (first as vertices, then polygons/objects, and later pixels), GPU does internally. So, the moment you talk about checking "vertices" or triangle/sec performance - you are off the bus and in cache/VRAM territory, which makes PCI bus irrelevant (to a point).

Keep in mind : Old synthetic benchmarks are made to rely on slot bandwidth as little as possible (or on it's latency). After all, main point of them was to show GPU performance that ISN'T constrained by platform that you use (PCI/PCIe port included).

Example of purpose made PCI bandwidth test is "PCI Express Feature Test" from 3DMark benchmark, but I doubt you simply want to know how fast PCI port actually is (you are more interested in performance difference between OS'es that are made possible for different approach to rendering pipeline).

In short :
Don't get too exited on performance difference between OS'es in basicly "lab environment".
Testing in actual games (using timedemos or build-in benchmarks tools), is best approach to see how things scale between OS'es. Having "super speed" on Vertex Shader, is pointless when you are VRAM bandwidth starved later down the road, or by ROPs/TMUs performance.

This is interesting but I think to remember a user post in the past that discussed about the PCI being very soon a possible vertex geometry limitation and on that I was thinking the Win 8.1 (not ideal o.s. for these bench of course) 10 or 15MT/s speed was the heaviest example of that and generally of the bad speed of these PCI probably last ever cards.
But this test (maybe pointless but still interesting to understand why) shows that triangles/s limitation was not that one and maybe as you said not much related to the PCI complex concept but still doesn't explain why in Win the retrocompatibility can't reach the s ame level of GPU usage. Of course having Win XP 3DMark05 syntethic numbers would help here but I wonder if modern Win o.s. run old Direct3D rendering through some layers (maybe not a wrapper but ipotethically like a sort of ANGLE sw for web browsers) which explain the existance of the Compatibility Tool sw anyway where it can help.

Having seen the same 10>15MT/s values on 3DMark2001SE, 03, 05, 06, make me think at first the limitation was really about these cards cause 3DMark06 isn't exactly that old and on the same CPU SoC using the on board GPU (GMA3600) the "T&L" triangle/s reach 24MT/s in Win 8.1. Which confused me into thinkin again the PCI translation had something to do there.
When talking about older 3DMark2000 Dx7 bench it sure runs with some compatibility mode cause limited to 20-30fps, totally needing a DirectDraw api and isolation to increase speed (which I tried and increased 1000 points easily on the GMA gpu) but I understand such old APIs are only supported in compatibility. But more modern ones I don't understand.. the CPU doesn't have the speed to offload GPU task (it's an Atom...) in some wrapper way so there's something wrong somewhere which doesn't show in a OpenGL isolated lighter linux scenario.

Maybe as said above, Wine d3d make use of modern calls to run as an old acceleration even with the unified shaders architectures and explaining these numbers but I don't understand why the same things doesn't happens in Win beside not really retrogaming oriented anymore with a very complex different enviroment.

Reply 15 of 25, by 386SX

User metadata
Rank l33t
Rank
l33t
0xCats wrote on 2022-05-25, 11:41:
Windows run D3D6 code (3dmark 2001 is largely directx6) natively as designed, with all it's problems and the largely CPU side TC […]
Show full quote

Windows run D3D6 code (3dmark 2001 is largely directx6) natively as designed, with all it's problems and the largely CPU side TCL logic.
Additionally to this under Windows 7,8+ D3D6 code runs via a legacy API translation layer which implements a lot of things in CPU side software. All GPU's since the unified shader architecture do the same to approximate older deprecated Directx6 & 7 hardware fixed function logic in software. XP in general outperforms Windows 7/8/10 in 3dmark 2001 every single time. The 2001SE benchmark is very CPU tied in general.

Now Linux has a superior memory allocator and scheduler, as well as generally (* depends on distro) being lighter on CPU and memory resources. This is of benefit for a lowly Atom core.
Wine intentionally also optimizes old slow Win32 API & DDX etc logic to faster Linux userspace API equivalents.
Further to this Wined3d translates old D3D6/7/8/9 logic to modern OpenGL3.3 equivalents. This in turn removes a lot of CPU overhead related to otherwise CPU bound fixed function logic.

As part of that a lot of Vertex data is VBO transformed, cached to VRAM and batching is used to improve the draw call rate.
I personally know one of the Wine D3D developers spent ages working on the legacy API to GL3.3 translation and achieved performance wins above native, since wine was always very popular for retro gaming under linux.
Happy to see it still working well as demonstrated.

Thanks for the interesting answer that seems to confirm what I was thinking above that these cards released for "modern" Win can't perform enough fast on modern games but can't perform enough fast on older ones too cause the old API logic/layers. But the PCI bridged bus which indeed is a limitation, not necessary seems to be as high as I thought at first. Of course the real point is that these cards didn't have a real gaming target beside using an old o.s. like XP maybe while in the modern o.s. beside improving compatibility, the slow GPU used is just as limiting as the PCI forced logic which also introduce the 30 watts total PCI bus limitation too (which is the reason these GPU were choosen).
Unfortunately I don't have XP to test these and have more comparable numbers vs linux or Win 8.x.

Something I was thinking and I don't understand about 3Dmark2001SE, it sure scale a lot depending on the CPU but not clear why. Last time I tried with the GT610 PCI from the Atom to the Core2 both using "Pure T&L" the increased speed with the faster cpu was a lot as "usual" (even if not that usual from a T&L rendering) but when tested the "Software T&L" path, the speed decrease difference was equally big which from such CPU demanding benchmark I would not expect.

Last edited by 386SX on 2022-05-25, 12:09. Edited 2 times in total.

Reply 16 of 25, by chiveicrook

User metadata
Rank Newbie
Rank
Newbie
386SX wrote on 2022-05-25, 11:46:

Maybe as said above, Wine d3d make use of modern calls to run as an old acceleration even with the unified shaders architectures and explaining these numbers but I don't understand why the same things doesn't happens in Win beside not really retrogaming oriented anymore with a very complex different enviroment.

Because it's not commercially viable to put any man-hours into performance optimizations for legacy products. It's supposed to run just well enough for marketing to put "fully backwards compatible" sticker on 😀
Windows gaming ecosystem is all about throwing more power at problems. All that matters for $$$ is "the latest and greatest" anyway.

Reply 18 of 25, by agent_x007

User metadata
Rank Oldbie
Rank
Oldbie

GT 610 PCI WinXP :

3DMark 2005.PNG
Filename
3DMark 2005.PNG
File size
954.34 KiB
Views
929 views
File license
Fair use/fair dealing exception
3DMark 2001 SE.PNG
Filename
3DMark 2001 SE.PNG
File size
707.36 KiB
Views
926 views
File license
Fair use/fair dealing exception

157143230295.png

Reply 19 of 25, by 386SX

User metadata
Rank l33t
Rank
l33t
agent_x007 wrote on 2022-05-25, 15:09:

GT 610 PCI WinXP :
3DMark 2005.PNG
3DMark 2001 SE.PNG

Great thanks! Now.. geometry speed is the same of Win 8.1 test and even the final speed with such powerful CPU is similar compared to my Atom I see. Single results seems also similar to the ones I had with Win 8.1 but very interesting still much slower than Linux with Wine even compared to the Geforce 210 PCI test I posted above or the GT610 PCI old test numbers I wrote above. I wasn't expecting this... the 3DMark2001SE of course benefit from the CPU (whatever the reason is) but the 10MT/s limit is still there which I wasn't expecting at all. Until I read these numbers I'd have thought the Win 8.1 discussed layers might have been the main reason for these low numbers but it seems almost the same situation with XP too.

I think that these last PCI video cards using reference drivers designed for PCI-EX configs and generic o.s. PCIex-PCI driver for the IC bridge at least from a benchmark point of view, end up working better in Linux with OpenGL than Win with Direct3D9. I can't see any reasons the same linux speed I've seen and posted above (or even higher) should not be equal in the correct o.s. these benchmarks were written for, with XP drivers on such light o.s. etc.. considering also Wine has to run these not native apps with an API translation while still resulting much faster than a native config (?). Of course this may not mean that all games run faster but it's interesting anyway for a Directx8.1/9.0c scenario.

Maybe (just a theory) this PCI concept cards might have needed some specific optimized/lighter drivers to better work in Win o.s. instead of using the generic heavy one I suppose optimized for anything but not such PCI design. And maybe also the PCI bridge work in the two o.s. in different ways. This really is surprising and unexpected and probably only happen with such specific video cards and I wonder that maybe these cards could have been optimized a bit more on the sw side.
I tried a native linux OpenGL benchmark that exists also in Win if anyone want to test (with the Geforce 210 PCI in my case) with Unigine Heaven with native app/OpenGL calls (without Wine and anything else). @ 1027x768 Medium details no AA I got:

Unigine Heaven Benchmark 4.0
FPS:5.6
Score:141
Min FPS:3.7
Max FPS:9.7

Settings Render:OpenGL
Mode:1024x768 fullscreen
Preset Custom
Quality Medium
Tessellation: Disabled

I found also the old 3DMark05 results I did instead with the same GT610 PCI /Atom/ Win 8.1 x64 that for some reason are a bit slower than the Geforce 210 PCI. I always thought for the different IC bridge but still the linux results even only looking the in-game tests fps remains high and XP incredibly doesn't change that.
Of course I'm not saying that these cards in Linux are fast cards.. they still can't even be compared to any PCI-EX card. But it'd be interesting to understand why and how these tests run on such different o.s. and the technical reasons.

Attachments

Last edited by 386SX on 2022-05-26, 07:54. Edited 3 times in total.