EISA Graphics card benchmark results

Reply 20 of 50, by gdjacobs

Posted on 2017-03-06, 12:23

gdjacobs Offline

Rank l33t++

Rank: l33t++
Posts: 7626
Joined: 2015-11-03, 05:51
Location: The Great White North

To me, it's the result of an inescapable linguistic association. The implication of calling a video accelerator chip a GPU is that it will have similar flexibility to a CPU, allowing proper programmability instead of a limited menu of preset configurations. Although you could technically handle the full graphics pipeline on this generation of chip, many titles didn't use hardware TCL and opted to handle it in CPU, presumably because the way the Geforce was configurable wasn't appropriate to what programmers wanted to do.

All hail the Great Capacitor Brand Finder

Reply 21 of 50, by Scali

Posted on 2017-03-06, 12:50

Scali Offline

Rank l33t

Rank: l33t
Posts: 4873
Joined: 2014-12-13, 14:24

gdjacobs wrote:
Although you could technically handle the full graphics pipeline on this generation of chip, many titles didn't use hardware TCL and opted to handle it in CPU, presumably because the way the Geforce was configurable wasn't appropriate to what programmers wanted to do.

Say what? The only titles that didn't use hardware T&L were the ones that ran on outdated D3D engines (you needed at least DX7 to make use of it).
In OpenGL you used it 'by default' because the OpenGL API is higher-level than D3D, and the driver would transparently enable T&L, since the OpenGL API allows the driver to manage geometry buffers by itself. So it could create buffers in video memory and configure them for T&L.
This automagically gave even older titles such as Q3A a boost when running on a T&L-enabled card (and added to the myth that OpenGL was faster than D3D).
See also here for example: http://www.anandtech.com/show/391/5
The original GeForce256 already had a very powerful T&L pipeline, which covered all functionality of the OpenGL pipeline. Which was more than what the average game required.
If you read the article closely it mentions certain games that used a custom T&L pipeline *before* T&L was available. The reason was not because OpenGL or D3D couldn't deliver the effects they wanted, but rather that prior to HW T&L you had to optimize your CPU pipeline for best performance.

http://scalibq.wordpress.com/just-keeping-it- … ro-programming/

Reply 22 of 50, by gdjacobs

Posted on 2017-03-06, 14:20

gdjacobs Offline

Rank l33t++

Rank: l33t++
Posts: 7626
Joined: 2015-11-03, 05:51
Location: The Great White North

They did pretty much map the hardware TCL unit to OpenGL, didn't they? Not a surprise considering where so many of their people came from.

Well, certainly there were cel shaded titles on console platforms which wouldn't be possible with the Direct3d 7 vertex pipeline. Interestingly, the Flipper chip was one of the graphical chips in question and did it with a fixed function vertex pipeline, it just had the requisite fixed function configuration options to handle this in hardware.

Interestingly, the Q3 engine doesn't use per vertex lighting. It accelerates vertex transformations in hardware, though.

All hail the Great Capacitor Brand Finder

Reply 23 of 50, by Scali

Posted on 2017-03-06, 14:41

Scali Offline

Rank l33t

Rank: l33t
Posts: 4873
Joined: 2014-12-13, 14:24

gdjacobs wrote:
Well, certainly there were cel shaded titles on console platforms which wouldn't be possible with the Direct3d 7 vertex pipeline.

Why not? There are various approaches to cel shading, many of which can map just fine to a D3D7 or OGL fixedfunction pipeline.

gdjacobs wrote:
Interestingly, the Q3 engine doesn't use per vertex lighting. It accelerates vertex transformations in hardware, though.

Yes, you don't necessarily have to use the 'L' part in 'T&L'. T&L with 0 lightsources is perfectly valid.
In fact, sometimes you can use a hybrid.
Back in the day I optimized a pipeline for GeForce2 where I did some of the processing on the CPU, where I updated some parameters in a vertexbuffer, and then passed it through the T&L pipeline. The GF2 could treat my preprocessed data as constants in its T&L pipeline.
This still give me much better performance than doing the entire T&L on the CPU.

http://scalibq.wordpress.com/just-keeping-it- … ro-programming/

Reply 24 of 50, by gdjacobs

Posted on 2017-03-06, 15:13

gdjacobs Offline

Rank l33t++

Rank: l33t++
Posts: 7626
Joined: 2015-11-03, 05:51
Location: The Great White North

Scali wrote:
gdjacobs wrote:
Well, certainly there were cel shaded titles on console platforms which wouldn't be possible with the Direct3d 7 vertex pipeline.

Why not? There are various approaches to cel shading, many of which can map just fine to a D3D7 or OGL fixedfunction pipeline.

It was my less than perfect understanding that the Direct3d TCL unit implemented Blinn-Phong.

All hail the Great Capacitor Brand Finder

Reply 25 of 50, by Scali

Posted on 2017-03-06, 15:25

Scali Offline

Rank l33t

Rank: l33t
Posts: 4873
Joined: 2014-12-13, 14:24

gdjacobs wrote:
It was my less than perfect understanding that the Direct3d TCL unit implemented Blinn-Phong.

There's a whole bunch of things you can make the T&L unit do.
Lights are processed with Blinn-Phong to generate per-vertex diffuse and specular components.
Of course you can manipulate the parameters of the light and the per-vertex normals to get entirely different effects than just the smooth shading you would expect.
But there's also the transform itself, and the texcoord generation modes (complete with texture matrices), and then there's what you do with all this data in the pixel stage.
One interesting way to get the extreme banding effect of cel-shading is to use an extreme gamma ramp.

And that's just a single pass. You can of course render multiple passes with slightly modified geometry, and then composite these different passes together in various ways.

http://scalibq.wordpress.com/just-keeping-it- … ro-programming/

Reply 26 of 50, by kanecvr

Posted on 2017-03-08, 09:44

kanecvr Offline

Rank Oldbie

Rank: Oldbie
Posts: 1957
Joined: 2015-04-22, 20:30
Location: Bucharest, Romania

Scali wrote:

What makes something a 'GPU' is not whether it can do 2D or not, but rather whether it is an actual 'processing unit'. As in, you can feed it a program and it executes it by itself.

That makes sense.

The Voodoo series ability to do some geometry calculations would make sort of a "grandfather" of the modern GPU as it was able to offload some of the work from the CPU right? I don't remember other contemporaty 3d accelerators being able to do the same. For example, running GL_Quake @ 640x480 on a really fast 486 (say a 160MHz Am586) yields playable framerates when using a Voodoo 2 (~23-25fps avarage) despite the game's heavy P54 optimizations. Using another contemporaty 3d accelerator like a Riva 128ZX or TNT PCI, the game is unplayable on the same machine.

Then again, this might be down to the glide APi and not the card itself? Is this correct?

Reply 27 of 50, by Scali

Posted on 2017-03-08, 10:24

Scali Offline

Rank l33t

Rank: l33t
Posts: 4873
Joined: 2014-12-13, 14:24

kanecvr wrote:
The Voodoo series ability to do some geometry calculations would make sort of a "grandfather" of the modern GPU as it was able to offload some of the work from the CPU right?

As far as I know, the VooDoo cards could do no such thing. They were merely accelators in 2D-space.

kanecvr wrote:
I don't remember other contemporaty 3d accelerators being able to do the same.

As far as I know, VooDoo wasn't particularly advanced at any point in time. It was just fast, really fast.

kanecvr wrote:
Then again, this might be down to the glide APi and not the card itself? Is this correct?

Yes, that's what I think.
They just had a really efficient miniGL driver.
GLQuake is not a very good case for hardware-accelerated T&L in the first place, for 2 reasons:
1) Quake uses extremely lowpoly geometry, where you generally want to submit larger batches of polygon data to get any mileage out of T&L.
2) Quake uses a BSP tree algorithm, which splits up each sector down to single polygons to determine visibility.
This means polygons are submitted to the pipeline one at at time, from system memory. This is a worst-case scenario for hardware T&L. Later BSP-based engines would use 'leafy' BSP trees, where they didn't split things down to individual polygons, but to small batches of polygons, to increase performance on T&L hardware.

A TNT may be slow on a 486 for a different reason: Because it's considerably more advanced than a VooDoo, the CPU also has to set up a lot more stuff for every draw operation. That may not be an issue on the Pentium II+ machines that it targeted, but the driver overhead might be too much for a 486. It could also be that because the driver is much newer than the VooDoo ones, no care was taken to optimize code for the extremely slow FPU of a 486, and they just assumed a fast FPU like on a Pentium or better. The software version of Quake is unplayable on 486 systems for the exact same reason.

If you want to see 'grandfathers' of the GPU, I think early SGI systems with custom co-processors for geometry would be where it's at.
Or perhaps even the IBM Professional Graphics Controller: https://en.wikipedia.org/wiki/Professional_Gr … hics_Controller
It was basically a 'PC on a stick': it had its own memory and 8088 CPU. You could upload programs to it, that would draw into the video memory autonomously.
The Amiga would also be a nice candidate: It had the 'copper' co-processor', which could run simple programs. And the blitter chip, which could draw lines and perform floodfill.
The blitter could be programmed to render polygons by first drawing the outline, and then flood-filling it. Such a program could be executed by the copper, so you could consider the copper+blitter combination an early form of a GPU: you could batch up graphics tasks with the CPU into copperlists, and they would execute without any intervention from the CPU.

http://scalibq.wordpress.com/just-keeping-it- … ro-programming/

Reply 28 of 50, by spiroyster

Posted on 2017-03-08, 10:55

spiroyster Offline

Rank Oldbie

Rank: Oldbie
Posts: 693
Joined: 2015-10-12, 12:26

And then there’s GPGPU (or is that like really pre 2010 o.0) aka cuda cores!

[Edit]: Ah didn't realise GPGPU had been mentioned already... too many Gs foolwed by Ps followed by U's... and then C looks a lot like G which is also confusing. Need coffee.

Even before the advent of GPGPU, standard depth behaviour of rastered images could be used for a 'picker' to deduce what object is at what point, by drawing each primitive with a different colour, rendering to an offscreen buffer and then querying the image. GL even had standard functionality for this although I think it's been deprecated by now.

By texturing geometry with patches and the raycasting against the rasterized image, radiosity could be accelerated (not real-time though), so a GPU could be used for non visual/lgtm processing, while still not being 'general purpose'.

3dfx was started by 3 ex- Silicon Graphics employees (sgi), who took the expertise from high paying government type clients and mass marketed it for the up and coming accelerated 3D games revolution (they timed it just right!)

Reply 29 of 50, by spiroyster

Posted on 2017-03-08, 11:08

spiroyster Offline

Rank Oldbie

Rank: Oldbie
Posts: 693
Joined: 2015-10-12, 12:26

gdjacobs wrote:
Well, certainly there were cel shaded titles on console platforms which wouldn't be possible with the Direct3d 7 vertex pipeline.

I didn't think it was possible until I saw it myself. This effect can be done with even OpenGL 1.2 by multi-pass clever use of the stencil buffer. While not quite the 'toon' affect you get these days, it allowed production of arty type non-photo realistic (watercolour/architectual renderings) affects beyond specular and diffuse.

Reply 30 of 50, by Scali

Posted on 2017-03-08, 11:16

Scali Offline

Rank l33t

Rank: l33t
Posts: 4873
Joined: 2014-12-13, 14:24

spiroyster wrote:
Even before the advent of GPGPU, standard depth behaviour of rastered images could be used for a 'picker' to deduce what object is at what point, by drawing each primitive with a different colour, rendering to an offscreen buffer and then querying the image. GL even had standard functionality for this although I think it's been deprecated by now.

By texturing geometry with patches and the raycasting against the rasterized image, radiosity could be accelerated (not real-time though), so a GPU could be used for non visual/lgtm processing, while still not being 'general purpose'.

Another cool trick is that you could do screen-space CSG, by making creative use of the stencil and depth buffers, and rendering your geometry in a specific order.
Any of the common operations, union, intersection and difference (and combination thereof) could be handled entirely by a DX7-class graphics card.
Union is trivial, that's just standard rendering with z-buffer. Intersection and difference could be done by marking pixels in the stencilbuffer (very similar to how shadowvolumes were done in Doom3 and related engines).
Here is my proof-of-concept, written on an GeForce2 at the time:
https://youtu.be/PHYF51Asav8

http://scalibq.wordpress.com/just-keeping-it- … ro-programming/

Reply 31 of 50, by spiroyster

Posted on 2017-03-08, 11:27

spiroyster Offline

Rank Oldbie

Rank: Oldbie
Posts: 693
Joined: 2015-10-12, 12:26

Scali wrote:
Another cool trick is that you could do screen-space CSG, by making creative use of the stencil and depth buffers, and rendering […]
Show full quote
spiroyster wrote:
Even before the advent of GPGPU, standard depth behaviour of rastered images could be used for a 'picker' to deduce what object is at what point, by drawing each primitive with a different colour, rendering to an offscreen buffer and then querying the image. GL even had standard functionality for this although I think it's been deprecated by now.

By texturing geometry with patches and the raycasting against the rasterized image, radiosity could be accelerated (not real-time though), so a GPU could be used for non visual/lgtm processing, while still not being 'general purpose'.

Another cool trick is that you could do screen-space CSG, by making creative use of the stencil and depth buffers, and rendering your geometry in a specific order.
Any of the common operations, union, intersection and difference (and combination thereof) could be handled entirely by a DX7-class graphics card.
Union is trivial, that's just standard rendering with z-buffer. Intersection and difference could be done by marking pixels in the stencilbuffer (very similar to how shadowvolumes were done in Doom3 and related engines).
Here is my proof-of-concept, written on an GeForce2 at the time:
https://youtu.be/PHYF51Asav8

😀 Fair play, thats pretty cool.... Heaven7 HD. 😀

Reply 32 of 50, by Scali

Posted on 2017-03-08, 11:48

Scali Offline

Rank l33t

Rank: l33t
Posts: 4873
Joined: 2014-12-13, 14:24

I actually designed that routine for a sort of 'variation' of the 'object picker' you mentioned.
Namely, I was working on a CAD/CAM application, where we wanted an estimation of the amount of flow through drill holes.
For this, we wanted an estimation of the 'area' of the hole when looking straight down it.
Since these drill holes are quite complex (a single drill hole is a cylinder with a cone on top. Holes can be 'stepped', by drilling from wide to narrow diameter. And holes can meet up), an analytical solution would be very complex, and basically unsolvable.
So I went looking for some kind of estimation, like cell-classification/FEA.

I figured that if I could render the hole in 2d, I could just count the number of pixels of a particular colour, as a simple 2d classification algorithm.
So I needed CSG to be able to model any kind of drill hole (that's why my test object is a cylinder with cylinders subtracted from it, they're my 'drill holes'). And I found that doing it on the GPU was extremely fast, and gave me a stable, pixel-accurate result.

http://scalibq.wordpress.com/just-keeping-it- … ro-programming/

Reply 33 of 50, by spiroyster

Posted on 2017-03-08, 12:30

spiroyster Offline

Rank Oldbie

Rank: Oldbie
Posts: 693
Joined: 2015-10-12, 12:26

Scali wrote:
I actually designed that routine for a sort of 'variation' of the 'object picker' you mentioned. Namely, I was working on a CAD/ […]
Show full quote
I actually designed that routine for a sort of 'variation' of the 'object picker' you mentioned.
Namely, I was working on a CAD/CAM application, where we wanted an estimation of the amount of flow through drill holes.
For this, we wanted an estimation of the 'area' of the hole when looking straight down it.
Since these drill holes are quite complex (a single drill hole is a cylinder with a cone on top. Holes can be 'stepped', by drilling from wide to narrow diameter. And holes can meet up), an analytical solution would be very complex, and basically unsolvable.
So I went looking for some kind of estimation, like cell-classification/FEA.

I figured that if I could render the hole in 2d, I could just count the number of pixels of a particular colour, as a simple 2d classification algorithm.
So I needed CSG to be able to model any kind of drill hole (that's why my test object is a cylinder with cylinders subtracted from it, they're my 'drill holes'). And I found that doing it on the GPU was extremely fast, and gave me a stable, pixel-accurate result.

This is really cool use of older hardware to accelerate non standard. I wonder what kind of detailed triangulations could be present and still get decent FPS. Ironically I will soon be implementing some CSG tools and workflows (me work in CAD (AEC) too 🤣) and this is giving me some good ideas for the drawing. Two main display drivers we support in our framework, GL 4.5 (currently investigating vulkan) and GL 1.2 (WinXP P4 users 😵). I was contemplating how to best mitigate this obvious performance difference for our 'legacy' users. Got some ideas now, thanks.. 😀 😀 😀

CAD + D3D o.0

Reply 34 of 50, by Scali

Posted on 2017-03-08, 12:35

Scali Offline

Rank l33t

Rank: l33t
Posts: 4873
Joined: 2014-12-13, 14:24

spiroyster wrote:
This is really cool use of older hardware to accelerate non standard.

Well, at the time I developed the routine, a GF2 was state-of-the-art 😀

spiroyster wrote:
I wonder what kind of detailed triangulations could be present and still get decent FPS.

The example high-poly cylinder as you see in the video, with a bunch of cylinders subtracted from it, ran at a few hundred FPS in 640x480 on a GF2GTS. Around 250-300 fps if memory serves me.
The specular highlights give an idea of just how highpoly the thing is, it's vertex-lit of course, not per-pixel.

http://scalibq.wordpress.com/just-keeping-it- … ro-programming/

Reply 35 of 50, by spiroyster

Posted on 2017-03-08, 12:44

spiroyster Offline

Rank Oldbie

Rank: Oldbie
Posts: 693
Joined: 2015-10-12, 12:26

Scali wrote:
Well, at the time I developed the routine, a GF2 was state-of-the-art :) […]
Show full quote
spiroyster wrote:
This is really cool use of older hardware to accelerate non standard.

Well, at the time I developed the routine, a GF2 was state-of-the-art 😀

spiroyster wrote:
I wonder what kind of detailed triangulations could be present and still get decent FPS.

The example high-poly cylinder as you see in the video, with a bunch of cylinders subtracted from it, ran at a few hundred FPS in 640x480 on a GF2GTS. Around 250-300 fps if memory serves me.
The specular highlights give an idea of just how highpoly the thing is, it's vertex-lit of course, not per-pixel.

At some angles, some of that tessalation looks spot on. Are the subtracting cylinders varying poly detail? Or all similar uniformaly distrubuted meshes?

Reply 36 of 50, by spiroyster

Posted on 2017-03-08, 12:47

spiroyster Offline

Rank Oldbie

Rank: Oldbie
Posts: 693
Joined: 2015-10-12, 12:26

P.S Don't know if its just me, but the link to the binaries times out?

[EDIT]: Apologies my spelling really is awful. I blame intellisense, it propagates bad spelling, but it is consistent o.0

Last edited by spiroyster on 2017-03-08, 12:58. Edited 1 time in total.

Reply 37 of 50, by Scali

Posted on 2017-03-08, 12:48

Scali Offline

Rank l33t

Rank: l33t
Posts: 4873
Joined: 2014-12-13, 14:24

spiroyster wrote:
At some angles, some of that tessalation looks spot on. Are the subtracting cylinders varying poly detail? Or all similar uniformaly distrubuted meshes?

All cylinders are pre-generated static meshes. I varied the tessellation factors for each one when generating them, so some are really high-poly and round, others are clearly 'faceted'.

spiroyster wrote:
P.S Don't know if its just me, but the link to the binaries times out?

Yea, that server went down a few years ago.
I'll have to see if I can find the files somewhere, and I can host them on dropbox.

http://scalibq.wordpress.com/just-keeping-it- … ro-programming/

Reply 38 of 50, by spiroyster

Posted on 2017-03-08, 13:14

spiroyster Offline

Rank Oldbie

Rank: Oldbie
Posts: 693
Joined: 2015-10-12, 12:26

Thanks for that, think I've watched it enough 😐 . One more question, the resolution of the offscreen texture, If I'm not mistaken, this changes? or is it constant and the strange artefacts of the reversed 'surfaces' I can see 'through the hole' is the texture filtering at different angles/projections?

[EDIT]: Looks like it could be filtering, vaguely reminiscent of the texel boundry evaluation flipping out o.0

Reply 39 of 50, by Scali

Posted on 2017-03-08, 13:29

Scali Offline

Rank l33t

Rank: l33t
Posts: 4873
Joined: 2014-12-13, 14:24

spiroyster wrote:
Thanks for that, think I've watched it enough 😐 . One more question, the resolution of the offscreen texture, If I'm not mistaken, this changes? or is it constant and the strange artefacts of the reversed 'surfaces' I can see 'through the hole' is the texture filtering at different angles/projections?

[EDIT]: Looks like it could be filtering, vaguely reminiscent of the texel boundry evaluation flipping out o.0

I don't use any special offscreen textures. Just a standard backbuffer, same size as the screen buffer (and depthstencil surface to match).
The aliasing could either be YouTube compression, or perhaps a bug I once had in the windowed mode of my engine, where it didn't resize the offscreen buffer to the exact size of the client area.
The rendering itself is as stable as the rasterizer of your GPU allows, and the GF2 rasterizer was very good. Z-buffer was 24-bit here (I only had the 24S8 option for the 8-bit stencil I needed).
So I don't recall any z-fighting or such really.

http://scalibq.wordpress.com/just-keeping-it- … ro-programming/

Main menu

Common searches