Here's something. Rendering a rotating textured mesh in OpenGL vs. Direct3D, polycount on x and FPS on y:
The attachment tgwlpi.png is no longer available
These are my own renderers so no guarantee that they behave like commercial engines or don't do something odd. I tried both Direct3D 5 and Direct3D 7 with the 100-poly mesh, identical FPS so I did the rest of the tests using Direct3D 7.
In effect, the G400 and the G550 behave fairly identically in OpenGL, but in Direct3D the G400 is considerably faster up to anywhere between 1000 and 5000 polys, and after that they perform the same. The pattern does show up in OpenGL as well, but much less pronounced.
What's causing this? The renderer isn't using strips, and I tested a displaced version of the mesh where no vertices were shared, same FPS.
This could explain why low-poly games like Homeworld and Formula 1 97, ones that also don't seem to be using multitexturing, run better on the G400, so long as the CPU is fast enough to push the triangles. The performance difference in Homeworld in Direct3D is about 20% in favor of the G400 but virtually even in OpenGL.
May also explain the lower texture benchmark result for the G550 in 3DMark2000, since the test renders a tube mesh to display the textures in. No idea what polycount it has.
Since the size of the mesh on the screen stays the same regardless of polycount, it could be a matter of polygon size in screen space, whether to do with texture sampling or some other kind of interpolation in rasterization, or caching, or something. Or maybe some driver setting has enabled a compromising optimization for Direct3D with the G400.