Putas wrote:That was the norm, 16 bit was never a proper graphics standard. The interesting question is which 3d accelerators, if any, where internally capped to 16 bits only?
The confusion is probably because there's a difference between the ALU used for rendering operations and the data format used for textures and framebuffers.
Just like with CPUs, once you have an ALU capable of 32-bit, it is no slower to perform 32-bit calculations than it is to perform 16-bit calculations.
Therefore, video chips would always process textures, lighting etc with 32-bit anyway. The thing is just that loading a 16-bit texture was twice as fast as loading a 32-bit texture, because you only had to send half the data over the same memory interface.
Likewise, storing data in a 16-bit framebuffer was twice as fast as a 32-bit framebuffer (the dithering was just some fixed-function unit, so it was pipelined into the design, and didn't result in longer rendering times. Just like fetching data from a texture was implicitly converting it to 32-bit 'for free').
Now, all conventional 3d accelerators will just load textures, perform the lighting/blending operations on these textures, then perform the blending with the framebuffer (if any), and store the result on a per-pixel basis.
So you get 16-bit -> 32-bit -> 16-bit.
PowerVR is the exception to the rule here. Because it is a tile-based renderer, it doesn't actually render to videomemory. It renders to the tile cache. This tile cache is always a 32-bit buffer. So that means that it only goes 16-bit -> 32-bit (if you use 16-bit textures that is). The pixels remain 32-bit in the tile-cache, and blend operations are also done in 32-bit.
It only does 32-bit -> 16-bit once the tile has finished rendering, and is stored to the final framebuffer in video memory.
This only works because the tile cache is much faster than video memory. For conventional renderers this wouldn't make sense. They would have to put the 'temporary' framebuffer in videomemory as well. In which case they'd run into the same bottlenecks as with full 32-bit rendering, so they would only be slower in doing so.