Reply 21760 of 29599, by 386SX
Lately still fighting with my mini-itx Atom sytem (I know it's a pointless effort.. 😁) having my usual Atom Cedar Trail SoC and its GMA SGX545 gpu which I'm still trying to understand why its speed is so low/variable.
The more I read on old discussions and even the official PowerVr dev forum and the public architecture papers explaining this series of low power mobile oriented gpu, the more I wonder if back in those years (almost a decade ago) the gpu had such a very different architecture from the usual desktop ones that make it not possible to compare to others gpu at first.
Many ones where blaming the original gpu speed but what I think have understood is that this gpu was scalable depending on the company choosing the configuration for what they needed I suppose depending on costs/power demand/core size@manufacturing level. So the quite low bench results is not necessary a problem of the design itself but I think the choice of which version were integrated into the SoC itself while having the same name SGX545. Also the specific architecture seems would have needed specific optimization to unleash its performances I wonder not necessary at driver level but more at games development which were of course oriented to desktop usual architectures. The common "unified shaders" logic here seems different while it might be considered (from GPUZ and some users discussions) as 4 "unified shaders" (USSE) 4 TMUs 2 "ROP" (while it has been said they can't be called that way on this gpu) running @ 400Mhz and using the 1066Mhz DDR3 single channel shared memory from 64MB minimun to 1,5GB whatever..
From the official forum few discussions the theorical numbers for the SGX545@400Mhz it was said:
"800 Mpixels/sec textured fillrate (2 textured pixels per clock)
6.4 Gflops (4 flops per clock per USSE, 4 USSEs total)
80 M transformed triangles/sec (5 clocks per triangle)"
( https://forums.imgtec.com/t/sgx545-vs-amd-nvidia/1838/2 )
The whole discussion is really interesting to understand what probably wasn't a gpu fault but more the version the gpu has been configured to be implemented in a SoC that was more oriented to "light desktop-notebook" but at the edge of Win 8 didn't have enough muscles probably to render both the newer GUIs and Dx10 games at the expected levels.
My numbers are this on a Win 8.1 x86 config with 4GB@3GB DDR3 single channel single dimm, along the SoC dual core 1,9Ghz SSSE3 cpu: 3DMark2001 3900 points (4100 on Win 7), 3DMark03 1930 points, 3DMark05 750 points, 3DMark06 420 points.
The WDDM1.1 driver doesn't help imho in the modern unsupported o.s. but still working. What's interesting is that these low numbers still aren't cpu limited even with such low power cpu so mostly gpu limited. But the synthetic numbers are more interesting with a strangely low fill rate that reach 650 Mt/s in single texturing and I can't understand why (and probably a reason for its low speed in old games) a much lower multi texturing number around 250Mt/s. The triangle rate is 24MT/s at best and 10MT/s in complex light scenes. The theorical DX10 level compatibility never saw a driver and remained compatible to Dx 9.3 probably surpassing that anyway. The H264/VC1 decoding engine can easily work with 1080p 60fps test videos.
What really make this gpu on a different level is the power demand when calculated from the wall plug meter. It looks like its power demand reach something like a maximun of 3 watts of difference from "idle" to "stress" and at idle desktop it probably ask something like 1 watt maybe less. So I wonder something like a 3,5 watts like gpu in the worst scenarios which doesn't make much difference even in the heaviest benchmarks. I suppose the biggest problem of this SoC was that the cpu itself and I suppose the memory controller or whatever took most of the SoC size @ 32nm leaving not much space for the configuration choice of the gpu to integrate which might scale up to much more unified pipelines.