Greetings Irinikus! 😀
Irinikus wrote on 2023-09-05, 14:11:
I'm definitely going to try a GTX Titan Z in this system when the opportunity to purchase one presents itself! (I'll set it up in the same way as I've setup the GTX 690!)
The only thing that bothers me is that the Titan Z is clocked at a lower frequency than the GTX 690, and if you take a look at the GPU usage that I got in Crysis, the Titan Z may actually offer reduced performance in this case!?
As someone who had (and still has in a box somewhere) GTX 690 and upgraded to Titan Black back in the day, I predict the TitanZ to smoke the 690 (Never had Titan Z though).
The Titan Black was defo an upgrade to the 690 (while I do love my 690). In some specialised conditions, the 690 could very well beat a Titan (non black version), but that workload would have to be max utilisation of SLI, minimal VRAM requirement (due to limited unstacked VRAM in 690) and saturate the entire CUDA core capacity (not captilalising on the higher ROP and TMU count of the titan). While the Titan clocks might be lower, the ROP and TMU allow a greater raster performance/throughput and coupling this with the higher CUDA core count, the reduced clock speed becomes less of a decent metric to gauge them against each other as shader execution throughput could still out perform the higher clocked/less core hardware. Not to mention higher bus speed of the Titan too.
The Titan black was a higher clocked Titan and so would be even better at beating the 690... which it was (Titan Black gave noticeably smoother and higher FPS in benchmarks such as "Unigine Heaven" without stutter, than 690 utilising it's SLI).
It's my inderstanding that the TitanZ is more like dual Titan Blacks, than dual Titans (although according to some sources this is a bit grey, and may even be two slightly hindered Titan cores), so I would assume that even 'half' of the TitanZ would match or outperform the 690 in most (if not all) things.
Also your screenshots, I think clearly show a CPU bottleneck (the GPU is waiting on CPU), why though I don't know. iirc Crysis could use max 4 cores, but is probably heavly limited by single core throughput. Can't remember what FPS Crysis gave me with the GTX 690, but it should be a lot higher than 25 😉. I was running it on dual X5690 (X58 system), very playable at 1920 x 1080 (50+ FPS iirc) and even playable at 2560 x 1600 at the time if memory serves.
Irinikus wrote on 2023-09-05, 17:51:The Serpent Rider wrote on 2023-09-05, 17:42:
It doesn't. Titan has almost twice the amount of CUDA cores.
How is GPU usage calculated?
That is a very good question!
I think use of the word 'usage' on these overlays is a bit of a red-herring for indicating if your GPU is at capacity or not, rather indicates how much the CPU is waiting on the GPU (so 'usage' in this case means two different things for CPU/GPU). CPU usage shows how much the current total hardware is utilised, where as for GPU it actually shows how long the GPU is busy over time, irrelevant if the GPU is at full capacity.
e.g. For dual core, a single threaded program could fully saturate just a single core so the usage is peresented as 50%, yet it is bottlenecking due the to limit of single core performance. Using both cores at max would be technically (and presented as) 100%.
For GPU, it's different, a shader that does not fully saturate the GPU architecture (i.e nV/WARP or AMD/Wavefront) so hardware usage is not 100% could still be bottlenecking as the CPU is having to wait for the GPU to finish, even though only a small percentage of total hardware performance is utilised... but this is presented as '100% usage'.
So in both cases there are bottlenecks, also in both cases neither hardware is at full capacity, but they present different values for 'usage'.
Also note, I think most of these 'overlays' rely on vendor specific telemetry for any detailed information about the GPU usage, and while it probably wouldn't be lying... it's proprietry so we don't exactly know what it is reporting wrt to usage and capacity and how it reaches it's conclusion. GPU architecture doesn't fit all usage scenarios. It's highly mutli-threaded so can only really be saturated with embarrassingly-parrallel problems, and even then, race conditions/concurrency, overall process design wrt to scheduling (driver-side AND client-side that is) and order of operations throws yet more spanners into the works about being able to fully utilise a GPU's full capacity. It's hard to keep GPU's workload at optimum full capacity all the time and is, in most cases, actually impossible since the way the architecture needs to be used.
TL;DR CPU usage shows 'how much is used', GPU usage shows 'how long it takes'...