Reply 20 of 22, by Dege
Thx for offering bounty, but unfortunately at the moment I have no idea about how to fix it. I guess it cannot be fixed outside of dgVoodoo either because it's not a "bug". Probably the thing is that GPU's still supported row-major rendertargets in the DX8/9 days so they could easily create one, map that piece of VRAM into the CPU space and return a pointer directly to it. But once row-major RT support was dropped then drivers could do nothing anymore other than copying the content of the RT into a plain row-major placeholder buffer and return a pointer to that one instead. And of course copying the plain buffer back to the RT when the GPU is about to draw into it. Moving that large amount of memory back and forth per Lock/Draw is expensive and I think that's what appears as 5s on discrete GPU's and even 2s on iGPU's.
I have an idea that could help for iGP's (of course it's not the perfect solution but it could make it somewhat faster). And think I could do the same for dGPU's with gpu upload heap.
But the sad thing is that I can't test gpu upload heap: I realized that D3D12 is not updated in Win10 anymore, so the Agility SDK dll's won't get into it by time. I must install Win11 and even then, I'd need an GF 3xxx class videocard because resizeable BAR is supported starting from that. 🙁
Anyway, I'll try the iGPU version when I have some time (requires some code rafactoring and such).