VOGONS


Reply 20 of 22, by Dege

User metadata
Rank l33t
Rank
l33t

Thx for offering bounty, but unfortunately at the moment I have no idea about how to fix it. I guess it cannot be fixed outside of dgVoodoo either because it's not a "bug". Probably the thing is that GPU's still supported row-major rendertargets in the DX8/9 days so they could easily create one, map that piece of VRAM into the CPU space and return a pointer directly to it. But once row-major RT support was dropped then drivers could do nothing anymore other than copying the content of the RT into a plain row-major placeholder buffer and return a pointer to that one instead. And of course copying the plain buffer back to the RT when the GPU is about to draw into it. Moving that large amount of memory back and forth per Lock/Draw is expensive and I think that's what appears as 5s on discrete GPU's and even 2s on iGPU's.

I have an idea that could help for iGP's (of course it's not the perfect solution but it could make it somewhat faster). And think I could do the same for dGPU's with gpu upload heap.
But the sad thing is that I can't test gpu upload heap: I realized that D3D12 is not updated in Win10 anymore, so the Agility SDK dll's won't get into it by time. I must install Win11 and even then, I'd need an GF 3xxx class videocard because resizeable BAR is supported starting from that. 🙁
Anyway, I'll try the iGPU version when I have some time (requires some code rafactoring and such).

Reply 21 of 22, by Joshhhuaaa

User metadata
Rank Newbie
Rank
Newbie
Dege wrote on 2023-05-04, 18:19:
Thx for offering bounty, but unfortunately at the moment I have no idea about how to fix it. I guess it cannot be fixed outside […]
Show full quote

Thx for offering bounty, but unfortunately at the moment I have no idea about how to fix it. I guess it cannot be fixed outside of dgVoodoo either because it's not a "bug". Probably the thing is that GPU's still supported row-major rendertargets in the DX8/9 days so they could easily create one, map that piece of VRAM into the CPU space and return a pointer directly to it. But once row-major RT support was dropped then drivers could do nothing anymore other than copying the content of the RT into a plain row-major placeholder buffer and return a pointer to that one instead. And of course copying the plain buffer back to the RT when the GPU is about to draw into it. Moving that large amount of memory back and forth per Lock/Draw is expensive and I think that's what appears as 5s on discrete GPU's and even 2s on iGPU's.

I have an idea that could help for iGP's (of course it's not the perfect solution but it could make it somewhat faster). And think I could do the same for dGPU's with gpu upload heap.
But the sad thing is that I can't test gpu upload heap: I realized that D3D12 is not updated in Win10 anymore, so the Agility SDK dll's won't get into it by time. I must install Win11 and even then, I'd need an GF 3xxx class videocard because resizeable BAR is supported starting from that. 🙁
Anyway, I'll try the iGPU version when I have some time (requires some code rafactoring and such).

Yeah, looks like we're at a dead end with the hardware, so I don't think we'll ever get a flawless solution for this like you said. But if the iGPU/GPU upload heap performance has a significant performance improvement, it could have potential. I should be able to try out your iGPU workarounds if you need any help with that. I'm currently still on a i7-9700K with Intel UHD Graphics 630.

I don't have Resizeable Bar support at the moment either with a Nvidia 2070 Super, but I'm planning to upgrade my hardware next year to get it. I know it requires a 10th gen Intel or Ryzen 5000 series CPU as well as having 30-series Nvidia.

Reply 22 of 22, by chris.davis925

User metadata
Rank Newbie
Rank
Newbie
Dege wrote on 2023-05-04, 18:19:
Thx for offering bounty, but unfortunately at the moment I have no idea about how to fix it. I guess it cannot be fixed outside […]
Show full quote

Thx for offering bounty, but unfortunately at the moment I have no idea about how to fix it. I guess it cannot be fixed outside of dgVoodoo either because it's not a "bug". Probably the thing is that GPU's still supported row-major rendertargets in the DX8/9 days so they could easily create one, map that piece of VRAM into the CPU space and return a pointer directly to it. But once row-major RT support was dropped then drivers could do nothing anymore other than copying the content of the RT into a plain row-major placeholder buffer and return a pointer to that one instead. And of course copying the plain buffer back to the RT when the GPU is about to draw into it. Moving that large amount of memory back and forth per Lock/Draw is expensive and I think that's what appears as 5s on discrete GPU's and even 2s on iGPU's.

I have an idea that could help for iGP's (of course it's not the perfect solution but it could make it somewhat faster). And think I could do the same for dGPU's with gpu upload heap.
But the sad thing is that I can't test gpu upload heap: I realized that D3D12 is not updated in Win10 anymore, so the Agility SDK dll's won't get into it by time. I must install Win11 and even then, I'd need an GF 3xxx class videocard because resizeable BAR is supported starting from that. 🙁
Anyway, I'll try the iGPU version when I have some time (requires some code rafactoring and such).

Thanks for your efforts, Dege. Here's to hoping that your iGPU idea can help with this 😁