VOGONS


Reply 20 of 26, by Dege

User metadata
Rank l33t
Rank
l33t

Thx for offering bounty, but unfortunately at the moment I have no idea about how to fix it. I guess it cannot be fixed outside of dgVoodoo either because it's not a "bug". Probably the thing is that GPU's still supported row-major rendertargets in the DX8/9 days so they could easily create one, map that piece of VRAM into the CPU space and return a pointer directly to it. But once row-major RT support was dropped then drivers could do nothing anymore other than copying the content of the RT into a plain row-major placeholder buffer and return a pointer to that one instead. And of course copying the plain buffer back to the RT when the GPU is about to draw into it. Moving that large amount of memory back and forth per Lock/Draw is expensive and I think that's what appears as 5s on discrete GPU's and even 2s on iGPU's.

I have an idea that could help for iGP's (of course it's not the perfect solution but it could make it somewhat faster). And think I could do the same for dGPU's with gpu upload heap.
But the sad thing is that I can't test gpu upload heap: I realized that D3D12 is not updated in Win10 anymore, so the Agility SDK dll's won't get into it by time. I must install Win11 and even then, I'd need an GF 3xxx class videocard because resizeable BAR is supported starting from that. 🙁
Anyway, I'll try the iGPU version when I have some time (requires some code rafactoring and such).

Reply 21 of 26, by Joshhhuaaa

User metadata
Rank Newbie
Rank
Newbie
Dege wrote on 2023-05-04, 18:19:
Thx for offering bounty, but unfortunately at the moment I have no idea about how to fix it. I guess it cannot be fixed outside […]
Show full quote

Thx for offering bounty, but unfortunately at the moment I have no idea about how to fix it. I guess it cannot be fixed outside of dgVoodoo either because it's not a "bug". Probably the thing is that GPU's still supported row-major rendertargets in the DX8/9 days so they could easily create one, map that piece of VRAM into the CPU space and return a pointer directly to it. But once row-major RT support was dropped then drivers could do nothing anymore other than copying the content of the RT into a plain row-major placeholder buffer and return a pointer to that one instead. And of course copying the plain buffer back to the RT when the GPU is about to draw into it. Moving that large amount of memory back and forth per Lock/Draw is expensive and I think that's what appears as 5s on discrete GPU's and even 2s on iGPU's.

I have an idea that could help for iGP's (of course it's not the perfect solution but it could make it somewhat faster). And think I could do the same for dGPU's with gpu upload heap.
But the sad thing is that I can't test gpu upload heap: I realized that D3D12 is not updated in Win10 anymore, so the Agility SDK dll's won't get into it by time. I must install Win11 and even then, I'd need an GF 3xxx class videocard because resizeable BAR is supported starting from that. 🙁
Anyway, I'll try the iGPU version when I have some time (requires some code rafactoring and such).

Yeah, looks like we're at a dead end with the hardware, so I don't think we'll ever get a flawless solution for this like you said. But if the iGPU/GPU upload heap performance has a significant performance improvement, it could have potential. I should be able to try out your iGPU workarounds if you need any help with that. I'm currently still on a i7-9700K with Intel UHD Graphics 630.

I don't have Resizeable Bar support at the moment either with a Nvidia 2070 Super, but I'm planning to upgrade my hardware next year to get it. I know it requires a 10th gen Intel or Ryzen 5000 series CPU as well as having 30-series Nvidia.

Reply 22 of 26, by chris.davis925

User metadata
Rank Newbie
Rank
Newbie
Dege wrote on 2023-05-04, 18:19:
Thx for offering bounty, but unfortunately at the moment I have no idea about how to fix it. I guess it cannot be fixed outside […]
Show full quote

Thx for offering bounty, but unfortunately at the moment I have no idea about how to fix it. I guess it cannot be fixed outside of dgVoodoo either because it's not a "bug". Probably the thing is that GPU's still supported row-major rendertargets in the DX8/9 days so they could easily create one, map that piece of VRAM into the CPU space and return a pointer directly to it. But once row-major RT support was dropped then drivers could do nothing anymore other than copying the content of the RT into a plain row-major placeholder buffer and return a pointer to that one instead. And of course copying the plain buffer back to the RT when the GPU is about to draw into it. Moving that large amount of memory back and forth per Lock/Draw is expensive and I think that's what appears as 5s on discrete GPU's and even 2s on iGPU's.

I have an idea that could help for iGP's (of course it's not the perfect solution but it could make it somewhat faster). And think I could do the same for dGPU's with gpu upload heap.
But the sad thing is that I can't test gpu upload heap: I realized that D3D12 is not updated in Win10 anymore, so the Agility SDK dll's won't get into it by time. I must install Win11 and even then, I'd need an GF 3xxx class videocard because resizeable BAR is supported starting from that. 🙁
Anyway, I'll try the iGPU version when I have some time (requires some code rafactoring and such).

Thanks for your efforts, Dege. Here's to hoping that your iGPU idea can help with this 😁

Reply 23 of 26, by NoFrameLimit

User metadata
Rank Newbie
Rank
Newbie
Dege wrote on 2023-04-28, 15:58:
Yes, I'll compile a faking-lock version and see how it works. […]
Show full quote

Yes, I'll compile a faking-lock version and see how it works.

Of course there is a caching mechanism under the hood for locking surfaces (it's multi-level and quite complicated, especially with 'fast vidmem' mode), so subsequent Lock calls won't cause memory readback all the time. But this is only true if the GPU does not modify the content between two Lock's.
But, in UnrealEd this is the general pattern:

101782	129.501221	24796	UnrealEd.exe	Direct3DSurface8::LockRect (this = 1d471de0, pLockedRect = 5f6310, pRect = 0, Flags = 0)
101783 129.501249 24796 UnrealEd.exe Direct3DSurface8::UnlockRect (this = 1d471de0)
101784 129.501277 24796 UnrealEd.exe Direct3DSurface8::Release (this = 1d471de0)
101785 129.501307 24796 UnrealEd.exe Direct3DVertexBuffer8::Lock (this = 1d802f68, OffsetToLock = 11392, SizeToLock = 768, ppbData = 5f5e48, Flags = 1000)
101786 129.501336 24796 UnrealEd.exe Direct3DVertexBuffer8::Unlock (this = 1d802f68)
101787 129.501371 24796 UnrealEd.exe Direct3D8Device::SetTransform (this = 175b9330, State = D3DTS_WORLD, pMatrix = 1159f964)
101788 129.501400 24796 UnrealEd.exe Direct3D8Device::DrawPrimitive (this = 175b9330, PrimitiveType = D3DPT_LINELIST, StartVertex = 712, PrimitiveCount = 24)
101789 129.501489 24796 UnrealEd.exe Direct3D8Device::GetRenderTarget (this = 175b9330, ppRenderTarget = 5f6314)
101790 129.501518 24796 UnrealEd.exe Direct3DSurface8::QueryInterface (this = 1d471de0)
101791 129.501547 24796 UnrealEd.exe Direct3DSurface8::LockRect (this = 1d471de0, pLockedRect = 5f62fc, pRect = 0, Flags = 0)
101792 129.503337 24796 UnrealEd.exe Direct3DSurface8::UnlockRect (this = 1d471de0)

There are Draw* calls between surface Lock's so the videomemory must be read from video memory both for the first and second Lock. (and even must be written back to the videomemory for the Draw call..., so it's moved back and forth)

Very old thread, but I thought I'd post here as I solved this issue in the Splinter Cell Chaos Theory Editor:
https://youtu.be/xxnMBnKBSjY

The editor was:

  1. locking the backbuffer
  2. creating a backup of a 5x5 pixel square around the click location
  3. writing a 5x5 pixel solid colour to that square
  4. releasing the backbuffer
  5. drawing primitives
  6. locking the backbuffer again to check if the 5x5 pixel colour had changed colour (hit detection)
  7. restoring the original 5x5 pixel square
  8. releasing the backbuffer again

Totally nuts, especially when you consider the game was always redrawing the entire scene so the backup/restore was entirely pointless 😁

Reply 24 of 26, by chris.davis925

User metadata
Rank Newbie
Rank
Newbie
NoFrameLimit wrote on Today, 17:28:
Very old thread, but I thought I'd post here as I solved this issue in the Splinter Cell Chaos Theory Editor: https://youtu.be/x […]
Show full quote
Dege wrote on 2023-04-28, 15:58:
Yes, I'll compile a faking-lock version and see how it works. […]
Show full quote

Yes, I'll compile a faking-lock version and see how it works.

Of course there is a caching mechanism under the hood for locking surfaces (it's multi-level and quite complicated, especially with 'fast vidmem' mode), so subsequent Lock calls won't cause memory readback all the time. But this is only true if the GPU does not modify the content between two Lock's.
But, in UnrealEd this is the general pattern:

101782	129.501221	24796	UnrealEd.exe	Direct3DSurface8::LockRect (this = 1d471de0, pLockedRect = 5f6310, pRect = 0, Flags = 0)
101783 129.501249 24796 UnrealEd.exe Direct3DSurface8::UnlockRect (this = 1d471de0)
101784 129.501277 24796 UnrealEd.exe Direct3DSurface8::Release (this = 1d471de0)
101785 129.501307 24796 UnrealEd.exe Direct3DVertexBuffer8::Lock (this = 1d802f68, OffsetToLock = 11392, SizeToLock = 768, ppbData = 5f5e48, Flags = 1000)
101786 129.501336 24796 UnrealEd.exe Direct3DVertexBuffer8::Unlock (this = 1d802f68)
101787 129.501371 24796 UnrealEd.exe Direct3D8Device::SetTransform (this = 175b9330, State = D3DTS_WORLD, pMatrix = 1159f964)
101788 129.501400 24796 UnrealEd.exe Direct3D8Device::DrawPrimitive (this = 175b9330, PrimitiveType = D3DPT_LINELIST, StartVertex = 712, PrimitiveCount = 24)
101789 129.501489 24796 UnrealEd.exe Direct3D8Device::GetRenderTarget (this = 175b9330, ppRenderTarget = 5f6314)
101790 129.501518 24796 UnrealEd.exe Direct3DSurface8::QueryInterface (this = 1d471de0)
101791 129.501547 24796 UnrealEd.exe Direct3DSurface8::LockRect (this = 1d471de0, pLockedRect = 5f62fc, pRect = 0, Flags = 0)
101792 129.503337 24796 UnrealEd.exe Direct3DSurface8::UnlockRect (this = 1d471de0)

There are Draw* calls between surface Lock's so the videomemory must be read from video memory both for the first and second Lock. (and even must be written back to the videomemory for the Draw call..., so it's moved back and forth)

Very old thread, but I thought I'd post here as I solved this issue in the Splinter Cell Chaos Theory Editor:
https://youtu.be/xxnMBnKBSjY

The editor was:

  1. locking the backbuffer
  2. creating a backup of a 5x5 pixel square around the click location
  3. writing a 5x5 pixel solid colour to that square
  4. releasing the backbuffer
  5. drawing primitives
  6. locking the backbuffer again to check if the 5x5 pixel colour had changed colour (hit detection)
  7. restoring the original 5x5 pixel square
  8. releasing the backbuffer again

Totally nuts, especially when you consider the game was always redrawing the entire scene so the backup/restore was entirely pointless 😁

Neat. Are you able to share the process of **how** you fixed it? A technical deep dive would be great as several older UE2 editors have this same problem.

Reply 25 of 26, by NoFrameLimit

User metadata
Rank Newbie
Rank
Newbie

I reversed engineered the code and disabled all the logic which depended on writing data to the backbuffer and then:

Step 1:
Replaced 1-4: with a single call to draw a 5x5 rectangle of the correct colour (no locking at all). This doubled the speed

Step 2:
Replaced 6-8: instead of locking the backbuffer I'm creating a new surface, copying the pixels I care about to it, and locking that instead

Not how I would do it if I had source code, but I'm happy with the results.

Reply 26 of 26, by chris.davis925

User metadata
Rank Newbie
Rank
Newbie
NoFrameLimit wrote on Today, 20:53:
I reversed engineered the code and disabled all the logic which depended on writing data to the backbuffer and then: […]
Show full quote

I reversed engineered the code and disabled all the logic which depended on writing data to the backbuffer and then:

Step 1:
Replaced 1-4: with a single call to draw a 5x5 rectangle of the correct colour (no locking at all). This doubled the speed

Step 2:
Replaced 6-8: instead of locking the backbuffer I'm creating a new surface, copying the pixels I care about to it, and locking that instead

Not how I would do it if I had source code, but I'm happy with the results.

Are you willing to make a video of the step by step process? Or a write-up? Again thinking about other games which I'm sure will have different memory addresses and patching locations etc. Thnx