VOGONS


First post, by Joshhhuaaa

User metadata
Rank Newbie
Rank
Newbie

Not sure how familiar everyone here is with early Unreal Engine 2 and its games but just some quick info:

Many early Unreal Engine 2 shipped with a Unreal Level Editor, commonly referred to as UnrealEd. UnrealEd 1.0-2.0 were both for Unreal Engine 1. Unreal Engine 2 games use UnrealEd 3.0, some make the mistake of UnrealEd 3.0 being Unreal Engine 3. There were many public UnrealEd's that shipped alongside their game: Unreal 2, Unreal Tournament 2003/2004, Rainbow Six 3, Postal 2, Thief 3, Splinter Cell: Pandora Tomorrow Versus, Splinter Cell: Chaos Theory Versus, SWAT 4, etc.

All of these games that shipped with UnrealEd have an issue that is widespread with newer Windows 10/11 and using modern graphics cards from Nvidia, AMD, and Intel. This affects many modding communities for these old games that use the level editors and decide to upgrade to newer hardware. It has been an issue since 2015 with the launch of Windows 10 pretty much with no solutions to be found.

The issue: When selecting an actor (basically any object in a level) in any of the 3D viewports, the application hangs for several seconds, and then finally your actor is selected. This hang time can become very frustrating as the more complex your level gets, the longer you must wait. A level that is filled with thousands of actors can take up to 20+ seconds just to select any actor. The GPU usage does spike high in usage as it's trying to select the item and UnrealEd is unresponsive as you wait.

The only workaround I have found is disabling your graphics card in device manager, and running UnrealEd in software mode with your CPU. Actors select nearly instant as it was in the old days, with no longer than half of a second of waiting. I have been experimenting with many solutions as software mode's performance is terrible obviously so it's not practical.

So why am I even posting this as a dgVoodoo2 topic? dgVoodoo2 hasn't been compatible with UnrealEd as long as it existed unfortunately as far as I know. Upon opening it with dgVooodoo2, all of the 3D viewports and browsers are all severely broken.
Picture of dgVooodoo2 running UnrealEd:
dHWyc63.png

Picture of UnrealEd using no wrappers (how it's intended to look):
NPPIxYc.png

So the reason why dgVooodoo2 is involved here: I made a little discovery that can greatly reduce this issue while running UnrealEd with a hardware GPU using dgVoodoo2's Fast Video Memory Access feature.
aWydY5Z.png
I performed a test where I select a simple actor on a map that is filled with upon thousands of actors. I simply left click and measured the amount of time until the UnrealEd window became responsive again and the actor highlighted green as selected. The results are:
~0.50 seconds selecting (Software CPU)
~3.15 seconds selecting (dgVoodoo2 with fast video memory access enabled)
~12.15 seconds selecting (Hardware Nvidia GPU with on wrapper)
~19.05 seconds selecting (dgVoodoo2 stock settings)

As you can see, this fast video memory access feature is very good at trimming the time down, it's not quite perfect like software mode but it's definitely an improvement. Sadly, this improvement doesn't help much because as shown above dgVoodoo2 doesn't even display the 3D viewports correctly. I assume most likely nobody can help beside Dege with this (Unsure if dgVoodoo2 is open source and if people know of its internal techniques).

This issue is widespread through many games and older communities, so this could make many people happy to see some kind of fix.
So, the questions I got:
- How difficult would it be to make UnrealEd's 3D viewports compatible with dgVoodoo2?
- What exactly does the option "Fast video memory access" do? I know the option is probably self-explanatory, but possibly share more details. It would be great if that the option got so optimized to the point it was on par with software mode's selection speed.
- Even if you don't think you could get dgVoodoo2 and UnrealEd to like each other, is there a way you could make a simple fix that literally just enables "Fast video memory access" with no additional improvements like emulating a old video card, so we could experience this speedup on a modern GPU.

Thanks for anyone who took the time to read and hope we can find a fix for this issue that has been a problem for quite awhile now causing some to give up editing levels on newer PCs. You don't exactly need a retro PC, but Windows 10/11 are completely busted. The last PC I had with a working UnrealEd was Windows 7 with a Nvidia GTX 900 series video card. I've also tried using DX8to9 as well as DXVK and the issue persists even in Vulkan.

Reply 1 of 22, by Dege

User metadata
Rank l33t
Rank
l33t

I gave this a go with the UnrealEd shipped with Postal 2 and indeed, the problems you described exist.

I fixed the viewport problem.
The slowdown problem is because the editor locks the viewport GPU backbuffers for CPU access a lot of times. I don't know, why it does that but it does.
The problem is that it cannot be done on modern hardware so the only way to achieve that "lock" is to copy the videomemory to system memory (back and forth if needed) which adds a large overhead to the lock-operation.
Fast video memory access is a special capability in dgvoodoo, when it is enabled, the wrapper detects which parts of the buffer the application accesses by the CPU and moves only the partially needed memory content between the CPU/GPU on-demand.
So, the point is, only relatively small parts of the memory is copied, and only when detected they need to be copied, so it reduces the overhead for those "lock" operations.

Software emulation does not suffer from this problem, so it's much faster in this regard. Also, UnrealEd may run faster on an IGP than a discrete video card because of the unified system/video memory.

Reply 2 of 22, by Joshhhuaaa

User metadata
Rank Newbie
Rank
Newbie
Dege wrote on 2022-11-21, 18:37:
I gave this a go with the UnrealEd shipped with Postal 2 and indeed, the problems you described exist. […]
Show full quote

I gave this a go with the UnrealEd shipped with Postal 2 and indeed, the problems you described exist.

I fixed the viewport problem.
The slowdown problem is because the editor locks the viewport GPU backbuffers for CPU access a lot of times. I don't know, why it does that but it does.
The problem is that it cannot be done on modern hardware so the only way to achieve that "lock" is to copy the videomemory to system memory (back and forth if needed) which adds a large overhead to the lock-operation.
Fast video memory access is a special capability in dgvoodoo, when it is enabled, the wrapper detects which parts of the buffer the application accesses by the CPU and moves only the partially needed memory content between the CPU/GPU on-demand.
So, the point is, only relatively small parts of the memory is copied, and only when detected they need to be copied, so it reduces the overhead for those "lock" operations.

Software emulation does not suffer from this problem, so it's much faster in this regard. Also, UnrealEd may run faster on an IGP than a discrete video card because of the unified system/video memory.

Hey, welcome back, thanks for the info and taking the time to look into it. If you did fix that viewport problem with UnrealEd and dgVoodoo2, could you possibly send a particular version of it over or would it be fixed in a later release? The fast video memory access definitely isn't on par with software emulation but it is a nice improvement if the viewports are working. I have actually tried running this on my Intel's iGPU but I had no luck in getting any better results, they could've been very slightly better... can't recall but certainly not amazing results.

I think it is kind of confusing whatever is causing the problem. It used to blamed on Windows 10 / 11, but Windows 7/8.1 has problems from testing if using a newer Nvidia RTX 20/30 series card. Previously, 7/8.1 were perfect for UnrealEd for the GPUs that were out at the time, such as GTX 900 series having 0 problems, but Nvidia's newer generations suffer from it even on legacy OS's that used to never have the problem. I'm not sure how realistic this could be fixed, not too smart with this stuff, but a friend and I were are having Nvidia possibly see if they can do anything with their drivers.

Reply 3 of 22, by Dege

User metadata
Rank l33t
Rank
l33t

Yes, it's planned to be released in a later version, but ATM I don't have a clue when one will be released. But anyway, I've just released a zip for another fix to test so you can also test it with UnrealEd:

http://dege.fw.hu/temp/dgVoodoo_fixes_to_test_2_79_3.zip

I wonder if NV will be able to fix the problem in any way because I think this is a limitiation of modern hardware (non-linear gpu memory layout that cannot be accessed directly by the cpu).

Reply 4 of 22, by Joshhhuaaa

User metadata
Rank Newbie
Rank
Newbie
Dege wrote on 2022-11-23, 16:51:

Yes, it's planned to be released in a later version, but ATM I don't have a clue when one will be released. But anyway, I've just released a zip for another fix to test so you can also test it with UnrealEd:

http://dege.fw.hu/temp/dgVoodoo_fixes_to_test_2_79_3.zip

I wonder if NV will be able to fix the problem in any way because I think this is a limitiation of modern hardware (non-linear gpu memory layout that cannot be accessed directly by the cpu).

Thanks a lot, no worries about getting it out fast in a stable version. Wasn't expecting the viewport problem to even be fixed that quickly either. Have a good holidays.

Reply 5 of 22, by RC-1266

User metadata
Rank Newbie
Rank
Newbie
Dege wrote on 2022-11-23, 16:51:

But anyway, I've just released a zip for another fix to test so you can also test it with UnrealEd:

http://dege.fw.hu/temp/dgVoodoo_fixes_to_test_2_79_3.zip

I was encountering the same issues with the UnrealEd.exe that came with Star Wars: Republic Commando.
- Initially (this version of) UnrealEd wouldn't even launch without dgVoodoo due to a C++ error;
- With the DLL files in the zip you provided for fix testing, I could fix the viewports issue;
- The remaining issue I encountered (black flickering within the viewports, random black bars partially covering the viewports) was fixed by selecting Direct3D 11 MS WARP (software) as an Output API in dgVoodoo's General tab. (This does make the game run poorly though, so before you start the game, switch back to Best available one.)

So I'd like to thank you a lot for providing the custom DLL files. And if anyone else 'out there' has the same issue, I hope they will quickly stumble upon this thread.

Reply 6 of 22, by Joshhhuaaa

User metadata
Rank Newbie
Rank
Newbie
RC-1266 wrote on 2023-01-05, 16:53:
I was encountering the same issues with the UnrealEd.exe that came with Star Wars: Republic Commando. - Initially (this version […]
Show full quote
Dege wrote on 2022-11-23, 16:51:

But anyway, I've just released a zip for another fix to test so you can also test it with UnrealEd:

http://dege.fw.hu/temp/dgVoodoo_fixes_to_test_2_79_3.zip

I was encountering the same issues with the UnrealEd.exe that came with Star Wars: Republic Commando.
- Initially (this version of) UnrealEd wouldn't even launch without dgVoodoo due to a C++ error;
- With the DLL files in the zip you provided for fix testing, I could fix the viewports issue;
- The remaining issue I encountered (black flickering within the viewports, random black bars partially covering the viewports) was fixed by selecting Direct3D 11 MS WARP (software) as an Output API in dgVoodoo's General tab. (This does make the game run poorly though, so before you start the game, switch back to Best available one.)

So I'd like to thank you a lot for providing the custom DLL files. And if anyone else 'out there' has the same issue, I hope they will quickly stumble upon this thread.

ah yeah, completely forgot to mention Star Wars: Republic Commando, but now I remember that game also was Unreal 2 that shipped with an UnrealEd. Very nice to see others already benefiting from the small WIP fix. I assume all the games act slightly different from one another because they have their own little modifications, but Splinter Cell Chaos Theory Versus in particular never had a C++ error, but only the viewport issue shown in the initial post. SCCT Versus also never had black flickering with the viewports but glad you found a workaround.

Reply 7 of 22, by Kerouha

User metadata
Rank Newbie
Rank
Newbie
Dege wrote on 2022-11-23, 16:51:

I've just released a zip for another fix to test so you can also test it with UnrealEd
...

I've tried the fix with Valve Hammer editor (Left 4 Dead 2 version), since it appears to have same viewport issues. Hammer crashes as you open any file, though for a moment you can see 3D viewport being rendered correctly.

Reply 8 of 22, by chris.davis925

User metadata
Rank Newbie
Rank
Newbie
Dege wrote on 2022-11-21, 18:37:
I gave this a go with the UnrealEd shipped with Postal 2 and indeed, the problems you described exist. […]
Show full quote

I gave this a go with the UnrealEd shipped with Postal 2 and indeed, the problems you described exist.

I fixed the viewport problem.
The slowdown problem is because the editor locks the viewport GPU backbuffers for CPU access a lot of times. I don't know, why it does that but it does.
The problem is that it cannot be done on modern hardware so the only way to achieve that "lock" is to copy the videomemory to system memory (back and forth if needed) which adds a large overhead to the lock-operation.
Fast video memory access is a special capability in dgvoodoo, when it is enabled, the wrapper detects which parts of the buffer the application accesses by the CPU and moves only the partially needed memory content between the CPU/GPU on-demand.
So, the point is, only relatively small parts of the memory is copied, and only when detected they need to be copied, so it reduces the overhead for those "lock" operations.

Software emulation does not suffer from this problem, so it's much faster in this regard. Also, UnrealEd may run faster on an IGP than a discrete video card because of the unified system/video memory.

Hi @Dege, I created an account just to say THANK YOU for creating this fix. Since a young boy, I would often create things in Unreal Editor 2.x which came with some of my favorite games from childhood. I still like to play around with some of these older games. It was simply impossible to use the Unreal Editor 2.x in Windows 10/11 in recent years because of the problem of the viewport locking up, thanks to the LockRect function from DX9 which doesn't work well on modern GPUs.

I have found that when I work with levels which have a larger memory footprint, the Unreal Editor becomes somewhat unusable again, even with the Fast Memory access workaround you have created. I presume it's because larger amounts of memory have to travel back fourth. I read an article earlier this week which appears to indicate that a recent Direct X 12 update from Microsoft allows for the CPU and GPU to both access VRAM as the same time. Do you think it would be at all possible to use your wrapper to take advantage of this DirectX 12 feature and somehow translate DirectX9 LockRect to utilize this?
https://www.tomshardware.com/news/dx12-optimi … -simultaneously

Here's to wishing for a way to get these older Unreal Editor games to run smoothly on modern hardware. Let me know if there is anything I can do to encourage development into this. Cheers!

Reply 9 of 22, by Dege

User metadata
Rank l33t
Rank
l33t
chris.davis925 wrote on 2023-04-02, 23:41:
Hi @Dege, I created an account just to say THANK YOU for creating this fix. Since a young boy, I would often create things in Un […]
Show full quote
Dege wrote on 2022-11-21, 18:37:
I gave this a go with the UnrealEd shipped with Postal 2 and indeed, the problems you described exist. […]
Show full quote

I gave this a go with the UnrealEd shipped with Postal 2 and indeed, the problems you described exist.

I fixed the viewport problem.
The slowdown problem is because the editor locks the viewport GPU backbuffers for CPU access a lot of times. I don't know, why it does that but it does.
The problem is that it cannot be done on modern hardware so the only way to achieve that "lock" is to copy the videomemory to system memory (back and forth if needed) which adds a large overhead to the lock-operation.
Fast video memory access is a special capability in dgvoodoo, when it is enabled, the wrapper detects which parts of the buffer the application accesses by the CPU and moves only the partially needed memory content between the CPU/GPU on-demand.
So, the point is, only relatively small parts of the memory is copied, and only when detected they need to be copied, so it reduces the overhead for those "lock" operations.

Software emulation does not suffer from this problem, so it's much faster in this regard. Also, UnrealEd may run faster on an IGP than a discrete video card because of the unified system/video memory.

Hi @Dege, I created an account just to say THANK YOU for creating this fix. Since a young boy, I would often create things in Unreal Editor 2.x which came with some of my favorite games from childhood. I still like to play around with some of these older games. It was simply impossible to use the Unreal Editor 2.x in Windows 10/11 in recent years because of the problem of the viewport locking up, thanks to the LockRect function from DX9 which doesn't work well on modern GPUs.

I have found that when I work with levels which have a larger memory footprint, the Unreal Editor becomes somewhat unusable again, even with the Fast Memory access workaround you have created. I presume it's because larger amounts of memory have to travel back fourth. I read an article earlier this week which appears to indicate that a recent Direct X 12 update from Microsoft allows for the CPU and GPU to both access VRAM as the same time. Do you think it would be at all possible to use your wrapper to take advantage of this DirectX 12 feature and somehow translate DirectX9 LockRect to utilize this?
https://www.tomshardware.com/news/dx12-optimi … -simultaneously

Here's to wishing for a way to get these older Unreal Editor games to run smoothly on modern hardware. Let me know if there is anything I can do to encourage development into this. Cheers!

Hi! Thanks!

I'll look into this, but I'm afraid this DX12 update "just" enables the same for discrete GPU's as already is available for integrated GPU's.
The problem is more complicated because even if the GPU resource is mapped into the CPU address space then it either cannot be a rendertarget or does not have the row-major memory layout so a pointer cannot be provided to it to the application (but can only be read/written by Read/Write functions through DX12 because only the driver itself knows the actual memory layout of the resource).
(And I guess that's why the 'Lockable' flag was a parameter for rendertarget creation in DX8/9 which provided the row-major layout with lower GPU rendering performance in return.)
But anyway, I'll look into it, maybe I remember something wrong.

Reply 10 of 22, by chris.davis925

User metadata
Rank Newbie
Rank
Newbie

Thanks again, @Dege. I have my fingers crossed that something may be possible!

Side note: even if it's not possible to restore the LockRect function to "normal", perhaps there is a way to discard these calls or always return a fake value? Obviously this function is needed for something or it wouldn't be there. Just thought I would ask. 😀

Reply 11 of 22, by Dege

User metadata
Rank l33t
Rank
l33t
chris.davis925 wrote on 2023-04-07, 04:03:

Thanks again, @Dege. I have my fingers crossed that something may be possible!

Side note: even if it's not possible to restore the LockRect function to "normal", perhaps there is a way to discard these calls or always return a fake value? Obviously this function is needed for something or it wouldn't be there. Just thought I would ask. 😀

I looked into it and got to the following. To give the application a pointer directly to the surface, it has to have row-major memory layout, it's essential because that's the only layout an application can understand. And,

- The GPU must support row-major for rendertarget textures, so it's optional in the first place. My GF1060 and AMD R7 360 supports only copying to/from row-major, but not rendering into it. On the other side, the Intel HD 530 supports row-major RT's, yeah.

- IGP's don't need GPU_UPLOAD_HEAP because system memory can already be mapped into the address space, so I gave it a try to do that with a row-major rendertarget. I turned out that a RT texture can only be row-major if it is created on a (cross-adapter) shared heap. Ok, and at the same time, shared heaps cannot be mapped (CPU-visible)...

- So, it won't work with GPU_UPLOAD_HEAP as well, even if the GPU supports row-major RT's (I don't know if there is such a discrete GPU, anyway)

GPU_UPLOAD_HEAP could still be useful in dgVoodoo because one copy could be optimized out (for most cases). But unfortunately this cannot be combined with the technique used with 'fast video memory access' (which is 'partial' video memory access technically), so I'm not sure at all if it'd help in cases like this (it could be even worse), when the texture is locked too many times per frame. 😐

So, for now I'm unsure what to do with GPU_UPLOAD_HEAP.
Btw, faking surface lock for UnrealEd might worth a try. I don't have an idea why it does it in the first place.

Reply 12 of 22, by chris.davis925

User metadata
Rank Newbie
Rank
Newbie
Dege wrote on 2023-04-19, 18:24:
I looked into it and got to the following. To give the application a pointer directly to the surface, it has to have row-major m […]
Show full quote
chris.davis925 wrote on 2023-04-07, 04:03:

Thanks again, @Dege. I have my fingers crossed that something may be possible!

Side note: even if it's not possible to restore the LockRect function to "normal", perhaps there is a way to discard these calls or always return a fake value? Obviously this function is needed for something or it wouldn't be there. Just thought I would ask. 😀

I looked into it and got to the following. To give the application a pointer directly to the surface, it has to have row-major memory layout, it's essential because that's the only layout an application can understand. And,

- The GPU must support row-major for rendertarget textures, so it's optional in the first place. My GF1060 and AMD R7 360 supports only copying to/from row-major, but not rendering into it. On the other side, the Intel HD 530 supports row-major RT's, yeah.

- IGP's don't need GPU_UPLOAD_HEAP because system memory can already be mapped into the address space, so I gave it a try to do that with a row-major rendertarget. I turned out that a RT texture can only be row-major if it is created on a (cross-adapter) shared heap. Ok, and at the same time, shared heaps cannot be mapped (CPU-visible)...

- So, it won't work with GPU_UPLOAD_HEAP as well, even if the GPU supports row-major RT's (I don't know if there is such a discrete GPU, anyway)

GPU_UPLOAD_HEAP could still be useful in dgVoodoo because one copy could be optimized out (for most cases). But unfortunately this cannot be combined with the technique used with 'fast video memory access' (which is 'partial' video memory access technically), so I'm not sure at all if it'd help in cases like this (it could be even worse), when the texture is locked too many times per frame. 😐

So, for now I'm unsure what to do with GPU_UPLOAD_HEAP.
Btw, faking surface lock for UnrealEd might worth a try. I don't have an idea why it does it in the first place.

Hello, thanks again for looking at this. What you said about your GF1060 is interesting because it also matches my experience of the last card I owned which worked with Unreal Editor (UE2 based) was my GeForce 1080. Oddly, it also performed faster in Windows 8.1 than it did in Windows 10. I used to dual boot into Windows 8.1 just to play with Unreal Editor. This was not possible once I got a RTX 3090 because there are no Windows 8.1 drivers for RTX 30x/40x cards. I opened an Nvidia support ticket to look into this years ago, but it went nowhere.

Is it "easy" for you to test if GPU_UNLOAD_HEAP improves performance with Unreal Editor?
In regards to faking surface lock for Unreal Editor - is this something that can be attempted with dgVoodoo?

Edit: Also, does this mean that the new intel discrete GPUs are also likely to support row-major for render target textures, like the integrated Intel GPU does? If so, I assume this means that Unreal ED is likely to work as intended on this GPU?

Reply 13 of 22, by Dege

User metadata
Rank l33t
Rank
l33t
chris.davis925 wrote on 2023-04-25, 05:53:
Hello, thanks again for looking at this. What you said about your GF1060 is interesting because it also matches my experience of […]
Show full quote
Dege wrote on 2023-04-19, 18:24:
I looked into it and got to the following. To give the application a pointer directly to the surface, it has to have row-major m […]
Show full quote
chris.davis925 wrote on 2023-04-07, 04:03:

Thanks again, @Dege. I have my fingers crossed that something may be possible!

Side note: even if it's not possible to restore the LockRect function to "normal", perhaps there is a way to discard these calls or always return a fake value? Obviously this function is needed for something or it wouldn't be there. Just thought I would ask. 😀

I looked into it and got to the following. To give the application a pointer directly to the surface, it has to have row-major memory layout, it's essential because that's the only layout an application can understand. And,

- The GPU must support row-major for rendertarget textures, so it's optional in the first place. My GF1060 and AMD R7 360 supports only copying to/from row-major, but not rendering into it. On the other side, the Intel HD 530 supports row-major RT's, yeah.

- IGP's don't need GPU_UPLOAD_HEAP because system memory can already be mapped into the address space, so I gave it a try to do that with a row-major rendertarget. I turned out that a RT texture can only be row-major if it is created on a (cross-adapter) shared heap. Ok, and at the same time, shared heaps cannot be mapped (CPU-visible)...

- So, it won't work with GPU_UPLOAD_HEAP as well, even if the GPU supports row-major RT's (I don't know if there is such a discrete GPU, anyway)

GPU_UPLOAD_HEAP could still be useful in dgVoodoo because one copy could be optimized out (for most cases). But unfortunately this cannot be combined with the technique used with 'fast video memory access' (which is 'partial' video memory access technically), so I'm not sure at all if it'd help in cases like this (it could be even worse), when the texture is locked too many times per frame. 😐

So, for now I'm unsure what to do with GPU_UPLOAD_HEAP.
Btw, faking surface lock for UnrealEd might worth a try. I don't have an idea why it does it in the first place.

Hello, thanks again for looking at this. What you said about your GF1060 is interesting because it also matches my experience of the last card I owned which worked with Unreal Editor (UE2 based) was my GeForce 1080. Oddly, it also performed faster in Windows 8.1 than it did in Windows 10. I used to dual boot into Windows 8.1 just to play with Unreal Editor. This was not possible once I got a RTX 3090 because there are no Windows 8.1 drivers for RTX 30x/40x cards. I opened an Nvidia support ticket to look into this years ago, but it went nowhere.

Is it "easy" for you to test if GPU_UNLOAD_HEAP improves performance with Unreal Editor?
In regards to faking surface lock for Unreal Editor - is this something that can be attempted with dgVoodoo?

Edit: Also, does this mean that the new intel discrete GPUs are also likely to support row-major for render target textures, like the integrated Intel GPU does? If so, I assume this means that Unreal ED is likely to work as intended on this GPU?

No, unfortunately it's not easy:
- Code complexity in dgVoodoo, implementing gpu upload heap "needs development"
- The D3D12 Agility SDK is basically unusable with dgVoodoo because you need to define what version of D3D12 dll's your application wants and where those dll's are. They can only be defined by exporting 2 symbols from the main executable which is not a viable way with dgVoodoo (and you still need driver support for the new features). The only way is an own test app for this and that but that's all. So, all I can do is wait until the given version of the D3D12 dll's make into the mainstream OS through updates.
(btw, I did a little development for a feature (re)appeared in the Agility SDK: triangle fans)

Faking the surface lock is not achievable with dgVoodoo but maybe I could compile such a version for myself and try it.

Since my Intel IGP reported supported row-major rendertargets, I tried UnrealEd with it (native D3D8). It was faster (2s for selecting an object) than with my NV (~5s or worse).
A MS guy told me that in D3D9 it was up to the driver what memory it used an mapped for lockable surfaces. So based on these, I have a feeling that even the Intel driver chooses not to map the rendertarget directly (or it cannot do that in WDDM) but copy the surface data into a row-major surface in the Lock call instead.

So even if a GPU supports row-major RT's (it's a legacy feature so I don't think discrete Intel GPU's supports them, but I don't know anyway) it cannot be mapped into the address space through D3D12. A copy to a non-RT is always needed and probably that's what the current drivers do. I could only do the same in dgVoodoo. Gpu upload heap would only help it to make the copy inside the video memory, which is faster than video-to-system, but even with that I'd expect the same ~2s latency (on my machine) which is slower than the already available 'fast video memory' access technique in dgVoodoo.

Reply 14 of 22, by chris.davis925

User metadata
Rank Newbie
Rank
Newbie
Dege wrote on 2023-04-26, 18:52:
No, unfortunately it's not easy: - Code complexity in dgVoodoo, implementing gpu upload heap "needs development" - The D3D12 Agi […]
Show full quote
chris.davis925 wrote on 2023-04-25, 05:53:
Hello, thanks again for looking at this. What you said about your GF1060 is interesting because it also matches my experience of […]
Show full quote
Dege wrote on 2023-04-19, 18:24:
I looked into it and got to the following. To give the application a pointer directly to the surface, it has to have row-major m […]
Show full quote

I looked into it and got to the following. To give the application a pointer directly to the surface, it has to have row-major memory layout, it's essential because that's the only layout an application can understand. And,

- The GPU must support row-major for rendertarget textures, so it's optional in the first place. My GF1060 and AMD R7 360 supports only copying to/from row-major, but not rendering into it. On the other side, the Intel HD 530 supports row-major RT's, yeah.

- IGP's don't need GPU_UPLOAD_HEAP because system memory can already be mapped into the address space, so I gave it a try to do that with a row-major rendertarget. I turned out that a RT texture can only be row-major if it is created on a (cross-adapter) shared heap. Ok, and at the same time, shared heaps cannot be mapped (CPU-visible)...

- So, it won't work with GPU_UPLOAD_HEAP as well, even if the GPU supports row-major RT's (I don't know if there is such a discrete GPU, anyway)

GPU_UPLOAD_HEAP could still be useful in dgVoodoo because one copy could be optimized out (for most cases). But unfortunately this cannot be combined with the technique used with 'fast video memory access' (which is 'partial' video memory access technically), so I'm not sure at all if it'd help in cases like this (it could be even worse), when the texture is locked too many times per frame. 😐

So, for now I'm unsure what to do with GPU_UPLOAD_HEAP.
Btw, faking surface lock for UnrealEd might worth a try. I don't have an idea why it does it in the first place.

Hello, thanks again for looking at this. What you said about your GF1060 is interesting because it also matches my experience of the last card I owned which worked with Unreal Editor (UE2 based) was my GeForce 1080. Oddly, it also performed faster in Windows 8.1 than it did in Windows 10. I used to dual boot into Windows 8.1 just to play with Unreal Editor. This was not possible once I got a RTX 3090 because there are no Windows 8.1 drivers for RTX 30x/40x cards. I opened an Nvidia support ticket to look into this years ago, but it went nowhere.

Is it "easy" for you to test if GPU_UNLOAD_HEAP improves performance with Unreal Editor?
In regards to faking surface lock for Unreal Editor - is this something that can be attempted with dgVoodoo?

Edit: Also, does this mean that the new intel discrete GPUs are also likely to support row-major for render target textures, like the integrated Intel GPU does? If so, I assume this means that Unreal ED is likely to work as intended on this GPU?

No, unfortunately it's not easy:
- Code complexity in dgVoodoo, implementing gpu upload heap "needs development"
- The D3D12 Agility SDK is basically unusable with dgVoodoo because you need to define what version of D3D12 dll's your application wants and where those dll's are. They can only be defined by exporting 2 symbols from the main executable which is not a viable way with dgVoodoo (and you still need driver support for the new features). The only way is an own test app for this and that but that's all. So, all I can do is wait until the given version of the D3D12 dll's make into the mainstream OS through updates.
(btw, I did a little development for a feature (re)appeared in the Agility SDK: triangle fans)

Faking the surface lock is not achievable with dgVoodoo but maybe I could compile such a version for myself and try it.

Since my Intel IGP reported supported row-major rendertargets, I tried UnrealEd with it (native D3D8). It was faster (2s for selecting an object) than with my NV (~5s or worse).
A MS guy told me that in D3D9 it was up to the driver what memory it used an mapped for lockable surfaces. So based on these, I have a feeling that even the Intel driver chooses not to map the rendertarget directly (or it cannot do that in WDDM) but copy the surface data into a row-major surface in the Lock call instead.

So even if a GPU supports row-major RT's (it's a legacy feature so I don't think discrete Intel GPU's supports them, but I don't know anyway) it cannot be mapped into the address space through D3D12. A copy to a non-RT is always needed and probably that's what the current drivers do. I could only do the same in dgVoodoo. Gpu upload heap would only help it to make the copy inside the video memory, which is faster than video-to-system, but even with that I'd expect the same ~2s latency (on my machine) which is slower than the already available 'fast video memory' access technique in dgVoodoo.

Let me know if you end up compiling a version to bypass - it seems like this is really our only hope?

I don't understand this at anywhere near the depth that you or others do, but I was wondering if perhaps another option could be to intercept calls to LockRect and use a custom caching mechanism? Instead of locking the actual surface, store a reference to the locked region and allocate a temporary buffer for the requested data. Likewise, calls to modify or unlock would also simply modify the temp buffer?

Perhaps another option could be using multiple parallel processing of LockRect (Triple Buffering?) or another way to not stop the main rendering? It seems that UnrealED makes several calls to this function, even when you are not clicking anything.

Odds are that my ideas are rudimentary and unhelpful, but I thought I would suggest them none the less.

Happy to continue this conversation on Discord as a faster means of dialogue.

Reply 15 of 22, by Dege

User metadata
Rank l33t
Rank
l33t

Yes, I'll compile a faking-lock version and see how it works.

Of course there is a caching mechanism under the hood for locking surfaces (it's multi-level and quite complicated, especially with 'fast vidmem' mode), so subsequent Lock calls won't cause memory readback all the time. But this is only true if the GPU does not modify the content between two Lock's.
But, in UnrealEd this is the general pattern:

101782	129.501221	24796	UnrealEd.exe	Direct3DSurface8::LockRect (this = 1d471de0, pLockedRect = 5f6310, pRect = 0, Flags = 0)
101783 129.501249 24796 UnrealEd.exe Direct3DSurface8::UnlockRect (this = 1d471de0)
101784 129.501277 24796 UnrealEd.exe Direct3DSurface8::Release (this = 1d471de0)
101785 129.501307 24796 UnrealEd.exe Direct3DVertexBuffer8::Lock (this = 1d802f68, OffsetToLock = 11392, SizeToLock = 768, ppbData = 5f5e48, Flags = 1000)
101786 129.501336 24796 UnrealEd.exe Direct3DVertexBuffer8::Unlock (this = 1d802f68)
101787 129.501371 24796 UnrealEd.exe Direct3D8Device::SetTransform (this = 175b9330, State = D3DTS_WORLD, pMatrix = 1159f964)
101788 129.501400 24796 UnrealEd.exe Direct3D8Device::DrawPrimitive (this = 175b9330, PrimitiveType = D3DPT_LINELIST, StartVertex = 712, PrimitiveCount = 24)
101789 129.501489 24796 UnrealEd.exe Direct3D8Device::GetRenderTarget (this = 175b9330, ppRenderTarget = 5f6314)
101790 129.501518 24796 UnrealEd.exe Direct3DSurface8::QueryInterface (this = 1d471de0)
101791 129.501547 24796 UnrealEd.exe Direct3DSurface8::LockRect (this = 1d471de0, pLockedRect = 5f62fc, pRect = 0, Flags = 0)
101792 129.503337 24796 UnrealEd.exe Direct3DSurface8::UnlockRect (this = 1d471de0)

There are Draw* calls between surface Lock's so the videomemory must be read from video memory both for the first and second Lock. (and even must be written back to the videomemory for the Draw call..., so it's moved back and forth)

Reply 16 of 22, by chris.davis925

User metadata
Rank Newbie
Rank
Newbie
Dege wrote on 2023-04-28, 15:58:
Yes, I'll compile a faking-lock version and see how it works. […]
Show full quote

Yes, I'll compile a faking-lock version and see how it works.

Of course there is a caching mechanism under the hood for locking surfaces (it's multi-level and quite complicated, especially with 'fast vidmem' mode), so subsequent Lock calls won't cause memory readback all the time. But this is only true if the GPU does not modify the content between two Lock's.
But, in UnrealEd this is the general pattern:

101782	129.501221	24796	UnrealEd.exe	Direct3DSurface8::LockRect (this = 1d471de0, pLockedRect = 5f6310, pRect = 0, Flags = 0)
101783 129.501249 24796 UnrealEd.exe Direct3DSurface8::UnlockRect (this = 1d471de0)
101784 129.501277 24796 UnrealEd.exe Direct3DSurface8::Release (this = 1d471de0)
101785 129.501307 24796 UnrealEd.exe Direct3DVertexBuffer8::Lock (this = 1d802f68, OffsetToLock = 11392, SizeToLock = 768, ppbData = 5f5e48, Flags = 1000)
101786 129.501336 24796 UnrealEd.exe Direct3DVertexBuffer8::Unlock (this = 1d802f68)
101787 129.501371 24796 UnrealEd.exe Direct3D8Device::SetTransform (this = 175b9330, State = D3DTS_WORLD, pMatrix = 1159f964)
101788 129.501400 24796 UnrealEd.exe Direct3D8Device::DrawPrimitive (this = 175b9330, PrimitiveType = D3DPT_LINELIST, StartVertex = 712, PrimitiveCount = 24)
101789 129.501489 24796 UnrealEd.exe Direct3D8Device::GetRenderTarget (this = 175b9330, ppRenderTarget = 5f6314)
101790 129.501518 24796 UnrealEd.exe Direct3DSurface8::QueryInterface (this = 1d471de0)
101791 129.501547 24796 UnrealEd.exe Direct3DSurface8::LockRect (this = 1d471de0, pLockedRect = 5f62fc, pRect = 0, Flags = 0)
101792 129.503337 24796 UnrealEd.exe Direct3DSurface8::UnlockRect (this = 1d471de0)

There are Draw* calls between surface Lock's so the videomemory must be read from video memory both for the first and second Lock. (and even must be written back to the videomemory for the Draw call..., so it's moved back and forth)

I shall eagerly await what FakeRect brings. 😀

Reply 17 of 22, by Dege

User metadata
Rank l33t
Rank
l33t
chris.davis925 wrote on 2023-04-28, 16:10:
Dege wrote on 2023-04-28, 15:58:
Yes, I'll compile a faking-lock version and see how it works. […]
Show full quote

Yes, I'll compile a faking-lock version and see how it works.

Of course there is a caching mechanism under the hood for locking surfaces (it's multi-level and quite complicated, especially with 'fast vidmem' mode), so subsequent Lock calls won't cause memory readback all the time. But this is only true if the GPU does not modify the content between two Lock's.
But, in UnrealEd this is the general pattern:

101782	129.501221	24796	UnrealEd.exe	Direct3DSurface8::LockRect (this = 1d471de0, pLockedRect = 5f6310, pRect = 0, Flags = 0)
101783 129.501249 24796 UnrealEd.exe Direct3DSurface8::UnlockRect (this = 1d471de0)
101784 129.501277 24796 UnrealEd.exe Direct3DSurface8::Release (this = 1d471de0)
101785 129.501307 24796 UnrealEd.exe Direct3DVertexBuffer8::Lock (this = 1d802f68, OffsetToLock = 11392, SizeToLock = 768, ppbData = 5f5e48, Flags = 1000)
101786 129.501336 24796 UnrealEd.exe Direct3DVertexBuffer8::Unlock (this = 1d802f68)
101787 129.501371 24796 UnrealEd.exe Direct3D8Device::SetTransform (this = 175b9330, State = D3DTS_WORLD, pMatrix = 1159f964)
101788 129.501400 24796 UnrealEd.exe Direct3D8Device::DrawPrimitive (this = 175b9330, PrimitiveType = D3DPT_LINELIST, StartVertex = 712, PrimitiveCount = 24)
101789 129.501489 24796 UnrealEd.exe Direct3D8Device::GetRenderTarget (this = 175b9330, ppRenderTarget = 5f6314)
101790 129.501518 24796 UnrealEd.exe Direct3DSurface8::QueryInterface (this = 1d471de0)
101791 129.501547 24796 UnrealEd.exe Direct3DSurface8::LockRect (this = 1d471de0, pLockedRect = 5f62fc, pRect = 0, Flags = 0)
101792 129.503337 24796 UnrealEd.exe Direct3DSurface8::UnlockRect (this = 1d471de0)

There are Draw* calls between surface Lock's so the videomemory must be read from video memory both for the first and second Lock. (and even must be written back to the videomemory for the Draw call..., so it's moved back and forth)

I shall eagerly await what FakeRect brings. 😀

It tried a version with faked locks for rendertarget surfaces. Sadly, it does not work. I can't select anything. The data read seems to be important for the editor. 😐

Reply 18 of 22, by chris.davis925

User metadata
Rank Newbie
Rank
Newbie
Dege wrote on 2023-04-28, 19:55:
chris.davis925 wrote on 2023-04-28, 16:10:
Dege wrote on 2023-04-28, 15:58:
Yes, I'll compile a faking-lock version and see how it works. […]
Show full quote

Yes, I'll compile a faking-lock version and see how it works.

Of course there is a caching mechanism under the hood for locking surfaces (it's multi-level and quite complicated, especially with 'fast vidmem' mode), so subsequent Lock calls won't cause memory readback all the time. But this is only true if the GPU does not modify the content between two Lock's.
But, in UnrealEd this is the general pattern:

101782	129.501221	24796	UnrealEd.exe	Direct3DSurface8::LockRect (this = 1d471de0, pLockedRect = 5f6310, pRect = 0, Flags = 0)
101783 129.501249 24796 UnrealEd.exe Direct3DSurface8::UnlockRect (this = 1d471de0)
101784 129.501277 24796 UnrealEd.exe Direct3DSurface8::Release (this = 1d471de0)
101785 129.501307 24796 UnrealEd.exe Direct3DVertexBuffer8::Lock (this = 1d802f68, OffsetToLock = 11392, SizeToLock = 768, ppbData = 5f5e48, Flags = 1000)
101786 129.501336 24796 UnrealEd.exe Direct3DVertexBuffer8::Unlock (this = 1d802f68)
101787 129.501371 24796 UnrealEd.exe Direct3D8Device::SetTransform (this = 175b9330, State = D3DTS_WORLD, pMatrix = 1159f964)
101788 129.501400 24796 UnrealEd.exe Direct3D8Device::DrawPrimitive (this = 175b9330, PrimitiveType = D3DPT_LINELIST, StartVertex = 712, PrimitiveCount = 24)
101789 129.501489 24796 UnrealEd.exe Direct3D8Device::GetRenderTarget (this = 175b9330, ppRenderTarget = 5f6314)
101790 129.501518 24796 UnrealEd.exe Direct3DSurface8::QueryInterface (this = 1d471de0)
101791 129.501547 24796 UnrealEd.exe Direct3DSurface8::LockRect (this = 1d471de0, pLockedRect = 5f62fc, pRect = 0, Flags = 0)
101792 129.503337 24796 UnrealEd.exe Direct3DSurface8::UnlockRect (this = 1d471de0)

There are Draw* calls between surface Lock's so the videomemory must be read from video memory both for the first and second Lock. (and even must be written back to the videomemory for the Draw call..., so it's moved back and forth)

I shall eagerly await what FakeRect brings. 😀

It tried a version with faked locks for rendertarget surfaces. Sadly, it does not work. I can't select anything. The data read seems to be important for the editor. 😐

I am sad to hear this.

In your professional opinion, are there any other ideas that are left to explore? Is this something that Nvidia or Microsoft could fix?

I am willing to open up a bounty to help solve this, but it's not clear to me that it can be fixed in any way (outside of DgVoodoo)?

Reply 19 of 22, by Joshhhuaaa

User metadata
Rank Newbie
Rank
Newbie
Dege wrote on 2023-04-28, 19:55:
chris.davis925 wrote on 2023-04-28, 16:10:
Dege wrote on 2023-04-28, 15:58:
Yes, I'll compile a faking-lock version and see how it works. […]
Show full quote

Yes, I'll compile a faking-lock version and see how it works.

Of course there is a caching mechanism under the hood for locking surfaces (it's multi-level and quite complicated, especially with 'fast vidmem' mode), so subsequent Lock calls won't cause memory readback all the time. But this is only true if the GPU does not modify the content between two Lock's.
But, in UnrealEd this is the general pattern:

101782	129.501221	24796	UnrealEd.exe	Direct3DSurface8::LockRect (this = 1d471de0, pLockedRect = 5f6310, pRect = 0, Flags = 0)
101783 129.501249 24796 UnrealEd.exe Direct3DSurface8::UnlockRect (this = 1d471de0)
101784 129.501277 24796 UnrealEd.exe Direct3DSurface8::Release (this = 1d471de0)
101785 129.501307 24796 UnrealEd.exe Direct3DVertexBuffer8::Lock (this = 1d802f68, OffsetToLock = 11392, SizeToLock = 768, ppbData = 5f5e48, Flags = 1000)
101786 129.501336 24796 UnrealEd.exe Direct3DVertexBuffer8::Unlock (this = 1d802f68)
101787 129.501371 24796 UnrealEd.exe Direct3D8Device::SetTransform (this = 175b9330, State = D3DTS_WORLD, pMatrix = 1159f964)
101788 129.501400 24796 UnrealEd.exe Direct3D8Device::DrawPrimitive (this = 175b9330, PrimitiveType = D3DPT_LINELIST, StartVertex = 712, PrimitiveCount = 24)
101789 129.501489 24796 UnrealEd.exe Direct3D8Device::GetRenderTarget (this = 175b9330, ppRenderTarget = 5f6314)
101790 129.501518 24796 UnrealEd.exe Direct3DSurface8::QueryInterface (this = 1d471de0)
101791 129.501547 24796 UnrealEd.exe Direct3DSurface8::LockRect (this = 1d471de0, pLockedRect = 5f62fc, pRect = 0, Flags = 0)
101792 129.503337 24796 UnrealEd.exe Direct3DSurface8::UnlockRect (this = 1d471de0)

There are Draw* calls between surface Lock's so the videomemory must be read from video memory both for the first and second Lock. (and even must be written back to the videomemory for the Draw call..., so it's moved back and forth)

I shall eagerly await what FakeRect brings. 😀

It tried a version with faked locks for rendertarget surfaces. Sadly, it does not work. I can't select anything. The data read seems to be important for the editor. 😐

Hey Dege, just wanted to say thanks for looking further into the LockRect issue. Your efforts are greatly appreciated.

I'm on board with Chris's suggestion to offer a bounty. Many of us are passionate about using these editors on older UE2 era games, and it would mean the world to have them working again. If there are other options to explore, we're willing to try just about anything to make it possible.