VOGONS


dgVoodoo 2 for DirectX 11

Topic actions

  • This topic is locked. You cannot reply or edit posts.

Reply 3600 of 3949, by Dege

User metadata
Rank l33t
Rank
l33t

Update on Drakan: I've just run it again and I must have taken a mistake. Draw distance seems to be normal after all.

UCyborg, could you check it out plz (these files are from your patch)?

http://dege.fw.hu/temp/Drakan_fastvidmem_dept … uffer_patch.zip

So, what I did:

- Added 'poking' code to the surface data reader (Drakan.exe)
- Added code to accept D24S8 and D24X8 debth buffer formats (Drakan.exe)
- Added mask-to-24-bit code for reading back depth values (Dragon.rfl)

Now it runs at 60fps for me in 2560x1440 8x MSAA with lens flares (can't test it at higher resolutions atm).

Reply 3602 of 3949, by UCyborg

User metadata
Rank Member
Rank
Member
Dege wrote:

Update on Drakan: I've just run it again and I must have taken a mistake. Draw distance seems to be normal after all.

UCyborg, could you check it out plz (these files are from your patch)?

http://dege.fw.hu/temp/Drakan_fastvidmem_dept … uffer_patch.zip

Thanks, will do! I was just writing the following wall of text. I'm a bit slow. Perhaps some questions will be answered by looking at your patches.

Got some results some from the laptop with Intel Core i3-3110M 2,4 GHz + Intel HD Graphics 4000. Drakan works quite nicely on it (absent lens flares, but more on that later) without dgVoodoo, the lowest recorded FPS number I got was 45 on level Islands while being bombarded by 4 other dragons. I've had resolution set at maximum available 1366x768 and draw distance to 200%.

Things were slower with dgVoodoo, first the frame-rate was quite low (choppy mouse) in menu when there was a background. In-game FPS was nicely at 60 until doubling draw distance from the original maximum of 100%. It was around 25 in that scene at the end of Wartok Canyons level (the one of which I uploaded a save on my thread on page 178).

We could say the codepaths handling Direct3D 10+ aren't as fast in that card's driver, at least for this scenario.

Now some dumb questions and thoughts:

Dege wrote:

The base problem with the cursor bitmap reading is that dgVoodoo doesn't notice that surface content changes. I inserted a little code snippet into the reader function that 'pokes' the mapped surface area, by writing one byte into it before ReadFile gets called.

Do I understand correctly that dgVoodoo doesn't notice only when ReadFile writes into the surface, but doing it manually (memcpy?) would work? OllyDbg catches many PAGE_GUARD_VIOLATION exceptions with enabled fast memory access, which, if I understood correctly, you use to detect changes. But nothing happens when you move the mouse. I think I'm missing something here, ReadFile gets called continuously when moving the mouse over one of the gems in the menu, but not when when it's moved anywhere else.

Dege wrote:

Dirty, but works for the cursor.

I recall 2 dirty hacks in my patching attempts as well. The game is quirky and I'm no software engineer.

Dege wrote:

The game can only work with either pure 16 bit or pure 24 bit depth buffers.

Is this the number the callback function checks at the offset 0x0C from the pointer in CPU register at two places?

Dege wrote:

By default it chooses the largest bit depth, so 24 bit in our case. When saying 24 bit, I mean real 24 bit, 3 bytes per pixel.

Is it just me, or is that function a bit flawed, assuming the answer to my previous question is positive? Both natively and with dgVoodoo, the first enumerated buffer depth is 16-bit, but the second is 24-bit with dgVoodoo and 32-bit natively. The engine cancels enumeration in both cases, but only in latter case lens flares disappear completely. I need to test this more; the game eventually crashes in Dragon.rfl (out-of-bounds pointer?) when it uses 32-bit buffer, but not with lower depths. Islands level, "iamgod" cheat and being bombarded with enemy dragons is a good way to get it to crash.

Dege wrote:

either limit Drakan to 16 bit depth buffers (doesn't sound good, I don't know what glitches it brings in)

Lens flares are perfect, but you get wallhack effect in the distance.

Dege wrote:

Then I tried the other way, what if I force the callback function to accept D24S8 depth format (all in all it's 32bit but the depth component is still 24). No more fps drop but lens flare disappeared in return

I guess this happens natively all the time. If I get it to use any format that isn't 32-bit, lens flares show up.

Dege wrote:

so I checked out the code reading back the depth values.
Indeed, it behaved badly, so I patched even this one. Lens flare were back but something went wrong with game draw distance...

So to conclude, the ideal solution would be to patch the function you mention somehow. Now I'm curious what the callback function for Z-buffer formats decides to use on Voodoo 2. I'll check it out in the emulator.

lowenz wrote:

What a case of study for a game of 20 years ago 😁

I love these! Always good to get enlightened after wandering in the darkness.

Arthur Schopenhauer wrote:

A man can be himself only so long as he is alone; and if he does not love solitude, he will not love freedom; for it is only when he is alone that he is really free.

Reply 3603 of 3949, by Dege

User metadata
Rank l33t
Rank
l33t
UCyborg wrote:

Do I understand correctly that dgVoodoo doesn't notice only when ReadFile writes into the surface, but doing it manually (memcpy?) would work? OllyDbg catches many PAGE_GUARD_VIOLATION exceptions with enabled fast memory access, which, if I understood correctly, you use to detect changes.

Yes, exactly. But if the memory is written by ReadFile then that happens in kernel mode so the guardpage exception is not raised.

UCyborg wrote:

But nothing happens when you move the mouse. I think I'm missing something here, ReadFile gets called continuously when moving the mouse over one of the gems in the menu, but not when when it's moved anywhere else.

ReadFile probably reads something else than bitmap data. Or, it's filling it to textures that were already updated but not used since then so no additional exceptions are raised.

UCyborg wrote:

Is this the number the callback function checks at the offset 0x0C from the pointer in CPU register at two places?

No, I think of the following callback function:

Edit: yes, we're talking about the same. At first I misunderstood it, sorry.

0043CFE0 8B 44 24 04          mov         eax,dword ptr [esp+4]      <--- eax points to a DDPIXELFORMAT descriptor
0043CFE4 56 push esi
0043CFE5 57 push edi
0043CFE6 81 78 04 00 04 00 00 cmp dword ptr [eax+4],400h <--- check if it's a format for a pure z-buffer (only the DDPF_ZBUFFER flag allowed)
0043CFED 75 22 jne 0043D011
0043CFEF 8B 7C 24 10 mov edi,dword ptr [esp+10h]
0043CFF3 8B 48 0C mov ecx,dword ptr [eax+0Ch] <--- ecx = bitcount member of the structure
0043CFF6 3B 4F 0C cmp ecx,dword ptr [edi+0Ch] <--- if it's lessequal than the maximum enumerated so far then skip this format
0043CFF9 76 16 jbe 0043D011
0043CFFB B9 08 00 00 00 mov ecx,8
0043D000 8B F0 mov esi,eax
0043D002 F3 A5 rep movs dword ptr es:[edi],dword ptr [esi] <--- copying the structure
0043D004 83 78 0C 18 cmp dword ptr [eax+0Ch],18h <--- if bitcount is >= 24 then stop the enumeration
0043D008 72 07 jb 0043D011
0043D00A 5F pop edi
0043D00B 33 C0 xor eax,eax
0043D00D 5E pop esi
0043D00E C2 08 00 ret 8
0043D011 5F pop edi
0043D012 B8 01 00 00 00 mov eax,1
0043D017 5E pop esi
0043D018 C2 08 00 ret 8

Indeed, it can work with 32bit too, if no any 24 bit format is enumerated.

UCyborg wrote:

Is it just me, or is that function a bit flawed, assuming the answer to my previous question is positive? Both natively and with dgVoodoo, the first enumerated buffer depth is 16-bit, but the second is 24-bit with dgVoodoo and 32-bit natively. The engine cancels enumeration in both cases, but only in latter case lens flares disappear completely. I need to test this more; the game eventually crashes in Dragon.rfl (out-of-bounds pointer?) when it uses 32-bit buffer, but not with lower depths. Islands level, "iamgod" cheat and being bombarded with enemy dragons is a good way to get it to crash.

Ok, I think what it is: D24 (old fashioned pure 3 byte depth) format is not supported natively, only D24S8 and D24X8. But the latters are considered as 32 bit formats and D24X8 passes the filter based on flags (DDPF_ZBUFFER) but the game doesn't check for the depth-bitmask in the desctiptor. So it thinks it's working with a full 32 bit format but with only 24 bit in the reality. This is for sure a problem for the code reading back depth-pixels.

As for the crash, yes, the code doesn't do boundschecking. It may overread the last scanline.

UCyborg wrote:

I guess this happens natively all the time. If I get it to use any format that isn't 32-bit, lens flares show up.

The game doesn't handle 32 bit z-formats correctly, so I modified the callback above to accept 32 bit formats but only when the depth mask is 0x00FFFFFF in the descriptor. And, the pixel mask calculated from the bitcount number in Dragon.rfl is always masked out by 0x00FFFFFF. That way real D32 format is excluded but all of D24, D24S8 and D24X8 are accepted and handled properly.

Reply 3605 of 3949, by Dege

User metadata
Rank l33t
Rank
l33t

Chaos Legion handles non-power of 2 textures inproperly. A bit subtle how it calculates texture data but it doesn't work for non-pow2 ones. It's not a D3D8-bug but one in the application.
I guess video hw's didn't support non-pow2 textures back when the game was released so that code path remained untested.
Anyway, I modified the code to discard the result of checking for D3DPTEXTURECAPS_POW2, D3DPTEXTURECAPS_CUBEMAP_POW2 and D3DPTEXTURECAPS_VOLUMEMAP_POW2 and always create pow2 sized textures. This works:

http://dege.fw.hu/temp/ChaosLegion_texture_patch.zip

Thanks for the idea to the creator of the D3D8-patch!!

Blade of Darkness: doesn't fast vidmem access help? Still not tested though.

Reply 3608 of 3949, by UCyborg

User metadata
Rank Member
Rank
Member

Thank you for your explanations, Dege! So the patch you made works great for bringing back the lens flares in Drakan, we get good performance through dgVoodoo and lens flares work consistently natively.

Just the stability issue with the problematic function in Dragon.rfl remains, the crash in the nested loop a bit further down from the patch point, the first instruction of the inner loop:

CPU Disasm
Address Hex dump Command Comments
100A3DFB |> /8B01 |/MOV EAX,DWORD PTR DS:[ECX] ; May crash
100A3DFD |. |23C5 ||AND EAX,EBP
100A3DFF |. |8B6C24 2C ||MOV EBP,DWORD PTR SS:[ESP+2C]
100A3E03 |. |3BC5 ||CMP EAX,EBP
100A3E05 |. |72 04 ||JB SHORT 100A3E0B
100A3E07 |. |FF4424 18 ||INC DWORD PTR SS:[ESP+18]
100A3E0B |> |8B6C24 14 ||MOV EBP,DWORD PTR SS:[ESP+14]
100A3E0F |. |8D04B6 ||LEA EAX,[ESI*4+ESI]
100A3E12 |. |03C8 ||ADD ECX,EAX
100A3E14 |. |4A ||DEC EDX
100A3E15 |.^\75 E4 |\JNZ SHORT 100A3DFB

I wonder if the data obtained earlier through the call to the Lock method could be used to somehow fix this. The crash frequency depends on the tolerance of accessing out-of-bounds memory of the lower-level systems. I couldn't get it to crash natively on my NVIDIA at all while it went down pretty quickly on Intel. With dgVoodoo, page guard exception is raised without being handled. The code can then continue through debugger, there's just empty memory there.

One function using 3DNow! instructions has a similar bug; accessing out-bounds-memory, though it doesn't do any harm. Application Verifier can reveal it.

Arthur Schopenhauer wrote:

A man can be himself only so long as he is alone; and if he does not love solitude, he will not love freedom; for it is only when he is alone that he is really free.

Reply 3610 of 3949, by ZellSF

User metadata
Rank l33t
Rank
l33t
robertmo wrote:
willow wrote:

A dx9 wrapper in dgvoodoo2 could be usefull because amd has introduced bugs in the last drivers with some old dx9 games and dont want solve theses problems
http://www.dsogaming.com/news/amd-will-most-l … renalin-driver/

Use nVidia.

You're uh, replying to an old post, AMD has already fixed that issue. Also even if that wasn't the case, there is a D3D9 wrapper you can recommend before going to the extreme of switching GPU: WineD3D. Which sadly has no resolution forcing so no way to fix GPU scaling issues on monitor issues, but should be fine as a compatibility wrapper.

Dege wrote:
Chaos Legion handles non-power of 2 textures inproperly. A bit subtle how it calculates texture data but it doesn't work for non […]
Show full quote

Chaos Legion handles non-power of 2 textures inproperly. A bit subtle how it calculates texture data but it doesn't work for non-pow2 ones. It's not a D3D8-bug but one in the application.
I guess video hw's didn't support non-pow2 textures back when the game was released so that code path remained untested.
Anyway, I modified the code to discard the result of checking for D3DPTEXTURECAPS_POW2, D3DPTEXTURECAPS_CUBEMAP_POW2 and D3DPTEXTURECAPS_VOLUMEMAP_POW2 and always create pow2 sized textures. This works:

http://dege.fw.hu/temp/ChaosLegion_texture_patch.zip

Thanks for the idea to the creator of the D3D8-patch!!

Works great (though obviously I had hoped it would be something that made sense to fix on a more general basis):

ChaosLegion 2018-05-19 10-28-47-16.jpg
Filename
ChaosLegion 2018-05-19 10-28-47-16.jpg
File size
554.22 KiB
Views
3428 views
File license
Fair use/fair dealing exception

Reply 3611 of 3949, by robertmo

User metadata
Rank l33t++
Rank
l33t++
ZellSF wrote:

extreme of switching GPU

I call it unavoidable PC modernization (combined with replacing broken part in this case).
The easiest one I can think of.
You sell your old card to someone who doesn't care about old games.
And buy whatever you want new or used.
You also vote for better support 😉
amd and ati were always cheap solution for speed enthusiasts anyway

Reply 3612 of 3949, by ZellSF

User metadata
Rank l33t
Rank
l33t
robertmo wrote:
I call it unavoidable PC modernization (combined with replacing broken part in this case). The easiest one I can think of. You s […]
Show full quote
ZellSF wrote:

extreme of switching GPU

I call it unavoidable PC modernization (combined with replacing broken part in this case).
The easiest one I can think of.
You sell your old card to someone who doesn't care about old games.
And buy whatever you want new or used.
You also vote for better support 😉
amd and ati were always cheap solution for speed enthusiasts anyway

Wow how much are Nvidia paying you?

Reply 3614 of 3949, by UCyborg

User metadata
Rank Member
Rank
Member

As if NVIDIA's drivers are perfect. You'll see NVIDIA specific problems mentioned just one page back.

Arthur Schopenhauer wrote:

A man can be himself only so long as he is alone; and if he does not love solitude, he will not love freedom; for it is only when he is alone that he is really free.

Reply 3616 of 3949, by UCyborg

User metadata
Rank Member
Rank
Member

Is it just me or do some of the inventory icons in Drakan get messed up with fast video memory access enabled?

Nope, happens here too. At first it looked like there is a pattern and only happens for items that cannot be obtained using cheat codes. But the sword Sting, found on the Islands, behaves fine.

Guess there is another spot that should be "poked". Edit: Maybe not, conditional breakpoint doesn't reveal any other place from where the game issues Lock method call.

Edit: save file for testing

Arthur Schopenhauer wrote:

A man can be himself only so long as he is alone; and if he does not love solitude, he will not love freedom; for it is only when he is alone that he is really free.

Reply 3617 of 3949, by Dege

User metadata
Rank l33t
Rank
l33t
UCyborg wrote:

I wonder if the data obtained earlier through the call to the Lock method could be used to somehow fix this.

How do you mean that?

UCyborg wrote:

The crash frequency depends on the tolerance of accessing out-of-bounds memory of the lower-level systems. I couldn't get it to crash natively on my NVIDIA at all while it went down pretty quickly on Intel. With dgVoodoo, page guard exception is raised without being handled. The code can then continue through debugger, there's just empty memory there.

Yes, good old days. I still remember overwriting the bounds of a mapped vertex buffer or sg like that could cause even a BSOD or GPU halt with my 5700 Ultra because there could be sg other GPU-related thing being shared on the last mapped memory page of the buffer. 😁
In dgVoodoo I added some guarding to the beginning and the end of the mappings because of that kind of overaddressing. It's typical when a game reads for example 2x2, 3x3 or 5 pixels to calculate an average or sg like that from that.

UCyborg wrote:

One function using 3DNow! instructions has a similar bug; accessing out-bounds-memory, though it doesn't do any harm. Application Verifier can reveal it.

I think it's not a problem by now, 3DNow! is not supported by any CPU's.

For the overaddressing of Drakan, I remember some code checking if the calculated sun position (coordinates) is within the rect of the surface. Maybe patching the code to check against a rect deflated by 1-2 pixels could solve the problem.

UCyborg wrote:
Nope, happens here too. At first it looked like there is a pattern and only happens for items that cannot be obtained using chea […]
Show full quote

Is it just me or do some of the inventory icons in Drakan get messed up with fast video memory access enabled?

Nope, happens here too. At first it looked like there is a pattern and only happens for items that cannot be obtained using cheat codes. But the sword Sting, found on the Islands, behaves fine.

Guess there is another spot that should be "poked". Edit: Maybe not, conditional breakpoint doesn't reveal any other place from where the game issues Lock method call.

Edit: save file for testing

Then I'll have to trace back where the data comes from. I hope it's another ReadFile somewhere. Cannot do it atm. But thanks for the savefile.

Reply 3618 of 3949, by Dege

User metadata
Rank l33t
Rank
l33t
KainXVIII wrote:

Well, i always used 480p ingame setting for Severance, but game still not feels smooth enough (despite 60 fps counter), NGlide wrapper is a little more smoother for some reason (but colors are washed out)

Doesn't fast vidmem access for Glide ('Force emulating true PCI access' on the CPL) help there too?

Reply 3619 of 3949, by UCyborg

User metadata
Rank Member
Rank
Member
Dege wrote:

How do you mean that?

I don't know, just the lpDDSurfaceDesc parameter of IDirectDrawSurface4::Lock came to mind. That function in Dragon.rfl dealing with lens flares is black magic to me.

Dege wrote:

I think it's not a problem by now, 3DNow! is not supported by any CPU's.

True. I wonder though, could it be mis-detected through CPUID. The original code doesn't check CPU vendor before checking those bits returned by CPUID function 80000001h, which is historically valid, I know there was one VIA CPU at the time that had that instruction set. Today, Intel's manual says that the exact bit that would indicate 3DNow! on AMD CPUs is reserved and to not count on its value.

Interestingly, at the end of the CPU detection routine, the game verifies presence of SSE instructions (which it doesn't use at all anywhere) by executing ORPS XMM0,XMM0, which is supposed to execute exception handler if it didn't work which turns off the bit in the variable indicating SSE presence, but the exception handler cannot be executed because some variable used by Structured Exception Handling mechanism is not set correctly by some code earlier, presuming due to developers mixing inline assembly with C code and compiler optimizations interfering.

I fixed that code in my patch; made exception handling work and made that verification sequence use 3DNow's FEMMS, which I presume, was intended behavior. So there's definitely less than zero chance of engine trying to use 3DNow on CPUs that don't have them, even though there's always -non3dnow command line parameter.

I noticed the bug because I tried the game on virtual machine with Windows NT 4.0 (server works on it) and it crashed due to broken exception handling.

Dege wrote:

For the overaddressing of Drakan, I remember some code checking if the calculated sun position (coordinates) is within the rect of the surface. Maybe patching the code to check against a rect deflated by 1-2 pixels could solve the problem.

Interesting. There indeed seems to be a pattern here. It's really not about how many lens flares are on the screen. When you're flying away from the dragons bombarding you, it's common for some flare to be at the very bottom of the screen and the chances of crashing are very high. But when everything is concentrated at the center, things look better.

Dege wrote:

Then I'll have to trace back where the data comes from. I hope it's another ReadFile somewhere. Cannot do it atm. But thanks for the savefile.

No need to rush with investigation for either problem. Thanks for all you do!

Last edited by UCyborg on 2018-05-21, 13:57. Edited 1 time in total.
Arthur Schopenhauer wrote:

A man can be himself only so long as he is alone; and if he does not love solitude, he will not love freedom; for it is only when he is alone that he is really free.