In short, the process is the following for shaders:
D3D8 (1.x) shader bytecode (coming from app) -> HLSL (translated by dgVoodoo) -> D3D11 (4.x) shader bytecode (D3DCompiler) -> GPU specific code (translated by the display driver)
The most time consuming part is compiling from HLSL to 4.x shader bytecode by D3DCompiler (independent on GPU).
Translating to GPU code could also be critical, that's why pre-warming shaders are generally recommended to avoid first-use performance penalty,
but I didn't really experienced such a (measurable) thing with dgVoodoo, so the lack of pre-warming is out of interest here, I think.
The process above can be completely done for vertex shaders at creation time (when the app creates the D3D8 shader), but not for D3D8-pixel shaders.
For those, only a HLSL template can be generated from which multiple concrete D3D11 shader instances are genereted later, during rendering, as needed according to the logical D3D8-pipeline state (deferred creation).
That's why some lagging occurs when entering new areas, looking at certain objects, etc.
When the D3D8-shaders are destroyed by the game (like when leaving a game level) then the cached instances are destroyed along with them.
So, when reloading a new level and they are recreated then the process repeats.
(When returning from Alt-Tab, the game may recreate its shaders, I don't know.)
Through Komat fix, pure D3D8 is used only, so compiling by D3DCompiler is completely missing, the driver eats 1.x shader code directly.
I have plans for avoiding the compile-bottleneck for the fixed function pipeline and Glide, since dgVoodoo has precompiled shaders, but that won't help for D3D8-shaders, unfortunately.