Is this true for all extenders?
The DPMI specification mandates the detection order to be: DPMI, VCPI, XMS, INT 15h (Raw), that is VCPI *before* XMS which is what DOS/32A does. There are however ways to cheat, for example if a VCPI host is present but is inactive, you could theoretically just ignore it, and use XMS/Raw which are faster. There are some implications though, first a VCPI host might reserve memory for EMS, so the DOS Extender will not be able to use all the memory in the system. Then there is compatibility issues, there are lots of different VCPI implementations out there, the only way to prove the detection code to be correct is lots of empirical tests which requires lots of time.
Most of the performace hit when running DOS/32A under VCPI comes from mode switching. Since the DOS Extender creates its own memory heap at startup (that's the reason for a short pause) the actual memory allocation at run-time is about just as fast as under XMS/Raw.
The current DOS/32A beta 9.0.2 includes some code to enable [XMS, VCPI] detection order, however it is commented out right now. If you feel like rebuilding the DOS Extender you can uncoment lines 109..121 in kernel/detect.asm and see how it works. The current solution is flawed, if VCPI grabs all the memory the DOS Extender is left with no memory at all (this is unusual but can happen). I'll need to rework some of the detection/initialization code before the solution is complete.