There is some CPU overhead in Windows Media Player to avoid visible tearing; this overhead is proportional to the window height and does not decrease with a faster CPU. It can be significant for a full-screen video, but should be very minimal when the player window is shrunk. This overhead is higher for DOSBox-recorded videos than regular videos due to the high frame rate.
For videos with a large frame size, there is also significant additional overhead to convert the video to 16-bit RGB or 32-bit RGB, since the ZMBV codec only supports 24-bit decoding. DirectShow's Color Space Converter is terrible at this (disassembly is showing unoptimized scalar code with a partial register stall per pixel) and on a 640x480 video VTune is showing the 24->32 routine in quartz.dll taking as much time as the ZMBV decoder. Implementing direct 16-bit and 32-bit decoding in ZMBV would fix this problem and shouldn't be too difficult.
Finally, when nothing is changing on screen, DOSBox is writing out frames that have no changes, but are still 12 bytes in size. If these can be written as true zero-byte null frames, it may allow video players to bypass the decoder entirely for those frames and improve playback performance when the game changes the screen less frequently than 70 fps.