@Raven-05, I've rolled back some of the changes with dma caching, it's far too unstable.
https://nirvtek.com/downloads/RReady.Alpha.20 … 1.ThreadSafe.7z
MD5: 18867c4e9b26db7185a912b36112e3d8
- Keyboard input handling heavily tweaked, a key is internally held for 0.066s before being released (unless pressed again sooner).
- ThreadSafe should be more stable and should be (hopefully) a bit more playable on your laptop, maybe.
- vQuake fixes.
I think the slowdown in vQuake and ICR2 is because of Dosclient. There's a thread which processes the DMA (or FIFO) stream one 32 bit word at a time. It has to parse the entire DMA stream and re-assemble it bit by bit. The problem is I could optimise it if all the data was self-contained, but because it's being streamed there's no guarantee that entire commands and their data are in that one buffer. The next buffer might continue from halfway through. WIth RRedline, there's a guarantee that a command buffer will always be self-contained, the API and the way its calls work guarantee it. Speedy3D probably does too, but unfortunately, I don't get Speedy3D, only the contents of the command buffers and there's no end of buffer marker in there, just the internals of buffer after buffer chucked in there, filling the FIFO queue on a real Rendition board. Because of this the performance is constrained by how fast this can work. I'm just wondering whether I can use AVX or something in here somehow. The compiler actually automatically optimises things to use AVX whenever possible, but I've never actually analyzed the binary.
I also have stats. vQuake at 800x600 streams at most 115 MB/s of commands+data with 20-72 fps (no vsync). ICR2 manages only 55 MB/s max with the Houston track (32 ahead/32 behind) at 16 fps. So maybe it isn't Dosclient. The profile said it didn't take up much time, but every little bit helps.
Having said that ICR2 on the Australia track with the default number of cars visible in [ALT+X] mode runs 7 fps faster (48 fps) without the rendering but with the buffer swaps (black screen with framerate). I should probably repeat the test for Houston.
I'm going to try moving it into its own thread again without DMA buffer queuing onto another thread. I might try using a stringstream (which is slow for strings) and leaks memory. That's a job for tomorrow.