Yeah, I thought you were thinking that way in our earlier discussion 😀 I guess there are two ways to offload rendering to the gpu:
1. drawing triangles on the gpu and pushing the rest as a bitmap to opengl, having opengl compositing the final image (what I was thinking about)
2. rendering triangles on the gpu, retrieving them and rendering the final image through dosbox renderers (what you are thinking about)
No.2 has the advantage of using Dosbox current outputs and basically everything that emulated 2d s3 video does. No.1 would be closer to my glide patch (rendering onto opengl window).
Now what I'm afraid, readbacks are slow, much slower than writes which are already slow. There's a way to have readbacks done asynchronously and that would be a must to get decent framerate. So there'd have to be a way to have render to FBO, initiate a readback, do something (perhaps render a new frame in the meantime to a second FBO) and hopefully by then the readback will be finished. I'm afraid having a simple render->readback will be too slow to be useable....I'm pretty fluent in OpenGL (done a whole 3D engine, with FBOs, shaders and all) 😁
edit: my ATi mobile pushes about 20Mpix/s when reading from a texture. That's only 66fps @640x480 or about 40 @800x600 if I did my math right. So doing 30fps @640x480 would take about 50% cpu...writes are at about 140Mpix/s.