Reply 20 of 38, by jmarsh
Bytes are 8 bits, you're only shifting by 4?
Bytes are 8 bits, you're only shifting by 4?
wrote:Bytes are 8 bits, you're only shifting by 4?
Lol that was indeed the error.
void GraphicsEngine::DrawRectangleASM(int x, int y, int w, int h, int color)
{
int screenWidth = 320;
asm("movl $320, %%eax;"
"movl %1, %%ebx;"
"mulw %%ebx;"
"add %0, %%eax;"
"add %5, %%eax;"
"movb %4, %%dl;"
"shl $8, %%edx;"
"movb %4, %%dl;"
"shl $8, %%edx;"
"movb %4, %%dl;"
"shl $8, %%edx;"
"movb %4, %%dl;"
"movl %3, %%ebx;"
"loop1:;"
" movl %2, %%ecx;"
" loop2:;"
" movl %%edx, (%%eax, %%ecx);"
" sub $4, %%ecx;"
" jae loop2;"
" addl %6, %%eax;"
" dec %%ebx;"
" jnz loop1;"
:
:"m"(x), "m"(y), "m"(w), "m"(h), "m"(color), "m"(&backBuffer[0]), "m"(screenWidth)
:"eax", "ebx", "ecx", "edx", "memory");
}
Thats what i came up with.
It seems to work fine too.
(the only downside is that the width has to be divisible by 4. but thats fine)
Its very fast though.
Drawing the rectangles and Flipping takes just about 1ms now.
Thanks all for helping me.
I'm going to do the same thing to my draw sprite function now.
wrote:Its very fast though.
Drawing the rectangles and Flipping takes just about 1ms now.
which is 1000 Hz (or 1000 fps). hence why dosbox is not representative of real hardware.
--/\-[ Stu : Bloody Cactus :: [ https://bloodycactus.com :: http://kråketær.com ]-/\--
wrote:which is 1000 Hz (or 1000 fps). hence why dosbox is not representative of real hardware.
there are two things to note here.
I dont wait for vertical retrace or anything. And also i've set the dosbox cycles to 100%
in dosbox.conf:
cycles=max
wrote:If you give me a binary, I can test it on my 386 with an ET4000 on ISA bus. That should probably take longer than 1ms...
Sure, we can test that.
I wasnt really trying to say anything about real Hardware though, it was just that my original code was very slow. And i thought i did something wrong (which i did 🤣)
It would be interesting non the less
The output is rather crude. It just draws 2 Rectangles, and prints the time in ms over and over.
On ESC it ends, and you can read the numbers.
Thats how it works in my dosbox anyway
Edit:
One more thing.
The ASM Code is designed for 486. Maybe it works on a 386 too. don't know.
Edit2:
Still works with dosbox set to 386. for whatever thats worth
wrote:The ASM Code is designed for 486. Maybe it works on a 386 too. don't know.
The only regular instruction that 486 has that 386 doesn't is bswap as far as I can remember.
wrote:wrote:The ASM Code is designed for 486. Maybe it works on a 386 too. don't know.
The only regular instruction that 486 has that 386 doesn't is bswap as far as I can remember.
CMPXCHG, XADD and a bunch of cache instructions are also new.
Ok, I tested it:
Result is 46ms per frame. That's a bit more than 21 FPS.
Thats ... not great 🤣
But its also not terrible.
I suspect the printing of the numbers slows it down a bit.
And if i can implement proper page flipping with VESA it should easily be 30-40% faster than now.
Of course drawing real sprites is slower. plus all the game logic.
But i think i can get something that will be playable
wrote:I suspect the printing of the numbers slows it down a bit.
Yup, the observer effect.
Generally a good approach is to just do a simple measurement, and only calculate times/framerates once or twice a second.
wrote:And if i can implement proper page flipping with VESA it should easily be 30-40% faster than now.
VESA? Mode X I suppose?
At least, when you're targeting 386, it would not be realistic to expect more than regular VGA support. 386 also has its hands full with just 320x200 8 bit colour.
Drawing sprites would be best with a compiled sprite routine, which would be barely slower than what you have now. In fact, it might be somewhat faster because it's effectively an unrolled loop.
I think over the ISA bus you won't see much better values. It is limited to 16 bits at a time, clocked at 8MHz. This should allow for 16MiB/s of peak performance if there is nothing else working on the bus and the CPU doesn't do anything else, or you have DMA transfers. The 21 FPS are of course not optimal and show only about 1.3 MiB/s of throughput. Hence it is probably much better if you write directly to VGA RAM and do page flipping. It's important that you don't have to read back from VGA memory, so that turnaround won't be long.
You CAN try to render to RAM and 'rep stosl' the buffer (or whatever subset of it that changed), which probably will give you the best performance, if you are NOT using page flipping. Even with page flipping there are probably optimized ways to copy sprites and characters into video RAM. Using transparency will necessitate some tricks though, as will sprite scaling.
So.
ive done a couple things.
I wrote a VESA init method, so I can use the VESA modes now including proper page flipping.
for the regular mode 13 (without page flipping) ive reimplemented the memcopy in assembly.
I also made sure that the printing time of the milliseconds wont show up in the calculated time.
In my dosbox it runs too fast for a proper measurement. Its below 1ms.
The problem with my page flip is, the screen still flickers like crazy.
Something is wrong. the picture should be static like mode13h...
void GraphicsEngine::Flip()
{
if(pageFlipping)
{
if(currentPageAddress == screenMemory)
{
currentPageAddress = screenMemory + screenWidth*screenHeight;
SetDisplayStart(0, 0);
}
else
{
currentPageAddress = screenMemory;
SetDisplayStart(screenHeight, 0);
}
}
else
{
unsigned int maxScreenOffset = screenWidth * screenHeight;
asm("movl $0, %%ecx;"
"loop%=:;"
" movl (%%esi, %%ecx), %%eax;"
" movl %%eax, (%%edi, %%ecx);"
" add $4, %%ecx;"
" cmp %0, %%ecx;"
" jbe loop%=;"
:
:"m"(maxScreenOffset), "D"(&screenMemory[0]), "S"(backBuffer)
:"eax", "ecx", "memory");
}
}
void GraphicsEngine::SetDisplayStart(int newStartScanline, int newStartPixelOnScanline)
{
__dpmi_regs r;
r.x.ax = 0x4F07;
r.h.bh = 0x00;
r.h.bl = 0x80;
r.x.cx = newStartPixelOnScanline;
r.x.dx = newStartScanline;
__dpmi_int(0x10, &r);
}
If somebody wants to test it, ive attached the exe.
Would be cool to know how it performs on real hardware.
(use mode 100 in VESA mode. everything else is untested)
you need to wait for vertical retrace, polling the vsync flag, then copy/pageflip
--/\-[ Stu : Bloody Cactus :: [ https://bloodycactus.com :: http://kråketær.com ]-/\--
wrote:you need to wait for vertical retrace, polling the vsync flag, then copy/pageflip
For my let's code series I put the code up on github. There's also a wait for retrace function that OP can copy:
https://gist.github.com/root42/8e147c5ec2427f … 2ac50a5aba52317
#define INPUT_STATUS 0x3DA
#define VRETRACE_BIT 0x08
void wait_for_retrace()
{
while( inp( INPUT_STATUS ) & VRETRACE_BIT );
while( ! (inp( INPUT_STATUS ) & VRETRACE_BIT) );
}
wrote:you need to wait for vertical retrace, polling the vsync flag, then copy/pageflip
That might depend...
On a 6845, the screen offset register is latched. So you basically 'fire-and-forget', the value will become active for the next frame.
Which means you would actually first perform the pageflip and THEN wait for vertical retrace, to wait for the actual flip to occur, before you start drawing in the new backbuffer (else you're actually drawing into what is still the frontbuffer).
FYI, waiting for retrace (e.g. BL=80h when calling INT 10h/AX=4F07h) is implemented in SVN, but in 0.74(-2) you can load UniVBE or so to get the feature.
Ok.
You guys are right. Waiting for Retrace fixes flickering.
But i dont understand why it flickers in the first place. i draw the image to the off buffer, then i swap the pointers, and then i draw to the other buffer (thats now the off buffer)
so how can it display an incomplete image. shouldn't there always be a complete image in the front buffer ?
Also, and thats really weird.
With waiting for retrace enabled, my time function just prints "e" all the time instead of a number?!
but only in VESA, mode in mode13h (with waitforRetrace) it still works ???
void GraphicsEngine::WaitForRetrace()
{
/* wait until done with vertical retrace */
while ((inportb(0x03da) & 0x08) != 8) {};
/* wait until done refreshing */
while ((inportb(0x03da) & 0x08) == 8) {};
}
//Main loop
while(running)
{
engine->time->SetFrameStartTimeStamp();
engine->input->PollKeys();
if(engine->input->KeyDown(1))
{
running = false;
}
engine->graphics->DrawRectangleASM(0,0,320,200,5);
engine->graphics->DrawRectangleASM(10,10,110,110,7);
engine->graphics->WaitForRetrace();
engine->graphics->Flip();
time = engine->time->GetTicksSinceFrameStart();
cout<<engine->time->TicksToMilliSeconds(time)<<" ";
}
wrote:But i dont understand why it flickers in the first place. i draw the image to the off buffer, then i swap the pointers, and then i draw to the other buffer (thats now the off buffer)
so how can it display an incomplete image. shouldn't there always be a complete image in the front buffer ?
because your drawing at a speed thats different from the screen. ie: Screen is in say, 60hz or 70hz but your drawing at 80 or 50 or something.
real crt + lcd will give different results.
waiting for retrace syncs up your drawing with the actual refresh rate of the screen.
--/\-[ Stu : Bloody Cactus :: [ https://bloodycactus.com :: http://kråketær.com ]-/\--
wrote:so how can it display an incomplete image. shouldn't there always be a complete image in the front buffer ?
As long as there are two different images in the two buffers, and you switch between them somewhere outside the vertical blank area, then you can get flicker, because part of the current displayed frame comes from one buffer, and part comes from another.
Drawing a screen takes time. The old CRT is quite intuitive that way: the cathode ray actually traces the screen one scanline at a time. So the output of the video card is always a single pixel at a time, and that is whatever the ray draws at that moment. Proper timing in the video card circuitry makes sure that pixels are switched at the correct time, scanlines are switched at the correct time, and eventually a vertical blank is inserted, so the ray can return to the top of the screen again.
Modern digital screens are not quite that 'direct', but the general idea still holds: The video memory is scanned from left to right, top to bottom, as it is sent to the internal framebuffer inside your flatscreen. So the concepts of vsync and tearing still apply.