VOGONS


very slow drawing with DJGPP

Topic actions

Reply 21 of 38, by thrawn235

User metadata
Rank Newbie
Rank
Newbie
jmarsh wrote:

Bytes are 8 bits, you're only shifting by 4?

Lol that was indeed the error.

void GraphicsEngine::DrawRectangleASM(int x, int y, int w, int h, int color)
{
int screenWidth = 320;

asm("movl $320, %%eax;"
"movl %1, %%ebx;"
"mulw %%ebx;"
"add %0, %%eax;"
"add %5, %%eax;"

"movb %4, %%dl;"
"shl $8, %%edx;"
"movb %4, %%dl;"
"shl $8, %%edx;"
"movb %4, %%dl;"
"shl $8, %%edx;"
"movb %4, %%dl;"

"movl %3, %%ebx;"
"loop1:;"
" movl %2, %%ecx;"
" loop2:;"
" movl %%edx, (%%eax, %%ecx);"
" sub $4, %%ecx;"
" jae loop2;"
" addl %6, %%eax;"
" dec %%ebx;"
" jnz loop1;"

:
:"m"(x), "m"(y), "m"(w), "m"(h), "m"(color), "m"(&backBuffer[0]), "m"(screenWidth)
:"eax", "ebx", "ecx", "edx", "memory");
}

Thats what i came up with.
It seems to work fine too.
(the only downside is that the width has to be divisible by 4. but thats fine)

Its very fast though.
Drawing the rectangles and Flipping takes just about 1ms now.

Thanks all for helping me.

I'm going to do the same thing to my draw sprite function now.

Reply 22 of 38, by BloodyCactus

User metadata
Rank Oldbie
Rank
Oldbie
thrawn235 wrote:

Its very fast though.
Drawing the rectangles and Flipping takes just about 1ms now.

which is 1000 Hz (or 1000 fps). hence why dosbox is not representative of real hardware.

--/\-[ Stu : Bloody Cactus :: [ https://bloodycactus.com :: http://kråketær.com ]-/\--

Reply 23 of 38, by root42

User metadata
Rank l33t
Rank
l33t

If you give me a binary, I can test it on my 386 with an ET4000 on ISA bus. That should probably take longer than 1ms...

YouTube and Bonus
80486DX@33 MHz, 16 MiB RAM, Tseng ET4000 1 MiB, SnarkBarker & GUSar Lite, PC MIDI Card+X2+SC55+MT32, OSSC

Reply 24 of 38, by thrawn235

User metadata
Rank Newbie
Rank
Newbie
BloodyCactus wrote:

which is 1000 Hz (or 1000 fps). hence why dosbox is not representative of real hardware.

there are two things to note here.
I dont wait for vertical retrace or anything. And also i've set the dosbox cycles to 100%
in dosbox.conf:

cycles=max
root42 wrote:

If you give me a binary, I can test it on my 386 with an ET4000 on ISA bus. That should probably take longer than 1ms...

Sure, we can test that.
I wasnt really trying to say anything about real Hardware though, it was just that my original code was very slow. And i thought i did something wrong (which i did 🤣)

It would be interesting non the less

The output is rather crude. It just draws 2 Rectangles, and prints the time in ms over and over.
On ESC it ends, and you can read the numbers.
Thats how it works in my dosbox anyway

Edit:
One more thing.
The ASM Code is designed for 486. Maybe it works on a 386 too. don't know.

Edit2:
Still works with dosbox set to 386. for whatever thats worth

Attachments

  • Filename
    CWSDPMI.EXE
    File size
    20.83 KiB
    Downloads
    44 downloads
    File license
    Fair use/fair dealing exception
  • Filename
    mode13.exe
    File size
    1.3 MiB
    Downloads
    63 downloads
    File license
    Fair use/fair dealing exception

Reply 25 of 38, by Scali

User metadata
Rank l33t
Rank
l33t
thrawn235 wrote:

The ASM Code is designed for 486. Maybe it works on a 386 too. don't know.

The only regular instruction that 486 has that 386 doesn't is bswap as far as I can remember.

http://scalibq.wordpress.com/just-keeping-it- … ro-programming/

Reply 26 of 38, by root42

User metadata
Rank l33t
Rank
l33t
Scali wrote:
thrawn235 wrote:

The ASM Code is designed for 486. Maybe it works on a 386 too. don't know.

The only regular instruction that 486 has that 386 doesn't is bswap as far as I can remember.

CMPXCHG, XADD and a bunch of cache instructions are also new.

YouTube and Bonus
80486DX@33 MHz, 16 MiB RAM, Tseng ET4000 1 MiB, SnarkBarker & GUSar Lite, PC MIDI Card+X2+SC55+MT32, OSSC

Reply 27 of 38, by root42

User metadata
Rank l33t
Rank
l33t

Ok, I tested it:

https://youtu.be/8-7N-BWnCcg

Result is 46ms per frame. That's a bit more than 21 FPS.

YouTube and Bonus
80486DX@33 MHz, 16 MiB RAM, Tseng ET4000 1 MiB, SnarkBarker & GUSar Lite, PC MIDI Card+X2+SC55+MT32, OSSC

Reply 28 of 38, by thrawn235

User metadata
Rank Newbie
Rank
Newbie

Thats ... not great 🤣

But its also not terrible.

I suspect the printing of the numbers slows it down a bit.
And if i can implement proper page flipping with VESA it should easily be 30-40% faster than now.

Of course drawing real sprites is slower. plus all the game logic.
But i think i can get something that will be playable

Reply 29 of 38, by Scali

User metadata
Rank l33t
Rank
l33t
thrawn235 wrote:

I suspect the printing of the numbers slows it down a bit.

Yup, the observer effect.
Generally a good approach is to just do a simple measurement, and only calculate times/framerates once or twice a second.

thrawn235 wrote:

And if i can implement proper page flipping with VESA it should easily be 30-40% faster than now.

VESA? Mode X I suppose?
At least, when you're targeting 386, it would not be realistic to expect more than regular VGA support. 386 also has its hands full with just 320x200 8 bit colour.

Drawing sprites would be best with a compiled sprite routine, which would be barely slower than what you have now. In fact, it might be somewhat faster because it's effectively an unrolled loop.

http://scalibq.wordpress.com/just-keeping-it- … ro-programming/

Reply 30 of 38, by root42

User metadata
Rank l33t
Rank
l33t

I think over the ISA bus you won't see much better values. It is limited to 16 bits at a time, clocked at 8MHz. This should allow for 16MiB/s of peak performance if there is nothing else working on the bus and the CPU doesn't do anything else, or you have DMA transfers. The 21 FPS are of course not optimal and show only about 1.3 MiB/s of throughput. Hence it is probably much better if you write directly to VGA RAM and do page flipping. It's important that you don't have to read back from VGA memory, so that turnaround won't be long.

You CAN try to render to RAM and 'rep stosl' the buffer (or whatever subset of it that changed), which probably will give you the best performance, if you are NOT using page flipping. Even with page flipping there are probably optimized ways to copy sprites and characters into video RAM. Using transparency will necessitate some tricks though, as will sprite scaling.

YouTube and Bonus
80486DX@33 MHz, 16 MiB RAM, Tseng ET4000 1 MiB, SnarkBarker & GUSar Lite, PC MIDI Card+X2+SC55+MT32, OSSC

Reply 31 of 38, by thrawn235

User metadata
Rank Newbie
Rank
Newbie

So.
ive done a couple things.
I wrote a VESA init method, so I can use the VESA modes now including proper page flipping.
for the regular mode 13 (without page flipping) ive reimplemented the memcopy in assembly.
I also made sure that the printing time of the milliseconds wont show up in the calculated time.

In my dosbox it runs too fast for a proper measurement. Its below 1ms.

The problem with my page flip is, the screen still flickers like crazy.
Something is wrong. the picture should be static like mode13h...

void GraphicsEngine::Flip()
{
if(pageFlipping)
{
if(currentPageAddress == screenMemory)
{
currentPageAddress = screenMemory + screenWidth*screenHeight;
SetDisplayStart(0, 0);
}
else
{
currentPageAddress = screenMemory;
SetDisplayStart(screenHeight, 0);
}
}
else
{
unsigned int maxScreenOffset = screenWidth * screenHeight;

asm("movl $0, %%ecx;"
"loop%=:;"
" movl (%%esi, %%ecx), %%eax;"
" movl %%eax, (%%edi, %%ecx);"
" add $4, %%ecx;"
" cmp %0, %%ecx;"
" jbe loop%=;"
:
:"m"(maxScreenOffset), "D"(&screenMemory[0]), "S"(backBuffer)
:"eax", "ecx", "memory");
}
}
void GraphicsEngine::SetDisplayStart(int newStartScanline, int newStartPixelOnScanline)
{
__dpmi_regs r;
r.x.ax = 0x4F07;
r.h.bh = 0x00;
r.h.bl = 0x80;
r.x.cx = newStartPixelOnScanline;
r.x.dx = newStartScanline;


__dpmi_int(0x10, &r);
}

If somebody wants to test it, ive attached the exe.
Would be cool to know how it performs on real hardware.
(use mode 100 in VESA mode. everything else is untested)

Attachments

  • Filename
    mode13.exe
    File size
    1.3 MiB
    Downloads
    62 downloads
    File license
    Fair use/fair dealing exception

Reply 32 of 38, by BloodyCactus

User metadata
Rank Oldbie
Rank
Oldbie

you need to wait for vertical retrace, polling the vsync flag, then copy/pageflip

--/\-[ Stu : Bloody Cactus :: [ https://bloodycactus.com :: http://kråketær.com ]-/\--

Reply 33 of 38, by root42

User metadata
Rank l33t
Rank
l33t
BloodyCactus wrote:

you need to wait for vertical retrace, polling the vsync flag, then copy/pageflip

For my let's code series I put the code up on github. There's also a wait for retrace function that OP can copy:

https://gist.github.com/root42/8e147c5ec2427f … 2ac50a5aba52317

#define INPUT_STATUS 0x3DA
#define VRETRACE_BIT 0x08

void wait_for_retrace()
{
while( inp( INPUT_STATUS ) & VRETRACE_BIT );
while( ! (inp( INPUT_STATUS ) & VRETRACE_BIT) );
}

YouTube and Bonus
80486DX@33 MHz, 16 MiB RAM, Tseng ET4000 1 MiB, SnarkBarker & GUSar Lite, PC MIDI Card+X2+SC55+MT32, OSSC

Reply 34 of 38, by Scali

User metadata
Rank l33t
Rank
l33t
BloodyCactus wrote:

you need to wait for vertical retrace, polling the vsync flag, then copy/pageflip

That might depend...
On a 6845, the screen offset register is latched. So you basically 'fire-and-forget', the value will become active for the next frame.
Which means you would actually first perform the pageflip and THEN wait for vertical retrace, to wait for the actual flip to occur, before you start drawing in the new backbuffer (else you're actually drawing into what is still the frontbuffer).

http://scalibq.wordpress.com/just-keeping-it- … ro-programming/

Reply 36 of 38, by thrawn235

User metadata
Rank Newbie
Rank
Newbie

Ok.
You guys are right. Waiting for Retrace fixes flickering.

But i dont understand why it flickers in the first place. i draw the image to the off buffer, then i swap the pointers, and then i draw to the other buffer (thats now the off buffer)

so how can it display an incomplete image. shouldn't there always be a complete image in the front buffer ?

Also, and thats really weird.
With waiting for retrace enabled, my time function just prints "e" all the time instead of a number?!
but only in VESA, mode in mode13h (with waitforRetrace) it still works ???

void GraphicsEngine::WaitForRetrace()
{
/* wait until done with vertical retrace */
while ((inportb(0x03da) & 0x08) != 8) {};
/* wait until done refreshing */
while ((inportb(0x03da) & 0x08) == 8) {};
}
//Main loop
while(running)
{

engine->time->SetFrameStartTimeStamp();

engine->input->PollKeys();
if(engine->input->KeyDown(1))
{
running = false;
}

engine->graphics->DrawRectangleASM(0,0,320,200,5);
engine->graphics->DrawRectangleASM(10,10,110,110,7);

engine->graphics->WaitForRetrace();
engine->graphics->Flip();

time = engine->time->GetTicksSinceFrameStart();
cout<<engine->time->TicksToMilliSeconds(time)<<" ";

}

Reply 37 of 38, by BloodyCactus

User metadata
Rank Oldbie
Rank
Oldbie
thrawn235 wrote:

But i dont understand why it flickers in the first place. i draw the image to the off buffer, then i swap the pointers, and then i draw to the other buffer (thats now the off buffer)

so how can it display an incomplete image. shouldn't there always be a complete image in the front buffer ?

because your drawing at a speed thats different from the screen. ie: Screen is in say, 60hz or 70hz but your drawing at 80 or 50 or something.

real crt + lcd will give different results.

waiting for retrace syncs up your drawing with the actual refresh rate of the screen.

--/\-[ Stu : Bloody Cactus :: [ https://bloodycactus.com :: http://kråketær.com ]-/\--

Reply 38 of 38, by Scali

User metadata
Rank l33t
Rank
l33t
thrawn235 wrote:

so how can it display an incomplete image. shouldn't there always be a complete image in the front buffer ?

As long as there are two different images in the two buffers, and you switch between them somewhere outside the vertical blank area, then you can get flicker, because part of the current displayed frame comes from one buffer, and part comes from another.

Drawing a screen takes time. The old CRT is quite intuitive that way: the cathode ray actually traces the screen one scanline at a time. So the output of the video card is always a single pixel at a time, and that is whatever the ray draws at that moment. Proper timing in the video card circuitry makes sure that pixels are switched at the correct time, scanlines are switched at the correct time, and eventually a vertical blank is inserted, so the ray can return to the top of the screen again.

Modern digital screens are not quite that 'direct', but the general idea still holds: The video memory is scanned from left to right, top to bottom, as it is sent to the internal framebuffer inside your flatscreen. So the concepts of vsync and tearing still apply.

http://scalibq.wordpress.com/just-keeping-it- … ro-programming/