VOGONS


very slow drawing with DJGPP

Topic actions

First post, by thrawn235

User metadata
Rank Newbie
Rank
Newbie

Hello,

Its my first post and i don't quite know whether this is the right place to post this.
Sorry if its not.

I started programming a little game using DJGPP.
Im compiling in linux using the djgpp-linux64-gcc720 package.

Im running my code in DosBox 0.74

The Program uses simple Mode13h
(I have since learned the screen is just 200px high. but that should have anything to do with my problems i think)

Now i have 2 problems that are potetially connected.

Problem one:
The Programm is incredibly slow. especially my drawing.

This is my main loop.
It just draws two rectangles, copies the backbuffer to videomemory and checks for ESC
I use time.h to meassure it.
Just that little bit takes 14ms!!

I dont know whether its my code, or something with dosbox. but thats incredibly slow.

while(running)
{

engine->time->SetFrameStartTimeStamp();

engine->graphics->DrawRectangleFast(0,0,320,240,5);
engine->graphics->DrawRectangleFast(10,10,110,110,6);

engine->graphics->Flip();

engine->input->PollKeys();
if(engine->input->KeyDown(1))
{
running = false;
}

cout<<engine->time->TicksToMilliSeconds(engine->time->GetTicksSinceFrameStart())<<" ";

}

Thats my DrawRect function:

void GraphicsEngine::DrawRectangleFast(int x, int y, int w, int h, int color)
{

int i, j;
for (j=y; j<y+h; j++)
{
int row = j*320;
for (i=x; i<x+w; i++)
{
backBuffer[row+i] = color;
}
}
}

And Flip:

void GraphicsEngine::Flip()
{
char *screen = (char*)0xa0000 + __djgpp_conventional_base;
int maxScreenOffset = 320*240;
for(int i = 0; i < maxScreenOffset; i++)
{
screen[i] = backBuffer[i];
}
}

The other question i have is, i print the frametime every frame. and i would assume them to be constant.
But the frames take much longer at first.
it starts at almost 50ms at first and then gets faster every few frames i gain 1ms after about 50 frames Im at 14ms

I dont do a lot of DOS programming. It is the first time i use DJGPP at all.

I'd appreciate any hints you guys can give me.

Reply 1 of 38, by retardware

User metadata
Rank Oldbie
Rank
Oldbie

It's decades ago that I did things with djgpp, so I don't know its' todays' state.

You should check out whether it still uses the BIOS drawing function.
If it does, this won't be really fast.
Because that means a BIOS call for each individual pixel.

Reply 2 of 38, by BloodyCactus

User metadata
Rank Oldbie
Rank
Oldbie

your flip is terrible. your doing a byte by byte when you want to do a memmove or 32bit load/store not 8bit.

but 320x240 implies modex and that needs different pixel manipulations than your flip.

(320*240 = 75kb, exceeds VGA's 64kb memory window, unless your in modex.).

--/\-[ Stu : Bloody Cactus :: [ https://bloodycactus.com :: http://kråketær.com ]-/\--

Reply 3 of 38, by Scali

User metadata
Rank l33t
Rank
l33t
BloodyCactus wrote:

your flip is terrible. your doing a byte by byte when you want to do a memmove or 32bit load/store not 8bit.

And even then, at 320x200 in mode 13h, you will need a localbus card and a fast 486 or better to get a decent framerate that way.
A simple 286 or 386 won't be fast enough to copy the 64k of data from main memory to video memory over an ISA bus at the full 70 Hz.

http://scalibq.wordpress.com/just-keeping-it- … ro-programming/

Reply 4 of 38, by root42

User metadata
Rank l33t
Rank
l33t

If using mode x you can simply do double buffering on the card. That will be way faster, except that draw routines will become a bit more complicated. You need to keep track of what parts of the screen need to be restored, so you avoid redrawing or copying the full screen in every frame.

Other than that, do what the others said. 32 bit memmove will be much faster.

YouTube and Bonus
80486DX@33 MHz, 16 MiB RAM, Tseng ET4000 1 MiB, SnarkBarker & GUSar Lite, PC MIDI Card+X2+SC55+MT32, OSSC

Reply 5 of 38, by thrawn235

User metadata
Rank Newbie
Rank
Newbie

Thanks for the Replies

retardware wrote:

You should check out whether it still uses the BIOS drawing function.
If it does, this won't be really fast.
Because that means a BIOS call for each individual pixel.

I'm not using any Bios calls to draw. i write directly to memory.

BloodyCactus wrote:

your flip is terrible. your doing a byte by byte when you want to do a memmove or 32bit load/store not 8bit.

I'm forced to agree with you there. 🤣
Do you guys know how to access an array of char with an int or long pointer ? with that i could move words instead of bytes.

BloodyCactus wrote:

but 320x240 implies modex and that needs different pixel manipulations than your flip.

That was an error on my part. Its not modeX just simple mode 13h.

If i have to, i could use modeX, or better yet, use the VESA bios extensions to get page flipping. But even just drawing the simple rect to m by backbuffer (without flipping at all) takes 6ms.
And thats not even using real sprites or any game logic at all.

Scali wrote:

And even then, at 320x200 in mode 13h, you will need a localbus card and a fast 486 or better to get a decent framerate that way.
A simple 286 or 386 won't be fast enough to copy the 64k of data from main memory to video memory over an ISA bus at the full 70 Hz.

So you think thats perfectly normal with Dosbox ?
It seems rather slow to me. i have the svga_s3 option enabled in my .conf file. there has to be a way to do that faster no?

root42 wrote:

If using mode x you can simply do double buffering on the card. That will be way faster, except that draw routines will become a bit more complicated. You need to keep track of what parts of the screen need to be restored, so you avoid redrawing or copying the full screen in every frame.

I was thinking of implementing some kind of adaptive tile refresh. That wouldnt be to hard. But in mode13 one cant move the screen in hardware if i'm informed correcty. So if i wanted scrolling (which i do want) i would have to redraw every frame anyway.

There has to be a way to do that though. games like doom have to redraw the whole buffer every frame. and they run great in dosbox.

Reply 6 of 38, by Scali

User metadata
Rank l33t
Rank
l33t
thrawn235 wrote:

So you think thats perfectly normal with Dosbox ?

Depends on your cycle-count.
I believe the default is 3000 cycles, which is indeed a reasonably slow system.
Although, I believe it switches to maximum when you run a 32-bit DOS extender, so that may not be an issue for you.

Other reasons why your code may be slow include compiling in debug mode, and compiling without optimizations.

thrawn235 wrote:

games like doom have to redraw the whole buffer every frame.

Redraw, yes, copy no.
DOOM uses Mode X and renders to a backbuffer in VRAM. Then it performs a real flip by just switching the framebuffer address on the CRTC.

http://scalibq.wordpress.com/just-keeping-it- … ro-programming/

Reply 7 of 38, by BloodyCactus

User metadata
Rank Oldbie
Rank
Oldbie

dosbox also does not have a penalty transferring from ram to video ram over an isa bus like real hardware. in dosbox its just an in memory copy.

--/\-[ Stu : Bloody Cactus :: [ https://bloodycactus.com :: http://kråketær.com ]-/\--

Reply 8 of 38, by Qbix

User metadata
Rank DOSBox Author
Rank
DOSBox Author

A real memcpy command instead of that loop when flipping will be much faster and you might be running in debug mode as well.

Water flows down the stream
How to ask questions the smart way!

Reply 10 of 38, by root42

User metadata
Rank l33t
Rank
l33t

How do you compile your code? E.g. this is debug:

gcc -g -W -Wall foo.c -o foo.exe

This would be with optimizations:

gcc -O3 -W -Wall foo.c -o foo.exe

YouTube and Bonus
80486DX@33 MHz, 16 MiB RAM, Tseng ET4000 1 MiB, SnarkBarker & GUSar Lite, PC MIDI Card+X2+SC55+MT32, OSSC

Reply 11 of 38, by thrawn235

User metadata
Rank Newbie
Rank
Newbie

I use this makefile to compile:

all: mode13

objects.o: objects.cpp objects.h
i586-pc-msdosdjgpp-g++ -c objects.cpp
engine.o: engine.cpp engine.h
i586-pc-msdosdjgpp-g++ -c engine.cpp
main.o: main.cpp
i586-pc-msdosdjgpp-g++ -c main.cpp
mode13: objects.o engine.o main.o
i586-pc-msdosdjgpp-g++ main.o engine.o objects.o -o mode13.exe -Wall -Os

clean:
rm *.o
rm mode13.exe

Reply 12 of 38, by root42

User metadata
Rank l33t
Rank
l33t

Try replacing -Os with -O3. The first optimizes for code size, the latter more for speed.

Also you need to put that in every compile target. During linking where you are using it, it's worthless.

Also get some documentation on make. You should use generic targets, so you can easily add and compile source files.

YouTube and Bonus
80486DX@33 MHz, 16 MiB RAM, Tseng ET4000 1 MiB, SnarkBarker & GUSar Lite, PC MIDI Card+X2+SC55+MT32, OSSC

Reply 13 of 38, by thrawn235

User metadata
Rank Newbie
Rank
Newbie
root42 wrote:

Also you need to put that in every compile target. During linking where you are using it, it's worthless.

Geez thanks.
That makes perfect sense.
Quite amazing what one can learn by posting to a forum 😀

Just using -O3 correctly brought the combined drawing and flipping time down to 4ms.
Thats much more manageble than the 14ms from before.

thanks!

If i could figure out how to use long pointers instead of char, i could probably cut that in half again.

Reply 14 of 38, by gerwin

User metadata
Rank l33t
Rank
l33t

Doom MBF in my signature compiles with DJGPP. You can look at it for examples. This 2.04 version does not use external libraries for the drawing routines anymore. It has different modes of drawing: Mode 13h 320x200, Mode X 320x200 (for page flipping only), VESA mode 320x200 or 640x400.

--> ISA Soundcard Overview // Doom MBF 2.04 // SetMul

Reply 15 of 38, by root42

User metadata
Rank l33t
Rank
l33t

For that djgpp has movdatal, which does 32 bit transfers:

http://www.delorie.com/djgpp/doc/libc/libc_585.html

Beware this is not portable, and restricted to djgpp.

YouTube and Bonus
80486DX@33 MHz, 16 MiB RAM, Tseng ET4000 1 MiB, SnarkBarker & GUSar Lite, PC MIDI Card+X2+SC55+MT32, OSSC

Reply 16 of 38, by Scali

User metadata
Rank l33t
Rank
l33t
thrawn235 wrote:

Just using -O3 correctly brought the combined drawing and flipping time down to 4ms.
Thats much more manageble than the 14ms from before.

Bear in mind that DOSBox performance is completely unrepresentative of actual hardware.
In some cases, you may get gains in DOSBox that aren't there on real hardware. In other case, code that may run faster on real hardware shows no gains in DOSBox.
Don't ever try to use DOSBox to optimize your code.

http://scalibq.wordpress.com/just-keeping-it- … ro-programming/

Reply 17 of 38, by root42

User metadata
Rank l33t
Rank
l33t

Untested, but this is a more generic Makefile. Note that it will compile too often, since the source files depend on ALL header files. But you can add a bit more to compute all header dependencies automatically. I leave that up for practice.

If you want to add a new source file, just add it to the SOURCES line.

CXX=i586-pc-msdosdjgpp-g++
CXXFLAGS=-O3 -W -Wall
.PHONY: clean

SOURCES=objects.cpp engine.cpp main.cpp
HEADERS=$(SOURCES:.cpp=.h)
OBJECTS=$(SOURCES:.cpp=.o)
EXECUTABLE=mode13.exe

all: $(SOURCES) $(EXECUTABLE)

.cpp.o: $(HEADERS)
$(CXX) $(CXXFLAGS) -c $< -o $@

$(EXECUTABLE): $(OBJECTS)
$(CXX) $(OBJECTS) -o $@

clean:
rm $(OBJECTS) $(EXECUTABLE)

YouTube and Bonus
80486DX@33 MHz, 16 MiB RAM, Tseng ET4000 1 MiB, SnarkBarker & GUSar Lite, PC MIDI Card+X2+SC55+MT32, OSSC

Reply 18 of 38, by thrawn235

User metadata
Rank Newbie
Rank
Newbie
gerwin wrote:

Doom MBF

thanks. Ill check that out.

root42 wrote:

Bear in mind that DOSBox performance is completely unrepresentative of actual hardware.

I know. I'm just comparing relatively, from one implementation to the other (both in DosBox) I'm still a long way out of actually testing it on real hardware.

root42 wrote:

Untested, but this is a more generic Makefile

thanks. I'll have to read up on makefiles. I intentionally created the most simple makefile i could. Just to get a grasp on the function of it all.

I did finally figure out how to use long pointers for my flip function
really isnt that hard once one knows how...

void GraphicsEngine::Flip()
{
char *screen = (char*)0xa0000 + __djgpp_conventional_base;
int maxScreenOffset = (320*200)/4;
long* accessPtrScreen;
accessPtrScreen = (long*)screen;
long* accessPtrBackBuffer;
accessPtrBackBuffer = (long*)backBuffer;
for(unsigned int i = 0; i < maxScreenOffset; i++)
{
accessPtrScreen[i] = accessPtrBackBuffer[i];
}
}

that brought it down to just 2ms per frame.

I've also tried quite a bit of inline assembly. without much success though.

i started to implement my drawRect function in asm.
first thing i tried was to put four times the color in one long register.
thats what i came up with. it doesnt seem to work.
it just draws the first pixel, but it should draw 4 pixels
edx is supposed to hold 4 times the color and &backBuffer[0] ist the first address of my backbuffer.

asm("movb %4, %%dl;"
"shl $4, %%edx;"
"movb %4, %%dl;"
"shl $4, %%edx;"
"movb %4, %%dl;"
"shl $4, %%edx;"
"movb %4, %%dl;"

"movl %%edx, (%5);"
:
:"m"(x), "m"(y), "m"(w), "m"(h), "m"(color), "D"(&backBuffer[0]), "m"(screenWidth)
:"memory");

Do you guys know whats wrong with it ?

Reply 19 of 38, by root42

User metadata
Rank l33t
Rank
l33t

First, I think you should add edx to the clobber list ofmthe asm part. Second, instead of &backBuffer[0], simply backBuffer should be enough.

Second, did you try to fill edx with a static value, e.g. 0xffffffff to see if it sets four pixels? Start by making your code simpler, then adding functionality.

YouTube and Bonus
80486DX@33 MHz, 16 MiB RAM, Tseng ET4000 1 MiB, SnarkBarker & GUSar Lite, PC MIDI Card+X2+SC55+MT32, OSSC