VOGONS


blit patch

Topic actions

First post, by ih8registrations

User metadata
Rank Oldbie
Rank
Oldbie

memory, scalar, and dma blitting.

Attachments

  • Filename
    blits.diff
    File size
    11.39 KiB
    Downloads
    344 downloads
    File license
    Fair use/fair dealing exception

Reply 4 of 10, by wd

User metadata
Rank DOSBox Author
Rank
DOSBox Author

It gains some

Not relevant amounts, so why bother.

it adds more code

Uglifies places that are quite straightforward and readable.

and I'm missing where the string copy triggers a pagefault.

mem_strlen, not the copy.

Reply 6 of 10, by ih8registrations

User metadata
Rank Oldbie
Rank
Oldbie

Profiling shows usage is spread out such that many small optimizations is how things are going to be improved, less doing something like threading. I have another patch here optimized with 64bit decoding and gcc_unlikely path optimizations which brings another small bump.

Reply 7 of 10, by wd

User metadata
Rank DOSBox Author
Rank
DOSBox Author

which brings another small bump

Well the problem at this stage is that adding complexity for very small speed
gains makes other optimizations/rewrites/changes harder, so they're not
useful imo as they are not noticeable on regular PCs, and on low-powered
devices you got pretty much different problems anyways.
But that's only my humble opinion of course.

Reply 8 of 10, by wd

User metadata
Rank DOSBox Author
Rank
DOSBox Author

The scaler BituMove might be interesting and is easy enough (not sure if
the 16byte alignment is fine though). Did you profile some stuff with that?
Especially default modes (320x200 games with normal2x scaler).

Reply 9 of 10, by ih8registrations

User metadata
Rank Oldbie
Rank
Oldbie

Yeah, long ago. BTW, I'm on an Athlon XP with only 256k L2. In my testing, some games cycle between the three conditions, <8, >=8, >=8 /w tmp(remainder), but most hit one condition exclusively or most of the time. To improve size for even lower cache cpus, could change the <8 case to be a repeat byte rather than the if dword, if word, if byte.

It could be further shrunk by having the remainder fall through, using the <8 loop, like so:

static void DMA_BlockRead(PhysPt pt,void * data,Bitu size) {
Bit32u page=pt>>12;
Bit32u * pagemap;
Bit32u mask;
if (page < LINK_START) { Bit32u pageend=(pt+size)>>12; pagemap=pmap[pageend < EMM_PAGEFRAME4K]; mask=~0; }
else { pagemap=&page; mask=0; }

Bit64u * writeq=(Bit64u *) data;
if (size>=8) {
Bit8u tmp=size&0x07;
size>>=3;
do {
*writeq++=phys_readq(pagemap[(pt>>12)&mask]*4096 + (pt & 4095));
size--; pt+=8;
} while (size);
if (!tmp) return;
size=tmp;
}
Bit8u * write=(Bit8u *) writeq;
do {
*write++=phys_readb(pagemap[(pt>>12)&mask]*4096 + (pt & 4095));
pt++;
} while (--size);
}

Reply 10 of 10, by ih8registrations

User metadata
Rank Oldbie
Rank
Oldbie

Alternately, could do something like this:

#define optimize 1 
/* read a block from physical memory */
static void DMA_BlockRead(PhysPt pt,void * data,Bitu size) {
Bit32u page=pt>>12;
Bit32u * pagemap;
Bit32u mask;
if (page < LINK_START) { Bit32u pageend=(pt+size)>>12; pagemap=pmap[pageend < EMM_PAGEFRAME4K]; mask=~0; }
else { pagemap=&page; mask=0; }

#ifdef optimize
if (size>=8) {
Bit64u * writeq=(Bit64u *) data;
Bit8u tmp=size&0x07;
size>>=3;
do {
*writeq++=phys_readq(pagemap[(pt>>12)&mask]*4096 + (pt & 4095));
size--; pt+=8;
} while (size);

if (tmp) {
tmp=8-tmp; pt-=tmp; writeq=(Bit64u *)((Bit8u *)writeq-tmp);
*writeq=phys_readq(pagemap[(pt>>12)&mask]*4096 + (pt & 4095));
}
return;
}
#endif
Bit8u * write=(Bit8u *) data;
do {
*write++=phys_readb(pagemap[(pt>>12)&mask]*4096 + (pt & 4095));
pt++;
} while (--size);
}