VOGONS


2003-7-26 Release

Topic actions

  • This topic is locked. You cannot reply or edit posts.

First post, by canadacow

User metadata
Rank Member
Rank
Member

I swear, if I had real employment there wouldn't be nearly as many updates as there have been recently. Anyway, after a serious bout of insomnia, I've added quite a few changes to this version. This includes an all new envelope management system and some optimizations courtesy of ih8registrations. I'm still not quite there yet, but again, I think everyone will agree its another good step closer. Strangely, in this version I've noticed some disappearing instruments on occasion (particularly in the Monkey Island 1 theme). I'm trying to track it down. I'm sure its some careless bug in the new envelope code.

As always, get it from http://www.artworxinn.com/alex

Reply 1 of 23, by ih8registrations

User metadata
Rank Oldbie
Rank
Oldbie

You know what's coming, more optmization:)

INLINE Bit16s MidiChannel::getPitchEnvelope(dpoly::partialStatus *pStat, dpoly *poly, bool inDecay) {
Bit32u sampoff;
patchCache *tcache = &pcache[pStat->partNum];

Bit16s tc;

pStat->pitchsustain = false;

if(inDecay) {
if(pStat->isDecayed || (pStat->envpos[PITCHENV] >= pStat->envsize[PITCHENV])) {
tc = tcache->pitchEnv.level[4];
pStat->prevlevel[PITCHENV] = tc;
return tc;
}
} else {

if(pStat->envstat[PITCHENV]==2) {
tc =tcache->pitchEnv.level[3];

if(tcache->sustain)
pStat->pitchsustain = true
else
StartDecay(PITCHENV, tcache->pitchEnv.level[3], pStat, poly);

pStat->prevlevel[PITCHENV] = tc;
return tc;
} else {

if((pStat->envstat[PITCHENV]==-1) || (pStat->envpos[PITCHENV] >= pStat->envsize[PITCHENV])) {
pStat->envbase[PITCHENV] = tcache->pitchEnv.level[pStat->envstat[PITCHENV]+1];
pStat->envstat[PITCHENV]++;

pStat->envpos[PITCHENV] = 0;
pStat->envsize[PITCHENV] = (envtimetable[tcache->pitchEnv.time[pStat->envstat[PITCHENV]]] * fildeptable[tcache->pitchEnv.timekeyfollow][poly->freqnum]) >> 8;
pStat->envsize[PITCHENV]++;
pStat->envdist[PITCHENV] = tcache->pitchEnv.level[pStat->envstat[PITCHENV]+1] - pStat->envbase[PITCHENV];
}

}

}
tc = pStat->envbase[PITCHENV];
tc = (tc + ((pStat->envdist[PITCHENV] * pStat->envpos[PITCHENV]) / pStat->envsize[PITCHENV]));

pStat->prevlevel[PITCHENV] = tc;
return tc;

}

Two extra
'pStat->prevlevel[PITCHENV] = tc;
return tc; '

are still shorter than one extra
'tc = pStat->envbase[PITCHENV];
tc = (tc + ((pStat->envdist[PITCHENV] * pStat->envpos[PITCHENV]) / pStat->envsize[PITCHENV]));'

size opt: deault false assign for pStat->pitchsustain. only one case where it's true. cost of default assign offset by saved jump from the immediate return tc and because that happens more than once, updated function is still faster overall.

Reply 2 of 23, by canadacow

User metadata
Rank Member
Rank
Member

Atleast in Visual C, immediate returns are no more optimized than just letting the code run to the end of the routine. The second set of dividing to calculate was to calculate for the decaying side of the envelope. Once decayed, the code should not be allowed into the standard block because once the envelope position (envpos) extends past the envelope size (envsize) the code then moves to the next part of the envelope. The final decay, of course, is the end of the line. This is still needed because even though the pitchenv could be complete in its decay, the other two envelopes (amplitude and filter) could still be far from complete decay. For informational purposes, here's Visual C's generated assembly code for this routine:

; 1284 : INLINE Bit16s MidiChannel::getPitchEnvelope(dpoly::partialStatus *pStat, dpoly *poly, bool inDecay) {

push ebx
push esi

; 1285 : Bit32u sampoff;
; 1286 : patchCache *tcache = &pcache[pStat->partNum];

mov esi, DWORD PTR _pStat$[esp+4]
mov eax, DWORD PTR [esi+284]
imul eax, 4008 ; 00000fa8H

; 1287 :
; 1288 : Bit16s tc;
; 1289 : pStat->pitchsustain = false;

xor ebx, ebx

; 1290 : if(inDecay) {

cmp BYTE PTR _inDecay$[esp+4], bl
push edi
lea eax, DWORD PTR [eax+ecx+262248]
mov BYTE PTR [esi+280], bl
je SHORT $L68822

; 1291 :
; 1292 : if((pStat->isDecayed) || (pStat->envpos[PITCHENV] >= pStat->envsize[PITCHENV])) {

cmp BYTE PTR [esi+144], bl
jne SHORT $L68824
mov ecx, DWORD PTR [esi+20]
mov edi, DWORD PTR [esi+84]
cmp ecx, edi
jge SHORT $L68824

; 1294 : } else {
; 1295 : tc = pStat->envbase[PITCHENV];
; 1296 : tc = (tc + ((pStat->envdist[PITCHENV] * pStat->envpos[PITCHENV]) / pStat->envsize[PITCHENV]));

mov eax, DWORD PTR [esi+68]
imul eax, ecx
cdq
idiv edi
mov edi, eax
add di, WORD PTR [esi+52]
jmp $L68826
$L68824:

; 1293 : tc = tcache->pitchEnv.level[4];

movsx di, BYTE PTR [eax+119]

; 1297 : }
; 1298 : } else {

jmp $L68826
$L68822:

; 1299 :
Show last 132 lines
; 1300 : 		if(pStat->envstat[PITCHENV]==2) {

mov edx, DWORD PTR [esi+36]
cmp edx, 2
jne SHORT $L68827

; 1301 : if(tcache->sustain) {

cmp BYTE PTR [eax+24], bl
je SHORT $L68828

; 1302 : tc =tcache->pitchEnv.level[3];

movsx di, BYTE PTR [eax+118]

; 1303 : pStat->prevlevel[PITCHENV] = tc;

movsx eax, di
mov DWORD PTR [esi+136], eax

; 1304 : pStat->pitchsustain = true;

mov BYTE PTR [esi+280], 1

; 1305 : } else {

jmp SHORT $L68830
$L68828:

; 1306 : tc =tcache->pitchEnv.level[3];

mov al, BYTE PTR [eax+118]

; 1307 : StartDecay(PITCHENV, tcache->pitchEnv.level[3], pStat, poly);

push DWORD PTR _poly$[esp+8]
movsx di, al
movsx eax, al
push esi
push eax
push 2
call ?StartDecay@MidiChannel@@QAEXHJPAUpartialStatus@dpoly@@PAU3@@Z ; MidiChannel::StartDecay

; 1308 : }
; 1309 :
; 1310 : } else {

jmp SHORT $L68830
$L68827:

; 1311 :
; 1312 : if((pStat->envstat[PITCHENV]==-1) || (pStat->envpos[PITCHENV] >= pStat->envsize[PITCHENV])) {

cmp edx, -1
je SHORT $L68832
mov ecx, DWORD PTR [esi+20]
cmp ecx, DWORD PTR [esi+84]
jl SHORT $L68831
$L68832:

; 1313 : pStat->envbase[PITCHENV] = tcache->pitchEnv.level[pStat->envstat[PITCHENV]+1];

movsx ecx, BYTE PTR [edx+eax+116]
mov DWORD PTR [esi+52], ecx

; 1314 : pStat->envstat[PITCHENV]++;

lea ecx, DWORD PTR [edx+1]

; 1315 :
; 1316 : pStat->envpos[PITCHENV] = 0;
; 1317 : pStat->envsize[PITCHENV] = (envtimetable[tcache->pitchEnv.time[pStat->envstat[PITCHENV]]] * fildeptable[tcache->pitchEnv.timekeyfollow][poly->freqnum]) >> 8;
; 1318 : pStat->envsize[PITCHENV]++;

mov edx, DWORD PTR _poly$[esp+8]
mov DWORD PTR [esi+36], ecx
add ecx, eax
mov DWORD PTR [esi+20], ebx
movsx eax, BYTE PTR [eax+110]
shl eax, 7
add eax, DWORD PTR [edx+12]
movsx edx, BYTE PTR [ecx+111]
mov eax, DWORD PTR _fildeptable[eax*4]
imul eax, DWORD PTR _envtimetable[edx*4]
sar eax, 8
inc eax
mov DWORD PTR [esi+84], eax

; 1319 : pStat->envdist[PITCHENV] = tcache->pitchEnv.level[pStat->envstat[PITCHENV]+1] - pStat->envbase[PITCHENV];

movsx eax, BYTE PTR [ecx+116]
sub eax, DWORD PTR [esi+52]
mov DWORD PTR [esi+68], eax
$L68831:

; 1320 : }
; 1321 :
; 1322 : tc = pStat->envbase[PITCHENV];
; 1323 : tc = (tc + ((pStat->envdist[PITCHENV] * pStat->envpos[PITCHENV]) / pStat->envsize[PITCHENV]));

mov eax, DWORD PTR [esi+68]
imul eax, DWORD PTR [esi+20]
cdq
idiv DWORD PTR [esi+84]
mov edi, eax
add di, WORD PTR [esi+52]
$L68830:

; 1324 :
; 1325 : }
; 1326 : pStat->prevlevel[PITCHENV] = tc;

movsx eax, di
mov DWORD PTR [esi+136], eax
$L68826:

; 1327 :
; 1328 :
; 1329 : }
; 1330 :
; 1331 : return tc;

mov ax, di
pop edi
pop esi
pop ebx

; 1332 :
; 1333 : }

ret 12 ; 0000000cH

Reply 3 of 23, by ih8registrations

User metadata
Rank Oldbie
Rank
Oldbie

Fair enough. As for the second caclulation, it sounds like you think I only applied the divide to calc for the first case? Barring a bug, the rewrite is functionally equivalent. The handling of both cases were moved outside of the if else structure and became the general case. For the other cases, they should return without hitting it.

The asm readout is of my rewrite? It looks like a mix of old & new. duplicate div to calc of old is in there.

and for my code snippet here:

	if(inDecay) {
if(pStat->isDecayed || (pStat->envpos[PITCHENV] >= pStat->envsize[PITCHENV])) {
tc = tcache->pitchEnv.level[4];
pStat->prevlevel[PITCHENV] = tc;
return tc;
}

it looks like it's not setting pStat for this case, 1296's jump to L68826. To match what I wrote it would need to jump to L68830. 1329 is part of the problem as 1326 should not be inside the else.

If that's Visual c's interpretation of my code I'm not impressed, goes against what I told it to do by reversing my size optimization & introduces a bug:P

C does give the power to strictlly tell the compiler what to do by way of the goto statement. Frowned upon in polite society but as you can see it's what the compilers doing anyway and the most direct way to specify forward jumps in c.

INLINE Bit16s  MidiChannel::getPitchEnvelope(dpoly::partialStatus
*pStat, dpoly *poly, bool inDecay) {
Bit32u sampoff;
patchCache *tcache = &pcache[pStat->partNum];

Bit16s tc;

pStat->pitchsustain = false;

if(inDecay) {
if(pStat->isDecayed || (pStat->envpos[PITCHENV] >= pStat->envsize[PITCHENV])) {
tc = tcache->pitchEnv.level[4];
goto dowhatisay;
}
} else {

if(pStat->envstat[PITCHENV]==2) {
tc =tcache->pitchEnv.level[3];

if(tcache->sustain)
pStat->pitchsustain = true
else
StartDecay(PITCHENV, tcache->pitchEnv.level[3], pStat, poly);
goto dowhatisay;
} else {

if((pStat->envstat[PITCHENV]==-1) || (pStat->envpos[PITCHENV] >= pStat->envsize[PITCHENV])) {
pStat->envbase[PITCHENV] = tcache->pitchEnv.level[pStat->envstat[PITCHENV]+1];
pStat->envstat[PITCHENV]++;

pStat->envpos[PITCHENV] = 0;
pStat->envsize[PITCHENV] = (envtimetable[tcache->pitchEnv.time[pStat->envstat[PITCHENV]]] * fildeptable[tcache->pitchEnv.timekeyfollow][poly->freqnum]) >> 8;
pStat->envsize[PITCHENV]++;
pStat->envdist[PITCHENV] = tcache->pitchEnv.level[pStat->envstat[PITCHENV]+1] - pStat->envbase[PITCHENV];
}

}

}
tc = pStat->envbase[PITCHENV];
tc = (tc + ((pStat->envdist[PITCHENV] * pStat->envpos[PITCHENV]) / pStat->envsize[PITCHENV]));
dowhatisay:
pStat->prevlevel[PITCHENV] = tc;
return tc;

}

How to tell Visual c to do the ending tc calc just once, but that I really mean it this time, I'm unsure. It should be doing what I tell it to as is.

ps. you may have noticed all the indexed refs of pStat->envXXX[idx] are four byte addressing + four byte base + 1byte immediate and when stored put into a four byte register. For size optimization, if there's three or more references to XXX without modifiying, copying to a temp variable before using will save. If modified, six or more will save. stat qualifies but just so, would save a whole two bytes:)

Last edited by ih8registrations on 2003-07-27, 11:22. Edited 1 time in total.

Reply 4 of 23, by ih8registrations

User metadata
Rank Oldbie
Rank
Oldbie

In InitTables from MidiHandler_mt32 class, merging the following loops save 31k cycles.

                for(dep=0;dep<=100;dep++) {
for(velt=0;velt<128;velt++) {
float fdep = ((float)dep / 100.0) * 256;
float fv = (velt - 64.0) / 64.0;
tempdep = 256.0 + (fdep * fv);
filveltable[velt][dep] = (int)tempdep;
//LOG_MSG("Filvel dep %d velt %d = %x", dep, velt, filveltable[velt][dep]);
}
}

float lfp, depf, finalval;
int depat, pval;

for(lf=0;lf<=100;lf++) {
// I believe the depth is cubed or something
lfp = pow(((float)lf / 100.0),3);
// Maybe its not
// lfp = (float)lf / 100.0;

for(depat=0;depat<=100;depat++) {
depf = ((float)depat - 50.0) / 50.0;
finalval = pow(2, lfp * depf * .25);
pval = (int)(finalval * 256);

lfoptable[lf][depat] = pval;

//LOG_MSG("lf %d depat %d pval %x", lf,depat,pval);

}
}

                float lfp, depf, finalval;
int depat, pval;
for(lf=0;lf<=100;lf++) {
// I believe the depth is cubed or something
lfp = pow(((float)lf / 100.0),3);
// Maybe its not
// lfp = (float)lf / 100.0;

for(depat=0;depat<=100;depat++) {
depf = ((float)depat - 50.0) / 50.0;
finalval = pow(2, lfp * depf * .25);
pval = (int)(finalval * 256);

lfoptable[lf][depat] = pval;

//LOG_MSG("lf %d depat %d pval %x", lf,depat,pval);

float fdep = ((float)lf / 100.0) * 256;
float fv = (depat - 64.0) / 64.0;
tempdep = 256.0 + (fdep * fv);
filveltable[depat][lf] = (int)tempdep;
//LOG_MSG("Filvel dep %d velt %d = %x", dep, velt, filveltable[velt][dep]);
}
for(velt=101;velt<128;velt++) {
float fdep = ((float)lf / 100.0) * 256;
float fv = (velt - 64.0) / 64.0;
tempdep = 256.0 + (fdep * fv);
filveltable[velt][lf] = (int)tempdep;
//LOG_MSG("Filvel dep %d velt %d = %x", dep, velt, filveltable[velt][dep]);
}
}

Merging the outside loop saves 100 cmp, inc, & jmps, as well as probably a mov since there's probably enough going on to need reloading the counter. That's the 1k. The 30k comes from merging the two inner loops for 100 iterations. again, saving a cmp, inc & jmp, probably not a mov, *100 inner * 100 outer. If we lowball & say they all take only one cycle, possible, ignoring pontential stalls, other, then it's 3*100*100 + outer 1k; 31k. There's several other outside loops in InitTables than can be merged and some other tweaks, for about another 5k or so that I saw, but this is the biggest savings to be had. The cost is the four lines of duplicated code but for 31k cycles, I can live with that:)

Reply 5 of 23, by ih8registrations

User metadata
Rank Oldbie
Rank
Oldbie

I didn't look close enough, here's nearly 7k easily saved right here:

                int period = 256;
float angdelt = (360 / (float)period) * (PI / 180);

float angval = 0;
for(int ang=0;ang<period;ang++) {

int halfang = (period / 2);
int quartang = (period / 4);
int angval = ang % quartang;
float tval = (float)angval / ((float)quartang);
if(ang<=quartang) sintable[ang] = (int)(tval * 256);
else if ((ang<=halfang) && (ang>quartang)) sintable[ang] = (int)((1.0-tval) * 256);
else if ((ang>halfang) && (ang<=(quartang+halfang))) sintable[ang] = (int)(tval * -256);
else if (ang>(quartang+halfang)) sintable[ang] = (int)((1.0-tval) * -256);
sintable[period/4] = 256;
sintable[period/2] = 0;
sintable[(period*3)/4] = -256;

//LOG_MSG("Lfo ang %d = value %d", ang, sintable[ang]);


sintable[period] *= 50;

}
// for(ang=0;ang<period;ang++) sintable[period] *= 50;
int velt, dep;
float tempdep;

for(velt=0;velt<128;velt++) {
veltkeytable[0][velt] = 256;
for(dep=1;dep<5;dep++) {
// if(dep>0) {
float ff = ((float)f) / (5 - dep) ;

tempdep = 256.0 - (ff);
veltkeytable[dep][velt] = (int)tempdep;
// Crap... parameters not right yet
//veltkeytable[dep][velt] = 256;
// } else {
// veltkeytable[dep][velt] = 256;
// }
}
}

added elses, 3.5k, removed dep>0 check, 2.5k, single line loop 768. There's easilly another trimmable 2k around to round the cycles saved to 40k.

Last edited by ih8registrations on 2003-07-27, 11:50. Edited 1 time in total.

Reply 6 of 23, by ih8registrations

User metadata
Rank Oldbie
Rank
Oldbie

Changing the beginning of PlayMsg to what's below saves 10 cycles when chan>8.

        void PlayMsg(Bit32u msg) {
int chan = msg & 0xf;
isEnabled= true;
//if(chan!=0x9) {
// if(chan==12) return;
// chan = chan & 0x7;
//
//} else {
// chan = 8;
//}
//if (chan==0) return;
//int prechan = chan;
//if(code!=0xf0) LOG_MSG("Playing chan %d, code 0x%x note: 0x%x", chan, code, note);

chan = chantable[chan];
if(chan>8) return;
//LOG_MSG("Play msg on unreg chan: %d = %d", chan, msg & 0xf);
if(chan<0) {
//LOG_MSG("Play msg on unreg chan: %d = %d", chan, msg & 0xf);
return;

}
int h;
int code = msg & 0xf0;
int note = (msg & 0xff00) >> 8;
int velocity = (msg & 0xff0000) >> 16;

as well, for case 0xc0: of PlayMsg
remove 'if((chan>=0) && (chan<8))' as it's unnecessary.

Reply 8 of 23, by canadacow

User metadata
Rank Member
Rank
Member

Wow... thanks for all the updates. I'm having trouble keeping up. As for the pitch envelope, it needs those duplicate divs because one manages the attack form of the envelope while the other one manages the decay form. Thanks again for your changes. I'm not too incredibly worried about the table generation. On my Celeron 1333Mhz it takes about half a second to generate all the tables--with most of this being consumed by the table generation for the lowpass filter. The real are of concern is the main processing area, the getSample routine. Its in that subroutine where optimizations will be most valuable.

Reply 9 of 23, by ih8registrations

User metadata
Rank Oldbie
Rank
Oldbie

Ok here' goes:

INLINE short MidiChannel::getSample(short *lspecial, short *rspecial) {

int t, m, c, loc, pcm, partplay;
dpoly *tmppoly;

//if(!isRy) return 0;
//if ((this->channum<2) || (this->channum>8)) return 0;
//if(this->channum!=1) return 0;

if(isRy)
partplay = DRUMPOLY
else partplay = DPOLY;

for (m=0;m<partplay;m++) {
Bit16s envval;
Bit16s ampval;
tmppoly = &notepoly[m];
if(tmppoly->isPlaying || tmppoly->isDecay) {
int ptemp[5];
memset(ptemp,0,sizeof(ptemp));
bool isDone = true;

for(t=0;t<4;t++) {
patchCache *tcache = &pcache[t];
dpoly::partialStatus *partCache = &tmppoly->pStatus[t];
if(isRy) tcache = &drumCache[tmppoly->pcmnum][t];

if((tcache->playPartial) && (!partCache->isDecayed)) {
isDone = false;
// Calculate TVA envelope
ampval = getAmpEnvelope(partCache,tmppoly,partCache->decaying[AMPENV]);
ampval = amptable[ampval];
int tmpvel = tmppoly->vel;
if(tcache->ampenvdir==1) tmpvel = 127-tmpvel;
ampval = (ampval * ampveltable[tmpvel][tcache->ampEnv.velosens]) >> 8;

// Calculate Pitch envelope
envval = getPitchEnvelope(partCache,tmppoly,partCache->decaying[PITCHENV]);
//if(envval<-50) envval=-50;
//if(envval>50) envval=50;
//envval += 50;
int pdep = penvtable[tcache->pitchEnv.depth][envval];

// Calculate LFO position
// LFO does not kick in until pitch envelope sustains
int lfoat;
if((tcache->lfodepth>0) && (partCache->pitchsustain)) {
if(partCache->lfopos>=tcache->lfoperiod)
partCache->lfopos = 0;
else partCache->lfopos++;

lfoat = (partCache->lfopos << 8) / tcache->lfoperiod;
lfoat = lfoptable[tcache->lfodepth][((sintable[lfoat]) >> 8)+50];
//LOG_MSG("lfodepth %d, lfoatr %d, lfoat %x period %d pos %d",tcache->lfodepth,lfoatr, lfoat, tcache->lfoperiod, tmppoly->lfopos);
} else lfoat = 0x100;

// Get waveform - either PCM or synthesized sawtooth or square
soundaddr *pOff = &partCache->partialOff;
int delta = 0x10000, noteval = partCache->noteval;
if (tcache->PCMPartial) {
Show last 129 lines
						// PCM partial
if(tcache->rawPCM>53) {
if(tcache->rawPCM>=74) {
if (partCache->PCMDone) {
pOff->pcmabs =0;
partCache->PCMDone = false;
}
pcm = PCMReassign[tcache->rawPCM - 74];
} else pcm = PCMReassign[tcache->rawPCM - 54];
} else pcm = tcache->convPCM;

delta = wavtabler[pcm][noteval];

if (!partCache->PCMDone) {
int ra, rb, addr = PCM[pcm].addr;
if(delta<0x10000) {
// Linear sound interpolation
ra = romfile[addr + pOff->pcmoffs.pcmplace];
rb = romfile[addr + pOff->pcmoffs.pcmplace+1];
ptemp[t] = (ra + (((rb-ra) * pOff->pcmoffs.pcmoffset) >>16));
} else
ptemp[t] = romfile[addr + pOff->pcmoffs.pcmplace];

if ((pOff->pcmoffs.pcmplace) >=PCM[pcm].len) {
if(PCM[pcm].loop)
pOff->pcmabs = 0
else partCache->PCMDone = true;
}
}
} else {
// Synthesis partial
int divis, ofs3, toff, wf;

toff = pOff->pcmoffs.pcmplace;
divis = divtable[noteval]>>15;

if(pOff->pcmoffs.pcmplace>=divis) pOff->pcmabs = (pOff->pcmoffs.pcmoffset % divis);

if(tcache->waveform == 0) {
// Square waveform. Made by combining two pregenerated bandlimited
// sawtooth waveforms
int divmark = divtable[noteval]>>8;

ofs3 = (toff + ((divmark*pulsetable[tcache->pulsewidth])>>16)) % (divis >> 1);

ptemp[t] = waveforms[0][noteval][toff % (divis >> 1)] + waveforms[1][noteval][ofs3];
} else {
// Sawtooth. Made by combining the full cosine and half cosine according
// to how the MT-32 does it. This is identical to the MT-32's operation
wf = 2;
if(toff >= sawtable[noteval][tcache->pulsewidth]) wf++;
ptemp[t] = waveforms[wf][noteval][toff];
}
ptemp[t] = getFiltEnvelope(ptemp[t],partCache,tmppoly,partCache->decaying[FILTENV]);
}
// Build delta for position of next sample
delta = (delta * finetable[tcache->fineshift])>>8;
delta = (delta * pdep)>>8;
delta = (delta * lfoat)>>8;

// Add calculated delta to our waveform offset
pOff->pcmabs+=delta;

// Put volume envelope over generated sample
ptemp[t] = (ptemp[t] * (int)ampval * (int)v) >> 14;

for(int envnum=0;envnum<3;envnum++) partCache->envpos[envnum]++;
}
}
if(isDone) {
tmppoly->isPlaying = false;
tmppoly->isDecay = false;
}
// Post process partials and bring them together
int temps, s1, s2, i = 0;
*lspecial = *rspecial = 0;
for(int z=0;z<2;z++) {
if(z==0) {
temps = mt32ram.params.patch[patch].common.pstruct12;
s1=0;
s2=1;
} else {
temps = mt32ram.params.patch[patch].common.pstruct34;
s1=2;
s2=3;
}
if(!pcache[s1].playPartial) s1=4;
if(!pcache[s2].playPartial) s2=4;
//LOG_MSG("z %d ps %d, s1 %d s2 %d", z, temps, s1, s2);

temps = PartMixStruct[temps];

switch(temps) {
case 0:
// Standard sound mix
i+=ptemp[s1] + ptemp[s2];
break;
case 1:
// Ring modulation with sound mix
i+=(((ptemp[s1] * ptemp[s2])>>WGAMP) + ptemp[s1]);
break;
case 2:
// Ring modulation alone
i+=((ptemp[s1] * ptemp[s2])>>WGAMP);
break;
case 3:
// Stereo mixing. One partial to one channel, one to another.
*lspecial += ptemp[s1];
*rspecial += ptemp[s2];
default:
i+=ptemp[s1] + ptemp[s2];
break;
}
}
if (!isRy) {
// Mix standard tibre
c += i;
} else {
c = 0;
// Drums have their special, built in panpot locations
*lspecial += ((i * drumPan[tmppoly->pcmnum][0]) >> 8);
*rspecial += ((i * drumPan[tmppoly->pcmnum][1]) >> 8);
}
//tmppoly->pcmoff.pcmabs +=tmppoly->pcmdelta;
}
}
return c;
}

/*
got rid of linefeeds for whitespace, indentions suffice; easier to trace with more on a page
partplay = DPOLY made part of if else than setting than overriding if isRy
moved int i, r init to where they are used
moved *lspecial = *rspecial = 0 to where they are used, same place as int i;
removed int x, shitguard, unused
removed Bit32u tmpoff, unused
removed bool playwav = true, unused
removed int v & v = volume, unused
init c to 0 moved to bottom of function into conditional
cleaned up calculate lfo position
removed unneccessary temp var pd
cleaned up pcm partial
cleaned up synthesis partial
*/

Again, I think you're misunderstanding my code change, or I'm not understanding what you're saying; my change in the code still does the div for both cases, it just doesn't have two copies of the call; it's a space saving optimization.

Next up to optimize for getSample are the functions it calls.

Reply 10 of 23, by ih8registrations

User metadata
Rank Oldbie
Rank
Oldbie

Called by getSample.

INLINE Bit16s MidiChannel::getAmpEnvelope(dpoly::partialStatus *pStat, dpoly *poly, bool inDecay) {
Bit16s tc;
patchCache *tcache = &pcache[pStat->partNum];

if(inDecay) {
if(!pStat->isDecayed) {
if(pStat->envpos[AMPENV] >= pStat->envsize[AMPENV]) pStat->isDecayed = true;
tc = (pStat->envbase[AMPENV] + ((pStat->envdist[AMPENV] * pStat->envpos[AMPENV]) / pStat->envsize[AMPENV]));
} else tc = 0;
} else {
if(pStat->envstat[AMPENV]==4) {
tc = tcache->ampEnv.envlevel[3];
if(tcache->sustain)
StartDecay(AMPENV, tc, pStat, poly);
} else {
if((pStat->envstat[AMPENV]==-1) || (pStat->envpos[AMPENV] >= pStat->envsize[AMPENV])) {
if(pStat->envstat[AMPENV]==-1)
pStat->envbase[AMPENV] = 0;
else pStat->envbase[AMPENV] = tcache->ampEnv.envlevel[pStat->envstat[AMPENV]];

pStat->envstat[AMPENV]++;
pStat->envpos[AMPENV] = 0;

if(pStat->envstat[AMPENV]==3)
pStat->envsize[AMPENV] = (decaytimetable[tcache->ampEnv.envtime[pStat->envstat[AMPENV]]] * fildeptable[tcache->ampEnv.envtkf][poly->freqnum]) >> 8;
else pStat->envsize[AMPENV] = (envtimetable[tcache->ampEnv.envtime[pStat->envstat[AMPENV]]] * fildeptable[tcache->ampEnv.envtkf][poly->freqnum]) >> 8;

//Spot for velocity time follow
//Just a wild guess. This is hard to measure.
pStat->envsize[AMPENV] = ((pStat->envsize[AMPENV] * veltkeytable[tcache->ampEnv.envvkf][poly->vel]) >> 8)+1;
pStat->envdist[AMPENV] = tcache->ampEnv.envlevel[pStat->envstat[AMPENV]] - pStat->envbase[AMPENV];
}
tc = (pStat->envbase[AMPENV] + ((pStat->envdist[AMPENV] * pStat->envpos[AMPENV]) / pStat->envsize[AMPENV]));
}
tc = (tc * (int)tcache->ampEnv.level) >> 7;
}
pStat->prevlevel[AMPENV] = tc;

//Bias level crap stuff now
int bias, max;
for(int bt=0;bt<2;bt++) {
if(tcache->ampblevel[bt]!=0) {
bias = tcache->ampbias[bt];
max = 0;
if(tcache->ampdir[bt]==0) {
// < Bias
if(poly->freqnum < bias) {
max = bias - 33;
bias =- poly->freqnum;
}
} else {
// > Bias
if(poly->freqnum > bias) {
max = 96 - bias;
bias = poly->freqnum - bias;
}
}
if(max!=0) {
bias = (((bias << 8) / max) * tcache->ampblevel[bt]) >> 8;
if(bias>12) bias=12;
Show last 8 lines
                                //LOG_MSG("bias %d freq %d pos %d lev %d dir %d", bias,poly->freqnum,pos,tcache->ampblevel[bt],tcache->ampdir[bt]);
tc = (biastable[bias] * tc) >> 8;
}
}
}
return tc;
}

Reply 11 of 23, by ih8registrations

User metadata
Rank Oldbie
Rank
Oldbie

Called by getSample.

INLINE Bit16s MidiChannel::getFiltEnvelope(Bit16s wg, dpoly::partialStatus *pStat, dpoly *poly, bool inDecay) {
// // unused
// Bit32u sampoff;
// float out, out2;
// float specialfreq,conv;
// int usefreq;
// realfol;
// envCache *myenv = &pcache[pStat->partNum].fEnvCache;

patchCache *tcache = &pcache[pStat->partNum];
float *hist = pStat->history;
int reshigh, filt, cutoff, depth;
int keyfollow = pStat->filtval;
int realfollow = pStat->realval;
int fr = poly->freqnum;
int wf = tcache->waveform;

if(inDecay) {
if(pStat->isDecayed || (pStat->envpos[FILTENV] >= pStat->envsize[FILTENV]))
reshigh = 0;
else reshigh = (pStat->envbase[FILTENV] + ((pStat->envdist[FILTENV] * pStat->envpos[FILTENV]) / pStat->envsize[FILTENV]));
} else {
if(pStat->envstat[FILTENV]==4) {
reshigh = tcache->filtEnv.envlevel[3];
if(!tcache->sustain)
StartDecay(FILTENV, reshigh, pStat, poly);
} else {
if((pStat->envstat[FILTENV]==-1) || (pStat->envpos[FILTENV] >= pStat->envsize[FILTENV])) {
if(pStat->envstat[FILTENV]==-1)
pStat->envbase[FILTENV] = 0;
else pStat->envbase[FILTENV] = tcache->filtEnv.envlevel[pStat->envstat[FILTENV]];

pStat->envstat[FILTENV]++;
pStat->envpos[FILTENV] = 0;

if(pStat->envstat[FILTENV]==3)

if(pStat->envstat[FILTENV]==3)
pStat->envsize[FILTENV] = (decaytimetable[tcache->filtEnv.envtime[pStat->envstat[FILTENV]]] * fildeptable[tcache->filtEnv.envtkf][poly->freqnum]) >> 8;
else pStat->envsize[FILTENV] = (envtimetable[tcache->filtEnv.envtime[pStat->envstat[FILTENV]]] * fildeptable[tcache->filtEnv.envtkf][poly->freqnum]) >> 8;

pStat->envsize[FILTENV]++;
pStat->envdist[FILTENV] = tcache->filtEnv.envlevel[pStat->envstat[FILTENV]] - pStat->envbase[FILTENV];
}
reshigh = (pStat->envbase[FILTENV] + ((pStat->envdist[FILTENV] * pStat->envpos[FILTENV]) / pStat->envsize[FILTENV])); }
pStat->prevlevel[FILTENV] = reshigh;
}
cutoff = (tcache->filtEnv.cutoff);
depth = (tcache->filtEnv.envdepth);

//int sensedep = (depth * 127-tcache->filtEnv.envsense) >> 7;
depth = (depth * filveltable[poly->vel][tcache->filtEnv.envsense]) >> 8;

int max, bias = tcache->tvfbias;
if(bias!=0) {
//LOG_MSG("Cutoff before %d", cutoff);
if(tcache->tvfdir == 0) {
if(fr < bias) {
max = bias;
if(max!=0) {
Show last 55 lines
                                        bias = ((((bias - fr) << 16) / max) * (tcache->tvfblevel))>>16;
cutoff = (cutoff * fbiastable[bias+7]) >> 8;
}
}
} else {
// > Bias
if(fr > bias) {
max = 108-bias;
if(max!=0) {
bias = ((((fr - bias) << 8) / max) * (tcache->tvfblevel))>>8;
cutoff = (cutoff * fbiastable[bias+7]) >> 8;
}
}
}
//LOG_MSG("Cutoff after %d", cutoff);
}
reshigh = (reshigh * depth)>>7;
filt = ((cutoff + reshigh) * keyfollow) / realfollow;
filt = (filt * fildeptable[tcache->tvfdepth][fr]) >> 8;

if(filt>200) filt = 200;
int usefilt = filttable[wf][fr][filt];

/*
if(usefilt==0) {
memset(hist,0,sizeof(hist));
return 0;
}*/

// Lowpass

return (int)iir_filter((float)wg,hist,filtcoeff[usefilt][tcache->filtEnv.resonance]);

/*
int res = tcache->filtEnv.resonance;

float in = (float)wg/32767.0;
float res_lp = (float)(res) / 31.0;
res_lp = res_lp * res_lp;
float cut_lp = usefilt;
float n1, n2, n3, n4, fb_lp,fb_lp2;

n1 = hist[0];
n2 = hist[1];

fb_lp = res_lp+res_lp/(1-cut_lp);
n1=n1+cut_lp*(in-n1+fb_lp*(n1-n2));
n2=n2+cut_lp*(n1-n2);

hist[0] = n1;
hist[1] = n2;

return (int)(n2*32767.0);*/
}

Reply 13 of 23, by ih8registrations

User metadata
Rank Oldbie
Rank
Oldbie

from struct dpoly:

	struct partialStatus {
// Keyfollowed note values
int noteval;

// Keyfollowed filter values
int realval;
int filtval;

Bit32s envpos[4];
Bit32s envstat[4];
Bit32s envbase[4];
Bit32s envdist[4];
Bit32s envsize[4];

Bit32u lfopos;
soundaddr partialOff;
// soundaddr wgOff;

bool decaying[4];
// bool notdecayed[4];
// Bit32u decay[4];
Bit32s prevlevel[4];
bool isDecayed;
bool PCMDone;
float history[32];
// float pastfilt;
bool pitchsustain;

int partNum;
} pStatus[4];

commented out unused variables; saves not copying up to 5k around in getSample.

Save not copying up to another 8k if pStatus were pulled out of dpoly, with dpoly having a pointer to outside pStatus instead.

Last edited by ih8registrations on 2003-07-30, 16:39. Edited 1 time in total.

Reply 14 of 23, by ih8registrations

User metadata
Rank Oldbie
Rank
Oldbie

Surpised I missed this one at first:

			for(t=0;t<4;t++) { 
patchCache *tcache = &pcache[t];
dpoly::partialStatus *partCache = &tmppoly->pStatus[t];
if(isRy) tcache = &drumCache[tmppoly->pcmnum][t];

patchCache is a big structure and the code doing a default assignment of it. If the instrument is a drum, it does another load of this big structure. ugh.

			for(t=0;t<4;t++) { 
patchCache *tcache;
dpoly::partialStatus *partCache = &tmppoly->pStatus[t];

if(isRy) tcache = &drumCache[tmppoly->pcmnum][t]
else tcache = &pcache[t];

This is an optimization for when the if playpartial && isdecayed check doesn't fall through at the cost? of doing the check of isDecayed referencing tmppoly=pStatus[t].

			for(t=0;t<4;t++) { 
patchCache *tcache;

if(isRy) tcache = &drumCache[tmppoly->pcmnum][t]
else tcache = &pcache[t];

if((tcache->playPartial) && (!tmppoly->pStatus[t]->isDecayed)) {
dpoly::partialStatus *partCache = &tmppoly->pStatus[t];

The same could be done for tcache.

			for(t=0;t<4;t++) { 
if((pcache[t]->playPartial || drumCache[tmppoly->pcmnum][t]->playPartial) && (!tmppoly->pStatus[t]->isDecayed)) {
patchCache *tcache;
dpoly::partialStatus *partCache = &tmppoly->pStatus[t];

if(isRy) tcache = &drumCache[tmppoly->pcmnum][t]
else tcache = &pcache[t];

Last edited by ih8registrations on 2003-07-30, 18:21. Edited 1 time in total.

Reply 15 of 23, by canadacow

User metadata
Rank Member
Rank
Member

This doesn't really work as an optimization. Look again at this code:

			for(t=0;t<4;t++) { 
patchCache *tcache = &pcache[t];
dpoly::partialStatus *partCache = &tmppoly->pStatus[t];
if(isRy) tcache = &drumCache[tmppoly->pcmnum][t];

There are no memory moves here. tcache and partCache are pointer variables, not the actual structures in memory. As such, no memory is copied. The structures could be 1 byte in size or 256MB in size, and this code would execute equally as fast. If I used actual structure variables rather than pointer variables, such a consideration would be an optimization. But again, these are pointers.

Have you read Michael Abrash's Zen of Code Optimization? In it, he goes through the ways one could "count cycles" and so forth. His ultimate conclusion though is that counting cycles can only go so far. The best form of optimzation that Abrash suggests is complete reinnovation and rethinking of the algoritmn. A good example was the change from the envelope caches to the evelope timer in my code. Not only was it more precise, its also a good deal faster. This is the kind of code optimization I'm looking for. If there is a faster, more precise way of lowpass filtering that matches the MT-32's output, that's what I need. I need an efficient reverb algorithm. I feel that I could better generate pulse width modified squarewaves without combining two bandlimted square waves. These are the places where the greatest speed benifit will be seen.

Reply 16 of 23, by ih8registrations

User metadata
Rank Oldbie
Rank
Oldbie

Well, that would explain not seeing it before:)

btw, then I would assume the tcache variable is only for code readability since it's not doing a copy & it's pointing to the same address as pcache.

Aaand, yes, know all about algorthim vs cycle count. Speaking of focusing the main problem areas, have you been profiling your code?

Last edited by ih8registrations on 2003-07-30, 19:51. Edited 1 time in total.

Reply 17 of 23, by canadacow

User metadata
Rank Member
Rank
Member

I have profiled the code but I've found that getting reliable, clear results is very diffcult. This is because the music varies in its demand on certain parts of the emulator. PCM samples are easier to play than the analogue synthesis. Likewise, sawtooths are easier to synthesize than square waves. As such, music that's biased in one of these areas will skew results.

Reply 18 of 23, by ih8registrations

User metadata
Rank Oldbie
Rank
Oldbie

Sounds like individual test cases are needed for each code path. To do that one way that comes to mind is to use a midi sequencer. The midi sequencers I've ever played with allowed you to turn off channels. Combine it with one or a few midi files that use the various types; pcm, synth, drums, xyz effects and you'll have playback that isolates them.

Last edited by ih8registrations on 2003-07-30, 22:24. Edited 1 time in total.