VOGONS


Dynamic core optimization

Topic actions

Reply 20 of 26, by awgamer

User metadata
Rank Oldbie
Rank
Oldbie

Thar she blows, 64 bit decoder.

For some reason this doesn't work:
// cache_addq(0xc48300000000e851|((Bit64u)(&mem_readb_checked_dcx86) - (Bit64u)cache.pos-4)<<16);
Should by my understanding and gets past gcc without complaint but dosbox crashes out. Saving ((Bit64u)(&mem_readb_checked_dcx86) - (Bit64u)cache.pos-4) to a temp var then passing the temp var doesn't work either. Everything else is happy being passed to the 64 bit cache_adds, but not this guy. Other than that, everything works, I imagine speeds things up for some cases but seems some swapping to 64 bit slows things down. I blew through this writing it up without checking so to find out have to go through change by change to verify who gives a speed improvement.

Attachments

  • Filename
    m4.diff
    File size
    165.87 KiB
    Downloads
    262 downloads
    File license
    Fair use/fair dealing exception

Reply 22 of 26, by awgamer

User metadata
Rank Oldbie
Rank
Oldbie

I tried u but no change, so I did some logging:

Bit32u tmp=((Bit32u)&mem_readb_checked_dcx86) - (Bit32u)cache.pos-4;
LOG_MSG("32bit tmp = %x",tmp);
Bit64u tmp2=0xc48300000000e851|(Bit64u)tmp<<16;
LOG_MSG("64bit tmp2 = %llx",tmp2);
cache_addq(0xc48300000000e851|(Bit64u)tmp<<16);

INLINE void cache_addq(Bit64u val) {
LOG_MSG("64bit val = %llx",val);
*(Bit64u*)cache.pos=val;
cache.pos+=8;
}

That outputted:

32bit tmp = f63925d6
64bit tmp2 = c483f63925d6e851
64bit val = c483f63925d6e851

ie working as it should, value as it should be, except.. it then crashes. It's using up the same cache space as the non 64bit adds and updating to the same cache pointer and its input value is the same and the 64 bit moves work when not processing &mem_readb_checked_dcx86) - (Bit32u)cache.pos-4 like input, even though, again, after processing it's the same result. Head scratcher. edit: for good measure I logged *(Bit64u*)cache.pos, matches val.

Reply 23 of 26, by M-HT

User metadata
Rank Newbie
Rank
Newbie
awgamer wrote:
Thar she blows, 64 bit decoder. […]
Show full quote

Thar she blows, 64 bit decoder.

For some reason this doesn't work:
// cache_addq(0xc48300000000e851|((Bit64u)(&mem_readb_checked_dcx86) - (Bit64u)cache.pos-4)<<16);
Should by my understanding and gets past gcc without complaint but dosbox crashes out. Saving ((Bit64u)(&mem_readb_checked_dcx86) - (Bit64u)cache.pos-4) to a temp var then passing the temp var doesn't work either. Everything else is happy being passed to the 64 bit cache_adds, but not this guy. Other than that, everything works, I imagine speeds things up for some cases but seems some swapping to 64 bit slows things down. I blew through this writing it up without checking so to find out have to go through change by change to verify who gives a speed improvement.

The problem is cache.pos. When you use "cache_addd((Bit64u)(&mem_readb_checked_dcx86) - (Bit64u)cache.pos-4);", then cache.pos refers to the position after byte 0xe8. When you use cache_addq then cache.pos refers to another position - you need to change "-4" to something else.

Reply 24 of 26, by awgamer

User metadata
Rank Oldbie
Rank
Oldbie

That's it. With cache_addd adds 4 and subs 4 to set back to there, addq adds 8 from a word before that spot so needs -6 to set back to that point. To get everything on one line, didn't like Bit64u for the math so Bit32u that then Bit64u for the shift.
cache_addq(0xc48300000000e851|(Bit64u)((Bit32u)&mem_readb_checked_dcx86 - (Bit32u)cache.pos-6)<<16);

Next use needs a mov -8 though I thought it'd need to stay at -4 since it was added to the back end of the addq so I'm still not correctly following along fully, little/big endian business? anyway, it's running. doom bench still seems slower to me so with testing changes would be dropped to best fit speed but at least we can go this route if it suits.

Attachments

  • Filename
    decoder.diff
    File size
    27.24 KiB
    Downloads
    280 downloads
    File license
    Fair use/fair dealing exception

Reply 25 of 26, by awgamer

User metadata
Rank Oldbie
Rank
Oldbie

Incorporated some of the gen create functions allowing to merge more cache_adds.

Attachments

  • Filename
    decoder2.diff
    File size
    31.11 KiB
    Downloads
    308 downloads
    File license
    Fair use/fair dealing exception

Reply 26 of 26, by awgamer

User metadata
Rank Oldbie
Rank
Oldbie

Side note, at the end of CreateCacheBlock there's a conditional to add code for debugging but..
..
goto finish_block;
#if (C_DEBUG)
dyn_set_eip_last();
dyn_reduce_cycles();
dyn_save_critical_regs();
gen_return(BR_OpcodeFull);
dyn_closeblock();
goto finish_block;
#endif
finish_block:
..

How is this ever going to be executed? prior goto always skips over and no jmp label entry. Either this is an oops or there's some magic coding fu that I'm unaware.