Dynamic core optimization

Developer's Forum, for discussion of bugs, code, and other developmental aspects of DOSBox.

Re: Dynamic core optimization

Postby awgamer » 2018-8-19 @ 23:38

Thar she blows, 64 bit decoder.

For some reason this doesn't work:
// cache_addq(0xc48300000000e851|((Bit64u)(&mem_readb_checked_dcx86) - (Bit64u)cache.pos-4)<<16);
Should by my understanding and gets past gcc without complaint but dosbox crashes out. Saving ((Bit64u)(&mem_readb_checked_dcx86) - (Bit64u)cache.pos-4) to a temp var then passing the temp var doesn't work either. Everything else is happy being passed to the 64 bit cache_adds, but not this guy. Other than that, everything works, I imagine speeds things up for some cases but seems some swapping to 64 bit slows things down. I blew through this writing it up without checking so to find out have to go through change by change to verify who gives a speed improvement.
Attachments
m4.diff
(165.87 KiB) Downloaded 37 times
awgamer
Oldbie
 
Posts: 567
Joined: 2014-7-26 @ 07:42

Re: Dynamic core optimization

Postby Qbix » 2018-8-21 @ 09:47

wouldn't long numbers need a specific suffix ?
Water flows down the stream
How to ask questions the smart way!
User avatar
Qbix
DOSBox Author
 
Posts: 10893
Joined: 2002-11-27 @ 14:50
Location: Fryslan

Re: Dynamic core optimization

Postby awgamer » 2018-8-21 @ 11:46

I tried u but no change, so I did some logging:

Bit32u tmp=((Bit32u)&mem_readb_checked_dcx86) - (Bit32u)cache.pos-4;
LOG_MSG("32bit tmp = %x",tmp);
Bit64u tmp2=0xc48300000000e851|(Bit64u)tmp<<16;
LOG_MSG("64bit tmp2 = %llx",tmp2);
cache_addq(0xc48300000000e851|(Bit64u)tmp<<16);

INLINE void cache_addq(Bit64u val) {
LOG_MSG("64bit val = %llx",val);
*(Bit64u*)cache.pos=val;
cache.pos+=8;
}

That outputted:

32bit tmp = f63925d6
64bit tmp2 = c483f63925d6e851
64bit val = c483f63925d6e851

ie working as it should, value as it should be, except.. it then crashes. It's using up the same cache space as the non 64bit adds and updating to the same cache pointer and its input value is the same and the 64 bit moves work when not processing &mem_readb_checked_dcx86) - (Bit32u)cache.pos-4 like input, even though, again, after processing it's the same result. Head scratcher. edit: for good measure I logged *(Bit64u*)cache.pos, matches val.
awgamer
Oldbie
 
Posts: 567
Joined: 2014-7-26 @ 07:42

Re: Dynamic core optimization

Postby M-HT » 2018-8-21 @ 14:09

awgamer wrote:Thar she blows, 64 bit decoder.

For some reason this doesn't work:
// cache_addq(0xc48300000000e851|((Bit64u)(&mem_readb_checked_dcx86) - (Bit64u)cache.pos-4)<<16);
Should by my understanding and gets past gcc without complaint but dosbox crashes out. Saving ((Bit64u)(&mem_readb_checked_dcx86) - (Bit64u)cache.pos-4) to a temp var then passing the temp var doesn't work either. Everything else is happy being passed to the 64 bit cache_adds, but not this guy. Other than that, everything works, I imagine speeds things up for some cases but seems some swapping to 64 bit slows things down. I blew through this writing it up without checking so to find out have to go through change by change to verify who gives a speed improvement.

The problem is cache.pos. When you use "cache_addd((Bit64u)(&mem_readb_checked_dcx86) - (Bit64u)cache.pos-4);", then cache.pos refers to the position after byte 0xe8. When you use cache_addq then cache.pos refers to another position - you need to change "-4" to something else.
M-HT
Newbie
 
Posts: 67
Joined: 2008-9-01 @ 12:55
Location: Bratislava

Re: Dynamic core optimization

Postby awgamer » 2018-8-21 @ 16:41

That's it. With cache_addd adds 4 and subs 4 to set back to there, addq adds 8 from a word before that spot so needs -6 to set back to that point. To get everything on one line, didn't like Bit64u for the math so Bit32u that then Bit64u for the shift.
cache_addq(0xc48300000000e851|(Bit64u)((Bit32u)&mem_readb_checked_dcx86 - (Bit32u)cache.pos-6)<<16);

Next use needs a mov -8 though I thought it'd need to stay at -4 since it was added to the back end of the addq so I'm still not correctly following along fully, little/big endian business? anyway, it's running. doom bench still seems slower to me so with testing changes would be dropped to best fit speed but at least we can go this route if it suits.
Attachments
decoder.diff
(27.24 KiB) Downloaded 40 times
awgamer
Oldbie
 
Posts: 567
Joined: 2014-7-26 @ 07:42

Re: Dynamic core optimization

Postby awgamer » 2018-8-21 @ 21:44

Incorporated some of the gen create functions allowing to merge more cache_adds.
Attachments
decoder2.diff
(31.11 KiB) Downloaded 39 times
awgamer
Oldbie
 
Posts: 567
Joined: 2014-7-26 @ 07:42

Re: Dynamic core optimization

Postby awgamer » 2018-8-23 @ 16:30

Side note, at the end of CreateCacheBlock there's a conditional to add code for debugging but..
..
goto finish_block;
#if (C_DEBUG)
dyn_set_eip_last();
dyn_reduce_cycles();
dyn_save_critical_regs();
gen_return(BR_OpcodeFull);
dyn_closeblock();
goto finish_block;
#endif
finish_block:
..

How is this ever going to be executed? prior goto always skips over and no jmp label entry. Either this is an oops or there's some magic coding fu that I'm unaware.
awgamer
Oldbie
 
Posts: 567
Joined: 2014-7-26 @ 07:42

Previous

Return to DOSBox Development

Who is online

Users browsing this forum: No registered users and 2 guests