Dynamic core optimization

Reply 20 of 26, by awgamer

Posted on 2018-08-19, 23:38

awgamer Offline

Rank Oldbie

Rank: Oldbie
Posts: 808
Joined: 2014-07-26, 07:42

Thar she blows, 64 bit decoder.

For some reason this doesn't work:
// cache_addq(0xc48300000000e851|((Bit64u)(&mem_readb_checked_dcx86) - (Bit64u)cache.pos-4)<<16);
Should by my understanding and gets past gcc without complaint but dosbox crashes out. Saving ((Bit64u)(&mem_readb_checked_dcx86) - (Bit64u)cache.pos-4) to a temp var then passing the temp var doesn't work either. Everything else is happy being passed to the 64 bit cache_adds, but not this guy. Other than that, everything works, I imagine speeds things up for some cases but seems some swapping to 64 bit slows things down. I blew through this writing it up without checking so to find out have to go through change by change to verify who gives a speed improvement.

Reply 21 of 26, by Qbix

Posted on 2018-08-21, 09:47

Qbix Offline

Rank DOSBox Author

Rank: DOSBox Author
Posts: 11323
Joined: 2002-11-27, 14:50
Location: Fryslan

wouldn't long numbers need a specific suffix ?

Water flows down the stream
How to ask questions the smart way!

Reply 22 of 26, by awgamer

Posted on 2018-08-21, 11:46

awgamer Offline

Rank Oldbie

Rank: Oldbie
Posts: 808
Joined: 2014-07-26, 07:42

I tried u but no change, so I did some logging:

Bit32u tmp=((Bit32u)&mem_readb_checked_dcx86) - (Bit32u)cache.pos-4;
LOG_MSG("32bit tmp = %x",tmp);
Bit64u tmp2=0xc48300000000e851|(Bit64u)tmp<<16;
LOG_MSG("64bit tmp2 = %llx",tmp2);
cache_addq(0xc48300000000e851|(Bit64u)tmp<<16);

INLINE void cache_addq(Bit64u val) {
LOG_MSG("64bit val = %llx",val);
*(Bit64u*)cache.pos=val;
cache.pos+=8;
}

That outputted:

32bit tmp = f63925d6
64bit tmp2 = c483f63925d6e851
64bit val = c483f63925d6e851

ie working as it should, value as it should be, except.. it then crashes. It's using up the same cache space as the non 64bit adds and updating to the same cache pointer and its input value is the same and the 64 bit moves work when not processing &mem_readb_checked_dcx86) - (Bit32u)cache.pos-4 like input, even though, again, after processing it's the same result. Head scratcher. edit: for good measure I logged *(Bit64u*)cache.pos, matches val.

Reply 23 of 26, by M-HT

Posted on 2018-08-21, 14:09

M-HT Offline

Rank Member

Rank: Member
Posts: 102
Joined: 2008-09-01, 12:55
Location: Bratislava

awgamer wrote:
Thar she blows, 64 bit decoder. […]
Show full quote
Thar she blows, 64 bit decoder.

For some reason this doesn't work:
// cache_addq(0xc48300000000e851|((Bit64u)(&mem_readb_checked_dcx86) - (Bit64u)cache.pos-4)<<16);
Should by my understanding and gets past gcc without complaint but dosbox crashes out. Saving ((Bit64u)(&mem_readb_checked_dcx86) - (Bit64u)cache.pos-4) to a temp var then passing the temp var doesn't work either. Everything else is happy being passed to the 64 bit cache_adds, but not this guy. Other than that, everything works, I imagine speeds things up for some cases but seems some swapping to 64 bit slows things down. I blew through this writing it up without checking so to find out have to go through change by change to verify who gives a speed improvement.

The problem is cache.pos. When you use "cache_addd((Bit64u)(&mem_readb_checked_dcx86) - (Bit64u)cache.pos-4);", then cache.pos refers to the position after byte 0xe8. When you use cache_addq then cache.pos refers to another position - you need to change "-4" to something else.

Reply 24 of 26, by awgamer

Posted on 2018-08-21, 16:41

awgamer Offline

Rank Oldbie

Rank: Oldbie
Posts: 808
Joined: 2014-07-26, 07:42

That's it. With cache_addd adds 4 and subs 4 to set back to there, addq adds 8 from a word before that spot so needs -6 to set back to that point. To get everything on one line, didn't like Bit64u for the math so Bit32u that then Bit64u for the shift.
cache_addq(0xc48300000000e851|(Bit64u)((Bit32u)&mem_readb_checked_dcx86 - (Bit32u)cache.pos-6)<<16);

Next use needs a mov -8 though I thought it'd need to stay at -4 since it was added to the back end of the addq so I'm still not correctly following along fully, little/big endian business? anyway, it's running. doom bench still seems slower to me so with testing changes would be dropped to best fit speed but at least we can go this route if it suits.

Reply 25 of 26, by awgamer

Posted on 2018-08-21, 21:44

awgamer Offline

Rank Oldbie

Rank: Oldbie
Posts: 808
Joined: 2014-07-26, 07:42

Incorporated some of the gen create functions allowing to merge more cache_adds.

Reply 26 of 26, by awgamer

Posted on 2018-08-23, 16:30

awgamer Offline

Rank Oldbie

Rank: Oldbie
Posts: 808
Joined: 2014-07-26, 07:42

Side note, at the end of CreateCacheBlock there's a conditional to add code for debugging but..
..
goto finish_block;
#if (C_DEBUG)
dyn_set_eip_last();
dyn_reduce_cycles();
dyn_save_critical_regs();
gen_return(BR_OpcodeFull);
dyn_closeblock();
goto finish_block;
#endif
finish_block:
..

How is this ever going to be executed? prior goto always skips over and no jmp label entry. Either this is an oops or there's some magic coding fu that I'm unaware.

Main menu

Topic actions

Reply 20 of 26, by awgamer

Reply 21 of 26, by Qbix

Reply 22 of 26, by awgamer

Reply 23 of 26, by M-HT

Reply 24 of 26, by awgamer

Reply 25 of 26, by awgamer

Reply 26 of 26, by awgamer