Dynamic core optimization

Developer's Forum, for discussion of bugs, code, and other developmental aspects of DOSBox.

Dynamic core optimization

Postby awgamer » 2018-3-29 @ 01:19

I'm still blocked with the issue. Anyway, the point was to do some optimizations to the dynamic recompilation(risc_x86.h,) less bloated in cache and fewer instructions processed, see attached. Optimizations here would diminish cpu spikes, smooth things out, and help in possible thrashing corner cases. There are some spots in decoder.h that could be tightened up as well in the same way like the dyn_read/write_x, which get touched a lot.
Attachments
dosboxcodoptimization.txt
(9.45 KiB) Downloaded 62 times
awgamer
Member
 
Posts: 481
Joined: 2014-7-26 @ 07:42

Re: DOSBox Compilation Guides

Postby Qbix » 2018-3-29 @ 13:51

Interesting, I would have assumed that the compiler did some of these on its own (given the inline and the greater picture)
but I checked the dynrec core x64 and noticed that for smaller functions it is "smart", but for the more complex things (gen_function_raw and such, which is inlined itself), it really starts doing one byte at the time and increase the pointer through a move, increase, move back operation.
Water flows down the stream
How to ask questions the smart way!
User avatar
Qbix
DOSBox Author
 
Posts: 10698
Joined: 2002-11-27 @ 14:50
Location: Fryslan

Re: DOSBox Compilation Guides

Postby awgamer » 2018-3-29 @ 14:34

Yeah, my intent was/is to do the gcc option of spitting out its assembly step to see the difference or not in the code it generates, know for sure at that point. Sounds like this is how you checked?
awgamer
Member
 
Posts: 481
Joined: 2014-7-26 @ 07:42

Re: DOSBox Compilation Guides

Postby Qbix » 2018-3-29 @ 15:31

yeah, I used
Code: Select all
objdump -Mintel -dS core_dynrec.o > test.asm


But it is a bit messy to read due to the optimized code.
I could have used that gcc option to output it directly, but this is easier given that the object files are in my tree normally
Water flows down the stream
How to ask questions the smart way!
User avatar
Qbix
DOSBox Author
 
Posts: 10698
Joined: 2002-11-27 @ 14:50
Location: Fryslan

Re: DOSBox Compilation Guides

Postby awgamer » 2018-3-29 @ 16:19

Yeah, gcc asm output is cryptic but a before and after compare is enough, mostly, for me to follow along. Hopefully I'll work out this annoying permissions issue to play with this myself. Speaking of cryptic, I find some of the changes I did more readable/less spaghetti, shorter than the original, but maybe that's just me:)
awgamer
Member
 
Posts: 481
Joined: 2014-7-26 @ 07:42

Re: DOSBox Compilation Guides

Postby Qbix » 2018-3-29 @ 16:41

It's easier to read with the -Mintel, but the interlinked source (the S) is sometimes a bit off.
Water flows down the stream
How to ask questions the smart way!
User avatar
Qbix
DOSBox Author
 
Posts: 10698
Joined: 2002-11-27 @ 14:50
Location: Fryslan

Re: DOSBox Compilation Guides

Postby awgamer » 2018-3-29 @ 17:24

Another tweak, can pull "if (!dsr2 && (ddr==dsr1) && !imm_size) return;" into the "if (!imm && (gsr1->index!=0x5))" path, no need to do the check for imm_size 1 & 4.

Code: Select all
static void gen_lea(DynReg * ddr,DynReg * dsr1,DynReg * dsr2,Bitu scale,Bits imm) {
   GenReg * gdr=FindDynReg(ddr);
   Bitu imm_size;
   Bit8u rm_base=(gdr->index << 3);
   Bit8u index;
   if (dsr1) {
      GenReg * gsr1=FindDynReg(dsr1);
      if (!imm && (gsr1->index!=0x5)) {
         if (!dsr2 && (ddr==dsr1)) return;      
         imm_size=0;   rm_base+=0x0;         //no imm            
      } else if ((imm>=-128 && imm<=127)) {
         imm_size=1;rm_base+=0x40;         //Signed byte imm
      } else {
         imm_size=4;rm_base+=0x80;         //Signed dword imm
      }   
      index=gsr1->index;   
   } else {
     imm_size=4;
     index=5;
   }   
   if (dsr2) {
      GenReg * gsr2=FindDynReg(dsr2);         
      cache_addw(0x8d|(rm_base+0x4)<<8);   //0x8d=LEA | The sib indicator
      Bit8u sib=(index+(gsr2->index<<3)+(scale<<6)); 
      cache_addb(sib);         
   } else {         
      cache_addw(0x8d|(rm_base+index)<<8);   //LEA | dword imm         
   }   
   switch (imm_size) {
   case 0:   break;
   case 1:cache_addb(imm);break;
   case 4:cache_addd(imm);break;
   }
   ddr->flags|=DYNFLG_CHANGED;
}
awgamer
Member
 
Posts: 481
Joined: 2014-7-26 @ 07:42

Re: DOSBox Compilation Guides

Postby awgamer » 2018-3-29 @ 21:30

cinched up decoder.h
Attachments
decoderoptimization.txt
(12.92 KiB) Downloaded 46 times
awgamer
Member
 
Posts: 481
Joined: 2014-7-26 @ 07:42

Re: Dynamic core optimization

Postby awgamer » 2018-8-08 @ 15:12

I can compile now, took a reinstall, w7 was borked. Tweaks work, /w a touch up here and there, negligible performance change, though the binary is a K smaller and saved ~100k on mem usage(varies, just tracking /w task manager,) which I've been trading for inlining xyz. Need to get asm output going(objdump isn't working for me currently) and profiling to see what's going on.
awgamer
Member
 
Posts: 481
Joined: 2014-7-26 @ 07:42

Re: Dynamic core optimization

Postby Qbix » 2018-8-08 @ 15:17

I'll be interested what you come up with.
I did something similar as you did for the dynrec core and it got a lot smaller indeed, but noticed no performance changes (which isn't too surprising as the asm that dosbox executes is unchanged)
Water flows down the stream
How to ask questions the smart way!
User avatar
Qbix
DOSBox Author
 
Posts: 10698
Joined: 2002-11-27 @ 14:50
Location: Fryslan

Re: Dynamic core optimization

Postby awgamer » 2018-8-08 @ 15:26

Refresh my memory on invoking diff to output the correct format and I can upload what I have now, warts and all. I used the svn tar.gz from here: https://www.dosbox.com/wiki/Building_DOSBox_with_MinGW
awgamer
Member
 
Posts: 481
Joined: 2014-7-26 @ 07:42

Re: Dynamic core optimization

Postby Qbix » 2018-8-08 @ 15:49

extract the source a second time (folder name dosbox-org)
and then run in the folder that contains both yoursource and the original source
Code: Select all
diff -u dosbox-org/src/cpu/core_dynamic/decoder.h yoursource/src/cpu/core_dynamic/decoder.h > mypatch.txt
Water flows down the stream
How to ask questions the smart way!
User avatar
Qbix
DOSBox Author
 
Posts: 10698
Joined: 2002-11-27 @ 14:50
Location: Fryslan

Re: Dynamic core optimization

Postby awgamer » 2018-8-08 @ 16:13

Changes have been more than to just decoder.h, but confined to core_dyn_x86 dir.
Attachments
mypatch.txt
(40.51 KiB) Downloaded 8 times
awgamer
Member
 
Posts: 481
Joined: 2014-7-26 @ 07:42

Re: Dynamic core optimization

Postby awgamer » 2018-8-08 @ 17:30

did another tightening to the guys in decoder with these:

cache_addd(0x52|(0x50+genreg->index)<<8|0xe850<<16);
to
cache_addd(0xe8505052+(genreg->index<<8));

getting rid of two ors and a shift. chris's 3d bench liked it, it seems, 1001 vs 957. error of margin? like I said, I need to get asm output and profiling going.
awgamer
Member
 
Posts: 481
Joined: 2014-7-26 @ 07:42

Re: Dynamic core optimization

Postby awgamer » 2018-8-12 @ 19:55

Reduced the binary by 8k now, mostly from changing ifs & switches /w repetitive function/method calls with local vars and calling once. Applied the optimized bound checking from the mixer to the mouse handler. The optimization in gen_call_function I had commented out working now.
Attachments
my.diff
(117.62 KiB) Downloaded 5 times
awgamer
Member
 
Posts: 481
Joined: 2014-7-26 @ 07:42

Re: Dynamic core optimization

Postby awgamer » 2018-8-14 @ 07:09

From the asm gcc is generating, they are faster, not just a size reduction of the binary. so yeah, fewer cache_addx is betta.
Code: Select all
static void gen_return(BlockReturn retcode) {
   gen_protectflags();
   if (retcode==0) {
      cache_addd(0xc3c03359);         //POP ECX, the flags
//     cache_addw(0xc033);      //MOV EAX, 0
//       cache_addb(0xc3);         //RET
   } else {
      cache_addw(0xb859);         //POP ECX, the flags
//      cache_addb(0xb8);      //MOV EAX, retcode
      cache_addd(retcode);
      cache_addb(0xc3);         //RET
   }
}

new:
__ZL10gen_return11BlockReturn.part.5:
  movl   __ZL5cache+16, %eax
   movl   $-1010814119, (%eax)   
   addl   $4, %eax
   movl   %eax, __ZL5cache+16
   ret
__ZL10gen_return11BlockReturn:   
   subl   $4, %esp
   cmpb   $0, __ZL6x86gen
   jne   L247
L244:  // -1xmov,2xlea
   testl   %eax, %eax
   je   L248
   movl   __ZL5cache+16, %edx
   movl   $-18343, %ecx
   movl   %eax, 2(%edx)
   leal   7(%edx), %eax
   movw   %cx, (%edx)
   movl   %eax, __ZL5cache+16
   movb   $-61, 6(%edx)
   addl   $4, %esp
   ret
L248:   // +1xadd,1xjmp, -2xmov,1xlea,1xadd
   addl   $4, %esp
   jmp   __ZL10gen_return11BlockReturn.part.5
L247:   
  movl   %eax, (%esp)
   call   __ZL16gen_protectflagsv.part.2
   movl   (%esp), %eax
   jmp   L244

old:
__ZL10gen_return11BlockReturn:
   .cfi_startproc
   subl   $4, %esp
   cmpb   $0, __ZL6x86gen
   jne   L266
L262:   
   movl   __ZL5cache+16, %edx
   testl   %eax, %eax
   leal   1(%edx), %ecx
   movl   %ecx, __ZL5cache+16
   movb   $89, (%edx)
   je   L267
   movl   __ZL5cache+16, %edx
   leal   1(%edx), %ecx
   movl   %ecx, __ZL5cache+16
   movb   $-72, (%edx)
   movl   __ZL5cache+16, %edx
   movl   %eax, (%edx)
   leal   4(%edx), %eax
   leal   1(%eax), %edx
   movl   %edx, __ZL5cache+16
   movb   $-61, (%eax)
   addl   $4, %esp
   ret
L267:   
   movl   __ZL5cache+16, %eax
   movl   $-16333, %edx
   movw   %dx, (%eax)
   addl   $2, %eax
   leal   1(%eax), %edx
   movl   %edx, __ZL5cache+16
   movb   $-61, (%eax)
   addl   $4, %esp
   ret
L266:   
   movl   %eax, (%esp)
   call   __ZL16gen_protectflagsv.part.2
   movl   (%esp), %eax
   jmp   L262
awgamer
Member
 
Posts: 481
Joined: 2014-7-26 @ 07:42

Re: Dynamic core optimization

Postby awgamer » 2018-8-15 @ 18:00

10k now.
Attachments
m2.diff
(159.74 KiB) Downloaded 3 times
awgamer
Member
 
Posts: 481
Joined: 2014-7-26 @ 07:42

Re: Dynamic core optimization

Postby awgamer » 2018-8-19 @ 13:38

Added a cache_add3, which adds three bytes to the cache doing a dword move and inc the pointer by 3, replacing the addb + addw for those cases.
Attachments
m3.diff
(157.91 KiB) Downloaded 5 times
awgamer
Member
 
Posts: 481
Joined: 2014-7-26 @ 07:42

Re: Dynamic core optimization

Postby Qbix » 2018-8-19 @ 14:07

I did that in my own tree as well. Guess we had similar thoughts :)
Water flows down the stream
How to ask questions the smart way!
User avatar
Qbix
DOSBox Author
 
Posts: 10698
Joined: 2002-11-27 @ 14:50
Location: Fryslan

Re: Dynamic core optimization

Postby awgamer » 2018-8-19 @ 14:34

Oh yeah? Well I've got cache_addq & cache_add7 working preliminarily and currently implementing the optimizations :happy:

edit: Well, they work, but it seems like doom bench is getting slower as I add 64 bit moves.
Last edited by awgamer on 2018-8-19 @ 23:59, edited 2 times in total.
awgamer
Member
 
Posts: 481
Joined: 2014-7-26 @ 07:42

Next

Return to DOSBox Development

Who is online

Users browsing this forum: No registered users and 1 guest