VOGONS


First post, by KPAH

User metadata
Rank Newbie
Rank
Newbie

Greetings everybody!

Recently i ran dosbox on my PSP. Well, i was really impressed! And now i have intention to improve x86 emulation performance at least on PSP. Hope, i have some time to spend too... Have C, C++ and x86 assembler experience (namely - use them by occupation last 8 years).

All i need is some dynamic core details. There are some ideas in my head - the question is are they worth.

1) most important: is dynamic core available/used on PSP build ?

2) As far as i understand dynamic core takes a part of emulated x86 code, and "compiles" it to function in native code (x86, ARM, MIPS etc). Dosbox simply runs native code function instead of translating emulated code, right?

3) how do you decide which part to "compile"? Is there some barriers like memory accesses, I\O operations, jumps etc?

4) do you just "compile" all suitable code parts or automatically detect hotspots and process them only? If automatically, how?

5) any self-modified code issues?

6) what kind of assembly code is generated? Load-execute-store for each x86 instruction or at least store/load optimized? I am talking about code like this:
------------------ initial x86 code
add ax, bx
add ax, cx
------------------ load-execute-store
mov r1, [RAX]
add r1, [RBX]
mov [RAX], r1
mov r1, [RAX]
add r1, [RCX]
mov [RAX], r1
------------------ load-execute-store with store/load optimization
mov r1, [RAX]
add r1, [RBX]
add r1, [RCX]
mov [RAX], r1
------------------
Other rather simple to implement optimization methods are possible. Up to eliminating writing register values temporary to memory - like virtually expanding x86 registers quantity.

At this moment i see it is rather simple to implement "compilation" to native code almost for any platform such parts of x86 code which have no: memory accesses (or they are but no paging etc), jumps, port I/O, floating point operations.

Thanks in advance

Reply 1 of 11, by MiniMax

User metadata
Rank Moderator
Rank
Moderator

Review this thread:

ARM (Thumb) Dynamic Core Code

After that, try contacting some of the people there. I hope they know everything you want to know.

DOSBox 60 seconds guide | How to ask questions
_________________
Lenovo M58p | Core 2 Quad Q8400 @ 2.66 GHz | Radeon R7 240 | LG HL-DT-ST DVDRAM GH40N | Fedora 32

Reply 2 of 11, by wd

User metadata
Rank DOSBox Author
Rank
DOSBox Author

1) most important: is dynamic core available/used on PSP build ?

Yes (mips32le backend).

Dosbox simply runs native code function instead of translating emulated code, right?

Yes, when first encountering new code it is recompiled into native code
(using function calls for the core of several instructions) which is run afterwards.

3) how do you decide which part to "compile"?

All code is recompiled, if unhandled instructions are found (mostly very complex
opcodes) the block is closed and the non-recompiling core handles the instruction.

Is there some barriers like memory accesses, I\O operations, jumps etc?

Depends on what you mean by barriers, all memory access is handled by
special functions (mainly as pmode memory access can be very complex),
jumps finish a basic block (conditional/unconditional jumps lead to basic
block chaninig).

5) any self-modified code issues?

Page handlers detect modification of blocks and invalidate them, writes into
the same block are a bit more tricky (needs almost-immediate exit of the
running block), and speed issues due to smc are a lengthy topic, too.

Check crazyc's patch (forum thread) at
http://forums.ps2dev.org/viewtopic.php?t=9564
as well as the dosbox sources in /cpu/dynrec/

Reply 3 of 11, by gulikoza

User metadata
Rank Oldbie
Rank
Oldbie

Dynrec core is the portable one. Dynamic core is too x86 specific, so dynrec was written by wd to make dynamic recompilation more portable. It's near dynamic core speed (about 70% on x86 I believe).

All you need to do is write a suitable backend for PSP arch (risc_XXX.h files). Sounds easy 😀
Most of the stuff (decoding, self-modified code issues) are handled by C routines, so you just need to write the risc_xxx.h file which writes the appropriate opcodes. See the X86 file for example, it should be quite easy to follow if you know x86 asm.

Doesn't PSP use MIPS architecture? MIPS32 (little endian) backend is already written....

http://www.si-gamer.net/gulikoza

Reply 5 of 11, by KPAH

User metadata
Rank Newbie
Rank
Newbie

wd

Jumps are the "barriers" i mentioned...

all memory access is handled by special functions (mainly as pmode memory access can be very complex)

Ok, i see. But is real-mode code also do memory accesses through special functions?

Looked through ARM code before creating this topic - found a header with a lot of functions doing platform-specific opcodes. What role does dynrec play in "compilation" process? Does dynrec do some optimization or just slices code on parts, decodes opcodes and sends decoded opcodes to platform-specific backend module?

gulikoza

Doesn't PSP use MIPS architecture? MIPS32 (little endian) backend is already written....

Yes, PSP is MIPS R4000 32bit based.

PS. By the way, where can i get Dosbox sources?

Reply 6 of 11, by MiniMax

User metadata
Rank Moderator
Rank
Moderator

http://dosbox.linuxsecured.net/dosboxcvs.tgz
DOSBox SVN Builds

DOSBox 60 seconds guide | How to ask questions
_________________
Lenovo M58p | Core 2 Quad Q8400 @ 2.66 GHz | Radeon R7 240 | LG HL-DT-ST DVDRAM GH40N | Fedora 32

Reply 7 of 11, by wd

User metadata
Rank DOSBox Author
Rank
DOSBox Author

But is real-mode code also do memory accesses through special functions?

Yes. Mainly to handle selfmodification of the running block.

found a header with a lot of functions doing platform-specific opcodes.

Which?

What role does dynrec play in "compilation" process?

Well it does the recompilation. dosbox uses several emulation cores (an
interpretation-only one, "normal core", an x86-to-x86 recompiler etc.)
and each of them can handle client-code execution (through interpretation
or recompilation). On a timeslice basis (cycle setting) the core is left to
handle external events and stuff.

By the way, where can i get Dosbox sources?

Check the download section of the dosbox project page at sf.net/projects/dosbox
or the cvs section thereof.

Reply 8 of 11, by KPAH

User metadata
Rank Newbie
Rank
Newbie
wd wrote:

Yes. Mainly to handle selfmodification of the running block.

Is it worth insert some trivial check inside the "compiled" code? Think it may eliminate a lot of function calls.

wd wrote:

Which?

risc_armv4le-thumb-new5.h from http://members.chello.sk/apauer/dosbox2/dosbox2.html page

Surely i will study sources.

One more idea:
Apparently jumps aren't a real "break" condition for interpretated code until they are forward-jumps and there are no back-jumps between jump adress and destination address.
To compile loops (ie code with backjump(s) ) such approach seems fit: 3 years ago i was working on porting a computational-intensive software from x86 to Texas Instruments DSPs. Theese DSP have a feature named hardware pipeline buffer or something like that - the idea is DSP has internal command buffer which you can fill up with some cycle opcodes and run it with special opcode. DSP does cycle without reading commands from memory and does hardware registers renaming etc. The problem is such loop was unbreakable - you must explicitly specify how many cycles shoud it run. So, until loop end there will be no reaction on interrups. In realtime systems reaction time almost always is crucial. And compiler has special option - how many cycles at max shoud any hardware pipeline run. Long loops break on short ones and you will get acceptable guaranteed worst interrupt reaction time. It will be hard task to implement such an approach but it may improve performance a lot especially on loops with short body just by eliminating load/store operations (before entering loop body and after) and dramatically reducing nuber of "compiled" code calls. And it can be much simplier to implement calling from "compiled" code function which runs as you told "on timeslice basis to
handle external events and stuff".

Reply 9 of 11, by wd

User metadata
Rank DOSBox Author
Rank
DOSBox Author

Is it worth insert some trivial check inside the "compiled" code? Think it may eliminate a lot of function calls.

You can't avoid the calls, the memory addresses are not known at the time
the block is recompiled thus you have to deal with various memory types
even in real mode (ram, video memory). Also realmode code is not "fixed",
as a v86 switch can happen any time so you have to deal with that (clearing
the recompiler cache is not an option due to speed).

Apparently jumps aren't a real "break" condition for interpretated code until they are forward-jumps
and there are no back-jumps between jump adress and destination address.

The recompiler handles all types of code segment relative jumps (that is also
forward jumps) by block chaining. The second time such a jump is taken
(which means most of the time only relevant branches are evaluated) the
target address is filled in, which is a direct jump to the connected block.
Endless loops are handled by the cycles checks.
the target

Reply 10 of 11, by KPAH

User metadata
Rank Newbie
Rank
Newbie
wd wrote:

...by block chaining. The second time such a jump is taken
(which means most of the time only relevant branches are evaluated) the
target address is filled in, which is a direct jump to the connected block.

But doesn't it require to have CPU state written to memory before calling any block? For examle this pseudocode:

operation1;
if(condition)
{
operation2;
}
operation3;

will be splitted on 3 blocks, right? Then block doing operation3 either must be of 2 types depending where from it is accessed... Or both operation1 and operation2 blocks must save x86 register values (which could be located in HW registers up to the end of block) back to memory and operation3 block must read these values from memory.

Reply 11 of 11, by wd

User metadata
Rank DOSBox Author
Rank
DOSBox Author

The x86-to-x86 recompiler (core=dynamic) uses dynamic register assignment
thus backsaves the content at the end of a block.
The other recompiler does only use host registers on a per-instruction basis
so the state is consistent after each emulated instruction anyways.

which could be located in HW registers up to the end of block

This is not possible with the current design of the x86-to-nonx86 recompiler.