VOGONS


Ideas about speeding up the dynrec

Topic actions

Reply 20 of 28, by wd

User metadata
Rank DOSBox Author
Rank
DOSBox Author

I save call to the original function, exception check

exception checks may be worth having an eye on (InitPageHandler tests for
pagefaults for example) as they may occur at page sweeps, raising the counter
check to 4096 would be better then.

When doing word and dword accesses I also save the check for page boundary. In this case the new function checks for page boundary and if the access crosses the boundary, the counter is decreased.

So for word/dword accesses you increase on regular accesses and decrease
on page boundary crossing accesses?

Reply 21 of 28, by M-HT

User metadata
Rank Newbie
Rank
Newbie
wd wrote:
exception checks may be worth having an eye on (InitPageHandler tests for pagefaults for example) as they may occur at page swee […]
Show full quote

I save call to the original function, exception check

exception checks may be worth having an eye on (InitPageHandler tests for
pagefaults for example) as they may occur at page sweeps, raising the counter
check to 4096 would be better then.

I haven't studied dosbox that much to know how it works.
I'm skipping the exception check when the code is rewritten to direct memory access, because direct memory access doesn't generate an exception (in mem_readb_checked_drc).
And raising the treshold to 4096 is no problem

wd wrote:

When doing word and dword accesses I also save the check for page boundary. In this case the new function checks for page boundary and if the access crosses the boundary, the counter is decreased.

So for word/dword accesses you increase on regular accesses and decrease
on page boundary crossing accesses?

Yes, this handles the case when a global variable crosses a page boundary (or similar).

Also, when the code is rewritten to direct memory access, I skip the page boundary check, because I assume that the next page is also directly accessible and has the same base address.

In total 4 cases can occur:
1) the memory address is accessed with handler -> the code is rewritten to call the old function (to skip counter handling) - the speed is the same as the old method
2) most accesses cross the page boundary (negative treshold is reached) -> the code is rewritten to call the old function (to skip counter handling) - the speed is the same as the old method
3) most accesses are direct (positive treshold is reached) -> the code is rewritten to access the memory directly - here is the speed increase
4) the counter oscilates between the positive and negative treshold - here's a slight speed decrease, but I don't think this case happens in real programs

Reply 22 of 28, by wd

User metadata
Rank DOSBox Author
Rank
DOSBox Author

I haven't studied dosbox that much to know how it works.

In pmode with paging enabled, memory access is controlled by flags that
specify how a 4k page can be accessed. If an access is denied, a page fault
happens (telling that a page doesn't exist, is read-only etc.) and the pmode
extender/OS software part handles the information accordingly (like reading
the page from disk and changing the flags, then returning to user code).
The stuff that cares about the checks and that triggers the page faults is
in the InitPage handlers.

Simple example that may occur (that is not a memory sweep) is a stack that
is supposed to only occupy as little pages as needed, but may grow (almost)
infinitely. In that case the "next page for the stack" is marked as non-present
and if the stack hits this page, the extender/OS part allocates memory for
that page, thus the stack code works fine. The amount of memory reads
may be very large until such a page is hit.

I don't know if games use page faults like this, things to check would be
win3x (surely uses it) and some heavy games (descent, borland extender games).

So the rewrites that remove handler checks (the

3) most accesses are direct (positive treshold is reached) -> the code is rewritten to access the memory directly - here is the speed increase

part) will introduce problems for certain games, but i don't have any clue
if a large number of games would be affected (i'd not expect that).

Reply 23 of 28, by M-HT

User metadata
Rank Newbie
Rank
Newbie
wd wrote:
In pmode with paging enabled, memory access is controlled by flags that specify how a 4k page can be accessed. If an access is d […]
Show full quote

In pmode with paging enabled, memory access is controlled by flags that
specify how a 4k page can be accessed. If an access is denied, a page fault
happens (telling that a page doesn't exist, is read-only etc.) and the pmode
extender/OS software part handles the information accordingly (like reading
the page from disk and changing the flags, then returning to user code).
The stuff that cares about the checks and that triggers the page faults is
in the InitPage handlers.

Simple example that may occur (that is not a memory sweep) is a stack that
is supposed to only occupy as little pages as needed, but may grow (almost)
infinitely. In that case the "next page for the stack" is marked as non-present
and if the stack hits this page, the extender/OS part allocates memory for
that page, thus the stack code works fine. The amount of memory reads
may be very large until such a page is hit.

I don't know if games use page faults like this, things to check would be
win3x (surely uses it) and some heavy games (descent, borland extender games).

So the rewrites that remove handler checks (the

3) most accesses are direct (positive treshold is reached) -> the code is rewritten to access the memory directly - here is the speed increase

part) will introduce problems for certain games, but i don't have any clue
if a large number of games would be affected (i'd not expect that).

Now I understand what you mean.

Your example with the growing stack - windows programs work like that, but I'm not sure about dos programs.
Because, I had to increase the stack size when compiling my test program - I was getting stack overflow otherwise.
I used openwatcom compiler and dos32a extender.

Seems like testing is in order now. I'll try to make an implementation for x86 dynrec and test if it brings some speedup (more than memory inlining) and how it works for some games.

Reply 25 of 28, by M-HT

User metadata
Rank Newbie
Rank
Newbie
Qbix wrote:

maybe reset the handler stuff if the paging handler change ?

I don't think this would help.
In the growing stack example, the handler is set and doesn't change. While the stack is enough, the code is rewritten for direct access. When the stack grows beyond limit, the handler for the page should be called. But it won't because the code was rewritten for direct access and no paging handler change occurred.

And I think this would be difficult to implement.

Reply 26 of 28, by wd

User metadata
Rank DOSBox Author
Rank
DOSBox Author

maybe reset the handler stuff if the paging handler change ?

Paging handlers don't change, the read address changes and advances
into a page with a handler and you'd only notice that when checking each
memory access (as currently done) but not if the heuristics removed
this code for performance reasons.

Reply 27 of 28, by M-HT

User metadata
Rank Newbie
Rank
Newbie

Looks like the last idea was a bust. I tested 15 various games and all crashed with my test implementation. The only thing that didn't crash was my test benchmarking program and the measured speedup was quite small. So I'm not going to work on this idea anymore.

On the bright side, since I learned how the existing code works, I optimized the inlined 32 bit accesses a bit. The code is in the attachment.
Strange thing is, that when I also inlined 16 bit accesses, the result was actually slower (which probably is the reason it's not inlined in the first place).

Attachments

  • Filename
    decoder.h
    File size
    81.67 KiB
    Downloads
    367 downloads
    File license
    Fair use/fair dealing exception

Reply 28 of 28, by wd

User metadata
Rank DOSBox Author
Rank
DOSBox Author

The problem with the inlining is simply that it bloats the code (checks and stuff),
and initially i was expecting more speedup from it than there was (32bit memory
access, that is).