VOGONS


First post, by peterferrie

User metadata
Rank Oldbie
Rank
Oldbie

I looked today at how emm386.exe does DMA transfers, to see why it's not affected by wraparounds. What happens is that it has an internal 64kb buffer, and copies to/from there prior to performing the transfer. The internal buffer is required for DMA transfer because the underlying physical pages might not be contiguous in the requester's buffer. The copy is also done using flat mode so there's no wraparound in the requester's buffer, either.

That leads to a question regarding this code:
for ( ; size ; size--, offset++) {
if (offset>(dma_wrapping<<dma16)) {
LOG_MSG(...
}
offset &= dma_wrap;
Why not check for overflow before the copy starts?

Checking after every byte incurs quite a lot of overhead. Moving the check outside of the 'for' loop would also allow switching to an internal buffer to complete the transfer, and thus avoiding the wraparound completely. The &dma_wrap could then be removed, too.
That would improve the compatibility for several games.

Reply 1 of 6, by peterferrie

User metadata
Rank Oldbie
Rank
Oldbie

here's the code for my idea.

Reply 2 of 6, by ripsaw8080

User metadata
Rank DOSBox Author
Rank
DOSBox Author

The wrap check is for a log message, so it doesn't really have to be there. I think the check only needs to be inside the loop if more than one wrap per block is possible. If there can be only one wrap, it's easy to move the check outside:

    size <<= dma16;
offset <<= dma16;
Bit32u dma_wrap = ((0xffff<<dma16)+dma16) | dma_wrapping;
+ if ((offset+size)>(dma_wrapping<<dma16)) {
+ LOG_MSG("DMA segbound wrapping (read): %x:%x size %x [%x] wrap %x",spage,(dma_wrapping<<dma16)+1,(offset+size)-((dma_wrapping<<dma16)+1),dma16,dma_wrapping);
+ }
for ( ; size ; size--, offset++) {
- if (offset>(dma_wrapping<<dma16)) {
- LOG_MSG("DMA segbound wrapping (read): %x:%x size %x [%x] wrap %x",spage,offset,size,dma16,dma_wrapping);
- }
offset &= dma_wrap;
Bitu page = highpart_addr_page+(offset >> 12);

I tried using both messages with games that wrap, and so far have only seen one wrap per block, and both messages report the same numbers.

DOSBox now has a setting for ems=emm386 in SVN, would it make sense to hook the EMM386-specific behavior to that?

Seems like memory allocation and copying will have some performance hit, so what is the gain? Specifically, which games work better?

Reply 3 of 6, by peterferrie

User metadata
Rank Oldbie
Rank
Oldbie

It can't wrap more than once, because it's limited to 64kb.
The emm386 option actually disables the wrapping, so the behaviour is the same as what I have, and without changing that routine. However, my code avoids recalculating the page for every byte. If nothing else, the 'for' loop should be written as the 'while' loop that I have. I align to a page and then copy an entire page at a time. That's much faster.

Reply 4 of 6, by peterferrie

User metadata
Rank Oldbie
Rank
Oldbie

okay, here's the original code, but rewritten to use the fast 'while' loop.

Reply 5 of 6, by ripsaw8080

User metadata
Rank DOSBox Author
Rank
DOSBox Author

I've only seen DmaChannel::Read() calling the block read for a few bytes at a time (the "want" amount), so it seems like MEM_BlockRead() would be more effective if the sizes were larger.

DOSBox does not currently support decrementing transfers, but if such support were to be added, the existing block read loop is easy to change because of its byte-wise approach. I suppose MEM_BlockRead could have a decrementing variant, though. DOS games apparently tend not use decrementing transfers, but there are at least a couple of demoscene intros that do (and don't work correctly in DOSBox because of it).

Reply 6 of 6, by peterferrie

User metadata
Rank Oldbie
Rank
Oldbie

Supporting that would be easy enough - an 'if' to see if the bit is set, and then a variation of my code, but using a 'for' loop instead of the block copy, if it is.