VOGONS


First post, by kjliew

User metadata
Rank Oldbie
Rank
Oldbie

Looking into DOSBox implementation of PageFaultCore() in paging.cpp, sound like at the very 1st page fault, it pushes the state into queue and switch to full core to decode the instruction. After some time, the full core decoder will return and a check is performed to find instructions to restart after page fault. If the match is found, it pops the state and switches back to dyn_x86 core. All is good.

Now the problem. If the match is never found, it keeps decoding the instruction with full core and one or more page faults can happen later which push the queue beyond its limit and DOSBox will crash. If it never hits the limit of the queue, then it will stuck with full core for instruction decoding and DOSBox will slow to a crawl on demanding games which typically require dyn_x86 core for full enjoyment. So increasing PF_QUEUESIZE can avoid DOSBox crash, but the performance hit remains.

In theory, this can happen whenever paging is employed. But for typical DOS games which use single-threads DOS extender stub lacking virtual memory support, I can understand that the likelihood can be close to nil.

However, this is the major road block for getting Win9x to work under DOSBox. I would even like to suggest that getting WFW 3.11 with 32-bit Disk/File access with native display drivers (not std VGA) could also expose the problem. In fact, if we can imagine that if a DOS game uses a multi-threaded DOS extender with virtual memory support, then such a game will either fail to run in DOSBox or slow to a crawl due to inaccessibility of dyn_x86 core unintentionally.

Thanks to h-a-l-9000, his paging patch somehow partly resolves the problem and is undoubtedly the ultimate key to get stable working win9x with DOSBox. However, h-a-l-9000 paging patch does not seem to solve all the page fault issues. It seems to be able to identify which page faults have a matching instruction re-entry point for restarting and which ones don't (which I still don't quite understand how he accomplished such marvellous tricks.) But sometimes, it also misses some and pf_queue never fully flushed. The whole story of this is non-deterministic, so I have to enable some logging in DOSBox to monitor pf_queue. Whenever I boot to Win9x desktop without pf_queue flushed, then I will just restart again. Once in Win9x desktop with pf_queue flushed, I never have any problem going forward. MechWarrior 2 Titanium-series and Half-Life both running well with Gulikoza Glide pass-through patch. An optimizing build (-O2 -fomit-frame-pointer) is less likely to have un-flushed pf_queue compared to debug build (--disable-core-inline -O0 -gdwarf-2). Similarly, running with more powerful CPUs also is less likely to get un-flushed pf_queue. (Core 2 duo vs Core-i5)

I would plead DOSBox devs to re-look into existing page fault handling issue and find ways to solve it elegantly. This is a low hanging fruit for DOSBox to support Win9x. I fully understand the complexity of DOSBox to support Win9x guest, but it looks like DOSBox already has the necessary pieces to be the best platform for emulating Win9x guest for gaming purpose, once the page faulting handling can be resolved.

IMHO this is equally important for DOS games. While DOSBox may not crash without having pf_queue flushed, it will slow to a crawl as most games designed to use DOS extenders are very CPU demanding.

Reply 1 of 4, by Joey_sw

User metadata
Rank Oldbie
Rank
Oldbie

Games like Elder Scroll: BattleSpire (using CauseWay extender) the offical updated version/patch also said can take advantage of Win9x virtual memory but it will also make the game less compatible with current state of DosBox.

The game have good performance using Dynamic Core but also will crash more often when its was not supposed to,
but using simple/normal core which as OP said, having performance hit, but it also will less likely to crash.

-fffuuu

Reply 2 of 4, by dreamlayers

User metadata
Rank Newbie
Rank
Newbie

The whole approach of running CPU exceptions in a function that runs a nested invocation of the CPU emulator is wrong. It conceptually matches how exceptions are sometimes used, but it does not match the way the CPU works. Exceptions should instead just change processor state and continue execution at the exception handler code.

A key difference is that the faulting instruction is aborted, and then executed again if the exception handler returns to its CS:EIP. There is no need for the emulator to do anything special here. Emulated code does all the work. With a page fault, the second execution will probably trigger a TLB load and succeed. Currently, DOSBox instead continues the original execution of the instruction which caused the fault.

A multitasking operating system does not have to return from page faults in a last in first out order. First in first out makes more sense when page faults involve disk access. That cannot be handled. PageFaultCore() could be altered to check against all CS:EIP values in pf_queue, but it has no way to return from anything but the most recent page fault. (If the page fault handler returns, it will continue execution of the instruction that triggered the most recent page fault.)

Never returning to the location that caused the page fault is hopefully non-existent when everything works properly. It can certainly happen when an operating system detects an error. If a page fault kills a process (SIGSEGV/SIGBUS) there won't be a return. Linux also rewrites some page fault return addresses in the kernel.

The best way to see what's going on is to have PageFaultCore() print out some information. Printing page fault entry addresses and only transitions between kernel and user space in PageFaultCore() has helped me understand what is going on with Linux. Collecting information this way may provide ideas on how to make things a bit better. Unfortunately, actually properly fixing the paging code is a lot of work.

The paging code also has another problem: it fails to re-read page table entries after some page faults. When InitPage() sets accessed and dirty flags, it can overwrite changes made in the earlier page fault. This triggers BUG_ON() assertions in Linux paging code. Fortunately, this is easy to fix.

Reply 4 of 4, by danoon

User metadata
Rank Member
Rank
Member

I did what dreamlayers mentioned in the java port, I don't match PF's and just return to nromal execution. The reason that Dosbox doesn't do this is if a PF happens during an emulated DOS or BIOS can we can't simply pop out of that function because we won't know the state when we re-enter. In jDosbox I just detect if we are in an emulated BIOS or DOS call and if so then I do the old PF matching method before returning to the function, otherwise I just return immediately from the PF function (no nested core). I load Bochs BIOS when booting Win95 and at that point there will be no emulated BIOS or DOS calls so the PF's will never run the nest core looking for a PF match which sometimes won't happen.

http://www.boxedwine.org/