Reply 1740 of 2419, by hail-to-the-ryzen
I'm building a binary now. Thanks for the tip.
I'm building a binary now. Thanks for the tip.
I added fpu_control.h from your web link. It is not in mingw32, at least in a usual directory.
wrote:I added fpu_control.h from your web link. It is not in mingw32, at least in a usual directory.
Hm... perhaps I should just copy the macro from the header then. MinGW Windows is one of the build targets I make for DOSBox-X.
I also commented out the include for features.h file in fpu_control.h. It seems to build ok and I didn't see any reference to definitions outside fpu_control.h.
It's built and the log shows the correct fpu header file was included:
FPU:FPU core: long double FPU
FPU:FPU32 selftest passed
FPU:FPU64 selftest passed
FPU:FPU80 selftest passed
I tested the Explora demo and have same result as you reported. I will test Quake next.
Quake seems to work fine. Compared to the previous non-x86 fpu core in speed, it seems the cost of long doubles is low, perhaps ~5%.
wrote:That is a good plan. It may work to only adapt a small set of fpu instructions for long double precision. Testing with the x86 f […]
That is a good plan. It may work to only adapt a small set of fpu instructions for long double precision. Testing with the x86 fpu code shows the rendering artifacts are reproduced by reduced precision in the following two functions (tested against the beginning of the Explora demo):
diff -rupN dosbox-Orig//src/fpu/fpu_instructions_x86.h dosbox//src/fpu/fpu_instructions_x86.h
--- dosbox-Orig//src/fpu/fpu_instructions_x86.h
+++ dosbox//src/fpu/fpu_instructions_x86.h
@@ -1121,11 +1121,15 @@ static void FPU_FDIVR_EA(Bitu op1){
}
static void FPU_FMUL(Bitu op1, Bitu op2){
+ FPU_SetCW(0x37F);
FPUD_ARITH1(fmulp)
+ FPU_SetCW(0x3FF);
}
static void FPU_FMUL_EA(Bitu op1){
+ FPU_SetCW(0x37F);
FPUD_ARITH1_EA(fmulp)
+ FPU_SetCW(0x3FF);
}
static void FPU_FSUB(Bitu op1, Bitu op2){
I have some questions about those values.
According to this source: http://home.agh.edu.pl/~amrozek/x87.pdf
Bits 6-7 are not defined (0x7F and 0xFF), and bits 9-8 define the precision (0x300), so how does that affect the demo exactly?
It should have changed bit 9 to 1 from 0, so bit 8 and 9 are 11 for extended precision.
Edit: 0x37f should reflect doubles and 0x3ff flipping the bit for long doubles, at least from my binary to hex calculator. :}
The difference between 0x37F and 0x3FF is bit 7 (0x80), which is not listed to contain anything.
The latest commit adds code to update the FPU control word for ADD, SUB, MUL, DIV, which in the long double FPU code seems to fix the Explora glitches.
Adding the same to the non long-double non-x86 FPU code didn't fix anything. How can it when FPU operations in that code always truncate to 53-bit mantissa (64-bit) precision?
EDIT: However the control word code is probably not going to compile or work properly on my Raspberry Pi (arm7) so further changes will need to be done.
My calculation must have been off, so it must have changed another fpu parameter, such as rounding. 🙁
It does suggest, however, that the multiply is the main factor, but it would have to be confirmed (although not needed now).
Thank you for the commit! I'm building a binary now to test.
Note I added the 0x37F and 0x3FF to the x86 FPU code to test, not the non-x86 code. I was trying to cause the rendering artifact - is that what you asked?
When using the x86 FPU code, the only test to fail in Intel's i387 program is the transcendental test (SIN, COS, TAN, etc if I remember correctly).
EDIT: In the comprehensive test, the only failure is 35 cases of the "Scale" test (FSCALE instruction?)
That's interesting on the trigonometry functions! I also verified that I counted bit 7 incorrectly. So, I caused the fpu to reset to doubles both before and after instead of just before. However, it showed that the fmul is the reason, or at least one of the causes of the artifacts.
Does PCem show the same problem with the Scale test?
I'll try it. Can you point me to the software? Also, verified that Explora runs well. 😀
Edit: I found it.
wrote:wrote:The problem may not be related to just the difference in precision, fpu_instructions.h is a very minimal x87 implementation that doesn't always take FP exceptions when it should or set flags correctly for many instructions. Using "long double" won't fix that.
Intel made a utility to test for correct x87 operation: https://winworldpc.com/product/386sx-math-coprocess/10
Dosbox fails every test unless using fpu_instructions_x86.h, which is obviously limited to (32-bit) x86 only as it uses real x87 instructions.That's a good find, I actually have an old 386SX system I acquired some time ago where someone had installed the i387 in it, I should try that utility on it.
Link in the comment.
Both (custom builds) of pcem and dosbox-x (interpreter cpu cores) are showing many errors in all categories.
Edit: core=dynamic is not generating those errors. In fact no errors that way.
It doesn't make sense that it is correct. Perhaps another software to use for testing?
Then are the scaler errors from 64 bit versus 32 bit build? Is that possible?