I'm just asking to be sure: to the CPU, does a 2-operand opcode 69/6B(IMUL r16,imm8/16) even exist? Or does it always decode to 3 operands, with r/m and immediate being multiplied and stored into the reg operand?
Didn't we discuss this previously? The AX register as destination parameter is implicit in the documentation, but always present in reality and decodes in the usual way.
I'm just asking to be sure: to the CPU, does a 2-operand opcode 69/6B(IMUL r16,imm8/16) even exist? Or does it always decode to 3 operands, with r/m and immediate being multiplied and stored into the reg operand?
Didn't we discuss this previously? The AX register as destination parameter is implicit in the documentation, but always present in reality and decodes in the usual way.
Huh? So the r16 in opcodes 69/6B is forced to be AX or EAX, depending on operand size? So the instruction only uses r/m16/32 and imm8/16/32 and always stores it's result in (E)AX? Then what is the reg part of modr/m used for? That's entirely different than what hottobar said a few posts back(reg16/32 is the destination of the result)?
Or do you mean the normal GRP3a/b instructions, which always store in (E)AX?
For some reason the logs keep erroring out on the opcode 69/6B IMUL8/16/32 instructions?
@peterferrie: So, if I understand this correctly:
- If opcodes 69/6B r16 decodes as AX/EAX, then it disassembles as the two-operand version.
- If opcodes 69/6B r16 decodes as other registers, then it disassembles as the three-operand version.
In both cases, behaviour is as the three-operand version?
Is that correct? Or is the two-operand disassembly of opcodes 69/6B an error in the 80386 manuals?
Edit: Btw, what do you mean with "two-byte" and "three-byte"? Two-byte=F7/F8/67/6B and Three-byte=0F opcode variant? Or do you actually mean two-operand and three-operand?
@hottobar: I've just tried to single step the DIV0 interrupt that happens at the first DIV instruction of the testsuite. I see it's doing it's stuff with ESI(saving it), calling the printStr function to display the error, modifying the return address, IRET to the return address. Then the RETD at the bottom of the handler returns to address 0010:00000000 instead of the original location of the loop that handles the processing of the OPs table. So something is going wrong at some point between the table starting handling of the DIV entry(call ESI instruction) and the return of the IRET handler?
Unfortunately searching the log isn't that easy: it's a 19.2GB large text file containing all debugger information that's logged(the current common log format with memory logging enabled(the latter( shorthand) method mentioned in my earlier posts on the common log thread)):
Be warned that it requires at least 256MB memory to extract, as I've increased it from the default 64MB to increase the compression ratio(even though it's already at Ultra compression level), but I thought that every increase in compression is usable, due to the huge log file.
Strangely enough, using a simple large file text editor doesn't show to find any div instructions when searching the log for some odd reason?
Thinking about the problem, there must be a problem related to the stack somehow.
The stages should be as follows(assuming no stack changes):
call idivloc(esi)
*pushes eip*
idivloc: div causing int 0
*pushes flags*
*pushes cs*
*pushes eip*
calls log "#DE " and returns
modifies eip at [ESP] (confirmed in disassembly, which is missing from the log somehow?)
iret
*pops modified eip*
*pops original cs*
*pops original eflags*
retd
*pops original eip of test dispatch routine for op table* (reads 0x00000000 EIP from the stack)
All of these should be working correctly: calld/retd because of earlier tests and div(int)/iret because of earlier faults(bounds x2, paging fault).
Is there something different on the div fault handing compared to the other faults? Privilege changes or the like?
superfury wrote:@peterferrie: So, if I understand this correctly:
- If opcodes 69/6B r16 decodes as AX/EAX, then it disassembles as the two-oper […] Show full quote
@peterferrie: So, if I understand this correctly:
- If opcodes 69/6B r16 decodes as AX/EAX, then it disassembles as the two-operand version.
- If opcodes 69/6B r16 decodes as other registers, then it disassembles as the three-operand version.
In both cases, behaviour is as the three-operand version?
Yes, that's correct.
superfury wrote:
Edit: Btw, what do you mean with "two-byte" and "three-byte"? Two-byte=F7/F8/67/6B and Three-byte=0F opcode variant? Or do you actually mean two-operand and three-operand?
Yes, I meant two-operand, not two-byte. My mistake.
superfury wrote:Strangely enough, using a simple large file text editor doesn't show to find any div instructions when searching the log for som […] Show full quote
Strangely enough, using a simple large file text editor doesn't show to find any div instructions when searching the log for some odd reason?
Thinking about the problem, there must be a problem related to the stack somehow.
The stages should be as follows(assuming no stack changes):
call idivloc(esi)
*pushes eip*
idivloc: div causing int 0
*pushes flags*
*pushes cs*
*pushes eip*
calls log "#DE " and returns
modifies eip at [ESP] (confirmed in disassembly, which is missing from the log somehow?)
iret
*pops modified eip*
*pops original cs*
*pops original eflags*
retd
*pops original eip of test dispatch routine for op table* (reads 0x00000000 EIP from the stack)
All of these should be working correctly: calld/retd because of earlier tests and div(int)/iret because of earlier faults(bounds x2, paging fault).
Is there something different on the div fault handing compared to the other faults? Privilege changes or the like?
The first question - is your stack pointer returned to the value that it should have when the call was made?
I suspect that's where the problem lies. Perhaps something is being left on the stack, like an error code or similar?
I've just looked at the DIV0 interrupt firing: I see some strange stack pushes there(according to Visual Studio debugger):
fbf8=original CS before INT0
fbf4=EFLAGS
f9f4=CS
f9f0=EIP
That is a crazy jump between EFLAGS and CS?
Edit: It's just the 32-bit extension bit (0x8) being used to shift left 2 by 8 to obtain the usual 0/1 value(word/dword value) which becomes an invalid shift when applied to the stack, moving ESP down by no less than 2<<8=512 bytes:S That happened when pushing CS on the stack and decreasing ESP before writing CS to memory:S
Edit: Having fixed this, it now continues on towards the other tests:D
Edit: After stepping though, I see it properly finishing up now and entering the final HLT at the end of the program(after the return to real mode and loading all registers, clearing the interrupt flag and executing HLT at POST FF). 😁
This is the EE log UniPCemu's generating:
The attachment porte9.log is no longer available
It does seem to give lots of errors from the IMUL8 onwards still? What's going wrong with those instructions? The same applies to the DIV instructions?
Edit: Looking at a simple diff from http://prettydiff.com/ , I see the following instructions failing:
1- All IMUL8/16/32 instructions. 2- DIVDX W regarding #DE(not faulting). 3- DIVEDX W/DIVEDX D regarding #DE(not faulting). 4- DIVAX W regarding #DE(not faulting). 5- DIVEAX D regarding #DE(not faulting). 6- IDIVDL B not always faulting #DE. 7- IDIVEDX D not always faulting #DE. 8- IDIVEDX D EAX = 80000001 EDX = FFFFFFFF PS = 0000 wrong result? 9etc. 10- Shift/rotate carry flags problems? : 11- SALr B/W/D flags problems? 12- SHRr B/W/D flags problems? 13- ROLr B/W/D flags problems? 14- RORr B/W/D flags problems? 15- RCLr B/W/D problems? 16- RCRr B/W/D problems?
So, simply said:
- flags problems on the shift/rotate instructions(mostly carry flag as far as I can see?)
- full output problems on IMUL8/16/32 instructions?
- #DE problems with DIV/IDIV instructions?
I know, but I assume that if the data size is constant(e.g. 64-bit integer, 32-bit integer, 16-bit integer, 8-bit integer), it shouldn't be a problem(e.g. uint_64 to int_64 and vise-versa)? The only thing that effectively changes is the sign bit becoming a negative value of itself, assuming C/C++ compatiblity on IEEE standards?
I know, but I assume that if the data size is constant(e.g. 64-bit integer, 32-bit integer, 16-bit integer, 8-bit integer), it shouldn't be a problem(e.g. uint_64 to int_64 and vise-versa)? The only thing that effectively changes is the sign bit becoming a negative value of itself, assuming C/C++ compatiblity on IEEE standards?
The problem is that compilers exploit undefined behaviours to do optimizations. A compiler can change the way it treats a specific UB without notice and your code could break in unpredictable ways.
See for example how GCC uses the signed overflow UB.
Aren't signed/unsigned integer values defined as an IEEE standard, thus never changing? If they were to change, lots of files would simply break due to invalid contents? So it should be assumable that using an union to convert between signed/unsigned variables shouldn't pose a problem?
I've improved the overflow detection a bit, but it still doesn't catch all or in some cases seems to catch too many overflows?
1/* 2 3checkSignedOverflow: Checks if a signed overflow occurs trying to store the data. 4unsignedval: The unsigned, positive value 5calculatedbits: The amount of bits that's stored in unsignedval. 6bits: The amount of bits to store in. 7convertedtopositive: The unsignedval is a positive conversion from a negative result, so needs to be converted back. 8 9*/ 10 11//Based on http://www.ragestorm.net/blogs/?p=34 12 13byte checkSignedOverflow(uint_64 unsignedval, byte calculatedbits, byte bits, byte convertedtopositive) 14{ 15 uint_64 maxpositive,maxbit,errorrange; 16 if (convertedtopositive) unsignedval = (~unsignedval)+1; //Convert to negative, if needed! 17 maxpositive = ((1ULL<<(bits-1))-1); //Maximum positive value we can have! 18 maxbit = (1ULL<<(bits-1)); //The highest value we cannot set! 19 errorrange = ((1ULL<<bits)-1)-maxpositive; //Lower roof of invalid range! 20 if (unlikely((unsignedval>maxpositive) && ((unsignedval<maxbit)||(unsignedval>errorrange)))) //Signed underflow/overflow on unsinged conversion? 21 { 22 return 1; //Underflow/overflow detected! 23 } 24 return 0; //OK! 25}
32-bit (I)DIV:
1//Universal DIV instruction for x86 DIV instructions! 2/* 3 4Parameters: 5 val: The value to divide 6 divisor: The value to divide by 7 quotient: Quotient result container 8 remainder: Remainder result container 9 error: 1 on error(DIV0), 0 when valid. 10 resultbits: The amount of bits the result contains(16 or 8 on 8086) of quotient and remainder. 11 SHLcycle: The amount of cycles for each SHL. 12 ADDSUBcycle: The amount of cycles for ADD&SUB instruction to execute. 13 issigned: Signed division? 14 quotientnegative: Quotient is signed negative result? 15 remaindernegative: Remainder is signed negative result? 16 17*/ 18void CPU80386_internal_DIV(uint_64 val, uint_32 divisor, uint_32 *quotient, uint_32 *remainder, byte *error, byte resultbits, byte SHLcycle, byte ADDSUBcycle, byte *applycycles, byte issigned, byte quotientnegative, byte remaindernegative) 19{ 20 uint_64 temp, temp2, currentquotient; //Remaining value and current divisor! 21 uint_64 resultquotient; 22 byte shift; //The shift to apply! No match on 0 shift is done! 23 temp = val; //Load the value to divide! 24 *applycycles = 1; //Default: apply the cycles normally! 25 if (divisor==0) //Not able to divide? 26 { 27 *quotient = 0; 28 *remainder = temp; //Unable to comply! 29 *error = 1; //Divide by 0 error! 30 return; //Abort: division by 0! 31 } 32 33 if (CPU_apply286cycles()) /* No 80286+ cycles instead? */ 34 { 35 SHLcycle = ADDSUBcycle = 0; //Don't apply the cycle counts for this instruction! 36 *applycycles = 0; //Don't apply the cycles anymore! 37 } 38 39 temp = val; //Load the remainder to use! 40 resultquotient = 0; //Default: we have nothing after division! 41 nextstep: 42 //First step: calculate shift so that (divisor<<shift)<=remainder and ((divisor<<(shift+1))>remainder) 43 temp2 = divisor; //Load the default divisor for x1! 44 if (temp2>temp) //Not enough to divide? We're done! 45 { 46 goto gotresult; //We've gotten a result! 47 } 48 currentquotient = 1; //We're starting with x1 factor! 49 for (shift=0;shift<(resultbits+1);++shift) //Check for the biggest factor to apply(we're going from bit 0 to maxbit)! 50 { 51 if ((temp2<=temp) && ((temp2<<1)>temp)) //Found our value to divide? 52 { 53 CPU[activeCPU].cycles_OP += SHLcycle; //We're taking 1 more SHL cycle for this! 54 break; //We've found our shift! 55 } 56 temp2 <<= 1; //Shift to the next position! 57 currentquotient <<= 1; //Shift to the next result! 58 CPU[activeCPU].cycles_OP += SHLcycle; //We're taking 1 SHL cycle for this! Assuming parallel shifting! 59 } 60 if (shift==(resultbits+1)) //We've overflown? We're too large to divide!
…Show last 71 lines
61 { 62 *error = 1; //Raise divide by 0 error due to overflow! 63 return; //Abort! 64 } 65 //Second step: substract divisor<<n from remainder and increase result with 1<<n. 66 temp -= temp2; //Substract divisor<<n from remainder! 67 resultquotient += currentquotient; //Increase result(divided value) with the found power of 2 (1<<n). 68 CPU[activeCPU].cycles_OP += ADDSUBcycle; //We're taking 1 substract and 1 addition cycle for this(ADD/SUB register take 3 cycles)! 69 goto nextstep; //Start the next step! 70 //Finished when remainder<divisor or remainder==0. 71 gotresult: //We've gotten a result! 72 if ((uint_64)temp>((1ULL<<resultbits)-1)) //Modulo overflow? 73 { 74 *error = 1; //Raise divide by 0 error due to overflow! 75 return; //Abort! 76 } 77 if ((uint_64)resultquotient>((1ULL<<resultbits)-1)) //Quotient overflow? 78 { 79 *error = 1; //Raise divide by 0 error due to overflow! 80 return; //Abort! 81 } 82 if (issigned) //Check for signed overflow as well? 83 { 84 /* 85 if (checkSignedOverflow((uint_64)temp,64,resultbits,remaindernegative)) 86 { 87 *error = 1; //Raise divide by 0 error due to overflow! 88 return; //Abort! 89 } 90 */ 91 if (checkSignedOverflow((uint_64)resultquotient,64,resultbits,quotientnegative)) 92 { 93 *error = 1; //Raise divide by 0 error due to overflow! 94 return; //Abort! 95 } 96 } 97 *quotient = resultquotient; //Quotient calculated! 98 *remainder = temp; //Give the modulo! The result is already calculated! 99 *error = 0; //We're having a valid result! 100} 101 102void CPU80386_internal_IDIV(uint_64 val, uint_32 divisor, uint_32 *quotient, uint_32 *remainder, byte *error, byte resultbits, byte SHLcycle, byte ADDSUBcycle, byte *applycycles) 103{ 104 byte quotientnegative, remaindernegative; //To toggle the result and apply sign after and before? 105 quotientnegative = remaindernegative = 0; //Default: don't toggle the result not remainder! 106 if (((val>>31)!=(divisor>>15))) //Are we to change signs on the result? The result is negative instead! (We're a +/- or -/+ division) 107 { 108 quotientnegative = 1; //We're to toggle the result sign if not zero! 109 } 110 if (val&0x80000000) //Negative value to divide? 111 { 112 val = ((~val)+1); //Convert the negative value to be positive! 113 remaindernegative = 1; //We're to toggle the remainder is any, because the value to divide is negative! 114 } 115 if (divisor&0x8000) //Negative divisor? Convert to a positive divisor! 116 { 117 divisor = ((~divisor)+1); //Convert the divisor to be positive! 118 } 119 CPU80386_internal_DIV(val,divisor,quotient,remainder,error,resultbits,SHLcycle,ADDSUBcycle,applycycles,1,quotientnegative,remaindernegative); //Execute the division as an unsigned division! 120 if (*error==0) //No error has occurred? Do post-processing of the results! 121 { 122 if (quotientnegative) //The result is negative? 123 { 124 *quotient = (~*quotient)+1; //Apply the new sign to the result! 125 } 126 if (remaindernegative) //The remainder is negative? 127 { 128 *remainder = (~*remainder)+1; //Apply the new sign to the remainder! 129 } 130 } 131}
16-bit (I)DIV:
1//Universal DIV instruction for x86 DIV instructions! 2/* 3 4Parameters: 5 val: The value to divide 6 divisor: The value to divide by 7 quotient: Quotient result container 8 remainder: Remainder result container 9 error: 1 on error(DIV0), 0 when valid. 10 resultbits: The amount of bits the result contains(16 or 8 on 8086) of quotient and remainder. 11 SHLcycle: The amount of cycles for each SHL. 12 ADDSUBcycle: The amount of cycles for ADD&SUB instruction to execute. 13 issigned: Signed division? 14 quotientnegative: Quotient is signed negative result? 15 remaindernegative: Remainder is signed negative result? 16 17*/ 18void CPU8086_internal_DIV(uint_32 val, word divisor, word *quotient, word *remainder, byte *error, byte resultbits, byte SHLcycle, byte ADDSUBcycle, byte *applycycles, byte issigned, byte quotientnegative, byte remaindernegative) 19{ 20 uint_32 temp, temp2, currentquotient; //Remaining value and current divisor! 21 uint_32 resultquotient; 22 byte shift; //The shift to apply! No match on 0 shift is done! 23 temp = val; //Load the value to divide! 24 *applycycles = 1; //Default: apply the cycles normally! 25 if (divisor==0) //Not able to divide? 26 { 27 *quotient = 0; 28 *remainder = temp; //Unable to comply! 29 *error = 1; //Divide by 0 error! 30 return; //Abort: division by 0! 31 } 32 33 if (CPU_apply286cycles()) /* No 80286+ cycles instead? */ 34 { 35 SHLcycle = ADDSUBcycle = 0; //Don't apply the cycle counts for this instruction! 36 *applycycles = 0; //Don't apply the cycles anymore! 37 } 38 39 temp = val; //Load the remainder to use! 40 resultquotient = 0; //Default: we have nothing after division! 41 nextstep: 42 //First step: calculate shift so that (divisor<<shift)<=remainder and ((divisor<<(shift+1))>remainder) 43 temp2 = divisor; //Load the default divisor for x1! 44 if (temp2>temp) //Not enough to divide? We're done! 45 { 46 goto gotresult; //We've gotten a result! 47 } 48 currentquotient = 1; //We're starting with x1 factor! 49 for (shift=0;shift<(resultbits+1);++shift) //Check for the biggest factor to apply(we're going from bit 0 to maxbit)! 50 { 51 if ((temp2<=temp) && ((temp2<<1)>temp)) //Found our value to divide? 52 { 53 CPU[activeCPU].cycles_OP += SHLcycle; //We're taking 1 more SHL cycle for this! 54 break; //We've found our shift! 55 } 56 temp2 <<= 1; //Shift to the next position! 57 currentquotient <<= 1; //Shift to the next result! 58 CPU[activeCPU].cycles_OP += SHLcycle; //We're taking 1 SHL cycle for this! Assuming parallel shifting! 59 } 60 if (shift==(resultbits+1)) //We've overflown? We're too large to divide!
…Show last 71 lines
61 { 62 *error = 1; //Raise divide by 0 error due to overflow! 63 return; //Abort! 64 } 65 //Second step: substract divisor<<n from remainder and increase result with 1<<n. 66 temp -= temp2; //Substract divisor<<n from remainder! 67 resultquotient += currentquotient; //Increase result(divided value) with the found power of 2 (1<<n). 68 CPU[activeCPU].cycles_OP += ADDSUBcycle; //We're taking 1 substract and 1 addition cycle for this(ADD/SUB register take 3 cycles)! 69 goto nextstep; //Start the next step! 70 //Finished when remainder<divisor or remainder==0. 71 gotresult: //We've gotten a result! 72 if (temp>((1<<resultbits)-1)) //Modulo overflow? 73 { 74 *error = 1; //Raise divide by 0 error due to overflow! 75 return; //Abort! 76 } 77 if (resultquotient>((1<<resultbits)-1)) //Quotient overflow? 78 { 79 *error = 1; //Raise divide by 0 error due to overflow! 80 return; //Abort! 81 } 82 if (issigned) //Check for signed overflow as well? 83 { 84 /* 85 if (checkSignedOverflow(temp,32,resultbits,remaindernegative)) 86 { 87 *error = 1; //Raise divide by 0 error due to overflow! 88 return; //Abort! 89 } 90 */ 91 if (checkSignedOverflow(resultquotient,32,resultbits,quotientnegative)) 92 { 93 *error = 1; //Raise divide by 0 error due to overflow! 94 return; //Abort! 95 } 96 } 97 *quotient = resultquotient; //Quotient calculated! 98 *remainder = temp; //Give the modulo! The result is already calculated! 99 *error = 0; //We're having a valid result! 100} 101 102void CPU8086_internal_IDIV(uint_32 val, word divisor, word *quotient, word *remainder, byte *error, byte resultbits, byte SHLcycle, byte ADDSUBcycle, byte *applycycles) 103{ 104 byte quotientnegative, remaindernegative; //To toggle the result and apply sign after and before? 105 quotientnegative = remaindernegative = 0; //Default: don't toggle the result not remainder! 106 if (((val>>31)!=(divisor>>15))) //Are we to change signs on the result? The result is negative instead! (We're a +/- or -/+ division) 107 { 108 quotientnegative = 1; //We're to toggle the result sign if not zero! 109 } 110 if (val&0x80000000) //Negative value to divide? 111 { 112 val = ((~val)+1); //Convert the negative value to be positive! 113 remaindernegative = 1; //We're to toggle the remainder is any, because the value to divide is negative! 114 } 115 if (divisor&0x8000) //Negative divisor? Convert to a positive divisor! 116 { 117 divisor = ((~divisor)+1); //Convert the divisor to be positive! 118 } 119 CPU8086_internal_DIV(val,divisor,quotient,remainder,error,resultbits,SHLcycle,ADDSUBcycle,applycycles,1,quotientnegative,remaindernegative); //Execute the division as an unsigned division! 120 if (*error==0) //No error has occurred? Do post-processing of the results! 121 { 122 if (quotientnegative) //The result is negative? 123 { 124 *quotient = (~*quotient)+1; //Apply the new sign to the result! 125 } 126 if (remaindernegative) //The remainder is negative? 127 { 128 *remainder = (~*remainder)+1; //Apply the new sign to the remainder! 129 } 130 } 131}
1 CPU8086_internal_DIV(valdiv,divisor,"ient,&remainder,&error,8,2,6,&applycycles,0,0,0); //Execute the unsigned division! 8-bits result and modulo!
8-bit IDIV to AH/AL:
1 valdivd = valdiv; 2 divisorw = divisor; 3 if (valdiv&0x8000) valdivd |= 0xFFFF0000; //Sign extend to 32-bits! 4 if (divisor&0x80) divisorw |= 0xFF00; //Sign extend to 16-bits! 5 CPU8086_internal_IDIV(valdivd,divisorw,"ient,&remainder,&error,8,2,6,&applycycles); //Execute the unsigned division! 8-bits result and modulo!
16-bit DIV to AX/DX:
1 CPU8086_internal_DIV(valdiv,divisor,"ient,&remainder,&error,16,2,6,&applycycles,0,0,0); //Execute the unsigned division! 8-bits result and modulo!
16-bit IDIV to AX/DX:
1 CPU8086_internal_IDIV(valdiv,divisor,"ient,&remainder,&error,16,2,6,&applycycles); //Execute the unsigned division! 8-bits result and modulo!
32-bit DIV to EAX/EDX:
1 CPU80386_internal_DIV(valdiv,divisor,"ient,&remainder,&error,32,2,6,&applycycles,0,0,0); //Execute the unsigned division! 8-bits result and modulo!
32-bit IDIV to EAX/EDX:
1 CPU80386_internal_IDIV(valdiv,divisor,"ient,&remainder,&error,32,2,6,&applycycles); //Execute the unsigned division! 8-bits result and modulo!
Just asking: what exactly IS the meaning of the different values that are logged in the EE log? What are the values logged with EAX/EDX and PS values? Is PS simply a direct dump of the lower 16 bits of the EFLAGS register(masked to only contain (un)defined bits)? What about the two sets of EAX/EDX values? Are they the EAX/EDX registers before/after the instruction, or something else?
I've managed to improve the under/overflow algorithm to detect better, but somehow your EE log reports #DE when it shouldn't be, according to pure logic?
1/* 2 3checkSignedOverflow: Checks if a signed overflow occurs trying to store the data. 4unsignedval: The unsigned, positive value 5calculatedbits: The amount of bits that's stored in unsignedval. 6bits: The amount of bits to store in. 7convertedtopositive: The unsignedval is a positive conversion from a negative result, so needs to be converted back. 8 9*/ 10 11//Based on http://www.ragestorm.net/blogs/?p=34 12 13byte checkSignedOverflow(uint_64 unsignedval, byte calculatedbits, byte bits, byte convertedtopositive) 14{ 15 uint_64 maxpositive,maxnegative; 16 maxpositive = ((1ULL<<(bits-1))-1); //Maximum positive value we can have! 17 maxnegative = (1ULL<<(bits-1)); //The highest value we cannot set and get past when negative! 18 if (unlikely(((unsignedval>maxpositive) && (convertedtopositive==0)) && ((unsignedval>maxnegative) && (convertedtopositive)))) //Signed underflow/overflow on unsinged conversion? 19 { 20 return 1; //Underflow/overflow detected! 21 } 22 return 0; //OK! 23}
Example faulting line:
Actual result it should be(according to the reference file, at row 25054):
1IDIVDL B EAX=00000080 EDX=00000001 PS=0000 #DE EAX=00000080 EDX=00000001 PS=0000
UniPCemu's result:
1IDIVDL B EAX=00000080 EDX=00000001 PS=0000 EAX=00000080 EDX=00000001 PS=0000
UniPCemu doesn't detect this correctly somehow? So it's +80/1=+80, which won't fit in the 8-bit signed result, causing a fault?
Managed to fix the IDIV instructions too now, with all variants (as well as normal DIV) functioning properly:
The new and improved x86 (with sign support properly added and working) division algorithms:
Overflow check support for determining Division error:
1/* 2 3checkSignedOverflow: Checks if a signed overflow occurs trying to store the data. 4unsignedval: The unsigned, positive value 5calculatedbits: The amount of bits that's stored in unsignedval. 6bits: The amount of bits to store in. 7convertedtopositive: The unsignedval is a positive conversion from a negative result, so needs to be converted back. 8 9*/ 10 11//Based on http://www.ragestorm.net/blogs/?p=34 12 13byte checkSignedOverflow(uint_64 unsignedval, byte calculatedbits, byte bits, byte convertedtopositive) 14{ 15 uint_64 maxpositive,maxnegative; 16 maxpositive = ((1ULL<<(bits-1))-1); //Maximum positive value we can have! 17 maxnegative = (1ULL<<(bits-1)); //The highest value we cannot set and get past when negative! 18 if (unlikely(((unsignedval>maxpositive) && (convertedtopositive==0)) || ((unsignedval>maxnegative) && (convertedtopositive)))) //Signed underflow/overflow on unsinged conversion? 19 { 20 return 1; //Underflow/overflow detected! 21 } 22 return 0; //OK! 23}
16-bit:
1//Universal DIV instruction for x86 DIV instructions! 2/* 3 4Parameters: 5 val: The value to divide 6 divisor: The value to divide by 7 quotient: Quotient result container 8 remainder: Remainder result container 9 error: 1 on error(DIV0), 0 when valid. 10 resultbits: The amount of bits the result contains(16 or 8 on 8086) of quotient and remainder. 11 SHLcycle: The amount of cycles for each SHL. 12 ADDSUBcycle: The amount of cycles for ADD&SUB instruction to execute. 13 issigned: Signed division? 14 quotientnegative: Quotient is signed negative result? 15 remaindernegative: Remainder is signed negative result? 16 17*/ 18void CPU8086_internal_DIV(uint_32 val, word divisor, word *quotient, word *remainder, byte *error, byte resultbits, byte SHLcycle, byte ADDSUBcycle, byte *applycycles, byte issigned, byte quotientnegative, byte remaindernegative) 19{ 20 uint_32 temp, temp2, currentquotient; //Remaining value and current divisor! 21 uint_32 resultquotient; 22 byte shift; //The shift to apply! No match on 0 shift is done! 23 temp = val; //Load the value to divide! 24 *applycycles = 1; //Default: apply the cycles normally! 25 if (divisor==0) //Not able to divide? 26 { 27 *quotient = 0; 28 *remainder = temp; //Unable to comply! 29 *error = 1; //Divide by 0 error! 30 return; //Abort: division by 0! 31 } 32 33 if (CPU_apply286cycles()) /* No 80286+ cycles instead? */ 34 { 35 SHLcycle = ADDSUBcycle = 0; //Don't apply the cycle counts for this instruction! 36 *applycycles = 0; //Don't apply the cycles anymore! 37 } 38 39 temp = val; //Load the remainder to use! 40 resultquotient = 0; //Default: we have nothing after division! 41 nextstep: 42 //First step: calculate shift so that (divisor<<shift)<=remainder and ((divisor<<(shift+1))>remainder) 43 temp2 = divisor; //Load the default divisor for x1! 44 if (temp2>temp) //Not enough to divide? We're done! 45 { 46 goto gotresult; //We've gotten a result! 47 } 48 currentquotient = 1; //We're starting with x1 factor! 49 for (shift=0;shift<(resultbits+1);++shift) //Check for the biggest factor to apply(we're going from bit 0 to maxbit)! 50 { 51 if ((temp2<=temp) && ((temp2<<1)>temp)) //Found our value to divide? 52 { 53 CPU[activeCPU].cycles_OP += SHLcycle; //We're taking 1 more SHL cycle for this! 54 break; //We've found our shift! 55 } 56 temp2 <<= 1; //Shift to the next position! 57 currentquotient <<= 1; //Shift to the next result! 58 CPU[activeCPU].cycles_OP += SHLcycle; //We're taking 1 SHL cycle for this! Assuming parallel shifting! 59 } 60 if (shift==(resultbits+1)) //We've overflown? We're too large to divide!
…Show last 71 lines
61 { 62 *error = 1; //Raise divide by 0 error due to overflow! 63 return; //Abort! 64 } 65 //Second step: substract divisor<<n from remainder and increase result with 1<<n. 66 temp -= temp2; //Substract divisor<<n from remainder! 67 resultquotient += currentquotient; //Increase result(divided value) with the found power of 2 (1<<n). 68 CPU[activeCPU].cycles_OP += ADDSUBcycle; //We're taking 1 substract and 1 addition cycle for this(ADD/SUB register take 3 cycles)! 69 goto nextstep; //Start the next step! 70 //Finished when remainder<divisor or remainder==0. 71 gotresult: //We've gotten a result! 72 if (temp>((1<<resultbits)-1)) //Modulo overflow? 73 { 74 *error = 1; //Raise divide by 0 error due to overflow! 75 return; //Abort! 76 } 77 if (resultquotient>((1<<resultbits)-1)) //Quotient overflow? 78 { 79 *error = 1; //Raise divide by 0 error due to overflow! 80 return; //Abort! 81 } 82 if (issigned) //Check for signed overflow as well? 83 { 84 /* 85 if (checkSignedOverflow(temp,32,resultbits,remaindernegative)) 86 { 87 *error = 1; //Raise divide by 0 error due to overflow! 88 return; //Abort! 89 } 90 */ 91 if (checkSignedOverflow(resultquotient,32,resultbits,quotientnegative)) 92 { 93 *error = 1; //Raise divide by 0 error due to overflow! 94 return; //Abort! 95 } 96 } 97 *quotient = resultquotient; //Quotient calculated! 98 *remainder = temp; //Give the modulo! The result is already calculated! 99 *error = 0; //We're having a valid result! 100} 101 102void CPU8086_internal_IDIV(uint_32 val, word divisor, word *quotient, word *remainder, byte *error, byte resultbits, byte SHLcycle, byte ADDSUBcycle, byte *applycycles) 103{ 104 byte quotientnegative, remaindernegative; //To toggle the result and apply sign after and before? 105 quotientnegative = remaindernegative = 0; //Default: don't toggle the result not remainder! 106 if (((val>>31)!=(divisor>>15))) //Are we to change signs on the result? The result is negative instead! (We're a +/- or -/+ division) 107 { 108 quotientnegative = 1; //We're to toggle the result sign if not zero! 109 } 110 if (val&0x80000000) //Negative value to divide? 111 { 112 val = ((~val)+1); //Convert the negative value to be positive! 113 remaindernegative = 1; //We're to toggle the remainder is any, because the value to divide is negative! 114 } 115 if (divisor&0x8000) //Negative divisor? Convert to a positive divisor! 116 { 117 divisor = ((~divisor)+1); //Convert the divisor to be positive! 118 } 119 CPU8086_internal_DIV(val,divisor,quotient,remainder,error,resultbits,SHLcycle,ADDSUBcycle,applycycles,1,quotientnegative,remaindernegative); //Execute the division as an unsigned division! 120 if (*error==0) //No error has occurred? Do post-processing of the results! 121 { 122 if (quotientnegative) //The result is negative? 123 { 124 *quotient = (~*quotient)+1; //Apply the new sign to the result! 125 } 126 if (remaindernegative) //The remainder is negative? 127 { 128 *remainder = (~*remainder)+1; //Apply the new sign to the remainder! 129 } 130 } 131}
32-bit:
1//Universal DIV instruction for x86 DIV instructions! 2/* 3 4Parameters: 5 val: The value to divide 6 divisor: The value to divide by 7 quotient: Quotient result container 8 remainder: Remainder result container 9 error: 1 on error(DIV0), 0 when valid. 10 resultbits: The amount of bits the result contains(16 or 8 on 8086) of quotient and remainder. 11 SHLcycle: The amount of cycles for each SHL. 12 ADDSUBcycle: The amount of cycles for ADD&SUB instruction to execute. 13 issigned: Signed division? 14 quotientnegative: Quotient is signed negative result? 15 remaindernegative: Remainder is signed negative result? 16 17*/ 18void CPU80386_internal_DIV(uint_64 val, uint_32 divisor, uint_32 *quotient, uint_32 *remainder, byte *error, byte resultbits, byte SHLcycle, byte ADDSUBcycle, byte *applycycles, byte issigned, byte quotientnegative, byte remaindernegative) 19{ 20 uint_64 temp, temp2, currentquotient; //Remaining value and current divisor! 21 uint_64 resultquotient; 22 byte shift; //The shift to apply! No match on 0 shift is done! 23 temp = val; //Load the value to divide! 24 *applycycles = 1; //Default: apply the cycles normally! 25 if (divisor==0) //Not able to divide? 26 { 27 *quotient = 0; 28 *remainder = temp; //Unable to comply! 29 *error = 1; //Divide by 0 error! 30 return; //Abort: division by 0! 31 } 32 33 if (CPU_apply286cycles()) /* No 80286+ cycles instead? */ 34 { 35 SHLcycle = ADDSUBcycle = 0; //Don't apply the cycle counts for this instruction! 36 *applycycles = 0; //Don't apply the cycles anymore! 37 } 38 39 temp = val; //Load the remainder to use! 40 resultquotient = 0; //Default: we have nothing after division! 41 nextstep: 42 //First step: calculate shift so that (divisor<<shift)<=remainder and ((divisor<<(shift+1))>remainder) 43 temp2 = divisor; //Load the default divisor for x1! 44 if (temp2>temp) //Not enough to divide? We're done! 45 { 46 goto gotresult; //We've gotten a result! 47 } 48 currentquotient = 1; //We're starting with x1 factor! 49 for (shift=0;shift<(resultbits+1);++shift) //Check for the biggest factor to apply(we're going from bit 0 to maxbit)! 50 { 51 if ((temp2<=temp) && ((temp2<<1)>temp)) //Found our value to divide? 52 { 53 CPU[activeCPU].cycles_OP += SHLcycle; //We're taking 1 more SHL cycle for this! 54 break; //We've found our shift! 55 } 56 temp2 <<= 1; //Shift to the next position! 57 currentquotient <<= 1; //Shift to the next result! 58 CPU[activeCPU].cycles_OP += SHLcycle; //We're taking 1 SHL cycle for this! Assuming parallel shifting! 59 } 60 if (shift==(resultbits+1)) //We've overflown? We're too large to divide!
…Show last 71 lines
61 { 62 *error = 1; //Raise divide by 0 error due to overflow! 63 return; //Abort! 64 } 65 //Second step: substract divisor<<n from remainder and increase result with 1<<n. 66 temp -= temp2; //Substract divisor<<n from remainder! 67 resultquotient += currentquotient; //Increase result(divided value) with the found power of 2 (1<<n). 68 CPU[activeCPU].cycles_OP += ADDSUBcycle; //We're taking 1 substract and 1 addition cycle for this(ADD/SUB register take 3 cycles)! 69 goto nextstep; //Start the next step! 70 //Finished when remainder<divisor or remainder==0. 71 gotresult: //We've gotten a result! 72 if (temp>((1ULL<<resultbits)-1)) //Modulo overflow? 73 { 74 *error = 1; //Raise divide by 0 error due to overflow! 75 return; //Abort! 76 } 77 if (resultquotient>((1ULL<<resultbits)-1ULL)) //Quotient overflow? 78 { 79 *error = 1; //Raise divide by 0 error due to overflow! 80 return; //Abort! 81 } 82 if (issigned) //Check for signed overflow as well? 83 { 84 /* 85 if (checkSignedOverflow(temp,64,resultbits,remaindernegative)) 86 { 87 *error = 1; //Raise divide by 0 error due to overflow! 88 return; //Abort! 89 } 90 */ 91 if (checkSignedOverflow(resultquotient,64,resultbits,quotientnegative)) 92 { 93 *error = 1; //Raise divide by 0 error due to overflow! 94 return; //Abort! 95 } 96 } 97 *quotient = resultquotient; //Quotient calculated! 98 *remainder = temp; //Give the modulo! The result is already calculated! 99 *error = 0; //We're having a valid result! 100} 101 102void CPU80386_internal_IDIV(uint_64 val, uint_32 divisor, uint_32 *quotient, uint_32 *remainder, byte *error, byte resultbits, byte SHLcycle, byte ADDSUBcycle, byte *applycycles) 103{ 104 byte quotientnegative, remaindernegative; //To toggle the result and apply sign after and before? 105 quotientnegative = remaindernegative = 0; //Default: don't toggle the result not remainder! 106 if (((val>>63)!=(divisor>>31))) //Are we to change signs on the result? The result is negative instead! (We're a +/- or -/+ division) 107 { 108 quotientnegative = 1; //We're to toggle the result sign if not zero! 109 } 110 if (val&0x8000000000000000ULL) //Negative value to divide? 111 { 112 val = ((~val)+1); //Convert the negative value to be positive! 113 remaindernegative = 1; //We're to toggle the remainder is any, because the value to divide is negative! 114 } 115 if (divisor&0x80000000) //Negative divisor? Convert to a positive divisor! 116 { 117 divisor = ((~divisor)+1); //Convert the divisor to be positive! 118 } 119 CPU80386_internal_DIV(val,divisor,quotient,remainder,error,resultbits,SHLcycle,ADDSUBcycle,applycycles,1,quotientnegative,remaindernegative); //Execute the division as an unsigned division! 120 if (*error==0) //No error has occurred? Do post-processing of the results! 121 { 122 if (quotientnegative) //The result is negative? 123 { 124 *quotient = (~*quotient)+1; //Apply the new sign to the result! 125 } 126 if (remaindernegative) //The remainder is negative? 127 { 128 *remainder = (~*remainder)+1; //Apply the new sign to the remainder! 129 } 130 } 131}
Both are functioning without problems now:D
The only things that are giving problems now are the arithmetic (SALr etc.) instructiona and IMUL instructions?
Just took a look at the 80386 manual at http://x86.renejeschke.de/ and implemented the wrappings on the counts etc. I've ran the testsuite again and after comparisions with the EE reference I saw the following:
1SAL1=OK 2SALi=OK 3SALr=Flags problems 4SAR1=OK 5SARi=OK 6SARr=OK 7SHR1=OK 8SHRi=OK 9SHRr=Flags problems 10ROL1=OK 11ROLi=Flags problems(overflow flag not set?) 12ROLr=Carry flag problems 13ROR1=OK 14RORi=OK 15RORr(b/w)=Carry flag problems 16RORr(d)=OK 17RCL1=OK 18RCLi=Overflow flag problems(not set) at word variant only. 19RCLr=OK 20RCR1=OK 21RCRi=Overflow flag problems(not set). 22RCRr=Carry flag problems(set/not set).
This is the actual general instruction executed for all of those rotate/shift instructions:
8/16-bit(8086+):
1byte op_grp2_8(byte cnt, byte varshift) { 2 //word d, 3 INLINEREGISTER word s, shift, tempCF, msb; 4 INLINEREGISTER byte numcnt; 5 //word backup; 6 //if (cnt>0x8) return(oper1b); //NEC V20/V30+ limits shift count 7 numcnt = cnt; //Save count! 8 s = oper1b; 9 switch (thereg) { 10 case 0: //ROL r/m8 11 if (EMULATED_CPU>=CPU_80386) numcnt &= 7; //Operand size wrap! 12 else if (EMULATED_CPU >= CPU_NECV30) numcnt &= 0x1F; //Clear the upper 3 bits to become a NEC V20/V30+! 13 for (shift = 1; shift <= numcnt; shift++) { 14 tempCF = ((s&0x80)>>7); //Save MSB! 15 s = (s << 1)|tempCF; 16 } 17 FLAGW_CF(s&1); //Set carry flag! 18 if (cnt==1) FLAGW_OF(((s >> 7) & 1)^FLAG_CF); 19 break; 20 21 case 1: //ROR r/m8 22 if (EMULATED_CPU>=CPU_80386) numcnt &= 7; //Operand size wrap! 23 else if (EMULATED_CPU >= CPU_NECV30) numcnt &= 0x1F; //Clear the upper 3 bits to become a NEC V20/V30+! 24 for (shift = 1; shift <= numcnt; shift++) { 25 tempCF = (s&1); //Save LSB! 26 s = (s >> 1) | (tempCF << 7); 27 FLAGW_CF(tempCF); //Set carry flag! 28 } 29 if (cnt==1) FLAGW_OF((s >> 7) ^ ((s >> 6) & 1)); 30 break; 31 32 case 2: //RCL r/m8 33 if (EMULATED_CPU >= CPU_NECV30) numcnt &= 0x1F; //Clear the upper 3 bits to become a NEC V20/V30+! 34 if (EMULATED_CPU>=CPU_80386) numcnt %= 9; //Operand size wrap! 35 for (shift = 1; shift <= numcnt; shift++) { 36 tempCF = ((s&0x80)>>7); //Save MSB! 37 s = (s << 1)|FLAG_CF; //Shift and set CF! 38 FLAGW_CF(tempCF); //Set CF! 39 } 40 if (cnt==1) FLAGW_OF(((s >> 7) & 1)^FLAG_CF); //OF=MSB^CF 41 break; 42 43 case 3: //RCR r/m8 44 if (EMULATED_CPU >= CPU_NECV30) numcnt &= 0x1F; //Clear the upper 3 bits to become a NEC V20/V30+! 45 if (EMULATED_CPU>=CPU_80386) numcnt %= 9; //Operand size wrap! 46 if (cnt==1) FLAGW_OF((s >> 7) ^ FLAG_CF); 47 for (shift = 1; shift <= numcnt; shift++) { 48 tempCF = (s&1); //Save LSB! 49 s = (s >> 1) | (FLAG_CF << 7); 50 FLAGW_CF(tempCF); 51 } 52 break; 53 54 case 4: case 6: //SHL r/m8 55 if (EMULATED_CPU >= CPU_NECV30) numcnt &= 0x1F; //Clear the upper 3 bits to become a NEC V20/V30+! 56 //FLAGW_AF(0); 57 for (shift = 1; shift <= numcnt; shift++) { 58 if (s & 0x80) FLAGW_CF(1); else FLAGW_CF(0); 59 //if (s & 0x8) FLAGW_AF(1); //Auxiliary carry? 60 s = (s << 1) & 0xFF;
…Show last 165 lines
61 } 62 if (numcnt==1) { if (FLAG_CF==(s>>7)) FLAGW_OF(0); else FLAGW_OF(1); } 63 flag_szp8((uint8_t)(s&0xFF)); break; 64 65 case 5: //SHR r/m8 66 if (EMULATED_CPU >= CPU_NECV30) numcnt &= 0x1F; //Clear the upper 3 bits to become a NEC V20/V30+! 67 if (numcnt==1) { if (s&0x80) FLAGW_OF(1); else FLAGW_OF(0); } 68 //FLAGW_AF(0); 69 for (shift = 1; shift <= numcnt; shift++) { 70 FLAGW_CF(s & 1); 71 //backup = s; //Save backup! 72 s = s >> 1; 73 //if (((backup^s)&0x10)) FLAGW_AF(1); //Auxiliary carry? 74 } 75 flag_szp8((uint8_t)(s & 0xFF)); break; 76 77 case 7: //SAR r/m8 78 if (EMULATED_CPU >= CPU_NECV30) numcnt &= 0x1F; //Clear the upper 3 bits to become a NEC V20/V30+! 79 msb = s & 0x80; 80 //FLAGW_AF(0); 81 for (shift = 1; shift <= numcnt; shift++) { 82 FLAGW_CF(s & 1); 83 //backup = s; //Save backup! 84 s = (s >> 1) | msb; 85 //if (((backup^s)&0x10)) FLAGW_AF(1); //Auxiliary carry? 86 } 87 byte tempSF; 88 tempSF = FLAG_SF; //Save the SF! 89 /*flag_szp8((uint8_t)(s & 0xFF));*/ 90 //http://www.electronics.dit.ie/staff/tscarff/8086_instruction_set/8086_instruction_set.html#SAR says only C and O flags! 91 if (!numcnt) //Nothing done? 92 { 93 FLAGW_SF(tempSF); //We don't update when nothing's done! 94 } 95 else if (numcnt==1) //Overflow is cleared on all 1-bit shifts! 96 { 97 flag_szp8(s); //Affect sign as well! 98 FLAGW_OF(0); //Cleared! 99 } 100 else if (numcnt) //Anything shifted at all? 101 { 102 flag_szp8(s); //Affect sign as well! 103 if (EMULATED_CPU<=CPU_NECV30) //Valid to update OF? 104 { 105 FLAGW_OF(0); //Cleared with count as well? 106 } 107 } 108 break; 109 } 110 op_grp2_cycles(numcnt, varshift); 111 return(s & 0xFF); 112} 113 114word op_grp2_16(byte cnt, byte varshift) { 115 //word d, 116 INLINEREGISTER uint_32 s, shift, tempCF, msb; 117 INLINEREGISTER byte numcnt; 118 //word backup; 119 //if (cnt>0x8) return(oper1b); //NEC V20/V30+ limits shift count 120 numcnt = cnt; //Save count! 121 s = oper1; 122 switch (thereg) { 123 case 0: //ROL r/m16 124 if (EMULATED_CPU>=CPU_80386) numcnt &= 0xF; //Operand size wrap! 125 else if (EMULATED_CPU >= CPU_NECV30) numcnt &= 0x1F; //Clear the upper 3 bits to become a NEC V20/V30+! 126 for (shift = 1; shift <= numcnt; shift++) { 127 tempCF = ((s&0x8000)>>15); //Save MSB! 128 s = (s << 1)|tempCF; 129 } 130 FLAGW_CF(s&1); //Set carry flag! 131 if (cnt==1) FLAGW_OF(((s >> 15) & 1)^FLAG_CF); 132 break; 133 134 case 1: //ROR r/m16 135 if (EMULATED_CPU>=CPU_80386) numcnt &= 0xF; //Operand size wrap! 136 else if (EMULATED_CPU >= CPU_NECV30) numcnt &= 0x1F; //Clear the upper 3 bits to become a NEC V20/V30+! 137 for (shift = 1; shift <= numcnt; shift++) { 138 tempCF = (s&1); //Save LSB! 139 s = (s >> 1) | (tempCF << 15); 140 FLAGW_CF(tempCF); //Set carry flag! 141 } 142 if (cnt==1) FLAGW_OF((s >> 15) ^ ((s >> 14) & 1)); 143 break; 144 145 case 2: //RCL r/m16 146 if (EMULATED_CPU >= CPU_NECV30) numcnt &= 0x1F; //Clear the upper 3 bits to become a NEC V20/V30+! 147 if (EMULATED_CPU>=CPU_80386) numcnt %= 17; //Operand size wrap! 148 for (shift = 1; shift <= numcnt; shift++) { 149 tempCF = ((s&0x8000)>>15); //Save MSB! 150 s = (s << 1)|FLAG_CF; //Shift and set CF! 151 FLAGW_CF(tempCF); //Set CF! 152 } 153 if (cnt==1) FLAGW_OF(((s >> 15) & 1)^FLAG_CF); //OF=MSB^CF 154 break; 155 156 case 3: //RCR r/m16 157 if (EMULATED_CPU >= CPU_NECV30) numcnt &= 0x1F; //Clear the upper 3 bits to become a NEC V20/V30+! 158 if (EMULATED_CPU>=CPU_80386) numcnt %= 17; //Operand size wrap! 159 if (cnt==1) FLAGW_OF((s >> 15) ^ FLAG_CF); 160 for (shift = 1; shift <= numcnt; shift++) { 161 tempCF = (s&1); //Save LSB! 162 s = (s >> 1) | (FLAG_CF << 15); 163 FLAGW_CF(tempCF); 164 } 165 break; 166 167 case 4: case 6: //SHL r/m16 168 if (EMULATED_CPU >= CPU_NECV30) numcnt &= 0x1F; //Clear the upper 3 bits to become a NEC V20/V30+! 169 //FLAGW_AF(0); 170 for (shift = 1; shift <= numcnt; shift++) { 171 if (s & 0x8000) FLAGW_CF(1); else FLAGW_CF(0); 172 //if (s & 0x8) FLAGW_AF(1); //Auxiliary carry? 173 s = (s << 1) & 0xFFFF; 174 } 175 if (numcnt==1) { if (FLAG_CF==(s>>15)) FLAGW_OF(0); else FLAGW_OF(1); } 176 flag_szp16((uint16_t)(s&0xFFFF)); break; 177 178 case 5: //SHR r/m16 179 if (EMULATED_CPU >= CPU_NECV30) numcnt &= 0x1F; //Clear the upper 3 bits to become a NEC V20/V30+! 180 if (numcnt==1) { if (s&0x8000) FLAGW_OF(1); else FLAGW_OF(0); } 181 //FLAGW_AF(0); 182 for (shift = 1; shift <= numcnt; shift++) { 183 FLAGW_CF(s & 1); 184 //backup = s; //Save backup! 185 s = s >> 1; 186 //if (((backup^s)&0x10)) FLAGW_AF(1); //Auxiliary carry? 187 } 188 flag_szp16((uint16_t)(s & 0xFFFF)); break; 189 190 case 7: //SAR r/m16 191 if (EMULATED_CPU >= CPU_NECV30) numcnt &= 0x1F; //Clear the upper 3 bits to become a NEC V20/V30+! 192 msb = s & 0x8000; 193 //FLAGW_AF(0); 194 for (shift = 1; shift <= numcnt; shift++) { 195 FLAGW_CF(s & 1); 196 //backup = s; //Save backup! 197 s = (s >> 1) | msb; 198 //if (((backup^s)&0x10)) FLAGW_AF(1); //Auxiliary carry? 199 } 200 byte tempSF; 201 tempSF = FLAG_SF; //Save the SF! 202 /*flag_szp8((uint8_t)(s & 0xFF));*/ 203 //http://www.electronics.dit.ie/staff/tscarff/8086_instruction_set/8086_instruction_set.html#SAR says only C and O flags! 204 if (!numcnt) //Nothing done? 205 { 206 FLAGW_SF(tempSF); //We don't update when nothing's done! 207 } 208 else if (numcnt==1) //Overflow is cleared on all 1-bit shifts! 209 { 210 flag_szp16(s); //Affect sign as well! 211 FLAGW_OF(0); //Cleared! 212 } 213 else if (numcnt) //Anything shifted at all? 214 { 215 flag_szp16(s); //Affect sign as well! 216 if (EMULATED_CPU<=CPU_NECV30) //Valid to update OF? 217 { 218 FLAGW_OF(0); //Cleared with count as well? 219 } 220 } 221 break; 222 } 223 op_grp2_cycles(numcnt, varshift); 224 return(s & 0xFFFF); 225}
32-bit:
1uint_32 op_grp2_32(byte cnt, byte varshift) { 2 //word d, 3 INLINEREGISTER uint_64 s, shift, tempCF, msb; 4 INLINEREGISTER byte numcnt; 5 //word backup; 6 //if (cnt>0x8) return(oper1b); //NEC V20/V30+ limits shift count 7 numcnt = cnt; //Save count! 8 s = oper1d; 9 switch (thereg) { 10 case 0: //ROL r/m32 11 if (EMULATED_CPU>=CPU_80386) numcnt &= 0x1F; //Operand size wrap! 12 else if (EMULATED_CPU >= CPU_NECV30) numcnt &= 0x1F; //Clear the upper 3 bits to become a NEC V20/V30+! 13 for (shift = 1; shift <= numcnt; shift++) { 14 tempCF = ((s&0x80000000)>>31); //Save MSB! 15 s = (s << 1)|tempCF; 16 } 17 FLAGW_CF(s&1); //Set carry flag! 18 if (cnt==1) FLAGW_OF(((s >> 31) & 1)^FLAG_CF); 19 break; 20 21 case 1: //ROR r/m32 22 if (EMULATED_CPU>=CPU_80386) numcnt &= 0x1F; //Operand size wrap! 23 else if (EMULATED_CPU >= CPU_NECV30) numcnt &= 0x1F; //Clear the upper 3 bits to become a NEC V20/V30+! 24 for (shift = 1; shift <= numcnt; shift++) { 25 tempCF = (s&1); //Save LSB! 26 s = (s >> 1) | (tempCF << 31); 27 FLAGW_CF(tempCF); //Set carry flag! 28 } 29 if (cnt==1) FLAGW_OF((s >> 31) ^ ((s >> 30) & 1)); 30 break; 31 32 case 2: //RCL r/m32 33 if (EMULATED_CPU >= CPU_NECV30) numcnt &= 0x1F; //Clear the upper 3 bits to become a NEC V20/V30+! 34 for (shift = 1; shift <= numcnt; shift++) { 35 tempCF = ((s&0x80000000)>>31); //Save MSB! 36 s = (s << 1)|FLAG_CF; //Shift and set CF! 37 FLAGW_CF(tempCF); //Set CF! 38 } 39 if (cnt==1) FLAGW_OF(((s >> 31) & 1)^FLAG_CF); //OF=MSB^CF 40 break; 41 42 case 3: //RCR r/m32 43 if (EMULATED_CPU >= CPU_NECV30) numcnt &= 0x1F; //Clear the upper 3 bits to become a NEC V20/V30+! 44 if (cnt==1) FLAGW_OF((s >> 31) ^ FLAG_CF); 45 for (shift = 1; shift <= numcnt; shift++) { 46 tempCF = (s&1); //Save LSB! 47 s = (s >> 1) | (FLAG_CF << 31); 48 FLAGW_CF(tempCF); 49 } 50 break; 51 52 case 4: case 6: //SHL r/m32 53 if (EMULATED_CPU >= CPU_NECV30) numcnt &= 0x1F; //Clear the upper 3 bits to become a NEC V20/V30+! 54 //FLAGW_AF(0); 55 for (shift = 1; shift <= numcnt; shift++) { 56 if (s & 0x80000000) FLAGW_CF(1); else FLAGW_CF(0); 57 //if (s & 0x8) FLAGW_AF(1); //Auxiliary carry? 58 s = (s << 1) & 0xFFFFFFFF; 59 } 60 if (numcnt==1) { if (FLAG_CF==(s>>31)) FLAGW_OF(0); else FLAGW_OF(1); }
…Show last 50 lines
61 flag_szp32((uint32_t)(s&0xFFFFFFFF)); break; 62 63 case 5: //SHR r/m32 64 if (EMULATED_CPU >= CPU_NECV30) numcnt &= 0x1F; //Clear the upper 3 bits to become a NEC V20/V30+! 65 if (numcnt==1) { if (s&0x80000000) FLAGW_OF(1); else FLAGW_OF(0); } 66 //FLAGW_AF(0); 67 for (shift = 1; shift <= numcnt; shift++) { 68 FLAGW_CF(s & 1); 69 //backup = s; //Save backup! 70 s = s >> 1; 71 //if (((backup^s)&0x10)) FLAGW_AF(1); //Auxiliary carry? 72 } 73 flag_szp32((uint32_t)(s & 0xFFFFFFFF)); break; 74 75 case 7: //SAR r/m32 76 if (EMULATED_CPU >= CPU_NECV30) numcnt &= 0x1F; //Clear the upper 3 bits to become a NEC V20/V30+! 77 msb = s & 0x80000000; 78 //FLAGW_AF(0); 79 for (shift = 1; shift <= numcnt; shift++) { 80 FLAGW_CF(s & 1); 81 //backup = s; //Save backup! 82 s = (s >> 1) | msb; 83 //if (((backup^s)&0x10)) FLAGW_AF(1); //Auxiliary carry? 84 } 85 byte tempSF; 86 tempSF = FLAG_SF; //Save the SF! 87 /*flag_szp8((uint8_t)(s & 0xFF));*/ 88 //http://www.electronics.dit.ie/staff/tscarff/8086_instruction_set/8086_instruction_set.html#SAR says only C and O flags! 89 if (!numcnt) //Nothing done? 90 { 91 FLAGW_SF(tempSF); //We don't update when nothing's done! 92 } 93 else if (numcnt==1) //Overflow is cleared on all 1-bit shifts! 94 { 95 flag_szp32(s); //Affect sign as well! 96 FLAGW_OF(0); //Cleared! 97 } 98 else if (numcnt) //Anything shifted at all? 99 { 100 flag_szp32(s); //Affect sign as well! 101 if (EMULATED_CPU<=CPU_NECV30) //Valid to update OF? 102 { 103 FLAGW_OF(0); //Cleared with count as well? 104 } 105 } 106 break; 107 } 108 op_grp2_cycles32(numcnt, varshift); 109 return(s & 0xFFFFFFFF); 110}
Can anyone see what's going wrong with those instructions?