VOGONS


test386.asm CPU tester

Topic actions

Reply 60 of 178, by peterferrie

User metadata
Rank Oldbie
Rank
Oldbie
superfury wrote:

I'm just asking to be sure: to the CPU, does a 2-operand opcode 69/6B(IMUL r16,imm8/16) even exist? Or does it always decode to 3 operands, with r/m and immediate being multiplied and stored into the reg operand?

Didn't we discuss this previously? The AX register as destination parameter is implicit in the documentation, but always present in reality and decodes in the usual way.

Reply 61 of 178, by superfury

User metadata
Rank l33t++
Rank
l33t++
peterferrie wrote:
superfury wrote:

I'm just asking to be sure: to the CPU, does a 2-operand opcode 69/6B(IMUL r16,imm8/16) even exist? Or does it always decode to 3 operands, with r/m and immediate being multiplied and stored into the reg operand?

Didn't we discuss this previously? The AX register as destination parameter is implicit in the documentation, but always present in reality and decodes in the usual way.

Huh? So the r16 in opcodes 69/6B is forced to be AX or EAX, depending on operand size? So the instruction only uses r/m16/32 and imm8/16/32 and always stores it's result in (E)AX? Then what is the reg part of modr/m used for? That's entirely different than what hottobar said a few posts back(reg16/32 is the destination of the result)?

Or do you mean the normal GRP3a/b instructions, which always store in (E)AX?

For some reason the logs keep erroring out on the opcode 69/6B IMUL8/16/32 instructions?

Author of the UniPCemu emulator.
UniPCemu Git repository
UniPCemu for Android, Windows, PSP, Vita and Switch on itch.io

Reply 63 of 178, by superfury

User metadata
Rank l33t++
Rank
l33t++

@peterferrie: So, if I understand this correctly:
- If opcodes 69/6B r16 decodes as AX/EAX, then it disassembles as the two-operand version.
- If opcodes 69/6B r16 decodes as other registers, then it disassembles as the three-operand version.

In both cases, behaviour is as the three-operand version?

Is that correct? Or is the two-operand disassembly of opcodes 69/6B an error in the 80386 manuals?

Edit: Btw, what do you mean with "two-byte" and "three-byte"? Two-byte=F7/F8/67/6B and Three-byte=0F opcode variant? Or do you actually mean two-operand and three-operand?

Author of the UniPCemu emulator.
UniPCemu Git repository
UniPCemu for Android, Windows, PSP, Vita and Switch on itch.io

Reply 64 of 178, by superfury

User metadata
Rank l33t++
Rank
l33t++

@hottobar: I've just tried to single step the DIV0 interrupt that happens at the first DIV instruction of the testsuite. I see it's doing it's stuff with ESI(saving it), calling the printStr function to display the error, modifying the return address, IRET to the return address. Then the RETD at the bottom of the handler returns to address 0010:00000000 instead of the original location of the loop that handles the processing of the OPs table. So something is going wrong at some point between the table starting handling of the DIV entry(call ESI instruction) and the return of the IRET handler?

Unfortunately searching the log isn't that easy: it's a 19.2GB large text file containing all debugger information that's logged(the current common log format with memory logging enabled(the latter( shorthand) method mentioned in my earlier posts on the common log thread)):

https://www.dropbox.com/s/00byfb9r02r8z7l/deb … 02_1036.7z?dl=0

Be warned that it requires at least 256MB memory to extract, as I've increased it from the default 64MB to increase the compression ratio(even though it's already at Ultra compression level), but I thought that every increase in compression is usable, due to the huge log file.

Author of the UniPCemu emulator.
UniPCemu Git repository
UniPCemu for Android, Windows, PSP, Vita and Switch on itch.io

Reply 65 of 178, by superfury

User metadata
Rank l33t++
Rank
l33t++

Strangely enough, using a simple large file text editor doesn't show to find any div instructions when searching the log for some odd reason?

Thinking about the problem, there must be a problem related to the stack somehow.
The stages should be as follows(assuming no stack changes):
call idivloc(esi)
*pushes eip*
idivloc: div causing int 0
*pushes flags*
*pushes cs*
*pushes eip*
calls log "#DE " and returns
modifies eip at [ESP] (confirmed in disassembly, which is missing from the log somehow?)
iret
*pops modified eip*
*pops original cs*
*pops original eflags*
retd
*pops original eip of test dispatch routine for op table* (reads 0x00000000 EIP from the stack)

All of these should be working correctly: calld/retd because of earlier tests and div(int)/iret because of earlier faults(bounds x2, paging fault).

Is there something different on the div fault handing compared to the other faults? Privilege changes or the like?

Author of the UniPCemu emulator.
UniPCemu Git repository
UniPCemu for Android, Windows, PSP, Vita and Switch on itch.io

Reply 66 of 178, by peterferrie

User metadata
Rank Oldbie
Rank
Oldbie
superfury wrote:
@peterferrie: So, if I understand this correctly: - If opcodes 69/6B r16 decodes as AX/EAX, then it disassembles as the two-oper […]
Show full quote

@peterferrie: So, if I understand this correctly:
- If opcodes 69/6B r16 decodes as AX/EAX, then it disassembles as the two-operand version.
- If opcodes 69/6B r16 decodes as other registers, then it disassembles as the three-operand version.

In both cases, behaviour is as the three-operand version?

Yes, that's correct.

superfury wrote:

Edit: Btw, what do you mean with "two-byte" and "three-byte"? Two-byte=F7/F8/67/6B and Three-byte=0F opcode variant? Or do you actually mean two-operand and three-operand?

Yes, I meant two-operand, not two-byte. My mistake.

Reply 67 of 178, by peterferrie

User metadata
Rank Oldbie
Rank
Oldbie
superfury wrote:
Strangely enough, using a simple large file text editor doesn't show to find any div instructions when searching the log for som […]
Show full quote

Strangely enough, using a simple large file text editor doesn't show to find any div instructions when searching the log for some odd reason?

Thinking about the problem, there must be a problem related to the stack somehow.
The stages should be as follows(assuming no stack changes):
call idivloc(esi)
*pushes eip*
idivloc: div causing int 0
*pushes flags*
*pushes cs*
*pushes eip*
calls log "#DE " and returns
modifies eip at [ESP] (confirmed in disassembly, which is missing from the log somehow?)
iret
*pops modified eip*
*pops original cs*
*pops original eflags*
retd
*pops original eip of test dispatch routine for op table* (reads 0x00000000 EIP from the stack)

All of these should be working correctly: calld/retd because of earlier tests and div(int)/iret because of earlier faults(bounds x2, paging fault).

Is there something different on the div fault handing compared to the other faults? Privilege changes or the like?

The first question - is your stack pointer returned to the value that it should have when the call was made?
I suspect that's where the problem lies. Perhaps something is being left on the stack, like an error code or similar?

Reply 68 of 178, by superfury

User metadata
Rank l33t++
Rank
l33t++

There's no error code with a Divide Error exception, so that can't be it. I'll look if I see any stack pointer errors when I have the time.

Author of the UniPCemu emulator.
UniPCemu Git repository
UniPCemu for Android, Windows, PSP, Vita and Switch on itch.io

Reply 69 of 178, by superfury

User metadata
Rank l33t++
Rank
l33t++

I've just looked at the DIV0 interrupt firing: I see some strange stack pushes there(according to Visual Studio debugger):
fbf8=original CS before INT0
fbf4=EFLAGS
f9f4=CS
f9f0=EIP

That is a crazy jump between EFLAGS and CS?

Edit: It's just the 32-bit extension bit (0x8) being used to shift left 2 by 8 to obtain the usual 0/1 value(word/dword value) which becomes an invalid shift when applied to the stack, moving ESP down by no less than 2<<8=512 bytes:S That happened when pushing CS on the stack and decreasing ESP before writing CS to memory:S

Edit: Having fixed this, it now continues on towards the other tests:D

Edit: After stepping though, I see it properly finishing up now and entering the final HLT at the end of the program(after the return to real mode and loading all registers, clearing the interrupt flag and executing HLT at POST FF). 😁

This is the EE log UniPCemu's generating:

Filename
porte9.log
File size
2.56 MiB
Downloads
51 downloads
File comment
Port E9 output, prestripped dates.
File license
Fair use/fair dealing exception

It does seem to give lots of errors from the IMUL8 onwards still? What's going wrong with those instructions? The same applies to the DIV instructions?

Edit: Looking at a simple diff from http://prettydiff.com/ , I see the following instructions failing:

- All IMUL8/16/32 instructions.
- DIVDX W regarding #DE(not faulting).
- DIVEDX W/DIVEDX D regarding #DE(not faulting).
- DIVAX W regarding #DE(not faulting).
- DIVEAX D regarding #DE(not faulting).
- IDIVDL B not always faulting #DE.
- IDIVEDX D not always faulting #DE.
- IDIVEDX D EAX = 80000001 EDX = FFFFFFFF PS = 0000 wrong result?
etc.
- Shift/rotate carry flags problems? :
- SALr B/W/D flags problems?
- SHRr B/W/D flags problems?
- ROLr B/W/D flags problems?
- RORr B/W/D flags problems?
- RCLr B/W/D problems?
- RCRr B/W/D problems?

So, simply said:
- flags problems on the shift/rotate instructions(mostly carry flag as far as I can see?)
- full output problems on IMUL8/16/32 instructions?
- #DE problems with DIV/IDIV instructions?

8086+ opcodes: https://bitbucket.org/superfury/unipcemu/src/ … 086.c?at=master
NEC V30+ opcodes(0F IMUL instruction): https://bitbucket.org/superfury/unipcemu/src/ … V30.c?at=master
32-bit operand size of all 80286- instructions: https://bitbucket.org/superfury/unipcemu/src/ … 386.c?at=master
80386+ 0F instructions: https://bitbucket.org/superfury/unipcemu/src/ … 386.c?at=master

Can you see what's exactly going wrong? Search for OP0FXX or OPXX for the opcode emulation itself(OP6B, OP69, OPF6, OPF7 and OPD*).

Author of the UniPCemu emulator.
UniPCemu Git repository
UniPCemu for Android, Windows, PSP, Vita and Switch on itch.io

Reply 70 of 178, by hottobar

User metadata
Rank Newbie
Rank
Newbie
superfury wrote:

Edit: Having fixed this, it now continues on towards the other tests:D

Congratulations! 😁

superfury wrote:

Can you see what's exactly going wrong? Search for OP0FXX or OPXX for the opcode emulation itself(OP6B, OP69, OPF6, OPF7 and OPD*).

I quickly read CPU_OP6B(), I didn't find the bug but here's a couple of things that can prevent future bugs.

You can sign extend by doing a type cast, so instead of:

temp2.val64 = (uint_64)immb; //Read unsigned parameter!
if (temp2.val64&0x80ULL) temp2.val64 |= 0xFFFFFFFFFFFFFF00ULL; //Sign extend to 64 bits!

you can simply do:

temp2.val64 = (int_8)immb;

(assuming you've declared int_8 as a signed 8 bit type, like int8_t)

I couldn't find the declaration of VAL64Splitter, but I assume it's a union used for type conversion.

temp3.val64s = temp1.val64s * temp2.val64s;

Beware that the use of unions to do a reinterpret_cast of data is undefined behavior.

Reply 71 of 178, by superfury

User metadata
Rank l33t++
Rank
l33t++

I know, but I assume that if the data size is constant(e.g. 64-bit integer, 32-bit integer, 16-bit integer, 8-bit integer), it shouldn't be a problem(e.g. uint_64 to int_64 and vise-versa)? The only thing that effectively changes is the sign bit becoming a negative value of itself, assuming C/C++ compatiblity on IEEE standards?

Author of the UniPCemu emulator.
UniPCemu Git repository
UniPCemu for Android, Windows, PSP, Vita and Switch on itch.io

Reply 72 of 178, by hottobar

User metadata
Rank Newbie
Rank
Newbie
superfury wrote:

I know, but I assume that if the data size is constant(e.g. 64-bit integer, 32-bit integer, 16-bit integer, 8-bit integer), it shouldn't be a problem(e.g. uint_64 to int_64 and vise-versa)? The only thing that effectively changes is the sign bit becoming a negative value of itself, assuming C/C++ compatiblity on IEEE standards?

The problem is that compilers exploit undefined behaviours to do optimizations. A compiler can change the way it treats a specific UB without notice and your code could break in unpredictable ways.
See for example how GCC uses the signed overflow UB.

Reply 73 of 178, by superfury

User metadata
Rank l33t++
Rank
l33t++

Aren't signed/unsigned integer values defined as an IEEE standard, thus never changing? If they were to change, lots of files would simply break due to invalid contents? So it should be assumable that using an union to convert between signed/unsigned variables shouldn't pose a problem?

Author of the UniPCemu emulator.
UniPCemu Git repository
UniPCemu for Android, Windows, PSP, Vita and Switch on itch.io

Reply 74 of 178, by superfury

User metadata
Rank l33t++
Rank
l33t++

I've improved the DIV/IDIV algorithm a bit. Now the DIV instructions check out correctly. But the IDIV instruction seem to fail somehow?

Changes: https://bitbucket.org/superfury/unipcemu/comm … 82fa270f471bb10

Although it should improve the signed overflow detection, it somehow doesn't fully fix it? Can you see what's going wrong?

Filename
porte9.log
File size
2.57 MiB
Downloads
44 downloads
File comment
Latest EE log from UniPCemu, showing errors in IDIV and shift/rotate arithmetic.
File license
Fair use/fair dealing exception

(ignore the enc_temp_folder file, it seems to have been committed with the rest accidently, removed in newer commit).

Author of the UniPCemu emulator.
UniPCemu Git repository
UniPCemu for Android, Windows, PSP, Vita and Switch on itch.io

Reply 75 of 178, by superfury

User metadata
Rank l33t++
Rank
l33t++

I've improved the overflow detection a bit, but it still doesn't catch all or in some cases seems to catch too many overflows?

/*

checkSignedOverflow: Checks if a signed overflow occurs trying to store the data.
unsignedval: The unsigned, positive value
calculatedbits: The amount of bits that's stored in unsignedval.
bits: The amount of bits to store in.
convertedtopositive: The unsignedval is a positive conversion from a negative result, so needs to be converted back.

*/

//Based on http://www.ragestorm.net/blogs/?p=34

byte checkSignedOverflow(uint_64 unsignedval, byte calculatedbits, byte bits, byte convertedtopositive)
{
uint_64 maxpositive,maxbit,errorrange;
if (convertedtopositive) unsignedval = (~unsignedval)+1; //Convert to negative, if needed!
maxpositive = ((1ULL<<(bits-1))-1); //Maximum positive value we can have!
maxbit = (1ULL<<(bits-1)); //The highest value we cannot set!
errorrange = ((1ULL<<bits)-1)-maxpositive; //Lower roof of invalid range!
if (unlikely((unsignedval>maxpositive) && ((unsignedval<maxbit)||(unsignedval>errorrange)))) //Signed underflow/overflow on unsinged conversion?
{
return 1; //Underflow/overflow detected!
}
return 0; //OK!
}

32-bit (I)DIV:

//Universal DIV instruction for x86 DIV instructions!
/*

Parameters:
val: The value to divide
divisor: The value to divide by
quotient: Quotient result container
remainder: Remainder result container
error: 1 on error(DIV0), 0 when valid.
resultbits: The amount of bits the result contains(16 or 8 on 8086) of quotient and remainder.
SHLcycle: The amount of cycles for each SHL.
ADDSUBcycle: The amount of cycles for ADD&SUB instruction to execute.
issigned: Signed division?
quotientnegative: Quotient is signed negative result?
remaindernegative: Remainder is signed negative result?

*/
void CPU80386_internal_DIV(uint_64 val, uint_32 divisor, uint_32 *quotient, uint_32 *remainder, byte *error, byte resultbits, byte SHLcycle, byte ADDSUBcycle, byte *applycycles, byte issigned, byte quotientnegative, byte remaindernegative)
{
uint_64 temp, temp2, currentquotient; //Remaining value and current divisor!
uint_64 resultquotient;
byte shift; //The shift to apply! No match on 0 shift is done!
temp = val; //Load the value to divide!
*applycycles = 1; //Default: apply the cycles normally!
if (divisor==0) //Not able to divide?
{
*quotient = 0;
*remainder = temp; //Unable to comply!
*error = 1; //Divide by 0 error!
return; //Abort: division by 0!
}

if (CPU_apply286cycles()) /* No 80286+ cycles instead? */
{
SHLcycle = ADDSUBcycle = 0; //Don't apply the cycle counts for this instruction!
*applycycles = 0; //Don't apply the cycles anymore!
}

temp = val; //Load the remainder to use!
resultquotient = 0; //Default: we have nothing after division!
nextstep:
//First step: calculate shift so that (divisor<<shift)<=remainder and ((divisor<<(shift+1))>remainder)
temp2 = divisor; //Load the default divisor for x1!
if (temp2>temp) //Not enough to divide? We're done!
{
goto gotresult; //We've gotten a result!
}
currentquotient = 1; //We're starting with x1 factor!
for (shift=0;shift<(resultbits+1);++shift) //Check for the biggest factor to apply(we're going from bit 0 to maxbit)!
{
if ((temp2<=temp) && ((temp2<<1)>temp)) //Found our value to divide?
{
CPU[activeCPU].cycles_OP += SHLcycle; //We're taking 1 more SHL cycle for this!
break; //We've found our shift!
}
temp2 <<= 1; //Shift to the next position!
currentquotient <<= 1; //Shift to the next result!
CPU[activeCPU].cycles_OP += SHLcycle; //We're taking 1 SHL cycle for this! Assuming parallel shifting!
}
if (shift==(resultbits+1)) //We've overflown? We're too large to divide!
Show last 71 lines
	{
*error = 1; //Raise divide by 0 error due to overflow!
return; //Abort!
}
//Second step: substract divisor<<n from remainder and increase result with 1<<n.
temp -= temp2; //Substract divisor<<n from remainder!
resultquotient += currentquotient; //Increase result(divided value) with the found power of 2 (1<<n).
CPU[activeCPU].cycles_OP += ADDSUBcycle; //We're taking 1 substract and 1 addition cycle for this(ADD/SUB register take 3 cycles)!
goto nextstep; //Start the next step!
//Finished when remainder<divisor or remainder==0.
gotresult: //We've gotten a result!
if ((uint_64)temp>((1ULL<<resultbits)-1)) //Modulo overflow?
{
*error = 1; //Raise divide by 0 error due to overflow!
return; //Abort!
}
if ((uint_64)resultquotient>((1ULL<<resultbits)-1)) //Quotient overflow?
{
*error = 1; //Raise divide by 0 error due to overflow!
return; //Abort!
}
if (issigned) //Check for signed overflow as well?
{
/*
if (checkSignedOverflow((uint_64)temp,64,resultbits,remaindernegative))
{
*error = 1; //Raise divide by 0 error due to overflow!
return; //Abort!
}
*/
if (checkSignedOverflow((uint_64)resultquotient,64,resultbits,quotientnegative))
{
*error = 1; //Raise divide by 0 error due to overflow!
return; //Abort!
}
}
*quotient = resultquotient; //Quotient calculated!
*remainder = temp; //Give the modulo! The result is already calculated!
*error = 0; //We're having a valid result!
}

void CPU80386_internal_IDIV(uint_64 val, uint_32 divisor, uint_32 *quotient, uint_32 *remainder, byte *error, byte resultbits, byte SHLcycle, byte ADDSUBcycle, byte *applycycles)
{
byte quotientnegative, remaindernegative; //To toggle the result and apply sign after and before?
quotientnegative = remaindernegative = 0; //Default: don't toggle the result not remainder!
if (((val>>31)!=(divisor>>15))) //Are we to change signs on the result? The result is negative instead! (We're a +/- or -/+ division)
{
quotientnegative = 1; //We're to toggle the result sign if not zero!
}
if (val&0x80000000) //Negative value to divide?
{
val = ((~val)+1); //Convert the negative value to be positive!
remaindernegative = 1; //We're to toggle the remainder is any, because the value to divide is negative!
}
if (divisor&0x8000) //Negative divisor? Convert to a positive divisor!
{
divisor = ((~divisor)+1); //Convert the divisor to be positive!
}
CPU80386_internal_DIV(val,divisor,quotient,remainder,error,resultbits,SHLcycle,ADDSUBcycle,applycycles,1,quotientnegative,remaindernegative); //Execute the division as an unsigned division!
if (*error==0) //No error has occurred? Do post-processing of the results!
{
if (quotientnegative) //The result is negative?
{
*quotient = (~*quotient)+1; //Apply the new sign to the result!
}
if (remaindernegative) //The remainder is negative?
{
*remainder = (~*remainder)+1; //Apply the new sign to the remainder!
}
}
}

16-bit (I)DIV:

//Universal DIV instruction for x86 DIV instructions!
/*

Parameters:
val: The value to divide
divisor: The value to divide by
quotient: Quotient result container
remainder: Remainder result container
error: 1 on error(DIV0), 0 when valid.
resultbits: The amount of bits the result contains(16 or 8 on 8086) of quotient and remainder.
SHLcycle: The amount of cycles for each SHL.
ADDSUBcycle: The amount of cycles for ADD&SUB instruction to execute.
issigned: Signed division?
quotientnegative: Quotient is signed negative result?
remaindernegative: Remainder is signed negative result?

*/
void CPU8086_internal_DIV(uint_32 val, word divisor, word *quotient, word *remainder, byte *error, byte resultbits, byte SHLcycle, byte ADDSUBcycle, byte *applycycles, byte issigned, byte quotientnegative, byte remaindernegative)
{
uint_32 temp, temp2, currentquotient; //Remaining value and current divisor!
uint_32 resultquotient;
byte shift; //The shift to apply! No match on 0 shift is done!
temp = val; //Load the value to divide!
*applycycles = 1; //Default: apply the cycles normally!
if (divisor==0) //Not able to divide?
{
*quotient = 0;
*remainder = temp; //Unable to comply!
*error = 1; //Divide by 0 error!
return; //Abort: division by 0!
}

if (CPU_apply286cycles()) /* No 80286+ cycles instead? */
{
SHLcycle = ADDSUBcycle = 0; //Don't apply the cycle counts for this instruction!
*applycycles = 0; //Don't apply the cycles anymore!
}

temp = val; //Load the remainder to use!
resultquotient = 0; //Default: we have nothing after division!
nextstep:
//First step: calculate shift so that (divisor<<shift)<=remainder and ((divisor<<(shift+1))>remainder)
temp2 = divisor; //Load the default divisor for x1!
if (temp2>temp) //Not enough to divide? We're done!
{
goto gotresult; //We've gotten a result!
}
currentquotient = 1; //We're starting with x1 factor!
for (shift=0;shift<(resultbits+1);++shift) //Check for the biggest factor to apply(we're going from bit 0 to maxbit)!
{
if ((temp2<=temp) && ((temp2<<1)>temp)) //Found our value to divide?
{
CPU[activeCPU].cycles_OP += SHLcycle; //We're taking 1 more SHL cycle for this!
break; //We've found our shift!
}
temp2 <<= 1; //Shift to the next position!
currentquotient <<= 1; //Shift to the next result!
CPU[activeCPU].cycles_OP += SHLcycle; //We're taking 1 SHL cycle for this! Assuming parallel shifting!
}
if (shift==(resultbits+1)) //We've overflown? We're too large to divide!
Show last 71 lines
	{
*error = 1; //Raise divide by 0 error due to overflow!
return; //Abort!
}
//Second step: substract divisor<<n from remainder and increase result with 1<<n.
temp -= temp2; //Substract divisor<<n from remainder!
resultquotient += currentquotient; //Increase result(divided value) with the found power of 2 (1<<n).
CPU[activeCPU].cycles_OP += ADDSUBcycle; //We're taking 1 substract and 1 addition cycle for this(ADD/SUB register take 3 cycles)!
goto nextstep; //Start the next step!
//Finished when remainder<divisor or remainder==0.
gotresult: //We've gotten a result!
if (temp>((1<<resultbits)-1)) //Modulo overflow?
{
*error = 1; //Raise divide by 0 error due to overflow!
return; //Abort!
}
if (resultquotient>((1<<resultbits)-1)) //Quotient overflow?
{
*error = 1; //Raise divide by 0 error due to overflow!
return; //Abort!
}
if (issigned) //Check for signed overflow as well?
{
/*
if (checkSignedOverflow(temp,32,resultbits,remaindernegative))
{
*error = 1; //Raise divide by 0 error due to overflow!
return; //Abort!
}
*/
if (checkSignedOverflow(resultquotient,32,resultbits,quotientnegative))
{
*error = 1; //Raise divide by 0 error due to overflow!
return; //Abort!
}
}
*quotient = resultquotient; //Quotient calculated!
*remainder = temp; //Give the modulo! The result is already calculated!
*error = 0; //We're having a valid result!
}

void CPU8086_internal_IDIV(uint_32 val, word divisor, word *quotient, word *remainder, byte *error, byte resultbits, byte SHLcycle, byte ADDSUBcycle, byte *applycycles)
{
byte quotientnegative, remaindernegative; //To toggle the result and apply sign after and before?
quotientnegative = remaindernegative = 0; //Default: don't toggle the result not remainder!
if (((val>>31)!=(divisor>>15))) //Are we to change signs on the result? The result is negative instead! (We're a +/- or -/+ division)
{
quotientnegative = 1; //We're to toggle the result sign if not zero!
}
if (val&0x80000000) //Negative value to divide?
{
val = ((~val)+1); //Convert the negative value to be positive!
remaindernegative = 1; //We're to toggle the remainder is any, because the value to divide is negative!
}
if (divisor&0x8000) //Negative divisor? Convert to a positive divisor!
{
divisor = ((~divisor)+1); //Convert the divisor to be positive!
}
CPU8086_internal_DIV(val,divisor,quotient,remainder,error,resultbits,SHLcycle,ADDSUBcycle,applycycles,1,quotientnegative,remaindernegative); //Execute the division as an unsigned division!
if (*error==0) //No error has occurred? Do post-processing of the results!
{
if (quotientnegative) //The result is negative?
{
*quotient = (~*quotient)+1; //Apply the new sign to the result!
}
if (remaindernegative) //The remainder is negative?
{
*remainder = (~*remainder)+1; //Apply the new sign to the remainder!
}
}
}

AAM:

	CPU8086_internal_DIV(REG_AL,data,&quotient,&remainder,&error,8,2,6,&applycycles,0,0,0);

8-bit DIV to AH/AL:

	CPU8086_internal_DIV(valdiv,divisor,&quotient,&remainder,&error,8,2,6,&applycycles,0,0,0); //Execute the unsigned division! 8-bits result and modulo!

8-bit IDIV to AH/AL:

	valdivd = valdiv;
divisorw = divisor;
if (valdiv&0x8000) valdivd |= 0xFFFF0000; //Sign extend to 32-bits!
if (divisor&0x80) divisorw |= 0xFF00; //Sign extend to 16-bits!
CPU8086_internal_IDIV(valdivd,divisorw,&quotient,&remainder,&error,8,2,6,&applycycles); //Execute the unsigned division! 8-bits result and modulo!

16-bit DIV to AX/DX:

	CPU8086_internal_DIV(valdiv,divisor,&quotient,&remainder,&error,16,2,6,&applycycles,0,0,0); //Execute the unsigned division! 8-bits result and modulo!

16-bit IDIV to AX/DX:

	CPU8086_internal_IDIV(valdiv,divisor,&quotient,&remainder,&error,16,2,6,&applycycles); //Execute the unsigned division! 8-bits result and modulo!

32-bit DIV to EAX/EDX:

	CPU80386_internal_DIV(valdiv,divisor,&quotient,&remainder,&error,32,2,6,&applycycles,0,0,0); //Execute the unsigned division! 8-bits result and modulo!

32-bit IDIV to EAX/EDX:

	CPU80386_internal_IDIV(valdiv,divisor,&quotient,&remainder,&error,32,2,6,&applycycles); //Execute the unsigned division! 8-bits result and modulo!

Attachments

  • Filename
    porte9.log
    File size
    2.57 MiB
    Downloads
    46 downloads
    File comment
    POST EE log generated by UniPCemu.
    File license
    Fair use/fair dealing exception

Author of the UniPCemu emulator.
UniPCemu Git repository
UniPCemu for Android, Windows, PSP, Vita and Switch on itch.io

Reply 76 of 178, by superfury

User metadata
Rank l33t++
Rank
l33t++

Just asking: what exactly IS the meaning of the different values that are logged in the EE log? What are the values logged with EAX/EDX and PS values? Is PS simply a direct dump of the lower 16 bits of the EFLAGS register(masked to only contain (un)defined bits)? What about the two sets of EAX/EDX values? Are they the EAX/EDX registers before/after the instruction, or something else?

Author of the UniPCemu emulator.
UniPCemu Git repository
UniPCemu for Android, Windows, PSP, Vita and Switch on itch.io

Reply 77 of 178, by superfury

User metadata
Rank l33t++
Rank
l33t++

I've managed to improve the under/overflow algorithm to detect better, but somehow your EE log reports #DE when it shouldn't be, according to pure logic?

/*

checkSignedOverflow: Checks if a signed overflow occurs trying to store the data.
unsignedval: The unsigned, positive value
calculatedbits: The amount of bits that's stored in unsignedval.
bits: The amount of bits to store in.
convertedtopositive: The unsignedval is a positive conversion from a negative result, so needs to be converted back.

*/

//Based on http://www.ragestorm.net/blogs/?p=34

byte checkSignedOverflow(uint_64 unsignedval, byte calculatedbits, byte bits, byte convertedtopositive)
{
uint_64 maxpositive,maxnegative;
maxpositive = ((1ULL<<(bits-1))-1); //Maximum positive value we can have!
maxnegative = (1ULL<<(bits-1)); //The highest value we cannot set and get past when negative!
if (unlikely(((unsignedval>maxpositive) && (convertedtopositive==0)) && ((unsignedval>maxnegative) && (convertedtopositive)))) //Signed underflow/overflow on unsinged conversion?
{
return 1; //Underflow/overflow detected!
}
return 0; //OK!
}

Example faulting line:
Actual result it should be(according to the reference file, at row 25054):

IDIVDL B EAX=00000080 EDX=00000001 PS=0000 #DE EAX=00000080 EDX=00000001 PS=0000 

UniPCemu's result:

IDIVDL B EAX=00000080 EDX=00000001 PS=0000 EAX=00000080 EDX=00000001 PS=0000 

UniPCemu doesn't detect this correctly somehow? So it's +80/1=+80, which won't fit in the 8-bit signed result, causing a fault?

Author of the UniPCemu emulator.
UniPCemu Git repository
UniPCemu for Android, Windows, PSP, Vita and Switch on itch.io

Reply 78 of 178, by superfury

User metadata
Rank l33t++
Rank
l33t++

Managed to fix the IDIV instructions too now, with all variants (as well as normal DIV) functioning properly:

The new and improved x86 (with sign support properly added and working) division algorithms:

Overflow check support for determining Division error:

/*

checkSignedOverflow: Checks if a signed overflow occurs trying to store the data.
unsignedval: The unsigned, positive value
calculatedbits: The amount of bits that's stored in unsignedval.
bits: The amount of bits to store in.
convertedtopositive: The unsignedval is a positive conversion from a negative result, so needs to be converted back.

*/

//Based on http://www.ragestorm.net/blogs/?p=34

byte checkSignedOverflow(uint_64 unsignedval, byte calculatedbits, byte bits, byte convertedtopositive)
{
uint_64 maxpositive,maxnegative;
maxpositive = ((1ULL<<(bits-1))-1); //Maximum positive value we can have!
maxnegative = (1ULL<<(bits-1)); //The highest value we cannot set and get past when negative!
if (unlikely(((unsignedval>maxpositive) && (convertedtopositive==0)) || ((unsignedval>maxnegative) && (convertedtopositive)))) //Signed underflow/overflow on unsinged conversion?
{
return 1; //Underflow/overflow detected!
}
return 0; //OK!
}

16-bit:

//Universal DIV instruction for x86 DIV instructions!
/*

Parameters:
val: The value to divide
divisor: The value to divide by
quotient: Quotient result container
remainder: Remainder result container
error: 1 on error(DIV0), 0 when valid.
resultbits: The amount of bits the result contains(16 or 8 on 8086) of quotient and remainder.
SHLcycle: The amount of cycles for each SHL.
ADDSUBcycle: The amount of cycles for ADD&SUB instruction to execute.
issigned: Signed division?
quotientnegative: Quotient is signed negative result?
remaindernegative: Remainder is signed negative result?

*/
void CPU8086_internal_DIV(uint_32 val, word divisor, word *quotient, word *remainder, byte *error, byte resultbits, byte SHLcycle, byte ADDSUBcycle, byte *applycycles, byte issigned, byte quotientnegative, byte remaindernegative)
{
uint_32 temp, temp2, currentquotient; //Remaining value and current divisor!
uint_32 resultquotient;
byte shift; //The shift to apply! No match on 0 shift is done!
temp = val; //Load the value to divide!
*applycycles = 1; //Default: apply the cycles normally!
if (divisor==0) //Not able to divide?
{
*quotient = 0;
*remainder = temp; //Unable to comply!
*error = 1; //Divide by 0 error!
return; //Abort: division by 0!
}

if (CPU_apply286cycles()) /* No 80286+ cycles instead? */
{
SHLcycle = ADDSUBcycle = 0; //Don't apply the cycle counts for this instruction!
*applycycles = 0; //Don't apply the cycles anymore!
}

temp = val; //Load the remainder to use!
resultquotient = 0; //Default: we have nothing after division!
nextstep:
//First step: calculate shift so that (divisor<<shift)<=remainder and ((divisor<<(shift+1))>remainder)
temp2 = divisor; //Load the default divisor for x1!
if (temp2>temp) //Not enough to divide? We're done!
{
goto gotresult; //We've gotten a result!
}
currentquotient = 1; //We're starting with x1 factor!
for (shift=0;shift<(resultbits+1);++shift) //Check for the biggest factor to apply(we're going from bit 0 to maxbit)!
{
if ((temp2<=temp) && ((temp2<<1)>temp)) //Found our value to divide?
{
CPU[activeCPU].cycles_OP += SHLcycle; //We're taking 1 more SHL cycle for this!
break; //We've found our shift!
}
temp2 <<= 1; //Shift to the next position!
currentquotient <<= 1; //Shift to the next result!
CPU[activeCPU].cycles_OP += SHLcycle; //We're taking 1 SHL cycle for this! Assuming parallel shifting!
}
if (shift==(resultbits+1)) //We've overflown? We're too large to divide!
Show last 71 lines
	{
*error = 1; //Raise divide by 0 error due to overflow!
return; //Abort!
}
//Second step: substract divisor<<n from remainder and increase result with 1<<n.
temp -= temp2; //Substract divisor<<n from remainder!
resultquotient += currentquotient; //Increase result(divided value) with the found power of 2 (1<<n).
CPU[activeCPU].cycles_OP += ADDSUBcycle; //We're taking 1 substract and 1 addition cycle for this(ADD/SUB register take 3 cycles)!
goto nextstep; //Start the next step!
//Finished when remainder<divisor or remainder==0.
gotresult: //We've gotten a result!
if (temp>((1<<resultbits)-1)) //Modulo overflow?
{
*error = 1; //Raise divide by 0 error due to overflow!
return; //Abort!
}
if (resultquotient>((1<<resultbits)-1)) //Quotient overflow?
{
*error = 1; //Raise divide by 0 error due to overflow!
return; //Abort!
}
if (issigned) //Check for signed overflow as well?
{
/*
if (checkSignedOverflow(temp,32,resultbits,remaindernegative))
{
*error = 1; //Raise divide by 0 error due to overflow!
return; //Abort!
}
*/
if (checkSignedOverflow(resultquotient,32,resultbits,quotientnegative))
{
*error = 1; //Raise divide by 0 error due to overflow!
return; //Abort!
}
}
*quotient = resultquotient; //Quotient calculated!
*remainder = temp; //Give the modulo! The result is already calculated!
*error = 0; //We're having a valid result!
}

void CPU8086_internal_IDIV(uint_32 val, word divisor, word *quotient, word *remainder, byte *error, byte resultbits, byte SHLcycle, byte ADDSUBcycle, byte *applycycles)
{
byte quotientnegative, remaindernegative; //To toggle the result and apply sign after and before?
quotientnegative = remaindernegative = 0; //Default: don't toggle the result not remainder!
if (((val>>31)!=(divisor>>15))) //Are we to change signs on the result? The result is negative instead! (We're a +/- or -/+ division)
{
quotientnegative = 1; //We're to toggle the result sign if not zero!
}
if (val&0x80000000) //Negative value to divide?
{
val = ((~val)+1); //Convert the negative value to be positive!
remaindernegative = 1; //We're to toggle the remainder is any, because the value to divide is negative!
}
if (divisor&0x8000) //Negative divisor? Convert to a positive divisor!
{
divisor = ((~divisor)+1); //Convert the divisor to be positive!
}
CPU8086_internal_DIV(val,divisor,quotient,remainder,error,resultbits,SHLcycle,ADDSUBcycle,applycycles,1,quotientnegative,remaindernegative); //Execute the division as an unsigned division!
if (*error==0) //No error has occurred? Do post-processing of the results!
{
if (quotientnegative) //The result is negative?
{
*quotient = (~*quotient)+1; //Apply the new sign to the result!
}
if (remaindernegative) //The remainder is negative?
{
*remainder = (~*remainder)+1; //Apply the new sign to the remainder!
}
}
}

32-bit:

//Universal DIV instruction for x86 DIV instructions!
/*

Parameters:
val: The value to divide
divisor: The value to divide by
quotient: Quotient result container
remainder: Remainder result container
error: 1 on error(DIV0), 0 when valid.
resultbits: The amount of bits the result contains(16 or 8 on 8086) of quotient and remainder.
SHLcycle: The amount of cycles for each SHL.
ADDSUBcycle: The amount of cycles for ADD&SUB instruction to execute.
issigned: Signed division?
quotientnegative: Quotient is signed negative result?
remaindernegative: Remainder is signed negative result?

*/
void CPU80386_internal_DIV(uint_64 val, uint_32 divisor, uint_32 *quotient, uint_32 *remainder, byte *error, byte resultbits, byte SHLcycle, byte ADDSUBcycle, byte *applycycles, byte issigned, byte quotientnegative, byte remaindernegative)
{
uint_64 temp, temp2, currentquotient; //Remaining value and current divisor!
uint_64 resultquotient;
byte shift; //The shift to apply! No match on 0 shift is done!
temp = val; //Load the value to divide!
*applycycles = 1; //Default: apply the cycles normally!
if (divisor==0) //Not able to divide?
{
*quotient = 0;
*remainder = temp; //Unable to comply!
*error = 1; //Divide by 0 error!
return; //Abort: division by 0!
}

if (CPU_apply286cycles()) /* No 80286+ cycles instead? */
{
SHLcycle = ADDSUBcycle = 0; //Don't apply the cycle counts for this instruction!
*applycycles = 0; //Don't apply the cycles anymore!
}

temp = val; //Load the remainder to use!
resultquotient = 0; //Default: we have nothing after division!
nextstep:
//First step: calculate shift so that (divisor<<shift)<=remainder and ((divisor<<(shift+1))>remainder)
temp2 = divisor; //Load the default divisor for x1!
if (temp2>temp) //Not enough to divide? We're done!
{
goto gotresult; //We've gotten a result!
}
currentquotient = 1; //We're starting with x1 factor!
for (shift=0;shift<(resultbits+1);++shift) //Check for the biggest factor to apply(we're going from bit 0 to maxbit)!
{
if ((temp2<=temp) && ((temp2<<1)>temp)) //Found our value to divide?
{
CPU[activeCPU].cycles_OP += SHLcycle; //We're taking 1 more SHL cycle for this!
break; //We've found our shift!
}
temp2 <<= 1; //Shift to the next position!
currentquotient <<= 1; //Shift to the next result!
CPU[activeCPU].cycles_OP += SHLcycle; //We're taking 1 SHL cycle for this! Assuming parallel shifting!
}
if (shift==(resultbits+1)) //We've overflown? We're too large to divide!
Show last 71 lines
	{
*error = 1; //Raise divide by 0 error due to overflow!
return; //Abort!
}
//Second step: substract divisor<<n from remainder and increase result with 1<<n.
temp -= temp2; //Substract divisor<<n from remainder!
resultquotient += currentquotient; //Increase result(divided value) with the found power of 2 (1<<n).
CPU[activeCPU].cycles_OP += ADDSUBcycle; //We're taking 1 substract and 1 addition cycle for this(ADD/SUB register take 3 cycles)!
goto nextstep; //Start the next step!
//Finished when remainder<divisor or remainder==0.
gotresult: //We've gotten a result!
if (temp>((1ULL<<resultbits)-1)) //Modulo overflow?
{
*error = 1; //Raise divide by 0 error due to overflow!
return; //Abort!
}
if (resultquotient>((1ULL<<resultbits)-1ULL)) //Quotient overflow?
{
*error = 1; //Raise divide by 0 error due to overflow!
return; //Abort!
}
if (issigned) //Check for signed overflow as well?
{
/*
if (checkSignedOverflow(temp,64,resultbits,remaindernegative))
{
*error = 1; //Raise divide by 0 error due to overflow!
return; //Abort!
}
*/
if (checkSignedOverflow(resultquotient,64,resultbits,quotientnegative))
{
*error = 1; //Raise divide by 0 error due to overflow!
return; //Abort!
}
}
*quotient = resultquotient; //Quotient calculated!
*remainder = temp; //Give the modulo! The result is already calculated!
*error = 0; //We're having a valid result!
}

void CPU80386_internal_IDIV(uint_64 val, uint_32 divisor, uint_32 *quotient, uint_32 *remainder, byte *error, byte resultbits, byte SHLcycle, byte ADDSUBcycle, byte *applycycles)
{
byte quotientnegative, remaindernegative; //To toggle the result and apply sign after and before?
quotientnegative = remaindernegative = 0; //Default: don't toggle the result not remainder!
if (((val>>63)!=(divisor>>31))) //Are we to change signs on the result? The result is negative instead! (We're a +/- or -/+ division)
{
quotientnegative = 1; //We're to toggle the result sign if not zero!
}
if (val&0x8000000000000000ULL) //Negative value to divide?
{
val = ((~val)+1); //Convert the negative value to be positive!
remaindernegative = 1; //We're to toggle the remainder is any, because the value to divide is negative!
}
if (divisor&0x80000000) //Negative divisor? Convert to a positive divisor!
{
divisor = ((~divisor)+1); //Convert the divisor to be positive!
}
CPU80386_internal_DIV(val,divisor,quotient,remainder,error,resultbits,SHLcycle,ADDSUBcycle,applycycles,1,quotientnegative,remaindernegative); //Execute the division as an unsigned division!
if (*error==0) //No error has occurred? Do post-processing of the results!
{
if (quotientnegative) //The result is negative?
{
*quotient = (~*quotient)+1; //Apply the new sign to the result!
}
if (remaindernegative) //The remainder is negative?
{
*remainder = (~*remainder)+1; //Apply the new sign to the remainder!
}
}
}

Both are functioning without problems now:D

The only things that are giving problems now are the arithmetic (SALr etc.) instructiona and IMUL instructions?

Author of the UniPCemu emulator.
UniPCemu Git repository
UniPCemu for Android, Windows, PSP, Vita and Switch on itch.io

Reply 79 of 178, by superfury

User metadata
Rank l33t++
Rank
l33t++

Just took a look at the 80386 manual at http://x86.renejeschke.de/ and implemented the wrappings on the counts etc. I've ran the testsuite again and after comparisions with the EE reference I saw the following:

SAL1=OK
SALi=OK
SALr=Flags problems
SAR1=OK
SARi=OK
SARr=OK
SHR1=OK
SHRi=OK
SHRr=Flags problems
ROL1=OK
ROLi=Flags problems(overflow flag not set?)
ROLr=Carry flag problems
ROR1=OK
RORi=OK
RORr(b/w)=Carry flag problems
RORr(d)=OK
RCL1=OK
RCLi=Overflow flag problems(not set) at word variant only.
RCLr=OK
RCR1=OK
RCRi=Overflow flag problems(not set).
RCRr=Carry flag problems(set/not set).

This is the actual general instruction executed for all of those rotate/shift instructions:

8/16-bit(8086+):

byte op_grp2_8(byte cnt, byte varshift) {
//word d,
INLINEREGISTER word s, shift, tempCF, msb;
INLINEREGISTER byte numcnt;
//word backup;
//if (cnt>0x8) return(oper1b); //NEC V20/V30+ limits shift count
numcnt = cnt; //Save count!
s = oper1b;
switch (thereg) {
case 0: //ROL r/m8
if (EMULATED_CPU>=CPU_80386) numcnt &= 7; //Operand size wrap!
else if (EMULATED_CPU >= CPU_NECV30) numcnt &= 0x1F; //Clear the upper 3 bits to become a NEC V20/V30+!
for (shift = 1; shift <= numcnt; shift++) {
tempCF = ((s&0x80)>>7); //Save MSB!
s = (s << 1)|tempCF;
}
FLAGW_CF(s&1); //Set carry flag!
if (cnt==1) FLAGW_OF(((s >> 7) & 1)^FLAG_CF);
break;

case 1: //ROR r/m8
if (EMULATED_CPU>=CPU_80386) numcnt &= 7; //Operand size wrap!
else if (EMULATED_CPU >= CPU_NECV30) numcnt &= 0x1F; //Clear the upper 3 bits to become a NEC V20/V30+!
for (shift = 1; shift <= numcnt; shift++) {
tempCF = (s&1); //Save LSB!
s = (s >> 1) | (tempCF << 7);
FLAGW_CF(tempCF); //Set carry flag!
}
if (cnt==1) FLAGW_OF((s >> 7) ^ ((s >> 6) & 1));
break;

case 2: //RCL r/m8
if (EMULATED_CPU >= CPU_NECV30) numcnt &= 0x1F; //Clear the upper 3 bits to become a NEC V20/V30+!
if (EMULATED_CPU>=CPU_80386) numcnt %= 9; //Operand size wrap!
for (shift = 1; shift <= numcnt; shift++) {
tempCF = ((s&0x80)>>7); //Save MSB!
s = (s << 1)|FLAG_CF; //Shift and set CF!
FLAGW_CF(tempCF); //Set CF!
}
if (cnt==1) FLAGW_OF(((s >> 7) & 1)^FLAG_CF); //OF=MSB^CF
break;

case 3: //RCR r/m8
if (EMULATED_CPU >= CPU_NECV30) numcnt &= 0x1F; //Clear the upper 3 bits to become a NEC V20/V30+!
if (EMULATED_CPU>=CPU_80386) numcnt %= 9; //Operand size wrap!
if (cnt==1) FLAGW_OF((s >> 7) ^ FLAG_CF);
for (shift = 1; shift <= numcnt; shift++) {
tempCF = (s&1); //Save LSB!
s = (s >> 1) | (FLAG_CF << 7);
FLAGW_CF(tempCF);
}
break;

case 4: case 6: //SHL r/m8
if (EMULATED_CPU >= CPU_NECV30) numcnt &= 0x1F; //Clear the upper 3 bits to become a NEC V20/V30+!
//FLAGW_AF(0);
for (shift = 1; shift <= numcnt; shift++) {
if (s & 0x80) FLAGW_CF(1); else FLAGW_CF(0);
//if (s & 0x8) FLAGW_AF(1); //Auxiliary carry?
s = (s << 1) & 0xFF;
Show last 165 lines
		}
if (numcnt==1) { if (FLAG_CF==(s>>7)) FLAGW_OF(0); else FLAGW_OF(1); }
flag_szp8((uint8_t)(s&0xFF)); break;

case 5: //SHR r/m8
if (EMULATED_CPU >= CPU_NECV30) numcnt &= 0x1F; //Clear the upper 3 bits to become a NEC V20/V30+!
if (numcnt==1) { if (s&0x80) FLAGW_OF(1); else FLAGW_OF(0); }
//FLAGW_AF(0);
for (shift = 1; shift <= numcnt; shift++) {
FLAGW_CF(s & 1);
//backup = s; //Save backup!
s = s >> 1;
//if (((backup^s)&0x10)) FLAGW_AF(1); //Auxiliary carry?
}
flag_szp8((uint8_t)(s & 0xFF)); break;

case 7: //SAR r/m8
if (EMULATED_CPU >= CPU_NECV30) numcnt &= 0x1F; //Clear the upper 3 bits to become a NEC V20/V30+!
msb = s & 0x80;
//FLAGW_AF(0);
for (shift = 1; shift <= numcnt; shift++) {
FLAGW_CF(s & 1);
//backup = s; //Save backup!
s = (s >> 1) | msb;
//if (((backup^s)&0x10)) FLAGW_AF(1); //Auxiliary carry?
}
byte tempSF;
tempSF = FLAG_SF; //Save the SF!
/*flag_szp8((uint8_t)(s & 0xFF));*/
//http://www.electronics.dit.ie/staff/tscarff/8086_instruction_set/8086_instruction_set.html#SAR says only C and O flags!
if (!numcnt) //Nothing done?
{
FLAGW_SF(tempSF); //We don't update when nothing's done!
}
else if (numcnt==1) //Overflow is cleared on all 1-bit shifts!
{
flag_szp8(s); //Affect sign as well!
FLAGW_OF(0); //Cleared!
}
else if (numcnt) //Anything shifted at all?
{
flag_szp8(s); //Affect sign as well!
if (EMULATED_CPU<=CPU_NECV30) //Valid to update OF?
{
FLAGW_OF(0); //Cleared with count as well?
}
}
break;
}
op_grp2_cycles(numcnt, varshift);
return(s & 0xFF);
}

word op_grp2_16(byte cnt, byte varshift) {
//word d,
INLINEREGISTER uint_32 s, shift, tempCF, msb;
INLINEREGISTER byte numcnt;
//word backup;
//if (cnt>0x8) return(oper1b); //NEC V20/V30+ limits shift count
numcnt = cnt; //Save count!
s = oper1;
switch (thereg) {
case 0: //ROL r/m16
if (EMULATED_CPU>=CPU_80386) numcnt &= 0xF; //Operand size wrap!
else if (EMULATED_CPU >= CPU_NECV30) numcnt &= 0x1F; //Clear the upper 3 bits to become a NEC V20/V30+!
for (shift = 1; shift <= numcnt; shift++) {
tempCF = ((s&0x8000)>>15); //Save MSB!
s = (s << 1)|tempCF;
}
FLAGW_CF(s&1); //Set carry flag!
if (cnt==1) FLAGW_OF(((s >> 15) & 1)^FLAG_CF);
break;

case 1: //ROR r/m16
if (EMULATED_CPU>=CPU_80386) numcnt &= 0xF; //Operand size wrap!
else if (EMULATED_CPU >= CPU_NECV30) numcnt &= 0x1F; //Clear the upper 3 bits to become a NEC V20/V30+!
for (shift = 1; shift <= numcnt; shift++) {
tempCF = (s&1); //Save LSB!
s = (s >> 1) | (tempCF << 15);
FLAGW_CF(tempCF); //Set carry flag!
}
if (cnt==1) FLAGW_OF((s >> 15) ^ ((s >> 14) & 1));
break;

case 2: //RCL r/m16
if (EMULATED_CPU >= CPU_NECV30) numcnt &= 0x1F; //Clear the upper 3 bits to become a NEC V20/V30+!
if (EMULATED_CPU>=CPU_80386) numcnt %= 17; //Operand size wrap!
for (shift = 1; shift <= numcnt; shift++) {
tempCF = ((s&0x8000)>>15); //Save MSB!
s = (s << 1)|FLAG_CF; //Shift and set CF!
FLAGW_CF(tempCF); //Set CF!
}
if (cnt==1) FLAGW_OF(((s >> 15) & 1)^FLAG_CF); //OF=MSB^CF
break;

case 3: //RCR r/m16
if (EMULATED_CPU >= CPU_NECV30) numcnt &= 0x1F; //Clear the upper 3 bits to become a NEC V20/V30+!
if (EMULATED_CPU>=CPU_80386) numcnt %= 17; //Operand size wrap!
if (cnt==1) FLAGW_OF((s >> 15) ^ FLAG_CF);
for (shift = 1; shift <= numcnt; shift++) {
tempCF = (s&1); //Save LSB!
s = (s >> 1) | (FLAG_CF << 15);
FLAGW_CF(tempCF);
}
break;

case 4: case 6: //SHL r/m16
if (EMULATED_CPU >= CPU_NECV30) numcnt &= 0x1F; //Clear the upper 3 bits to become a NEC V20/V30+!
//FLAGW_AF(0);
for (shift = 1; shift <= numcnt; shift++) {
if (s & 0x8000) FLAGW_CF(1); else FLAGW_CF(0);
//if (s & 0x8) FLAGW_AF(1); //Auxiliary carry?
s = (s << 1) & 0xFFFF;
}
if (numcnt==1) { if (FLAG_CF==(s>>15)) FLAGW_OF(0); else FLAGW_OF(1); }
flag_szp16((uint16_t)(s&0xFFFF)); break;

case 5: //SHR r/m16
if (EMULATED_CPU >= CPU_NECV30) numcnt &= 0x1F; //Clear the upper 3 bits to become a NEC V20/V30+!
if (numcnt==1) { if (s&0x8000) FLAGW_OF(1); else FLAGW_OF(0); }
//FLAGW_AF(0);
for (shift = 1; shift <= numcnt; shift++) {
FLAGW_CF(s & 1);
//backup = s; //Save backup!
s = s >> 1;
//if (((backup^s)&0x10)) FLAGW_AF(1); //Auxiliary carry?
}
flag_szp16((uint16_t)(s & 0xFFFF)); break;

case 7: //SAR r/m16
if (EMULATED_CPU >= CPU_NECV30) numcnt &= 0x1F; //Clear the upper 3 bits to become a NEC V20/V30+!
msb = s & 0x8000;
//FLAGW_AF(0);
for (shift = 1; shift <= numcnt; shift++) {
FLAGW_CF(s & 1);
//backup = s; //Save backup!
s = (s >> 1) | msb;
//if (((backup^s)&0x10)) FLAGW_AF(1); //Auxiliary carry?
}
byte tempSF;
tempSF = FLAG_SF; //Save the SF!
/*flag_szp8((uint8_t)(s & 0xFF));*/
//http://www.electronics.dit.ie/staff/tscarff/8086_instruction_set/8086_instruction_set.html#SAR says only C and O flags!
if (!numcnt) //Nothing done?
{
FLAGW_SF(tempSF); //We don't update when nothing's done!
}
else if (numcnt==1) //Overflow is cleared on all 1-bit shifts!
{
flag_szp16(s); //Affect sign as well!
FLAGW_OF(0); //Cleared!
}
else if (numcnt) //Anything shifted at all?
{
flag_szp16(s); //Affect sign as well!
if (EMULATED_CPU<=CPU_NECV30) //Valid to update OF?
{
FLAGW_OF(0); //Cleared with count as well?
}
}
break;
}
op_grp2_cycles(numcnt, varshift);
return(s & 0xFFFF);
}

32-bit:

uint_32 op_grp2_32(byte cnt, byte varshift) {
//word d,
INLINEREGISTER uint_64 s, shift, tempCF, msb;
INLINEREGISTER byte numcnt;
//word backup;
//if (cnt>0x8) return(oper1b); //NEC V20/V30+ limits shift count
numcnt = cnt; //Save count!
s = oper1d;
switch (thereg) {
case 0: //ROL r/m32
if (EMULATED_CPU>=CPU_80386) numcnt &= 0x1F; //Operand size wrap!
else if (EMULATED_CPU >= CPU_NECV30) numcnt &= 0x1F; //Clear the upper 3 bits to become a NEC V20/V30+!
for (shift = 1; shift <= numcnt; shift++) {
tempCF = ((s&0x80000000)>>31); //Save MSB!
s = (s << 1)|tempCF;
}
FLAGW_CF(s&1); //Set carry flag!
if (cnt==1) FLAGW_OF(((s >> 31) & 1)^FLAG_CF);
break;

case 1: //ROR r/m32
if (EMULATED_CPU>=CPU_80386) numcnt &= 0x1F; //Operand size wrap!
else if (EMULATED_CPU >= CPU_NECV30) numcnt &= 0x1F; //Clear the upper 3 bits to become a NEC V20/V30+!
for (shift = 1; shift <= numcnt; shift++) {
tempCF = (s&1); //Save LSB!
s = (s >> 1) | (tempCF << 31);
FLAGW_CF(tempCF); //Set carry flag!
}
if (cnt==1) FLAGW_OF((s >> 31) ^ ((s >> 30) & 1));
break;

case 2: //RCL r/m32
if (EMULATED_CPU >= CPU_NECV30) numcnt &= 0x1F; //Clear the upper 3 bits to become a NEC V20/V30+!
for (shift = 1; shift <= numcnt; shift++) {
tempCF = ((s&0x80000000)>>31); //Save MSB!
s = (s << 1)|FLAG_CF; //Shift and set CF!
FLAGW_CF(tempCF); //Set CF!
}
if (cnt==1) FLAGW_OF(((s >> 31) & 1)^FLAG_CF); //OF=MSB^CF
break;

case 3: //RCR r/m32
if (EMULATED_CPU >= CPU_NECV30) numcnt &= 0x1F; //Clear the upper 3 bits to become a NEC V20/V30+!
if (cnt==1) FLAGW_OF((s >> 31) ^ FLAG_CF);
for (shift = 1; shift <= numcnt; shift++) {
tempCF = (s&1); //Save LSB!
s = (s >> 1) | (FLAG_CF << 31);
FLAGW_CF(tempCF);
}
break;

case 4: case 6: //SHL r/m32
if (EMULATED_CPU >= CPU_NECV30) numcnt &= 0x1F; //Clear the upper 3 bits to become a NEC V20/V30+!
//FLAGW_AF(0);
for (shift = 1; shift <= numcnt; shift++) {
if (s & 0x80000000) FLAGW_CF(1); else FLAGW_CF(0);
//if (s & 0x8) FLAGW_AF(1); //Auxiliary carry?
s = (s << 1) & 0xFFFFFFFF;
}
if (numcnt==1) { if (FLAG_CF==(s>>31)) FLAGW_OF(0); else FLAGW_OF(1); }
Show last 50 lines
		flag_szp32((uint32_t)(s&0xFFFFFFFF)); break;

case 5: //SHR r/m32
if (EMULATED_CPU >= CPU_NECV30) numcnt &= 0x1F; //Clear the upper 3 bits to become a NEC V20/V30+!
if (numcnt==1) { if (s&0x80000000) FLAGW_OF(1); else FLAGW_OF(0); }
//FLAGW_AF(0);
for (shift = 1; shift <= numcnt; shift++) {
FLAGW_CF(s & 1);
//backup = s; //Save backup!
s = s >> 1;
//if (((backup^s)&0x10)) FLAGW_AF(1); //Auxiliary carry?
}
flag_szp32((uint32_t)(s & 0xFFFFFFFF)); break;

case 7: //SAR r/m32
if (EMULATED_CPU >= CPU_NECV30) numcnt &= 0x1F; //Clear the upper 3 bits to become a NEC V20/V30+!
msb = s & 0x80000000;
//FLAGW_AF(0);
for (shift = 1; shift <= numcnt; shift++) {
FLAGW_CF(s & 1);
//backup = s; //Save backup!
s = (s >> 1) | msb;
//if (((backup^s)&0x10)) FLAGW_AF(1); //Auxiliary carry?
}
byte tempSF;
tempSF = FLAG_SF; //Save the SF!
/*flag_szp8((uint8_t)(s & 0xFF));*/
//http://www.electronics.dit.ie/staff/tscarff/8086_instruction_set/8086_instruction_set.html#SAR says only C and O flags!
if (!numcnt) //Nothing done?
{
FLAGW_SF(tempSF); //We don't update when nothing's done!
}
else if (numcnt==1) //Overflow is cleared on all 1-bit shifts!
{
flag_szp32(s); //Affect sign as well!
FLAGW_OF(0); //Cleared!
}
else if (numcnt) //Anything shifted at all?
{
flag_szp32(s); //Affect sign as well!
if (EMULATED_CPU<=CPU_NECV30) //Valid to update OF?
{
FLAGW_OF(0); //Cleared with count as well?
}
}
break;
}
op_grp2_cycles32(numcnt, varshift);
return(s & 0xFFFFFFFF);
}

Can anyone see what's going wrong with those instructions?

Edit: Managed to fix until "ROLi B".

Author of the UniPCemu emulator.
UniPCemu Git repository
UniPCemu for Android, Windows, PSP, Vita and Switch on itch.io