test386.asm CPU tester

Reply 60 of 178, by peterferrie

Posted on 2017-10-30, 19:48

peterferrie Offline

Rank Oldbie

Rank: Oldbie
Posts: 649
Joined: 2008-05-08, 21:54

superfury wrote:
I'm just asking to be sure: to the CPU, does a 2-operand opcode 69/6B(IMUL r16,imm8/16) even exist? Or does it always decode to 3 operands, with r/m and immediate being multiplied and stored into the reg operand?

Didn't we discuss this previously? The AX register as destination parameter is implicit in the documentation, but always present in reality and decodes in the usual way.

Reply 61 of 178, by superfury

Posted on 2017-10-30, 21:06

superfury Offline

Rank l33t++

Rank: l33t++
Posts: 5858
Joined: 2014-03-08, 11:25
Location: Netherlands

peterferrie wrote:
superfury wrote:
I'm just asking to be sure: to the CPU, does a 2-operand opcode 69/6B(IMUL r16,imm8/16) even exist? Or does it always decode to 3 operands, with r/m and immediate being multiplied and stored into the reg operand?

Didn't we discuss this previously? The AX register as destination parameter is implicit in the documentation, but always present in reality and decodes in the usual way.

Huh? So the r16 in opcodes 69/6B is forced to be AX or EAX, depending on operand size? So the instruction only uses r/m16/32 and imm8/16/32 and always stores it's result in (E)AX? Then what is the reg part of modr/m used for? That's entirely different than what hottobar said a few posts back(reg16/32 is the destination of the result)?

Or do you mean the normal GRP3a/b instructions, which always store in (E)AX?

For some reason the logs keep erroring out on the opcode 69/6B IMUL8/16/32 instructions?

Author of the UniPCemu emulator.
UniPCemu Git repository
UniPCemu for Android, Windows, PSP, Vita and Switch on itch.io

Reply 62 of 178, by peterferrie

Posted on 2017-11-01, 22:56

peterferrie Offline

Rank Oldbie

Rank: Oldbie
Posts: 649
Joined: 2008-05-08, 21:54

I meant that the "two-byte" version always uses (E)AX.
The three-byte version uses whatever register you specify.

Reply 63 of 178, by superfury

Posted on 2017-11-02, 05:42

superfury Offline

Rank l33t++

Rank: l33t++
Posts: 5858
Joined: 2014-03-08, 11:25
Location: Netherlands

@peterferrie: So, if I understand this correctly:
- If opcodes 69/6B r16 decodes as AX/EAX, then it disassembles as the two-operand version.
- If opcodes 69/6B r16 decodes as other registers, then it disassembles as the three-operand version.

In both cases, behaviour is as the three-operand version?

Is that correct? Or is the two-operand disassembly of opcodes 69/6B an error in the 80386 manuals?

Edit: Btw, what do you mean with "two-byte" and "three-byte"? Two-byte=F7/F8/67/6B and Three-byte=0F opcode variant? Or do you actually mean two-operand and three-operand?

Author of the UniPCemu emulator.
UniPCemu Git repository
UniPCemu for Android, Windows, PSP, Vita and Switch on itch.io

Reply 64 of 178, by superfury

Posted on 2017-11-02, 10:33

superfury Offline

Rank l33t++

Rank: l33t++
Posts: 5858
Joined: 2014-03-08, 11:25
Location: Netherlands

@hottobar: I've just tried to single step the DIV0 interrupt that happens at the first DIV instruction of the testsuite. I see it's doing it's stuff with ESI(saving it), calling the printStr function to display the error, modifying the return address, IRET to the return address. Then the RETD at the bottom of the handler returns to address 0010:00000000 instead of the original location of the loop that handles the processing of the OPs table. So something is going wrong at some point between the table starting handling of the DIV entry(call ESI instruction) and the return of the IRET handler?

Unfortunately searching the log isn't that easy: it's a 19.2GB large text file containing all debugger information that's logged(the current common log format with memory logging enabled(the latter( shorthand) method mentioned in my earlier posts on the common log thread)):

https://www.dropbox.com/s/00byfb9r02r8z7l/deb … 02_1036.7z?dl=0

Be warned that it requires at least 256MB memory to extract, as I've increased it from the default 64MB to increase the compression ratio(even though it's already at Ultra compression level), but I thought that every increase in compression is usable, due to the huge log file.

Author of the UniPCemu emulator.
UniPCemu Git repository
UniPCemu for Android, Windows, PSP, Vita and Switch on itch.io

Reply 65 of 178, by superfury

Posted on 2017-11-02, 21:15

superfury Offline

Rank l33t++

Rank: l33t++
Posts: 5858
Joined: 2014-03-08, 11:25
Location: Netherlands

Strangely enough, using a simple large file text editor doesn't show to find any div instructions when searching the log for some odd reason?

Thinking about the problem, there must be a problem related to the stack somehow.
The stages should be as follows(assuming no stack changes):
call idivloc(esi)
*pushes eip*
idivloc: div causing int 0
*pushes flags*
*pushes cs*
*pushes eip*
calls log "#DE " and returns
modifies eip at [ESP] (confirmed in disassembly, which is missing from the log somehow?)
iret
*pops modified eip*
*pops original cs*
*pops original eflags*
retd
*pops original eip of test dispatch routine for op table* (reads 0x00000000 EIP from the stack)

All of these should be working correctly: calld/retd because of earlier tests and div(int)/iret because of earlier faults(bounds x2, paging fault).

Is there something different on the div fault handing compared to the other faults? Privilege changes or the like?

Author of the UniPCemu emulator.
UniPCemu Git repository
UniPCemu for Android, Windows, PSP, Vita and Switch on itch.io

Reply 66 of 178, by peterferrie

Posted on 2017-11-02, 23:41

peterferrie Offline

Rank Oldbie

Rank: Oldbie
Posts: 649
Joined: 2008-05-08, 21:54

superfury wrote:
@peterferrie: So, if I understand this correctly: - If opcodes 69/6B r16 decodes as AX/EAX, then it disassembles as the two-oper […]
Show full quote
@peterferrie: So, if I understand this correctly:
- If opcodes 69/6B r16 decodes as AX/EAX, then it disassembles as the two-operand version.
- If opcodes 69/6B r16 decodes as other registers, then it disassembles as the three-operand version.

In both cases, behaviour is as the three-operand version?

Yes, that's correct.

superfury wrote:
Edit: Btw, what do you mean with "two-byte" and "three-byte"? Two-byte=F7/F8/67/6B and Three-byte=0F opcode variant? Or do you actually mean two-operand and three-operand?

Yes, I meant two-operand, not two-byte. My mistake.

Reply 67 of 178, by peterferrie

Posted on 2017-11-02, 23:44

peterferrie Offline

Rank Oldbie

Rank: Oldbie
Posts: 649
Joined: 2008-05-08, 21:54

superfury wrote:
Strangely enough, using a simple large file text editor doesn't show to find any div instructions when searching the log for som […]
Show full quote
Strangely enough, using a simple large file text editor doesn't show to find any div instructions when searching the log for some odd reason?

Thinking about the problem, there must be a problem related to the stack somehow.
The stages should be as follows(assuming no stack changes):
call idivloc(esi)
*pushes eip*
idivloc: div causing int 0
*pushes flags*
*pushes cs*
*pushes eip*
calls log "#DE " and returns
modifies eip at [ESP] (confirmed in disassembly, which is missing from the log somehow?)
iret
*pops modified eip*
*pops original cs*
*pops original eflags*
retd
*pops original eip of test dispatch routine for op table* (reads 0x00000000 EIP from the stack)

All of these should be working correctly: calld/retd because of earlier tests and div(int)/iret because of earlier faults(bounds x2, paging fault).

Is there something different on the div fault handing compared to the other faults? Privilege changes or the like?

The first question - is your stack pointer returned to the value that it should have when the call was made?
I suspect that's where the problem lies. Perhaps something is being left on the stack, like an error code or similar?

Reply 68 of 178, by superfury

Posted on 2017-11-03, 01:36

superfury Offline

Rank l33t++

Rank: l33t++
Posts: 5858
Joined: 2014-03-08, 11:25
Location: Netherlands

There's no error code with a Divide Error exception, so that can't be it. I'll look if I see any stack pointer errors when I have the time.

Author of the UniPCemu emulator.
UniPCemu Git repository
UniPCemu for Android, Windows, PSP, Vita and Switch on itch.io

Reply 69 of 178, by superfury

Posted on 2017-11-03, 11:09

superfury Offline

Rank l33t++

Rank: l33t++
Posts: 5858
Joined: 2014-03-08, 11:25
Location: Netherlands

I've just looked at the DIV0 interrupt firing: I see some strange stack pushes there(according to Visual Studio debugger):
fbf8=original CS before INT0
fbf4=EFLAGS
f9f4=CS
f9f0=EIP

That is a crazy jump between EFLAGS and CS?

Edit: It's just the 32-bit extension bit (0x8) being used to shift left 2 by 8 to obtain the usual 0/1 value(word/dword value) which becomes an invalid shift when applied to the stack, moving ESP down by no less than 2<<8=512 bytes:S That happened when pushing CS on the stack and decreasing ESP before writing CS to memory:S

Edit: Having fixed this, it now continues on towards the other tests:D

Edit: After stepping though, I see it properly finishing up now and entering the final HLT at the end of the program(after the return to real mode and loading all registers, clearing the interrupt flag and executing HLT at POST FF). 😁

This is the EE log UniPCemu's generating:

The attachment porte9.log is no longer available

It does seem to give lots of errors from the IMUL8 onwards still? What's going wrong with those instructions? The same applies to the DIV instructions?

Edit: Looking at a simple diff from http://prettydiff.com/ , I see the following instructions failing:

1- All IMUL8/16/32 instructions.
2- DIVDX W regarding #DE(not faulting).
3- DIVEDX W/DIVEDX D regarding #DE(not faulting).
4- DIVAX W regarding #DE(not faulting).
5- DIVEAX D regarding #DE(not faulting).
6- IDIVDL B not always faulting #DE.
7- IDIVEDX D not always faulting #DE.
8- IDIVEDX D EAX = 80000001 EDX = FFFFFFFF PS = 0000 wrong result?
9etc.
10- Shift/rotate carry flags problems? :
11- SALr B/W/D flags problems?
12- SHRr B/W/D flags problems?
13- ROLr B/W/D flags problems?
14- RORr B/W/D flags problems?
15- RCLr B/W/D problems?
16- RCRr B/W/D problems?

So, simply said:
- flags problems on the shift/rotate instructions(mostly carry flag as far as I can see?)
- full output problems on IMUL8/16/32 instructions?
- #DE problems with DIV/IDIV instructions?

8086+ opcodes: https://bitbucket.org/superfury/unipcemu/src/ … 086.c?at=master
NEC V30+ opcodes(0F IMUL instruction): https://bitbucket.org/superfury/unipcemu/src/ … V30.c?at=master
32-bit operand size of all 80286- instructions: https://bitbucket.org/superfury/unipcemu/src/ … 386.c?at=master
80386+ 0F instructions: https://bitbucket.org/superfury/unipcemu/src/ … 386.c?at=master

Can you see what's exactly going wrong? Search for OP0FXX or OPXX for the opcode emulation itself(OP6B, OP69, OPF6, OPF7 and OPD*).

Author of the UniPCemu emulator.
UniPCemu Git repository
UniPCemu for Android, Windows, PSP, Vita and Switch on itch.io

Reply 70 of 178, by hottobar

Posted on 2017-11-03, 17:48

hottobar Offline

Rank Newbie

Rank: Newbie
Posts: 50
Joined: 2014-04-21, 17:00

superfury wrote:

Edit: Having fixed this, it now continues on towards the other tests:D

Congratulations! 😁

superfury wrote:

Can you see what's exactly going wrong? Search for OP0FXX or OPXX for the opcode emulation itself(OP6B, OP69, OPF6, OPF7 and OPD*).

I quickly read CPU_OP6B(), I didn't find the bug but here's a couple of things that can prevent future bugs.

You can sign extend by doing a type cast, so instead of:

1temp2.val64 = (uint_64)immb; //Read unsigned parameter!
2if (temp2.val64&0x80ULL) temp2.val64 |= 0xFFFFFFFFFFFFFF00ULL; //Sign extend to 64 bits!

you can simply do:

1temp2.val64 = (int_8)immb;

(assuming you've declared int_8 as a signed 8 bit type, like int8_t)

I couldn't find the declaration of VAL64Splitter, but I assume it's a union used for type conversion.

1temp3.val64s = temp1.val64s * temp2.val64s;

Beware that the use of unions to do a reinterpret_cast of data is undefined behavior.

Reply 71 of 178, by superfury

Posted on 2017-11-03, 18:05

superfury Offline

Rank l33t++

Rank: l33t++
Posts: 5858
Joined: 2014-03-08, 11:25
Location: Netherlands

I know, but I assume that if the data size is constant(e.g. 64-bit integer, 32-bit integer, 16-bit integer, 8-bit integer), it shouldn't be a problem(e.g. uint_64 to int_64 and vise-versa)? The only thing that effectively changes is the sign bit becoming a negative value of itself, assuming C/C++ compatiblity on IEEE standards?

Author of the UniPCemu emulator.
UniPCemu Git repository
UniPCemu for Android, Windows, PSP, Vita and Switch on itch.io

Reply 72 of 178, by hottobar

Posted on 2017-11-03, 18:36

hottobar Offline

Rank Newbie

Rank: Newbie
Posts: 50
Joined: 2014-04-21, 17:00

superfury wrote:
I know, but I assume that if the data size is constant(e.g. 64-bit integer, 32-bit integer, 16-bit integer, 8-bit integer), it shouldn't be a problem(e.g. uint_64 to int_64 and vise-versa)? The only thing that effectively changes is the sign bit becoming a negative value of itself, assuming C/C++ compatiblity on IEEE standards?

The problem is that compilers exploit undefined behaviours to do optimizations. A compiler can change the way it treats a specific UB without notice and your code could break in unpredictable ways.
See for example how GCC uses the signed overflow UB.

Reply 73 of 178, by superfury

Posted on 2017-11-03, 19:02

superfury Offline

Rank l33t++

Rank: l33t++
Posts: 5858
Joined: 2014-03-08, 11:25
Location: Netherlands

Aren't signed/unsigned integer values defined as an IEEE standard, thus never changing? If they were to change, lots of files would simply break due to invalid contents? So it should be assumable that using an union to convert between signed/unsigned variables shouldn't pose a problem?

Author of the UniPCemu emulator.
UniPCemu Git repository
UniPCemu for Android, Windows, PSP, Vita and Switch on itch.io

Reply 74 of 178, by superfury

Posted on 2017-11-03, 19:35

superfury Offline

Rank l33t++

Rank: l33t++
Posts: 5858
Joined: 2014-03-08, 11:25
Location: Netherlands

I've improved the DIV/IDIV algorithm a bit. Now the DIV instructions check out correctly. But the IDIV instruction seem to fail somehow?

Changes: https://bitbucket.org/superfury/unipcemu/comm … 82fa270f471bb10

Although it should improve the signed overflow detection, it somehow doesn't fully fix it? Can you see what's going wrong?

The attachment porte9.log is no longer available

(ignore the enc_temp_folder file, it seems to have been committed with the rest accidently, removed in newer commit).

Author of the UniPCemu emulator.
UniPCemu Git repository
UniPCemu for Android, Windows, PSP, Vita and Switch on itch.io

Reply 75 of 178, by superfury

Posted on 2017-11-03, 20:50

superfury Offline

Rank l33t++

Rank: l33t++
Posts: 5858
Joined: 2014-03-08, 11:25
Location: Netherlands

I've improved the overflow detection a bit, but it still doesn't catch all or in some cases seems to catch too many overflows?

1/*
2
3checkSignedOverflow: Checks if a signed overflow occurs trying to store the data.
4unsignedval: The unsigned, positive value
5calculatedbits: The amount of bits that's stored in unsignedval.
6bits: The amount of bits to store in.
7convertedtopositive: The unsignedval is a positive conversion from a negative result, so needs to be converted back.
8
9*/
10
11//Based on http://www.ragestorm.net/blogs/?p=34
12
13byte checkSignedOverflow(uint_64 unsignedval, byte calculatedbits, byte bits, byte convertedtopositive)
14{
15	uint_64 maxpositive,maxbit,errorrange;
16	if (convertedtopositive) unsignedval = (~unsignedval)+1; //Convert to negative, if needed!
17	maxpositive = ((1ULL<<(bits-1))-1); //Maximum positive value we can have!
18	maxbit = (1ULL<<(bits-1)); //The highest value we cannot set!
19	errorrange = ((1ULL<<bits)-1)-maxpositive; //Lower roof of invalid range!
20	if (unlikely((unsignedval>maxpositive) && ((unsignedval<maxbit)||(unsignedval>errorrange)))) //Signed underflow/overflow on unsinged conversion?
21	{
22		return 1; //Underflow/overflow detected!
23	}
24	return 0; //OK!
25}

32-bit (I)DIV:

1//Universal DIV instruction for x86 DIV instructions!
2/*
3
4Parameters:
5	val: The value to divide
6	divisor: The value to divide by
7	quotient: Quotient result container
8	remainder: Remainder result container
9	error: 1 on error(DIV0), 0 when valid.
10	resultbits: The amount of bits the result contains(16 or 8 on 8086) of quotient and remainder.
11	SHLcycle: The amount of cycles for each SHL.
12	ADDSUBcycle: The amount of cycles for ADD&SUB instruction to execute.
13	issigned: Signed division?
14	quotientnegative: Quotient is signed negative result?
15	remaindernegative: Remainder is signed negative result?
16
17*/
18void CPU80386_internal_DIV(uint_64 val, uint_32 divisor, uint_32 *quotient, uint_32 *remainder, byte *error, byte resultbits, byte SHLcycle, byte ADDSUBcycle, byte *applycycles, byte issigned, byte quotientnegative, byte remaindernegative)
19{
20	uint_64 temp, temp2, currentquotient; //Remaining value and current divisor!
21	uint_64 resultquotient;
22	byte shift; //The shift to apply! No match on 0 shift is done!
23	temp = val; //Load the value to divide!
24	*applycycles = 1; //Default: apply the cycles normally!
25	if (divisor==0) //Not able to divide?
26	{
27		*quotient = 0;
28		*remainder = temp; //Unable to comply!
29		*error = 1; //Divide by 0 error!
30		return; //Abort: division by 0!
31	}
32
33	if (CPU_apply286cycles()) /* No 80286+ cycles instead? */
34	{
35		SHLcycle = ADDSUBcycle = 0; //Don't apply the cycle counts for this instruction!
36		*applycycles = 0; //Don't apply the cycles anymore!
37	}
38
39	temp = val; //Load the remainder to use!
40	resultquotient = 0; //Default: we have nothing after division! 
41	nextstep:
42	//First step: calculate shift so that (divisor<<shift)<=remainder and ((divisor<<(shift+1))>remainder)
43	temp2 = divisor; //Load the default divisor for x1!
44	if (temp2>temp) //Not enough to divide? We're done!
45	{
46		goto gotresult; //We've gotten a result!
47	}
48	currentquotient = 1; //We're starting with x1 factor!
49	for (shift=0;shift<(resultbits+1);++shift) //Check for the biggest factor to apply(we're going from bit 0 to maxbit)!
50	{
51		if ((temp2<=temp) && ((temp2<<1)>temp)) //Found our value to divide?
52		{
53			CPU[activeCPU].cycles_OP += SHLcycle; //We're taking 1 more SHL cycle for this!
54			break; //We've found our shift!
55		}
56		temp2 <<= 1; //Shift to the next position!
57		currentquotient <<= 1; //Shift to the next result!
58		CPU[activeCPU].cycles_OP += SHLcycle; //We're taking 1 SHL cycle for this! Assuming parallel shifting!
59	}
60	if (shift==(resultbits+1)) //We've overflown? We're too large to divide!

…Show last 71 lines

61	{
62		*error = 1; //Raise divide by 0 error due to overflow!
63		return; //Abort!
64	}
65	//Second step: substract divisor<<n from remainder and increase result with 1<<n.
66	temp -= temp2; //Substract divisor<<n from remainder!
67	resultquotient += currentquotient; //Increase result(divided value) with the found power of 2 (1<<n).
68	CPU[activeCPU].cycles_OP += ADDSUBcycle; //We're taking 1 substract and 1 addition cycle for this(ADD/SUB register take 3 cycles)!
69	goto nextstep; //Start the next step!
70	//Finished when remainder<divisor or remainder==0.
71	gotresult: //We've gotten a result!
72	if ((uint_64)temp>((1ULL<<resultbits)-1)) //Modulo overflow?
73	{
74		*error = 1; //Raise divide by 0 error due to overflow!
75		return; //Abort!		
76	}
77	if ((uint_64)resultquotient>((1ULL<<resultbits)-1)) //Quotient overflow?
78	{
79		*error = 1; //Raise divide by 0 error due to overflow!
80		return; //Abort!		
81	}
82	if (issigned) //Check for signed overflow as well?
83	{
84		/*
85		if (checkSignedOverflow((uint_64)temp,64,resultbits,remaindernegative))
86		{
87			*error = 1; //Raise divide by 0 error due to overflow!
88			return; //Abort!					
89		}
90		*/
91		if (checkSignedOverflow((uint_64)resultquotient,64,resultbits,quotientnegative))
92		{
93			*error = 1; //Raise divide by 0 error due to overflow!
94			return; //Abort!					
95		}
96	}
97	*quotient = resultquotient; //Quotient calculated!
98	*remainder = temp; //Give the modulo! The result is already calculated!
99	*error = 0; //We're having a valid result!
100}
101
102void CPU80386_internal_IDIV(uint_64 val, uint_32 divisor, uint_32 *quotient, uint_32 *remainder, byte *error, byte resultbits, byte SHLcycle, byte ADDSUBcycle, byte *applycycles)
103{
104	byte quotientnegative, remaindernegative; //To toggle the result and apply sign after and before?
105	quotientnegative = remaindernegative = 0; //Default: don't toggle the result not remainder!
106	if (((val>>31)!=(divisor>>15))) //Are we to change signs on the result? The result is negative instead! (We're a +/- or -/+ division)
107	{
108		quotientnegative = 1; //We're to toggle the result sign if not zero!
109	}
110	if (val&0x80000000) //Negative value to divide?
111	{
112		val = ((~val)+1); //Convert the negative value to be positive!
113		remaindernegative = 1; //We're to toggle the remainder is any, because the value to divide is negative!
114	}
115	if (divisor&0x8000) //Negative divisor? Convert to a positive divisor!
116	{
117		divisor = ((~divisor)+1); //Convert the divisor to be positive!
118	}
119	CPU80386_internal_DIV(val,divisor,quotient,remainder,error,resultbits,SHLcycle,ADDSUBcycle,applycycles,1,quotientnegative,remaindernegative); //Execute the division as an unsigned division!
120	if (*error==0) //No error has occurred? Do post-processing of the results!
121	{
122		if (quotientnegative) //The result is negative?
123		{
124			*quotient = (~*quotient)+1; //Apply the new sign to the result!
125		}
126		if (remaindernegative) //The remainder is negative?
127		{
128			*remainder = (~*remainder)+1; //Apply the new sign to the remainder!
129		}
130	}
131}

16-bit (I)DIV:

1//Universal DIV instruction for x86 DIV instructions!
2/*
3
4Parameters:
5	val: The value to divide
6	divisor: The value to divide by
7	quotient: Quotient result container
8	remainder: Remainder result container
9	error: 1 on error(DIV0), 0 when valid.
10	resultbits: The amount of bits the result contains(16 or 8 on 8086) of quotient and remainder.
11	SHLcycle: The amount of cycles for each SHL.
12	ADDSUBcycle: The amount of cycles for ADD&SUB instruction to execute.
13	issigned: Signed division?
14	quotientnegative: Quotient is signed negative result?
15	remaindernegative: Remainder is signed negative result?
16
17*/
18void CPU8086_internal_DIV(uint_32 val, word divisor, word *quotient, word *remainder, byte *error, byte resultbits, byte SHLcycle, byte ADDSUBcycle, byte *applycycles, byte issigned, byte quotientnegative, byte remaindernegative)
19{
20	uint_32 temp, temp2, currentquotient; //Remaining value and current divisor!
21	uint_32 resultquotient;
22	byte shift; //The shift to apply! No match on 0 shift is done!
23	temp = val; //Load the value to divide!
24	*applycycles = 1; //Default: apply the cycles normally!
25	if (divisor==0) //Not able to divide?
26	{
27		*quotient = 0;
28		*remainder = temp; //Unable to comply!
29		*error = 1; //Divide by 0 error!
30		return; //Abort: division by 0!
31	}
32
33	if (CPU_apply286cycles()) /* No 80286+ cycles instead? */
34	{
35		SHLcycle = ADDSUBcycle = 0; //Don't apply the cycle counts for this instruction!
36		*applycycles = 0; //Don't apply the cycles anymore!
37	}
38
39	temp = val; //Load the remainder to use!
40	resultquotient = 0; //Default: we have nothing after division! 
41	nextstep:
42	//First step: calculate shift so that (divisor<<shift)<=remainder and ((divisor<<(shift+1))>remainder)
43	temp2 = divisor; //Load the default divisor for x1!
44	if (temp2>temp) //Not enough to divide? We're done!
45	{
46		goto gotresult; //We've gotten a result!
47	}
48	currentquotient = 1; //We're starting with x1 factor!
49	for (shift=0;shift<(resultbits+1);++shift) //Check for the biggest factor to apply(we're going from bit 0 to maxbit)!
50	{
51		if ((temp2<=temp) && ((temp2<<1)>temp)) //Found our value to divide?
52		{
53			CPU[activeCPU].cycles_OP += SHLcycle; //We're taking 1 more SHL cycle for this!
54			break; //We've found our shift!
55		}
56		temp2 <<= 1; //Shift to the next position!
57		currentquotient <<= 1; //Shift to the next result!
58		CPU[activeCPU].cycles_OP += SHLcycle; //We're taking 1 SHL cycle for this! Assuming parallel shifting!
59	}
60	if (shift==(resultbits+1)) //We've overflown? We're too large to divide!

…Show last 71 lines

61	{
62		*error = 1; //Raise divide by 0 error due to overflow!
63		return; //Abort!
64	}
65	//Second step: substract divisor<<n from remainder and increase result with 1<<n.
66	temp -= temp2; //Substract divisor<<n from remainder!
67	resultquotient += currentquotient; //Increase result(divided value) with the found power of 2 (1<<n).
68	CPU[activeCPU].cycles_OP += ADDSUBcycle; //We're taking 1 substract and 1 addition cycle for this(ADD/SUB register take 3 cycles)!
69	goto nextstep; //Start the next step!
70	//Finished when remainder<divisor or remainder==0.
71	gotresult: //We've gotten a result!
72	if (temp>((1<<resultbits)-1)) //Modulo overflow?
73	{
74		*error = 1; //Raise divide by 0 error due to overflow!
75		return; //Abort!		
76	}
77	if (resultquotient>((1<<resultbits)-1)) //Quotient overflow?
78	{
79		*error = 1; //Raise divide by 0 error due to overflow!
80		return; //Abort!		
81	}
82	if (issigned) //Check for signed overflow as well?
83	{
84		/*
85		if (checkSignedOverflow(temp,32,resultbits,remaindernegative))
86		{
87			*error = 1; //Raise divide by 0 error due to overflow!
88			return; //Abort!					
89		}
90		*/
91		if (checkSignedOverflow(resultquotient,32,resultbits,quotientnegative))
92		{
93			*error = 1; //Raise divide by 0 error due to overflow!
94			return; //Abort!					
95		}
96	}
97	*quotient = resultquotient; //Quotient calculated!
98	*remainder = temp; //Give the modulo! The result is already calculated!
99	*error = 0; //We're having a valid result!
100}
101
102void CPU8086_internal_IDIV(uint_32 val, word divisor, word *quotient, word *remainder, byte *error, byte resultbits, byte SHLcycle, byte ADDSUBcycle, byte *applycycles)
103{
104	byte quotientnegative, remaindernegative; //To toggle the result and apply sign after and before?
105	quotientnegative = remaindernegative = 0; //Default: don't toggle the result not remainder!
106	if (((val>>31)!=(divisor>>15))) //Are we to change signs on the result? The result is negative instead! (We're a +/- or -/+ division)
107	{
108		quotientnegative = 1; //We're to toggle the result sign if not zero!
109	}
110	if (val&0x80000000) //Negative value to divide?
111	{
112		val = ((~val)+1); //Convert the negative value to be positive!
113		remaindernegative = 1; //We're to toggle the remainder is any, because the value to divide is negative!
114	}
115	if (divisor&0x8000) //Negative divisor? Convert to a positive divisor!
116	{
117		divisor = ((~divisor)+1); //Convert the divisor to be positive!
118	}
119	CPU8086_internal_DIV(val,divisor,quotient,remainder,error,resultbits,SHLcycle,ADDSUBcycle,applycycles,1,quotientnegative,remaindernegative); //Execute the division as an unsigned division!
120	if (*error==0) //No error has occurred? Do post-processing of the results!
121	{
122		if (quotientnegative) //The result is negative?
123		{
124			*quotient = (~*quotient)+1; //Apply the new sign to the result!
125		}
126		if (remaindernegative) //The remainder is negative?
127		{
128			*remainder = (~*remainder)+1; //Apply the new sign to the remainder!
129		}
130	}
131}

AAM:

1	CPU8086_internal_DIV(REG_AL,data,&quotient,&remainder,&error,8,2,6,&applycycles,0,0,0);

8-bit DIV to AH/AL:

1	CPU8086_internal_DIV(valdiv,divisor,&quotient,&remainder,&error,8,2,6,&applycycles,0,0,0); //Execute the unsigned division! 8-bits result and modulo!

8-bit IDIV to AH/AL:

1	valdivd = valdiv;
2	divisorw = divisor;
3	if (valdiv&0x8000) valdivd |= 0xFFFF0000; //Sign extend to 32-bits!
4	if (divisor&0x80) divisorw |= 0xFF00; //Sign extend to 16-bits!
5	CPU8086_internal_IDIV(valdivd,divisorw,&quotient,&remainder,&error,8,2,6,&applycycles); //Execute the unsigned division! 8-bits result and modulo!

16-bit DIV to AX/DX:

1	CPU8086_internal_DIV(valdiv,divisor,&quotient,&remainder,&error,16,2,6,&applycycles,0,0,0); //Execute the unsigned division! 8-bits result and modulo!

16-bit IDIV to AX/DX:

1	CPU8086_internal_IDIV(valdiv,divisor,&quotient,&remainder,&error,16,2,6,&applycycles); //Execute the unsigned division! 8-bits result and modulo!

32-bit DIV to EAX/EDX:

1	CPU80386_internal_DIV(valdiv,divisor,&quotient,&remainder,&error,32,2,6,&applycycles,0,0,0); //Execute the unsigned division! 8-bits result and modulo!

32-bit IDIV to EAX/EDX:

1	CPU80386_internal_IDIV(valdiv,divisor,&quotient,&remainder,&error,32,2,6,&applycycles); //Execute the unsigned division! 8-bits result and modulo!

Author of the UniPCemu emulator.
UniPCemu Git repository
UniPCemu for Android, Windows, PSP, Vita and Switch on itch.io

Reply 76 of 178, by superfury

Posted on 2017-11-04, 12:22

superfury Offline

Rank l33t++

Rank: l33t++
Posts: 5858
Joined: 2014-03-08, 11:25
Location: Netherlands

Just asking: what exactly IS the meaning of the different values that are logged in the EE log? What are the values logged with EAX/EDX and PS values? Is PS simply a direct dump of the lower 16 bits of the EFLAGS register(masked to only contain (un)defined bits)? What about the two sets of EAX/EDX values? Are they the EAX/EDX registers before/after the instruction, or something else?

Author of the UniPCemu emulator.
UniPCemu Git repository
UniPCemu for Android, Windows, PSP, Vita and Switch on itch.io

Reply 77 of 178, by superfury

Posted on 2017-11-04, 14:22

superfury Offline

Rank l33t++

Rank: l33t++
Posts: 5858
Joined: 2014-03-08, 11:25
Location: Netherlands

I've managed to improve the under/overflow algorithm to detect better, but somehow your EE log reports #DE when it shouldn't be, according to pure logic?

1/*
2
3checkSignedOverflow: Checks if a signed overflow occurs trying to store the data.
4unsignedval: The unsigned, positive value
5calculatedbits: The amount of bits that's stored in unsignedval.
6bits: The amount of bits to store in.
7convertedtopositive: The unsignedval is a positive conversion from a negative result, so needs to be converted back.
8
9*/
10
11//Based on http://www.ragestorm.net/blogs/?p=34
12
13byte checkSignedOverflow(uint_64 unsignedval, byte calculatedbits, byte bits, byte convertedtopositive)
14{
15	uint_64 maxpositive,maxnegative;
16	maxpositive = ((1ULL<<(bits-1))-1); //Maximum positive value we can have!
17	maxnegative = (1ULL<<(bits-1)); //The highest value we cannot set and get past when negative!
18	if (unlikely(((unsignedval>maxpositive) && (convertedtopositive==0)) && ((unsignedval>maxnegative) && (convertedtopositive)))) //Signed underflow/overflow on unsinged conversion?
19	{
20		return 1; //Underflow/overflow detected!
21	}
22	return 0; //OK!
23}

Example faulting line:
Actual result it should be(according to the reference file, at row 25054):

1IDIVDL B EAX=00000080 EDX=00000001 PS=0000 #DE EAX=00000080 EDX=00000001 PS=0000

UniPCemu's result:

1IDIVDL B EAX=00000080 EDX=00000001 PS=0000 EAX=00000080 EDX=00000001 PS=0000

UniPCemu doesn't detect this correctly somehow? So it's +80/1=+80, which won't fit in the 8-bit signed result, causing a fault?

Author of the UniPCemu emulator.
UniPCemu Git repository
UniPCemu for Android, Windows, PSP, Vita and Switch on itch.io

Reply 78 of 178, by superfury

Posted on 2017-11-04, 21:44

superfury Offline

Rank l33t++

Rank: l33t++
Posts: 5858
Joined: 2014-03-08, 11:25
Location: Netherlands

Managed to fix the IDIV instructions too now, with all variants (as well as normal DIV) functioning properly:

The new and improved x86 (with sign support properly added and working) division algorithms:

Overflow check support for determining Division error:

1/*
2
3checkSignedOverflow: Checks if a signed overflow occurs trying to store the data.
4unsignedval: The unsigned, positive value
5calculatedbits: The amount of bits that's stored in unsignedval.
6bits: The amount of bits to store in.
7convertedtopositive: The unsignedval is a positive conversion from a negative result, so needs to be converted back.
8
9*/
10
11//Based on http://www.ragestorm.net/blogs/?p=34
12
13byte checkSignedOverflow(uint_64 unsignedval, byte calculatedbits, byte bits, byte convertedtopositive)
14{
15	uint_64 maxpositive,maxnegative;
16	maxpositive = ((1ULL<<(bits-1))-1); //Maximum positive value we can have!
17	maxnegative = (1ULL<<(bits-1)); //The highest value we cannot set and get past when negative!
18	if (unlikely(((unsignedval>maxpositive) && (convertedtopositive==0)) || ((unsignedval>maxnegative) && (convertedtopositive)))) //Signed underflow/overflow on unsinged conversion?
19	{
20		return 1; //Underflow/overflow detected!
21	}
22	return 0; //OK!
23}

16-bit:

1//Universal DIV instruction for x86 DIV instructions!
2/*
3
4Parameters:
5	val: The value to divide
6	divisor: The value to divide by
7	quotient: Quotient result container
8	remainder: Remainder result container
9	error: 1 on error(DIV0), 0 when valid.
10	resultbits: The amount of bits the result contains(16 or 8 on 8086) of quotient and remainder.
11	SHLcycle: The amount of cycles for each SHL.
12	ADDSUBcycle: The amount of cycles for ADD&SUB instruction to execute.
13	issigned: Signed division?
14	quotientnegative: Quotient is signed negative result?
15	remaindernegative: Remainder is signed negative result?
16
17*/
18void CPU8086_internal_DIV(uint_32 val, word divisor, word *quotient, word *remainder, byte *error, byte resultbits, byte SHLcycle, byte ADDSUBcycle, byte *applycycles, byte issigned, byte quotientnegative, byte remaindernegative)
19{
20	uint_32 temp, temp2, currentquotient; //Remaining value and current divisor!
21	uint_32 resultquotient;
22	byte shift; //The shift to apply! No match on 0 shift is done!
23	temp = val; //Load the value to divide!
24	*applycycles = 1; //Default: apply the cycles normally!
25	if (divisor==0) //Not able to divide?
26	{
27		*quotient = 0;
28		*remainder = temp; //Unable to comply!
29		*error = 1; //Divide by 0 error!
30		return; //Abort: division by 0!
31	}
32
33	if (CPU_apply286cycles()) /* No 80286+ cycles instead? */
34	{
35		SHLcycle = ADDSUBcycle = 0; //Don't apply the cycle counts for this instruction!
36		*applycycles = 0; //Don't apply the cycles anymore!
37	}
38
39	temp = val; //Load the remainder to use!
40	resultquotient = 0; //Default: we have nothing after division! 
41	nextstep:
42	//First step: calculate shift so that (divisor<<shift)<=remainder and ((divisor<<(shift+1))>remainder)
43	temp2 = divisor; //Load the default divisor for x1!
44	if (temp2>temp) //Not enough to divide? We're done!
45	{
46		goto gotresult; //We've gotten a result!
47	}
48	currentquotient = 1; //We're starting with x1 factor!
49	for (shift=0;shift<(resultbits+1);++shift) //Check for the biggest factor to apply(we're going from bit 0 to maxbit)!
50	{
51		if ((temp2<=temp) && ((temp2<<1)>temp)) //Found our value to divide?
52		{
53			CPU[activeCPU].cycles_OP += SHLcycle; //We're taking 1 more SHL cycle for this!
54			break; //We've found our shift!
55		}
56		temp2 <<= 1; //Shift to the next position!
57		currentquotient <<= 1; //Shift to the next result!
58		CPU[activeCPU].cycles_OP += SHLcycle; //We're taking 1 SHL cycle for this! Assuming parallel shifting!
59	}
60	if (shift==(resultbits+1)) //We've overflown? We're too large to divide!

…Show last 71 lines

61	{
62		*error = 1; //Raise divide by 0 error due to overflow!
63		return; //Abort!
64	}
65	//Second step: substract divisor<<n from remainder and increase result with 1<<n.
66	temp -= temp2; //Substract divisor<<n from remainder!
67	resultquotient += currentquotient; //Increase result(divided value) with the found power of 2 (1<<n).
68	CPU[activeCPU].cycles_OP += ADDSUBcycle; //We're taking 1 substract and 1 addition cycle for this(ADD/SUB register take 3 cycles)!
69	goto nextstep; //Start the next step!
70	//Finished when remainder<divisor or remainder==0.
71	gotresult: //We've gotten a result!
72	if (temp>((1<<resultbits)-1)) //Modulo overflow?
73	{
74		*error = 1; //Raise divide by 0 error due to overflow!
75		return; //Abort!		
76	}
77	if (resultquotient>((1<<resultbits)-1)) //Quotient overflow?
78	{
79		*error = 1; //Raise divide by 0 error due to overflow!
80		return; //Abort!		
81	}
82	if (issigned) //Check for signed overflow as well?
83	{
84		/*
85		if (checkSignedOverflow(temp,32,resultbits,remaindernegative))
86		{
87			*error = 1; //Raise divide by 0 error due to overflow!
88			return; //Abort!
89		}
90		*/
91		if (checkSignedOverflow(resultquotient,32,resultbits,quotientnegative))
92		{
93			*error = 1; //Raise divide by 0 error due to overflow!
94			return; //Abort!
95		}
96	}
97	*quotient = resultquotient; //Quotient calculated!
98	*remainder = temp; //Give the modulo! The result is already calculated!
99	*error = 0; //We're having a valid result!
100}
101
102void CPU8086_internal_IDIV(uint_32 val, word divisor, word *quotient, word *remainder, byte *error, byte resultbits, byte SHLcycle, byte ADDSUBcycle, byte *applycycles)
103{
104	byte quotientnegative, remaindernegative; //To toggle the result and apply sign after and before?
105	quotientnegative = remaindernegative = 0; //Default: don't toggle the result not remainder!
106	if (((val>>31)!=(divisor>>15))) //Are we to change signs on the result? The result is negative instead! (We're a +/- or -/+ division)
107	{
108		quotientnegative = 1; //We're to toggle the result sign if not zero!
109	}
110	if (val&0x80000000) //Negative value to divide?
111	{
112		val = ((~val)+1); //Convert the negative value to be positive!
113		remaindernegative = 1; //We're to toggle the remainder is any, because the value to divide is negative!
114	}
115	if (divisor&0x8000) //Negative divisor? Convert to a positive divisor!
116	{
117		divisor = ((~divisor)+1); //Convert the divisor to be positive!
118	}
119	CPU8086_internal_DIV(val,divisor,quotient,remainder,error,resultbits,SHLcycle,ADDSUBcycle,applycycles,1,quotientnegative,remaindernegative); //Execute the division as an unsigned division!
120	if (*error==0) //No error has occurred? Do post-processing of the results!
121	{
122		if (quotientnegative) //The result is negative?
123		{
124			*quotient = (~*quotient)+1; //Apply the new sign to the result!
125		}
126		if (remaindernegative) //The remainder is negative?
127		{
128			*remainder = (~*remainder)+1; //Apply the new sign to the remainder!
129		}
130	}
131}

32-bit:

1//Universal DIV instruction for x86 DIV instructions!
2/*
3
4Parameters:
5	val: The value to divide
6	divisor: The value to divide by
7	quotient: Quotient result container
8	remainder: Remainder result container
9	error: 1 on error(DIV0), 0 when valid.
10	resultbits: The amount of bits the result contains(16 or 8 on 8086) of quotient and remainder.
11	SHLcycle: The amount of cycles for each SHL.
12	ADDSUBcycle: The amount of cycles for ADD&SUB instruction to execute.
13	issigned: Signed division?
14	quotientnegative: Quotient is signed negative result?
15	remaindernegative: Remainder is signed negative result?
16
17*/
18void CPU80386_internal_DIV(uint_64 val, uint_32 divisor, uint_32 *quotient, uint_32 *remainder, byte *error, byte resultbits, byte SHLcycle, byte ADDSUBcycle, byte *applycycles, byte issigned, byte quotientnegative, byte remaindernegative)
19{
20	uint_64 temp, temp2, currentquotient; //Remaining value and current divisor!
21	uint_64 resultquotient;
22	byte shift; //The shift to apply! No match on 0 shift is done!
23	temp = val; //Load the value to divide!
24	*applycycles = 1; //Default: apply the cycles normally!
25	if (divisor==0) //Not able to divide?
26	{
27		*quotient = 0;
28		*remainder = temp; //Unable to comply!
29		*error = 1; //Divide by 0 error!
30		return; //Abort: division by 0!
31	}
32
33	if (CPU_apply286cycles()) /* No 80286+ cycles instead? */
34	{
35		SHLcycle = ADDSUBcycle = 0; //Don't apply the cycle counts for this instruction!
36		*applycycles = 0; //Don't apply the cycles anymore!
37	}
38
39	temp = val; //Load the remainder to use!
40	resultquotient = 0; //Default: we have nothing after division! 
41	nextstep:
42	//First step: calculate shift so that (divisor<<shift)<=remainder and ((divisor<<(shift+1))>remainder)
43	temp2 = divisor; //Load the default divisor for x1!
44	if (temp2>temp) //Not enough to divide? We're done!
45	{
46		goto gotresult; //We've gotten a result!
47	}
48	currentquotient = 1; //We're starting with x1 factor!
49	for (shift=0;shift<(resultbits+1);++shift) //Check for the biggest factor to apply(we're going from bit 0 to maxbit)!
50	{
51		if ((temp2<=temp) && ((temp2<<1)>temp)) //Found our value to divide?
52		{
53			CPU[activeCPU].cycles_OP += SHLcycle; //We're taking 1 more SHL cycle for this!
54			break; //We've found our shift!
55		}
56		temp2 <<= 1; //Shift to the next position!
57		currentquotient <<= 1; //Shift to the next result!
58		CPU[activeCPU].cycles_OP += SHLcycle; //We're taking 1 SHL cycle for this! Assuming parallel shifting!
59	}
60	if (shift==(resultbits+1)) //We've overflown? We're too large to divide!

…Show last 71 lines

61	{
62		*error = 1; //Raise divide by 0 error due to overflow!
63		return; //Abort!
64	}
65	//Second step: substract divisor<<n from remainder and increase result with 1<<n.
66	temp -= temp2; //Substract divisor<<n from remainder!
67	resultquotient += currentquotient; //Increase result(divided value) with the found power of 2 (1<<n).
68	CPU[activeCPU].cycles_OP += ADDSUBcycle; //We're taking 1 substract and 1 addition cycle for this(ADD/SUB register take 3 cycles)!
69	goto nextstep; //Start the next step!
70	//Finished when remainder<divisor or remainder==0.
71	gotresult: //We've gotten a result!
72	if (temp>((1ULL<<resultbits)-1)) //Modulo overflow?
73	{
74		*error = 1; //Raise divide by 0 error due to overflow!
75		return; //Abort!		
76	}
77	if (resultquotient>((1ULL<<resultbits)-1ULL)) //Quotient overflow?
78	{
79		*error = 1; //Raise divide by 0 error due to overflow!
80		return; //Abort!		
81	}
82	if (issigned) //Check for signed overflow as well?
83	{
84		/*
85		if (checkSignedOverflow(temp,64,resultbits,remaindernegative))
86		{
87			*error = 1; //Raise divide by 0 error due to overflow!
88			return; //Abort!
89		}
90		*/
91		if (checkSignedOverflow(resultquotient,64,resultbits,quotientnegative))
92		{
93			*error = 1; //Raise divide by 0 error due to overflow!
94			return; //Abort!
95		}
96	}
97	*quotient = resultquotient; //Quotient calculated!
98	*remainder = temp; //Give the modulo! The result is already calculated!
99	*error = 0; //We're having a valid result!
100}
101
102void CPU80386_internal_IDIV(uint_64 val, uint_32 divisor, uint_32 *quotient, uint_32 *remainder, byte *error, byte resultbits, byte SHLcycle, byte ADDSUBcycle, byte *applycycles)
103{
104	byte quotientnegative, remaindernegative; //To toggle the result and apply sign after and before?
105	quotientnegative = remaindernegative = 0; //Default: don't toggle the result not remainder!
106	if (((val>>63)!=(divisor>>31))) //Are we to change signs on the result? The result is negative instead! (We're a +/- or -/+ division)
107	{
108		quotientnegative = 1; //We're to toggle the result sign if not zero!
109	}
110	if (val&0x8000000000000000ULL) //Negative value to divide?
111	{
112		val = ((~val)+1); //Convert the negative value to be positive!
113		remaindernegative = 1; //We're to toggle the remainder is any, because the value to divide is negative!
114	}
115	if (divisor&0x80000000) //Negative divisor? Convert to a positive divisor!
116	{
117		divisor = ((~divisor)+1); //Convert the divisor to be positive!
118	}
119	CPU80386_internal_DIV(val,divisor,quotient,remainder,error,resultbits,SHLcycle,ADDSUBcycle,applycycles,1,quotientnegative,remaindernegative); //Execute the division as an unsigned division!
120	if (*error==0) //No error has occurred? Do post-processing of the results!
121	{
122		if (quotientnegative) //The result is negative?
123		{
124			*quotient = (~*quotient)+1; //Apply the new sign to the result!
125		}
126		if (remaindernegative) //The remainder is negative?
127		{
128			*remainder = (~*remainder)+1; //Apply the new sign to the remainder!
129		}
130	}
131}

Both are functioning without problems now:D

The only things that are giving problems now are the arithmetic (SALr etc.) instructiona and IMUL instructions?

Author of the UniPCemu emulator.
UniPCemu Git repository
UniPCemu for Android, Windows, PSP, Vita and Switch on itch.io

Reply 79 of 178, by superfury

Posted on 2017-11-05, 14:00

superfury Offline

Rank l33t++

Rank: l33t++
Posts: 5858
Joined: 2014-03-08, 11:25
Location: Netherlands

Just took a look at the 80386 manual at http://x86.renejeschke.de/ and implemented the wrappings on the counts etc. I've ran the testsuite again and after comparisions with the EE reference I saw the following:

1SAL1=OK
2SALi=OK
3SALr=Flags problems
4SAR1=OK
5SARi=OK
6SARr=OK
7SHR1=OK
8SHRi=OK
9SHRr=Flags problems
10ROL1=OK
11ROLi=Flags problems(overflow flag not set?)
12ROLr=Carry flag problems
13ROR1=OK
14RORi=OK
15RORr(b/w)=Carry flag problems
16RORr(d)=OK
17RCL1=OK
18RCLi=Overflow flag problems(not set) at word variant only.
19RCLr=OK
20RCR1=OK
21RCRi=Overflow flag problems(not set).
22RCRr=Carry flag problems(set/not set).

This is the actual general instruction executed for all of those rotate/shift instructions:

8/16-bit(8086+):

1byte op_grp2_8(byte cnt, byte varshift) {
2	//word d,
3	INLINEREGISTER word s, shift, tempCF, msb;
4	INLINEREGISTER byte numcnt;
5	//word backup;
6	//if (cnt>0x8) return(oper1b); //NEC V20/V30+ limits shift count
7	numcnt = cnt; //Save count!
8	s = oper1b;
9	switch (thereg) {
10	case 0: //ROL r/m8
11		if (EMULATED_CPU>=CPU_80386) numcnt &= 7; //Operand size wrap!
12		else if (EMULATED_CPU >= CPU_NECV30) numcnt &= 0x1F; //Clear the upper 3 bits to become a NEC V20/V30+!
13		for (shift = 1; shift <= numcnt; shift++) {
14			tempCF = ((s&0x80)>>7); //Save MSB!
15			s = (s << 1)|tempCF;
16		}
17		FLAGW_CF(s&1); //Set carry flag!
18		if (cnt==1) FLAGW_OF(((s >> 7) & 1)^FLAG_CF);
19		break;
20
21	case 1: //ROR r/m8
22		if (EMULATED_CPU>=CPU_80386) numcnt &= 7; //Operand size wrap!
23		else if (EMULATED_CPU >= CPU_NECV30) numcnt &= 0x1F; //Clear the upper 3 bits to become a NEC V20/V30+!
24		for (shift = 1; shift <= numcnt; shift++) {
25			tempCF = (s&1); //Save LSB!
26			s = (s >> 1) | (tempCF << 7);
27			FLAGW_CF(tempCF); //Set carry flag!
28		}
29		if (cnt==1) FLAGW_OF((s >> 7) ^ ((s >> 6) & 1));
30		break;
31
32	case 2: //RCL r/m8
33		if (EMULATED_CPU >= CPU_NECV30) numcnt &= 0x1F; //Clear the upper 3 bits to become a NEC V20/V30+!
34		if (EMULATED_CPU>=CPU_80386) numcnt %= 9; //Operand size wrap!
35		for (shift = 1; shift <= numcnt; shift++) {
36			tempCF = ((s&0x80)>>7); //Save MSB!
37			s = (s << 1)|FLAG_CF; //Shift and set CF!
38			FLAGW_CF(tempCF); //Set CF!
39		}
40		if (cnt==1) FLAGW_OF(((s >> 7) & 1)^FLAG_CF); //OF=MSB^CF
41		break;
42
43	case 3: //RCR r/m8
44		if (EMULATED_CPU >= CPU_NECV30) numcnt &= 0x1F; //Clear the upper 3 bits to become a NEC V20/V30+!
45		if (EMULATED_CPU>=CPU_80386) numcnt %= 9; //Operand size wrap!
46		if (cnt==1) FLAGW_OF((s >> 7) ^ FLAG_CF);
47		for (shift = 1; shift <= numcnt; shift++) {
48			tempCF = (s&1); //Save LSB!
49			s = (s >> 1) | (FLAG_CF << 7);
50			FLAGW_CF(tempCF);
51		}
52		break;
53
54	case 4: case 6: //SHL r/m8
55		if (EMULATED_CPU >= CPU_NECV30) numcnt &= 0x1F; //Clear the upper 3 bits to become a NEC V20/V30+!
56		//FLAGW_AF(0);
57		for (shift = 1; shift <= numcnt; shift++) {
58			if (s & 0x80) FLAGW_CF(1); else FLAGW_CF(0);
59			//if (s & 0x8) FLAGW_AF(1); //Auxiliary carry?
60			s = (s << 1) & 0xFF;

…Show last 165 lines

61		}
62		if (numcnt==1) { if (FLAG_CF==(s>>7)) FLAGW_OF(0); else FLAGW_OF(1); }
63		flag_szp8((uint8_t)(s&0xFF)); break;
64
65	case 5: //SHR r/m8
66		if (EMULATED_CPU >= CPU_NECV30) numcnt &= 0x1F; //Clear the upper 3 bits to become a NEC V20/V30+!
67		if (numcnt==1) { if (s&0x80) FLAGW_OF(1); else FLAGW_OF(0); }
68		//FLAGW_AF(0);
69		for (shift = 1; shift <= numcnt; shift++) {
70			FLAGW_CF(s & 1);
71			//backup = s; //Save backup!
72			s = s >> 1;
73			//if (((backup^s)&0x10)) FLAGW_AF(1); //Auxiliary carry?
74		}
75		flag_szp8((uint8_t)(s & 0xFF)); break;
76
77	case 7: //SAR r/m8
78		if (EMULATED_CPU >= CPU_NECV30) numcnt &= 0x1F; //Clear the upper 3 bits to become a NEC V20/V30+!
79		msb = s & 0x80;
80		//FLAGW_AF(0);
81		for (shift = 1; shift <= numcnt; shift++) {
82			FLAGW_CF(s & 1);
83			//backup = s; //Save backup!
84			s = (s >> 1) | msb;
85			//if (((backup^s)&0x10)) FLAGW_AF(1); //Auxiliary carry?
86		}
87		byte tempSF;
88		tempSF = FLAG_SF; //Save the SF!
89		/*flag_szp8((uint8_t)(s & 0xFF));*/
90		//http://www.electronics.dit.ie/staff/tscarff/8086_instruction_set/8086_instruction_set.html#SAR says only C and O flags!
91		if (!numcnt) //Nothing done?
92		{
93			FLAGW_SF(tempSF); //We don't update when nothing's done!
94		}
95		else if (numcnt==1) //Overflow is cleared on all 1-bit shifts!
96		{
97			flag_szp8(s); //Affect sign as well!
98			FLAGW_OF(0); //Cleared!
99		}
100		else if (numcnt) //Anything shifted at all?
101		{
102			flag_szp8(s); //Affect sign as well!
103			if (EMULATED_CPU<=CPU_NECV30) //Valid to update OF?
104			{
105				FLAGW_OF(0); //Cleared with count as well?
106			}
107		}
108		break;
109	}
110	op_grp2_cycles(numcnt, varshift);
111	return(s & 0xFF);
112}
113
114word op_grp2_16(byte cnt, byte varshift) {
115	//word d,
116	INLINEREGISTER uint_32 s, shift, tempCF, msb;
117	INLINEREGISTER byte numcnt;
118	//word backup;
119	//if (cnt>0x8) return(oper1b); //NEC V20/V30+ limits shift count
120	numcnt = cnt; //Save count!
121	s = oper1;
122	switch (thereg) {
123	case 0: //ROL r/m16
124		if (EMULATED_CPU>=CPU_80386) numcnt &= 0xF; //Operand size wrap!
125		else if (EMULATED_CPU >= CPU_NECV30) numcnt &= 0x1F; //Clear the upper 3 bits to become a NEC V20/V30+!
126		for (shift = 1; shift <= numcnt; shift++) {
127			tempCF = ((s&0x8000)>>15); //Save MSB!
128			s = (s << 1)|tempCF;
129		}
130		FLAGW_CF(s&1); //Set carry flag!
131		if (cnt==1) FLAGW_OF(((s >> 15) & 1)^FLAG_CF);
132		break;
133
134	case 1: //ROR r/m16
135		if (EMULATED_CPU>=CPU_80386) numcnt &= 0xF; //Operand size wrap!
136		else if (EMULATED_CPU >= CPU_NECV30) numcnt &= 0x1F; //Clear the upper 3 bits to become a NEC V20/V30+!
137		for (shift = 1; shift <= numcnt; shift++) {
138			tempCF = (s&1); //Save LSB!
139			s = (s >> 1) | (tempCF << 15);
140			FLAGW_CF(tempCF); //Set carry flag!
141		}
142		if (cnt==1) FLAGW_OF((s >> 15) ^ ((s >> 14) & 1));
143		break;
144
145	case 2: //RCL r/m16
146		if (EMULATED_CPU >= CPU_NECV30) numcnt &= 0x1F; //Clear the upper 3 bits to become a NEC V20/V30+!
147		if (EMULATED_CPU>=CPU_80386) numcnt %= 17; //Operand size wrap!
148		for (shift = 1; shift <= numcnt; shift++) {
149			tempCF = ((s&0x8000)>>15); //Save MSB!
150			s = (s << 1)|FLAG_CF; //Shift and set CF!
151			FLAGW_CF(tempCF); //Set CF!
152		}
153		if (cnt==1) FLAGW_OF(((s >> 15) & 1)^FLAG_CF); //OF=MSB^CF
154		break;
155
156	case 3: //RCR r/m16
157		if (EMULATED_CPU >= CPU_NECV30) numcnt &= 0x1F; //Clear the upper 3 bits to become a NEC V20/V30+!
158		if (EMULATED_CPU>=CPU_80386) numcnt %= 17; //Operand size wrap!
159		if (cnt==1) FLAGW_OF((s >> 15) ^ FLAG_CF);
160		for (shift = 1; shift <= numcnt; shift++) {
161			tempCF = (s&1); //Save LSB!
162			s = (s >> 1) | (FLAG_CF << 15);
163			FLAGW_CF(tempCF);
164		}
165		break;
166
167	case 4: case 6: //SHL r/m16
168		if (EMULATED_CPU >= CPU_NECV30) numcnt &= 0x1F; //Clear the upper 3 bits to become a NEC V20/V30+!
169		//FLAGW_AF(0);
170		for (shift = 1; shift <= numcnt; shift++) {
171			if (s & 0x8000) FLAGW_CF(1); else FLAGW_CF(0);
172			//if (s & 0x8) FLAGW_AF(1); //Auxiliary carry?
173			s = (s << 1) & 0xFFFF;
174		}
175		if (numcnt==1) { if (FLAG_CF==(s>>15)) FLAGW_OF(0); else FLAGW_OF(1); }
176		flag_szp16((uint16_t)(s&0xFFFF)); break;
177
178	case 5: //SHR r/m16
179		if (EMULATED_CPU >= CPU_NECV30) numcnt &= 0x1F; //Clear the upper 3 bits to become a NEC V20/V30+!
180		if (numcnt==1) { if (s&0x8000) FLAGW_OF(1); else FLAGW_OF(0); }
181		//FLAGW_AF(0);
182		for (shift = 1; shift <= numcnt; shift++) {
183			FLAGW_CF(s & 1);
184			//backup = s; //Save backup!
185			s = s >> 1;
186			//if (((backup^s)&0x10)) FLAGW_AF(1); //Auxiliary carry?
187		}
188		flag_szp16((uint16_t)(s & 0xFFFF)); break;
189
190	case 7: //SAR r/m16
191		if (EMULATED_CPU >= CPU_NECV30) numcnt &= 0x1F; //Clear the upper 3 bits to become a NEC V20/V30+!
192		msb = s & 0x8000;
193		//FLAGW_AF(0);
194		for (shift = 1; shift <= numcnt; shift++) {
195			FLAGW_CF(s & 1);
196			//backup = s; //Save backup!
197			s = (s >> 1) | msb;
198			//if (((backup^s)&0x10)) FLAGW_AF(1); //Auxiliary carry?
199		}
200		byte tempSF;
201		tempSF = FLAG_SF; //Save the SF!
202		/*flag_szp8((uint8_t)(s & 0xFF));*/
203		//http://www.electronics.dit.ie/staff/tscarff/8086_instruction_set/8086_instruction_set.html#SAR says only C and O flags!
204		if (!numcnt) //Nothing done?
205		{
206			FLAGW_SF(tempSF); //We don't update when nothing's done!
207		}
208		else if (numcnt==1) //Overflow is cleared on all 1-bit shifts!
209		{
210			flag_szp16(s); //Affect sign as well!
211			FLAGW_OF(0); //Cleared!
212		}
213		else if (numcnt) //Anything shifted at all?
214		{
215			flag_szp16(s); //Affect sign as well!
216			if (EMULATED_CPU<=CPU_NECV30) //Valid to update OF?
217			{
218				FLAGW_OF(0); //Cleared with count as well?
219			}
220		}
221		break;
222	}
223	op_grp2_cycles(numcnt, varshift);
224	return(s & 0xFFFF);
225}

32-bit:

1uint_32 op_grp2_32(byte cnt, byte varshift) {
2	//word d,
3	INLINEREGISTER uint_64 s, shift, tempCF, msb;
4	INLINEREGISTER byte numcnt;
5	//word backup;
6	//if (cnt>0x8) return(oper1b); //NEC V20/V30+ limits shift count
7	numcnt = cnt; //Save count!
8	s = oper1d;
9	switch (thereg) {
10	case 0: //ROL r/m32
11		if (EMULATED_CPU>=CPU_80386) numcnt &= 0x1F; //Operand size wrap!
12		else if (EMULATED_CPU >= CPU_NECV30) numcnt &= 0x1F; //Clear the upper 3 bits to become a NEC V20/V30+!
13		for (shift = 1; shift <= numcnt; shift++) {
14			tempCF = ((s&0x80000000)>>31); //Save MSB!
15			s = (s << 1)|tempCF;
16		}
17		FLAGW_CF(s&1); //Set carry flag!
18		if (cnt==1) FLAGW_OF(((s >> 31) & 1)^FLAG_CF);
19		break;
20
21	case 1: //ROR r/m32
22		if (EMULATED_CPU>=CPU_80386) numcnt &= 0x1F; //Operand size wrap!
23		else if (EMULATED_CPU >= CPU_NECV30) numcnt &= 0x1F; //Clear the upper 3 bits to become a NEC V20/V30+!
24		for (shift = 1; shift <= numcnt; shift++) {
25			tempCF = (s&1); //Save LSB!
26			s = (s >> 1) | (tempCF << 31);
27			FLAGW_CF(tempCF); //Set carry flag!
28		}
29		if (cnt==1) FLAGW_OF((s >> 31) ^ ((s >> 30) & 1));
30		break;
31
32	case 2: //RCL r/m32
33		if (EMULATED_CPU >= CPU_NECV30) numcnt &= 0x1F; //Clear the upper 3 bits to become a NEC V20/V30+!
34		for (shift = 1; shift <= numcnt; shift++) {
35			tempCF = ((s&0x80000000)>>31); //Save MSB!
36			s = (s << 1)|FLAG_CF; //Shift and set CF!
37			FLAGW_CF(tempCF); //Set CF!
38		}
39		if (cnt==1) FLAGW_OF(((s >> 31) & 1)^FLAG_CF); //OF=MSB^CF
40		break;
41
42	case 3: //RCR r/m32
43		if (EMULATED_CPU >= CPU_NECV30) numcnt &= 0x1F; //Clear the upper 3 bits to become a NEC V20/V30+!
44		if (cnt==1) FLAGW_OF((s >> 31) ^ FLAG_CF);
45		for (shift = 1; shift <= numcnt; shift++) {
46			tempCF = (s&1); //Save LSB!
47			s = (s >> 1) | (FLAG_CF << 31);
48			FLAGW_CF(tempCF);
49		}
50		break;
51
52	case 4: case 6: //SHL r/m32
53		if (EMULATED_CPU >= CPU_NECV30) numcnt &= 0x1F; //Clear the upper 3 bits to become a NEC V20/V30+!
54		//FLAGW_AF(0);
55		for (shift = 1; shift <= numcnt; shift++) {
56			if (s & 0x80000000) FLAGW_CF(1); else FLAGW_CF(0);
57			//if (s & 0x8) FLAGW_AF(1); //Auxiliary carry?
58			s = (s << 1) & 0xFFFFFFFF;
59		}
60		if (numcnt==1) { if (FLAG_CF==(s>>31)) FLAGW_OF(0); else FLAGW_OF(1); }

…Show last 50 lines

61		flag_szp32((uint32_t)(s&0xFFFFFFFF)); break;
62
63	case 5: //SHR r/m32
64		if (EMULATED_CPU >= CPU_NECV30) numcnt &= 0x1F; //Clear the upper 3 bits to become a NEC V20/V30+!
65		if (numcnt==1) { if (s&0x80000000) FLAGW_OF(1); else FLAGW_OF(0); }
66		//FLAGW_AF(0);
67		for (shift = 1; shift <= numcnt; shift++) {
68			FLAGW_CF(s & 1);
69			//backup = s; //Save backup!
70			s = s >> 1;
71			//if (((backup^s)&0x10)) FLAGW_AF(1); //Auxiliary carry?
72		}
73		flag_szp32((uint32_t)(s & 0xFFFFFFFF)); break;
74
75	case 7: //SAR r/m32
76		if (EMULATED_CPU >= CPU_NECV30) numcnt &= 0x1F; //Clear the upper 3 bits to become a NEC V20/V30+!
77		msb = s & 0x80000000;
78		//FLAGW_AF(0);
79		for (shift = 1; shift <= numcnt; shift++) {
80			FLAGW_CF(s & 1);
81			//backup = s; //Save backup!
82			s = (s >> 1) | msb;
83			//if (((backup^s)&0x10)) FLAGW_AF(1); //Auxiliary carry?
84		}
85		byte tempSF;
86		tempSF = FLAG_SF; //Save the SF!
87		/*flag_szp8((uint8_t)(s & 0xFF));*/
88		//http://www.electronics.dit.ie/staff/tscarff/8086_instruction_set/8086_instruction_set.html#SAR says only C and O flags!
89		if (!numcnt) //Nothing done?
90		{
91			FLAGW_SF(tempSF); //We don't update when nothing's done!
92		}
93		else if (numcnt==1) //Overflow is cleared on all 1-bit shifts!
94		{
95			flag_szp32(s); //Affect sign as well!
96			FLAGW_OF(0); //Cleared!
97		}
98		else if (numcnt) //Anything shifted at all?
99		{
100			flag_szp32(s); //Affect sign as well!
101			if (EMULATED_CPU<=CPU_NECV30) //Valid to update OF?
102			{
103				FLAGW_OF(0); //Cleared with count as well?
104			}
105		}
106		break;
107	}
108	op_grp2_cycles32(numcnt, varshift);
109	return(s & 0xFFFFFFFF);
110}

Can anyone see what's going wrong with those instructions?

Edit: Managed to fix until "ROLi B".

Author of the UniPCemu emulator.
UniPCemu Git repository
UniPCemu for Android, Windows, PSP, Vita and Switch on itch.io

Main menu

Topic actions

Reply 60 of 178, by peterferrie

Reply 61 of 178, by superfury

Reply 62 of 178, by peterferrie

Reply 63 of 178, by superfury

Reply 64 of 178, by superfury

Reply 65 of 178, by superfury

Reply 66 of 178, by peterferrie

Reply 67 of 178, by peterferrie

Reply 68 of 178, by superfury

Reply 69 of 178, by superfury

Reply 70 of 178, by hottobar

Reply 71 of 178, by superfury

Reply 72 of 178, by hottobar

Reply 73 of 178, by superfury

Reply 74 of 178, by superfury

Reply 75 of 178, by superfury

Reply 76 of 178, by superfury

Reply 77 of 178, by superfury

Reply 78 of 178, by superfury

Reply 79 of 178, by superfury