VOGONS


First post, by superfury

User metadata
Rank l33t++
Rank
l33t++

The documentation states(80386 programmer's reference manual: https://pdos.csail.mit.edu/6.828/2008/readings/i386/SHRD.htm):

IF ShiftAmt >= OperandSize
THEN (* Bad parameters *)
r/m := UNDEFINED;
CF, OF, SF, ZF, AF, PF := UNDEFINED;

So, with 16-bit operands, this will happen(32-bit operands are unchanged due to the mask of 1F, no overflow can happen):
Does that mean that shifts of 16(cnt=0x10) and up are NOPs? Or shifts of 17(0x11) and up are NOPs? I'd assume OperandSize==16 in this case?

Author of the UniPCemu emulator.
UniPCemu Git repository
UniPCemu for Android, Windows, PSP, Vita and Switch on itch.io

Reply 1 of 15, by BloodyCactus

User metadata
Rank Oldbie
Rank
Oldbie

did you test it? its super easy to test.. but of course not, you never do real tests you just ask us.

the answer might surprise you. I'll give you a clue, its not a No-Op...

--/\-[ Stu : Bloody Cactus :: [ https://bloodycactus.com :: http://kråketær.com ]-/\--

Reply 2 of 15, by superfury

User metadata
Rank l33t++
Rank
l33t++

Well the most difficult part is that I don't have any real 80386/80486 to test that on, so that's kind of impossible.

All I have is documentation and emulator(fake86, Bochs, Dosbox, qemu, PCem etc.) and these forums as a reference.

Dosbox doesn't seem to do anything special, as far as I can see.

Bochs says in it's Pentium-based documentation, it's apparently reloading the second parameter to it's original value and repeat shifting normally(thus shifting param1::param2::param2 effectively). Or it's some kind of bit index being used, taking said bit(a counter counting down) on said value each shift to shift into the first parameter. And said counter wraps around 16-bits?

PCem doesn't add much...
Edit: Pcem says the top n bits(when past 16 shifts gets added zeroes?

Last edited by superfury on 2019-01-18, 01:15. Edited 1 time in total.

Author of the UniPCemu emulator.
UniPCemu Git repository
UniPCemu for Android, Windows, PSP, Vita and Switch on itch.io

Reply 4 of 15, by superfury

User metadata
Rank l33t++
Rank
l33t++

So past 16 bits shifted in it starts to shift in zeroes at the LSb(shld) or MSb(shrd), due to emptied src?

Author of the UniPCemu emulator.
UniPCemu Git repository
UniPCemu for Android, Windows, PSP, Vita and Switch on itch.io

Reply 5 of 15, by BloodyCactus

User metadata
Rank Oldbie
Rank
Oldbie

On real hardware;

mov eax,0x12345678
mov ebx,0x87654321
shld eax,ebx,4
==
eax=0x23456788
ebx=0x87654321

mov eax,0x12345678
mov ebx,0x87654321
shld eax,ebx,0x96
==
eax=0x9E21D950
ebx=0x87654321


mov eax,1
shl eax,0x33
==
eax=0x00080000


mov eax,1
shl eax,0x20
==
eax=0x00000001


mov eax,1
shl eax,0x21
==
eax=0x00000002

mov eax,-1
shl eax,0x24
==
eax=0xFFFFFFF0

As you can see, the shift bit counts acts as MOD. Dosbox also gives same matching results.

--/\-[ Stu : Bloody Cactus :: [ https://bloodycactus.com :: http://kråketær.com ]-/\--

Reply 6 of 15, by superfury

User metadata
Rank l33t++
Rank
l33t++

Well, I know the cnt (third parameter, either CL or immediate byte) is taken modulo 32(as is documented). But then, while shifting past count 16, what happens? Does it merrily start shifting zeroes from the now-emptied second operand(what UniPCemu does)?

Current implementation of SHLD/SHRD 16-bit:

extern byte BST_cnt; //How many of bit scan/test (forward) times are taken?

//0F AA is RSM FLAGS on 386++

//SHL/RD instructions.

word tempSHLRDW;

void CPU80386_SHLD_16(word *dest, word src, byte cnt)
{
byte shift;
cnt &= 0x1F;
BST_cnt = 0; //Count!
if (cnt && (cnt<16)) //To actually shift?
{
if (!dest) { if (CPU8086_internal_stepreadmodrmw(0,&tempSHLRDW,MODRM_src0)) return; } //Read source if needed!
else if (CPU[activeCPU].internalinstructionstep==0) tempSHLRDW = *dest;
if (CPU[activeCPU].internalinstructionstep==0) //Exection step?
{
BST_cnt = cnt; //Count!
for (shift = 1; shift <= cnt; shift++)
{
if (tempSHLRDW & 0x8000) FLAGW_CF(1); else FLAGW_CF(0);
tempSHLRDW = ((tempSHLRDW << 1) & 0xFFFF)|((src>>15)&1);
src <<= 1; //Next bit to shift in!
}
if (cnt==1) { if (FLAG_CF == (tempSHLRDW >> 15)) FLAGW_OF(0); else FLAGW_OF(1); }
flag_szp16(tempSHLRDW);
++CPU[activeCPU].internalinstructionstep;
CPU_apply286cycles(); /* Apply cycles */
if (dest==NULL)
{
CPU[activeCPU].executed = 0; //Still running!
return;
}
}
if (dest)
{
*dest = tempSHLRDW;
}
else
{
if (CPU8086_internal_stepwritemodrmw(2,tempSHLRDW,MODRM_src0,0)) return;
}
}
}

void CPU80386_SHRD_16(word *dest, word src, byte cnt)
{
byte shift;
cnt &= 0x1F;
BST_cnt = 0; //Count!
if (cnt && (cnt<16))
{
if (!dest) { if (CPU8086_internal_stepreadmodrmw(0,&tempSHLRDW,MODRM_src0)) return; } //Read source if needed!
else if (CPU[activeCPU].internalinstructionstep==0) tempSHLRDW = *dest;
if (CPU[activeCPU].internalinstructionstep==0) //Exection step?
{
BST_cnt = cnt; //Count!
if (cnt == 1) FLAGW_OF(((tempSHLRDW & 0x8000) ^ ((src & 1) << 15)) ? 1 : 0);
Show last 25 lines
			for (shift = 1; shift <= cnt; shift++)
{
FLAGW_CF(tempSHLRDW & 1);
tempSHLRDW = ((tempSHLRDW >> 1)|((src&1)<<15));
src >>= 1; //Next bit to shift in!
}
flag_szp16(tempSHLRDW);
++CPU[activeCPU].internalinstructionstep;
CPU_apply286cycles(); /* Apply cycles */
if (dest==NULL)
{
CPU[activeCPU].executed = 0; //Still running!
return;
}
}
if (dest)
{
*dest = tempSHLRDW;
}
else
{
if (CPU8086_internal_stepwritemodrmw(2,tempSHLRDW,MODRM_src0,0)) return;
}
}
}

Author of the UniPCemu emulator.
UniPCemu Git repository
UniPCemu for Android, Windows, PSP, Vita and Switch on itch.io

Reply 7 of 15, by BloodyCactus

User metadata
Rank Oldbie
Rank
Oldbie
superfury wrote:

But then, while shifting past count 16, what happens?

I did a anothter quick shl + shld test

mov ax,1
shl ax,0x11
==
ax=0x00

mov ax,1
shl ax,0x21
==
ax=0x02

mov ax,1
mov bx,0xffff
shld ax,bx,0x21
==
ax = 3
bx = 0xffff


mov ax,1
mov bx,0xffff
shld ax,bx,0x20
==
ax = 1
bx = 0xffff


mov ax,1
mov bx,0xffff
shld ax,bx,0x11
==
ax = 0xffff
bx = 0xffff

--/\-[ Stu : Bloody Cactus :: [ https://bloodycactus.com :: http://kråketær.com ]-/\--

Reply 8 of 15, by superfury

User metadata
Rank l33t++
Rank
l33t++

The issue is with that final one(all others are acting as documented due to masking the count with 1Fh):

mov ax,1
mov bx,0xffff
shld ax,bx,0x11
==
ax = 0xffff
bx = 0xffff

How did it arrive at FFFF in AX? Shifting with 10h will result in AX=FFFF, but what happens after that? The source temporary register(which I assume it'll use) should be 0(what remains of the DX temporary shifting value), so after that, 0 will be shifted into AX, thus FFFEh. But it's resulting in FFFFh instead? Does the temporary get reloaded with DX after 16 shifts? Thus shifting left AX::DXcopy::DXcopy2?

Author of the UniPCemu emulator.
UniPCemu Git repository
UniPCemu for Android, Windows, PSP, Vita and Switch on itch.io

Reply 10 of 15, by ripsaw8080

User metadata
Rank DOSBox Author
Rank
DOSBox Author

An issue with shifts greater than 16 bits was recently brought up for Fast Tracker II 2.09 using GUS in DOSBox. According to the Bochs source code, the behavior of SHRD/SHLD changed starting with Pentium Pro, and Bochs now uses the changed behavior. In DOSBox you get the Pentium and earlier behavior with normal or dynrec cores, but the (supposed) PPro and later behavior with the dynx86 core because the instruction executes on the host CPU.

For reference, the attached patch changes the normal core in DOSBox SVN to use the later behavior, which amounts to a different choice of operand.

Attachments

  • Filename
    DSH_w.diff
    File size
    832 Bytes
    Downloads
    73 downloads
    File license
    Fair use/fair dealing exception

Reply 11 of 15, by superfury

User metadata
Rank l33t++
Rank
l33t++

So, if I understand it correctly, shifting more than 16 bits on a 386/486 will shift in 0-bits, not the second operand again? So essentially shifting dest::src::0?

Won't that cause compatibility issues on newer processors?

Author of the UniPCemu emulator.
UniPCemu Git repository
UniPCemu for Android, Windows, PSP, Vita and Switch on itch.io

Reply 12 of 15, by BloodyCactus

User metadata
Rank Oldbie
Rank
Oldbie
peterferrie wrote:

It's likely a typo. The result should indeed be AX=0xFFFFE. i.e. shifting in zeroes.

Well, its NOT a typo. its real hardware thanks (k6-2+)

ax=0,bx=0xffff, shld ax,bx,0x11 -> ax=0xffff, bx=0xffff
ax=0,bx=0xAAAA, shld ax,bx,0x11 -> ax=0x5555, bx=0xAAAA
ax=0,bx=1, shld ax,bx,0x11 -> ax=2, bx=1

--/\-[ Stu : Bloody Cactus :: [ https://bloodycactus.com :: http://kråketær.com ]-/\--

Reply 13 of 15, by superfury

User metadata
Rank l33t++
Rank
l33t++
BloodyCactus wrote:
Well, its NOT a typo. its real hardware thanks (k6-2+) […]
Show full quote
peterferrie wrote:

It's likely a typo. The result should indeed be AX=0xFFFFE. i.e. shifting in zeroes.

Well, its NOT a typo. its real hardware thanks (k6-2+)

ax=0,bx=0xffff, shld ax,bx,0x11 -> ax=0xffff, bx=0xffff
ax=0,bx=0xAAAA, shld ax,bx,0x11 -> ax=0x5555, bx=0xAAAA
ax=0,bx=1, shld ax,bx,0x11 -> ax=2, bx=1

So what happens in that first case on a 80386? If the bx temporary is reloaded after 16 shifts(shifting non-zeroes in), wouldn't that be an 'undocumented' incompatibility with the Pentium(or 80486/80386) right there in the instruction behaviour?

Also, you're saying k6-2+, so are you talking about even more modern processors? I'm trying to figure out it's behaviour on a 80386 (and 80486SX), not a modern CPU(which I don't emulate).

Author of the UniPCemu emulator.
UniPCemu Git repository
UniPCemu for Android, Windows, PSP, Vita and Switch on itch.io

Reply 14 of 15, by BloodyCactus

User metadata
Rank Oldbie
Rank
Oldbie
superfury wrote:

So what happens in that first case on a 80386? If the bx temporary is reloaded after 16 shifts(shifting non-zeroes in), wouldn't that be an 'undocumented' incompatibility with the Pentium(or 80486/80386) right there in the instruction behaviour?

Also, you're saying k6-2+, so are you talking about even more modern processors? I'm trying to figure out it's behaviour on a 80386 (and 80486SX), not a modern CPU(which I don't emulate).

OK, I'll let you dig out a 386 or 486 (which you know, maybe you should have on hand the hardware your trying to emulate so you can you know.. test it?). I have no interest in breaking out my 486-dx/100 to test for you.

dosbox (built from source about a week ago) with 386 cpu + normal core gives same behaviour. 386 + simple gives me same behavour. penium_slow + dynamic on my i7-4970k also gives me same behaviour as my k6-2+.. sooo dont when ripsaw8080 changed dosbox behaviour but mine does not follow those rules he stated. (I'm on linux, which maybe has some bearing on dynarec core?)

--/\-[ Stu : Bloody Cactus :: [ https://bloodycactus.com :: http://kråketær.com ]-/\--

Reply 15 of 15, by ripsaw8080

User metadata
Rank DOSBox Author
Rank
DOSBox Author

DOSBox has not been changed because the Pentium-and-earlier behavior is desired. I only show the changes in the diff for reference of what would be changed to use the PPro-and-later behavior.

In DOSBox you will only get the later behavior with the 32-bit dynamic core (meaning dynx86, and NOT dynrec) because the instruction executes on the host CPU.