VOGONS


First post, by superfury

User metadata
Rank l33t++
Rank
l33t++

When loading(jumping) to a different address, ip or eip is loaded depending on the operand size. But when the operand size is 16-bits, the high 16-bits of eip are masked to become zeroed.

So even though software might use ip in 16-bit mode, the CPU might only have EIP(with it's 'IP' part being written to memory when stored for procedure calls/interrupts when in 16-bit mode).

So any overflow in the 16-bit limit will cause EIP's low 16-bits to be written and EIP(when wrapping to 0x10000 for the next instruction) will resume at the #GP handler. When said handler returns, IP is loaded from the stack and stored within EIP as 16-bits, clearing the upper 16-bits of EIP once more. So from a software perspective, this EIP wrapping(not in the middle of an instruction) is provided for free(assuming a #GP handler returns normally)? It's just instructions that wrap IP in the middle of an instruction will hang the CPU in this way(since they return to the instruction itself, which still is before the wrap).

So thinking like this, in essence IP doesn't exist on 32-bit x86 CPU's? The lower 16-bits are stored in memory during any procedure call(int,call), but loads load the full EIP register regardless of the operand size? So IP only exists on the 80286 and earlier(although the 80286 might have 17-bit registers to facilitate the overflow exception when going past offset 0xFFFF on any memory operand(including IP))?

Author of the UniPCemu emulator.
UniPCemu Git repository
UniPCemu for Android, Windows, PSP, Vita and Switch on itch.io

Reply 1 of 5, by Jo22

User metadata
Rank l33t++
Rank
l33t++

Hi superfury, I'd really like to help you with that but I'm by far not compotent enough, I'm afraid. 😅
However, I can give a little tip. At OS2Museum, there are smart people who know the inside-outs of x86.
More than often, they do discuss such complicated things: http://www.os2museum.com

If memory serves, I also read a posting once where some similar "286 vs 386 issues" were described.
It was related to Win32s and how some modern CPU docs do miss to describe correct behavior of storing 24bits out of 32bits,
Sounds confusing ? Well, I'm not good as describing. See http://www.os2museum.com/wp/sgdtsidt-fiction-and-reality/

Anyway, that's all I can say.
Hope you find out how to solve your issue.

Best regards,
Jo22

"Time, it seems, doesn't flow. For some it's fast, for some it's slow.
In what to one race is no time at all, another race can rise and fall..." - The Minstrel

//My video channel//

Reply 2 of 5, by superfury

User metadata
Rank l33t++
Rank
l33t++

So if I understand that article correctly, LIDT/SIDT work as documented(storing the high 8-bits of the base as zeroes using 16-bit operand size), but SIDT/SLDT actually store a 32-bit base always, with the high 8-bits being zeroed on 80286, but stored normally on 80386+(Required for Win32s not to come crashing down from 80000000h+ kernel memory)?

Edit: Luckily this is already implemented that way in UniPCemu.

Author of the UniPCemu emulator.
UniPCemu Git repository
UniPCemu for Android, Windows, PSP, Vita and Switch on itch.io

Reply 3 of 5, by superfury

User metadata
Rank l33t++
Rank
l33t++

Hmmm... Thinking about large (bug/huge) segments, what would happen on a x64 CPU when using a 4GB segment in 32-bit unreal/protected mode when setting the base address of a segment to non-zero and specifying an offset overflowing the linear address(when using 32/64-bit linear addresses for paging)? Will the linear address overflow to 0-4GB or will it reach into the 4GB address space? Or will it trigger a #GP fault, even though not raising a segment limit violation? So like DS.BASE=0x80000000, DS.limit=4G, offset 0x80000002? Will it access 0x100000002 or 0x2(linear addresses)?

Author of the UniPCemu emulator.
UniPCemu Git repository
UniPCemu for Android, Windows, PSP, Vita and Switch on itch.io

Reply 4 of 5, by reenigne

User metadata
Rank Oldbie
Rank
Oldbie
superfury wrote:

So thinking like this, in essence IP doesn't exist on 32-bit x86 CPU's? The lower 16-bits are stored in memory during any procedure call(int,call), but loads load the full EIP register regardless of the operand size? So IP only exists on the 80286 and earlier(although the 80286 might have 17-bit registers to facilitate the overflow exception when going past offset 0xFFFF on any memory operand(including IP))?

I would expect that there inside a CPU there are not separate sets of transistors for representing IP and EIP, if that's what you're asking (IP would just be the lower bits of EIP, like AX is the lower bits of EAX). Similarly with the 80286 (IP is actually 24 bits). Doesn't unreal mode depend on not having any wrapping of IP? It's just a different set of segment limits (got to be careful about interrupts if the high bits of EIP are significant though).

Fun fact: on the 8088/8086 there is actually no proper IP register at all - the CPU just keeps track of the current prefetch address and the number of bytes in the prefetch queue. So to generate IP (e.g. for a relative jump or to save on the stack for a CALL instruction or interrupt) the CPU subtracts the prefetch queue count from the prefetch address. I wouldn't be surprised if later CPUs did similar tricks.

Reply 5 of 5, by superfury

User metadata
Rank l33t++
Rank
l33t++

Well, UniPCemu currently keeps two sets of EIP registers(both being loaded when flushing the PIQ): one that's incremented on each PIQ fetch into the BIU(one byte at a time), which is the currently executed EIP as well as for debugging and jumps(incremented during fetch phase on the EU).
The other 'shadow EIP' is used by the 'prefetch unit'(BIU itself) to, combined with the CS segment descriptor cache and Paging TLB for generating physical addresses, generate physical addresses to fetch instructions into the PIQ. This hidden register is incremented in parallel with the EU's idea of EIP to generate addresses for fetching dword/word/byte instruction data into the PIQ. In this case for each byte, it's translated from it's logical into linear(without wrapping, NOPs at segment limits) and through the Paging TLB cache(TLB fetches are done by the EU exclusively, TLB misses halts the prefetching until it's loaded by the EU, just like segment limit or any other violations) to generate a physical byte address. Only when each of those steps succeed, the prefetch into the PIQ filling executes and reads from memory into the PIQ.

This method also allows for the BIU to remain unmodified and allow the EU to handle all faults, like paging on demand(instructions running over the Page border that Page Fault) as well as filling the Paging TLB(currently not done through the BIU yet, but directly accessing RAM by bypassing the BIU part of the access).

The BIU tries to keep reading whole dword(32-bit bus), word(16-bit bus) or byte(8-bit bus) into the prefetch each memory access cycle(1(80486), 2(80286/80386) or 4(80(1)8X cycles). When only a part of a word/dword can be read, only that part is read and the rest is discarded(80386-style), in the same way as the 80386+ memory access mask.

Author of the UniPCemu emulator.
UniPCemu Git repository
UniPCemu for Android, Windows, PSP, Vita and Switch on itch.io