VOGONS


First post, by superfury

User metadata
Rank l33t++
Rank
l33t++

During a x86 stack switch and things like far call through a gate or stack return, how does the CPU know to load SP or ESP respectively? And what size to push or pop on the stack? Does it use the new B-bit or the original B-bit (of the instruction's original stack from before it started to load SS)?

Edit: Currently I have the following implemented:
- Stack switches to higher privilege: TSS size determines the 16-bit or 32-bit value for SS/ESP loaded from the TSS. Resulting stack segment descriptor's B-bit determines if to load ESP(set) or SP(cleared) with the 16(zero-extended if needed) or 32-bit value(truncated if needed).
- Stack return to lower privilege level: operand size determines if to pop a 32-bit or 16-bit operand from the stack. The resulting stack segment descriptor that's loaded afterwards determines if to load SP(cleared) or ESP(set) with that value, zero-extended(16->32) or truncated (32->16) if needed. This applies to both RETF and IRET.
- Call gate to higher privilege: Stack switch occurs as mentioned above. SS is pushed as 32-bit based on the call gate size. The call gate size also determines if 32-bit ESP or 16-bit SP is pushed on the stack (decreased by 4 or 2 accordingly to that too). Extra parameters are pushed in the same way (on the destination stack). The same is true for the return address.

Anyone knows if this behaviour is correct?

Author of the UniPCemu emulator.
UniPCemu Git repository
UniPCemu for Android, Windows, PSP, Vita and Switch on itch.io

Reply 1 of 7, by Baron von Riedesel

User metadata
Rank Member
Rank
Member

> Anyone knows if this behaviour is correct?

Not quite. IIRC the current operand size ( set by the D bit of CS, optionally modified by operand prefix 0x66) determines if SP or ESP is loaded - if SP is loaded, hiword(ESP) stays unmodified. OTOH, the B/D bit of the SS descriptor (optionally modified by address prefix 0x67?) just controls whether (indirect) references to the stack ( instructions CALL, RET, PUSH, POP, ENTER, LEAVE ) use SP or ESP.

Reply 2 of 7, by superfury

User metadata
Rank l33t++
Rank
l33t++
Baron von Riedesel wrote on 2025-08-31, 14:20:

> Anyone knows if this behaviour is correct?

Not quite. IIRC the current operand size ( set by the D bit of CS, optionally modified by operand prefix 0x66) determines if SP or ESP is loaded - if SP is loaded, hiword(ESP) stays unmodified. OTOH, the B/D bit of the SS descriptor (optionally modified by address prefix 0x67?) just controls whether (indirect) references to the stack ( instructions CALL, RET, PUSH, POP, ENTER, LEAVE ) use SP or ESP.

Then how do those cases I mentioned use the stack size or operand size?
The cases being:
- Stack switch to higher privilege (loading SP or ESP from the TSS). When is ESP or SP loaded based on? And is the full TSS stack pointer read in both cases (depending on the TSS size)?
- Stack return to lower privilege level (loading SP or ESP from either a 16-bit or 32-bit value popped off the stack)? How does it know to pop a 32-bit or 16-bit value off the stack for the new stack pointer? Is the resulting pointer loaded into ESP or SP (and how does it know which one to use)?
https://www.felixcloutier.com/x86/ret says simply 16-bit/32-bit operand-size based pop on the loading of tempSS? Then further down simply loads ESP in all cases, never just SP?
- Call gate to higher privilege: how does it know what size of stack pointer to push on the stack? Is a 16-bit pointer zero-extended when pushed as 32-bits on the stack? All other variables are moved based on the call gate size (16-bit or 32-bit call gate) from what I know. What about the stack pointer and SS? Based on the call gate size or instruction operand size (highly doubt it's the latter, as the kernel has no control over it)?
Looking at the CALL instruction's description (https://www.felixcloutier.com/x86/call) I get

Push(oldSS:oldESP); (* From calling procedure *)

Which is the same as the 32-bit call gate version. oldCS:oldEIP is mentioned in the same way for both call gate sizes?

The issue is that in this case, all those cases would be implicit, as the instruction doesn't specify the size? So are they all based on the SS descriptor's B-bit of the original instruction, or of the destination stack (for higher privilege level after the stack switch)?

Author of the UniPCemu emulator.
UniPCemu Git repository
UniPCemu for Android, Windows, PSP, Vita and Switch on itch.io

Reply 3 of 7, by Baron von Riedesel

User metadata
Rank Member
Rank
Member
superfury wrote on 2025-08-31, 21:12:

- Stack return to lower privilege level (loading SP or ESP from either a 16-bit or 32-bit value popped off the stack)? How does it know to pop a 32-bit or 16-bit value off the stack for the new stack pointer? Is the resulting pointer loaded into ESP or SP (and how does it know which one to use)?

This solely depends on the current operand size: it's either RETW/IRETW or RETD/IRETD. The first case will always try to load SS:SP, the latter SS:ESP.

Reply 4 of 7, by superfury

User metadata
Rank l33t++
Rank
l33t++

And what about stack switching to a higher privilege level? I know that the TSS determines if it's a 16-bit or 32-bit field, but what about the loading of the SP or ESP register in that case? How is that determined (I currently base it upon the B-bit of the loaded stack segment descriptor).

And call gates? What combination of SP, ESP and data size (truncated or zero-extended if needed) does it push? How does it determine that? (I currently base it on the call gate size instead of any operand size)

Edit: Based on Bochs, call gates are a bit complicated:
https://sourceforge.net/p/bochs/code/HEAD/tre … cpu/wide_int.cc

It looks like it's taking the lower 16 bits of the caller's ESP based on the B-bit of it's stack segment descriptor.
The B-bit determines which part of the stack register to affect (SP vs ESP).

And stack switching seems as simple as I described it (applying to all the stack switching cases). The B-bit is handled normally for all subsequent pushes (affecting SP vs ESP decreasing).

Author of the UniPCemu emulator.
UniPCemu Git repository
UniPCemu for Android, Windows, PSP, Vita and Switch on itch.io

Reply 5 of 7, by superfury

User metadata
Rank l33t++
Rank
l33t++

Hmmmm... Apparently Windows (32-bit only?) uses call gates in one case:
Interrupt 2Fh function 1684h.
It's supposed to return a call gate into ES and offset into DI (so ES:DI is the call gate and offset(to be discarded?)).

But you can't load a call gate into a segment register other than CS (so MOV ES,xxxxh can't be used for call gates)?
Unless it's in virtual 8086 mode or real mode of course. But that makes call gates unusable anyways (since they don't use the GDT or LDT).

Author of the UniPCemu emulator.
UniPCemu Git repository
UniPCemu for Android, Windows, PSP, Vita and Switch on itch.io

Reply 6 of 7, by superfury

User metadata
Rank l33t++
Rank
l33t++

How do gate types select between 16/32-bit pushes and SP/ESP loads? Are they always based on gate size(call gate or interrupt gate) or SS descriptor B-bit? What about switching between stack sizes (larger or smaller word size)?

Author of the UniPCemu emulator.
UniPCemu Git repository
UniPCemu for Android, Windows, PSP, Vita and Switch on itch.io

Reply 7 of 7, by Enis

User metadata
Rank Newbie
Rank
Newbie

From what I’ve played around with in x86 emulation, your approach looks mostly right. When the CPU switches stacks on a privilege change, it really does follow the new TSS/SS descriptor’s B-bit to decide whether to load SP or ESP and whether pushes and pops are 16 or 32 bit. For returns like RETF or IRET, the instruction’s operand size determines how many bytes to pop, but the final SS and SP/ESP come from the segment descriptor, with zero-extension or truncation if needed. Call gates to higher privilege are a bit tricky, but basically the extra parameters and return addresses get pushed according to the destination stack’s width, while the gate size only affects the far pointer itself. Overall, it matches my experience tinkering with CPU emulation, and the key thing is keeping straight which B-bit and operand size applies at each step.