Stack overflow/underflow on x86? \ VOGONS

Stack overflow/underflow on x86?

Topic actions

Post a reply

First post, by superfury

Posted on 2018-06-19, 16:08

superfury Offline

Rank l33t++

Rank: l33t++
Posts: 5456
Joined: 2014-03-08, 11:25
Location: Netherlands

I understand that an unaligned, overflowing access(word at 0xFFFF(FFFF) or dword at 0x(FFFF)FFFD+) or limit(>limit) can throw a stack fault. The same can be implemented using paging in 4K blocks. But does the CPU do anything (exception) when not doing it on a limit violation(stack underflow/overflow by popping past 0x(FFFF)FFFE or below 0)? So a push at offset 1(word) or 2-0(dword)? Will that trigger an exception in any way?

Author of the UniPCemu emulator.
UniPCemu Git repository
UniPCemu for Android, Windows, PSP, Vita and Switch on itch.io

Reply 1 of 7, by peterferrie

Posted on 2018-06-19, 16:29

peterferrie Offline

Rank Oldbie

Rank: Oldbie
Posts: 649
Joined: 2008-05-08, 21:54

A push with SP=1 (for example) will fault in the same way, because the stack pointer will be reduced to FFFF, and then the write will be attempted, causing an address overflow.
The same thing happens for a dword push with SP=1-3. SP=0 is fine - it will subtract to FFFC and then write as usual.

Reply 2 of 7, by superfury

Posted on 2018-06-19, 20:23

superfury Offline

Rank l33t++

Rank: l33t++
Posts: 5456
Joined: 2014-03-08, 11:25
Location: Netherlands

So if I understand it correctly, the well known stack overflow(/underflow) 'fault' is either archieved with the help of paging, by using unaligned (E)SP stack addresses to trigger the fault on under/overflow or by using checks before/after a push/pop is executed? No special cases exist compared with the other descriptors?

Author of the UniPCemu emulator.
UniPCemu Git repository
UniPCemu for Android, Windows, PSP, Vita and Switch on itch.io

Reply 3 of 7, by superfury

Posted on 2018-06-19, 22:12

superfury Offline

Rank l33t++

Rank: l33t++
Posts: 5456
Joined: 2014-03-08, 11:25
Location: Netherlands

Just been thinking: Are offsets 33-bits or 32-bits+carry on 32-bit x86 processors? Otherwise, a overflow during 32-bit accesses wouldn't be detected, as it silently wra[s back to offsets 0-2? The 80286 faulted on word accesses past 0xFFFF, so it had to have a least 17-bits to detect that. It might only use 16(286+)/32-bit(386+ addressing modes on 80386+) address lines when calculating the logical address, but it has to use 17/33-bits to detect the overflowing offset on word/dword accesses?

Also, what happens when a table entry(GDT/IDT/LDT/Paging entries) aren't aligned when the align flag is set on a 486+?

Edit: Is the alignment check performed on virtual or logical addresses(paging doesn't affect alignment, due to always being 4K-aligned on i386/i486)?

Author of the UniPCemu emulator.
UniPCemu Git repository
UniPCemu for Android, Windows, PSP, Vita and Switch on itch.io

Reply 4 of 7, by peterferrie

Posted on 2018-06-20, 16:25

peterferrie Offline

Rank Oldbie

Rank: Oldbie
Posts: 649
Joined: 2008-05-08, 21:54

superfury wrote:
Just been thinking: Are offsets 33-bits or 32-bits+carry on 32-bit x86 processors? Otherwise, a overflow during 32-bit accesses wouldn't be detected, as it silently wra[s back to offsets 0-2? The 80286 faulted on word accesses past 0xFFFF, so it had to have a least 17-bits to detect that. It might only use 16(286+)/32-bit(386+ addressing modes on 80386+) address lines when calculating the logical address, but it has to use 17/33-bits to detect the overflowing offset on word/dword accesses?

I have no idea about that. I suppose that the result is identical in either case, so we don't need to concern ourselves with it.

superfury wrote:
Also, what happens when a table entry(GDT/IDT/LDT/Paging entries) aren't aligned when the align flag is set on a 486+?

Edit: Is the alignment check performed on virtual or logical addresses(paging doesn't affect alignment, due to always being 4K-aligned on i386/i486)?

Alignment checks are performed on the virtual address, regardless of the alignment of the descriptor tables.

Reply 5 of 7, by superfury

Posted on 2018-06-20, 19:36

superfury Offline

Rank l33t++

Rank: l33t++
Posts: 5456
Joined: 2014-03-08, 11:25
Location: Netherlands

What I mean is, is the alignment flag's alignment applied to the offset of the segment:offset address or is it actually applied to the logical address(after adding the segment base address to the offset)?

You say the virtual address, so it's applied to the offset only(first byte of a memory word/dword access)?
If that's true, that means that bochs' implementation is incorrect: it applies the alignment check(exception) when the logical address isn't aligned instead?
UniPCemu then does this correctly on the offset address of a word/dword access, before translating using segmentation.

Author of the UniPCemu emulator.
UniPCemu Git repository
UniPCemu for Android, Windows, PSP, Vita and Switch on itch.io

Reply 6 of 7, by superfury

Posted on 2018-06-20, 19:57

superfury Offline

Rank l33t++

Rank: l33t++
Posts: 5456
Joined: 2014-03-08, 11:25
Location: Netherlands

peterferrie wrote:
superfury wrote:
Just been thinking: Are offsets 33-bits or 32-bits+carry on 32-bit x86 processors? Otherwise, a overflow during 32-bit accesses wouldn't be detected, as it silently wra[s back to offsets 0-2? The 80286 faulted on word accesses past 0xFFFF, so it had to have a least 17-bits to detect that. It might only use 16(286+)/32-bit(386+ addressing modes on 80386+) address lines when calculating the logical address, but it has to use 17/33-bits to detect the overflowing offset on word/dword accesses?

I have no idea about that. I suppose that the result is identical in either case, so we don't need to concern ourselves with it.

Thinking about it, maybe it's a combination of the two? The offset of the modr/m is calculated. Said offset is then wrapped to 16-bit or 32-bit depending on the operand size that still happens during the modr/m and memory operand calculation for non-modr/m addresses. This is the base offset of the segmented operand(16/32-bits). The resulting offset is then checked against overflow using a 17/33-bit addition of any remainder(so +0 to +1 for 16-bit, +0 to +3 for 32-bit). If overflow occurs on any of those bytes(>0x(FFFF)FFFF), a #GP fault occurs like for limit violations?

So both of these apply?

So with a limit of 0xFFFF:
BX+SI with BX=0, SI=0xFFFF faults on a word access(second half of word from 0x10000).
BX+SI with BX=1, SI=0xFFFF doesn't fault: resulting addresses are 0 and 1.
With a limit of 0xFFFFFFFF the same principle applies. As long as the offset after wrapping isn't 0xFFFFFFFF(word) or 0xFFFFFFFD-0xFFFFFFFF(dword)?

So in short, calculate address of offset to apply from modr/m or immediate/register. Then wrap that to the address size. Then check n to n+(size-1) against limit using address size+1 bits variable(in c/c++ we need to use uint_64)? 8086 always wraps to 16-bit. 80186 doesn't wrap when using the resulting offset(just use base+(17-bit offset)) with base offset 0xFFFF. Otherwise, wrap to 16-bits.

Author of the UniPCemu emulator.
UniPCemu Git repository
UniPCemu for Android, Windows, PSP, Vita and Switch on itch.io

Reply 7 of 7, by superfury

Posted on 2018-06-21, 09:25

superfury Offline

Rank l33t++

Rank: l33t++
Posts: 5456
Joined: 2014-03-08, 11:25
Location: Netherlands

I've just adjusted memory protection as follows:
First, the base offset of an operand is calculated from ModR/M or immediate operand.
Next, said operand is wrapped against the address size(0xFFFF or 0xFFFFFFFF).

At this point, we have the base offset for a memory operand to read/write from/to(as a byte/word/dword access).

Then, the resulting address(16-bits or 32-bit address) is checked against all two layers of protection/translations. Said checks perform on 64-bit addresses(actually only 17 or 33-bits used, depending on the address size to use).

So for a resulting address byte, only said address is used, which always is within range.
For a 16-bit address, the low byte is always within range. The high byte of the address is checked against address+1 without any wrapping(so offset 0xFFFF+1=0x10000 and 0xFFFFFFFF+1=0x100000000, which will fault against the limit).
FOr a 32-bit address, the low byte is always within range. The high bytes of the address is checked against address+1 to address+3 without any wrapping(so base offset 0xFFFFFFFD+ will result in offsets 0x100000000+, which will fault against the limit).

The limit is either 20-bits(byte granularity) or 32-bits(4K granularity). So anything passing 0xFFFF or limit in 16-bit mode and/or 0xFFFFFFFF or limit in 32-bit mode actually faults using a General Protection fault.

Compared to the earlier method of only using a 32-bit offset in the entire process, wrapping past 0xFFFFFFFF actually faults with 16-bit or 32-bit offsets when used with word/dword offsets(since they are past the segment limit). Earlier, they wouldn't fault because the offset itself including the rest of the operand locations(n+1 or n+1 to n+3) would already wrap to low addresses(so a access at 0xFFFFFFFD would wrap the last byte to 0x00000000 without faulting).

So that's actually having a lot of effect on the whole offset validation, faulting now where it didn't fault properly before(due to wrapping 16-bit or 32-bit addresses). Said wrapping is now only still applied to the actual memory access itself, which only happens when protection validates against the offset or when a CPU doesn't check against the offset(80186(actually accessing offset 0x10000, whereas the 8086 only uses 16-bits with wrapping) or older CPU emulation, since they have no protection).

Author of the UniPCemu emulator.
UniPCemu Git repository
UniPCemu for Android, Windows, PSP, Vita and Switch on itch.io

Go to top of page Go to top of page

Back to PC Emulation

Main menu

Common searches