VOGONS


First post, by vladstamate

User metadata
Rank Oldbie
Rank
Oldbie

Hoping to get people to post quirkiness issues with x86 platform.

Here is one from me. This instruction

db 036h
db 0f3h
div word[2]

is actually legal (the processor ignores the REPZ) and it looks something like this:

REPZ
DIV WORD[SS:2]

Now the weird part is if it causes an overflow (because of a divide by 0 let's say) the CS:IP that is pushed on the stack is that of the REPZ instruction (to indicate to Interrupt 0 what is the faulty instruction) because to the x86 processor prefixes like REPZ and SS: are part of the instruction and they do not stand by themselves.

I discovered this because my emulator was not pushing the correct CS:IP in the stack in this extreme case.

Does anyone have other strange discoveries?

YouTube channel: https://www.youtube.com/channel/UC7HbC_nq8t1S9l7qGYL0mTA
Collection: http://www.digiloguemuseum.com/index.html
Emulator: https://sites.google.com/site/capex86/
Raytracer: https://sites.google.com/site/opaqueraytracer/

Reply 1 of 7, by peterferrie

User metadata
Rank Oldbie
Rank
Oldbie

This is well-known, but it's not truly legal - Intel could change at any point what the instruction means when the prefix is there.
Consider the SSE instructions took over the string prefixes from MMX.

See http://pferrie.host22.com/misc/lowlevel2.htm
for some things.

For multiple segment prefixes (eg cs: ss: ds:mov ax,[1234]), the last one has priority.
Prior to the Pentium III, for multiple string prefixes (e.g. repz repnz lodsb), the first one had priority. Now it's the last one that has priority.

AAA uses a single 16-bit add, not two 8-bit adds, as was documented originally.
AAS uses a single 16-bit sub, not two 8-bit subs, as was documented originally.

I have a collection of other bits, but they are specific to the generation of the CPU (mostly Pentium III and earlier, because Pentium 4 fixed many things).

Reply 2 of 7, by vladstamate

User metadata
Rank Oldbie
Rank
Oldbie
peterferrie wrote:

Wow, wait what?!

In my emulator which does cycle by cycle emulation I do not decode the RMmode byte until after the instruction is decoded. Not only that but I only put in a request to the BIU for a prefetch value (to get the RM) after I decoded and decided that an RM is required. So in my case I would fire the unsupported instruction interrupt before the page fault.

I would not have expected that kind of behavior....

So I guess the undefined instruction exception has to happen in the cycle when the execution is happening after all the RMmode and the afferent EA calculation bytes as well as any immediates have been retrieved from the prefetch queue.

YouTube channel: https://www.youtube.com/channel/UC7HbC_nq8t1S9l7qGYL0mTA
Collection: http://www.digiloguemuseum.com/index.html
Emulator: https://sites.google.com/site/capex86/
Raytracer: https://sites.google.com/site/opaqueraytracer/

Reply 3 of 7, by Jepael

User metadata
Rank Oldbie
Rank
Oldbie

These may be common knowledge nowadays, but here goes:

At least on 8088/8086, [0Fh] POP CS was a legal instruction. As far as I know, it did not clear the prefetch queue, so there is a danger that changing CS will cause incomplete instructions to be fetched to prefetch queue, so of course it won't execute properly.

Also, if an interrupt happens during execution, when control returns back to original instruction that may contain multiple prefixes, it points to last prefix only.
So basically moving 64 kilobytes with REP MOVSW is safe to copy from DS:SI to ES:DI, but not safe to use REP ES:MOVSW to copy from ES:SI to ES:DI. This may also apply to 8088/8086 only.

And some RETF bug which does not restore interrupt flag properly.

Man, having used to these kind of opcodes to just work on a modern system makes retro 808x programming challenging (modern in the sense that I started assembly programming on a 386...)

Reply 4 of 7, by reenigne

User metadata
Rank Oldbie
Rank
Oldbie

I haven't found a use for them yet, but there are a couple of hidden registers on the 8088/8086. One holds the result of the most recent EA computation. You can access it with an LEA instruction that uses a register instead of memory operand. The other holds the result of the most recent bus access (excluding instruction fetches), and can be accessed with an LDS or LES instruction that uses a register instead of a memory operand.

Reply 5 of 7, by peterferrie

User metadata
Rank Oldbie
Rank
Oldbie
vladstamate wrote:

In my emulator which does cycle by cycle emulation I do not decode the RMmode byte until after the instruction is decoded. Not only that but I only put in a request to the BIU for a prefetch value (to get the RM) after I decoded and decided that an RM is required. So in my case I would fire the unsupported instruction interrupt before the page fault.

I would not have expected that kind of behavior....

So I guess the undefined instruction exception has to happen in the cycle when the execution is happening after all the RMmode and the afferent EA calculation bytes as well as any immediates have been retrieved from the prefetch queue.

Yes, that's correct. The instruction is fetched entirely (which is where the page fault might occur - there is no "this is a NOP, don't bother to fetch the RM parts"), before it is evaluated for validity.

Oh, and for CPUs prior to Core overlapping rep instructions (e.g. "rep stosb" or "rep movsb" where cx is non-zero and di will hit the "rep" as a result), they would cache the rep and complete the operation until cx is zero (or a page fault occurred) even though the original instruction is destroyed. It was the remaining prefetch cache bug that was supposed to have been fixed with the Pentium (i.e. the 486 "mov [next instruction], cc" trick which would trap a debugger).

Reply 6 of 7, by BloodyCactus

User metadata
Rank Oldbie
Rank
Oldbie

I remember having lots of prefetch buffer tricks on my old cracking code and encryption/compressor tools I wrote that totally fubared when pentium came along and rewrote the prefetch rule they had in 386/486.

--/\-[ Stu : Bloody Cactus :: [ https://bloodycactus.com :: http://kråketær.com ]-/\--

Reply 7 of 7, by vladstamate

User metadata
Rank Oldbie
Rank
Oldbie

I realized one more thing. In the 8088 I always thought the 4th cycle (T4) is when the BIU will be able to perform data bus activity: read memory, write memory, in/out. However it turns out that is also the cycle when the CPU acknowledges interrupts. So if an IRQ comes the CPU will only acknowledge it in this 4th cycle and then it won't be able to do any bus operations. Somehow I (wrongly) thought that whenever an IRQ comes and the CPU is ready to decode the next instruction will deal with it immediately. Turns out interrupt acknowledge is also a bus operation.

For 286 that is every 2 cycles not 4.

YouTube channel: https://www.youtube.com/channel/UC7HbC_nq8t1S9l7qGYL0mTA
Collection: http://www.digiloguemuseum.com/index.html
Emulator: https://sites.google.com/site/capex86/
Raytracer: https://sites.google.com/site/opaqueraytracer/