VOGONS


First post, by GloriousCow

User metadata
Rank Member
Rank
Member

I have a few questions about the behavior of an 8088 with REP-prefixed string instructions.

What is the state of IP when executing a REP STOSB, for example? Does it point at the REP prefix or the STOSB opcode itself?

The intel docs say:

A repeating string operation can be suspended by an exception or interrupt. When this happens, the state of the registers is pre […]
Show full quote

A repeating string operation can be suspended by an exception or interrupt. When this happens, the state of the
registers is preserved
to allow the string operation to be resumed upon a return from the exception or interrupt
handler. The source and destination registers point to the next string elements to be operated on, the EIP register
points to the string instruction, and the ECX register has the value it held following the last successful iteration of
the instruction. This mechanism allows long string operations to proceed without affecting the interrupt response
time of the system.

The phrase "the state of the registers is preserved" implies to me, at least, that there's some mechanism saving these values to be restored on return from interrupt rather than simply relying on the ISR not to modify them, but looking at the source of some other emulators it doesn't appear to handled with any special cases.

Are these values "saved" somewhere and "restored" on IRET or am I just reading too much into the word "preserved"? Can an ISR in theory overwrite them and cause the string instruction to fail to resume properly? (I know that would be an ill-behaved ISR...)

When the interrupt is finished and IP returns to the string instruction, if IP is pointing to the string instruction and not the REP prefix, how does the CPU "remember" it was in a REP?

I know there's a 'bug' with the 8088 where a REP + a segment override prefix won't properly restore the segment override prefix, and I wonder how that fits into the picture as well.

MartyPC: A cycle-accurate IBM PC/XT emulator | https://github.com/dbalsom/martypc

Reply 1 of 4, by Ringding

User metadata
Rank Member
Rank
Member

The ISR needs to preserve the registers, otherwise it is just broken. And yes, of course it can modify the running state by changing registers. The wording might be a bit unfortunate, but EIP definitely needs to point at the REP prefix. The term "the string instruction" must definitely be meant to include the prefix.

Reply 2 of 4, by mkarcher

User metadata
Rank l33t
Rank
l33t
GloriousCow wrote on 2022-11-17, 16:38:

I know there's a 'bug' with the 8088 where a REP + a segment override prefix won't properly restore the segment override prefix, and I wonder how that fits into the picture as well.

That fit perfectly well into this picture. The 8088 has its internal IP at the string instruction, and the microcode for repeated string instructions just pushes "IP-1" expecting the REP prefix there, which allows a clean restart. If you have "CS: REP MOVSW", the segment override gets lost after an interrupt. People claim that "REP CS: MOVSW" can just be restartet if CX is non-zero, but i'm unsure whether CX would need to be adjusted in this case, as a single "CS: MOVSW" would have been executed after returning from the interrupt.

Reply 3 of 4, by reenigne

User metadata
Rank Oldbie
Rank
Oldbie

The microcode actually pushes "IP-2" as the internal IP points to the instruction after the currently running one (hence the encoding of relative jumps). But yes, the 8088/8086 microcode can't handle multiple prefixes in this situation. Later CPUs fixed this problem. You can't use "REP CS: MOVSW" and correct for it after the fact as there's an ambiguous case. If an interrupt occurs on the last iteration, you'll get an extra word copied and you can't tell from looking at CX whether this has happened or not.

Reply 4 of 4, by mkarcher

User metadata
Rank l33t
Rank
l33t
reenigne wrote on 2022-11-21, 08:33:

You can't use "REP CS: MOVSW" and correct for it after the fact as there's an ambiguous case. If an interrupt occurs on the last iteration, you'll get an extra word copied and you can't tell from looking at CX whether this has happened or not.

Thanks for the input! The work around needs to be more complicated, but it's not impossible to detect. You may not trust CX, but you can trust SI/DI. Needing to store the expected final SI or DI value and stash away the target word that possibly gets overwritten gets complicated enough that just re-setting DS for the repeated string instruction is likely the easier approach.