REP + String instructions behavior \ VOGONS

REP + String instructions behavior

Topic actions

First post, by GloriousCow

Posted on 2022-11-17, 16:38

GloriousCow Offline

Rank Member

Rank: Member
Posts: 488
Joined: 2022-09-12, 20:00

I have a few questions about the behavior of an 8088 with REP-prefixed string instructions.

What is the state of IP when executing a REP STOSB, for example? Does it point at the REP prefix or the STOSB opcode itself?

The intel docs say:

A repeating string operation can be suspended by an exception or interrupt. When this happens, the state of the registers is pre […]
Show full quote
A repeating string operation can be suspended by an exception or interrupt. When this happens, the state of the
registers is preserved to allow the string operation to be resumed upon a return from the exception or interrupt
handler. The source and destination registers point to the next string elements to be operated on, the EIP register
points to the string instruction, and the ECX register has the value it held following the last successful iteration of
the instruction. This mechanism allows long string operations to proceed without affecting the interrupt response
time of the system.

The phrase "the state of the registers is preserved" implies to me, at least, that there's some mechanism saving these values to be restored on return from interrupt rather than simply relying on the ISR not to modify them, but looking at the source of some other emulators it doesn't appear to handled with any special cases.

Are these values "saved" somewhere and "restored" on IRET or am I just reading too much into the word "preserved"? Can an ISR in theory overwrite them and cause the string instruction to fail to resume properly? (I know that would be an ill-behaved ISR...)

When the interrupt is finished and IP returns to the string instruction, if IP is pointing to the string instruction and not the REP prefix, how does the CPU "remember" it was in a REP?

I know there's a 'bug' with the 8088 where a REP + a segment override prefix won't properly restore the segment override prefix, and I wonder how that fits into the picture as well.

MartyPC: A cycle-accurate IBM PC/XT emulator | https://github.com/dbalsom/martypc

Reply 1 of 4, by Ringding

Posted on 2022-11-17, 16:58

Ringding Offline

Rank Member

Rank: Member
Posts: 211
Joined: 2016-01-05, 21:02
Location: Wien

The ISR needs to preserve the registers, otherwise it is just broken. And yes, of course it can modify the running state by changing registers. The wording might be a bit unfortunate, but EIP definitely needs to point at the REP prefix. The term "the string instruction" must definitely be meant to include the prefix.

Reply 2 of 4, by mkarcher

Posted on 2022-11-17, 17:51

mkarcher Offline

Rank l33t

Rank: l33t
Posts: 3318
Joined: 2019-01-19, 16:29
Location: Germany

GloriousCow wrote on 2022-11-17, 16:38:

I know there's a 'bug' with the 8088 where a REP + a segment override prefix won't properly restore the segment override prefix, and I wonder how that fits into the picture as well.

That fit perfectly well into this picture. The 8088 has its internal IP at the string instruction, and the microcode for repeated string instructions just pushes "IP-1" expecting the REP prefix there, which allows a clean restart. If you have "CS: REP MOVSW", the segment override gets lost after an interrupt. People claim that "REP CS: MOVSW" can just be restartet if CX is non-zero, but i'm unsure whether CX would need to be adjusted in this case, as a single "CS: MOVSW" would have been executed after returning from the interrupt.

Reply 3 of 4, by reenigne

Posted on 2022-11-21, 08:33

reenigne Offline

Rank Oldbie

Rank: Oldbie
Posts: 649
Joined: 2006-11-30, 05:13
Location: Cornwall, UK

The microcode actually pushes "IP-2" as the internal IP points to the instruction after the currently running one (hence the encoding of relative jumps). But yes, the 8088/8086 microcode can't handle multiple prefixes in this situation. Later CPUs fixed this problem. You can't use "REP CS: MOVSW" and correct for it after the fact as there's an ambiguous case. If an interrupt occurs on the last iteration, you'll get an extra word copied and you can't tell from looking at CX whether this has happened or not.

Reply 4 of 4, by mkarcher

Posted on 2022-11-21, 14:37

mkarcher Offline

Rank l33t

Rank: l33t
Posts: 3318
Joined: 2019-01-19, 16:29
Location: Germany

reenigne wrote on 2022-11-21, 08:33:

You can't use "REP CS: MOVSW" and correct for it after the fact as there's an ambiguous case. If an interrupt occurs on the last iteration, you'll get an extra word copied and you can't tell from looking at CX whether this has happened or not.

Thanks for the input! The work around needs to be more complicated, but it's not impossible to detect. You may not trust CX, but you can trust SI/DI. Needing to store the expected final SI or DI value and stash away the target word that possibly gets overwritten gets complicated enough that just re-setting DS for the repeated string instruction is likely the easier approach.

Go to top of page Go to top of page

Back to PC Emulation