superfury wrote:Edit: What about the other (un)conditional jump instructions and call instructions? Do they delay the BIU 6 cycles as well?

Taken conditional jumps and LOOPs (including JCXZ): 6 cycles.

Near/short JMP: 6 cycles.

Indirect JMP (i.e. "JMP CX"): 3 cycles.

Indirect CALL (i.e. "CALL CX") and near CALL: 10 cycles, of which last 4 are the prefetch of the instruction at the destination.

Far JMP: 4 cycles.

mov [iw],accum: 2 cycles.

Far CALL: 5 cycles before first stack store, 9 cycles before the second stack stack store (note that the prefetch of the destination instruction takes the last 4 of these 9 cycles).

OUT DX,accum and IN accum,DX: no delay except for the 1 cycle wait state

PUSH rw, PUSH segreg, PUSHF: 2 cycles before stack operation.

MOVSB, MOVSW: 3 cycles between load and store.

REP MOVSB, REP MOVSW: same, also 6 cycles between each load/store pair (0 between halves of a word load/store).

REP STOSB, REP STOSW: 6 cycles between each store (0 between halves of a word store).

REP LODSB, REP LODSW: 9 cycles between each load (0 between halves of a word load).

RET: 3 cycles between stack store and first prefetch at destination.

RET iw: 2 cycles before stack store, 4 cycles between stack store and first prefetch at destination.

XLATB: 2 cycles before load, 2 cycles after.

ADD B[SI],AL: 2 cycles before read, 3 cycles before write.

ADD AL,B[SI}: 2 cycles before read.

CMP [SI],accum: 2 cycles before read.

I've attached a file showing sniffer logs of all these and more (but not an exhaustive list). Note that some may be different for other bus states.