VOGONS


First post, by GloriousCow

User metadata
Rank Member
Rank
Member

In the ongoing pursuit of emulator cycle accuracy, I'm investigating the cycle timing for interrupts. Specifically, waking from a HALT.

I have an 8088 on a microcontroller and what I have discovered is that there is a somewhat significant delay between INTR being asserted and the first INTA bus cycle.

Here I assert the INTR while halted and step the CPU...

2023-05-20T15:26:34Z TRACE cpu_client::remote_cpu Setting INTR high to recover from halt...
00000070 [60104] M:... I:... Q:.. PASV T1 | 1 [90 ]
00000071 [60104] M:... I:... Q:.. PASV T1 | 1 [90 ]
00000072 [60104] M:... I:... Q:.. PASV T1 | 1 [90 ]
00000073 [60104] M:... I:... Q:.. PASV T1 | 1 [90 ]
00000074 [60104] M:... I:... Q:.. PASV T1 | 1 [90 ]
00000075 [60104] M:... I:... Q:.. PASV T1 | 1 [90 ]
00000076 [60104] M:... I:... Q:.. PASV T1 | 1 [90 ]
00000077 A:[00195] M:... I:... Q:.. IRQA T1 | 1 [90 ]

I see either a 7 or 8 cycle delay. The additional cycle delay is less frequent, but frequent enough to see.

I confirmed this with a scope, see attached. Yellow is the PIT channel 0 output. It's interesting to see it has a slow enough rise time to effectively delay INTR a half cycle.
I stuck a probe on HOLDA just to make sure the delay wasn't caused by DMA. My arduino-controlled 8088 has no DMA controller, so I had pretty much ruled that out anyway.

Anyone know the underlying rules or logic for interrupt acknowledgement on the 8088?

Attachments

  • wake_from_halt_timing.png
    Filename
    wake_from_halt_timing.png
    File size
    24.17 KiB
    Views
    1865 views
    File comment
    oscilloscope measurement of 8088 interrupt processing
    File license
    Public domain

MartyPC: A cycle-accurate IBM PC/XT emulator | https://github.com/dbalsom/martypc

Reply 1 of 15, by kdr

User metadata
Rank Member
Rank
Member

I thought it might be due to a microcode loop (the similar WAIT instruction is explicity documented as checking the /TEST pin every 5 clocks) but apparently not; reenigne's disassembly of the 8086 microcode page has this to say:

There is no microcode for the segment override prefixes (CS:, SS:, DS: and ES:). Nor for the other prefixes (REP, REPNE and LOCK), nor the instructions CLC, STC, CLI, STI, CLD, STD, CMC, and HLT. The "group" opcodes 0xf6, 0xf7, 0xfe and 0xff do not have top level microcode instructions. So none of the instructions with 0xf in the high nybble of the opcode are initially handled by the microcode. Most of these instruction are very simple and probably better done by random logic. HLT is a little surprising - I really thought I'd find a microcode loop for that one since it only seems to check for interrupts every other cycle.

If you haven't already, Ken's very comprehensive reverse-engineering of the 8086 interrupt hardware page might give you some ideas.

Reply 2 of 15, by reenigne

User metadata
Rank Oldbie
Rank
Oldbie

I no longer think it's checking every other cycle. Here's the logic I use in my microcode-based emulator: https://github.com/reenigne/reenigne/blob/mas … rocode.h#LL2522 . This seems to work in my tests, but the cause is very mysterious! I asked Ken about it and he couldn't see any circuitry that might be responsible for it, so it might be 8088-specifc. I'm hoping it will become clear once I work out the logic for interrupt timing for non-HLT, non-WAIT scenarios.

Reply 3 of 15, by GloriousCow

User metadata
Rank Member
Rank
Member

I was able to disprove any sort of two-cycle phase as I can delay 10, 11, or 12 cycles after halt before asserting INTR and see the same delay value. Interesting from your code that it appears to be related to the last bus transfer type?

Do you ever get the feeling that the logic for prefetching, bus delays, stalls, aborts, etc, feels a bit too complicated for such an old chip? I was hoping one of Ken's blogs might uncover some evidence for simpler unifying rules. I have a pet theory that would simplify things greatly but I've yet to be able to shoehorn it into every delay scenario.

MartyPC: A cycle-accurate IBM PC/XT emulator | https://github.com/dbalsom/martypc

Reply 4 of 15, by reenigne

User metadata
Rank Oldbie
Rank
Oldbie

Yes, the last bus transfer type does seem to be related - at least that's my best guess. But there must be some bit of state (a flip-flop somewhere) which keeps track of whether to do that extra cycle of delay during the HLT, as its output is used long after that bus transfer is complete. It being 8088-specific hints at possibly being related to breaking up a 16-bit transfer into two 8-bit transfers.

With my first attempt at XTCE I definitely felt that the logic I had come up with was far too complicated to be implemented in 29,000 transistors, and that there was a simpler underlying rule. With the microcode version, almost all of it makes sense. One big mystery was cleared up by this blog post of Ken's: http://www.righto.com/2023/01/inside-8086-pro … nstruction.html (see my comment on there). The _extraHaltDelay is the most mysterious puzzle piece remaining.

I'm interested to hear about your pet theory!

Reply 5 of 15, by GloriousCow

User metadata
Rank Member
Rank
Member

I was working on a blog post to try to describe the prefetching algorithm, at least my mental model of it. It would be interesting to compare notes. Although every time I think I have all the rules fully understood, some new bus delay scenario seems to crop up. The last ones were uncovered in my work to finally emulate the Lake effect in Area 5150.

I did see your comment on Ken's blog (I've tried to comment on his blog and just get a browser error, frustrating). I had some further observations about that delay, there's a bit more to it than Ken went into. It's not as simple as if queue_len == 3 then delay, as you've probably found. I actually handle two different states when queue length is 3, one for code fetches and one for all other bus transfers, as they seem to vary by a cycle in length, and I believe they somehow keep track of whether the last queue operation was a read or write. But that implies a flip-flop I don't think Ken has ever mentioned...

EDIT 10/30/2023: This ended up being incorrect.

Last edited by GloriousCow on 2023-10-30, 13:19. Edited 1 time in total.

MartyPC: A cycle-accurate IBM PC/XT emulator | https://github.com/dbalsom/martypc

Reply 6 of 15, by reenigne

User metadata
Rank Oldbie
Rank
Oldbie

Yes, I think you're right about the it being more complicated than "queue_len == 3". The logic I have for it is https://github.com/reenigne/reenigne/blob/mas … rocode.h#LL3185 - that doesn't take into account whether the last queue operation was a read or a write (directly) but does take into account whether the last IO was a prefetch or not (which may amount to the same thing). I'd be interested to see if your that makes your logic any simpler than mine!

Reply 7 of 15, by GloriousCow

User metadata
Rank Member
Rank
Member
reenigne wrote on 2023-05-21, 15:18:

Yes, I think you're right about the it being more complicated than "queue_len == 3". The logic I have for it is https://github.com/reenigne/reenigne/blob/mas … rocode.h#LL3185 - that doesn't take into account whether the last queue operation was a read or a write (directly) but does take into account whether the last IO was a prefetch or not (which may amount to the same thing). I'd be interested to see if your that makes your logic any simpler than mine!

Did you ever figure this out? Because I think I may have. We could compare notes 😁

MartyPC: A cycle-accurate IBM PC/XT emulator | https://github.com/dbalsom/martypc

Reply 8 of 15, by reenigne

User metadata
Rank Oldbie
Rank
Oldbie

Well, if you're talking about the "3 bytes in queue" delay, I thought I had it figured out with Ken's blog post. But I'd be very interested to hear if you've figured out something I haven't!

The _extraHaltDelay is still a mystery, though.

Reply 9 of 15, by superfury

User metadata
Rank l33t++
Rank
l33t++
reenigne wrote on 2023-10-30, 11:18:

Well, if you're talking about the "3 bytes in queue" delay, I thought I had it figured out with Ken's blog post. But I'd be very interested to hear if you've figured out something I haven't!

The _extraHaltDelay is still a mystery, though.

Perhaps an hidden RNI of sorts?

Author of the UniPCemu emulator.
UniPCemu Git repository
UniPCemu for Android, Windows, PSP, Vita and Switch on itch.io

Reply 10 of 15, by reenigne

User metadata
Rank Oldbie
Rank
Oldbie
superfury wrote on 2023-10-30, 11:43:
reenigne wrote on 2023-10-30, 11:18:

The _extraHaltDelay is still a mystery, though.

Perhaps an hidden RNI of sorts?

I wondered that too, but it doesn't seem to fit what I'm seeing. It does seem to depend on whether the previous IO was a code fetch or not for one thing. I suppose it's possible that it depends on the microcode of the previous instruction and the condition I worked out just happens to fit for the cases I've been able to find. I do wonder if it'll become clear once I do more experiments involving interrupts interrupting instruction sequences other than HLT and WAIT.

Reply 11 of 15, by GloriousCow

User metadata
Rank Member
Rank
Member
reenigne wrote on 2023-10-30, 11:18:

Well, if you're talking about the "3 bytes in queue" delay, I thought I had it figured out with Ken's blog post. But I'd be very interested to hear if you've figured out something I haven't!

The _extraHaltDelay is still a mystery, though.

Nope, I meant extraHaltDelay. I may not have a deep silicon explanation for you, we'd have to beg Ken for that, but it actually fits into my existing BIU model I presented in my blog post: https://martypc.blogspot.com/2023/08/the-8088 … -algorithm.html

The key is that HALT is not a true BIU/bus state, but something hacked in for one clock cycle by special logic, and that logic sometimes even fails to produce a HALT state even after HALT. I noticed that if INTR is high going into HLT, we don't see a HALT state appear on the status lines. I thought at first this was intentional, and maybe it is, but the behavior seems more just incidental than deliberate when you look at why that occurs.

Basically, the short explanation is that the BIU state is unchanged by HALT. The BIU is just put in sort of suspended animation - so when we wake back up from the halt state via an interrupt, the BIU remains in whatever state it was going into the halt.

I'll use the two examples you posted on Ken's blog:

Example 1: https://www.reenigne.org/misc/F4_0.txt

60AFF .p....  10A9C FF 00  FD  73 .......                                             |     | fetch ( idle-> pf)
60AFF Ip.... 10A9C FF 00 FD 73 ....... I F4 HLT | | idle -> pf 2
60AFD .C.... 10A9C FF 00 FD 70 ....... | | idle -> pf 3
10A9D .C.... 10A9D FF 00 FD 71 ....... T1 | | code fetch
60A9D .C.... 10A9D FF 00 FD 72 ..r.... T2 | | halt-not-hold -> suppress prefetch decision
60AFF .p.... 10A9D FF 00 FD 72 ..r.... T3 FF <-f [ 10A9D] | pf | halt wait for bus
60AFF .p.... 10A9D FF 00 FD 73 ....... T4 | pf | halt wait for bus
20AFF .H.... 10A9D FF 00 FD 70 ....... | pf | inject T1
60A9D .H.... 60A9D FF 00 FD 71 ....... | pf | HALT

Here our BIU starts out IDLE due to the full queue after MUL; I am assuming that halt has to wait on the BIU transition back to PF; and then wait for T2 of the prefetch itself.
The 'halt-not-hold' signal 'suppresses' the prefetcher - I'm proposing this is subtly different than a SUSPEND of the prefetcher. The BIU state does not go to idle; but remains in the 'B_pf' state.
Evidence that this is different from SUSPEND is that SUSPEND will wait for the current bus transfer to complete, and this is immediate.

Example 2: https://www.reenigne.org/misc/F4_1.txt

60AFF .p....  10A9C FF 00  FD  72 .......                                             |     | fetch ( idle -> pf)
60AFF Ip.... 10A9C FF 00 FD 72 ....... I 90 NOP | | idle -> pf 2
60AF0 .C.... 10A9C FF 00 FD 71 ....... | | idle -> pf 3
10A9D .C.... 10A9D FF 00 FD 70 ....... T1 | | code fetch
60A9D IC.... 10A9D FF 00 FD 73 ..r.... T2 I F4 HLT | | fc -> make prefetch decision
60ACD .p.... 10A9D CD 00 FD 73 ..r.... T3 CD <-f [ 10A9D] | | halt-not-hold
60ACD .p.... 10A9D CD 00 FD 72 ....... T4 | | halt wait for bus
20ACD .H.... 10A9D CD 00 FD 71 ....... | >i | inject T1
60A9D .H.... 60A9D CD 00 FD 70 ....... | i | HALT

Ken showed us that the halt-not-hold signal is generated on T2 - HLT has just been decoded on T2 so the earliest this can take effect is on T3 - after the prefetcher has made a prefetch decision. There's two things at play here - suppression of prefetch decisions, and suppression of new bus transfers. The latter is still in effect, so the scheduled prefetch attempts to occur after T4, and can't. So what happens? The BIU goes into the idle state.

In my blog I described leaving the idle state as taking 3 cycles; but really, it's more like leaving the idle state takes 1 cycle + the normal bus startup cost. If it takes 2 cycles to start a normal bus transfer, then idle -> PF takes 3, and Idle -> EU takes 3 cycles.
I'm expanding my BIU model a bit, so that we have a new B_ia state for interrupt acknowledge. This state takes 4 cycles to enter normally - and when idle, it takes 5.

When we wake back up in example #1 , we are still in the B_pf state, so it takes 4 cycles to enter INTA:

60A9D .p....  60A9D FF 00  FDI 70 .......                                  |      | INTR high
60A9D .p.... 60A9D FF 00 FDI 71 ....... | | IL set -> loader INTR routine
60A9D .p.... 60A9D FF 00 FDI 72 ....... |pf->ia| 1 INTR: INTA bus request - halt signal low
60A9D .p.... 60A9D FF 00 FDI 73 ....... |pf->ia| 2
60A9D .p.... 60A9D FF 00 FDI 73 ....... |pf->ia| 3
60A9D .A.... 60A9D FF 00 FDI 70 ....... |pf->ia| 4
00A9D .A.... 00A9D FF 00 FDI 71 ....... T1 | | INTA

But when we wake back up in example #2, we are in the B_idle state, so it take 5:

60A9D .p....  60A9D FF 00  FDI 71 .......                                  |     | INTR high                        
60A9D .p.... 60A9D FF 00 FDI 70 ....... | | IL set -> loader INTR routine
60A9D .p.... 60A9D FF 00 FDI 73 ....... |i->ia| 1 INTR: INTA bus request - halt signal low
60A9D .p.... 60A9D FF 00 FDI 72 ....... |i->ia| 2
60A9D .p.... 60A9D FF 00 FDI 72 ....... |i->ia| 3
50AAE .p.... 60A9D FF 00 FDI 71 ....... |i->ia| 4
50AAE .A.... 60A9D FF 00 FDI 70 ....... |i->ia| 5
00AAE .A.... 00AAE FF 00 FDI 73 ....... T1 | | INTA

If you aren't really sold on my BIU state machine theory, you can basically just summarize it as - is the halt-not-hold signal generated on or after T2?

What's interesting is that the HALT logic and the INTA BIU transition can even overlap in the right window.

00000029   [00020] CS M:... I:... P:.. PASV T4        F[        ] q-> F4 | HLT @ [F0105]
00000030 A:[F0106] M:... I:... P:.. CODE T1 [ ] |
00000031 [F0106] CS M:R.. I:... P:.. CODE T2 r-> 90 [ ] | halt-not-hold
00000032 [F0106] CS M:R.. I:... P:R. PASV T3 r-> 90 [ ] | Setting INTR high
00000033 [F0106] CS M:... I:... P:R. PASV T4 [ ] | > IL set T4 loader: INTR
00000034 [F0106] M:... I:... P:R. PASV T1 [90 ] | > IA INTR bus request push T1 flip-flop / halt-not-hold deactivates
00000035 A:[60106] M:... I:... P:R. HALT T1 [90 ] | > IA HALT
00000036 [60106] M:... I:... P:R. PASV T1 [90 ] | > IA
00000037 [60106] M:... I:... P:R. PASV T1 [90 ] | > IA
00000038 A:[001B6] M:... I:... P:R. INTA T1 [90 ] |

Here we see the HALT state forced even while the BIU is transitioning to INTA - without some sort of model for this behavior I just assumed I'd always see 7-8 cycles between HALT and INTA. But here there are only two!

There's some additional details and interesting captures I'll be covering in a blog post, but I think this model explains stuff pretty well.

I've been sort of musing for an explanation for BIU state transition costs, it would really be nice to get a silicon explanation for what is essentially a theoretical model. I think it makes sense that it takes two cycles to prepare a normal bus transfer, as we have to load a segment and offset into the adder, but what internal flip-flop creates this "idle" state and corresponding cycle penalty - no idea. What INTA needs 4 cycles for, no idea either.

MartyPC: A cycle-accurate IBM PC/XT emulator | https://github.com/dbalsom/martypc

Reply 12 of 15, by GloriousCow

User metadata
Rank Member
Rank
Member
superfury wrote on 2023-10-30, 11:43:
reenigne wrote on 2023-10-30, 11:18:

Well, if you're talking about the "3 bytes in queue" delay, I thought I had it figured out with Ken's blog post. But I'd be very interested to hear if you've figured out something I haven't!

The _extraHaltDelay is still a mystery, though.

Perhaps an hidden RNI of sorts?

You're not actually too far off the mark, I think, but it has more to do with the loader. HLT creates a special condition where the interrupt latch will trigger the loader with the interrupt microcode routine pre-set.

That's another interesting detail behind interrupts. Intel's documentation states that they are processed at the end of the instruction, but I'm going to be so bold as to say that's a little fib. I think actually occur when the loader is triggered.

Think about a NOP - the 2nd microcode instruction of a NOP is flagged NXT; so when that occurs, the loader is prompted to read the next instruction byte out of the queue and decode it. If the interrupt latch was set due to INTR being active a cycle prior, then we don't want to do this, we want to go right into the interrupt routine. If we really waited for the end of the instruction, we'd be processing the interrupt latch at RNI, which would mean we'd have loaded something only to try to back it back out and load the interrupt routine instead. That's no good...

You can experimentally test this: compare NOP to INTO (with the overflow flag cleared). INTO in its No-interrupt form is identical length to NOP - 3 microcode instructions ending in RNI - but does not have an NXT bit set on its second instruction. You can set INTR on the first cycle of NOP, and an interrupt will follow. But set INTR on the second cycle of NOP, and another NOP will be fetched instead. Set INTR on the second cycle of INTO, however, and an interrupt occurs. The only explanation is NXT.

By some fluke, MartyPC already gets this correct, since I do not execute any cycles flagged with NXT or RNI - meaning MartyPC goes right into the 'loader', at which point interrupts are checked. Wish I could take credit for that foresight...

MartyPC: A cycle-accurate IBM PC/XT emulator | https://github.com/dbalsom/martypc

Reply 13 of 15, by superfury

User metadata
Rank l33t++
Rank
l33t++
GloriousCow wrote on 2023-10-30, 13:13:
You're not actually too far off the mark, I think, but it has more to do with the loader. HLT creates a special condition where […]
Show full quote
superfury wrote on 2023-10-30, 11:43:
reenigne wrote on 2023-10-30, 11:18:

Well, if you're talking about the "3 bytes in queue" delay, I thought I had it figured out with Ken's blog post. But I'd be very interested to hear if you've figured out something I haven't!

The _extraHaltDelay is still a mystery, though.

Perhaps an hidden RNI of sorts?

You're not actually too far off the mark, I think, but it has more to do with the loader. HLT creates a special condition where the interrupt latch will trigger the loader with the interrupt microcode routine pre-set.

That's another interesting detail behind interrupts. Intel's documentation states that they are processed at the end of the instruction, but I'm going to be so bold as to say that's a little fib. I think actually occur when the loader is triggered.

Think about a NOP - the 2nd microcode instruction of a NOP is flagged NXT; so when that occurs, the loader is prompted to read the next instruction byte out of the queue and decode it. If the interrupt latch was set due to INTR being active a cycle prior, then we don't want to do this, we want to go right into the interrupt routine. If we really waited for the end of the instruction, we'd be processing the interrupt latch at RNI, which would mean we'd have loaded something only to try to back it back out and load the interrupt routine instead. That's no good...

You can experimentally test this: compare NOP to INTO (with the overflow flag cleared). INTO in its No-interrupt form is identical length to NOP - 3 microcode instructions ending in RNI - but does not have an NXT bit set on its second instruction. You can set INTR on the first cycle of NOP, and an interrupt will follow. But set INTR on the second cycle of NOP, and another NOP will be fetched instead. Set INTR on the second cycle of INTO, however, and an interrupt occurs. The only explanation is NXT.

By some fluke, MartyPC already gets this correct, since I do not execute any cycles flagged with NXT or RNI - meaning MartyPC goes right into the 'loader', at which point interrupts are checked. Wish I could take credit for that foresight...

Well, if with 'loader' you mean the process of clearing execution state for a new instruction (error handling and stepping state for the EU instruction handler) and fetching the new instruction before firing up the EU again, then yes, UniPCemu already handles it that way (and essentially always has since it was created as far as I can remember).
Basically the whole interrupt handling is executed immediately (actually indeed preset from the global emulator core instead of the CPU core itself), which triggers the CPU to start the INTA (two on 8088, with the 8088 discarding the second one, since it's data isn't used) execution from the latched interrupt vector (already done on the emulator core itself for the specified CPU core), which triggers it to start a normal INT (with special flags for hardware interrupt handling set instead of the software interrupt handling) that's running according to the timings I mentioned in an earlier post on this forum when discussing it with you guys (including the special behaviour of that first (and perhaps also second cycle set) if I remember it correctly (the special INT0/2/3 case I think? Don't remember which of INT0/1/2/3 were affected, but do remember that it was affecting 3 of those.)).
Although none of those special timings apply to INTA-based interrupts though. UniPCemu just handles it as a INTA(INTnr), INTA(discard and acknowledge), INTx(actual interrupt handling in the EU).

Author of the UniPCemu emulator.
UniPCemu Git repository
UniPCemu for Android, Windows, PSP, Vita and Switch on itch.io

Reply 14 of 15, by GloriousCow

User metadata
Rank Member
Rank
Member
superfury wrote on 2023-10-30, 16:10:

Well, if with 'loader' you mean the process of clearing execution state for a new instruction (error handling and stepping state for the EU instruction handler) and fetching the new instruction before firing up the EU again, then yes, UniPCemu already handles it that way (and essentially always has since it was created as far as I can remember).
Basically the whole interrupt handling is executed immediately (actually indeed preset from the global emulator core instead of the CPU core itself), which triggers the CPU to start the INTA (two on 8088, with the 8088 discarding the second one, since it's data isn't used) execution from the latched interrupt vector (already done on the emulator core itself for the specified CPU core), which triggers it to start a normal INT (with special flags for hardware interrupt handling set instead of the software interrupt handling) that's running according to the timings I mentioned in an earlier post on this forum when discussing it with you guys (including the special behaviour of that first (and perhaps also second cycle set) if I remember it correctly (the special INT0/2/3 case I think? Don't remember which of INT0/1/2/3 were affected, but do remember that it was affecting 3 of those.)).
Although none of those special timings apply to INTA-based interrupts though. UniPCemu just handles it as a INTA(INTnr), INTA(discard and acknowledge), INTx(actual interrupt handling in the EU).

very minor point, but the 8088 ignores the first INTA, and uses the second.

here's a timer interrupt. You can see there's nonsense on the data bus for the first INTA and then we have 8 on the second.

inta_sniff.PNG
Filename
inta_sniff.PNG
File size
17.69 KiB
Views
1145 views
File comment
bus sniff of INTA cycles
File license
Public domain

of course if you're latching the vector internally to your emulator, it doesn't really matter.

MartyPC: A cycle-accurate IBM PC/XT emulator | https://github.com/dbalsom/martypc

Reply 15 of 15, by superfury

User metadata
Rank l33t++
Rank
l33t++
GloriousCow wrote on 2023-10-30, 16:44:
very minor point, but the 8088 ignores the first INTA, and uses the second. […]
Show full quote
superfury wrote on 2023-10-30, 16:10:

Well, if with 'loader' you mean the process of clearing execution state for a new instruction (error handling and stepping state for the EU instruction handler) and fetching the new instruction before firing up the EU again, then yes, UniPCemu already handles it that way (and essentially always has since it was created as far as I can remember).
Basically the whole interrupt handling is executed immediately (actually indeed preset from the global emulator core instead of the CPU core itself), which triggers the CPU to start the INTA (two on 8088, with the 8088 discarding the second one, since it's data isn't used) execution from the latched interrupt vector (already done on the emulator core itself for the specified CPU core), which triggers it to start a normal INT (with special flags for hardware interrupt handling set instead of the software interrupt handling) that's running according to the timings I mentioned in an earlier post on this forum when discussing it with you guys (including the special behaviour of that first (and perhaps also second cycle set) if I remember it correctly (the special INT0/2/3 case I think? Don't remember which of INT0/1/2/3 were affected, but do remember that it was affecting 3 of those.)).
Although none of those special timings apply to INTA-based interrupts though. UniPCemu just handles it as a INTA(INTnr), INTA(discard and acknowledge), INTx(actual interrupt handling in the EU).

very minor point, but the 8088 ignores the first INTA, and uses the second.

here's a timer interrupt. You can see there's nonsense on the data bus for the first INTA and then we have 8 on the second.
inta_sniff.PNG

of course if you're latching the vector internally to your emulator, it doesn't really matter.

As a quick question: why does the 808x perform 2 INTA cycles for that? What is the purpose of that first (discarded) INTA?

Edit: Found a bit about that first 'byte' INTA result:
https://rakmaya.tripod.com/Chip8259.htm
So basically, the 8259 acknowledges the IRQ(updating IMR and ISR) and sets up for a read vector to be read on another INTA pulse. It doesn't drive the data bus during the first INTA, thus the 'garbage'?

Author of the UniPCemu emulator.
UniPCemu Git repository
UniPCemu for Android, Windows, PSP, Vita and Switch on itch.io