By tools I meant some other testing software, maybe, in case this one makes us chase phantom errors. Can't really help you there though, I don't know any.
A scope would be great too, with XT machine a cheap 20MHz one is good enough. Even a very old 5MHz analog scope would be useful but only if you have some experience in using one, and understand the bandwidth limitations. The XT is really just another 8-bit machine with 4 more address lines - you can look at the schematics and tell how it works.
These cold/hot issues are the worst. It can be anything, a chip, bad solder point, cracked trace, via corrosion, you name it. Maybe I'm obsessing about these interrupts too much, so here's my reasoning behind it. The "Keyboard controller" is really the LS322, it works as serial input parallel output register. Both clock and data lines from keyboard are input-only on PC side, data goes directly to LS322 and clock is routed through a synchronizer/delay flip-flop pair of LS175 to the LS322 as well. Bits are shifted in and, eventually, the first one (which should be 1) gets shifted out and goes to LS74 flip-flip and will activate IRQ1 line (8 more bits are now stored in LS322). This event also disables any further shifts until CPU reads the LS322 data through the 8255 chip and clears the interrupt (which also clears the LS322 state). As a side note, in the clear mode the LS322 is taken off the bus and the SW1 configuration switches can be read through the same 8255 interface.
There are two LS125 gates that can drive the keyboard clock and data lines low, permanently. One is used to "disable" the data line on IRQ1 event, the other - I don't remember. This works because the keyboard interface is really only open collector outputs and there are two 4k7 pull-up resistors on the PC side.
Anyway, this seems to work, becasuse any fault in any of these chips or connections would make the keyboard unresponsive or glitching. I mean there's a chance that you are getting weird keyboard codes in, that are silenty dropped as invalid so IRQ1 happens but you don't see it. But that can't really be tested without a scope - you'd have to monitor this line and it should only fire on keypresses and not at other times. Since the interrupt request to clear is probably just a few microseconds, you can't just use a LED to test, the blinks might be too short for the eye to register. But you can try with some bright blue LED I guess.
So I'm thinking, since IRQ0 seems to fail sometimes as well, and then the floppy dies too eventually, perhaps there is some random interrupt arriving at 8259 that interferes with the correct ones. If not that, then the address decoding logic might have a weak chip and that's going to be a merry chase...
One more thing though. Often overlooked issue is RAM, and these chips do die on their own. Using software to detect issues is only going to work properly if the RAM works, and since every time you load a program it ends up in the same spot (after a reboot), chances are a glitching RAM chip can affect it in similar way. Mark the RAM chips with numbers, pull them out and put back in different order. See what that does. Or - since you have 2 mobos - swap them around.