Reply 20 of 25, by RussD
So my next step was trying to figure out what event or clock the memory controller (DBX) wasn't receiving to advance through this delay. I pulled up the pinout for the DBX again, expecting to find some low speed clock it was using, or some event signal from the PMC. But nothing. Just a single clock, HCLKIN. For what it's worth, I did continuity test the signals between the DBX and PMC to see if there was an issue there, but nothing.
Anyway, HCLKIN is the host clock, in this case the 60MHz front side bus clock. The clock that does pretty much everything. The system is executing code so the clock is clearly working. So what's up? Guess I could check and make sure it's connected to the DBX, and check where it goes.
It connects up to U6 on the lower right. This appears to be clocking central for the motherboard. W4/W5 up on the upper left select the bus clock (60MHz or 66MHz), and the jumpers starting with W6 down on the lower left select the multiplier. The silver can is the main crystal. The actual datasheet for the clock chip seems to be lost to the ages, but it looks like the pin-out has been re-used across multiple pentium pro clock chips. One of them is show here
Probing shows the HCLKIN clock headed to DBX connecting to pin 7. Which is PCLK1. That should be, given the jumper settings in the motherboard manual, be giving me 60MHz. I'm too lazy to pull out the scope right now, so lets hook it up to the logic analyzer, set the rate to 200MHz, and see what kind of sloppy signal we can get.
oh...oh. That's not 60MHz, like not even close. It is suspiciously half of the frequency of the crystal, 14.318MHz. Ok, I think A) we've gotten a lot closer to finding the root problem, B) once again proven the old adage of check your voltages *and* clocks first, and C) oh geez, that chip is kinda hard to find. I guess we could start by probing the other pins, to see if they have a sane value. Moving around the chip, the other PCLKn outputs are also outputting the same frequency, the FS1/FS0 pins....are both 3.3V. That's not right. In the table that indicates "Test mode". Well, maybe test mode outputs half the crystal frequency. Bolsters my hope a bit that the clock chip is ok.
Things are getting even closer to the root cause. Time to check where those hook up. Jumpers W4 and W5, the left pin of each...odd. Where does the right pin of those jumpers go...ground. Something is not right here. With the jumper in place, W4 is shorted to ground. Well, they also hook up to pins 15 and 17 of U7 on the upper left as well as a couple of 8k pull-up resistors.
15 and 17 are *inputs*, how is this signal being shorted to ground and yet somehow being driven to 3.3V. Maybe there's a broken trace? No. I already probed it. It's fine. What if I move the jumper to the other position? I should get the 66MHz setting. Power up, probe. WTF. Both lines are still reading at 3.3V. OK, I'll probe the damn bridge on the top of the jumper. 0V. I'm getting to the point of insanity here. Clearly I've driven my self mad. What if I probe the little pin sticking up out of the jumper.
3.3V. No. no no no no. Time to pull the jumper out and look at it. It looks fine. How about we test it.
I give you the jumper sent from hell to torment me. It's got just a tiny bit of extra plastic combined with some loose contacts. The extra plastic is pushing the pin just enough to not make contact. If I wiggle it, it makes contact, but otherwise nothing.