VOGONS


Debugging a Micronics W6-LI POST failure

Topic actions

Reply 20 of 25, by RussD

User metadata
Rank Newbie
Rank
Newbie

So my next step was trying to figure out what event or clock the memory controller (DBX) wasn't receiving to advance through this delay. I pulled up the pinout for the DBX again, expecting to find some low speed clock it was using, or some event signal from the PMC. But nothing. Just a single clock, HCLKIN. For what it's worth, I did continuity test the signals between the DBX and PMC to see if there was an issue there, but nothing.

Anyway, HCLKIN is the host clock, in this case the 60MHz front side bus clock. The clock that does pretty much everything. The system is executing code so the clock is clearly working. So what's up? Guess I could check and make sure it's connected to the DBX, and check where it goes.

The attachment PXL_20220725_191050667.jpg is no longer available

It connects up to U6 on the lower right. This appears to be clocking central for the motherboard. W4/W5 up on the upper left select the bus clock (60MHz or 66MHz), and the jumpers starting with W6 down on the lower left select the multiplier. The silver can is the main crystal. The actual datasheet for the clock chip seems to be lost to the ages, but it looks like the pin-out has been re-used across multiple pentium pro clock chips. One of them is show here

The attachment gnome-shell-screenshot-hl3dce.png is no longer available

Probing shows the HCLKIN clock headed to DBX connecting to pin 7. Which is PCLK1. That should be, given the jumper settings in the motherboard manual, be giving me 60MHz. I'm too lazy to pull out the scope right now, so lets hook it up to the logic analyzer, set the rate to 200MHz, and see what kind of sloppy signal we can get.

The attachment gnome-shell-screenshot-fq2jef.png is no longer available

oh...oh. That's not 60MHz, like not even close. It is suspiciously half of the frequency of the crystal, 14.318MHz. Ok, I think A) we've gotten a lot closer to finding the root problem, B) once again proven the old adage of check your voltages *and* clocks first, and C) oh geez, that chip is kinda hard to find. I guess we could start by probing the other pins, to see if they have a sane value. Moving around the chip, the other PCLKn outputs are also outputting the same frequency, the FS1/FS0 pins....are both 3.3V. That's not right. In the table that indicates "Test mode". Well, maybe test mode outputs half the crystal frequency. Bolsters my hope a bit that the clock chip is ok.

Things are getting even closer to the root cause. Time to check where those hook up. Jumpers W4 and W5, the left pin of each...odd. Where does the right pin of those jumpers go...ground. Something is not right here. With the jumper in place, W4 is shorted to ground. Well, they also hook up to pins 15 and 17 of U7 on the upper left as well as a couple of 8k pull-up resistors.

The attachment gnome-shell-screenshot-4qb2m9.png is no longer available

15 and 17 are *inputs*, how is this signal being shorted to ground and yet somehow being driven to 3.3V. Maybe there's a broken trace? No. I already probed it. It's fine. What if I move the jumper to the other position? I should get the 66MHz setting. Power up, probe. WTF. Both lines are still reading at 3.3V. OK, I'll probe the damn bridge on the top of the jumper. 0V. I'm getting to the point of insanity here. Clearly I've driven my self mad. What if I probe the little pin sticking up out of the jumper.

3.3V. No. no no no no. Time to pull the jumper out and look at it. It looks fine. How about we test it.

The attachment PXL_20220805_161427870.jpg is no longer available

I give you the jumper sent from hell to torment me. It's got just a tiny bit of extra plastic combined with some loose contacts. The extra plastic is pushing the pin just enough to not make contact. If I wiggle it, it makes contact, but otherwise nothing.

Last edited by RussD on 2022-08-18, 00:17. Edited 1 time in total.

Reply 21 of 25, by RussD

User metadata
Rank Newbie
Rank
Newbie

Well, there's only one thing left to do. Grab a new jumper out of the parts bin and try it out.

The attachment PXL_20220818_000044080.jpg is no longer available

Well, there you go. And as for that jumper:

The attachment PXL_20220817_235749522.jpg is no longer available

TL;DR, a jumper was not making contact and programming the clock chip to run in test mode, putting out a bus clock of 7MHz instead of 60MHz. This caused a timing issue on the memory controller where a timeout during a memory test never triggered causing it to hang the system during DRAM auto-probing.

Since I worked this issue for a few weeks off and on, I had a couple of fun side projects come out of it I hope to share soon. One to deal with the excessive current draw by the VRM on the 5V rail, and the other to provide additional debugging options for P6 platforms. Although if you have a system failing to boot and it's at a stage where caches are not enabled, and memory is not initialized, probing the ISA bus can be really useful. Of course, all of that can just be avoided by first checking voltages, and clocks, and not making assumptions.

Reply 22 of 25, by pentiumspeed

User metadata
Rank l33t
Rank
l33t

You are welcome, wow bad jumper!? I usually search around and buy bag of quality jumper blocks.

I was glad I suggested checking the clocking.

Cheers,

Great Northern aka Canada.

Reply 23 of 25, by rasz_pl

User metadata
Rank l33t
Rank
l33t
RussD wrote on 2022-08-17, 22:40:

Given the 32 byte size and alignment, I'm pretty sure it's related to cachelines. I can hook things up again to double check, but it looks like it's reading an entire cacheline every time it reads a single byte of instruction memory from BIOS.

Really fascinating. Probably some memory controller design shortcut, should be called Ignore cache 😀. I wonder if intel stuck with it, and if other vendors also copied this behavior. Might explains why disabling L1 cache slows even very fast CPUs beyond expected theoretical rate.

AT&T Globalyst/FIC 486-GAC-2 Cache Module reproduction
Zenith Data Systems (ZDS) ZBIOS 'MFM-300 Monitor' reverse engineering

Reply 24 of 25, by Thermalwrong

User metadata
Rank Oldbie
Rank
Oldbie

That's wonderful you've got it working again, I've been following although a lot of it's over my head. But the diagnostic process is something I need to try with a couple of motherboards that just aren't working.
The fact it was just a jumper and not something more complex to resolve, that's great because it's simple to resolve. And again, I'm really glad you have that system working again 😀
but it does show how even such a simple thing can cause such significant failures, especially with older PCs that are reliant on jumpers. I've had 486 motherboards in a non-functional state until all the jumper settings were 'just right', but a Pentium Pro taken down by just one damaged jumper on the PLL, well it's certainly educational.

Reply 25 of 25, by amijim

User metadata
Rank Member
Rank
Member

Well,i run my w6-li to 233mhz with win2k.For some reason the ide cdrom does not work and does not autoboot win2k ,win98 works okey ,so i had to install an adaptec scsi controller and boot from scsi cdrom and install to a scsi hd and now it works like a dream.I have this machine since 2001 when i was at the first year in the univercity.I had written alot of info in sysopt forums and 2cpu forums.

Iwill ZmaxDP
Arima4way
Tyan s2885
Iwill MPX2
Gigabyte GA-7DPXDW+
Compaq SP700
Compaq ml350