First post, by wsy2220
TLDR. Replace the 2SD1802 transistor near the cpu slot with a N-Channel MOSFET.
I recently built a 1Ghz coppermine machine with P2B v1.10 motherboard. The motherboard has an HIP6019BCB controller so it can supply the 1.75v volage needed by coppermine. I use a '370 CPU CARD REV 3.2' adapter to install the cpu. Since P2Bs are relatively cheap in my area, I aquired 3 of them.
It worked flawlessly until I tried to ran memtest86+ to test a pair of new memory modules. Everytime it enters modulo 20 test (#9 in 6.00, #10 in 5.x), the system freezes in a few seconds. Sometimes it can run for a few minutes but that's rare.
When it freezes, the symptoms are very strange. First the display goes blank, then the cooling fans spin down. Less than a second later, the screen comes back with garbed text, and the fans spin up again.
I tried a bunch of things to pinpoint the problem.
To check if memtest86+ is writing to unwanted memory address like video ram or bios area, I tried to limit the target address range like 32M-64M. It still freezes. However I noticed that if I choose a smaller area like 32M-36M, the screen goes blank and never comes back, and the fans are stopped. Even the reset button doesn't work. A power cycle is needed to boot it again.
To check if the memory modules are defective, I switched to a 500MHz Katmai to ran the test again. It survived a whole night of complete memtest86+ runs. This narrowed down the problem quite a bit:
- Memtest86+ is indeed working without writing to unwated addresses.
- My memory modules and chipset are not defective. Since my Katami and Coppermine both run at 100MHz base frequncy, and memtest86+ showed exactly the same memory benchmark numbers, Katami is enough to saturate the memory bus and coppermine can't put extra pressure on the chipset or memory.
To check if the coopermine cpu is defective, I replaced it with another 1Ghz coppermine and still got the freeze. I also ran Prime95 under Linux to confirm the cpu is stable under extreme load.
So it has to be the motherboard, or the slotket. Though at that time I didn't realize the possibilty of slotket issue, just assumed it's the motherboard's problem. I switched to another P2B and got the same freeze. Both boards have no visually bad caps. I also measured cpu Vcc with a oscilloscope. The voltage is stable and the ripple is pretty good. So there must be a flaw in the motherboard's design.
But I had no clue where to find it. Here I realized I haven't try to search for this problem. I googled 'p2b memtest', the first result is an old usenet thread about exactly the same issue.
In that thread someone mentioned about the 'Photoshop bug'. I looked into it. It looked very similar to my problem. According to the decription of the fixes, It is caused by lacking filtering caps between Vtt and ground. I hooked up my oscilloscope to the Vtt line to verify. Indeed when modulo20 is running, Vtt frequently goes from 1.5V to 1.3V. I tried to add some capacitors between Vtt and ground according to the recommended fix but it didn't work. At least I had narrowed down the problem to Vtt power supply.
Someone in the usenet thread mentioned HIP6019 has a fault pin. I looked into the datasheet, the fault pin will go high when over-current protection is tripped 3 times in a row. Only power-cyle the chip can restore its operation. I measured the fault pin after I triggerred the unresttable freeze, it's indeed high.
So the picture is clear now, coppermine triggered Vtt overcurrent protection of HIP6019BCB under heavy load. I found Intel's Pentium III Processor Power Distribution Guidelines. Vtt current can go up to 5.38A.
I noticed something intresting in the HIP6019BCB datasheet. In the reference design, 1.5V linear controller output GATE3 is connected to a MOSFET HUF75307D3S. I traced the pins on my P2B, GATE3 is connected to a 2SB1202 Bipolar junction transistor(BJT). The difference btween BJT and MOSFET is that BJT is a current driven device, higher emitter current requires higher base current. If GATE3 can't drive enough current through the BJT, the output voltage will go down and trigger over-current protection. While MOSFET only needs enough voltage on the gate, with a very low current requirement.
Then I realized my third P2B has a US3007CW, maybe it can drive more current to the BJT. I tested again and it passed modulo20 test without a issue. The Vtt voltage is stable at 1.5V without hiccups. According to the datasheet, US3007CW can drive 50mA current through GATE3. HIP6019BCB datasheet doesn't provide this value, but I found HIP6020A which has a drive current of only 40mA. This is consistent with my speculation. According to 2SD1802 datasheet, the collector current can only reach 3.5A when base current is 40mA, much lower than requred 5.38A.
So I ordered some IRFR1205 MOSFETs in TO252 package and replaced the BJT with it. It survived a whole night of modulo 20 test. Mystery solved after 18 years!
Here are some tips if you want to do the same modification:
- Desoldering power transistors isn't easy, I watched a video to learn how to do it. I also cut the legs first to lower the difficulty.
- The Rds rating of the MOSFET is not important. When used in a linear power supply, they generate the same amount of heat. But you may need the same threashold voltage of 2-4V as the reference HUF75307D3S.
Someone in that usenet thread also suspected the hardware monitor chip AS97127F(W83781D clone). This can be ruled out now. The fan spin down because another BJT to supply fan voltage near the monitor chip lost base current when the over-current protection is trippled. I also found in the kernel document someone mentioned fan stop under heavy load and blamed the asus chip 😀
I'm just a software dev with no formal training in electronics. So please correct me if I made any mistake. I've learned a lot in this journey.
Edit: 2SB1202->2SD1802