VOGONS


First post, by quicknick

User metadata
Rank Member
Rank
Member

Thought for a while if I should post this, and decided it's better to do it as a reminder that things can go wrong when you least expect it.

So today I was in the middle of putting together my second 'lockdown project', a system built around an Abit KT7 (non-A) that has a particular significance to me - it is the actual board that I bought in December '00, used for about 4-5 years and which I considered lost without a trace, vanished, until last week when we were reunited - but that's another long story by itself. The board has the socket mod from back in the day so it can run with higher multipliers, being last used with a Thoroughbred-B at 2200MHz (100x22).

Not having any available PSU with a fat +5V rail I decided to use my best Socket A CPU, an AXMH2400FQQ4C, further underclocked/undervolted to 1500MHz@1.40V. At this very low power I saw fit to use a good looking, all-copper, low profile cooler which I got from the fleamarket in one of the last visits before the lockdown.

I was in the last stages of Win98 setup when I felt the need to check on the heatsink temperature with my finger. But reaching there blindly, and the cooler being so flat, I missed my intended target and ended with my finger in the fan instead of reaching the copper part. I got chopped pretty bad, some blood from under the fingernail, uttered some sweet words and carried on. The Windows setup was complete, I was preparing to copy the install kit from the CD to the SSD and the room was slowly filling with that familiar "new PC" smell. I thought it was from the PSU, as it's a NOS Enermax Liberty. But then explorer.exe crashed, bluescreens were cascading, chaos was taking over. I hit ctrl-alt-del repeatedly until the system rebooted (heard the distinctive sound from the optical drive), but it wouldn't post. Cycled power from the PSU, nothing! The smell was getting more intense and seemed to come from the PSU, and at this point I was thinking that maybe something inside it had shorted, or at least a cap had blown.

So I turned the chassis to horizontal to begin troubleshooting, then I noticed. The horror. CPU fan wasn't spinning. Turns out, my finger wasn't the sole victim - the blade that wounded me was snapped and completely blocked the fan:

brokenfan.jpg
Filename
brokenfan.jpg
File size
553.66 KiB
Views
463 views
File license
Fair use/fair dealing exception

Quicky pulled the plug, and again tried to feel the temperature. But there was no need to touch it - I could feel from a distance that the heatsink was scorching hot. At this point I was expecting the worst: not only I did ruin my best Socket A CPU, but probably my beloved board as well. After letting it cool down I noticed that it got so hot that some of the solder that was used to hold the fins to the base plate of the heatsink melted and formed a lot of solder balls:

solderballs.jpg
Filename
solderballs.jpg
File size
652.29 KiB
Views
463 views
File license
Fair use/fair dealing exception

Also the sticker on the bottom of the fan became crumpled and shrunk as a result of the intense heat:

wrinkedfan.jpg
Filename
wrinkedfan.jpg
File size
221.87 KiB
Views
463 views
File license
Fair use/fair dealing exception

Not seen in this picture, but the fins left partially melted imprints of themselves on the fan's wires where it touched them. I recovered some of the solder balls, and using my soldering station I found out they have a melting point of about 160 degrees C.
Seeing that the CPU substrate became noticeably darker under the die I was pretty sure I killed it, and I was only hoping that my board survived. But decided to give it a try nevertheless, and to my extreme surprise, after replacing the cooler, everything works! Will do more tests in the following days, check to see if the CPU can still hit it's rated frequency, and maybe leave it for some hours under a stress test, but so far it's looking incredibly good (though I wish it didn't happen in the first place).

So... that concludes my story for now. Stay safe!

Reply 1 of 13, by kjliew

User metadata
Rank Oldbie
Rank
Oldbie

Haven't you watched the Intel Pentium4 promo video by Intel at the time where they were hit hard by Athlon? The moment the fan were abruptly stopped, the Athlon just turned into fire. This really gave AMD a hard time and all motherboard vendors started implementing CPU fan monitoring and alerting from their BIOS to alleviate the potential fire hazard liability. Being an underdog, no one had actually sued AMD. On the other hand, Intel knew very well about the problem and their engineering research invested heavily to make sure that the CPU won't kill itself in case of fan failure.

Fan monitoring and alerting was extremely crucial for AMD CPUs for safe computing back then. I hope AMD has made progress in this area, but I will always make sure the motherboard has working fan monitoring and alerting.

Reply 2 of 13, by The Serpent Rider

User metadata
Rank l33t
Rank
l33t

Haven't you watched the Intel Pentium4 promo video by Intel at the time where they were hit hard by Athlon? The moment the fan were abruptly stopped, the Athlon just turned into fire.

Nope. THG made very unlikable scenario, where they just yanked the whole cooler from the CPU. It's very risky for any CPU without IHS, including Pentium 3 Coppermine.

ABIT KT7 already has thermal shutdown protection which will suffice for damaged fan scenario, but you need to enable it in BIOS first.

Get up, come on get down with the sickness
Open up your hate, and let it flow into me

Reply 3 of 13, by quicknick

User metadata
Rank Member
Rank
Member
kjliew wrote on 2020-04-04, 00:25:

I hope AMD has made progress in this area, but I will always make sure the motherboard has working fan monitoring and alerting.

I'm sure they made progress, as I had another incident ~12 years ago, when my Arctic Cooling Freezer 64 Pro fell onto my graphics card after the retainer broke under the heatsink's weight while my computer ran unattended, and my X2 4800+ survived unscathed.

The Serpent Rider wrote:

ABIT KT7 already has thermal shutdown protection which will suffice for damaged fan scenario, but you need to enable it in BIOS first.

No thermal shutdown, at least in the current BIOS. There is only one option, CPU protect for CPUFan Off, that would have spared me of this whole affair had it been Enabled...
Ironically, I cannot use it now because the new cooler (AC Copper Lite3) is too slow (around 2000rpm I think), and the board is buggy and sees that as 0rpm (it only registers "screamer" fans), so if I enable now this option the system doesn't boot anymore (or rather it shuts down by itself in a matter of seconds, after giving an alarm beep).

Reply 5 of 13, by The Serpent Rider

User metadata
Rank l33t
Rank
l33t

BTW you can push Athlon XP-M much lower. Here's my result with normal Barton on KT7A:

Athlon XP Barton low voltage.png
Filename
Athlon XP Barton low voltage.png
File size
17.74 KiB
Views
353 views
File license
CC-BY-4.0

With beefy enough radiator it can work without active cooling.

Get up, come on get down with the sickness
Open up your hate, and let it flow into me

Reply 7 of 13, by kjliew

User metadata
Rank Oldbie
Rank
Oldbie
The Serpent Rider wrote on 2020-04-04, 09:03:

It's very risky for any CPU without IHS, including Pentium 3 Coppermine.

Not for Intel CPUs though. As I just said, Intel had been investing heavily in their engineering research to prevent the CPUs from thermal damage. I had a PIII Coppermine 1GHz in 2002, but I used to wrong heatsink and the metal contact was floating above the CPU die (with some thermal paste). I could only run the CPU at 733MHz, at 900MHz it would crash some time about 30mins. When I finally got the motherboard SuperIO health monitoring working, I figured out that the CPU temperature was hovering between 75~80oC. That was how I figured out the heatsink was improper. I had been using the system for a good over 2 years at reduced clock speed. Once the heatsink problem was corrected, I was able to overclock it at 1.33GHz and it worked through the rest of its useful life until I upgraded to Core 2 Duo.

I am not a typical AMD basher, but this was one area that Intel truly excelled and took the problem into their own hands, while AMD struggled and left the problem to be addressed by board/firmware designs.

Reply 8 of 13, by Unknown_K

User metadata
Rank Oldbie
Rank
Oldbie

I had the plastic part the AM2 heatsint attaches to break off the tab and the heatsink fell off when running but nothing was hurt.

Intel invested in CPU monitoring probably because of the high end P4 chips running hot.

Collector of old computers, hardware, and software

Reply 9 of 13, by Horun

User metadata
Rank Oldbie
Rank
Oldbie
quicknick wrote on 2020-04-04, 00:03:

Thought for a while if I should post this, and decided it's better to do it as a reminder that things can go wrong when you least expect it.

I was in the last stages of Win98 setup when I felt the need to check on the heatsink temperature with my finger. But reaching there blindly, and the cooler being so flat, I missed my intended target and ended with my finger in the fan instead of reaching the copper part. I got chopped pretty bad, some blood from under the fingernail, uttered some sweet words and carried on. The Windows setup was complete, I was preparing to copy the install kit from the CD to the SSD and the room was slowly filling with that familiar "new PC" smell. I thought it was from the PSU, as it's a NOS Enermax Liberty. But then explorer.exe crashed, bluescreens were cascading, chaos was taking over. I hit ctrl-alt-del repeatedly until the system rebooted (heard the distinctive sound from the optical drive), but it wouldn't post. Cycled power from the PSU, nothing! The smell was getting more intense and seemed to come from the PSU, and at this point I was thinking that maybe something inside it had shorted, or at least a cap had blown.

So I turned the chassis to horizontal to begin troubleshooting, then I noticed. The horror. CPU fan wasn't spinning. Turns out, my finger wasn't the sole victim - the blade that wounded me was snapped and completely blocked the fan:

Wow bummer, sorry to hear about your cpu. Athlon socA and P3 soc370 cpu's both share the quick death if heatsink+fan is not doing it's job. you got lucky !

kjliew wrote on 2020-04-04, 21:15:

Not for Intel CPUs though. As I just said, Intel had been investing heavily in their engineering research to prevent the CPUs from thermal damage. I had a PIII Coppermine 1GHz in 2002, but I used to wrong heatsink ..

When I finally got the motherboard SuperIO health monitoring working, I figured out that the CPU temperature was hovering between 75~80oC.

Oh I killed a p3-soc370 due to a bad heatsink, you also got lucky and found it is time. I bet cpu would have been dead if it would kept running for a few more hours before you noticed something was wrong....

Hate posting a reply and have to edit it because it made no sense 😁 First computer was an IBM 3270 workstation with CGA monitor. 🤣

Reply 10 of 13, by imi

User metadata
Rank Oldbie
Rank
Oldbie
kjliew wrote on 2020-04-04, 21:15:

while AMD struggled and left the problem to be addressed by board/firmware designs.

and most did ^^ this really is far less of an issue than people make it out to be... but yeah you have to be a bit more careful.

Reply 11 of 13, by The Serpent Rider

User metadata
Rank l33t
Rank
l33t

As I just said, Intel had been investing heavily in their engineering research to prevent the CPUs from thermal damage. I had a PIII Coppermine 1GHz in 2002, but I used to wrong heatsink and the metal contact was floating above the CPU die (with some thermal paste). I could only run the CPU at 733MHz, at 900MHz it would crash some time about 30mins

That's all fine and dandy, but it's not that hard to kill 1 Ghz Coppermine by removing heatsink, like THG did. If the board is too slow with protection, it will just fry.

Get up, come on get down with the sickness
Open up your hate, and let it flow into me

Reply 12 of 13, by cde

User metadata
Rank Member
Rank
Member

Hi quicknick,

I'm sorry to hear about your unfortunate cooler meltdown 🙁 By the way I'm sure you're already aware, but these Socket A coolers only have one correct orientation (otherwise there will be little or no contact with the CPU).

The good news is that the CPU and board still work! I suppose that's quite possible since 130nm is large enough to reduce the effects of electromigration, especially as the event didn't last too long and you reacted quickly.

My best Athlon XP-M is the AXMH2500FQQ4C. It goes down to 1.2V at 1866 Mhz, which is pretty good. I wouldn't be suprised to see yours go down to 1.3V.

Looking forward to the story of how you found your KT7 😀

Reply 13 of 13, by quicknick

User metadata
Rank Member
Rank
Member

I'm down to 1.2 volts at the moment, still rock solid. There is potential to go lower, but for now I feel like I need a break from all hands-on retro activities for a week or two. I might be more active here, though 😀