VOGONS


First post, by cloppy007

User metadata
Rank Newbie
Rank
Newbie

Hi Vogons!

I travelled with my Radeon 9700Pro inside its anti-static bag, covered in bubble wrap and suddenly I'm getting this weird output both in DVI and VGA:

radeon_9700_mal.jpg
Filename
radeon_9700_mal.jpg
File size
196.66 KiB
Views
1723 views
File comment
A photo of the video output
File license
CC-BY-4.0

And a short video: https://youtu.be/CVpZqoC-1lM

It happens in 2 of my test systems: a DFI nF4 using an AGP to PCIe adapter and a Gigabyte SiS 648 board. I've never tested it before in these machines. I can boot in Windows with another graphics card and Windows does detect the card and even let me extend the desktop to, but it outputs the same crap. I was even able to dump the BIOS (and its checksum matches that of the TPU database)

Previously it was working (as in I could game) in an Abit NF7-S and also in an Asus A8N32-SLI (using the same AGP2PCIe adapter).

I don't think there has been physical damage to it, could that be the case? There are no fallen caps or resistors on the static bad and none seem to be missing on a first inspection using a magnifying glass.

Before pulling it off the working system (the Abit board) it did BSOD in a few scenarios and I grew suspicious of the electrolytic caps, but they seem fine and I got the same behaviour with a non-related 9800Pro and a 9600SE.

What could possible be wrong with this card?

Things I've checked:

  • Temps (I've always run it with a modded crystal orb that makes good contact)
  • Power cable (checked OK with a multimeter)
  • VGA/DVI + AGP contacts (all cleaned with IPA)

Reply 1 of 18, by The Serpent Rider

User metadata
Rank l33t++
Rank
l33t++

Well, it is physical damage. Most likely GPU or RAM has lost contact with the PCB. Just because you've checked it before doesn't guarantee that it can work again, because BGA contact could crack during cooling phase.

I've always run it with a modded crystal orb that makes good contact

Unless the frame protection on GPU was removed or thick thermal pad was applied, you've basically just cooked it to death. And based on what I see from the internet, Crystal Orb is fairly mediocre chipset cooler, which won't be sufficient for Radeon 9700/9800 core.

I must be some kind of standard: the anonymous gangbanger of the 21st century.

Reply 2 of 18, by bloodem

User metadata
Rank Oldbie
Rank
Oldbie

That behavior is usually indicative of broken solder joints (either on the GPU, or its memory ICs).
A damaged GPU core is also a possibility, but that usually results in having no display output at all.

While the system is running, try and press / gently bend the card in various areas (and press on each memory IC individually) to see if the display pattern changes.

1 x PLCC-68 / 2 x PGA132 / 5 x Skt 3 / 9 x Skt 7 / 12 x SS7 / 1 x Skt 8 / 14 x Slot 1 / 5 x Slot A
5 x Skt 370 / 8 x Skt A / 2 x Skt 478 / 2 x Skt 754 / 3 x Skt 939 / 7 x LGA775 / 1 x LGA1155
Current PC: Ryzen 7 5800X3D
Backup PC: Core i7 7700k

Reply 4 of 18, by cloppy007

User metadata
Rank Newbie
Rank
Newbie

First of all, thank you all for your replies.

The Serpent Rider wrote on 2023-09-12, 08:19:

Well, it is physical damage. Most likely GPU or RAM has lost contact with the PCB. Just because you've checked it before doesn't guarantee that it can work again, because BGA contact could crack during cooling phase.

I've always run it with a modded crystal orb that makes good contact

Unless the frame protection on GPU was removed or thick thermal pad was applied, you've basically just cooked it to death. And based on what I see from the internet, Crystal Orb is fairly mediocre chipset cooler, which won't be sufficient for Radeon 9700/9800 core.

Regarding cooking it, perhaps I have not explained myself clearly. When I bought this card back in 2002 the first thing I did was removing the cooler to use good quality paste. Then I saw what ATi/Hercules did and immediatly bought a new cooler. I sanded/carved the edges so that it would make perfect contact with the GPU die without removing the shim. There was a big difference that I could measure using a thermistor and also by touching the heatsink on the back (thanks for that, Hercules), which was more than 10º cooler. I don't think the core ever reached 50º during load. I used this card for 2-3 years and then put it into storage. Last month then I tested it and it run fine for two weeks. I highly doubt that the core decided to die because of temperature stress right after I sealed it in an antistatic bag.

The contact patch of the cooler looks very nice and indeed it's very hard to remove it after a clean application of TIM. Gravity alone won't take it out:

photo_2023-09-12_22-54-23.jpg
Filename
photo_2023-09-12_22-54-23.jpg
File size
189.16 KiB
Views
1579 views
File license
CC-BY-4.0

TL;DR: I did not bought this card used, it's mine and has always been properly cooled.

bloodem wrote on 2023-09-12, 08:52:

That behavior is usually indicative of broken solder joints (either on the GPU, or its memory ICs).
A damaged GPU core is also a possibility, but that usually results in having no display output at all.

While the system is running, try and press / gently bend the card in various areas (and press on each memory IC individually) to see if the display pattern changes.

If the core was damaged, do you think Windows would be able to load the right driver for it and also list the supported resolutions of the monitor plugged into it?

I did try applying a bit of force on each memory IC, the core itself and also bending the card, to no avail. There is no change in the displayed pattern.

acl wrote on 2023-09-12, 18:59:

Unfortunately it's a common sight on 9700 Pro.
I've had 4 of them... 3 were like yours.

Were all of them your cards or somebody else's? I'd consider that possibility if the card had died during operation, but that's not this case at all.

That said, I think I have a lead. I tried to read documentation on where to measure voltages and found this TechPowerUp guide. I've measure the following values:

  • VDDQ: 0.78V (what the...?)
  • VGPU 1.47V (a bit low, but OK)
  • VDD: 0.0V
  • VREF: 0.0 V (seems to be related to VDDQ)

I tested the VDD measured point with an oscilloscope (ZEEWEII dso1511g) and there was something fishy, it wasn't able to tell the frequency of the signal, it was fluctuating a lot. A multimeter just reads 0.

Then I checked some ICs for shorts and found that the second pin of U96 (qs3257S1) is grounded. Reading through the datasheet seems to indicate this is the (negated) enable input. I'm not good at this so it could mean nothing and this could all be a red herring.

I'll recheck the board under the microscope to check for damaged/burned components and try to guess how VDD and VREF are generated. Such a low VDDQ would indeed trigger memory issues, I guess. I might also remove one of the electrolytic caps and run it through the component tester.

Once again, thanks a lot!

Reply 5 of 18, by The Serpent Rider

User metadata
Rank l33t++
Rank
l33t++

Interesting. Well, looks like your card survived long enough to experience VRM/caps failure.

I must be some kind of standard: the anonymous gangbanger of the 21st century.

Reply 6 of 18, by darry

User metadata
Rank l33t++
Rank
l33t++

IMHO, one of the best ways to find a working Radeon 9700 variant is to get an OEM branded one from the likes of Dell .

Those were lower clocked from the factory and stand a greater chance of having survived the anemic stock cooling AND the GPU's core shim . My experience is sharec here [1].

[1]
[SOLVED, fixed by forcing AGP 1X] De-shimmed Radeon 9700 on 440BX crashes Windows 98 SE running 3DMARK 2003

Reply 7 of 18, by bloodem

User metadata
Rank Oldbie
Rank
Oldbie
cloppy007 wrote on 2023-09-12, 21:00:

Regarding cooking it, perhaps I have not explained myself clearly. When I bought this card back in 2002 the first thing I did was removing the cooler to use good quality paste. Then I saw what ATi/Hercules did and immediatly bought a new cooler. I sanded/carved the edges so that it would make perfect contact with the GPU die without removing the shim. There was a big difference that I could measure using a thermistor and also by touching the heatsink on the back (thanks for that, Hercules), which was more than 10º cooler. I don't think the core ever reached 50º during load. I used this card for 2-3 years and then put it into storage. Last month then I tested it and it run fine for two weeks. I highly doubt that the core decided to die because of temperature stress right after I sealed it in an antistatic bag.

Indeed, especially based on what you wrote, I don't think you are dealing with a dead core. Broken solder joints could still have been an issue, though (powering on the card after so many years and heating it up could cause some very tiny cracks that finally gave in during the subsequent transport), although, in your particular case, this doesn't seem to be the issue either.

cloppy007 wrote on 2023-09-12, 21:00:

The contact patch of the cooler looks very nice and indeed it's very hard to remove it after a clean application of TIM. Gravity alone won't take it out:
photo_2023-09-12_22-54-23.jpg
TL;DR: I did not bought this card used, it's mine and has always been properly cooled.

Yeah, that looks like a very good contact, so nothing to worry about in terms of temperature.

cloppy007 wrote on 2023-09-12, 21:00:

If the core was damaged, do you think Windows would be able to load the right driver for it and also list the supported resolutions of the monitor plugged into it?

No. As I said, the expected behavior for a dead core would be a black screen, but I've had my fair share of weird behaviors in the past (such as a dead Coppermine CPU that at first glance worked perfectly, but refused to run games, movies and music). So nothing surprises me anymore. 😀

cloppy007 wrote on 2023-09-12, 21:00:
That said, I think I have a lead. I tried to read documentation on where to measure voltages and found this TechPowerUp guide. I […]
Show full quote

That said, I think I have a lead. I tried to read documentation on where to measure voltages and found this TechPowerUp guide. I've measure the following values:

  • VDDQ: 0.78V (what the...?)
  • VGPU 1.47V (a bit low, but OK)
  • VDD: 0.0V
  • VREF: 0.0 V (seems to be related to VDDQ)

I tested the VDD measured point with an oscilloscope (ZEEWEII dso1511g) and there was something fishy, it wasn't able to tell the frequency of the signal, it was fluctuating a lot. A multimeter just reads 0.

Then I checked some ICs for shorts and found that the second pin of U96 (qs3257S1) is grounded. Reading through the datasheet seems to indicate this is the (negated) enable input. I'm not good at this so it could mean nothing and this could all be a red herring.

Very hard to tell without a schematic, but that does indeed sound strange. It might be an issue in a completely different location that causes a missing enable signal.
In situations like these, it really does help to have another identical working card, that you can use to compare voltages and resistances.

1 x PLCC-68 / 2 x PGA132 / 5 x Skt 3 / 9 x Skt 7 / 12 x SS7 / 1 x Skt 8 / 14 x Slot 1 / 5 x Slot A
5 x Skt 370 / 8 x Skt A / 2 x Skt 478 / 2 x Skt 754 / 3 x Skt 939 / 7 x LGA775 / 1 x LGA1155
Current PC: Ryzen 7 5800X3D
Backup PC: Core i7 7700k

Reply 9 of 18, by bloodem

User metadata
Rank Oldbie
Rank
Oldbie
stef80 wrote on 2023-09-13, 10:42:

It's dead, Jim.
Use known-to-work aftermarket coolers (Zalman VF700 / VF900 / FC-ZV9 or Arctic ATI silencer 1) or remove the shim.

Read the whole thread, Jim! 😉

1 x PLCC-68 / 2 x PGA132 / 5 x Skt 3 / 9 x Skt 7 / 12 x SS7 / 1 x Skt 8 / 14 x Slot 1 / 5 x Slot A
5 x Skt 370 / 8 x Skt A / 2 x Skt 478 / 2 x Skt 754 / 3 x Skt 939 / 7 x LGA775 / 1 x LGA1155
Current PC: Ryzen 7 5800X3D
Backup PC: Core i7 7700k

Reply 10 of 18, by stef80

User metadata
Rank Member
Rank
Member

Just did 😉. I guess I stopped reading after seeing garbled output pattern. BTW, reference design R300 cards were built like military-grade compared to everything else on the market at the time. I'm really interested to see what the issue is.

Reply 11 of 18, by cloppy007

User metadata
Rank Newbie
Rank
Newbie
The Serpent Rider wrote on 2023-09-13, 02:19:

Interesting. Well, looks like your card survived long enough to experience VRM/caps failure.

What a relief! I laughed out hard and loud. Now I just have to find what exactly is failing!

darry wrote on 2023-09-13, 06:21:
IMHO, one of the best ways to find a working Radeon 9700 variant is to get an OEM branded one from the likes of Dell . […]
Show full quote

IMHO, one of the best ways to find a working Radeon 9700 variant is to get an OEM branded one from the likes of Dell .

Those were lower clocked from the factory and stand a greater chance of having survived the anemic stock cooling AND the GPU's core shim . My experience is sharec here [1].

[1]
[SOLVED, fixed by forcing AGP 1X] De-shimmed Radeon 9700 on 440BX crashes Windows 98 SE running 3DMARK 2003

Thanks for the tip! I must confess I think I already read it some months ago whilst looking for experience on 440BX boards.

bloodem wrote on 2023-09-13, 08:45:

Indeed, especially based on what you wrote, I don't think you are dealing with a dead core. Broken solder joints could still have been an issue, though (powering on the card after so many years and heating it up could cause some very tiny cracks that finally gave in during the subsequent transport), although, in your particular case, this doesn't seem to be the issue either.

Sounds reasonable.

bloodem wrote on 2023-09-13, 08:45:

No. As I said, the expected behavior for a dead core would be a black screen, but I've had my fair share of weird behaviors in the past (such as a dead Coppermine CPU that at first glance worked perfectly, but refused to run games, movies and music). So nothing surprises me anymore. 😀

Yep, old hardware can be like that. Although back in the day I saw some very strange issues with new hardware. We had a brand new Packard Bell P166MMX that became unstable under what seemed random circumstances and made me miss our former 486DX4.

bloodem wrote on 2023-09-13, 08:45:

Very hard to tell without a schematic, but that does indeed sound strange. It might be an issue in a completely different location that causes a missing enable signal.
In situations like these, it really does help to have another identical working card, that you can use to compare voltages and resistances.

The closest card I have is an Hercules 9800Pro which looks similar but it's not quite the same. I think they're quite expensive right now but I'll see if I can source a working one.

I've yet to check everything under the microscope. Perhaps it's just a component leg that's now unsoldered and now it's missing an enable signal... I hope I can find some time soon.

stef80 wrote on 2023-09-13, 11:28:

Just did 😉. I guess I stopped reading after seeing garbled output pattern. BTW, reference design R300 cards were built like military-grade compared to everything else on the market at the time. I'm really interested to see what the issue is.

That's right: it seems to be using top of the line drivers and some polymer caps. Interestingly, modern AMD GPUs usually come with over the top power delivery (at least according to Buildzoid). I've had to replace caps of same-era nVidias or throw them away.

I don't think it's relevant but IIRC the card does draw something resembling the Windows mouse pointer. Perhaps with garbled characters but it's very recognisable with a bounding box around it.

Anyway, rest assured I'll let you all know once I find the root cause of the issue, even if stef80 had been right all along 😉

I must say I'm overwhelmed by so many people replying to this thread. I've always loved this community even though I had nothing interesting to share. Thanks a lot to all of you!

Reply 13 of 18, by stef80

User metadata
Rank Member
Rank
Member

Not sure how you could check that, since BGA chips are used. Reference Pro comes with very hot Samsung 2.8ns BGA chips.
Some vendors (Sapphire, FIC) also used Etrontech 2.8ns.

I have one 9700Pro with broken in half mem chip. Broken textures in 3D, but no pattern like this. No immediate artefacats in 2D.

Last edited by stef80 on 2023-09-17, 03:48. Edited 1 time in total.

Reply 14 of 18, by Roman555

User metadata
Rank Oldbie
Rank
Oldbie

Hi
I'd like to share a software utility that supposedly can test VRAM on Radeon R3xx/RV3xx graphic cards.
Though I've never had a chance to use it in practice. So run it on your own risk.
Good luck!

P.S. I googled a little bit and found a command to run the utility
R3MEMID –NOCFG –GENREF –LOG

Attachments

  • Filename
    R3MEMID.RAR
    File size
    306.66 KiB
    Downloads
    44 downloads
    File comment
    VRAM test utility
    File license
    Fair use/fair dealing exception
Last edited by Roman555 on 2023-09-17, 17:41. Edited 1 time in total.

[ MS6168/PII-350/YMF754/98SE ]
[ 775i65G/E5500/9800Pro/Vortex2/ME ]

Reply 15 of 18, by cloppy007

User metadata
Rank Newbie
Rank
Newbie
Roman555 wrote on 2023-09-15, 10:55:
Hi I'd like to share a software utility that supposedly can test VRAM on Radeon R3xx/RV3xx graphic cards. Though I've never had […]
Show full quote

Hi
I'd like to share a software utility that supposedly can test VRAM on Radeon R3xx/RV3xx graphic cards.
Though I've never had a chance to use it in practice. So run it on your own risk.
Good luck!

Thanks a lot! I was really looking forward to finding something like that. I'd first like to sort out the power delivery issue as I'm rather confident that such test will fail if the ram is not properly powered.

I've now probed most of the controllers and it seems to me that VRef has an issue with one of its mosfets

stef80 wrote on 2023-09-15, 07:16:

Not sure how you could check that, since BGA chips are used. Reference Pro comes with very hot Samsung 2.8ns BGA chips.
Some vendors (Sapphire, FIC) also used Etrontech 2.8ns.

I have one 9700Pro with broken in half mem chip. Broken textures in 3D, but no pattern like this. No immediate artefacats in 2D.

Stef80's right, these are BGA chips.

Reply 16 of 18, by cloppy007

User metadata
Rank Newbie
Rank
Newbie

Well, I've got an update, and you're going to love it.

Apparently, it was the power supply. As it was the only component I haven't replaced in my current test setup, I replaced the PSU with a known bad PSU that gives low voltages (on non-ATX12V 2.0 motherboards). The 9700 Pro works with no issues so far.

I also cross-checked by replacing the 9700Pro with a GF 6200 PCI that suddenly refused to boot. In fact, it made the whole system not to boot. It also booted and worked just fine.

I had no reason to suspect of my PSU: both 12 and 5V lines were pretty ok, if a bit on the low side (11.80 and 4.90V, respectively). But it's had a long live. The "bad" one is a cheap 550W one that I've only used to bleed water cooling systems and I don't trust.

I will still run the memory test program, however. And give the card a comprehensive stability evaluation.

Reply 18 of 18, by cloppy007

User metadata
Rank Newbie
Rank
Newbie
Roman555 wrote on 2023-09-15, 10:55:
Hi I'd like to share a software utility that supposedly can test VRAM on Radeon R3xx/RV3xx graphic cards. Though I've never had […]
Show full quote

Hi
I'd like to share a software utility that supposedly can test VRAM on Radeon R3xx/RV3xx graphic cards.
Though I've never had a chance to use it in practice. So run it on your own risk.
Good luck!

P.S. I googled a little bit and found a command to run the utility
R3MEMID –NOCFG –GENREF –LOG

Thanks a lot Roman555! As the the card passed the test, I recorded it if it proves to be useful for other people: https://www.youtube.com/watch?v=24EeNXhCYpg