VOGONS


First post, by tehsiggi

User metadata
Rank Newbie
Rank
Newbie

Dear Vogons Members,

I've been lurking this forum for a long time now and I'll go ahead and start contributing back. Hoping that my participation and knowledge in some fields might help others.

Not too long ago, I've spotted a Hercules Radeon 9700 Pro on ebay that was listed as "defective". While Radeon 9700 Pros are more and more hard to come by, the Hercules variant itself always felt like an unicorn to me. To be honest, back in the day this would have been my go-to variant, if I only had the money. I went on with a measly Radeon 9200SE from HIS, but that's another story.

So I decided to buy that card and check it out, seeing what I'll make out of it. In any case it would look pretty in my small "collection".

### First Impressions ###

The card itself looked relatively tidy and clean. The french seller appears to have specialized on selling retro hardware, so that condition was to be expected. The fan is missing one blade and makes an awful noise. Without the blade, it's very unbalanced and the bearing apparently didn't like that either (who would have thought) - It appears to be a TTC-CUV2AB from Titan, with a Hercules Sticker that is. However, the fan is my least concern.

The seller mentioned that the GPU produces artifacts and provided pictures of that as well.
It turned out that he was right and furthermore, the GPU is not very keen on a) getting drivers installed and running and b) running demanding 3D games.

Here is what the artifacts in a 2D environment look like:

errors.jpg

Screenshot-from-2025-04-15-11-52-12.png

### Diagnosis - Part I ###

Coming from an engineering and software development background, I started doing my usual "poking". How do things look like on that graphics card?

First of all, the artifact pattern appears to be repeating and consistent. It does not vary with temperature or time. Checking the card for obvious destruction under the microscope and with my pair of eyes to not result in any significant finding. The card looks to be in a good shape.

The PCB is a reference board design that AIBs received from ATI, so everything on here can be compared to any other Radeon 9700 Pro (or even 9500 non Pro 256bit) in reference design. This is a big plus, as these layouts and its schematic are well documented in the web. The only difference is the cooling. Hercules did apply some heatsinks to the memory as well to the backside of the GPU (or ASIC, as ATI appears to have called it in the time), which would imply two things: First, they expected components to run hot. From what I've read and experienced, DDR1 memory with 300+ MHz tends to do so. Second: They noticed and wanted to mitigate heat issues.

While there could be a third reason: Looks cool and sells better, I assume this was mostly due to the fact that these chips tend to run hot.

Since the PCB inspection did not yield any significant results, I checked to important voltages like VDDC, MVDDC, MVDDQ etc. All of them looked in spec from their values and also with regards to ripple on the Oscilloscope - so apparently a dead end.

Then I went on and gone physical: What happens if I apply pressure anywhere? Since the pattern looks to me like memory issues, I started pressing firmly on those memory heatsinks to apply pressure, hoping that it might show any bad solder joints under those BGA memory ICs.

It appeared as if I was out of luck, no changes. Also pressure on the GPU did not change a thing. For the kicks I tried cold or hot air as well (below zero °C to 100°C) - no change.

So what could I conclude so far:

  • There is no obvious damage to the PCB and / or components that could cause these artifacts
  • Voltages and regulation appear to be within expected ranges
  • There is no change in behavior if pressure is applied or the component temperature is significantly changed
  • The behavior is consistent, every boot up has the same pattern, artifacts are visible even on the POST screen, indicating this is not a driver issue

This now left me in Limbo. Something appears to be physically wrong with this card, which immediately brings up memories of all the "dead R300" stories from the past.

I decided to not give up yet and do some more research..

### Diagnosis - Part II ###

There is another Thread on Vogons which - interestingly - also has a Hercules Radeon 9700 Pro showing artifacts. The severity is completely different however and the issue there was the Power Supply. Since by design, the Radeon 9700 Pro derives its GPU voltage from the 5V and 3.3V rail, it is very sensitive to having a proper Power Supply. Since I know my Power Supply is fine and I validated that with other GPUs like a Radeon 9500, 9550, 9600 Pro etc. plus the readings from the voltage rails on the PCB, I ruled out that possibility pretty quickly.

However, the user in this thread mentions a tool called "R3MEMID" - which is basically a DOS utility provided by ATI to validate the memory on R300 based video cards. Similar tools exist and can be found in the web for later generations like "R5MEMID".

I used this tool and ran the tests, which - obviously - failed. However the resulting log file shows an interesting discovery:

R3MEMID version 1.07, (c) Copyright ATI Technologies Inc, 2003
Log file generation enabled to .\R3MEMID.LOG ...
Reference data file (RDF) loading disabled ...
[1 ] Fill : FAIL
Error ID 0VB001
1024 x 768 - 32 bpp ( 60 Hz): TEST FAILURE
failing bit : MDA0 32 33 34 35 36 37 38 39 ...

[1 ] Fill : FAIL
Error ID 0VB001
1024 x 768 - 32 bpp ( 60 Hz): TEST FAILURE
failing bit : MDA0 32 33 34 35 36 37 38 39 ...

I did not attach the full log, since it's basically only repeating the above output.

It indicates a memory error on MDA0 - starting with bits 32 onward. Now this is really interesting and - somewhat - promising. It does not start at 0, which is an indication for me that not the whole MDA0 is affected. Furthermore the number 32 is a multiple of 2, indicating that this isn't a coincidence.

To figure out, what this means for repairing the graphics card, we have to take a look at the memory controller design of the R300.

mem-schema.png

The R300 houses four memory channels, which are often referred to as Channel A to D in schematics.
Each channel has a width of 64 bits, which is achieved by adding two 32 bit memory modules to it. When you do the math, this brings the R300 to a 256bit memory interface which made it so great back in the day.
Reading the log output and checking the schematics, I came to the conclusion that the program simply translates the memory channels to numbers, meaning A=0, B=1, C=2 and D=3. There is no "MDA0" on the schematics that would make sense in that context. There is a single connection called "MDA0" that goes to the memory, but how would that then translate to bit 32 onward?

So my working assumption of the log output is the following:
There appears to be an issue on MDAO, which is memory channel A, with bits 32 and onward.

Checking with the schematic this makes sense: Channel A has two memory chips, of which one goes to data bits 0-31 and the second one goes to data bis 32-63. So it would be that second chip that is our issue.

Locating this chip on the board is easy, since we know the marking from the schematic to be U53. A quick look onto the PCB reveals it to be the memory chip in the bottom corner next to the AGP Port on the backside of the PCB.

ic.jpg

### Finding a replacement ###

Looking into the interwebs it appears hard to find a replacement matching the same part number.
After carefully removing the memory heatsink on both the defective memory IC as well as it's counterpart on the other side of the PCB, it is revealed to be a K4D26323RA-GC2A

Those chips are manufactured by Samsung and are highly obsolete by now. It is easier to find chips with the K4D26323RA-GC2B marking however. I bought mine on ebay through a chinese seller. As always, those deals have to be taken with a grain of salt. You never know if what you get is what you want.

I checked beforehand and there appear to be at least some GeForce Ti4600 variants with that chip. Though Samsung never published a datasheet nor listed this part number on their page, it leads me to believe that this designation is legit.

I was lucky and a couple days later the memory chips arrived: K4D26323RA-GC2B.

### The repair itself ###

The first step, even before replacing anything, was to remove the heatsinks on the memory modules. They are stuck with adhesive, which is easier to get off when warmed up. So I took my hotair station to 150°C, nothing too hot and gave the heatsinks some heat for a couple of seconds. After that is was easy to remove the heatsinks with a slight twisting motion and a pair of tweezers.

old-mem.jpg

To remove the memory chip itself I applied some flux around it and used my hotair gun at around 380°C with medium throughput. I taped of surrounding components using kapton tape to reduce thermal stress on them.

I did not take too long and I had the chip removed, revealing the BGA pads below:

bga-pads-dirty.jpg

After that, I removed all excess solder and flux:

bga-pads-clean.jpg

For the new chip, some small amount of solder was applied to all pads, as well as a fresh amount of flux. Then I aligned the chip carefully.

To re-solder, I used the same temperature but lower air throughput. This ensures I don't "blow away" the chip out of position.

A couple of seconds later the chip moved into position and the solder balls melted onto their respective pads (always an oddly satisfying view).

new-mem.jpg

### Testing ###

The moment of thruth - did it work?

YES!

The card is alive and happy. The artifacts are gone and 3D games work just fine again. I quickly ran 3DMark 2001SE and the result appears to be within expected range.
The hard was then tested with a burn-in test using S.T.A.L.K.E.R to see if it remains stable, which it did.

result.jpg

dut.jpg

### Last issue, the cooler ###

The cooler appears to be a lost cause. The bearing is broken and very noisy. A fan blade is missing as well and even one of the mounting stands for the fan is broken off.

cooler.jpg

I'll see how to find a replacement. I want to restore the full original look, meaning I'm looking for that Titan cooler. In the meantime, I'll run the card with a cheap 3rd party cooler that is easy to get via aliexpress etc. - but not too excessive.

card-with-cooler.jpg

AGP Power monitor - diagnostic hardware tool
Graphics card repair collection

Reply 1 of 14, by bloodem

User metadata
Rank Oldbie
Rank
Oldbie

Great detective work and excellent repair! Congrats!
I know this card, I saw it on eBay ~ 6 months ago, being sold by CrocoRetro (a French seller). I wanted to buy it, but I was too busy to try and fix it at the time. And now I'm glad that someone like you bought it and gave it another chance at life! 😀

2 x PLCC-68 / 4 x PGA132 / 5 x Skt 3 / 1 x Skt 4 / 9 x Skt 7 / 12 x SS7 / 1 x Skt 8 / 14 x Slot 1 / 6 x Slot A
5 x Skt 370 / 8 x Skt A / 2 x Skt 478 / 2 x Skt 754 / 3 x Skt 939 / 7 x LGA775 / 1 x LGA1155
Current PC: Ryzen 7 9800X3D
Backup: Ryzen 7 5800X3D

Reply 3 of 14, by bloodem

User metadata
Rank Oldbie
Rank
Oldbie

Yeah, I remember when, in my first year of college, a friend of a friend bought a brand new PC that came with a Barton 2500+ and a Radeon 9700 PRO. We were all at his house, watching in awe as 3DMark03 was running on the screen! 😁
I was also poor, so I only had a GeForce 3 Ti 200 at the time, which I used until 2008, when I finally upgraded to a GeForce 8800GT.

2 x PLCC-68 / 4 x PGA132 / 5 x Skt 3 / 1 x Skt 4 / 9 x Skt 7 / 12 x SS7 / 1 x Skt 8 / 14 x Slot 1 / 6 x Slot A
5 x Skt 370 / 8 x Skt A / 2 x Skt 478 / 2 x Skt 754 / 3 x Skt 939 / 7 x LGA775 / 1 x LGA1155
Current PC: Ryzen 7 9800X3D
Backup: Ryzen 7 5800X3D

Reply 4 of 14, by tehsiggi

User metadata
Rank Newbie
Rank
Newbie

That sounds awfully familiar: I used my Radeon 9200SE for a long time, until my dad had his Hercules Radeon 9000 as a leftover. Though the RV250 and RV280 are pretty similar, the striking difference of 128bit vs. 64 bit memory interfaces was a thing.
I then moved to a Radeon X700 AGP in 2006, which remained my card until I moved to a GeForce 9600GT in 2008.

The Radeon 9XX0 Series holds a special place in my heart. All of them to be honest. More to come..

AGP Power monitor - diagnostic hardware tool
Graphics card repair collection

Reply 5 of 14, by bloodem

User metadata
Rank Oldbie
Rank
Oldbie
tehsiggi wrote on 2025-04-15, 16:29:

That sounds awfully familiar: I used my Radeon 9200SE for a long time, until my dad had his Hercules Radeon 9000 as a leftover. Though the RV250 and RV280 are pretty similar, the striking difference of 128bit vs. 64 bit memory interfaces was a thing.

Oh, for sure. 64 bit vs 128 bit was night and day!

tehsiggi wrote on 2025-04-15, 16:29:

The Radeon 9XX0 Series holds a special place in my heart. All of them to be honest. More to come..

Looking forward to any other repairs you might do in the future! 😁

2 x PLCC-68 / 4 x PGA132 / 5 x Skt 3 / 1 x Skt 4 / 9 x Skt 7 / 12 x SS7 / 1 x Skt 8 / 14 x Slot 1 / 6 x Slot A
5 x Skt 370 / 8 x Skt A / 2 x Skt 478 / 2 x Skt 754 / 3 x Skt 939 / 7 x LGA775 / 1 x LGA1155
Current PC: Ryzen 7 9800X3D
Backup: Ryzen 7 5800X3D

Reply 6 of 14, by tehsiggi

User metadata
Rank Newbie
Rank
Newbie
bloodem wrote on 2025-04-15, 17:00:

Looking forward to any other repairs you might do in the future! 😁

There is a couple of Radeon 9800 Pros waiting, a Radeon 9000 Pro with a reference layout that needs some love, a re-cap of a 9200SE and some other fun projects all around the 9XX0 Series coming up. I've been preparing a lot!

AGP Power monitor - diagnostic hardware tool
Graphics card repair collection

Reply 7 of 14, by fix_metal

User metadata
Rank Newbie
Rank
Newbie

Super nice repair!!!

Ah, the 9700Pro. I remember it was such a desiderata for me back then, then I got a 9700 for cheap (my teenage wallet was way small), patched to 9700pro, and replaced the tiny heat cooler with a huge 3rd party one with even a bigger fan. It didn't last though.
After a couple of years I throw it away as it died abruptly - literally sudden death. I was so disappointed...

Reply 8 of 14, by tehsiggi

User metadata
Rank Newbie
Rank
Newbie

Unfortunately the 9700 Pros all appear to have had issues with heat/cooling due to their shim design.

But even 9800s with their R350s appear to be not completely safe from that. I've got one patient here which will probably require a new ASIC donor.
See the bulges on the green parts of the GPU:

r350bulge.jpg

A good one looks like this
r350healthy.jpg

And even though the coolers have a proper offset for the die to make sure the shim does not collide:

r9800pro-cooler.jpg

AGP Power monitor - diagnostic hardware tool
Graphics card repair collection

Reply 9 of 14, by The Serpent Rider

User metadata
Rank l33t++
Rank
l33t++

From my observation, most of them die from bad BGA memory.

I must be some kind of standard: the anonymous gangbanger of the 21st century.

Reply 10 of 14, by tehsiggi

User metadata
Rank Newbie
Rank
Newbie

I had a couple of them where it appeared that the ASIC was not too happy anymore. But yes, memory appears to be a spread issue too. Luckily, it's easier to source the memory chips than an ASIC replacement.

Just have a 9800Pro laying around which appears to have an issue on MDB bit 60 - I'll see if that's really the issue or a dead end.

One correction to R3MEMID:
The enumeration is MDA, MDB, MDC and MDD apparently, with a 0 suffixed.
I suspect the suffix might be related to the memory rank. The R300 (and R350/R360) only use 1 rank per memory channel, whereas other GPUs like RV350 or RV360 use two ranks. I suspect they'd have MDA0 and MDA1, referencing the two ranks.

I'll know more once I get my hands on a faulty non 256bit memory ASIC card.

Furthermore I've noticed during some of my work, that apparently R3MEMID requires a GPU with the architecture that it at least Rage6 (R100) - I suspect it might also work with R200 (Rage7) based cards. I'll test with a 9000, 9100 and 9200 tomorrow. Perhaps it'll work and come in handy to save some cards of that architecture as well.

AGP Power monitor - diagnostic hardware tool
Graphics card repair collection

Reply 11 of 14, by stef80

User metadata
Rank Member
Rank
Member

@tehsiggi
I have Gigabyte Maya II 9700 Pro with cracked BGA chip (Samsung). GPU is ok. Was keeping it for spare parts, but since then I got quite a few R300 cards. If you are interested ... free of charge, just shipping.

Reply 12 of 14, by tehsiggi

User metadata
Rank Newbie
Rank
Newbie
stef80 wrote on 2025-04-16, 19:08:

@tehsiggi
I have Gigabyte Maya II 9700 Pro with cracked BGA chip (Samsung). GPU is ok. Was keeping it for spare parts, but since then I got quite a few R300 cards. If you are interested ... free of charge, just shipping.

I'd be delighted to take a look at this card. Unfortunately, since my account is very fresh, it appears I can not send personal messages yet. But perhaps you can drop me a message with a way to contact you elsewhere?

That'd be great. I'm happy with any R300 that gets a second life.

AGP Power monitor - diagnostic hardware tool
Graphics card repair collection

Reply 13 of 14, by stef80

User metadata
Rank Member
Rank
Member

I've sent you a contact email via private message.

Reply 14 of 14, by tehsiggi

User metadata
Rank Newbie
Rank
Newbie
tehsiggi wrote on 2025-04-16, 18:42:

Furthermore I've noticed during some of my work, that apparently R3MEMID requires a GPU with the architecture that it at least Rage6 (R100) - I suspect it might also work with R200 (Rage7) based cards. I'll test with a 9000, 9100 and 9200 tomorrow. Perhaps it'll work and come in handy to save some cards of that architecture as well.

So I gave that a shot:
When booting with a S3 Trio64, it prompts that it needs at least Rage6 as the architecture.
When booting with a Radeon 9200 (RV280) it prompts that no R300, R350 or RV350 has been detected.

Testing with the following cards worked:
Radeon 9500 (R300 - 256bit / 128MB)
Radeon 9700Pro (R300 - 256bit / 128MB)
Radeon 9800Pro (R360 - 256bit / 128MB)
Radeon 9550 (RV350 - 128bit /256MB)
Radeon 9600Pro (RV350 - 128bit / 256MB DDR1)
Radeon 9600Pro (RV360 - 128bit / 256MB DDR2)

I also tested with a Radeon X700 AGP (RV410) and that one does not work. I get the same "Rage6 required" message.

So it looks like R3MEMID contains logic that detects GPUs like R100, R200 based ones and then complains there is no R300, but doesn't know a thing about others. The Rage6 reference implies to me that this program in some sort of form has been developed off of an existing program for these previous generations.

AGP Power monitor - diagnostic hardware tool
Graphics card repair collection