VOGONS


First post, by majestyk

User metadata
Rank Oldbie
Rank
Oldbie

I have 6 sticks 72-pin 128 MB Fast Page Mode SIMMS with parity (branded IBM, all chips are Samsung).
These sticks serve mainly for testing Intel 430HX mainboards, when I check the total cacheable area and RAM population must be maxed out.

Recently there occured instabilities during the tests that I first attributed to other hardware problems, but finally I tested all 6 sticks with Memtest and found two of them to cause errors.

fpm128_60.JPG
Filename
fpm128_60.JPG
File size
899.07 KiB
Views
2696 views
File license
Fair use/fair dealing exception

The contacts are clean, I couldn´t find aly loose pins and the errors are always occuring in the same memory ranges (79MB, 95MB).

fpm128_60err.JPG
Filename
fpm128_60err.JPG
File size
213.25 KiB
Views
2685 views
File license
Fair use/fair dealing exception

How can I find the defective memory chips and is it worth replacing them? (There´s only very few space for soldering also.)
And how can I tell if a memory or parity chip is the culprit?

Reply 2 of 24, by rasz_pl

User metadata
Rank l33t
Rank
l33t

hotair / hot plate. Its so old I would bake it in 100C for a while first
can you disable parity in bios and test? that would eliminate parity chips
I guess "bits in error mask" tells us which bits are bad? its a question of mapping that number to actual databus. Desoldering one ram chip should give you a clue if motherboard was routed 1:1

Open Source AT&T Globalyst/NCR/FIC 486-GAC-2 proprietary Cache Module reproduction

Reply 3 of 24, by whitepawn

User metadata
Rank Newbie
Rank
Newbie

I had a similiar issue with my 4 mb FPM SIMM with parity, it fails on memory test. Removed parity chips and tested it but no luck, then i searched online for testing chips one by one with arduino and found that some people are checking simm modules with arduino.

I will try to code something like this. I think desoldering chips one by one and test them invidually is only option. Any ideas are welcome.

Here is some links:

https://github.com/zrafa/30pin-simm-ram-arduino

https://github.com/AlexandreRouma/SIMM

Reply 4 of 24, by weedeewee

User metadata
Rank l33t
Rank
l33t

Just guessing here. Going on the photo of the memtest error.

Bits in error mask : 00000200
would indicate the third nibble , since the simm uses 4 bit wide chips
or better said, errors are on bit d9, which would be pin 51 on the simm, and would likely trace to, i'm guessing the third chip from the right, ie the one underneath the part of the label that says removed, or the one on the other side of the board

Allthough, it could be the one on the other side of the simm 😀 or some totally different chip.
So pin 51 on the simm should connect to... looking at the datasheet, pin 2 of the third chip.
If it isn't check all the other chips, pin 2,3, 30 & 31 to see which pin is connected to the 51st pad on the simm.
There should be two chips connected. Now which one is the one with alzheimers will depend on how the address bus is connected.
I'll leave that for someone else to possibly figure out.

All of this is offcourse assuming the mainboard has data bit 9 connected to pad 51 of the simm sockets

edit : it's also completely possible I misinterpreted the error mask and it's actually the tenth bit... sigh in which case, chip pin 3 and simm pad 53

Right to repair is fundamental. You own it, you're allowed to fix it.
How To Ask Questions The Smart Way
Do not ask Why !
https://www.vogonswiki.com/index.php/Serial_port

Reply 6 of 24, by majestyk

User metadata
Rank Oldbie
Rank
Oldbie

The next release of Memtest is supposed to determine single defective RAM chips - but for DDR5 only...
https://allinfo.space/2022/08/11/memtest86-ne … and-even-chips/

At the moment I´m re-running the tests more precisely after it turned out only one stick is faulty. The second only showed errors when combined with the first one.
I´m testing with Parity enabled and disabled first.

Situation 1:
Parity enabled in BIOS, faulty stick in socket "SIMM1", another healthy stick in socket "SIMM2".
-> Test freezes after 9 minutes during "TEST #6" with this "unexpected interrupt - halting CPU0" message:

test_no_1.JPG
Filename
test_no_1.JPG
File size
209.27 KiB
Views
2515 views
File license
Public domain

Situation 2:
Parity enabled in BIOS, faulty stick in socket "SIMM2", another healthy stick in socket "SIMM1".
-> Test stops after 9 minutes during "TEST #5" with errors at 95MB like this:

test_no_2.JPG
Filename
test_no_2.JPG
File size
212.98 KiB
Views
2515 views
File license
Public domain

Situation 3:
Parity disabled in BIOS, faulty stick in socket "SIMM1", another healthy stick in socket "SIMM2".
-> Test finishes after 49 minutes with errors at 95MB and 222.5MB:

test_no_3.JPG
Filename
test_no_3.JPG
File size
155.26 KiB
Views
2487 views
File license
Public domain

Situation 4:
Parity disabled in BIOS, faulty stick in socket "SIMM2", another healthy stick in socket "SIMM1".
-> Test finishes after 49 minutes with errors at 95MB and 222.5MB like before:

test_no_4.JPG
Filename
test_no_4.JPG
File size
158.02 KiB
Views
2495 views
File license
Public domain

Note that with parity disabled Memtest won´t freeze and there are no "interrupts halting CPU".

Last edited by majestyk on 2022-09-01, 11:30. Edited 4 times in total.

Reply 7 of 24, by Sphere478

User metadata
Rank l33t++
Rank
l33t++

You could use hot air to make the chips error.

Record the memory range each chip errors at. When you find one (maybe two) that are close to the failing one that is your chip.

As for removing them. Tweezers and an oven.

Sphere's PCB projects.
-
Sphere’s socket 5/7 cpu collection.
-
SUCCESSFUL K6-2+ to K6-3+ Full Cache Enable Mod
-
Tyan S1564S to S1564D single to dual processor conversion (also s1563 and s1562)

Reply 9 of 24, by majestyk

User metadata
Rank Oldbie
Rank
Oldbie
konc wrote on 2022-08-31, 09:48:

I assume it's not tight memory timing on a specific PC causing the errors and you've already thought about it.

No, I didn not think about that. Memory timing (there´s only one setting in BIOS) is set to 70nS, the chips are 60nS.
Why should the test always fail for one single stick out of 6 when the issue was PC / Mainboard related`?

Instabilities occured with these Sticks on several mainboards as I explained in the first posting. That was the reason for further examination and running the tests in the first place.

Reply 10 of 24, by konc

User metadata
Rank l33t
Rank
l33t
majestyk wrote on 2022-08-31, 10:11:
No, I didn not think about that. Memory timing (there´s only one setting in BIOS) is set to 70nS, the chips are 60nS. Why shoul […]
Show full quote
konc wrote on 2022-08-31, 09:48:

I assume it's not tight memory timing on a specific PC causing the errors and you've already thought about it.

No, I didn not think about that. Memory timing (there´s only one setting in BIOS) is set to 70nS, the chips are 60nS.
Why should the test always fail for one single stick out of 6 when the issue was PC / Mainboard related`?

Instabilities occured with these Sticks on several mainboards as I explained in the first posting. That was the reason for further examination and running the tests in the first place.

Well if you haven't manually tightened the timings or always set a memory setting to "super ultra lightning fast" or similar then of course it's not that.
Maybe running memtest one last time with fail-safe defaults in BIOS will completely eliminate this possibility.

But this is exactly what happens when you run memory out of specs/too fast: one or more chips can't handle it and behave as bad, which is not easily distinguishable from a faulty chip, if at all.

Reply 11 of 24, by rasz_pl

User metadata
Rank l33t
Rank
l33t
majestyk wrote on 2022-08-31, 06:47:

The next release of Memtest is supposed to determine single defective RAM chips - but for DDR5 only...
https://allinfo.space/2022/08/11/memtest86-ne … and-even-chips/

yeah nah, in order for this to work passmark needs to individually certify and curate a list of supported motherboards (chipset and address/data bus pin) and ram modules (again address/data to individual chip) mappings. Even the tweet says its for a narrow set of supported hardware.

Open Source AT&T Globalyst/NCR/FIC 486-GAC-2 proprietary Cache Module reproduction

Reply 12 of 24, by weedeewee

User metadata
Rank l33t
Rank
l33t
majestyk wrote on 2022-08-31, 06:47:
The next release of Memtest is supposed to determine single defective RAM chips - but for DDR5 only... https://allinfo.space/202 […]
Show full quote

The next release of Memtest is supposed to determine single defective RAM chips - but for DDR5 only...
https://allinfo.space/2022/08/11/memtest86-ne … and-even-chips/

At the moment I´m re-running the tests more precisely after it turned out only one stick is faulty. The second only showed errors when combined with the first one.
I´m testing with Parity enabled and disabled first.

Situation 1:
Parity enabled in BIOS, faulty stick in socket "SIMM1", another healthy stick in socket "SIMM2".
-> Test freezes after 9 minutes during "TEST #6" with this "unexpected interrupt - halting CPU0" message:

test_no_1.JPG

Situation 2:
Parity enabled in BIOS, faulty stick in socket "SIMM2", another healthy stick in socket "SIMM1".
-> Test stops after 9 minutes during "TEST #5" with errors at 95MB like this:

test_no_2.JPG

Situation 3:
Parity enabled in BIOS, faulty stick in socket "SIMM1", another healthy stick in socket "SIMM2".
-> Test finishes after 49 minutes with errors at 95MB and 222.5MB:

test_no_3.JPG

Situation 4:
Parity disabled in BIOS, faulty stick in socket "SIMM2", another healthy stick in socket "SIMM1".
-> Test finishes after 49 minutes with errors at 95MB and 222.5MB like before:

test_no_4.JPG

Note that with parity disabled Memtest won´t freeze and there are no "interrupts halting CPU".

These tests make me think you have a problem with simm socket 2. Any chance of running the test with two good simms in socket 1 & 2 & 3 and the faulty one in 4 ?
Also your L1 cache speed is suspiciously low, though maybe that's just the software

Right to repair is fundamental. You own it, you're allowed to fix it.
How To Ask Questions The Smart Way
Do not ask Why !
https://www.vogonswiki.com/index.php/Serial_port

Reply 13 of 24, by konc

User metadata
Rank l33t
Rank
l33t
weedeewee wrote on 2022-09-01, 10:21:

Also your L1 cache speed is suspiciously low, though maybe that's just the software

...but not in the first photo, which is what made me discuss different machines and timings

Reply 14 of 24, by majestyk

User metadata
Rank Oldbie
Rank
Oldbie

This morning I swapped the 2 RAM sticks and the CPU to a different Mainboard (Elitegroup P5HX-A) and I´m currently repeating the tests.

"Situation 1":
Parity enabled in BIOS, faulty stick in socket "SIMM1", another healthy stick in socket "SIMM2".

test_no_1_1.JPG
Filename
test_no_1_1.JPG
File size
143.79 KiB
Views
2388 views
File license
Public domain

"Situation 2":

Parity enabled in BIOS, faulty stick in socket "SIMM2", another healthy stick in socket "SIMM1".

test_no_2_1.JPG
Filename
test_no_2_1.JPG
File size
142.08 KiB
Views
2384 views
File license
Public domain

"Situation 3":
Parity disabled in BIOS, faulty stick in socket "SIMM1", another healthy stick in socket "SIMM2".

test_no_3_1.JPG
Filename
test_no_3_1.JPG
File size
107.96 KiB
Views
2391 views
File license
Public domain

and here´s the result of "Sitation 4":
Parity disabled in BIOS, faulty stick in socket "SIMM2", another healthy stick in socket "SIMM1".

test_no_4_1.JPG
Filename
test_no_4_1.JPG
File size
107.83 KiB
Views
2401 views
File license
Public domain

The L1 cache speed seems more realistc now, but still too slow. For now I assume this is some Memtest flaw.
Here are the current BIOS settings:

test_RAM_sett.JPG
Filename
test_RAM_sett.JPG
File size
127.79 KiB
Views
2391 views
File license
Public domain

Reply 15 of 24, by majestyk

User metadata
Rank Oldbie
Rank
Oldbie
weedeewee wrote on 2022-08-30, 18:42:
Just guessing here. Going on the photo of the memtest error. […]
Show full quote

Just guessing here. Going on the photo of the memtest error.

Bits in error mask : 00000200
would indicate the third nibble , since the simm uses 4 bit wide chips
or better said, errors are on bit d9, which would be pin 51 on the simm, and would likely trace to, i'm guessing the third chip from the right, ie the one underneath the part of the label that says removed, or the one on the other side of the board

Allthough, it could be the one on the other side of the simm 😀 or some totally different chip.
So pin 51 on the simm should connect to... looking at the datasheet, pin 2 of the third chip.
If it isn't check all the other chips, pin 2,3, 30 & 31 to see which pin is connected to the 51st pad on the simm.
There should be two chips connected. Now which one is the one with alzheimers will depend on how the address bus is connected.
I'll leave that for someone else to possibly figure out.

All of this is offcourse assuming the mainboard has data bit 9 connected to pad 51 of the simm sockets

edit : it's also completely possible I misinterpreted the error mask and it's actually the tenth bit... sigh in which case, chip pin 3 and simm pad 53

I just tested this. Pin 51 of the SIMM socket goes to pin 30 of chip #13 and #14.

fpm128_60a.jpg
Filename
fpm128_60a.jpg
File size
907.21 KiB
Views
2376 views
File license
Public domain

Reply 16 of 24, by rasz_pl

User metadata
Rank l33t
Rank
l33t

Situation 3 and Situation 4 give same 'error mask 1', means memtest86 is now useless at anything more than detection. I distinctly remember back in the day memtest showing expected and actual data for every encountered error so you could at least see how many bits got flipped.

Open Source AT&T Globalyst/NCR/FIC 486-GAC-2 proprietary Cache Module reproduction

Reply 17 of 24, by weedeewee

User metadata
Rank l33t
Rank
l33t
rasz_pl wrote on 2022-09-01, 15:49:

Situation 3 and Situation 4 give same 'error mask 1', means memtest86 is now useless at anything more than detection. I distinctly remember back in the day memtest showing expected and actual data for every encountered error so you could at least see how many bits got flipped.

yeah, that is annoying, not to have that data anymore.
I thought the error mask represents which bit was actually in error.
maybe OP can try an older version of memtest, although the error mask has changed over the first photo vs the later ones, while the memtest version remained the same.

The simm pad to chip pin connection confuses me a little at the moment.

Goodmorning !

Right to repair is fundamental. You own it, you're allowed to fix it.
How To Ask Questions The Smart Way
Do not ask Why !
https://www.vogonswiki.com/index.php/Serial_port

Reply 18 of 24, by mkarcher

User metadata
Rank l33t
Rank
l33t
weedeewee wrote on 2022-09-02, 06:19:
yeah, that is annoying, not to have that data anymore. I thought the error mask represents which bit was actually in error. ma […]
Show full quote
rasz_pl wrote on 2022-09-01, 15:49:

Situation 3 and Situation 4 give same 'error mask 1', means memtest86 is now useless at anything more than detection. I distinctly remember back in the day memtest showing expected and actual data for every encountered error so you could at least see how many bits got flipped.

yeah, that is annoying, not to have that data anymore.
I thought the error mask represents which bit was actually in error.
maybe OP can try an older version of memtest, although the error mask has changed over the first photo vs the later ones, while the memtest version remained the same.

You can configure the error reporting mode in memtest using the "(c)onfiguration" hot key. I hope the classic error reporting mode is still available as option. This summary report detailing which test had how many errors is somehow nice, but it doesn't help in locating bad chips. Also, continue testing without parity. The "unexpected interrupt" dumps obscure the actual error summary, and those interrupts are likely caused by the board detecting parity errors. If you can't get memtest86 4.3 to print classic error messages, switch to memtest86+ v4.10, which is known to work on classic computers down to 386 machines and does the classic error reporting.

Reply 19 of 24, by majestyk

User metadata
Rank Oldbie
Rank
Oldbie

Thanks for all the ideas and input so far!

I have tested the sticks on 3 different mainboards in the meantime - always with the same results. So we can safely rule out any hardware related issues like SIMM sockets, layout- or BIOS-bugs etc.
Will cotinue testing tomorrow without parity and will try to select the classic reporting.