VOGONS


First post, by 80386SX

User metadata
Rank Newbie
Rank
Newbie

Dear All,

I post here because I'm very stuck and need of your help (as usual ;p).
I've an issue with the motherboard : CDTEK MINI-AT286 G2 MAIN BOARD

Randomly, I've the message post :

ON BOARD PARITY ERROR
ADDR (HEX) = (XXXX:XXXX)
SYSTEM HALTED

Initialiy, this mainboard had 1MB of embedded DRAM.

So, to find where is the problem, I did this :

- Visual control and one tantalum capacitor replaced -> OK
- Hard RAM chip tested one by one with external specific tester -> test OK
- Voltage check -> OK
- Continuity check -> OK
- Power supply used for test replaced by another one (with good quality too)
- Checked each RAM pin with a logic probe signal tester -> test OK
- Cpu frequency checked under SI and NSSI (and bench compare) -> OK
- Disabled the "turbo" mode
- Booted on multiplie DOS floppy with basic tools or games -> KO issue still persist
- Replaced DRAM chip by SIPP RAM - Nec -> work but problem still persist with this new RAM
- Replaced DRAM chip by SIPP RAM - Samsung -> work but problem still persist with this new RAM
- Tested multiple RAM mode selection 512kb to 4MB -> work but problem still persist
- Replaced MFM controller card and HDD, by IDE card and Flash card -> work but problem still persist
- Made multiple HDD low-formating and shutdown on again to avoid potential virus on RAM and reinstall DOS (under protected floppy made with certified img)
- Booted on multiplie DOS versions -> problem still persist
- Hard RAM chip tested under Checkit! 3.0 -> often time check OK but sometime a random error detected but system not halted by the BIOS (noticed on the reporting or computer freeze).
- Keyboard replacement to avoid any ghost/bug key pressing (instant system bus saturation or something like that)
- VGA card replaced

It's really summarized, I've been on it for 3 full days ;p

The problem even happened once or twice under the AMI BIOS diag testing (a small AMI on-board test tool that allows you to do some rudimentary tests)
Now I'm starting to run out of ideas...
I'm beginning to despair, this card is in good visual condition and I would love to recover it...

If anyone here has an idea, many thanks ahead then ;D

Attachments

Reply 1 of 10, by rasz_pl

User metadata
Rank l33t
Rank
l33t

try hair drier & cold spray, maybe its temp related, either margin chip between ram-cpu or partially cracked track going open once everything warms up

Open Source AT&T Globalyst/NCR/FIC 486-GAC-2 proprietary Cache Module reproduction

Reply 2 of 10, by Horun

User metadata
Rank l33t++
Rank
l33t++

Does it work OK if you remove jumper JP3 ? That is the parity check enable/disable ??

Hate posting a reply and then have to edit it because it made no sense 😁 First computer was an IBM 3270 workstation with CGA monitor. Stuff: https://archive.org/details/@horun

Reply 3 of 10, by 80386SX

User metadata
Rank Newbie
Rank
Newbie
rasz_pl wrote on 2023-06-13, 21:27:

try hair drier & cold spray, maybe its temp related, either margin chip between ram-cpu or partially cracked track going open once everything warms up

How to proceed ?
I'll see this as a last resort - I'm afraid to damage another "good" component by side effect, to see

Last edited by 80386SX on 2023-06-14, 05:11. Edited 1 time in total.

Reply 4 of 10, by 80386SX

User metadata
Rank Newbie
Rank
Newbie
Horun wrote on 2023-06-14, 00:39:

Does it work OK if you remove jumper JP3 ? That is the parity check enable/disable ??

Oh, good point and not listed but already tested -> I've still a system halt, without the full screen message and a buggy display with a thing like a ghost system prompt on the at the bottom left, with a system one beep and all is frozen

If there is anything else to try..

Reply 5 of 10, by rasz_pl

User metadata
Rank l33t
Rank
l33t
80386SX wrote on 2023-06-14, 05:06:
rasz_pl wrote on 2023-06-13, 21:27:

try hair drier & cold spray, maybe its temp related, either margin chip between ram-cpu or partially cracked track going open once everything warms up

How to proceed ?
I'll see this as a last resort - I'm afraid to damage another "good" component by side effect, to see

Hair drier is less than 100C, cold spray (for example compressed air for cleaning keyboards) isnt that cold either. If anything dies with those temps it was already damaged and on its way out.
You run something intensive (wolfenstein3d?) and selective heat up components on the board looking for a crash, then same deal with cooling.
But before that I would just resolder all pins on Headland chips and CPU. Sadly no datasheets for those chips so cant go do more detailed debugging without reverse engineering.

80386SX wrote on 2023-06-14, 05:10:
Horun wrote on 2023-06-14, 00:39:

Does it work OK if you remove jumper JP3 ? That is the parity check enable/disable ??

Oh, good point and not listed but already tested -> I've still a system halt, without the full screen message and a buggy display with a thing like a ghost system prompt on the at the bottom left, with a system one beep and all is frozen

If there is anything else to try..

so its not parity problem but memory/data corruption

Open Source AT&T Globalyst/NCR/FIC 486-GAC-2 proprietary Cache Module reproduction

Reply 6 of 10, by 80386SX

User metadata
Rank Newbie
Rank
Newbie
rasz_pl wrote on 2023-06-14, 13:37:
Hair drier is less than 100C, cold spray (for example compressed air for cleaning keyboards) isnt that cold either. If anything […]
Show full quote
80386SX wrote on 2023-06-14, 05:06:
rasz_pl wrote on 2023-06-13, 21:27:

try hair drier & cold spray, maybe its temp related, either margin chip between ram-cpu or partially cracked track going open once everything warms up

How to proceed ?
I'll see this as a last resort - I'm afraid to damage another "good" component by side effect, to see

Hair drier is less than 100C, cold spray (for example compressed air for cleaning keyboards) isnt that cold either. If anything dies with those temps it was already damaged and on its way out.
You run something intensive (wolfenstein3d?) and selective heat up components on the board looking for a crash, then same deal with cooling.
But before that I would just resolder all pins on Headland chips and CPU. Sadly no datasheets for those chips so cant go do more detailed debugging without reverse engineering.

80386SX wrote on 2023-06-14, 05:10:
Horun wrote on 2023-06-14, 00:39:

Does it work OK if you remove jumper JP3 ? That is the parity check enable/disable ??

Oh, good point and not listed but already tested -> I've still a system halt, without the full screen message and a buggy display with a thing like a ghost system prompt on the at the bottom left, with a system one beep and all is frozen

If there is anything else to try..

so its not parity problem but memory/data corruption

Many many thanks for your advices, but I spent my evening to make this and resolder all Headland chips + logc probe testing, but without any result 🙁
Im afraid that just one of RAM controller (Headland HT102/B1A4000 I guess) have just an internal issue...

I've done a new check also : I made and eeprom dump of Low and High BIOS to compare with same dump findable on the web, to check if not corrupted.. but same SHA1
- L : CD5629E839B531D2DB01A297DEAE65D9E34B09A7
- H : 5CF6BE081FC23328B0C14D1F1744350CEB99AC9A

This motherboard is driving me crazy i just want to cry 🤣.. or maybe an internal copper track of the PCB that has cracked.. I don't know what to think anymore..

On the reproduction side of the problem, I just noticed the swapdos systematically crash the RAM (probably because of a reservation and loading of RAM at once: I could make you a video of this)

Attachments

Reply 7 of 10, by Deunan

User metadata
Rank Oldbie
Rank
Oldbie

FYI the way parity error is usually handled on these mobos is via NMI. Could be the interrupt is somehow triggered randomly but not because of actual parity error, and that would be why the address seems to be different every time.

NMI on PCs is actually maskable, via external circuit. The way it's done is by accessing CMOS NVRAM address register, bit 7 is the mask and not part of the address (so there can be only 128 bytes stored in CMOS, less because of RTC registers taking some space, but there are RTC chips with banked NVRAM). If you can try writing a small COM program that will disable NMI after boot, see if it cures your system (NMI is not usually needed). If you don't have the tools I might have some time during next weekend.

EDIT: Here's the program, didn't have time to test it though. Run it "NMI 0" to enable NMI or "NMI 1" to disable it. Try without any memory managers first, even HIMEM, since going through CPU reset or BIOS code might change NMI state.

Attachments

  • Filename
    NMI.zip
    File size
    229 Bytes
    Downloads
    30 downloads
    File license
    Fair use/fair dealing exception

Reply 8 of 10, by 80386SX

User metadata
Rank Newbie
Rank
Newbie
Deunan wrote on 2023-06-14, 23:17:

FYI the way parity error is usually handled on these mobos is via NMI. Could be the interrupt is somehow triggered randomly but not because of actual parity error, and that would be why the address seems to be different every time.

NMI on PCs is actually maskable, via external circuit. The way it's done is by accessing CMOS NVRAM address register, bit 7 is the mask and not part of the address (so there can be only 128 bytes stored in CMOS, less because of RTC registers taking some space, but there are RTC chips with banked NVRAM). If you can try writing a small COM program that will disable NMI after boot, see if it cures your system (NMI is not usually needed). If you don't have the tools I might have some time during next weekend.

EDIT: Here's the program, didn't have time to test it though. Run it "NMI 0" to enable NMI or "NMI 1" to disable it. Try without any memory managers first, even HIMEM, since going through CPU reset or BIOS code might change NMI state.

Oh guy, it's so kind.
Indeed it's about the NMI - with your COM test, the machine is much more stable!
You revive me too somewhere 😀
I added a demo of your prog in attachement (zip) to show you the results.

Finally I had to go back to 1MB of RAM in chip, because 2MB or 4MB was poorly supported : I still had some freezes.
I'm pretty sure there is a problem with one of the memory controller chips.
Indeed, on this card there are two chips dedicated to this purpose, and when I do a RAM scan under SI9, we see that the "half" of the RAM is intermittently a concern.

But hey, from there to replacing a chip - or rebluid the motherboard -, I don't feel the courage to do it for the moment, especially since I don't have the professional hardware for that at all. ;D

So for the moment - and so that I won't be fooled in the future if I forget to load your COM file when reinstalling somewhere - I made a "hard patch" 🤣 I disabled the NMI pin of the CPU (see photo below).
It's not ultra clean but it's the most elegant fix I guess at the moment, in the sense that ; that's it or I put this card in my spare scrap box on my garage.

Well here it is, so with the 1MB on board, I ran benches for several hours in a loop and zero worries, the machine is stable! Zero crashs and zero freeze! 😀
Which was not the case before, whatever the amount or the mode of memory used (512k to 4Mb, by SIPP or on board).

All that it's for a restoration project of a complete PC 286 that I had found outside on the ground at the edge of a forest...
I'll post some before/after pics 😀
Old comp restoration is a real passion let's not be ashamed 🤣

Maaaaaany and many-many thanks again ALL for your help!! <3

Cheers!

Attachments

Last edited by 80386SX on 2023-06-19, 11:02. Edited 2 times in total.

Reply 9 of 10, by 80386SX

User metadata
Rank Newbie
Rank
Newbie

Some pictures of the restoration.
I only have two for the beginning, this pc was rusty everywhere and in a bad state.
For example, on the top of the can here it's gray and you can't see it because it dried, but there was some green forest foam. And inside, various bugs (spiders, etc).

Luckily it was rescued 😀

Cheers.

Attachments