VOGONS


First post, by scruit

User metadata
Rank Newbie
Rank
Newbie

Working on "Uncle Sherman", my Mitsubishi MP-3200 386SX-16.

Current issue that has resurfaced is that it reliably crashes when certain software tries to put it into protected mode.

CheckIT V1: Crashes on "(2) Test EXTENDED (Protected mode) Memory only" Specifically crashes at address 0x100000, which is the first byte over 1024Kb, IIRC. Shows the testing page, reboots immediately.

CheckIT V1: Crashes on CPU test immediately after displaying "Testing Protected mode" (Crashes even if all extended memory is removed, leaving only base memory)

CheckIt V3: Seems happy, reports protected mode test is successful. Tests all extended memory (with occasional "parity errors detected" note)

NSSI 0.60.45: Crashes on startup, right after telling me that April 2022 is an invalid date. Sequence of messages in the bottom left is:

- "Please wait while determining system contents..."

- "Determining computer type..."

- "Determining presence of PCI bus..."

- "Determining processor type. checking for Cyrix..."

- "Determining processor speed..."

- "Resetting CPU..."

(Crashes here with every character of the 80x25 text mode screen displaying a "mouse cursor" , begins to POST, the crashes again with a blank screen and boots normally)

NSSI 0.60.45: I can start it in safe mode ("nssi /safe") and it won't immedately crash, but crashes on certain cpu/memory tests (will have to go to the workshop and refer to my notebook, bear with me on this)

Adding HIMEM.SYS to CONFIG.SYS in DOS 6.2 causes the PC to halt with "MAIN RAM FAIL" message. DOS 6.2 without HIMEM is fine, an di Have played Prince of Persia and Indy 500 just fine on this PC.

All the memory chips above base memory are socketed, and I removed them all and tested them in a dramarduino. They all came back fine. Deoxit / reseated, and all lines up correctly. Having said that, it crashes in CheckIT 1 with all memory (above base memory) removed, so that suggests mayeb not an individual ram chip issue (or maybe it also crashes if it finds no extended memory..?)

Questions:

I've done some reading on protected mode and I know what it's for (running more than one program at a time) and that reaching into memory owned by another app will cause the CPU to rest the system. What I don't currently understand is how to break down and test the components of that.

- Is the 32kb cache used by, and only by, protected mode?

- Is there a DOS program that will lest me test the 32kb cache?

- Is bad cache ram a possible cause of crashing on entering protected mode? I'm trying to figure out of it is worth desoldering and testing the cache ram chips?

Testing extended memory in checkit3 often gives me a parity error note, but still lists the test as passed. My next approach will be to get collections of chips and install only 512K of extended memory and see if I can pass multiple memory tests without parity errors on any chips at all. If not, there is a more fundamental issue versus individual bad ram chips. If I test a memory range that is not populated with memory I get a hard fault/failure from the test, so I know the test is doing SOMETHING. Maybe going into protected mode is seeing the same thing that is causing checkit3 to detect parity errors in the memory, and the CPU crashes when the process tries to go outside of it's protected?

Or maybe checkit3 demands that I have himem.sys running? And not running it is the cause of the errors going into protected mode.

Reply 1 of 10, by Horun

User metadata
Rank l33t++
Rank
l33t++

How much total ram do you have ?
If Himem.sys gives an error you could have bad ram OR you need to use the /machine:xx option with it. http://www.manmrk.net/tutorials/DOS/help/himem.sys.htm
Neither of those games use any extended memory so do not need Himem.sys to run.
You could try Cachechk v7 (https://www.sac.sk/files.php?d=13&p=4) to see if the cache is working....use Memtest86 to test the ram, you must use a version below 4.2 for 386 cpu IIRC.

Hate posting a reply and then have to edit it because it made no sense 😁 First computer was an IBM 3270 workstation with CGA monitor. Stuff: https://archive.org/details/@horun

Reply 2 of 10, by scruit

User metadata
Rank Newbie
Rank
Newbie
Horun wrote on 2022-05-01, 15:46:
How much total ram do you have ? If Himem.sys gives an error you could have bad ram OR you need to use the /machine:xx option wi […]
Show full quote

How much total ram do you have ?
If Himem.sys gives an error you could have bad ram OR you need to use the /machine:xx option with it. http://www.manmrk.net/tutorials/DOS/help/himem.sys.htm
Neither of those games use any extended memory so do not need Himem.sys to run.
You could try Cachechk v7 (https://www.sac.sk/files.php?d=13&p=4) to see if the cache is working....use Memtest86 to test the ram, you must use a version below 4.2 for 386 cpu IIRC.

Thankyou for your response.

The total system ram is .. unclear. I believe it is 3.5Mb total spread across 640k base, up to 1024 reserved by system, then another 2.5Mb. Some of that memory is on a daughterboard, and some is on an "Intel AboveBoard PS AT". There is 2Mb on the daughterboard and 1.5mb on the AboveBoard.

I will take a look at the /machine option in himem.sys,.

Have not heard of Cachechk - will check it out.

I had tried to use memcheck86 but got a message it was not compatible. I wasn't aware of the version limitation, and I will seek out an older version.
Thank you!

More detail on my memory math...
- The memory is in what I am calling "rows" of 9x m5m4257-12 dram chips. These are 256Kbit chips, 120ns. Each row is 8 bits and a parity chip.. So, I interpret that as each row is 256Kbytes.
- The daughterboard has 4 "rows" soldered directly to the board, and 4 rows that are socketed/populated with identical m5m4257-12 chips. Total 2Mb, of which 1Mb is base/reserved and 1Mb is extended.
- There is an additional 16bit ISA "AboveBoard" memory card with 6 rows (1.5mb) of socketed/populated with an eclectic mix of m5m4257-compatible memory chips across about 4 different manufacturers and speeds of 120-150ns. The AboveBoard says "150ns or faster is ok" I will be removing this board for the next set of tests to help reduce variables.

- Originally the memory in CheckIT 3 (the one that doesn't crash accessing upper memory) would show a hard error on every other address starting at 0x300000 (3Mb). This, I believe, was because the AboveBoard was configured via dip switches to "start" its memory at 1.5m. So the last 512K of the DB and the first 512K of the AB were essentially overlapping. I changed this to start at 2mb and go to 3.5mb and the memory tests now succeeds from 0x300000 to 0x37FFFF and fails starting from 0x380000 (because there's no memory there!)

(side note, these m5m4257 chips are pin-compatible with 4256 chips from the newer Commodore machines, so the DramArduino was able to test them. It's just a single pass and a very simple test, doesn't really put the chip through it's paces, but at least none of these chips are DOA. I fear, though, that one or more may be 'marginal'. It passes the "What is your name" and "what is your favorite color" tests, but doesn't ask the tough question.)

Reply 3 of 10, by Deunan

User metadata
Rank Oldbie
Rank
Oldbie

Cache is used always, not just in protected mode, but the access pattern are different and that might be enough to expose a problem with marginal SRAM chips. It has to do with the registers being, by default, only 16-bit wide in DOS and 32-bit in protected mode. I too found that 16-bit/DOS code tends to be more resistant to timing issues on RAM and cache. Does your BIOS offer any cache settings at all? Might not, at 16MHz anything 25ns or better would work fine.

I would assume it's cache issues since this is what usually causes problems like these, but perhaps it's some quirk of your BIOS. Chances are programs are dying because of a bug in int 15h code, and most are probably using the same DOS extender or code sequence that triggers it. If you could find a way to load HIMEM properly then most well behaved extenders will not use in 15h anymore but go via XMS manager interface to allocate memory over 1MB. So if that worked, it would be a confirmation it's probably a BIOS issue.

Is that an Intel 386SX? I'm not aware of any CPU bugs that would break PM on the SX series but who knows. Could be a degraded chip too. But I'd place my money on cache and RAM first, then broken BIOS, and the CPU is last thing to suspect.

Reply 4 of 10, by Horun

User metadata
Rank l33t++
Rank
l33t++

Since the important memory is on the 16bit Mistubishi daughter card, you need to with just that and no Intel above board.
The 3200 came with 1Mb upgradeable to 2Mb which yours has.
Also make sure : External memory select 2MB - SW2/switch 4 - On
And: MEMORY MODE CONFIGURATION
Mode - Size - SW1/switch 3 - SW1/switch 5
C - 640KB (base) - Off - Off
If you still cannot load Himem.sys try turning off the cache:
Cache enabled - SW2/switch 1 - On
Cache disabled - SW2/switch 1 - Off

Hate posting a reply and then have to edit it because it made no sense 😁 First computer was an IBM 3270 workstation with CGA monitor. Stuff: https://archive.org/details/@horun

Reply 5 of 10, by scruit

User metadata
Rank Newbie
Rank
Newbie
Horun wrote on 2022-05-03, 22:09:
Since the important memory is on the 16bit Mistubishi daughter card, you need to with just that and no Intel above board. The 3 […]
Show full quote

Since the important memory is on the 16bit Mistubishi daughter card, you need to with just that and no Intel above board.
The 3200 came with 1Mb upgradeable to 2Mb which yours has.
Also make sure : External memory select 2MB - SW2/switch 4 - On
And: MEMORY MODE CONFIGURATION
Mode - Size - SW1/switch 3 - SW1/switch 5
C - 640KB (base) - Off - Off
If you still cannot load Himem.sys try turning off the cache:
Cache enabled - SW2/switch 1 - On
Cache disabled - SW2/switch 1 - Off

Thank you for your response.

I have been testing without the AboveBoard installed right now, fewer variables.

Current positions of the switches on the front of the case (unchanged since I obtained the PC, which is itself unchanged except for removing the AboveBoard and the AccuLogic sIDE3+ card)

- External Memory Select - SW2.4 = On.
- Base Memory - SW1.3 and SW1.5 = Off/Off.

I'll try turning the cache off and loading himem. I have been trying to think of a way of testing the cache ram chips without desoldering them and running them through the chip tester... I'm a little embarrassed I didn't think of just turning the cache off. 😀 I have documented those switch purposes and positions too, so I know that setting is there...

Here's something weird I noted while testing... If I only install 512K of the extended memory (columns 5 and 6 on the DB) then I get explicit failure while testing the extended memory space CheckIt3 . These errors appear at every 4th byte for the first 64 bytes of every 64K block in extended memory.

0x100000
0x100004
0x100008
..etc
0x10003c
0x200000
0x200004
...etc
0x300000
0x300004

...and the "base memory only" tests passes with no issues.

If I populate that 512K in columns 7 and 8 instead, skipping 5 & 6 then I get an explicit failure listing similar pattern, offset by two bytes:

0x100002
0x100006
0x10000a
..
0x10003e
0x200002
0x200006
..
0x300002
0x300006
..etc

...and "base memory only" tests passes with no issues.

If I populate all 4 columns of extended memory, 2Mb total, then the "base memory only"test still "passes" but I get a "parity issues detected, run a full test. (those are full tests, not a quick tests) I apparently didn't log the result of the 1mb extended memory test, so I am retesting that now.

If I change the order of chips within the columns then the error locations don't change. Makes me think that any problem would be at the board level (socket/trace/74 supporting logic) rather than individual chips. - OR maybe 0x100000 means the first byte, stored in column 5, and 0x100002 is stored in column 7. (Column 5=00, 6=01, 7=02, 8=03) So when the chips are populated in 5 and 6, the memory addresses for the beginning of each 2-byte word are 00/04/08 on column 5, and 02/06/0a in column 7. Hmmm...

My next thoughts on testing are in recognizing that the chip layout is two columns if 9 chips (8 bits and 1 parity) which collectively make up the 16 bit memory address, in minimum blocks of 512Kb... So, if any chip fails then it could cause all memory addressed to read bad, correct? Because all chips contribute to all memory locations. (if the chip in location C5 fails on gives out all 0's, then any 16-bit word that populated that with a 1 in the bit stored in c5 would report as failed)

Reply 6 of 10, by Horun

User metadata
Rank l33t++
Rank
l33t++

Yes it could be a one memory chip that is marginal or perhaps an upper address line is bad. Can you post a good picture of the daughter board ? I have never seen one from a 3200.
Is there also a BIOS setting you need to set ? I have an older board that I have to set jumpers for memory size AND go into the BIOS and tell it also that I have X amount of ram and how to use it.

Hate posting a reply and then have to edit it because it made no sense 😁 First computer was an IBM 3270 workstation with CGA monitor. Stuff: https://archive.org/details/@horun

Reply 7 of 10, by scruit

User metadata
Rank Newbie
Rank
Newbie

Stand by, I have captured a ton of pics and I'll post them up.

Also, I went to test the cache disable and the PC didn't fire up. I had noticed sometimes it takes a few seconds after hitting the power switch for it to actually power up. A couple nights ago I turned it on and it took 30 seconds to react.

I opened up the PSU and found that 1/3 of the capacitors are bulging. So I have a side-quest to replace those before I continue testing the PC. New caps on order.

Reply 9 of 10, by Horun

User metadata
Rank l33t++
Rank
l33t++

Thanks ! Interesting. Wish I had some thing to add to your issue

Hate posting a reply and then have to edit it because it made no sense 😁 First computer was an IBM 3270 workstation with CGA monitor. Stuff: https://archive.org/details/@horun

Reply 10 of 10, by scruit

User metadata
Rank Newbie
Rank
Newbie

I spent several hours yesterday mapping out the connections on the daughterboard so I can understand how memory is accessed. I got a few hints and I'm working on analyzing the other stuff.

Locations are coordinates that follow the labels columns 1-13 horizontally and rows A-J (skipping I) vertically.

- The bottom row (J) of the dram chips (columns 1-8) is the parity.
- There are 4 of 74F280 (9-bit parity chip).
--- The parity chip at H10 handles columns 1 and 5 (each parity bit input connects to the data in and data out of a different row, between A-H)
--- (Note that this does double-duty on the first column of base memory AND first column of extended memory)
--- The parity chip at J10 handles columns 2 and 6
--- The parity chip at J09 handles columns 3 and 7
--- The parity chip at H09 handles columns 4 and 8

I had noted earlier that base memory tests fine until I add extended memory. When I add half of the extended memory I get hard errors for every 4th byte of the first 64 bytes of each 64k block of extended memory, and if I populate all 1Mb extended then I suddenly start getting parity errors in base memory.

This map I am making is already giving me some clues. Now I know the parity chips span base/extended memory I can see how an upper memory problem would cause a base memory parity error.

I think I'm zeroing in on it. The analysis continues...