SY-5EH revisions v1.2 vs v1.3

Reply 20 of 23, by Chkcpu

Posted on 2025-07-18, 20:07

Chkcpu Offline

Rank Oldbie

Rank: Oldbie
Posts: 577
Joined: 2021-05-13, 18:42
Location: The Netherlands

mkarcher wrote on 2025-07-18, 06:13:

Chkcpu wrote on 2025-07-16, 10:06:

If you still want to experiment with this, I can help with writing a “L2WB” Macro for CTCHIPZ. This will take some time though as I need to experiment to find what works, without crashing the system. 😉

The CTCHIPZ version I used some time ago had a bug in its "load memory block" routine, affecting large cache sizes. The standard method to initialize an "always valid" cache is to read a memory block of twice the L2 size (+L1 size, if you don't disable L1 during that time), and CTCHIPZ has a macro-callable function to run REP LODSD (or something like that) for this purpose. In cache the load amount doesn't fit the conventional memory (it doesn't for 1MB or 2MB blocks), it uses a different method that loads extended memory, but there is an arithmetic error: CTCHIPZ will load 1KB or 2KB instead of 1MB or 2MB in that case. Somewhere I likely still have a patched CTCHIPZ around that fixes this bug. I'm gonna hunt for it if you are interested.

Hi mkarcher,

Yes, this CTCHIPZ KB/MB bug would certainly derail the L2 cache WB attempts with 512KB or more cache. I now recall you mentioned this before, so thanks for reminding me about this bug.

Instead of the usual V3.4, I’m using Andreas Stiller’s CTCHIPZ.EXE V3.7 which has the infamous “Runtime error 200” bug on faster CPUs fixed. But I don’t know if this version fixes the KB/MB bug as well, so if you can locate the patched CTCHIPZ version, I would be very interested in a copy.

Thanks, Jan

CPU Identification utility
The Unofficial K6-2+ / K6-III+ page

Reply 21 of 23, by mkarcher

Posted on 2025-07-26, 14:34

mkarcher Offline

Rank l33t

Rank: l33t
Posts: 3395
Joined: 2019-01-19, 16:29
Location: Germany

Chkcpu wrote on 2025-07-18, 20:07:

Yes, this CTCHIPZ KB/MB bug would certainly derail the L2 cache WB attempts with 512KB or more cache. I now recall you mentioned this before, so thanks for reminding me about this bug.

Instead of the usual V3.4, I’m using Andreas Stiller’s CTCHIPZ.EXE V3.7 which has the infamous “Runtime error 200” bug on faster CPUs fixed.

While I didn't immediately find the patched version, I still found the disassembly.Relevant locations in V3.4:

If a cache size above 256KB is configured, 0000:17CD is called (I give segment numbers as Turbo Pascal would, i.e. 0000 is the first segment in the EXE file. If you happen to use IDA and load the EXE file with default settings, add 1000 to the segment). That function checks whether the processor is in protected mode (i.e. V86 submode of protected mode), and errors out in that case, otherwise it prepares a flat real mode with 4GB segment limits. Furthermore, it sets a flag that flat mode has been entered at 085A:009A. Also, this function receives a 16-bit stack parameter that is supposed to indicate the load size used for cache flushing in units of 64KB. This size is stored at 085A:00BC. This global variable is initialized by default to 10 (640KB).

Later, when the cache is supposed to be flushed, the function at 0000:0505 is called, which again gets the load size as stack parameter, typically it is taken from the global variable at 085A:00BC. If flat mode is not initialized, the flush function starts at physical address 0 and does as many 16-bit 64KB LODSW invocations as needed to load the requested number of 64KB blocks. OTOH, if flat mode is initialized, the segment count is directly loaded into ECX for a REP LODSD. This is obviously wrong, and not just a MB/KB confusion, but it is even off by a factor of 16.384.

This means you somehow need to bodge in an "SHL ECX, 14" instruction in the flat mode path, which requires 4 extra bytes. I guess you can obtain those bytes by unifying the two code paths as much as possible (both paths contains a dedicated push ds, a XOR r16,r16 (one time AX, one time DX) and a pop ds, which can likely be unified, which will yield 4 bytes space). I assume that's what I did back in the day I troubleshot the issue the first time.

Reply 22 of 23, by Chkcpu

Posted on 2025-07-26, 18:06

Chkcpu Offline

Rank Oldbie

Rank: Oldbie
Posts: 577
Joined: 2021-05-13, 18:42
Location: The Netherlands

mkarcher wrote on 2025-07-26, 14:34:
While I didn't immediately find the patched version, I still found the disassembly.Relevant locations in V3.4: […]
Show full quote

Chkcpu wrote on 2025-07-18, 20:07:

Yes, this CTCHIPZ KB/MB bug would certainly derail the L2 cache WB attempts with 512KB or more cache. I now recall you mentioned this before, so thanks for reminding me about this bug.

Instead of the usual V3.4, I’m using Andreas Stiller’s CTCHIPZ.EXE V3.7 which has the infamous “Runtime error 200” bug on faster CPUs fixed.

While I didn't immediately find the patched version, I still found the disassembly.Relevant locations in V3.4:

If a cache size above 256KB is configured, 0000:17CD is called (I give segment numbers as Turbo Pascal would, i.e. 0000 is the first segment in the EXE file. If you happen to use IDA and load the EXE file with default settings, add 1000 to the segment). That function checks whether the processor is in protected mode (i.e. V86 submode of protected mode), and errors out in that case, otherwise it prepares a flat real mode with 4GB segment limits. Furthermore, it sets a flag that flat mode has been entered at 085A:009A. Also, this function receives a 16-bit stack parameter that is supposed to indicate the load size used for cache flushing in units of 64KB. This size is stored at 085A:00BC. This global variable is initialized by default to 10 (640KB).

Later, when the cache is supposed to be flushed, the function at 0000:0505 is called, which again gets the load size as stack parameter, typically it is taken from the global variable at 085A:00BC. If flat mode is not initialized, the flush function starts at physical address 0 and does as many 16-bit 64KB LODSW invocations as needed to load the requested number of 64KB blocks. OTOH, if flat mode is initialized, the segment count is directly loaded into ECX for a REP LODSD. This is obviously wrong, and not just a MB/KB confusion, but it is even off by a factor of 16.384.

This means you somehow need to bodge in an "SHL ECX, 14" instruction in the flat mode path, which requires 4 extra bytes. I guess you can obtain those bytes by unifying the two code paths as much as possible (both paths contains a dedicated push ds, a XOR r16,r16 (one time AX, one time DX) and a pop ds, which can likely be unified, which will yield 4 bytes space). I assume that's what I did back in the day I troubleshot the issue the first time.

mkarcher, thanks for the detailed analysis of the FLUSH function in CTCHIPZ V3.4 and the required patch to correct the KB/MB bug. Amazing that the bug now appears to flush a 16 times smaller datablock than previously assumed!

I will run CTCHIPZ.EXE through my disassembler and look at the details you specified.
I'll let you know when I’m successful in patching this bug.

Jan

CPU Identification utility
The Unofficial K6-2+ / K6-III+ page

Reply 23 of 23, by Living

Posted on 2025-07-26, 18:39

Living Offline

Rank Member

Rank: Member
Posts: 216
Joined: 2014-01-26, 20:01
Location: Buenos Aires, Argentina.

i think the 1.3 is the only one with 1MB of cache

i didnt notice any other difference when they were new (used a ton of this models ranging from 1.0 to 1.3 and K6-2 400 to 550 to build computers between 1999 and 2000)

the grand majority i came across were the 1.2 version with 512k paired with a 450Mhz K6-2. Since we sold them with 64 or 128mb, we never seen any difference with the 1MB version.

the only warning ill give you is the USB head: it has a different pinout and it managed to kill a couple of usb drives using a different cable than the original.

Main menu