VOGONS


Reply 60 of 110, by kixs

User metadata
Rank l33t
Rank
l33t

I really don't know the technical details... but more cache should always be faster for reasons already said in my previous post (cache stores what cpu is requesting from main memory at the time. Cache doesn't store some random data from some random memory allocation). This is also confirmed by my testing with 256, 512 and 1024kb cache on UMC8881 board.

If what you're saying is correct. Then I could test with 8MB, 16MB, 32MB and 64MB (my max FPM module is 16MBx4 = 64MB). 8MB should be fastest while 64MB slowest. I really doubt that.

256KB-WT & 64 MB should provide the same results as 1024KB-WT & 256 MB, however 1024KB-WT & 8/16 MB should be faster than 256KB-WT & 8/16 MB.

Also doubt that.

Requests are also possible... /msg kixs

Reply 61 of 110, by feipoa

User metadata
Rank l33t++
Rank
l33t++

This is only a working hypothesis and I will not be offended if it turns out false, however, I would like to know the physical understanding behind the truth regardless. There is an overwhelming quantity of online literature concerning caching schemes and I believe what I said should theoretically be true, however it is heavily dependent on the benchmarking program. Naturally, a smaller program would thrash between blocks a lot less.

If you have some technical information concerning your stance on this I would be willing to read through it. How does it make sense that a program which only uses 8 MB would benefit from 1024 KB when your system has 256 MB? Those deeper cache lines are only authorised to use deeper blocks of memory, and not the first 8 MB. Now if you take out that unneeded RAM, the deep cache lines get spread out to the existing RAM. Perhaps if the cache line assignment to memory blocks does not occur consecutively or if the chipset can dynamically reassign these cache lines to new blocks, however that would no longer be direct-mapped cache.

Plan your life wisely, you'll be dead before you know it.

Reply 62 of 110, by kixs

User metadata
Rank l33t
Rank
l33t

Not sure if 486 uses directly mapped cache as some documents say it's 4-way associative.

But enough about theory 😉

I've done some quick test with my Gigabyte 486AM/S UMC 8881 board, Intel 486DX4-100 SK096 WB, 1024KB WB cache, Tseng ET4000/w32p PCI. I've only tested WriteBack L1 & L2. Not sure if WT would be any different in this comparison.

Tested with 3DBench 1.0c, PCPBench, Doom, Quake (Phil's pack) and following memory configurations (1x16MB, 1x4MB, 2x4MB, 2x16MB, 2x4MB+2x16MB)

Every memory configuration was the same. The only variation was in Doom with 1x4MB - result was 1 point faster (1641). Also Quake wasn't run with 4MB as minimum is 8MB.

3DBench... 71.1
PCPBench... 24.5
Doom... 1642
Quake... 11.1

The way I see it, cache is just a super fast mini memory. Larger it is, more data it can hold. Benefits largely depend on the applications itself - how large is it's most used code/data. Cache implementations hold the key only to the efficiency of cache - directly mapped cache has most cache misses and isn't as effective as n-way associative. More cache usually means better performance. But it's true some motherboards (maybe better said: chipsets) don't like large cache as you have to increase wait states and this usually means benefits of a larger cache are diminished.

I just remembered I've done this cache size comparison a few years ago on 386 VLB board with 486DLC-40 from 128KB to 256KB and the same memory configuration - 8x1MB. Here are results from my benchmark notes:

Test........128KB.......256KB
3DBench... .26.3.........26.6
PCPBench.....5.5...........5.8
Doom.........6191........5905

Not much, but it's faster overall.

Requests are also possible... /msg kixs

Reply 63 of 110, by feipoa

User metadata
Rank l33t++
Rank
l33t++

Which PCI-based 486 boards use set-associative L2 cache? I have only seen this on some 386 boards - CHIPS Peak/DM 82C311 uses 2-way set associative and the SiS Rabbit (85C310) only uses 2-way set associative when the on-chipset 128 bytes of SRAM is used. If external cache is installed, then direct-mapping is enabled.

I know these 486 chipsets use L2 direct-mapped cache,

SiS 85C471 (VLB)
UMC 8881 (PCI)
SiS 496 (PCI)
UMC 82C481 (ISA)
Intel 82420EX (PCI/VLB)

Unfortunately, the FinALI M1487/M1489 does not list what type of L2 cache controller scheme is implemented.

Plan your life wisely, you'll be dead before you know it.

Reply 64 of 110, by kixs

User metadata
Rank l33t
Rank
l33t

To be honest I never really cared about cache schemes. It's what it is. You can't change it - only cache size.

I guess 486 L1 cache is 4-way associative.

I might look into it, as it is an interesting topic anyway 😉

Requests are also possible... /msg kixs

Reply 65 of 110, by feipoa

User metadata
Rank l33t++
Rank
l33t++

Actually, you can use CCR0, bit 6 to alter the caching scheme of the Cyrix/Ti 486DLC between direct-mapped and 2-way set-associative.

Plan your life wisely, you'll be dead before you know it.

Reply 66 of 110, by feipoa

User metadata
Rank l33t++
Rank
l33t++

I ran some tests on a MSI MS-4144 socket 3 motherboard and compared 256K, 512K, and 1024K double-banked cache performance using the Ziff-Davis CPUMark99 stand-alone app. This program seems to be the most sensitive concerning cache adjustments. I looked at results for a system with 16 MB, 32 MB, 64 MB, and 112 MB of RAM. The short answer is that I, too, did not notice any performance increase when reducing the RAM quantity. I suspect the test program has too small of a memory foot-print to make a difference. None-the-less, the results are as follows.

1024K - Write-back (Write-through)
16 MB = 5.67 (5.47)
32 MB = 5.82 (5.61)
64 MB = 5.79 (5.62)
112 MB = 5.83 (5.61)

32 MB (write-back)
1024K = 5.83
512K = 5.52
256K = 5.33

FYI, cacheable limits
1024K (write-back) = 128 MB
1024K (write-thru) = 256 MB
512K (write-back) = 64 MB
512K (write-thru) = 128 MB
256K (write-back) = 32 MB
256K (write-thru) = 64 MB

Interesting how CPUMark99 shows a 10% improvement when the cache jumps from 256K to 1024K. I think the real-world improvement is around 2%. Shows how sensitive this benchmark is in regard to cache. Similarly, the benefit of write-back over write-through cache was 4%.

Plan your life wisely, you'll be dead before you know it.

Reply 67 of 110, by rad

User metadata
Rank Newbie
Rank
Newbie

Thanks for those tests, I've ordered some 10ns cache chips from 2 different asian dealers, because no one had all the chips I want for my system. One of the package arrived already (that I'm going to use for TAG RAM) and the other one is still on its way to me.

TAG RAM (IS61C256AH-15N) already replaced with W24257AK-10.
4 Cache chips (IS61C512-15N) will be replaced with IS61C1024-10N when they arrive.

Good news is that after I've replaced the TAG chip now the system doesn't hang up with L2 setting to WB mode. Repeated some tests under DOS and Windows and it seems stable so far. With the old (15ns) chip it was impossible to even successfully load OS. I'll complete all and prolong tests when I replace the other chips as well, but in terms of speed perspective there's little to no difference (WB vs WT).

Board doesn't have a setting for 512KB cache, but as far as I can see there is one soldered jumper which controls exactly that. Will confirm that once I've got the other chips.

The goal obviously is to upgrade the system with the fastest and largest possible cache this board allows.

Reply 68 of 110, by feipoa

User metadata
Rank l33t++
Rank
l33t++

I wonder if any of your 15 ns cache was defective. Did you try other 15 ns cache? If all your 15 ns cache doesn't work and your 10 ns cache works, then these Asian-sourced chips may indeed be 10 or 12 ns chips instead of relabeled 15 ns chips.

Plan your life wisely, you'll be dead before you know it.

Reply 69 of 110, by rad

User metadata
Rank Newbie
Rank
Newbie

This was only a problem with 66MHz bus with the tag ram marked as 15ns (IS61C256AH-15N) in WB mode. With lower FSB speed it wasn't a problem. And since the tag ram chip is actually the bottleneck of the cache subsystem as it is required to be a little faster than the other cache chips, because the chipset first have to check whether there is a cache hit or not and if there is - address it in the cache store, and all those operations have to be performed in a single cycle. This means that actually the tag ram should be a little faster than the rest cache chips (actual storage).

And given the fact that 66MHz FSB operation is exactly 15ns time frame, given the additional overhead and operations for checking cache hits, 15ns is not enough for that speed. WT mode is a little bit relaxed maybe because of not having an additional dirty bit and that explains why the old 15ns tag ram was working fine in WT mode and not in WB mode. I'm also surprised that even WT mode worked in 66MHz operation.

Now with the new 10ns tag ram and old 4x 15ns cache storage chips system is working fine in 66Mhz FSB in WB mode. I wonder if I need to use explicitly 10ns for the rest cache chips at all (to achieve more tight timings) when everything else is working fine.

Reply 70 of 110, by feipoa

User metadata
Rank l33t++
Rank
l33t++

I guess I have always used 10 ns cache (TAG & Data) at 66 MHz, so I did not run into this problem. I did drop my system RAM down from 128 MB to 64 MB so that I can run the system with a 1ws RAM Read time isntead of 2 ws.

What did you ultimately end up with using for your hard drive host controller?

Plan your life wisely, you'll be dead before you know it.

Reply 73 of 110, by rad

User metadata
Rank Newbie
Rank
Newbie

Yes I thought about that, even bought one PCI card Promise Ultra100 TX2. However when plugged inside with my CF card reader, it is recognized properly by his own BIOS but refuses to boot afterwards. I haven't played with it anymore longer, but I presume it's only a configuration issue.

This weekend I was able to hw mod the mainboard to support more than 256kb L2 cache. The hard wired jumper has to be desoldered and an ordinary jumper has to be soldered on its place. Here are some pictures and the final results:

IMG_1097.JPG
Filename
IMG_1097.JPG
File size
1.41 MiB
Views
1865 views
File license
Fair use/fair dealing exception
IMG_1099.JPG
Filename
IMG_1099.JPG
File size
891.22 KiB
Views
1865 views
File license
Fair use/fair dealing exception
IMG_1098.JPG
Filename
IMG_1098.JPG
File size
1.21 MiB
Views
1865 views
File license
Fair use/fair dealing exception

Reply 74 of 110, by feipoa

User metadata
Rank l33t++
Rank
l33t++

Glad to see you got 512K working. Is it stable with Quake, DOOM, and Windows.

I noticed that the Promise Ultra100 TX2 won't boot past its own BIOS if you soft reset. This is with the Biostar MB-8433. It will work with a CF-Card adapter, but it is fussy about which item is master/slave, which is primary, secondary. Sometimes it only likes the 40-pin cables for UDMA33.

Plan your life wisely, you'll be dead before you know it.

Reply 75 of 110, by feipoa

User metadata
Rank l33t++
Rank
l33t++

rad, any progress on the Promise Ultra100 TX2? Did you try a regular magnetic disk drive?

Did you try a Voodoo3 in your system? I'm interested to know your GLQuake scores at 800x600 with these settings: -nosound -nocdaudio -nonet -nomouse -nojoy

Plan your life wisely, you'll be dead before you know it.

Reply 76 of 110, by rad

User metadata
Rank Newbie
Rank
Newbie

Apologies for not replying earlier... Well the chinese seller sent me wrong 512K chips... they were 15ns instead of 10ns as I've afraid... so trying to run the system with the same settings as 256K in WB mode was not stable. In WT mode it was a little bit better, everything was running fine under DOS, but when I start the Quake demo it just hangs every time. Now I'm waiting for 10ns chips finally to arrive and see if I can stabilize the board with 512K cache.

Unfortunately I don't have any mechanical IDE drive to test the Ultra100 TX2 as well as still haven't found PCI Voodoo3 card at a reasonable price. But will update the thread once I've found one.

Regarding the issue with the ISA bus not working well with 66 MHz FSB speed... As you remember the BIOS doesn't have any option for setting DMA wait states and I/O Recovery times. However the chipset supports setting all those parameters via PCI configuration registers (for 496 configutaion) and I/O index/data ports (for 497 configuration). There's one program (RW-everything) which is able to do that (read and write on those ports) but it refuses to start on Win9x machines. I'm trying to find an older version which should presumably work on such configuration.

Anyway do you know for an easy tool to read/write those registers and send data to i/o data port?

Reply 77 of 110, by feipoa

User metadata
Rank l33t++
Rank
l33t++

Keep in mind that I experience a 10% failure rate on Chinese-sourced SRAM. If the issue is not related to the speed grade, you may have a bad module in there. Do you have a means to test them, for example, with an SRAM tester? Or in another motherboard and run HIMEM and MemTest.

Also keep in mind that the larger the size of your L2 cache, the more likely it is that the fastest timings will not work. This is especially true of single-banked cache. I do hope that the issue is strickly related to the SRAM speed rating though.

Does your motherboard BIOS allow you to at least set the ISA bus clock speed relative to the PCI clock? Do you know if your ISA I/O Recovery time is on 2 BCLKS or 4 BCLKS? After some time, I came to realise that 4 BCLKS was needed for long-term stability.

It might be easier to find a SiS496-based BIOS which does have the I/O recovery time option. Unfortunately, the 4DPS doesn't have this feature either.

MODBIN allows for altering the chipset registers. CTChip does as well, but I don't see the config file for the SiS496, only for the UM8881.

Another idea is to use a PCI sound card with an XR385.

Plan your life wisely, you'll be dead before you know it.

Reply 78 of 110, by rad

User metadata
Rank Newbie
Rank
Newbie

Yes I know there's a workaround for the sound, that's why I'm currently using ES1938s sound card with wavetable header, so that I can attach daughterboards like the Roland SCB55, which is working perfectly fine. Also DOS support and SB Pro/16 compatibility is great with the ESS PCI card.

However knowing that such an issue exists is keeping me pushing to find the reason behind it and a long term fix, not just a workaround. I also have a bunch of old ISA sound cards which I would like to be able to plug them inside and work reliably.

I think I've found very simple way into looking into southbridge registers (available via CPU I/O port 22h and 23h) - this is the DOS DEBUG program:

C:\>debug
- o 22 71
- i 23
01
- o 22 70
- i 23
40

You can use the same program for changing values and bits on those registers. I've already tried that on register 71h (used for I/O recovery time settings) and apparently it worked. If later I retrieve the same register current status, it is preserved and return the updated values. Now the problem is that as seen from default values in the MODBIN PCI configuration registers - for 71h I've value 01h. The same is returned via DEBUG. This means that the default values are the most conservative ones:

Bits 7:6 16-Bit I / O Cycle Command Recovery Time Selection
00: 5 BUSCLK
01: 4 BUSCLK
10: 3 BUSCLK
11: 2 BUSCLK
Bits 5:4 8-Bit I/O Cycle Command Recovery Time Selection
00: 8 BUSCLK
01: 5 BUSCLK
10: 4 BUSCLK
11: 3 BUSCLK
Bit 3 Reserved
Bit 2 16-Bit Memory, I/O Wait State Selection
0: 2 wait states
1: 1 wait states
Bit 1 8-Bit Memory, I/O Wait State Selection
0: 5 wait states
1: 4 wait states
Bit 0 Reserved

I've compared the value of 70h register which is all about ISA Bus Clock Selection and I can confirm values are correct, as set from BIOS - the same is set on that register:
Bits 7:6 Bus Clock Frequency Selection
00: BUSCLK = 7.159 MHz
01: BUSCLK = 1/4 PCICLK
10: BUSCLK = 1/3 PCICLK
11: Reserved
Bits 5:0 Reserved

This leads me to think that the reason for ISA sound card lock up is not about the I/O recovery time, but something else...

In Windows, there's one great utility which can view/change PCI Configuration registers - it is called WPCREDIT. Using that I've found something else quite disturbing. All other registers are set correctly (DRAM, Cache settings, etc....) but not the main CPU type configuration register - 40h:
Bit 7 Reserved
Bit 6 CPU Burst Write Enable (P24T/P24D/M7/Cx 5x86/Enhanced Am486)
0 = Disable
1 = Enable
For the other CPU types, this bit must be disabled.
Bit 5 CPU Internal Cache Write Back Mode Enable
0 = Disable
1 = Enable
Bits 4:2 Host Processor Type Selection (the three bits must be set immediately after powerup and match the setting in Reg 81[4:2])
000 = Intel i486DX/DX2/AMD Am486DX/DX2/DX4
001 = Intel i486 SL-Enhanced/ i486 DX4
010 = Intel P24D/P24T/AMD Enhanced Am486 DX2/DX4/Cyrix Cx 5x86
011 = AMD Am486DXL
101 = Cyrix Cx486DX / Cx486DX2
Others = Reserved
Bits 1:0 DRAM Speed
00 = Slowest (50Mhz)
01 = Slower (40Mhz)
10 = Faster (33Mhz)
11 = Fastest (25Mhz)

The bold lines show current configuration. Although I've set the jumpers as shown on first post in this thread which should match Cyrix Cx 5x86 with just one jumper changed for multiplier 2x instead of 3x, the system is setting those registers for Cx486DX configuration and CPU write burst is disabled. I've set the CPU write burst to enabled via another program (TweakBIOS which has a page for setting various Cyrix features) and verified it via registers that it is set to ON, and everything continue to work normally.

So something maybe is not set properly for my configuration. Any other ideas what could be the cause for ISA cards instability as I/O recovery time seems not to be the problem?

Reply 79 of 110, by feipoa

User metadata
Rank l33t++
Rank
l33t++

Could you remind me what you are using to determine that ISA sound is not stable? And which sound cards have you tried?

On my particular motherboard (UM8881), I determined that I could only use 4BCLK for the I/O Recovery time. Not 2, nor 6, not 8. I could only use 4. So it might be worth trying all other I/O Recovery times on your system.

Any idea what effect it has on performance/stability if the wrong host processor type is selected? Is that just used to display the correct identiy string before POST?

Plan your life wisely, you'll be dead before you know it.