VOGONS


Cyrix DX2-80 slower than DX2-66?

Topic actions

Reply 20 of 28, by mkarcher

User metadata
Rank l33t
Rank
l33t
Disruptor wrote on 2020-07-12, 08:29:
jakethompson1 wrote on 2020-07-11, 18:43:

It's in write through. The write in in Speedsys is a straight horizontal line about as fast as an L2-cached read which seems consistent with that. No jumpers on the board for write back. The board is also 5V only so it may be too old. There is a BIOS setting for L2 write back, though, and the performance seems consistent with it being on.

I don't give a penny for Speedsys' write line on a 486. On this architecture it seems to be not affected by write back cache at all.
And I also have noticed the 27 MHz of Speedsys for my Cx486-DX2/80. It is definitively wrong for this CPU.

Speedsys uses an access pattern for write tests that causes a near 100% miss ratio on the write-back cache on most 486 and Pentium systems, so you always measure the cache-miss performance, even with just 4KB (which would fir in L1) or 128KB (which would fit in L2). I explained it in this post: Re: Pentium MMX 450MHz. Unfold the quoted text to get the required context to understand that post.

Reply 21 of 28, by mkarcher

User metadata
Rank l33t
Rank
l33t
jakethompson1 wrote on 2020-07-11, 18:49:
I replied to an old thread so I'm the recent one here, but what you're saying sounds like my suspicion. Would that also be consi […]
Show full quote
mkarcher wrote on 2020-07-11, 17:23:

My money is on "chipset autoconfiguration". Most BIOSes have the option to auto-configure RAM and cache timings automatically depending on the bus clock. On some BIOSes, auto-config is the only option (and no timing configuration is displayed in the advanced chipset setup). On most BIOSes, you can set auto-config to "disabled" and manually configure timings.

I replied to an old thread so I'm the recent one here, but what you're saying sounds like my suspicion. Would that also be consistent with Speedsys displaying 27 MHz?

The only performance-related options in the BIOS that I don't already understand are one about waiting 1T or none for FPU (I set it to none), and 1T or 2T for ELBA# (won't boot on 1T, default is 2T).

I think I have the jumpers set to max out all the VLB stuff now, and my issue is on some cold boots it locks up at the VGA BIOS copyright screen, and when it does boot, the memory graph in Speedsys beyond 256K is pretty unimpressive. Raw CPU performance, VESA Memory throughput, L1 and L2 cached memory, and disk I/O all look better though.

I know how to write asm to do i/o port reads and writes and such if I were curious about settings that don't show in the BIOS, but there seems to be very little documentation at that depth about the UMC 82C491F.

Yeah, good luck on finding UMC chipset data sheets. If your BIOS is AMI, you might want to try AMISETUP to get access to hidden configuration options. If your BIOS is AWARD; you might want to try MODBIN to create a modified BIOS for your board that exposes more configuration options. MODBIN is a tool aimed at mainboard manufacturers which lets them choose which of the options the chipset manufacturer provided should actually be exposed to the user. A deep understanding of mainboard operation is helpful for using MODBIN, Assembly language knownledge is not required, as MODBIN only edits configuration data in the BIOS, but it does nothing about code.

I guess the ELBA# pin means "external local bus access". This signal is driven by a VL card to indicate that the cycle is handled by a VL card and should thus not be handled by the ISA bridge or the memory controller, although I am unsure whether having VL memory space on top of mainboard memory is actually a supported configurtion on most 486 chipsets. The VL slots have a pin called "LDEV#" (local bus device) that is to be driven by a VL card if it wants to respond to a cycle. The LDEV# signals of all VL slots (and onboard local-bus devices, if any) are typically ANDed together using a fast AND gate chip (remember, LDEV# is active low, so a physical "AND" gate actually means "OR"), and presented to the chipset. It seems UMC calls this OR of any LDEV# line "ELBA#".

The setup option is about when the chipset is going to theck its ELBA# input pin. It must not start an ISA cycle for cycles handled by VL cards, so it cannot forward the 486 cycle to the ISA bus until the decision for VL/non-VL is known, so it needs to wait for the ELBA# check result before forwarding. The fast option is T1, and it means that at the end of the first clock cycle of an 486 bus cycle, LDEV is sampled. The chipset looses one clock before it may assert any read/write command lines on the ISA bus. This might not be an issue because there might be address setup time requirements on the ISA bus that require a command delay anyway. The slower option is T2, which gives VL cards a whole extra cycle to assert LDEV#, but also delays forwarding of cycles to the ISA bus by an extra FSB cycle.

Disregarding the delay of the AND gate that computes ELBA# from LDEV#, the recommendation of VESA is to sample LDEV# at the end of T1 if clock speed is 33MHz or lower, and to sample LDEV# at the end of T2 if clock speed is above 33MHz. This is what the VL>33MHz jumper on the mainboard is for. It does not have any effect on mainboard operation itself, its only function is to drive the ID3 pin on VL bus that informs cards about the bus speed. Setting this jumper to >33Mhz, it tells the VL cards that it is OK to take an extra cycle on decoding the address, because LDEV is sampled at the end of T2. Setting this jumper to >33MHz and configuring ELBA# sampling to T1 is a non-conforming configuration. The second VL jumper is about write wait states. It also does not influence mainboard operation at all (on the mainboards I know), but it is directly connected to the ID2 pin. This pin tells VL cards that they can assume there are no 0WS write cycles on the VL bus, so taking an extra cycle to pick up the data on write cycles is OK, even if the card signals LRDY# (to indicate readiness, i.e. terminate the cycle) very early. Again, the VESA recommendation is that 0WS write cycles are forbidden at clock frequencies above 33MHz.

ELBA# at T2 is a performance killer for ISA performance, and if you need that configuration at 40MHz (or even at 33MHz), you should consider using a different mainboard, or you should place everything performance sensitive on the VL bus (mostly hard disk controller and graphics card) to avoid paying the ISA performance penalty during normal operation.

Reply 22 of 28, by jakethompson1

User metadata
Rank Oldbie
Rank
Oldbie
mkarcher wrote on 2020-07-12, 08:50:

The memory graph seems to indicate that I indeed was right about "chipset autoconfiguration" being the cause of the problem, but I explained about L2 timings, whereas the problem is in fact about main memory timing. (Nearly?) all 486 consumer boards derive the memory timing (how long to wait after the falling edge on /RAS to switch to the column address, how long to wait after that to provide the falling edge on /CAS, how long to wait after /CAS being pulled low until the data output is valid, how long /CAS needs to be low and how long /CAS needs to be high after that before the next fast-page-mode access may occur) from the main oscillator, sometimes allowing specification in half-cycles. This means the resolution for setting these time is 30/15ns with 33MHz FSB, and 25/12.5ns with 40MHz FSB. To compensate for the faster timing, the BIOS is likely increasing the number of clocks or half-clocks to keep in specification with some memory timing requirements.

As far as I know, BIOS auto-configuration on 486 boards usually tunes memory access times to be suitable for 80ns or 70ns RAM. If you have 60ns RAM installed, it is likely that the cycles counts you use at 33MHz would also work at 40MHz.

Finally, some thoughts about the numbers: 19MB/s is 1.19 cache lines per microsecond, or 840ns per cache line. This is around 28 cycles at 33MHz per cache line. 15MB/s is 0.94 cache lines per microsecond, or 1070 ns per cache line. This is around 43 cycles per cache line. The raw cycle count will be a bit lower, as the memory is also busy doing other things like refresh, but those numbers seem excessively slow for reading. Are you sure you did not mix up read timings and move timings? For moving, data is transferred twice over the bus, so speeds between 15MB/s and 20MB/s make much more sense.

I tried the tests again, this time keeping a careful log-book.
First things first, I was ultimately able to reproduce the intermittent hang on boot even at a bus speed of 33 MHz and the Intel 486DX2-66, so that doesn't seem to be related. I've made a ton of other changes recently including different SIMMs and an external CMOS battery, so I won't worry about that for now.

So I hit R for each Speedsys test and got these numbers from the SSTREP*.txt files. Speedsys seems to not distinguish between L1 & L2 cache likely for lack of CPUID, which is why I'm going into the .txt file and using the table - row "Memory", column "Reading."
Cx486DX2-80, bus speed 40 MHz, all other jumpers aggressive (<= 33 MHz) - CPU score 30.24, uncached Memory read 10.83 MB/s, disk linear read 5904 KB/s.
I tried setting JP18 (VLB speed > 33) to the conservative setting and it didn't affect the numbers.

Next, I set both JP18 (VLB speed > 33) and JP7 (CPUCLK >= 40 or 50). The CPU score worsened to 28.4, the memory read worsened to 10.66 MB/s, and the disk linear read worsened to 5634 KB/s.

At that point, I switched from VLB to ISA VGA to eliminate that as a factor. It didn't seem to change the scores, except worsen the disk linear read speed a little.

Next, I underclocked the Cx486CX2-80 to 66 MHz by lowering the bus speed to 33. I returned all other jumpers to aggressive settings.
Cx486DX2-80@66 - CPU score declined to 26.69, uncached Memory read improved to 17.17 MB/s, linear disk read worsened to 5703 KB/s.

Next, I switched from the Cx486DX2-80@66 back to an Intel 486DX2-66.
i486DX2-66 - CPU score declined to 25.17, uncached Memory read declined to 14.78 MB/s, linear disk read worsened to 5691 KB/s.

Stay tuned for another post in a bit though - AMISETUP seems to change things.

Reply 23 of 28, by jakethompson1

User metadata
Rank Oldbie
Rank
Oldbie
mkarcher wrote on 2020-07-12, 09:34:

Yeah, good luck on finding UMC chipset data sheets. If your BIOS is AMI, you might want to try AMISETUP to get access to hidden configuration options. If your BIOS is AWARD; you might want to try MODBIN to create a modified BIOS for your board that exposes more configuration options. MODBIN is a tool aimed at mainboard manufacturers which lets them choose which of the options the chipset manufacturer provided should actually be exposed to the user. A deep understanding of mainboard operation is helpful for using MODBIN, Assembly language knownledge is not required, as MODBIN only edits configuration data in the BIOS, but it does nothing about code.

First I'll get ELBA# out of the way. Let me clarify the hang was at the AMI WAIT... prompt. I tried flipping C: disk type to Not Installed, and also changing advanced IDE features (block read, LBA, 32-bit) to disabled. This avoided the hang. Verdict: my UMC8672 VLB IO card is what's forcing the ELBA# to be on T2.

Thanks for pointing me to AMISETUP as that seems key! Here are some of the unavailable settings:
Auto Config: Enabled
Cache Read: 3-2-2-2
Cache Write: 1 WS
DRAM Type: Fast Page
DRAM wait states: 1 WS
Keyboard Clock: 9.5 MHz
AT Clock: CPUCLK/4
IO Ready Time: 5/3 BCLK
Hold PD Bus: 2~3 T

Those do seem quite conservative. As an example I did nothing but change autoconfig to Disabled and DRAM wait states to 0. This alone improved memory read from 14.78MB/s to 18.82MB/s. This is with the Intel 486DX2-66 back in with bus speed of 33. Sounds like I have lots of things to play with. Unfortunately I have no cmos reset jumper so I'm stuck disconnecting the battery and waiting for it to clear if I make a mistake. My cache chips are 15ns and tag ram 5ns if that's a factor - seems those are roomy as far as increasing settings. I better set it back to autoconfig and put the Cx486DX2-80 back in before I tinker more.

By the way, I bought some EPROM chips and a programmer expecting to have to update the BIOS for LBA support. It came updated already. You think there is a bit I can flip to put these settings in the normal BIOS setup? I ran strings on the bios image and those are in there so hopefully the code to support them is there and just disabled.

Reply 24 of 28, by mkarcher

User metadata
Rank l33t
Rank
l33t
jakethompson1 wrote on 2020-07-12, 22:04:

First I'll get ELBA# out of the way. Let me clarify the hang was at the AMI WAIT... prompt. I tried flipping C: disk type to Not Installed, and also changing advanced IDE features (block read, LBA, 32-bit) to disabled. This avoided the hang. Verdict: my UMC8672 VLB IO card is what's forcing the ELBA# to be on T2.

So, obviously that card is too slow in pulling LDEV# for ELBA# to be active at T1. If you own another VLB IDE interface card, try it.

jakethompson1 wrote on 2020-07-12, 22:04:
Thanks for pointing me to AMISETUP as that seems key! Here are some of the unavailable settings: Auto Config: Enabled Cache Read […]
Show full quote

Thanks for pointing me to AMISETUP as that seems key! Here are some of the unavailable settings:
Auto Config: Enabled
Cache Read: 3-2-2-2
Cache Write: 1 WS
DRAM Type: Fast Page
DRAM wait states: 1 WS
Keyboard Clock: 9.5 MHz
AT Clock: CPUCLK/4
IO Ready Time: 5/3 BCLK
Hold PD Bus: 2~3 T

Those do seem quite conservative.

While auto-config is often quite conservative, keep in mind that if auto-config is enabled, auto-config overrides at least Cache Read, Cache Write and DRAM wait states, so the awfully slow values printed there are most likely not in effect unless you run FSB50.

jakethompson1 wrote on 2020-07-12, 22:04:

I better set it back to autoconfig and put the Cx486DX2-80 back in before I tinker more.

By the way, Award BIOSes (I know them better than AMI) can use different auto-config table for Cyrix and Non-Cyrix CPUs. Speed differences between Cyrix and Intel processors might be caused by auto-config chosing different chipset configuration as well as actual core performance differences. I guess there are technical reasons (like different signal timing specifications) that require these differences, though.

jakethompson1 wrote on 2020-07-12, 22:04:

By the way, I bought some EPROM chips and a programmer expecting to have to update the BIOS for LBA support. It came updated already. You think there is a bit I can flip to put these settings in the normal BIOS setup? I ran strings on the bios image and those are in there so hopefully the code to support them is there and just disabled.

They are there and are just disabled. AMISETUP does not know anything about your boards or chipsets by itself. It just collects both enabled and disabled settings from the ROM BIOS and presents you all of them. It's likely just flipping a single bit to make the settings user-available (but you still need to adjust the checksum, too. And if your BIOS is of the newer self-decompressing variant, you also need a tool to unpack and repack the BIOS). I don't know about the tooling available for AMI BIOS ROMs, but I guess there must be some OEM BIOS adjustment tool for AMI BIOSes just as there is MODBIN for Award BIOSes. I hope someone else with AMI experience can chime in here...

Reply 25 of 28, by jakethompson1

User metadata
Rank Oldbie
Rank
Oldbie
mkarcher wrote on 2020-07-12, 22:38:

While auto-config is often quite conservative, keep in mind that if auto-config is enabled, auto-config overrides at least Cache Read, Cache Write and DRAM wait states, so the awfully slow values printed there are most likely not in effect unless you run FSB50.

It was actually overwriting them in addition to overriding. When I shut down, put the DX2-80 back in, and booted, I found they were even more conservative. Cache write and memory access were both 2 WS, etc.

This speedsys is with some tuning of the hidden settings. Looks more like it!

I can't get cache timings below 3-2-2-2 nor can I get cache write below 1 WS. Anything I'd be doing wrong? Again cache sram is 15ns and tag ram is 5ns.

Attachments

  • sstimg08.png
    Filename
    sstimg08.png
    File size
    10.21 KiB
    Views
    662 views
    File license
    Public domain

Reply 26 of 28, by jakethompson1

User metadata
Rank Oldbie
Rank
Oldbie
mkarcher wrote on 2020-07-12, 22:38:

While auto-config is often quite conservative, keep in mind that if auto-config is enabled, auto-config overrides at least Cache Read, Cache Write and DRAM wait states, so the awfully slow values printed there are most likely not in effect unless you run FSB50.

Hey, I just realized the tag ram HM3-65756F-5 is a 20ns chip, not 5ns which doesn't exist (sneaky!) .
The F means 20ns and the -5 means commercial grade.
This is probably why I can't get cache reads faster than 3-2-2-2 and writes faster than 1 WS, right?
Looks like I can replace all 9 chips with 12ns ones for $3 apiece, might do that just for fun just to see what happens.

Reply 28 of 28, by Anonymous Coward

User metadata
Rank l33t
Rank
l33t

The tag RAM is normally recommended to be one speed stepping faster than the rest of the cache. In theory if you have 20ns cache, the tag should be 15ns, and if you have 15ns cache the tag should be 12ns. However, in practice this often doesn't happen.

"Will the highways on the internets become more few?" -Gee Dubya
V'Ger XT|Upgraded AT|Ultimate 386|Super VL/EISA 486|SMP VL/EISA Pentium