VOGONS


First post, by mkarcher

User metadata
Rank l33t
Rank
l33t

I just reverse engineered a lot of how the UM8672 VLB IDE interface chip works. I used a PT-627B just as shown in Re: List of VLB IDE Controllers , but the findings should apply to any VL IDE interface card with that chip. As discussed in the linked thread, there is strong evidence that "UM8672" is the new name for the chip originally designed as "UM82C872".

The chip can be configured using I/O ports 108 (index port) and 109 (data port). The configuration mechanism gets unlocked by writing 0x5A to the index port (immediate effect without access to the data port), and locked by writing 0xA5 to the index port (again, immediate effect). Actual configuration access is performed using index values D0..DD. At least in the (possibly erratic) EXP4044 mainboard, performing memory access while the configuration is unlocked may corrupt other parts of the memory, so my tool I used for experimenting first loaded everything required into registers, then disabled interrupts, fetched the subsequent code (using REP LODSD) to have it in L1 cache, and then performed a unlock-index-data-lock access sequence without any intervening FSB cycles. This was the only way I could prevent system crashes when I started my tool from the Borland IDE.

The BIOS for the UM8672 posted in the other thread contains logic that alternatingly writes the index and the data register with the index value, starting and ending with a write to the index register for a total of 511 writes (256 to the index register and 255 to the data register). This code path is only invoked if normal access to read from the UM8672 configuration space fails (it doesn't fail on my controller). This code path is only used before reading, not before writing, and it may or may not overwrite writeable bits in the register accessed by it.

Lets start with discussing registers D6 and D7: Register D6 configures the "set-up time" for accesses to the primary channel, and D7 does the same job for accesses to the secondary channel. I measure setup time is the time beween the start of the VL cycle (the rising LCLK with /ADS asserted) to the activation of /DIOW or /DIOR on the IDE interface. I had my logic analyzer set to sample only on rising edges of LCLK, so the timings given in this "specification" may be slightly distorted due to quantization. Each register, D6 and D7 is split into two 4-bit halves (nibbles), with the least significant nibble configuring the setup time when the first drive (commonly known as "master") is selected, and the most significant nibble configuring the setup time when the second drive (commonly known as "slave") is selected. The setup time is 4 bus clocks plus the value programmed in the corresponding register. If a write to 1F6/176 changes the active drive, the set-up time for the drive that gets deselected is used.

In case the register accessed is not the data register 1F0/170, the command active time (the time /DIOW or /DIOR is asserted on the IDE interface) is 3 clocks more than the low nibble of configuration register at index DD. This is not specific for each drive, but globally configured for all drives. This parameter is not critical for performance, as the time spent to set up an IDE command is usually negligible compared to the time spent transferring data. The VL cycle is completed by asserting LRDY on the VL side during the last clock of the command active time.

In case the access is a read access to the data register 1F0/170, the command active time is taken from the high nibble of register D0..D3 (D0 for the first drive on the primary channel till D3 for the second drive on the secondary channel). The active time is again 3 clocks higher than the programmed value. In case the access is a 32-bit access, there will be two cycles on the IDE side, which are seperated by the recovey time, which is located in the low nibble of register D0..D3 (depending on the target drive). The recovery time is 4 clocks longer than the programmed value. A 32-bit VL cycle is completed by asserting LRDY during the last clock of the active time of the second IDE cycle.

In case the access is a write access to the data register, everything works the same way as for a read access, but the timings are taken from registers D8..DB instead of D0..D3.

There also is a register at indec DC which contains a two-bit field per drive (second drive of secondary channel in the top two bits, first drive of the primary channel in the bottom two bits), which does not have an effect I could observe with my experiments. Just like the other timing parameters, this register is programmed with lower values for faster speeds.

Finally, there is a read-only register at index D5 which reads A0, which is used to check for the presence of the UM8672 chip, and the bits 4 and 5 of index DD contain the jumper settings, which are sampled during reset only. As I did no pinout reverse engineering, I don't know for sure, but I expect the jumpers to be implemented as pullup/pulldown on pins used as output pins during normal operation, which prevents live sampling of the jumper setting.

The BIOS contains FSB-specific mapping on two layers: First, it has a table of known non-PIO3 drives which get mapped varying speeds. The table in the BIOS is a superset of the table shown in the printed manual at https://theretroweb.com/expansioncard/documen … f0529457776.pdf . The values in the BIOS seem to be more conservative for some drives than the values in the manual. I compared the first five drives:

CP3044    3/2/1  3/2/1 (identical)
CP30084E 10/8/5 10/8/5 (identical)
CP30104 8/4/0 8/5/3 (slower in BIOS)
CP30104H 8/4/0 8/4/3 (slower in BIOS)
CP30174E 10/10/8 10/10/8 (identical)

The first set of three numbers is the set of "speed values" for FSB25/FSB33/FSB50 as configured by the BIOS, while the second set of three numbers is the set of "speed values" as printed in the manual. So you see that the speed values highly depend on the FSB clock, and it is thus expected that a certain "speed value" indicates certain fixed parameters for setup, active and dword-recovery time. Yet there are three tables mapping speed values to these parameters, one for each "supported" FSB clock (but note that the FSB25 setting is never used by that BIOS).

The speed settings for the data port are, given as setup/active/recovery/active for a 32-bit VL cycle

 0 = 19+18+19+18 (Jumper "<40MB HDD")
1 = 15+15+15+15 (Jumper "40/50 MHz default")
2 = 4+15+4+15 (Jumper "25/33 MHz default")
3 = 6+8+12+8 for read; 6+15+7+15 for write
4 = 6+6+10+6 for read; 6+15+7+15 for write
5 = 5+6+7+5 for read; 5+15+7+15 for write
6 = 5+5+7+5 for read; 5+15+7+15 for write
7 = 4+9+4+9 (Jumper "CPUCLK <20MHz")
8 = 5+5+5+5 for read; 5+9+8+9 for write
9 = 5+4+5+4 for read; 5+9+8+9 for write
10 = 5+4+4+4 for read; 5+9+8+9 for write
11 = 5+3+4+3 for read; 5+9+8+9 for write

This is the 33MHz table. The 25MHz table (unused) is the same for reads, but uses 6/5+13+7+13 instead of 6/5+15+7+15 and 5+8+7+8 instead of 5+9+8+9 for writes. You might notice that the jumperable configurations have a different approach to timing than the other speeds: Their setup and inter-word recovery time is equal, as well as read and write timings are equal, while the other timings tend to use bigger recovery times than setup time except for the fastest settings. I guess that the four jumperable timings might be taken over from the earlier UM82C871 chip and kept the same for compatibility, or they might be what the chip does without BIOS assistance given those jumper settings. Note a mistake in the manual: The manual lists "active time" and "recovery time", but the values printed in the manual actually are the "active time" and the "cycle time", in which the "cycle time" is the sum of the "active time" and the "recovery time".

It seems surprising that the inter-word recovery time for 32-bit access (eg 12 clocks for reads on speed 3) is bigger than the setup tim (eg 6 clocks for reads on speed 3), but you as the card is designed for the classic 486DX (without L1WB), you can be sure that there will be a cycle writing a 32-bit value to memory between two VL cycles accessing the IDE interface card. On the EXP4044 board with a non-clock-doubled 486 processor, there is a 6-clock gap between the VL cycles in REP INSD, so the actual burst timing is 12+8+12+8 at speed 3. On the other hand, REP OUTSD, if it hits L1 cache, may perform back-to-back cycles, so it stays at 6+15+7+15. This is an effective cycle time of 20 clocks for reading and 21.5 clocks for writing. Writing is noticably slower than reading especially at higher speeds, which may be either for data integrity concerns (an overly fast read corrupts data once; an overly fast write corrupts data forever), or because writes are less common than reads, or because writes may be performed back-to-back. Possibly experimentation showed that writing is required to be slower for common drives than reading.

The 50MHz table has slight variations in read performance for speeds 5 and 6, and generally slower writes ( 15+9+15 instead of 15+7+15; 10+9+10 instead of 9+8+9).

Finally, non-data transfers use the same setup time as given in the data speed table, but an active time of 12/14/15 clocks for FSB25/33/50.

The speed determination of the BIOS assumes CPU clock equals FSB clock, so a clock-doubled processor at 50MHz will cause the FSB50 tables to be used, alth0ugh that processor runs at FSB25. Furthermor, it does not take the different cycle counts of the Cyrix core into account. The Cx486DX2-80 scores an "Intel score" of 90.7MHz, so a Cx486DX-33 is sopposed to score around 37 "Intel MHz", which will already suffice to also shift into the FSB50 configuration.

With not too old hard drives, using the FSB50 tables is not that bad. While you get the slowest command rate (active time 12 clocks), all hard drives that claim to support PIO3 with IORDY are configured to use "speed 10" if there is no non-PIO3 hard drive in the system, and "speed 9" otherwise - independent of FSB clock!

Any non-harddisk device, and any hard disk that doesn't claim to support PIO3 with IORDY is configured to the jumpered speed.

So, what do the rates mean in practice? Assuming an overhead of 6 clocks between REP INW / REP INSD iterations (as measured with the Intel 486DX in the EXP4044 board), the read burst data rates during data transfer are at FSB 25/33/40/50:

 0: 1.16 / 1.53 / 1.86 / 2.33 MB/s  1.25 / 1.65 / 2.00 / 2.50 MB/s
1: 1.39 / 1.83 / 2.22 / 2.78 MB/s 1.52 / 2.00 / 2.42 / 3.03 MB/s
2: 2.00 / 2.64 / 3.20 / 4.00 MB/s 2.27 / 3.00 / 3.64 / 4.55 MB/s
3: 2.50 / 3.30 / 4.00 / 5.00 MB/s 2.50 / 3.30 / 4.00 / 5.00 MB/s (16bit == 32 bit in this case)
4: 2.78 / 3.67 / 4.44 / 5.56 MB/s 2.94 / 3.88 / 4.70 / 5.88 MB/s
5: 2.94 / 3.88 / 4.70 / 5.88 MB/s 3.45 / 4.55 / 5.52 / 6.90 MB/s
6: 3.13 / 4.13 / 5.00 / 6.25 MB/s 3.57 / 4.71 / 5.72 / 7.14 MB/s
7: 2.63 / 3.47 / 4.21 / 5.26 MB/s 3.13 / 4.13 / 5.00 / 6.25 MB/s (yeah, this one is "too slow" for this rank)
8: 3.13 / 4.13 / 5.00 / 6.25 MB/s 3.84 / 5.08 / 6.15 / 7.69 MB/s
9: 3.33 / 4.40 / 5.33 / 6.66 MB/s 4.16 / 5.50 / 6.66 / 8.33 MB/s
10: 3.33 / 4.40 / 5.33 / 6.66 MB/s 4.35 / 5.74 / 6.96 / 8.70 MB/s
11: 3.57 / 4.71 / 5.72 / 7.14 MB/s 4.76 / 6.29 / 7.62 / 9.52 MB/s

Please note that these are theoretical values considering just the data transfer phase, ignoring the command overhead. Also, these rates ignore timer interrupts, RAM refresh and so on, so you will not see anything quite close to those in actual benchmarks. I've seen "buffered reads" of 5.0MB/s at speed 10 @ FSB33 (theoretical burst rate: 5.7) and 5.7MB/s @ FSB40 (theoretical burst rate 7.0 MB/s). I might have added RAM wait states in my FSB40 benchmark, which might explain why the practical rate scales worse than the theoretical rate. Also note that none of these values actually reach the PIO3 rate of 11MB/s, even at FSB50.

Reply 1 of 3, by jakethompson1

User metadata
Rank l33t
Rank
l33t

You may have seen that the drivers have special case code to reprogram one of the chipset registers while accessing the configuration registers if the machine has a UMC 491 or 498 chipset. It's easiest to see in the NT 3.x driver as all the functions have export symbols - Delay_1T_498() and Restore_498() and same for 491. I couldn't figure out exactly why but it may be related to the case where the chip detection fails (which I have actually run into before - the Linux driver can fail to see the chip). Obviously this is a huge breakdown in abstraction since the machine may not even have a UMC chipset. I don't know if this has any bearing on your findings and the lock-up symptom without priming L1 cache.

You've found this chip has no readahead feature other than 32 to 16 bit translation, right?

Reply 2 of 3, by mkarcher

User metadata
Rank l33t
Rank
l33t
jakethompson1 wrote on 2025-11-23, 00:31:

You may have seen that the drivers have special case code to reprogram one of the chipset registers while accessing the configuration registers if the machine has a UMC 491 or 498 chipset. It's easiest to see in the NT 3.x driver as all the functions have export symbols - Delay_1T_498() and Restore_498() and same for 491.

I did not look at any drivers (yet), I just reverse engineered the BIOS and experimented with register settings and a logic analyzer. I had the sample clock taken from LCLK and data probes on /ADS, /LRDY (all of these signals taken from an empty VL slot), as well as /CS1Fx, /CS3Fx, /DIOR and /DIOW on the secondary IDE port.

jakethompson1 wrote on 2025-11-23, 00:31:

I don't know if this has any bearing on your findings and the lock-up symptom without priming L1 cache.

Possibly it does. If it configures the chipset to add extra wait states to VL cycles, this may work around FSB corruption issues.

jakethompson1 wrote on 2025-11-23, 00:31:

You've found this chip has no readahead feature other than 32 to 16 bit translation, right?

That's entirely correct. The chip (at least with the programming as performed by the BIOS) does neither write posting nor readahead. The missing write posting will cause a performance hit as soon as the data to be written is not in L1 cache. I did not calculate theoretical write burst rates, but writes seem to be slow anyway as programmed by the BIOS. The Linux driver on the other hand programs write timings to be equal to the read timings.

Reply 3 of 3, by mkarcher

User metadata
Rank l33t
Rank
l33t

Well, don't get me started about that Windows NT 3.1 driver. It seems to be "works on my machine, let's ship it" quality.

  • As you already noticed, it has support to configure two contemporary UMC chipsets in a way that configuration of the chip is possible without encountering "issues", but it doesn't check whether the target chipset is actually present. Given the IO addresses 28/2A for the UM82C498 and C022/C024 for the UM82C491, this seems to be benign, as these addresses are uncommon chipset configuration addresses, so that code is unlikely to mess up other chipsets. Nevertheless, On the DataExpert EXP4044 with its Opti 802 chipset, some workaround seems required, as I experienced, which this driver does not provide.
  • Compared to the BIOS, they improved the bus speed detection code. It still measures Intel-Equivalent MHz (thus getting wrong results on Cyrix CPUs), but now it also times REP OUTSB to port 1F2, to actually get an impression of the bus speed, and then divides the CPU speed by 2 or 3 if appropriate. The algorithm is "if the CPU clock is below estimated bus clock + 15MHz, assume x1. Otherwise, if half the CPU clock is below estimated bus clock + 8MHz, assume x2. Otherwise, assume x3". The thresholds for 25/33/50 MHz are 27MHz and 36MHz. This code does not support x4 (AMD 5x86), but it has more issues. It issues REP OUTSB without setting up ESI, which obviously just happens to point to readable memory at that point. While I still didn't measure it, I assume a Cyrix 486DX-33 to score around 37 to 38 "Intel-equivalent MHz" in that test, pushing it into the FSB50 category. But worst of all, the code sets "Speed 1" before this measurement (which is a prudent thing to do), which sets all the drive-specific timing parameters to "Speed 1", but it does not configure the global non-data active time setting. As it is REP OUTSBing to a non-data port, that setting is relevant for the benchmark it does. Depending on FSB clock, the BIOS I had at hand configures this value to 14 or 15 clocks, so in that case, the assumption of 30 clocks cycle time (or so) may still hold, as the drive-specific setup time at speed 1 is 15 clocks.