VOGONS


Worth it to try to push ISA bus past 12MHz?

Topic actions

Reply 20 of 35, by mkarcher

User metadata
Rank l33t
Rank
l33t
Anonymous Coward wrote on 2019-08-18, 12:52:

Personally, I think a busmastering SCSI card over 12MHz is not a good idea. I don't run mine faster than 10MHz. If you're going to do it, at least use a 1542CF or CP. The 1542B is pretty ancient.

ISA is a strange thing. Most cycles on the ISA bus are not synchronous to the bus clock. For example, if you want to write an 8-bit value to an I/O port, you place the data value and the port address (A0..A15) onto the bus, and some time later (after the "setup time"), you enable /IOW for some minimum amount on time. If the device isn't able to take the value within the default minimum time, it can request extra time by lowering IOCHRDY to request the address, data and /IOW signal to keep active while IOCHRDY is low. As soon as IOCHRDY is no longer low and the minimum /IOW time has expired, the bus owner may release /IOW and the address and data lines. While this explanation is slightly simplified (e.g. not talking about AEN), it captures the philosophy of operation on the ISA bus. Note I did not ever write "clock" in the description of an ISA cycle, and that's not because I purposefully left it out, but because the standard ISA cycles are defined without referring to the CLK signal!

If an Adaptec 1542 card is busmastering, it uses its own time source to drive the bus, and doesn't respect the ISA clock at all. For the 1542CF, which can go up to 10MB/s, the time /MEMW or /MEMR is active is quantized on a 50ns grid, i.e. derived from a 20MHz (or 10MHz using both edges) clock. The bus mastering aspect on the 1542 is not at all dependent on the bus clock. On the other hand, you are perfectly correct that the 1542 is a quite old card, and might not be able to cope with ISA cycles that are "too fast", but this does not relate to the bus master cycles, but to the BIOS memory reads and the I/O access to the ports 330-333. I don't know whether the Adaptec 1542 uses IOCHRDY to slow down the cycles if the board is "too fast", or just relies on a minimum time (which is specified in the ISA specification).

Actually, there are 3 ways of timing an ISA cycle: The default one is to not drive IOCHRDY, and rely on the default cycle time. Faster bus clocks can be compensated by adding extra wait states, so the cycle time stays the same. That's why many 386/486 boards add more wait states to 8-bit cycles (might target cards designed for 4.77MHz) than to 16-bit cycles (those cards should be designed for 6MHz at least, as this was the first AT clock frequency). I already explained the second way, which is slowing down a cycle using IOCHRDY. The third way is speeding up a cycle (AT only) by asserting /0WS (aka /NOWS), which instructs the mainboard to ignore the minimum cycle time and perform the bus cycle as fast as possible. Actually, zero-wait-state cycles are the only kind of cycles that do have a reference to the bus clock signal in the ISA specification. For the AHA1542, I don't expect it to use 0WS cycles at all, as the BIOS chip is "slow" anyway, and I/O access to the card is rare. So it should work perfectly at any ISA clock frequency, as long as the mainboard adds a sufficient amount of wait states to "default" cycles that the cycle time stays within spec.

On the other hand, all "fast" graphics cards, like the CL-GD542x, the ET4000 or the TVGA8900D do use the /0WS signal and suppress the default wait states. Running an ISA system at 16MHz thus just requires to have enough default wait states that "slow cards" still work and only the "fast cards" that use 0WS need to be 16MHz-capable. It depends on the chipset (if there is a chipset at all) how to configure the default wait state. Some 286 chipsets have the default wait-state count as hardware strap options, which are fixed on the board.

Reply 21 of 35, by Marco

User metadata
Rank Oldbie
Rank
Oldbie

Thanks for the good insights !

1) VLSI SCAMP 311 | 386SX25@TI486SXLC2-50@63 | 16MB | CL-GD5434 | CT2830| SCC-1 | MT32 | WDC160GB/7200/8MB | Fast-SCSI AHA 1542CF + BlueSCSI v2/15k U320
2) SIS486 | 486DX/2 66(@80) | 32MB | TGUI9440 | LAPC-I

Reply 22 of 35, by Marco

User metadata
Rank Oldbie
Rank
Oldbie

I experienced an additional very "interesting" behaviour on my side:

- when using an osci (36 / 40 MHz normally leading to 18MHz or 20MHz ISA) the system independently uses a /6 divider leading to very low bus speed at the end (6MHz / 6,66MHz)
- when running e.g. 27Mhz osci the std. divider of /2 is being used
- there is no (at least not visible to me) BIOS setting to manage this - nor in the BIOS nor AMISETUP tool

I will check the chipset documentation to find out more plus I will try to find the barrier of MHz where to board starts to switch to an "emergency divider".

Update: 35Mhz = 17,5 MHz ISA works fine with standard divider.

Last edited by Marco on 2025-04-24, 16:32. Edited 1 time in total.

1) VLSI SCAMP 311 | 386SX25@TI486SXLC2-50@63 | 16MB | CL-GD5434 | CT2830| SCC-1 | MT32 | WDC160GB/7200/8MB | Fast-SCSI AHA 1542CF + BlueSCSI v2/15k U320
2) SIS486 | 486DX/2 66(@80) | 32MB | TGUI9440 | LAPC-I

Reply 23 of 35, by Marco

User metadata
Rank Oldbie
Rank
Oldbie

Results: Besides some synthetic benchmarks I see no difference between 13,5 and 17,5 MHz Isa in following benchmarks:

Doom low: same
PCPBench VGA: same
3DBench (goes from 31,2 to 32,x) but I sometimes think 3dbench has some fixed numbers as possible results

That’s all very strange. The only setting I had to adopt is increase 16bit I/O WS by +1 due to HDD errors.

The GD5428 shows significant pixel errors. I can minimize them By overclocking the mem clock by +10%

1) VLSI SCAMP 311 | 386SX25@TI486SXLC2-50@63 | 16MB | CL-GD5434 | CT2830| SCC-1 | MT32 | WDC160GB/7200/8MB | Fast-SCSI AHA 1542CF + BlueSCSI v2/15k U320
2) SIS486 | 486DX/2 66(@80) | 32MB | TGUI9440 | LAPC-I

Reply 24 of 35, by mkarcher

User metadata
Rank l33t
Rank
l33t
Marco wrote on 2025-04-24, 16:30:
Results: Besides some synthetic benchmarks I see no difference between 13,5 and 17,5 MHz Isa in following benchmarks: […]
Show full quote

Results: Besides some synthetic benchmarks I see no difference between 13,5 and 17,5 MHz Isa in following benchmarks:

Doom low: same
PCPBench VGA: same
3DBench (goes from 31,2 to 32,x) but I sometimes think 3dbench has some fixed numbers as possible results

That’s all very strange. The only setting I had to adopt is increase 16bit I/O WS by +1 due to HDD errors.

This seems to prove that the bottleneck in your system is not the ISA bus. I don't see you mention your system in this thread. You mention the graphics chip, the GD5428. While it has mediocre (though not really bad) VL performance, it is quite good on the ISA bus, coping with 0WS. Nevertheless, there might be limits imposed by the memory bandwidth between the GD5248 and the graphics memory. As you say you increased the memory clock of your Cirrus chip, and the scores still didn't change, it looks like the bottleneck of your system is most likely the CPU itself, or the main memory access.

Marco wrote on 2025-04-24, 16:30:

The GD5428 shows significant pixel errors. I can minimize them By overclocking the mem clock by +10%

The Cirrus data sheet actually calls for a minimum memory clock depending on the write cycle duration: In table 7-3 ("Memory Write Timing (ISA Bus)", the specification asks for the duration of the /SMEMW pulse for writing to be at least 3 times the memory clock period, and the recovery time between two writes to be 3 times the memory clock period as well, so you get 6 memory clocks per ISA write. An ISA write at 0WS takes two bus clocks, so the minimum memory clock is 3 times the ISA clock assuming you get a 50:50 duty cycle on /SMEMW. This means the minimum memory clock for 12.5MHz ISA at 0WS is 37.5MHz, whereas at 17.5MHz, a memory clock of 52.5MHz would be required. This perfectly fits your observation that the default 50MHz pixel clock of CL-GD5428 cards is insufficient to keep up with 17.5MHz ISA clock.

Reply 25 of 35, by Marco

User metadata
Rank Oldbie
Rank
Oldbie

Great conclusion esp. with the gd5428. Thx. My system is the one in my signature with the ti486sxlc2 cpu. Thus cpu bottleneck should hopefully be unlikely.

1) VLSI SCAMP 311 | 386SX25@TI486SXLC2-50@63 | 16MB | CL-GD5434 | CT2830| SCC-1 | MT32 | WDC160GB/7200/8MB | Fast-SCSI AHA 1542CF + BlueSCSI v2/15k U320
2) SIS486 | 486DX/2 66(@80) | 32MB | TGUI9440 | LAPC-I

Reply 26 of 35, by mkarcher

User metadata
Rank l33t
Rank
l33t
Marco wrote on 2025-04-24, 18:08:

My system is the one in my signature with the ti486sxlc2 cpu. Thus cpu bottleneck should hopefully be unlikely.

In that case, I suspect the access to main memory through the FSB to limit the Doom performance. While the SXLC2 has a sensible amount of L1 cache, the amount of data moved in rendering doom frames clearly exceeds the cache size by a lot.

Reply 27 of 35, by Marco

User metadata
Rank Oldbie
Rank
Oldbie

Possible. During all tests the „FSB“ remained at 30MHz as the MB chipset won’t run reliable at higher speed (I assume limits of the old caps etc as the board sometimes runs at 33MHz. Sometimes not. If it runs it runs flawless)

Update remark: in „vidspeed *“ I reach about 25mb read and 12-13mb write /s for 32bit

1) VLSI SCAMP 311 | 386SX25@TI486SXLC2-50@63 | 16MB | CL-GD5434 | CT2830| SCC-1 | MT32 | WDC160GB/7200/8MB | Fast-SCSI AHA 1542CF + BlueSCSI v2/15k U320
2) SIS486 | 486DX/2 66(@80) | 32MB | TGUI9440 | LAPC-I

Reply 28 of 35, by mkarcher

User metadata
Rank l33t
Rank
l33t
Marco wrote on 2025-04-24, 20:40:

Possible. During all tests the „FSB“ remained at 30MHz as the MB chipset won’t run reliable at higher speed (I assume limits of the old caps etc as the board sometimes runs at 33MHz. Sometimes not. If it runs it runs flawless)

Update remark: in „vidspeed *“ I reach about 25mb read and 12-13mb write /s for 32bit

25mb read from video memory over ISA is impossible. The theoretical maximum for ISA@17.5MHz is 17.5MB/s, and you are not going to get real close to that limit as long as the ISA bus also has the mandatory (according the spec) memory refresh cycles on it. Furthermore, you only get the good write performance to "modern" ISA VGA cards, because they buffer writes, and merge them internally into a bunch of 32-bit writes that are performed in fast page mode to the video RAM. This approach doesn't work for reading in the same way as it works for writing: a card may claim "write finished" and perform a part of the actual write operation in the background ("posted write") while a read operation needs to be finished (i.e. the data needs to be retrieved from video RAM) before the processor may move on to the instruction. The mirror of "posted writes" would be "read ahead", in which a single (16-bit) memory read causes the card to fill a larger read-ahead buffer so consecutive reads can be fulfilled from that buffer without requiring video memory access. I don't know of any video card implementing read-ahead for ISA/VL memory reads (although "modern" VGA cards to implement read-ahead for the video scanout, they call it "FIFO"), so read performance is way worse than write performance. Did you misread the value by a decimal digit? 2.5mb/s sounds like a quite good, yet still plausible value.

12-13mb/s write performance is around what you expect, it might be slightly higher, up to 15MB/s at 17.5MHz ISA clock. On the other hand, there will be synchronization losses between the 30MHz FSB clock and the 17.5MHz ISA clock, so it wouldn't surprise me if 17.5MHz ISA doesn't provide any advantage over 15MHz ISA (i.e. half the processor clock). 12-13MB/s is the practical limit at 15MHz ISA clock, mainly due to refresh overhead.

If the front-side bus is blocked during the whole ISA write cycle on VGA memory writes, you should be able to identify a performance difference between ISA@12.5MHz (160ns per 2-clock write cycle, which would get rounded up to around 166ns (5 FSB clocks @ 30MHz) and ISA@17.5MHz (114ns per 2-clock cycle), which could hit 133ns (4 FSB clocks @ 30MHz). If DOOM is starved on FSB (which is IMHO an entirely valid assumption on any kind of 486SLC processor with clock doubling), the different duration of blocking the FSB would directly affect the DOOM benchmark score. So I expect your board to contain an ISA write buffer, taking the write cycle from the host processor and acknowledging within 2 FSB clock (3 FSB clocks if the board doesn't use pipelined cycles), and then finishing the ISA cycle while the FSB is already free for other tasks. You only notice a difference between ISA@12.5 and ISA@17.5 if you are able to issue an FSB write cycle to ISA more often than every 5 FSB clocks (every 10 processor clocks). A REP MOVSW from a pre-rendered screen buffer to video memory should be able to hit a 4 FSB clock rate, so a performance difference should be visible if DOOM would pre-render into main memory and then copy to video memory, if you manage to run your memory at 0WS (no idea whether this is possible at 30MHz FSB). At 1WS, the FSB actions of REP MOSVW will take 5 FSB clocks (3 for the read, 2 for the posted ISA write), so you are already topped out at 12.5MHz. REP MOVSD wouldn't make any difference on a 486SLC as soon as the FSB is the bottleneck, as the FSB is just 16 bits wide, and the processor itself is fast enough to hit the FSB limit even at MOSVSW (any Cx486SLC-derived CPU, even without clock doubling, should be able to hit the FSB limit on REP MOVSW, if the source is not in L1. Any clock-doubled CPU should even hit the FSB limit if the source is in L1). I don't know whether classic DOOM does pre-render into main memory, but I do know that FASTDOOM does.

If DOOM would render textures directly into video memory from texture memory, there is software scaling included in the loop, and likely writes as vertical lines, which would prevent the video card from merging writes into page-mode bursts, so the 12-13MB/s rate wouldn't be reached at all. OTOH, if you were running into that limit, increasing the Cirrus MCLK should be directly visible as performance improvement.

Reply 29 of 35, by mkarcher

User metadata
Rank l33t
Rank
l33t
mkarcher wrote on 2025-04-25, 08:56:
Marco wrote on 2025-04-24, 20:40:

Possible. During all tests the „FSB“ remained at 30MHz as the MB chipset won’t run reliable at higher speed (I assume limits of the old caps etc as the board sometimes runs at 33MHz. Sometimes not. If it runs it runs flawless)

Update remark: in „vidspeed *“ I reach about 25mb read and 12-13mb write /s for 32bit

25mb read from video memory over ISA is impossible.

Oops, I just found out that "vidspeed *" is supposed to measure main memory, not video memory. 25MB/s read at FSB30 from main memory sounds OK, although maybe that's just "from L1"not "from memory". For L1, REP LODSD is limited to 1 DWORD / 5 clock cycles (assuming a Cyrix execution core), which is 48MB/s, so I assume the block size exceeds L1 cache size.

12MB/s write is disappointing, though.

Reply 30 of 35, by Marco

User metadata
Rank Oldbie
Rank
Oldbie

Thanks. I re run the test. Though slightly better as written. The results are the same no matter 13,75 or 17,5 MHz.
DRAM WS=0

1) VLSI SCAMP 311 | 386SX25@TI486SXLC2-50@63 | 16MB | CL-GD5434 | CT2830| SCC-1 | MT32 | WDC160GB/7200/8MB | Fast-SCSI AHA 1542CF + BlueSCSI v2/15k U320
2) SIS486 | 486DX/2 66(@80) | 32MB | TGUI9440 | LAPC-I

Reply 31 of 35, by mkarcher

User metadata
Rank l33t
Rank
l33t
Marco wrote on 2025-04-25, 10:12:

Thanks. I re run the test. Though slightly better as written. The results are the same no matter 13,75 or 17,5 MHz.
DRAM WS=0

"vidspeed *" tests the main memory, not the video memory. You shouldn't get different results depending on the ISA clock in that benchmark. And it's 30MB/s write performance now (FSB limit!) and 17MB/s read performance. That's not optimal, but it might be the fastest you can get on that board. You clearly see that 32 bit reads are faster than 16 bit reads, so obviously 16-bit reads hit a CPU limit. The bus limit is the same for both 16 and 32 bit reads, as the CPU itself splits 32-bit reads into 16-bit reads.

Reply 32 of 35, by Marco

User metadata
Rank Oldbie
Rank
Oldbie

Thanks a lot for your help and detailed insights. For an SX board w/o L2 cache results shouldn’t be that bad. I also understand that it’s a FSB limitation mainly. Thanks for that.

So next objective will be how to increase the FSB.

1) VLSI SCAMP 311 | 386SX25@TI486SXLC2-50@63 | 16MB | CL-GD5434 | CT2830| SCC-1 | MT32 | WDC160GB/7200/8MB | Fast-SCSI AHA 1542CF + BlueSCSI v2/15k U320
2) SIS486 | 486DX/2 66(@80) | 32MB | TGUI9440 | LAPC-I

Reply 33 of 35, by MikeSG

User metadata
Rank Member
Rank
Member
Marco wrote on 2025-04-24, 16:30:
Results: Besides some synthetic benchmarks I see no difference between 13,5 and 17,5 MHz Isa in following benchmarks: […]
Show full quote

Results: Besides some synthetic benchmarks I see no difference between 13,5 and 17,5 MHz Isa in following benchmarks:

Doom low: same
PCPBench VGA: same
3DBench (goes from 31,2 to 32,x) but I sometimes think 3dbench has some fixed numbers as possible results

That’s all very strange. The only setting I had to adopt is increase 16bit I/O WS by +1 due to HDD errors.

The GD5428 shows significant pixel errors. I can minimize them By overclocking the mem clock by +10%

I get the same with my 386sx and a GD5429. 3DBench score increases a little, everything else the same.

It struck me that memory (system RAM) was the bottleneck.

I know you talked about video memory speed, but I've overclocked that on the GD5429 from 50Mhz to 80Mhz and 3dBench only increases by 0.1FPS.

The bottlenecked system RAM is shared by both system & video card, so the more the L1 cache is used (3dBench) the more the high ISA clock can be utilised.

The Chips 82C836 chipset I use has an 8-bit fast video memory mode which makes me think if the video card uses less RAM to begin with it might use the system RAM less, but 8-bit cards aren't all that fast.

Reply 34 of 35, by Marco

User metadata
Rank Oldbie
Rank
Oldbie

Thanks for the confirmation. And wow from 50 to 80 is a lot. Quite probable you will notice differences in GDI acceleration performance when benching with Wintach

1) VLSI SCAMP 311 | 386SX25@TI486SXLC2-50@63 | 16MB | CL-GD5434 | CT2830| SCC-1 | MT32 | WDC160GB/7200/8MB | Fast-SCSI AHA 1542CF + BlueSCSI v2/15k U320
2) SIS486 | 486DX/2 66(@80) | 32MB | TGUI9440 | LAPC-I

Reply 35 of 35, by Marco

User metadata
Rank Oldbie
Rank
Oldbie

I made a rare comparison of 16MHz ISA asynchronous vs 15 MHz ISA synchronous Bus Speed based on the 386 board in my

Although ISA bus is 1MHz lower it makes some differences in synthetic I/O benchmarks for video and especially IDE.

No game changer but maybe interesting 😀

Sorry for the layout. Via mobile.

Another remark: overclocking mclk of the gd5428 has direct impact on the read results of vidspeed L. Increasing clock from 50mhz to 53,5mhz raised the vidspeed L to 9,95/4,2

Update: while writing this I noticed that I had to increase IO 16BIT WS by +1 when running at 16MHz so maybe that’s the reason 😀) deleted as the additional WS has nearly no impact.

1) VLSI SCAMP 311 | 386SX25@TI486SXLC2-50@63 | 16MB | CL-GD5434 | CT2830| SCC-1 | MT32 | WDC160GB/7200/8MB | Fast-SCSI AHA 1542CF + BlueSCSI v2/15k U320
2) SIS486 | 486DX/2 66(@80) | 32MB | TGUI9440 | LAPC-I