VOGONS


Reply 20 of 48, by megatron-uk

User metadata
Rank Oldbie
Rank
Oldbie

Been playing around with this setup a bit more - I really like it.

Managed to work out how to do shadowing of the XT-IDE BIOS, so IDE transfer rates have more than doubled (~2.7MBytes/sec), plus got "The Last Byte" memory manager installed which has specific support for the Symphony SL82C461 chipset, releasing up to 256KB of upper memory.

The only trouble I've still got is the need to replace the Dallas RTC - (it's now socketed, so I need to either butcher a 1287, or fit a 'new' one), and some bizarre behaviour from speedsys; it locks up with the 486DLC chip fitted, and with the 386DX fitted it hangs at the extended memory test. Other than that, everything seems good - I'll probably not bother tracking down any further 386 boards; it's pretty clear that even if you find one of the mythical, working, 386DX/486 Forex boards, it's only single percentage points different if anything.

One more weird thing, and it's not related to the motherboard as such, but I cannot get a 64GB Sandisk Extreme CF card working - it partitions, formats and sys's fine via a Win98SE boot floppy, but on booting via the XT-IDE boot menu I consistently get 'missing operating system'. Switch to an equivalent Sandisk Extreme 16GB card and the same procedure works perfectly.

Nearly completed all the testing I want to do (DX vs DLC, cache enabled vs disabled, 8/10/13MHz ISA clock comparison, Trident vs Tseng vs Cirrus) and will get the data up here once I've finished.

Benchmarking/progress page on my wiki: https://www.target-earth.net/wiki/doku.php?id … 86_shootout_fic

My collection database and technical wiki:
https://www.target-earth.net

Reply 21 of 48, by megatron-uk

User metadata
Rank Oldbie
Rank
Oldbie

Full set of results from testing.

The configurations are:

Config 0
386DX-40
16MB 60ns RAM
256KB motherboard cache at 15ns
VGA, System & XTIDE ROM shadowing enabled
0ws added
ISA Clock @ 10MHz

Config 1
386DX-40
16MB 60ns RAM
256KB motherboard cache at 15ns
VGA, System & XTIDE ROM shadowing enabled
0ws added
ISA Clock @ 13MHz

Config 2
486DLC-40
16MB 60ns RAM
1KB CPU cache DISABLED
256KB motherboard cache at 15ns
VGA, System & XTIDE ROM shadowing enabled
0ws added
ISA Clock @ 10MHz

Config 3
486DLC-40
16MB 60ns RAM
1KB CPU cache ENABLED
256KB motherboard cache at 15ns
VGA, System & XTIDE ROM shadowing enabled
0ws added
ISA Clock @ 10MHz

Config 4
486DLC-40
16MB 60ns RAM
1KB CPU cache ENABLED
256KB motherboard cache at 15ns
VGA, System & XTIDE ROM shadowing enabled
0ws added
ISA Clock @ 13MHz

In addition, each configuration tests a Trident 9000, Tseng Labs ET4000AX and a Cirrus Logic GD5428 as VGA #1, #2 and #3, respectively.

Disk IO is via a Sandisk Ultra 16GB 50MB/sec compact flash card connected as primary master on an Acer M5105 multi-IO card.

FIC386SC_results.png
Filename
FIC386SC_results.png
File size
167.07 KiB
Views
613 views
File license
CC-BY-4.0

Charts up next...

Last edited by megatron-uk on 2021-05-15, 22:21. Edited 1 time in total.

My collection database and technical wiki:
https://www.target-earth.net

Reply 22 of 48, by megatron-uk

User metadata
Rank Oldbie
Rank
Oldbie

Disk throughput with XT-IDE ROM shadowed to RAM:

FIC386SC_diskio.png
Filename
FIC386SC_diskio.png
File size
29.77 KiB
Views
613 views
File license
CC-BY-4.0

You absolutely must try to enable shadowing for the XT-IDE BIOS - on my previous motherboards where I have used it the option wasn't available, but on this board I was able to activate shadowing for the region that the XT-IDE BIOS was present and the disk throughput scores improved by well over 100%; hovering just over the 1MB/sec level with a non-shadowed ROM, and up to 2.7MB/sec when shadowed. Increasing the ISA clock also makes a significant improvement to the throughput, whether using the shadowed ROM or not.

Synthetic CPU/FPU scores from Landmark:

FIC386SC_cpu.png
Filename
FIC386SC_cpu.png
File size
42.29 KiB
Views
613 views
File license
CC-BY-4.0

Synthetic Dhrystone/Whetstone scores:

FIC386SC_dhry.png
Filename
FIC386SC_dhry.png
File size
39.5 KiB
Views
613 views
File license
CC-BY-4.0
FIC386SC_whet.png
Filename
FIC386SC_whet.png
File size
36.51 KiB
Views
613 views
File license
CC-BY-4.0

VGA Performance metrics:

FIC386SC_vga.png
Filename
FIC386SC_vga.png
File size
38.18 KiB
Views
613 views
File license
CC-BY-4.0
Last edited by megatron-uk on 2021-05-15, 22:14. Edited 1 time in total.

My collection database and technical wiki:
https://www.target-earth.net

Reply 23 of 48, by megatron-uk

User metadata
Rank Oldbie
Rank
Oldbie

Main RAM throughput results - this deserves some explanation; you would think that with the speed to memory being set the same during each test (same amount of wait states, same processor frequency) that the throughput would be the same, well, it's not as simple as that - adding in the on-cpu cache effectively introduces another bottleneck to accessing main RAM (albeit that bottleneck itself is very fast). Each level of cache that you need to read from or write to in order to get the memory location you want from DRAM is another element which slows down your effective main RAM access.

It's counter-intuitive at first, and looking at the memory throughput figures only tells half the story - you also need to consider that at each stage you've got that small, incredibly fast SRAM cache able to feed a certain% of your memory requests before having to hit DRAM. The overall effect is to speed up the majority of memory requests, increasing overall memory bandwidth, but increasing latency for those requests not served from the cache:

FIC386SC_ram.png
Filename
FIC386SC_ram.png
File size
36.29 KiB
Views
603 views
File license
CC-BY-4.0

Wolfenstein 3D benchmark results:

FIC386SC_wolf3d.png
Filename
FIC386SC_wolf3d.png
File size
34.8 KiB
Views
603 views
File license
CC-BY-4.0

Formula 1 GP benchmarck results - lower bound figure is obtained on the start grid, upper figure obtained after all cars start moving from the grid:

FIC386SC_f1gp.png
Filename
FIC386SC_f1gp.png
File size
37.07 KiB
Views
603 views
File license
CC-BY-4.0

3DBench results - all best results were achieved with the Cirrus Logic CL-GD5428. Worst performance against ET4000AX it was the same, best performance against the ET4000AX it was just over 1fps faster. The Trident card trailed both the ET4000 and 5428 by ~2-3fps in every run, and even more in Wolfenstein 3D:

FIC386SC_3dbench.png
Filename
FIC386SC_3dbench.png
File size
35.94 KiB
Views
600 views
File license
CC-BY-4.0

My collection database and technical wiki:
https://www.target-earth.net

Reply 24 of 48, by megatron-uk

User metadata
Rank Oldbie
Rank
Oldbie

Here's the composite chart of the memory bandwidth available in each of the configurations:

FIC386SC_ram2.png
Filename
FIC386SC_ram2.png
File size
41.83 KiB
Views
596 views
File license
CC-BY-4.0

You can see the impact that the increased latency of accessing main memory has on it's throughput, but obviously that is offset by the increasing amounts of cache that you put in front of that (comparatively) slower DRAM transfer.

What's interesting is that the DLC, even with internal cache disabled has a slight performance impact accessing main memory compared to the DX; it's small, but measurable.

My collection database and technical wiki:
https://www.target-earth.net

Reply 25 of 48, by Deunan

User metadata
Rank Oldbie
Rank
Oldbie

Your 3DBench results are a bit better but in general on par with mine (Trident TVGA8900D with 1M of VRAM, that gives me 32-bit internal bus, and 0WS jumper is also present) and I'm not overclocking ISA bus.
The Landmark results though are a not that impressive. For 386DX-40 and CX-83D87-40 I get 70 MHz AT, 144 MHz 80287. CPU result is OK, but your FPU seems kinda low. I wonder if that Cyrix of yours is slower than FASMATH variants?
And then there's the DLC, you get 134 for CPU with cache enabled, I get 168 (and 164 for NPU). Seeing these results and those regressions in memory througput (your explanation doesn't quite convince me) I do wonder if your CPU cache settings are correct? This almost looks like you get some penalty for too frequent cache flushing (and yes, that can apply too even if the internal cache is disabled via CR0 bit).

If you let the BIOS handle these settings you should investigate if they are correct. The BIOS in mobo I got my results on offers FLUSH as option but in reality that pin is not connected on the CPU, so even if the chipsed does support it, I can't use it. I have to stick with BARB and fortunately I have hidden refresh option that works.

Reply 26 of 48, by pshipkov

User metadata
Rank Oldbie
Rank
Oldbie

what is the reason to test lower isa bus frequencies ?
obviously the higher the better.
finding and sharing best perf is what matters.
not a complain really, just saying.

otherwise, anothet good Symphony board.
too bad it does not go past 40, or does it ?

retro bits and bytes

Reply 27 of 48, by megatron-uk

User metadata
Rank Oldbie
Rank
Oldbie
pshipkov wrote on 2021-05-16, 01:57:
what is the reason to test lower isa bus frequencies ? obviously the higher the better. finding and sharing best perf is what ma […]
Show full quote

what is the reason to test lower isa bus frequencies ?
obviously the higher the better.
finding and sharing best perf is what matters.
not a complain really, just saying.

otherwise, anothet good Symphony board.
too bad it does not go past 40, or does it ?

The board defaults to 8MHz ISA clock, I picked 10MHz as a baseline but wanted to also see if the various cards were stable at 13MHz (and what the performance delta is) - other than one benchmark (CheckIt2) with one card (the CL-GD5428) it's perfectly stable at 13MHz. Going up to the next speed increment (16MHz) I get memory errors from HIMEM about not being able to control the A20 gate, so 16MHz was a no-go.

I've not yet tried anything above 40MHz as I need some 80-something MHz oscillators to do so; I haven't got any at the moment - just the one 80MHz part.

My collection database and technical wiki:
https://www.target-earth.net

Reply 28 of 48, by megatron-uk

User metadata
Rank Oldbie
Rank
Oldbie
Deunan wrote on 2021-05-15, 23:51:
Your 3DBench results are a bit better but in general on par with mine (Trident TVGA8900D with 1M of VRAM, that gives me 32-bit i […]
Show full quote

Your 3DBench results are a bit better but in general on par with mine (Trident TVGA8900D with 1M of VRAM, that gives me 32-bit internal bus, and 0WS jumper is also present) and I'm not overclocking ISA bus.
The Landmark results though are a not that impressive. For 386DX-40 and CX-83D87-40 I get 70 MHz AT, 144 MHz 80287. CPU result is OK, but your FPU seems kinda low. I wonder if that Cyrix of yours is slower than FASMATH variants?
And then there's the DLC, you get 134 for CPU with cache enabled, I get 168 (and 164 for NPU). Seeing these results and those regressions in memory througput (your explanation doesn't quite convince me) I do wonder if your CPU cache settings are correct? This almost looks like you get some penalty for too frequent cache flushing (and yes, that can apply too even if the internal cache is disabled via CR0 bit).

If you let the BIOS handle these settings you should investigate if they are correct. The BIOS in mobo I got my results on offers FLUSH as option but in reality that pin is not connected on the CPU, so even if the chipsed does support it, I can't use it. I have to stick with BARB and fortunately I have hidden refresh option that works.

Yes, using the cyrix.exe utility to control cache behaviour is one area that I have not yet investigated - by all accounts most motherboard support for the DLC/SXL on-chip cache is brain dead, so this is something I am eager to try next.

My collection database and technical wiki:
https://www.target-earth.net

Reply 29 of 48, by megatron-uk

User metadata
Rank Oldbie
Rank
Oldbie
Deunan wrote on 2021-05-15, 23:51:
Your 3DBench results are a bit better but in general on par with mine (Trident TVGA8900D with 1M of VRAM, that gives me 32-bit i […]
Show full quote

Your 3DBench results are a bit better but in general on par with mine (Trident TVGA8900D with 1M of VRAM, that gives me 32-bit internal bus, and 0WS jumper is also present) and I'm not overclocking ISA bus.
The Landmark results though are a not that impressive. For 386DX-40 and CX-83D87-40 I get 70 MHz AT, 144 MHz 80287. CPU result is OK, but your FPU seems kinda low. I wonder if that Cyrix of yours is slower than FASMATH variants?
And then there's the DLC, you get 134 for CPU with cache enabled, I get 168 (and 164 for NPU). Seeing these results and those regressions in memory througput (your explanation doesn't quite convince me) I do wonder if your CPU cache settings are correct? This almost looks like you get some penalty for too frequent cache flushing (and yes, that can apply too even if the internal cache is disabled via CR0 bit).

If you let the BIOS handle these settings you should investigate if they are correct. The BIOS in mobo I got my results on offers FLUSH as option but in reality that pin is not connected on the CPU, so even if the chipsed does support it, I can't use it. I have to stick with BARB and fortunately I have hidden refresh option that works.

Just running cyrix.exe with the MR BIOS '486 CPU Cache: on' set:

IMG_2054.JPG
Filename
IMG_2054.JPG
File size
288.72 KiB
Views
558 views
File license
CC-BY-4.0

My collection database and technical wiki:
https://www.target-earth.net

Reply 30 of 48, by megatron-uk

User metadata
Rank Oldbie
Rank
Oldbie

Ok, trying the following:

cyrix.exe -e -b- -f -m- -xA000,128 -xC000,256

IMG20210516104700.jpg
Filename
IMG20210516104700.jpg
File size
1.64 MiB
Views
553 views
File license
CC-BY-4.0

Landmark CPU pretty much matches what @Deunan was saying - though FPU is the same (I thought the last, cx87DLC FPU was the best?), video throughput is ~200 chars/sec better too. At the moment I'm re-imaging my CF card, so haven't had a chance to look at any memory bandwidth tests yet.

I need to test that FLUSH works properly though - there's a couple of threads that I need to re-read through with some test software to ensure that cache coherency is maintained and thus whether the motherboard FLUSH support is correct, or if BARB is needed instead. Either way it looks like the default MR BIOS "Enable 486 cache" option doesn't set things up effectively.

My collection database and technical wiki:
https://www.target-earth.net

Reply 31 of 48, by megatron-uk

User metadata
Rank Oldbie
Rank
Oldbie

Have ran a few more tests with the use of the cyrix utility and I'm seeing a very modest increase in memory bandwidth (CPU cache increases in the order of +1MB/sec, motherboard cache of +~400KB/sec and DRAM of about the same). No earth shattering improvements, but improvements none the less.

I'll see if any of the more real-world benchmarks show noticeable improvements.

My collection database and technical wiki:
https://www.target-earth.net

Reply 32 of 48, by megatron-uk

User metadata
Rank Oldbie
Rank
Oldbie

Actually, scratch all that - I can't get that 167MHz Landmark result to run consistently when a memory manager is loaded. If himem.sys or one of the TLB drivers are loaded then I inconsistently flip between that improved 167MHz score and the ~130MHz rating I was getting previously.

If I run without any memory managers loaded to remove that variable, neither FLUSH nor BARB mode appear to make any difference.

I can do "cyrix.exe -f- -b" to enable BARB mode, run landmark or comptest, then flip to "cyrix.exe -b- -f" and run them both again and there is no difference in the cpu rating or memory throughput scores.

My collection database and technical wiki:
https://www.target-earth.net

Reply 33 of 48, by megatron-uk

User metadata
Rank Oldbie
Rank
Oldbie

Hmm... the more I look at actual real-world performance numbers (Wolf 3D, 3DBench, PCPlayer Benchmark) the less I'm convinced that I'm actually losing any real performance.

This thing is pulling 40fps in Wolf3d, 23.8fps in 3DBench and 6.7fps in PCP - all of which put it right at the top, or almost the top of equivalent 40MHz DLC scores in Phil's benchmark database.

I've not got any Doom numbers until now - as I was trying to limit the range of tests to those I had already run for my range of 286 boards... but it won't hurt to do some one-off tests to see if the real-world performance holds true for that title as well.

My collection database and technical wiki:
https://www.target-earth.net

Reply 34 of 48, by Deunan

User metadata
Rank Oldbie
Rank
Oldbie

I always test without memory managers, not even HIMEM loaded - although that one shouldn't really matter. But then again Landmark has some issues every now and then on CPUs faster than 386DX and I noticed having HIMEM.SYS loaded actually helps it to hang/report garbage less. Difference in loading address?

On DLC chips FLUSH often seems to work OK but that's only because the 1KiB L1 is so small it'll get evicted before it has a chance to cause collision. You'll get more issues with SXL(C) chips that have 8KiB, usually the first thing is corruption on data from floppy. So if BIOS sets BARB, stick to that unless you can see a performance difference (no hidden refresh). The only real issue (with FLUSH) would be DMA that writes to memory, so unless you are doing sampling with your sound card it's only the FDC that will be affected.

And try both disabling A20M input as well as not prohibiting the first 64k segments from being cached. There is really not many programs that would try to wrap-around the 20b address space so it's pretty safe to assume all accesses beyond 1MiB are with A20 unmasked. Also there might be some more HDD performance to be found if you only excluded the VRAM from caching regions, but not the BIOS and extensions - exactly because you use XTIDE.

Reply 35 of 48, by megatron-uk

User metadata
Rank Oldbie
Rank
Oldbie

I've got the first 64k of each MB now being cached again and seeing minor performance gains, although one bonus is that landmark scores are now consistent whether having himem loaded or not; so that's a bonus.

It does feel like I'm shaving a Yak now though - chasing smaller and smaller gains.

In addition to a Wolf3d score of 40.7 fps, Landmark scores of 167MHz cpu / 161MHz fpu, 3dBench of 24.8fps I am seeing 6.8fps in PCPBench. These all appear to match well configured DLC40 and 486DX33 systems, so it's pretty much on the money in terms of real world speed.

One problem I'm coming across is an inability to launch Doom though; whether it's Phils benchmark pack or my registered Doom SE, they both hang at "initialising Doom refresh daemon"... so there's something amiss there. Plenty of other protected mode stuff I've tried does work though.

Speedsys still doesn't run, either; hanging after detecting all the devices, starts to redraw the screen, prints "Processor: Cyrix" then hangs without drawing any further UI elements.

My collection database and technical wiki:
https://www.target-earth.net

Reply 36 of 48, by pshipkov

User metadata
Rank Oldbie
Rank
Oldbie

Speedsys hanging with 386 dlc is actually a common problem.
Recently this was discussed in another thread.
Barb/flush/i1/i2 flags combo can unclog it eventually, as well as relaxed wait states.
But in general - it is not indicative of system instability.
Doom in the other hand is 😀

retro bits and bytes

Reply 37 of 48, by megatron-uk

User metadata
Rank Oldbie
Rank
Oldbie

Ok, playing around with memory sticks now - swapped out my 4x 4MB 60ns 'known working' set and fitted the 8x 70ns 1MB parts that came with the board. It's not happy at 0ws any more due to the slower parts, but at 1ws it runs fine, the memory throughput is ~1MB/sec faster (some sort of interleaving between banks?) and the Doom test runs.

It's pointing to my 'trusty' 4MB memory being the issue for that particular problem.

According to Phils dos benchmark pack, the 'low detail' Doom timedemo returns: 2134 gametics, 1627 realtics
The 'high detail' timdemo returns: 2134 gametics, 5835 realtics

If true, that's the best Doom realtics score from Phils benchmarking spreadsheet for a DLC40; I calculate it to be 12.8fps.

My collection database and technical wiki:
https://www.target-earth.net

Reply 38 of 48, by megatron-uk

User metadata
Rank Oldbie
Rank
Oldbie

The tweaks to the level 1 cache setting using the cyrix.exe tool have netted an approximate 1.5MBytes/sec improvement to the main memory throughput versus the default configuration as enabled via the MR BIOS "enabled 486 cpu cache" option.

The synthetic results from Landmark are now consistently >167MHz CPU and >160 FPU.

For the very last set of results in each test this was with the 8x 1MB 70ns modules at 1ws in the BIOS, versus 4x 4MB 60ns modules at 0ws. The use of those 8x modules improved main memory throughput by another 1MBytes/sec, for an overall improvement of ~2.5MBytes/sec. If I can get a set of 8 tested, higher performance modules that work at 0ws, this should improve things again.

3DBench scores improved by another 0.5fps and HDD transfers by ~300KBytes/sec compared to the enabling cpu cache without cyrix.exe.

I need to go back through the previous tests to see what impact all of the changes had to the Doom metrics, but for the final configuration I get low detail framerates for timedemo 3 of 46.2fps and high detail rates of 12.8fps.

I'm quite pleased with the way it is performing now.

FIC386SC_ram2.png
Filename
FIC386SC_ram2.png
File size
67.48 KiB
Views
476 views
File license
CC-BY-4.0
FIC386SC_cpu.png
Filename
FIC386SC_cpu.png
File size
61.06 KiB
Views
476 views
File license
CC-BY-4.0
FIC386SC_wolf3d.png
Filename
FIC386SC_wolf3d.png
File size
52.27 KiB
Views
476 views
File license
CC-BY-4.0
FIC386SC_3dbench.png
Filename
FIC386SC_3dbench.png
File size
44.39 KiB
Views
476 views
File license
CC-BY-4.0

My collection database and technical wiki:
https://www.target-earth.net

Reply 39 of 48, by megatron-uk

User metadata
Rank Oldbie
Rank
Oldbie

Disk IO metrics:

FIC386SC_diskio.png
Filename
FIC386SC_diskio.png
File size
36.38 KiB
Views
473 views
File license
CC-BY-4.0

My collection database and technical wiki:
https://www.target-earth.net