VOGONS


First post, by Disruptor

User metadata
Rank Member
Rank
Member

I'd like to present a fact:

I've a 486/160 that has a HighPoint IDE controller and an 80 GB HDD that is definitively too fast for this computer.

In my test using the command line (command.com or cmd.exe) I copy an 130 MB file to NUL device.

copy /b file NUL

Under plain DOS I get over 30 MB/s
Starting smartdrv performance drops below 20 MB/s (tested with a smaller file)
Under Windows 98 SE I get ~ 12 MB/s
Under Windows 2000 I get below 10 MB/s

The reason is, that copying from PCI bus using busmaster transfer is about as fast as memory transfers themself.
So, putting file contents to (kernel mode) cache and copying them to (user space) applications will drop performance.

Please draw your own conclusions.

Last edited by Disruptor on 2020-01-31, 21:03. Edited 1 time in total.

Reply 1 of 11, by konc

User metadata
Rank Oldbie
Rank
Oldbie

Yes it's pretty known by now that when using a too modern/too fast hard disk on older systems smartdrive has the opposite effect. There's also the large cache size of the drives themselves.

Reply 2 of 11, by derSammler

User metadata
Rank l33t
Rank
l33t
Disruptor wrote on 2020-01-30, 20:51:

The reason is, that copying from PCI bus using busmaster transfer is about as fast as memory transfers themself.

No, that's not the reason. The cache has nothing to do with it. The file is read into RAM in any case.

In DOS, everything runs exclusively, single-tasked. If you do a file-copy, nothing else happens than that. DOS will be using 100% of the time "copying" the file. Reading a single big file is fast therefore.

Win98 and 2000 are multi-tasking OSes. A file-copy is no longer done exclusively; there is a lot of other stuff going on in the background that also accesses the hard disk. The performance-drop in Win98 comes from that and from the fact that DOS in Win98 runs in a VM (a DOS session has no direct access to the hard disk in Win98, it must go thru the OS). A 486 also doesn't help; that's a bit too weak for Win98 already. If at all, copy the file using Explorer and not in a DOS shell.

And as for Windows 2000, seriously? Windows 2000 needs a Pentium II and 128 MB RAM or more to run at good speed. You're not even hitting the bare minimum, which is a 133 MHz Pentium. It's so slow because your 486 simply can't even handle the core OS, let alone any additional tasks you try to do.

http://retro-net.de/blog.html

Reply 3 of 11, by Disruptor

User metadata
Rank Member
Rank
Member
derSammler wrote on 2020-01-31, 08:12:
No, that's not the reason. The cache has nothing to do with it. The file is read into RAM in any case. […]
Show full quote
Disruptor wrote on 2020-01-30, 20:51:

The reason is, that copying from PCI bus using busmaster transfer is about as fast as memory transfers themself.

No, that's not the reason. The cache has nothing to do with it. The file is read into RAM in any case.

In DOS, everything runs exclusively, single-tasked. If you do a file-copy, nothing else happens than that. DOS will be using 100% of the time "copying" the file. Reading a single big file is fast therefore.

Win98 and 2000 are multi-tasking OSes. A file-copy is no longer done exclusively; there is a lot of other stuff going on in the background that also accesses the hard disk. The performance-drop in Win98 comes from that and from the fact that DOS in Win98 runs in a VM (a DOS session has no direct access to the hard disk in Win98, it must go thru the OS). A 486 also doesn't help; that's a bit too weak for Win98 already. If at all, copy the file using Explorer and not in a DOS shell.

And as for Windows 2000, seriously? Windows 2000 needs a Pentium II and 128 MB RAM or more to run at good speed. You're not even hitting the bare minimum, which is a 133 MHz Pentium. It's so slow because your 486 simply can't even handle the core OS, let alone any additional tasks you try to do.

This is just what I have explained in the second sentence you have ignored:

Disruptor wrote on 2020-01-30, 20:51:

So, putting file contents to (kernel mode) cache and copying them to (user space) applications will drop performance.

Reply 4 of 11, by Falcosoft

User metadata
Rank Oldbie
Rank
Oldbie

Hi,
I have the feeling that the title of this thread is a little misleading. Generally it's true that "caching drops bulk COPY performance". It's not only true in case of disk caches but caches in general.
Caching relies on the principle of locality (temporal/spatial). Caching in general works well when the data being cached can be reused later. In case of copying/moving large files this never happens. So for large/bulk copies even processors provide ways to bypass cache to achieve better performance (e.g. look at non-temporal write instructions in x86 such as MovNTq, MovNTps etc.).
https://vgatherps.github.io/2018-09-02-nontemporal/
So when copying large amount of data caching has an unnecessary performance penalty/overhead even in case of processors.
Yet noone would argue that caching in general "drops CPU performance massively"...

Also in case of copying large files most OS provide ways to bypass disk caches. E.g. in case of Win32 there is the FILE_FLAG_NO_BUFFERING flag you can use.
https://docs.microsoft.com/en-us/windows/win3 … /file-buffering
So it's rather a commonplace that caching is NOT for speed up large/bulk copies, it's the worst case scenario for caching in general.
Most professional copy tools provide ways to bypass disk caches to speed up large copy operations. Robocopy has a /J command line option recommended for copying large files.
If you use Total Commander in Windows you can bypass disk caches by enabling "big file copy mode".
In DOS you can temporarily disable caches for a drive by using smartdrv drive- .

But even in case of DOS and modern hardware disk caches like smartdrive can speed up disk I/O. But copying large files is not the right test. Instead try to work with compilers under DOS (such as Turbo C/Pascal, WatCom C, FreePascal etc.). You will see how much faster the compiling process is with smartdrive loaded (since in case of compiling many header and library files can be re-used from the cache during the process). Also try to delete e.g. the Windows folder (or any other folder with thousands of small files) with smartdrive loaded and without smartdrive.
Even in case of TB sized disks with 64MB internal buffer the above operations will be much faster with smartdrive.

Website, Facebook, Youtube
Falcosoft Midi Player + Munt VSTi + BassMidi VSTi topic

Reply 6 of 11, by Falcosoft

User metadata
Rank Oldbie
Rank
Oldbie
Disruptor wrote on 2020-01-31, 15:09:

No, it isn't.
After copying 2 times to NUL, the file is completely in disk cache.
But the performance is still horrible.

Yest it's.
Re-read what I have written. Disk cache performance is not (only) about bulk file copies...
Anyway, how a 130 MB file could be 'completely in disk cache' under DOS with smartdrive? Smartdrive's default cache size is only 2MB and even with manual tuning you cannot get bigger cache than 25/35.7 MB...

Website, Facebook, Youtube
Falcosoft Midi Player + Munt VSTi + BassMidi VSTi topic

Reply 7 of 11, by Disruptor

User metadata
Rank Member
Rank
Member
Falcosoft wrote on 2020-01-31, 17:52:

Yest it's.
Re-read what I have written. Disk cache performance is not (only) about bulk file copies...
Anyway, how a 130 MB file could be 'completely in disk cache' under DOS with smartdrive? Smartdrive's default cache size is only 2MB and even with manual tuning you cannot get bigger cache than 25/35.7 MB...

With optimal techniques it is still possible to just do one read into memory - and this with busmaster DMA. Hint: Paging and page alignment.

Oh, to do this extra test in DOS smartdrive, I took a 30 MB file. Sorry to mention that, but it makes almost no difference.
MOVSW / MOVSD ist not really fast on a 486.

Reply 8 of 11, by AvalonH

User metadata
Rank Member
Rank
Member

Most IDE controllers I have tried over the years don't enable DMA from bios extension in realmode DOS, which makes a big difference. They might show on screen DMA enabled but it is not under DOS. Promise s150 tx2 is one exception. I tested a 256GB SSD connected to it to rule out any speed bottlenecks. In plain dos without smartdrive it transfers at 76MB/S (this on a p90, basically maxing out the PCI bus, 430FX chipset). Loading Smartdrv with default settings it transfers at 29MB/s.
And saying DOS is single tasking OS, yes, but running the same controller forced in fastest non DMA mode - PIO4 mode, the max tranfser is 8.3MB/s, CPU is 100% used and it can't even reach the maximum transfer of PI04's 16MB/s. This is generally what happens when you boot to DOS using every intel onboard ide controller, even on modern motherboards (dos dma device drivers don't support newer motherboard ide controllers.).

One software I would like is a modern reliable(takes chache into account) hard disk benchmark tool the runs under realmode dos. Speedsys 4.78 is not accurate, this promise controller shows 1.2GB/s transfer speed in it's disk benchmark.

Reply 9 of 11, by mrau

User metadata
Rank Oldbie
Rank
Oldbie
Disruptor wrote on 2020-01-30, 20:51:

Under plain DOS I get over 30 MB/s

so dma must be enabled already?
lower performance in windows will come from copying of content which is not required in dos; the additional loss of performance comes from system load and maybe You can confirm that by comparing performance of a simple loop in all 3 environments?
it looks like the windows approach is flawed and not just a bit

Reply 10 of 11, by mrau

User metadata
Rank Oldbie
Rank
Oldbie
AvalonH wrote on 2020-02-01, 11:43:

Speedsys 4.78 is not accurate, this promise controller shows 1.2GB/s transfer speed in it's disk benchmark.

maybe that is the checking speed? that is not a real world value i believe

why can't i edit my old post to add this there?

Reply 11 of 11, by AvalonH

User metadata
Rank Member
Rank
Member
mrau wrote on 2020-02-03, 00:46:
AvalonH wrote on 2020-02-01, 11:43:

Speedsys 4.78 is not accurate, this promise controller shows 1.2GB/s transfer speed in it's disk benchmark.

maybe that is the checking speed? that is not a real world value i believe

why can't i edit my old post to add this there?

It's not the cache speed, that is the figure it gives for the sustained transfer rate across all 256GB of the SSD. Obviously speedsys is wrong, any figure above 133MB/s would be wrong as it's faster than the total bandwidth of the PCI bus. The 430FX motherboard I'm using will top out at around 100MB/s over the PCI bus.
It would be handy to have a util in dos that checks if DMA is enabled on a disk and also gives an accurate read/write/access time benchmark for faster hardware.