VOGONS

Common searches


Reply 40 of 47, by mbbrutman

User metadata
Rank Member
Rank
Member

Grzyb,

In 2019 I added code to improve the flow control on noisy/lossy connections. That code shrinks and grows the TCP receive window size depending on how many packets are being lost. Without this code connections would often just freeze and die because the two sides would get so far apart that they could not re-sync. FreeBSD servers were especially prone to the dropped connections because of the way they re transmit lost packets.

Your trace shows that code is tripping constantly. The DOS machine is constantly not receiving packets sent from the FTP server, so it gets packets that are out of order and has to send an empty packet to tell the FTP server to resend the missing packets. The trace shows 22570 being received and 2992 cases where we had to ask for a re-transmit. That is a 13% missed/lost packet rate which is abysmal, and it's amazing it keeps the connection going at all. (A 1% loss rate would be high.)

You have two layers of Ethernet device drivers going on - the packet driver and the ODI shim. I don't have any visibility into what the ODI shim is doing, or what it's view of the world is. I suspect it is dropping packets but I have no way of proving that. Your machine does have quite a bit more overhead to receive those packets, and a 100 Mb/s card might have some fairly large buffers, so I'm pretty sure it's not a great card for that machine.

To see if the new flow control code is the problem can you rerun the trace but use the 2015 code? I'm going to bet that you have the same number of SEQ/ACK errors but it works faster because we're not shrinking the TCP receive window when things seem crappy. If that works I'll give you new code that disables just that feature to prove it is the problem, and then from there we'll figure it out.

Reply 41 of 47, by Grzyb

User metadata
Rank Oldbie
Rank
Oldbie
mbbrutman wrote on 2021-02-17, 02:36:

To see if the new flow control code is the problem can you rerun the trace but use the 2015 code?

See the attachment.

Also, I've tried with the FTPSRV.
Again, transferring a 10 MB file in BINARY mode.
The modern Linux box is now the client.

Jul 5 2015

PUT to RAMdrive: 561 KB/s
GET from RAMdrive: 581 KB/s

Mar 7 2020

PUT to RAMdrive: 116 KB/s
GET from RAMdrive: 561 KB/s

Attachments

  • Filename
    2015.LOG
    File size
    3.07 KiB
    Downloads
    5 downloads
    File license
    Public domain

Reply 43 of 47, by mbbrutman

User metadata
Rank Member
Rank
Member

Grzyb,

I started digging through the code and I noticed a bug that might be responsible for what you are seeing. Try this, and if it works, then I'll tell you what I screwed up. (And then fixed ...)

http://www.brutman.com/mTCP/ftp_flowcontrolfix.exe

(I can't click on the link to start the download, but I can copy the link to a new tab and get it to work. Weird ...)

Please use the debug trace commands too and send me the results.

-Mike

Reply 44 of 47, by Grzyb

User metadata
Rank Oldbie
Rank
Oldbie

Measurements without debugging:

download to RAMdrive: 113 KB/s
upload from RAMdrive: 650 KB/s

With debugging - see the attachment.

Also, I wanted to see how much can a 386 score in an environment that's saner than ODI+ODIPKT, so I've installed Linux...

Red Hat Linux 5.2
Kernel 2.0.36

time wget -O /dev/null -q ftp://...
(yes, "-q" is important, displaying the progress slows it down measurably...)

685 KB/s

put /proc/kcore

710 KB/s

Attachments

  • Filename
    FCF.LOG
    File size
    401.66 KiB
    Downloads
    6 downloads
    File license
    Public domain

Reply 45 of 47, by mbbrutman

User metadata
Rank Member
Rank
Member

Ok, well clearly that bug isn't the cause of your pain. (The debug log showed the same excessive number of missed packets.)

There were not many changes to the TCP layer and below between 2015 and 2020. The window resizing code was added and was pretty extensively tested, but even then it still has the bug I think I found. But otherwise, the code is not that much different.

If the card were in my possession I'd just start "bisecting" the problem and trying to figure out which exact commit to the library causes the problem with receive speed. I could do that with you, but it would be tedious for you to keep running experiments. If you are up to it let me know; with about 10 variations it should be possible to narrow down where it broke. While it's easy to blame the specific environment, I'd like to find the root cause. There might be a lesson to be learned that makes the code more robust for everybody. (If we do that we should move to email.)

Reply 46 of 47, by mbbrutman

User metadata
Rank Member
Rank
Member

Just a quick update ...

Grzyb and I are still debugging, but one of the problems I found is his MTU size. He was using the default MTU which is 576, when 1500 is optimal for Ethernet. MTU is specified in the mTCP CFG file, so if you have not set it then you are using the default of 576. Larger MTU means fewer packets and interrupts for the same size of data transfer.

That being said, the code should work with MTU set to 576. What Grzyb is seeing on his machine is still abnormal. But with MTU set to 1500 his machine flies ...

Reply 47 of 47, by mbbrutman

User metadata
Rank Member
Rank
Member

Wrapping this up ...

I had a small bug, but fixing it had no effect because it wasn't the root of the problem.

If you are running a shim to make an ODI driver look like a packet driver you should specify an MTU of 1500. With the default MTU of 576 many more packets were being generated, and it was causing the ODI layer to "drop packets on the floor" silently, making it look like lost packets to mTCP.

The other way to work around that problem is to specify more buffers for the ODI layer in the NET.CFG file. There are web pages recommending 8 buffers for the ODI layer. 12 or 16 is definitely better, and at 16 Grzyb noticed no transfer errors at full speed, even with the MTU set to 576. With MTU at 1500 he was getting over 870KB/sec using FTP.