Reply 40 of 47, by mbbrutman
Grzyb,
In 2019 I added code to improve the flow control on noisy/lossy connections. That code shrinks and grows the TCP receive window size depending on how many packets are being lost. Without this code connections would often just freeze and die because the two sides would get so far apart that they could not re-sync. FreeBSD servers were especially prone to the dropped connections because of the way they re transmit lost packets.
Your trace shows that code is tripping constantly. The DOS machine is constantly not receiving packets sent from the FTP server, so it gets packets that are out of order and has to send an empty packet to tell the FTP server to resend the missing packets. The trace shows 22570 being received and 2992 cases where we had to ask for a re-transmit. That is a 13% missed/lost packet rate which is abysmal, and it's amazing it keeps the connection going at all. (A 1% loss rate would be high.)
You have two layers of Ethernet device drivers going on - the packet driver and the ODI shim. I don't have any visibility into what the ODI shim is doing, or what it's view of the world is. I suspect it is dropping packets but I have no way of proving that. Your machine does have quite a bit more overhead to receive those packets, and a 100 Mb/s card might have some fairly large buffers, so I'm pretty sure it's not a great card for that machine.
To see if the new flow control code is the problem can you rerun the trace but use the 2015 code? I'm going to bet that you have the same number of SEQ/ACK errors but it works faster because we're not shrinking the TCP receive window when things seem crappy. If that works I'll give you new code that disables just that feature to prove it is the problem, and then from there we'll figure it out.