From kragen@dnaco.net Sat Aug 22 00:11:08 1998
Date: Sat, 22 Aug 1998 00:11:06 -0400 (EDT)
From: Kragen <kragen@dnaco.net>
To: James Weirich <james.weirich@sdrc.com>
cc: clug-user@clug.org
Subject: Re: Strange FTP Download Behavior
In-Reply-To: <199808212018.QAA48521@sgipd497>
Message-ID: <Pine.SUN.3.96.980821235856.21889G-100000@picard.dnaco.net>
MIME-Version: 1.0
Content-Type: TEXT/PLAIN; charset=US-ASCII
X-Keywords:
X-UID: 1332
Status: O
X-Status: 

On Fri, 21 Aug 1998, James Weirich wrote:
> >>>>> "Brent" == Brent Foster <brent@one.net> writes:
> 
>     >> remote site:      One.net   Other ftp server
>     >> Local machine:            --------------------------
>     >> Linux          |          breaks  | works fine
>     >> Other          |      works fine  | works fine
> 
>     Brent> Your matrix is incorrect.  Sopan said the other ftp server
>     Brent> running Linux breaks too.
> 
> I agree.  I also got stalls when FTPing directly to the
> cayennesoft.com site, which looks like it is running a generic Unix
> System Vr4 system.

"generic Unix System Vr4 system" usually means "Solaris 2.x system".
Several versions of Solaris have a really severe TCP bug that really
only shows up when sending stuff over a dialup.  Part of TCP flow
control is to estimate the maximum reasonable round-trip time from how
soon you receive acknowledgements to nonretransmitted data segments.
Then, the delay before retransmission of an unacknowledged data segment
is based on this estimate -- the longer the round-trip time, the longer
the sender waits before retransmitting.

This estimate has minimum and maximum values set on it to prevent
unreasonable behavior in unreasonable network conditions.  The maximum
was far too low in certain versions of Solaris.  This leads to
retransmission of data segments that are already being received quite
nicely, thank you, which leads to multiple acknowledgements of those
segments, which leads to multiple retransmissions of following
segments, and the problem multiplies.  I've had 30 seconds' worth of
redundant TCP packets piled up on the far end of a 14.4kbps line
because of this.

This is SEPARATE and DISTINCT from the "stalling" problem being
described earlier -- the most obvious difference is that your modem
lights stay on all the time, because you're receiving numerous copies
of the same data.

>     Kragen> Has anyone run tcpdump during these funky downloads to see
>     Kragen> what's happening?
> 
>     Brent> Not on the server end.
>  
> I can try it tonight.  I've never run tcpdump.  Is it a standard
> Redhat utility or do I need to grab it from somewhere?  What kind of
> information does it provide and what should I look for?

It's a standard Solaris utility (hacked a little and renamed "snoop"),
and is available from Lawrence Berkeley Labs, or from the net-tools (or
is it netkit?  I'm not sure) package for Linux.

It prints out a one-line summary of each packet visible on a particular
network interface (by default, putting the interface in promiscuous
mode.)  For all packets, the information includes the time the packet
was seen, its from and to addresses, and any unusual IP flags (like
TOS, don't fragment, or a TTL of 1); for TCP packets, it also includes
the source and destination TCP ports, any TCP flags on the packet, the
sequence numbers of the packet (by default, relative to the first
packet on that connection that that run of tcpdump saw), and the
acknowledgement sequence number and the window size.

You should look for things that might cause the sender to stop
transmitting, like incorrect acknowledgements or windows of 0.

Hope this helps.

Kragen


