2002-11-04 00:20:59

by Andries E. Brouwer

[permalink] [raw]
Subject: [PATCH] tcp hang solved

Recent 2.5 kernels were unusable since rlogin sessions would
inexplicably hang for many seconds. tcpdump reveals that
the remote host sends packets A,B,C, the local host acks A,
remote host sends D, local host acks A, sacks C, D, and,
for example 47 seconds later, the remote host retransmits B.

Until 2.5.34 the scenario would be the same, but the retransmission
would occur after less than a second.

If I am not mistaken, the cause of this change in behaviour is
the removal of tcp_store_ts_recent() one place, and addition
three other places. At first sight the three new places seem to
cover the old call, but inspection shows that the conditions
changed. So, the following patch should be appropriate.

Andries


--- linux-2.5.45/linux/net/ipv4/tcp_input.c Thu Oct 31 14:14:51 2002
+++ linux-2.5.45a/linux/net/ipv4/tcp_input.c Mon Nov 4 01:10:27 2002
@@ -3433,6 +3433,7 @@
(sizeof(struct tcphdr) + TCPOLEN_TSTAMP_ALIGNED) &&
tp->rcv_nxt == tp->rcv_wup)
tcp_store_ts_recent(tp);
+
/* We know that such packets are checksummed
* on entry.
*/
@@ -3454,10 +3455,6 @@
__set_current_state(TASK_RUNNING);

if (!tcp_copy_to_iovec(sk, skb, tcp_header_len)) {
- __skb_pull(skb, tcp_header_len);
- tp->rcv_nxt = TCP_SKB_CB(skb)->end_seq;
- NET_INC_STATS_BH(TCPHPHitsToUser);
- eaten = 1;
/* Predicted packet is in window by definition.
* seq == rcv_nxt and rcv_wup <= rcv_nxt.
* Hence, check seq<=rcv_wup reduces to:
@@ -3467,6 +3464,11 @@
TCPOLEN_TSTAMP_ALIGNED) &&
tp->rcv_nxt == tp->rcv_wup)
tcp_store_ts_recent(tp);
+
+ __skb_pull(skb, tcp_header_len);
+ tp->rcv_nxt = TCP_SKB_CB(skb)->end_seq;
+ NET_INC_STATS_BH(TCPHPHitsToUser);
+ eaten = 1;
}
}
if (!eaten) {

[now that this code+comment occurs three times, an inline function
is perhaps nicer]


2002-11-04 10:40:05

by David Miller

[permalink] [raw]
Subject: Re: [PATCH] tcp hang solved

From: [email protected]
Date: Mon, 4 Nov 2002 01:27:26 +0100 (MET)

So, the following patch should be appropriate.

Thanks for this fix, I will apply it. I have no idea why I didn't
spot this while 'porting' this patch from 2.4.x to 2.5.x :)

2002-11-04 11:31:59

by Sebastian Benoit

[permalink] [raw]
Subject: Re: [PATCH] tcp hang solved

David S. Miller([email protected])@2002.11.04 02:36:20 +0000:
> From: [email protected]
> Date: Mon, 4 Nov 2002 01:27:26 +0100 (MET)
>
> So, the following patch should be appropriate.
>
> Thanks for this fix, I will apply it. I have no idea why I didn't
> spot this while 'porting' this patch from 2.4.x to 2.5.x :)

This does _not_ fix the tcp hang i reported on netdev 2 weeks ago,
i can still reproduce this with 2.5.45-latest-bk + this patch:

hostA: ./socket -s 3000 > /dev/null
hostA: ssh hostC
hostB: cat largefile | ./socket hostA 3000

Now typing stuff in the ssh-connection to hostC will
cause this connection to hang. [*]
The connection between B and A is not affected.

This bug was introduced in 2.5.43-bk1, previous versions are ok.
I think it might be
ChangeSet 1.781.1.68 2002/10/15 19:01:33 [email protected]
but i'm not sure.

[*] tcpdump output of the SSH connection at the moment it stops working:
the first packets are ok. 80.76.225.240 = hostA, 80.76.224.45 = hostC

12:27:06.846266 80.76.225.240.32921 > 80.76.224.45.ssh: P 3072:3120(48) ack 897 win 63712 <nop,nop,timestamp 2925577 602150234> (DF) [tos 0x10]
12:27:06.895276 80.76.224.45.ssh > 80.76.225.240.32921: P 897:945(48) ack 3120 win 15928 <nop,nop,timestamp 602151959 2925577> (DF) [tos 0x10]
12:27:06.895316 80.76.225.240.32921 > 80.76.224.45.ssh: . ack 945 win 63712 <nop,nop,timestamp 2925626 602151959> (DF) [tos 0x10]
12:27:07.095430 80.76.225.240.32921 > 80.76.224.45.ssh: P 3120:3168(48) ack 945 win 63712 <nop,nop,timestamp 2925826 602151959> (DF) [tos 0x10]
12:27:07.107633 80.76.224.45.ssh > 80.76.225.240.32921: P 897:945(48) ack 3120 win 15928 <nop,nop,timestamp 602151981 2925577> (DF) [tos 0x10]
12:27:07.107668 80.76.225.240.32921 > 80.76.224.45.ssh: . ack 945 win 63712 <nop,nop,timestamp 2925839 602151981,nop,nop,sack sack 1 {897:945} > (DF) [tos 0x10]
12:27:07.124619 80.76.225.240.32921 > 80.76.224.45.ssh: P 3168:3216(48) ack 945 win 63712 <nop,nop,timestamp 2925856 602151981> (DF) [tos 0x10]
12:27:07.304553 80.76.225.240.32921 > 80.76.224.45.ssh: P 3120:3168(48) ack 945 win 63712 <nop,nop,timestamp 2926036 602151981> (DF) [tos 0x10]
12:27:07.328656 80.76.224.45.ssh > 80.76.225.240.32921: P 945:993(48) ack 3168 win 15928 <nop,nop,timestamp 602152003 2926036> (DF) [tos 0x10]
[ssh stops working around here]
12:27:07.328689 80.76.225.240.32921 > 80.76.224.45.ssh: . ack 993 win 63712 <nop,nop,timestamp 2926060 602152003> (DF) [tos 0x10]
12:27:07.724482 80.76.225.240.32921 > 80.76.224.45.ssh: P ack 993 win 63712 <nop,nop,timestamp 2926456 602152003> (DF) [tos 0x10]
12:27:08.564354 80.76.225.240.32921 > 80.76.224.45.ssh: P ack 993 win 63712 <nop,nop,timestamp 2927296 602152003> (DF) [tos 0x10]
12:27:10.244202 80.76.225.240.32921 > 80.76.224.45.ssh: P ack 993 win 63712 <nop,nop,timestamp 2928976 602152003> (DF) [tos 0x10]
12:27:13.603809 80.76.225.240.32921 > 80.76.224.45.ssh: P ack 993 win 63712 <nop,nop,timestamp 2932336 602152003> (DF) [tos 0x10]
12:27:20.323005 80.76.225.240.32921 > 80.76.224.45.ssh: P ack 993 win 63712 <nop,nop,timestamp 2939056 602152003> (DF) [tos 0x10]

/B.
--
Sebastian Benoit <[email protected]>
My mail is GnuPG signed -- Unsigned ones are bogus -- http://www.gnupg.org/
GnuPG 0x5BA22F00 2001-07-31 2999 9839 6C9E E4BF B540 C44B 4EC4 E1BE 5BA2 2F00


Attachments:
(No filename) (3.27 kB)
(No filename) (240.00 B)
Download all attachments

2002-11-04 13:00:32

by David Miller

[permalink] [raw]
Subject: Re: [PATCH] tcp hang solved

From: Sebastian Benoit <[email protected]>
Date: Mon, 4 Nov 2002 12:38:25 +0100

This bug was introduced in 2.5.43-bk1, previous versions are ok.
I think it might be
ChangeSet 1.781.1.68 2002/10/15 19:01:33 [email protected]
but i'm not sure.

It's not possible, if this functions modified here did not work you
would not be able to make any connections at all.

Can you try reverting the networking changesets one by one until
the problem goes away?

2002-11-04 13:12:26

by Sebastian Benoit

[permalink] [raw]
Subject: Re: [PATCH] tcp hang solved

David S. Miller([email protected])@2002.11.04 05:59:51 +0000:
> From: Sebastian Benoit <[email protected]>
> Date: Mon, 4 Nov 2002 12:38:25 +0100
>
> This bug was introduced in 2.5.43-bk1, previous versions are ok.
> I think it might be
> ChangeSet 1.781.1.68 2002/10/15 19:01:33 [email protected]
> but i'm not sure.
>
> It's not possible, if this functions modified here did not work you
> would not be able to make any connections at all.
>
> Can you try reverting the networking changesets one by one until
> the problem goes away?

I removed parts of 2.5.43-bk1 and that solved my problem, see attached mail
below. When I did this (2 weeks ago) I removed all changes that touched
net/* and associated includes. I was not able to pinpoint the exact change
that causes the problem, the changes there are a bit to much for my knowledge
of the kernel, sorry.

/B.

=================================================================================

Date: Fri, 18 Oct 2002 12:47:17 +0200
From: Sebastian Benoit <[email protected]>
To: [email protected]
Subject: network connection gets stuck with 2.5.43-bk1/mm2
Message-ID: <[email protected]>

Hi,

i posted the message below to linux-mm yesterday, Andrew Morton told me that
it might be one of the changes in networking in -bk1.

removing these changes solved the problem:

include/linux/ip.h | 16=20
include/linux/tcp.h | 2=20
include/linux/udp.h | 31 -
include/net/dst.h | 56 --
include/net/ip.h | 16=20
include/net/sock.h | 2=20
include/net/tcp.h | 2=20
include/net/udp.h | 2=20
net/core/dst.c | 25 -
net/ipv4/af_inet.c | 17=20
net/ipv4/icmp.c | 4=20
net/ipv4/ip_output.c | 880 +++++++----------------------------------=
-----
net/ipv4/ip_proc.c | 74 ---
net/ipv4/ip_sockglue.c | 4=20
net/ipv4/raw.c | 7=20
net/ipv4/tcp.c | 49 --
net/ipv4/tcp_ipv4.c | 6=20
net/ipv4/tcp_minisocks.c | 10=20
net/ipv4/udp.c | 296 ---------------
net/ipv6/tcp_ipv6.c | 5=20
net/netsyms.c | 1=20

(this is the diffstat of the reverse patch)

Is this fixed already?
/B.



----------------- quote -----------------
Hi,=20

funny problem w. 2.5.43-mm2:

i'm running 2.5.43-mm2 on my workstation. Normal workload, X-windows, a few
xterms, editor, mozilla, etc. (host A)

I have a NFS/SAMBA-mount (both show the problem) to host B. Host B runs
2.4.19rc5aa1.

I can get a xterm, in which i have a ssh-connection to a third host C
'stuck' by simply cat'ing a large file from the NFS/SAMBA server to
/dev/null.

The xterm/ssh seems stuck, that is no key i press is received on the other
end, but output of the program running on host C is updated in the xterm. I
checked with tcpdump: the keypress does not generate a packet, my host only
sends ACK's on that ssh connection to host C.

The ssh-connection is not unstuck by stopping the data transfer from host B.

I checked that plain 2.5.42 and 2.5.43-mm1 do not have this problem: here my
input goes through to C. At least for small amounts of input, i did not test
anything beyond typing a few hundret chars.

recap:

"mount /mnt/hostB"
"ssh hostC" -> type random stuff in that connection
at the same time do "cat /mnt/hostB/bigfile > /dev/null"
ssh gets stuck.

hardware: PIII/600, 3c905B on 10baseT half-duplex

I'm sorry i cant do any further checks until Friday afternoon (MET).

/B.
--------------- quote ends ---------------


--
Sebastian Benoit <[email protected]>
My mail is GnuPG signed -- Unsigned ones are bogus -- http://www.gnupg.org/
GnuPG 0x5BA22F00 2001-07-31 2999 9839 6C9E E4BF B540 C44B 4EC4 E1BE 5BA2 2F00

Every gun that is made, every warship launched, every rocket fired,
signifies in the final sense a theft from those who hunger and are not fed,
those who are cold and are not clothed.
-- Dwight D. Eisenhower, U.S. President, 1953


Attachments:
(No filename) (3.94 kB)
(No filename) (240.00 B)
Download all attachments

2002-11-04 13:40:08

by David Miller

[permalink] [raw]
Subject: Re: [PATCH] tcp hang solved

From: Sebastian Benoit <[email protected]>
Date: Mon, 4 Nov 2002 14:18:53 +0100

I removed parts of 2.5.43-bk1 and that solved my problem, see attached mail
below.

That's just a list of files, can you send me the actual precise patch
you reverted?

2002-11-04 16:04:03

by Sebastian Benoit

[permalink] [raw]
Subject: Re: [PATCH] tcp hang solved

David S. Miller([email protected])@2002.11.04 06:39:28 +0000:
> From: Sebastian Benoit <[email protected]>
> Date: Mon, 4 Nov 2002 14:18:53 +0100
>
> I removed parts of 2.5.43-bk1 and that solved my problem, see attached mail
> below.
>
> That's just a list of files, can you send me the actual precise patch
> you reverted?

Okay, I redid that patch against plain 2.5.44. This patch removes most
networking changes that were done between 2.5.43 and 2.5.44. I rechecked
that 2.5.44 without this patch has the tcp-hang problem, with this patch it
works again.

Since it's 24kB compressed, I put it on http://turing.fb12.de/people/benoit/2.5.44-rn.gz

/B.
--
Sebastian Benoit <[email protected]>
My mail is GnuPG signed -- Unsigned ones are bogus -- http://www.gnupg.org/
GnuPG 0x5BA22F00 2001-07-31 2999 9839 6C9E E4BF B540 C44B 4EC4 E1BE 5BA2 2F00

Astronomers do it with sextants.


Attachments:
(No filename) (906.00 B)
(No filename) (240.00 B)
Download all attachments