2001-12-10 18:17:19

by Mika Liljeberg

[permalink] [raw]
Subject: TCP LAST-ACK state broken in 2.4.17-pre2

Hi,

I came across the following behavior (sorry, no tcpdump but this should
be easy to reproduce with the right tools):

hostA hostB
--------FIN----------->
<-----data+FIN---------
--------ACK-------X (packet lost)
<-----data+FIN--------- (retransmit)
<-----data+FIN--------- (retransmit)
<-----data+FIN--------- (retransmit)
....
<-----data+FIN--------- (retransmit)
--------RST----------->

HostA is running Linux 2.4.17-pre2. HostB is running Symbian OS. All the
sequence numbers pan out.

Either LAST-ACK is completely broken or Linux just cannot handle a
FIN-ACK that is piggybacked on a data segment, when received in LAST-ACK
state. It should be acked as an out-of-window segment, as usual.
Finally, the LAST-ACK state times out and Linux responds to the FIN
segment with an RST.

Cheers,

MikaL


2001-12-10 18:35:38

by Alexey Kuznetsov

[permalink] [raw]
Subject: Re: TCP LAST-ACK state broken in 2.4.17-pre2

Hello!

> Either LAST-ACK is completely broken or Linux just cannot handle a
> FIN-ACK that is piggybacked on a data segment, when received in LAST-ACK
> state.

It cannot handle even pure FIN in this state. :-( I bring apologies,
it is my fault. Thank you.

Well, you can just add one line to tcp_input.c to repair this.

}
/* Fall through */
+ case TCP_LAST_ACK:
case TCP_ESTABLISHED:
tcp_data_queue(sk, skb);


Dave, "official" patch will follow later. I must think about
some marginal effect in TCP_CLOSE_WAIT and TCP_CLOSING, which can break
out of switch too. Duh, do specs say something about segments with seqs
above fin? I do not remember.

Alexey

2001-12-10 18:55:40

by Mika Liljeberg

[permalink] [raw]
Subject: Re: TCP LAST-ACK state broken in 2.4.17-pre2

[email protected] wrote:
> Well, you can just add one line to tcp_input.c to repair this.

Thanks, that was quick! :)

> Duh, do specs say something about segments with seqs
> above fin? I do not remember.

I don't think they do, aside from that LAST-ACK is a synchronized state.
I.e., if you set RCV.WND to zero after receiving a FIN, any subsequent
out-of-window (below or above) segment will be acked. However, I don't
think it matters much, since above-window packets would in this case
always be caused by a bug in the sender.

> Alexey

MikaL

2001-12-11 00:14:25

by David Miller

[permalink] [raw]
Subject: Re: TCP LAST-ACK state broken in 2.4.17-pre2

From: [email protected]
Date: Mon, 10 Dec 2001 21:34:47 +0300 (MSK)

Dave, "official" patch will follow later. I must think about
some marginal effect in TCP_CLOSE_WAIT and TCP_CLOSING, which can break
out of switch too. Duh, do specs say something about segments with seqs
above fin? I do not remember.

A socket in a synchronized state is required to enforce legal sequence
numbers, is it not?

2001-12-11 17:25:12

by Alexey Kuznetsov

[permalink] [raw]
Subject: Re: TCP LAST-ACK state broken in 2.4.17-pre2

Hello!

> A socket in a synchronized state is required to enforce legal sequence
> numbers, is it not?

They are . :-)

Well, assuming that this is really illegal we could just add
missing LAST_ACK close to its relative CLOSING, CLOSE_WAIT
(where it was forgotten old days occasionally, I think).
It is minimal change and this is good.

But I look at problem at our side: if we receive such packet yet,
what should we make? Earlier we sent an ACK and dropped
bad segment or aborted connection. Now we just blackhole them
and the bug with missing case LAST_ACK just allowed to see the fact
that we changed behaviour, which is not good. :-)

Alexey

2001-12-11 17:51:48

by Alexey Kuznetsov

[permalink] [raw]
Subject: Re: TCP LAST-ACK state broken in 2.4.17-pre2

Hello!

> Thanks, that was quick! :)

If everyone were "quick" in this manner, linux kernel even would not boot. :-)

Alexey

2001-12-12 20:32:37

by Mika Liljeberg

[permalink] [raw]
Subject: Re: TCP LAST-ACK state broken in 2.4.17-pre2

11:11:57.149389 10.0.5.3.3071 > 10.0.5.11.1327: P 254033:255481(1448) ack 1 win 7300 <timestamp 3704210538 8515686,eol> (DF) (ttl 63, id 860, len 1500)
11:11:57.149451 10.0.5.11.1327 > 10.0.5.3.3071: . [tcp sum ok] 1:1(0) ack 255481 win 65160 <nop,nop,timestamp 8544990 3704210538> (DF) (ttl 64, id 30696, len 52)
11:11:57.661595 10.0.5.3.3071 > 10.0.5.11.1327: FP 255481:256001(520) ack 1 win 7300 <timestamp 3705679288 8515686,eol> (DF) (ttl 63, id 861, len 572)
11:11:57.661660 10.0.5.11.1327 > 10.0.5.3.3071: F [tcp sum ok] 1:1(0) ack 256002 win 65160 <nop,nop,timestamp 8545041 3705679288> (DF) (ttl 64, id 30697, len 52)
11:12:11.340666 10.0.5.3.3071 > 10.0.5.11.1327: FP 255481:256001(520) ack 1 win 7300 <timestamp 3727069913 8515686,eol> (DF) (ttl 63, id 863, len 572)
11:12:11.340698 10.0.5.11.1327 > 10.0.5.3.3071: . [tcp sum ok] 2:2(0) ack 256002 win 65160 <nop,nop,timestamp 8546409 3727069913,nop,nop,sack sack 1 {255481:256002} > (DF) (ttl 64, id 30698, len 64)


Attachments:
last-ack.txt (977.00 B)

2001-12-13 10:20:26

by Pasi Sarolahti

[permalink] [raw]
Subject: Re: TCP LAST-ACK state broken in 2.4.17-pre2

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Hello,

Mika wrote:
> Looks like there are still problems after applying your quick patch.
> Back at the lab we observed a case where the FIN-ACK packet is dropped
> and Linux fails to retransmit it. See the attached dump for the details
> (Linux is 10.0.5.11). The action ends there, with Linux timing out to
> CLOSED state and the remote stuck in FIN-WAIT-2.

I think following might happen: When the receiver gets FIN and acks it, it
should be in CLOSE_WAIT or LAST_ACK state depending on the situation,
right? In tcp_rcv_state_process() the receiver calls ack_snd_check, which
has the following test:

if (!tcp_ack_scheduled(tp)) {
/* We sent a data segment already. */
return;
}
__tcp_ack_snd_check(sk, 1);

I think in this situation it may be possible that ack_scheduled is false,
which would mean that the receiver never acks the further FIN segments if
the first FIN-ack is lost. Maybe something like the following might work,
although it looks pretty ugly :-)

if (!tcp_ack_scheduled(tp) &&
(sk->state == TCP_ESTABLISHED ||
sk->state == TCP_FIN_WAIT1)) {
/* We sent a data segment already. */
return;
}

(Btw, I'm not on the lkml, so I would like to be cc'd of the further
discussion on this thread)

- - Pasi

- --
http://www.cs.helsinki.fi/u/sarolaht/
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.0.6 (GNU/Linux)
Comment: For info see http://www.gnupg.org

iD8DBQE8GIDRoNa7NH1G2csRAvoLAKC5JbdYF524KMGKOG7X7jObLIkifgCffIbG
tA/Cr4FqSeWhEArt/mPlHGY=
=KD8M
-----END PGP SIGNATURE-----

2001-12-13 18:00:19

by Alexey Kuznetsov

[permalink] [raw]
Subject: Re: TCP LAST-ACK state broken in 2.4.17-pre2

Hello!

> Looks like there are still problems

This is not related to that problem.


> 11:11:57.149389 10.0.5.3.3071 > 10.0.5.11.1327: P 254033:255481(1448) ack 1 win 7300 <timestamp 3704210538 8515686,eol> (DF) (ttl 63, id 860, len 1500)
> 11:11:57.149451 10.0.5.11.1327 > 10.0.5.3.3071: . [tcp sum ok] 1:1(0) ack 255481 win 65160 <nop,nop,timestamp 8544990 3704210538> (DF) (ttl 64, id 30696, len 52)
> 11:11:57.661595 10.0.5.3.3071 > 10.0.5.11.1327: FP 255481:256001(520) ack 1 win 7300 <timestamp 3705679288 8515686,eol> (DF) (ttl 63, id 861, len 572)
> 11:11:57.661660 10.0.5.11.1327 > 10.0.5.3.3071: F [tcp sum ok] 1:1(0) ack 256002 win 65160 <nop,nop,timestamp 8545041 3705679288> (DF) (ttl 64, id 30697, len 52)
> 11:12:11.340666 10.0.5.3.3071 > 10.0.5.11.1327: FP 255481:256001(520) ack 1 win 7300 <timestamp 3727069913 8515686,eol> (DF) (ttl 63, id 863, len 572)
> 11:12:11.340698 10.0.5.11.1327 > 10.0.5.3.3071: . [tcp sum ok] 2:2(0) ack 256002 win 65160 <nop,nop,timestamp 8546409 3727069913,nop,nop,sack sack 1 {255481:256002} > (DF) (ttl 64, id 30698, len 64)

Please, make cat /proc/net/tcp at this point. To be honest I do not believe
that tcpdump finishes _here_. When will retransmit timer expire? Taking
into account that 10.0.5.3 has rto of 14 seconds (distance between retransmits
of its FIN :-)), linux can have even more. In the case of such bad connection
closing fin-wait-2 via abort is pretty normal.

Alexey

2001-12-13 19:31:31

by Mika Liljeberg

[permalink] [raw]
Subject: Re: TCP LAST-ACK state broken in 2.4.17-pre2

[email protected] wrote:
> This is not related to that problem.

I believe you. Nevertheless, it appears to be a problem that happens in
the LAST-ACK state.

> Please, make cat /proc/net/tcp at this point.

I'll do that if it happens again.

> To be honest I do not believe
> that tcpdump finishes _here_. When will retransmit timer expire? Taking
> into account that 10.0.5.3 has rto of 14 seconds (distance between retransmits
> of its FIN :-)), linux can have even more. In the case of such bad connection
> closing fin-wait-2 via abort is pretty normal.

I'm afraid it did end there. :( The data transfer was unidirectional
from the remote towards the Linux machine. During the SYN exchange the
RTT is less than one second. The rest is queuing delay. So Linux should
have a fairly low RTO. There were no FIN retransmissions, I'm sorry to
say.

> Alexey

Cheers,

MikaL

2001-12-13 19:39:11

by Alexey Kuznetsov

[permalink] [raw]
Subject: Re: TCP LAST-ACK state broken in 2.4.17-pre2

Hello!

> have a fairly low RTO. There were no FIN retransmissions, I'm sorry to
> say.

I believe, believe. :-)

It is possible _only_ if rto is at 120 seconds. It is the only case
when retransmissions do not happen and this would be normal behaviour.

For now it is the only hypothesis and it will be clear from /proc/net/tcp,
whether is this right or not.

Alexey