2006-11-07 06:36:44

by NeilBrown

[permalink] [raw]
Subject: TCP stack sometimes loses ACKs ... or something


I upgraded my notebook from 2.6.16 to 2.6.18 recently and noticed that
I couldn't talk to my VOIP device (which has a WEB interface).
Watching traffic I see the three-way-handshake working perfectly, and
then the first data packet is sent (a partial HTTP request:
GET / HTTP/1.1 ....) and an ACK comes back from the device.
Then the next data packet (remainder of the HTTP request) is sent, but
tcpdump never sees the ACK, nor does the TCP stack. So the data gets
recent repeatedly. No ack. Ever.

With 2.6.16, The ack comes back just fine and the connection proceeds
as you would expect.

As it was a very reproducible problem I decided to try "git bisect"
and found

bad: [7b4f4b5ebceab67ce440a61081a69f0265e17c2a] [TCP]: Set default max buffers from memory pool size

I double checked as this seemed a fairly unlikely patch to cause the
problem, but this definitely is it.
The net effect of this patch is to change the last of the three
numbers in
cat /proc/sys/net/ipv4/tcp_[rw]mem
from well below 2^20 to well above. 2^20 seems to be a significant
number. I set tcp_wmem to that and the ACK was lost. I set it to
one less and the first ACK (at least) was accepted.
I ended up setting both r and w to 100000 and everything is fine.

Exploring more deeply, and comparing:
- a failing connection (to VIOP box, [rw]mem large)
- a working connection to VOIP box ([rw]mem small)
- a working connection to another machine ([rw]mem irrelevant).
I find:

The VIOP returns MSS=1360 in the SYN/ACK packet. Other machine
returns MSS=1460

The ack that is getting lost contains data as well as the
ACK. i.e. the same packet that ACKs at the TCP level includes the
HTTP level reply.
The matching ACK from the other machine (some Linux 2.6.8 I think)
is a data-less ACK followed very quickly by the HTTP reply in
a separate packet.

The 'Timestamps' option coming back from the VOIP box is a little
odd. The Timestamp in the SYN/ACK is the same as the timestamp in
the next ACK (the ack for the first partial HTTP request).
The Timestamp in the next packet which is the one that gets lost has
exactly the same TSval as previous packets, and TSecr is one more
than in the previous packet.

I assume that one (or more) of these differences combined with the
large tcp_[rw]mem value cause the packet loss, but I have no idea
which.

Help?

I can make the tcp traces available if needed, but these are really
the only non-trivial differences.

I'm willing to test patches.

NeilBrown


2006-11-07 06:48:18

by Stephen Hemminger

[permalink] [raw]
Subject: Re: TCP stack sometimes loses ACKs ... or something

Neil Brown wrote:
> I upgraded my notebook from 2.6.16 to 2.6.18 recently and noticed that
> I couldn't talk to my VOIP device (which has a WEB interface).
> Watching traffic I see the three-way-handshake working perfectly, and
> then the first data packet is sent (a partial HTTP request:
> GET / HTTP/1.1 ....) and an ACK comes back from the device.
> Then the next data packet (remainder of the HTTP request) is sent, but
> tcpdump never sees the ACK, nor does the TCP stack. So the data gets
> recent repeatedly. No ack. Ever.
>
> With 2.6.16, The ack comes back just fine and the connection proceeds
> as you would expect.
>
> As it was a very reproducible problem I decided to try "git bisect"
> and found
>
> bad: [7b4f4b5ebceab67ce440a61081a69f0265e17c2a] [TCP]: Set default max buffers from memory pool size
>
> I double checked as this seemed a fairly unlikely patch to cause the
> problem, but this definitely is it.
> The net effect of this patch is to change the last of the three
> numbers in
> cat /proc/sys/net/ipv4/tcp_[rw]mem
> from well below 2^20 to well above. 2^20 seems to be a significant
> number. I set tcp_wmem to that and the ACK was lost. I set it to
> one less and the first ACK (at least) was accepted.
> I ended up setting both r and w to 100000 and everything is fine.
>
> Exploring more deeply, and comparing:
> - a failing connection (to VIOP box, [rw]mem large)
> - a working connection to VOIP box ([rw]mem small)
> - a working connection to another machine ([rw]mem irrelevant).
> I find:
>
> The VIOP returns MSS=1360 in the SYN/ACK packet. Other machine
> returns MSS=1460
>
> The ack that is getting lost contains data as well as the
> ACK. i.e. the same packet that ACKs at the TCP level includes the
> HTTP level reply.
> The matching ACK from the other machine (some Linux 2.6.8 I think)
> is a data-less ACK followed very quickly by the HTTP reply in
> a separate packet.
>
> The 'Timestamps' option coming back from the VOIP box is a little
> odd. The Timestamp in the SYN/ACK is the same as the timestamp in
> the next ACK (the ack for the first partial HTTP request).
> The Timestamp in the next packet which is the one that gets lost has
> exactly the same TSval as previous packets, and TSecr is one more
> than in the previous packet.
>
> I assume that one (or more) of these differences combined with the
> large tcp_[rw]mem value cause the packet loss, but I have no idea
> which.
>
> Help?
>
> I can make the tcp traces available if needed, but these are really
> the only non-trivial differences.
>
> I'm willing to test patches.
>
> NeilBrown
>

You almost certainly have a windows scale corrupting firewall in your path.
See http://lwn.net/Articles/92727/

2.6.18 increased the maximum window size, so it aggravated a pre-existing
condition in your network. You can turn off window scaling globally
(with sysctl)
or per route congestion window limit.

It could also be that VOIP application is getting aggravated by TCP ABC.
That can be turned off with sysctl (net.ipv4.tcp_abc=0)




2006-11-07 07:07:53

by David Miller

[permalink] [raw]
Subject: Re: TCP stack sometimes loses ACKs ... or something


Window scaling... there is some intermediate device which is
trying to prevent "out of window" segments from passing through,
but it is not taking the negotiated window scale into account.
So it thinks that segments are outside of the window, when they
are not.