2001-04-09 14:44:04

by Eugene B. Berdnikov

[permalink] [raw]
Subject: Bug report: tcp staled when send-q != 0, timers == 0.

Hi all.

In brief: a stale state of the tcp send queue was observed for 2.2.17
while send-q counter and connection window sizes are not zero:

% netstat -n -eot | grep 1018
tcp 0 13064 194.190.166.31:22 194.190.161.106:1018
ESTABLISHED 0 11964 off (0.00/0/0)

Host 194.190.166.31: 2.2.17 on PPro-180 (HP NetServer E40),
compiled with CONFIG_M686=y and running sshd1 server.

SYMPTOMS

1. When data is sent to 194.190.166.31, it is ack'ed with non-zero
window size, but zero data block is returned (whereas send-q!=0).
Strace shows, that server gets data via read(2) and replies
with write(2) to the same descriptor. The send-q counter is
encremented by the number of bytes sent, but timers remain zero.
No data is returned to the network.

2. When data is is sent to the controlling pty, sshd also steps
through the read(2) and write(2), with the same result
(send-q is encremented, timers do not change, no traffic).

3. Keepalive is enabled in ssh client and server (on both sides).
The curious thing is that every keepalive interval (2h as default)
host 194.190.166.31 sends exactly one packet with ethernet
MTU size:

17:50:30.391347 > 194.190.166.31.ssh > 194.190.161.106.1018:
P 1:1449(1448) ack 1 win 32640
<nop,nop,timestamp 72137447 1733874370> (DF) [tos 0x10]
17:50:31.102567 < 194.190.161.106.1018 > 194.190.166.31.ssh:
. 1:1(0) ack 1449 win 32120
<nop,nop,timestamp 1734601828 72137447> (DF) [tos 0x10]

The value send-q is properly decremented every such transmission.

4. This connection was traced for a long time, and when send-q counter
reaches zero (due to "keepalive" exchange), it gets out from its
stale state. Now it behaves as an normal connection.
I still keep it for investigation (if any:).

INFO

Here is some supplementary information on the network configuration for
this machine. I belive it is not directly concerned to the bug discussed,
but put here for the completeness.

Host has Intel EtherExpress-100 ethernet card (standard driver from 2.2.17)
and SkyMedia-200 DVB sattelite receiver, running driver "sm200_lnx"
from Telemann. Staled connection passes over ethernet only.

# lsmod
Module Size Used by
sm200_lnx 18800 1
eepro100 16180 1 (autoclean)

# ip -s l l
1: lo: <LOOPBACK,UP> mtu 3924 qdisc noqueue
link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
RX: bytes packets errors dropped overrun mcast
349139839 1395237 0 0 0 0
TX: bytes packets errors dropped carrier collsns
349139839 1395237 0 0 0 0
2: eth0: <BROADCAST,MULTICAST,UP> mtu 1500 qdisc pfifo_fast qlen 100
link/ether 00:a0:c9:9e:c9:7d brd ff:ff:ff:ff:ff:ff
RX: bytes packets errors dropped overrun mcast
2338597780 7637867 0 0 0 0
TX: bytes packets errors dropped carrier collsns
3016099313 7378871 0 0 0 293385
58: sm200: <BROADCAST,MULTICAST,UP> mtu 1500 qdisc pfifo_fast qlen 100
link/ether 00:90:bc:01:1a:da brd ff:ff:ff:ff:ff:ff
RX: bytes packets errors dropped overrun mcast
603326460 896671 0 0 0 0
TX: bytes packets errors dropped carrier collsns
0 0 0 0 0 0

# ip -s r l
192.168.30.31 dev eth0 scope link
194.190.166.31 dev eth0 scope link src 194.190.166.31
192.168.30.0/24 dev eth0 proto kernel scope link src 192.168.30.31
127.0.0.0/8 dev lo scope link
default via 192.168.30.34 dev eth0 src 194.190.166.31

Here is the line from /proc/net/tcp for this connection when it was stale:

128: 1FA6BEC2:0016 6AA1BEC2:03FA 01 00003394:00000000 00:00000000 00000000
0 0 11964

That's all that I consider as interesting.

I was told by ANK, that it might be helpfull to find a socket in the memory
and dump it contents, while it was staled. However, the time was lost...

If anybody tell me which additional information can be extracted for the
diagnostic, I'll try to get it. In any case, I plan to run something through
this connection in hope to reproduce this state again.
--
Eugene Berdnikov


2001-04-10 17:39:14

by Alexey Kuznetsov

[permalink] [raw]
Subject: Re: Bug report: tcp staled when send-q != 0, timers == 0.

Hello!

> In brief: a stale state of the tcp send queue was observed for 2.2.17
> while send-q counter and connection window sizes are not zero:

I think I pinned down this. The patch is appended.


> diagnostic, I'll try to get it. In any case, I plan to run something through
> this connection in hope to reproduce this state again.

If my guess is right, you can easily put this socket to funny state
just catting a large file and kill -STOP'ing ssh. ssh will close window,
but sshd will not send zero probes. Any socket with keepalives enabled
enters this state after the first keepalive is sent.
[ Note, that it is not Butenko's problem, it is still to be discovered. 8) ]

I think you will not able to reproduce full problem: socket will revive
after the first received ACK. It is another bug and its probability is
astronomically low.

Alexey


--- linux/net/ipv4/tcp_input.c.orig Mon Apr 9 22:46:56 2001
+++ linux/net/ipv4/tcp_input.c Tue Apr 10 21:23:33 2001
@@ -733,8 +733,6 @@
if (tp->retransmits) {
if (tp->packets_out == 0) {
tp->retransmits = 0;
- tp->fackets_out = 0;
- tp->retrans_out = 0;
tp->backoff = 0;
tcp_set_rto(tp);
} else {
@@ -781,8 +779,10 @@
if(sk->zapped)
return(1); /* Dead, can't ack any more so why bother */

- if (tp->pending == TIME_KEEPOPEN)
+ if (tp->pending == TIME_KEEPOPEN) {
tp->probes_out = 0;
+ tp->pending = 0;
+ }

tp->rcv_tstamp = tcp_time_stamp;

@@ -850,8 +850,6 @@
if (tp->retransmits) {
if (tp->packets_out == 0) {
tp->retransmits = 0;
- tp->fackets_out = 0;
- tp->retrans_out = 0;
}
} else {
/* We don't have a timestamp. Can only use
@@ -878,6 +876,8 @@
tcp_ack_packets_out(sk, tp);
} else {
tcp_clear_xmit_timer(sk, TIME_RETRANS);
+ tp->fackets_out = 0;
+ tp->retrans_out = 0;
}

flag &= (FLAG_DATA | FLAG_WIN_UPDATE);
--- linux/net/ipv4/tcp_output.c.orig Mon Apr 9 22:47:06 2001
+++ linux/net/ipv4/tcp_output.c Tue Apr 10 21:23:33 2001
@@ -546,6 +546,8 @@
*/
kfree_skb(next_skb);
sk->tp_pinfo.af_tcp.packets_out--;
+ if (sk->tp_pinfo.af_tcp.fackets_out)
+ sk->tp_pinfo.af_tcp.fackets_out--;
}
}

2001-04-10 21:19:52

by Eugene B. Berdnikov

[permalink] [raw]
Subject: Re: Bug report: tcp staled when send-q != 0, timers == 0.

Hello.

On Tue, Apr 10, 2001 at 09:38:43PM +0400, [email protected] wrote:
> If my guess is right, you can easily put this socket to funny state
> just catting a large file and kill -STOP'ing ssh. ssh will close window,
> but sshd will not send zero probes.

[1] I have checked your statement on 2 different machines, running 2.2.17.
No confirmation. But this is much more funny than it simply sounds. :)

The thing is that one machine (which run ssh client in my bug report)
do send ACKs when ssh is SIGSTOP'ed. The other one does not send ACKs,
but much more curious is that it does not send ACKs even when input
buffer is filled, and client IS NOT stopped! :))) Hence connection dies
due to retransmission timeout on the server side.

I did not believe my own eyes and tried this test several times, with
ssh1 and openssh, copying ssh configs, but results were always the same.

Both hosts are running 2.2.17 on K6 processors, compiled via egcs-1.1.2,
with minor differences in the kernel configuration. If you really check
your statements before writing, you surely have a 2.2.17 which behave some
another way, which I can't reproduce. Isn't funny? :)))

I can send configs (and even binary kernels with modules) for verification.
If this is not a complete fault, we have a very-very sad situation, when
tcp core behaviour depends on the secondary configuration options.
I have no other ideas how it can be explained.

[2] Your second statement is that sshd with keepalive enabled does not send
zero probes when input window is closed. Be sure, in my case it sends:

01:04:05.025715 194.190.166.31.22 > 194.190.161.106.1006: . ack 1 win 32120 <nop,nop,timestamp 117938386 1780393243> (DF) [tos 0x10]
01:04:05.025816 194.190.161.106.1006 > 194.190.166.31.22: . ack 17376 win 0 <nop,nop,timestamp 1780405324 117898941> (DF) [tos 0x10]
01:06:05.953026 194.190.166.31.22 > 194.190.161.106.1006: . ack 1 win 32120 <nop,nop,timestamp 117950477 1780405324> (DF) [tos 0x10]
01:06:05.953122 194.190.161.106.1006 > 194.190.166.31.22: . ack 17376 win 0 <nop,nop,timestamp 1780417417 117898941> (DF) [tos 0x10]

BTW, I strongly rule out a possibility to stop my ssh client when I
encounter the reported bug.

> Any socket with keepalives enabled
> enters this state after the first keepalive is sent.

I do not understand how connection with closed window can wait until
first keepalive - it must do zero probes instead.

> [ Note, that it is not Butenko's problem, it is still to be discovered. 8) ]
>
> I think you will not able to reproduce full problem: socket will revive
> after the first received ACK. It is another bug and its probability is
> astronomically low.

Hmm... I observed this bug on the host, which never performs more
than 10 conn/sec and has peak loadvg ~ 0.15.
--
Eugene Berdnikov

2001-04-11 10:17:19

by Eugene B. Berdnikov

[permalink] [raw]
Subject: Re: Bug report: tcp staled when send-q != 0, timers == 0.

Hello.

I'd like to make additional comments to my previous message.

On Wed, Apr 11, 2001 at 01:19:01AM +0400, Eugene B. Berdnikov wrote:
> The thing is that one machine (which run ssh client in my bug report)
> do send ACKs when ssh is SIGSTOP'ed. The other one does not send ACKs,
> but much more curious is that it does not send ACKs even when input
> buffer is filled, and client IS NOT stopped! :))) Hence connection dies
> due to retransmission timeout on the server side.
[...]
> Both hosts are running 2.2.17 on K6 processors, compiled via egcs-1.1.2,
> with minor differences in the kernel configuration.

My observation of the "buggy" 2.2.17 was on the host, connected via
modem and running ppp-2.4.0b2 with MTU=256. Today I got this kernel and
modules, and run it on another machine with Cel-450 and 3c590B-TX
ethernet card. It also exhibits loss of ACKs. My study shows, that
it depends upon MTU and keepalive flags:

mtu 382 + keepalive yes -> loss
mtu 382 + keepalive no -> ok
mtu 383 + any keepalive -> ok

I tested several values of MTU over and below 382, and it seems me that
value 382 is a boundary between normal and error behaviour.

Then I have tested kernel 2.2.14-5.0 from the RedHat-6.2 distribution on
the same machine (Cel-450 + 3c59x driver). It also shows loss of ACKs,
but with another MTU boundary and independently of keepalive:

mtu <= 420 + any keepalive -> loss
mtu >= 421 + any keepalive -> ok

At last, I tried several MTUs on 3d computer, running "right" 2.2.17, and
could not find conditions, under which any loss of ACKs can be detected.

So, the conclusion is that this loss depends upon kernel version,
configuration options and MTU on the interface. I suspect this is
another bug, accationaly found in discussion. :)

I complete my statements with the illustration dump. Commands were like

ifconfig ppp0 mtu 256
ssh -o 'keepalive yes' 194.190.166.31 \
'while true ; do cat /etc/passwd ; done' 2>&1 | less
tcpdump -nl host 194.190.166.31

[...]

10:20:11.196983 > 172.16.42.57.1023 > 194.190.166.31.ssh: . 655:655(0) ack 30120 win 15708 <nop,nop,timestamp 8830012 121274899> (DF) [tos 0x10]
10:20:11.266845 < 194.190.166.31.ssh > 172.16.42.57.1023: P 30120:30324(204) ack 655 win 32616 <nop,nop,timestamp 121274900 8829912> (DF) [tos 0x10]
10:20:11.356837 < 194.190.166.31.ssh > 172.16.42.57.1023: P 30324:30528(204) ack 655 win 32616 <nop,nop,timestamp 121274900 8829912> (DF) [tos 0x10]
10:20:11.426832 < 194.190.166.31.ssh > 172.16.42.57.1023: P 30528:30732(204) ack 655 win 32616 <nop,nop,timestamp 121274902 8829919> (DF) [tos 0x10]
10:20:11.476844 < 194.190.166.31.ssh > 172.16.42.57.1023: P 30732:30936(204) ack 655 win 32616 <nop,nop,timestamp 121274902 8829919> (DF) [tos 0x10]
10:20:11.546843 < 194.190.166.31.ssh > 172.16.42.57.1023: P 30936:31140(204) ack 655 win 32616 <nop,nop,timestamp 121274962 8829928> (DF) [tos 0x10]
10:20:11.636840 < 194.190.166.31.ssh > 172.16.42.57.1023: P 31140:31344(204) ack 655 win 32616 <nop,nop,timestamp 121274962 8829928> (DF) [tos 0x10]
10:20:11.706843 < 194.190.166.31.ssh > 172.16.42.57.1023: P 31344:31548(204) ack 655 win 32616 <nop,nop,timestamp 121274963 8829935> (DF) [tos 0x10]
10:20:11.766854 < 194.190.166.31.ssh > 172.16.42.57.1023: P 31548:31752(204) ack 655 win 32616 <nop,nop,timestamp 121274963 8829935> (DF) [tos 0x10]
10:20:11.866832 < 194.190.166.31.ssh > 172.16.42.57.1023: P 31752:31956(204) ack 655 win 32616 <nop,nop,timestamp 121274964 8829939> (DF) [tos 0x10]
10:20:11.926839 < 194.190.166.31.ssh > 172.16.42.57.1023: P 31956:32160(204) ack 655 win 32616 <nop,nop,timestamp 121274964 8829939> (DF) [tos 0x10]
10:20:11.996837 < 194.190.166.31.ssh > 172.16.42.57.1023: P 32160:32364(204) ack 655 win 32616 <nop,nop,timestamp 121274984 8829949> (DF) [tos 0x10]
10:20:12.066835 < 194.190.166.31.ssh > 172.16.42.57.1023: P 32364:32568(204) ack 655 win 32616 <nop,nop,timestamp 121274984 8829949> (DF) [tos 0x10]
10:20:12.126850 < 194.190.166.31.ssh > 172.16.42.57.1023: P 32568:32772(204) ack 655 win 32616 <nop,nop,timestamp 121274985 8829956> (DF) [tos 0x10]
10:20:12.216832 < 194.190.166.31.ssh > 172.16.42.57.1023: P 32772:32976(204) ack 655 win 32616 <nop,nop,timestamp 121274985 8829956> (DF) [tos 0x10]
10:20:12.286854 < 194.190.166.31.ssh > 172.16.42.57.1023: P 32976:33180(204) ack 655 win 32616 <nop,nop,timestamp 121274986 8829962> (DF) [tos 0x10]
10:20:12.356846 < 194.190.166.31.ssh > 172.16.42.57.1023: P 33180:33384(204) ack 655 win 32616 <nop,nop,timestamp 121274986 8829962> (DF) [tos 0x10]
10:20:12.426838 < 194.190.166.31.ssh > 172.16.42.57.1023: P 33384:33588(204) ack 655 win 32616 <nop,nop,timestamp 121275007 8829969> (DF) [tos 0x10]
10:20:12.516835 < 194.190.166.31.ssh > 172.16.42.57.1023: P 33588:33792(204) ack 655 win 32616 <nop,nop,timestamp 121275007 8829969> (DF) [tos 0x10]
10:20:12.576830 < 194.190.166.31.ssh > 172.16.42.57.1023: P 33792:33996(204) ack 655 win 32616 <nop,nop,timestamp 121275008 8829976> (DF) [tos 0x10]
10:20:12.646843 < 194.190.166.31.ssh > 172.16.42.57.1023: P 33996:34200(204) ack 655 win 32616 <nop,nop,timestamp 121275008 8829976> (DF) [tos 0x10]
10:20:12.706842 < 194.190.166.31.ssh > 172.16.42.57.1023: P 34200:34404(204) ack 655 win 32616 <nop,nop,timestamp 121275009 8829982> (DF) [tos 0x10]
10:20:12.776850 < 194.190.166.31.ssh > 172.16.42.57.1023: P 34404:34608(204) ack 655 win 32616 <nop,nop,timestamp 121275009 8829982> (DF) [tos 0x10]
10:20:12.846842 < 194.190.166.31.ssh > 172.16.42.57.1023: P 34608:34812(204) ack 655 win 32616 <nop,nop,timestamp 121275029 8829990> (DF) [tos 0x10]
10:20:12.936834 < 194.190.166.31.ssh > 172.16.42.57.1023: P 34812:35016(204) ack 655 win 32616 <nop,nop,timestamp 121275030 8829999> (DF) [tos 0x10]
10:20:13.006840 < 194.190.166.31.ssh > 172.16.42.57.1023: P 35016:35220(204) ack 655 win 32616 <nop,nop,timestamp 121275031 8830006> (DF) [tos 0x10]
10:20:13.046850 < 194.190.166.31.ssh > 172.16.42.57.1023: P 35220:35424(204) ack 655 win 32616 <nop,nop,timestamp 121275032 8830012> (DF) [tos 0x10]
10:20:21.376855 < 194.190.166.31.ssh > 172.16.42.57.1023: P 30120:30324(204) ack 655 win 32616 <nop,nop,timestamp 121275972 8830012> (DF) [tos 0x10]

And from here 194.190.166.31 retransmits until timeout:

10:20:40.146846 < 194.190.166.31.ssh > 172.16.42.57.1023: P 30120:30324(204) ack 655 win 32616 <nop,nop,timestamp 121277852 8830012> (DF) [tos 0x10]
10:21:17.746854 < 194.190.166.31.ssh > 172.16.42.57.1023: P 30120:30324(204) ack 655 win 32616 <nop,nop,timestamp 121281612 8830012> (DF) [tos 0x10]
10:22:32.956845 < 194.190.166.31.ssh > 172.16.42.57.1023: P 30120:30324(204) ack 655 win 32616 <nop,nop,timestamp 121289132 8830012> (DF) [tos 0x10]
10:24:32.966837 < 194.190.166.31.ssh > 172.16.42.57.1023: P 30120:30324(204) ack 655 win 32616 <nop,nop,timestamp 121301132 8830012> (DF) [tos 0x10]
10:26:32.986843 < 194.190.166.31.ssh > 172.16.42.57.1023: P 30120:30324(204) ack 655 win 32616 <nop,nop,timestamp 121313132 8830012> (DF) [tos 0x10]
10:28:32.966854 < 194.190.166.31.ssh > 172.16.42.57.1023: P 30120:30324(204) ack 655 win 32616 <nop,nop,timestamp 121325132 8830012> (DF) [tos 0x10]
--
Eugene Berdnikov

2001-04-11 16:36:30

by Alexey Kuznetsov

[permalink] [raw]
Subject: Re: Bug report: tcp staled when send-q != 0, timers == 0.

Hello!

> > If my guess is right, you can easily put this socket to funny state
> > just catting a large file and kill -STOP'ing ssh. ssh will close window,
> > but sshd will not send zero probes.
>
> [1] I have checked your statement on 2 different machines, running 2.2.17.
> No confirmation. But this is much more funny than it simply sounds. :)

_That_ socket which was stuck must show this behaviour.

To get this on new socket you should leave session idle for >2hours
until the first keeplaive. After this it will never probe under
any curcumstances. The bug was that keepalive corrupts state of timer
and probe0 timer is not started after this.


> buffer is filled, and client IS NOT stopped! :))) Hence connection dies
> due to retransmission timeout on the server side.

It is known linuxism. If the ratio connection_mss/link_mtu less than ~1/4
or connection is flood with tiny packets, after rcvbuf is full linux
enters memory paranoia mode pretending that all the packets are lost.
Ugly, unpleasant, but luckily harmless under any normal curcumstances.

One way to workaround is to set rx_copybreak on ethernet drivers to 400-500.

The bug is really difficult. It is not cured even in current 2.4
(only with zerocopy patch).


> I do not understand how connection with closed window can wait until
> first keepalive - it must do zero probes instead.

If socket has ever sent keepalive, it will not be able to send zero window
probes after this.


> Hmm... I observed this bug on the host, which never performs more
> than 10 conn/sec and has peak loadvg ~ 0.15.

8)8)8) Probability is probability.

Alexey

2001-04-11 16:57:11

by Alexey Kuznetsov

[permalink] [raw]
Subject: Re: Bug report: tcp staled when send-q != 0, timers == 0.

Hello!

> At last, I tried several MTUs on 3d computer, running "right" 2.2.17, and
> could not find conditions, under which any loss of ACKs can be detected.

8)8)8)

ppp also inclined to the mss/mtu bug, it allocates too large buffers
and never breaks them. The difference between kernels looks funny, but
I think it finds explanation in differences between mss/mtu's.

Alexey

[ I will be absent since tomorrow for some time. ]

2001-04-11 18:36:11

by Eugene B. Berdnikov

[permalink] [raw]
Subject: Re: Bug report: tcp staled when send-q != 0, timers == 0.

Hello.

On Wed, Apr 11, 2001 at 08:56:41PM +0400, [email protected] wrote:
> ppp also inclined to the mss/mtu bug, it allocates too large buffers
> and never breaks them. The difference between kernels looks funny, but
> I think it finds explanation in differences between mss/mtu's.

In my experiments linux simply sets mss=mtu-40 at the start of ethernet
connections. I do not know why, but belive it's ok. How the version of
kernel and configuration options can affect mss later?
--
Eugene Berdnikov

2001-04-11 18:51:44

by Eugene B. Berdnikov

[permalink] [raw]
Subject: Re: Bug report: tcp staled when send-q != 0, timers == 0.

On Wed, Apr 11, 2001 at 08:35:51PM +0400, [email protected] wrote:
> To get this on new socket you should leave session idle for >2hours
> until the first keeplaive. After this it will never probe under
> any curcumstances. The bug was that keepalive corrupts state of timer
> and probe0 timer is not started after this.

Maybe. However, I did not understand, have you any reasonable explanation
how can I get such a socket. Indeed, I have been dealing with active
connection: I traced a squid redirector at a peak time of users web
activity. Several lines of log per second. That's why I was surprised
when this window become frozen.

If your model does not cover such situation, pls, take it in mind. :)
--
Eugene Berdnikov

2001-04-11 19:05:36

by Alexey Kuznetsov

[permalink] [raw]
Subject: Re: Bug report: tcp staled when send-q != 0, timers == 0.

Hello!

> In my experiments linux simply sets mss=mtu-40 at the start of ethernet
> connections. I do not know why, but belive it's ok. How the version of
> kernel and configuration options can affect mss later?

You can figure out this yourself. In fact you measured this.

With mss=1460 the problem does not exist.

The problem begins f.e. when mss is less and packet arrives on ethernet.
It eats the same 1.5k of memory, but carries only ~mss bytes of tcp payload.
See? We do not know this forward, advertise large window, have not enough
rcvbuf to get it filled and cannot do anything but dropping new packets.

ppp is more difficult. Actually, I do not know exactly how it works now.
At least, ppp in 2.4 trims skb if it has too much of unused space.

Alexey

2001-04-11 19:10:06

by Alexey Kuznetsov

[permalink] [raw]
Subject: Re: Bug report: tcp staled when send-q != 0, timers == 0.

Hello!

> If your model does not cover such situation, pls, take it in mind. :)

Taken.

Is the machine UP? The only other known dubious place is smp specific...

BTW if that cursed socket is still alive, try to make the experiment
with filling window on it. It must stuck, or my theory is completely wrong.

Alexey

2001-04-11 19:19:08

by Eugene B. Berdnikov

[permalink] [raw]
Subject: Re: Bug report: tcp staled when send-q != 0, timers == 0.

Hello.

On Wed, Apr 11, 2001 at 11:09:35PM +0400, [email protected] wrote:
> Is the machine UP? The only other known dubious place is smp specific...

It is a HP NetServer E40 with signle PPro-180. SMP is turned off in .config.

> BTW if that cursed socket is still alive, try to make the experiment
> with filling window on it. It must stuck, or my theory is completely wrong.

OK. I'll try tomorrow on return to my working place. Both peer hosts are
on UPSes and possibility to lose this connection is low. :)
--
Eugene Berdnikov

2001-04-11 19:29:29

by Eugene B. Berdnikov

[permalink] [raw]
Subject: Re: Bug report: tcp staled when send-q != 0, timers == 0.

Hello.

On Wed, Apr 11, 2001 at 11:04:04PM +0400, [email protected] wrote:
> > In my experiments linux simply sets mss=mtu-40 at the start of ethernet
> > connections. I do not know why, but belive it's ok. How the version of
> > kernel and configuration options can affect mss later?
[...]
> The problem begins f.e. when mss is less and packet arrives on ethernet.
> It eats the same 1.5k of memory, but carries only ~mss bytes of tcp payload.
> See? We do not know this forward, advertise large window, have not enough
> rcvbuf to get it filled and cannot do anything but dropping new packets.

However, I can't understand the dependency upon the kernel version, etc...

Let me steak on this question again. In my experiments I found the
dependency on the keepalive setting for connection on 2.2.17:

mtu 382 + keepalive yes -> loss
mtu 382 + keepalive no -> ok

I made 2 tries for each setting. Does your model of "mss/mtu bug" cover
such a picture? If the answer is "yes", I am almost satisfied. :-)

If this behaviour is not deterministic, and is driven by probability,
does it mean that I can get other results with large number of tests?
--
Eugene Berdnikov

2001-04-11 19:38:00

by Alexey Kuznetsov

[permalink] [raw]
Subject: Re: Bug report: tcp staled when send-q != 0, timers == 0.

Hello!

> mtu 382 + keepalive yes -> loss
> mtu 382 + keepalive no -> ok

Well, I ignored this because it looked as full sense. Sorry. 8)

> such a picture? If the answer is "yes", I am almost satisfied. :-)

No, the answer is strict "no". Until keepalive is triggered the first
time, it cannot affect connection in _any_ way.


... sorry, I have to run. Let's defer the furter investigation.

Alexey

2001-04-13 08:55:04

by Eugene B. Berdnikov

[permalink] [raw]
Subject: Re: Bug report: tcp staled when send-q != 0, timers == 0.

Hello.

On Wed, Apr 11, 2001 at 11:09:35PM +0400, [email protected] wrote:
> BTW if that cursed socket is still alive, try to make the experiment
> with filling window on it. It must stuck, or my theory is completely wrong.

Filling the socket via writing to pty (controlled by sshd), I found
the state, which seems very similar to that I have reported:

# netstat -n -eot | grep 1018
tcp 0 37684 194.190.166.31:22 194.190.161.106:1018
ESTABLISHED 0 11964 off (0.00/0/0)

You see, timers are zero and send-q is not. Zero probes were NOT observed,
exactly as you predict, but keepalive is correct.

However, this is _not_ a staled state. When I resume ssh on 194.190.166.31,
buffer gets empty and connection behaves as normal. I made this experiment
waiting for keepalive packets from both sides, as well as resuming ssh
before keepalives. In both cases connection did not become stale.

So, my conclusion is that your statement about zero probe breakdown due
to keepalives is right, and, I hope, your patch also makes the right thing.
However, this is not an answer for the question how such a stale connection
could arise, and predicted machanism to get it "stuck" does not work.

[I hope we will continue this discussion later.]
--
Eugene Berdnikov

2001-04-21 15:45:27

by Eugene B. Berdnikov

[permalink] [raw]
Subject: Re: Bug report: tcp staled when send-q != 0, timers == 0.

Hello.

On Wed, Apr 18, 2001 at 11:28:43PM +0400, [email protected] wrote:
> > However, this is _not_ a staled state. When I resume ssh on 194.190.166.31,
> > buffer gets empty and connection behaves as normal. I made this experiment
> > waiting for keepalive packets from both sides, as well as resuming ssh
> > before keepalives. In both cases connection did not become stale.
>
> Yes, I have said that it is practically impossible to reproduce this.
> My guess was that it is due to inaccurate counting of sacks when path
> mtu discovery happens or when segments are fragmented due to SWS avoidance
> override.

Im my case P-MTU discovery and fragmentation should be ruled out, but
sacks are really frequent: my hosts are connected via poor leased line.

> Actually, the most dubious place is your statement that this connection
> was not idle for 2 hours. It is _necessary_ condition
> for my scenario to work...

I only wrote that it was active when got stuck. It may be idle before -
I do not remember, but have a habit to keep connections for weeks. :)

As my experiments show, any connection, entering keepalive once,
have lose its ability to send zero probes - forever.

> > [I hope we will continue this discussion later.]
>
> I am ready.

OK. Let us return to the "mss/mtu bug". The most mystifying thing for
me is the dependance of the MTU threshold on the kernel version, etc.

I also wrote that it depends on keepalive flag. It seems, it was a mistake.
My additional experiments show that there is no distinct threshold of MTU:
trying the same value many times, I observed loss of acks in some cases,
and in some did not. So, the MTU boundary is not strict. Well, let it be.

But the question is what the minimum "reliable" MTU. There are lots of
situations when data comes rapidly in small packets (say, monitoring logs).
Is there a danger to lose such connections on a heavily loaded host?
--
Eugene Berdnikov

2001-04-21 17:05:59

by Alexey Kuznetsov

[permalink] [raw]
Subject: Re: Bug report: tcp staled when send-q != 0, timers == 0.

Hello!

> Im my case P-MTU discovery

Sorry, I lied. Not pmtu discovery but exaclty opposite effect
is important here: collapsing of small frames to larger ones.
Each such merge results in loss of 1 "sack" in 2.2.

> I only wrote that it was active when got stuck. It may be idle before -
> I do not remember, but have a habit to keep connections for weeks. :)

Good. 8)

> As my experiments show, any connection, entering keepalive once,
> have lose its ability to send zero probes - forever.

Exactly.


> OK. Let us return to the "mss/mtu bug". The most mystifying thing for
> me is the dependance of the MTU threshold on the kernel version, etc.

Well, you can reinvestigate this to get more reliable results...

Actually, this problem is so difficult that the study would be purely
academical; there is no hope to fix it in 2.2. It is partially
repaired during 2.3 and completely resolved only in 2.4.4.


> But the question is what the minimum "reliable" MTU. There are lots of
> situations when data comes rapidly in small packets (say, monitoring logs).
> Is there a danger to lose such connections on a heavily loaded host?

There is no real danger. Bad things can happen only when receiver does not
read data for very long time, in this case connection times out not
receiving any acks.

What's about minimum/maximum mtu... it does not exist. F.e. if sender floods
1 byte frames in TCP_NODELAY mode and receiver does not read them, 2.2 will
fail not depending on mtu. See? Even 40 bytes of IP+TCP headers (not counting
for additional overhead) guarantee that memory will exhaust by order earlier
than receiver can close window.

Alexey