2001-02-21 13:09:47

by Markus Germeier

[permalink] [raw]
Subject: Problem with 2.2.19pre9 (Connection closed.)

Hello,

after upgrading to 2.2.19pre9 (+ 2 NFS-patches, IPv6 enabled) idle
connections tend to shut down without a visible reason:

client->ssh server
Last login: Mon Feb 19 2001 18:01:12 from client.domain
Sun Microsystems Inc. SunOS 5.8 Generic February 2000
You have mail.
server->Disconnected; connection lost (Connection closed.).
Connection to server closed.
client->uname -a
Linux client 2.2.19pre9 #6 Thu Feb 15 09:26:46 MET 2001 i686 unknown

Here is the relevant part of a tcpdump from this session: (The whole
dump can be found here: http://www.tzi.de/~mager/linux.tcpdump )

[...]
10:15:17.336759 eth0 < server.domain.ssh > client.domain.afbackup: P 2242:2294(52) ack 1790 win 24616 <nop,nop,timestamp 232542776 43462051> (DF)
10:15:17.350719 eth0 > client.domain.afbackup > server.domain.ssh: . 1790:1790(0) ack 2294 win 31856 <nop,nop,timestamp 43462055 232542776> (DF)
[...]
12:15:17.158963 eth0 < server.domain.ssh > client.domain.afbackup: P 2293:2294(1) ack 1790 win 24616 <nop,nop,timestamp 233262778 43462055> (DF)
12:15:18.506350 eth0 < server.domain.ssh > client.domain.afbackup: P 2293:2294(1) ack 1790 win 24616 <nop,nop,timestamp 233262913 43462055> (DF)
12:15:21.216267 eth0 < server.domain.ssh > client.domain.afbackup: P 2293:2294(1) ack 1790 win 24616 <nop,nop,timestamp 233263184 43462055> (DF)
12:15:26.636121 eth0 < server.domain.ssh > client.domain.afbackup: P 2293:2294(1) ack 1790 win 24616 <nop,nop,timestamp 233263726 43462055> (DF)
12:15:37.475848 eth0 < server.domain.ssh > client.domain.afbackup: P 2293:2294(1) ack 1790 win 24616 <nop,nop,timestamp 233264810 43462055> (DF)
12:15:59.155272 eth0 < server.domain.ssh > client.domain.afbackup: P 2293:2294(1) ack 1790 win 24616 <nop,nop,timestamp 233266978 43462055> (DF)
12:16:42.514137 eth0 < server.domain.ssh > client.domain.afbackup: P 2293:2294(1) ack 1790 win 24616 <nop,nop,timestamp 233271314 43462055> (DF)
12:17:42.512562 eth0 < server.domain.ssh > client.domain.afbackup: P 2293:2294(1) ack 1790 win 24616 <nop,nop,timestamp 233277314 43462055> (DF)
12:18:42.511239 eth0 < server.domain.ssh > client.domain.afbackup: P 2293:2294(1) ack 1790 win 24616 <nop,nop,timestamp 233283314 43462055> (DF)
12:19:42.509606 eth0 < server.domain.ssh > client.domain.afbackup: P 2293:2294(1) ack 1790 win 24616 <nop,nop,timestamp 233289314 43462055> (DF)
12:20:42.508124 eth0 < server.domain.ssh > client.domain.afbackup: P 2293:2294(1) ack 1790 win 24616 <nop,nop,timestamp 233295314 43462055> (DF)
12:21:42.506655 eth0 < server.domain.ssh > client.domain.afbackup: P 2293:2294(1) ack 1790 win 24616 <nop,nop,timestamp 233301314 43462055> (DF)
12:22:42.505090 eth0 < server.domain.ssh > client.domain.afbackup: P 2293:2294(1) ack 1790 win 24616 <nop,nop,timestamp 233307314 43462055> (DF)
[...]
14:23:10.527642 eth0 > client.domain.afbackup > server.domain.ssh: . 1789:1789(0) ack 2294 win 31856 <nop,nop,timestamp 44949377 233307314> (DF)
14:23:10.528676 eth0 < server.domain.ssh > client.domain.afbackup: R 2292303093:2292303093(0) win 0 (DF)
[...]

Any ideas?

Thanks a lot for your help!!

Regards
Markus

--
Markus Germeier
[email protected]


2001-02-21 13:18:37

by Alan

[permalink] [raw]
Subject: Re: Problem with 2.2.19pre9 (Connection closed.)

> after upgrading to 2.2.19pre9 (+ 2 NFS-patches, IPv6 enabled) idle
> connections tend to shut down without a visible reason:

Yes I've seen this too. It seems that the tcp changes broke the keepalive
handling somewhere when I leave a non Linux target idle.

Dave - any ideas, shall we back it out and work on it for 2.2.20 ?

2001-02-21 13:34:20

by Jes Sorensen

[permalink] [raw]
Subject: Re: Problem with 2.2.19pre9 (Connection closed.)

>>>>> "Alan" == Alan Cox <[email protected]> writes:

>> after upgrading to 2.2.19pre9 (+ 2 NFS-patches, IPv6 enabled) idle
>> connections tend to shut down without a visible reason:

Alan> Yes I've seen this too. It seems that the tcp changes broke the
Alan> keepalive handling somewhere when I leave a non Linux target
Alan> idle.

I reported this on netdev last week as well.

I only see this for connections with incoming traffic where I don't
send something out (like irc), whereas unused ssh connections seem to
survive fine.

Hopefully another hint that could help nailing the bug.

Jes

2001-02-21 13:56:33

by Markus Germeier

[permalink] [raw]
Subject: Re: Problem with 2.2.19pre9 (Connection closed.)

Jes Sorensen <[email protected]> writes:

> I only see this for connections with incoming traffic where I don't
> send something out (like irc), whereas unused ssh connections seem to
> survive fine.

Just for the record: My example was an idle ssh connection!

I believe Alan is correct. I can't remember having this problem with
another linux box. I'll try to reproduce this with a linux box.

Thanks for your quick responses. Hopefully we can resolve this before
2.2.19 comes out!

Regards
Markus

--
Markus Germeier
[email protected]

2001-02-21 15:49:29

by Jes Sorensen

[permalink] [raw]
Subject: Re: Problem with 2.2.19pre9 (Connection closed.)

>>>>> "Markus" == Markus Germeier <[email protected]> writes:

Markus> Jes Sorensen <[email protected]> writes:
>> I only see this for connections with incoming traffic where I don't
>> send something out (like irc), whereas unused ssh connections seem
>> to survive fine.

Markus> Just for the record: My example was an idle ssh connection!

Markus> I believe Alan is correct. I can't remember having this
Markus> problem with another linux box. I'll try to reproduce this
Markus> with a linux box.

Hmmm I am seeing this with Linux boxes at the other end. Haven't tried
talking to non Linux.

Jes

2001-02-21 21:57:15

by David Miller

[permalink] [raw]
Subject: Re: Problem with 2.2.19pre9 (Connection closed.)


Alan Cox writes:
> Dave - any ideas, shall we back it out and work on it for 2.2.20 ?

The one change which is probably causing this is non-critical,
so let me study things quickly tonight and if I come up with
nothing I'll show you what you can revert safely.

Later,
David S. Miller
[email protected]

2001-02-22 13:13:45

by Markus Germeier

[permalink] [raw]
Subject: Re: Problem with 2.2.19pre9 (Connection closed.)

Hi all,

I did some further investigation and found the following:

It seems to me that this is a linux <-> solaris problem. I have no
problems with AIX 4.1.4, IRIX 6.5 or WIN2K. However all solaris boxes
I have access to (2.6, 7, 8, sparc and intel) give me a "connection
closed" after 2h, which is (at least I blelieve so ;-) the TCP timer
for keepalive.

Tell me if I can provide you with further data to nail down this bug.

Jes: I thought about your information that ssh connections do not show
this problem. I believe you are using ssh 2.3 or 2.4 from ssh.com,
right? 2.3 introduced a rekeying-feature which exchanges new keys
every 60 minutes, so the TCP keepalive is never triggered. (Due to a
bug which is still present in 2.4, we can't use these versions at my
site.)

HTH.

Regards,
Markus

--
Markus Germeier
[email protected]

2001-02-22 13:25:17

by Jes Sorensen

[permalink] [raw]
Subject: Re: Problem with 2.2.19pre9 (Connection closed.)

>>>>> "Markus" == Markus Germeier <[email protected]> writes:

Markus> Tell me if I can provide you with further data to nail down
Markus> this bug.

Alan forwarded a patch to me from DaveM which fixed it for me.

Markus> Jes: I thought about your information that ssh connections do
Markus> not show this problem. I believe you are using ssh 2.3 or 2.4
Markus> from ssh.com, right? 2.3 introduced a rekeying-feature which
Markus> exchanges new keys every 60 minutes, so the TCP keepalive is
Markus> never triggered. (Due to a bug which is still present in 2.4,
Markus> we can't use these versions at my site.)

No way, I don't use software from those slimeballs, I use OpenSSH.

The problems I were seeing were much more than every 2 hrs, more like
every 10-15 mins. Anyway it seems it got fixed.

Jes

Subject: Re: Problem with 2.2.19pre9 (Connection closed.)

On Thu, 22 Feb 2001, Jes Sorensen wrote:
> Alan forwarded a patch to me from DaveM which fixed it for me.

Jes, could you forward it here as well?

> The problems I were seeing were much more than every 2 hrs, more like
> every 10-15 mins. Anyway it seems it got fixed.

I've seen IRC sessions getting dropped every 10-15 minutes as well, and
about 70% outgoing http connections (to a FreeBSD proxy) dropping in less
than 1 second. It was so bad here I had to downgrade to 2.2.18 (which fixed
all the issues, so I'm pretty sure it was kernel trouble and not network
trouble).

--
"One disk to rule them all, One disk to find them. One disk to bring
them all and in the darkness grind them. In the Land of Redmond
where the shadows lie." -- The Silicon Valley Tarot
Henrique Holschuh