2007-10-13 18:18:18

by Peter Volkov

[permalink] [raw]
Subject: regression(?): starting with 2.6.21 sending packets became broken.

Hello, all on the list.

Please CC me in answers, I'm not subscribed. Please, if this is wrong
list tell me what is correct.

Starting with 2.6.21 (or may be 2.6.20 as I have not tried it) kernel I
have problem that most tcp based services freeze at some point of
operation. I've noticed this first on ssh but then found out that at
lease one other service became similarly. The problem sites somewhere in
the kernel as I've compiled 2.6.19, 2.6.21, and 2.6.22 with the
similar .config options (of course not exact, as some options does not
exist in some kernels, but seems that enabled options are all the same)
but I have this problem only with the 21 and 22. I've tried to debug the
problem a bit, but not a lot as that is production box working as linux
based firewall/router.

First I took tcpdump. Although ssh connection to the router is not
always possible as it often hangs before I get into router, after some
attempts ssh connection was established. On client computer I've started
tcpdump and worked a bit until hang. tcpdump output showed me that when
I press any keys the packets are sent to the server and proper ack are
received. Later I found that all commands I enter blindly are executed
on router but I receive no reply packets with some data in them (pure
ack). That's why nothing happens on the screen and it looks like
hanging.

Now I've got to the router started ssh connection from router to some
other server. It hanged too. I attached strace and found that ssh
receive keyboard pressings (read() calls in the output) and writes them
further to the kernel (write() calls) but tcpdump on the router shows no
packets. So packets enter kernel and lost somewhere inside.

Now a information about my system. That's a pentium4 system with
hyper-threading enabled. cpuinfo and lspci output attached. kernel built
with "gcc version 4.1.2 (Gentoo 4.1.2 p1.0.2)" and binutils version
2.17. My .config file for all kernels I've mentioned is available here:

http://theor.ran.gpi.ru/linux-2.6.19-gentoo-r5-config (works)
http://theor.ran.gpi.ru/linux-2.6.21-gentoo-r4-config (not works)
http://theor.ran.gpi.ru/linux-2.6.22-gentoo-r8-config (not works)

Besides standard gentoo patchsets all kernels have IMQ and IPSET's
patches.

Does anybody have any idea what's going on with the latest kernels? How
to debug it further?

--
Peter.


Attachments:
router-lspci.txt (1.15 kB)
routers-cpuinfo.txt (1.36 kB)
signature.asc (189.00 B)
Эта часть сообщения подписана цифровой подписью
Download all attachments

2007-10-13 19:32:17

by David R

[permalink] [raw]
Subject: Re: regression(?): starting with 2.6.21 sending packets became broken.

Peter Volkov wrote:
> Hello, all on the list.
>
> Please CC me in answers, I'm not subscribed. Please, if this is wrong
> list tell me what is correct.
>
> Starting with 2.6.21 (or may be 2.6.20 as I have not tried it) kernel I
> have problem that most tcp based services freeze at some point of
> operation. I've noticed this first on ssh but then found out that at
> lease one other service became similarly. The problem sites somewhere in
> the kernel as I've compiled 2.6.19, 2.6.21, and 2.6.22 with the
> similar .config options (of course not exact, as some options does not
> exist in some kernels, but seems that enabled options are all the same)
> but I have this problem only with the 21 and 22. I've tried to debug the
> problem a bit, but not a lot as that is production box working as linux
> based firewall/router.
>
Try

echo 0 > /proc/sys/net/ipv4/tcp_window_scaling

I bet you have broken router(s) between your machine and the problem
site(s).

Cheers
David

2007-10-13 20:35:36

by Jan Engelhardt

[permalink] [raw]
Subject: Re: regression(?): starting with 2.6.21 sending packets became broken.


On Oct 13 2007 19:59, David wrote:
>Try
>
>echo 0 > /proc/sys/net/ipv4/tcp_window_scaling
>
>I bet you have broken router(s) between your machine and the problem
>site(s).

There is an xt_TCPOPTSTRIP module in the works that allows you to strip
Window Scaling only on the connections you want (rather than globally);
seems to be in for 2.6.24 at earliest, though it's there is also the
standalone patch.

2007-10-13 23:24:31

by Stephen Hemminger

[permalink] [raw]
Subject: Re: regression(?): starting with 2.6.21 sending packets became broken.

On Sat, 13 Oct 2007 22:35:25 +0200 (CEST)
Jan Engelhardt <[email protected]> wrote:

>
> On Oct 13 2007 19:59, David wrote:
> >Try
> >
> >echo 0 > /proc/sys/net/ipv4/tcp_window_scaling
> >
> >I bet you have broken router(s) between your machine and the problem
> >site(s).
>
> There is an xt_TCPOPTSTRIP module in the works that allows you to strip
> Window Scaling only on the connections you want (rather than globally);
> seems to be in for 2.6.24 at earliest, though it's there is also the
> standalone patch.

You can also do it on a per route basis which is easier than bothering
with filtering rules by just enforcing a window size limit.

ip route add {broken_dst}/32 via {gateway} window 65535


Long description at:
http://lwn.net/Articles/92727/

--
Stephen Hemminger <[email protected]>

2007-10-28 08:34:19

by Peter Volkov

[permalink] [raw]
Subject: Re: regression(?): starting with 2.6.21 sending packets became broken.

Hello, David, Jan.

В Сбт, 13/10/2007 в 19:59 +0100, David пишет:
> Try
>
> echo 0 > /proc/sys/net/ipv4/tcp_window_scaling
>
> I bet you have broken router(s) between your machine and the problem
> site(s).

Thank you for your help, but it occurs that the problem comes from IMQ
patch. I've reported that on imqlinux mailing list, but seems that it's
closed for those who do not have yahoo ID so I copy my report there and
answer here.

===========my investigation of the problem=================

This bug was reported in our bugzilla ( bugs.gentoo.org/195731 ) but as
I found that IMQ is the root of problems I wanted to share my experience
here.

Starting with 2.6.21 (or may be 2.6.20 as I have not tried it) kernel I
have problem that most tcp based services freeze at some point of
operation. I've noticed this first on ssh but then found out that at
lease one other service
became similarly. The problem sites somewhere in the kernel as I've
compiled 2.6.19, 2.6.21, and 2.6.22 with the similar .config options (of
course not exact, as some options does not exist in some kernels, but
seems that
enabled options are all the same) but I have this problem only with the
21 and 22. I've tried to debug the problem a bit, but not a lot as that
is production box working as linux based firewall/router.

First I took tcpdump. Although ssh connection to the router is not
always possible as it often hangs before I get into router, after some
attempts ssh connection was established. On client computer I've started
tcpdump and worked a bit until hang. tcpdump output showed me that when
I press any keys the packets are sent to the server and proper ack are
received. Later I found that all commands I enter blindly are executed
on router but I receive no reply packets with some data in them (pure
ack). That's why nothing happens on the screen and it looks like
hanging.

Now I've got to the router started ssh connection from router to some
other server. It hanged too. I attached strace and found that ssh
receive keyboard pressings (read() calls in the output) and writes them
further
to the kernel (write() calls) but tcpdump on the router shows no
packets. So packets enter kernel and lost somewhere inside.

This problem was reproduced both on single core amd64 system and on x86
system with hyper threading. So I suspect everybody could reproduce this
problem. Just start `yes` which produce a lot of output and then press
Ctrl+C to interupt. It hanged here somewhere at this moment.

Suggestion "echo 0 > /proc/sys/net/ipv4/tcp_window_scaling" does not
helped here.

The end of the story is that if I localized that the problem sites in
this http://www.linuximq.net/patchs/linux-2.6.21-img2.diff IMQ patch.
If I install clean gentoo sources then connection does not freeze and
with this patch I have problems. BTW. This patch
http://www.actusa.net/~linuximq/linux-2.6.23-imq.diff does not have this
problem here too.

Thank you for your attention. I think may be it's good idea to mark that
patch as questionable on site?

===========================================================


======== vlad031 answer on linuximq mailing list===========

Re: regression(?): starting with 2.6.21 sending packets became broken

We allready know that ... almost everybody does ... that's why we have
uploaded the good patches here:
http://www.actusa.net/~linuximq/

Andree isn't seem to take care of this project anymore...

Please note that 2.6.23 kernel has a lot of bugs and we don't recommend
using it yet as we had some imq problems related too (however, it doesnt
seem to be from imq patch)

Cheers!

===========================================================

--
Peter.


Attachments:
signature.asc (189.00 B)
Эта часть сообщения подписана цифровой подписью