2014-06-11 18:46:52

by Luis Henriques

[permalink] [raw]
Subject: Re: [ 059/143] sysctl net: Keep tcp_syn_retries inside the boundary

Hi Willy,

On Mon, May 12, 2014 at 02:32:59AM +0200, Willy Tarreau wrote:
> 2.6.32-longterm review patch. If anyone has any objections, please let me know.
>

During Ubuntu Lucid kernel regression testing, after the merge of
2.6.32.62, we found problems with the following patches

[ 059/143] sysctl net: Keep tcp_syn_retries inside the boundary
(Upstream commit 651e92716aaae60fc41b9652f54cb6803896e0da)

[ 065/143] net: check net.core.somaxconn sysctl values
(Upstream commit 5f671d6b4ec3e6d66c2a868738af2cdea09e7509)

The following two stack traces were found in kernel logs:

[ 0.199908] sysctl table check failed: /net/core/somaxconn .3.1.18 Missing strategy
[ 0.201100] Pid: 1, comm: swapper Not tainted 2.6.32-02063262-generic #201405200837
[ 0.202173] Call Trace:
[ 0.202523] [<ffffffff8108e419>] set_fail+0x59/0x60
[ 0.203213] [<ffffffff8108e74b>] sysctl_check_table+0x16b/0x4b0
[ 0.204065] [<ffffffff8108e75c>] sysctl_check_table+0x17c/0x4b0
[ 0.204879] [<ffffffff8108e75c>] sysctl_check_table+0x17c/0x4b0
[ 0.205697] [<ffffffff810712dd>] __register_sysctl_paths+0x11d/0x360
[ 0.206709] [<ffffffff8108e75c>] ? sysctl_check_table+0x17c/0x4b0
[ 0.207552] [<ffffffff81528af1>] register_net_sysctl_table+0x61/0x70
[ 0.208425] [<ffffffff814566d5>] sysctl_core_net_init+0x45/0xb0
[ 0.209297] [<ffffffff81455af8>] register_pernet_operations+0x48/0x100
[ 0.210119] [<ffffffff8187b6ee>] ? sysctl_core_init+0x0/0x38
[ 0.210867] [<ffffffff81455c5c>] register_pernet_subsys+0x2c/0x50
[ 0.211699] [<ffffffff8187b724>] sysctl_core_init+0x36/0x38
[ 0.212448] [<ffffffff8100a04c>] do_one_initcall+0x3c/0x1a0
[ 0.213324] [<ffffffff818446d1>] do_basic_setup+0x54/0x66
[ 0.214563] [<ffffffff818447f1>] kernel_init+0x10e/0x156
[ 0.215766] [<ffffffff810131ea>] child_rip+0xa/0x20
[ 0.216882] [<ffffffff818446e3>] ? kernel_init+0x0/0x156
[ 0.218099] [<ffffffff810131e0>] ? child_rip+0x0/0x20

and

[ 0.398433] sysctl table check failed: /net/ipv4/ip_no_pmtu_disc .3.5.39 Missing strategy
[ 0.398437] Pid: 1, comm: swapper Not tainted 2.6.32-02063262-generic #201405200837
[ 0.398438] Call Trace:
[ 0.398444] [<ffffffff8108e419>] set_fail+0x59/0x60
[ 0.398446] [<ffffffff8108e74b>] sysctl_check_table+0x16b/0x4b0
[ 0.398447] [<ffffffff8108e75c>] sysctl_check_table+0x17c/0x4b0
[ 0.398449] [<ffffffff8108e75c>] sysctl_check_table+0x17c/0x4b0
[ 0.398452] [<ffffffff810712dd>] __register_sysctl_paths+0x11d/0x360
[ 0.398455] [<ffffffff811a21d8>] ? __proc_create+0xd8/0x130
[ 0.398459] [<ffffffff8187d106>] ? sysctl_ipv4_init+0x0/0x4e
[ 0.398461] [<ffffffff8107154b>] register_sysctl_paths+0x2b/0x30
[ 0.398463] [<ffffffff8187d122>] sysctl_ipv4_init+0x1c/0x4e
[ 0.398466] [<ffffffff8100a04c>] do_one_initcall+0x3c/0x1a0
[ 0.398469] [<ffffffff818446d1>] do_basic_setup+0x54/0x66
[ 0.398470] [<ffffffff818447f1>] kernel_init+0x10e/0x156
[ 0.398473] [<ffffffff810131ea>] child_rip+0xa/0x20
[ 0.398474] [<ffffffff818446e3>] ? kernel_init+0x0/0x156
[ 0.398476] [<ffffffff810131e0>] ? child_rip+0x0/0x20

and here's a bug link:

https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1326473

For the Ubuntu Lucid kernel, we ended up reverting the offending
commits. Since I was able to reproduce this problem with a vanilla
2.6.32.62, you may want to take a similar action for the next 2.6.32
release.

Cheers,
--
Lu?s

> ------------------
>
> From: Michal Tesar <[email protected]>
>
> [ Upstream commit 651e92716aaae60fc41b9652f54cb6803896e0da ]
>
> Limit the min/max value passed to the
> /proc/sys/net/ipv4/tcp_syn_retries.
>
> Signed-off-by: Michal Tesar <[email protected]>
> Signed-off-by: David S. Miller <[email protected]>
> Signed-off-by: Willy Tarreau <[email protected]>
> ---
> net/ipv4/sysctl_net_ipv4.c | 6 +++++-
> 1 file changed, 5 insertions(+), 1 deletion(-)
>
> diff --git a/net/ipv4/sysctl_net_ipv4.c b/net/ipv4/sysctl_net_ipv4.c
> index 2dcf04d..910fa54 100644
> --- a/net/ipv4/sysctl_net_ipv4.c
> +++ b/net/ipv4/sysctl_net_ipv4.c
> @@ -23,6 +23,8 @@
>
> static int zero;
> static int tcp_retr1_max = 255;
> +static int tcp_syn_retries_min = 1;
> +static int tcp_syn_retries_max = MAX_TCP_SYNCNT;
> static int ip_local_port_range_min[] = { 1, 1 };
> static int ip_local_port_range_max[] = { 65535, 65535 };
>
> @@ -237,7 +239,9 @@ static struct ctl_table ipv4_table[] = {
> .data = &ipv4_config.no_pmtu_disc,
> .maxlen = sizeof(int),
> .mode = 0644,
> - .proc_handler = proc_dointvec
> + .proc_handler = proc_dointvec_minmax,
> + .extra1 = &tcp_syn_retries_min,
> + .extra2 = &tcp_syn_retries_max
> },
> {
> .ctl_name = NET_IPV4_NONLOCAL_BIND,
> --
> 1.7.12.2.21.g234cd45.dirty
>
>
>
> --
> To unsubscribe from this list: send the line "unsubscribe stable" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html


2014-06-11 19:47:10

by Willy Tarreau

[permalink] [raw]
Subject: Re: [ 059/143] sysctl net: Keep tcp_syn_retries inside the boundary

Hi Luis,

On Wed, Jun 11, 2014 at 07:46:44PM +0100, Luis Henriques wrote:
> Hi Willy,
>
> On Mon, May 12, 2014 at 02:32:59AM +0200, Willy Tarreau wrote:
> > 2.6.32-longterm review patch. If anyone has any objections, please let me know.
> >
>
> During Ubuntu Lucid kernel regression testing, after the merge of
> 2.6.32.62, we found problems with the following patches
>
> [ 059/143] sysctl net: Keep tcp_syn_retries inside the boundary
> (Upstream commit 651e92716aaae60fc41b9652f54cb6803896e0da)
>
> [ 065/143] net: check net.core.somaxconn sysctl values
> (Upstream commit 5f671d6b4ec3e6d66c2a868738af2cdea09e7509)
>
> The following two stack traces were found in kernel logs:

Aie :-/

> [ 0.199908] sysctl table check failed: /net/core/somaxconn .3.1.18 Missing strategy
> [ 0.201100] Pid: 1, comm: swapper Not tainted 2.6.32-02063262-generic #201405200837
> [ 0.202173] Call Trace:
(...)
> and here's a bug link:
>
> https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1326473

I think that Tyler's suggest is the right approach.

> For the Ubuntu Lucid kernel, we ended up reverting the offending
> commits. Since I was able to reproduce this problem with a vanilla
> 2.6.32.62, you may want to take a similar action for the next 2.6.32
> release.

The initial bug is hard to debug on live systems. I've been hit myself
and it took me a lot of time to find the root cause. The problem is that
the backlog is stored on an unsigned short while the sysctl is stored
on an int, and the value is naturally truncated, so when you use an
somaxconn of N*65536 + just a few, you end up with just a few and drop
a lot of SYNs even under moderate loads. Worse, the only people who
touch these values are those who run under high loads and who are the
most likely to face the issue.

Thus if there's a quick way to check that Tyler's fix reliably addresses
the issue, I think we should take it instead. Of course I understand that
in the mean time the revert is better for you!

Regards,
Willy

2014-06-12 12:56:00

by Luis Henriques

[permalink] [raw]
Subject: Re: [ 059/143] sysctl net: Keep tcp_syn_retries inside the boundary

(Adding Tyler to the thread, as I should have done in the first place)

Willy Tarreau <[email protected]> writes:

> Hi Luis,
>
> On Wed, Jun 11, 2014 at 07:46:44PM +0100, Luis Henriques wrote:
>> Hi Willy,
>>
>> On Mon, May 12, 2014 at 02:32:59AM +0200, Willy Tarreau wrote:
>> > 2.6.32-longterm review patch. If anyone has any objections, please let me know.
>> >
>>
>> During Ubuntu Lucid kernel regression testing, after the merge of
>> 2.6.32.62, we found problems with the following patches
>>
>> [ 059/143] sysctl net: Keep tcp_syn_retries inside the boundary
>> (Upstream commit 651e92716aaae60fc41b9652f54cb6803896e0da)
>>
>> [ 065/143] net: check net.core.somaxconn sysctl values
>> (Upstream commit 5f671d6b4ec3e6d66c2a868738af2cdea09e7509)
>>
>> The following two stack traces were found in kernel logs:
>
> Aie :-/
>
>> [ 0.199908] sysctl table check failed: /net/core/somaxconn .3.1.18 Missing strategy
>> [ 0.201100] Pid: 1, comm: swapper Not tainted 2.6.32-02063262-generic #201405200837
>> [ 0.202173] Call Trace:
> (...)
>> and here's a bug link:
>>
>> https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1326473
>
> I think that Tyler's suggest is the right approach.
>
>> For the Ubuntu Lucid kernel, we ended up reverting the offending
>> commits. Since I was able to reproduce this problem with a vanilla
>> 2.6.32.62, you may want to take a similar action for the next 2.6.32
>> release.
>
> The initial bug is hard to debug on live systems. I've been hit myself
> and it took me a lot of time to find the root cause. The problem is that
> the backlog is stored on an unsigned short while the sysctl is stored
> on an int, and the value is naturally truncated, so when you use an
> somaxconn of N*65536 + just a few, you end up with just a few and drop
> a lot of SYNs even under moderate loads. Worse, the only people who
> touch these values are those who run under high loads and who are the
> most likely to face the issue.
>
> Thus if there's a quick way to check that Tyler's fix reliably addresses
> the issue, I think we should take it instead. Of course I understand that
> in the mean time the revert is better for you!
>
> Regards,
> Willy
>

I was finally able to spend some more time with this and tried (a
modified) Tyler's patch on top of 2.6.32.62, and it seems to work.
Although I haven't done any extended testing, I don't see the two
stack traces and the /proc/sys/net/ipv4/ directory seems to be
correctly populated.

I'm attaching the patch I've used, based on Tyler's.

Cheers,
--
Luís

diff --git a/net/core/sysctl_net_core.c b/net/core/sysctl_net_core.c
index e2eaf29..e6bf72c 100644
--- a/net/core/sysctl_net_core.c
+++ b/net/core/sysctl_net_core.c
@@ -121,7 +121,8 @@ static struct ctl_table netns_core_table[] = {
.mode = 0644,
.extra1 = &zero,
.extra2 = &ushort_max,
- .proc_handler = proc_dointvec_minmax
+ .proc_handler = proc_dointvec_minmax,
+ .strategy = &sysctl_intvec
},
{ .ctl_name = 0 }
};
diff --git a/net/ipv4/sysctl_net_ipv4.c b/net/ipv4/sysctl_net_ipv4.c
index 910fa54..d957371 100644
--- a/net/ipv4/sysctl_net_ipv4.c
+++ b/net/ipv4/sysctl_net_ipv4.c
@@ -241,7 +241,8 @@ static struct ctl_table ipv4_table[] = {
.mode = 0644,
.proc_handler = proc_dointvec_minmax,
.extra1 = &tcp_syn_retries_min,
- .extra2 = &tcp_syn_retries_max
+ .extra2 = &tcp_syn_retries_max,
+ .strategy = &sysctl_intvec
},
{
.ctl_name = NET_IPV4_NONLOCAL_BIND,

2014-06-12 13:02:56

by Willy Tarreau

[permalink] [raw]
Subject: Re: [ 059/143] sysctl net: Keep tcp_syn_retries inside the boundary

Hi Luis,

On Thu, Jun 12, 2014 at 01:55:53PM +0100, Luis Henriques wrote:
> (Adding Tyler to the thread, as I should have done in the first place)
>
> Willy Tarreau <[email protected]> writes:
>
> > Hi Luis,
> >
> > On Wed, Jun 11, 2014 at 07:46:44PM +0100, Luis Henriques wrote:
> >> Hi Willy,
> >>
> >> On Mon, May 12, 2014 at 02:32:59AM +0200, Willy Tarreau wrote:
> >> > 2.6.32-longterm review patch. If anyone has any objections, please let me know.
> >> >
> >>
> >> During Ubuntu Lucid kernel regression testing, after the merge of
> >> 2.6.32.62, we found problems with the following patches
> >>
> >> [ 059/143] sysctl net: Keep tcp_syn_retries inside the boundary
> >> (Upstream commit 651e92716aaae60fc41b9652f54cb6803896e0da)
> >>
> >> [ 065/143] net: check net.core.somaxconn sysctl values
> >> (Upstream commit 5f671d6b4ec3e6d66c2a868738af2cdea09e7509)
> >>
> >> The following two stack traces were found in kernel logs:
> >
> > Aie :-/
> >
> >> [ 0.199908] sysctl table check failed: /net/core/somaxconn .3.1.18 Missing strategy
> >> [ 0.201100] Pid: 1, comm: swapper Not tainted 2.6.32-02063262-generic #201405200837
> >> [ 0.202173] Call Trace:
> > (...)
> >> and here's a bug link:
> >>
> >> https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1326473
> >
> > I think that Tyler's suggest is the right approach.
> >
> >> For the Ubuntu Lucid kernel, we ended up reverting the offending
> >> commits. Since I was able to reproduce this problem with a vanilla
> >> 2.6.32.62, you may want to take a similar action for the next 2.6.32
> >> release.
> >
> > The initial bug is hard to debug on live systems. I've been hit myself
> > and it took me a lot of time to find the root cause. The problem is that
> > the backlog is stored on an unsigned short while the sysctl is stored
> > on an int, and the value is naturally truncated, so when you use an
> > somaxconn of N*65536 + just a few, you end up with just a few and drop
> > a lot of SYNs even under moderate loads. Worse, the only people who
> > touch these values are those who run under high loads and who are the
> > most likely to face the issue.
> >
> > Thus if there's a quick way to check that Tyler's fix reliably addresses
> > the issue, I think we should take it instead. Of course I understand that
> > in the mean time the revert is better for you!
> >
> > Regards,
> > Willy
> >
>
> I was finally able to spend some more time with this and tried (a
> modified) Tyler's patch on top of 2.6.32.62, and it seems to work.
> Although I haven't done any extended testing, I don't see the two
> stack traces and the /proc/sys/net/ipv4/ directory seems to be
> correctly populated.

OK so that's confirmed now. I remember we had to do the same for
another patch during the 32.y cycle last year, so I'm not surprized.

> I'm attaching the patch I've used, based on Tyler's.

Great, thank you guys. I'll queue it up for .63. I just have to check
if there is anything else pending for a release.

Cheers,
Willy

2014-06-14 17:50:59

by Willy Tarreau

[permalink] [raw]
Subject: Re: [ 059/143] sysctl net: Keep tcp_syn_retries inside the boundary

Hi Luis,

On Thu, Jun 12, 2014 at 01:55:53PM +0100, Luis Henriques wrote:
> I was finally able to spend some more time with this and tried (a
> modified) Tyler's patch on top of 2.6.32.62, and it seems to work.
> Although I haven't done any extended testing, I don't see the two
> stack traces and the /proc/sys/net/ipv4/ directory seems to be
> correctly populated.
>
> I'm attaching the patch I've used, based on Tyler's.

Would any of you or Tyler please kindly pass me a signed-off-by with
a commit message ? That would be great. Alternately I'd do it myself
and mention you authored them.

> Cheers,
> --
> Lu?s

Thanks,
Willy

> diff --git a/net/core/sysctl_net_core.c b/net/core/sysctl_net_core.c
> index e2eaf29..e6bf72c 100644
> --- a/net/core/sysctl_net_core.c
> +++ b/net/core/sysctl_net_core.c
> @@ -121,7 +121,8 @@ static struct ctl_table netns_core_table[] = {
> .mode = 0644,
> .extra1 = &zero,
> .extra2 = &ushort_max,
> - .proc_handler = proc_dointvec_minmax
> + .proc_handler = proc_dointvec_minmax,
> + .strategy = &sysctl_intvec
> },
> { .ctl_name = 0 }
> };
> diff --git a/net/ipv4/sysctl_net_ipv4.c b/net/ipv4/sysctl_net_ipv4.c
> index 910fa54..d957371 100644
> --- a/net/ipv4/sysctl_net_ipv4.c
> +++ b/net/ipv4/sysctl_net_ipv4.c
> @@ -241,7 +241,8 @@ static struct ctl_table ipv4_table[] = {
> .mode = 0644,
> .proc_handler = proc_dointvec_minmax,
> .extra1 = &tcp_syn_retries_min,
> - .extra2 = &tcp_syn_retries_max
> + .extra2 = &tcp_syn_retries_max,
> + .strategy = &sysctl_intvec
> },
> {
> .ctl_name = NET_IPV4_NONLOCAL_BIND,

2014-06-20 22:17:26

by Eric W. Biederman

[permalink] [raw]
Subject: Re: [ 059/143] sysctl net: Keep tcp_syn_retries inside the boundary

Willy Tarreau <[email protected]> writes:

> Hi Luis,
>
> On Thu, Jun 12, 2014 at 01:55:53PM +0100, Luis Henriques wrote:
>> I was finally able to spend some more time with this and tried (a
>> modified) Tyler's patch on top of 2.6.32.62, and it seems to work.
>> Although I haven't done any extended testing, I don't see the two
>> stack traces and the /proc/sys/net/ipv4/ directory seems to be
>> correctly populated.
>>
>> I'm attaching the patch I've used, based on Tyler's.
>
> Would any of you or Tyler please kindly pass me a signed-off-by with
> a commit message ? That would be great. Alternately I'd do it myself
> and mention you authored them.

If my memory serves it is possibe in 2.6.32 to set
.ctl_name = CTL_UNNEEDED

and not need to implement a .strategy routine at all.

Given the fact that most people got the strategy routines
slightly wrong and that sys_sysctl is effectively unused
a strategy where you don't implement code that no-one
will use in a backport I would be preferable.

Since you have mentioned this has come up a couple of times if something
else this will be something to think about for next time.

I am puzzled why .ctl_name was populated in a backport at all.

Eric

2014-06-20 22:58:38

by Willy Tarreau

[permalink] [raw]
Subject: Re: [ 059/143] sysctl net: Keep tcp_syn_retries inside the boundary

Hi Eric,

On Fri, Jun 20, 2014 at 03:16:07PM -0700, Eric W. Biederman wrote:
> Willy Tarreau <[email protected]> writes:
>
> > Hi Luis,
> >
> > On Thu, Jun 12, 2014 at 01:55:53PM +0100, Luis Henriques wrote:
> >> I was finally able to spend some more time with this and tried (a
> >> modified) Tyler's patch on top of 2.6.32.62, and it seems to work.
> >> Although I haven't done any extended testing, I don't see the two
> >> stack traces and the /proc/sys/net/ipv4/ directory seems to be
> >> correctly populated.
> >>
> >> I'm attaching the patch I've used, based on Tyler's.
> >
> > Would any of you or Tyler please kindly pass me a signed-off-by with
> > a commit message ? That would be great. Alternately I'd do it myself
> > and mention you authored them.
>
> If my memory serves it is possibe in 2.6.32 to set
> .ctl_name = CTL_UNNEEDED
>
> and not need to implement a .strategy routine at all.

Ah that's quite interesting, thanks for the tip!

> Given the fact that most people got the strategy routines
> slightly wrong and that sys_sysctl is effectively unused
> a strategy where you don't implement code that no-one
> will use in a backport I would be preferable.

OK.

> Since you have mentioned this has come up a couple of times if something
> else this will be something to think about for next time.

I'm keeping your e-mail where I manage patches, hoping to recognize
this case next time.

> I am puzzled why .ctl_name was populated in a backport at all.

Oh it's simply because I didn't know it did not have to be there,
and among the few reviewers, I guess that it's not common to know
what version uses what semantics.

Thank you for the exaplanation, it's really helpful. We're not used
to backport sysctl changes but here I got caught a few times and have
found some sysctl.conf with bogus values in field a few times, so it
was really important to backport this one.

Best regards,
Willy

2014-06-21 00:20:35

by Eric W. Biederman

[permalink] [raw]
Subject: Re: [ 059/143] sysctl net: Keep tcp_syn_retries inside the boundary

Willy Tarreau <[email protected]> writes:

> Hi Eric,
>
> On Fri, Jun 20, 2014 at 03:16:07PM -0700, Eric W. Biederman wrote:
>> Willy Tarreau <[email protected]> writes:
>>
>> > Hi Luis,
>> >
>> > On Thu, Jun 12, 2014 at 01:55:53PM +0100, Luis Henriques wrote:
>> >> I was finally able to spend some more time with this and tried (a
>> >> modified) Tyler's patch on top of 2.6.32.62, and it seems to work.
>> >> Although I haven't done any extended testing, I don't see the two
>> >> stack traces and the /proc/sys/net/ipv4/ directory seems to be
>> >> correctly populated.
>> >>
>> >> I'm attaching the patch I've used, based on Tyler's.
>> >
>> > Would any of you or Tyler please kindly pass me a signed-off-by with
>> > a commit message ? That would be great. Alternately I'd do it myself
>> > and mention you authored them.
>>
>> If my memory serves it is possibe in 2.6.32 to set
>> .ctl_name = CTL_UNNEEDED
>>
>> and not need to implement a .strategy routine at all.
>
> Ah that's quite interesting, thanks for the tip!
>
>> Given the fact that most people got the strategy routines
>> slightly wrong and that sys_sysctl is effectively unused
>> a strategy where you don't implement code that no-one
>> will use in a backport I would be preferable.
>
> OK.
>
>> Since you have mentioned this has come up a couple of times if something
>> else this will be something to think about for next time.
>
> I'm keeping your e-mail where I manage patches, hoping to recognize
> this case next time.
>
>> I am puzzled why .ctl_name was populated in a backport at all.
>
> Oh it's simply because I didn't know it did not have to be there,
> and among the few reviewers, I guess that it's not common to know
> what version uses what semantics.

I guess what I meant is that the field .ctl_name does not even exist
anymore for the same reasons .strategy does not exist anymore. So I
was just suprirsed that someone picked a randomish number and stuck
it in there.

If anyone actually were to use those randomish numbers in the binary
sys_sysctl call their applications would break when they eventually
moved to a more recent kernel.

Which is one of the motivations it was decided there would be no more
binary sysctls allocated around the 2.6.32 timeframe.

> Thank you for the exaplanation, it's really helpful. We're not used
> to backport sysctl changes but here I got caught a few times and have
> found some sysctl.conf with bogus values in field a few times, so it
> was really important to backport this one.

Sysctl do have their uses, and at least 2.6.32 has runtime sysctl checks
to keep the insanity to a dull roar.

Eric