2022-11-23 10:07:32

by Firo Yang

[permalink] [raw]
Subject: [PATCH 1/1] sctp: sysctl: referring the correct net namespace

Recently, a customer reported that from their container whose
net namespace is different to the host's init_net, they can't set
the container's net.sctp.rto_max to any value smaller than
init_net.sctp.rto_min.

For instance,
Host:
sudo sysctl net.sctp.rto_min
net.sctp.rto_min = 1000

Container:
echo 100 > /mnt/proc-net/sctp/rto_min
echo 400 > /mnt/proc-net/sctp/rto_max
echo: write error: Invalid argument

This is caused by the check made from this'commit 4f3fdf3bc59c
("sctp: add check rto_min and rto_max in sysctl")'
When validating the input value, it's always referring the boundary
value set for the init_net namespace.

Having container's rto_max smaller than host's init_net.sctp.rto_min
does make sense. Considering that the rto between two containers on the
same host is very likely smaller than it for two hosts.

So to fix this problem, just referring the boundary value from the net
namespace where the new input value came from shold be enough.

Signed-off-by: Firo Yang <[email protected]>
---
net/sctp/sysctl.c | 6 ++++++
1 file changed, 6 insertions(+)

diff --git a/net/sctp/sysctl.c b/net/sctp/sysctl.c
index b46a416787ec..e167df4dc60b 100644
--- a/net/sctp/sysctl.c
+++ b/net/sctp/sysctl.c
@@ -429,6 +429,9 @@ static int proc_sctp_do_rto_min(struct ctl_table *ctl, int write,
else
tbl.data = &net->sctp.rto_min;

+ if (net != &init_net)
+ max = net->sctp.rto_max;
+
ret = proc_dointvec(&tbl, write, buffer, lenp, ppos);
if (write && ret == 0) {
if (new_value > max || new_value < min)
@@ -457,6 +460,9 @@ static int proc_sctp_do_rto_max(struct ctl_table *ctl, int write,
else
tbl.data = &net->sctp.rto_max;

+ if (net != &init_net)
+ min = net->sctp.rto_min;
+
ret = proc_dointvec(&tbl, write, buffer, lenp, ppos);
if (write && ret == 0) {
if (new_value > max || new_value < min)
--
2.26.2


2022-11-23 13:29:51

by Marcelo Ricardo Leitner

[permalink] [raw]
Subject: Re: [PATCH 1/1] sctp: sysctl: referring the correct net namespace

On Wed, Nov 23, 2022 at 05:44:06PM +0800, Firo Yang wrote:
> Recently, a customer reported that from their container whose
> net namespace is different to the host's init_net, they can't set
> the container's net.sctp.rto_max to any value smaller than
> init_net.sctp.rto_min.
>
> For instance,
> Host:
> sudo sysctl net.sctp.rto_min
> net.sctp.rto_min = 1000
>
> Container:
> echo 100 > /mnt/proc-net/sctp/rto_min
> echo 400 > /mnt/proc-net/sctp/rto_max
> echo: write error: Invalid argument
>
> This is caused by the check made from this'commit 4f3fdf3bc59c
> ("sctp: add check rto_min and rto_max in sysctl")'
> When validating the input value, it's always referring the boundary
> value set for the init_net namespace.
>
> Having container's rto_max smaller than host's init_net.sctp.rto_min
> does make sense. Considering that the rto between two containers on the
> same host is very likely smaller than it for two hosts.

Makes sense. And also, here, it is not using the init_net as
boundaries for the values themselves. I mean, rto_min in init_net
won't be the minimum allowed for rto_min in other netns. Ditto for
rto_max.

More below.

>
> So to fix this problem, just referring the boundary value from the net
> namespace where the new input value came from shold be enough.
>
> Signed-off-by: Firo Yang <[email protected]>
> ---
> net/sctp/sysctl.c | 6 ++++++
> 1 file changed, 6 insertions(+)
>
> diff --git a/net/sctp/sysctl.c b/net/sctp/sysctl.c
> index b46a416787ec..e167df4dc60b 100644
> --- a/net/sctp/sysctl.c
> +++ b/net/sctp/sysctl.c
> @@ -429,6 +429,9 @@ static int proc_sctp_do_rto_min(struct ctl_table *ctl, int write,
> else
> tbl.data = &net->sctp.rto_min;
>
> + if (net != &init_net)
> + max = net->sctp.rto_max;

This also affects other sysctls:

$ grep -e procname -e extra sysctl.c | grep -B1 extra.*init_net
.extra1 = SYSCTL_ONE,
.extra2 = &init_net.sctp.rto_max
.procname = "rto_max",
.extra1 = &init_net.sctp.rto_min,
--
.extra1 = SYSCTL_ZERO,
.extra2 = &init_net.sctp.ps_retrans,
.procname = "ps_retrans",
.extra1 = &init_net.sctp.pf_retrans,

And apparently, SCTP is the only one doing such dynamic limits. At
least in networking.

While the issue you reported is fixable this way, for ps/pf_retrans,
it is not, as it is using proc_dointvec_minmax() and it will simply
consume those values (with no netns translation).

So what about patching sctp_sysctl_net_register() instead, to update
these pointers during netns creation? Right after where it update the
'data' one in there:

for (i = 0; table[i].data; i++)
table[i].data += (char *)(&net->sctp) - (char *)&init_net.sctp;

Thanks,
Marcelo

> +
> ret = proc_dointvec(&tbl, write, buffer, lenp, ppos);
> if (write && ret == 0) {
> if (new_value > max || new_value < min)
> @@ -457,6 +460,9 @@ static int proc_sctp_do_rto_max(struct ctl_table *ctl, int write,
> else
> tbl.data = &net->sctp.rto_max;
>
> + if (net != &init_net)
> + min = net->sctp.rto_min;
> +
> ret = proc_dointvec(&tbl, write, buffer, lenp, ppos);
> if (write && ret == 0) {
> if (new_value > max || new_value < min)
> --
> 2.26.2
>

2022-11-24 07:23:20

by Firo Yang

[permalink] [raw]
Subject: Re: [PATCH 1/1] sctp: sysctl: referring the correct net namespace

The 11/23/2022 10:00, Marcelo Ricardo Leitner wrote:
> On Wed, Nov 23, 2022 at 05:44:06PM +0800, Firo Yang wrote:
> > Recently, a customer reported that from their container whose
> > net namespace is different to the host's init_net, they can't set
> > the container's net.sctp.rto_max to any value smaller than
> > init_net.sctp.rto_min.
> >
> > For instance,
> > Host:
> > sudo sysctl net.sctp.rto_min
> > net.sctp.rto_min = 1000
> >
> > Container:
> > echo 100 > /mnt/proc-net/sctp/rto_min
> > echo 400 > /mnt/proc-net/sctp/rto_max
> > echo: write error: Invalid argument
> >
> > This is caused by the check made from this'commit 4f3fdf3bc59c
> > ("sctp: add check rto_min and rto_max in sysctl")'
> > When validating the input value, it's always referring the boundary
> > value set for the init_net namespace.
> >
> > Having container's rto_max smaller than host's init_net.sctp.rto_min
> > does make sense. Considering that the rto between two containers on the
> > same host is very likely smaller than it for two hosts.
>
> Makes sense. And also, here, it is not using the init_net as
> boundaries for the values themselves. I mean, rto_min in init_net
> won't be the minimum allowed for rto_min in other netns. Ditto for
> rto_max.
>
> More below.
>
> >
> > So to fix this problem, just referring the boundary value from the net
> > namespace where the new input value came from shold be enough.
> >
> > Signed-off-by: Firo Yang <[email protected]>
> > ---
> > net/sctp/sysctl.c | 6 ++++++
> > 1 file changed, 6 insertions(+)
> >
> > diff --git a/net/sctp/sysctl.c b/net/sctp/sysctl.c
> > index b46a416787ec..e167df4dc60b 100644
> > --- a/net/sctp/sysctl.c
> > +++ b/net/sctp/sysctl.c
> > @@ -429,6 +429,9 @@ static int proc_sctp_do_rto_min(struct ctl_table *ctl, int write,
> > else
> > tbl.data = &net->sctp.rto_min;
> >
> > + if (net != &init_net)
> > + max = net->sctp.rto_max;
>
> This also affects other sysctls:
>
> $ grep -e procname -e extra sysctl.c | grep -B1 extra.*init_net
> .extra1 = SYSCTL_ONE,
> .extra2 = &init_net.sctp.rto_max
> .procname = "rto_max",
> .extra1 = &init_net.sctp.rto_min,
> --
> .extra1 = SYSCTL_ZERO,
> .extra2 = &init_net.sctp.ps_retrans,
> .procname = "ps_retrans",
> .extra1 = &init_net.sctp.pf_retrans,
>
> And apparently, SCTP is the only one doing such dynamic limits. At
> least in networking.
>
> While the issue you reported is fixable this way, for ps/pf_retrans,
> it is not, as it is using proc_dointvec_minmax() and it will simply
> consume those values (with no netns translation).
>
> So what about patching sctp_sysctl_net_register() instead, to update
> these pointers during netns creation? Right after where it update the
> 'data' one in there:
>
> for (i = 0; table[i].data; i++)
> table[i].data += (char *)(&net->sctp) - (char *)&init_net.sctp;

Thanks Marcelo. It's better. So you mean something like the following?

--- a/net/sctp/sysctl.c
+++ b/net/sctp/sysctl.c
@@ -586,6 +586,11 @@ int sctp_sysctl_net_register(struct net *net)
for (i = 0; table[i].data; i++)
table[i].data += (char *)(&net->sctp) - (char *)&init_net.sctp;

+#define SCTP_RTO_MIN_IDX 1
+#define SCTP_RTO_MAX_IDX 2
+ table[SCTP_RTO_MIN_IDX].extra2 = &net->sctp.rto_max;
+ table[SCTP_RTO_MAX_IDX].extra1 = &net->sctp.rto_min;
+
net->sctp.sysctl_header = register_net_sysctl(net, "net/sctp", table);
if (net->sctp.sysctl_header == NULL) {
kfree(table);


>
> Thanks,
> Marcelo
>
> > +
> > ret = proc_dointvec(&tbl, write, buffer, lenp, ppos);
> > if (write && ret == 0) {
> > if (new_value > max || new_value < min)
> > @@ -457,6 +460,9 @@ static int proc_sctp_do_rto_max(struct ctl_table *ctl, int write,
> > else
> > tbl.data = &net->sctp.rto_max;
> >
> > + if (net != &init_net)
> > + min = net->sctp.rto_min;
> > +
> > ret = proc_dointvec(&tbl, write, buffer, lenp, ppos);
> > if (write && ret == 0) {
> > if (new_value > max || new_value < min)
> > --
> > 2.26.2
> >

2022-11-24 19:04:05

by Marcelo Ricardo Leitner

[permalink] [raw]
Subject: Re: [PATCH 1/1] sctp: sysctl: referring the correct net namespace

On Thu, Nov 24, 2022 at 02:29:38PM +0800, Firo Yang wrote:
> The 11/23/2022 10:00, Marcelo Ricardo Leitner wrote:
> > On Wed, Nov 23, 2022 at 05:44:06PM +0800, Firo Yang wrote:
> > > Recently, a customer reported that from their container whose
> > > net namespace is different to the host's init_net, they can't set
> > > the container's net.sctp.rto_max to any value smaller than
> > > init_net.sctp.rto_min.
> > >
> > > For instance,
> > > Host:
> > > sudo sysctl net.sctp.rto_min
> > > net.sctp.rto_min = 1000
> > >
> > > Container:
> > > echo 100 > /mnt/proc-net/sctp/rto_min
> > > echo 400 > /mnt/proc-net/sctp/rto_max
> > > echo: write error: Invalid argument
> > >
> > > This is caused by the check made from this'commit 4f3fdf3bc59c
> > > ("sctp: add check rto_min and rto_max in sysctl")'
> > > When validating the input value, it's always referring the boundary
> > > value set for the init_net namespace.
> > >
> > > Having container's rto_max smaller than host's init_net.sctp.rto_min
> > > does make sense. Considering that the rto between two containers on the
> > > same host is very likely smaller than it for two hosts.
> >
> > Makes sense. And also, here, it is not using the init_net as
> > boundaries for the values themselves. I mean, rto_min in init_net
> > won't be the minimum allowed for rto_min in other netns. Ditto for
> > rto_max.
> >
> > More below.
> >
> > >
> > > So to fix this problem, just referring the boundary value from the net
> > > namespace where the new input value came from shold be enough.
> > >
> > > Signed-off-by: Firo Yang <[email protected]>
> > > ---
> > > net/sctp/sysctl.c | 6 ++++++
> > > 1 file changed, 6 insertions(+)
> > >
> > > diff --git a/net/sctp/sysctl.c b/net/sctp/sysctl.c
> > > index b46a416787ec..e167df4dc60b 100644
> > > --- a/net/sctp/sysctl.c
> > > +++ b/net/sctp/sysctl.c
> > > @@ -429,6 +429,9 @@ static int proc_sctp_do_rto_min(struct ctl_table *ctl, int write,
> > > else
> > > tbl.data = &net->sctp.rto_min;
> > >
> > > + if (net != &init_net)
> > > + max = net->sctp.rto_max;
> >
> > This also affects other sysctls:
> >
> > $ grep -e procname -e extra sysctl.c | grep -B1 extra.*init_net
> > .extra1 = SYSCTL_ONE,
> > .extra2 = &init_net.sctp.rto_max
> > .procname = "rto_max",
> > .extra1 = &init_net.sctp.rto_min,
> > --
> > .extra1 = SYSCTL_ZERO,
> > .extra2 = &init_net.sctp.ps_retrans,
> > .procname = "ps_retrans",
> > .extra1 = &init_net.sctp.pf_retrans,
> >
> > And apparently, SCTP is the only one doing such dynamic limits. At
> > least in networking.
> >
> > While the issue you reported is fixable this way, for ps/pf_retrans,
> > it is not, as it is using proc_dointvec_minmax() and it will simply
> > consume those values (with no netns translation).
> >
> > So what about patching sctp_sysctl_net_register() instead, to update
> > these pointers during netns creation? Right after where it update the
> > 'data' one in there:
> >
> > for (i = 0; table[i].data; i++)
> > table[i].data += (char *)(&net->sctp) - (char *)&init_net.sctp;
>
> Thanks Marcelo. It's better. So you mean something like the following?

Yes,

>
> --- a/net/sctp/sysctl.c
> +++ b/net/sctp/sysctl.c
> @@ -586,6 +586,11 @@ int sctp_sysctl_net_register(struct net *net)
> for (i = 0; table[i].data; i++)
> table[i].data += (char *)(&net->sctp) - (char *)&init_net.sctp;
>
> +#define SCTP_RTO_MIN_IDX 1
> +#define SCTP_RTO_MAX_IDX 2

But these should be together with the sysctl table definition, so we
don't forget to update it later on if needed.

> + table[SCTP_RTO_MIN_IDX].extra2 = &net->sctp.rto_max;
> + table[SCTP_RTO_MAX_IDX].extra1 = &net->sctp.rto_min;

And also the ps/pf_retrans. :-)

> +
> net->sctp.sysctl_header = register_net_sysctl(net, "net/sctp", table);
> if (net->sctp.sysctl_header == NULL) {
> kfree(table);
>
>
> >
> > Thanks,
> > Marcelo
> >
> > > +
> > > ret = proc_dointvec(&tbl, write, buffer, lenp, ppos);
> > > if (write && ret == 0) {
> > > if (new_value > max || new_value < min)
> > > @@ -457,6 +460,9 @@ static int proc_sctp_do_rto_max(struct ctl_table *ctl, int write,
> > > else
> > > tbl.data = &net->sctp.rto_max;
> > >
> > > + if (net != &init_net)
> > > + min = net->sctp.rto_min;
> > > +
> > > ret = proc_dointvec(&tbl, write, buffer, lenp, ppos);
> > > if (write && ret == 0) {
> > > if (new_value > max || new_value < min)
> > > --
> > > 2.26.2
> > >

2022-11-25 06:07:20

by Firo Yang

[permalink] [raw]
Subject: Re: [PATCH 1/1] sctp: sysctl: referring the correct net namespace

The 11/24/2022 14:57, Marcelo Ricardo Leitner wrote:
> On Thu, Nov 24, 2022 at 02:29:38PM +0800, Firo Yang wrote:
> > The 11/23/2022 10:00, Marcelo Ricardo Leitner wrote:
> > > On Wed, Nov 23, 2022 at 05:44:06PM +0800, Firo Yang wrote:
> > > > Recently, a customer reported that from their container whose
> > > > net namespace is different to the host's init_net, they can't set
> > > > the container's net.sctp.rto_max to any value smaller than
> > > > init_net.sctp.rto_min.
> > > >
> > > > For instance,
> > > > Host:
> > > > sudo sysctl net.sctp.rto_min
> > > > net.sctp.rto_min = 1000
> > > >
> > > > Container:
> > > > echo 100 > /mnt/proc-net/sctp/rto_min
> > > > echo 400 > /mnt/proc-net/sctp/rto_max
> > > > echo: write error: Invalid argument
> > > >
> > > > This is caused by the check made from this'commit 4f3fdf3bc59c
> > > > ("sctp: add check rto_min and rto_max in sysctl")'
> > > > When validating the input value, it's always referring the boundary
> > > > value set for the init_net namespace.
> > > >
> > > > Having container's rto_max smaller than host's init_net.sctp.rto_min
> > > > does make sense. Considering that the rto between two containers on the
> > > > same host is very likely smaller than it for two hosts.
> > >
> > > Makes sense. And also, here, it is not using the init_net as
> > > boundaries for the values themselves. I mean, rto_min in init_net
> > > won't be the minimum allowed for rto_min in other netns. Ditto for
> > > rto_max.
> > >
> > > More below.
> > >
> > > >
> > > > So to fix this problem, just referring the boundary value from the net
> > > > namespace where the new input value came from shold be enough.
> > > >
> > > > Signed-off-by: Firo Yang <[email protected]>
> > > > ---
> > > > net/sctp/sysctl.c | 6 ++++++
> > > > 1 file changed, 6 insertions(+)
> > > >
> > > > diff --git a/net/sctp/sysctl.c b/net/sctp/sysctl.c
> > > > index b46a416787ec..e167df4dc60b 100644
> > > > --- a/net/sctp/sysctl.c
> > > > +++ b/net/sctp/sysctl.c
> > > > @@ -429,6 +429,9 @@ static int proc_sctp_do_rto_min(struct ctl_table *ctl, int write,
> > > > else
> > > > tbl.data = &net->sctp.rto_min;
> > > >
> > > > + if (net != &init_net)
> > > > + max = net->sctp.rto_max;
> > >
> > > This also affects other sysctls:
> > >
> > > $ grep -e procname -e extra sysctl.c | grep -B1 extra.*init_net
> > > .extra1 = SYSCTL_ONE,
> > > .extra2 = &init_net.sctp.rto_max
> > > .procname = "rto_max",
> > > .extra1 = &init_net.sctp.rto_min,
> > > --
> > > .extra1 = SYSCTL_ZERO,
> > > .extra2 = &init_net.sctp.ps_retrans,
> > > .procname = "ps_retrans",
> > > .extra1 = &init_net.sctp.pf_retrans,
> > >
> > > And apparently, SCTP is the only one doing such dynamic limits. At
> > > least in networking.
> > >
> > > While the issue you reported is fixable this way, for ps/pf_retrans,
> > > it is not, as it is using proc_dointvec_minmax() and it will simply
> > > consume those values (with no netns translation).
> > >
> > > So what about patching sctp_sysctl_net_register() instead, to update
> > > these pointers during netns creation? Right after where it update the
> > > 'data' one in there:
> > >
> > > for (i = 0; table[i].data; i++)
> > > table[i].data += (char *)(&net->sctp) - (char *)&init_net.sctp;
> >
> > Thanks Marcelo. It's better. So you mean something like the following?
>
> Yes,
>
> >
> > --- a/net/sctp/sysctl.c
> > +++ b/net/sctp/sysctl.c
> > @@ -586,6 +586,11 @@ int sctp_sysctl_net_register(struct net *net)
> > for (i = 0; table[i].data; i++)
> > table[i].data += (char *)(&net->sctp) - (char *)&init_net.sctp;
> >
> > +#define SCTP_RTO_MIN_IDX 1
> > +#define SCTP_RTO_MAX_IDX 2
>
> But these should be together with the sysctl table definition, so we
> don't forget to update it later on if needed.
>
> > + table[SCTP_RTO_MIN_IDX].extra2 = &net->sctp.rto_max;
> > + table[SCTP_RTO_MAX_IDX].extra1 = &net->sctp.rto_min;
>
> And also the ps/pf_retrans. :-)

Sure. I will send an V2.

Thanks,
// Firo

>
> > +
> > net->sctp.sysctl_header = register_net_sysctl(net, "net/sctp", table);
> > if (net->sctp.sysctl_header == NULL) {
> > kfree(table);
> >
> >
> > >
> > > Thanks,
> > > Marcelo
> > >
> > > > +
> > > > ret = proc_dointvec(&tbl, write, buffer, lenp, ppos);
> > > > if (write && ret == 0) {
> > > > if (new_value > max || new_value < min)
> > > > @@ -457,6 +460,9 @@ static int proc_sctp_do_rto_max(struct ctl_table *ctl, int write,
> > > > else
> > > > tbl.data = &net->sctp.rto_max;
> > > >
> > > > + if (net != &init_net)
> > > > + min = net->sctp.rto_min;
> > > > +
> > > > ret = proc_dointvec(&tbl, write, buffer, lenp, ppos);
> > > > if (write && ret == 0) {
> > > > if (new_value > max || new_value < min)
> > > > --
> > > > 2.26.2
> > > >