2021-05-31 17:22:31

by Changbin Du

[permalink] [raw]
Subject: [PATCH] nsfs: fix oops when ns->ops is not provided

We should not create inode for disabled namespace. A disabled namespace
sets its ns->ops to NULL. Kernel could panic if we try to create a inode
for such namespace.

Here is an example oops in socket ioctl cmd SIOCGSKNS when NET_NS is
disabled. Kernel panicked wherever nsfs trys to access ns->ops since the
proc_ns_operations is not implemented in this case.

[7.670023] Unable to handle kernel NULL pointer dereference at virtual address 00000010
[7.670268] pgd = 32b54000
[7.670544] [00000010] *pgd=00000000
[7.671861] Internal error: Oops: 5 [#1] SMP ARM
[7.672315] Modules linked in:
[7.672918] CPU: 0 PID: 1 Comm: systemd Not tainted 5.13.0-rc3-00375-g6799d4f2da49 #16
[7.673309] Hardware name: Generic DT based system
[7.673642] PC is at nsfs_evict+0x24/0x30
[7.674486] LR is at clear_inode+0x20/0x9c

So let's reject such request for disabled namespace.

Signed-off-by: Changbin Du <[email protected]>
Cc: <[email protected]>
Cc: Cong Wang <[email protected]>
Cc: Jakub Kicinski <[email protected]>
Cc: David Laight <[email protected]>
---
fs/nsfs.c | 4 ++++
1 file changed, 4 insertions(+)

diff --git a/fs/nsfs.c b/fs/nsfs.c
index 800c1d0eb0d0..6c055eb7757b 100644
--- a/fs/nsfs.c
+++ b/fs/nsfs.c
@@ -62,6 +62,10 @@ static int __ns_get_path(struct path *path, struct ns_common *ns)
struct inode *inode;
unsigned long d;

+ /* In case the namespace is not actually enabled. */
+ if (!ns->ops)
+ return -EOPNOTSUPP;
+
rcu_read_lock();
d = atomic_long_read(&ns->stashed);
if (!d)
--
2.30.2


2021-06-01 05:04:40

by Jakub Kicinski

[permalink] [raw]
Subject: Re: [PATCH] nsfs: fix oops when ns->ops is not provided

On Mon, 31 May 2021 23:34:10 +0800 Changbin Du wrote:
> We should not create inode for disabled namespace. A disabled namespace
> sets its ns->ops to NULL. Kernel could panic if we try to create a inode
> for such namespace.
>
> Here is an example oops in socket ioctl cmd SIOCGSKNS when NET_NS is
> disabled. Kernel panicked wherever nsfs trys to access ns->ops since the
> proc_ns_operations is not implemented in this case.
>
> [7.670023] Unable to handle kernel NULL pointer dereference at virtual address 00000010
> [7.670268] pgd = 32b54000
> [7.670544] [00000010] *pgd=00000000
> [7.671861] Internal error: Oops: 5 [#1] SMP ARM
> [7.672315] Modules linked in:
> [7.672918] CPU: 0 PID: 1 Comm: systemd Not tainted 5.13.0-rc3-00375-g6799d4f2da49 #16
> [7.673309] Hardware name: Generic DT based system
> [7.673642] PC is at nsfs_evict+0x24/0x30
> [7.674486] LR is at clear_inode+0x20/0x9c
>
> So let's reject such request for disabled namespace.
>
> Signed-off-by: Changbin Du <[email protected]>
> Cc: <[email protected]>
> Cc: Cong Wang <[email protected]>
> Cc: Jakub Kicinski <[email protected]>
> Cc: David Laight <[email protected]>
> ---
> fs/nsfs.c | 4 ++++
> 1 file changed, 4 insertions(+)
>
> diff --git a/fs/nsfs.c b/fs/nsfs.c
> index 800c1d0eb0d0..6c055eb7757b 100644
> --- a/fs/nsfs.c
> +++ b/fs/nsfs.c
> @@ -62,6 +62,10 @@ static int __ns_get_path(struct path *path, struct ns_common *ns)
> struct inode *inode;
> unsigned long d;
>
> + /* In case the namespace is not actually enabled. */
> + if (!ns->ops)
> + return -EOPNOTSUPP;
> +
> rcu_read_lock();
> d = atomic_long_read(&ns->stashed);
> if (!d)

I'm not sure why we'd pick runtime checks for something that can be
perfectly easily solved at compilation time. Networking should not
be asking for FDs for objects which don't exist.

2021-06-01 08:09:48

by Christian Brauner

[permalink] [raw]
Subject: Re: [PATCH] nsfs: fix oops when ns->ops is not provided

On Mon, May 31, 2021 at 10:01:28PM -0700, Jakub Kicinski wrote:
> On Mon, 31 May 2021 23:34:10 +0800 Changbin Du wrote:
> > We should not create inode for disabled namespace. A disabled namespace
> > sets its ns->ops to NULL. Kernel could panic if we try to create a inode
> > for such namespace.
> >
> > Here is an example oops in socket ioctl cmd SIOCGSKNS when NET_NS is
> > disabled. Kernel panicked wherever nsfs trys to access ns->ops since the
> > proc_ns_operations is not implemented in this case.
> >
> > [7.670023] Unable to handle kernel NULL pointer dereference at virtual address 00000010
> > [7.670268] pgd = 32b54000
> > [7.670544] [00000010] *pgd=00000000
> > [7.671861] Internal error: Oops: 5 [#1] SMP ARM
> > [7.672315] Modules linked in:
> > [7.672918] CPU: 0 PID: 1 Comm: systemd Not tainted 5.13.0-rc3-00375-g6799d4f2da49 #16
> > [7.673309] Hardware name: Generic DT based system
> > [7.673642] PC is at nsfs_evict+0x24/0x30
> > [7.674486] LR is at clear_inode+0x20/0x9c
> >
> > So let's reject such request for disabled namespace.
> >
> > Signed-off-by: Changbin Du <[email protected]>
> > Cc: <[email protected]>
> > Cc: Cong Wang <[email protected]>
> > Cc: Jakub Kicinski <[email protected]>
> > Cc: David Laight <[email protected]>
> > ---
> > fs/nsfs.c | 4 ++++
> > 1 file changed, 4 insertions(+)
> >
> > diff --git a/fs/nsfs.c b/fs/nsfs.c
> > index 800c1d0eb0d0..6c055eb7757b 100644
> > --- a/fs/nsfs.c
> > +++ b/fs/nsfs.c
> > @@ -62,6 +62,10 @@ static int __ns_get_path(struct path *path, struct ns_common *ns)
> > struct inode *inode;
> > unsigned long d;
> >
> > + /* In case the namespace is not actually enabled. */
> > + if (!ns->ops)
> > + return -EOPNOTSUPP;
> > +
> > rcu_read_lock();
> > d = atomic_long_read(&ns->stashed);
> > if (!d)
>
> I'm not sure why we'd pick runtime checks for something that can be
> perfectly easily solved at compilation time. Networking should not
> be asking for FDs for objects which don't exist.

Agreed!
This should be fixable by sm like:

diff --git a/net/socket.c b/net/socket.c
index 27e3e7d53f8e..2484466d96ad 100644
--- a/net/socket.c
+++ b/net/socket.c
@@ -1150,10 +1150,12 @@ static long sock_ioctl(struct file *file, unsigned cmd, unsigned long arg)
break;
case SIOCGSKNS:
err = -EPERM;
+#ifdef CONFIG_NET_NS
if (!ns_capable(net->user_ns, CAP_NET_ADMIN))
break;

err = open_related_ns(&net->ns, get_net_ns);
+#endif
break;
case SIOCGSTAMP_OLD:
case SIOCGSTAMPNS_OLD:

2021-06-01 19:54:16

by Cong Wang

[permalink] [raw]
Subject: Re: [PATCH] nsfs: fix oops when ns->ops is not provided

On Mon, May 31, 2021 at 10:01 PM Jakub Kicinski <[email protected]> wrote:
>
> On Mon, 31 May 2021 23:34:10 +0800 Changbin Du wrote:
> > We should not create inode for disabled namespace. A disabled namespace
> > sets its ns->ops to NULL. Kernel could panic if we try to create a inode
> > for such namespace.
> >
> > Here is an example oops in socket ioctl cmd SIOCGSKNS when NET_NS is
> > disabled. Kernel panicked wherever nsfs trys to access ns->ops since the
> > proc_ns_operations is not implemented in this case.
> >
> > [7.670023] Unable to handle kernel NULL pointer dereference at virtual address 00000010
> > [7.670268] pgd = 32b54000
> > [7.670544] [00000010] *pgd=00000000
> > [7.671861] Internal error: Oops: 5 [#1] SMP ARM
> > [7.672315] Modules linked in:
> > [7.672918] CPU: 0 PID: 1 Comm: systemd Not tainted 5.13.0-rc3-00375-g6799d4f2da49 #16
> > [7.673309] Hardware name: Generic DT based system
> > [7.673642] PC is at nsfs_evict+0x24/0x30
> > [7.674486] LR is at clear_inode+0x20/0x9c
> >
> > So let's reject such request for disabled namespace.
> >
> > Signed-off-by: Changbin Du <[email protected]>
> > Cc: <[email protected]>
> > Cc: Cong Wang <[email protected]>
> > Cc: Jakub Kicinski <[email protected]>
> > Cc: David Laight <[email protected]>
> > ---
> > fs/nsfs.c | 4 ++++
> > 1 file changed, 4 insertions(+)
> >
> > diff --git a/fs/nsfs.c b/fs/nsfs.c
> > index 800c1d0eb0d0..6c055eb7757b 100644
> > --- a/fs/nsfs.c
> > +++ b/fs/nsfs.c
> > @@ -62,6 +62,10 @@ static int __ns_get_path(struct path *path, struct ns_common *ns)
> > struct inode *inode;
> > unsigned long d;
> >
> > + /* In case the namespace is not actually enabled. */
> > + if (!ns->ops)
> > + return -EOPNOTSUPP;
> > +
> > rcu_read_lock();
> > d = atomic_long_read(&ns->stashed);
> > if (!d)
>
> I'm not sure why we'd pick runtime checks for something that can be
> perfectly easily solved at compilation time. Networking should not
> be asking for FDs for objects which don't exist.

Four reasons:

1) ioctl() is not a hot path, so performance is not a problem here.

2) There are 3 different places (tun has two more) that need the same
fix.

3) init_net always exits, except it does not have an ops when
CONFIG_NET_NS is disabled:

static __net_init int net_ns_net_init(struct net *net)
{
#ifdef CONFIG_NET_NS
net->ns.ops = &netns_operations;
#endif
return ns_alloc_inum(&net->ns);
}

4) *I think* other namespaces need this fix too, for instance
init_ipc_ns:

struct ipc_namespace init_ipc_ns = {
.ns.count = REFCOUNT_INIT(1),
.user_ns = &init_user_ns,
.ns.inum = PROC_IPC_INIT_INO,
#ifdef CONFIG_IPC_NS
.ns.ops = &ipcns_operations,
#endif
};

whose ns->ops is NULL too if disabled.

Thanks.

2021-06-01 20:27:32

by Jakub Kicinski

[permalink] [raw]
Subject: Re: [PATCH] nsfs: fix oops when ns->ops is not provided

On Tue, 1 Jun 2021 10:06:54 +0200 Christian Brauner wrote:
> > I'm not sure why we'd pick runtime checks for something that can be
> > perfectly easily solved at compilation time. Networking should not
> > be asking for FDs for objects which don't exist.
>
> Agreed!
> This should be fixable by sm like:
>
> diff --git a/net/socket.c b/net/socket.c
> index 27e3e7d53f8e..2484466d96ad 100644
> --- a/net/socket.c
> +++ b/net/socket.c
> @@ -1150,10 +1150,12 @@ static long sock_ioctl(struct file *file, unsigned cmd, unsigned long arg)
> break;
> case SIOCGSKNS:
> err = -EPERM;
> +#ifdef CONFIG_NET_NS
> if (!ns_capable(net->user_ns, CAP_NET_ADMIN))
> break;
>
> err = open_related_ns(&net->ns, get_net_ns);
> +#endif
> break;
> case SIOCGSTAMP_OLD:
> case SIOCGSTAMPNS_OLD:

Thanks! You weren't CCed on v1, so FWIW I was suggesting
checking in get_net_ns(), to catch other callers:

diff --git a/net/socket.c b/net/socket.c
index 27e3e7d53f8e..3b44f2700e0c 100644
--- a/net/socket.c
+++ b/net/socket.c
@@ -1081,6 +1081,8 @@ static long sock_do_ioctl(struct net *net, struct socket *sock,

struct ns_common *get_net_ns(struct ns_common *ns)
{
+ if (!IS_ENABLED(CONFIG_NET_NS))
+ return ERR_PTR(-EOPNOTSUPP);
return &get_net(container_of(ns, struct net, ns))->ns;
}
EXPORT_SYMBOL_GPL(get_net_ns);

2021-06-02 09:57:33

by Christian Brauner

[permalink] [raw]
Subject: Re: [PATCH] nsfs: fix oops when ns->ops is not provided

On Tue, Jun 01, 2021 at 12:51:51PM -0700, Cong Wang wrote:
> On Mon, May 31, 2021 at 10:01 PM Jakub Kicinski <[email protected]> wrote:
> >
> > On Mon, 31 May 2021 23:34:10 +0800 Changbin Du wrote:
> > > We should not create inode for disabled namespace. A disabled namespace
> > > sets its ns->ops to NULL. Kernel could panic if we try to create a inode
> > > for such namespace.
> > >
> > > Here is an example oops in socket ioctl cmd SIOCGSKNS when NET_NS is
> > > disabled. Kernel panicked wherever nsfs trys to access ns->ops since the
> > > proc_ns_operations is not implemented in this case.
> > >
> > > [7.670023] Unable to handle kernel NULL pointer dereference at virtual address 00000010
> > > [7.670268] pgd = 32b54000
> > > [7.670544] [00000010] *pgd=00000000
> > > [7.671861] Internal error: Oops: 5 [#1] SMP ARM
> > > [7.672315] Modules linked in:
> > > [7.672918] CPU: 0 PID: 1 Comm: systemd Not tainted 5.13.0-rc3-00375-g6799d4f2da49 #16
> > > [7.673309] Hardware name: Generic DT based system
> > > [7.673642] PC is at nsfs_evict+0x24/0x30
> > > [7.674486] LR is at clear_inode+0x20/0x9c
> > >
> > > So let's reject such request for disabled namespace.
> > >
> > > Signed-off-by: Changbin Du <[email protected]>
> > > Cc: <[email protected]>
> > > Cc: Cong Wang <[email protected]>
> > > Cc: Jakub Kicinski <[email protected]>
> > > Cc: David Laight <[email protected]>
> > > ---
> > > fs/nsfs.c | 4 ++++
> > > 1 file changed, 4 insertions(+)
> > >
> > > diff --git a/fs/nsfs.c b/fs/nsfs.c
> > > index 800c1d0eb0d0..6c055eb7757b 100644
> > > --- a/fs/nsfs.c
> > > +++ b/fs/nsfs.c
> > > @@ -62,6 +62,10 @@ static int __ns_get_path(struct path *path, struct ns_common *ns)
> > > struct inode *inode;
> > > unsigned long d;
> > >
> > > + /* In case the namespace is not actually enabled. */
> > > + if (!ns->ops)
> > > + return -EOPNOTSUPP;
> > > +
> > > rcu_read_lock();
> > > d = atomic_long_read(&ns->stashed);
> > > if (!d)
> >
> > I'm not sure why we'd pick runtime checks for something that can be
> > perfectly easily solved at compilation time. Networking should not
> > be asking for FDs for objects which don't exist.
>
> Four reasons:
>
> 1) ioctl() is not a hot path, so performance is not a problem here.

Hm, I think a compile time check is better than a runtime check
independent of performance benefits.

>
> 2) There are 3 different places (tun has two more) that need the same
> fix.


>
> 3) init_net always exits, except it does not have an ops when
> CONFIG_NET_NS is disabled:

Which is true for every namespace.

>
> static __net_init int net_ns_net_init(struct net *net)
> {
> #ifdef CONFIG_NET_NS
> net->ns.ops = &netns_operations;
> #endif
> return ns_alloc_inum(&net->ns);
> }
>
> 4) *I think* other namespaces need this fix too, for instance
> init_ipc_ns:

None of them should have paths to trigger ->ops.

>
> struct ipc_namespace init_ipc_ns = {
> .ns.count = REFCOUNT_INIT(1),
> .user_ns = &init_user_ns,
> .ns.inum = PROC_IPC_INIT_INO,
> #ifdef CONFIG_IPC_NS
> .ns.ops = &ipcns_operations,
> #endif
> };
>
> whose ns->ops is NULL too if disabled.

But the point is that ns->ops should never be accessed when that
namespace type is disabled. Or in other words, the bug is that something
in netns makes use of namespace features when they are disabled. If we
handle ->ops being NULL we might be tapering over a real bug somewhere.

Jakub's proposal in the other mail makes sense and falls in line with
how the rest of the netns getters are implemented. For example
get_net_ns_fd_fd():

#ifdef CONFIG_NET_NS

[...]

struct net *get_net_ns_by_fd(int fd)
{
struct file *file;
struct ns_common *ns;
struct net *net;

file = proc_ns_fget(fd);
if (IS_ERR(file))
return ERR_CAST(file);

ns = get_proc_ns(file_inode(file));
if (ns->ops == &netns_operations)
net = get_net(container_of(ns, struct net, ns));
else
net = ERR_PTR(-EINVAL);

fput(file);
return net;
}

#else
struct net *get_net_ns_by_fd(int fd)
{
return ERR_PTR(-EINVAL);
}
#endif
EXPORT_SYMBOL_GPL(get_net_ns_by_fd);

(It seems that "get_net_ns()" could also be moved into the same file as
get_net_ns_by_fd() btw.)

Christian

2021-06-02 09:59:23

by Christian Brauner

[permalink] [raw]
Subject: Re: [PATCH] nsfs: fix oops when ns->ops is not provided

On Tue, Jun 01, 2021 at 01:26:02PM -0700, Jakub Kicinski wrote:
> On Tue, 1 Jun 2021 10:06:54 +0200 Christian Brauner wrote:
> > > I'm not sure why we'd pick runtime checks for something that can be
> > > perfectly easily solved at compilation time. Networking should not
> > > be asking for FDs for objects which don't exist.
> >
> > Agreed!
> > This should be fixable by sm like:
> >
> > diff --git a/net/socket.c b/net/socket.c
> > index 27e3e7d53f8e..2484466d96ad 100644
> > --- a/net/socket.c
> > +++ b/net/socket.c
> > @@ -1150,10 +1150,12 @@ static long sock_ioctl(struct file *file, unsigned cmd, unsigned long arg)
> > break;
> > case SIOCGSKNS:
> > err = -EPERM;
> > +#ifdef CONFIG_NET_NS
> > if (!ns_capable(net->user_ns, CAP_NET_ADMIN))
> > break;
> >
> > err = open_related_ns(&net->ns, get_net_ns);
> > +#endif
> > break;
> > case SIOCGSTAMP_OLD:
> > case SIOCGSTAMPNS_OLD:
>
> Thanks! You weren't CCed on v1, so FWIW I was suggesting
> checking in get_net_ns(), to catch other callers:
>
> diff --git a/net/socket.c b/net/socket.c
> index 27e3e7d53f8e..3b44f2700e0c 100644
> --- a/net/socket.c
> +++ b/net/socket.c
> @@ -1081,6 +1081,8 @@ static long sock_do_ioctl(struct net *net, struct socket *sock,
>
> struct ns_common *get_net_ns(struct ns_common *ns)
> {
> + if (!IS_ENABLED(CONFIG_NET_NS))
> + return ERR_PTR(-EOPNOTSUPP);
> return &get_net(container_of(ns, struct net, ns))->ns;
> }
> EXPORT_SYMBOL_GPL(get_net_ns);

Yeah, that's better than my hack. :) Maybe this function should simply
move over to net/core/net_namespace.c with the other netns getters, e.g.
get_net_ns_by_fd()?

#ifdef CONFIG_NET_NS

[...]

struct net *get_net_ns_by_fd(int fd)
{
struct file *file;
struct ns_common *ns;
struct net *net;

file = proc_ns_fget(fd);
if (IS_ERR(file))
return ERR_CAST(file);

ns = get_proc_ns(file_inode(file));
if (ns->ops == &netns_operations)
net = get_net(container_of(ns, struct net, ns));
else
net = ERR_PTR(-EINVAL);

fput(file);
return net;
}

#else
struct net *get_net_ns_by_fd(int fd)
{
return ERR_PTR(-EINVAL);
}
#endif
EXPORT_SYMBOL_GPL(get_net_ns_by_fd);

Christian

2021-06-02 10:46:24

by David Laight

[permalink] [raw]
Subject: RE: [PATCH] nsfs: fix oops when ns->ops is not provided

From: Christian Brauner
> Sent: 02 June 2021 10:15
...
> Hm, I think a compile time check is better than a runtime check
> independent of performance benefits.
>
> >
> > 2) There are 3 different places (tun has two more) that need the same
> > fix.
>
>
> >
> > 3) init_net always exits, except it does not have an ops when
> > CONFIG_NET_NS is disabled:
>
> Which is true for every namespace.
>
> >
> > static __net_init int net_ns_net_init(struct net *net)
> > {
> > #ifdef CONFIG_NET_NS
> > net->ns.ops = &netns_operations;
> > #endif
> > return ns_alloc_inum(&net->ns);
> > }
> >
> > 4) *I think* other namespaces need this fix too, for instance
> > init_ipc_ns:
>
> None of them should have paths to trigger ->ops.
>
> >
> > struct ipc_namespace init_ipc_ns = {
> > .ns.count = REFCOUNT_INIT(1),
> > .user_ns = &init_user_ns,
> > .ns.inum = PROC_IPC_INIT_INO,
> > #ifdef CONFIG_IPC_NS
> > .ns.ops = &ipcns_operations,
> > #endif
> > };
> >
> > whose ns->ops is NULL too if disabled.
>
> But the point is that ns->ops should never be accessed when that
> namespace type is disabled. Or in other words, the bug is that something
> in netns makes use of namespace features when they are disabled. If we
> handle ->ops being NULL we might be tapering over a real bug somewhere.
>
> Jakub's proposal in the other mail makes sense and falls in line with
> how the rest of the netns getters are implemented. For example
> get_net_ns_fd_fd():
>
> #ifdef CONFIG_NET_NS
>
> [...]
>
> struct net *get_net_ns_by_fd(int fd)
> {
> struct file *file;
> struct ns_common *ns;
> struct net *net;
>
> file = proc_ns_fget(fd);
> if (IS_ERR(file))
> return ERR_CAST(file);
>
> ns = get_proc_ns(file_inode(file));
> if (ns->ops == &netns_operations)
> net = get_net(container_of(ns, struct net, ns));
> else
> net = ERR_PTR(-EINVAL);
>
> fput(file);
> return net;
> }
>
> #else
> struct net *get_net_ns_by_fd(int fd)
> {
> return ERR_PTR(-EINVAL);
> }
> #endif
> EXPORT_SYMBOL_GPL(get_net_ns_by_fd);
>
> (It seems that "get_net_ns()" could also be moved into the same file as
> get_net_ns_by_fd() btw.)

The default implementation ought to be in the .h file.
So it gets inlined by the compiler.

David

-
Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK
Registration No: 1397386 (Wales)

2021-06-02 16:40:13

by Jakub Kicinski

[permalink] [raw]
Subject: Re: [PATCH] nsfs: fix oops when ns->ops is not provided

On Wed, 2 Jun 2021 11:16:32 +0200 Christian Brauner wrote:
> > diff --git a/net/socket.c b/net/socket.c
> > index 27e3e7d53f8e..3b44f2700e0c 100644
> > --- a/net/socket.c
> > +++ b/net/socket.c
> > @@ -1081,6 +1081,8 @@ static long sock_do_ioctl(struct net *net, struct socket *sock,
> >
> > struct ns_common *get_net_ns(struct ns_common *ns)
> > {
> > + if (!IS_ENABLED(CONFIG_NET_NS))
> > + return ERR_PTR(-EOPNOTSUPP);
> > return &get_net(container_of(ns, struct net, ns))->ns;
> > }
> > EXPORT_SYMBOL_GPL(get_net_ns);
>
> Yeah, that's better than my hack. :) Maybe this function should simply
> move over to net/core/net_namespace.c with the other netns getters, e.g.
> get_net_ns_by_fd()?

SGTM!

2021-06-03 22:56:28

by Cong Wang

[permalink] [raw]
Subject: Re: [PATCH] nsfs: fix oops when ns->ops is not provided

On Wed, Jun 2, 2021 at 2:14 AM Christian Brauner
<[email protected]> wrote:
> But the point is that ns->ops should never be accessed when that
> namespace type is disabled. Or in other words, the bug is that something
> in netns makes use of namespace features when they are disabled. If we
> handle ->ops being NULL we might be tapering over a real bug somewhere.

It is merely a protocol between fs/nsfs.c and other namespace users,
so there is certainly no right or wrong here, the only question is which
one is better.

>
> Jakub's proposal in the other mail makes sense and falls in line with
> how the rest of the netns getters are implemented. For example
> get_net_ns_fd_fd():

It does not make any sense to me. get_net_ns() merely increases
the netns refcount, which is certainly fine for init_net too, no matter
CONFIG_NET_NS is enabled or disabled. Returning EOPNOTSUPP
there is literally saying we do not support increasing init_net refcount,
which is of course false.

> struct net *get_net_ns_by_fd(int fd)
> {
> return ERR_PTR(-EINVAL);
> }

There is a huge difference between just increasing netns refcount
and retrieving it by fd, right? I have no idea why you bring this up,
calling them getters is missing their difference.

Thanks.

2021-06-04 09:57:06

by Christian Brauner

[permalink] [raw]
Subject: Re: [PATCH] nsfs: fix oops when ns->ops is not provided

On Thu, Jun 03, 2021 at 03:52:29PM -0700, Cong Wang wrote:
> On Wed, Jun 2, 2021 at 2:14 AM Christian Brauner
> <[email protected]> wrote:
> > But the point is that ns->ops should never be accessed when that
> > namespace type is disabled. Or in other words, the bug is that something
> > in netns makes use of namespace features when they are disabled. If we
> > handle ->ops being NULL we might be tapering over a real bug somewhere.
>
> It is merely a protocol between fs/nsfs.c and other namespace users,
> so there is certainly no right or wrong here, the only question is which
> one is better.
>
> >
> > Jakub's proposal in the other mail makes sense and falls in line with
> > how the rest of the netns getters are implemented. For example
> > get_net_ns_fd_fd():
>
> It does not make any sense to me. get_net_ns() merely increases
> the netns refcount, which is certainly fine for init_net too, no matter
> CONFIG_NET_NS is enabled or disabled. Returning EOPNOTSUPP
> there is literally saying we do not support increasing init_net refcount,
> which is of course false.
>
> > struct net *get_net_ns_by_fd(int fd)
> > {
> > return ERR_PTR(-EINVAL);
> > }
>
> There is a huge difference between just increasing netns refcount
> and retrieving it by fd, right? I have no idea why you bring this up,
> calling them getters is missing their difference.

This argument doesn't hold up. All netns helpers ultimately increase the
reference count of the net namespace they find. And if any of them
perform operations where they are called in environments wherey they
need CONFIG_NET_NS they handle this case at compile time.

(Pluse they are defined in a central place in net/net_namespace.{c,h}.
That includes the low-level get_net() function and all the others.
get_net_ns() is the only one that's defined out of band. So get_net_ns()
currently is arguably also misplaced.)

The problem I have with fixing this in nsfs is that it gives the
impression that this is a bug in nsfs whereas it isn't and it
potentially helps tapering over other bugs.

get_net_ns() is only called for codepaths that call into nsfs via
open_related_ns() and it's the only namespace that does this. But
open_related_ns() is only well defined if CONFIG_<NAMESPACE_TYPE> is
set. For example, none of the procfs namespace f_ops will be set for
!CONFIG_NET_NS. So clearly the socket specific getter here is buggy as
it doesn't account for !CONFIG_NET_NS and it should be fixed.

Plus your fix leaks references to init netns without fixing get_net_ns()
too.
You succeed to increase the refcount of init netns in get_net_ns() but
then you return in __ns_get_path() because ns->ops aren't set before
ns->ops->put() can be called. But you also _can't_ call it since it's
not set because !CONFIG_NET_NS. So everytime you call any of those
ioctls you increas the refcount of init net ns without decrementing it
on failure. So the fix is buggy as it is too and would suggest you to
fixup get_net_ns() too.

Cc: <[email protected]>
Cc: Cong Wang <[email protected]>
Cc: Jakub Kicinski <[email protected]>
Cc: David Laight <[email protected]>
Signed-off-by: Changbin Du <[email protected]>
---
fs/nsfs.c | 4 ++++
1 file changed, 4 insertions(+)

diff --git a/fs/nsfs.c b/fs/nsfs.c
index 800c1d0eb0d0..6c055eb7757b 100644
--- a/fs/nsfs.c
+++ b/fs/nsfs.c
@@ -62,6 +62,10 @@ static int __ns_get_path(struct path *path, struct ns_common *ns)
struct inode *inode;
unsigned long d;

+ /* In case the namespace is not actually enabled. */
+ if (!ns->ops)
+ return -EOPNOTSUPP;
+
rcu_read_lock();
d = atomic_long_read(&ns->stashed);
if (!d)
--
2.30.2


2021-06-06 22:45:47

by Changbin Du

[permalink] [raw]
Subject: Re: [PATCH] nsfs: fix oops when ns->ops is not provided

On Fri, Jun 04, 2021 at 11:54:51AM +0200, Christian Brauner wrote:
> On Thu, Jun 03, 2021 at 03:52:29PM -0700, Cong Wang wrote:
> > On Wed, Jun 2, 2021 at 2:14 AM Christian Brauner
> > <[email protected]> wrote:
> > > But the point is that ns->ops should never be accessed when that
> > > namespace type is disabled. Or in other words, the bug is that something
> > > in netns makes use of namespace features when they are disabled. If we
> > > handle ->ops being NULL we might be tapering over a real bug somewhere.
> >
> > It is merely a protocol between fs/nsfs.c and other namespace users,
> > so there is certainly no right or wrong here, the only question is which
> > one is better.
> >
> > >
> > > Jakub's proposal in the other mail makes sense and falls in line with
> > > how the rest of the netns getters are implemented. For example
> > > get_net_ns_fd_fd():
> >
> > It does not make any sense to me. get_net_ns() merely increases
> > the netns refcount, which is certainly fine for init_net too, no matter
> > CONFIG_NET_NS is enabled or disabled. Returning EOPNOTSUPP
> > there is literally saying we do not support increasing init_net refcount,
> > which is of course false.
> >
> > > struct net *get_net_ns_by_fd(int fd)
> > > {
> > > return ERR_PTR(-EINVAL);
> > > }
> >
> > There is a huge difference between just increasing netns refcount
> > and retrieving it by fd, right? I have no idea why you bring this up,
> > calling them getters is missing their difference.
>
> This argument doesn't hold up. All netns helpers ultimately increase the
> reference count of the net namespace they find. And if any of them
> perform operations where they are called in environments wherey they
> need CONFIG_NET_NS they handle this case at compile time.
>
> (Pluse they are defined in a central place in net/net_namespace.{c,h}.
> That includes the low-level get_net() function and all the others.
> get_net_ns() is the only one that's defined out of band. So get_net_ns()
> currently is arguably also misplaced.)
>
Ihe get_net_ns() was a static helper function and then sb made it exported
but didn't move it. See commit d8d211a2a0 ('net: Make extern and export get_net_ns()').

> The problem I have with fixing this in nsfs is that it gives the
> impression that this is a bug in nsfs whereas it isn't and it
> potentially helps tapering over other bugs.
>
> get_net_ns() is only called for codepaths that call into nsfs via
> open_related_ns() and it's the only namespace that does this. But
> open_related_ns() is only well defined if CONFIG_<NAMESPACE_TYPE> is
> set. For example, none of the procfs namespace f_ops will be set for
> !CONFIG_NET_NS. So clearly the socket specific getter here is buggy as
> it doesn't account for !CONFIG_NET_NS and it should be fixed.
I agree with Cong that a pure getter returns a generic error is a bit weird.
And get_net_ns() is to get the ns_common which always exists indepent of
CONFIG_NET_NS. For get_net_ns_by_fd(), I think it is a 'findder + getter'.

So maybe we can rollback to patch V1 to fix all code called into
open_related_ns()?
https://lore.kernel.org/netdev/CAM_iQpWwApLVg39rUkyXxnhsiP0SZf=0ft6vsq=VxFtJ2SumAQ@mail.gmail.com/T/

--- a/net/socket.c
+++ b/net/socket.c
@@ -1149,11 +1149,15 @@ static long sock_ioctl(struct file *file, unsigned cmd, unsigned long arg)
mutex_unlock(&vlan_ioctl_mutex);
break;
case SIOCGSKNS:
+#ifdef CONFIG_NET_NS
err = -EPERM;
if (!ns_capable(net->user_ns, CAP_NET_ADMIN))
break;

err = open_related_ns(&net->ns, get_net_ns);
+#else
+ err = -ENOTSUPP;
+#endif

>
> Plus your fix leaks references to init netns without fixing get_net_ns()
> too.
> You succeed to increase the refcount of init netns in get_net_ns() but
> then you return in __ns_get_path() because ns->ops aren't set before
> ns->ops->put() can be called. But you also _can't_ call it since it's
> not set because !CONFIG_NET_NS. So everytime you call any of those
> ioctls you increas the refcount of init net ns without decrementing it
> on failure. So the fix is buggy as it is too and would suggest you to
> fixup get_net_ns() too.
Yes, it is a problem. Can be put a BUG_ON() in nsfs so that such bug (calling
into nsfs without ops) can be catched early?

>
> Cc: <[email protected]>
> Cc: Cong Wang <[email protected]>
> Cc: Jakub Kicinski <[email protected]>
> Cc: David Laight <[email protected]>
> Signed-off-by: Changbin Du <[email protected]>
> ---
> fs/nsfs.c | 4 ++++
> 1 file changed, 4 insertions(+)
>
> diff --git a/fs/nsfs.c b/fs/nsfs.c
> index 800c1d0eb0d0..6c055eb7757b 100644
> --- a/fs/nsfs.c
> +++ b/fs/nsfs.c
> @@ -62,6 +62,10 @@ static int __ns_get_path(struct path *path, struct ns_common *ns)
> struct inode *inode;
> unsigned long d;
>
> + /* In case the namespace is not actually enabled. */
> + if (!ns->ops)
> + return -EOPNOTSUPP;
> +
> rcu_read_lock();
> d = atomic_long_read(&ns->stashed);
> if (!d)
> --
> 2.30.2
>
>

--
Cheers,
Changbin Du

2021-06-07 00:49:29

by Cong Wang

[permalink] [raw]
Subject: Re: [PATCH] nsfs: fix oops when ns->ops is not provided

On Fri, Jun 4, 2021 at 2:54 AM Christian Brauner
<[email protected]> wrote:
>
> On Thu, Jun 03, 2021 at 03:52:29PM -0700, Cong Wang wrote:
> > On Wed, Jun 2, 2021 at 2:14 AM Christian Brauner
> > <[email protected]> wrote:
> > > But the point is that ns->ops should never be accessed when that
> > > namespace type is disabled. Or in other words, the bug is that something
> > > in netns makes use of namespace features when they are disabled. If we
> > > handle ->ops being NULL we might be tapering over a real bug somewhere.
> >
> > It is merely a protocol between fs/nsfs.c and other namespace users,
> > so there is certainly no right or wrong here, the only question is which
> > one is better.
> >
> > >
> > > Jakub's proposal in the other mail makes sense and falls in line with
> > > how the rest of the netns getters are implemented. For example
> > > get_net_ns_fd_fd():
> >
> > It does not make any sense to me. get_net_ns() merely increases
> > the netns refcount, which is certainly fine for init_net too, no matter
> > CONFIG_NET_NS is enabled or disabled. Returning EOPNOTSUPP
> > there is literally saying we do not support increasing init_net refcount,
> > which is of course false.
> >
> > > struct net *get_net_ns_by_fd(int fd)
> > > {
> > > return ERR_PTR(-EINVAL);
> > > }
> >
> > There is a huge difference between just increasing netns refcount
> > and retrieving it by fd, right? I have no idea why you bring this up,
> > calling them getters is missing their difference.
>
> This argument doesn't hold up. All netns helpers ultimately increase the
> reference count of the net namespace they find. And if any of them
> perform operations where they are called in environments wherey they
> need CONFIG_NET_NS they handle this case at compile time.

Let me explain it in this more straight way: what is the protocol here
for indication of !CONFIG_XXX_NS? Clearly it must be ns->ops==NULL,
because all namespaces use the following similar pattern:

#ifdef CONFIG_NET_NS
net->ns.ops = &netns_operations;
#endif

Now you are arguing the protocol is not this, but it is the getter of
open_related_ns() returns an error pointer.

>
> (Pluse they are defined in a central place in net/net_namespace.{c,h}.
> That includes the low-level get_net() function and all the others.
> get_net_ns() is the only one that's defined out of band. So get_net_ns()
> currently is arguably also misplaced.)

Of course they do, only struct ns_common is generic. What's your
point? Each ns.ops is defined by each namespace too.

>
> The problem I have with fixing this in nsfs is that it gives the
> impression that this is a bug in nsfs whereas it isn't and it
> potentially helps tapering over other bugs.

Like I keep saying, this is just a protocol, there is no right or
wrong here. If the protocol is just ops==NULL, then there is nothing
wrong use it.

(BTW, we have a lot of places that use ops==NULL as a protocol,
they work really well.)

>
> get_net_ns() is only called for codepaths that call into nsfs via
> open_related_ns() and it's the only namespace that does this. But

I am pretty sure userns does the same:

197 case NS_GET_USERNS:
198 return open_related_ns(ns, ns_get_owner);


> open_related_ns() is only well defined if CONFIG_<NAMESPACE_TYPE> is
> set. For example, none of the procfs namespace f_ops will be set for
> !CONFIG_NET_NS. So clearly the socket specific getter here is buggy as
> it doesn't account for !CONFIG_NET_NS and it should be fixed.

If the protocol is just ops==NULL, then the core part should just check
ops==NULL. Pure and simple. I have no idea why you do not admit the
fact that every namespace intentionally leaves ops as NULL when its
config is disabled.

>
> Plus your fix leaks references to init netns without fixing get_net_ns()
> too.

I thought it is 100% clear that this patch is not from me?

Plus, the PoC patch from me actually suggests to change
open_related_ns(), not __ns_get_path(). I have no idea why you
both miss it.

Thanks.

2021-06-07 09:12:02

by Christian Brauner

[permalink] [raw]
Subject: Re: [PATCH] nsfs: fix oops when ns->ops is not provided

On Sun, Jun 06, 2021 at 05:37:40PM -0700, Cong Wang wrote:
> On Fri, Jun 4, 2021 at 2:54 AM Christian Brauner
> <[email protected]> wrote:
> >
> > On Thu, Jun 03, 2021 at 03:52:29PM -0700, Cong Wang wrote:
> > > On Wed, Jun 2, 2021 at 2:14 AM Christian Brauner
> > > <[email protected]> wrote:
> > > > But the point is that ns->ops should never be accessed when that
> > > > namespace type is disabled. Or in other words, the bug is that something
> > > > in netns makes use of namespace features when they are disabled. If we
> > > > handle ->ops being NULL we might be tapering over a real bug somewhere.
> > >
> > > It is merely a protocol between fs/nsfs.c and other namespace users,
> > > so there is certainly no right or wrong here, the only question is which
> > > one is better.
> > >
> > > >
> > > > Jakub's proposal in the other mail makes sense and falls in line with
> > > > how the rest of the netns getters are implemented. For example
> > > > get_net_ns_fd_fd():
> > >
> > > It does not make any sense to me. get_net_ns() merely increases
> > > the netns refcount, which is certainly fine for init_net too, no matter
> > > CONFIG_NET_NS is enabled or disabled. Returning EOPNOTSUPP
> > > there is literally saying we do not support increasing init_net refcount,
> > > which is of course false.
> > >
> > > > struct net *get_net_ns_by_fd(int fd)
> > > > {
> > > > return ERR_PTR(-EINVAL);
> > > > }
> > >
> > > There is a huge difference between just increasing netns refcount
> > > and retrieving it by fd, right? I have no idea why you bring this up,
> > > calling them getters is missing their difference.
> >
> > This argument doesn't hold up. All netns helpers ultimately increase the
> > reference count of the net namespace they find. And if any of them
> > perform operations where they are called in environments wherey they
> > need CONFIG_NET_NS they handle this case at compile time.
>
> Let me explain it in this more straight way: what is the protocol here
> for indication of !CONFIG_XXX_NS? Clearly it must be ns->ops==NULL,
> because all namespaces use the following similar pattern:
>
> #ifdef CONFIG_NET_NS
> net->ns.ops = &netns_operations;
> #endif
>
> Now you are arguing the protocol is not this, but it is the getter of
> open_related_ns() returns an error pointer.

I don't understand what this is supposed to tell me.

>
> >
> > (Pluse they are defined in a central place in net/net_namespace.{c,h}.
> > That includes the low-level get_net() function and all the others.
> > get_net_ns() is the only one that's defined out of band. So get_net_ns()
> > currently is arguably also misplaced.)
>
> Of course they do, only struct ns_common is generic. What's your
> point? Each ns.ops is defined by each namespace too.

All netns helpers should arguably be located in a central place
including get_net_ns(). There's no need to spread such helpers
everywhere. This is completely orthogonaly to struct ns_common.

>
> >
> > The problem I have with fixing this in nsfs is that it gives the
> > impression that this is a bug in nsfs whereas it isn't and it
> > potentially helps tapering over other bugs.
>
> Like I keep saying, this is just a protocol, there is no right or
> wrong here. If the protocol is just ops==NULL, then there is nothing
> wrong use it.
>
> (BTW, we have a lot of places that use ops==NULL as a protocol,
> they work really well.)
>
> >
> > get_net_ns() is only called for codepaths that call into nsfs via
> > open_related_ns() and it's the only namespace that does this. But
>
> I am pretty sure userns does the same:
>
> 197 case NS_GET_USERNS:
> 198 return open_related_ns(ns, ns_get_owner);

Maybe I wasn't clear enough, open_related_ns() is the only namespace
that calls into nsfs via open_related_ns() __outside__ of fs/nsfs.c I
thought that was pretty clear.

But also...

#ifdef CONFIG_USER_NS
struct ns_common *ns_get_owner(struct ns_common *ns);
#else
static inline struct ns_common *ns_get_owner(struct ns_common *ns)
{
return ERR_PTR(-EPERM);
}
#endif

So ns_get_owner() returns -EPERM when !CONFIG_USER_NS so the callback
handles the !CONFIG_USER_NS case. And that's what we were saying
get_net_ns() should do.

>
>
> > open_related_ns() is only well defined if CONFIG_<NAMESPACE_TYPE> is
> > set. For example, none of the procfs namespace f_ops will be set for
> > !CONFIG_NET_NS. So clearly the socket specific getter here is buggy as
> > it doesn't account for !CONFIG_NET_NS and it should be fixed.
>
> If the protocol is just ops==NULL, then the core part should just check
> ops==NULL. Pure and simple. I have no idea why you do not admit the
> fact that every namespace intentionally leaves ops as NULL when its
> config is disabled.

I'm just going to quote myself:

> > set. For example, none of the procfs namespace f_ops will be set for
> > !CONFIG_NET_NS.

If a given namespace type isn't selected then it will never appear in
/proc/<pid>/ns/* which is why the proc_ns_operations aren't defined in
fs/proc/namespaces.c.

In other words, you can't get a file descriptor for a given namespace
through proc or rather the nsfs part of proc when that namespace type
isn't selected.

The open_related_ns() function is a function that is just there to give
you a namespace fd and it assumes that when it is called that the
namespace type is selected for or that the callback you're passing it
handles that case.

For example, see you're own example about ns_get_owner() above.

>
> >
> > Plus your fix leaks references to init netns without fixing get_net_ns()
> > too.
>
> I thought it is 100% clear that this patch is not from me?
>
> Plus, the PoC patch from me actually suggests to change
> open_related_ns(), not __ns_get_path(). I have no idea why you
> both miss it.

Turning this around, I'm not sure what your resistance to just doing it
like ns_get_owner() is doing it is.

Christian

2021-06-07 09:19:08

by Christian Brauner

[permalink] [raw]
Subject: Re: [PATCH] nsfs: fix oops when ns->ops is not provided

On Mon, Jun 07, 2021 at 06:43:41AM +0800, Changbin Du wrote:
> On Fri, Jun 04, 2021 at 11:54:51AM +0200, Christian Brauner wrote:
> > On Thu, Jun 03, 2021 at 03:52:29PM -0700, Cong Wang wrote:
> > > On Wed, Jun 2, 2021 at 2:14 AM Christian Brauner
> > > <[email protected]> wrote:
> > > > But the point is that ns->ops should never be accessed when that
> > > > namespace type is disabled. Or in other words, the bug is that something
> > > > in netns makes use of namespace features when they are disabled. If we
> > > > handle ->ops being NULL we might be tapering over a real bug somewhere.
> > >
> > > It is merely a protocol between fs/nsfs.c and other namespace users,
> > > so there is certainly no right or wrong here, the only question is which
> > > one is better.
> > >
> > > >
> > > > Jakub's proposal in the other mail makes sense and falls in line with
> > > > how the rest of the netns getters are implemented. For example
> > > > get_net_ns_fd_fd():
> > >
> > > It does not make any sense to me. get_net_ns() merely increases
> > > the netns refcount, which is certainly fine for init_net too, no matter
> > > CONFIG_NET_NS is enabled or disabled. Returning EOPNOTSUPP
> > > there is literally saying we do not support increasing init_net refcount,
> > > which is of course false.
> > >
> > > > struct net *get_net_ns_by_fd(int fd)
> > > > {
> > > > return ERR_PTR(-EINVAL);
> > > > }
> > >
> > > There is a huge difference between just increasing netns refcount
> > > and retrieving it by fd, right? I have no idea why you bring this up,
> > > calling them getters is missing their difference.
> >
> > This argument doesn't hold up. All netns helpers ultimately increase the
> > reference count of the net namespace they find. And if any of them
> > perform operations where they are called in environments wherey they
> > need CONFIG_NET_NS they handle this case at compile time.
> >
> > (Pluse they are defined in a central place in net/net_namespace.{c,h}.
> > That includes the low-level get_net() function and all the others.
> > get_net_ns() is the only one that's defined out of band. So get_net_ns()
> > currently is arguably also misplaced.)
> >
> Ihe get_net_ns() was a static helper function and then sb made it exported
> but didn't move it. See commit d8d211a2a0 ('net: Make extern and export get_net_ns()').
>
> > The problem I have with fixing this in nsfs is that it gives the
> > impression that this is a bug in nsfs whereas it isn't and it
> > potentially helps tapering over other bugs.
> >
> > get_net_ns() is only called for codepaths that call into nsfs via
> > open_related_ns() and it's the only namespace that does this. But
> > open_related_ns() is only well defined if CONFIG_<NAMESPACE_TYPE> is
> > set. For example, none of the procfs namespace f_ops will be set for
> > !CONFIG_NET_NS. So clearly the socket specific getter here is buggy as
> > it doesn't account for !CONFIG_NET_NS and it should be fixed.
> I agree with Cong that a pure getter returns a generic error is a bit weird.
> And get_net_ns() is to get the ns_common which always exists indepent of
> CONFIG_NET_NS. For get_net_ns_by_fd(), I think it is a 'findder + getter'.
>
> So maybe we can rollback to patch V1 to fix all code called into
> open_related_ns()?
> https://lore.kernel.org/netdev/CAM_iQpWwApLVg39rUkyXxnhsiP0SZf=0ft6vsq=VxFtJ2SumAQ@mail.gmail.com/T/
>
> --- a/net/socket.c
> +++ b/net/socket.c
> @@ -1149,11 +1149,15 @@ static long sock_ioctl(struct file *file, unsigned cmd, unsigned long arg)
> mutex_unlock(&vlan_ioctl_mutex);
> break;
> case SIOCGSKNS:
> +#ifdef CONFIG_NET_NS
> err = -EPERM;
> if (!ns_capable(net->user_ns, CAP_NET_ADMIN))
> break;
>
> err = open_related_ns(&net->ns, get_net_ns);
> +#else
> + err = -ENOTSUPP;
> +#endif

If Jakub is fine with it I don't really care much but then you need to
fix the other places in tun.c as well.
I'm just not sure what's so special about get_net_ns() that it can't
simply get an ifdef !CONFIG_NET_NS section.
But it seems that there's magic semantics deeply hidden in this helper
that btw only exists for open_related_ns() that makes this necessary:

drivers/net/tun.c: return open_related_ns(&net->ns, get_net_ns);
drivers/net/tun.c: ret = open_related_ns(&net->ns, get_net_ns);
include/linux/socket.h:extern struct ns_common *get_net_ns(struct ns_common *ns);
net/socket.c: * get_net_ns - increment the refcount of the network namespace
net/socket.c:struct ns_common *get_net_ns(struct ns_common *ns)
net/socket.c:EXPORT_SYMBOL_GPL(get_net_ns);
net/socket.c: err = open_related_ns(&net->ns, get_net_ns);
tools/perf/trace/beauty/include/linux/socket.h:extern struct ns_common *get_net_ns(struct ns_common *ns);

>
> >
> > Plus your fix leaks references to init netns without fixing get_net_ns()
> > too.
> > You succeed to increase the refcount of init netns in get_net_ns() but
> > then you return in __ns_get_path() because ns->ops aren't set before
> > ns->ops->put() can be called. But you also _can't_ call it since it's
> > not set because !CONFIG_NET_NS. So everytime you call any of those
> > ioctls you increas the refcount of init net ns without decrementing it
> > on failure. So the fix is buggy as it is too and would suggest you to
> > fixup get_net_ns() too.
> Yes, it is a problem. Can be put a BUG_ON() in nsfs so that such bug (calling
> into nsfs without ops) can be catched early?

Maybe place a WARN_ON() in there.

Christian

2021-06-07 23:34:05

by Changbin Du

[permalink] [raw]
Subject: Re: [PATCH] nsfs: fix oops when ns->ops is not provided

On Mon, Jun 07, 2021 at 11:16:47AM +0200, Christian Brauner wrote:
> On Mon, Jun 07, 2021 at 06:43:41AM +0800, Changbin Du wrote:
> > On Fri, Jun 04, 2021 at 11:54:51AM +0200, Christian Brauner wrote:
> > > On Thu, Jun 03, 2021 at 03:52:29PM -0700, Cong Wang wrote:
> > > > On Wed, Jun 2, 2021 at 2:14 AM Christian Brauner
> > > > <[email protected]> wrote:
> > > > > But the point is that ns->ops should never be accessed when that
> > > > > namespace type is disabled. Or in other words, the bug is that something
> > > > > in netns makes use of namespace features when they are disabled. If we
> > > > > handle ->ops being NULL we might be tapering over a real bug somewhere.
> > > >
> > > > It is merely a protocol between fs/nsfs.c and other namespace users,
> > > > so there is certainly no right or wrong here, the only question is which
> > > > one is better.
> > > >
> > > > >
> > > > > Jakub's proposal in the other mail makes sense and falls in line with
> > > > > how the rest of the netns getters are implemented. For example
> > > > > get_net_ns_fd_fd():
> > > >
> > > > It does not make any sense to me. get_net_ns() merely increases
> > > > the netns refcount, which is certainly fine for init_net too, no matter
> > > > CONFIG_NET_NS is enabled or disabled. Returning EOPNOTSUPP
> > > > there is literally saying we do not support increasing init_net refcount,
> > > > which is of course false.
> > > >
> > > > > struct net *get_net_ns_by_fd(int fd)
> > > > > {
> > > > > return ERR_PTR(-EINVAL);
> > > > > }
> > > >
> > > > There is a huge difference between just increasing netns refcount
> > > > and retrieving it by fd, right? I have no idea why you bring this up,
> > > > calling them getters is missing their difference.
> > >
> > > This argument doesn't hold up. All netns helpers ultimately increase the
> > > reference count of the net namespace they find. And if any of them
> > > perform operations where they are called in environments wherey they
> > > need CONFIG_NET_NS they handle this case at compile time.
> > >
> > > (Pluse they are defined in a central place in net/net_namespace.{c,h}.
> > > That includes the low-level get_net() function and all the others.
> > > get_net_ns() is the only one that's defined out of band. So get_net_ns()
> > > currently is arguably also misplaced.)
> > >
> > Ihe get_net_ns() was a static helper function and then sb made it exported
> > but didn't move it. See commit d8d211a2a0 ('net: Make extern and export get_net_ns()').
> >
> > > The problem I have with fixing this in nsfs is that it gives the
> > > impression that this is a bug in nsfs whereas it isn't and it
> > > potentially helps tapering over other bugs.
> > >
> > > get_net_ns() is only called for codepaths that call into nsfs via
> > > open_related_ns() and it's the only namespace that does this. But
> > > open_related_ns() is only well defined if CONFIG_<NAMESPACE_TYPE> is
> > > set. For example, none of the procfs namespace f_ops will be set for
> > > !CONFIG_NET_NS. So clearly the socket specific getter here is buggy as
> > > it doesn't account for !CONFIG_NET_NS and it should be fixed.
> > I agree with Cong that a pure getter returns a generic error is a bit weird.
> > And get_net_ns() is to get the ns_common which always exists indepent of
> > CONFIG_NET_NS. For get_net_ns_by_fd(), I think it is a 'findder + getter'.
> >
> > So maybe we can rollback to patch V1 to fix all code called into
> > open_related_ns()?
> > https://lore.kernel.org/netdev/CAM_iQpWwApLVg39rUkyXxnhsiP0SZf=0ft6vsq=VxFtJ2SumAQ@mail.gmail.com/T/
> >
> > --- a/net/socket.c
> > +++ b/net/socket.c
> > @@ -1149,11 +1149,15 @@ static long sock_ioctl(struct file *file, unsigned cmd, unsigned long arg)
> > mutex_unlock(&vlan_ioctl_mutex);
> > break;
> > case SIOCGSKNS:
> > +#ifdef CONFIG_NET_NS
> > err = -EPERM;
> > if (!ns_capable(net->user_ns, CAP_NET_ADMIN))
> > break;
> >
> > err = open_related_ns(&net->ns, get_net_ns);
> > +#else
> > + err = -ENOTSUPP;
> > +#endif
>
> If Jakub is fine with it I don't really care much but then you need to
> fix the other places in tun.c as well.
> I'm just not sure what's so special about get_net_ns() that it can't
> simply get an ifdef !CONFIG_NET_NS section.
> But it seems that there's magic semantics deeply hidden in this helper
> that btw only exists for open_related_ns() that makes this necessary:
>
> drivers/net/tun.c: return open_related_ns(&net->ns, get_net_ns);
> drivers/net/tun.c: ret = open_related_ns(&net->ns, get_net_ns);
> include/linux/socket.h:extern struct ns_common *get_net_ns(struct ns_common *ns);
> net/socket.c: * get_net_ns - increment the refcount of the network namespace
> net/socket.c:struct ns_common *get_net_ns(struct ns_common *ns)
> net/socket.c:EXPORT_SYMBOL_GPL(get_net_ns);
> net/socket.c: err = open_related_ns(&net->ns, get_net_ns);
> tools/perf/trace/beauty/include/linux/socket.h:extern struct ns_common *get_net_ns(struct ns_common *ns);
>
Yes, all these must be fixed, too. I can do that if Jakub agrees.

> >
> > >
> > > Plus your fix leaks references to init netns without fixing get_net_ns()
> > > too.
> > > You succeed to increase the refcount of init netns in get_net_ns() but
> > > then you return in __ns_get_path() because ns->ops aren't set before
> > > ns->ops->put() can be called. But you also _can't_ call it since it's
> > > not set because !CONFIG_NET_NS. So everytime you call any of those
> > > ioctls you increas the refcount of init net ns without decrementing it
> > > on failure. So the fix is buggy as it is too and would suggest you to
> > > fixup get_net_ns() too.
> > Yes, it is a problem. Can be put a BUG_ON() in nsfs so that such bug (calling
> > into nsfs without ops) can be catched early?
>
> Maybe place a WARN_ON() in there.
>
How about below change? We need to avoid oops if we can.
-- a/fs/nsfs.c
+++ b/fs/nsfs.c
@@ -62,6 +62,10 @@ static int __ns_get_path(struct path *path, struct ns_common *ns)
struct inode *inode;
unsigned long d;

+ /* In case the namespace is not actually enabled. */
+ if (WARN_ON(!ns->ops))
+ return -EINVAL;
+
rcu_read_lock();

> Christian

--
Cheers,
Changbin Du

2021-06-11 00:16:58

by Cong Wang

[permalink] [raw]
Subject: Re: [PATCH] nsfs: fix oops when ns->ops is not provided

On Mon, Jun 7, 2021 at 2:08 AM Christian Brauner
<[email protected]> wrote:
>
> On Sun, Jun 06, 2021 at 05:37:40PM -0700, Cong Wang wrote:
> > On Fri, Jun 4, 2021 at 2:54 AM Christian Brauner
> > <[email protected]> wrote:
> > >
> > > On Thu, Jun 03, 2021 at 03:52:29PM -0700, Cong Wang wrote:
> > > > On Wed, Jun 2, 2021 at 2:14 AM Christian Brauner
> > > > <[email protected]> wrote:
> > > > > But the point is that ns->ops should never be accessed when that
> > > > > namespace type is disabled. Or in other words, the bug is that something
> > > > > in netns makes use of namespace features when they are disabled. If we
> > > > > handle ->ops being NULL we might be tapering over a real bug somewhere.
> > > >
> > > > It is merely a protocol between fs/nsfs.c and other namespace users,
> > > > so there is certainly no right or wrong here, the only question is which
> > > > one is better.
> > > >
> > > > >
> > > > > Jakub's proposal in the other mail makes sense and falls in line with
> > > > > how the rest of the netns getters are implemented. For example
> > > > > get_net_ns_fd_fd():
> > > >
> > > > It does not make any sense to me. get_net_ns() merely increases
> > > > the netns refcount, which is certainly fine for init_net too, no matter
> > > > CONFIG_NET_NS is enabled or disabled. Returning EOPNOTSUPP
> > > > there is literally saying we do not support increasing init_net refcount,
> > > > which is of course false.
> > > >
> > > > > struct net *get_net_ns_by_fd(int fd)
> > > > > {
> > > > > return ERR_PTR(-EINVAL);
> > > > > }
> > > >
> > > > There is a huge difference between just increasing netns refcount
> > > > and retrieving it by fd, right? I have no idea why you bring this up,
> > > > calling them getters is missing their difference.
> > >
> > > This argument doesn't hold up. All netns helpers ultimately increase the
> > > reference count of the net namespace they find. And if any of them
> > > perform operations where they are called in environments wherey they
> > > need CONFIG_NET_NS they handle this case at compile time.
> >
> > Let me explain it in this more straight way: what is the protocol here
> > for indication of !CONFIG_XXX_NS? Clearly it must be ns->ops==NULL,
> > because all namespaces use the following similar pattern:
> >
> > #ifdef CONFIG_NET_NS
> > net->ns.ops = &netns_operations;
> > #endif
> >
> > Now you are arguing the protocol is not this, but it is the getter of
> > open_related_ns() returns an error pointer.
>
> I don't understand what this is supposed to tell me.

This tells you whatever you called a bug, it is just a protocol.

You are trying to justify it as bug by interpreting is as a getter
like get_net_ns_by_fd(). None of them makes sense, neither
is this bug, nor it is any similar to get_net_ns_by_fd().


>
> >
> > >
> > > (Pluse they are defined in a central place in net/net_namespace.{c,h}.
> > > That includes the low-level get_net() function and all the others.
> > > get_net_ns() is the only one that's defined out of band. So get_net_ns()
> > > currently is arguably also misplaced.)
> >
> > Of course they do, only struct ns_common is generic. What's your
> > point? Each ns.ops is defined by each namespace too.
>
> All netns helpers should arguably be located in a central place
> including get_net_ns(). There's no need to spread such helpers
> everywhere. This is completely orthogonaly to struct ns_common.

I have no idea why you want to argue on something I don't disagree
with. Actually, the proposal from me only changes fs/nsfs.c, so you
do not even need to worry about file locations at all.

>
> >
> > >
> > > The problem I have with fixing this in nsfs is that it gives the
> > > impression that this is a bug in nsfs whereas it isn't and it
> > > potentially helps tapering over other bugs.
> >
> > Like I keep saying, this is just a protocol, there is no right or
> > wrong here. If the protocol is just ops==NULL, then there is nothing
> > wrong use it.
> >
> > (BTW, we have a lot of places that use ops==NULL as a protocol,
> > they work really well.)
> >
> > >
> > > get_net_ns() is only called for codepaths that call into nsfs via
> > > open_related_ns() and it's the only namespace that does this. But
> >
> > I am pretty sure userns does the same:
> >
> > 197 case NS_GET_USERNS:
> > 198 return open_related_ns(ns, ns_get_owner);
>
> Maybe I wasn't clear enough, open_related_ns() is the only namespace
> that calls into nsfs via open_related_ns() __outside__ of fs/nsfs.c I
> thought that was pretty clear.

Why it matter which calls open_related_ns() here? The point is
ns_get_owner() and get_net_ns() are defined by each ns, IOW,
outside of fs/nsfs.c.

>
> But also...
>
> #ifdef CONFIG_USER_NS
> struct ns_common *ns_get_owner(struct ns_common *ns);
> #else
> static inline struct ns_common *ns_get_owner(struct ns_common *ns)
> {
> return ERR_PTR(-EPERM);
> }
> #endif
>
> So ns_get_owner() returns -EPERM when !CONFIG_USER_NS so the callback
> handles the !CONFIG_USER_NS case. And that's what we were saying
> get_net_ns() should do.

Sure, thanks for pointing this out. This is unnecessary too, like I said,
if the protocol is simply ns.ops==NULL. No one says the current code
is perfect, all code can be improved, so using existing code can't justify
it.

>
> >
> >
> > > open_related_ns() is only well defined if CONFIG_<NAMESPACE_TYPE> is
> > > set. For example, none of the procfs namespace f_ops will be set for
> > > !CONFIG_NET_NS. So clearly the socket specific getter here is buggy as
> > > it doesn't account for !CONFIG_NET_NS and it should be fixed.
> >
> > If the protocol is just ops==NULL, then the core part should just check
> > ops==NULL. Pure and simple. I have no idea why you do not admit the
> > fact that every namespace intentionally leaves ops as NULL when its
> > config is disabled.
>
> I'm just going to quote myself:
>
> > > set. For example, none of the procfs namespace f_ops will be set for
> > > !CONFIG_NET_NS.
>
> If a given namespace type isn't selected then it will never appear in
> /proc/<pid>/ns/* which is why the proc_ns_operations aren't defined in
> fs/proc/namespaces.c.
>
> In other words, you can't get a file descriptor for a given namespace
> through proc or rather the nsfs part of proc when that namespace type
> isn't selected.

Who said open_related_ns() should return a fd in such a case?
Clearly it must return an error here.

>
> The open_related_ns() function is a function that is just there to give
> you a namespace fd and it assumes that when it is called that the
> namespace type is selected for or that the callback you're passing it
> handles that case.

Once again, this is just a protocol. Let me compare your protocol
with mine:

1. You want to use the getter as a protocol for indicating a ns is
disabled;

2. I prefer to use ns.ops==NULL as a protocol here.

And let me explain why 2) is better than 1):

a) The final code is less with 2), because all those ugly #ifdef's
are all gone. The getter would not even be called if ns.ops==NULL.

b) The code is more readable. The point of the getter is to increase
refcnt of a specific ns. There is nothing wrong to at least literally
increase the refcnt of init_net. With your approach, it simply says
getting init_net is not supported if !CONFIG_NET_NS, this is just
false.

c) It is slightly faster to error out. open_related_ns() would return
an error even before get_unused_fd_flags(). With your approach,
it defers to the getter, that is, after get_unused_fd_flags().


>
> For example, see you're own example about ns_get_owner() above.
>
> >
> > >
> > > Plus your fix leaks references to init netns without fixing get_net_ns()
> > > too.
> >
> > I thought it is 100% clear that this patch is not from me?
> >
> > Plus, the PoC patch from me actually suggests to change
> > open_related_ns(), not __ns_get_path(). I have no idea why you
> > both miss it.
>
> Turning this around, I'm not sure what your resistance to just doing it
> like ns_get_owner() is doing it is.

I have the same doubt. See above on why.

Thanks.