2020-03-06 17:25:02

by Corey Minyard

[permalink] [raw]
Subject: [PATCH v2] pid: Fix error return value in some cases

From: Corey Minyard <[email protected]>

Recent changes to alloc_pid() allow the pid number to be specified on
the command line. If set_tid_size is set, then the code scanning the
levels will hard-set retval to -EPERM, overriding it's previous -ENOMEM
value.

After the code scanning the levels, there are error returns that do not
set retval, assuming it is still set to -ENOMEM.

So set retval back to -ENOMEM after scanning the levels.

Fixes: 49cb2fc42ce4 "fork: extend clone3() to support setting a PID"
Signed-off-by: Corey Minyard <[email protected]>
Cc: <[email protected]> # 5.5
Cc: Adrian Reber <[email protected]>
Cc: Christian Brauner <[email protected]>
Cc: Oleg Nesterov <[email protected]>
Cc: Dmitry Safonov <[email protected]>
Cc: Andrei Vagin <[email protected]>
---

Changes from v1:
Just set retval to -ENOMEM before the gotos that would use it.

I do think that the second instance:

if (!(ns->pid_allocated & PIDNS_ADDING))
goto out_unlock;

is returning the wrong error value, but that's probably not a big
deal, and if it was fixed would probably need to be a separate change.

In the first instance, the error return values are almost all -ENOMEM,
anyway.

kernel/pid.c | 2 ++
1 file changed, 2 insertions(+)

diff --git a/kernel/pid.c b/kernel/pid.c
index 0f4ecb57214c..19645b25b77c 100644
--- a/kernel/pid.c
+++ b/kernel/pid.c
@@ -247,6 +247,8 @@ struct pid *alloc_pid(struct pid_namespace *ns, pid_t *set_tid,
tmp = tmp->parent;
}

+ retval = -ENOMEM;
+
if (unlikely(is_child_reaper(pid))) {
if (pid_ns_prepare_proc(ns))
goto out_free;
--
2.17.1


2020-03-07 11:00:43

by Christian Brauner

[permalink] [raw]
Subject: Re: [PATCH v2] pid: Fix error return value in some cases

On Fri, Mar 06, 2020 at 11:23:14AM -0600, [email protected] wrote:
> From: Corey Minyard <[email protected]>
>
> Recent changes to alloc_pid() allow the pid number to be specified on
> the command line. If set_tid_size is set, then the code scanning the
> levels will hard-set retval to -EPERM, overriding it's previous -ENOMEM
> value.
>
> After the code scanning the levels, there are error returns that do not
> set retval, assuming it is still set to -ENOMEM.
>
> So set retval back to -ENOMEM after scanning the levels.
>
> Fixes: 49cb2fc42ce4 "fork: extend clone3() to support setting a PID"
> Signed-off-by: Corey Minyard <[email protected]>
> Cc: <[email protected]> # 5.5
> Cc: Adrian Reber <[email protected]>
> Cc: Christian Brauner <[email protected]>
> Cc: Oleg Nesterov <[email protected]>
> Cc: Dmitry Safonov <[email protected]>
> Cc: Andrei Vagin <[email protected]>

Thanks! I've pulled the patch now and applied.

I think that restores the old behavior. If you don't mind, I'll add a
comment on top of it saying something like:
"ENOMEM is not the most obvious choice but it's the what we've been
exposing to userspace for a long time and it's also documented
behavior. So we can't easily change it to something more sensible."

Acked-by: Christian Brauner <[email protected]>

2020-03-07 13:13:01

by Corey Minyard

[permalink] [raw]
Subject: Re: [PATCH v2] pid: Fix error return value in some cases

On Sat, Mar 07, 2020 at 12:00:07PM +0100, Christian Brauner wrote:
> On Fri, Mar 06, 2020 at 11:23:14AM -0600, [email protected] wrote:
> > From: Corey Minyard <[email protected]>
> >
> > Recent changes to alloc_pid() allow the pid number to be specified on
> > the command line. If set_tid_size is set, then the code scanning the
> > levels will hard-set retval to -EPERM, overriding it's previous -ENOMEM
> > value.
> >
> > After the code scanning the levels, there are error returns that do not
> > set retval, assuming it is still set to -ENOMEM.
> >
> > So set retval back to -ENOMEM after scanning the levels.
> >
> > Fixes: 49cb2fc42ce4 "fork: extend clone3() to support setting a PID"
> > Signed-off-by: Corey Minyard <[email protected]>
> > Cc: <[email protected]> # 5.5
> > Cc: Adrian Reber <[email protected]>
> > Cc: Christian Brauner <[email protected]>
> > Cc: Oleg Nesterov <[email protected]>
> > Cc: Dmitry Safonov <[email protected]>
> > Cc: Andrei Vagin <[email protected]>
>
> Thanks! I've pulled the patch now and applied.
>
> I think that restores the old behavior. If you don't mind, I'll add a
> comment on top of it saying something like:
> "ENOMEM is not the most obvious choice but it's the what we've been
> exposing to userspace for a long time and it's also documented
> behavior. So we can't easily change it to something more sensible."

That's great. I was just looking through the code for another reason
and noticed the issue. Every little thing counts for quality.

-corey

>
> Acked-by: Christian Brauner <[email protected]>

2020-03-08 17:08:41

by Christian Brauner

[permalink] [raw]
Subject: Re: [PATCH v2] pid: Fix error return value in some cases

On Sat, Mar 07, 2020 at 07:11:36AM -0600, Corey Minyard wrote:
> On Sat, Mar 07, 2020 at 12:00:07PM +0100, Christian Brauner wrote:
> > On Fri, Mar 06, 2020 at 11:23:14AM -0600, [email protected] wrote:
> > > From: Corey Minyard <[email protected]>
> > >
> > > Recent changes to alloc_pid() allow the pid number to be specified on
> > > the command line. If set_tid_size is set, then the code scanning the
> > > levels will hard-set retval to -EPERM, overriding it's previous -ENOMEM
> > > value.
> > >
> > > After the code scanning the levels, there are error returns that do not
> > > set retval, assuming it is still set to -ENOMEM.
> > >
> > > So set retval back to -ENOMEM after scanning the levels.
> > >
> > > Fixes: 49cb2fc42ce4 "fork: extend clone3() to support setting a PID"
> > > Signed-off-by: Corey Minyard <[email protected]>
> > > Cc: <[email protected]> # 5.5
> > > Cc: Adrian Reber <[email protected]>
> > > Cc: Christian Brauner <[email protected]>
> > > Cc: Oleg Nesterov <[email protected]>
> > > Cc: Dmitry Safonov <[email protected]>
> > > Cc: Andrei Vagin <[email protected]>
> >
> > Thanks! I've pulled the patch now and applied.

Applied as:
https://git.kernel.org/pub/scm/linux/kernel/git/brauner/linux.git/commit/?h=fixes&id=b26ebfe12f34f372cf041c6f801fa49c3fb382c5

Should show up in -next around Monday and I'll target it for rc6. Should
then be backported to v5.5 rather soon!

Thanks!

2020-03-08 17:11:13

by Christian Brauner

[permalink] [raw]
Subject: [PATCH] pid: make ENOMEM return value more obvious

The alloc_pid() codepath used to be simpler. With the introducation of the
ability to choose specific pids in 49cb2fc42ce4 ("fork: extend clone3() to
support setting a PID") it got more complex. It hasn't been super obvious
that ENOMEM is returned when the pid namespace init process/child subreaper
of the pid namespace has died. As can be seen from multiple attempts to
improve this see e.g. [1] and most recently [2].
We regressed returning ENOMEM in [3] and [2] restored it. Let's add a
comment on top explaining that this is historic and documented behavior and
cannot easily be changed.
The unconditional initialization of retval when declaring it can be removed
since it is initialized on ever failure path in the loop and unconditionaly
set to ENOMEM right after it.

[1]: 35f71bc0a09a ("fork: report pid reservation failure properly")
[2]: b26ebfe12f34 ("pid: Fix error return value in some cases")
[3]: 49cb2fc42ce4 ("fork: extend clone3() to support setting a PID")
Signed-off-by: Christian Brauner <[email protected]>
---
kernel/pid.c | 10 +++++++++-
1 file changed, 9 insertions(+), 1 deletion(-)

diff --git a/kernel/pid.c b/kernel/pid.c
index 19645b25b77c..be43122eb876 100644
--- a/kernel/pid.c
+++ b/kernel/pid.c
@@ -165,7 +165,7 @@ struct pid *alloc_pid(struct pid_namespace *ns, pid_t *set_tid,
int i, nr;
struct pid_namespace *tmp;
struct upid *upid;
- int retval = -ENOMEM;
+ int retval;

/*
* set_tid_size contains the size of the set_tid array. Starting at
@@ -247,6 +247,14 @@ struct pid *alloc_pid(struct pid_namespace *ns, pid_t *set_tid,
tmp = tmp->parent;
}

+ /*
+ * ENOMEM is not the most obvious choice especially for the case
+ * where the child subreaper has already exited and the pid
+ * namespace denies the creation of any new processes. But ENOMEM
+ * is what we have exposed to userspace for a long time and it is
+ * documented behavior for pid namespaces. So we can't easily
+ * change it even if there were an error code better suited.
+ */
retval = -ENOMEM;

if (unlikely(is_child_reaper(pid))) {

base-commit: b26ebfe12f34f372cf041c6f801fa49c3fb382c5
--
2.25.1

2020-03-08 17:17:26

by Christian Brauner

[permalink] [raw]
Subject: [PATCH] selftests: add pid namespace ENOMEM regression test

We recently regressed (cf. [1] and its corresponding fix in [2]) returning
ENOMEM when trying to create a process in a pid namespace whose init
process/child subreaper has already died. This has caused confusion at
least once before that (cf. [3]). Let's add a simple regression test to
catch this in the future.

[1]: 49cb2fc42ce4 ("fork: extend clone3() to support setting a PID")
[2]: b26ebfe12f34 ("pid: Fix error return value in some cases")
[3]: 35f71bc0a09a ("fork: report pid reservation failure properly")
Cc: Corey Minyard <[email protected]>
Cc: Oleg Nesterov <[email protected]>
Cc: Adrian Reber <[email protected]>
Cc: Dmitry Safonov <[email protected]>
Cc: Andrei Vagin <[email protected]>
Signed-off-by: Christian Brauner <[email protected]>
---
MAINTAINERS | 1 +
tools/testing/selftests/Makefile | 1 +
.../selftests/pid_namespace/.gitignore | 1 +
.../testing/selftests/pid_namespace/Makefile | 8 ++++
tools/testing/selftests/pid_namespace/config | 2 +
.../pid_namespace/regression_enomem.c | 45 +++++++++++++++++++
tools/testing/selftests/pidfd/pidfd.h | 2 +
7 files changed, 60 insertions(+)
create mode 100644 tools/testing/selftests/pid_namespace/.gitignore
create mode 100644 tools/testing/selftests/pid_namespace/Makefile
create mode 100644 tools/testing/selftests/pid_namespace/config
create mode 100644 tools/testing/selftests/pid_namespace/regression_enomem.c

diff --git a/MAINTAINERS b/MAINTAINERS
index 6158a143a13e..e3a83c739ff3 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -13159,6 +13159,7 @@ S: Maintained
T: git git://git.kernel.org/pub/scm/linux/kernel/git/brauner/linux.git
F: samples/pidfd/
F: tools/testing/selftests/pidfd/
+F: tools/testing/selftests/pid_namespace/
F: tools/testing/selftests/clone3/
K: (?i)pidfd
K: (?i)clone3
diff --git a/tools/testing/selftests/Makefile b/tools/testing/selftests/Makefile
index 6ec503912bea..5fc587b7136f 100644
--- a/tools/testing/selftests/Makefile
+++ b/tools/testing/selftests/Makefile
@@ -38,6 +38,7 @@ TARGETS += netfilter
TARGETS += networking/timestamping
TARGETS += nsfs
TARGETS += pidfd
+TARGETS += pid_namespace
TARGETS += powerpc
TARGETS += proc
TARGETS += pstore
diff --git a/tools/testing/selftests/pid_namespace/.gitignore b/tools/testing/selftests/pid_namespace/.gitignore
new file mode 100644
index 000000000000..93ab9d7e5b7e
--- /dev/null
+++ b/tools/testing/selftests/pid_namespace/.gitignore
@@ -0,0 +1 @@
+regression_enomem
diff --git a/tools/testing/selftests/pid_namespace/Makefile b/tools/testing/selftests/pid_namespace/Makefile
new file mode 100644
index 000000000000..dcaefa224ca0
--- /dev/null
+++ b/tools/testing/selftests/pid_namespace/Makefile
@@ -0,0 +1,8 @@
+# SPDX-License-Identifier: GPL-2.0
+CFLAGS += -g -I../../../../usr/include/
+
+TEST_GEN_PROGS := regression_enomem
+
+include ../lib.mk
+
+$(OUTPUT)/regression_enomem: regression_enomem.c ../pidfd/pidfd.h
diff --git a/tools/testing/selftests/pid_namespace/config b/tools/testing/selftests/pid_namespace/config
new file mode 100644
index 000000000000..26cdb27e7dbb
--- /dev/null
+++ b/tools/testing/selftests/pid_namespace/config
@@ -0,0 +1,2 @@
+CONFIG_PID_NS=y
+CONFIG_USER_NS=y
diff --git a/tools/testing/selftests/pid_namespace/regression_enomem.c b/tools/testing/selftests/pid_namespace/regression_enomem.c
new file mode 100644
index 000000000000..73d532556d17
--- /dev/null
+++ b/tools/testing/selftests/pid_namespace/regression_enomem.c
@@ -0,0 +1,45 @@
+#define _GNU_SOURCE
+#include <assert.h>
+#include <errno.h>
+#include <fcntl.h>
+#include <linux/types.h>
+#include <sched.h>
+#include <signal.h>
+#include <stdio.h>
+#include <stdlib.h>
+#include <string.h>
+#include <syscall.h>
+#include <sys/wait.h>
+
+#include "../kselftest.h"
+#include "../kselftest_harness.h"
+#include "../pidfd/pidfd.h"
+
+/*
+ * Regression test for:
+ * 35f71bc0a09a ("fork: report pid reservation failure properly")
+ * b26ebfe12f34 ("pid: Fix error return value in some cases")
+ */
+TEST(regression_enomem)
+{
+ pid_t pid;
+
+ if (geteuid())
+ EXPECT_EQ(0, unshare(CLONE_NEWUSER));
+
+ EXPECT_EQ(0, unshare(CLONE_NEWPID));
+
+ pid = fork();
+ ASSERT_GE(pid, 0);
+
+ if (pid == 0)
+ exit(EXIT_SUCCESS);
+
+ EXPECT_EQ(0, wait_for_pid(pid));
+
+ pid = fork();
+ ASSERT_LT(pid, 0);
+ ASSERT_EQ(errno, ENOMEM);
+}
+
+TEST_HARNESS_MAIN
diff --git a/tools/testing/selftests/pidfd/pidfd.h b/tools/testing/selftests/pidfd/pidfd.h
index d482515604db..c1921a53dbed 100644
--- a/tools/testing/selftests/pidfd/pidfd.h
+++ b/tools/testing/selftests/pidfd/pidfd.h
@@ -13,6 +13,8 @@
#include <string.h>
#include <syscall.h>
#include <sys/mount.h>
+#include <sys/types.h>
+#include <sys/wait.h>

#include "../kselftest.h"


base-commit: 8deb24dcb89cb390110d2ccac830a84d1ab5cee4
--
2.25.1