2019-10-08 13:38:15

by Christian Kellner

[permalink] [raw]
Subject: [PATCH] pidfd: show pids for nested pid namespaces in fdinfo

From: Christian Kellner <[email protected]>

The fdinfo file for a process file descriptor already contains the
pid of the process in the callers namespaces. Additionally, if pid
namespaces are configured, show the process ids of the process in
all nested namespaces in the same format as in the procfs status
file, i.e. "NSPid:\t%d\%d...". This allows the easy identification
of the processes in nested namespaces.

Signed-off-by: Christian Kellner <[email protected]>
---
kernel/fork.c | 10 ++++++++++
1 file changed, 10 insertions(+)

diff --git a/kernel/fork.c b/kernel/fork.c
index 5a0fd518e04e..801793789f33 100644
--- a/kernel/fork.c
+++ b/kernel/fork.c
@@ -1685,8 +1685,18 @@ static void pidfd_show_fdinfo(struct seq_file *m, struct file *f)
{
struct pid_namespace *ns = proc_pid_ns(file_inode(m->file));
struct pid *pid = f->private_data;
+#ifdef CONFIG_PID_NS
+ int i;
+#endif

seq_put_decimal_ull(m, "Pid:\t", pid_nr_ns(pid, ns));
+#ifdef CONFIG_PID_NS
+ seq_puts(m, "\nNSpid:");
+ for (i = ns->level; i <= pid->level; i++) {
+ ns = pid_nr_ns(pid, pid->numbers[i].ns);
+ seq_put_decimal_ull(m, "\t", ns);
+ }
+#endif
seq_putc(m, '\n');
}
#endif
--
2.21.0


2019-10-08 13:54:07

by Christian Brauner

[permalink] [raw]
Subject: Re: [PATCH] pidfd: show pids for nested pid namespaces in fdinfo

On Tue, Oct 08, 2019 at 03:36:37PM +0200, Christian Kellner wrote:
> From: Christian Kellner <[email protected]>
>
> The fdinfo file for a process file descriptor already contains the
> pid of the process in the callers namespaces. Additionally, if pid
> namespaces are configured, show the process ids of the process in
> all nested namespaces in the same format as in the procfs status
> file, i.e. "NSPid:\t%d\%d...". This allows the easy identification
> of the processes in nested namespaces.
>
> Signed-off-by: Christian Kellner <[email protected]>

Yeah, makes sense to me.
Note that if you send the pidfd to a sibling pid namespace NSpid won't
show you anything useful. But that's what I'd expect security wise. You
should only be able to snoop on descendant pid namespaces.

Please add a test for this to verify that this all works correctly and
then resend. The tests live in tools/testing/selftests/pidfd/ and should
already have most of the infrastructure there. The fdinfo parsing code
should be in samples/pidfd/ which

For the patch itself:

Reviewed-by: Christian Brauner <[email protected]>

You can resend with my Reviewed-by retained if you don't change
anything. Before I see tests I'll hold off on merging this. ;)

Thanks!
Christian

2019-10-08 14:02:46

by Michal Hocko

[permalink] [raw]
Subject: Re: [PATCH] pidfd: show pids for nested pid namespaces in fdinfo

On Tue 08-10-19 15:52:59, Christian Brauner wrote:
> On Tue, Oct 08, 2019 at 03:36:37PM +0200, Christian Kellner wrote:
> > From: Christian Kellner <[email protected]>
> >
> > The fdinfo file for a process file descriptor already contains the
> > pid of the process in the callers namespaces. Additionally, if pid
> > namespaces are configured, show the process ids of the process in
> > all nested namespaces in the same format as in the procfs status
> > file, i.e. "NSPid:\t%d\%d...". This allows the easy identification
> > of the processes in nested namespaces.
> >
> > Signed-off-by: Christian Kellner <[email protected]>
>
> Yeah, makes sense to me.
> Note that if you send the pidfd to a sibling pid namespace NSpid won't
> show you anything useful. But that's what I'd expect security wise. You
> should only be able to snoop on descendant pid namespaces.
>
> Please add a test for this to verify that this all works correctly and
> then resend. The tests live in tools/testing/selftests/pidfd/ and should
> already have most of the infrastructure there. The fdinfo parsing code
> should be in samples/pidfd/ which
>
> For the patch itself:
>
> Reviewed-by: Christian Brauner <[email protected]>
>
> You can resend with my Reviewed-by retained if you don't change
> anything. Before I see tests I'll hold off on merging this. ;)

This is also forming a new user visible "api" right? So the make sure
that linux-api is on the Cc list.

And one minore note. The ifdefery is just ugly, could you just make it a
separate function with ifdef hidden inside?
--
Michal Hocko
SUSE Labs

2019-10-09 16:11:07

by Christian Kellner

[permalink] [raw]
Subject: [PATCH v2 1/2] pidfd: show pids for nested pid namespaces in fdinfo

From: Christian Kellner <[email protected]>

The fdinfo file for a process file descriptor already contains the
pid of the process in the callers namespaces. Additionally, if pid
namespaces are configured, show the process ids of the process in
all nested namespaces in the same format as in the procfs status
file, i.e. "NSPid:\t%d\%d...". This allows the easy identification
of the processes in nested namespaces.

Signed-off-by: Christian Kellner <[email protected]>
---

Changes in v2:
- Moved into separate function to avoid multiple ifdefs as suggested
by Michal Hocko

kernel/fork.c | 15 +++++++++++++++
1 file changed, 15 insertions(+)

diff --git a/kernel/fork.c b/kernel/fork.c
index 5a0fd518e04e..f7a59ef046e9 100644
--- a/kernel/fork.c
+++ b/kernel/fork.c
@@ -1681,12 +1681,27 @@ static int pidfd_release(struct inode *inode, struct file *file)
}

#ifdef CONFIG_PROC_FS
+static void pidfd_nspid(struct seq_file *m, struct pid *pid)
+{
+#ifdef CONFIG_PID_NS
+ struct pid_namespace *ns = proc_pid_ns(file_inode(m->file));
+ int i;
+
+ seq_puts(m, "\nNSpid:");
+ for (i = ns->level; i <= pid->level; i++) {
+ ns = pid->numbers[i].ns;
+ seq_put_decimal_ull(m, "\t", pid_nr_ns(pid, ns));
+ }
+#endif
+}
+
static void pidfd_show_fdinfo(struct seq_file *m, struct file *f)
{
struct pid_namespace *ns = proc_pid_ns(file_inode(m->file));
struct pid *pid = f->private_data;

seq_put_decimal_ull(m, "Pid:\t", pid_nr_ns(pid, ns));
+ pidfd_nspid(m, pid);
seq_putc(m, '\n');
}
#endif
--
2.21.0

2019-10-09 16:13:38

by Christian Kellner

[permalink] [raw]
Subject: [PATCH v2 2/2] pidfd: add tests for NSpid info in fdinfo

From: Christian Kellner <[email protected]>

Add tests that check that if pid namespaces are configured the fdinfo
file of a pidfd contains an NSpid: entry containing the process id
in the current and additionally all nested namespaces.

Signed-off-by: Christian Kellner <[email protected]>
---
tools/testing/selftests/pidfd/Makefile | 2 +-
tools/testing/selftests/pidfd/pidfd.h | 12 +++
.../selftests/pidfd/pidfd_fdinfo_test.c | 98 +++++++++++++++++++
tools/testing/selftests/pidfd/pidfd_test.c | 12 ---
4 files changed, 111 insertions(+), 13 deletions(-)
create mode 100644 tools/testing/selftests/pidfd/pidfd_fdinfo_test.c

diff --git a/tools/testing/selftests/pidfd/Makefile b/tools/testing/selftests/pidfd/Makefile
index 464c9b76148f..b7784dc488b8 100644
--- a/tools/testing/selftests/pidfd/Makefile
+++ b/tools/testing/selftests/pidfd/Makefile
@@ -1,7 +1,7 @@
# SPDX-License-Identifier: GPL-2.0-only
CFLAGS += -g -I../../../../usr/include/ -lpthread

-TEST_GEN_PROGS := pidfd_test pidfd_open_test pidfd_poll_test pidfd_wait
+TEST_GEN_PROGS := pidfd_test pdfd_fdinfo_test pidfd_open_test pidfd_poll_test pidfd_wait

include ../lib.mk

diff --git a/tools/testing/selftests/pidfd/pidfd.h b/tools/testing/selftests/pidfd/pidfd.h
index c6bc68329f4b..2946d788645b 100644
--- a/tools/testing/selftests/pidfd/pidfd.h
+++ b/tools/testing/selftests/pidfd/pidfd.h
@@ -84,4 +84,16 @@ static inline int sys_pidfd_send_signal(int pidfd, int sig, siginfo_t *info,
return syscall(__NR_pidfd_send_signal, pidfd, sig, info, flags);
}

+static pid_t pidfd_clone(int flags, int *pidfd, int (*fn)(void *))
+{
+ size_t stack_size = 1024;
+ char *stack[1024] = { 0 };
+
+#ifdef __ia64__
+ return __clone2(fn, stack, stack_size, flags | SIGCHLD, NULL, pidfd);
+#else
+ return clone(fn, stack + stack_size, flags | SIGCHLD, NULL, pidfd);
+#endif
+}
+
#endif /* __PIDFD_H */
diff --git a/tools/testing/selftests/pidfd/pidfd_fdinfo_test.c b/tools/testing/selftests/pidfd/pidfd_fdinfo_test.c
new file mode 100644
index 000000000000..fbae502ad8ad
--- /dev/null
+++ b/tools/testing/selftests/pidfd/pidfd_fdinfo_test.c
@@ -0,0 +1,98 @@
+// SPDX-License-Identifier: GPL-2.0
+
+#define _GNU_SOURCE
+#include <errno.h>
+#include <fcntl.h>
+#include <linux/types.h>
+#include <sched.h>
+#include <signal.h>
+#include <stdio.h>
+#include <stdlib.h>
+#include <string.h>
+#include <syscall.h>
+#include <sys/wait.h>
+
+#include "pidfd.h"
+#include "../kselftest.h"
+
+static int child_fdinfo_nspid_test(void *args)
+{
+ ksft_print_msg("Child: pid %d\n", getpid());
+ return 0;
+}
+
+static int compare_fdinfo_nspid(int pidfd, char *expect, size_t len)
+{
+ char path[512];
+ FILE *f;
+ size_t n = 0;
+ ssize_t k;
+ char *line = NULL;
+ int r = -1;
+
+ snprintf(path, sizeof(path), "/proc/self/fdinfo/%d", pidfd);
+
+ f = fopen(path, "re");
+ if (!f)
+ return -1;
+
+ while ((k = getline(&line, &n, f)) != -1) {
+ if (strncmp(line, "NSpid:", 6))
+ continue;
+
+ line[k - 1] = '\0';
+ ksft_print_msg("Child: fdinfo NSpid line: '%s'.\n", line);
+ r = strncmp(line + 6, expect, len);
+ break;
+ }
+
+ free(line);
+ fclose(f);
+
+ return r;
+}
+
+static void test_pidfd_fdinfo_nspid(void)
+{
+ char expect[512];
+ int pid, pidfd = 0;
+ int n, r;
+ const char *test_name = "pidfd check for NSpid information in fdinfo";
+
+ pid = pidfd_clone(CLONE_PIDFD | CLONE_NEWPID | CLONE_NEWUSER, &pidfd,
+ child_fdinfo_nspid_test);
+ if (pid < 0)
+ ksft_exit_fail_msg(
+ "%s test: pidfd_clone failed (ret %d, errno %d)\n",
+ test_name, pid, errno);
+
+ ksft_print_msg("Parent: child-pid: %d\n", pid);
+
+ /* The child will have pid 1 in the new pid namespace,
+ * so the line must be 'NSPid:\t<pid>\t1'
+ */
+ n = snprintf(expect, sizeof(expect), "\t%d\t%d", pid, 1);
+ r = compare_fdinfo_nspid(pidfd, expect, n);
+
+ (void)close(pidfd);
+
+ if (wait_for_pid(pid))
+ ksft_exit_fail_msg(
+ "%s test: waitpid failed (ret %d, errno %d)\n",
+ test_name, r, errno);
+
+ if (r != 0)
+ ksft_exit_fail_msg("%s test: Failed\n", test_name);
+ else
+ ksft_test_result_pass("%s test: Passed\n", test_name);
+}
+
+int main(int argc, char **argv)
+{
+ ksft_print_header();
+ ksft_set_plan(1);
+
+ test_pidfd_fdinfo_nspid();
+
+ return ksft_exit_pass();
+}
diff --git a/tools/testing/selftests/pidfd/pidfd_test.c b/tools/testing/selftests/pidfd/pidfd_test.c
index 7aff2d3b42c0..9cf0b6b3e389 100644
--- a/tools/testing/selftests/pidfd/pidfd_test.c
+++ b/tools/testing/selftests/pidfd/pidfd_test.c
@@ -27,18 +27,6 @@

#define MAX_EVENTS 5

-static pid_t pidfd_clone(int flags, int *pidfd, int (*fn)(void *))
-{
- size_t stack_size = 1024;
- char *stack[1024] = { 0 };
-
-#ifdef __ia64__
- return __clone2(fn, stack, stack_size, flags | SIGCHLD, NULL, pidfd);
-#else
- return clone(fn, stack + stack_size, flags | SIGCHLD, NULL, pidfd);
-#endif
-}
-
static int signal_received;

static void set_signal_received_on_sigusr1(int sig)
--
2.21.0

2019-10-09 17:31:30

by Christian Brauner

[permalink] [raw]
Subject: Re: [PATCH v2 1/2] pidfd: show pids for nested pid namespaces in fdinfo

On Wed, Oct 09, 2019 at 06:05:30PM +0200, Christian Kellner wrote:
> From: Christian Kellner <[email protected]>
>
> The fdinfo file for a process file descriptor already contains the
> pid of the process in the callers namespaces. Additionally, if pid
> namespaces are configured, show the process ids of the process in
> all nested namespaces in the same format as in the procfs status
> file, i.e. "NSPid:\t%d\%d...". This allows the easy identification
> of the processes in nested namespaces.
>
> Signed-off-by: Christian Kellner <[email protected]>
> ---
>
> Changes in v2:
> - Moved into separate function to avoid multiple ifdefs as suggested
> by Michal Hocko
>
> kernel/fork.c | 15 +++++++++++++++
> 1 file changed, 15 insertions(+)
>
> diff --git a/kernel/fork.c b/kernel/fork.c
> index 5a0fd518e04e..f7a59ef046e9 100644
> --- a/kernel/fork.c
> +++ b/kernel/fork.c
> @@ -1681,12 +1681,27 @@ static int pidfd_release(struct inode *inode, struct file *file)
> }
>
> #ifdef CONFIG_PROC_FS
> +static void pidfd_nspid(struct seq_file *m, struct pid *pid)

If it has to be a separate helper then please make it:

static inline void print_pidfd_nspid(struct seq_file *m, struct pid_namespace *ns, struct pid *pid)
{
#ifdef CONFIG_PID_NS
int i;

seq_puts(m, "\nNSpid:");
for (i = ns->level; i <= pid->level; i++) {
ns = pid->numbers[i].ns;
seq_put_decimal_ull(m, "\t", pid_nr_ns(pid, ns));
}
#endif
}

It's called nowhere else and we've already retrieved the pid_namespace
in pidfd_show_fdinfo().

> +{
> +#ifdef CONFIG_PID_NS
> + struct pid_namespace *ns = proc_pid_ns(file_inode(m->file));
> + int i;
> +
> + seq_puts(m, "\nNSpid:");
> + for (i = ns->level; i <= pid->level; i++) {
> + ns = pid->numbers[i].ns;
> + seq_put_decimal_ull(m, "\t", pid_nr_ns(pid, ns));
> + }
> +#endif
> +}
> +
> static void pidfd_show_fdinfo(struct seq_file *m, struct file *f)
> {
> struct pid_namespace *ns = proc_pid_ns(file_inode(m->file));
> struct pid *pid = f->private_data;
>
> seq_put_decimal_ull(m, "Pid:\t", pid_nr_ns(pid, ns));
> + pidfd_nspid(m, pid);
> seq_putc(m, '\n');
> }
> #endif
> --
> 2.21.0
>

2019-10-11 12:24:25

by Christian Kellner

[permalink] [raw]
Subject: [PATCH v3 1/2] pidfd: show pids for nested pid namespaces in fdinfo

From: Christian Kellner <[email protected]>

The fdinfo file for a process file descriptor already contains the
pid of the process in the callers namespaces. Additionally, if pid
namespaces are configured, show the process ids of the process in
all nested namespaces in the same format as in the procfs status
file, i.e. "NSPid:\t%d\%d...". This allows the easy identification
of the processes in nested namespaces.

Signed-off-by: Christian Kellner <[email protected]>
---

Changes in v2:
- Moved into separate function to avoid multiple ifdefs as suggested
by Michal Hocko
Changes in v3:
- Helper function takes struct pid_namespace *ns param and got a new
name

kernel/fork.c | 15 +++++++++++++++
1 file changed, 15 insertions(+)

diff --git a/kernel/fork.c b/kernel/fork.c
index bcdf53125210..183950aad82b 100644
--- a/kernel/fork.c
+++ b/kernel/fork.c
@@ -1695,12 +1695,27 @@ static int pidfd_release(struct inode *inode, struct file *file)
}

#ifdef CONFIG_PROC_FS
+static inline void print_pidfd_nspid(struct seq_file *m, struct pid *pid,
+ struct pid_namespace *ns)
+{
+#ifdef CONFIG_PID_NS
+ int i;
+
+ seq_puts(m, "\nNSpid:");
+ for (i = ns->level; i <= pid->level; i++) {
+ ns = pid->numbers[i].ns;
+ seq_put_decimal_ull(m, "\t", pid_nr_ns(pid, ns));
+ }
+#endif
+}
+
static void pidfd_show_fdinfo(struct seq_file *m, struct file *f)
{
struct pid_namespace *ns = proc_pid_ns(file_inode(m->file));
struct pid *pid = f->private_data;

seq_put_decimal_ull(m, "Pid:\t", pid_nr_ns(pid, ns));
+ print_pidfd_nspid(m, pid, ns);
seq_putc(m, '\n');
}
#endif
--
2.21.0

2019-10-11 12:24:56

by Christian Kellner

[permalink] [raw]
Subject: [PATCH v3 2/2] pidfd: add tests for NSpid info in fdinfo

From: Christian Kellner <[email protected]>

Add tests that check that if pid namespaces are configured the fdinfo
file of a pidfd contains an NSpid: entry containing the process id
in the current and additionally all nested namespaces.

Signed-off-by: Christian Kellner <[email protected]>
---
tools/testing/selftests/pidfd/Makefile | 2 +-
tools/testing/selftests/pidfd/pidfd.h | 12 +++
.../selftests/pidfd/pidfd_fdinfo_test.c | 98 +++++++++++++++++++
tools/testing/selftests/pidfd/pidfd_test.c | 12 ---
4 files changed, 111 insertions(+), 13 deletions(-)
create mode 100644 tools/testing/selftests/pidfd/pidfd_fdinfo_test.c

diff --git a/tools/testing/selftests/pidfd/Makefile b/tools/testing/selftests/pidfd/Makefile
index 7550f08822a3..43db1b98e845 100644
--- a/tools/testing/selftests/pidfd/Makefile
+++ b/tools/testing/selftests/pidfd/Makefile
@@ -1,7 +1,7 @@
# SPDX-License-Identifier: GPL-2.0-only
CFLAGS += -g -I../../../../usr/include/ -pthread

-TEST_GEN_PROGS := pidfd_test pidfd_open_test pidfd_poll_test pidfd_wait
+TEST_GEN_PROGS := pidfd_test pidfd_fdinfo_test pidfd_open_test pidfd_poll_test pidfd_wait

include ../lib.mk

diff --git a/tools/testing/selftests/pidfd/pidfd.h b/tools/testing/selftests/pidfd/pidfd.h
index c6bc68329f4b..2946d788645b 100644
--- a/tools/testing/selftests/pidfd/pidfd.h
+++ b/tools/testing/selftests/pidfd/pidfd.h
@@ -84,4 +84,16 @@ static inline int sys_pidfd_send_signal(int pidfd, int sig, siginfo_t *info,
return syscall(__NR_pidfd_send_signal, pidfd, sig, info, flags);
}

+static pid_t pidfd_clone(int flags, int *pidfd, int (*fn)(void *))
+{
+ size_t stack_size = 1024;
+ char *stack[1024] = { 0 };
+
+#ifdef __ia64__
+ return __clone2(fn, stack, stack_size, flags | SIGCHLD, NULL, pidfd);
+#else
+ return clone(fn, stack + stack_size, flags | SIGCHLD, NULL, pidfd);
+#endif
+}
+
#endif /* __PIDFD_H */
diff --git a/tools/testing/selftests/pidfd/pidfd_fdinfo_test.c b/tools/testing/selftests/pidfd/pidfd_fdinfo_test.c
new file mode 100644
index 000000000000..fbae502ad8ad
--- /dev/null
+++ b/tools/testing/selftests/pidfd/pidfd_fdinfo_test.c
@@ -0,0 +1,98 @@
+// SPDX-License-Identifier: GPL-2.0
+
+#define _GNU_SOURCE
+#include <errno.h>
+#include <fcntl.h>
+#include <linux/types.h>
+#include <sched.h>
+#include <signal.h>
+#include <stdio.h>
+#include <stdlib.h>
+#include <string.h>
+#include <syscall.h>
+#include <sys/wait.h>
+
+#include "pidfd.h"
+#include "../kselftest.h"
+
+static int child_fdinfo_nspid_test(void *args)
+{
+ ksft_print_msg("Child: pid %d\n", getpid());
+ return 0;
+}
+
+static int compare_fdinfo_nspid(int pidfd, char *expect, size_t len)
+{
+ char path[512];
+ FILE *f;
+ size_t n = 0;
+ ssize_t k;
+ char *line = NULL;
+ int r = -1;
+
+ snprintf(path, sizeof(path), "/proc/self/fdinfo/%d", pidfd);
+
+ f = fopen(path, "re");
+ if (!f)
+ return -1;
+
+ while ((k = getline(&line, &n, f)) != -1) {
+ if (strncmp(line, "NSpid:", 6))
+ continue;
+
+ line[k - 1] = '\0';
+ ksft_print_msg("Child: fdinfo NSpid line: '%s'.\n", line);
+ r = strncmp(line + 6, expect, len);
+ break;
+ }
+
+ free(line);
+ fclose(f);
+
+ return r;
+}
+
+static void test_pidfd_fdinfo_nspid(void)
+{
+ char expect[512];
+ int pid, pidfd = 0;
+ int n, r;
+ const char *test_name = "pidfd check for NSpid information in fdinfo";
+
+ pid = pidfd_clone(CLONE_PIDFD | CLONE_NEWPID | CLONE_NEWUSER, &pidfd,
+ child_fdinfo_nspid_test);
+ if (pid < 0)
+ ksft_exit_fail_msg(
+ "%s test: pidfd_clone failed (ret %d, errno %d)\n",
+ test_name, pid, errno);
+
+ ksft_print_msg("Parent: child-pid: %d\n", pid);
+
+ /* The child will have pid 1 in the new pid namespace,
+ * so the line must be 'NSPid:\t<pid>\t1'
+ */
+ n = snprintf(expect, sizeof(expect), "\t%d\t%d", pid, 1);
+ r = compare_fdinfo_nspid(pidfd, expect, n);
+
+ (void)close(pidfd);
+
+ if (wait_for_pid(pid))
+ ksft_exit_fail_msg(
+ "%s test: waitpid failed (ret %d, errno %d)\n",
+ test_name, r, errno);
+
+ if (r != 0)
+ ksft_exit_fail_msg("%s test: Failed\n", test_name);
+ else
+ ksft_test_result_pass("%s test: Passed\n", test_name);
+}
+
+int main(int argc, char **argv)
+{
+ ksft_print_header();
+ ksft_set_plan(1);
+
+ test_pidfd_fdinfo_nspid();
+
+ return ksft_exit_pass();
+}
diff --git a/tools/testing/selftests/pidfd/pidfd_test.c b/tools/testing/selftests/pidfd/pidfd_test.c
index 7aff2d3b42c0..9cf0b6b3e389 100644
--- a/tools/testing/selftests/pidfd/pidfd_test.c
+++ b/tools/testing/selftests/pidfd/pidfd_test.c
@@ -27,18 +27,6 @@

#define MAX_EVENTS 5

-static pid_t pidfd_clone(int flags, int *pidfd, int (*fn)(void *))
-{
- size_t stack_size = 1024;
- char *stack[1024] = { 0 };
-
-#ifdef __ia64__
- return __clone2(fn, stack, stack_size, flags | SIGCHLD, NULL, pidfd);
-#else
- return clone(fn, stack + stack_size, flags | SIGCHLD, NULL, pidfd);
-#endif
-}
-
static int signal_received;

static void set_signal_received_on_sigusr1(int sig)
--
2.21.0

2019-10-11 13:17:45

by Christian Brauner

[permalink] [raw]
Subject: Re: [PATCH v3 1/2] pidfd: show pids for nested pid namespaces in fdinfo

On Fri, Oct 11, 2019 at 02:23:20PM +0200, Christian Kellner wrote:
> From: Christian Kellner <[email protected]>
>
> The fdinfo file for a process file descriptor already contains the
> pid of the process in the callers namespaces. Additionally, if pid
> namespaces are configured, show the process ids of the process in
> all nested namespaces in the same format as in the procfs status
> file, i.e. "NSPid:\t%d\%d...". This allows the easy identification
> of the processes in nested namespaces.
>
> Signed-off-by: Christian Kellner <[email protected]>

Reviewed-by: Christian Brauner <[email protected]>

If I hear no technical objections I'll pick this up targeting the 5.5
merge window.

Thanks!
Christian

2019-10-11 13:21:39

by Christian Brauner

[permalink] [raw]
Subject: Re: [PATCH v3 2/2] pidfd: add tests for NSpid info in fdinfo

On Fri, Oct 11, 2019 at 02:23:21PM +0200, Christian Kellner wrote:
> From: Christian Kellner <[email protected]>
>
> Add tests that check that if pid namespaces are configured the fdinfo
> file of a pidfd contains an NSpid: entry containing the process id
> in the current and additionally all nested namespaces.
>
> Signed-off-by: Christian Kellner <[email protected]>

Acked-by: Christian Brauner <[email protected]>

Shuah, can I get an Ack for you from this. If you have no objections I'd
queue up this patchset for the 5.5 merge window.

Thanks!
Christian

2019-10-11 14:59:19

by Jann Horn

[permalink] [raw]
Subject: Re: [PATCH v3 1/2] pidfd: show pids for nested pid namespaces in fdinfo

On Fri, Oct 11, 2019 at 2:23 PM Christian Kellner <[email protected]> wrote:
> The fdinfo file for a process file descriptor already contains the
> pid of the process in the callers namespaces. Additionally, if pid
> namespaces are configured, show the process ids of the process in
> all nested namespaces in the same format as in the procfs status
> file, i.e. "NSPid:\t%d\%d...". This allows the easy identification
> of the processes in nested namespaces.
[...]
> #ifdef CONFIG_PROC_FS
> +static inline void print_pidfd_nspid(struct seq_file *m, struct pid *pid,
> + struct pid_namespace *ns)

`ns` is the namespace of the PID namespace of the procfs instance
through which the file descriptor is being viewed.

> +{
> +#ifdef CONFIG_PID_NS
> + int i;
> +
> + seq_puts(m, "\nNSpid:");
> + for (i = ns->level; i <= pid->level; i++) {

ns->level is the level of the PID namespace associated with the procfs
instance through which the file descriptor is being viewed. pid->level
is the level of the PID associated with the pidfd.

> + ns = pid->numbers[i].ns;
> + seq_put_decimal_ull(m, "\t", pid_nr_ns(pid, ns));
> + }
> +#endif
> +}

I think you assumed that `ns` is always going to contain `pid`.
However, that's not the case. Consider the following scenario:

- the init_pid_ns has two child PID namespaces, A and B (each with
its own mount namespace and procfs instance)
- process P1 lives in A
- process P2 lives in B
- P1 opens a pidfd for itself
- P1 passes the pidfd to P2 (e.g. via a unix domain socket)
- P2 reads /proc/self/fdinfo/$pidfd

Now the loop will print the ID of P1 in A. I don't think that's what
you intended? You might want to bail out if "pid_nr_ns(pid, ns) == 0",
or something like that.

2019-10-11 15:10:43

by Jann Horn

[permalink] [raw]
Subject: Re: [PATCH v2 2/2] pidfd: add tests for NSpid info in fdinfo

On Wed, Oct 9, 2019 at 6:10 PM Christian Kellner <[email protected]> wrote:
> Add tests that check that if pid namespaces are configured the fdinfo
> file of a pidfd contains an NSpid: entry containing the process id
> in the current and additionally all nested namespaces.
[...]
> +static int compare_fdinfo_nspid(int pidfd, char *expect, size_t len)
> +{
> + char path[512];
> + FILE *f;
> + size_t n = 0;
> + ssize_t k;
> + char *line = NULL;
> + int r = -1;
> +
> + snprintf(path, sizeof(path), "/proc/self/fdinfo/%d", pidfd);

(Maybe at some point the selftests code should add some more concise
alternative to snprintf() calls on separate lines. A macro or
something like that so that you can write stuff like `f =
fopen(tprintf("/proc/self/fdinfo/%d", pidfd), "re")`.)

> + f = fopen(path, "re");
> + if (!f)
> + return -1;
> +
> + while ((k = getline(&line, &n, f)) != -1) {
> + if (strncmp(line, "NSpid:", 6))
> + continue;
> +
> + line[k - 1] = '\0';
> + ksft_print_msg("Child: fdinfo NSpid line: '%s'.\n", line);
> + r = strncmp(line + 6, expect, len);

Wouldn't it be better to get rid of the nullbyte assignment and change
the strncmp() into a strcmp() here...

[...]
> + /* The child will have pid 1 in the new pid namespace,
> + * so the line must be 'NSPid:\t<pid>\t1'
> + */
> + n = snprintf(expect, sizeof(expect), "\t%d\t%d", pid, 1);

... and add a "\n" to the format string? It's shorter and doesn't
silently ignore it if the line doesn't end at that point.

2019-10-11 15:18:00

by Christian Brauner

[permalink] [raw]
Subject: Re: [PATCH v3 1/2] pidfd: show pids for nested pid namespaces in fdinfo

On Fri, Oct 11, 2019 at 04:55:59PM +0200, Jann Horn wrote:
> On Fri, Oct 11, 2019 at 2:23 PM Christian Kellner <[email protected]> wrote:
> > The fdinfo file for a process file descriptor already contains the
> > pid of the process in the callers namespaces. Additionally, if pid
> > namespaces are configured, show the process ids of the process in
> > all nested namespaces in the same format as in the procfs status
> > file, i.e. "NSPid:\t%d\%d...". This allows the easy identification
> > of the processes in nested namespaces.
> [...]
> > #ifdef CONFIG_PROC_FS
> > +static inline void print_pidfd_nspid(struct seq_file *m, struct pid *pid,
> > + struct pid_namespace *ns)
>
> `ns` is the namespace of the PID namespace of the procfs instance
> through which the file descriptor is being viewed.
>
> > +{
> > +#ifdef CONFIG_PID_NS
> > + int i;
> > +
> > + seq_puts(m, "\nNSpid:");
> > + for (i = ns->level; i <= pid->level; i++) {
>
> ns->level is the level of the PID namespace associated with the procfs
> instance through which the file descriptor is being viewed. pid->level
> is the level of the PID associated with the pidfd.
>
> > + ns = pid->numbers[i].ns;
> > + seq_put_decimal_ull(m, "\t", pid_nr_ns(pid, ns));
> > + }
> > +#endif
> > +}
>
> I think you assumed that `ns` is always going to contain `pid`.
> However, that's not the case. Consider the following scenario:
>
> - the init_pid_ns has two child PID namespaces, A and B (each with
> its own mount namespace and procfs instance)
> - process P1 lives in A
> - process P2 lives in B
> - P1 opens a pidfd for itself
> - P1 passes the pidfd to P2 (e.g. via a unix domain socket)
> - P2 reads /proc/self/fdinfo/$pidfd
>
> Now the loop will print the ID of P1 in A. I don't think that's what
> you intended? You might want to bail out if "pid_nr_ns(pid, ns) == 0",
> or something like that.

I assumed the same thing happens when you pass around an fd for
/proc/self/status and that's why I didn't object to this behavior.

Christian

2019-10-11 15:31:48

by Jann Horn

[permalink] [raw]
Subject: Re: [PATCH v3 1/2] pidfd: show pids for nested pid namespaces in fdinfo

On Fri, Oct 11, 2019 at 5:17 PM Christian Brauner
<[email protected]> wrote:
>
> On Fri, Oct 11, 2019 at 04:55:59PM +0200, Jann Horn wrote:
> > On Fri, Oct 11, 2019 at 2:23 PM Christian Kellner <[email protected]> wrote:
> > > The fdinfo file for a process file descriptor already contains the
> > > pid of the process in the callers namespaces. Additionally, if pid
> > > namespaces are configured, show the process ids of the process in
> > > all nested namespaces in the same format as in the procfs status
> > > file, i.e. "NSPid:\t%d\%d...". This allows the easy identification
> > > of the processes in nested namespaces.
> > [...]
> > > #ifdef CONFIG_PROC_FS
> > > +static inline void print_pidfd_nspid(struct seq_file *m, struct pid *pid,
> > > + struct pid_namespace *ns)
> >
> > `ns` is the namespace of the PID namespace of the procfs instance
> > through which the file descriptor is being viewed.
> >
> > > +{
> > > +#ifdef CONFIG_PID_NS
> > > + int i;
> > > +
> > > + seq_puts(m, "\nNSpid:");
> > > + for (i = ns->level; i <= pid->level; i++) {
> >
> > ns->level is the level of the PID namespace associated with the procfs
> > instance through which the file descriptor is being viewed. pid->level
> > is the level of the PID associated with the pidfd.
> >
> > > + ns = pid->numbers[i].ns;
> > > + seq_put_decimal_ull(m, "\t", pid_nr_ns(pid, ns));
> > > + }
> > > +#endif
> > > +}
> >
> > I think you assumed that `ns` is always going to contain `pid`.
> > However, that's not the case. Consider the following scenario:
> >
> > - the init_pid_ns has two child PID namespaces, A and B (each with
> > its own mount namespace and procfs instance)
> > - process P1 lives in A
> > - process P2 lives in B
> > - P1 opens a pidfd for itself
> > - P1 passes the pidfd to P2 (e.g. via a unix domain socket)
> > - P2 reads /proc/self/fdinfo/$pidfd
> >
> > Now the loop will print the ID of P1 in A. I don't think that's what
> > you intended? You might want to bail out if "pid_nr_ns(pid, ns) == 0",
> > or something like that.
>
> I assumed the same thing happens when you pass around an fd for
> /proc/self/status and that's why I didn't object to this behavior.

I don't see how /proc/$pid/status is relevant. In the
/proc/$pid/status case, the output is the list of PIDs starting at the
PID namespace the procfs is associated with; and the process is always
contained in that namespace, which also means that the first PID
listed is the one in the PID namespace of the procfs instance. In the
pidfd case, the process is not necessarily contained in that
namespace, and the output doesn't make sense.

2019-10-11 16:59:46

by Christian Brauner

[permalink] [raw]
Subject: Re: [PATCH v3 1/2] pidfd: show pids for nested pid namespaces in fdinfo

On Fri, Oct 11, 2019 at 05:30:03PM +0200, Jann Horn wrote:
> On Fri, Oct 11, 2019 at 5:17 PM Christian Brauner
> <[email protected]> wrote:
> >
> > On Fri, Oct 11, 2019 at 04:55:59PM +0200, Jann Horn wrote:
> > > On Fri, Oct 11, 2019 at 2:23 PM Christian Kellner <[email protected]> wrote:
> > > > The fdinfo file for a process file descriptor already contains the
> > > > pid of the process in the callers namespaces. Additionally, if pid
> > > > namespaces are configured, show the process ids of the process in
> > > > all nested namespaces in the same format as in the procfs status
> > > > file, i.e. "NSPid:\t%d\%d...". This allows the easy identification
> > > > of the processes in nested namespaces.
> > > [...]
> > > > #ifdef CONFIG_PROC_FS
> > > > +static inline void print_pidfd_nspid(struct seq_file *m, struct pid *pid,
> > > > + struct pid_namespace *ns)
> > >
> > > `ns` is the namespace of the PID namespace of the procfs instance
> > > through which the file descriptor is being viewed.
> > >
> > > > +{
> > > > +#ifdef CONFIG_PID_NS
> > > > + int i;
> > > > +
> > > > + seq_puts(m, "\nNSpid:");
> > > > + for (i = ns->level; i <= pid->level; i++) {
> > >
> > > ns->level is the level of the PID namespace associated with the procfs
> > > instance through which the file descriptor is being viewed. pid->level
> > > is the level of the PID associated with the pidfd.
> > >
> > > > + ns = pid->numbers[i].ns;
> > > > + seq_put_decimal_ull(m, "\t", pid_nr_ns(pid, ns));
> > > > + }
> > > > +#endif
> > > > +}
> > >
> > > I think you assumed that `ns` is always going to contain `pid`.
> > > However, that's not the case. Consider the following scenario:
> > >
> > > - the init_pid_ns has two child PID namespaces, A and B (each with
> > > its own mount namespace and procfs instance)
> > > - process P1 lives in A
> > > - process P2 lives in B
> > > - P1 opens a pidfd for itself
> > > - P1 passes the pidfd to P2 (e.g. via a unix domain socket)
> > > - P2 reads /proc/self/fdinfo/$pidfd
> > >
> > > Now the loop will print the ID of P1 in A. I don't think that's what
> > > you intended? You might want to bail out if "pid_nr_ns(pid, ns) == 0",
> > > or something like that.
> >
> > I assumed the same thing happens when you pass around an fd for
> > /proc/self/status and that's why I didn't object to this behavior.
>
> I don't see how /proc/$pid/status is relevant. In the
> /proc/$pid/status case, the output is the list of PIDs starting at the
> PID namespace the procfs is associated with; and the process is always
> contained in that namespace, which also means that the first PID
> listed is the one in the PID namespace of the procfs instance. In the
> pidfd case, the process is not necessarily contained in that
> namespace, and the output doesn't make sense.

I might be misreading what you're saying.
(Maybe I'm doing something obviously wrong.)
If I compile the following two programs:
b2: https://paste.ubuntu.com/p/xthMsCXy3s/
c2: https://paste.ubuntu.com/p/y5HSzyMQJr/

Then in shell1
sudo unshare --mount --pid --fork --mount-proc

and in shell2
sudo unshare --mount --pid --fork --mount-proc

and run b2 in shell1 and c2 in shell2 which sends around an fd for
/proc/b2/status to c2. Now c2 reads b2's status file via the fd it
received. The c2 will see the pid of b2 in b2's pid namespace even
though the process is not contained in the pid namespace of c2.

Christian

2019-10-11 17:09:35

by Christian Brauner

[permalink] [raw]
Subject: Re: [PATCH v2 2/2] pidfd: add tests for NSpid info in fdinfo

On Fri, Oct 11, 2019 at 05:09:29PM +0200, Jann Horn wrote:
> On Wed, Oct 9, 2019 at 6:10 PM Christian Kellner <[email protected]> wrote:
> > Add tests that check that if pid namespaces are configured the fdinfo
> > file of a pidfd contains an NSpid: entry containing the process id
> > in the current and additionally all nested namespaces.
> [...]
> > +static int compare_fdinfo_nspid(int pidfd, char *expect, size_t len)
> > +{
> > + char path[512];
> > + FILE *f;
> > + size_t n = 0;
> > + ssize_t k;
> > + char *line = NULL;
> > + int r = -1;
> > +
> > + snprintf(path, sizeof(path), "/proc/self/fdinfo/%d", pidfd);
>
> (Maybe at some point the selftests code should add some more concise
> alternative to snprintf() calls on separate lines. A macro or
> something like that so that you can write stuff like `f =
> fopen(tprintf("/proc/self/fdinfo/%d", pidfd), "re")`.)
>
> > + f = fopen(path, "re");
> > + if (!f)
> > + return -1;
> > +
> > + while ((k = getline(&line, &n, f)) != -1) {
> > + if (strncmp(line, "NSpid:", 6))
> > + continue;
> > +
> > + line[k - 1] = '\0';
> > + ksft_print_msg("Child: fdinfo NSpid line: '%s'.\n", line);
> > + r = strncmp(line + 6, expect, len);
>
> Wouldn't it be better to get rid of the nullbyte assignment and change
> the strncmp() into a strcmp() here...
>
> [...]
> > + /* The child will have pid 1 in the new pid namespace,
> > + * so the line must be 'NSPid:\t<pid>\t1'
> > + */
> > + n = snprintf(expect, sizeof(expect), "\t%d\t%d", pid, 1);
>
> ... and add a "\n" to the format string? It's shorter and doesn't
> silently ignore it if the line doesn't end at that point.

Also, what Christian just told me and what I wanted to suggest is that
we add tests for sending around pidfds and reading fdinfo too.

2019-10-11 18:22:00

by Jann Horn

[permalink] [raw]
Subject: Re: [PATCH v3 1/2] pidfd: show pids for nested pid namespaces in fdinfo

On Fri, Oct 11, 2019 at 6:58 PM Christian Brauner
<[email protected]> wrote:
> On Fri, Oct 11, 2019 at 05:30:03PM +0200, Jann Horn wrote:
> > On Fri, Oct 11, 2019 at 5:17 PM Christian Brauner
> > <[email protected]> wrote:
> > >
> > > On Fri, Oct 11, 2019 at 04:55:59PM +0200, Jann Horn wrote:
> > > > On Fri, Oct 11, 2019 at 2:23 PM Christian Kellner <[email protected]> wrote:
> > > > > The fdinfo file for a process file descriptor already contains the
> > > > > pid of the process in the callers namespaces. Additionally, if pid
> > > > > namespaces are configured, show the process ids of the process in
> > > > > all nested namespaces in the same format as in the procfs status
> > > > > file, i.e. "NSPid:\t%d\%d...". This allows the easy identification
> > > > > of the processes in nested namespaces.
> > > > [...]
> > > > > #ifdef CONFIG_PROC_FS
> > > > > +static inline void print_pidfd_nspid(struct seq_file *m, struct pid *pid,
> > > > > + struct pid_namespace *ns)
> > > >
> > > > `ns` is the namespace of the PID namespace of the procfs instance
> > > > through which the file descriptor is being viewed.
> > > >
> > > > > +{
> > > > > +#ifdef CONFIG_PID_NS
> > > > > + int i;
> > > > > +
> > > > > + seq_puts(m, "\nNSpid:");
> > > > > + for (i = ns->level; i <= pid->level; i++) {
> > > >
> > > > ns->level is the level of the PID namespace associated with the procfs
> > > > instance through which the file descriptor is being viewed. pid->level
> > > > is the level of the PID associated with the pidfd.
> > > >
> > > > > + ns = pid->numbers[i].ns;
> > > > > + seq_put_decimal_ull(m, "\t", pid_nr_ns(pid, ns));
> > > > > + }
> > > > > +#endif
> > > > > +}
> > > >
> > > > I think you assumed that `ns` is always going to contain `pid`.
> > > > However, that's not the case. Consider the following scenario:
> > > >
> > > > - the init_pid_ns has two child PID namespaces, A and B (each with
> > > > its own mount namespace and procfs instance)
> > > > - process P1 lives in A
> > > > - process P2 lives in B
> > > > - P1 opens a pidfd for itself
> > > > - P1 passes the pidfd to P2 (e.g. via a unix domain socket)
> > > > - P2 reads /proc/self/fdinfo/$pidfd
> > > >
> > > > Now the loop will print the ID of P1 in A. I don't think that's what
> > > > you intended? You might want to bail out if "pid_nr_ns(pid, ns) == 0",
> > > > or something like that.
> > >
> > > I assumed the same thing happens when you pass around an fd for
> > > /proc/self/status and that's why I didn't object to this behavior.
> >
> > I don't see how /proc/$pid/status is relevant. In the
> > /proc/$pid/status case, the output is the list of PIDs starting at the
> > PID namespace the procfs is associated with; and the process is always
> > contained in that namespace, which also means that the first PID
> > listed is the one in the PID namespace of the procfs instance. In the
> > pidfd case, the process is not necessarily contained in that
> > namespace, and the output doesn't make sense.
>
> I might be misreading what you're saying.
> (Maybe I'm doing something obviously wrong.)
> If I compile the following two programs:
> b2: https://paste.ubuntu.com/p/xthMsCXy3s/
> c2: https://paste.ubuntu.com/p/y5HSzyMQJr/
>
> Then in shell1
> sudo unshare --mount --pid --fork --mount-proc
>
> and in shell2
> sudo unshare --mount --pid --fork --mount-proc
>
> and run b2 in shell1 and c2 in shell2 which sends around an fd for
> /proc/b2/status to c2. Now c2 reads b2's status file via the fd it
> received. The c2 will see the pid of b2 in b2's pid namespace even
> though the process is not contained in the pid namespace of c2.

Because the reader doesn't matter; the perspective you have on the
system is defined by which pidns the procfs instance you're looking
through is associated with, and here you're looking through shell1's
procfs. It's normal that when you look through another procfs, you see
PIDs differently.

2019-10-12 10:22:04

by Christian Brauner

[permalink] [raw]
Subject: [PATCH] pidfd: add NSpid entries to fdinfo

Currently, the fdinfo file of contains the field Pid:
It contains the pid a given pidfd refers to in the pid namespace of the
opener's procfs instance.
If the pid namespace of the process is not a descendant of the pid
namespace of the procfs instance 0 will be shown as its pid. This is
similar to calling getppid() on a process who's parent is out of it's
pid namespace (e.g. when moving a process into a sibling pid namespace
via setns()).

Add an NSpid field for easy retrieval of the pid in all descendant pid
namespaces:
If pid namespaces are supported this field will contain the pid a given
pidfd refers to for all descendant pid namespaces starting from the
current pid namespace of the opener's procfs instance, i.e. the first
pid entry for Pid and NSpid will be identical.
If the pid namespace of the process is not a descendant of the pid
namespace of the procfs instance 0 will be shown as its first NSpid and
no other NSpid entries will be shown.
Note that this differs from the Pid and NSpid fields in
/proc/<pid>/status where Pid and NSpid are always shown relative to the
pid namespace of the opener's procfs instace. The difference becomes
obvious when sending around a pidfd between pid namespaces from
different trees, i.e. where no ancestoral relation is present between
the pid namespaces:
1. sending around pidfd:
- create two new pid namespaces ns1 and ns2 in the initial pid namespace
(Also take care to create new mount namespaces in the new pid
namespace and mount procfs.)
- create a process with a pidfd in ns1
- send pidfd from ns1 to ns2
- read /proc/self/fdinfo/<pidfd> and observe that Pid and NSpid entry
are 0
- create a process with a pidfd in
- open a pidfd for a process in the initial pid namespace
2. sending around /proc/<pid>/status fd:
- create two new pid namespaces ns1 and ns2 in the initial pid namespace
(Also take care to create new mount namespaces in the new pid
namespace and mount procfs.)
- create a process in ns1
- open /proc/<pid>/status in the initial pid namespace for the process
you created in ns1
- send statusfd from initial pid namespace to ns2
- read statusfd and observe:
- that Pid will contain the pid of the process as seen from the init
- that NSpid will contain the pids of the process for all descendant
pid namespaces starting from the initial pid namespace

Cc: Jann Horn <[email protected]>
Cc: [email protected]
Co-Developed-by: Christian Kellner <[email protected]>
Signed-off-by: Christian Kellner <[email protected]>
Signed-off-by: Christian Brauner <[email protected]>
---
kernel/fork.c | 73 ++++++++++++++++++++++++++++++++++++++++++++++++++-
1 file changed, 72 insertions(+), 1 deletion(-)

diff --git a/kernel/fork.c b/kernel/fork.c
index 1f6c45f6a734..b155bad92d9c 100644
--- a/kernel/fork.c
+++ b/kernel/fork.c
@@ -1695,12 +1695,83 @@ static int pidfd_release(struct inode *inode, struct file *file)
}

#ifdef CONFIG_PROC_FS
+/**
+ * pidfd_show_fdinfo - print information about a pidfd
+ * @m: proc fdinfo file
+ * @f: file referencing a pidfd
+ *
+ * Pid:
+ * This function will print the pid a given pidfd refers to in the pid
+ * namespace of the opener's procfs instance.
+ * If the pid namespace of the process is not a descendant of the pid
+ * namespace of the procfs instance 0 will be shown as its pid. This is
+ * similar to calling getppid() on a process who's parent is out of it's
+ * pid namespace (e.g. when moving a process into a sibling pid namespace
+ * via setns()).
+ *
+ * NSpid:
+ * If pid namespaces are supported then this function will also print the
+ * pid a given pidfd refers to for all descendant pid namespaces starting
+ * from the current pid namespace of the opener's procfs instance, i.e. the
+ * first pid entry for Pid and NSpid will be identical.
+ * If the pid namespace of the process is not a descendant of the pid
+ * namespace of the procfs instance 0 will be shown as its first NSpid and
+ * no other NSpid entries will be shown.
+ * Note that this differs from the Pid and NSpid fields in
+ * /proc/<pid>/status where Pid and NSpid are always shown relative to the
+ * pid namespace of the opener's procfs instace. The difference becomes
+ * obvious when sending around a pidfd between pid namespaces from
+ * different trees, i.e. where no ancestoral relation is present between
+ * the pid namespaces:
+ * 1. sending around pidfd:
+ * - create two new pid namespaces ns1 and ns2 in the initial pid namespace
+ * (Also take care to create new mount namespaces in the new pid
+ * namespace and mount procfs.)
+ * - create a process with a pidfd in ns1
+ * - send pidfd from ns1 to ns2
+ * - read /proc/self/fdinfo/<pidfd> and observe that Pid and NSpid entry
+ * are 0
+ * - create a process with a pidfd in
+ * - open a pidfd for a process in the initial pid namespace
+ * 2. sending around /proc/<pid>/status fd:
+ * - create two new pid namespaces ns1 and ns2 in the initial pid namespace
+ * (Also take care to create new mount namespaces in the new pid
+ * namespace and mount procfs.)
+ * - create a process in ns1
+ * - open /proc/<pid>/status in the initial pid namespace for the process
+ * you created in ns1
+ * - send statusfd from initial pid namespace to ns2
+ * - read statusfd and observe:
+ * - that Pid will contain the pid of the process as seen from the init
+ * - that NSpid will contain the pids of the process for all descendant
+ * pid namespaces starting from the initial pid namespace
+ */
static void pidfd_show_fdinfo(struct seq_file *m, struct file *f)
{
struct pid_namespace *ns = proc_pid_ns(file_inode(m->file));
struct pid *pid = f->private_data;
+ pid_t nr = pid_nr_ns(pid, ns);
+
+ seq_put_decimal_ull(m, "Pid:\t", nr);
+
+#ifdef CONFIG_PID_NS
+ seq_puts(m, "\nNSpid:");
+ if (nr == 0) {
+ /*
+ * If nr is zero the pid namespace of the procfs and the
+ * pid namespace of the pidfd are neither the same pid
+ * namespace nor are they ancestors. Since NSpid and Pid
+ * are always identical in their first entry shortcut it
+ * and simply print 0.
+ */
+ seq_put_decimal_ull(m, "\t", nr);
+ } else {
+ int i;
+ for (i = ns->level; i <= pid->level; i++)
+ seq_put_decimal_ull(m, "\t", pid_nr_ns(pid, pid->numbers[i].ns));
+ }
+#endif

- seq_put_decimal_ull(m, "Pid:\t", pid_nr_ns(pid, ns));
seq_putc(m, '\n');
}
#endif
--
2.23.0

2019-10-12 10:26:03

by Christian Brauner

[permalink] [raw]
Subject: Re: [PATCH] pidfd: add NSpid entries to fdinfo

On Sat, Oct 12, 2019 at 12:19:22PM +0200, Christian Brauner wrote:
> Currently, the fdinfo file of contains the field Pid:
> It contains the pid a given pidfd refers to in the pid namespace of the
> opener's procfs instance.
> If the pid namespace of the process is not a descendant of the pid
> namespace of the procfs instance 0 will be shown as its pid. This is
> similar to calling getppid() on a process who's parent is out of it's
> pid namespace (e.g. when moving a process into a sibling pid namespace
> via setns()).
>
> Add an NSpid field for easy retrieval of the pid in all descendant pid
> namespaces:
> If pid namespaces are supported this field will contain the pid a given
> pidfd refers to for all descendant pid namespaces starting from the
> current pid namespace of the opener's procfs instance, i.e. the first
> pid entry for Pid and NSpid will be identical.
> If the pid namespace of the process is not a descendant of the pid
> namespace of the procfs instance 0 will be shown as its first NSpid and
> no other NSpid entries will be shown.
> Note that this differs from the Pid and NSpid fields in
> /proc/<pid>/status where Pid and NSpid are always shown relative to the
> pid namespace of the opener's procfs instace. The difference becomes
> obvious when sending around a pidfd between pid namespaces from
> different trees, i.e. where no ancestoral relation is present between
> the pid namespaces:
> 1. sending around pidfd:
> - create two new pid namespaces ns1 and ns2 in the initial pid namespace
> (Also take care to create new mount namespaces in the new pid
> namespace and mount procfs.)
> - create a process with a pidfd in ns1
> - send pidfd from ns1 to ns2
> - read /proc/self/fdinfo/<pidfd> and observe that Pid and NSpid entry
> are 0
> - create a process with a pidfd in
> - open a pidfd for a process in the initial pid namespace
> 2. sending around /proc/<pid>/status fd:
> - create two new pid namespaces ns1 and ns2 in the initial pid namespace
> (Also take care to create new mount namespaces in the new pid
> namespace and mount procfs.)
> - create a process in ns1
> - open /proc/<pid>/status in the initial pid namespace for the process
> you created in ns1
> - send statusfd from initial pid namespace to ns2
> - read statusfd and observe:
> - that Pid will contain the pid of the process as seen from the init
> - that NSpid will contain the pids of the process for all descendant
> pid namespaces starting from the initial pid namespace
>
> Cc: Jann Horn <[email protected]>
> Cc: [email protected]
> Co-Developed-by: Christian Kellner <[email protected]>
> Signed-off-by: Christian Kellner <[email protected]>
> Signed-off-by: Christian Brauner <[email protected]>

I think this might be more what we want.
I tried to think of cases where the first entry of Pid is not identical
to the first entry of NSpid but I came up with none. Maybe you do, Jann?

Christian, this is just a quick stab I took. Feel free to pick this up
as a template.

Thanks!
Christian

> ---
> kernel/fork.c | 73 ++++++++++++++++++++++++++++++++++++++++++++++++++-
> 1 file changed, 72 insertions(+), 1 deletion(-)
>
> diff --git a/kernel/fork.c b/kernel/fork.c
> index 1f6c45f6a734..b155bad92d9c 100644
> --- a/kernel/fork.c
> +++ b/kernel/fork.c
> @@ -1695,12 +1695,83 @@ static int pidfd_release(struct inode *inode, struct file *file)
> }
>
> #ifdef CONFIG_PROC_FS
> +/**
> + * pidfd_show_fdinfo - print information about a pidfd
> + * @m: proc fdinfo file
> + * @f: file referencing a pidfd
> + *
> + * Pid:
> + * This function will print the pid a given pidfd refers to in the pid
> + * namespace of the opener's procfs instance.
> + * If the pid namespace of the process is not a descendant of the pid
> + * namespace of the procfs instance 0 will be shown as its pid. This is
> + * similar to calling getppid() on a process who's parent is out of it's
> + * pid namespace (e.g. when moving a process into a sibling pid namespace
> + * via setns()).
> + *
> + * NSpid:
> + * If pid namespaces are supported then this function will also print the
> + * pid a given pidfd refers to for all descendant pid namespaces starting
> + * from the current pid namespace of the opener's procfs instance, i.e. the
> + * first pid entry for Pid and NSpid will be identical.
> + * If the pid namespace of the process is not a descendant of the pid
> + * namespace of the procfs instance 0 will be shown as its first NSpid and
> + * no other NSpid entries will be shown.
> + * Note that this differs from the Pid and NSpid fields in
> + * /proc/<pid>/status where Pid and NSpid are always shown relative to the
> + * pid namespace of the opener's procfs instace. The difference becomes
> + * obvious when sending around a pidfd between pid namespaces from
> + * different trees, i.e. where no ancestoral relation is present between
> + * the pid namespaces:
> + * 1. sending around pidfd:
> + * - create two new pid namespaces ns1 and ns2 in the initial pid namespace
> + * (Also take care to create new mount namespaces in the new pid
> + * namespace and mount procfs.)
> + * - create a process with a pidfd in ns1
> + * - send pidfd from ns1 to ns2
> + * - read /proc/self/fdinfo/<pidfd> and observe that Pid and NSpid entry
> + * are 0
> + * - create a process with a pidfd in
> + * - open a pidfd for a process in the initial pid namespace
> + * 2. sending around /proc/<pid>/status fd:
> + * - create two new pid namespaces ns1 and ns2 in the initial pid namespace
> + * (Also take care to create new mount namespaces in the new pid
> + * namespace and mount procfs.)
> + * - create a process in ns1
> + * - open /proc/<pid>/status in the initial pid namespace for the process
> + * you created in ns1
> + * - send statusfd from initial pid namespace to ns2
> + * - read statusfd and observe:
> + * - that Pid will contain the pid of the process as seen from the init
> + * - that NSpid will contain the pids of the process for all descendant
> + * pid namespaces starting from the initial pid namespace
> + */
> static void pidfd_show_fdinfo(struct seq_file *m, struct file *f)
> {
> struct pid_namespace *ns = proc_pid_ns(file_inode(m->file));
> struct pid *pid = f->private_data;
> + pid_t nr = pid_nr_ns(pid, ns);
> +
> + seq_put_decimal_ull(m, "Pid:\t", nr);
> +
> +#ifdef CONFIG_PID_NS
> + seq_puts(m, "\nNSpid:");
> + if (nr == 0) {
> + /*
> + * If nr is zero the pid namespace of the procfs and the
> + * pid namespace of the pidfd are neither the same pid
> + * namespace nor are they ancestors. Since NSpid and Pid
> + * are always identical in their first entry shortcut it
> + * and simply print 0.
> + */
> + seq_put_decimal_ull(m, "\t", nr);
> + } else {
> + int i;
> + for (i = ns->level; i <= pid->level; i++)
> + seq_put_decimal_ull(m, "\t", pid_nr_ns(pid, pid->numbers[i].ns));
> + }
> +#endif
>
> - seq_put_decimal_ull(m, "Pid:\t", pid_nr_ns(pid, ns));
> seq_putc(m, '\n');
> }
> #endif
> --
> 2.23.0
>

2019-10-14 09:51:31

by Christian Kellner

[permalink] [raw]
Subject: Re: [PATCH] pidfd: add NSpid entries to fdinfo

On Sat, 2019-10-12 at 12:21 +0200, Christian Brauner wrote:
> I think this might be more what we want.
Yep, indeed.

> I tried to think of cases where the first entry of Pid is not
> identical
> to the first entry of NSpid but I came up with none. Maybe you do,
> Jann?
Yeah, I don't think that can be the case. By looking at the source of
'pid_nr_ns(pid, ns)' a non-zero return means that a) 'pid' valid, ie.
non-null and b) 'ns' is in the pid namespace hierarchy of 'pid' (at
pid->level, i.e. "pid->numbers[ns->level].ns == ns").

> Christian, this is just a quick stab I took. Feel free to pick this
> up as a template.
Thanks! I slightly re-worked it, with the reasoning above in mind, to
get rid of one of the branches:

+#ifdef CONFIG_PID_NS
+ seq_put_decimal_ull(m, "\nNSpid:\t", nr);
+ if (nr) {
+ int i;
+
+ /* If nr is non-zero it means that 'pid' is valid and that
+ * ns, i.e. the pid namespace associated with the procfs
+ * instance, is in the pid namespace hierarchy of pid.
+ * Start at one level below and print all descending pids.
+ */
+ for (i = ns->level + 1; i <= pid->level; i++) {
+ ns = pid->numbers[i].ns;
+ seq_put_decimal_ull(m, "\t", pid_nr_ns(pid, ns));
+ }
+ }
+#endif

But I now just realized that with the very same reasoning, if nr is
non-zero, we don't need to redo all the checks and can just do:

for (i = ns->level + 1; i <= pid->level; i++)
seq_put_decimal_ull(m, "\t", pid->numbers[i].nr);

If this sounds good to you I resend the patches with the change above.

Thanks,
Christian

2019-10-14 10:33:20

by Christian Brauner

[permalink] [raw]
Subject: Re: [PATCH] pidfd: add NSpid entries to fdinfo

On Mon, Oct 14, 2019 at 11:43:01AM +0200, Christian Kellner wrote:
> On Sat, 2019-10-12 at 12:21 +0200, Christian Brauner wrote:
> > I think this might be more what we want.
> Yep, indeed.
>
> > I tried to think of cases where the first entry of Pid is not
> > identical
> > to the first entry of NSpid but I came up with none. Maybe you do,
> > Jann?
> Yeah, I don't think that can be the case. By looking at the source of
> 'pid_nr_ns(pid, ns)' a non-zero return means that a) 'pid' valid, ie.
> non-null and b) 'ns' is in the pid namespace hierarchy of 'pid' (at
> pid->level, i.e. "pid->numbers[ns->level].ns == ns").
>
> > Christian, this is just a quick stab I took. Feel free to pick this
> > up as a template.
> Thanks! I slightly re-worked it, with the reasoning above in mind, to
> get rid of one of the branches:

Thanks!

>
> +#ifdef CONFIG_PID_NS
> + seq_put_decimal_ull(m, "\nNSpid:\t", nr);
> + if (nr) {
> + int i;
> +
> + /* If nr is non-zero it means that 'pid' is valid and that
> + * ns, i.e. the pid namespace associated with the procfs
> + * instance, is in the pid namespace hierarchy of pid.
> + * Start at one level below and print all descending pids.
> + */
> + for (i = ns->level + 1; i <= pid->level; i++) {
> + ns = pid->numbers[i].ns;

I'm not a fan of overriding the "ns" pointer. It's not a huge deal but
it's rather subtle.

> + seq_put_decimal_ull(m, "\t", pid_nr_ns(pid, ns));
> + }
> + }
> +#endif
>
> But I now just realized that with the very same reasoning, if nr is
> non-zero, we don't need to redo all the checks and can just do:
>
> for (i = ns->level + 1; i <= pid->level; i++)
> seq_put_decimal_ull(m, "\t", pid->numbers[i].nr);
>
> If this sounds good to you I resend the patches with the change above.

You could probably do:

#ifdef CONFIG_PID_NS
seq_put_decimal_ull(m, "\nNSpid:\t", nr);
for (i = ns->level + 1; i <= pid->level && nr; i++)
seq_put_decimal_ull(m, "\t", pid->numbers[i].nr);
#endif

Christian

2019-10-14 15:10:58

by Jann Horn

[permalink] [raw]
Subject: Re: [PATCH] pidfd: add NSpid entries to fdinfo

On Sat, Oct 12, 2019 at 12:19 PM Christian Brauner
<[email protected]> wrote:
> Currently, the fdinfo file of contains the field Pid:

nit: something missing after "of"?

> It contains the pid a given pidfd refers to in the pid namespace of the
> opener's procfs instance.

s/opener's // here and in various places below? "the opener's" is
kinda misleading since it sounds as if it has something to do with
task identity.

> If the pid namespace of the process is not a descendant of the pid
> namespace of the procfs instance 0 will be shown as its pid. This is
> similar to calling getppid() on a process who's parent is out of it's

nit: s/who's/whose/

nit: s/it's/its/

> pid namespace (e.g. when moving a process into a sibling pid namespace
> via setns()).

You can't move a process into a PID namespace via setns(), you can
only set the PID namespace for its children; and even then, you can't
set that to a sibling PID namespace, you can only move into descendant
PID namespaces.

> Add an NSpid field for easy retrieval of the pid in all descendant pid
> namespaces:
> If pid namespaces are supported this field will contain the pid a given
> pidfd refers to for all descendant pid namespaces starting from the
> current pid namespace of the opener's procfs instance, i.e. the first

s/current // - neither tasks nor procfs instances can change which pid
namespace they're associated with

> pid entry for Pid and NSpid will be identical.

*the Pid field and the first entry in the NSpid field?

> If the pid namespace of the process is not a descendant of the pid
> namespace of the procfs instance 0 will be shown as its first NSpid and
> no other NSpid entries will be shown.
> Note that this differs from the Pid and NSpid fields in
> /proc/<pid>/status where Pid and NSpid are always shown relative to the
> pid namespace of the opener's procfs instace. The difference becomes
> obvious when sending around a pidfd between pid namespaces from
> different trees, i.e. where no ancestoral relation is present between
> the pid namespaces:
> 1. sending around pidfd:
> - create two new pid namespaces ns1 and ns2 in the initial pid namespace
> (Also take care to create new mount namespaces in the new pid
> namespace and mount procfs.)
> - create a process with a pidfd in ns1
> - send pidfd from ns1 to ns2
> - read /proc/self/fdinfo/<pidfd> and observe that Pid and NSpid entry
> are 0
> - create a process with a pidfd in

truncated phrase?

> - open a pidfd for a process in the initial pid namespace
> 2. sending around /proc/<pid>/status fd:
> - create two new pid namespaces ns1 and ns2 in the initial pid namespace
> (Also take care to create new mount namespaces in the new pid
> namespace and mount procfs.)
> - create a process in ns1
> - open /proc/<pid>/status in the initial pid namespace for the process
> you created in ns1
> - send statusfd from initial pid namespace to ns2
> - read statusfd and observe:
> - that Pid will contain the pid of the process as seen from the init
> - that NSpid will contain the pids of the process for all descendant
> pid namespaces starting from the initial pid namespace

You don't need fd passing for case 2, you can just look at the procfs
instance that's mounted in the initial mount namespace. Using fd
passing in this example kinda obfuscates what's going on, in my
opinion.


> +/**
> + * pidfd_show_fdinfo - print information about a pidfd
> + * @m: proc fdinfo file
> + * @f: file referencing a pidfd
> + *
> + * Pid:
> + * This function will print the pid a given pidfd refers to in the pid
> + * namespace of the opener's procfs instance.
> + * If the pid namespace of the process is not a descendant of the pid
> + * namespace of the procfs instance 0 will be shown as its pid. This is
> + * similar to calling getppid() on a process who's parent is out of it's
> + * pid namespace (e.g. when moving a process into a sibling pid namespace
> + * via setns()).
> + * NSpid:
> + * If pid namespaces are supported then this function will also print the
> + * pid a given pidfd refers to for all descendant pid namespaces starting
> + * from the current pid namespace of the opener's procfs instance, i.e. the
> + * first pid entry for Pid and NSpid will be identical.
> + * If the pid namespace of the process is not a descendant of the pid
> + * namespace of the procfs instance 0 will be shown as its first NSpid and
> + * no other NSpid entries will be shown.
> + * Note that this differs from the Pid and NSpid fields in
> + * /proc/<pid>/status where Pid and NSpid are always shown relative to the
> + * pid namespace of the opener's procfs instace. The difference becomes
> + * obvious when sending around a pidfd between pid namespaces from
> + * different trees, i.e. where no ancestoral relation is present between
> + * the pid namespaces:
> + * 1. sending around pidfd:
> + * - create two new pid namespaces ns1 and ns2 in the initial pid namespace
> + * (Also take care to create new mount namespaces in the new pid
> + * namespace and mount procfs.)
> + * - create a process with a pidfd in ns1
> + * - send pidfd from ns1 to ns2
> + * - read /proc/self/fdinfo/<pidfd> and observe that Pid and NSpid entry
> + * are 0
> + * - create a process with a pidfd in
> + * - open a pidfd for a process in the initial pid namespace
> + * 2. sending around /proc/<pid>/status fd:
> + * - create two new pid namespaces ns1 and ns2 in the initial pid namespace
> + * (Also take care to create new mount namespaces in the new pid
> + * namespace and mount procfs.)
> + * - create a process in ns1
> + * - open /proc/<pid>/status in the initial pid namespace for the process
> + * you created in ns1
> + * - send statusfd from initial pid namespace to ns2
> + * - read statusfd and observe:
> + * - that Pid will contain the pid of the process as seen from the init
> + * - that NSpid will contain the pids of the process for all descendant
> + * pid namespaces starting from the initial pid namespace
> + */

See above, same problems as in the commit message. Actually, since you
have a big comment block with this text, there's no reason to repeat
it in the commit message, right?

(By the way, only slightly related to this patch: At the moment, if
you have a pidfd to a task that has already been reaped, and the PID
has been reallocated to another task, the pidfd will still show up in
/proc/*/fdinfo/* as if it referred to a non-dead process, right? Might
be worth considering whether you want to probe ->tasks[PIDTYPE_PID] or
->tasks[PIDTYPE_TGID] for NULL-ness when printing fdinfo, or something
like that.)

2019-10-14 15:12:40

by Jann Horn

[permalink] [raw]
Subject: Re: [PATCH] pidfd: add NSpid entries to fdinfo

On Mon, Oct 14, 2019 at 12:32 PM Christian Brauner
<[email protected]> wrote:
> On Mon, Oct 14, 2019 at 11:43:01AM +0200, Christian Kellner wrote:
> > On Sat, 2019-10-12 at 12:21 +0200, Christian Brauner wrote:
> > > I tried to think of cases where the first entry of Pid is not
> > > identical
> > > to the first entry of NSpid but I came up with none. Maybe you do,
> > > Jann?
> > Yeah, I don't think that can be the case. By looking at the source of
> > 'pid_nr_ns(pid, ns)' a non-zero return means that a) 'pid' valid, ie.
> > non-null and b) 'ns' is in the pid namespace hierarchy of 'pid' (at
> > pid->level, i.e. "pid->numbers[ns->level].ns == ns").

Agreed.

> You could probably do:
>
> #ifdef CONFIG_PID_NS
> seq_put_decimal_ull(m, "\nNSpid:\t", nr);
> for (i = ns->level + 1; i <= pid->level && nr; i++)
> seq_put_decimal_ull(m, "\t", pid->numbers[i].nr);
> #endif

Personally, I dislike hiding the precondition for running the loop in
the loop statement like that. While it makes the code more concise, it
somewhat obfuscates the high-level logic at a first glance.

2019-10-14 15:22:32

by Christian Kellner

[permalink] [raw]
Subject: Re: [PATCH] pidfd: add NSpid entries to fdinfo

On Mon, 2019-10-14 at 17:10 +0200, Jann Horn wrote:
> On Mon, Oct 14, 2019 at 12:32 PM Christian Brauner wrote:
> > On Mon, Oct 14, 2019 at 11:43:01AM +0200, Christian Kellner wrote:
> > You could probably do:
> >
> > #ifdef CONFIG_PID_NS
> > seq_put_decimal_ull(m, "\nNSpid:\t", nr);
> > for (i = ns->level + 1; i <= pid->level && nr; i++)
> > seq_put_decimal_ull(m, "\t", pid->numbers[i].nr);
> > #endif
>
> Personally, I dislike hiding the precondition for running the loop in
> the loop statement like that. While it makes the code more concise,
> it somewhat obfuscates the high-level logic at a first glance.
I agree and it has the side-effect of needing another #ifdef at the end
of the variable block for "i". I think I will go with:

if (nr) {
int i;

/* If nr is non-zero it means that 'pid' is valid and that
* ns, i.e. the pid namespace associated with the procfs
* instance, is in the pid namespace hierarchy of pid.
* Start at one below the already printed level.
*/
for (i = ns->level + 1; i <= pid->level; i++)
seq_put_decimal_ull(m, "\t", pid->numbers[i].nr);
}

I will re-work the comment block and then send a new version of
the patch.

Thanks,
Christian

2019-10-14 16:54:39

by Christian Kellner

[permalink] [raw]
Subject: [PATCH v4 1/2] pidfd: add NSpid entries to fdinfo

From: Christian Kellner <[email protected]>

Currently, the fdinfo file contains the Pid field which shows the
pid a given pidfd refers to in the pid namespace of the procfs
instance. If pid namespaces are configured, also show an NSpid field
for easy retrieval of the pid in all descendant pid namespaces. If
the pid namespace of the process is not a descendant of the pid
namespace of the procfs instance 0 will be shown as its first NSpid
entry and no other entries will be shown. Add a block comment to
pidfd_show_fdinfo with a detailed explanation of Pid and NSpid fields.

Co-developed-by: Christian Brauner <[email protected]>
Signed-off-by: Christian Brauner <[email protected]>
Signed-off-by: Christian Kellner <[email protected]>
---
Changes in v4:
- Reworked to properly handle the case where the pidfd is from a
different branch in the pid namespace hierarchy; also add block
comment with an in-depth explanation (Christian Brauner)

kernel/fork.c | 53 ++++++++++++++++++++++++++++++++++++++++++++++++++-
1 file changed, 52 insertions(+), 1 deletion(-)

diff --git a/kernel/fork.c b/kernel/fork.c
index bcdf53125210..782986962d47 100644
--- a/kernel/fork.c
+++ b/kernel/fork.c
@@ -1695,12 +1695,63 @@ static int pidfd_release(struct inode *inode, struct file *file)
}

#ifdef CONFIG_PROC_FS
+/**
+ * pidfd_show_fdinfo - print information about a pidfd
+ * @m: proc fdinfo file
+ * @f: file referencing a pidfd
+ *
+ * Pid:
+ * This function will print the pid that a given pidfd refers to in the
+ * pid namespace of the procfs instance.
+ * If the pid namespace of the process is not a descendant of the pid
+ * namespace of the procfs instance 0 will be shown as its pid. This is
+ * similar to calling getppid() on a process whose parent is outside of
+ * its pid namespace.
+ *
+ * NSpid:
+ * If pid namespaces are supported then this function will also print
+ * the pid of a given pidfd refers to for all descendant pid namespaces
+ * starting from the current pid namespace of the instance, i.e. the
+ * Pid field and the first entry in the NSpid field will be identical.
+ * If the pid namespace of the process is not a descendant of the pid
+ * namespace of the procfs instance 0 will be shown as its first NSpid
+ * entry and no others will be shown.
+ * Note that this differs from the Pid and NSpid fields in
+ * /proc/<pid>/status where Pid and NSpid are always shown relative to
+ * the pid namespace of the procfs instance. The difference becomes
+ * obvious when sending around a pidfd between pid namespaces from a
+ * different branch of the tree, i.e. where no ancestoral relation is
+ * present between the pid namespaces:
+ * - create two new pid namespaces ns1 and ns2 in the initial pid
+ * namespace (also take care to create new mount namespaces in the
+ * new pid namespace and mount procfs)
+ * - create a process with a pidfd in ns1
+ * - send pidfd from ns1 to ns2
+ * - read /proc/self/fdinfo/<pidfd> and observe that both Pid and NSpid
+ * have exactly one entry, which is 0
+ */
static void pidfd_show_fdinfo(struct seq_file *m, struct file *f)
{
struct pid_namespace *ns = proc_pid_ns(file_inode(m->file));
struct pid *pid = f->private_data;
+ pid_t nr = pid_nr_ns(pid, ns);
+
+ seq_put_decimal_ull(m, "Pid:\t", nr);

- seq_put_decimal_ull(m, "Pid:\t", pid_nr_ns(pid, ns));
+#ifdef CONFIG_PID_NS
+ seq_put_decimal_ull(m, "\nNSpid:\t", nr);
+ if (nr) {
+ int i;
+
+ /* If nr is non-zero it means that 'pid' is valid and that
+ * ns, i.e. the pid namespace associated with the procfs
+ * instance, is in the pid namespace hierarchy of pid.
+ * Start at one below the already printed level.
+ */
+ for (i = ns->level + 1; i <= pid->level; i++)
+ seq_put_decimal_ull(m, "\t", pid->numbers[i].nr);
+ }
+#endif
seq_putc(m, '\n');
}
#endif
--
2.21.0

2019-10-14 17:17:55

by Christian Brauner

[permalink] [raw]
Subject: Re: [PATCH] pidfd: add NSpid entries to fdinfo

On October 14, 2019 5:09:58 PM GMT+02:00, Jann Horn <[email protected]> wrote:
>On Sat, Oct 12, 2019 at 12:19 PM Christian Brauner
><[email protected]> wrote:
>> Currently, the fdinfo file of contains the field Pid:
>
>nit: something missing after "of"?
>
>> It contains the pid a given pidfd refers to in the pid namespace of
>the
>> opener's procfs instance.
>
>s/opener's // here and in various places below? "the opener's" is
>kinda misleading since it sounds as if it has something to do with
>task identity.
>
>> If the pid namespace of the process is not a descendant of the pid
>> namespace of the procfs instance 0 will be shown as its pid. This is
>> similar to calling getppid() on a process who's parent is out of it's
>
>nit: s/who's/whose/
>
>nit: s/it's/its/
>
>> pid namespace (e.g. when moving a process into a sibling pid
>namespace
>> via setns()).
>
>You can't move a process into a PID namespace via setns(), you can
>only set the PID namespace for its children; and even then, you can't
>set that to a sibling PID namespace, you can only move into descendant
>PID namespaces.

Yes, I know. This was sloppy changelogging on my part.

>
>> Add an NSpid field for easy retrieval of the pid in all descendant
>pid
>> namespaces:
>> If pid namespaces are supported this field will contain the pid a
>given
>> pidfd refers to for all descendant pid namespaces starting from the
>> current pid namespace of the opener's procfs instance, i.e. the first
>
>s/current // - neither tasks nor procfs instances can change which pid
>namespace they're associated with

Yes.

>
>> pid entry for Pid and NSpid will be identical.
>
>*the Pid field and the first entry in the NSpid field?

Yes.

>
>> If the pid namespace of the process is not a descendant of the pid
>> namespace of the procfs instance 0 will be shown as its first NSpid
>and
>> no other NSpid entries will be shown.
>> Note that this differs from the Pid and NSpid fields in
>> /proc/<pid>/status where Pid and NSpid are always shown relative to
>the
>> pid namespace of the opener's procfs instace. The difference becomes
>> obvious when sending around a pidfd between pid namespaces from
>> different trees, i.e. where no ancestoral relation is present between
>> the pid namespaces:
>> 1. sending around pidfd:
>> - create two new pid namespaces ns1 and ns2 in the initial pid
>namespace
>> (Also take care to create new mount namespaces in the new pid
>> namespace and mount procfs.)
>> - create a process with a pidfd in ns1
>> - send pidfd from ns1 to ns2
>> - read /proc/self/fdinfo/<pidfd> and observe that Pid and NSpid entry
>> are 0
>> - create a process with a pidfd in
>
>truncated phrase?

Yeah, as I said this was really just a rough draft for Christian (the other one :)).

>
>> - open a pidfd for a process in the initial pid namespace
>> 2. sending around /proc/<pid>/status fd:
>> - create two new pid namespaces ns1 and ns2 in the initial pid
>namespace
>> (Also take care to create new mount namespaces in the new pid
>> namespace and mount procfs.)
>> - create a process in ns1
>> - open /proc/<pid>/status in the initial pid namespace for the
>process
>> you created in ns1
>> - send statusfd from initial pid namespace to ns2
>> - read statusfd and observe:
>> - that Pid will contain the pid of the process as seen from the
>init
>> - that NSpid will contain the pids of the process for all
>descendant
>> pid namespaces starting from the initial pid namespace
>
>You don't need fd passing for case 2, you can just look at the procfs
>instance that's mounted in the initial mount namespace. Using fd
>passing in this example kinda obfuscates what's going on, in my
>opinion.

My goal was to illustrate the same mechanism leading to different results.
But I don't care much about how this is described as long as I illustrates the difference.

>
>
>> +/**
>> + * pidfd_show_fdinfo - print information about a pidfd
>> + * @m: proc fdinfo file
>> + * @f: file referencing a pidfd
>> + *
>> + * Pid:
>> + * This function will print the pid a given pidfd refers to in the
>pid
>> + * namespace of the opener's procfs instance.
>> + * If the pid namespace of the process is not a descendant of the
>pid
>> + * namespace of the procfs instance 0 will be shown as its pid. This
>is
>> + * similar to calling getppid() on a process who's parent is out of
>it's
>> + * pid namespace (e.g. when moving a process into a sibling pid
>namespace
>> + * via setns()).
>> + * NSpid:
>> + * If pid namespaces are supported then this function will also
>print the
>> + * pid a given pidfd refers to for all descendant pid namespaces
>starting
>> + * from the current pid namespace of the opener's procfs instance,
>i.e. the
>> + * first pid entry for Pid and NSpid will be identical.
>> + * If the pid namespace of the process is not a descendant of the
>pid
>> + * namespace of the procfs instance 0 will be shown as its first
>NSpid and
>> + * no other NSpid entries will be shown.
>> + * Note that this differs from the Pid and NSpid fields in
>> + * /proc/<pid>/status where Pid and NSpid are always shown relative
>to the
>> + * pid namespace of the opener's procfs instace. The difference
>becomes
>> + * obvious when sending around a pidfd between pid namespaces from
>> + * different trees, i.e. where no ancestoral relation is present
>between
>> + * the pid namespaces:
>> + * 1. sending around pidfd:
>> + * - create two new pid namespaces ns1 and ns2 in the initial pid
>namespace
>> + * (Also take care to create new mount namespaces in the new pid
>> + * namespace and mount procfs.)
>> + * - create a process with a pidfd in ns1
>> + * - send pidfd from ns1 to ns2
>> + * - read /proc/self/fdinfo/<pidfd> and observe that Pid and NSpid
>entry
>> + * are 0
>> + * - create a process with a pidfd in
>> + * - open a pidfd for a process in the initial pid namespace
>> + * 2. sending around /proc/<pid>/status fd:
>> + * - create two new pid namespaces ns1 and ns2 in the initial pid
>namespace
>> + * (Also take care to create new mount namespaces in the new pid
>> + * namespace and mount procfs.)
>> + * - create a process in ns1
>> + * - open /proc/<pid>/status in the initial pid namespace for the
>process
>> + * you created in ns1
>> + * - send statusfd from initial pid namespace to ns2
>> + * - read statusfd and observe:
>> + * - that Pid will contain the pid of the process as seen from the
>init
>> + * - that NSpid will contain the pids of the process for all
>descendant
>> + * pid namespaces starting from the initial pid namespace
>> + */
>
>See above, same problems as in the commit message. Actually, since you
>have a big comment block with this text, there's no reason to repeat
>it in the commit message, right?

If the comment gets modified or the logic changes I'd still like to have the actual context recorded in the changelog too.
I think that's good practice.

>
>(By the way, only slightly related to this patch: At the moment, if
>you have a pidfd to a task that has already been reaped, and the PID
>has been reallocated to another task, the pidfd will still show up in
>/proc/*/fdinfo/* as if it referred to a non-dead process, right? Might
>be worth considering whether you want to probe ->tasks[PIDTYPE_PID] or
>->tasks[PIDTYPE_TGID] for NULL-ness when printing fdinfo, or something
>like that.)

Not a big deal but yes, let me put this on my list.
Or do you feel in the mood for a patch? ;)

Christian

2019-10-15 11:20:06

by Christian Brauner

[permalink] [raw]
Subject: Re: [PATCH v4 1/2] pidfd: add NSpid entries to fdinfo

On Mon, Oct 14, 2019 at 06:20:32PM +0200, Christian Kellner wrote:
> From: Christian Kellner <[email protected]>
>
> Currently, the fdinfo file contains the Pid field which shows the
> pid a given pidfd refers to in the pid namespace of the procfs
> instance. If pid namespaces are configured, also show an NSpid field
> for easy retrieval of the pid in all descendant pid namespaces. If
> the pid namespace of the process is not a descendant of the pid
> namespace of the procfs instance 0 will be shown as its first NSpid
> entry and no other entries will be shown. Add a block comment to
> pidfd_show_fdinfo with a detailed explanation of Pid and NSpid fields.
>
> Co-developed-by: Christian Brauner <[email protected]>
> Signed-off-by: Christian Brauner <[email protected]>
> Signed-off-by: Christian Kellner <[email protected]>

Thanks!
Reviewed-by: Christian Brauner <[email protected]>

> ---
> Changes in v4:
> - Reworked to properly handle the case where the pidfd is from a
> different branch in the pid namespace hierarchy; also add block
> comment with an in-depth explanation (Christian Brauner)
>
> kernel/fork.c | 53 ++++++++++++++++++++++++++++++++++++++++++++++++++-
> 1 file changed, 52 insertions(+), 1 deletion(-)
>
> diff --git a/kernel/fork.c b/kernel/fork.c
> index bcdf53125210..782986962d47 100644
> --- a/kernel/fork.c
> +++ b/kernel/fork.c
> @@ -1695,12 +1695,63 @@ static int pidfd_release(struct inode *inode, struct file *file)
> }
>
> #ifdef CONFIG_PROC_FS
> +/**
> + * pidfd_show_fdinfo - print information about a pidfd
> + * @m: proc fdinfo file
> + * @f: file referencing a pidfd
> + *
> + * Pid:
> + * This function will print the pid that a given pidfd refers to in the
> + * pid namespace of the procfs instance.
> + * If the pid namespace of the process is not a descendant of the pid
> + * namespace of the procfs instance 0 will be shown as its pid. This is
> + * similar to calling getppid() on a process whose parent is outside of
> + * its pid namespace.
> + *
> + * NSpid:
> + * If pid namespaces are supported then this function will also print
> + * the pid of a given pidfd refers to for all descendant pid namespaces
> + * starting from the current pid namespace of the instance, i.e. the
> + * Pid field and the first entry in the NSpid field will be identical.
> + * If the pid namespace of the process is not a descendant of the pid
> + * namespace of the procfs instance 0 will be shown as its first NSpid
> + * entry and no others will be shown.
> + * Note that this differs from the Pid and NSpid fields in
> + * /proc/<pid>/status where Pid and NSpid are always shown relative to
> + * the pid namespace of the procfs instance. The difference becomes
> + * obvious when sending around a pidfd between pid namespaces from a
> + * different branch of the tree, i.e. where no ancestoral relation is
> + * present between the pid namespaces:
> + * - create two new pid namespaces ns1 and ns2 in the initial pid
> + * namespace (also take care to create new mount namespaces in the
> + * new pid namespace and mount procfs)
> + * - create a process with a pidfd in ns1
> + * - send pidfd from ns1 to ns2
> + * - read /proc/self/fdinfo/<pidfd> and observe that both Pid and NSpid
> + * have exactly one entry, which is 0
> + */
> static void pidfd_show_fdinfo(struct seq_file *m, struct file *f)
> {
> struct pid_namespace *ns = proc_pid_ns(file_inode(m->file));
> struct pid *pid = f->private_data;
> + pid_t nr = pid_nr_ns(pid, ns);
> +
> + seq_put_decimal_ull(m, "Pid:\t", nr);
>
> - seq_put_decimal_ull(m, "Pid:\t", pid_nr_ns(pid, ns));
> +#ifdef CONFIG_PID_NS
> + seq_put_decimal_ull(m, "\nNSpid:\t", nr);
> + if (nr) {
> + int i;
> +
> + /* If nr is non-zero it means that 'pid' is valid and that

Nit: multiline kernel comment style is usually

/*
* bla
* bla
*/

but I'll just fix this up when applying. No need to resend.

> + * ns, i.e. the pid namespace associated with the procfs
> + * instance, is in the pid namespace hierarchy of pid.
> + * Start at one below the already printed level.
> + */
> + for (i = ns->level + 1; i <= pid->level; i++)
> + seq_put_decimal_ull(m, "\t", pid->numbers[i].nr);
> + }
> +#endif
> seq_putc(m, '\n');
> }
> #endif
> --
> 2.21.0
>