2019-07-27 08:54:13

by Christian Brauner

[permalink] [raw]
Subject: [PATCH v2 0/2] pidfd: waiting on processes through pidfds

Hey everyone,

/* v2 */
This adds the ability to wait on processes using pidfds. This is one of
the few missing pieces to make it possible to manage processes using
only pidfds.

Now major changes have occured since v1. The only thing that was changed
has been to move all find_get_pid() calls into the switch statement to
avoid checking the type argument twice as suggested by Linus.

The core patch for waitid is pleasantly small. The largest change is
caused by adding proper tests for waitid(P_PIDFD).

/* v1 */
Link: https://lore.kernel.org/lkml/[email protected]/

/* v0 */
Link: https://lore.kernel.org/lkml/[email protected]

Christian

Christian Brauner (2):
pidfd: add P_PIDFD to waitid()
pidfd: add pidfd_wait tests

include/linux/pid.h | 4 +
include/uapi/linux/wait.h | 1 +
kernel/exit.c | 29 ++-
kernel/fork.c | 8 +
kernel/signal.c | 7 +-
tools/testing/selftests/pidfd/pidfd.h | 25 +++
tools/testing/selftests/pidfd/pidfd_test.c | 14 --
tools/testing/selftests/pidfd/pidfd_wait.c | 245 +++++++++++++++++++++
8 files changed, 313 insertions(+), 20 deletions(-)
create mode 100644 tools/testing/selftests/pidfd/pidfd_wait.c

--
2.22.0



2019-07-27 08:54:14

by Christian Brauner

[permalink] [raw]
Subject: [PATCH v2 1/2] pidfd: add P_PIDFD to waitid()

This adds the P_PIDFD type to waitid().
One of the last remaining bits for the pidfd api is to make it possible
to wait on pidfds. With P_PIDFD added to waitid() the parts of userspace
that want to use the pidfd api to exclusively manage processes can do so
now.

One of the things this will unblock in the future is the ability to make
it possible to retrieve the exit status via waitid(P_PIDFD) for
non-parent processes if handed a _suitable_ pidfd that has this feature
set. This is similar to what you can do on FreeBSD with kqueue(). It
might even end up being possible to wait on a process as a non-parent if
an appropriate property is enabled on the pidfd.

With P_PIDFD no scoping of the process identified by the pidfd is
possible, i.e. it explicitly blocks things such as wait4(-1), wait4(0),
waitid(P_ALL), waitid(P_PGID) etc. It only allows for semantics
equivalent to wait4(pid), waitid(P_PID). Users that need scoping should
rely on pid-based wait*() syscalls for now.

Signed-off-by: Christian Brauner <[email protected]>
Cc: Arnd Bergmann <[email protected]>
Cc: "Eric W. Biederman" <[email protected]>
Cc: Kees Cook <[email protected]>
Cc: Joel Fernandes (Google) <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: David Howells <[email protected]>
Cc: Jann Horn <[email protected]>
Cc: Andy Lutomirsky <[email protected]>
Cc: Andrew Morton <[email protected]>
Cc: Oleg Nesterov <[email protected]>
Cc: Aleksa Sarai <[email protected]>
Cc: Linus Torvalds <[email protected]>
Cc: Al Viro <[email protected]>
---
v1:
- Linus Torvalds <[email protected]>:
- use flag as discussed before, not a dedicated pidfd_wait() syscall
- Oleg Nesterov <[email protected]>:
- use flag as discussed before, not a dedicated pidfd_wait() syscall

v2:
- Linus Torvalds <[email protected]>:
- move find_get_pid() calls into switch statements to avoid checking
the type argument twice
---
include/linux/pid.h | 4 ++++
include/uapi/linux/wait.h | 1 +
kernel/exit.c | 29 +++++++++++++++++++++++++----
kernel/fork.c | 8 ++++++++
kernel/signal.c | 7 +++++--
5 files changed, 43 insertions(+), 6 deletions(-)

diff --git a/include/linux/pid.h b/include/linux/pid.h
index 2a83e434db9d..9645b1194c98 100644
--- a/include/linux/pid.h
+++ b/include/linux/pid.h
@@ -72,6 +72,10 @@ extern struct pid init_struct_pid;

extern const struct file_operations pidfd_fops;

+struct file;
+
+extern struct pid *pidfd_pid(const struct file *file);
+
static inline struct pid *get_pid(struct pid *pid)
{
if (pid)
diff --git a/include/uapi/linux/wait.h b/include/uapi/linux/wait.h
index ac49a220cf2a..85b809fc9f11 100644
--- a/include/uapi/linux/wait.h
+++ b/include/uapi/linux/wait.h
@@ -17,6 +17,7 @@
#define P_ALL 0
#define P_PID 1
#define P_PGID 2
+#define P_PIDFD 3


#endif /* _UAPI_LINUX_WAIT_H */
diff --git a/kernel/exit.c b/kernel/exit.c
index a75b6a7f458a..207f7a37b2d0 100644
--- a/kernel/exit.c
+++ b/kernel/exit.c
@@ -1555,6 +1555,7 @@ static long do_wait(struct wait_opts *wo)
static long kernel_waitid(int which, pid_t upid, struct waitid_info *infop,
int options, struct rusage *ru)
{
+ struct fd f;
struct wait_opts wo;
struct pid *pid = NULL;
enum pid_type type;
@@ -1574,19 +1575,35 @@ static long kernel_waitid(int which, pid_t upid, struct waitid_info *infop,
type = PIDTYPE_PID;
if (upid <= 0)
return -EINVAL;
+
+ pid = find_get_pid(upid);
break;
case P_PGID:
type = PIDTYPE_PGID;
if (upid <= 0)
return -EINVAL;
+
+ pid = find_get_pid(upid);
+ break;
+ case P_PIDFD:
+ type = PIDTYPE_PID;
+ if (upid < 0)
+ return -EINVAL;
+
+ f = fdget(upid);
+ if (!f.file)
+ return -EBADF;
+
+ pid = pidfd_pid(f.file);
+ if (IS_ERR(pid)) {
+ fdput(f);
+ return PTR_ERR(pid);
+ }
break;
default:
return -EINVAL;
}

- if (type < PIDTYPE_MAX)
- pid = find_get_pid(upid);
-
wo.wo_type = type;
wo.wo_pid = pid;
wo.wo_flags = options;
@@ -1594,7 +1611,11 @@ static long kernel_waitid(int which, pid_t upid, struct waitid_info *infop,
wo.wo_rusage = ru;
ret = do_wait(&wo);

- put_pid(pid);
+ if (which == P_PIDFD)
+ fdput(f);
+ else
+ put_pid(pid);
+
return ret;
}

diff --git a/kernel/fork.c b/kernel/fork.c
index d8ae0f1b4148..b169e2ca2d84 100644
--- a/kernel/fork.c
+++ b/kernel/fork.c
@@ -1690,6 +1690,14 @@ static inline void rcu_copy_process(struct task_struct *p)
#endif /* #ifdef CONFIG_TASKS_RCU */
}

+struct pid *pidfd_pid(const struct file *file)
+{
+ if (file->f_op == &pidfd_fops)
+ return file->private_data;
+
+ return ERR_PTR(-EBADF);
+}
+
static int pidfd_release(struct inode *inode, struct file *file)
{
struct pid *pid = file->private_data;
diff --git a/kernel/signal.c b/kernel/signal.c
index 91b789dd6e72..2e567f64812f 100644
--- a/kernel/signal.c
+++ b/kernel/signal.c
@@ -3672,8 +3672,11 @@ static int copy_siginfo_from_user_any(kernel_siginfo_t *kinfo, siginfo_t *info)

static struct pid *pidfd_to_pid(const struct file *file)
{
- if (file->f_op == &pidfd_fops)
- return file->private_data;
+ struct pid *pid;
+
+ pid = pidfd_pid(file);
+ if (!IS_ERR(pid))
+ return pid;

return tgid_pidfd_to_pid(file);
}
--
2.22.0


2019-07-27 08:55:57

by Christian Brauner

[permalink] [raw]
Subject: [PATCH v2 2/2] pidfd: add pidfd_wait tests

Add tests for pidfd_wait() and CLONE_WAIT_PID:
- test that waitid(P_PIDFD) can wait on a pidfd
- test that waitid(P_PIDFD) can wait on a pidfd and return siginfo_t
- test that waitid(P_PIDFD) works with WEXITED
- test that waitid(P_PIDFD) works with WSTOPPED
- test that waitid(P_PIDFD) works with WUNTRACED
- test that waitid(P_PIDFD) works with WCONTINUED
- test that waitid(P_PIDFD) works with WNOWAIT
- test that waitid(P_PIDFD)works with WNOHANG

Signed-off-by: Christian Brauner <[email protected]>
Cc: Arnd Bergmann <[email protected]>
Cc: "Eric W. Biederman" <[email protected]>
Cc: Kees Cook <[email protected]>
Cc: Joel Fernandes (Google) <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: David Howells <[email protected]>
Cc: Jann Horn <[email protected]>
Cc: Andy Lutomirsky <[email protected]>
Cc: Andrew Morton <[email protected]>
Cc: Oleg Nesterov <[email protected]>
Cc: Aleksa Sarai <[email protected]>
Cc: Linus Torvalds <[email protected]>
Cc: Al Viro <[email protected]>
---
v1:
- Christian Brauner <[email protected]>:
- adapt tests to new P_PIDFD concept

v2: unchanged
---
tools/testing/selftests/pidfd/pidfd.h | 25 +++
tools/testing/selftests/pidfd/pidfd_test.c | 14 --
tools/testing/selftests/pidfd/pidfd_wait.c | 245 +++++++++++++++++++++
3 files changed, 270 insertions(+), 14 deletions(-)
create mode 100644 tools/testing/selftests/pidfd/pidfd_wait.c

diff --git a/tools/testing/selftests/pidfd/pidfd.h b/tools/testing/selftests/pidfd/pidfd.h
index 8452e910463f..7d7d0ca05e0b 100644
--- a/tools/testing/selftests/pidfd/pidfd.h
+++ b/tools/testing/selftests/pidfd/pidfd.h
@@ -16,6 +16,26 @@

#include "../kselftest.h"

+#ifndef P_PIDFD
+#define P_PIDFD 3
+#endif
+
+#ifndef CLONE_PIDFD
+#define CLONE_PIDFD 0x00001000
+#endif
+
+#ifndef __NR_pidfd_open
+#define __NR_pidfd_open -1
+#endif
+
+#ifndef __NR_pidfd_send_signal
+#define __NR_pidfd_send_signal -1
+#endif
+
+#ifndef __NR_clone3
+#define __NR_clone3 -1
+#endif
+
/*
* The kernel reserves 300 pids via RESERVED_PIDS in kernel/pid.c
* That means, when it wraps around any pid < 300 will be skipped.
@@ -53,5 +73,10 @@ int wait_for_pid(pid_t pid)
return WEXITSTATUS(status);
}

+static inline int sys_pidfd_send_signal(int pidfd, int sig, siginfo_t *info,
+ unsigned int flags)
+{
+ return syscall(__NR_pidfd_send_signal, pidfd, sig, info, flags);
+}

#endif /* __PIDFD_H */
diff --git a/tools/testing/selftests/pidfd/pidfd_test.c b/tools/testing/selftests/pidfd/pidfd_test.c
index 7eaa8a3de262..42e3eb494d72 100644
--- a/tools/testing/selftests/pidfd/pidfd_test.c
+++ b/tools/testing/selftests/pidfd/pidfd_test.c
@@ -21,20 +21,12 @@
#include "pidfd.h"
#include "../kselftest.h"

-#ifndef __NR_pidfd_send_signal
-#define __NR_pidfd_send_signal -1
-#endif
-
#define str(s) _str(s)
#define _str(s) #s
#define CHILD_THREAD_MIN_WAIT 3 /* seconds */

#define MAX_EVENTS 5

-#ifndef CLONE_PIDFD
-#define CLONE_PIDFD 0x00001000
-#endif
-
static pid_t pidfd_clone(int flags, int *pidfd, int (*fn)(void *))
{
size_t stack_size = 1024;
@@ -47,12 +39,6 @@ static pid_t pidfd_clone(int flags, int *pidfd, int (*fn)(void *))
#endif
}

-static inline int sys_pidfd_send_signal(int pidfd, int sig, siginfo_t *info,
- unsigned int flags)
-{
- return syscall(__NR_pidfd_send_signal, pidfd, sig, info, flags);
-}
-
static int signal_received;

static void set_signal_received_on_sigusr1(int sig)
diff --git a/tools/testing/selftests/pidfd/pidfd_wait.c b/tools/testing/selftests/pidfd/pidfd_wait.c
new file mode 100644
index 000000000000..018d806032c0
--- /dev/null
+++ b/tools/testing/selftests/pidfd/pidfd_wait.c
@@ -0,0 +1,245 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+
+#define _GNU_SOURCE
+#include <errno.h>
+#include <linux/sched.h>
+#include <linux/types.h>
+#include <signal.h>
+#include <stdint.h>
+#include <stdio.h>
+#include <stdlib.h>
+#include <sched.h>
+#include <string.h>
+#include <sys/resource.h>
+#include <sys/time.h>
+#include <sys/types.h>
+#include <sys/wait.h>
+#include <unistd.h>
+
+#include "pidfd.h"
+#include "../kselftest.h"
+
+#define ptr_to_u64(ptr) ((__u64)((uintptr_t)(ptr)))
+
+static pid_t sys_clone3(struct clone_args *args)
+{
+ return syscall(__NR_clone3, args, sizeof(struct clone_args));
+}
+
+static int sys_waitid(int which, pid_t pid, siginfo_t *info, int options,
+ struct rusage *ru)
+{
+ return syscall(__NR_waitid, which, pid, info, options, ru);
+}
+
+static int test_pidfd_wait_simple(void)
+{
+ const char *test_name = "pidfd wait siginfo";
+ int pidfd = -1, status = 0;
+ pid_t parent_tid = -1;
+ struct clone_args args = {
+ .parent_tid = ptr_to_u64(&parent_tid),
+ .pidfd = ptr_to_u64(&pidfd),
+ .flags = CLONE_PIDFD | CLONE_PARENT_SETTID,
+ .exit_signal = SIGCHLD,
+ };
+ int ret;
+ pid_t pid;
+ siginfo_t info = {
+ .si_signo = 0,
+ };
+
+ pid = sys_clone3(&args);
+ if (pid < 0)
+ ksft_exit_fail_msg("%s test: failed to create new process %s\n",
+ test_name, strerror(errno));
+
+ if (pid == 0)
+ exit(EXIT_SUCCESS);
+
+ pid = sys_waitid(P_PIDFD, pidfd, &info, WEXITED, NULL);
+ if (pid < 0)
+ ksft_exit_fail_msg(
+ "%s test: failed to wait on process with pid %d and pidfd %d: %s\n",
+ test_name, parent_tid, pidfd, strerror(errno));
+
+ if (!WIFEXITED(info.si_status) || WEXITSTATUS(info.si_status))
+ ksft_exit_fail_msg(
+ "%s test: unexpected status received after waiting on process with pid %d and pidfd %d: %s\n",
+ test_name, parent_tid, pidfd, strerror(errno));
+ close(pidfd);
+
+ if (info.si_signo != SIGCHLD)
+ ksft_exit_fail_msg(
+ "%s test: unexpected si_signo value %d received after waiting on process with pid %d and pidfd %d: %s\n",
+ test_name, info.si_signo, parent_tid, pidfd,
+ strerror(errno));
+
+ if (info.si_code != CLD_EXITED)
+ ksft_exit_fail_msg(
+ "%s test: unexpected si_code value %d received after waiting on process with pid %d and pidfd %d: %s\n",
+ test_name, info.si_code, parent_tid, pidfd,
+ strerror(errno));
+
+ if (info.si_pid != parent_tid)
+ ksft_exit_fail_msg(
+ "%s test: unexpected si_pid value %d received after waiting on process with pid %d and pidfd %d: %s\n",
+ test_name, info.si_pid, parent_tid, pidfd,
+ strerror(errno));
+
+ ksft_test_result_pass("%s test: Passed\n", test_name);
+ return 0;
+}
+
+static int test_pidfd_wait_states(void)
+{
+ const char *test_name = "pidfd wait states";
+ int pidfd = -1, status = 0;
+ pid_t parent_tid = -1;
+ struct clone_args args = {
+ .parent_tid = ptr_to_u64(&parent_tid),
+ .pidfd = ptr_to_u64(&pidfd),
+ .flags = CLONE_PIDFD | CLONE_PARENT_SETTID,
+ .exit_signal = SIGCHLD,
+ };
+ int ret;
+ pid_t pid;
+ siginfo_t info = {
+ .si_signo = 0,
+ };
+
+ pid = sys_clone3(&args);
+ if (pid < 0)
+ ksft_exit_fail_msg("%s test: failed to create new process %s\n",
+ test_name, strerror(errno));
+
+ if (pid == 0) {
+ kill(getpid(), SIGSTOP);
+ kill(getpid(), SIGSTOP);
+ exit(EXIT_SUCCESS);
+ }
+
+ ret = sys_waitid(P_PIDFD, pidfd, &info, WSTOPPED, NULL);
+ if (ret < 0)
+ ksft_exit_fail_msg(
+ "%s test: failed to wait on process with pid %d and pidfd %d: %s\n",
+ test_name, parent_tid, pidfd, strerror(errno));
+
+ if (info.si_signo != SIGCHLD)
+ ksft_exit_fail_msg(
+ "%s test: unexpected si_signo value %d received after waiting on process with pid %d and pidfd %d: %s\n",
+ test_name, info.si_signo, parent_tid, pidfd,
+ strerror(errno));
+
+ if (info.si_code != CLD_STOPPED)
+ ksft_exit_fail_msg(
+ "%s test: unexpected si_code value %d received after waiting on process with pid %d and pidfd %d: %s\n",
+ test_name, info.si_code, parent_tid, pidfd,
+ strerror(errno));
+
+ if (info.si_pid != parent_tid)
+ ksft_exit_fail_msg(
+ "%s test: unexpected si_pid value %d received after waiting on process with pid %d and pidfd %d: %s\n",
+ test_name, info.si_pid, parent_tid, pidfd,
+ strerror(errno));
+
+ ret = sys_pidfd_send_signal(pidfd, SIGCONT, NULL, 0);
+ if (ret < 0)
+ ksft_exit_fail_msg(
+ "%s test: failed to wait on process with pid %d and pidfd %d: %s\n",
+ test_name, parent_tid, pidfd, strerror(errno));
+
+ ret = sys_waitid(P_PIDFD, pidfd, &info, WCONTINUED, NULL);
+ if (ret < 0)
+ ksft_exit_fail_msg(
+ "%s test: failed to wait on process with pid %d and pidfd %d: %s\n",
+ test_name, parent_tid, pidfd, strerror(errno));
+
+ if (info.si_signo != SIGCHLD)
+ ksft_exit_fail_msg(
+ "%s test: unexpected si_signo value %d received after waiting on process with pid %d and pidfd %d: %s\n",
+ test_name, info.si_signo, parent_tid, pidfd,
+ strerror(errno));
+
+ if (info.si_code != CLD_CONTINUED)
+ ksft_exit_fail_msg(
+ "%s test: unexpected si_code value %d received after waiting on process with pid %d and pidfd %d: %s\n",
+ test_name, info.si_code, parent_tid, pidfd,
+ strerror(errno));
+
+ if (info.si_pid != parent_tid)
+ ksft_exit_fail_msg(
+ "%s test: unexpected si_pid value %d received after waiting on process with pid %d and pidfd %d: %s\n",
+ test_name, info.si_pid, parent_tid, pidfd,
+ strerror(errno));
+
+ ret = sys_waitid(P_PIDFD, pidfd, &info, WUNTRACED, NULL);
+ if (ret < 0)
+ ksft_exit_fail_msg(
+ "%s test: failed to wait on process with pid %d and pidfd %d: %s\n",
+ test_name, parent_tid, pidfd, strerror(errno));
+
+ if (info.si_signo != SIGCHLD)
+ ksft_exit_fail_msg(
+ "%s test: unexpected si_signo value %d received after waiting on process with pid %d and pidfd %d: %s\n",
+ test_name, info.si_signo, parent_tid, pidfd,
+ strerror(errno));
+
+ if (info.si_code != CLD_STOPPED)
+ ksft_exit_fail_msg(
+ "%s test: unexpected si_code value %d received after waiting on process with pid %d and pidfd %d: %s\n",
+ test_name, info.si_code, parent_tid, pidfd,
+ strerror(errno));
+
+ if (info.si_pid != parent_tid)
+ ksft_exit_fail_msg(
+ "%s test: unexpected si_pid value %d received after waiting on process with pid %d and pidfd %d: %s\n",
+ test_name, info.si_pid, parent_tid, pidfd,
+ strerror(errno));
+
+ ret = sys_pidfd_send_signal(pidfd, SIGKILL, NULL, 0);
+ if (ret < 0)
+ ksft_exit_fail_msg(
+ "%s test: failed to wait on process with pid %d and pidfd %d: %s\n",
+ test_name, parent_tid, pidfd, strerror(errno));
+
+ ret = sys_waitid(P_PIDFD, pidfd, &info, WEXITED, NULL);
+ if (ret < 0)
+ ksft_exit_fail_msg(
+ "%s test: failed to wait on process with pid %d and pidfd %d: %s\n",
+ test_name, parent_tid, pidfd, strerror(errno));
+
+ if (info.si_signo != SIGCHLD)
+ ksft_exit_fail_msg(
+ "%s test: unexpected si_signo value %d received after waiting on process with pid %d and pidfd %d: %s\n",
+ test_name, info.si_signo, parent_tid, pidfd,
+ strerror(errno));
+
+ if (info.si_code != CLD_KILLED)
+ ksft_exit_fail_msg(
+ "%s test: unexpected si_code value %d received after waiting on process with pid %d and pidfd %d: %s\n",
+ test_name, info.si_code, parent_tid, pidfd,
+ strerror(errno));
+
+ if (info.si_pid != parent_tid)
+ ksft_exit_fail_msg(
+ "%s test: unexpected si_pid value %d received after waiting on process with pid %d and pidfd %d: %s\n",
+ test_name, info.si_pid, parent_tid, pidfd,
+ strerror(errno));
+
+ close(pidfd);
+
+ ksft_test_result_pass("%s test: Passed\n", test_name);
+ return 0;
+}
+
+int main(int argc, char **argv)
+{
+ ksft_print_header();
+ ksft_set_plan(2);
+
+ test_pidfd_wait_simple();
+ test_pidfd_wait_states();
+
+ return ksft_exit_pass();
+}
--
2.22.0


2019-07-27 16:37:32

by Linus Torvalds

[permalink] [raw]
Subject: Re: [PATCH v2 1/2] pidfd: add P_PIDFD to waitid()

Sorry to keep pestering about the patch series, but with the addition
of P_PIDFD, I react once again..

On Sat, Jul 27, 2019 at 1:53 AM Christian Brauner <[email protected]> wrote:
>
> --- a/kernel/exit.c
> +++ b/kernel/exit.c
> @@ -1555,6 +1555,7 @@ static long do_wait(struct wait_opts *wo)
> static long kernel_waitid(int which, pid_t upid, struct waitid_info *infop,
> int options, struct rusage *ru)
> {
> + struct fd f;

Please don't do 'struct fd' at this level. That results in this ugly code later:

> - put_pid(pid);
> + if (which == P_PIDFD)
> + fdput(f);
> + else
> + put_pid(pid);

which just looks nasty.

Instead, do all the 'file descriptor to pid' games here:

> + case P_PIDFD:
> + type = PIDTYPE_PID;
> + if (upid < 0)
> + return -EINVAL;
> +
> + f = fdget(upid);
> + if (!f.file)
> + return -EBADF;
> +
> + pid = pidfd_pid(f.file);
> + if (IS_ERR(pid)) {
> + fdput(f);
> + return PTR_ERR(pid);
> + }
> break;

and make thus just do something like

pid = get_pid_from_fd(upid);
if (IS_ERR(pid))
return PTR_ERR(pid);

and now do that "fd to pid" in that helper function, and get the
reference to 'struct pid *' there instead.

Which you can actually do efficiently and lightly without even getting
a ref to the 'struct file'. Something like

struct pid *fd_to_pid(unsigned int fd)
{
struct fd f;
struct pid *pid;

f = fdget(fd);
if (!f.file)
return ERR_PTR(-EBADF);
pid = pidfd_pid(f.file);
if (!IS_ERR(pid))
get_pid(pid);
fdput(f);
return pid;
}

is the stupid and straightforward thing, but if you want to be
*clever* you can actually avoid getting a reference to the 'struct
file *" entirely, and do the fd->pid lookup under rcu_read_lock()
instead. It's slightly more complex, but it avoids the fdget/fdput
reference count games entirely.

And then all that kernel_waitid() ever worries about is "struct pid
*", and the ending goes back to just that simple

put_pid(pid);
return ret;

instead.

This was kind of my point of doing all the "find_get_pid()" games in
the "switch()" statement - the different cases have different ways to
look up what the "struct pid *" pointer should be, but they should all
just look up a pid pointer, and then nothing else needs to care about
'type' any more. See?

Hmm?

Linus

2019-07-27 16:45:45

by Linus Torvalds

[permalink] [raw]
Subject: Re: [PATCH v2 1/2] pidfd: add P_PIDFD to waitid()

On Sat, Jul 27, 2019 at 9:28 AM Linus Torvalds
<[email protected]> wrote:
>
> Something like
>
> struct pid *fd_to_pid(unsigned int fd)
> {
> struct fd f;
> struct pid *pid;
...

I forgot to put my usual disclaimer about TOTALLY UNTESTED GARBAGE in
that email. I want to make that part clear: that code snippet was
meant as a rough guide of direction, not as a "this works".

Hopefully that was clear.

Also note again that one of the reasons I would prefer that
"fd_to_pid()" interface is that you _can_ do it cleverly with RCU
lookup, but that requires a lot of care.

In particular, I think all of our _existing_
"proc_pid(file_inode(file))" users are done while you actually hold a
reference to "struct file *", so they don't have to worry about races
with another thread doing the final ->release(). So the "clever" thing
is possible, but might need a _lot_ of care to make sure the 'struct
pid *' associated with the file still exists.

The example code sequence was not doing the clever thing, obviously.
So it was untested _and_ simple-stupid. But it has the interface that
I'd prefer.

Linus

2019-07-27 16:52:13

by Al Viro

[permalink] [raw]
Subject: Re: [PATCH v2 1/2] pidfd: add P_PIDFD to waitid()

On Sat, Jul 27, 2019 at 09:28:40AM -0700, Linus Torvalds wrote:

> is the stupid and straightforward thing, but if you want to be
> *clever* you can actually avoid getting a reference to the 'struct
> file *" entirely, and do the fd->pid lookup under rcu_read_lock()
> instead. It's slightly more complex, but it avoids the fdget/fdput
> reference count games entirely.

Yecchhh... Please, don't do the last part - at least not unless
we really see that in profiles.

2019-07-27 19:48:50

by Christian Brauner

[permalink] [raw]
Subject: Re: [PATCH v2 1/2] pidfd: add P_PIDFD to waitid()

On Sat, Jul 27, 2019 at 09:28:40AM -0700, Linus Torvalds wrote:
> Sorry to keep pestering about the patch series, but with the addition
> of P_PIDFD, I react once again..

That's fine. I don't at all mind being particular about how something
has to be done as long as the result is functional. In this case it
seems we'll end up with something cleaner overall, so sure.

I'll rework the snippets into the actual patch and resend. I'll leave
out the rcu-cleverness you suggested in the other mail though.

Christian

2019-07-27 19:48:50

by Christian Brauner

[permalink] [raw]
Subject: Re: [PATCH v2 1/2] pidfd: add P_PIDFD to waitid()

On Sat, Jul 27, 2019 at 05:49:32PM +0100, Al Viro wrote:
> On Sat, Jul 27, 2019 at 09:28:40AM -0700, Linus Torvalds wrote:
>
> > is the stupid and straightforward thing, but if you want to be
> > *clever* you can actually avoid getting a reference to the 'struct
> > file *" entirely, and do the fd->pid lookup under rcu_read_lock()
> > instead. It's slightly more complex, but it avoids the fdget/fdput
> > reference count games entirely.
>
> Yecchhh... Please, don't do the last part - at least not unless
> we really see that in profiles.

Yeah, I will leave this out for now.

Christian

2019-07-27 19:48:50

by Christian Brauner

[permalink] [raw]
Subject: Re: [PATCH v2 1/2] pidfd: add P_PIDFD to waitid()

On Sat, Jul 27, 2019 at 09:41:25AM -0700, Linus Torvalds wrote:
> On Sat, Jul 27, 2019 at 9:28 AM Linus Torvalds
> <[email protected]> wrote:
> >
> > Something like
> >
> > struct pid *fd_to_pid(unsigned int fd)
> > {
> > struct fd f;
> > struct pid *pid;
> ...
>
> I forgot to put my usual disclaimer about TOTALLY UNTESTED GARBAGE in
> that email. I want to make that part clear: that code snippet was
> meant as a rough guide of direction, not as a "this works".
>
> Hopefully that was clear.

Yeah. I don't take code someone else has written without verifying or
testing into my own code. And I hope people do the same with mine. :)

Christian