2020-09-02 10:24:33

by Christian Brauner

[permalink] [raw]
Subject: [PATCH v2 0/4] Support non-blocking pidfds

Hi,

Passing a non-blocking pidfd to waitid() currently has no effect, i.e.
is not supported. There are users which would like to use waitid() on
pidfds that are O_NONBLOCK and mix it with pidfds that are blocking and
both pass them to waitid().
The expected behavior is to have waitid() return -EAGAIN for
non-blocking pidfds and to block for blocking pidfds without needing to
perform any additional checks for flags set on the pidfd before passing
it to waitid().
Non-blocking pidfds will return EAGAIN from waitid() when no child
process is ready yet. Returning -EAGAIN for non-blocking pidfds makes it
easier for event loops that handle EAGAIN specially.

It also makes the API more consistent and uniform. In essence, waitid()
is treated like a read on a non-blocking pidfd or a recvmsg() on a
non-blocking socket.
With the addition of support for non-blocking pidfds we support the same
functionality that sockets do. For sockets() recvmsg() supports
MSG_DONTWAIT for pidfds waitid() supports WNOHANG. Both flags are
per-call options. In contrast non-blocking pidfds and non-blocking
sockets are a setting on an open file description affecting all threads
in the calling process as well as other processes that hold file
descriptors referring to the same open file description. Both behaviors,
per call and per open file description, have genuine use-cases.

A concrete use-case that was brought on-list (see [1]) was Josh's async
pidfd library. Ever since the introduction of pidfds and more advanced
async io various programming languages such as Rust have grown support
for async event libraries. These libraries are created to help build
epoll-based event loops around file descriptors. A common pattern is to
automatically make all file descriptors they manage to O_NONBLOCK.

For such libraries the EAGAIN error code is treated specially. When a
function is called that returns EAGAIN the function isn't called again
until the event loop indicates the the file descriptor is ready.
Supporting EAGAIN when waiting on pidfds makes such libraries just work
with little effort.

Thanks!
Christian

[1]: https://lore.kernel.org/lkml/20200811181236.GA18763@localhost/

Christian Brauner (4):
pidfd: support PIDFD_NONBLOCK in pidfd_open()
exit: support non-blocking pidfds
tests: port pidfd_wait to kselftest harness
tests: add waitid() tests for non-blocking pidfds

include/uapi/linux/pidfd.h | 12 +
kernel/exit.c | 15 +-
kernel/pid.c | 12 +-
tools/testing/selftests/pidfd/pidfd.h | 4 +
tools/testing/selftests/pidfd/pidfd_wait.c | 298 +++++++++------------
5 files changed, 157 insertions(+), 184 deletions(-)
create mode 100644 include/uapi/linux/pidfd.h


base-commit: d012a7190fc1fd72ed48911e77ca97ba4521bccd
--
2.28.0


2020-09-02 10:24:43

by Christian Brauner

[permalink] [raw]
Subject: [PATCH v2 1/4] pidfd: support PIDFD_NONBLOCK in pidfd_open()

Introduce PIDFD_NONBLOCK to support non-blocking pidfd file descriptors.

Ever since the introduction of pidfds and more advanced async io various
programming languages such as Rust have grown support for async event
libraries. These libraries are created to help build epoll-based event loops
around file descriptors. A common pattern is to automatically make all file
descriptors they manage to O_NONBLOCK.

For such libraries the EAGAIN error code is treated specially. When a function
is called that returns EAGAIN the function isn't called again until the event
loop indicates the the file descriptor is ready. Supporting EAGAIN when
waiting on pidfds makes such libraries just work with little effort. In the
following patch we will extend waitid() internally to support non-blocking
pidfds.

This introduces a new flag PIDFD_NONBLOCK that is equivalent to O_NONBLOCK.
This follows the same patterns we have for other (anon inode) file descriptors
such as EFD_NONBLOCK, IN_NONBLOCK, SFD_NONBLOCK, TFD_NONBLOCK and the same for
close-on-exec flags.

Link: https://lore.kernel.org/lkml/20200811181236.GA18763@localhost/
Link: https://github.com/joshtriplett/async-pidfd
Cc: Kees Cook <[email protected]>
Cc: Sargun Dhillon <[email protected]>
Cc: Oleg Nesterov <[email protected]>
Suggested-by: Josh Triplett <[email protected]>
Signed-off-by: Christian Brauner <[email protected]>
---
/* v2 */
- Christian Brauner <[email protected]>:
- Improve commit message.
---
include/uapi/linux/pidfd.h | 12 ++++++++++++
kernel/pid.c | 12 +++++++-----
2 files changed, 19 insertions(+), 5 deletions(-)
create mode 100644 include/uapi/linux/pidfd.h

diff --git a/include/uapi/linux/pidfd.h b/include/uapi/linux/pidfd.h
new file mode 100644
index 000000000000..5406fbc13074
--- /dev/null
+++ b/include/uapi/linux/pidfd.h
@@ -0,0 +1,12 @@
+/* SPDX-License-Identifier: GPL-2.0 WITH Linux-syscall-note */
+
+#ifndef _UAPI_LINUX_PIDFD_H
+#define _UAPI_LINUX_PIDFD_H
+
+#include <linux/types.h>
+#include <linux/fcntl.h>
+
+/* Flags for pidfd_open(). */
+#define PIDFD_NONBLOCK O_NONBLOCK
+
+#endif /* _UAPI_LINUX_PIDFD_H */
diff --git a/kernel/pid.c b/kernel/pid.c
index b2562a7ce525..74ddbff1a6ba 100644
--- a/kernel/pid.c
+++ b/kernel/pid.c
@@ -43,6 +43,7 @@
#include <linux/sched/task.h>
#include <linux/idr.h>
#include <net/sock.h>
+#include <uapi/linux/pidfd.h>

struct pid init_struct_pid = {
.count = REFCOUNT_INIT(1),
@@ -522,7 +523,8 @@ struct pid *find_ge_pid(int nr, struct pid_namespace *ns)
/**
* pidfd_create() - Create a new pid file descriptor.
*
- * @pid: struct pid that the pidfd will reference
+ * @pid: struct pid that the pidfd will reference
+ * @flags: flags to pass
*
* This creates a new pid file descriptor with the O_CLOEXEC flag set.
*
@@ -532,12 +534,12 @@ struct pid *find_ge_pid(int nr, struct pid_namespace *ns)
* Return: On success, a cloexec pidfd is returned.
* On error, a negative errno number will be returned.
*/
-static int pidfd_create(struct pid *pid)
+static int pidfd_create(struct pid *pid, unsigned int flags)
{
int fd;

fd = anon_inode_getfd("[pidfd]", &pidfd_fops, get_pid(pid),
- O_RDWR | O_CLOEXEC);
+ flags | O_RDWR | O_CLOEXEC);
if (fd < 0)
put_pid(pid);

@@ -565,7 +567,7 @@ SYSCALL_DEFINE2(pidfd_open, pid_t, pid, unsigned int, flags)
int fd;
struct pid *p;

- if (flags)
+ if (flags & ~PIDFD_NONBLOCK)
return -EINVAL;

if (pid <= 0)
@@ -576,7 +578,7 @@ SYSCALL_DEFINE2(pidfd_open, pid_t, pid, unsigned int, flags)
return -ESRCH;

if (pid_has_task(p, PIDTYPE_TGID))
- fd = pidfd_create(p);
+ fd = pidfd_create(p, flags);
else
fd = -EINVAL;

--
2.28.0

2020-09-02 10:25:00

by Christian Brauner

[permalink] [raw]
Subject: [PATCH v2 2/4] exit: support non-blocking pidfds

Passing a non-blocking pidfd to waitid() currently has no effect, i.e. is not
supported. There are users which would like to use waitid() on pidfds that are
O_NONBLOCK and mix it with pidfds that are blocking and both pass them to
waitid().
The expected behavior is to have waitid() return -EAGAIN for non-blocking
pidfds and to block for blocking pidfds without needing to perform any
additional checks for flags set on the pidfd before passing it to waitid().
Non-blocking pidfds will return EAGAIN from waitid() when no child process is
ready yet. Returning -EAGAIN for non-blocking pidfds makes it easier for event
loops that handle EAGAIN specially.

It also makes the API more consistent and uniform. In essence, waitid() is
treated like a read on a non-blocking pidfd or a recvmsg() on a non-blocking
socket.
With the addition of support for non-blocking pidfds we support the same
functionality that sockets do. For sockets() recvmsg() supports MSG_DONTWAIT
for pidfds waitid() supports WNOHANG. Both flags are per-call options. In
contrast non-blocking pidfds and non-blocking sockets are a setting on an open
file description affecting all threads in the calling process as well as other
processes that hold file descriptors referring to the same open file
description. Both behaviors, per call and per open file description, have
genuine use-cases.

The implementation should be straightforward, we simply raise the WNOHANG flag
when a non-blocking pidfd is passed and when do_wait() returns without finding
an eligible task and the pidfd is non-blocking we set EAGAIN. If no child
process exists non-blocking pidfd users will continue to see ECHILD but if
child processes exist but have not yet exited users will see EAGAIN.

A concrete use-case that was brought on-list was Josh's async pidfd library.
Ever since the introduction of pidfds and more advanced async io various
programming languages such as Rust have grown support for async event
libraries. These libraries are created to help build epoll-based event loops
around file descriptors. A common pattern is to automatically make all file
descriptors they manage to O_NONBLOCK.

For such libraries the EAGAIN error code is treated specially. When a function
is called that returns EAGAIN the function isn't called again until the event
loop indicates the the file descriptor is ready. Supporting EAGAIN when
waiting on pidfds makes such libraries just work with little effort.

Link: https://lore.kernel.org/lkml/20200811181236.GA18763@localhost/
Link: https://github.com/joshtriplett/async-pidfd
Cc: Kees Cook <[email protected]>
Cc: Sargun Dhillon <[email protected]>
Cc: Jann Horn <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Oleg Nesterov <[email protected]>
Cc: "Peter Zijlstra (Intel)" <[email protected]>
Suggested-by: Josh Triplett <[email protected]>
Signed-off-by: Christian Brauner <[email protected]>
---
/* v2 */
- Oleg Nesterov <[email protected]>:
- Remove the eagain_error and simple set to EAGAIN in kernel_waitid() if
pidfd is non-blocking and no child process has yet exited.
---
kernel/exit.c | 15 ++++++++++++---
1 file changed, 12 insertions(+), 3 deletions(-)

diff --git a/kernel/exit.c b/kernel/exit.c
index 733e80f334e7..254ea3efe954 100644
--- a/kernel/exit.c
+++ b/kernel/exit.c
@@ -1474,7 +1474,7 @@ static long do_wait(struct wait_opts *wo)
return retval;
}

-static struct pid *pidfd_get_pid(unsigned int fd)
+static struct pid *pidfd_get_pid(unsigned int fd, unsigned int *flags)
{
struct fd f;
struct pid *pid;
@@ -1484,8 +1484,10 @@ static struct pid *pidfd_get_pid(unsigned int fd)
return ERR_PTR(-EBADF);

pid = pidfd_pid(f.file);
- if (!IS_ERR(pid))
+ if (!IS_ERR(pid)) {
get_pid(pid);
+ *flags = f.file->f_flags;
+ }

fdput(f);
return pid;
@@ -1498,6 +1500,7 @@ static long kernel_waitid(int which, pid_t upid, struct waitid_info *infop,
struct pid *pid = NULL;
enum pid_type type;
long ret;
+ unsigned int f_flags = 0;

if (options & ~(WNOHANG|WNOWAIT|WEXITED|WSTOPPED|WCONTINUED|
__WNOTHREAD|__WCLONE|__WALL))
@@ -1531,9 +1534,10 @@ static long kernel_waitid(int which, pid_t upid, struct waitid_info *infop,
if (upid < 0)
return -EINVAL;

- pid = pidfd_get_pid(upid);
+ pid = pidfd_get_pid(upid, &f_flags);
if (IS_ERR(pid))
return PTR_ERR(pid);
+
break;
default:
return -EINVAL;
@@ -1544,7 +1548,12 @@ static long kernel_waitid(int which, pid_t upid, struct waitid_info *infop,
wo.wo_flags = options;
wo.wo_info = infop;
wo.wo_rusage = ru;
+ if (f_flags & O_NONBLOCK)
+ wo.wo_flags |= WNOHANG;
+
ret = do_wait(&wo);
+ if (!ret && (f_flags & O_NONBLOCK))
+ ret = -EAGAIN;

put_pid(pid);
return ret;
--
2.28.0

2020-09-02 10:25:08

by Christian Brauner

[permalink] [raw]
Subject: [PATCH v2 3/4] tests: port pidfd_wait to kselftest harness

All of the new pidfd selftests already use the new kselftest harness
infrastructure. It makes for clearer output, makes the code easier to
understand, and makes adding new tests way simpler.

Cc: Shuah Khan <[email protected]>
Cc: [email protected]
Signed-off-by: Christian Brauner <[email protected]>
---
/* v2 */
unchanged
---
tools/testing/selftests/pidfd/pidfd_wait.c | 213 ++++-----------------
1 file changed, 39 insertions(+), 174 deletions(-)

diff --git a/tools/testing/selftests/pidfd/pidfd_wait.c b/tools/testing/selftests/pidfd/pidfd_wait.c
index 7079f8eef792..075c716f6fb8 100644
--- a/tools/testing/selftests/pidfd/pidfd_wait.c
+++ b/tools/testing/selftests/pidfd/pidfd_wait.c
@@ -17,7 +17,7 @@
#include <unistd.h>

#include "pidfd.h"
-#include "../kselftest.h"
+#include "../kselftest_harness.h"

#define ptr_to_u64(ptr) ((__u64)((uintptr_t)(ptr)))

@@ -32,9 +32,8 @@ static int sys_waitid(int which, pid_t pid, siginfo_t *info, int options,
return syscall(__NR_waitid, which, pid, info, options, ru);
}

-static int test_pidfd_wait_simple(void)
+TEST(wait_simple)
{
- const char *test_name = "pidfd wait simple";
int pidfd = -1, status = 0;
pid_t parent_tid = -1;
struct clone_args args = {
@@ -50,76 +49,40 @@ static int test_pidfd_wait_simple(void)
};

pidfd = open("/proc/self", O_DIRECTORY | O_RDONLY | O_CLOEXEC);
- if (pidfd < 0)
- ksft_exit_fail_msg("%s test: failed to open /proc/self %s\n",
- test_name, strerror(errno));
+ ASSERT_GE(pidfd, 0);

pid = sys_waitid(P_PIDFD, pidfd, &info, WEXITED, NULL);
- if (pid == 0)
- ksft_exit_fail_msg(
- "%s test: succeeded to wait on invalid pidfd %s\n",
- test_name, strerror(errno));
- close(pidfd);
+ ASSERT_NE(pid, 0);
+ EXPECT_EQ(close(pidfd), 0);
pidfd = -1;

pidfd = open("/dev/null", O_RDONLY | O_CLOEXEC);
- if (pidfd == 0)
- ksft_exit_fail_msg("%s test: failed to open /dev/null %s\n",
- test_name, strerror(errno));
+ ASSERT_GE(pidfd, 0);

pid = sys_waitid(P_PIDFD, pidfd, &info, WEXITED, NULL);
- if (pid == 0)
- ksft_exit_fail_msg(
- "%s test: succeeded to wait on invalid pidfd %s\n",
- test_name, strerror(errno));
- close(pidfd);
+ ASSERT_NE(pid, 0);
+ EXPECT_EQ(close(pidfd), 0);
pidfd = -1;

pid = sys_clone3(&args);
- if (pid < 0)
- ksft_exit_fail_msg("%s test: failed to create new process %s\n",
- test_name, strerror(errno));
+ ASSERT_GE(pid, 1);

if (pid == 0)
exit(EXIT_SUCCESS);

pid = sys_waitid(P_PIDFD, pidfd, &info, WEXITED, NULL);
- if (pid < 0)
- ksft_exit_fail_msg(
- "%s test: failed to wait on process with pid %d and pidfd %d: %s\n",
- test_name, parent_tid, pidfd, strerror(errno));
-
- if (!WIFEXITED(info.si_status) || WEXITSTATUS(info.si_status))
- ksft_exit_fail_msg(
- "%s test: unexpected status received after waiting on process with pid %d and pidfd %d: %s\n",
- test_name, parent_tid, pidfd, strerror(errno));
- close(pidfd);
-
- if (info.si_signo != SIGCHLD)
- ksft_exit_fail_msg(
- "%s test: unexpected si_signo value %d received after waiting on process with pid %d and pidfd %d: %s\n",
- test_name, info.si_signo, parent_tid, pidfd,
- strerror(errno));
-
- if (info.si_code != CLD_EXITED)
- ksft_exit_fail_msg(
- "%s test: unexpected si_code value %d received after waiting on process with pid %d and pidfd %d: %s\n",
- test_name, info.si_code, parent_tid, pidfd,
- strerror(errno));
-
- if (info.si_pid != parent_tid)
- ksft_exit_fail_msg(
- "%s test: unexpected si_pid value %d received after waiting on process with pid %d and pidfd %d: %s\n",
- test_name, info.si_pid, parent_tid, pidfd,
- strerror(errno));
-
- ksft_test_result_pass("%s test: Passed\n", test_name);
- return 0;
+ ASSERT_GE(pid, 0);
+ ASSERT_EQ(WIFEXITED(info.si_status), true);
+ ASSERT_EQ(WEXITSTATUS(info.si_status), 0);
+ EXPECT_EQ(close(pidfd), 0);
+
+ ASSERT_EQ(info.si_signo, SIGCHLD);
+ ASSERT_EQ(info.si_code, CLD_EXITED);
+ ASSERT_EQ(info.si_pid, parent_tid);
}

-static int test_pidfd_wait_states(void)
+TEST(wait_states)
{
- const char *test_name = "pidfd wait states";
int pidfd = -1, status = 0;
pid_t parent_tid = -1;
struct clone_args args = {
@@ -135,9 +98,7 @@ static int test_pidfd_wait_states(void)
};

pid = sys_clone3(&args);
- if (pid < 0)
- ksft_exit_fail_msg("%s test: failed to create new process %s\n",
- test_name, strerror(errno));
+ ASSERT_GE(pid, 0);

if (pid == 0) {
kill(getpid(), SIGSTOP);
@@ -145,127 +106,31 @@ static int test_pidfd_wait_states(void)
exit(EXIT_SUCCESS);
}

- ret = sys_waitid(P_PIDFD, pidfd, &info, WSTOPPED, NULL);
- if (ret < 0)
- ksft_exit_fail_msg(
- "%s test: failed to wait on WSTOPPED process with pid %d and pidfd %d: %s\n",
- test_name, parent_tid, pidfd, strerror(errno));
-
- if (info.si_signo != SIGCHLD)
- ksft_exit_fail_msg(
- "%s test: unexpected si_signo value %d received after waiting on process with pid %d and pidfd %d: %s\n",
- test_name, info.si_signo, parent_tid, pidfd,
- strerror(errno));
-
- if (info.si_code != CLD_STOPPED)
- ksft_exit_fail_msg(
- "%s test: unexpected si_code value %d received after waiting on process with pid %d and pidfd %d: %s\n",
- test_name, info.si_code, parent_tid, pidfd,
- strerror(errno));
-
- if (info.si_pid != parent_tid)
- ksft_exit_fail_msg(
- "%s test: unexpected si_pid value %d received after waiting on process with pid %d and pidfd %d: %s\n",
- test_name, info.si_pid, parent_tid, pidfd,
- strerror(errno));
-
- ret = sys_pidfd_send_signal(pidfd, SIGCONT, NULL, 0);
- if (ret < 0)
- ksft_exit_fail_msg(
- "%s test: failed to send signal to process with pid %d and pidfd %d: %s\n",
- test_name, parent_tid, pidfd, strerror(errno));
-
- ret = sys_waitid(P_PIDFD, pidfd, &info, WCONTINUED, NULL);
- if (ret < 0)
- ksft_exit_fail_msg(
- "%s test: failed to wait WCONTINUED on process with pid %d and pidfd %d: %s\n",
- test_name, parent_tid, pidfd, strerror(errno));
+ ASSERT_EQ(sys_waitid(P_PIDFD, pidfd, &info, WSTOPPED, NULL), 0);
+ ASSERT_EQ(info.si_signo, SIGCHLD);
+ ASSERT_EQ(info.si_code, CLD_STOPPED);
+ ASSERT_EQ(info.si_pid, parent_tid);

- if (info.si_signo != SIGCHLD)
- ksft_exit_fail_msg(
- "%s test: unexpected si_signo value %d received after waiting on process with pid %d and pidfd %d: %s\n",
- test_name, info.si_signo, parent_tid, pidfd,
- strerror(errno));
+ ASSERT_EQ(sys_pidfd_send_signal(pidfd, SIGCONT, NULL, 0), 0);

- if (info.si_code != CLD_CONTINUED)
- ksft_exit_fail_msg(
- "%s test: unexpected si_code value %d received after waiting on process with pid %d and pidfd %d: %s\n",
- test_name, info.si_code, parent_tid, pidfd,
- strerror(errno));
+ ASSERT_EQ(sys_waitid(P_PIDFD, pidfd, &info, WCONTINUED, NULL), 0);
+ ASSERT_EQ(info.si_signo, SIGCHLD);
+ ASSERT_EQ(info.si_code, CLD_CONTINUED);
+ ASSERT_EQ(info.si_pid, parent_tid);

- if (info.si_pid != parent_tid)
- ksft_exit_fail_msg(
- "%s test: unexpected si_pid value %d received after waiting on process with pid %d and pidfd %d: %s\n",
- test_name, info.si_pid, parent_tid, pidfd,
- strerror(errno));
+ ASSERT_EQ(sys_waitid(P_PIDFD, pidfd, &info, WUNTRACED, NULL), 0);
+ ASSERT_EQ(info.si_signo, SIGCHLD);
+ ASSERT_EQ(info.si_code, CLD_STOPPED);
+ ASSERT_EQ(info.si_pid, parent_tid);

- ret = sys_waitid(P_PIDFD, pidfd, &info, WUNTRACED, NULL);
- if (ret < 0)
- ksft_exit_fail_msg(
- "%s test: failed to wait on WUNTRACED process with pid %d and pidfd %d: %s\n",
- test_name, parent_tid, pidfd, strerror(errno));
+ ASSERT_EQ(sys_pidfd_send_signal(pidfd, SIGKILL, NULL, 0), 0);

- if (info.si_signo != SIGCHLD)
- ksft_exit_fail_msg(
- "%s test: unexpected si_signo value %d received after waiting on process with pid %d and pidfd %d: %s\n",
- test_name, info.si_signo, parent_tid, pidfd,
- strerror(errno));
+ ASSERT_EQ(sys_waitid(P_PIDFD, pidfd, &info, WEXITED, NULL), 0);
+ ASSERT_EQ(info.si_signo, SIGCHLD);
+ ASSERT_EQ(info.si_code, CLD_KILLED);
+ ASSERT_EQ(info.si_pid, parent_tid);

- if (info.si_code != CLD_STOPPED)
- ksft_exit_fail_msg(
- "%s test: unexpected si_code value %d received after waiting on process with pid %d and pidfd %d: %s\n",
- test_name, info.si_code, parent_tid, pidfd,
- strerror(errno));
-
- if (info.si_pid != parent_tid)
- ksft_exit_fail_msg(
- "%s test: unexpected si_pid value %d received after waiting on process with pid %d and pidfd %d: %s\n",
- test_name, info.si_pid, parent_tid, pidfd,
- strerror(errno));
-
- ret = sys_pidfd_send_signal(pidfd, SIGKILL, NULL, 0);
- if (ret < 0)
- ksft_exit_fail_msg(
- "%s test: failed to send SIGKILL to process with pid %d and pidfd %d: %s\n",
- test_name, parent_tid, pidfd, strerror(errno));
-
- ret = sys_waitid(P_PIDFD, pidfd, &info, WEXITED, NULL);
- if (ret < 0)
- ksft_exit_fail_msg(
- "%s test: failed to wait on WEXITED process with pid %d and pidfd %d: %s\n",
- test_name, parent_tid, pidfd, strerror(errno));
-
- if (info.si_signo != SIGCHLD)
- ksft_exit_fail_msg(
- "%s test: unexpected si_signo value %d received after waiting on process with pid %d and pidfd %d: %s\n",
- test_name, info.si_signo, parent_tid, pidfd,
- strerror(errno));
-
- if (info.si_code != CLD_KILLED)
- ksft_exit_fail_msg(
- "%s test: unexpected si_code value %d received after waiting on process with pid %d and pidfd %d: %s\n",
- test_name, info.si_code, parent_tid, pidfd,
- strerror(errno));
-
- if (info.si_pid != parent_tid)
- ksft_exit_fail_msg(
- "%s test: unexpected si_pid value %d received after waiting on process with pid %d and pidfd %d: %s\n",
- test_name, info.si_pid, parent_tid, pidfd,
- strerror(errno));
-
- close(pidfd);
-
- ksft_test_result_pass("%s test: Passed\n", test_name);
- return 0;
+ EXPECT_EQ(close(pidfd), 0);
}

-int main(int argc, char **argv)
-{
- ksft_print_header();
- ksft_set_plan(2);
-
- test_pidfd_wait_simple();
- test_pidfd_wait_states();
-
- return ksft_exit_pass();
-}
+TEST_HARNESS_MAIN
--
2.28.0

2020-09-02 10:27:49

by Christian Brauner

[permalink] [raw]
Subject: [PATCH v2 4/4] tests: add waitid() tests for non-blocking pidfds

Verify that the PIDFD_NONBLOCK flag works with pidfd_open() and that
waitid() with a non-blocking pidfd returns EAGAIN:

TAP version 13
1..3
# Starting 3 tests from 1 test cases.
# RUN global.wait_simple ...
# OK global.wait_simple
ok 1 global.wait_simple
# RUN global.wait_states ...
# OK global.wait_states
ok 2 global.wait_states
# RUN global.wait_nonblock ...
# OK global.wait_nonblock
ok 3 global.wait_nonblock
# PASSED: 3 / 3 tests passed.
# Totals: pass:3 fail:0 xfail:0 xpass:0 skip:0 error:0

Cc: Shuah Khan <[email protected]>
Cc: [email protected]
Signed-off-by: Christian Brauner <[email protected]>
---
/* v2 */
unchanged
---
tools/testing/selftests/pidfd/pidfd.h | 4 ++
tools/testing/selftests/pidfd/pidfd_wait.c | 83 +++++++++++++++++++++-
2 files changed, 86 insertions(+), 1 deletion(-)

diff --git a/tools/testing/selftests/pidfd/pidfd.h b/tools/testing/selftests/pidfd/pidfd.h
index a2c80914e3dc..01f8d3c0cf2c 100644
--- a/tools/testing/selftests/pidfd/pidfd.h
+++ b/tools/testing/selftests/pidfd/pidfd.h
@@ -46,6 +46,10 @@
#define __NR_pidfd_getfd -1
#endif

+#ifndef PIDFD_NONBLOCK
+#define PIDFD_NONBLOCK O_NONBLOCK
+#endif
+
/*
* The kernel reserves 300 pids via RESERVED_PIDS in kernel/pid.c
* That means, when it wraps around any pid < 300 will be skipped.
diff --git a/tools/testing/selftests/pidfd/pidfd_wait.c b/tools/testing/selftests/pidfd/pidfd_wait.c
index 075c716f6fb8..cefce4d3d2f6 100644
--- a/tools/testing/selftests/pidfd/pidfd_wait.c
+++ b/tools/testing/selftests/pidfd/pidfd_wait.c
@@ -21,6 +21,11 @@

#define ptr_to_u64(ptr) ((__u64)((uintptr_t)(ptr)))

+/* Attempt to de-conflict with the selftests tree. */
+#ifndef SKIP
+#define SKIP(s, ...) XFAIL(s, ##__VA_ARGS__)
+#endif
+
static pid_t sys_clone3(struct clone_args *args)
{
return syscall(__NR_clone3, args, sizeof(struct clone_args));
@@ -65,7 +70,7 @@ TEST(wait_simple)
pidfd = -1;

pid = sys_clone3(&args);
- ASSERT_GE(pid, 1);
+ ASSERT_GE(pid, 0);

if (pid == 0)
exit(EXIT_SUCCESS);
@@ -133,4 +138,80 @@ TEST(wait_states)
EXPECT_EQ(close(pidfd), 0);
}

+TEST(wait_nonblock)
+{
+ int pidfd, status = 0;
+ unsigned int flags = 0;
+ pid_t parent_tid = -1;
+ struct clone_args args = {
+ .parent_tid = ptr_to_u64(&parent_tid),
+ .flags = CLONE_PARENT_SETTID,
+ .exit_signal = SIGCHLD,
+ };
+ int ret;
+ pid_t pid;
+ siginfo_t info = {
+ .si_signo = 0,
+ };
+
+ /*
+ * Callers need to see ECHILD with non-blocking pidfds when no child
+ * processes exists.
+ */
+ pidfd = sys_pidfd_open(getpid(), PIDFD_NONBLOCK);
+ EXPECT_GE(pidfd, 0) {
+ /* pidfd_open() doesn't support PIDFD_NONBLOCK. */
+ ASSERT_EQ(errno, EINVAL);
+ SKIP(return, "Skipping PIDFD_NONBLOCK test");
+ }
+
+ pid = sys_waitid(P_PIDFD, pidfd, &info, WEXITED, NULL);
+ ASSERT_LT(pid, 0);
+ ASSERT_EQ(errno, ECHILD);
+ EXPECT_EQ(close(pidfd), 0);
+
+ pid = sys_clone3(&args);
+ ASSERT_GE(pid, 0);
+
+ if (pid == 0) {
+ kill(getpid(), SIGSTOP);
+ exit(EXIT_SUCCESS);
+ }
+
+ pidfd = sys_pidfd_open(pid, PIDFD_NONBLOCK);
+ EXPECT_GE(pidfd, 0) {
+ /* pidfd_open() doesn't support PIDFD_NONBLOCK. */
+ ASSERT_EQ(errno, EINVAL);
+ SKIP(return, "Skipping PIDFD_NONBLOCK test");
+ }
+
+ flags = fcntl(pidfd, F_GETFL, 0);
+ ASSERT_GT(flags, 0);
+ ASSERT_GT((flags & O_NONBLOCK), 0);
+
+ /*
+ * Callers need to see EAGAIN/EWOULDBLOCK with non-blocking pidfd when
+ * child processes exist but none have exited.
+ */
+ pid = sys_waitid(P_PIDFD, pidfd, &info, WEXITED, NULL);
+ ASSERT_LT(pid, 0);
+ ASSERT_EQ(errno, EAGAIN);
+
+ ASSERT_EQ(sys_waitid(P_PIDFD, pidfd, &info, WSTOPPED, NULL), 0);
+ ASSERT_EQ(info.si_signo, SIGCHLD);
+ ASSERT_EQ(info.si_code, CLD_STOPPED);
+ ASSERT_EQ(info.si_pid, parent_tid);
+
+ ASSERT_EQ(sys_pidfd_send_signal(pidfd, SIGCONT, NULL, 0), 0);
+
+ ASSERT_EQ(fcntl(pidfd, F_SETFL, (flags & ~O_NONBLOCK)), 0);
+
+ ASSERT_EQ(sys_waitid(P_PIDFD, pidfd, &info, WEXITED, NULL), 0);
+ ASSERT_EQ(info.si_signo, SIGCHLD);
+ ASSERT_EQ(info.si_code, CLD_EXITED);
+ ASSERT_EQ(info.si_pid, parent_tid);
+
+ EXPECT_EQ(close(pidfd), 0);
+}
+
TEST_HARNESS_MAIN
--
2.28.0

2020-09-03 14:37:19

by Oleg Nesterov

[permalink] [raw]
Subject: Re: [PATCH v2 1/4] pidfd: support PIDFD_NONBLOCK in pidfd_open()

On 09/02, Christian Brauner wrote:
>
> -static int pidfd_create(struct pid *pid)
> +static int pidfd_create(struct pid *pid, unsigned int flags)
> {
> int fd;
>
> fd = anon_inode_getfd("[pidfd]", &pidfd_fops, get_pid(pid),
> - O_RDWR | O_CLOEXEC);
> + flags | O_RDWR | O_CLOEXEC);
> if (fd < 0)
> put_pid(pid);
>
> @@ -565,7 +567,7 @@ SYSCALL_DEFINE2(pidfd_open, pid_t, pid, unsigned int, flags)
> int fd;
> struct pid *p;
>
> - if (flags)
> + if (flags & ~PIDFD_NONBLOCK)
> return -EINVAL;
>
> if (pid <= 0)
> @@ -576,7 +578,7 @@ SYSCALL_DEFINE2(pidfd_open, pid_t, pid, unsigned int, flags)
> return -ESRCH;
>
> if (pid_has_task(p, PIDTYPE_TGID))
> - fd = pidfd_create(p);
> + fd = pidfd_create(p, flags);
> else
> fd = -EINVAL;
>

Reviewed-by: Oleg Nesterov <[email protected]>

2020-09-03 14:49:27

by Oleg Nesterov

[permalink] [raw]
Subject: Re: [PATCH v2 2/4] exit: support non-blocking pidfds

On 09/02, Christian Brauner wrote:
>
> It also makes the API more consistent and uniform. In essence, waitid() is
> treated like a read on a non-blocking pidfd or a recvmsg() on a non-blocking
> socket.
> With the addition of support for non-blocking pidfds we support the same
> functionality that sockets do. For sockets() recvmsg() supports MSG_DONTWAIT
> for pidfds waitid() supports WNOHANG.

What I personally do not like is that waitid(WNOHANG) returns zero or EAGAIN
depending on f_flags & O_NONBLOCK... This doesn't match recvmsg(MSG_DONTWAIT)
and doesn't look consistent to me.

Nevermind, the patch looks correct and if you think this can really help
user-space I won't argue.

Reviewed-by: Oleg Nesterov <[email protected]>

2020-09-03 14:59:50

by Oleg Nesterov

[permalink] [raw]
Subject: Re: [PATCH v2 1/4] pidfd: support PIDFD_NONBLOCK in pidfd_open()

Christian, off-topic question...

On 09/02, Christian Brauner wrote:
>
> -static int pidfd_create(struct pid *pid)
> +static int pidfd_create(struct pid *pid, unsigned int flags)
> {
> int fd;
>
> fd = anon_inode_getfd("[pidfd]", &pidfd_fops, get_pid(pid),
> - O_RDWR | O_CLOEXEC);
> + flags | O_RDWR | O_CLOEXEC);

I just noticed this comment above pidfd_create:

* Note, that this function can only be called after the fd table has
* been unshared to avoid leaking the pidfd to the new process.

what does it mean?

Of course, if fd table is shared then pidfd can "leak" to another process,
but this is true for any file and sys_pidfd_open() doesn't do any check?



In fact I think this helper buys nothing but adds the unnecessary get/put_pid,
we can kill it and change pidfd_open() to do

SYSCALL_DEFINE2(pidfd_open, pid_t, pid, unsigned int, flags)
{
int fd;
struct pid *p;

if (flags & ~PIDFD_NONBLOCK)
return -EINVAL;

if (pid <= 0)
return -EINVAL;

p = find_get_pid(pid);
if (!p)
return -ESRCH;

fd = -EINVAL;
if (pid_has_task(p, PIDTYPE_TGID)) {
fd = anon_inode_getfd("[pidfd]", &pidfd_fops, pid,
flags | O_RDWR | O_CLOEXEC);
}
if (fd < 0)
put_pid(p);
return fd;
}

but this is cosmetic and off-topic too.

Oleg.

2020-09-03 15:28:48

by Christian Brauner

[permalink] [raw]
Subject: Re: [PATCH v2 1/4] pidfd: support PIDFD_NONBLOCK in pidfd_open()

On Thu, Sep 03, 2020 at 04:58:09PM +0200, Oleg Nesterov wrote:
> Christian, off-topic question...
>
> On 09/02, Christian Brauner wrote:
> >
> > -static int pidfd_create(struct pid *pid)
> > +static int pidfd_create(struct pid *pid, unsigned int flags)
> > {
> > int fd;
> >
> > fd = anon_inode_getfd("[pidfd]", &pidfd_fops, get_pid(pid),
> > - O_RDWR | O_CLOEXEC);
> > + flags | O_RDWR | O_CLOEXEC);
>
> I just noticed this comment above pidfd_create:
>
> * Note, that this function can only be called after the fd table has
> * been unshared to avoid leaking the pidfd to the new process.
>
> what does it mean?
>
> Of course, if fd table is shared then pidfd can "leak" to another process,
> but this is true for any file and sys_pidfd_open() doesn't do any check?

It's the same comment we added in kernel/fork.c to make callers aware
that they can leak a pidfd to another process unintentionally. Sure,
this is true of any fd but since pidfds were a new type of handle and on
another process at that we felt that this was important to spell out. The
"can only" should've arguably been "should probably".

>
>
>
> In fact I think this helper buys nothing but adds the unnecessary get/put_pid,
> we can kill it and change pidfd_open() to do
>
> SYSCALL_DEFINE2(pidfd_open, pid_t, pid, unsigned int, flags)
> {
> int fd;
> struct pid *p;
>
> if (flags & ~PIDFD_NONBLOCK)
> return -EINVAL;
>
> if (pid <= 0)
> return -EINVAL;
>
> p = find_get_pid(pid);
> if (!p)
> return -ESRCH;
>
> fd = -EINVAL;
> if (pid_has_task(p, PIDTYPE_TGID)) {
> fd = anon_inode_getfd("[pidfd]", &pidfd_fops, pid,
> flags | O_RDWR | O_CLOEXEC);
> }
> if (fd < 0)
> put_pid(p);
> return fd;
> }

Sure, I'd totally take a patch like that!

>
> but this is cosmetic and off-topic too.

No, much appreciated. Good-looking code is important. :)

Christian

2020-09-03 15:39:58

by Christian Brauner

[permalink] [raw]
Subject: Re: [PATCH v2 2/4] exit: support non-blocking pidfds

On Thu, Sep 03, 2020 at 04:22:42PM +0200, Oleg Nesterov wrote:
> On 09/02, Christian Brauner wrote:
> >
> > It also makes the API more consistent and uniform. In essence, waitid() is
> > treated like a read on a non-blocking pidfd or a recvmsg() on a non-blocking
> > socket.
> > With the addition of support for non-blocking pidfds we support the same
> > functionality that sockets do. For sockets() recvmsg() supports MSG_DONTWAIT
> > for pidfds waitid() supports WNOHANG.
>
> What I personally do not like is that waitid(WNOHANG) returns zero or EAGAIN
> depending on f_flags & O_NONBLOCK... This doesn't match recvmsg(MSG_DONTWAIT)
> and doesn't look consistent to me.

It's not my favorite thing either but async event loops are usually
modeled around EAGAIN so I think this has benefits. Josh can speak more
to that.

Christian

2020-09-03 23:51:33

by Josh Triplett

[permalink] [raw]
Subject: Re: [PATCH v2 1/4] pidfd: support PIDFD_NONBLOCK in pidfd_open()

On Wed, Sep 02, 2020 at 12:21:27PM +0200, Christian Brauner wrote:
> Introduce PIDFD_NONBLOCK to support non-blocking pidfd file descriptors.
>
> Ever since the introduction of pidfds and more advanced async io various
> programming languages such as Rust have grown support for async event
> libraries. These libraries are created to help build epoll-based event loops
> around file descriptors. A common pattern is to automatically make all file
> descriptors they manage to O_NONBLOCK.
>
> For such libraries the EAGAIN error code is treated specially. When a function
> is called that returns EAGAIN the function isn't called again until the event
> loop indicates the the file descriptor is ready. Supporting EAGAIN when
> waiting on pidfds makes such libraries just work with little effort. In the
> following patch we will extend waitid() internally to support non-blocking
> pidfds.
>
> This introduces a new flag PIDFD_NONBLOCK that is equivalent to O_NONBLOCK.
> This follows the same patterns we have for other (anon inode) file descriptors
> such as EFD_NONBLOCK, IN_NONBLOCK, SFD_NONBLOCK, TFD_NONBLOCK and the same for
> close-on-exec flags.
>
> Link: https://lore.kernel.org/lkml/20200811181236.GA18763@localhost/
> Link: https://github.com/joshtriplett/async-pidfd
> Cc: Kees Cook <[email protected]>
> Cc: Sargun Dhillon <[email protected]>
> Cc: Oleg Nesterov <[email protected]>
> Suggested-by: Josh Triplett <[email protected]>
> Signed-off-by: Christian Brauner <[email protected]>

Reviewed-by: Josh Triplett <[email protected]>

2020-09-03 23:58:05

by Josh Triplett

[permalink] [raw]
Subject: Re: [PATCH v2 2/4] exit: support non-blocking pidfds

On Wed, Sep 02, 2020 at 12:21:28PM +0200, Christian Brauner wrote:
> Passing a non-blocking pidfd to waitid() currently has no effect, i.e. is not
> supported. There are users which would like to use waitid() on pidfds that are
> O_NONBLOCK and mix it with pidfds that are blocking and both pass them to
> waitid().
> The expected behavior is to have waitid() return -EAGAIN for non-blocking
> pidfds and to block for blocking pidfds without needing to perform any
> additional checks for flags set on the pidfd before passing it to waitid().
> Non-blocking pidfds will return EAGAIN from waitid() when no child process is
> ready yet. Returning -EAGAIN for non-blocking pidfds makes it easier for event
> loops that handle EAGAIN specially.
>
> It also makes the API more consistent and uniform. In essence, waitid() is
> treated like a read on a non-blocking pidfd or a recvmsg() on a non-blocking
> socket.
> With the addition of support for non-blocking pidfds we support the same
> functionality that sockets do. For sockets() recvmsg() supports MSG_DONTWAIT
> for pidfds waitid() supports WNOHANG. Both flags are per-call options. In
> contrast non-blocking pidfds and non-blocking sockets are a setting on an open
> file description affecting all threads in the calling process as well as other
> processes that hold file descriptors referring to the same open file
> description. Both behaviors, per call and per open file description, have
> genuine use-cases.
>
> The implementation should be straightforward, we simply raise the WNOHANG flag
> when a non-blocking pidfd is passed and when do_wait() returns without finding
> an eligible task and the pidfd is non-blocking we set EAGAIN. If no child
> process exists non-blocking pidfd users will continue to see ECHILD but if
> child processes exist but have not yet exited users will see EAGAIN.
>
> A concrete use-case that was brought on-list was Josh's async pidfd library.
> Ever since the introduction of pidfds and more advanced async io various
> programming languages such as Rust have grown support for async event
> libraries. These libraries are created to help build epoll-based event loops
> around file descriptors. A common pattern is to automatically make all file
> descriptors they manage to O_NONBLOCK.
>
> For such libraries the EAGAIN error code is treated specially. When a function
> is called that returns EAGAIN the function isn't called again until the event
> loop indicates the the file descriptor is ready. Supporting EAGAIN when
> waiting on pidfds makes such libraries just work with little effort.
>
> Link: https://lore.kernel.org/lkml/20200811181236.GA18763@localhost/
> Link: https://github.com/joshtriplett/async-pidfd
> Cc: Kees Cook <[email protected]>
> Cc: Sargun Dhillon <[email protected]>
> Cc: Jann Horn <[email protected]>
> Cc: Thomas Gleixner <[email protected]>
> Cc: Ingo Molnar <[email protected]>
> Cc: Oleg Nesterov <[email protected]>
> Cc: "Peter Zijlstra (Intel)" <[email protected]>
> Suggested-by: Josh Triplett <[email protected]>
> Signed-off-by: Christian Brauner <[email protected]>

With or without the discussed change to WNOHANG behavior for
compatibility:
Reviewed-by: Josh Triplett <[email protected]>

Also, I think you should flip the order of patches 1 and 2, so that
there isn't a one-patch window in kernel history where you can create an
O_NONBLOCK pidfd with pidfd_open but it has no effect. I'd expect
userspace to use pidfd_open accepting or EINVAL-ing the flag as an
indication of whether it'll work.

2020-09-03 23:58:22

by Josh Triplett

[permalink] [raw]
Subject: Re: [PATCH v2 2/4] exit: support non-blocking pidfds

On Thu, Sep 03, 2020 at 05:38:47PM +0200, Christian Brauner wrote:
> On Thu, Sep 03, 2020 at 04:22:42PM +0200, Oleg Nesterov wrote:
> > On 09/02, Christian Brauner wrote:
> > >
> > > It also makes the API more consistent and uniform. In essence, waitid() is
> > > treated like a read on a non-blocking pidfd or a recvmsg() on a non-blocking
> > > socket.
> > > With the addition of support for non-blocking pidfds we support the same
> > > functionality that sockets do. For sockets() recvmsg() supports MSG_DONTWAIT
> > > for pidfds waitid() supports WNOHANG.
> >
> > What I personally do not like is that waitid(WNOHANG) returns zero or EAGAIN
> > depending on f_flags & O_NONBLOCK... This doesn't match recvmsg(MSG_DONTWAIT)
> > and doesn't look consistent to me.
>
> It's not my favorite thing either but async event loops are usually
> modeled around EAGAIN so I think this has benefits. Josh can speak more
> to that.

I wouldn't expect the same application to use both WNOHANG and
O_NONBLOCK, since the latter makes the former unnecessary. I'd have no
objection if WNOHANG continued to have the same "return 0 and you have
to check the structure to figure out what that means" behavior
regardless of the fd flags, for compatibility with an application or
library that expects that behavior with WNOHANG and didn't expect the
return value to change with a non-blocking fd. waitid could just return
EAGAIN on a non-blocking fd if *not* passed WNOHANG, which would make
pidfd Just Work in non-blocking event loops.

2020-09-04 00:02:29

by Josh Triplett

[permalink] [raw]
Subject: Re: [PATCH v2 0/4] Support non-blocking pidfds

On Wed, Sep 02, 2020 at 12:21:26PM +0200, Christian Brauner wrote:
> Hi,
>
> Passing a non-blocking pidfd to waitid() currently has no effect, i.e.
> is not supported. There are users which would like to use waitid() on
> pidfds that are O_NONBLOCK and mix it with pidfds that are blocking and
> both pass them to waitid().
> The expected behavior is to have waitid() return -EAGAIN for
> non-blocking pidfds and to block for blocking pidfds without needing to
> perform any additional checks for flags set on the pidfd before passing
> it to waitid().
> Non-blocking pidfds will return EAGAIN from waitid() when no child
> process is ready yet. Returning -EAGAIN for non-blocking pidfds makes it
> easier for event loops that handle EAGAIN specially.
>
> It also makes the API more consistent and uniform. In essence, waitid()
> is treated like a read on a non-blocking pidfd or a recvmsg() on a
> non-blocking socket.
> With the addition of support for non-blocking pidfds we support the same
> functionality that sockets do. For sockets() recvmsg() supports
> MSG_DONTWAIT for pidfds waitid() supports WNOHANG. Both flags are
> per-call options. In contrast non-blocking pidfds and non-blocking
> sockets are a setting on an open file description affecting all threads
> in the calling process as well as other processes that hold file
> descriptors referring to the same open file description. Both behaviors,
> per call and per open file description, have genuine use-cases.
>
> A concrete use-case that was brought on-list (see [1]) was Josh's async
> pidfd library. Ever since the introduction of pidfds and more advanced
> async io various programming languages such as Rust have grown support
> for async event libraries. These libraries are created to help build
> epoll-based event loops around file descriptors. A common pattern is to
> automatically make all file descriptors they manage to O_NONBLOCK.
>
> For such libraries the EAGAIN error code is treated specially. When a
> function is called that returns EAGAIN the function isn't called again
> until the event loop indicates the the file descriptor is ready.
> Supporting EAGAIN when waiting on pidfds makes such libraries just work
> with little effort.

Thanks for the patch series, Christian!

This will make it much easier to use pidfd in non-blocking event loops.

Reviewed-by: Josh Triplett <[email protected]>

- Josh Triplett

2020-09-04 10:30:27

by Christian Brauner

[permalink] [raw]
Subject: Re: [PATCH v2 2/4] exit: support non-blocking pidfds

On Thu, Sep 03, 2020 at 04:56:59PM -0700, Josh Triplett wrote:
> On Wed, Sep 02, 2020 at 12:21:28PM +0200, Christian Brauner wrote:
> > Passing a non-blocking pidfd to waitid() currently has no effect, i.e. is not
> > supported. There are users which would like to use waitid() on pidfds that are
> > O_NONBLOCK and mix it with pidfds that are blocking and both pass them to
> > waitid().
> > The expected behavior is to have waitid() return -EAGAIN for non-blocking
> > pidfds and to block for blocking pidfds without needing to perform any
> > additional checks for flags set on the pidfd before passing it to waitid().
> > Non-blocking pidfds will return EAGAIN from waitid() when no child process is
> > ready yet. Returning -EAGAIN for non-blocking pidfds makes it easier for event
> > loops that handle EAGAIN specially.
> >
> > It also makes the API more consistent and uniform. In essence, waitid() is
> > treated like a read on a non-blocking pidfd or a recvmsg() on a non-blocking
> > socket.
> > With the addition of support for non-blocking pidfds we support the same
> > functionality that sockets do. For sockets() recvmsg() supports MSG_DONTWAIT
> > for pidfds waitid() supports WNOHANG. Both flags are per-call options. In
> > contrast non-blocking pidfds and non-blocking sockets are a setting on an open
> > file description affecting all threads in the calling process as well as other
> > processes that hold file descriptors referring to the same open file
> > description. Both behaviors, per call and per open file description, have
> > genuine use-cases.
> >
> > The implementation should be straightforward, we simply raise the WNOHANG flag
> > when a non-blocking pidfd is passed and when do_wait() returns without finding
> > an eligible task and the pidfd is non-blocking we set EAGAIN. If no child
> > process exists non-blocking pidfd users will continue to see ECHILD but if
> > child processes exist but have not yet exited users will see EAGAIN.
> >
> > A concrete use-case that was brought on-list was Josh's async pidfd library.
> > Ever since the introduction of pidfds and more advanced async io various
> > programming languages such as Rust have grown support for async event
> > libraries. These libraries are created to help build epoll-based event loops
> > around file descriptors. A common pattern is to automatically make all file
> > descriptors they manage to O_NONBLOCK.
> >
> > For such libraries the EAGAIN error code is treated specially. When a function
> > is called that returns EAGAIN the function isn't called again until the event
> > loop indicates the the file descriptor is ready. Supporting EAGAIN when
> > waiting on pidfds makes such libraries just work with little effort.
> >
> > Link: https://lore.kernel.org/lkml/20200811181236.GA18763@localhost/
> > Link: https://github.com/joshtriplett/async-pidfd
> > Cc: Kees Cook <[email protected]>
> > Cc: Sargun Dhillon <[email protected]>
> > Cc: Jann Horn <[email protected]>
> > Cc: Thomas Gleixner <[email protected]>
> > Cc: Ingo Molnar <[email protected]>
> > Cc: Oleg Nesterov <[email protected]>
> > Cc: "Peter Zijlstra (Intel)" <[email protected]>
> > Suggested-by: Josh Triplett <[email protected]>
> > Signed-off-by: Christian Brauner <[email protected]>
>
> With or without the discussed change to WNOHANG behavior for
> compatibility:
> Reviewed-by: Josh Triplett <[email protected]>

I think that WNOHANG compatibility change might be a good idea. So I've
changed this to:

ret = do_wait(&wo);
if (!ret && !(options & WNOHANG) && (f_flags & O_NONBLOCK))
ret = -EAGAIN;

>
> Also, I think you should flip the order of patches 1 and 2, so that
> there isn't a one-patch window in kernel history where you can create an
> O_NONBLOCK pidfd with pidfd_open but it has no effect. I'd expect
> userspace to use pidfd_open accepting or EINVAL-ing the flag as an
> indication of whether it'll work.

Good point! I've changed the order now.

Thanks!
Christian

2020-09-04 10:38:35

by Christian Brauner

[permalink] [raw]
Subject: Re: [PATCH v2 0/4] Support non-blocking pidfds

On Thu, Sep 03, 2020 at 04:58:55PM -0700, Josh Triplett wrote:
> On Wed, Sep 02, 2020 at 12:21:26PM +0200, Christian Brauner wrote:
> > Hi,
> >
> > Passing a non-blocking pidfd to waitid() currently has no effect, i.e.
> > is not supported. There are users which would like to use waitid() on
> > pidfds that are O_NONBLOCK and mix it with pidfds that are blocking and
> > both pass them to waitid().
> > The expected behavior is to have waitid() return -EAGAIN for
> > non-blocking pidfds and to block for blocking pidfds without needing to
> > perform any additional checks for flags set on the pidfd before passing
> > it to waitid().
> > Non-blocking pidfds will return EAGAIN from waitid() when no child
> > process is ready yet. Returning -EAGAIN for non-blocking pidfds makes it
> > easier for event loops that handle EAGAIN specially.
> >
> > It also makes the API more consistent and uniform. In essence, waitid()
> > is treated like a read on a non-blocking pidfd or a recvmsg() on a
> > non-blocking socket.
> > With the addition of support for non-blocking pidfds we support the same
> > functionality that sockets do. For sockets() recvmsg() supports
> > MSG_DONTWAIT for pidfds waitid() supports WNOHANG. Both flags are
> > per-call options. In contrast non-blocking pidfds and non-blocking
> > sockets are a setting on an open file description affecting all threads
> > in the calling process as well as other processes that hold file
> > descriptors referring to the same open file description. Both behaviors,
> > per call and per open file description, have genuine use-cases.
> >
> > A concrete use-case that was brought on-list (see [1]) was Josh's async
> > pidfd library. Ever since the introduction of pidfds and more advanced
> > async io various programming languages such as Rust have grown support
> > for async event libraries. These libraries are created to help build
> > epoll-based event loops around file descriptors. A common pattern is to
> > automatically make all file descriptors they manage to O_NONBLOCK.
> >
> > For such libraries the EAGAIN error code is treated specially. When a
> > function is called that returns EAGAIN the function isn't called again
> > until the event loop indicates the the file descriptor is ready.
> > Supporting EAGAIN when waiting on pidfds makes such libraries just work
> > with little effort.
>
> Thanks for the patch series, Christian!
>
> This will make it much easier to use pidfd in non-blocking event loops.
>
> Reviewed-by: Josh Triplett <[email protected]>

Thank you and thanks for your input on a bunch of other stuff as well. :)

Christian