Received: by 2002:a05:6a10:22f:0:0:0:0 with SMTP id 15csp357286pxk; Wed, 2 Sep 2020 03:24:43 -0700 (PDT) X-Google-Smtp-Source: ABdhPJyujJ4Vxk695Hrmlzvfh9PArSgO1zJSmSnbpugDPMSanickTxgaNBvYITa+eprOYiHPrRf3 X-Received: by 2002:aa7:d515:: with SMTP id y21mr5683389edq.381.1599042283363; Wed, 02 Sep 2020 03:24:43 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1599042283; cv=none; d=google.com; s=arc-20160816; b=d77cJVBVz0fh4ywjle/Vx0+0SJdSSb+h+hKPwbo4m/nH2HNZZHZ/f8wbS23pcMjcAW h4NMppBBdl3bbOTcL+YHvzUeeDGeZPLpZYknx/42095bgtyl0JQ9iZ8qJLAhWWqZf07z 5jEQrQuqYQnoVszN4TI2N3ltzrdpcXS2cwEcherW/PHS+niJyZ8a+Ka+Xrr16Wxgxw13 JCAtisYym3kGmpYUyMi9/XBwo7TLjo4hK2gkPeYL5WTuh8Y6RyLM1bqq26N1tqvGrB0q 2VhcHskUCdCMD7lAntl4K+G9HXVpIC1AiXCoBx3LkTlguHajEMrHU0W/g76xBEiOuCTz CbkQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:content-transfer-encoding:mime-version :references:in-reply-to:message-id:date:subject:cc:to:from; bh=eFvQwOS+jVA8NcJ/m1/CxKgME92Asl/hk7gPjfkLskE=; b=1DyQ30iEGtP+78gGEZ/J4lhRWFdBf8jNcloSZ29r+a1uG1F98GxEMccm3dll/bmdCq mrPhlVQ4cpJkiJUHDLoobay4nXsN41unMUtHXwFzaGVZu5t5WZ9HvMlEdeRa4kFtcJ9d nwwnM4pltChMULp9Vk7+N4WopYFlQIjKhMr5TQQjYQI0wb8Ts6DfQVJWOdlEXC+B4P+d XJiOQ90XHs36haqVXvdBRJIgdMYHKpHqgAJ7yPQuk33N3JWIKt7yS420q/1hajtBJgRO aFMJNw513iUV22YzEtTddwfoKeuwcgMOFHejI0s0TVikbCTwo6YV1lgR0+2T2jSpAOR3 +q4w== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id cf27si2276341edb.506.2020.09.02.03.24.20; Wed, 02 Sep 2020 03:24:43 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726323AbgIBKXY (ORCPT + 99 others); Wed, 2 Sep 2020 06:23:24 -0400 Received: from youngberry.canonical.com ([91.189.89.112]:49490 "EHLO youngberry.canonical.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726177AbgIBKXU (ORCPT ); Wed, 2 Sep 2020 06:23:20 -0400 Received: from ip5f5af70b.dynamic.kabel-deutschland.de ([95.90.247.11] helo=wittgenstein.fritz.box) by youngberry.canonical.com with esmtpsa (TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.86_2) (envelope-from ) id 1kDPvC-00069F-5z; Wed, 02 Sep 2020 10:23:18 +0000 From: Christian Brauner To: linux-kernel@vger.kernel.org Cc: Christian Brauner , "Peter Zijlstra (Intel)" , Ingo Molnar , Thomas Gleixner , Oleg Nesterov , "Eric W. Biederman" , Kees Cook , Sargun Dhillon , Aleksa Sarai , linux-kselftest@vger.kernel.org, Josh Triplett , Jens Axboe , linux-api@vger.kernel.org, Christian Brauner Subject: [PATCH v2 1/4] pidfd: support PIDFD_NONBLOCK in pidfd_open() Date: Wed, 2 Sep 2020 12:21:27 +0200 Message-Id: <20200902102130.147672-2-christian.brauner@ubuntu.com> X-Mailer: git-send-email 2.28.0 In-Reply-To: <20200902102130.147672-1-christian.brauner@ubuntu.com> References: <20200902102130.147672-1-christian.brauner@ubuntu.com> MIME-Version: 1.0 Content-Transfer-Encoding: 8bit Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Introduce PIDFD_NONBLOCK to support non-blocking pidfd file descriptors. Ever since the introduction of pidfds and more advanced async io various programming languages such as Rust have grown support for async event libraries. These libraries are created to help build epoll-based event loops around file descriptors. A common pattern is to automatically make all file descriptors they manage to O_NONBLOCK. For such libraries the EAGAIN error code is treated specially. When a function is called that returns EAGAIN the function isn't called again until the event loop indicates the the file descriptor is ready. Supporting EAGAIN when waiting on pidfds makes such libraries just work with little effort. In the following patch we will extend waitid() internally to support non-blocking pidfds. This introduces a new flag PIDFD_NONBLOCK that is equivalent to O_NONBLOCK. This follows the same patterns we have for other (anon inode) file descriptors such as EFD_NONBLOCK, IN_NONBLOCK, SFD_NONBLOCK, TFD_NONBLOCK and the same for close-on-exec flags. Link: https://lore.kernel.org/lkml/20200811181236.GA18763@localhost/ Link: https://github.com/joshtriplett/async-pidfd Cc: Kees Cook Cc: Sargun Dhillon Cc: Oleg Nesterov Suggested-by: Josh Triplett Signed-off-by: Christian Brauner --- /* v2 */ - Christian Brauner : - Improve commit message. --- include/uapi/linux/pidfd.h | 12 ++++++++++++ kernel/pid.c | 12 +++++++----- 2 files changed, 19 insertions(+), 5 deletions(-) create mode 100644 include/uapi/linux/pidfd.h diff --git a/include/uapi/linux/pidfd.h b/include/uapi/linux/pidfd.h new file mode 100644 index 000000000000..5406fbc13074 --- /dev/null +++ b/include/uapi/linux/pidfd.h @@ -0,0 +1,12 @@ +/* SPDX-License-Identifier: GPL-2.0 WITH Linux-syscall-note */ + +#ifndef _UAPI_LINUX_PIDFD_H +#define _UAPI_LINUX_PIDFD_H + +#include +#include + +/* Flags for pidfd_open(). */ +#define PIDFD_NONBLOCK O_NONBLOCK + +#endif /* _UAPI_LINUX_PIDFD_H */ diff --git a/kernel/pid.c b/kernel/pid.c index b2562a7ce525..74ddbff1a6ba 100644 --- a/kernel/pid.c +++ b/kernel/pid.c @@ -43,6 +43,7 @@ #include #include #include +#include struct pid init_struct_pid = { .count = REFCOUNT_INIT(1), @@ -522,7 +523,8 @@ struct pid *find_ge_pid(int nr, struct pid_namespace *ns) /** * pidfd_create() - Create a new pid file descriptor. * - * @pid: struct pid that the pidfd will reference + * @pid: struct pid that the pidfd will reference + * @flags: flags to pass * * This creates a new pid file descriptor with the O_CLOEXEC flag set. * @@ -532,12 +534,12 @@ struct pid *find_ge_pid(int nr, struct pid_namespace *ns) * Return: On success, a cloexec pidfd is returned. * On error, a negative errno number will be returned. */ -static int pidfd_create(struct pid *pid) +static int pidfd_create(struct pid *pid, unsigned int flags) { int fd; fd = anon_inode_getfd("[pidfd]", &pidfd_fops, get_pid(pid), - O_RDWR | O_CLOEXEC); + flags | O_RDWR | O_CLOEXEC); if (fd < 0) put_pid(pid); @@ -565,7 +567,7 @@ SYSCALL_DEFINE2(pidfd_open, pid_t, pid, unsigned int, flags) int fd; struct pid *p; - if (flags) + if (flags & ~PIDFD_NONBLOCK) return -EINVAL; if (pid <= 0) @@ -576,7 +578,7 @@ SYSCALL_DEFINE2(pidfd_open, pid_t, pid, unsigned int, flags) return -ESRCH; if (pid_has_task(p, PIDTYPE_TGID)) - fd = pidfd_create(p); + fd = pidfd_create(p, flags); else fd = -EINVAL; -- 2.28.0