Received: by 2002:a25:ad19:0:0:0:0:0 with SMTP id y25csp9656974ybi; Wed, 24 Jul 2019 07:48:41 -0700 (PDT) X-Google-Smtp-Source: APXvYqy/I5KmMcEjwb1d8UnZ7UVHCv7kzW5g+uf0gbP3wEsmGKUR0MFTy9lc3edNJYEjTLb9hTlm X-Received: by 2002:aa7:8f24:: with SMTP id y4mr11569486pfr.36.1563979721287; Wed, 24 Jul 2019 07:48:41 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1563979721; cv=none; d=google.com; s=arc-20160816; b=yVxNaKbZd7Z0ngVPd3gM4nRdmUXEh80eYSfS+rlypb28HL05nhJmqIP5jcIZifMTsP dDR0E5a8iDe4csPW4HpmW5H8hC8MDqfj3PQdw+OjWvye4bv3grzrpGGjL/QifOL/Dhs0 D1rL67/Uz2DWyyHK/g4vFG2ReyiZ2VROOngPvYIEmR6W9MYpM+H/KKN2Qv7/3jh42/dO vt6tMdouBbEZsMCeJ01QtbW8uPAOjQGx/iVsM1hrIh4rUdOCcEq2cT12yOKQpyn8srDB U0VBrfSBnbsu/Z0ig/sWSOAMCXOq7TXB0/lUbu2bMAR1C5AZvz/jDPPCIWAzoBaadytE 3umw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:content-transfer-encoding:mime-version :references:in-reply-to:message-id:date:subject:cc:to:from :dkim-signature; bh=sU1zQBRTx+tjcZH3tpuIL4aXRbaDqB+lgEWHSSFsnfM=; b=dQsBoL2asklBTMquIEQaGvpgrl+pbbLLc0Xh/l0G+HuHS7l22ju9v6lyVv52yH5Ba+ VWw9GDZGhHCN+aOqgsqv2QHzQxSRy+YafRVP+P0LHm4Hv2DGyhFP762GxkYwDfIxv0b7 oHcKyaRQGpe36UxapP70n/690ZkqiO9+O5JrTkwQ2LGc+18BqLwxOeOKQH8/2F4/FiQx T9QQo4RRkT7N+ciaE3IsCRbJ4GJ1x4HG3Lz6ND8Vv3DPyWiVja5+VhSDWk1uYPl2CWH3 sVyZSDOv+L0/OLwunE216WFihn74pL8ScgR6fynXkZBfIm9ugjhS942M/k882P1+0+2E Xbaw== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@brauner.io header.s=google header.b=RkIECN6z; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id bh8si13810048plb.175.2019.07.24.07.48.26; Wed, 24 Jul 2019 07:48:41 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; dkim=pass header.i=@brauner.io header.s=google header.b=RkIECN6z; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S2387512AbfGXOrk (ORCPT + 99 others); Wed, 24 Jul 2019 10:47:40 -0400 Received: from mail-pl1-f196.google.com ([209.85.214.196]:46588 "EHLO mail-pl1-f196.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726470AbfGXOrk (ORCPT ); Wed, 24 Jul 2019 10:47:40 -0400 Received: by mail-pl1-f196.google.com with SMTP id c2so22078934plz.13 for ; Wed, 24 Jul 2019 07:47:39 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=brauner.io; s=google; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=sU1zQBRTx+tjcZH3tpuIL4aXRbaDqB+lgEWHSSFsnfM=; b=RkIECN6zulgyHPa4sHRBBZ6otRBiKjdl3fjHramc9JSt2tYgv5wNIG8UwPstGvwR2q mVBSqLH3Is4Ih8yEiFgdYLIhgSmwAd4yHvbY74z5HxcHfSOtnJA8HRJ3ra2qnIUlVJtw RZzuj3G0XbgYSxvely4SeE6bqVyve6N+6Cgqs7AD1qRuJvEI+aSUA0jHrtm5qEw3r3Ya nj1L76nyOH67Tq+3MRb6KGdViGLHVg8rt8a+aF26wodmjjaJ4PUXdjEfH0E+QDvLVB0Y zynDzP8R0Zupb1MmcEQKGw9EDYQikpmbKlTHQZhkEZHjkGFECl1OrHTljdZxt7yCRGC9 Xlsw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=sU1zQBRTx+tjcZH3tpuIL4aXRbaDqB+lgEWHSSFsnfM=; b=cIFLRRR3bWTqV+gfH/rKMBqjcDUaVxzKCGddnJrIrsNW9jcl2ML3ggpTBgJr/z0h+R D9kJOncjS28Owx4QHUbnXiCaliAif+BrpnrB82uvMuS60/KvzaE3gJtlj9lIrxHuAyfl IaS/UPUJEityrAjHdzYs6ScpA5bews04LH+5WRcCsSYdwvj1z6yZfge+ZQg1+yQSouXC s0RMHZnB5zSEy7tsJyIZ6UUzmdxl/gnLH0sjzzr2+sCrf3y/42TpjuQM5hklGITn+yLR bcla6RgKv0/r7Nae8P7P9vwzBZ+jcgrVAFBpuGWLY9KUuPjeFESR7O1VRnyyC3adkc7l fmuQ== X-Gm-Message-State: APjAAAXGhUh1fEMRega5OzArf0UC71HSSML6VR4D5Z4H37eQFHJVa9dh qLsU+VQkAB0xwOKtIXKa6Q4T8IgUf8s= X-Received: by 2002:a17:902:3181:: with SMTP id x1mr84460244plb.135.1563979658986; Wed, 24 Jul 2019 07:47:38 -0700 (PDT) Received: from localhost.localdomain ([172.58.27.54]) by smtp.gmail.com with ESMTPSA id g6sm41125644pgh.64.2019.07.24.07.47.31 (version=TLS1_3 cipher=AEAD-AES256-GCM-SHA384 bits=256/256); Wed, 24 Jul 2019 07:47:38 -0700 (PDT) From: Christian Brauner To: linux-kernel@vger.kernel.org, oleg@redhat.com Cc: arnd@arndb.de, ebiederm@xmission.com, keescook@chromium.org, joel@joelfernandes.org, tglx@linutronix.de, tj@kernel.org, dhowells@redhat.com, jannh@google.com, luto@kernel.org, akpm@linux-foundation.org, cyphar@cyphar.com, torvalds@linux-foundation.org, viro@zeniv.linux.org.uk, kernel-team@android.com, Christian Brauner , linux-api@vger.kernel.org Subject: [PATCH 2/5] pidfd: add pidfd_wait() Date: Wed, 24 Jul 2019 16:46:48 +0200 Message-Id: <20190724144651.28272-3-christian@brauner.io> X-Mailer: git-send-email 2.22.0 In-Reply-To: <20190724144651.28272-1-christian@brauner.io> References: <20190724144651.28272-1-christian@brauner.io> MIME-Version: 1.0 Content-Transfer-Encoding: 8bit Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org This adds the pidfd_wait() syscall. One of the last remaining bits for the pidfd api is to make it possible to wait on pidfds. With this syscall implemented parts of userspace that want to use this api can finally switch to managing processes completely through pidfds if they so desire (cf. [1]). The pidfd_wait() syscall does not allow scoping of the process identified by the pidfd, i.e. it explicitly does not try to mirror the behavior of: wait4(-1), wait4(0), waitid(P_ALL), waitid(P_PGID) etc. It only allows for semantics equivalent to wait4(pid), waitid(P_PID). Users that need scoping should rely on pid-based wait*() syscalls for now. pidfd_wait() allows to specify which changes to wait for. The states to wait for can be or-ed and are specified in the states argument: WEXITED Wait for children that have terminated. WSTOPPED Wait for children that have been stopped by delivery of a signal. WCONTINUED Wait for (previously stopped) children that have been resumed by delivery of SIGCONT. WUNTRACED Return if a child has stopped. The behavior of pidfd_wait() can be further modified by specifying the following or-able options in the flags argument: __WCLONE Only wait for a process that delivers no signal or a different signal than SIGCHLD to the parent on termination. __WALL Wait for all children indepedent of whether or not they deliver no signal or another signal than SIGCHLD to the parent on termination. parent __WNOTHREAD Do not wait for children of other threads in the same thread-group. WNOHANG Return immediately if no child has exited. WNOWAIT Leave the child in a waitable state. pidfd_wait() takes an additional siginfo_t argument. If it is non-NULL, pidfd_wait() will fill in si_pid, si_uid, si_signo, si_status, and si_code. The si_code field will be set to one of CLD_EXITED, CLD_KILLED, CLD_DUMPED, CLD_STOPPED, CLD_TRAPPED, or CLD_CONTINUED. Information about resource usage of the process in question is returned in the struct rusage argument of pidfd_wait(). On success, pidfd_wait() will return the pid of the process the pidfd referred to. On failure, a negative error code will be returned. /* Prior approach */ The first implementation was based on a flag WPIDFD which got added to the wait*() system calls. However, that involved passing the pidfd through the pid_t pid argument and do in-kernel type switching based on the flag which feels like a really unclean solution and overall like a mishmash of two apis. This is something we luckily have avoided so far and I think we're better off in the long run if we keep it that way. /* References */ [1]: https://github.com/systemd/systemd/issues/13101 Signed-off-by: Christian Brauner Cc: Arnd Bergmann Cc: "Eric W. Biederman" Cc: Kees Cook Cc: Joel Fernandes (Google) Cc: Thomas Gleixner Cc: Tejun Heo Cc: David Howells Cc: Jann Horn Cc: Andy Lutomirsky Cc: Andrew Morton Cc: Oleg Nesterov Cc: Aleksa Sarai Cc: Linus Torvalds Cc: Al Viro Cc: linux-api@vger.kernel.org --- include/linux/pid.h | 5 +++ kernel/exit.c | 87 +++++++++++++++++++++++++++++++++++++++++++++ kernel/fork.c | 8 +++++ kernel/signal.c | 7 ++-- 4 files changed, 105 insertions(+), 2 deletions(-) diff --git a/include/linux/pid.h b/include/linux/pid.h index 2a83e434db9d..443cd4108943 100644 --- a/include/linux/pid.h +++ b/include/linux/pid.h @@ -72,6 +72,11 @@ extern struct pid init_struct_pid; extern const struct file_operations pidfd_fops; +struct file; + +extern struct pid *pidfd_pid(const struct file *file); + + static inline struct pid *get_pid(struct pid *pid) { if (pid) diff --git a/kernel/exit.c b/kernel/exit.c index 73392a455b72..8086c76e1959 100644 --- a/kernel/exit.c +++ b/kernel/exit.c @@ -1738,3 +1738,90 @@ __weak void abort(void) panic("Oops failed to kill thread"); } EXPORT_SYMBOL(abort); + +static int copy_rusage_to_user_any(struct rusage *kru, struct rusage __user *ru) +{ +#ifdef CONFIG_COMPAT + if (in_compat_syscall()) + return put_compat_rusage(kru, (struct compat_rusage __user *)ru); +#endif + return copy_to_user(ru, kru, sizeof(*kru)); +} + +static int copy_siginfo_to_user_any(kernel_siginfo_t *kinfo, siginfo_t *info) +{ +#ifdef CONFIG_COMPAT + if (in_compat_syscall()) + return copy_siginfo_to_user32( + (struct compat_siginfo __user *)info, kinfo); +#endif + return copy_siginfo_to_user(info, kinfo); +} + +SYSCALL_DEFINE6(pidfd_wait, int, pidfd, int __user *, stat_addr, + siginfo_t __user *, info, struct rusage __user *, ru, + unsigned int, states, unsigned int, flags) +{ + long ret; + struct fd f; + struct pid *pid; + struct wait_opts wo; + struct rusage kru = {}; + kernel_siginfo_t kinfo = { + .si_signo = 0, + }; + + if (pidfd < 0) + return -EINVAL; + + if (states & ~(WEXITED | WSTOPPED | WCONTINUED | WUNTRACED)) + return -EINVAL; + + if (!(states & (WEXITED | WSTOPPED | WCONTINUED | WUNTRACED))) + return -EINVAL; + + if (flags & ~(__WNOTHREAD | __WCLONE | __WALL | WNOWAIT | WNOHANG)) + return -EINVAL; + + f = fdget(pidfd); + if (!f.file) + return -EBADF; + + pid = pidfd_pid(f.file); + if (IS_ERR(pid)) { + ret = PTR_ERR(pid); + goto out_fdput; + } + + wo = (struct wait_opts){ + .wo_type = PIDTYPE_PID, + .wo_pid = pid, + .wo_flags = states | flags, + .wo_info = info ? &kinfo : NULL, + .wo_rusage = ru ? &kru : NULL, + }; + + ret = do_wait(&wo); + if (ret > 0) { + kinfo.si_signo = SIGCHLD; + + if (stat_addr && put_user(wo.wo_stat, stat_addr)) { + ret = -EFAULT; + goto out_fdput; + } + + if (ru && copy_rusage_to_user_any(&kru, ru)) { + ret = -EFAULT; + goto out_fdput; + } + } else { + kinfo.si_signo = 0; + } + + if (info && copy_siginfo_to_user_any(&kinfo, info)) + ret = -EFAULT; + +out_fdput: + fdput(f); + return ret; +} diff --git a/kernel/fork.c b/kernel/fork.c index d8ae0f1b4148..baaff6570517 100644 --- a/kernel/fork.c +++ b/kernel/fork.c @@ -1743,6 +1743,14 @@ const struct file_operations pidfd_fops = { #endif }; +struct pid *pidfd_pid(const struct file *file) +{ + if (file->f_op == &pidfd_fops) + return file->private_data; + + return ERR_PTR(-EBADF); +} + static void __delayed_free_task(struct rcu_head *rhp) { struct task_struct *tsk = container_of(rhp, struct task_struct, rcu); diff --git a/kernel/signal.c b/kernel/signal.c index 91b789dd6e72..2e567f64812f 100644 --- a/kernel/signal.c +++ b/kernel/signal.c @@ -3672,8 +3672,11 @@ static int copy_siginfo_from_user_any(kernel_siginfo_t *kinfo, siginfo_t *info) static struct pid *pidfd_to_pid(const struct file *file) { - if (file->f_op == &pidfd_fops) - return file->private_data; + struct pid *pid; + + pid = pidfd_pid(file); + if (!IS_ERR(pid)) + return pid; return tgid_pidfd_to_pid(file); } -- 2.22.0