Received: by 2002:a25:4158:0:0:0:0:0 with SMTP id o85csp1546939yba; Sun, 14 Apr 2019 13:16:06 -0700 (PDT) X-Google-Smtp-Source: APXvYqwteYao/2e98mFlyKD5l7rWgXJ3YjgvU5qiN6J6PWoBUqA6Me04S0xYa9UUhg7cnp03ufx8 X-Received: by 2002:a65:6241:: with SMTP id q1mr36975095pgv.244.1555272966347; Sun, 14 Apr 2019 13:16:06 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1555272966; cv=none; d=google.com; s=arc-20160816; b=Vj0ZCoqAMZXCqP1UlKaantD+r5xvgh0hEZO5MhZB5u2U1j3lfL8k26wz885azEMFLT HBPz0+JCF24kjq5OszviQxBHyOvNgtIq6F77b8iHMi9jTo57ThzaQpB2wizMJCeKbfmv kJ4/clMTOGwjdqmHMQmyyfTqOqpzPdejlmtJmj68svx4QKJGYfI9cP2pj6n10kiXWjda AeBluDqEDsGO0E1AiTZDDUiiuZ0GzNlvAMcw73prjUQ+OLPfFhc9g/FX41IL7hk+svSm W0TFieDhQ0DvnXRIf0K1lmfqJMPIYThht0lW/XMED5CAp95dYNUg1Pou5TjO8IM9IfHB +baA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:content-transfer-encoding:mime-version :message-id:date:subject:cc:to:from:dkim-signature; bh=Pmnpw1WexUqJgkx/tqwqzimvX6ITzB1N6znVU/yqhE8=; b=ClEPsnQj3DQHiFsF4tecpUIRMUhP8rqn6hwIW1nJQraMq3ArljdJFKgp6mwa9TrD3f UMnsjuR/ajlrGyJdaJ7xfeg60Zs5eO99vxSbfd1uJQz4BM5rOPFqCPr3M7Sxc9aQDHaR o7vP//4pUu5tvOva5uOH/IRLsVU/zOypH3mr3+1B1H8eYDnq114nKy6naaSOuD8kIVuh NSpMzHzaDLQrl96Lv9jlMybWBObrLfMejUT2tcosmFpUy0UrWsUW0i9yAogPo3vOwhmS qTuB7Nbhv5uijDnb1HKpC0KIszkSQxRTP9tr0cTEDu10p/vD6RuYXee4c4HS3MlGMdRi LUkg== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@brauner.io header.s=google header.b="e/mIYmtV"; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id 15si30542476pfz.73.2019.04.14.13.15.49; Sun, 14 Apr 2019 13:16:06 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; dkim=pass header.i=@brauner.io header.s=google header.b="e/mIYmtV"; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727020AbfDNUPP (ORCPT + 99 others); Sun, 14 Apr 2019 16:15:15 -0400 Received: from mail-ed1-f67.google.com ([209.85.208.67]:45000 "EHLO mail-ed1-f67.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726159AbfDNUPP (ORCPT ); Sun, 14 Apr 2019 16:15:15 -0400 Received: by mail-ed1-f67.google.com with SMTP id i13so2135949edf.11 for ; Sun, 14 Apr 2019 13:15:13 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=brauner.io; s=google; h=from:to:cc:subject:date:message-id:mime-version :content-transfer-encoding; bh=Pmnpw1WexUqJgkx/tqwqzimvX6ITzB1N6znVU/yqhE8=; b=e/mIYmtVmiPhqcTCUIiWix31Fdw8olIu8gfgT7DxWehhcB2RRoqsE3CBsc1SNgaBk8 9yLqCKFRuTEEAQBj65kIHZ9W6gyvd7KaBHbBrPhwbctZzjt2yqH1Fl/wjNYdJdyZjYVT ZgZDL5tNKa0O5641yyJ8GqmAAv+N7sW89DpVaHpTdVGS2eY5c2XCZE8jSkLCRO7eFD0Y hQI3WFKWA8ObUceDfLFiqE1YqoxTBqfXri7TdeWSdMw8afViMGdiPl+q0A1Uf+kfe0IK hiz9rKRLdyVKYv7esr+8b+UsQyoWK1L7d9a7B0lHP9HS+Dj7BrCQi2E3+YZ1cNivTBeo VvEg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:mime-version :content-transfer-encoding; bh=Pmnpw1WexUqJgkx/tqwqzimvX6ITzB1N6znVU/yqhE8=; b=MAZKVoxmHhnruiDcadmL6cZaYq/mMUa+LZ4j9Ix7FVgIoL0j82iEs0OTsrwxlFMX4d nqVq24m9a1Ti8+UKYwgFsXqWS0nyD5nNJ7REW2D62OZn0cQqYcnvy+751hpUQqixyYzl Tz70Cj6rR+Rt7kmLixn6RSmWZD6dvzHhsRJNahLuty00sHPs/wR97itj+lg1S99YuCWo LSzxHjEVwb86OVB4IZBh0hrlw1pfApeKS7TQQpPTTIKysic/5QTIhKjl88WzRN6e0Fx/ aR4yDaUpKRZXZUv7kBbwClAzGaR+WEZscoOo6vbz71e3jpZCiNDypfmXB1oaDmrEAeoe 5gCA== X-Gm-Message-State: APjAAAUjOyOWcfA9jLbji6vwwecXTM4wdE+ThVxbIYQHIkHC6/N7N5E8 leyGRU0QksvhwnC/JqdoO+QFyw== X-Received: by 2002:a17:906:6d99:: with SMTP id h25mr13987974ejt.187.1555272912454; Sun, 14 Apr 2019 13:15:12 -0700 (PDT) Received: from localhost.localdomain ([212.91.227.56]) by smtp.gmail.com with ESMTPSA id n21sm3383068edq.14.2019.04.14.13.15.10 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Sun, 14 Apr 2019 13:15:11 -0700 (PDT) From: Christian Brauner To: torvalds@linux-foundation.org, viro@zeniv.linux.org.uk, jannh@google.com, dhowells@redhat.com, linux-api@vger.kernel.org, linux-kernel@vger.kernel.org Cc: serge@hallyn.com, luto@kernel.org, arnd@arndb.de, ebiederm@xmission.com, keescook@chromium.org, tglx@linutronix.de, mtk.manpages@gmail.com, akpm@linux-foundation.org, oleg@redhat.com, cyphar@cyphar.com, joel@joelfernandes.org, dancol@google.com, Christian Brauner Subject: [PATCH 0/4] clone: add CLONE_PIDFD Date: Sun, 14 Apr 2019 22:14:32 +0200 Message-Id: <20190414201436.19502-1-christian@brauner.io> X-Mailer: git-send-email 2.21.0 MIME-Version: 1.0 Content-Transfer-Encoding: 8bit Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Hey Linus, This patchset makes it possible to retrieve pid file descriptors at process creation time by introducing the new flag CLONE_PIDFD to the clone() system call as previously discussed. As decided last week [1] Jann and I have refined the implementation of pidfds as anonymous inodes. Based on last weeks RFC we have only tweaked documentation and naming, as well as making the sample program how to get easy metadata access from a pidfd a little cleaner and more paranoid when checking for errors. The sample program can also serve as a test for the patchset. When clone is called with CLONE_PIDFD a pidfd instead of a pid will be returned. To make it possible for users of CLONE_PIDFD to apply standard error checking that is common all across userspace, file descriptor numbering for pidfds starts at 1 and not 0. This has the major advantage that users can do: int pidfd = clone(CLONE_PIDFD); if (pidfd < 0) { /* handle error */ exit(EXIT_FAILURE): } if (pidfd == 0) { /* child */ exit(EXIT_SUCCESS); } /* parent */ exit(EXIT_SUCCESS); We have also taken care that pidfds are created *after* the fd table has been unshared to not leak pidfds into child processes. pidfd creation during clone is split into two distinct steps: 1. preparing both an fd and a file referencing struct pid for fd_install() 2. fd_install()ing the pidfd Step 1. is performed before clone's point of no return and especially before write_lock_irq(&tasklist_lock) is taken. Performing 1. before clone's point of no return ensures that we don't need to fail a process that is already visible to userspace when pidfd creation fails. Step 2. happens after attach_pid() is performed and the process is visible to userspace. Technically, we could have also performed step 1. and 2. together before clone's point of no return and then calling close on the file descriptor on failure. This would slightly increase code-locality but it is semantically more correct and clean to bring the pidfd into existence once the process is fully attached and not before. The actual code for CLONE_PIDFD in patch 2 is completely confined to fork.c (apart from the CLONE_PIDFD definition of course) and is rather small and hopefully good to review. The additional changes listed under David's name in the diffstat below are here to make anon_inodes available unconditionally. They are needed for the new mount api and thus for core vfs code in addition to pidfds. David knows this and he has informed Al that this patch is sent out here. The changes themselves are rather automatic. As promised I have also contacted Joel who has sent a patchset to make pidfds pollable. He has been informed and is happy to port his patchset once we have moved forward [2]. Jann and I currently plan to target this patchset for inclusion in the 5.2 merge window. Thanks! Jann & Christian [1]: https://lore.kernel.org/lkml/CAHk-=wifyY+XGNW=ZC4MyTHD14w81F8JjQNH-GaGAm2RxZ_S8Q@mail.gmail.com/ [2]: https://lore.kernel.org/lkml/20190411200059.GA75190@google.com/ Christian Brauner (3): clone: add CLONE_PIDFD signal: support CLONE_PIDFD with pidfd_send_signal samples: show race-free pidfd metadata access David Howells (1): Make anon_inodes unconditional arch/arm/kvm/Kconfig | 1 - arch/arm64/kvm/Kconfig | 1 - arch/mips/kvm/Kconfig | 1 - arch/powerpc/kvm/Kconfig | 1 - arch/s390/kvm/Kconfig | 1 - arch/x86/Kconfig | 1 - arch/x86/kvm/Kconfig | 1 - drivers/base/Kconfig | 1 - drivers/char/tpm/Kconfig | 1 - drivers/dma-buf/Kconfig | 1 - drivers/gpio/Kconfig | 1 - drivers/iio/Kconfig | 1 - drivers/infiniband/Kconfig | 1 - drivers/vfio/Kconfig | 1 - fs/Makefile | 2 +- fs/notify/fanotify/Kconfig | 1 - fs/notify/inotify/Kconfig | 1 - include/linux/pid.h | 2 + include/uapi/linux/sched.h | 1 + init/Kconfig | 10 -- kernel/fork.c | 117 +++++++++++++++++++++- kernel/signal.c | 14 ++- kernel/sys_ni.c | 3 - samples/Makefile | 2 +- samples/pidfd/Makefile | 6 ++ samples/pidfd/pidfd-metadata.c | 172 +++++++++++++++++++++++++++++++++ 26 files changed, 305 insertions(+), 40 deletions(-) create mode 100644 samples/pidfd/Makefile create mode 100644 samples/pidfd/pidfd-metadata.c -- 2.21.0