Received: by 2002:a05:7412:b10a:b0:f3:1519:9f41 with SMTP id az10csp443799rdb; Thu, 30 Nov 2023 08:41:48 -0800 (PST) X-Google-Smtp-Source: AGHT+IGN/CU8A0wASjOQfcgl4qi16NbE0K1HelYPM1EEHriYTK8Qf9xYg/8ipxlSINb5O9jdkM1m X-Received: by 2002:a05:6830:2644:b0:6d5:11f6:eec7 with SMTP id f4-20020a056830264400b006d511f6eec7mr79477otu.28.1701362507875; Thu, 30 Nov 2023 08:41:47 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1701362507; cv=none; d=google.com; s=arc-20160816; b=TC0w/O9LOtAR5kHCUODJzM3nujPgQ5rGwA/xjEFZKvjLTXDgkQD1xz+7V5wQbtjdk/ qNk+iCGZVOFpKQ3LddvlFhecw9D/+7qNMN0CfUl/vUzxNsUs6GIdasN1bEzIbD0RkqI0 3AKoV9z7c7aeSPza5VGS8o3IHggY24e1QJ28EQdeKOkAiCzsE4pQPYIrPM26CBsOIGLG A+34U5akr3FW+2CaApA8ylNNM8xmFSZ1JxrgK9XegaIncoP+ghAy0fnC8pQ95tupDzpg kkbyrDGSRTghlrqLuRDqJBsnEs989Hh+93G/JFG6XRsSBiZHCBhmMXhyjKqayTzuFA7N Xy8w== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :message-id:date:subject:cc:to:from:feedback-id:dkim-signature :dkim-signature; bh=GcRUTefwODfeWsEEU+T7P5vkTMTkxQIp7mTUU9VqXL8=; fh=cqt7tYfUjLVb4iAKPLlMbd5K0yJUwDstCHDT1mZCG8Y=; b=CA9lRJ/7NOS3wN/KgrK6ORplCFdzLvyPgrUsIXTgHDCge3DI06lN/L/q8RTn9kHNzT Xc7aDJwQqoOph6lcqOYHGOQSg+3PNlyETJ/2Vk7p2xWp6XT2S+dgWd0TajfJlBHvn0oj lUDodHNJxp+8yHvra8+FN3gZjvVvhMgDRhhDXsAdUawK/R0+3mFoss7Cc/M8Mcm4bH5E TYAYccCWwUjtxwGhqaLJ8YCV9qE/3Fv+t9zOkUlwZ2xwqzIHjhHfH2dMLu5lKWJtSTvk gw7raNmGOMQKuOjsEo3HYyvEghS7bpZerR6pfVISltD9or11DWOQCzkWplcR7JgOojsx 0NCA== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@tycho.pizza header.s=fm3 header.b=xGCt99og; dkim=pass header.i=@messagingengine.com header.s=fm1 header.b=MRoPsenN; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::3:4 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from howler.vger.email (howler.vger.email. [2620:137:e000::3:4]) by mx.google.com with ESMTPS id bi3-20020a056a02024300b005b57aa8517bsi1855578pgb.91.2023.11.30.08.41.47 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 30 Nov 2023 08:41:47 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::3:4 as permitted sender) client-ip=2620:137:e000::3:4; Authentication-Results: mx.google.com; dkim=pass header.i=@tycho.pizza header.s=fm3 header.b=xGCt99og; dkim=pass header.i=@messagingengine.com header.s=fm1 header.b=MRoPsenN; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::3:4 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: from out1.vger.email (depot.vger.email [IPv6:2620:137:e000::3:0]) by howler.vger.email (Postfix) with ESMTP id 8A4C8802294C; Thu, 30 Nov 2023 08:41:44 -0800 (PST) X-Virus-Status: Clean X-Virus-Scanned: clamav-milter 0.103.11 at howler.vger.email Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1345688AbjK3QlE (ORCPT + 99 others); Thu, 30 Nov 2023 11:41:04 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:51444 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S235171AbjK3Qkw (ORCPT ); Thu, 30 Nov 2023 11:40:52 -0500 Received: from out4-smtp.messagingengine.com (out4-smtp.messagingengine.com [66.111.4.28]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 51DB41A4; Thu, 30 Nov 2023 08:40:57 -0800 (PST) Received: from compute1.internal (compute1.nyi.internal [10.202.2.41]) by mailout.nyi.internal (Postfix) with ESMTP id EA2735C006B; Thu, 30 Nov 2023 11:40:53 -0500 (EST) Received: from mailfrontend1 ([10.202.2.162]) by compute1.internal (MEProxy); Thu, 30 Nov 2023 11:40:53 -0500 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=tycho.pizza; h= cc:cc:content-transfer-encoding:content-type:date:date:from:from :in-reply-to:message-id:mime-version:reply-to:sender:subject :subject:to:to; s=fm3; t=1701362453; x=1701448853; bh=GcRUTefwOD feWsEEU+T7P5vkTMTkxQIp7mTUU9VqXL8=; b=xGCt99ogNyz5PF4Lg0iAzjC7qp FTlBXswcvclsYpRpetMTOZeHvn0VKjXlt5IxOKdgHGAdqFfr7wACfU37IXpvx/6t P8GYFuhEuY1+1HU/lQU4KGELsKb+Shoqwh6YBsDfec4s8aGLKT6QUYU26PSui/6I /4gT+bDjkIJGkOJp6Gbxa7FpsfUJjyWrNHakgInrdTcGADekAP8zHs0ntdDyfBnF qcu93zQEGRxPbQRSSR5Sh/2L5PmeQW/DnL/8cjpFNAWqGjyRdzu0y0eBBV8VyEuo nw4aOFHD4fteUzqZZFz14NyYmX9BKhwRd9BnVu3tyScO7/ArxN7XfBz0rPPg== DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d= messagingengine.com; h=cc:cc:content-transfer-encoding :content-type:date:date:feedback-id:feedback-id:from:from :in-reply-to:message-id:mime-version:reply-to:sender:subject :subject:to:to:x-me-proxy:x-me-proxy:x-me-sender:x-me-sender :x-sasl-enc; s=fm1; t=1701362453; x=1701448853; bh=GcRUTefwODfeW sEEU+T7P5vkTMTkxQIp7mTUU9VqXL8=; b=MRoPsenNPeU3+ceAOVhtiJAXfV/uR GTn22jjH2BZ5z93PfBkF7NgpW7yARoJtGwnZ+pew+zZqqULn+2GKr02BLwdjYVg8 7VbnJ0yYaeLbvBy9Db+9x21UY4Lpk3ozF8v80qwwkvLX5G05+7gFGAE1Yb6R6xLD thYnzYkEMKNVz1cyfUx7xzcK+Cu84vVl1kxlXt0cZ4AVHTqcGF+nVl2XdG5yqcLP RxjinEvBgtwLrUbq6zpGD7XC5bLhk5BjrUrWxbLm3NrSk0ArDsLkPksVv5J4XINd XIAv97dzF9BE10dgkLytjqllQj8UE+NC3FTwCNdFFrxPzlN9noMKkwr0A== X-ME-Sender: X-ME-Received: X-ME-Proxy-Cause: gggruggvucftvghtrhhoucdtuddrgedvkedrudeijedgleduucetufdoteggodetrfdotf fvucfrrhhofhhilhgvmecuhfgrshhtofgrihhlpdfqfgfvpdfurfetoffkrfgpnffqhgen uceurghilhhouhhtmecufedttdenucesvcftvggtihhpihgvnhhtshculddquddttddmne cujfgurhephffvvefufffkofgggfestdekredtredttdenucfhrhhomhepvfihtghhohcu tehnuggvrhhsvghnuceothihtghhohesthihtghhohdrphhiiiiirgeqnecuggftrfgrth htvghrnhepheeffeehleeftdfgjeegheelieefvdfghfeuudeuheehuefhhffhtefhiedv geegnecuvehluhhsthgvrhfuihiivgeptdenucfrrghrrghmpehmrghilhhfrhhomhepth ihtghhohesthihtghhohdrphhiiiiirg X-ME-Proxy: Feedback-ID: i21f147d5:Fastmail Received: by mail.messagingengine.com (Postfix) with ESMTPA; Thu, 30 Nov 2023 11:40:52 -0500 (EST) From: Tycho Andersen To: Christian Brauner Cc: Oleg Nesterov , "Eric W . Biederman" , linux-kernel@vger.kernel.org, linux-api@vger.kernel.org, Tycho Andersen , Tycho Andersen Subject: [RFC 1/3] pidfd: allow pidfd_open() on non-thread-group leaders Date: Thu, 30 Nov 2023 09:39:44 -0700 Message-Id: <20231130163946.277502-1-tycho@tycho.pizza> X-Mailer: git-send-email 2.34.1 MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Spam-Status: No, score=-0.9 required=5.0 tests=DKIM_SIGNED,DKIM_VALID, DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS,MAILING_LIST_MULTI, SPF_HELO_NONE,SPF_PASS,T_SCC_BODY_TEXT_LINE autolearn=unavailable autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on howler.vger.email Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-Greylist: Sender passed SPF test, not delayed by milter-greylist-4.6.4 (howler.vger.email [0.0.0.0]); Thu, 30 Nov 2023 08:41:44 -0800 (PST) From: Tycho Andersen We are using the pidfd family of syscalls with the seccomp userspace notifier. When some thread triggers a seccomp notification, we want to do some things to its context (munge fd tables via pidfd_getfd(), maybe write to its memory, etc.). However, threads created with ~CLONE_FILES or ~CLONE_VM mean that we can't use the pidfd family of syscalls for this purpose, since their fd table or mm are distinct from the thread group leader's. In this patch, we relax this restriction for pidfd_open(). In order to avoid dangling poll() users we need to notify pidfd waiters when individual threads die, but once we do that all the other machinery seems to work ok viz. the tests. But I suppose there are more cases than just this one. Another weirdness is the open-coding of this vs. exporting using do_notify_pidfd(). This particular location is after __exit_signal() is called, which does __unhash_process() which kills ->thread_pid, so we need to use the copy we have locally, vs do_notify_pid() which accesses it via task_pid(). Maybe this suggests that the notification should live somewhere in __exit_signals()? I just put it here because I saw we were already testing if this task was the leader. Signed-off-by: Tycho Andersen --- kernel/exit.c | 29 +++++++++++++++++++---------- kernel/fork.c | 4 +--- kernel/pid.c | 11 +---------- 3 files changed, 21 insertions(+), 23 deletions(-) diff --git a/kernel/exit.c b/kernel/exit.c index ee9f43bed49a..34eeefc7ee21 100644 --- a/kernel/exit.c +++ b/kernel/exit.c @@ -263,16 +263,25 @@ void release_task(struct task_struct *p) */ zap_leader = 0; leader = p->group_leader; - if (leader != p && thread_group_empty(leader) - && leader->exit_state == EXIT_ZOMBIE) { - /* - * If we were the last child thread and the leader has - * exited already, and the leader's parent ignores SIGCHLD, - * then we are the one who should release the leader. - */ - zap_leader = do_notify_parent(leader, leader->exit_signal); - if (zap_leader) - leader->exit_state = EXIT_DEAD; + if (leader != p) { + if (thread_group_empty(leader) + && leader->exit_state == EXIT_ZOMBIE) { + /* + * If we were the last child thread and the leader has + * exited already, and the leader's parent ignores SIGCHLD, + * then we are the one who should release the leader. + */ + zap_leader = do_notify_parent(leader, + leader->exit_signal); + if (zap_leader) + leader->exit_state = EXIT_DEAD; + } else { + /* + * wake up pidfd pollers anyway, they want to know this + * thread is dying. + */ + wake_up_all(&thread_pid->wait_pidfd); + } } write_unlock_irq(&tasklist_lock); diff --git a/kernel/fork.c b/kernel/fork.c index 10917c3e1f03..eef15c93f6cf 100644 --- a/kernel/fork.c +++ b/kernel/fork.c @@ -2163,8 +2163,6 @@ static int __pidfd_prepare(struct pid *pid, unsigned int flags, struct file **re * Allocate a new file that stashes @pid and reserve a new pidfd number in the * caller's file descriptor table. The pidfd is reserved but not installed yet. * - * The helper verifies that @pid is used as a thread group leader. - * * If this function returns successfully the caller is responsible to either * call fd_install() passing the returned pidfd and pidfd file as arguments in * order to install the pidfd into its file descriptor table or they must use @@ -2182,7 +2180,7 @@ static int __pidfd_prepare(struct pid *pid, unsigned int flags, struct file **re */ int pidfd_prepare(struct pid *pid, unsigned int flags, struct file **ret) { - if (!pid || !pid_has_task(pid, PIDTYPE_TGID)) + if (!pid) return -EINVAL; return __pidfd_prepare(pid, flags, ret); diff --git a/kernel/pid.c b/kernel/pid.c index 6500ef956f2f..4806798022d9 100644 --- a/kernel/pid.c +++ b/kernel/pid.c @@ -552,11 +552,6 @@ struct pid *pidfd_get_pid(unsigned int fd, unsigned int *flags) * Return the task associated with @pidfd. The function takes a reference on * the returned task. The caller is responsible for releasing that reference. * - * Currently, the process identified by @pidfd is always a thread-group leader. - * This restriction currently exists for all aspects of pidfds including pidfd - * creation (CLONE_PIDFD cannot be used with CLONE_THREAD) and pidfd polling - * (only supports thread group leaders). - * * Return: On success, the task_struct associated with the pidfd. * On error, a negative errno number will be returned. */ @@ -615,11 +610,7 @@ int pidfd_create(struct pid *pid, unsigned int flags) * @flags: flags to pass * * This creates a new pid file descriptor with the O_CLOEXEC flag set for - * the process identified by @pid. Currently, the process identified by - * @pid must be a thread-group leader. This restriction currently exists - * for all aspects of pidfds including pidfd creation (CLONE_PIDFD cannot - * be used with CLONE_THREAD) and pidfd polling (only supports thread group - * leaders). + * the process identified by @pid. * * Return: On success, a cloexec pidfd is returned. * On error, a negative errno number will be returned. base-commit: 2cc14f52aeb78ce3f29677c2de1f06c0e91471ab -- 2.34.1