Received: by 2002:a25:ad19:0:0:0:0:0 with SMTP id y25csp10054419ybi; Wed, 24 Jul 2019 15:03:29 -0700 (PDT) X-Google-Smtp-Source: APXvYqy4kL3HkBN5cBQHW6eNLeaPuOk8gxmwTtRbcpMUi/9i+qO0SlqSBUjOFpvkWjMHjJSd3zcA X-Received: by 2002:a65:47c1:: with SMTP id f1mr81897814pgs.169.1564005809022; Wed, 24 Jul 2019 15:03:29 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1564005809; cv=none; d=google.com; s=arc-20160816; b=a9/7Bm2RURc9E17jUYerV5I5PqWPFk3itnvrfa7qR4hKd0/arjuBEZ3nIwpq6qJjb7 K/x2sYoYVDZfts4e8RRbv8uYjnIxqrQVQDAyylZohWjmuhQIurLpDKLWTmIkbHesJsn6 irTbnaz8iD8U6O5+4B66zYN3HbS4Rt9ijzFwqmgAYS6tjUvCch0KN1TxRwILaPBpPm9p b/ZtecEoJ1g9kd06E0iObUtqJIJtsF6ttfbpOUsSmstepH6llc6CXXxCwZErkqjInpEr +CI3EEZ0DFm3WQ5W0fAzX7jbxG1AWcMJ76wj+bMECOLkACWotmtEAZvKmdo7yiWLh0qy CDoQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:dkim-signature; bh=BJ2NVnXSi8ebOI8TCeX+KEiumjRr37taJEiDBtLAzIw=; b=hoX49225pDxRIwmZsXJ67+Ayu33aH74ZGnHMxHmbVdOn7BTnioZm69wBRncyOSezap dA0/UkYyCpPGE+lSwssCOFpJZErqUH/VPm/AvxP3nOmo9hnXmyFkkwx/SAb7dHVvbNSE mDIgnZpL4WhKnuX1Rbue7Lp+6Vqz18QfbS8kWzqC/0IozaG38SDs8t436oIV/wF3xuJY dz4inxIjK0Xb62GXyVR33gnBnvS0SBKGwdfX1YS27lbzDnoAkw0F1dcgi1SPOdlBe70B X9zIqp3vX3Wa9LxtcZGKLra7fc4vjSSdiwrfWtOnT8fE9XOksbPjG0ub96Wegphqvkhb JKQA== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@google.com header.s=20161025 header.b=ay7nQDJt; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id m71si14093182pjb.26.2019.07.24.15.03.14; Wed, 24 Jul 2019 15:03:29 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; dkim=pass header.i=@google.com header.s=20161025 header.b=ay7nQDJt; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1728573AbfGXTIW (ORCPT + 99 others); Wed, 24 Jul 2019 15:08:22 -0400 Received: from mail-oi1-f196.google.com ([209.85.167.196]:42907 "EHLO mail-oi1-f196.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726942AbfGXTIV (ORCPT ); Wed, 24 Jul 2019 15:08:21 -0400 Received: by mail-oi1-f196.google.com with SMTP id s184so35821115oie.9 for ; Wed, 24 Jul 2019 12:08:21 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20161025; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=BJ2NVnXSi8ebOI8TCeX+KEiumjRr37taJEiDBtLAzIw=; b=ay7nQDJto1uXe3PyisCCQ4jh0O1tS2wQekngxDHnkudc+mm443mhyxwIylnVi04ikq p+0MRA6sMLXHIiw/hlRdeEzcIyX0AQkLLglriGAU52YI7GUxDDL3uP7O8MbdaG5x324/ s1IQjOXptoKEECbr2VtO00HQSZl+DWrRVKCPcaWpsiz6AkLZW3DHJTI1mI76g4GCPhrP uohpgu8+vHc5tGBhon5LNnF9mroUP4HVW7iavecvv1zAtnAu+zBPOqE1k1qppbUtbE9V CNbTUQ1m1b1++BkjeJrUj6Gkwei+gseZ/u8ExpXPnBE+5vHE2NJCJ2+XGnA32KgepmdE /ALA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=BJ2NVnXSi8ebOI8TCeX+KEiumjRr37taJEiDBtLAzIw=; b=n3pNg8rYOP248QAp3svugXO5PjO/elQWzL9efNZI0eYQvfPwoJY0VRhCcp19j89JDz Dq08pm8TNBeV0MTkFc+M7+Nyb/nxigPtcFnU8GhePoO8wfVYMFBwbBlB2R8vqwyvOihg /HujxARbQk7c7DLDbtGteqU85VlWrGlKNggR/YujZD9CJNGP+Nc5rgbbnUJwW0zMluj+ 00pgO77FuhKbeGsuca1Swmx2xWfwbdIXbCIXnd8ilMjQAgzOYOiyAISAkQFJxngdm3FO YwJ4N6rK9PgbtQAYrL30cljCzVzqfkazjeUORiG6f9g4NHN4NeVw+RS+MAgDBJ1y0IQp d6hA== X-Gm-Message-State: APjAAAV29t5zYQP4cNxTvkaPb4y5zHM6Mha2lAW2asXuaC/KO1EbReJD SJ4i/9t4MTHwHwihRpqfXGtSEt3t0rY47I8v3+ab8w== X-Received: by 2002:aca:3dd7:: with SMTP id k206mr37653038oia.47.1563995300490; Wed, 24 Jul 2019 12:08:20 -0700 (PDT) MIME-Version: 1.0 References: <20190724144651.28272-1-christian@brauner.io> <20190724144651.28272-5-christian@brauner.io> In-Reply-To: From: Jann Horn Date: Wed, 24 Jul 2019 21:07:54 +0200 Message-ID: Subject: Re: [PATCH 4/5] pidfd: add CLONE_WAIT_PID To: Christian Brauner Cc: kernel list , Oleg Nesterov , Arnd Bergmann , "Eric W. Biederman" , Kees Cook , "Joel Fernandes (Google)" , Thomas Gleixner , Tejun Heo , David Howells , Andy Lutomirski , Andrew Morton , Aleksa Sarai , Linus Torvalds , Al Viro , kernel-team , Ingo Molnar , Peter Zijlstra , Linux API Content-Type: text/plain; charset="UTF-8" Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Wed, Jul 24, 2019 at 8:27 PM Christian Brauner wrote: > On July 24, 2019 8:14:26 PM GMT+02:00, Jann Horn wrote: > >On Wed, Jul 24, 2019 at 4:48 PM Christian Brauner > > wrote: > >> If CLONE_WAIT_PID is set the newly created process will not be > >> considered by process wait requests that wait generically on children > >> such as: > >> > >> syscall(__NR_wait4, -1, wstatus, options, rusage) > >> syscall(__NR_waitpid, -1, wstatus, options) > >> syscall(__NR_waitid, P_ALL, -1, siginfo, options, rusage) > >> syscall(__NR_waitid, P_PGID, -1, siginfo, options, rusage) > >> syscall(__NR_waitpid, -pid, wstatus, options) > >> syscall(__NR_wait4, -pid, wstatus, options, rusage) > >> > >> A process created with CLONE_WAIT_PID can only be waited upon with a > >> focussed wait call. This ensures that processes can be reaped even if > >> all file descriptors referring to it are closed. > >[...] > >> diff --git a/kernel/fork.c b/kernel/fork.c > >> index baaff6570517..a067f3876e2e 100644 > >> --- a/kernel/fork.c > >> +++ b/kernel/fork.c > >> @@ -1910,6 +1910,8 @@ static __latent_entropy struct task_struct > >*copy_process( > >> delayacct_tsk_init(p); /* Must remain after > >dup_task_struct() */ > >> p->flags &= ~(PF_SUPERPRIV | PF_WQ_WORKER | PF_IDLE); > >> p->flags |= PF_FORKNOEXEC; > >> + if (clone_flags & CLONE_WAIT_PID) > >> + p->flags |= PF_WAIT_PID; > >> INIT_LIST_HEAD(&p->children); > >> INIT_LIST_HEAD(&p->sibling); > >> rcu_copy_process(p); > > > >This means that if a process with PF_WAIT_PID forks, the child > >inherits the flag, right? That seems unintended? You might have to add > >something like "if (clone_flags & CLONE_THREAD == 0) p->flags &= > >~PF_WAIT_PID;" before this. (I think threads do have to inherit the > >flag so that the case where a non-leader thread of the child goes > >through execve and steals the leader's identity is handled properly.) > >Or you could cram it somewhere into signal_struct instead of on the > >task - that might be a more logical place for it? > > Hm, CLONE_WAIT_PID is only useable with CLONE_PIDFD which in turn is > not useable with CLONE_THREAD. > But we should probably make that explicit for CLONE_WAIT_PID too. To clarify: This code looks buggy to me because p->flags is inherited from the parent, with the exception of flags that are explicitly stripped out. Since PF_WAIT_PID is not stripped out, this means that if task A creates a child B with clone(CLONE_WAIT_PID), and then task B uses fork() to create a child C, then B will not be able to use wait(&status) to wait for C since C inherited PF_WAIT_PID from B. The obvious way to fix that would be to always strip out PF_WAIT_PID; but that would also be wrong, because if task B creates a thread C, and then C calls execve(), the task_struct of B goes away and B's TGID is taken over by C. When C eventually exits, it should still obey the CLONE_WAIT_PID (since to A, it's all the same process). Therefore, if p->flags is used to track whether the task was created with CLONE_WAIT_PID, PF_WAIT_PID must be inherited if CLONE_THREAD is set. So: diff --git a/kernel/fork.c b/kernel/fork.c index d8ae0f1b4148..b32e1e9a6c9c 100644 --- a/kernel/fork.c +++ b/kernel/fork.c @@ -1902,6 +1902,10 @@ static __latent_entropy struct task_struct *copy_process( delayacct_tsk_init(p); /* Must remain after dup_task_struct() */ p->flags &= ~(PF_SUPERPRIV | PF_WQ_WORKER | PF_IDLE); p->flags |= PF_FORKNOEXEC; + if (!(clone_flags & CLONE_THREAD)) + p->flags &= ~PF_PF_WAIT_PID; + if (clone_flags & CLONE_WAIT_PID) + p->flags |= PF_PF_WAIT_PID; INIT_LIST_HEAD(&p->children); INIT_LIST_HEAD(&p->sibling); rcu_copy_process(p); An alternative would be to not use p->flags at all, but instead make this a property of the signal_struct - since the property is shared by all threads, that might make more sense?