Received: by 2002:a05:6a10:c7c6:0:0:0:0 with SMTP id h6csp1574771pxy; Mon, 2 Aug 2021 05:20:58 -0700 (PDT) X-Google-Smtp-Source: ABdhPJw4uOE+yx4wX4NeUZqBtEEaHNj10prMd+lnfy24UyH9q6cKFm3tINSaaa+tqECrKX8XZgUa X-Received: by 2002:a92:6d07:: with SMTP id i7mr1126310ilc.104.1627906858443; Mon, 02 Aug 2021 05:20:58 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1627906858; cv=none; d=google.com; s=arc-20160816; b=taUb2Vhi3pSwLfkE9ZK9q4yLL5Rr8NE6jW3I6gWvck7jiWHCJ/HN0MkNfxlM3cH27e D6pCH7W2iZjY0Ql4vGixoQYe7pBrUeF2IMrwfcq9PmbvBQr8jdfO8qAAo/BvQRYPBqMH uidXqB4jAzbAizbAlYyH6RN6YOxf2CklWTkkzdC6cYli8Z8BYKa8Bub+2k7l9Bsd3aWr C9Ctj/yJZ6QnSQuM+on1RIeGsRhIVmVs9sHUX9dc6yI3O3rTVvVXWRBMUF5jlfCgBBd1 htwOxsx9EnLhl6Ip0T2LG4RjBwHAenhs3CALBbefZVdhHVp2QWUf6dnvINqbKzBEzAcI SgWw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:in-reply-to:content-disposition:mime-version :references:message-id:subject:cc:to:from:date; bh=LNOPfHQytdlLJwbMnoQMsbf/c6oEGtWJoDSKif7fWyc=; b=csCCwcj/9van0zc3i778aOhFU0nZUe3ia35F81pCvOkALOF5Z7swHVOyFWgLlWKdNU UKZ+mz9UObDHIpBTyYr2Ey7J1gDoUK5+WqanzUqQWKLkIpynGUl+kkN99luzl5hj83Du au0W/OXlm3DmAiMN62K5Q3aEGikhxmU0wvQZziPoM3tLWB9jPgDUzqqzGKmdqvtL78Zy piwUfO7jpf9wZ9kXFZ0zjVM6nbNCBZt2sUJJzUfoASTMCE0rQsZuuvU9wLuS++zzRqXw yy0Vy++2i6UNivF95aZTh+s52d/8UDSabyor/8auPi2EpqzWaXaUVpqpFlPFjDsgPUq4 2duA== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id x9si12525783jat.100.2021.08.02.05.20.46; Mon, 02 Aug 2021 05:20:58 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S233567AbhHBMTu (ORCPT + 99 others); Mon, 2 Aug 2021 08:19:50 -0400 Received: from mail.kernel.org ([198.145.29.99]:34740 "EHLO mail.kernel.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S233498AbhHBMTu (ORCPT ); Mon, 2 Aug 2021 08:19:50 -0400 Received: by mail.kernel.org (Postfix) with ESMTPSA id 4E20260F35; Mon, 2 Aug 2021 12:19:38 +0000 (UTC) Date: Mon, 2 Aug 2021 14:19:35 +0200 From: Christian Brauner To: Al Viro Cc: John Ericson , LKML , David Laight , Andy Lutomirski , "Jason A. Donenfeld" , Kernel Hardening , Jann Horn , Christian Brauner Subject: Re: Leveraging pidfs for process creation without fork Message-ID: <20210802121935.dkiw627twcrxbh54@wittgenstein> References: <20210729142415.qovpzky537zkg3dp@wittgenstein> <1468d75c-57ae-42aa-85ce-2bee8d403763@www.fastmail.com> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline In-Reply-To: Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Sat, Jul 31, 2021 at 10:42:16PM +0000, Al Viro wrote: > On Sat, Jul 31, 2021 at 03:11:03PM -0700, John Ericson wrote: > > Do you mind pointing out one of those examples? I'm new to this, but if they follow a pattern I should be able to find the other examples based off it. I'm certainly curious to take a look :). > > > > I hope these issues aren't to deep. Ideally there's a nice decoupling so the creating process is just manipulating "inert" data structures for the embryo that scheduler doesn't even need see, and then after the embryonic process is submitted, when the context switches to it for the first time that's a completely normal process without special cases. > > > > The place complexity is hardest to avoid I think would be cleaning up the yet-unborn embryonic processes orphaned by exitted parent(s), because that will have to handle all the semi-initialized states those could be in (as opposed to real processes). > > It's more on the exit/exec/coredump side, actually. For > exit we want to be sure that no new live threads will appear in a > group once the last live thread has entered do_exit(). For > exec (de_thread(), for starters) you want to have all threads > except for the one that does execve() to be killed and your > thread to take over as group leader. Look for the machinery there > and in do_exit()/release_task() involved into that. For coredump > you want all threads except for dumper to be brought into do_exit() > and stopped there, for dumping one to be able to access their state. > > Then there's fun with ->sighand treatment - the whole thing > critically relies upon ->sighand being shared for the entire thread > group; look at the ->sighand->siglock uses. > > The whole area is full of rather subtle places. Again, the > real headache comes from the exit and execve. Embryonic threads are > passive; it's the ones already running that can (and do) cause PITA. Iiuc, you're talking about adding a thread into a thread-group tg1 from a thread in another thread-group tg2. I don't think that's a very pressing use-case and I agree that that sounds rather nasty right now. Unless I'm missing something, a simple api to create something like a processes configuration context doesn't require this.