Received: by 2002:a05:6a10:c7c6:0:0:0:0 with SMTP id h6csp522535pxy; Sat, 31 Jul 2021 15:47:10 -0700 (PDT) X-Google-Smtp-Source: ABdhPJxj0o3qe0NFbcs0eFPf6HrxqPlue1t1sgxAi90DdGfPm4FENh/fHroEMk87/pXUeLNdEq3A X-Received: by 2002:a05:6e02:525:: with SMTP id h5mr2539693ils.205.1627771630289; Sat, 31 Jul 2021 15:47:10 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1627771630; cv=none; d=google.com; s=arc-20160816; b=oNy7xxTfL/xgOylReFkGmjaKBBQZpYunldYbP89J/7H9Gt5/THaYTWzEKPBZWGjhbq 985TKSesNAZ3N8Q8bxJZHlxMEE+4HXfXqrwwQ/VBXPzLxIdqkMYDgy+GkL5+RT06aluL u8LvLfvIwlK+AuQTvbhifNihKGDn1TGDD+Ap9j72UU/qm80jL1GVkcZz/yXYiLR2SkOq 5RHzpDSgo9pWxgr1JKrvQ9zEMHuCwSHRaxQLnsqUeRgbLFumo4hBJ9Xwv59c+HvsX1vX oJrRIvPghzzvH07BFVdZ43NXqZxjQ5wqm2XlZMokb+wMsDnWTGxB/WKSV9Fmuxoj9X41 TIzA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:in-reply-to:content-disposition :mime-version:references:message-id:subject:cc:to:from:date; bh=LyEMrOv3HMpyvkNiDFCSxVGv8e4tLLo8WE0I/VvTigA=; b=e1T1c3FmPUd+XS0/+WulUWGIGAWdivGxgkLZs1om0qkmObEyaos68Iwxp4WXmGl2NZ +LWLpvDLwrenimqOdNx3Us7WAfNFRFqjvMeWgW3ErGOwf0iEfLYjU/TbfviDE68fepcr nJIjDtmvAlOUI8QJHLFVDpfE/81sLjzoyKrC5UhQmFqI8BnBU9yzCt1U7v091rvZ9mK+ 86D6C9gHkqzJYKeWWVUyMPmxPXPY/rbkieJKihHDGY/t0S7mkqJGGRwTXx5MLR0c6BgP 8rmNGYb7jN8nHZTAsJn2Nn1jvno5YbNRIy2ldQGCh8YpwkHYu2uC2Lr8J9/RMrhMu6dg MUFg== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id 16si6441473ioe.32.2021.07.31.15.46.58; Sat, 31 Jul 2021 15:47:10 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231559AbhGaWoh (ORCPT + 99 others); Sat, 31 Jul 2021 18:44:37 -0400 Received: from zeniv-ca.linux.org.uk ([142.44.231.140]:36766 "EHLO zeniv-ca.linux.org.uk" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229505AbhGaWoh (ORCPT ); Sat, 31 Jul 2021 18:44:37 -0400 Received: from viro by zeniv-ca.linux.org.uk with local (Exim 4.94.2 #2 (Red Hat Linux)) id 1m9xgO-005bgK-Az; Sat, 31 Jul 2021 22:42:16 +0000 Date: Sat, 31 Jul 2021 22:42:16 +0000 From: Al Viro To: John Ericson Cc: Christian Brauner , LKML , David Laight , Andy Lutomirski , "Jason A. Donenfeld" , Kernel Hardening , Jann Horn , Christian Brauner Subject: Re: Leveraging pidfs for process creation without fork Message-ID: References: <20210729142415.qovpzky537zkg3dp@wittgenstein> <1468d75c-57ae-42aa-85ce-2bee8d403763@www.fastmail.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <1468d75c-57ae-42aa-85ce-2bee8d403763@www.fastmail.com> Sender: Al Viro Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Sat, Jul 31, 2021 at 03:11:03PM -0700, John Ericson wrote: > Do you mind pointing out one of those examples? I'm new to this, but if they follow a pattern I should be able to find the other examples based off it. I'm certainly curious to take a look :). > > I hope these issues aren't to deep. Ideally there's a nice decoupling so the creating process is just manipulating "inert" data structures for the embryo that scheduler doesn't even need see, and then after the embryonic process is submitted, when the context switches to it for the first time that's a completely normal process without special cases. > > The place complexity is hardest to avoid I think would be cleaning up the yet-unborn embryonic processes orphaned by exitted parent(s), because that will have to handle all the semi-initialized states those could be in (as opposed to real processes). It's more on the exit/exec/coredump side, actually. For exit we want to be sure that no new live threads will appear in a group once the last live thread has entered do_exit(). For exec (de_thread(), for starters) you want to have all threads except for the one that does execve() to be killed and your thread to take over as group leader. Look for the machinery there and in do_exit()/release_task() involved into that. For coredump you want all threads except for dumper to be brought into do_exit() and stopped there, for dumping one to be able to access their state. Then there's fun with ->sighand treatment - the whole thing critically relies upon ->sighand being shared for the entire thread group; look at the ->sighand->siglock uses. The whole area is full of rather subtle places. Again, the real headache comes from the exit and execve. Embryonic threads are passive; it's the ones already running that can (and do) cause PITA. What do you want that for, BTW?