Received: by 2002:a05:6a10:c7c6:0:0:0:0 with SMTP id h6csp2185030pxy; Mon, 2 Aug 2021 23:01:58 -0700 (PDT) X-Google-Smtp-Source: ABdhPJx4V8Ku/uTlGATGEufx1WjxaiBrFWLej+H2YaBkY7OHHXZQ0gBT+PBHffllZFefsWQz+kTp X-Received: by 2002:a92:db4b:: with SMTP id w11mr1037262ilq.297.1627970517891; Mon, 02 Aug 2021 23:01:57 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1627970517; cv=none; d=google.com; s=arc-20160816; b=OiZdTwgiAiA9KiAqgltMkiy/TEqsP9x6jN1qAbN3UHdn2OlFWso+/vQ3K/nnxHsuMJ GAiybGpzeEyk6hG4lMBO+eRwuOhFwOSfgEwZjxPf9sRtm9qN/b5CHvGXpFVuvS3HEp6B uZ6DUziK9J+yrjaalQgP61BozphFn9ZeV5rBM69Sknx6U4oNV8R2ahglHbZjWfpP+tyN cLZCMGrsazW4wFAGTiANR0a9z/S/Cof5AQkCXAnAyndSrXSdmiK7sMTIfpjhNSgbq3O9 fNwJe9u4/ogPMMP+qXC/3V4rn4gjCOHRXYU2DEuduQRv2Lc2+/OOBaOVlJDYz9+PsoDn fG6g== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:content-language :in-reply-to:mime-version:user-agent:date:message-id:references:cc :to:subject:from:dkim-signature:dkim-signature; bh=1t6DwB38Ya6t7Tu/rVm9U8PBSU3BOMT1sByp+5Idu5I=; b=V1NiDcBEOC8J3eqsq9YqFN9C698EK2rTjFPqGC6USMcKnukO51GmVP6hd/IMZfNJ6+ drOl/y+om9P5B7ra1Yd5CrkJx6M2lOEfJtPy78u/tYDQImY/gsjjem6O+OOhvK6O5bvq 43CuUXhS2aoeC0hfdiQ1E91D2LNipd6OXH0dVBEb5BnXwIoJitCAYjSWWrm26e3nffEO m1gkhD3Cagu56hLm6AGuzRwjOnVxRyyPUDzyOaSygqWUH8q0i3Jugq2XhQtOMkxA1yFo mj7PYh0r5csZGKlkH+6fyxf4PJxp/E3fjN+pEpZ1CSDUi013OOn7trLBpii5qnUC7HOm 3E8w== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@johnericson.me header.s=fm1 header.b=J+HlGiCY; dkim=pass header.i=@messagingengine.com header.s=fm3 header.b=QPMN3YMO; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id d12si16725818ilg.4.2021.08.02.23.01.43; Mon, 02 Aug 2021 23:01:57 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; dkim=pass header.i=@johnericson.me header.s=fm1 header.b=J+HlGiCY; dkim=pass header.i=@messagingengine.com header.s=fm3 header.b=QPMN3YMO; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S233758AbhHCGBM (ORCPT + 99 others); Tue, 3 Aug 2021 02:01:12 -0400 Received: from out2-smtp.messagingengine.com ([66.111.4.26]:52411 "EHLO out2-smtp.messagingengine.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230096AbhHCGBL (ORCPT ); Tue, 3 Aug 2021 02:01:11 -0400 Received: from compute5.internal (compute5.nyi.internal [10.202.2.45]) by mailout.nyi.internal (Postfix) with ESMTP id 833E95C0097; Tue, 3 Aug 2021 02:01:00 -0400 (EDT) Received: from mailfrontend2 ([10.202.2.163]) by compute5.internal (MEProxy); Tue, 03 Aug 2021 02:01:00 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=johnericson.me; h=from:subject:to:cc:references:message-id:date:mime-version :in-reply-to:content-type:content-transfer-encoding; s=fm1; bh=1 t6DwB38Ya6t7Tu/rVm9U8PBSU3BOMT1sByp+5Idu5I=; b=J+HlGiCYm0yMRxz7e vpYUQmVQ047hXM//JDP+47cq1vopYq99mq4xn0vzIIcOPjJ9XT5mvnQZbhHIfXYO QoHKnY1ARruZ7kkgYGLbk4cEPYTs54RJLLi7Ok/WsLgG65eoM3SKUgADuJ/ZB1x6 QhwLlL5a8g+dhh3n6JWgj6xUE7/pg8Q163ObAZ3BwwVoV8okpi9VAmcsbtLf8NKq 07Tyn/E8bqg6dn+sebouIU9ooBlNBbB5HtTwb+EAPPkbIOEtghTADbTn9Z678fNk 1lYPQdSPE5NDqkzXAKf4gsfKEchg70TfvO/m/sYIJoACeTDrnC7Lm9Qb9zECWDpg ajjxg== DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d= messagingengine.com; h=cc:content-transfer-encoding:content-type :date:from:in-reply-to:message-id:mime-version:references :subject:to:x-me-proxy:x-me-proxy:x-me-sender:x-me-sender :x-sasl-enc; s=fm3; bh=1t6DwB38Ya6t7Tu/rVm9U8PBSU3BOMT1sByp+5Idu 5I=; b=QPMN3YMOT9ijdsl5WznkgQ//2o/aibY1ZiVaNMsxDheJZUu0mDvCteSAl 9bRpVk8UFe3L1HtrdW9H+wVH2qe9s4wvx56b6q5mFbe6+SeXnlkWY0xAqxVkgwb1 70/oGtXTw8xRU90e3Z9+wPSiunHJ0j3BKpiimtEU2wO+ayvHaMYMJOWb96Iwnc8/ oZ9pFiUEAM4VzUJrJ2XSBN82gHAmyj3qHhNYqWMcXfFB/tyuENDIybwfIz6olMoK uPulHqINWNYMtQVxcc7zOLXum6xrDwQj3/MxFOsp3qmm1x1iaF2QZFZRlBDa+J8s qHxwqwRUgvjCZJ+BIXsfPv+N8Gdjg== X-ME-Sender: X-ME-Received: X-ME-Proxy-Cause: gggruggvucftvghtrhhoucdtuddrgedvtddrieefgdelvdcutefuodetggdotefrodftvf curfhrohhfihhlvgemucfhrghsthforghilhdpqfgfvfdpuffrtefokffrpgfnqfghnecu uegrihhlohhuthemuceftddtnecusecvtfgvtghiphhivghnthhsucdlqddutddtmdenuc fjughrpefhuffvfhfkffgfgggjtgfgsehtjeertddtfeehnecuhfhrohhmpeflohhhnhcu vehothhtohhnucfgrhhitghsohhnuceomhgrihhlseflohhhnhfgrhhitghsohhnrdhmvg eqnecuggftrfgrthhtvghrnhepteelveettdeuhfeuhfffkeevieejfffgvdffhfefvdeg vefgfffgfffggfevffeinecuvehluhhsthgvrhfuihiivgeptdenucfrrghrrghmpehmrg hilhhfrhhomhepmhgrihhlseflohhhnhfgrhhitghsohhnrdhmvg X-ME-Proxy: Received: by mail.messagingengine.com (Postfix) with ESMTPA; Tue, 3 Aug 2021 02:00:58 -0400 (EDT) From: John Cotton Ericson Subject: Re: Leveraging pidfs for process creation without fork To: Christian Brauner , Al Viro Cc: LKML , David Laight , Andy Lutomirski , "Jason A. Donenfeld" , Kernel Hardening , Jann Horn , Christian Brauner References: <20210729142415.qovpzky537zkg3dp@wittgenstein> <1468d75c-57ae-42aa-85ce-2bee8d403763@www.fastmail.com> <20210802121935.dkiw627twcrxbh54@wittgenstein> Message-ID: <01e72780-e328-23b5-c114-9c35125da7c6@JohnEricson.me> Date: Tue, 3 Aug 2021 02:00:57 -0400 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:78.0) Gecko/20100101 Thunderbird/78.11.0 MIME-Version: 1.0 In-Reply-To: <20210802121935.dkiw627twcrxbh54@wittgenstein> Content-Type: text/plain; charset=windows-1252; format=flowed Content-Language: en-US Content-Transfer-Encoding: 7bit Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Mon, Aug 2, 2021, at 8:19 AM, Christian Brauner wrote: > On Sat, Jul 31, 2021 at 10:42:16PM +0000, Al Viro wrote: > > > > It's more on the exit/exec/coredump side, actually. For > > exit we want to be sure that no new live threads will appear in a > > group once the last live thread has entered do_exit(). For > > exec (de_thread(), for starters) you want to have all threads > > except for the one that does execve() to be killed and your > > thread to take over as group leader. Look for the machinery there > > and in do_exit()/release_task() involved into that. For coredump > > you want all threads except for dumper to be brought into do_exit() > > and stopped there, for dumping one to be able to access their state. > > > > Then there's fun with ->sighand treatment - the whole thing > > critically relies upon ->sighand being shared for the entire thread > > group; look at the ->sighand->siglock uses. > > > > The whole area is full of rather subtle places. Again, the > > real headache comes from the exit and execve. Embryonic threads are > > passive; it's the ones already running that can (and do) cause PITA. I took a look at de_thread and begin_new_exec. It does seems whatever trouble there is stems from a bit of mixing concerns as I thought. Most of begin_new_exec seems about wiping clean the current process's state, including the de_thread, unsharing various things. But then operations like that first bprm_creds_from_file call (of perhaps more recent vintage [1]) is about initializing new state from binprm argument. It is interesting to me to note that some of the "unsharing" happens at clone time (the namespaces), and some happens (also) at exec time (file table, signal handlers). This to me is more good concrete evidence fork + exec is awkward and strews concerns. There perhaps will be some subtleties about in which order state can be set up on the embryonic process, but I don't think any de_thread will be needed because there will never be threads from a "previous" state lying around. Indeed there is no "previous" anything, just the current everything-inert embryonic process. I would propose trying to rip up begin_new_exec so the unsharing, de_thread-ing etc. is just done in the traditional exec path, and just the bprm bits with a non-current fresh embryonic task_sched are done in the new one. [1]: 56305aa9b6fab91a5555a45796b79c1b0a6353d1 > Iiuc, you're talking about adding a thread into a thread-group tg1 from > a thread in another thread-group tg2. I don't think that's a very > pressing use-case and I agree that that sounds rather nasty right now. > Unless I'm missing something, a simple api to create something like a > processes configuration context doesn't require this. Agreed. I did mention embryonic processes with multiple threads, but was just a shower thought and not something I really care about. Also, since that would entail adding a thread to an inert thread group the creator has full powers over (it's "on the operating table") I don't think it would be so bad. (To keep this new servery metaphor going, exec would be self-surgery, and adding a thread to *live* thread group would be surgery without anesthesia.) > a processes configuration context This phrase stuck to me, Christian. Not to rush you on your concrete proposal, but sounds like you are envisioning building up a separate struct with instructions on how to produce a process, rather than mutating unscheduled but otherwise genuine `task_struct`s? > > What do you want that for, BTW? Those security + ergonomic things I mentioned in my original email are the main goal. I have a personal *long*-term goal to see something like CloudABI resurrected. I think it got most of the interfaces right, but not process management, and now that there are pidfds, we have a chance to better. I'm in no rush, so happy to just see very linux-specific interfaces evolve in a good direction for now. Writing a personality or some other shim is not the interesting part, to say the least, so I'm happy to wait ages before doing that while the internals marinate. John