Received: by 2002:a6b:fb09:0:0:0:0:0 with SMTP id h9csp237993iog; Wed, 15 Jun 2022 01:01:11 -0700 (PDT) X-Google-Smtp-Source: AGRyM1swdowuEvf42voKWmW5sZ/d4QfDdDrJHDLlGT8HjO2pwpYnbkbdsZCUtBLDuPR7w3oqtbHH X-Received: by 2002:a17:902:aa8f:b0:168:b18c:5e16 with SMTP id d15-20020a170902aa8f00b00168b18c5e16mr8081401plr.64.1655280071307; Wed, 15 Jun 2022 01:01:11 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1655280071; cv=none; d=google.com; s=arc-20160816; b=tFspCgPiTJLPuUitpP1Za3KOZEmpLm08MxTmr/bemsddQuhcpGCG1F10PzSBBchPgG JfSDSFSVtRFf+2pMIHCHWvjMyoiRXEIHfXxZD7W0withNN9yB/l1k+XHwQfI3vyNdAIh 9xMwiIxXnB/pF1jYOx2TaWIrHWmgBbzq1HzxGT6QKDHfcHm9eEAWloZSB7QYiCrpCm2X 6/G+kJQhNk9SEmYR3GkalrdiyHwUYMwLTDxuL6WiEps95a3YULUwc8zlRG+dGP4Y4RXg i+/L7camdmKZx/xoEgFUgttrns1fri9J91XtKfU4wMFnKhoyhwJft78Yi0WoIi0HsBk5 FTrg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:in-reply-to:content-disposition:mime-version :references:message-id:subject:cc:to:from:date:dkim-signature; bh=4YCosIekAOS/OZL39QeFYiAZW3CIYaBg4tcyQmnY8Iw=; b=s7zWD7o3Zdi3n6vYiRnrLQmnRvEGPCgVWXZvU1GAJLS+T1N51i4AXZLZfZ1ydKCrPo +A7gVDbpVtcaYx0QacqELYdh1RLwqQKVteOSwqxE3E7lNe4hr+y3byaUcalQJICAvJmw pMmjrOXqoYx3Wf74e+Av1lWKe/JCJTP48M27BrhOsePr2gBHYez5hLtI9px5FuV3Yx3a WisQYx1Sc3ew7hgiCvum6aXEVVfiRezoH516aBtfllJBnKqq9Sk7BYXs8sTrlTUcv5jr jt4dnsmxSuLV5w/7eoY4YmlbklBi0OiyzV6lGUriaoW0fUMIvrwhc90Tp8zuOTtpCegh Kj0w== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@kernel.org header.s=k20201202 header.b=WbXfnxam; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=kernel.org Return-Path: Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id d16-20020a170903231000b0016409612071si18124369plh.121.2022.06.15.01.00.53; Wed, 15 Jun 2022 01:01:11 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@kernel.org header.s=k20201202 header.b=WbXfnxam; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S236475AbiFOHwb (ORCPT + 99 others); Wed, 15 Jun 2022 03:52:31 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:58342 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S234935AbiFOHw2 (ORCPT ); Wed, 15 Jun 2022 03:52:28 -0400 Received: from dfw.source.kernel.org (dfw.source.kernel.org [139.178.84.217]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id D81D8F55 for ; Wed, 15 Jun 2022 00:52:25 -0700 (PDT) Received: from smtp.kernel.org (relay.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by dfw.source.kernel.org (Postfix) with ESMTPS id 6F31A61917 for ; Wed, 15 Jun 2022 07:52:25 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id ED225C3411C; Wed, 15 Jun 2022 07:52:22 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1655279544; bh=/Cq8JxWDGWPl1V210SZ7wqDAk1YugafyAPEyJX3tbh4=; h=Date:From:To:Cc:Subject:References:In-Reply-To:From; b=WbXfnxamMI9lriVQxFHGz+SDHAV/InUwRvxz9w+8yPYsVUbPEZC9X+nw40fofM1jP dNldOC17X30lLDafgiwlqAUgKqjfYVUwcJxgj9fpSlPBquLHfAvEWSENSW/MV/BnjW TbJmhDmNRRDEc7U0TuIMkgQqxdadP+3msiO+mwmn+BXhqQ253EIxokGRNJfD6QkNWV ieGANqzWz677uZiaLRBiahDa+Wz7y+gFcv+WF5gpf1dm6765hDMARgcG8e6N+1knT4 wjMHUUPuVNfktVIJM/m3X0rHPSzaNb+xHMjnwkLQopi1DvZwS/Wc52y8HvzsUXiWLi LcsRhjzywnBnQ== Date: Wed, 15 Jun 2022 09:52:19 +0200 From: Christian Brauner To: Kees Cook Cc: Andrei Vagin , linux-kernel@vger.kernel.org, Dmitry Safonov <0x7f454c46@gmail.com>, Florian Weimer , linux-mm@kvack.org, Eric Biederman Subject: Re: [PATCH 1/2] fs/exec: allow to unshare a time namespace on vfork+exec Message-ID: <20220615075219.5cvoc3py3zdm74oo@wittgenstein> References: <20220613060723.197407-1-avagin@gmail.com> <202206141412.2B0732FF6C@keescook> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline In-Reply-To: <202206141412.2B0732FF6C@keescook> X-Spam-Status: No, score=-8.3 required=5.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_HI, SPF_HELO_NONE,SPF_PASS,T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Tue, Jun 14, 2022 at 02:14:35PM -0700, Kees Cook wrote: > On Sun, Jun 12, 2022 at 11:07:22PM -0700, Andrei Vagin wrote: > > Right now, a new process can't be forked in another time namespace > > if it shares mm with its parent. It is prohibited, because each time > > namespace has its own vvar page that is mapped into a process address > > space. > > > > When a process calls exec, it gets a new mm and so it could be "legal" > > to switch time namespace in that case. This was not implemented and > > now if we want to do this, we need to add another clone flag to not > > break backward compatibility. > > > > We don't have any user requests to switch times on exec except the > > vfork+exec combination, so there is no reason to add a new clone flag. > > As for vfork+exec, this should be safe to allow switching timens with > > the current clone flag. Right now, vfork (CLONE_VFORK | CLONE_VM) fails > > if a child is forked into another time namespace. With this change, > > vfork creates a new process in parent's timens, and the following exec > > does the actual switch to the target time namespace. > > This seems like a very special case. None of the other namespaces do > this, do they? > > How is CLONE_NEWTIME supposed to be used today? Time namespaces are similar to pid namespaces. If a process calls unshare(CLONE_NEWTIME) it will not change into a new time namespace. Only the children of the process will. You can also see this via /proc//ns/time and /proc//ns/time_for_children. After an unshare(CLONE_NEWTIME) /proc//ns/time will be unchanged while /proc//ns/time_for_children will reference a new time namespace. So if the process now calls fork() the child will be placed in a new time namespace. As Andrei correctly points out in the commit message each time namespace gets it's own vvar page mapped into the process address space. Consequently calls to clone*() with CLONE_VM will need to fail because it would alter the parent's mm as well. That includes vfork() which is roughly just CLONE_VM | CLONE_VFORK. fork() remains unaffected. So anything that implements a process launcher using vfork() needs to implement a fork() fallback after vfork failure() in case the original process has unshared a new time namespace. As posix spawn is implemented using vfork() we would force glibc to implement a fork() fallback and enforce the introducing of a lot of complexity to work around this. I think the proposal here makes sense and allows us to avoid introducing yet another clone flag. For vfork() it also makes sense because the calling process is suspended until exec or exit so the semantics between fork() and clone*() are sufficiently distinct to justify this difference. Iow, vfork() is distinctly targeted at enforcing an exec* call already anyway. Christian