Received: by 2002:a25:1506:0:0:0:0:0 with SMTP id 6csp5215973ybv; Mon, 17 Feb 2020 15:04:10 -0800 (PST) X-Google-Smtp-Source: APXvYqwiIzZA+uGtjJ+DYDDTaBc66XePJM/1z2tUSoRejcuEMfowVK58Wm3bCVQMPYjvh2fwKq2w X-Received: by 2002:a9d:7ccc:: with SMTP id r12mr14355321otn.22.1581980650834; Mon, 17 Feb 2020 15:04:10 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1581980650; cv=none; d=google.com; s=arc-20160816; b=N5gGXWExUBvHLKKIWUK2+a8hxUnPKZXJ3uOMsumvFP01F8eParDEdTEFeczuggr+ZR SrLp/Lh+WhYbgN5YXujWFk4S45tcV9P/Dw+BqPdYmx0er+afNRGK8LPcGNxfY1PVQuIq u62CD8jNbgeNR7EM4vb7pHpkXQuzE1WqcVUL69ieWnZovLEpJTDkfgXbTDfkEpf9Hlia TfXkJrDP0QYO2QD0cVqN91aUbONxH+Lpnqfi3bOmjkPaTThuv97XTQZlDe8eA6FqqGOu HVFTKCKLcmUz9pQEs8g+Tk6D/3FSHNqwpAYCC4FVAvaDNMp8JHzY2RGN/ABiEB9rR4bE l9lg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:in-reply-to:content-disposition :mime-version:references:message-id:subject:cc:to:from:date; bh=hrPEWFTMk5x/pFNn6dwtPrFC6cSq/VwIauj+tBuS9SI=; b=EP5Pnq23mx+LXXCpeJaZTUZeBnja8kaP6yv8samYJ7Jy6xNirL4hEiChQ13CYrDBm1 KPKVlGLxD/1oNsPWWJHGp6oFrDxlOKjSfGvL7lCEuqSij8/Nz9V36PNykThAcxGT/5Bh 9EoGVeyJ88CjPBCAlMRCJM1IXR/kPS0H25U0GLsNF41tfVGC8Q2UCOJQdU38Jin/ZFb7 yFP0eWoBeNZfydy8JrooE/TTnUfg4psgR3+O3ClR3dnsVK8YlPBIZLOhbOzCYU0CpHLg IeiGx7wtnNDAhCH6NMMGP4lmpMe3CI9H9yx79Gizi/iB/AtxkPs96O+NQZsa9f6lDazO cRVA== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id d20si874552otq.157.2020.02.17.15.03.58; Mon, 17 Feb 2020 15:04:10 -0800 (PST) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726166AbgBQXDi (ORCPT + 99 others); Mon, 17 Feb 2020 18:03:38 -0500 Received: from youngberry.canonical.com ([91.189.89.112]:56717 "EHLO youngberry.canonical.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1725922AbgBQXDi (ORCPT ); Mon, 17 Feb 2020 18:03:38 -0500 Received: from ip5f5bf7ec.dynamic.kabel-deutschland.de ([95.91.247.236] helo=wittgenstein) by youngberry.canonical.com with esmtpsa (TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.86_2) (envelope-from ) id 1j3pQK-0001Ar-JZ; Mon, 17 Feb 2020 23:03:32 +0000 Date: Tue, 18 Feb 2020 00:03:31 +0100 From: Christian Brauner To: "Michael Kerrisk (man-pages)" Cc: Dmitry Safonov , Andrei Vagin , Linux Kernel , Dmitry Safonov <0x7f454c46@gmail.com>, Adrian Reber , Andy Lutomirski , Arnd Bergmann , Cyrill Gorcunov , "Eric W. Biederman" , "H. Peter Anvin" , Ingo Molnar , Jann Horn , Jeff Dike , Oleg Nesterov , Pavel Emelyanov , Shuah Khan , Thomas Gleixner , Vincenzo Frascino , containers , criu@openvz.org, Linux API , x86@kernel.org, Andrei Vagin Subject: Re: Time Namespaces: CLONE_NEWTIME and clone3()? Message-ID: <20200217230331.he6p5bs766zp6smx@wittgenstein> References: <20191112012724.250792-1-dima@arista.com> <20191112012724.250792-4-dima@arista.com> <20200217145908.7epzz5nescccwvzv@wittgenstein> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline In-Reply-To: Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Mon, Feb 17, 2020 at 10:47:53PM +0100, Michael Kerrisk (man-pages) wrote: > Hello Christian, > > On Mon, 17 Feb 2020 at 16:15, Christian Brauner > wrote: > > > > On Mon, Feb 17, 2020 at 03:20:55PM +0100, Michael Kerrisk wrote: > > > Hello Dmitry, Andrei, > > > > > > Is the CLONE_NEWTIME flag intended to be usable with clone3()? The > > > mail quoted below implies (in my reading) that this should be possible > > > once clone3() is available, which it is by now. (See also [1].) > > > > > > If the answer is yes, CLONE_NEWTIME should be usable with clone3(), > > > then I have a bug report and a question. > > > > > > I successfully used CLONE_NEWTIME with unshare(). But if I try to use > > > CLONE_NEWSIGNAL with clone3(), it errors out with EINVAL, because of > > > > s/CLONE_NEWSIGNAL/CLONE_NEWTIME/ > > > > > the following check in clone3_args_valid(): > > > > > > /* > > > * - make the CLONE_DETACHED bit reuseable for clone3 > > > * - make the CSIGNAL bits reuseable for clone3 > > > */ > > > if (kargs->flags & (CLONE_DETACHED | CSIGNAL)) > > > return false; > > > > > > The problem is that CLONE_NEWTIME matches one of the bits in the > > > CSIGNAL mask. If the intention is to allow CLONE_NEWTIME with > > > clone3(), then either the bit needs to be redefined, or the error > > > checking in clone3_args_valid() needs to be reworked. > > > > If this is intended to be useable with clone3() the check should be > > adapted to allow for CLONE_NEWTIME. (I asked about this a while ago I > > think.) > > But below rather sounds like it should simply be an unshare() flag. The > > code seems to set frozen_offsets to true right after copy_namespaces() > > in timens_on_fork(new_ns, tsk) and so the offsets can't be changed > > anymore unless I'm reading this wrong. > > Alternatives seem to either make timens_offsets writable once after fork > > and before exec, I guess - though that's probably not going to work > > with the vdso judging from timens_on_fork(). > > > > The other alternative is that Andrei and Dmitry send me a patch to > > enable CLONE_NEWTIME with clone3() by exposing struct timens_offsets (or > > a version of it) in the uapi and extend struct clone_args to include a > > pointer to a struct timens_offset that is _only_ set when CLONE_NEWTIME > > is set. > > Though the unshare() way sounds way less invasive simpler. > > Actually, I think the alternative you propose just here is better. I > imagine there are times when one will want to create multiple > namespaces with a single call to clone3(), including a time namespace. > I think this should be allowed by the API. And, otherwise, clone3() > becomes something of a second-class citizen for creating namespaces. > (I don't really get the "less invasive" argument. Implementing this is > just a piece of kernel to code to make user-space's life a bit simpler > and more consistent.) I don't particularly mind either way. If there's actual users that need to set it at clone3() time then we can extend it. So I'd like to hear what Adrian, Dmitry, and Thomas think since they are well-versed how this will be used in the wild. I'm weary of exposing a whole new uapi struct and extending clone3() without any real use-case but I'm happy to if there is! Christian