Received: by 2002:a05:6902:102b:0:0:0:0 with SMTP id x11csp3538635ybt; Tue, 23 Jun 2020 04:59:38 -0700 (PDT) X-Google-Smtp-Source: ABdhPJztAzVgJ+rEFMKhsSpz7TRN3JY1+FeaTBukKTvZbR6Ya662wf3lKCn0C3+IEfKcscQrJqhE X-Received: by 2002:a05:6402:1592:: with SMTP id c18mr22773171edv.40.1592913578426; Tue, 23 Jun 2020 04:59:38 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1592913578; cv=none; d=google.com; s=arc-20160816; b=rClmur0KV24fJoTM+nP0mk6tw52Bs1r+Hn+4eyr2GZI37x1o+avkZDApmjnoOro4Uq XCXA8lGKlyyKPtPNZ+Hw7MM0UNjDpRlzXpBM8+oIvxDs+8MQLth4esOGHN5UpHC7mDSA sOW5yQqHf+Ho7KGVKrfIYjD5280TPJ6YpS+Jv0bJcpECucOmCh89J7mteVziu2LJC0Ti KnXOhvee71BwTzlkNDdFmtWHMdiNo0nzbeQ+U0nTwa0Xg3tWdvN7FMpXejnZvzpuwSNf H8T1CqhS3Udj4pfOlQ3nVuLx5UelJO7U/710b5GaC255a+ffOCPj/st6jcyIFbfS67FT VnkA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:in-reply-to:content-disposition :mime-version:references:message-id:subject:cc:to:from:date; bh=/z/edC9sIvPEZvT3p/27A80RePMSUGZ/GbTuXLEE9zo=; b=juc3vRjSwX9ep0mO1lpyKAAj7Gr6Pch9DcNQDcEXFv71dFz2pcHiQPBvBkhly86onY gXq0ijUtgWPqVAeWegn5R2nkoQtZT29TcG3sk6o/HZQiWz3cvqYB8BE92IVyC9G+Y42s kkFJcTnNNHQlPFIJBIyQxHjqQIxT39treGcj0q+PlKKI7fhrJwjt9Dw3dmTm3UI/bG4D DiNWlsRN0VMZcs65fyjoOKFxLl2Y3cmLOCJAOPNWELSmlzpCeZDeijK35DLeZCjt8aRy aUw/PLLS8iPCpc0D1a9XpWZGMMq7oxuufgm5aLWnlfhLGs3DxkaqsA9Y1CNrPXy5Cc4C d/gw== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id o25si10340219eja.361.2020.06.23.04.59.14; Tue, 23 Jun 2020 04:59:38 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1732475AbgFWLz1 (ORCPT + 99 others); Tue, 23 Jun 2020 07:55:27 -0400 Received: from youngberry.canonical.com ([91.189.89.112]:46796 "EHLO youngberry.canonical.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1732400AbgFWLz1 (ORCPT ); Tue, 23 Jun 2020 07:55:27 -0400 Received: from ip5f5af08c.dynamic.kabel-deutschland.de ([95.90.240.140] helo=wittgenstein) by youngberry.canonical.com with esmtpsa (TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.86_2) (envelope-from ) id 1jnhWN-00018K-3o; Tue, 23 Jun 2020 11:55:23 +0000 Date: Tue, 23 Jun 2020 13:55:21 +0200 From: Christian Brauner To: linux-arm-kernel@lists.infradead.org, linux-kernel@vger.kernel.org, x86@kernel.org, Dmitry Safonov , Andrei Vagin Cc: Will Deacon , Vincenzo Frascino , Thomas Gleixner , Serge Hallyn , Michael Kerrisk , Andy Lutomirski , Catalin Marinas , Mark Rutland , adrian@lisas.de Subject: Re: [PATCH 3/3] nsproxy: support CLONE_NEWTIME with setns() Message-ID: <20200623115521.hk3xlhixrt2zrgkn@wittgenstein> References: <20200619153559.724863-1-christian.brauner@ubuntu.com> <20200619153559.724863-4-christian.brauner@ubuntu.com> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline In-Reply-To: <20200619153559.724863-4-christian.brauner@ubuntu.com> Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Fri, Jun 19, 2020 at 05:35:59PM +0200, Christian Brauner wrote: > So far setns() was missing time namespace support. This was partially due > to it simply not being implemented but also because vdso_join_timens() > could still fail which made switching to multiple namespaces atomically > problematic. This is now fixed so support CLONE_NEWTIME with setns() > > Cc: Thomas Gleixner > Cc: Michael Kerrisk > Cc: Serge Hallyn > Cc: Dmitry Safonov > Cc: Andrei Vagin > Signed-off-by: Christian Brauner > --- Andrei, Dmitry, A little off-topic since its not related to the patch here but I've been going through the current time namespace semantics and i just want to confirm something with you: Afaict, unshare(CLONE_NEWTIME) currently works similar to unshare(CLONE_NEWPID) in that it only changes {pid,time}_for_children but does _not_ change the {pid, time} namespace of the caller itself. For pid namespaces that makes a lot of sense but I'm not completely clear why you're doing this for time namespaces, especially since the setns() behavior for CLONE_NEWPID and CLONE_NEWTIME is very different: Similar to unshare(CLONE_NEWPID), setns(CLONE_NEWPID) doesn't change the pid namespace of the caller itself, it only changes it for it's children by setting up pid_for_children. _But_ for setns(CLONE_NEWTIME) both the caller's and the children's time namespace is changed, i.e. unshare(CLONE_NEWTIME) behaves different from setns(CLONE_NEWTIME). Why? This also has the consequence that the unshare(CLONE_NEWTIME) + setns(CLONE_NEWTIME) sequence can be used to change the callers pid namespace. Is this intended? Here's some code where you can verify this (please excuse the aweful code I'm using to illustrate this): int main(int argc, char *argv[]) { char buf1[4096], buf2[4096]; if (unshare(0x00000080)) exit(1); int fd = open("/proc/self/ns/time", O_RDONLY); if (fd < 0) exit(2); readlink("/proc/self/ns/time", buf1, sizeof(buf1)); readlink("/proc/self/ns/time_for_children", buf2, sizeof(buf2)); printf("unshare(CLONE_NEWTIME): time(%s) ~= time_for_children(%s)\n", buf1, buf2); if (setns(fd, 0x00000080)) exit(3); readlink("/proc/self/ns/time", buf1, sizeof(buf1)); readlink("/proc/self/ns/time_for_children", buf2, sizeof(buf2)); printf("setns(self, CLONE_NEWTIME): time(%s) == time_for_children(%s)\n", buf1, buf2); exit(EXIT_SUCCESS); } which gives: root@f2-vm:/# ./test unshare(CLONE_NEWTIME): time(time:[4026531834]) ~= time_for_children(time:[4026532366]) setns(self, CLONE_NEWTIME): time(time:[4026531834]) == time_for_children(time:[4026531834]) why is unshare(CLONE_NEWTIME) blocked from changing the callers pid namespace when setns(CLONE_NEWTIME) is allowed to do this? Christian