Received: by 2002:a05:6a10:a0d1:0:0:0:0 with SMTP id j17csp2488665pxa; Mon, 17 Aug 2020 10:50:54 -0700 (PDT) X-Google-Smtp-Source: ABdhPJydbhKkKpFwo1qOTzerXTjjwoM+wAWv1r0QqrCVir5Zr4nnjOa3yrffGDJDzmPXtxACQnHo X-Received: by 2002:a17:906:990c:: with SMTP id zl12mr15950594ejb.488.1597686654625; Mon, 17 Aug 2020 10:50:54 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1597686654; cv=none; d=google.com; s=arc-20160816; b=yeija79R86of3Fi/zyEnfWemyJGF8uCcI0+3NMt9o3s3rVm6ISHqjW/0mIU3JcjEvF hebwFtQkM4BYGa0ToaKtYDStIhzl2g10chhPiZdeg7Blb1ICGUwBCZxcUt71JcVNiXDH FrIC89KeSSES85DcGMBf+/6Fki0+daXIufhwJbmRiEHUNE8t6ounHy4x0rR41vBP7fux wYRD8oDVhIl7Hg6/kwS8r9m+bXzL871grP2omrrDwgeRYTZorWyyaIfLMbNS2Ui/LY1g FpKr3qS/YJZxLPSybberUggwNbM+FbyUPJr50kmec0YCyrtLCETMB0avv7YguVrr+3LG MB3Q== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:in-reply-to:content-disposition :mime-version:references:message-id:subject:cc:to:from:date; bh=nFIJftOcIzQng99oRLDpRhjiJT8OiyMNdgz0dRLScO8=; b=rCQsaznCSqG2xbRO7boaVLHD+FfzNJelxN0OPL+N4wW/5MTgWyZ62LbhQ6zhIcSDOe WEO1ge3unnezmn7Tb6vtIl/rjbdu73DZweyu9zltxbT/3RwpzgT3f4duLdq48xDPTLNC nNVRYRBgmwl4z/VOxa9AO1B5DP73UyD1YPgQnX3UUsN12UePBqnFbkaMSp8EhOzwa2kt 0oXLWT/rigl7drOKS2r1OriufpaiJVYDlmh0/JEV5egiOZNKa3eCWb3AMRB4uyUx6pTg E/n0nP0XPpj+MZbucMBBUJCo0mD0ISzOY1DlOHDMU16iYwkcr4Kwy9T5pnKEHaX4tFKT t3rg== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id j26si11248961ejx.454.2020.08.17.10.50.31; Mon, 17 Aug 2020 10:50:54 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S2389497AbgHQRsc (ORCPT + 99 others); Mon, 17 Aug 2020 13:48:32 -0400 Received: from youngberry.canonical.com ([91.189.89.112]:39198 "EHLO youngberry.canonical.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S2390094AbgHQRsU (ORCPT ); Mon, 17 Aug 2020 13:48:20 -0400 Received: from ip5f5af70b.dynamic.kabel-deutschland.de ([95.90.247.11] helo=wittgenstein) by youngberry.canonical.com with esmtpsa (TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.86_2) (envelope-from ) id 1k7jEY-0000AV-7b; Mon, 17 Aug 2020 17:47:46 +0000 Date: Mon, 17 Aug 2020 19:47:45 +0200 From: Christian Brauner To: "Eric W. Biederman" Cc: Kirill Tkhai , Andrei Vagin , adobriyan@gmail.com, viro@zeniv.linux.org.uk, davem@davemloft.net, akpm@linux-foundation.org, areber@redhat.com, serge@hallyn.com, linux-kernel@vger.kernel.org, linux-fsdevel@vger.kernel.org, Pavel Tikhomirov Subject: Re: [PATCH 00/23] proc: Introduce /proc/namespaces/ directory to expose namespaces lineary Message-ID: <20200817174745.jssxjdcwoqxeg5pu@wittgenstein> References: <2d65ca28-bcfa-b217-e201-09163640ebc2@virtuozzo.com> <20200810173431.GA68662@gmail.com> <33565447-9b97-a820-bc2c-a4ff53a7675a@virtuozzo.com> <20200812175338.GA596568@gmail.com> <8f3c9414-9efc-cc01-fb2a-4d83266e96b2@virtuozzo.com> <20200814011649.GA611947@gmail.com> <0af3f2fa-f2c3-fb7d-b57e-9c41fe94ca58@virtuozzo.com> <20200814192102.GA786465@gmail.com> <56ed1fb9-4f1f-3528-3f09-78478b9dfcf2@virtuozzo.com> <87d03pb7f2.fsf@x220.int.ebiederm.org> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline In-Reply-To: <87d03pb7f2.fsf@x220.int.ebiederm.org> Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Mon, Aug 17, 2020 at 10:48:01AM -0500, Eric W. Biederman wrote: > > Creating names in the kernel for namespaces is very difficult and > problematic. I have not seen anything that looks like all of the > problems have been solved with restoring these new names. > > When your filter for your list of namespaces is user namespace creating > a new directory in proc is highly questionable. > > As everyone uses proc placing this functionality in proc also amplifies > the problem of creating names. > > > Rather than proc having a way to mount a namespace filesystem filter by > the user namespace of the mounter likely to have many many fewer > problems. Especially as we are limiting/not allow new non-process > things and ideally finding a way to remove the non-process things. > > > Kirill you have a good point that taking the case where a pid namespace > does not exist in a user namespace is likely quite unrealistic. > > Kirill mentioned upthread that the list of namespaces are the list that > can appear in a container. Except by discipline in creating containers > it is not possible to know which namespaces may appear in attached to a > process. It is possible to be very creative with setns, and violate any > constraint you may have. Which means your filtered list of namespaces > may not contain all of the namespaces used by a set of processes. This Indeed. We use setns() quite creatively when intercepting syscalls and when attaching to a container. > further argues that attaching the list of namespaces to proc does not > make sense. > > Andrei has a good point that placing the names in a hierarchy by > user namespace has the potential to create more freedom when > assigning names to namespaces, as it means the names for namespaces > do not need to be globally unique, and while still allowing the names > to stay the same. > > > To recap the possibilities for names for namespaces that I have seen > mentioned in this thread are: > - Names per mount > - Names per user namespace > > I personally suspect that names per mount are likely to be so flexibly > they are confusing, while names per user namespace are likely to be > rigid, possibly too rigid to use. > > It all depends upon how everything is used. I have yet to see a > complete story of how these names will be generated and used. So I can > not really judge. So I haven't fully understood either what the motivation for this patchset is. I can just speak to the use-case I had when I started prototyping something similar: We needed a way to get a view on all namespaces that exist on the system because we wanted a way to do namespace debugging on a live system. This interface could've easily lived in debugfs. The main point was that it should contain all namespaces. Note, that it wasn't supposed to be a hierarchical format it was only mean to list all namespaces and accessible to real root. The interface here is way more flexible/complex and I haven't yet figured out what exactly it is supposed to be used for. > > > Let me add another take on this idea that might give this work a path > forward. If I were solving this I would explore giving nsfs directories > per user namespace, and a way to mount it that exposed the directory of > the mounters current user namespace (something like btrfs snapshots). > > Hmm. For the user namespace directory I think I would give it a file > "ns" that can be opened to get a file handle on the user namespace. > Plus a set of subdirectories "cgroup", "ipc", "mnt", "net", "pid", > "user", "uts") for each type of namespace. In each directory I think > I would just have a 64bit counter and each new entry I would assign the > next number from that counter. > > The restore could either have the ability to rename files or simply the > ability to bump the counter (like we do with pids) so the names of the > namespaces can be restored. > > That winds up making a user namespace the namespace of namespaces, so > I am not 100% about the idea. I think you're right that we need to understand better what the use-case is. If I understand your suggestion correctly it wouldn't allow to show nested user namespaces if the nsfs mount is per-user namespace. Let me throw in a crazy idea: couldn't we just make the ioctl_ns() walk a namespace hierarchy? For example, you could pass in a user namespace fd and then you'd get back a struct with handles for fds for the namespaces owned by that user namespace and then you could use NS_GET_USERNS/NS_GET_PARENT to walk upwards from the user namespace fd passed in initially and so on? Or something similar/simpler. This would also decouple this from procfs somewhat. Christian