Received: by 2002:a25:683:0:0:0:0:0 with SMTP id 125csp1650540ybg; Thu, 4 Jun 2020 15:37:00 -0700 (PDT) X-Google-Smtp-Source: ABdhPJwaNmYn0j2XspoxldsuaSLOUIam7kcfbvyK4A2yHrKKsTbPIENhGSxTeptPCjKAWDvce7XF X-Received: by 2002:a17:907:94c4:: with SMTP id dn4mr5648157ejc.150.1591310220195; Thu, 04 Jun 2020 15:37:00 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1591310220; cv=none; d=google.com; s=arc-20160816; b=LjIIHP0ptH1PGmVa6BKj5/MY3OyJvErTnuNgH3dSTbgcTSk9HsjImXKuqKU5pX8HB9 4xZGsjIUpxAicA4OPBruiNM0SmORp1B0SLHwWgjNqVW8EnI234EtAlXU3W0FjALx7kwq 3QIeDzwpnfHOH8sndE9o/1VQIEc+fLqsm91Ohe+WTI05GOguENVNsV0RBR22Gk9OWgdn 7iSWi/wbHs+Ungr0/5OQeCRJP4g7z7OZrZCjtigkh9/Ax1lVi/kpS2RdODha8zTgAiOk HnHUN4RbQ6lPdiBN+IJY9WQ4QP1tlNrQrEivA6XcFg6eJQT6YUsRx8QT5VR4+aV1s1/I kvsw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:in-reply-to:content-disposition :mime-version:references:message-id:subject:cc:to:from:date; bh=mbUcIXIyTagDOkgYFXSPDS3+TYk9hm7NF9UAFJX+3Gc=; b=guhFUAs4zLgflpFP8W+jMRPO7e7IUgDpChPc2v0UJq3Gvsqg+SO2C/csYBRKLJBobl NJ2l0xsi8GQtDg8axRJLZ3IECSh2/iObt4JKrkBsTZktv7p3/c0rlJEbLME3sgVdpMfC MEvah0U1HLoSrYNrh4JdF1Wn94ry7KJ4PsFGBOjMMbiMNF1FBx2Xibmato5ZoUtIeqnB nACxiTPLg+LHIld+J8DJDYsPKNgWqv81BsEYCFwu4rqFNRT/AmouDbibQc7cyCRs9Udl wQdmJTAG6cJGGGJ+pFk74R9o9gxdFssXaqlvZIowvcMBsXIHwAv4xQk8g/SnhDRFI6jB /+4g== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id u9si2517124ejx.370.2020.06.04.15.36.37; Thu, 04 Jun 2020 15:37:00 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727911AbgFDVcd (ORCPT + 99 others); Thu, 4 Jun 2020 17:32:33 -0400 Received: from youngberry.canonical.com ([91.189.89.112]:52050 "EHLO youngberry.canonical.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1725943AbgFDVcc (ORCPT ); Thu, 4 Jun 2020 17:32:32 -0400 Received: from ip5f5af183.dynamic.kabel-deutschland.de ([95.90.241.131] helo=wittgenstein) by youngberry.canonical.com with esmtpsa (TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.86_2) (envelope-from ) id 1jgxTJ-00081i-4f; Thu, 04 Jun 2020 21:32:21 +0000 Date: Thu, 4 Jun 2020 23:32:20 +0200 From: Christian Brauner To: "Eric W. Biederman" Cc: Alexey Gladkov , Kees Cook , Linux Containers , LKML , Alexander Viro , Andy Lutomirski , Linux FS Devel , Alexey Gladkov , Alexey Dobriyan , =?utf-8?B?U3TDqXBoYW5l?= Graber Subject: Re: [PATCH 0/2] proc: use subset option to hide some top-level procfs entries Message-ID: <20200604213220.grcaldlxz54jyd3o@wittgenstein> References: <20200604200413.587896-1-gladkov.alexey@gmail.com> <87ftbah8q2.fsf@x220.int.ebiederm.org> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline In-Reply-To: <87ftbah8q2.fsf@x220.int.ebiederm.org> Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Thu, Jun 04, 2020 at 03:33:25PM -0500, Eric W. Biederman wrote: > Alexey Gladkov writes: > > > Greetings! > > > > Preface > > ------- > > This patch set can be applied over: > > > > git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace.git d35bec8a5788 > > I am not going to seriously look at this for merging until after the > merge window closes. > > Have you thought about the possibility of relaxing the permission checks > to mount proc such that we don't need to verify there is an existing > mount of proc? With just the subset pids I think this is feasible. It > might not be worth it at this point, but it is definitely worth asking > the question. As one of the benefits early propopents of the idea of a > subset of proc touted was that they would not be as restricted as they > are with today's proc. > > I ask because this has a bearing on the other options you are playing > with. > > Do we want to find a way to have the benefit of relaxed permission > checks while still including a few more files. > > > Overview > > -------- > > Directories and files can be created and deleted by dynamically loaded modules. > > Not all of these files are virtualized and safe inside the container. > > > > However, subset=pid is not enough because many containers wants to have > > /proc/meminfo, /proc/cpuinfo, etc. We need a way to limit the visibility of > > files per procfs mountpoint. > > Is it desirable to have meminfo and cpuinfo as they are today or do > people want them to reflect the ``container'' context. So that > applications like the JVM don't allocation too many cpus or don't try > and consume too much memory, or run on nodes that cgroups current make > unavailable. > > Are there any users or planned users of this functionality yet? > > I am concerned that you might be adding functionality that no one will > ever use that will just add code to the kernel that no one cares about, > that will then accumulate bugs. Having had to work through a few of > those cases to make each mount of proc have it's own super block I am > not a great fan of adding another one. > > If the runc, lxc and other container runtime folks can productively use > such and option to do useful things and they are sensible things to do I > don't have any fundamental objection. But I do want to be certain this > is a feature that is going to be used. I'm not sure Alexey is introducing virtualized meminfo and cpuinfo (but I haven't had time to look at this patchset). In any case, we are currently virtualizing: /proc/cpuinfo /proc/diskstats /proc/loadavg /proc/meminfo /proc/stat /proc/swaps /proc/uptime for each container with a tiny in-userspace filesystem LXCFS ( https://github.com/lxc/lxcfs ) and have been doing that for years. Having meminfo and cpuinfo virtualized in procfs was something we have been wanting for a long time and there have been patches by other people (from Siteground, I believe) to achieve this a few years back but were disregarded. I think meminfo and cpuinfo would already be great. And if we're virtualizing cpuinfo we also need to virtualize the cpu bits exposed in /proc/stat. It would also be great to virtualize /proc/uptime. Right now we're achieving this essentially by substracting the time the init process of the pid namespace has started since system boot time, minus the time when the system started to get the actual reaper age (It's a bit more involved but that's the gist.). This is all on the topic list for this year's virtual container's microconference at Plumber's and I would suggest we try to discuss the various requirements for something like this there. (I'm about to send the CFP out.) Christian