Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S964948AbcDNQXI (ORCPT ); Thu, 14 Apr 2016 12:23:08 -0400 Received: from out01.mta.xmission.com ([166.70.13.231]:47695 "EHLO out01.mta.xmission.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1756238AbcDNQXF (ORCPT ); Thu, 14 Apr 2016 12:23:05 -0400 From: ebiederm@xmission.com (Eric W. Biederman) To: "Serge E. Hallyn" Cc: Tejun Heo , linux-api@vger.kernel.org, adityakali@google.com, Linux Containers , cgroups@vger.kernel.org, lkml References: <20160321234133.GA22463@mail.hallyn.com> <20160413175736.GC3676@htj.duckdns.org> <20160414040436.GA3739@mail.hallyn.com> <87oa9c6ymf.fsf@x220.int.ebiederm.org> <20160414152747.GA12700@mail.hallyn.com> Date: Thu, 14 Apr 2016 11:12:42 -0500 In-Reply-To: <20160414152747.GA12700@mail.hallyn.com> (Serge E. Hallyn's message of "Thu, 14 Apr 2016 10:27:47 -0500") Message-ID: <877fg06uf9.fsf@x220.int.ebiederm.org> User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/24.5 (gnu/linux) MIME-Version: 1.0 Content-Type: text/plain X-XM-AID: U2FsdGVkX1/pyr7TfoMtwhZK13j9UrMVNrG5vHt81MU= X-SA-Exim-Connect-IP: 67.3.249.252 X-SA-Exim-Mail-From: ebiederm@xmission.com X-Spam-Report: * -1.0 ALL_TRUSTED Passed through trusted hosts only via SMTP * 0.7 XMSubLong Long Subject * 0.0 TVD_RCVD_IP Message was received from an IP address * 0.0 T_TM2_M_HEADER_IN_MSG BODY: No description available. * 0.8 BAYES_50 BODY: Bayes spam probability is 40 to 60% * [score: 0.4289] * -0.0 DCC_CHECK_NEGATIVE Not listed in DCC * [sa02 1397; Body=1 Fuz1=1 Fuz2=1] * 1.0 T_XMDrugObfuBody_08 obfuscated drug references * 0.0 T_TooManySym_01 4+ unique symbols in subject X-Spam-DCC: XMission; sa02 1397; Body=1 Fuz1=1 Fuz2=1 X-Spam-Combo: *;"Serge E. Hallyn" X-Spam-Relay-Country: X-Spam-Timing: total 1093 ms - load_scoreonly_sql: 0.05 (0.0%), signal_user_changed: 3.8 (0.3%), b_tie_ro: 2.7 (0.2%), parse: 1.16 (0.1%), extract_message_metadata: 31 (2.8%), get_uri_detail_list: 4.4 (0.4%), tests_pri_-1000: 12 (1.1%), tests_pri_-950: 2.2 (0.2%), tests_pri_-900: 1.70 (0.2%), tests_pri_-400: 36 (3.3%), check_bayes: 34 (3.1%), b_tokenize: 13 (1.2%), b_tok_get_all: 9 (0.8%), b_comp_prob: 4.9 (0.4%), b_tok_touch_all: 2.8 (0.3%), b_finish: 0.90 (0.1%), tests_pri_0: 990 (90.6%), check_dkim_signature: 0.85 (0.1%), check_dkim_adsp: 4.8 (0.4%), tests_pri_500: 10 (0.9%), rewrite_mail: 0.00 (0.0%) Subject: Re: [PATCH] cgroup namespaces: add a 'nsroot=' mountinfo field X-Spam-Flag: No X-SA-Exim-Version: 4.2.1 (built Wed, 24 Sep 2014 11:00:52 -0600) X-SA-Exim-Scanned: Yes (on in01.mta.xmission.com) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 3318 Lines: 62 "Serge E. Hallyn" writes: > Quoting Eric W. Biederman (ebiederm@xmission.com): >> "Serge E. Hallyn" writes: >> >> > This is so that userspace can distinguish a mount made in a cgroup >> > namespace from a bind mount from a cgroup subdirectory. >> >> To do that do you need to print the path, or is an extra option that >> reveals nothing except that it was a cgroup mount sufficient? >> >> Is there any practical difference between a mount in a namespace and a >> bind mount? >> >> Given the way the conversation has been going I think it would be good >> to see the answers to these questions. Perhaps I missed it but I >> haven't seen the answers to those questions. > > Yup, I tried to answer those in my last email, let me try again. > > Let's say I start a container using cgroup namespaces, /lxc/x1. It mounts > freezer at /sys/fs/cgroup so it has field three of mountinfo as /lxc/x1, > and /sys/fs/cgroup/ is the path to the container's cgroup (/lxc/x1). In > that container, I start another container x1, not using cgroup namespaces. > It also wants a cgroup mount, and a common way to handle that (to prevent > container rewriting its limits) is to mount a tmpfs at /sys/fs/cgroup, > create /sysfs/cgroup/lxc/x1, and bind mount /sys/fs/cgroup/lxc/x1 from > the parent container onto /sys/fs/cgroup/lxc/x1 in the child container. > Now for that bind mount, the mountinfo field 3 will show /lxc/x1/lxc/x1, > with mount target /sys/fs/cgroup/lxc/x1, while /proc/self/cgroup for a task > in that container will show '/lxc/x1'. Unless it has been moved into > /lxc/x1/lxc/x1 in the container (/lxc/x1/lxc/x1/lxc/x1 on the host)... > Every time I've thought "maybe we can just..." I've found a case where it > wouldn't work. > > At first in lxc we simply said if /proc/self/ns/cgroup exists assume that > the cgroupfs mounts are not bind mounts. However, old userspace (and > container drivers) on new kernels is certainly possible, especially an > older distro in a container on a newer distro on the host. That completely > breaks with this approach. > > I also personally think there *is* value in letting a task know its > place on the system, so hiding the full cgroup path is imo not only not > a valid goal, it's counter-productive. Part of making for better > virtualization is to give userspace all the info it needs about its > current limits. Consider that with the unified hierarchy, you cannot > have tasks in a cgroup that also has child cgroups - except for the > root. Cgroup namespaces do not make an exception for this, so knowing > that you are not in the absolute cgroup root actually can prevent you > from trying something that cannot work. Or, I suppose, at least > understanding why you're unable to do what you're trying to do (namely > your container manager messed up). I point this out because finding > a way to only show the namespaced root in field 3 of mountinfo would > fix the base problem, but at the cost of hiding useful information > from a container. It is just the superblock show_path method. And regardless of the rest of the usefullness of your mount option implementing show_path appears to be fundamentally the right thing in this context. As that field appears to have the same issue as /proc/self/cgroup. Eric