Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1751682AbcCFWDk (ORCPT ); Sun, 6 Mar 2016 17:03:40 -0500 Received: from out03.mta.xmission.com ([166.70.13.233]:46417 "EHLO out03.mta.xmission.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1750916AbcCFWDc (ORCPT ); Sun, 6 Mar 2016 17:03:32 -0500 From: ebiederm@xmission.com (Eric W. Biederman) To: "Serge E. Hallyn" Cc: lkml , Seth Forshee , =?utf-8?Q?St=C3=A9phane?= Graber , serge@hallyn.com, Andy Lutomirski References: <20160306082820.GA1917@mail.hallyn.com> Date: Sun, 06 Mar 2016 15:53:40 -0600 In-Reply-To: <20160306082820.GA1917@mail.hallyn.com> (Serge E. Hallyn's message of "Sun, 6 Mar 2016 02:28:20 -0600") Message-ID: <87oaar2ryz.fsf@x220.int.ebiederm.org> User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/24.5 (gnu/linux) MIME-Version: 1.0 Content-Type: text/plain X-XM-AID: U2FsdGVkX1/x9ac+T1JKSbeSUhs0QmwJ/iNZlscEZTM= X-SA-Exim-Connect-IP: 70.59.168.211 X-SA-Exim-Mail-From: ebiederm@xmission.com X-Spam-Report: * -1.0 ALL_TRUSTED Passed through trusted hosts only via SMTP * 0.0 TVD_RCVD_IP Message was received from an IP address * 0.0 T_TM2_M_HEADER_IN_MSG BODY: No description available. * 0.8 BAYES_50 BODY: Bayes spam probability is 40 to 60% * [score: 0.4973] * -0.0 DCC_CHECK_NEGATIVE Not listed in DCC * [sa05 1397; Body=1 Fuz1=1 Fuz2=1] X-Spam-DCC: XMission; sa05 1397; Body=1 Fuz1=1 Fuz2=1 X-Spam-Combo: ;"Serge E. Hallyn" X-Spam-Relay-Country: X-Spam-Timing: total 486 ms - load_scoreonly_sql: 0.08 (0.0%), signal_user_changed: 5 (1.0%), b_tie_ro: 3.5 (0.7%), parse: 1.46 (0.3%), extract_message_metadata: 12 (2.5%), get_uri_detail_list: 1.32 (0.3%), tests_pri_-1000: 4.8 (1.0%), tests_pri_-950: 1.14 (0.2%), tests_pri_-900: 0.95 (0.2%), tests_pri_-400: 20 (4.0%), check_bayes: 18 (3.8%), b_tokenize: 5 (1.1%), b_tok_get_all: 6 (1.3%), b_comp_prob: 1.99 (0.4%), b_tok_touch_all: 2.8 (0.6%), b_finish: 0.69 (0.1%), tests_pri_0: 430 (88.5%), check_dkim_signature: 0.53 (0.1%), check_dkim_adsp: 4.1 (0.8%), tests_pri_500: 7 (1.4%), rewrite_mail: 0.00 (0.0%) Subject: Re: user namespace and fully visible proc and sys mounts X-Spam-Flag: No X-SA-Exim-Version: 4.2.1 (built Wed, 24 Sep 2014 11:00:52 -0600) X-SA-Exim-Scanned: Yes (on in01.mta.xmission.com) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 1938 Lines: 44 "Serge E. Hallyn" writes: > Hi, > > So we've been over this many times... but unfortunately there is more > breakage to report. Regular privileged and unprivileged containers > work all right for us. But running an unprivileged container inside a > privileged container is blocked. > > When creating privileged containers, lxc by default does a few things: > it mounts some fuse.lxcfs files over procfiles include /proc/meminfo and > /proc/uptime. It mounts proc rw but /proc/sysrq-trigger ro as well as > moves /proc/sys/net out of the way, bind-mounts /proc/sys readonly > (because this container is not in a user namespace) then moves > /proc/sys/net back. Finally it mounts sys ro but bind-mounts > /sys/devices/virtual/net as writeable. > > If any of these are left enabled, unprivileged containers can't be > started. If all are disabled, then they can be. > > Can we find a way to make these not block remounts in child user > namespaces? A boot flag, a procfs and sysfs mount option, a sysctl? Are any of these overmounts done for the purpose of security? It appears the /proc/sys and /sys mounts being made read-only is for that purpose. If none of the mounts are for secuirty the easy solution that works today is to also mount /proc and /sys somewhere else in your container so that the permission check for mounting a new copy passes. That said /proc/sys appears to be a show stopper in this scheme. As the root of your privileged container can enter your unprivileged container it can bypass your read-only /proc/sys by mounting a new copy of proc if we allow the relaxation you are requesting. Therefore the only choice on the table (and I don't have a clue how realistic it is) is to have a variant of proc with just files describing processes. Call it processfs. That would not need the current restrictions. As for sysfs I am drawing a blank about what might be possible. Eric