Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S934681AbbEOO4F (ORCPT ); Fri, 15 May 2015 10:56:05 -0400 Received: from out03.mta.xmission.com ([166.70.13.233]:40438 "EHLO out03.mta.xmission.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S933865AbbEOO4B (ORCPT ); Fri, 15 May 2015 10:56:01 -0400 From: ebiederm@xmission.com (Eric W. Biederman) To: Steve Grubb Cc: Paul Moore , Richard Guy Briggs , containers@lists.linux-foundation.org, linux-kernel@vger.kernel.org, linux-audit@redhat.com, eparis@parisplace.org, arozansk@redhat.com, serge@hallyn.com, zohar@linux.vnet.ibm.com, viro@zeniv.linux.org.uk, linux-fsdevel@vger.kernel.org, linux-api@vger.kernel.org, netdev@vger.kernel.org References: <12675437.ssZNCck7zG@sifl> <87bnhmbp8e.fsf@x220.int.ebiederm.org> <2519397.QJshNan19e@x2> Date: Fri, 15 May 2015 09:51:09 -0500 In-Reply-To: <2519397.QJshNan19e@x2> (Steve Grubb's message of "Fri, 15 May 2015 09:17:24 -0400") Message-ID: <87iobtyjvm.fsf@x220.int.ebiederm.org> User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/24.3 (gnu/linux) MIME-Version: 1.0 Content-Type: text/plain X-XM-AID: U2FsdGVkX1/ued2CwpAqH5o2Jp/NuqqR2Rt3Zht+GvA= X-SA-Exim-Connect-IP: 67.3.205.90 X-SA-Exim-Mail-From: ebiederm@xmission.com X-Spam-Report: * -1.0 ALL_TRUSTED Passed through trusted hosts only via SMTP * 0.7 XMSubLong Long Subject * 0.0 TVD_RCVD_IP Message was received from an IP address * 0.0 T_TM2_M_HEADER_IN_MSG BODY: No description available. * 0.8 BAYES_50 BODY: Bayes spam probability is 40 to 60% * [score: 0.5000] * -0.0 DCC_CHECK_NEGATIVE Not listed in DCC * [sa07 1397; Body=1 Fuz1=1 Fuz2=1] * 0.5 XM_Body_Dirty_Words Contains a dirty word * 0.1 XMSolicitRefs_0 Weightloss drug * 1.0 T_XMDrugObfuBody_08 obfuscated drug references X-Spam-DCC: XMission; sa07 1397; Body=1 Fuz1=1 Fuz2=1 X-Spam-Combo: **;Steve Grubb X-Spam-Relay-Country: X-Spam-Timing: total 779 ms - load_scoreonly_sql: 0.05 (0.0%), signal_user_changed: 3.0 (0.4%), b_tie_ro: 2.0 (0.3%), parse: 1.17 (0.1%), extract_message_metadata: 20 (2.5%), get_uri_detail_list: 7 (0.9%), tests_pri_-1000: 4.9 (0.6%), tests_pri_-950: 1.32 (0.2%), tests_pri_-900: 1.21 (0.2%), tests_pri_-400: 52 (6.6%), check_bayes: 50 (6.4%), b_tokenize: 21 (2.7%), b_tok_get_all: 15 (1.9%), b_comp_prob: 7 (0.9%), b_tok_touch_all: 3.8 (0.5%), b_finish: 0.76 (0.1%), tests_pri_0: 682 (87.6%), tests_pri_500: 10 (1.2%), rewrite_mail: 0.00 (0.0%) Subject: Re: [PATCH V6 05/10] audit: log creation and deletion of namespace instances X-Spam-Flag: No X-SA-Exim-Version: 4.2.1 (built Wed, 24 Sep 2014 11:00:52 -0600) X-SA-Exim-Scanned: Yes (on in01.mta.xmission.com) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 7105 Lines: 148 Steve Grubb writes: > On Thursday, May 14, 2015 08:31:45 PM Eric W. Biederman wrote: >> Paul Moore writes: >> > As Eric, and others, have stated, the container concept is a userspace >> > idea, not a kernel idea; the kernel only knows, and cares about, >> > namespaces. This is unlikely to change. >> > >> > However, as Steve points out, there is precedence for the kernel to record >> > userspace tokens for the sake of audit. Personally I'm not a big fan of >> > this in general, but I do recognize that it does satisfy a legitimate >> > need. Think of things like auid and the sessionid as necessary evils; >> > audit is already chock full of evilness I doubt one more will doom us all >> > to hell. >> > >> > Moving forward, I'd like to see the following: >> > >> > * Create a container ID token (unsigned 32-bit integer?), similar to >> > auid/sessionid, that is set by userspace and carried by the kernel to be >> > used in audit records. I'd like to see some discussion on how we manage >> > this, e.g. how do handle container ID inheritance, how do we handle >> > nested containers (setting the containerid when it is already set), do we >> > care if multiple different containers share the same namespace config, >> > etc.? >> > >> > Can we all live with this? If not, please suggest some alternate ideas; >> > simply shouting "IT'S ALL CRAP!" isn't helpful for anyone ... it may be >> > true, but it doesn't help us solve the problem ;) >> >> Without stopping and defining what someone means by container I think it >> is pretty much nonsense. > > Maybe this is what's hanging everyone up? Its easy to get lost when your view > is down at the syscall level and what is happening in the kernel. Starting a > container is akin to the idea of login. Not every call to setresuid is a > login. It could be a setuid program starting or a daemon dropping privileges. > The idea of a container is a higher level concept that starting a name space. > I think comparing a login with a container is a useful analogy because both > are higher level concepts but employ low level ideas. A login is a collection > of chdir, setuid, setgid, allocating a tty, associating the first 3 file > descriptors, setting a process group, and starting a specific executable. All > these low level concepts each by itself is not special. Except login and setresuid are privileged operation. CREATING A CONTAINER IS NOT A PRIVILGED OPERATION. Your analagy fails rather badly with respect to that fact. > A container is what we need auditing events around not creation of namespaces. > If we want creation of namespaces, we can audit the clone/unshare/setns > syscalls. The container is when a managing program such as docker, lxc, or > sometimes systemd creates a special operating environment for the express > purpose of running programs disassociated in some way from the parent > namespaces, cgroups, and security assumptions. Its this orchestration, just as > sshd orchestrates a login, that makes it different. What do you define as a container? From what I can tell we share a similiar understanding of the term, and running lxc is not a privileged operation. Running sandstorm.io is not a privileged operation. >> Should every vsftp connection get a container every? Every chrome tab? > > No. Also, note that not every program that grants a user session constitutes a > login. >> At some of the connections per second numbers I have seen we might >> exhaust a 32bit number in an hour or two. Will any of that make sense >> to someone reading the audit logs? > > I would agree if we were auditing creation of name spaces. But going back to > the concept of login, these could occur at a high rate. This is a bruteforce > login attack. We put countermeasures in place to prevent it. But it is > possible for the session id to wrap. But in our case, things like lxc or > docker don't start hundreds of these a minute. Except there are reasonable situtations where container creation does happen at fast rates. Outside of a container per network connection (which is likely to happen at some point) I have seen builds fire up more containers than I can count as part of automated testing. >> Without considerning that container creation is an unprivileged >> operation I think it is pretty much nonsense. Do I get to say I am any >> container I want? That would seem to invalidate the concept of >> userspace setting a container id. > > It would need to be a privileged operation just as setuid is. CONTAINER CREATION IS NOT A PRIVILEGED OPERATION. That is today. That is talking about lxc. CONTAINER CREATION IS NOT A PRIVILEGED OPERATION. And ultimately we don't want it to be, as if you can safely create a container without privilege your system is safer. >> How does any of this interact with setns? AKA entering a container? > > We have to audit this. For the moment, auditing the setns syscall may be > enough. I'd have to look at the lifecycle of the application that's doing this > to determine if we need more. Frequently it will be sysadmins for some arbitrary reason calling nsenter or a similar program that is more aware of their favorite container flavor. >> I will go as far as looking at patches. If someone comes up with >> a mission statement about what they are actually trying to achieve and a >> mechanism that actually achieves that, and that allows for containers to >> nest we can talk about doing something like that. > > Auditing wouldn't impose any restrictions on this. We just need a way to > observe actions within and associate them as needed to investigate violations > of security policy. *Rolls eyes* But the rest of the container tool kit in the kernel will impose limitations on those identifiers. >> But for right now I just hear proposals for things that make no sense >> and can not possibly work. Not least because it will require modifying >> every program that creates a container and who knows how many of them >> there are. > > We only care about a couple programs doing the orchestration. They will need > to have the right support added to them. I'm hoping the analogy of a login > helps demonstrate what we are after. All I see is that (a) you have not defined what you see a container as (b) you have failed to acknowledge I can create a container without privilege (which breaks your analogy with login). But I think I am with Andy. If you only care about privileged events and privileged containers, it is unlikely you need to do anything in the kernel and you can perform whatever logging you see fit in your privileged userspace applications. Of course in the log run I don't see what good that will do you as I expect increasingly there will not need to be any special permissions to create containers. Eric -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/