Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1753166Ab3F1TtL (ORCPT ); Fri, 28 Jun 2013 15:49:11 -0400 Received: from mail-qc0-f174.google.com ([209.85.216.174]:53974 "EHLO mail-qc0-f174.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752055Ab3F1TtJ (ORCPT ); Fri, 28 Jun 2013 15:49:09 -0400 MIME-Version: 1.0 In-Reply-To: <20130628192117.GA4553@sergelap> References: <20130627181108.GA26334@sergelap> <20130628163154.GA4989@sergelap> <20130628192117.GA4553@sergelap> From: Tim Hockin Date: Fri, 28 Jun 2013 12:48:48 -0700 X-Google-Sender-Auth: 9nkI2D-o5FbXurfmhy2DJjS66nI Message-ID: Subject: Re: cgroup access daemon To: Serge Hallyn Cc: Mike Galbraith , Tejun Heo , "linux-kernel@vger.kernel.org" , Containers , Kay Sievers , lpoetter , workman-devel , jpoimboe , "dhaval.giani" , Cgroups , vrigo , vmarmol Content-Type: text/plain; charset=UTF-8 Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 5411 Lines: 115 On Fri, Jun 28, 2013 at 12:21 PM, Serge Hallyn wrote: > Quoting Tim Hockin (thockin@hockin.org): >> On Fri, Jun 28, 2013 at 9:31 AM, Serge Hallyn wrote: >> > Quoting Tim Hockin (thockin@hockin.org): >> >> On Thu, Jun 27, 2013 at 11:11 AM, Serge Hallyn wrote: >> >> > Quoting Tim Hockin (thockin@hockin.org): >> > Could you give examples? >> > >> > If you have a white/academic paper I should go read, that'd be great. >> >> We don't have anything on this, but examples may help. >> >> Someone running as root should be able to connect to the "native" >> daemon and read or write any cgroup file they want, right? You could >> argue that root should be able to do this to a child-daemon, too, but >> let's ignore that. >> >> But inside a container, I don't want the users to be able to write to >> anything in their own container. I do want them to be able to make >> sub-cgroups, but only 5 levels deep. For sub-cgroups, they should be >> able to write to memory.limit_in_bytes, to read but not write >> memory.soft_limit_in_bytes, and not be able to read memory.stat. >> >> To get even fancier, a user should be able to create a sub-cgroup and >> then designate that sub-cgroup as "final" - no further sub-sub-cgroups >> allowed under it. They should also be able to designate that a >> sub-cgroup is "one-way" - once a process enters it, it can not leave. >> >> These are real(ish) examples based on what people want to do today. >> In particular, the last couple are things that we want to do, but >> don't do today. >> >> The particular policy can differ per-container. Production jobs might >> be allowed to create sub-cgroups, but batch jobs are not. Some user >> jobs are designated "trusted" in one facet or another and get more >> (but still not full) access. > > Interesting, thanks. > > I'll think a bit on how to best address these. > >> > At the moment I'm going on the naive belief that proper hierarchy >> > controls will be enforced (eventually) by the kernel - i.e. if >> > a task in cgroup /lxc/c1 is not allowed to mknod /dev/sda1, then it >> > won't be possible for /lxc/c1/lxc/c2 to take that access. >> > >> > The native cgroup manager (the one using cgroupfs) will be checking >> > the credentials of the requesting child manager for access(2) to >> > the cgroup files. >> >> This might be sufficient, or the basis for a sufficient access control >> system for users. The problem comes that we have multiple jobs on a >> single machine running as the same user. We need to ensure that the >> jobs can not modify each other. > > Would running them each in user namespaces with different mappings (all > jobs running as uid 1000, but uid 1000 mapped to different host uids > for each job) would be (long-term) feasible? Possibly. It's a largish imposition to make on the caller (we don't use user namespaces today, though we are evaluating how to start using them) but perhaps not terrible. >> > It is a named socket. >> >> So anyone can connect? even with SO_PEERCRED, how do you know which >> branches of the cgroup tree I am allowed to modify if the same user >> owns more than one? > > I was assuming that any process requesting management of > /c1/c2/c3 would have to be in one of its ancestor cgroups (i.e. /c1) > > So if you have two jobs running as uid 1000, one under /c1 and one > under /c2, and one as uid 1001 running under /c3 (with the uids owning > the cgroups), then the file permissions will prevent 1000 and 1001 > from walk over each other, while the cgroup manager will not allow > a process (child manager or otherwise) under /c1 to manage cgroups > under /c2 and vice versa. > >> >> Do you have a design spec, or a requirements list, or even a prototype >> >> that we can look at? >> > >> > The readme at https://github.com/hallyn/cgroup-mgr/blob/master/README >> > shows what I have in mind. It (and the sloppy code next to it) >> > represent a few hours' work over the last few days while waiting >> > for compiles and in between emails... >> >> Awesome. Do you mind if we look? > > No, but it might not be worth it (other than the readme) :) - so far > it's only served to help me think through what I want and need from > the mgr. > >> > But again, it is completely predicated on my goal to have libvirt >> > and lxc (and other cgroup users) be able to use the same library >> > or API to make their requests whether they are on host or in a >> > container, and regardless of the distro they're running under. >> >> I think that is a good goal. We'd like to not be different, if >> possible. Obviously, we can't impose our needs on you if you don't >> want to handle them. It sounds like what you are building is the >> bottom layer in a stack - we (Google) should use that same bottom >> layer. But that can only happen iff you're open to hearing our >> requirements. Otherwise we have to strike out on our own or build >> more layers in-between. > > I'm definately open to your requirements - whether providing what > you need for another layer on top, or building it right in. Great. That's a good place to start :) -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/