Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1754150AbcJETJQ (ORCPT ); Wed, 5 Oct 2016 15:09:16 -0400 Received: from mail-pa0-f67.google.com ([209.85.220.67]:36021 "EHLO mail-pa0-f67.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752265AbcJETJN (ORCPT ); Wed, 5 Oct 2016 15:09:13 -0400 Date: Wed, 5 Oct 2016 12:09:09 -0700 From: Dmitry Torokhov To: John Stultz Cc: lkml , Colin Cross , Tejun Heo , Li Zefan , Jonathan Corbet , cgroups@vger.kernel.org, Android Kernel Team , Rom Lemarchand , Dmitry Shmidt , Todd Kjos , Christian Poetzsch , Amit Pundir , Ricky Zhou Subject: Re: [PATCH 1/2] cgroup: Add generic cgroup subsystem permission checks Message-ID: <20161005190909.GA31873@dtor-ws> References: <1475556090-6278-1-git-send-email-john.stultz@linaro.org> <1475556090-6278-2-git-send-email-john.stultz@linaro.org> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <1475556090-6278-2-git-send-email-john.stultz@linaro.org> User-Agent: Mutt/1.5.21 (2010-09-15) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 5769 Lines: 158 [ Some comments are form Ricky Zhou , some from myself ] On Mon, Oct 03, 2016 at 09:41:29PM -0700, John Stultz wrote: > From: Colin Cross > > Rather than using explicit euid == 0 checks when trying to move > tasks into a cgroup, move permission checks into each specific > cgroup subsystem. If a subsystem does not specify a 'allow_attach' > handler, then we fall back to doing the checks the old way. > > This patch adds a 'allow_attach' handler instead of reusing the > 'can_attach' handler, since if the 'can_attach' handler was > reused, a new cgroup that implements 'can_attach' but not the > permission checks could end up with no permission checks at all. > > This also includes folded down fixes from: > Christian Poetzsch > Amit Pundir > > Cc: Tejun Heo > Cc: Li Zefan > Cc: Jonathan Corbet > Cc: cgroups@vger.kernel.org > Cc: Android Kernel Team > Cc: Rom Lemarchand > Cc: Colin Cross > Cc: Dmitry Shmidt > Cc: Todd Kjos > Cc: Christian Poetzsch > Cc: Amit Pundir > Original-Author: San Mehat > Signed-off-by: Colin Cross > [jstultz: Rewording of commit message, folded down fixes] > Signed-off-by: John Stultz > --- > Documentation/cgroup-v1/cgroups.txt | 9 +++++++++ > include/linux/cgroup-defs.h | 1 + > kernel/cgroup.c | 40 +++++++++++++++++++++++++++++++++++-- > 3 files changed, 48 insertions(+), 2 deletions(-) > > diff --git a/Documentation/cgroup-v1/cgroups.txt b/Documentation/cgroup-v1/cgroups.txt > index 308e5ff..295f026 100644 > --- a/Documentation/cgroup-v1/cgroups.txt > +++ b/Documentation/cgroup-v1/cgroups.txt > @@ -578,6 +578,15 @@ is completely unused; @cgrp->parent is still valid. (Note - can also > be called for a newly-created cgroup if an error occurs after this > subsystem's create() method has been called for the new cgroup). > > +int allow_attach(struct cgroup *cgrp, struct cgroup_taskset *tset) > +(cgroup_mutex held by caller) > + > +Called prior to moving a task into a cgroup; if the subsystem > +returns an error, this will abort the attach operation. Used > +to extend the permission checks - if all subsystems in a cgroup > +return 0, the attach will be allowed to proceed, even if the > +default permission check (root or same user) fails. > + > int can_attach(struct cgroup *cgrp, struct cgroup_taskset *tset) > (cgroup_mutex held by caller) > > diff --git a/include/linux/cgroup-defs.h b/include/linux/cgroup-defs.h > index 5b17de6..0f4548c 100644 > --- a/include/linux/cgroup-defs.h > +++ b/include/linux/cgroup-defs.h > @@ -441,6 +441,7 @@ struct cgroup_subsys { > void (*css_free)(struct cgroup_subsys_state *css); > void (*css_reset)(struct cgroup_subsys_state *css); > > + int (*allow_attach)(struct cgroup_taskset *tset); > int (*can_attach)(struct cgroup_taskset *tset); > void (*cancel_attach)(struct cgroup_taskset *tset); > void (*attach)(struct cgroup_taskset *tset); > diff --git a/kernel/cgroup.c b/kernel/cgroup.c > index d6b729b..e6afe2d 100644 > --- a/kernel/cgroup.c > +++ b/kernel/cgroup.c > @@ -2833,6 +2833,25 @@ static int cgroup_attach_task(struct cgroup *dst_cgrp, > return ret; > } > > +static int cgroup_allow_attach(struct cgroup *cgrp, struct cgroup_taskset *tset) > +{ > + struct cgroup_subsys_state *css; > + int i; > + int ret; > + > + for_each_css(css, i, cgrp) { > + if (css->ss->allow_attach) { > + ret = css->ss->allow_attach(tset); > + if (ret) > + return ret; > + } else { > + return -EACCES; > + } > + } > + > + return 0; > +} > + > static int cgroup_procs_write_permission(struct task_struct *task, > struct cgroup *dst_cgrp, > struct kernfs_open_file *of) > @@ -2847,8 +2866,25 @@ static int cgroup_procs_write_permission(struct task_struct *task, > */ > if (!uid_eq(cred->euid, GLOBAL_ROOT_UID) && > !uid_eq(cred->euid, tcred->uid) && > - !uid_eq(cred->euid, tcred->suid)) > - ret = -EACCES; > + !uid_eq(cred->euid, tcred->suid)) { > + /* > + * if the default permission check fails, give each > + * cgroup a chance to extend the permission check > + */ > + struct cgroup_taskset tset = { > + .src_csets = LIST_HEAD_INIT(tset.src_csets), > + .dst_csets = LIST_HEAD_INIT(tset.dst_csets), > + .csets = &tset.src_csets, > + }; > + struct css_set *cset; > + > + cset = task_css_set(task); Do we need to take css_set_lock here? If not, why? > + list_add(&cset->mg_node, &tset.src_csets); > + ret = cgroup_allow_attach(dst_cgrp, &tset); > + list_del(&tset.src_csets); This should be list_del_init(&cset->mg_node); since you are deleting task's cset from the tset list, not other way around. It only happen to work because there is exactly 1 member in tset.src_csets and list_del done on it is exactly list_del_init on the node, so you are not leaving with uncorrupted mg_node in task's cset. > + if (ret) > + ret = -EACCES; > + } > > if (!ret && cgroup_on_dfl(dst_cgrp)) { > struct super_block *sb = of->file->f_path.dentry->d_sb; Isn't this, generally speaking, racy? We take current task's cset and check if we have rights to move it over. But we do not have any locking between check and actual move, so can the cset change between these 2 operations? And if cset can't really change and it is only 1 task, then why do we bother with forming taskset at all? Can we make allow_attach take just the target task argument? Thanks. -- Dmitry