Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1754204AbcJEAii (ORCPT ); Tue, 4 Oct 2016 20:38:38 -0400 Received: from h2.hallyn.com ([78.46.35.8]:48406 "EHLO h2.hallyn.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752877AbcJEAig (ORCPT ); Tue, 4 Oct 2016 20:38:36 -0400 Date: Tue, 4 Oct 2016 19:38:33 -0500 From: "Serge E. Hallyn" To: John Stultz Cc: lkml , Tejun Heo , Li Zefan , Jonathan Corbet , cgroups@vger.kernel.org, Android Kernel Team , Rom Lemarchand , Colin Cross , Dmitry Shmidt , Todd Kjos , Christian Poetzsch , Amit Pundir , "Serge E. Hallyn" Subject: Re: [RFC][PATCH] cgroup: Add new capability to allow a process to migrate other tasks between cgroups Message-ID: <20161005003833.GA29239@mail.hallyn.com> References: <1475626874-22949-1-git-send-email-john.stultz@linaro.org> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <1475626874-22949-1-git-send-email-john.stultz@linaro.org> User-Agent: Mutt/1.5.21 (2010-09-15) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 3905 Lines: 100 Quoting John Stultz (john.stultz@linaro.org): > This patch adds CAP_GROUP_MIGRATE_TASK and logic to allows a process > to migrate other tasks between cgroups. > > In Android (where this feature originated), the ActivityManager tracks > various application states (TOP_APP, FOREGROUND, BACKGROUND, SYSTEM, > etc), and then as applications change states, the SchedPolicy logic > will migrate the application tasks between different cgroups used > to control the different application states (for example, there is a > background cpuset cgroup which can limit background tasks to stay > on one low-power cpu, and the bg_non_interactive cpuctrl cgroup can > then further limit those background tasks to a small percentage of > that one cpu's cpu time). > > However, for security reasons, Android doesn't want to make the > system_server (the process that runs the ActivityManager and > SchedPolicy logic), run as root. So in the Android common.git > kernel, they have some logic to allow cgroups to loosen their > permissions so CAP_SYS_NICE tasks can migrate other tasks between > cgroups. > > The approach taken there overloads CAP_SYS_NICE a bit much, and > is maybe more complicated then needed. > > So this patch, as suggested by Tejun, simply adds a new process > capability flag (CAP_CGROUP_MIGRATE_TASK), and uses it when checking So realistically, what all can this mean? Freezing tasks, changing cpu/memory limits, changing network and disk throughput, forbid forking, and (most importantly) forbid access to certain devices. I think that's all ok. (And we still separately check for inode write perms.) If anything I'd say the GLOBAL_ROOT_UID check could be taken out since otherwise a host-root task effectively cannot drop this capability. > if a task can migrate other tasks between cgroups. > > I've tested this with AOSP master (though its a bit hacked in as I > still need to properly get the selinux bits aware of the new > capability bit) with selinux set to permissive and it seems to be > working well. > > Thouhts and feedback would be appreciated! > > Cc: Tejun Heo > Cc: Li Zefan > Cc: Jonathan Corbet > Cc: cgroups@vger.kernel.org > Cc: Android Kernel Team > Cc: Rom Lemarchand > Cc: Colin Cross > Cc: Dmitry Shmidt > Cc: Todd Kjos > Cc: Christian Poetzsch > Cc: Amit Pundir > Cc: Serge E. Hallyn Acked-by: Serge Hallyn > Signed-off-by: John Stultz > --- > include/uapi/linux/capability.h | 5 ++++- > kernel/cgroup.c | 3 ++- > 2 files changed, 6 insertions(+), 2 deletions(-) > > diff --git a/include/uapi/linux/capability.h b/include/uapi/linux/capability.h > index 49bc062..e199ea0 100644 > --- a/include/uapi/linux/capability.h > +++ b/include/uapi/linux/capability.h > @@ -349,8 +349,11 @@ struct vfs_cap_data { > > #define CAP_AUDIT_READ 37 > > +/* Allow migrating tasks between cgroups */ > > -#define CAP_LAST_CAP CAP_AUDIT_READ > +#define CAP_CGROUP_MIGRATE_TASK 38 > + > +#define CAP_LAST_CAP CAP_CGROUP_MIGRATE_TASK > > #define cap_valid(x) ((x) >= 0 && (x) <= CAP_LAST_CAP) > > diff --git a/kernel/cgroup.c b/kernel/cgroup.c > index 9ba28310..a318956 100644 > --- a/kernel/cgroup.c > +++ b/kernel/cgroup.c > @@ -2847,7 +2847,8 @@ static int cgroup_procs_write_permission(struct task_struct *task, > */ > if (!uid_eq(cred->euid, GLOBAL_ROOT_UID) && > !uid_eq(cred->euid, tcred->uid) && > - !uid_eq(cred->euid, tcred->suid)) > + !uid_eq(cred->euid, tcred->suid) && > + !ns_capable(tcred->user_ns, CAP_CGROUP_MIGRATE_TASK)) > ret = -EACCES; > > if (!ret && cgroup_on_dfl(dst_cgrp)) { > -- > 1.9.1