Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1754552AbcJUGmd (ORCPT ); Fri, 21 Oct 2016 02:42:33 -0400 Received: from mail-lf0-f66.google.com ([209.85.215.66]:36466 "EHLO mail-lf0-f66.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1754420AbcJUGma (ORCPT ); Fri, 21 Oct 2016 02:42:30 -0400 Subject: Re: [RFC][PATCH v3] cgroup: Use CAP_SYS_RESOURCE to allow a process to migrate other tasks between cgroups To: John Stultz , lkml References: <1477013079-23692-1-git-send-email-john.stultz@linaro.org> Cc: mtk.manpages@gmail.com, Tejun Heo , Li Zefan , Jonathan Corbet , cgroups@vger.kernel.org, Android Kernel Team , Rom Lemarchand , Colin Cross , Dmitry Shmidt , Todd Kjos , Christian Poetzsch , Amit Pundir , Dmitry Torokhov , Kees Cook , "Serge E . Hallyn" , linux-api@vger.kernel.org From: "Michael Kerrisk (man-pages)" Message-ID: <937568dd-61b8-ac09-3cdf-789ff14f5423@gmail.com> Date: Fri, 21 Oct 2016 08:42:22 +0200 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:45.0) Gecko/20100101 Thunderbird/45.2.0 MIME-Version: 1.0 In-Reply-To: <1477013079-23692-1-git-send-email-john.stultz@linaro.org> Content-Type: text/plain; charset=windows-1252 Content-Transfer-Encoding: 7bit Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 3358 Lines: 88 Hi John, On 10/21/2016 03:24 AM, John Stultz wrote: > This patch adds logic to allows a process to migrate other tasks > between cgroups if they have CAP_SYS_RESOURCE. This appears to be a patch against your previous patch, rather than against mainline. Was that intended? Cheers, Michael > In Android (where this feature originated), the ActivityManager tracks > various application states (TOP_APP, FOREGROUND, BACKGROUND, SYSTEM, > etc), and then as applications change states, the SchedPolicy logic > will migrate the application tasks between different cgroups used > to control the different application states (for example, there is a > background cpuset cgroup which can limit background tasks to stay > on one low-power cpu, and the bg_non_interactive cpuctrl cgroup can > then further limit those background tasks to a small percentage of > that one cpu's cpu time). > > However, for security reasons, Android doesn't want to make the > system_server (the process that runs the ActivityManager and > SchedPolicy logic), run as root. So in the Android common.git > kernel, they have some logic to allow cgroups to loosen their > permissions so CAP_SYS_NICE tasks can migrate other tasks between > cgroups. > > I feel the approach taken there overloads CAP_SYS_NICE a bit much, > and is maybe more complicated then needed. > > So this patch, as suggested by Michael Kerrisk, simply adds a > check for CAP_SYS_RESOURCE. > > I've tested this with AOSP master, and this seems to work well > as Zygote and system_server already use CAP_SYS_RESOURCE. I've > also submitted patches against the android-4.4 kernel to change > it to use CAP_SYS_RESOURCE, and the Android developers have > seemed ok with this change. > > Thouhts and feedback would be appreciated! > > Cc: Tejun Heo > Cc: Li Zefan > Cc: Jonathan Corbet > Cc: cgroups@vger.kernel.org > Cc: Android Kernel Team > Cc: Rom Lemarchand > Cc: Colin Cross > Cc: Dmitry Shmidt > Cc: Todd Kjos > Cc: Christian Poetzsch > Cc: Amit Pundir > Cc: Dmitry Torokhov > Cc: Kees Cook > Cc: Serge E. Hallyn > Cc: linux-api@vger.kernel.org > Signed-off-by: John Stultz > --- > v2: Renamed to just CAP_CGROUP_MIGRATE as reccomended by Tejun > v3: Switched to just using CAP_SYS_RESOURCE as suggested by Michael > --- > kernel/cgroup.c | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/kernel/cgroup.c b/kernel/cgroup.c > index 09f84d2..866059a 100644 > --- a/kernel/cgroup.c > +++ b/kernel/cgroup.c > @@ -2857,7 +2857,7 @@ static int cgroup_procs_write_permission(struct task_struct *task, > if (!uid_eq(cred->euid, GLOBAL_ROOT_UID) && > !uid_eq(cred->euid, tcred->uid) && > !uid_eq(cred->euid, tcred->suid) && > - !ns_capable(tcred->user_ns, CAP_CGROUP_MIGRATE)) > + !ns_capable(tcred->user_ns, CAP_SYS_RESOURCE)) > ret = -EACCES; > > if (!ret && cgroup_on_dfl(dst_cgrp)) { > -- Michael Kerrisk Linux man-pages maintainer; http://www.kernel.org/doc/man-pages/ Linux/UNIX System Programming Training: http://man7.org/training/