Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S934582AbcLMQua (ORCPT ); Tue, 13 Dec 2016 11:50:30 -0500 Received: from mail-oi0-f52.google.com ([209.85.218.52]:35227 "EHLO mail-oi0-f52.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S934006AbcLMQtE (ORCPT ); Tue, 13 Dec 2016 11:49:04 -0500 MIME-Version: 1.0 In-Reply-To: <221e80bd-3d99-6c35-dcd3-b2547f0abb11@schaufler-ca.com> References: <1481593143-18756-1-git-send-email-john.stultz@linaro.org> <221e80bd-3d99-6c35-dcd3-b2547f0abb11@schaufler-ca.com> From: John Stultz Date: Tue, 13 Dec 2016 08:49:03 -0800 Message-ID: Subject: Re: [PATCH v5] cgroup: Add new capability to allow a process to migrate other tasks between cgroups To: Casey Schaufler Cc: Michael Kerrisk , lkml , Tejun Heo , Li Zefan , Jonathan Corbet , "open list:CONTROL GROUP (CGROUP)" , Android Kernel Team , Rom Lemarchand , Colin Cross , Dmitry Shmidt , Todd Kjos , Christian Poetzsch , Amit Pundir , Dmitry Torokhov , Kees Cook , "Serge E . Hallyn" , Andy Lutomirski , Linux API Content-Type: text/plain; charset=UTF-8 Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 3396 Lines: 69 On Tue, Dec 13, 2016 at 8:39 AM, Casey Schaufler wrote: > On 12/13/2016 1:47 AM, Michael Kerrisk (man-pages) wrote: >> Hi John, >> >> On 13 December 2016 at 02:39, John Stultz wrote: >>> This patch adds CAP_GROUP_MIGRATE and logic to allows a process >> s/CAP_GROUP_MIGRATE/CAP_CGROUP_MIGRATE/ >> >>> to migrate other tasks between cgroups. >>> >>> In Android (where this feature originated), the ActivityManager >>> tracks various application states (TOP_APP, FOREGROUND, >>> BACKGROUND, SYSTEM, etc), and then as applications change >>> states, the SchedPolicy logic will migrate the application tasks >>> between different cgroups used to control the different >>> application states (for example, there is a background cpuset >>> cgroup which can limit background tasks to stay on one low-power >>> cpu, and the bg_non_interactive cpuctrl cgroup can then further >>> limit those background tasks to a small percentage of that one >>> cpu's cpu time). >>> >>> However, for security reasons, Android doesn't want to make the >>> system_server (the process that runs the ActivityManager and >>> SchedPolicy logic), run as root. So in the Android common.git >>> kernel, they have some logic to allow cgroups to loosen their >>> permissions so CAP_SYS_NICE tasks can migrate other tasks between >>> cgroups. >>> >>> I feel the approach taken there overloads CAP_SYS_NICE a bit much >>> for non-android environments. Efforts to re-use CAP_SYS_RESOURCE >>> for this purpose (which Android has since adopted) was also >>> stymied by concerns about risks from future cgroups that could be >>> considered "dangerous" by how they might change system semantics. >>> >>> So to avoid overlapping usage, this patch adds a brand new >>> process capability flag (CAP_CGROUP_MIGRATE), and uses it when >>> checking if a task can migrate other tasks between cgroups. >>> >>> I've tested this with AOSP master (though its a bit hacked in as >>> I still need to properly get the selinux bits aware of the new >>> capability bit) with selinux set to permissive and it seems to be >>> working well. >>> >>> Thoughts and feedback would be appreciated! >> So, back to the discussion of silos. I understand the argument for >> wanting a new silo. But, in that case can we at least try not to make >> it a single-use silo? >> >> How about CAP_CGROUP_CONTROL or some such, with the idea that this >> might be a capability that allows the holder to step outside usual >> cgroup rules? At the moment, that capability would allow only one such >> step, but maybe there would be others in the future. > > I agree, but want to put it more strongly. The granularity of > capabilities can never be fine enough for some people, and this > is an example of a case where you're going a bit too far. If the > use case is Android as you say, you don't need this. As my friends > on the far side of the aisle would say, "just write SELinux policy" > to correctly control access as required. So.. The trouble is that while selinux is good for restricting permissions, the in-kernel permission checks here are already too restrictive. It seems one must first loosen things up before we can tighten it with selinux rules. Or are you suggesting the system_server run as root + further selinux limitations? I worry, the Android developers may still be hesitant to do that. thanks -john