Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1761195AbZCTAXh (ORCPT ); Thu, 19 Mar 2009 20:23:37 -0400 Received: (majordomo@vger.kernel.org) by vger.kernel.org id S1755342AbZCTAX1 (ORCPT ); Thu, 19 Mar 2009 20:23:27 -0400 Received: from e37.co.us.ibm.com ([32.97.110.158]:43924 "EHLO e37.co.us.ibm.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752710AbZCTAX0 (ORCPT ); Thu, 19 Mar 2009 20:23:26 -0400 Date: Thu, 19 Mar 2009 19:23:21 -0500 From: "Serge E. Hallyn" To: Matt Helsley Cc: lkml , Dhaval Giani , mingo@elte.hu, Bharata B Rao , peterz@infradead.org, Linux Containers Subject: Re: [PATCH 1/1] introduce user_ns inheritance in user-sched Message-ID: <20090320002321.GA26056@us.ibm.com> References: <20090319211615.GA18383@us.ibm.com> <20090319235503.GA15844@us.ibm.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20090319235503.GA15844@us.ibm.com> User-Agent: Mutt/1.5.18 (2008-05-17) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 5652 Lines: 176 Quoting Matt Helsley (matthltc@us.ibm.com): > Shouldn't this put_user_ns(new->user_ns) be removed? It looks like two > references to new->user_ns are being dropped if anything fails > after sched_create_user(new) succeeds yet as far as I can tell the > patch does not introduce any new references to new->user_ns. > > Otherwise looks good to me. Here is the new version. Thanks again. >From 55c264b27cb1f6f91007ae2aeda2d4f6067bb2eb Mon Sep 17 00:00:00 2001 From: Serge E. Hallyn Date: Wed, 18 Mar 2009 13:29:32 -0700 Subject: [PATCH] introduce user_ns inheritance in user-sched (v2) In a kernel compiled with CONFIG_USER_SCHED=y, cpu shares are allocated according to uid. Shares are specifiable under /sys/kernel/uids/ In a kernel compiled with CONFIG_USER_NS=y, clone(2) with the CLONE_NEWUSER flag creates a new user namespace, and the newly cloned task will belong to uid 0 in the new user namespace. Without this patch, if uid 500 calls clone(CLONE_NEWUSER) (which is possible using a program with the cap_sys_admin,cap_setuid,cap_setgid=pe file capabilities), then the new task will get the cpu shares of uid 0. After this patch, if uid 500 calls clone(CLONE_NEWUSER), then even though it is uid 0 in the new user namespace, it will be restricted to the cpu shares of uid 500. Currently there is no way to set shares for uids in user namespaces other than the initial one. That will be trivial to add when sysfs tagging (or its functional equivalent, also needed to expose network devices in network namespaces other than init) becomes available. Until cross-user-namespace file accesses are enforced, nothing stops uid 0 in a child namespace from simply writing new values into /sys/kernel/uids/500. Here are results of some testing with and without the patch. Cpu shares are initialized as follows:: user root: 2048 user hallyn: 1024 user serge: 512 Results are the 'real' part of time make -j4 > o 2>&1, each time after a make clean. ================================================================= UNPATCHED User 1: user serge creates a child user_ns and runs as user root User 2: hallyn runs as user hallyn ================================================================= User 1 User 2 run 1: 2m58.834s 3m0.609s run 2: 2m59.248s 2m59.457s ============================================================= PATCHED User 1: user serge User 2: user hallyn ============================================================= User 1 User 2 run 1: 3m6.337s 2m22.681s run 2: 3m6.323s 2m21.855s ============================================================= PATCHED User 1: user serge setuid to user root User 2: hallyn ============================================================= User 1 User 2 run 1: 2m17.782s 3m3.947s run 2: 2m18.497s 3m7.961s ========================================================== PATCHED User 1: user root inside userns created by userid serge User 2: hallyn ========================================================== User 1 User 2 run 1: 3m9.876s 2m8.428s run 2: 3m8.539s 2m6.356s Changelog: Mar 19: Matt Helsley pointed out there were two calls to put_user_ns() in alloc_uid() error path. Signed-off-by: Serge E. Hallyn Signed-off-by: Dhaval Giani Cc: mingo@elte.hu Cc: Bharata B Rao Cc: peterz@infradead.org --- kernel/user.c | 13 +++++++++---- kernel/user_namespace.c | 2 +- 2 files changed, 10 insertions(+), 5 deletions(-) diff --git a/kernel/user.c b/kernel/user.c index 850e0ba..8ae4bf8 100644 --- a/kernel/user.c +++ b/kernel/user.c @@ -101,7 +101,12 @@ static int sched_create_user(struct user_struct *up) { int rc = 0; - up->tg = sched_create_group(&root_task_group); + struct task_group *parent = &root_task_group; + + if (up->user_ns != &init_user_ns) + parent = up->user_ns->creator->tg; + + up->tg = sched_create_group(parent); if (IS_ERR(up->tg)) rc = -ENOMEM; @@ -434,11 +439,11 @@ struct user_struct *alloc_uid(struct user_namespace *ns, uid_t uid) new->uid = uid; atomic_set(&new->__count, 1); + new->user_ns = get_user_ns(ns); + if (sched_create_user(new) < 0) goto out_free_user; - new->user_ns = get_user_ns(ns); - if (uids_user_create(new)) goto out_destoy_sched; @@ -470,8 +475,8 @@ struct user_struct *alloc_uid(struct user_namespace *ns, uid_t uid) out_destoy_sched: sched_destroy_user(new); - put_user_ns(new->user_ns); out_free_user: + put_user_ns(new->user_ns); kmem_cache_free(uid_cachep, new); out_unlock: uids_mutex_unlock(); diff --git a/kernel/user_namespace.c b/kernel/user_namespace.c index 076c7c8..a99d3c7 100644 --- a/kernel/user_namespace.c +++ b/kernel/user_namespace.c @@ -35,6 +35,7 @@ int create_user_ns(struct cred *new) INIT_HLIST_HEAD(ns->uidhash_table + n); /* Alloc new root user. */ + ns->creator = new->user; root_user = alloc_uid(ns, 0); if (!root_user) { kfree(ns); @@ -42,7 +43,6 @@ int create_user_ns(struct cred *new) } /* set the new root user in the credentials under preparation */ - ns->creator = new->user; new->user = root_user; new->uid = new->euid = new->suid = new->fsuid = 0; new->gid = new->egid = new->sgid = new->fsgid = 0; -- 1.5.4.3 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/