2008-12-03 19:17:35

by Serge E. Hallyn

[permalink] [raw]
Subject: [PATCH 1/2] user namespaces: let user_ns be cloned with fairsched

(These two patches are in the next-unacked branch of
git://git.kernel.org/pub/scm/linux/kernel/git/sergeh/userns-2.6.
If they get some ACKs, then I hope to feed this into security-next.
After these two, I think we're ready to tackle userns+capabilities)

Fairsched creates a per-uid directory under /sys/kernel/uids/.
So when you clone(CLONE_NEWUSER), it tries to create
/sys/kernel/uids/0, which already exists, and you get back
-ENOMEM.

This was supposed to be fixed by sysfs tagging, but that
was postponed (ok, rejected until sysfs locking is fixed).
So, just as with network namespaces, we just don't create
those directories for user namespaces other than the init.

Signed-off-by: Serge E. Hallyn <[email protected]>
---
kernel/user.c | 4 ++++
1 files changed, 4 insertions(+), 0 deletions(-)

diff --git a/kernel/user.c b/kernel/user.c
index 97202cb..6c924bc 100644
--- a/kernel/user.c
+++ b/kernel/user.c
@@ -246,6 +246,8 @@ static int uids_user_create(struct user_struct *up)
int error;

memset(kobj, 0, sizeof(struct kobject));
+ if (up->user_ns != &init_user_ns)
+ return 0;
kobj->kset = uids_kset;
error = kobject_init_and_add(kobj, &uids_ktype, NULL, "%d", up->uid);
if (error) {
@@ -281,6 +283,8 @@ static void remove_user_sysfs_dir(struct work_struct *w)
unsigned long flags;
int remove_user = 0;

+ if (up->user_ns != &init_user_ns)
+ return;
/* Make uid_hash_remove() + sysfs_remove_file() + kobject_del()
* atomic.
*/
--
1.5.4.3


2008-12-03 19:17:51

by Serge E. Hallyn

[permalink] [raw]
Subject: [PATCH 2/2] user namespaces: require cap_set{ug}id for CLONE_NEWUSER

While ideally CLONE_NEWUSER will eventually require no
privilege, the required permission checks are currently
not there. As a result, CLONE_NEWUSER has the same effect
as a setuid(0)+setgroups(1,"0"). While we already require
CAP_SYS_ADMIN, requiring CAP_SETUID and CAP_SETGID seems
appropriate.

Signed-off-by: Serge E. Hallyn <[email protected]>
---
kernel/fork.c | 3 ++-
1 files changed, 2 insertions(+), 1 deletions(-)

diff --git a/kernel/fork.c b/kernel/fork.c
index 1dd8945..e3a85b3 100644
--- a/kernel/fork.c
+++ b/kernel/fork.c
@@ -1344,7 +1344,8 @@ long do_fork(unsigned long clone_flags,
/* hopefully this check will go away when userns support is
* complete
*/
- if (!capable(CAP_SYS_ADMIN))
+ if (!capable(CAP_SYS_ADMIN) || !capable(CAP_SETUID) ||
+ !capable(CAP_SETGID))
return -EPERM;
}

--
1.5.4.3

2008-12-05 10:08:42

by James Morris

[permalink] [raw]
Subject: Re: [PATCH 1/2] user namespaces: let user_ns be cloned with fairsched

On Wed, 3 Dec 2008, Serge E. Hallyn wrote:

> (These two patches are in the next-unacked branch of
> git://git.kernel.org/pub/scm/linux/kernel/git/sergeh/userns-2.6.
> If they get some ACKs, then I hope to feed this into security-next.
> After these two, I think we're ready to tackle userns+capabilities)

These look ok to me, but no acks so far. Any reason not to apply them?


- James
--
James Morris
<[email protected]>

2008-12-05 14:46:10

by Serge E. Hallyn

[permalink] [raw]
Subject: Re: [PATCH 1/2] user namespaces: let user_ns be cloned with fairsched

Quoting James Morris ([email protected]):
> On Wed, 3 Dec 2008, Serge E. Hallyn wrote:
>
> > (These two patches are in the next-unacked branch of
> > git://git.kernel.org/pub/scm/linux/kernel/git/sergeh/userns-2.6.
> > If they get some ACKs, then I hope to feed this into security-next.
> > After these two, I think we're ready to tackle userns+capabilities)
>
> These look ok to me, but no acks so far. Any reason not to apply them?

Thanks for taking a look, James. Yes, there were specific acks I
was looking for.

Dhaval, could you take a look at the first one and tell me if it
is a problem for fairsched?

Eric, could you take a look at the second one? Actually, Daniel,
you've played with file capabilities for liblxc - would
http://lkml.org/lkml/2008/12/3/277 be a problem for you?

thanks,
-serge

2008-12-05 16:26:38

by Eric W. Biederman

[permalink] [raw]
Subject: Re: [PATCH 2/2] user namespaces: require cap_set{ug}id for CLONE_NEWUSER

"Serge E. Hallyn" <[email protected]> writes:

> While ideally CLONE_NEWUSER will eventually require no
> privilege, the required permission checks are currently
> not there. As a result, CLONE_NEWUSER has the same effect
> as a setuid(0)+setgroups(1,"0"). While we already require
> CAP_SYS_ADMIN, requiring CAP_SETUID and CAP_SETGID seems
> appropriate.

This looks reasonable. For the short term we will need a greater
set of caps to be able to do all of the interesting things.

Personally the user namespace only becomes interesting when we
start to be able to move in the other direction and remove the
set of capabilities requires to create it.

Eric

2008-12-05 16:46:22

by Serge E. Hallyn

[permalink] [raw]
Subject: Re: [PATCH 2/2] user namespaces: require cap_set{ug}id for CLONE_NEWUSER

Quoting Eric W. Biederman ([email protected]):
> "Serge E. Hallyn" <[email protected]> writes:
>
> > While ideally CLONE_NEWUSER will eventually require no
> > privilege, the required permission checks are currently
> > not there. As a result, CLONE_NEWUSER has the same effect
> > as a setuid(0)+setgroups(1,"0"). While we already require
> > CAP_SYS_ADMIN, requiring CAP_SETUID and CAP_SETGID seems
> > appropriate.
>
> This looks reasonable. For the short term we will need a greater
> set of caps to be able to do all of the interesting things.

Could you ack the patch? Stephen explicitly doesn't want patches
in linux-next which haven't been acked, and security-next feeds
into linux-next, so I don't want to ask James to take the patch
without an ack :)

> Personally the user namespace only becomes interesting when we
> start to be able to move in the other direction and remove the
> set of capabilities requires to create it.
>
> Eric

Agreed. Now the thing is I don't think we need full userns
support to get there. We just need the targeted capabilities
and the basic dummy fs support - that is, init_user_ns owns
all vfsmounts, and anyone not in init_user_ns only gets
user other access to files under those mounts.

Of course complete support for targeted caps will in itself
be a huge effort :)

So my roadmap is: next address the per-user keyring, then
the targeted caps.

-serge

2008-12-05 17:16:42

by Eric W. Biederman

[permalink] [raw]
Subject: Re: [PATCH 2/2] user namespaces: require cap_set{ug}id for CLONE_NEWUSER

"Serge E. Hallyn" <[email protected]> writes:

> While ideally CLONE_NEWUSER will eventually require no
> privilege, the required permission checks are currently
> not there. As a result, CLONE_NEWUSER has the same effect
> as a setuid(0)+setgroups(1,"0"). While we already require
> CAP_SYS_ADMIN, requiring CAP_SETUID and CAP_SETGID seems
> appropriate.

Acked-by: "Eric W. Biederman" <[email protected]>

The patch looks good, and we are likely to need more caps to
actually use it.

>
> Signed-off-by: Serge E. Hallyn <[email protected]>
> ---
> kernel/fork.c | 3 ++-
> 1 files changed, 2 insertions(+), 1 deletions(-)
>
> diff --git a/kernel/fork.c b/kernel/fork.c
> index 1dd8945..e3a85b3 100644
> --- a/kernel/fork.c
> +++ b/kernel/fork.c
> @@ -1344,7 +1344,8 @@ long do_fork(unsigned long clone_flags,
> /* hopefully this check will go away when userns support is
> * complete
> */
> - if (!capable(CAP_SYS_ADMIN))
> + if (!capable(CAP_SYS_ADMIN) || !capable(CAP_SETUID) ||
> + !capable(CAP_SETGID))
> return -EPERM;
> }
>
> --
> 1.5.4.3

2008-12-05 17:26:36

by Eric W. Biederman

[permalink] [raw]
Subject: Re: [PATCH 2/2] user namespaces: require cap_set{ug}id for CLONE_NEWUSER

"Serge E. Hallyn" <[email protected]> writes:

>> Personally the user namespace only becomes interesting when we
>> start to be able to move in the other direction and remove the
>> set of capabilities requires to create it.
>>
>> Eric
>
> Agreed. Now the thing is I don't think we need full userns
> support to get there. We just need the targeted capabilities
> and the basic dummy fs support - that is, init_user_ns owns
> all vfsmounts, and anyone not in init_user_ns only gets
> user other access to files under those mounts.

Right.

> Of course complete support for targeted caps will in itself
> be a huge effort :)
>
> So my roadmap is: next address the per-user keyring, then
> the targeted caps.

Sounds good.

I expect this means we will pass through a period where the user
namespace is less useful than it is today. But as it will be on
a much firmer foundation that is fine.

Eric

2008-12-07 22:51:44

by James Morris

[permalink] [raw]
Subject: Re: [PATCH 1/2] user namespaces: let user_ns be cloned with fairsched

On Wed, 3 Dec 2008, Serge E. Hallyn wrote:

> (These two patches are in the next-unacked branch of
> git://git.kernel.org/pub/scm/linux/kernel/git/sergeh/userns-2.6.
> If they get some ACKs, then I hope to feed this into security-next.
> After these two, I think we're ready to tackle userns+capabilities)
>
> Fairsched creates a per-uid directory under /sys/kernel/uids/.
> So when you clone(CLONE_NEWUSER), it tries to create
> /sys/kernel/uids/0, which already exists, and you get back
> -ENOMEM.
>
> This was supposed to be fixed by sysfs tagging, but that
> was postponed (ok, rejected until sysfs locking is fixed).
> So, just as with network namespaces, we just don't create
> those directories for user namespaces other than the init.
>
> Signed-off-by: Serge E. Hallyn <[email protected]>

Applied to
git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/security-testing-2.6#next


--
James Morris
<[email protected]>

2008-12-07 22:51:58

by James Morris

[permalink] [raw]
Subject: Re: [PATCH 2/2] user namespaces: require cap_set{ug}id for CLONE_NEWUSER

On Fri, 5 Dec 2008, Eric W. Biederman wrote:

> "Serge E. Hallyn" <[email protected]> writes:
>
> > While ideally CLONE_NEWUSER will eventually require no
> > privilege, the required permission checks are currently
> > not there. As a result, CLONE_NEWUSER has the same effect
> > as a setuid(0)+setgroups(1,"0"). While we already require
> > CAP_SYS_ADMIN, requiring CAP_SETUID and CAP_SETGID seems
> > appropriate.
>
> Acked-by: "Eric W. Biederman" <[email protected]>
>
> The patch looks good, and we are likely to need more caps to
> actually use it.

Applied to
git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/security-testing-2.6#next


--
James Morris
<[email protected]>

2008-12-08 16:15:18

by Serge E. Hallyn

[permalink] [raw]
Subject: Re: [PATCH 1/2] user namespaces: let user_ns be cloned with fairsched

Quoting James Morris ([email protected]):
> On Wed, 3 Dec 2008, Serge E. Hallyn wrote:
>
> > (These two patches are in the next-unacked branch of
> > git://git.kernel.org/pub/scm/linux/kernel/git/sergeh/userns-2.6.
> > If they get some ACKs, then I hope to feed this into security-next.
> > After these two, I think we're ready to tackle userns+capabilities)
> >
> > Fairsched creates a per-uid directory under /sys/kernel/uids/.
> > So when you clone(CLONE_NEWUSER), it tries to create
> > /sys/kernel/uids/0, which already exists, and you get back
> > -ENOMEM.
> >
> > This was supposed to be fixed by sysfs tagging, but that
> > was postponed (ok, rejected until sysfs locking is fixed).
> > So, just as with network namespaces, we just don't create
> > those directories for user namespaces other than the init.
> >
> > Signed-off-by: Serge E. Hallyn <[email protected]>
>
> Applied to
> git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/security-testing-2.6#next

Thanks, James. I talked about patch 1 with Dhaval, and while he's ok
with the patch he (rightfully) thought there should be some extra
documentation. If it's not too much trouble would you mind swapping
out patch 1 for the following? (Otherwise I can send a new patch on
top of the original)

thanks,
-serge

>From 047b66fff5e014ac0eb995b8a60ff396abe2e8b2 Mon Sep 17 00:00:00 2001
From: Serge E. Hallyn <[email protected]>
Date: Mon, 8 Dec 2008 07:24:33 -0800
Subject: [PATCH 1/1] user namespaces: let user_ns be cloned with fairsched

fairsched creates a per-uid directory under /sys/kernel/uids/.
So when you clone(CLONE_NEWUSER), it tries to create
/sys/kernel/uids/0, which already exists, and you get back
-ENOMEM.

This was supposed to be fixed by sysfs tagging, but that
was postponed (ok, rejected until sysfs locking is fixed).
So, just as with network namespaces, we just don't create
those directories for user namespaces other than the init.

Changelog:
Dec 8 2008: Documented the currently bogus state of
support for user groups with user namespaces. In
particular, all users in a user namespace should be
children of the user which created the user namespace.
This is yet to be unimplemented.

Signed-off-by: Serge E. Hallyn <[email protected]>
Acked-by: Dhaval Giani <[email protected]>
---
Documentation/scheduler/sched-design-CFS.txt | 21 +++++++++++++++++++++
kernel/user.c | 12 +++++++++++-
2 files changed, 32 insertions(+), 1 deletions(-)

diff --git a/Documentation/scheduler/sched-design-CFS.txt b/Documentation/scheduler/sched-design-CFS.txt
index eb471c7..8398ca4 100644
--- a/Documentation/scheduler/sched-design-CFS.txt
+++ b/Documentation/scheduler/sched-design-CFS.txt
@@ -273,3 +273,24 @@ task groups and modify their CPU share using the "cgroups" pseudo filesystem.

# #Launch gmplayer (or your favourite movie player)
# echo <movie_player_pid> > multimedia/tasks
+
+8. Implementation note: user namespaces
+
+User namespaces are intended to be hierarchical. But they are currently
+only partially implemented. Each of those has ramifications for CFS.
+
+First, since user namespaces are hierarchical, the /sys/kernel/uids
+presentation is inadequate. Eventually we will likely want to use sysfs
+tagging to provide private views of /sys/kernel/uids within each user
+namespace.
+
+Second, the hierarchical nature is intended to support completely
+unprivileged use of user namespaces. So if using user groups, then
+we want the users in a user namespace to be children of the user
+who created it.
+
+That is currently unimplemented. So instead, every user in a new
+user namespace will receive 1024 shares just like any user in the
+initial user namespace. Note that at the moment creation of a new
+user namespace requires each of CAP_SYS_ADMIN, CAP_SETUID, and
+CAP_SETGID.
diff --git a/kernel/user.c b/kernel/user.c
index 97202cb..6608a3d 100644
--- a/kernel/user.c
+++ b/kernel/user.c
@@ -239,13 +239,21 @@ static struct kobj_type uids_ktype = {
.release = uids_release,
};

-/* create /sys/kernel/uids/<uid>/cpu_share file for this user */
+/*
+ * Create /sys/kernel/uids/<uid>/cpu_share file for this user
+ * We do not create this file for users in a user namespace (until
+ * sysfs tagging is implemented).
+ *
+ * See Documentation/scheduler/sched-design-CFS.txt for ramifications.
+ */
static int uids_user_create(struct user_struct *up)
{
struct kobject *kobj = &up->kobj;
int error;

memset(kobj, 0, sizeof(struct kobject));
+ if (up->user_ns != &init_user_ns)
+ return 0;
kobj->kset = uids_kset;
error = kobject_init_and_add(kobj, &uids_ktype, NULL, "%d", up->uid);
if (error) {
@@ -281,6 +289,8 @@ static void remove_user_sysfs_dir(struct work_struct *w)
unsigned long flags;
int remove_user = 0;

+ if (up->user_ns != &init_user_ns)
+ return;
/* Make uid_hash_remove() + sysfs_remove_file() + kobject_del()
* atomic.
*/
--
1.5.4.3

2008-12-08 21:15:34

by James Morris

[permalink] [raw]
Subject: Re: [PATCH 1/2] user namespaces: let user_ns be cloned with fairsched

On Mon, 8 Dec 2008, Serge E. Hallyn wrote:

> > Applied to
> > git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/security-testing-2.6#next
>
> Thanks, James. I talked about patch 1 with Dhaval, and while he's ok
> with the patch he (rightfully) thought there should be some extra
> documentation. If it's not too much trouble would you mind swapping
> out patch 1 for the following? (Otherwise I can send a new patch on
> top of the original)

It was applied to a public tree, so an update patch is needed.

>
> thanks,
> -serge
>
> >From 047b66fff5e014ac0eb995b8a60ff396abe2e8b2 Mon Sep 17 00:00:00 2001
> From: Serge E. Hallyn <[email protected]>
> Date: Mon, 8 Dec 2008 07:24:33 -0800
> Subject: [PATCH 1/1] user namespaces: let user_ns be cloned with fairsched
>
> fairsched creates a per-uid directory under /sys/kernel/uids/.
> So when you clone(CLONE_NEWUSER), it tries to create
> /sys/kernel/uids/0, which already exists, and you get back
> -ENOMEM.
>
> This was supposed to be fixed by sysfs tagging, but that
> was postponed (ok, rejected until sysfs locking is fixed).
> So, just as with network namespaces, we just don't create
> those directories for user namespaces other than the init.
>
> Changelog:
> Dec 8 2008: Documented the currently bogus state of
> support for user groups with user namespaces. In
> particular, all users in a user namespace should be
> children of the user which created the user namespace.
> This is yet to be unimplemented.
>
> Signed-off-by: Serge E. Hallyn <[email protected]>
> Acked-by: Dhaval Giani <[email protected]>
> ---
> Documentation/scheduler/sched-design-CFS.txt | 21 +++++++++++++++++++++
> kernel/user.c | 12 +++++++++++-
> 2 files changed, 32 insertions(+), 1 deletions(-)
>
> diff --git a/Documentation/scheduler/sched-design-CFS.txt b/Documentation/scheduler/sched-design-CFS.txt
> index eb471c7..8398ca4 100644
> --- a/Documentation/scheduler/sched-design-CFS.txt
> +++ b/Documentation/scheduler/sched-design-CFS.txt
> @@ -273,3 +273,24 @@ task groups and modify their CPU share using the "cgroups" pseudo filesystem.
>
> # #Launch gmplayer (or your favourite movie player)
> # echo <movie_player_pid> > multimedia/tasks
> +
> +8. Implementation note: user namespaces
> +
> +User namespaces are intended to be hierarchical. But they are currently
> +only partially implemented. Each of those has ramifications for CFS.
> +
> +First, since user namespaces are hierarchical, the /sys/kernel/uids
> +presentation is inadequate. Eventually we will likely want to use sysfs
> +tagging to provide private views of /sys/kernel/uids within each user
> +namespace.
> +
> +Second, the hierarchical nature is intended to support completely
> +unprivileged use of user namespaces. So if using user groups, then
> +we want the users in a user namespace to be children of the user
> +who created it.
> +
> +That is currently unimplemented. So instead, every user in a new
> +user namespace will receive 1024 shares just like any user in the
> +initial user namespace. Note that at the moment creation of a new
> +user namespace requires each of CAP_SYS_ADMIN, CAP_SETUID, and
> +CAP_SETGID.
> diff --git a/kernel/user.c b/kernel/user.c
> index 97202cb..6608a3d 100644
> --- a/kernel/user.c
> +++ b/kernel/user.c
> @@ -239,13 +239,21 @@ static struct kobj_type uids_ktype = {
> .release = uids_release,
> };
>
> -/* create /sys/kernel/uids/<uid>/cpu_share file for this user */
> +/*
> + * Create /sys/kernel/uids/<uid>/cpu_share file for this user
> + * We do not create this file for users in a user namespace (until
> + * sysfs tagging is implemented).
> + *
> + * See Documentation/scheduler/sched-design-CFS.txt for ramifications.
> + */
> static int uids_user_create(struct user_struct *up)
> {
> struct kobject *kobj = &up->kobj;
> int error;
>
> memset(kobj, 0, sizeof(struct kobject));
> + if (up->user_ns != &init_user_ns)
> + return 0;
> kobj->kset = uids_kset;
> error = kobject_init_and_add(kobj, &uids_ktype, NULL, "%d", up->uid);
> if (error) {
> @@ -281,6 +289,8 @@ static void remove_user_sysfs_dir(struct work_struct *w)
> unsigned long flags;
> int remove_user = 0;
>
> + if (up->user_ns != &init_user_ns)
> + return;
> /* Make uid_hash_remove() + sysfs_remove_file() + kobject_del()
> * atomic.
> */
> --
> 1.5.4.3
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at http://www.tux.org/lkml/
>

--
James Morris
<[email protected]>