2022-02-11 03:22:36

by Eric W. Biederman

[permalink] [raw]
Subject: [PATCH 4/8] ucounts: Only except the root user in init_user_ns from RLIMIT_NPROC

In [1] Michal Koutný <[email protected]> reported that it does not make
sense to unconditionally exempt the INIT_USER during fork and exec
from RLIMIT_NPROC and then to impose a limit on that same user with
is_ucounts_overlimit. So I looked into why the exeception was added.

commit 909cc4ae86f3 ("[PATCH] Fix two bugs with process limits (RLIMIT_NPROC)") says:
> If a setuid process swaps it's real and effective uids and then
> forks, the fork fails if the new realuid has more processes than
> the original process was limited to. This is particularly a
> problem if a user with a process limit (e.g. 256) runs a
> setuid-root program which does setuid() + fork() (e.g. lprng) while
> root already has more than 256 process (which is quite possible).
>
> The root problem here is that a limit which should be a per-user
> limit is being implemented as a per-process limit with per-process
> (e.g. CAP_SYS_RESOURCE) controls. Being a per-user limit, it
> should be that the root-user can over-ride it, not just some
> process with CAP_SYS_RESOURCE.
>
> This patch adds a test to ignore process limits if the real user is root.

With the moving of the limits from per-user to per-user-per-user_ns it
is clear that testing a user_struct is no longer the proper test and
the test should be a test against the ucounts struct to match the
rest of the logic change that was made.

With RLIMIT_NPROC not being enforced for the global root user anywhere
else should it be enforced in is_ucounts_overlimit for a user
namespace created by the global root user?

Since this is limited to just the global root user, and RLIMIT_NPROC
is already ignored for that user I am going to vote no.

This change does imply that it becomes possible to limit all users in
a user namespace but to not enforce the rlimits on the root user or
anyone with CAP_SYS_ADMIN and CAP_SYS_RESOURCE in the user namespace.
It is not clear to me why any of those exceptions exist so I figure
we should until this is actually a problem for someone before
we relax the permission checks here.

Cc: [email protected]
Reported-by: Michal Koutný <[email protected]>
[1] https://lkml.kernel.org/r/[email protected]
History-Tree: https://git.kernel.org/pub/scm/linux/kernel/git/tglx/history.git
Fixes: 21d1c5e386bc ("Reimplement RLIMIT_NPROC on top of ucounts")
Signed-off-by: "Eric W. Biederman" <[email protected]>
---
fs/exec.c | 2 +-
kernel/fork.c | 2 +-
kernel/user_namespace.c | 2 ++
3 files changed, 4 insertions(+), 2 deletions(-)

diff --git a/fs/exec.c b/fs/exec.c
index 1e7f757cbc2c..01c8c7bae9b4 100644
--- a/fs/exec.c
+++ b/fs/exec.c
@@ -1881,7 +1881,7 @@ static int do_execveat_common(int fd, struct filename *filename,
*/
if ((current->flags & PF_NPROC_CHECK) &&
is_ucounts_overlimit(current_ucounts(), UCOUNT_RLIMIT_NPROC, rlimit(RLIMIT_NPROC)) &&
- (current_user() != INIT_USER) &&
+ (current_ucounts() != &init_ucounts) &&
!capable(CAP_SYS_RESOURCE) && !capable(CAP_SYS_ADMIN)) {
retval = -EAGAIN;
goto out_ret;
diff --git a/kernel/fork.c b/kernel/fork.c
index 2b6a28a86325..6f62d37f3650 100644
--- a/kernel/fork.c
+++ b/kernel/fork.c
@@ -2027,7 +2027,7 @@ static __latent_entropy struct task_struct *copy_process(

retval = -EAGAIN;
if (is_ucounts_overlimit(task_ucounts(p), UCOUNT_RLIMIT_NPROC, rlimit(RLIMIT_NPROC))) {
- if (p->real_cred->user != INIT_USER &&
+ if ((task_ucounts(p) != &init_ucounts) &&
!capable(CAP_SYS_RESOURCE) && !capable(CAP_SYS_ADMIN))
goto bad_fork_cleanup_count;
}
diff --git a/kernel/user_namespace.c b/kernel/user_namespace.c
index 6b2e3ca7ee99..f0c04073403d 100644
--- a/kernel/user_namespace.c
+++ b/kernel/user_namespace.c
@@ -123,6 +123,8 @@ int create_user_ns(struct cred *new)
ns->ucount_max[i] = INT_MAX;
}
set_rlimit_ucount_max(ns, UCOUNT_RLIMIT_NPROC, rlimit(RLIMIT_NPROC));
+ if (new->ucounts == &init_ucounts)
+ set_rlimit_ucount_max(ns, UCOUNT_RLIMIT_NPROC, RLIMIT_INFINITY);
set_rlimit_ucount_max(ns, UCOUNT_RLIMIT_MSGQUEUE, rlimit(RLIMIT_MSGQUEUE));
set_rlimit_ucount_max(ns, UCOUNT_RLIMIT_SIGPENDING, rlimit(RLIMIT_SIGPENDING));
set_rlimit_ucount_max(ns, UCOUNT_RLIMIT_MEMLOCK, rlimit(RLIMIT_MEMLOCK));
--
2.29.2



2022-02-15 14:42:35

by Michal Koutný

[permalink] [raw]
Subject: Re: [PATCH 4/8] ucounts: Only except the root user in init_user_ns from RLIMIT_NPROC

On Thu, Feb 10, 2022 at 08:13:20PM -0600, "Eric W. Biederman" <[email protected]> wrote:
> @@ -1881,7 +1881,7 @@ static int do_execveat_common(int fd, struct filename *filename,
[...]
> - (current_user() != INIT_USER) &&
> + (current_ucounts() != &init_ucounts) &&
[...]
> @@ -2027,7 +2027,7 @@ static __latent_entropy struct task_struct *copy_process(
[...]
> - if (p->real_cred->user != INIT_USER &&
> + if ((task_ucounts(p) != &init_ucounts) &&

These substitutions make sense to me.

> !capable(CAP_SYS_RESOURCE) && !capable(CAP_SYS_ADMIN))
> goto bad_fork_cleanup_count;
> }
> diff --git a/kernel/user_namespace.c b/kernel/user_namespace.c
> index 6b2e3ca7ee99..f0c04073403d 100644
> --- a/kernel/user_namespace.c
> +++ b/kernel/user_namespace.c
> @@ -123,6 +123,8 @@ int create_user_ns(struct cred *new)
> ns->ucount_max[i] = INT_MAX;
> }
> set_rlimit_ucount_max(ns, UCOUNT_RLIMIT_NPROC, rlimit(RLIMIT_NPROC));
> + if (new->ucounts == &init_ucounts)
> + set_rlimit_ucount_max(ns, UCOUNT_RLIMIT_NPROC, RLIMIT_INFINITY);
> set_rlimit_ucount_max(ns, UCOUNT_RLIMIT_MSGQUEUE, rlimit(RLIMIT_MSGQUEUE));
> set_rlimit_ucount_max(ns, UCOUNT_RLIMIT_SIGPENDING, rlimit(RLIMIT_SIGPENDING));
> set_rlimit_ucount_max(ns, UCOUNT_RLIMIT_MEMLOCK, rlimit(RLIMIT_MEMLOCK));

First, I wanted to object this double fork_init() but I realized it's
relevant for newly created user_ns.

Second, I think new->ucounts would be correct at this point and the
check should be

> if (ucounts == &init_ucounts)

i.e. before set_cred_ucounts() new->ucounts may not be correct.

I'd suggest also a comment in the create_user_ns() explaining the
reason is to exempt global root from RLIMINT_NRPOC also indirectly via
descendant user_nss.

Thanks,
Michal

2022-02-16 16:40:23

by Eric W. Biederman

[permalink] [raw]
Subject: Re: [PATCH 4/8] ucounts: Only except the root user in init_user_ns from RLIMIT_NPROC

Michal Koutný <[email protected]> writes:

> On Thu, Feb 10, 2022 at 08:13:20PM -0600, "Eric W. Biederman" <[email protected]> wrote:
>> @@ -1881,7 +1881,7 @@ static int do_execveat_common(int fd, struct filename *filename,
> [...]
>> - (current_user() != INIT_USER) &&
>> + (current_ucounts() != &init_ucounts) &&
> [...]
>> @@ -2027,7 +2027,7 @@ static __latent_entropy struct task_struct *copy_process(
> [...]
>> - if (p->real_cred->user != INIT_USER &&
>> + if ((task_ucounts(p) != &init_ucounts) &&
>
> These substitutions make sense to me.
>
>> !capable(CAP_SYS_RESOURCE) && !capable(CAP_SYS_ADMIN))
>> goto bad_fork_cleanup_count;
>> }
>> diff --git a/kernel/user_namespace.c b/kernel/user_namespace.c
>> index 6b2e3ca7ee99..f0c04073403d 100644
>> --- a/kernel/user_namespace.c
>> +++ b/kernel/user_namespace.c
>> @@ -123,6 +123,8 @@ int create_user_ns(struct cred *new)
>> ns->ucount_max[i] = INT_MAX;
>> }
>> set_rlimit_ucount_max(ns, UCOUNT_RLIMIT_NPROC, rlimit(RLIMIT_NPROC));
>> + if (new->ucounts == &init_ucounts)
>> + set_rlimit_ucount_max(ns, UCOUNT_RLIMIT_NPROC, RLIMIT_INFINITY);
>> set_rlimit_ucount_max(ns, UCOUNT_RLIMIT_MSGQUEUE, rlimit(RLIMIT_MSGQUEUE));
>> set_rlimit_ucount_max(ns, UCOUNT_RLIMIT_SIGPENDING, rlimit(RLIMIT_SIGPENDING));
>> set_rlimit_ucount_max(ns, UCOUNT_RLIMIT_MEMLOCK, rlimit(RLIMIT_MEMLOCK));
>
> First, I wanted to object this double fork_init() but I realized it's
> relevant for newly created user_ns.
>
> Second, I think new->ucounts would be correct at this point and the
> check should be
>
>> if (ucounts == &init_ucounts)
>
> i.e. before set_cred_ucounts() new->ucounts may not be correct.
>
> I'd suggest also a comment in the create_user_ns() explaining the
> reason is to exempt global root from RLIMINT_NRPOC also indirectly via
> descendant user_nss.

Yes.

This one got culled from my next version of the patchset as it is not
conservative enough. I think it is probably the right general
direction.

On further reflection I am not convinced that it makes sense to test
user or ucounts. They are really not fields designed to support
permission checks.

I think if we want to exempt the root user's children from the root
users rlimit using the second set_rlimit_ucount_max is the way to go.

Someone filed a bug that strongly suggests that we want the second
set_rlimit_ucount_max:
https://bugzilla.kernel.org/show_bug.cgi?id=215596

I am still trying to understand that case.

Eric