LinuxLists.cc - [PATCH 6/8] ucounts: Handle inc_rlimit

2022-02-11 02:33:22

Subject: [PATCH 6/8] ucounts: Handle inc_rlimit_ucounts wrapping in fork

Move inc_rlimit_ucounts from copy_creds into copy_process immediately
after copy_creds where it can be called exactly once. Test for and
handle it when inc_rlimit_ucounts returns LONG_MAX indicating the
count has wrapped.

This is good hygenine and fixes a theoretical bug. In practice
PID_MAX_LIMIT is at most 2^22 so there is not a chance the number of
processes would ever wrap even on an architecture with a 32bit long.

Fixes: 21d1c5e386bc ("Reimplement RLIMIT_NPROC on top of ucounts")
Signed-off-by: "Eric W. Biederman" <[email protected]>
---
kernel/cred.c | 2 --
kernel/fork.c | 2 ++
2 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/kernel/cred.c b/kernel/cred.c
index 229cff081167..96d5fd6ff26f 100644
--- a/kernel/cred.c
+++ b/kernel/cred.c
@@ -358,7 +358,6 @@ int copy_creds(struct task_struct *p, unsigned long clone_flags)
kdebug("share_creds(%p{%d,%d})",
p->cred, atomic_read(&p->cred->usage),
read_cred_subscribers(p->cred));
- inc_rlimit_ucounts(task_ucounts(p), UCOUNT_RLIMIT_NPROC, 1);
return 0;
}

@@ -395,7 +394,6 @@ int copy_creds(struct task_struct *p, unsigned long clone_flags)
#endif

p->cred = p->real_cred = get_cred(new);
- inc_rlimit_ucounts(task_ucounts(p), UCOUNT_RLIMIT_NPROC, 1);
alter_cred_subscribers(new, 2);
validate_creds(new);
return 0;
diff --git a/kernel/fork.c b/kernel/fork.c
index 6f62d37f3650..69333078259c 100644
--- a/kernel/fork.c
+++ b/kernel/fork.c
@@ -2026,6 +2026,8 @@ static __latent_entropy struct task_struct *copy_process(
goto bad_fork_free;

retval = -EAGAIN;
+ if (inc_rlimit_ucounts(task_ucounts(p), UCOUNT_RLIMIT_NPROC, 1) == LONG_MAX)
+ goto bad_fork_cleanup_count;
if (is_ucounts_overlimit(task_ucounts(p), UCOUNT_RLIMIT_NPROC, rlimit(RLIMIT_NPROC))) {
if ((task_ucounts(p) != &init_ucounts) &&
!capable(CAP_SYS_RESOURCE) && !capable(CAP_SYS_ADMIN))
--
2.29.2

2022-02-11 18:24:03

by Alexey Gladkov

[permalink] [raw]

Subject: Re: [PATCH 6/8] ucounts: Handle inc_rlimit_ucounts wrapping in fork

On Thu, Feb 10, 2022 at 08:13:22PM -0600, Eric W. Biederman wrote:
> Move inc_rlimit_ucounts from copy_creds into copy_process immediately
> after copy_creds where it can be called exactly once. Test for and
> handle it when inc_rlimit_ucounts returns LONG_MAX indicating the
> count has wrapped.
>
> This is good hygenine and fixes a theoretical bug. In practice
> PID_MAX_LIMIT is at most 2^22 so there is not a chance the number of
> processes would ever wrap even on an architecture with a 32bit long.
>
> Fixes: 21d1c5e386bc ("Reimplement RLIMIT_NPROC on top of ucounts")
> Signed-off-by: "Eric W. Biederman" <[email protected]>
> ---
> kernel/cred.c | 2 --
> kernel/fork.c | 2 ++
> 2 files changed, 2 insertions(+), 2 deletions(-)
>
> diff --git a/kernel/cred.c b/kernel/cred.c
> index 229cff081167..96d5fd6ff26f 100644
> --- a/kernel/cred.c
> +++ b/kernel/cred.c
> @@ -358,7 +358,6 @@ int copy_creds(struct task_struct *p, unsigned long clone_flags)
> kdebug("share_creds(%p{%d,%d})",
> p->cred, atomic_read(&p->cred->usage),
> read_cred_subscribers(p->cred));
> - inc_rlimit_ucounts(task_ucounts(p), UCOUNT_RLIMIT_NPROC, 1);
> return 0;
> }
>
> @@ -395,7 +394,6 @@ int copy_creds(struct task_struct *p, unsigned long clone_flags)
> #endif
>
> p->cred = p->real_cred = get_cred(new);
> - inc_rlimit_ucounts(task_ucounts(p), UCOUNT_RLIMIT_NPROC, 1);
> alter_cred_subscribers(new, 2);
> validate_creds(new);
> return 0;
> diff --git a/kernel/fork.c b/kernel/fork.c
> index 6f62d37f3650..69333078259c 100644
> --- a/kernel/fork.c
> +++ b/kernel/fork.c
> @@ -2026,6 +2026,8 @@ static __latent_entropy struct task_struct *copy_process(
> goto bad_fork_free;
>
> retval = -EAGAIN;
> + if (inc_rlimit_ucounts(task_ucounts(p), UCOUNT_RLIMIT_NPROC, 1) == LONG_MAX)
> + goto bad_fork_cleanup_count;
> if (is_ucounts_overlimit(task_ucounts(p), UCOUNT_RLIMIT_NPROC, rlimit(RLIMIT_NPROC))) {
> if ((task_ucounts(p) != &init_ucounts) &&
> !capable(CAP_SYS_RESOURCE) && !capable(CAP_SYS_ADMIN))

It might make sense to do something like:

if (inc_rlimit_ucounts_overlimit(task_ucounts(p), UCOUNT_RLIMIT_NPROC, 1, rlimit(RLIMIT_NPROC)) == LONG_MAX) {
if ((task_ucounts(p) != &init_ucounts) &&
!capable(CAP_SYS_RESOURCE) && !capable(CAP_SYS_ADMIN))

and the new function:

long inc_rlimit_ucounts_overlimit(struct ucounts *ucounts, enum ucount_type type, long v, unsigned long rlimit)
{
struct ucounts *iter;
long ret = 0;
long max = rlimit;
if (rlimit > LONG_MAX)
max = LONG_MAX;
for (iter = ucounts; iter; iter = iter->ns->ucounts) {
long new = atomic_long_add_return(v, &iter->ucount[type]);
if (new < 0 || new > max)
ret = LONG_MAX;
else if (iter == ucounts)
ret = new;
max = READ_ONCE(iter->ns->ucount_max[type]);
}
return ret;
}

This will avoid double checking the same userns tree.

Or even modify inc_rlimit_ucounts. This function is used elsewhere like
this:

msgqueue = inc_rlimit_ucounts(info->ucounts, UCOUNT_RLIMIT_MSGQUEUE, mq_bytes);
if (msgqueue == LONG_MAX || msgqueue > rlimit(RLIMIT_MSGQUEUE)) {

memlock = inc_rlimit_ucounts(ucounts, UCOUNT_RLIMIT_MEMLOCK, locked);
if (!allowed && (memlock == LONG_MAX || memlock > lock_limit) && !capable(CAP_IPC_LOCK)) {

In all cases, we have max value for comparison.

--
Rgrds, legion

2022-02-12 13:06:34

by Alexey Gladkov

[permalink] [raw]

Subject: Re: [PATCH 6/8] ucounts: Handle inc_rlimit_ucounts wrapping in fork

On Fri, Feb 11, 2022 at 11:50:46AM -0600, Eric W. Biederman wrote:
> Alexey Gladkov <[email protected]> writes:
>
> > On Thu, Feb 10, 2022 at 08:13:22PM -0600, Eric W. Biederman wrote:
> >> Move inc_rlimit_ucounts from copy_creds into copy_process immediately
> >> after copy_creds where it can be called exactly once. Test for and
> >> handle it when inc_rlimit_ucounts returns LONG_MAX indicating the
> >> count has wrapped.
> >>
> >> This is good hygenine and fixes a theoretical bug. In practice
> >> PID_MAX_LIMIT is at most 2^22 so there is not a chance the number of
> >> processes would ever wrap even on an architecture with a 32bit long.
> >>
> >> Fixes: 21d1c5e386bc ("Reimplement RLIMIT_NPROC on top of ucounts")
> >> Signed-off-by: "Eric W. Biederman" <[email protected]>
> >> ---
> >> kernel/cred.c | 2 --
> >> kernel/fork.c | 2 ++
> >> 2 files changed, 2 insertions(+), 2 deletions(-)
> >>
> >> diff --git a/kernel/cred.c b/kernel/cred.c
> >> index 229cff081167..96d5fd6ff26f 100644
> >> --- a/kernel/cred.c
> >> +++ b/kernel/cred.c
> >> @@ -358,7 +358,6 @@ int copy_creds(struct task_struct *p, unsigned long clone_flags)
> >> kdebug("share_creds(%p{%d,%d})",
> >> p->cred, atomic_read(&p->cred->usage),
> >> read_cred_subscribers(p->cred));
> >> - inc_rlimit_ucounts(task_ucounts(p), UCOUNT_RLIMIT_NPROC, 1);
> >> return 0;
> >> }
> >>
> >> @@ -395,7 +394,6 @@ int copy_creds(struct task_struct *p, unsigned long clone_flags)
> >> #endif
> >>
> >> p->cred = p->real_cred = get_cred(new);
> >> - inc_rlimit_ucounts(task_ucounts(p), UCOUNT_RLIMIT_NPROC, 1);
> >> alter_cred_subscribers(new, 2);
> >> validate_creds(new);
> >> return 0;
> >> diff --git a/kernel/fork.c b/kernel/fork.c
> >> index 6f62d37f3650..69333078259c 100644
> >> --- a/kernel/fork.c
> >> +++ b/kernel/fork.c
> >> @@ -2026,6 +2026,8 @@ static __latent_entropy struct task_struct *copy_process(
> >> goto bad_fork_free;
> >>
> >> retval = -EAGAIN;
> >> + if (inc_rlimit_ucounts(task_ucounts(p), UCOUNT_RLIMIT_NPROC, 1) == LONG_MAX)
> >> + goto bad_fork_cleanup_count;
> >> if (is_ucounts_overlimit(task_ucounts(p), UCOUNT_RLIMIT_NPROC, rlimit(RLIMIT_NPROC))) {
> >> if ((task_ucounts(p) != &init_ucounts) &&
> >> !capable(CAP_SYS_RESOURCE) && !capable(CAP_SYS_ADMIN))
> >
> > It might make sense to do something like:
> >
> > if (inc_rlimit_ucounts_overlimit(task_ucounts(p), UCOUNT_RLIMIT_NPROC, 1, rlimit(RLIMIT_NPROC)) == LONG_MAX) {
> > if ((task_ucounts(p) != &init_ucounts) &&
> > !capable(CAP_SYS_RESOURCE) && !capable(CAP_SYS_ADMIN))
> >
> > and the new function:
> >
> > long inc_rlimit_ucounts_overlimit(struct ucounts *ucounts, enum ucount_type type, long v, unsigned long rlimit)
> > {
> > struct ucounts *iter;
> > long ret = 0;
> > long max = rlimit;
> > if (rlimit > LONG_MAX)
> > max = LONG_MAX;
> > for (iter = ucounts; iter; iter = iter->ns->ucounts) {
> > long new = atomic_long_add_return(v, &iter->ucount[type]);
> > if (new < 0 || new > max)
> > ret = LONG_MAX;
> > else if (iter == ucounts)
> > ret = new;
> > max = READ_ONCE(iter->ns->ucount_max[type]);
> > }
> > return ret;
> > }
> >
> > This will avoid double checking the same userns tree.
> >
> > Or even modify inc_rlimit_ucounts. This function is used elsewhere like
> > this:
> >
> >
> > msgqueue = inc_rlimit_ucounts(info->ucounts, UCOUNT_RLIMIT_MSGQUEUE, mq_bytes);
> > if (msgqueue == LONG_MAX || msgqueue > rlimit(RLIMIT_MSGQUEUE)) {
> >
> >
> > memlock = inc_rlimit_ucounts(ucounts, UCOUNT_RLIMIT_MEMLOCK, locked);
> > if (!allowed && (memlock == LONG_MAX || memlock > lock_limit) && !capable(CAP_IPC_LOCK)) {
> >
> >
> > In all cases, we have max value for comparison.
>
> Good point. The downside is that it means we can't use the same code
> in exec. The upside is that the code is more idiomatic.

My suggestion was before I saw the 8/8 patch :)

We can make something like:

static inline bool is_nproc_overlimit(struct task_struct *task)
{
return (task_ucounts(task) != &init_ucounts) &&
!has_capability(task, CAP_SYS_RESOURCE) &&
!has_capability(task, CAP_SYS_ADMIN);
}

In copy_process:

if (inc_rlimit_ucounts_overlimit(task_ucounts(p), UCOUNT_RLIMIT_NPROC, 1, rlimit(RLIMIT_NPROC)) == LONG_MAX) {
if (is_nproc_overlimit(p))
goto bad_fork_cleanup_count;
}

In do_execveat_common:

if ((current->flags & PF_NPROC_CHECK) &&
is_ucounts_overlimit(current_ucounts(), UCOUNT_RLIMIT_NPROC, rlimit(RLIMIT_NPROC)) &&
is_nproc_overlimit(current)) {
retval = -EAGAIN;
goto out_ret;
}

--
Rgrds, legion

2022-02-12 23:30:42

by Eric W. Biederman

[permalink] [raw]

Subject: Re: [PATCH 6/8] ucounts: Handle inc_rlimit_ucounts wrapping in fork

Alexey Gladkov <[email protected]> writes:

> On Thu, Feb 10, 2022 at 08:13:22PM -0600, Eric W. Biederman wrote:
>> Move inc_rlimit_ucounts from copy_creds into copy_process immediately
>> after copy_creds where it can be called exactly once. Test for and
>> handle it when inc_rlimit_ucounts returns LONG_MAX indicating the
>> count has wrapped.
>>
>> This is good hygenine and fixes a theoretical bug. In practice
>> PID_MAX_LIMIT is at most 2^22 so there is not a chance the number of
>> processes would ever wrap even on an architecture with a 32bit long.
>>
>> Fixes: 21d1c5e386bc ("Reimplement RLIMIT_NPROC on top of ucounts")
>> Signed-off-by: "Eric W. Biederman" <[email protected]>
>> ---
>> kernel/cred.c | 2 --
>> kernel/fork.c | 2 ++
>> 2 files changed, 2 insertions(+), 2 deletions(-)
>>
>> diff --git a/kernel/cred.c b/kernel/cred.c
>> index 229cff081167..96d5fd6ff26f 100644
>> --- a/kernel/cred.c
>> +++ b/kernel/cred.c
>> @@ -358,7 +358,6 @@ int copy_creds(struct task_struct *p, unsigned long clone_flags)
>> kdebug("share_creds(%p{%d,%d})",
>> p->cred, atomic_read(&p->cred->usage),
>> read_cred_subscribers(p->cred));
>> - inc_rlimit_ucounts(task_ucounts(p), UCOUNT_RLIMIT_NPROC, 1);
>> return 0;
>> }
>>
>> @@ -395,7 +394,6 @@ int copy_creds(struct task_struct *p, unsigned long clone_flags)
>> #endif
>>
>> p->cred = p->real_cred = get_cred(new);
>> - inc_rlimit_ucounts(task_ucounts(p), UCOUNT_RLIMIT_NPROC, 1);
>> alter_cred_subscribers(new, 2);
>> validate_creds(new);
>> return 0;
>> diff --git a/kernel/fork.c b/kernel/fork.c
>> index 6f62d37f3650..69333078259c 100644
>> --- a/kernel/fork.c
>> +++ b/kernel/fork.c
>> @@ -2026,6 +2026,8 @@ static __latent_entropy struct task_struct *copy_process(
>> goto bad_fork_free;
>>
>> retval = -EAGAIN;
>> + if (inc_rlimit_ucounts(task_ucounts(p), UCOUNT_RLIMIT_NPROC, 1) == LONG_MAX)
>> + goto bad_fork_cleanup_count;
>> if (is_ucounts_overlimit(task_ucounts(p), UCOUNT_RLIMIT_NPROC, rlimit(RLIMIT_NPROC))) {
>> if ((task_ucounts(p) != &init_ucounts) &&
>> !capable(CAP_SYS_RESOURCE) && !capable(CAP_SYS_ADMIN))
>
> It might make sense to do something like:
>
> if (inc_rlimit_ucounts_overlimit(task_ucounts(p), UCOUNT_RLIMIT_NPROC, 1, rlimit(RLIMIT_NPROC)) == LONG_MAX) {
> if ((task_ucounts(p) != &init_ucounts) &&
> !capable(CAP_SYS_RESOURCE) && !capable(CAP_SYS_ADMIN))
>
> and the new function:
>
> long inc_rlimit_ucounts_overlimit(struct ucounts *ucounts, enum ucount_type type, long v, unsigned long rlimit)
> {
> struct ucounts *iter;
> long ret = 0;
> long max = rlimit;
> if (rlimit > LONG_MAX)
> max = LONG_MAX;
> for (iter = ucounts; iter; iter = iter->ns->ucounts) {
> long new = atomic_long_add_return(v, &iter->ucount[type]);
> if (new < 0 || new > max)
> ret = LONG_MAX;
> else if (iter == ucounts)
> ret = new;
> max = READ_ONCE(iter->ns->ucount_max[type]);
> }
> return ret;
> }
>
> This will avoid double checking the same userns tree.
>
> Or even modify inc_rlimit_ucounts. This function is used elsewhere like
> this:
>
>
> msgqueue = inc_rlimit_ucounts(info->ucounts, UCOUNT_RLIMIT_MSGQUEUE, mq_bytes);
> if (msgqueue == LONG_MAX || msgqueue > rlimit(RLIMIT_MSGQUEUE)) {
>
>
> memlock = inc_rlimit_ucounts(ucounts, UCOUNT_RLIMIT_MEMLOCK, locked);
> if (!allowed && (memlock == LONG_MAX || memlock > lock_limit) && !capable(CAP_IPC_LOCK)) {
>
>
> In all cases, we have max value for comparison.

Good point. The downside is that it means we can't use the same code
in exec. The upside is that the code is more idiomatic.

Eric

2022-02-14 06:11:28

by Shuah Khan

[permalink] [raw]

Subject: Re: [PATCH 6/8] ucounts: Handle inc_rlimit_ucounts wrapping in fork

On 2/11/22 10:50 AM, Eric W. Biederman wrote:
> Alexey Gladkov <[email protected]> writes:
>
>> On Thu, Feb 10, 2022 at 08:13:22PM -0600, Eric W. Biederman wrote:
>>> Move inc_rlimit_ucounts from copy_creds into copy_process immediately
>>> after copy_creds where it can be called exactly once. Test for and
>>> handle it when inc_rlimit_ucounts returns LONG_MAX indicating the
>>> count has wrapped.
>>>
>>> This is good hygenine and fixes a theoretical bug. In practice
>>> PID_MAX_LIMIT is at most 2^22 so there is not a chance the number of
>>> processes would ever wrap even on an architecture with a 32bit long.
>>>
>>> Fixes: 21d1c5e386bc ("Reimplement RLIMIT_NPROC on top of ucounts")
>>> Signed-off-by: "Eric W. Biederman" <[email protected]>
>>> ---
>>> kernel/cred.c | 2 --
>>> kernel/fork.c | 2 ++
>>> 2 files changed, 2 insertions(+), 2 deletions(-)
>>>
>>> diff --git a/kernel/cred.c b/kernel/cred.c
>>> index 229cff081167..96d5fd6ff26f 100644
>>> --- a/kernel/cred.c
>>> +++ b/kernel/cred.c
>>> @@ -358,7 +358,6 @@ int copy_creds(struct task_struct *p, unsigned long clone_flags)
>>> kdebug("share_creds(%p{%d,%d})",
>>> p->cred, atomic_read(&p->cred->usage),
>>> read_cred_subscribers(p->cred));
>>> - inc_rlimit_ucounts(task_ucounts(p), UCOUNT_RLIMIT_NPROC, 1);
>>> return 0;
>>> }
>>>
>>> @@ -395,7 +394,6 @@ int copy_creds(struct task_struct *p, unsigned long clone_flags)
>>> #endif
>>>
>>> p->cred = p->real_cred = get_cred(new);
>>> - inc_rlimit_ucounts(task_ucounts(p), UCOUNT_RLIMIT_NPROC, 1);
>>> alter_cred_subscribers(new, 2);
>>> validate_creds(new);
>>> return 0;
>>> diff --git a/kernel/fork.c b/kernel/fork.c
>>> index 6f62d37f3650..69333078259c 100644
>>> --- a/kernel/fork.c
>>> +++ b/kernel/fork.c
>>> @@ -2026,6 +2026,8 @@ static __latent_entropy struct task_struct *copy_process(
>>> goto bad_fork_free;
>>>
>>> retval = -EAGAIN;
>>> + if (inc_rlimit_ucounts(task_ucounts(p), UCOUNT_RLIMIT_NPROC, 1) == LONG_MAX)
>>> + goto bad_fork_cleanup_count;
>>> if (is_ucounts_overlimit(task_ucounts(p), UCOUNT_RLIMIT_NPROC, rlimit(RLIMIT_NPROC))) {
>>> if ((task_ucounts(p) != &init_ucounts) &&
>>> !capable(CAP_SYS_RESOURCE) && !capable(CAP_SYS_ADMIN))
>>
>> It might make sense to do something like:
>>
>> if (inc_rlimit_ucounts_overlimit(task_ucounts(p), UCOUNT_RLIMIT_NPROC, 1, rlimit(RLIMIT_NPROC)) == LONG_MAX) {
>> if ((task_ucounts(p) != &init_ucounts) &&
>> !capable(CAP_SYS_RESOURCE) && !capable(CAP_SYS_ADMIN))
>>
>> and the new function:
>>
>> long inc_rlimit_ucounts_overlimit(struct ucounts *ucounts, enum ucount_type type, long v, unsigned long rlimit)
>> {
>> struct ucounts *iter;
>> long ret = 0;
>> long max = rlimit;
>> if (rlimit > LONG_MAX)
>> max = LONG_MAX;
>> for (iter = ucounts; iter; iter = iter->ns->ucounts) {
>> long new = atomic_long_add_return(v, &iter->ucount[type]);
>> if (new < 0 || new > max)
>> ret = LONG_MAX;
>> else if (iter == ucounts)
>> ret = new;
>> max = READ_ONCE(iter->ns->ucount_max[type]);
>> }
>> return ret;
>> }
>>
>> This will avoid double checking the same userns tree.
>>
>> Or even modify inc_rlimit_ucounts. This function is used elsewhere like
>> this:
>>
>>
>> msgqueue = inc_rlimit_ucounts(info->ucounts, UCOUNT_RLIMIT_MSGQUEUE, mq_bytes);
>> if (msgqueue == LONG_MAX || msgqueue > rlimit(RLIMIT_MSGQUEUE)) {
>>
>>
>> memlock = inc_rlimit_ucounts(ucounts, UCOUNT_RLIMIT_MEMLOCK, locked);
>> if (!allowed && (memlock == LONG_MAX || memlock > lock_limit) && !capable(CAP_IPC_LOCK)) {
>>
>>
>> In all cases, we have max value for comparison.
>
> Good point. The downside is that it means we can't use the same code
> in exec. The upside is that the code is more idiomatic.
>

Checking on this a bit more on other callers of inc_rlimit_ucounts(),
we might have another issue:

1. mqueue_get_inode() does:
spin_lock(&mq_lock);
msgqueue = inc_rlimit_ucounts(info->ucounts, UCOUNT_RLIMIT_MSGQUEUE, mq_bytes);
if (msgqueue == LONG_MAX || msgqueue > rlimit(RLIMIT_MSGQUEUE)) {
dec_rlimit_ucounts(info->ucounts, UCOUNT_RLIMIT_MSGQUEUE, mq_bytes);
spin_unlock(&mq_lock);

2. user_shm_lock() &shmlock_user_lock
spin_lock(&shmlock_user_lock);
memlock = inc_rlimit_ucounts(ucounts, UCOUNT_RLIMIT_MEMLOCK, locked);

if (!allowed && (memlock == LONG_MAX || memlock > lock_limit) && !capable(CAP_IPC_LOCK)) {
dec_rlimit_ucounts(ucounts, UCOUNT_RLIMIT_MEMLOCK, locked);

3. user_namespace_sysctl_init() Doesn't check for max value.

4. copy_creds() doesn't check for max value in its 3 calls to inc_rlimit_ucounts()

You can see that each of these instances some callers dec_rlimit_ucounts().
They hold different locks. So do we have a window where LONG_MAX could
overflow and go unnoticed?

thanks,
-- Shuah

2022-02-14 12:55:07

by Eric W. Biederman

[permalink] [raw]

Subject: Re: [PATCH 6/8] ucounts: Handle inc_rlimit_ucounts wrapping in fork

Alexey Gladkov <[email protected]> writes:

> On Fri, Feb 11, 2022 at 11:50:46AM -0600, Eric W. Biederman wrote:
>> Alexey Gladkov <[email protected]> writes:
>>
>> > On Thu, Feb 10, 2022 at 08:13:22PM -0600, Eric W. Biederman wrote:
>> >> Move inc_rlimit_ucounts from copy_creds into copy_process immediately
>> >> after copy_creds where it can be called exactly once. Test for and
>> >> handle it when inc_rlimit_ucounts returns LONG_MAX indicating the
>> >> count has wrapped.
>> >>
>> >> This is good hygenine and fixes a theoretical bug. In practice
>> >> PID_MAX_LIMIT is at most 2^22 so there is not a chance the number of
>> >> processes would ever wrap even on an architecture with a 32bit long.
>> >>
>> >> Fixes: 21d1c5e386bc ("Reimplement RLIMIT_NPROC on top of ucounts")
>> >> Signed-off-by: "Eric W. Biederman" <[email protected]>
>> >> ---
>> >> kernel/cred.c | 2 --
>> >> kernel/fork.c | 2 ++
>> >> 2 files changed, 2 insertions(+), 2 deletions(-)
>> >>
>> >> diff --git a/kernel/cred.c b/kernel/cred.c
>> >> index 229cff081167..96d5fd6ff26f 100644
>> >> --- a/kernel/cred.c
>> >> +++ b/kernel/cred.c
>> >> @@ -358,7 +358,6 @@ int copy_creds(struct task_struct *p, unsigned long clone_flags)
>> >> kdebug("share_creds(%p{%d,%d})",
>> >> p->cred, atomic_read(&p->cred->usage),
>> >> read_cred_subscribers(p->cred));
>> >> - inc_rlimit_ucounts(task_ucounts(p), UCOUNT_RLIMIT_NPROC, 1);
>> >> return 0;
>> >> }
>> >>
>> >> @@ -395,7 +394,6 @@ int copy_creds(struct task_struct *p, unsigned long clone_flags)
>> >> #endif
>> >>
>> >> p->cred = p->real_cred = get_cred(new);
>> >> - inc_rlimit_ucounts(task_ucounts(p), UCOUNT_RLIMIT_NPROC, 1);
>> >> alter_cred_subscribers(new, 2);
>> >> validate_creds(new);
>> >> return 0;
>> >> diff --git a/kernel/fork.c b/kernel/fork.c
>> >> index 6f62d37f3650..69333078259c 100644
>> >> --- a/kernel/fork.c
>> >> +++ b/kernel/fork.c
>> >> @@ -2026,6 +2026,8 @@ static __latent_entropy struct task_struct *copy_process(
>> >> goto bad_fork_free;
>> >>
>> >> retval = -EAGAIN;
>> >> + if (inc_rlimit_ucounts(task_ucounts(p), UCOUNT_RLIMIT_NPROC, 1) == LONG_MAX)
>> >> + goto bad_fork_cleanup_count;
>> >> if (is_ucounts_overlimit(task_ucounts(p), UCOUNT_RLIMIT_NPROC, rlimit(RLIMIT_NPROC))) {
>> >> if ((task_ucounts(p) != &init_ucounts) &&
>> >> !capable(CAP_SYS_RESOURCE) && !capable(CAP_SYS_ADMIN))
>> >
>> > It might make sense to do something like:
>> >
>> > if (inc_rlimit_ucounts_overlimit(task_ucounts(p), UCOUNT_RLIMIT_NPROC, 1, rlimit(RLIMIT_NPROC)) == LONG_MAX) {
>> > if ((task_ucounts(p) != &init_ucounts) &&
>> > !capable(CAP_SYS_RESOURCE) && !capable(CAP_SYS_ADMIN))
>> >
>> > and the new function:
>> >
>> > long inc_rlimit_ucounts_overlimit(struct ucounts *ucounts, enum ucount_type type, long v, unsigned long rlimit)
>> > {
>> > struct ucounts *iter;
>> > long ret = 0;
>> > long max = rlimit;
>> > if (rlimit > LONG_MAX)
>> > max = LONG_MAX;
>> > for (iter = ucounts; iter; iter = iter->ns->ucounts) {
>> > long new = atomic_long_add_return(v, &iter->ucount[type]);
>> > if (new < 0 || new > max)
>> > ret = LONG_MAX;
>> > else if (iter == ucounts)
>> > ret = new;
>> > max = READ_ONCE(iter->ns->ucount_max[type]);
>> > }
>> > return ret;
>> > }
>> >
>> > This will avoid double checking the same userns tree.
>> >
>> > Or even modify inc_rlimit_ucounts. This function is used elsewhere like
>> > this:
>> >
>> >
>> > msgqueue = inc_rlimit_ucounts(info->ucounts, UCOUNT_RLIMIT_MSGQUEUE, mq_bytes);
>> > if (msgqueue == LONG_MAX || msgqueue > rlimit(RLIMIT_MSGQUEUE)) {
>> >
>> >
>> > memlock = inc_rlimit_ucounts(ucounts, UCOUNT_RLIMIT_MEMLOCK, locked);
>> > if (!allowed && (memlock == LONG_MAX || memlock > lock_limit) && !capable(CAP_IPC_LOCK)) {
>> >
>> >
>> > In all cases, we have max value for comparison.
>>
>> Good point. The downside is that it means we can't use the same code
>> in exec. The upside is that the code is more idiomatic.
>
> My suggestion was before I saw the 8/8 patch :)
>
> We can make something like:
>
> static inline bool is_nproc_overlimit(struct task_struct *task)
> {
> return (task_ucounts(task) != &init_ucounts) &&
> !has_capability(task, CAP_SYS_RESOURCE) &&
> !has_capability(task, CAP_SYS_ADMIN);
> }
>
> In copy_process:
>
> if (inc_rlimit_ucounts_overlimit(task_ucounts(p), UCOUNT_RLIMIT_NPROC, 1, rlimit(RLIMIT_NPROC)) == LONG_MAX) {
> if (is_nproc_overlimit(p))
> goto bad_fork_cleanup_count;
> }
>
> In do_execveat_common:
>
> if ((current->flags & PF_NPROC_CHECK) &&
> is_ucounts_overlimit(current_ucounts(), UCOUNT_RLIMIT_NPROC, rlimit(RLIMIT_NPROC)) &&
> is_nproc_overlimit(current)) {
> retval = -EAGAIN;
> goto out_ret;
> }

The more I think about it the more I suspect 8/8 is the wrong way to go.

The report is that adding the capability calls in kernel/sys.c which I
moved into execve broke apache. As the change was about removing
inconsistencies I expect I should just start with the revert and keep
the difference between the two code paths.

My gut feel is that both the capable and the magic exception of a user
are wrong. If I am wrong people can report a bug and the code can get
fixed.

But definitely a bug fix branch is the wrong place to be expanding what
is allowed without it clearly being a bug.

Eric