2021-07-28 07:31:16

by CGEL

[permalink] [raw]
Subject: [PATCH] set_user: add capability check when rlimit(RLIMIT_NPROC) exceeds

From: Ran Xiaokai <[email protected]>

in copy_process(): non root users but with capability CAP_SYS_RESOURCE
or CAP_SYS_ADMIN will clean PF_NPROC_EXCEEDED flag even
rlimit(RLIMIT_NPROC) exceeds. Add the same capability check logic here.

Signed-off-by: Ran Xiaokai <[email protected]>
---
kernel/sys.c | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/kernel/sys.c b/kernel/sys.c
index ef1a78f5d71c..72c7639e3c98 100644
--- a/kernel/sys.c
+++ b/kernel/sys.c
@@ -480,7 +480,8 @@ static int set_user(struct cred *new)
* failure to the execve() stage.
*/
if (is_ucounts_overlimit(new->ucounts, UCOUNT_RLIMIT_NPROC, rlimit(RLIMIT_NPROC)) &&
- new_user != INIT_USER)
+ new_user != INIT_USER &&
+ !capable(CAP_SYS_RESOURCE) && !capable(CAP_SYS_ADMIN))
current->flags |= PF_NPROC_EXCEEDED;
else
current->flags &= ~PF_NPROC_EXCEEDED;
--
2.25.1



2021-07-28 12:01:55

by Christian Brauner

[permalink] [raw]
Subject: Re: [PATCH] set_user: add capability check when rlimit(RLIMIT_NPROC) exceeds

[Ccing a few people that did the PF_NPROC_EXCEEDED changes]

On Wed, Jul 28, 2021 at 12:26:29AM -0700, [email protected] wrote:
> From: Ran Xiaokai <[email protected]>
>
> in copy_process(): non root users but with capability CAP_SYS_RESOURCE
> or CAP_SYS_ADMIN will clean PF_NPROC_EXCEEDED flag even
> rlimit(RLIMIT_NPROC) exceeds. Add the same capability check logic here.
>
> Signed-off-by: Ran Xiaokai <[email protected]>
> ---
> kernel/sys.c | 3 ++-
> 1 file changed, 2 insertions(+), 1 deletion(-)
>
> diff --git a/kernel/sys.c b/kernel/sys.c
> index ef1a78f5d71c..72c7639e3c98 100644
> --- a/kernel/sys.c
> +++ b/kernel/sys.c
> @@ -480,7 +480,8 @@ static int set_user(struct cred *new)
> * failure to the execve() stage.
> */
> if (is_ucounts_overlimit(new->ucounts, UCOUNT_RLIMIT_NPROC, rlimit(RLIMIT_NPROC)) &&
> - new_user != INIT_USER)
> + new_user != INIT_USER &&
> + !capable(CAP_SYS_RESOURCE) && !capable(CAP_SYS_ADMIN))
> current->flags |= PF_NPROC_EXCEEDED;
> else
> current->flags &= ~PF_NPROC_EXCEEDED;

Hey Cgel,
Hey Ran,

The gist seems to me that this code wants to make sure that a program
can't successfully exec if it has gone through a set*id() transition
while exceeding its RLIMIT_NPROC.

But I agree that the semantics here are a bit strange.

Iicu, a capable but non-INIT_USER caller getting PF_NPROC_EXCEEDED set
during a set*id() transition wouldn't be able to exec right away if they
still exceed their RLIMIT_NPROC at the time of exec. So their exec would
fail in fs/exec.c:

if ((current->flags & PF_NPROC_EXCEEDED) &&
is_ucounts_overlimit(current_ucounts(), UCOUNT_RLIMIT_NPROC, rlimit(RLIMIT_NPROC))) {
retval = -EAGAIN;
goto out_ret;
}

However, if the caller were to fork() right after the set*id()
transition but before the exec while still exceeding their RLIMIT_NPROC
then they would get PF_NPROC_EXCEEDED cleared (while the child would
inherit it):

retval = -EAGAIN;
if (is_ucounts_overlimit(task_ucounts(p), UCOUNT_RLIMIT_NPROC, rlimit(RLIMIT_NPROC))) {
if (p->real_cred->user != INIT_USER &&
!capable(CAP_SYS_RESOURCE) && !capable(CAP_SYS_ADMIN))
goto bad_fork_free;
}
current->flags &= ~PF_NPROC_EXCEEDED;

which means a subsequent exec by the capable caller would now succeed
even though they could still exceed their RLIMIT_NPROC limit.

So at first glance, it seems that set_user() should probably get the
same check as it can be circumvented today unless I misunderstand the
original motivation.

Christian

2021-07-30 08:24:38

by CGEL

[permalink] [raw]
Subject: Re: [PATCH] set_user: add capability check when rlimit(RLIMIT_NPROC) exceeds

On Wed, Jul 28, 2021 at 01:59:30PM +0200, Christian Brauner wrote:
> [Ccing a few people that did the PF_NPROC_EXCEEDED changes]
>
>
> Hey Cgel,
> Hey Ran,
>
> The gist seems to me that this code wants to make sure that a program
> can't successfully exec if it has gone through a set*id() transition
> while exceeding its RLIMIT_NPROC.
>
> But I agree that the semantics here are a bit strange.
>
> Iicu, a capable but non-INIT_USER caller getting PF_NPROC_EXCEEDED set
> during a set*id() transition wouldn't be able to exec right away if they
> still exceed their RLIMIT_NPROC at the time of exec. So their exec would
> fail in fs/exec.c:
>
> if ((current->flags & PF_NPROC_EXCEEDED) &&
> is_ucounts_overlimit(current_ucounts(), UCOUNT_RLIMIT_NPROC, rlimit(RLIMIT_NPROC))) {
> retval = -EAGAIN;
> goto out_ret;
> }
>
> However, if the caller were to fork() right after the set*id()
> transition but before the exec while still exceeding their RLIMIT_NPROC
> then they would get PF_NPROC_EXCEEDED cleared (while the child would
> inherit it):
>
> retval = -EAGAIN;
> if (is_ucounts_overlimit(task_ucounts(p), UCOUNT_RLIMIT_NPROC, rlimit(RLIMIT_NPROC))) {
> if (p->real_cred->user != INIT_USER &&
> !capable(CAP_SYS_RESOURCE) && !capable(CAP_SYS_ADMIN))
> goto bad_fork_free;
> }
> current->flags &= ~PF_NPROC_EXCEEDED;
>
> which means a subsequent exec by the capable caller would now succeed
> even though they could still exceed their RLIMIT_NPROC limit.
>
> So at first glance, it seems that set_user() should probably get the
> same check as it can be circumvented today unless I misunderstand the
> original motivation.
>
> Christian

Hi Christian,

I think i didn't give enough information in the commit message.
When switch to a capable but non-INIT_SUER and the RLIMIT_NPROC limit already exceeded,
and calls these funcs:
1. set_xxuid()->exec()
---> fail
2. set_xxuid()->fork()->exec()
---> success
Kernel should have the same behavior to uer space.
Also i think non init_user CAN exceed the limit when with proper capability,
so in the patch, set_user() clear PF_NPROC_EXCEEDED flag if capable()
returns true.

2021-08-03 10:07:05

by CGEL

[permalink] [raw]
Subject: Re: [PATCH] set_user: add capability check when rlimit(RLIMIT_NPROC) exceeds

On Fri, Jul 30, 2021 at 01:23:31AM -0700, CGEL wrote:
> On Wed, Jul 28, 2021 at 01:59:30PM +0200, Christian Brauner wrote:
> > [Ccing a few people that did the PF_NPROC_EXCEEDED changes]
> >
> >
> > Hey Cgel,
> > Hey Ran,
> >
> > The gist seems to me that this code wants to make sure that a program
> > can't successfully exec if it has gone through a set*id() transition
> > while exceeding its RLIMIT_NPROC.
> >
> > But I agree that the semantics here are a bit strange.
> >
> > Iicu, a capable but non-INIT_USER caller getting PF_NPROC_EXCEEDED set
> > during a set*id() transition wouldn't be able to exec right away if they
> > still exceed their RLIMIT_NPROC at the time of exec. So their exec would
> > fail in fs/exec.c:
> >
> > if ((current->flags & PF_NPROC_EXCEEDED) &&
> > is_ucounts_overlimit(current_ucounts(), UCOUNT_RLIMIT_NPROC, rlimit(RLIMIT_NPROC))) {
> > retval = -EAGAIN;
> > goto out_ret;
> > }
> >
> > However, if the caller were to fork() right after the set*id()
> > transition but before the exec while still exceeding their RLIMIT_NPROC
> > then they would get PF_NPROC_EXCEEDED cleared (while the child would
> > inherit it):
> >
> > retval = -EAGAIN;
> > if (is_ucounts_overlimit(task_ucounts(p), UCOUNT_RLIMIT_NPROC, rlimit(RLIMIT_NPROC))) {
> > if (p->real_cred->user != INIT_USER &&
> > !capable(CAP_SYS_RESOURCE) && !capable(CAP_SYS_ADMIN))
> > goto bad_fork_free;
> > }
> > current->flags &= ~PF_NPROC_EXCEEDED;
> >
> > which means a subsequent exec by the capable caller would now succeed
> > even though they could still exceed their RLIMIT_NPROC limit.
> >
> > So at first glance, it seems that set_user() should probably get the
> > same check as it can be circumvented today unless I misunderstand the
> > original motivation.
> >
> > Christian
>
> Hi Christian,
>
> I think i didn't give enough information in the commit message.
> When switch to a capable but non-INIT_SUER and the RLIMIT_NPROC limit already exceeded,
> and calls these funcs:
> 1. set_xxuid()->exec()
> ---> fail
> 2. set_xxuid()->fork()->exec()
> ---> success
> Kernel should have the same behavior to uer space.
> Also i think non init_user CAN exceed the limit when with proper capability,
> so in the patch, set_user() clear PF_NPROC_EXCEEDED flag if capable()
> returns true.

Hi, Christian

Do you have any further comments on this patch?
is there anything i did not give enough infomation ?

2021-08-03 14:08:05

by Christian Brauner

[permalink] [raw]
Subject: Re: [PATCH] set_user: add capability check when rlimit(RLIMIT_NPROC) exceeds

On Tue, Aug 03, 2021 at 03:03:54AM -0700, CGEL wrote:
> On Fri, Jul 30, 2021 at 01:23:31AM -0700, CGEL wrote:
> > On Wed, Jul 28, 2021 at 01:59:30PM +0200, Christian Brauner wrote:
> > > [Ccing a few people that did the PF_NPROC_EXCEEDED changes]
> > >
> > >
> > > Hey Cgel,
> > > Hey Ran,
> > >
> > > The gist seems to me that this code wants to make sure that a program
> > > can't successfully exec if it has gone through a set*id() transition
> > > while exceeding its RLIMIT_NPROC.
> > >
> > > But I agree that the semantics here are a bit strange.
> > >
> > > Iicu, a capable but non-INIT_USER caller getting PF_NPROC_EXCEEDED set
> > > during a set*id() transition wouldn't be able to exec right away if they
> > > still exceed their RLIMIT_NPROC at the time of exec. So their exec would
> > > fail in fs/exec.c:
> > >
> > > if ((current->flags & PF_NPROC_EXCEEDED) &&
> > > is_ucounts_overlimit(current_ucounts(), UCOUNT_RLIMIT_NPROC, rlimit(RLIMIT_NPROC))) {
> > > retval = -EAGAIN;
> > > goto out_ret;
> > > }
> > >
> > > However, if the caller were to fork() right after the set*id()
> > > transition but before the exec while still exceeding their RLIMIT_NPROC
> > > then they would get PF_NPROC_EXCEEDED cleared (while the child would
> > > inherit it):
> > >
> > > retval = -EAGAIN;
> > > if (is_ucounts_overlimit(task_ucounts(p), UCOUNT_RLIMIT_NPROC, rlimit(RLIMIT_NPROC))) {
> > > if (p->real_cred->user != INIT_USER &&
> > > !capable(CAP_SYS_RESOURCE) && !capable(CAP_SYS_ADMIN))
> > > goto bad_fork_free;
> > > }
> > > current->flags &= ~PF_NPROC_EXCEEDED;
> > >
> > > which means a subsequent exec by the capable caller would now succeed
> > > even though they could still exceed their RLIMIT_NPROC limit.
> > >
> > > So at first glance, it seems that set_user() should probably get the
> > > same check as it can be circumvented today unless I misunderstand the
> > > original motivation.
> > >
> > > Christian
> >
> > Hi Christian,
> >
> > I think i didn't give enough information in the commit message.
> > When switch to a capable but non-INIT_SUER and the RLIMIT_NPROC limit already exceeded,
> > and calls these funcs:
> > 1. set_xxuid()->exec()
> > ---> fail
> > 2. set_xxuid()->fork()->exec()
> > ---> success
> > Kernel should have the same behavior to uer space.
> > Also i think non init_user CAN exceed the limit when with proper capability,
> > so in the patch, set_user() clear PF_NPROC_EXCEEDED flag if capable()
> > returns true.
>
> Hi, Christian
>
> Do you have any further comments on this patch?
> is there anything i did not give enough infomation ?

Yeah, this is fine and how I understood it too. I don't see anything
obviously wrong with it and the weird detour workaround via fork() seems
inconsistent. So if I don't here anyone come up with a good reason the
current behavior makes sense I'll pick this up.

Christian

2021-09-07 21:40:28

by Solar Designer

[permalink] [raw]
Subject: Re: [PATCH] set_user: add capability check when rlimit(RLIMIT_NPROC) exceeds

Hi all,

Brad Spengler brought this to my attention on Twitter, and Christian
Brauner agreed I should follow up. So here goes, below the quote:

On Tue, Aug 03, 2021 at 04:07:02PM +0200, Christian Brauner wrote:
> On Tue, Aug 03, 2021 at 03:03:54AM -0700, CGEL wrote:
> > On Fri, Jul 30, 2021 at 01:23:31AM -0700, CGEL wrote:
> > > On Wed, Jul 28, 2021 at 01:59:30PM +0200, Christian Brauner wrote:
> > > > [Ccing a few people that did the PF_NPROC_EXCEEDED changes]
> > > >
> > > >
> > > > Hey Cgel,
> > > > Hey Ran,
> > > >
> > > > The gist seems to me that this code wants to make sure that a program
> > > > can't successfully exec if it has gone through a set*id() transition
> > > > while exceeding its RLIMIT_NPROC.
> > > >
> > > > But I agree that the semantics here are a bit strange.
> > > >
> > > > Iicu, a capable but non-INIT_USER caller getting PF_NPROC_EXCEEDED set
> > > > during a set*id() transition wouldn't be able to exec right away if they
> > > > still exceed their RLIMIT_NPROC at the time of exec. So their exec would
> > > > fail in fs/exec.c:
> > > >
> > > > if ((current->flags & PF_NPROC_EXCEEDED) &&
> > > > is_ucounts_overlimit(current_ucounts(), UCOUNT_RLIMIT_NPROC, rlimit(RLIMIT_NPROC))) {
> > > > retval = -EAGAIN;
> > > > goto out_ret;
> > > > }
> > > >
> > > > However, if the caller were to fork() right after the set*id()
> > > > transition but before the exec while still exceeding their RLIMIT_NPROC
> > > > then they would get PF_NPROC_EXCEEDED cleared (while the child would
> > > > inherit it):
> > > >
> > > > retval = -EAGAIN;
> > > > if (is_ucounts_overlimit(task_ucounts(p), UCOUNT_RLIMIT_NPROC, rlimit(RLIMIT_NPROC))) {
> > > > if (p->real_cred->user != INIT_USER &&
> > > > !capable(CAP_SYS_RESOURCE) && !capable(CAP_SYS_ADMIN))
> > > > goto bad_fork_free;
> > > > }
> > > > current->flags &= ~PF_NPROC_EXCEEDED;
> > > >
> > > > which means a subsequent exec by the capable caller would now succeed
> > > > even though they could still exceed their RLIMIT_NPROC limit.
> > > >
> > > > So at first glance, it seems that set_user() should probably get the
> > > > same check as it can be circumvented today unless I misunderstand the
> > > > original motivation.
> > > >
> > > > Christian
> > >
> > > Hi Christian,
> > >
> > > I think i didn't give enough information in the commit message.
> > > When switch to a capable but non-INIT_SUER and the RLIMIT_NPROC limit already exceeded,
> > > and calls these funcs:
> > > 1. set_xxuid()->exec()
> > > ---> fail
> > > 2. set_xxuid()->fork()->exec()
> > > ---> success
> > > Kernel should have the same behavior to uer space.
> > > Also i think non init_user CAN exceed the limit when with proper capability,
> > > so in the patch, set_user() clear PF_NPROC_EXCEEDED flag if capable()
> > > returns true.
> >
> > Hi, Christian
> >
> > Do you have any further comments on this patch?
> > is there anything i did not give enough infomation ?
>
> Yeah, this is fine and how I understood it too. I don't see anything
> obviously wrong with it and the weird detour workaround via fork() seems
> inconsistent. So if I don't here anyone come up with a good reason the
> current behavior makes sense I'll pick this up.
>
> Christian

As I understand, the resulting commit:

https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=2863643fb8b92291a7e97ba46e342f1163595fa8

broke RLIMIT_NPROC support for Apache httpd suexec and likely similar.

Yes, I can see how having a detour via fork() was inconsistent, but
since the privileged process can be assumed non-malicious it was no big
deal. suexec just doesn't have fork() in there.

Historically, the resetting on fork() appears to have been due to my
suggestion here:

https://www.openwall.com/lists/kernel-hardening/2011/07/25/4

"Perhaps also reset the flag on fork() because we have an RLIMIT_NPROC
check on fork() anyway."

Looks like I didn't consider the inconsistency for capable() processes
(or maybe that exception wasn't yet in there?)

Anyway, now I suggest that 2863643fb8b92291a7e97ba46e342f1163595fa8 be
reverted, and if there's any reason to make any change (what reason?
mere consistency or any real issue?) then I suggest that the flag
resetting on fork() be made conditional. Something like this:

if (atomic_read(&p->real_cred->user->processes) >=
task_rlimit(p, RLIMIT_NPROC)) {
if (p->real_cred->user != INIT_USER &&
!capable(CAP_SYS_RESOURCE) && !capable(CAP_SYS_ADMIN))
goto bad_fork_free;
- }
- current->flags &= ~PF_NPROC_EXCEEDED;
+ } else
+ current->flags &= ~PF_NPROC_EXCEEDED;

Alexander

2021-09-08 12:53:15

by Solar Designer

[permalink] [raw]
Subject: Re: [PATCH] set_user: add capability check when rlimit(RLIMIT_NPROC) exceeds

Here's a further observation:

On Tue, Sep 07, 2021 at 11:30:42PM +0200, Solar Designer wrote:
> As I understand, the resulting commit:
>
> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=2863643fb8b92291a7e97ba46e342f1163595fa8
>
> broke RLIMIT_NPROC support for Apache httpd suexec and likely similar.

The commit above tries to make things consistent by duplicating the
check from copy_process() also in set_user(). However, the check isn't
actually the same because set_user(new) is called _before_
security_task_fix_setuid(new, ...), whereas in the described detour via
fork() its check would be reached already as the new user. So those
capable() calls just look the same, but are actually very different, and
that's the problem. My current understanding is the commit actually
increases inconsistency.

The commit message starts with:

"in copy_process(): non root users but with capability CAP_SYS_RESOURCE
or CAP_SYS_ADMIN will clean PF_NPROC_EXCEEDED flag even
rlimit(RLIMIT_NPROC) exceeds. Add the same capability check logic here."

It talks about the obscure case of "non root users but with capability".
However, the capable() calls added by the commit actually also apply to
root, such as in suexec.

> Anyway, now I suggest that 2863643fb8b92291a7e97ba46e342f1163595fa8 be
> reverted, and if there's any reason to make any change (what reason?
> mere consistency or any real issue?) then I suggest that the flag
> resetting on fork() be made conditional. Something like this:
>
> if (atomic_read(&p->real_cred->user->processes) >=
> task_rlimit(p, RLIMIT_NPROC)) {
> if (p->real_cred->user != INIT_USER &&
> !capable(CAP_SYS_RESOURCE) && !capable(CAP_SYS_ADMIN))
> goto bad_fork_free;
> - }
> - current->flags &= ~PF_NPROC_EXCEEDED;
> + } else
> + current->flags &= ~PF_NPROC_EXCEEDED;

Alternatively, we could postpone the set_user() calls until we're
running with the new user's capabilities, but that's an invasive change
that's likely to create its own issues. So my suggestion above holds.

Alexander

2021-09-13 10:04:06

by Christian Brauner

[permalink] [raw]
Subject: Re: [PATCH] set_user: add capability check when rlimit(RLIMIT_NPROC) exceeds

On Wed, Sep 08, 2021 at 12:24:00PM +0200, Solar Designer wrote:
> Here's a further observation:
>
> On Tue, Sep 07, 2021 at 11:30:42PM +0200, Solar Designer wrote:
> > As I understand, the resulting commit:
> >
> > https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=2863643fb8b92291a7e97ba46e342f1163595fa8
> >
> > broke RLIMIT_NPROC support for Apache httpd suexec and likely similar.
>
> The commit above tries to make things consistent by duplicating the
> check from copy_process() also in set_user(). However, the check isn't
> actually the same because set_user(new) is called _before_
> security_task_fix_setuid(new, ...), whereas in the described detour via
> fork() its check would be reached already as the new user. So those
> capable() calls just look the same, but are actually very different, and
> that's the problem. My current understanding is the commit actually
> increases inconsistency.
>
> The commit message starts with:
>
> "in copy_process(): non root users but with capability CAP_SYS_RESOURCE
> or CAP_SYS_ADMIN will clean PF_NPROC_EXCEEDED flag even
> rlimit(RLIMIT_NPROC) exceeds. Add the same capability check logic here."
>
> It talks about the obscure case of "non root users but with capability".
> However, the capable() calls added by the commit actually also apply to
> root, such as in suexec.
>
> > Anyway, now I suggest that 2863643fb8b92291a7e97ba46e342f1163595fa8 be
> > reverted, and if there's any reason to make any change (what reason?
> > mere consistency or any real issue?) then I suggest that the flag
> > resetting on fork() be made conditional. Something like this:
> >
> > if (atomic_read(&p->real_cred->user->processes) >=
> > task_rlimit(p, RLIMIT_NPROC)) {
> > if (p->real_cred->user != INIT_USER &&
> > !capable(CAP_SYS_RESOURCE) && !capable(CAP_SYS_ADMIN))
> > goto bad_fork_free;
> > - }
> > - current->flags &= ~PF_NPROC_EXCEEDED;
> > + } else
> > + current->flags &= ~PF_NPROC_EXCEEDED;
>
> Alternatively, we could postpone the set_user() calls until we're
> running with the new user's capabilities, but that's an invasive change
> that's likely to create its own issues. So my suggestion above holds.

Thanks for taking a look at this. We can surely revert this. Fwiw,
given how non-obvious this whole thing turned out to be a few comments
in the code would've been helpful. I'll try to send a revert by the end
of this week with your explanations added in the revert message.

Christian