2012-05-07 10:06:38

by Maciej Żenczykowski

[permalink] [raw]
Subject: setuid and RLIMIT_NPROC and 3.1+

Commit 72fa59970f8698023045ab0713d66f3f4f96945c
Author: Vasiliy Kulikov <[email protected]>
Date: Mon Aug 8 19:02:04 2011 +0400

move RLIMIT_NPROC check from set_user() to do_execve_common()

intentionally 'breaks' error return codes from setuid and friends in
the presence of RLIMIT_NPROC.

3.0.30:
setresuid(0, 0, 0) = 0
setrlimit(RLIMIT_NPROC, {rlim_cur=1, rlim_max=1}) = 0
clone(Process 20070 attached
child_stack=0, flags=CLONE_CHILD_CLEARTID|CLONE_CHILD_SETTID|SIGCHLD,
child_tidptr=0x1344b30) = 20070 [fork() succeeds only because we're
root]
[pid 20069] setuid(65534) = 0
[pid 20070] setuid(65534) = -1 EAGAIN (Resource temporarily unavailable)

3.1:
setresuid(0, 0, 0) = 0
setrlimit(RLIMIT_NPROC, {rlim_cur=1, rlim_max=1}) = 0
clone(Process 13507 attached
child_stack=0, flags=CLONE_CHILD_CLEARTID|CLONE_CHILD_SETTID|SIGCHLD,
child_tidptr=0x135cb30) = 13507 [fork() succeeds only because we're
root]
[pid 13506] setuid(65534) = 0
[pid 13507] setuid(65534) = 0

Having read the commit in question I get what it is trying to prevent,
but perhaps the setuid call should still be returning an error code
[E2BIG? EBUSY? EOVERFLOW? ENAVAIL?] for those programs that do bother
to check, even though it would 'succeed' in changing uid?

(In my case there is no exec following the setuid...)

- Maciej


2012-05-07 15:23:00

by Linus Torvalds

[permalink] [raw]
Subject: Re: setuid and RLIMIT_NPROC and 3.1+

On Mon, May 7, 2012 at 3:06 AM, Maciej Żenczykowski
<[email protected]> wrote:
>
> Having read the commit in question I get what it is trying to prevent,
> but perhaps the setuid call should still be returning an error code
> [E2BIG? EBUSY? EOVERFLOW? ENAVAIL?] for those programs that do bother
> to check, even though it would 'succeed' in changing uid?
>
> (In my case there is no exec following the setuid...)

But that means that in your case the setuid() did actually succeed.

So you didn't get an error return, and the setuid() actually worked.
Only a later exec() would cause an error, but since you didn't do an
exec(), everything should be perfectly fine.

So returning an error would actually be confusing: it would make the
caller think that the setuid() failed and that you're still running
with the old uid. But it isn't.

So can you explain what the problem is for your workload? As far as I
can tell, it should all work perfectly well.

Linus

2012-05-09 19:04:11

by Linus Torvalds

[permalink] [raw]
Subject: Re: setuid and RLIMIT_NPROC and 3.1+

On Mon, May 7, 2012 at 1:13 PM, Maciej Żenczykowski
<[email protected]> wrote:
>
> The application was relying on setuid failing in order to do resource limiting
> (the man page for setresuid documents EAGAIN as the error you'll get if you
> can't switch to the new uid because of RLIMIT_NPROC being exceeded).
> It would detect the error condition and slow down.
> Now it doesn't get an error back and can grow out of control.

Ok. Maybe we could change the logic in set_user() to simply just check
both the soft and the hard limit.

At the hard limit, we just fail it. At the soft limit, we mark the
next execve() for failure.

That would seem to be a very natural model, and it would mean that you
could get the old behavior by simply making the hard limit the same as
the soft limit.

Would that work ok for your use case?

Trivial (but TOTALLY UNTESTED - so maybe it doesn't work) patch attached.

Linus


Attachments:
patch.diff (956.00 B)

2012-05-09 23:50:55

by Linus Torvalds

[permalink] [raw]
Subject: Re: setuid and RLIMIT_NPROC and 3.1+

On Wed, May 9, 2012 at 4:46 PM, Maciej Żenczykowski
<[email protected]> wrote:
>
> Doesn't that just reintroduce the security 'hole' in buggy userspace apps
> that the original patch was attempting to fix?

Well, those bggy apps have to be really *odd* buggy apps now. IOW,
they need to do setuid(), and then not execve(). At that point, they
really do have to check the error return, since there is no execve()
for them to check.

In the end, we can do only so much to counter buggy apps. I think my
patch is a reasonable "we can try to give the error at execve() time,
but if somebody does tons of setuid() without ever doing the execve(),
at some point they do have to check the error return of setuid()
itself".

I suspect most users of setuid() are good and check the error return.

Linus

2012-05-10 00:49:31

by Maciej Żenczykowski

[permalink] [raw]
Subject: Re: setuid and RLIMIT_NPROC and 3.1+

> Well, those bggy apps have to be really *odd* buggy apps now. IOW,
> they need to do setuid(), and then not execve(). At that point, they
> really do have to check the error return, since there is no execve()
> for them to check.
>
> In the end, we can do only so much to counter buggy apps. I think my
> patch is a reasonable "we can try to give the error at execve() time,
> but if somebody does tons of setuid() without ever doing the execve(),
> at some point they do have to check the error return of setuid()
> itself".
>
> I suspect most users of setuid() are good and check the error return.

If you return EAGAIN from setuid, because we've run into the hard
limit, and userspace doesn't check it, then the exec will still
succeed, and we'll end up running the exec'ed code as root.
Which re-introduces the same bug this was trying to fix. At that
point we might as well just revert the 3.1 commit.

(what if soft and hard nproc limits are the same? then it's always
EAGAIN, so the bug is always reintroduced)

In my opinion the buggy apps should be fixed and the 3.1 commit should
be reverted both in upcoming 3.4 and in stable 3.1/3.2/3.3.

2012-05-10 01:13:31

by Linus Torvalds

[permalink] [raw]
Subject: Re: setuid and RLIMIT_NPROC and 3.1+

Oh christ, I also cannot be bothered to continue arguing with you
since you seemingly randomly drop other people from the discussion.

So don't expect any more replies from me.

Linus

On Wed, May 9, 2012 at 6:10 PM, Linus Torvalds
<[email protected]> wrote:
> On Wed, May 9, 2012 at 5:47 PM, Maciej Żenczykowski
> <[email protected]> wrote:
>>
>> If you return EAGAIN from setuid, because we've run into the hard
>> limit, and userspace doesn't check it, then the exec will still
>> succeed, and we'll end up running the exec'ed code as root.
>
> But how would that happen in practice, and why would we care?
>
> If the application is buggy, it won't check the setuid() error return,
> but if it does the exec, it will fail.
>
> Only a buggy *and*malicious* application that *first* does non-exec
> cases up until it hits the hard limit and *then* tries to do a buggy
> setuid() - without testing the error value - and exec() would care.
>
> But we don't care about applications that *try* to be actively buggy.
> That simply isn't the interesting case. The point of the whole "return
> error at execve()" was not to discourage the actively buggy usage
> case, but the *accidentally* buggy case. And there's no sane way the
> accidentally buggy case can trigger what you describe.
>
> So the whole point wasn't that it "fixes" buggy applications (they
> are, after all, buggy), but that it makes it less likely to have
> accidental problems. It's a "softer" edge to some bugs.
>
> That said, I don't really care that much about the original patch
> either. For all I care, we could revert it. I just simply don't
> understand the point of your objections.
>
>                          Linus

2012-05-10 06:22:41

by Maciej Żenczykowski

[permalink] [raw]
Subject: Re: setuid and RLIMIT_NPROC and 3.1+

On Wed, May 9, 2012 at 6:13 PM, Linus Torvalds
<[email protected]> wrote:
> Oh christ, I also cannot be bothered to continue arguing with you
> since you seemingly randomly drop other people from the discussion.

Sorry, that was a mail client misconfiguration, I was used to 'reply
all' instead of 'reply' being the default - something must have
changed recently.
I noticed and fixed it, but I had failed to notice that this email
also went out only to you.
Once again very sorry.

>>> If you return EAGAIN from setuid, because we've run into the hard
>>> limit, and userspace doesn't check it, then the exec will still
>>> succeed, and we'll end up running the exec'ed code as root.
>>
>> But how would that happen in practice, and why would we care?
>>
>> If the application is buggy, it won't check the setuid() error return,
>> but if it does the exec, it will fail.
>>
>> Only a buggy *and*malicious* application that *first* does non-exec
>> cases up until it hits the hard limit and *then* tries to do a buggy
>> setuid() - without testing the error value - and exec() would care.

There is no guarantee that the soft and hard walls aren't identical
(indeed setting them both to 1 or some other small number seems very
normal) - perhaps the soft limit check in your proposed patch should
be first.
They may also simply be very close to each other [multiple "fork,
setuid, execve" happening in parallel], or that the processes get used
up by someone else using that uid (whose rlimit nproc is higher, this
is a per-process not system-wide setting after all), or that an app
has cases were it setuid's without execing, and other places where it
setuid's with execing.

I agree that with the exception of hard==soft nproc rlimit, the rest
are probably relatively unlikely.
Userspace can increase soft limit to hard, so soft isn't really a
limit, as such anyone that wants to truly limit, will set
soft==hard==minimum value that works.

>> But we don't care about applications that *try* to be actively buggy.
>>
>> That simply isn't the interesting case. The point of the whole "return
>> error at execve()" was not to discourage the actively buggy usage
>> case, but the *accidentally* buggy case. And there's no sane way the
>> accidentally buggy case can trigger what you describe.

It doesn't discourage, because the setuid can no longer fail, so there
is no longer a point to checking it.
It almost actively encourages you to not check the setuid return code
(if you are root), since there's now even less of a point to checking
it.

>> So the whole point wasn't that it "fixes" buggy applications (they
>> are, after all, buggy), but that it makes it less likely to have
>> accidental problems. It's a "softer" edge to some bugs.

Yes, I agree with the sentiment. It would be nice to both be correct
and somehow work-around/fix buggy apps.

Other possible approaches:
(a) fail the setuid system call, but set a magic flag (which clears on
a successful setuid), flag causes exec to fail. This fixes the
fork/setuid/exec case, while not breaking those that expect setuid to
return the failure, while still being potentially insecure (for buggy
apps) in the fork/setuid/no-exec case.

This would basically be:
if (atomic_read(&new_user->processes) >= rlimit(RLIMIT_NPROC) &&
new_user != INIT_USER) {
current->flags |= PF_NPROC_EXCEEDED;
free_uid(new_user);
return -EAGAIN;
} else {
current->flags &= ~PF_NPROC_EXCEEDED;
}

(and exec would *NOT* recheck the rlimit, since the uid wasn't
changed, and rechecking the limit would now be outright wrong)

(b) have the setuid call change uid, set a a magic flag, but still
return failure, flag causes exec to fail

This also fixes buggy setuid/no-exec apps, but has funky syntax, and
could result in apps that do check setuid to be broken (since setuid
could now change the uid while appearing to fail and thus you could
lose privileges when not expecting to)

>> That said, I don't really care that much about the original patch
>> either. For all I care, we could revert it. I just simply don't
>> understand the point of your objections.

Would you accept a patch to do (a) from above?