Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:18 as permitted sender) client-ip=2620:137:e000::1:18;
From:   "Eric W. Biederman" <ebiederm@xmission.com>
To:     Linus Torvalds <linus@torvalds.org>
Cc:     Linux API <linux-api@vger.kernel.org>,
        Etienne Dechamps <etienne@edechamps.fr>,
        Alexey Gladkov <legion@kernel.org>,
        Kees Cook <keescook@chromium.org>,
        Shuah Khan <shuah@kernel.org>,
        Christian Brauner <brauner@kernel.org>,
        Solar Designer <solar@openwall.com>,
        Ran Xiaokai <ran.xiaokai@zte.com.cn>,
        Linux Kernel Mailing List <linux-kernel@vger.kernel.org>,
        "open list:KERNEL SELFTEST FRAMEWORK" 
        <linux-kselftest@vger.kernel.org>,
        Linux Containers <containers@lists.linux-foundation.org>,
        Michal =?utf-8?Q?Koutn=C3=BD?= <mkoutny@suse.com>,
        Security Officers <security@kernel.org>,
        Neil Brown <neilb@cse.unsw.edu.au>, NeilBrown <neilb@suse.de>,
        "Serge E. Hallyn" <serge@hallyn.com>, Jann Horn <jannh@google.com>,
        Andy Lutomirski <luto@kernel.org>, Willy Tarreau <w@1wt.eu>
References: <20220207121800.5079-1-mkoutny@suse.com>
        <e9589141-cfeb-90cd-2d0e-83a62787239a@edechamps.fr>
        <20220215101150.GD21589@blackbody.suse.cz>
        <87zgmi5rhm.fsf@email.froward.int.ebiederm.org>
        <87fso91n0v.fsf_-_@email.froward.int.ebiederm.org>
        <CAHk-=wjX3VK8QRMDUWwigCTKdHJt0ESXh0Hy5HNaXf7YkEdCAA@mail.gmail.com>
        <878ru1qcos.fsf@email.froward.int.ebiederm.org>
        <CAHk-=wgW8+vmqhx4t+uFiZL==8Ac5VWTqCm_oshA0e47B73qPw@mail.gmail.com>
Date:   Wed, 23 Feb 2022 20:12:06 -0600
In-Reply-To: <CAHk-=wgW8+vmqhx4t+uFiZL==8Ac5VWTqCm_oshA0e47B73qPw@mail.gmail.com>
        (Linus Torvalds's message of "Wed, 23 Feb 2022 17:41:41 -0800")
Message-ID: <87tucpko7d.fsf@email.froward.int.ebiederm.org>
User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/27.1 (gnu/linux)
MIME-Version: 1.0
Content-Type: text/plain
Subject: Re: How should rlimits, suid exec, and capabilities interact?
Precedence: bulk

Linus Torvalds <linus@torvalds.org> writes:

> On Wed, Feb 23, 2022 at 5:24 PM Eric W. Biederman <ebiederm@xmission.com> wrote:
>>
>> Question: Running a suid program today charges the activity of that
>> program to the user who ran that program, not to the user the program
>> runs as.  Does anyone see a problem with charging the user the program
>> runs as?
>
> So I think that there's actually two independent issues with limits
> when you have situations like this where the actual user might be
> ambiguous.
>
>  - the "who to charge" question
>
>  - the "how do we *check* the limit" question
>
> and honestly, I think that when it comes to suid binaries, the first
> question is fundamentally ambiguous, because it almost certainly
> depends on the user.
>
> Which to me implies that there probably isn't an answer that is always
> right, and that what you should look at is that second option.
>
> So I would actually suggest that the "execute a suid binary" should
> charge the real user, but *because* it is suid, it should then not
> check the limit (or, perhaps, should check the hard limit?).
>
> You have to charge somebody, but at that point it's a bit ambiguous
> whether it should be allowed.
>
> Exactly so that if you're over a process limit (or something similar -
> think "too many files open" or whatever because you screwed up and
> opened everything) you could still log in as yourself (ssh/login
> charges some admin thing, which probably has high limits or is
> unlimited), and hopefully get shell access, and then be able to "exec
> sudo" to actually get admin access that should be disabled from the
> network.
>
> The above is just one (traditional) example of a fork/open bomb case
> where a user isn't really able to no longer function as himself, but
> wants to fix things (maybe the user has another terminal open, but
> then he can hopefully use a shell-buiiltin 'kill' instead).
>
> And I'm not saying it's "the thing that needs to work". I'm more
> making up an example.
>
> So I'm only saying that the above actually has two examples to the two
> sides of the coin: "login" lowering privileges to a user that may be
> over some limit - and succeeding despite that - and 'suid' succeeding
> despite the original user perhaps being over-committed.
>
> So it's intended exactly as an example of "picking the new or the old
> user would be wrong in either case if you check limits at the
> transition point".
>
> Hmm?

That doesn't really clarify anything for me.  We have two checks one in
fork and one in exec and you seem to be talking about the check in exec.

The check I have problems with for a suid executable is the check in
fork.  If the new process is accounted to the previous user and we use
the permissions of the effective user for checking it that does not make
sense to me.

If we can sort out that the check in fork.  I think I have clarity about
the other cases.


The check in exec while clumsy and needing cleaning up seems to make
sense to me.  We have a transition that starts with fork and ends with
exec and has operations like setuid in between.  If something like
setuid() is called before exec we check in exec.

The case the check in exec is aimed at supporting are processes spawned
from a parent that have a different user (than the parent) and will
never call fork again.   Those processes would be fundamentally immune
to RLIMIT_NPROC if we don't check somewhere besides fork.  There is
existing code in apache to use RLIMIT_NPROC this way.


For your login case I have no problems with it in principle.  In
practice I think you have to login as root to deal with a fork bomb that
hits RLIMIT_NPROC and does not die gracefully.

What I don't see about your login example is how it is practically
different from the apache cgi script case, that the code has supported
for 20 years, and that would be a regression if stopped supporting.

If we want to stop supporting that case we can just remove all of the
RLIMIT_NPROC tests everywhere except for fork, a nice cleanup.


That still leaves me with mismatched effective vs real uid checks in
fork when the effective and real uids don't match.  Which means testing
for root with "capable(CAP_SYS_ADMIN)" does not work.  Which today is
make the code a bit of a challenge to understand and work with.

Eric