Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20;
From:   "Eric W. Biederman" <ebiederm@xmission.com>
To:     Kees Cook <keescook@chromium.org>
Cc:     linux-kernel@vger.kernel.org,
        Linux API <linux-api@vger.kernel.org>,
        Etienne Dechamps <etienne@edechamps.fr>,
        Alexey Gladkov <legion@kernel.org>,
        Shuah Khan <shuah@kernel.org>,
        Christian Brauner <brauner@kernel.org>,
        Solar Designer <solar@openwall.com>,
        Ran Xiaokai <ran.xiaokai@zte.com.cn>,
        "open list:KERNEL SELFTEST FRAMEWORK" 
        <linux-kselftest@vger.kernel.org>,
        Linux Containers <containers@lists.linux-foundation.org>,
        Michal =?utf-8?Q?Koutn=C3=BD?= <mkoutny@suse.com>,
        Security Officers <security@kernel.org>,
        Neil Brown <neilb@cse.unsw.edu.au>, NeilBrown <neilb@suse.de>,
        "Serge E. Hallyn" <serge@hallyn.com>, Jann Horn <jannh@google.com>,
        Andy Lutomirski <luto@kernel.org>, Willy Tarreau <w@1wt.eu>,
        Linus Torvalds <linus@torvalds.org>
References: <20220207121800.5079-1-mkoutny@suse.com>
        <e9589141-cfeb-90cd-2d0e-83a62787239a@edechamps.fr>
        <20220215101150.GD21589@blackbody.suse.cz>
        <87zgmi5rhm.fsf@email.froward.int.ebiederm.org>
        <87fso91n0v.fsf_-_@email.froward.int.ebiederm.org>
        <CAHk-=wjX3VK8QRMDUWwigCTKdHJt0ESXh0Hy5HNaXf7YkEdCAA@mail.gmail.com>
        <878ru1qcos.fsf@email.froward.int.ebiederm.org>
        <CAHk-=wgW8+vmqhx4t+uFiZL==8Ac5VWTqCm_oshA0e47B73qPw@mail.gmail.com>
        <87tucpko7d.fsf@email.froward.int.ebiederm.org>
        <87sfs8jmpz.fsf_-_@email.froward.int.ebiederm.org>
        <202202240826.E31BADF@keescook>
Date:   Thu, 24 Feb 2022 18:29:24 -0600
In-Reply-To: <202202240826.E31BADF@keescook> (Kees Cook's message of "Thu, 24
        Feb 2022 08:28:41 -0800")
Message-ID: <87y21zhjq3.fsf@email.froward.int.ebiederm.org>
User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/27.1 (gnu/linux)
MIME-Version: 1.0
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: 8BIT
Subject: Re: [PATCH] ucounts: Fix systemd LimigtNPROC with private users
 regression
Precedence: bulk

Kees Cook <keescook@chromium.org> writes:

> typo: Subject's LimigtNPROC -> LimitNPROC
>
> On Thu, Feb 24, 2022 at 09:41:44AM -0600, Eric W. Biederman wrote:
>> 
>> Long story short recursively enforcing RLIMIT_NPROC when it is not
>> enforced on the process that creates a new user namespace, causes
>> currently working code to fail.  There is no reason to enforce
>> RLIMIT_NPROC recursively when we don't enforce it normally so update
>> the code to detect this case.
>> 
>> I would like to simply use capable(CAP_SYS_RESOURCE) to detect when
>> RLIMIT_NPROC is not enforced upon the caller.  Unfortunately because
>> RLIMIT_NPROC is charged and checked for enforcement based upon the
>> real uid, using capable() wich is euid based is inconsistent with reality.
>
> typo: wich -> which

Ahh... Typos.

>> Come as close as possible to testing for capable(CAP_SYS_RESOURCE) by
>> testing for when the real uid would match the conditions when
>> CAP_SYS_RESOURCE would be present if the real uid was the effective
>> uid.
>> 
>> Reported-by: Etienne Dechamps <etienne@edechamps.fr>
>> Link: https://bugzilla.kernel.org/show_bug.cgi?id=215596
>> Link: https://lkml.kernel.org/r/e9589141-cfeb-90cd-2d0e-83a62787239a@edechamps.fr
>> Fixes: 21d1c5e386bc ("Reimplement RLIMIT_NPROC on top of ucounts")
>> Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
>> ---
>> 
>> The previous conversation has given me enough clarity that I can see
>> which tests I am comfortable with use for this pending regression fix.
>> 
>> I have tested this and it works for me.  Does anyone have any concerns
>> with this change?
>
> I'd really love some kind of selftest that exercises the edge cases; do
> you have your tests in some form that could be converted?
>
> But otherwise, yes, this looks like the best option here.

Let's start with Michal Koutný tests.  I keep forgetting to look at
them.  This cold has really been kicking my butt.

For this issue the test case was a systemd unit file.  Which is simple
and demonstrates the real-world regression but not really minimal in the
way a kernel selftest should be.

> Reviewed-by: Kees Cook <keescook@chromium.org>
>
>> 
>>  kernel/user_namespace.c | 14 +++++++++++++-
>>  1 file changed, 13 insertions(+), 1 deletion(-)
>> 
>> diff --git a/kernel/user_namespace.c b/kernel/user_namespace.c
>> index 6b2e3ca7ee99..5481ba44a8d6 100644
>> --- a/kernel/user_namespace.c
>> +++ b/kernel/user_namespace.c
>> @@ -58,6 +58,18 @@ static void set_cred_user_ns(struct cred *cred, struct user_namespace *user_ns)
>>  	cred->user_ns = user_ns;
>>  }
>>  
>> +static unsigned long enforced_nproc_rlimit(void)
>> +{
>> +	unsigned long limit = RLIM_INFINITY;
>> +
>> +	/* Is RLIMIT_NPROC currently enforced? */
>> +	if (!uid_eq(current_uid(), GLOBAL_ROOT_UID) ||
>> +	    (current_user_ns() != &init_user_ns))
>> +		limit = rlimit(RLIMIT_NPROC);
>> +
>> +	return limit;
>> +}
>> +
>>  /*
>>   * Create a new user namespace, deriving the creator from the user in the
>>   * passed credentials, and replacing that user with the new root user for the
>> @@ -122,7 +134,7 @@ int create_user_ns(struct cred *new)
>>  	for (i = 0; i < MAX_PER_NAMESPACE_UCOUNTS; i++) {
>>  		ns->ucount_max[i] = INT_MAX;
>>  	}
>> -	set_rlimit_ucount_max(ns, UCOUNT_RLIMIT_NPROC, rlimit(RLIMIT_NPROC));
>> +	set_rlimit_ucount_max(ns, UCOUNT_RLIMIT_NPROC, enforced_nproc_rlimit());
>>  	set_rlimit_ucount_max(ns, UCOUNT_RLIMIT_MSGQUEUE, rlimit(RLIMIT_MSGQUEUE));
>>  	set_rlimit_ucount_max(ns, UCOUNT_RLIMIT_SIGPENDING, rlimit(RLIMIT_SIGPENDING));
>>  	set_rlimit_ucount_max(ns, UCOUNT_RLIMIT_MEMLOCK, rlimit(RLIMIT_MEMLOCK));
>> -- 
>> 2.29.2
>> 

Eric