Received: by 2002:a05:6a10:1a4d:0:0:0:0 with SMTP id nk13csp2367589pxb; Wed, 9 Feb 2022 17:22:47 -0800 (PST) X-Google-Smtp-Source: ABdhPJxPsolKy6fQYNMhsNTdTRweF7HTUBMJJddZ6GBn3WG7vbAxDG8na0O7DcpHGGSQ7EDux9pz X-Received: by 2002:a63:85c1:: with SMTP id u184mr1355028pgd.337.1644456166954; Wed, 09 Feb 2022 17:22:46 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1644456166; cv=none; d=google.com; s=arc-20160816; b=uLFI+fC3WhAfE5Jf0qw8EmrjfIA2Oa2fCzYQjaE9RpG+MZgF15sNjRK3lFIavW4zio iVRg3ks26+1Ym0P9Kid52+cv779DkL8FwNWfxLVnEtaDqXK/3lODFTc6/rBqBdwQHJdM Qoi/PUu7TPM2POzDIr53Jkq1EqxnC5LJUo14OqfRRo38mfRdATkZD7kCxQOmLrm5xdKN rMBsz+ojjl40ufYAqy2mBSem30gY/oU52CxjQlGVUIvktNW9hv438BE1eMmZnN9z6t+B e3Fd6bzHGBj/b0KUGoaY9bd3+4k3H1ak7KyStL7xiQr3968sKuiqSffZ5+AySYeny9aN tZiQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:content-language :in-reply-to:mime-version:user-agent:date:message-id:from:references :cc:to:subject:dkim-signature; bh=5+tRnd8Xg9x7mgUqVVo/liOxPoi3ep4CbmXvpW+HjwM=; b=CjFI9/cmsaVasme9ylAf3pbeFidOKRS03BRR2scwqUQ4hylUERt2lCCXQD5JbtZxp1 LZN/BdNmiVkfAE1r6VS8e5c7kY/dCLJlLIzomy45EDcmg8QOQAjBJuoHrTuNN/1Z5npY ht82XRkyMkyIPr1yIzyCWGDIUWvWRU/LDOIa7C0x8GXU6f+QmRZyRG8nHCJvxMczpXG1 HDeBDL3ewtIbd2f0qa2H+1wTJ2Bc3LKYkKmFmqhq5N9vaVfFppAm5Qj8h7GHqg3bqII7 ziQlwLXqQuRT56Cv+gWeRG3Y23ybL7wdGpxyDDVzH9AaztHtPZFuN9j9ad9dFiCajcfx q9qQ== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@linuxfoundation.org header.s=google header.b=Ss+PDDj7; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=linuxfoundation.org Return-Path: Received: from lindbergh.monkeyblade.net (lindbergh.monkeyblade.net. [2620:137:e000::1:18]) by mx.google.com with ESMTPS id d15si4720085pgw.137.2022.02.09.17.22.46 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 09 Feb 2022 17:22:46 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:18 as permitted sender) client-ip=2620:137:e000::1:18; Authentication-Results: mx.google.com; dkim=pass header.i=@linuxfoundation.org header.s=google header.b=Ss+PDDj7; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=linuxfoundation.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by lindbergh.monkeyblade.net (Postfix) with ESMTP id 810231EC73; Wed, 9 Feb 2022 17:22:36 -0800 (PST) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231574AbiBJBWZ (ORCPT + 99 others); Wed, 9 Feb 2022 20:22:25 -0500 Received: from gmail-smtp-in.l.google.com ([23.128.96.19]:41298 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S231518AbiBJBWT (ORCPT ); Wed, 9 Feb 2022 20:22:19 -0500 Received: from mail-io1-xd36.google.com (mail-io1-xd36.google.com [IPv6:2607:f8b0:4864:20::d36]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id C19081EC67 for ; Wed, 9 Feb 2022 17:22:21 -0800 (PST) Received: by mail-io1-xd36.google.com with SMTP id 9so5559107iou.2 for ; Wed, 09 Feb 2022 17:22:21 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linuxfoundation.org; s=google; h=subject:to:cc:references:from:message-id:date:user-agent :mime-version:in-reply-to:content-language:content-transfer-encoding; bh=5+tRnd8Xg9x7mgUqVVo/liOxPoi3ep4CbmXvpW+HjwM=; b=Ss+PDDj7oCnQ2KGPqPfYltHXga74jpfEV9BYRBNQLvQeUZiGNb42fWTGaxTwOMbW8i S5DV/SFS/4P68yIivGidmgElaG8YYaIWt85HeKtSQ3N5sSuKOdAihDva6swjbgLOgXR3 8b4t6WXF7CD1MmibvwSV+6SKpy/0viLlZC4Tk= X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:subject:to:cc:references:from:message-id:date :user-agent:mime-version:in-reply-to:content-language :content-transfer-encoding; bh=5+tRnd8Xg9x7mgUqVVo/liOxPoi3ep4CbmXvpW+HjwM=; b=c9tRGMG8ROb4ucmLvRffVA/SSHMVauiCHWHmJCOE3hC/hfEz+bgtuRPY4gdlTSSVP0 Ha9xD13S8TjwFYuiH2SZPYxQrBGdUXwvbiw425oOPav3ZEiD/gAElKFNMGvYZrCbgByw ut+p/sOPn+jpggkj5M7fCLKp9H/AnZ+hWjlqPO4Wr/o3rXvPYkbdi1mk1RXw6yANXNCq F/uliiI5FrH85o9P3IfbCsJk6XImGTZKIdKh44zQGRlILLt7mdJIbe4Qcu+vXfXoAmCa +0iiMlz7+6mukkUE1jbgld49KI1mtfgxIa8quvoTLaOGkKxcge/4pQAqlDTYRruM43s9 KH9w== X-Gm-Message-State: AOAM531qTAIVRED2FQ1f6Wlc7wr9g/All5EFNAWthQtB4BM7ErTG2/jf Blx0sJWJuqiOU0sAgAVomoZTXg== X-Received: by 2002:a05:6638:22ce:: with SMTP id j14mr2844023jat.225.1644456139851; Wed, 09 Feb 2022 17:22:19 -0800 (PST) Received: from [192.168.1.128] ([71.205.29.0]) by smtp.gmail.com with ESMTPSA id ay35sm10631712iob.3.2022.02.09.17.22.18 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Wed, 09 Feb 2022 17:22:19 -0800 (PST) Subject: Re: [RFC PATCH 5/6] selftests: Challenge RLIMIT_NPROC in user namespaces To: =?UTF-8?Q?Michal_Koutn=c3=bd?= , Eric Biederman , Alexey Gladkov Cc: Kees Cook , Shuah Khan , Christian Brauner , Solar Designer , Ran Xiaokai , linux-kernel@vger.kernel.org, linux-kselftest@vger.kernel.org, Linux Containers , Shuah Khan References: <20220207121800.5079-1-mkoutny@suse.com> <20220207121800.5079-6-mkoutny@suse.com> From: Shuah Khan Message-ID: Date: Wed, 9 Feb 2022 18:22:18 -0700 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:78.0) Gecko/20100101 Thunderbird/78.8.1 MIME-Version: 1.0 In-Reply-To: <20220207121800.5079-6-mkoutny@suse.com> Content-Type: text/plain; charset=utf-8; format=flowed Content-Language: en-US Content-Transfer-Encoding: 8bit X-Spam-Status: No, score=-2.0 required=5.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS, MAILING_LIST_MULTI,NICE_REPLY_A,RDNS_NONE,SPF_HELO_NONE, T_SCC_BODY_TEXT_LINE autolearn=no autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 2/7/22 5:17 AM, Michal Koutný wrote: > The services are started in descendant user namepaces, each of them > should honor the RLIMIT_NPROC that's passed during user namespace > creation. > > main [user_ns_0] > ` service [user_ns_1] > ` worker 1 > ` worker 2 > ... > ` worker k > ... > ` service [user_ns_n] > ` worker 1 > ` worker 2 > ... > ` worker k > > Test uses explicit synchronization, to make sure original parent's limit > does not interfere with descendants. > Thank you for updating the test with the kernel updates. Please see comments below. A bit of a concern with how long this test will run. Did you time it? > Signed-off-by: Michal Koutný > --- > .../selftests/rlimits/rlimits-per-userns.c | 154 ++++++++++++++---- > 1 file changed, 125 insertions(+), 29 deletions(-) > > diff --git a/tools/testing/selftests/rlimits/rlimits-per-userns.c b/tools/testing/selftests/rlimits/rlimits-per-userns.c > index 26dc949e93ea..54c1b345e42b 100644 > --- a/tools/testing/selftests/rlimits/rlimits-per-userns.c > +++ b/tools/testing/selftests/rlimits/rlimits-per-userns.c > @@ -9,7 +9,9 @@ > #include > #include > #include > +#include > > +#include > #include > #include > #include > @@ -21,38 +23,74 @@ > #include > #include > > -#define NR_CHILDS 2 > +#define THE_LIMIT 4 > +#define NR_CHILDREN 5 > + > +static_assert(NR_CHILDREN >= THE_LIMIT-1, "Need slots for limit-1 children."); > > static char *service_prog; > static uid_t user = 60000; > static uid_t group = 60000; > +static struct rlimit saved_limit; > + > +/* Two uses: main and service */ > +static pid_t child[NR_CHILDREN]; > +static pid_t pid; > > static void setrlimit_nproc(rlim_t n) > { > - pid_t pid = getpid(); > struct rlimit limit = { > .rlim_cur = n, > .rlim_max = n > }; > - > - warnx("(pid=%d): Setting RLIMIT_NPROC=%ld", pid, n); > + if (getrlimit(RLIMIT_NPROC, &saved_limit) < 0) > + err(EXIT_FAILURE, "(pid=%d): getrlimit(RLIMIT_NPROC)", pid); > > if (setrlimit(RLIMIT_NPROC, &limit) < 0) > err(EXIT_FAILURE, "(pid=%d): setrlimit(RLIMIT_NPROC)", pid); > + > + warnx("(pid=%d): Set RLIMIT_NPROC=%ld", pid, n); > +} > + > +static void restore_rlimit_nproc(void) > +{ > + if (setrlimit(RLIMIT_NPROC, &saved_limit) < 0) > + err(EXIT_FAILURE, "(pid=%d): setrlimit(RLIMIT_NPROC, saved)", pid); > + warnx("(pid=%d) Restored RLIMIT_NPROC", pid); > } > > -static pid_t fork_child(void) > +enum msg_sync { > + UNSHARE, > + RLIMIT_RESTORE, > +}; > + > +static void sync_notify(int fd, enum msg_sync m) > { > - pid_t pid = fork(); > + char tmp = m; > + > + if (write(fd, &tmp, 1) < 0) > + warnx("(pid=%d): failed sync-write", pid); > +} > > - if (pid < 0) > +static void sync_wait(int fd, enum msg_sync m) > +{ > + char tmp; > + > + if (read(fd, &tmp, 1) < 0) > + warnx("(pid=%d): failed sync-read", pid); > +} > + > +static pid_t fork_child(int control_fd) > +{ > + pid_t new_pid = fork(); > + > + if (new_pid < 0) > err(EXIT_FAILURE, "fork"); > > - if (pid > 0) > - return pid; > + if (new_pid > 0) > + return new_pid; > > pid = getpid(); > - > warnx("(pid=%d): New process starting ...", pid); > > if (prctl(PR_SET_PDEATHSIG, SIGKILL) < 0) > @@ -73,6 +111,9 @@ static pid_t fork_child(void) > if (unshare(CLONE_NEWUSER) < 0) > err(EXIT_FAILURE, "unshare(CLONE_NEWUSER)"); > > + sync_notify(control_fd, UNSHARE); > + sync_wait(control_fd, RLIMIT_RESTORE); > + > char *const argv[] = { "service", NULL }; > char *const envp[] = { "I_AM_SERVICE=1", NULL }; > > @@ -82,37 +123,92 @@ static pid_t fork_child(void) > err(EXIT_FAILURE, "(pid=%d): execve", pid); > } > > +static void run_service(void) > +{ > + size_t i; > + int ret = EXIT_SUCCESS; > + struct rlimit limit; > + char user_ns[PATH_MAX]; > + > + if (getrlimit(RLIMIT_NPROC, &limit) < 0) > + err(EXIT_FAILURE, "(pid=%d) failed getrlimit", pid); > + if (readlink("/proc/self/ns/user", user_ns, PATH_MAX) < 0) > + err(EXIT_FAILURE, "(pid=%d) failed readlink", pid); > + > + warnx("(pid=%d) Service instance attempts %i children, limit %lu:%lu, ns=%s", > + pid, THE_LIMIT, limit.rlim_cur, limit.rlim_max, user_ns); > + > + /* test rlimit inside the service, effectively THE_LIMIT-1 becaue of service itself */ > + for (i = 0; i < THE_LIMIT; i++) { > + child[i] = fork(); > + if (child[i] == 0) { > + /* service child */ > + pause(); > + exit(EXIT_SUCCESS); > + } > + if (child[i] < 0) { > + warnx("(pid=%d) service fork %lu failed, errno = %i", pid, i+1, errno); > + if (!(i == THE_LIMIT-1 && errno == EAGAIN)) > + ret = EXIT_FAILURE; > + } else if (i == THE_LIMIT-1) { > + warnx("(pid=%d) RLIMIT_NPROC not honored", pid); > + ret = EXIT_FAILURE; > + } > + } > + > + /* service cleanup */ > + for (i = 0; i < THE_LIMIT; i++) > + if (child[i] > 0) > + kill(child[i], SIGUSR1); > + > + for (i = 0; i < THE_LIMIT; i++) > + if (child[i] > 0) > + waitpid(child[i], NULL, WNOHANG); > + > + if (ret) > + exit(ret); > + pause(); > +} > + > int main(int argc, char **argv) > { > size_t i; > - pid_t child[NR_CHILDS]; > - int wstatus[NR_CHILDS]; > - int childs = NR_CHILDS; > - pid_t pid; > + int control_fd[NR_CHILDREN]; > + int wstatus[NR_CHILDREN]; > + int children = NR_CHILDREN; > + int sockets[2]; > + > + pid = getpid(); > > if (getenv("I_AM_SERVICE")) { > - pause(); > - exit(EXIT_SUCCESS); > + run_service(); > + exit(EXIT_FAILURE); Why is this a failure unconditionally? > } > > service_prog = argv[0]; > - pid = getpid(); > > warnx("(pid=%d) Starting testcase", pid); > > - /* > - * This rlimit is not a problem for root because it can be exceeded. > - */ > - setrlimit_nproc(1); > - > - for (i = 0; i < NR_CHILDS; i++) { > - child[i] = fork_child(); > + setrlimit_nproc(THE_LIMIT); > + for (i = 0; i < NR_CHILDREN; i++) { > + if (socketpair(AF_UNIX, SOCK_DGRAM | SOCK_CLOEXEC, 0, sockets) < 0) > + err(EXIT_FAILURE, "(pid=%d) socketpair failed", pid); > + control_fd[i] = sockets[0]; > + child[i] = fork_child(sockets[1]); > wstatus[i] = 0; > + } > + > + for (i = 0; i < NR_CHILDREN; i++) > + sync_wait(control_fd[i], UNSHARE); > + restore_rlimit_nproc(); > + > + for (i = 0; i < NR_CHILDREN; i++) { > + sync_notify(control_fd[i], RLIMIT_RESTORE); > usleep(250000); How long does this test now run for with this loop? > } > > while (1) { > - for (i = 0; i < NR_CHILDS; i++) { > + for (i = 0; i < NR_CHILDREN; i++) { > if (child[i] <= 0) > continue; > > @@ -126,22 +222,22 @@ int main(int argc, char **argv) > warn("(pid=%d): waitpid(%d)", pid, child[i]); > > child[i] *= -1; > - childs -= 1; > + children -= 1; > } > > - if (!childs) > + if (!children) > break; > > usleep(250000); > > - for (i = 0; i < NR_CHILDS; i++) { > + for (i = 0; i < NR_CHILDREN; i++) { > if (child[i] <= 0) > continue; > kill(child[i], SIGUSR1); > } > } > > - for (i = 0; i < NR_CHILDS; i++) { > + for (i = 0; i < NR_CHILDREN; i++) { > if (WIFEXITED(wstatus[i])) > warnx("(pid=%d): pid %d exited, status=%d", > pid, -child[i], WEXITSTATUS(wstatus[i])); > Please a add few more comments in the code path. thanks, -- Shuah