Received: by 2002:a05:6a10:c7c6:0:0:0:0 with SMTP id h6csp2457637pxy; Tue, 3 Aug 2021 07:03:27 -0700 (PDT) X-Google-Smtp-Source: ABdhPJwiEfnzqo51rKpX//RtGdmmcPrnilRMybx2ZlnbJAv9HievSDgJdctjqttmu3E+UQB4oQ5C X-Received: by 2002:a5d:9ccb:: with SMTP id w11mr909905iow.174.1627999406862; Tue, 03 Aug 2021 07:03:26 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1627999406; cv=none; d=google.com; s=arc-20160816; b=Stwpv07S9d/y1rGJ3J+rZtxWpPm0Npq2DNASoZsdoS5TGyLWR9xEnzIBB3iKn+1axS 5FtUmAeWt5BiyUvcQaw9/+fNal8aDteRVi7junpmTBZKLsEg98wANnHyKoIEc4IdvwfW vTV5IRcVsVqsW75KS1isBIYTnHlokITvaNXljClcjMUfEUq99k7Tyzx+DKJWfdjZUuws ej9KgDprDbiT00O7u6UA5M+B8QbsfUUyP57llA34UZFUW9mFyREwO6zCIkzSjKymdxAy NrF8oG7Pw1/z7s1EVA4WpInvXsh+4TwuvqGHTpXsa3wFSeagTk4sqKzj9Mf9X5Rp9mYS m1Ag== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:in-reply-to:content-disposition:mime-version :references:message-id:subject:cc:to:from:date; bh=seIpyff8AYVsUOztInHnTDzOI4xUiqyuL3K/YzTII/Y=; b=oNE+TmV+XQyC7u1s1d7zmyWlHR/gWFcKnHNQtCISYm4W97d4+T5yRuAyjNp0iXxlMJ Al56PJsuAwu+tyBjOZujZ3oVw64hqZkSzwmxnML2uYD6idAvKlUVwhfzax5n/A5QW+/S WWq+vmniSIFNjL2JQqc1aKKKpklydwVl3XR5joVH+qorlsZMrYDELc7xjtsGgnFnZhPb HT3rhYiEiZmSKTwBDWaVXOiOgDRzzYLXti55RuXUe2iPAkhCE5VVNpuEkQ9IvI+o1C7R xwlOiQ2aPbpAc6JvNn5EaYb5IdUKgyiCwKZAZcviwqqcal0B/NRPLDutrMXzm6E+dMik dmCA== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id b24si4901318jar.119.2021.08.03.07.03.15; Tue, 03 Aug 2021 07:03:26 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S236473AbhHCOCK (ORCPT + 99 others); Tue, 3 Aug 2021 10:02:10 -0400 Received: from mail.kernel.org ([198.145.29.99]:44424 "EHLO mail.kernel.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S236483AbhHCOBv (ORCPT ); Tue, 3 Aug 2021 10:01:51 -0400 Received: by mail.kernel.org (Postfix) with ESMTPSA id E95C860EC0; Tue, 3 Aug 2021 14:01:36 +0000 (UTC) Date: Tue, 3 Aug 2021 16:01:33 +0200 From: Christian Brauner To: CGEL Cc: keescook@chromium.org, ktkhai@virtuozzo.com, jamorris@linux.microsoft.com, varad.gautam@suse.com, legion@kernel.org, dbueso@suse.de, linux-kernel@vger.kernel.org, Ran Xiaokai Subject: Re: [PATCH] ipc: add set_ownership() and permissions() callbacks for posix mqueue sysctl Message-ID: <20210803140133.vksebmgqhlbqipla@wittgenstein> References: <20210729030651.536326-1-ran.xiaokai@zte.com.cn> <20210729145348.hvvhu6lmlcn5js4y@wittgenstein> <20210803103150.GA607784@www> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline In-Reply-To: <20210803103150.GA607784@www> Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Tue, Aug 03, 2021 at 03:31:50AM -0700, CGEL wrote: > O Thu, Jul 29, 2021 at 04:53:48PM +0200, Christian Brauner wrote: > > > > Yeah, we did that work specifically for the network namespace but knew > > there were quite a few places that would need fix up. This makes sense > > to me. > > > > Please add tests for this patch though. Also make sure to run them in a > > tight loop on a kernel with memory and log debugging enabled. > For now i have rebuilt the kernel turning on the config items you > suggested and some other kernel hacking and locking debug configs. > I tested this by a shell script concurrently writing the mqueue sysctl > files and checking the value. Do you mean that i should add some test code in this patch? > Can you give some examples for this tests code? Yep, I'd prefer to see tests with this. This should be fairly easy to test. There are a bunch of ways. A really trivial skeleton for a test in tools/testing/selftstes/mqueue/ could be: - Create a new mount and ipc namespace and mount mqueue in there. Read and remember the /proc/sys/fs/mqueue/queues_max value. - Now create a new user + mount namespace pair in a child process. - Mount mqueue filesystem in there. - Set /proc/sys/fs/mqueue/queues_max to 1. - Call mq_open with O_CREAT in the child process the first time and expect success keeping the fd open. - Call mq_open with O_CREAT in the child process a second time and expect failure because of: if (ipc_ns->mq_queues_count >= ipc_ns->mq_queues_max && !capable(CAP_SYS_RESOURCE)) { error = -ENOSPC; goto out_unlock; } ipc_ns->mq_queues_count++; spin_unlock(&mq_lock); - Reap the child in the parent expecting success. - Verify that the /proc/sys/fs/mqueue/queues_max value in the parent is identical to the value you read before creating the child. This should be a very rough test that at least the fundamentals of this change are sane. This doesn't have to be complex. Here's the code from a test I've written for mount_setattr() to create an unpriv mount + userns which might be the annoying part: #include "../../kselftest_harness.h" #include "../pidfd.h" /* for wait_for_pid() */ static ssize_t write_nointr(int fd, const void *buf, size_t count) { ssize_t ret; do { ret = write(fd, buf, count); } while (ret < 0 && errno == EINTR); return ret; } static int write_file(const char *path, const void *buf, size_t count) { int fd; ssize_t ret; fd = open(path, O_WRONLY | O_CLOEXEC | O_NOCTTY | O_NOFOLLOW); if (fd < 0) return -1; ret = write_nointr(fd, buf, count); close(fd); if (ret < 0 || (size_t)ret != count) return -1; return 0; } static int create_and_enter_userns(void) { uid_t uid; gid_t gid; char map[100]; uid = getuid(); gid = getgid(); if (unshare(CLONE_NEWUSER)) return -1; if (write_file("/proc/self/setgroups", "deny", sizeof("deny") - 1) && errno != ENOENT) return -1; snprintf(map, sizeof(map), "0 %d 1", uid); if (write_file("/proc/self/uid_map", map, strlen(map))) return -1; snprintf(map, sizeof(map), "0 %d 1", gid); if (write_file("/proc/self/gid_map", map, strlen(map))) return -1; if (setgid(0)) return -1; if (setuid(0)) return -1; return 0; } static int prepare_unpriv_mountns(void) { if (create_and_enter_userns()) return -1; if (unshare(CLONE_NEWNS)) return -1; if (mount(NULL, "/", NULL, MS_REC | MS_PRIVATE, 0)) return -1; if (unshare(CLONE_NEWIPC)) return -1; return 0; } TEST(mqueue_sysctls) { int ret; pid_t pid; if (unshare(CLONE_NEWNS)) return -1; if (mount(NULL, "/", NULL, MS_REC | MS_PRIVATE, 0)) return -1; if (unshare(CLONE_NEWIPC)) return -1; /* Mount new mqueue instance, read and stash sysctl value. */ pid = fork(); ASSERT_GE(pid, 0) { TH_LOG("%s - Failed to fork", strerror(errno)); } if (pid == 0) { ASSERT_EQ(prepare_unpriv_mountns(), 0); /* mount new mqueue instance, setup sysctls and perform mq_open() tests. */ exit(EXIT_SUCCESS); } ASSERT_EQ(wait_for_pid(pid), 0); /* Read sysctl value again making sure it's still the same as before. */ } TEST_HARNESS_MAIN > > > The whole sysctl retire stuff can't be called from rcu contexts and that's easy to > > miss. > for this patch, retire_mq_sysctls() is called by free_ipc_ns(), and free_ipc_ns() is > triggered by schedule_work(&free_ipc_work) in kthread context. > Can you give some comments on the chance this code running on rcu > context? I really just prefer to see sysctl namespacing changes compiled with all manner of debugging options enabled as they can quite easily uncover bugs where the sysctl cleanup helpers are called from invalid contexts. That has happened to me when I worked on some sysctl changes a while back and thought I had followed all codepaths diligently to make sure that nothing calls the sysctl cleanup code from invalid contexts. I only caught the codepath that did because I had all of the options turned on.