Received: by 2002:a05:7412:bb8d:b0:d7:7d3a:4fe2 with SMTP id js13csp71034rdb; Mon, 14 Aug 2023 09:44:58 -0700 (PDT) X-Google-Smtp-Source: AGHT+IFUmlu8iymzziE/3aUr22jntAENoAAFrSMwEy6kH7A/2vyJhQSUSz79xHN8AnoHKlokQKmN X-Received: by 2002:aa7:8883:0:b0:67e:18c6:d2c6 with SMTP id z3-20020aa78883000000b0067e18c6d2c6mr13692046pfe.5.1692031498160; Mon, 14 Aug 2023 09:44:58 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1692031498; cv=none; d=google.com; s=arc-20160816; b=gsasV9z5bqBaGOeeA5YV72Q2XJqhztE64NAcc0OtcVPu+KiGNOhANyuD22zhd+bF1k u299MPR3Rlux9nZsQBLpMti0OW1RuIf096XXp0cviQvIMjrAP0jKng2QTwsgfJK7uwDO 848aKxMZsFE5kjFgZhS0p6bINWMZopZrAtOHGX6N2sm4MDrgtlqrBKyazJnJ9o2cYbn5 BVKNpoFMWsADe+t7feue9jlsLJQNSFa2wJeLEu1aHY18hxQaYfn5vruogK9jd9w3n5Ab 8AOSvD+LkyTe+YawAlXACil/VwsnlfclTQEBRFuHy/TAJUBCmFIBMg7jOTl1TvzHeRxG SvuA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:cc:to:subject :message-id:date:from:in-reply-to:references:mime-version :dkim-signature; bh=6Ziw3r3Tja22DqMHkyIjj75Auq4FVBkHVssPtrJPa9M=; fh=V29uOjsTIFIWFoA7XfAdwGIaAMgd7rWp+LXWqDNjVP8=; b=n3AwidQqdcAO2YFYPvDjiA9jTsUXUkwYevvvr67cqy57yRgAUlTrhiVyLOwbhHaxaa cOcg2hZ/6I+zicw+ipq4Rao2ugMOLSd/zapkEQvEu0gFbXZMZH4QCmjAuQSoyzY5GwmX dBHuuL0mHCAzI6K/Smlm2QS1uau5Gw6gzzVDhfI1rccEkuD+RYfMStkUl3RqVFKfJRHj QAJlOeVzUxkvvoe+uZsCWsAH5Leq0Uq2WIuVDaZejXfzBgxqrKiI+4YLeG8+UZgMaSwX pEDFLFMixpHQ3JLXUMVgDP0BbebD6S2FyQ2I7GS68L6rt8DJ0UvkS8e9q92dn0Yh3fS7 xLpQ== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@mihalicyn.com header.s=mihalicyn header.b=M4xjxW1e; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=QUARANTINE sp=QUARANTINE dis=NONE) header.from=mihalicyn.com Return-Path: Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id t14-20020a63d24e000000b00543e355adb5si8086898pgi.294.2023.08.14.09.44.26; Mon, 14 Aug 2023 09:44:58 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@mihalicyn.com header.s=mihalicyn header.b=M4xjxW1e; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=QUARANTINE sp=QUARANTINE dis=NONE) header.from=mihalicyn.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S232926AbjHNPZT (ORCPT + 99 others); Mon, 14 Aug 2023 11:25:19 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:34156 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S232930AbjHNPYt (ORCPT ); Mon, 14 Aug 2023 11:24:49 -0400 Received: from mail-lf1-x12f.google.com (mail-lf1-x12f.google.com [IPv6:2a00:1450:4864:20::12f]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id D36F110C1 for ; Mon, 14 Aug 2023 08:24:46 -0700 (PDT) Received: by mail-lf1-x12f.google.com with SMTP id 2adb3069b0e04-4f13c41c957so1322781e87.1 for ; Mon, 14 Aug 2023 08:24:46 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=mihalicyn.com; s=mihalicyn; t=1692026685; x=1692631485; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=6Ziw3r3Tja22DqMHkyIjj75Auq4FVBkHVssPtrJPa9M=; b=M4xjxW1e/fZPcoLMcHB5mqTpLyc8P97Cwp0PNoQ5QHbNxHlZjpjwVZ2popwEFDtpxf U5Ul0NNV0zQj+zAeghcawTCMiNcV8jaHf8Dc06XeIjlDfkLHGyLSUJdxnvwloecVMliq 5gJKkB4paIGj4gkGU6+wcAG54AO1g6LOfL5DY= X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1692026685; x=1692631485; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=6Ziw3r3Tja22DqMHkyIjj75Auq4FVBkHVssPtrJPa9M=; b=Bk/LUA2h3PwCUVFRdOFQ2adwlRutTJub4zui99mHh6Dyf41huc/cGvO79SmInsvSMC 7cpNCvo1+bDKQC6/n15A2rFZKz9EbuudoaLaD+qhG1ELTMDI/Ui9Flp5hRxvD/tkbJGg IbyK20l/AtoHhSKx1BnRRritCEdaUGvzGIK+MvlAj1yS7nxWmiAOafkGM4d72wWMSoWa u72VWhI812BFEsdOz/o1p5iMWtRdSf2ECQpAvthbls3jfsAaRfM1u67DTJQxfvG4+xoN hTGOCsyunm8zYiDSwTlLqwSc4R7O4bS1v9laEFo5YAG/vvh9t/E3SOw+SScAIKUqHAmz RrRw== X-Gm-Message-State: AOJu0YxY2WBlb+LqVom3WCkX8szhpLmJ66HrbO4mCscSDaFxVDhkSHf8 Veo3vIXRD2vdQk/ZgQiXTvw5LBwSKmkS4VbEiR3L2Q== X-Received: by 2002:ac2:5f16:0:b0:4fe:5f0c:9db6 with SMTP id 22-20020ac25f16000000b004fe5f0c9db6mr5252512lfq.5.1692026685009; Mon, 14 Aug 2023 08:24:45 -0700 (PDT) MIME-Version: 1.0 References: <20230814-devcg_guard-v1-0-654971ab88b1@aisec.fraunhofer.de> <20230814-devcg_guard-v1-4-654971ab88b1@aisec.fraunhofer.de> In-Reply-To: <20230814-devcg_guard-v1-4-654971ab88b1@aisec.fraunhofer.de> From: Alexander Mikhalitsyn Date: Mon, 14 Aug 2023 17:24:33 +0200 Message-ID: Subject: Re: [PATCH RFC 4/4] fs: allow mknod in non-initial userns using cgroup device guard To: =?UTF-8?Q?Michael_Wei=C3=9F?= Cc: Christian Brauner , Alexei Starovoitov , Daniel Borkmann , Andrii Nakryiko , Martin KaFai Lau , Song Liu , Yonghong Song , John Fastabend , KP Singh , Stanislav Fomichev , Hao Luo , Jiri Olsa , Quentin Monnet , Alexander Viro , bpf@vger.kernel.org, linux-kernel@vger.kernel.org, linux-fsdevel@vger.kernel.org, gyroidos@aisec.fraunhofer.de, stgraber@ubuntu.com Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Spam-Status: No, score=-2.1 required=5.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_NONE, SPF_HELO_NONE,SPF_PASS autolearn=unavailable autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org +CC St=C3=A9phane Graber On Mon, Aug 14, 2023 at 4:26=E2=80=AFPM Michael Wei=C3=9F wrote: > > If a container manager restricts its unprivileged (user namespaced) > children by a device cgroup, it is not necessary to deny mknod > anymore. Thus, user space applications may map devices on different > locations in the file system by using mknod() inside the container. > > A use case for this, we also use in GyroidOS, is to run virsh for > VMs inside an unprivileged container. virsh creates device nodes, > e.g., "/var/run/libvirt/qemu/11-fgfg.dev/null" which currently fails > in a non-initial userns, even if a cgroup device white list with the > corresponding major, minor of /dev/null exists. Thus, in this case > the usual bind mounts or pre populated device nodes under /dev are > not sufficient. > > To circumvent this limitation, we allow mknod() in fs/namei.c if a > bpf cgroup device guard is enabeld for the current task using > devcgroup_task_is_guarded() and check CAP_MKNOD for the current user > namespace by ns_capable() instead of the global CAP_MKNOD. > > To avoid unusable device nodes on file systems mounted in > non-initial user namespace, may_open_dev() ignores the SB_I_NODEV > for cgroup device guarded tasks. > > Signed-off-by: Michael Wei=C3=9F > --- > fs/namei.c | 19 ++++++++++++++++--- > 1 file changed, 16 insertions(+), 3 deletions(-) > > diff --git a/fs/namei.c b/fs/namei.c > index e56ff39a79bc..ef4f22b9575c 100644 > --- a/fs/namei.c > +++ b/fs/namei.c > @@ -3221,6 +3221,9 @@ EXPORT_SYMBOL(vfs_mkobj); > > bool may_open_dev(const struct path *path) > { > + if (devcgroup_task_is_guarded(current)) > + return !(path->mnt->mnt_flags & MNT_NODEV); > + > return !(path->mnt->mnt_flags & MNT_NODEV) && > !(path->mnt->mnt_sb->s_iflags & SB_I_NODEV); > } > @@ -3976,9 +3979,19 @@ int vfs_mknod(struct mnt_idmap *idmap, struct inod= e *dir, > if (error) > return error; > > - if ((S_ISCHR(mode) || S_ISBLK(mode)) && !is_whiteout && > - !capable(CAP_MKNOD)) > - return -EPERM; > + /* > + * In case of a device cgroup restirction allow mknod in user > + * namespace. Otherwise just check global capability; thus, > + * mknod is also disabled for user namespace other than the > + * initial one. > + */ > + if ((S_ISCHR(mode) || S_ISBLK(mode)) && !is_whiteout) { > + if (devcgroup_task_is_guarded(current)) { > + if (!ns_capable(current_user_ns(), CAP_MKNOD)) > + return -EPERM; > + } else if (!capable(CAP_MKNOD)) > + return -EPERM; > + } > > if (!dir->i_op->mknod) > return -EPERM; > > -- > 2.30.2 >