Received: by 2002:a05:6358:11c7:b0:104:8066:f915 with SMTP id i7csp1324785rwl; Wed, 5 Apr 2023 15:33:10 -0700 (PDT) X-Google-Smtp-Source: AKy350bgWVostOUZHvIXN3gVYhQHMY12vPLa3b6pWzf1hYb8Mrvwz9nEYHnm98yOjNha0zDcyr8N X-Received: by 2002:a17:907:bb8c:b0:946:e908:3af0 with SMTP id xo12-20020a170907bb8c00b00946e9083af0mr3821081ejc.26.1680733990119; Wed, 05 Apr 2023 15:33:10 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1680733990; cv=none; d=google.com; s=arc-20160816; b=ofUsZQmi8WrscbRdE3S/eHNFXSXmoI+tS4a3eHwNbebHEK0PJRBuOgzjj7erRWPdW5 paRMRonthzgi6Dd1ublmRSNtmmRnvQNjEepZJ0s06B+iRpE5dmTzRA96Nv55o96hQnnE JEjYRtX9gTJaRpMgbxVDtwQEwLUaONMFq+UUJXoWcuf47x29+VGfks3d7kuYzM+OxRyL vu3weqSkoJbMnnhVCbcMH0LY/rAVJJG9O3sXvcO3vk6BdYdIMNQHd8EEfiThwTjai81a qe/AIPDsg7wLOQk63pxs0oLO3+QKo2oPP8ECANM+6eUHECb8uRhzId3q7NgSfxwYKdDq gr9w== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:cc:to:from:subject:message-id:mime-version :in-reply-to:date:dkim-signature; bh=GWR9MlnovPb4azeMHr07bRFmjnPJymKEQvz5exk2wDc=; b=lhUkNttX67NWiPjIkx/qFoCkkYQ0NthTDrgStkz9LAarZI7IUIFpjfDAkR0lqm/HBQ T0GMvkcSrTScKunJ+gIeT8KVjWpX5XqSPv4/iVp6BzNgtA7o55va/v6k6xko1hXI+4qY iabdLYNIYKxap4cB+Ps44WGxd92NJ0jKHlCpD3+NDmIrBUb9yftyO2ET5RGDmgQR6vRK GVceLx3IbL6ccRJywnBFI5cOvoJGOU4VKOPXaz7VbcIwnmyUEoZTKZVpDu12XGRcUJK3 cO/a/xllCpX3PUO+am1nsh12UP5gaKDZHt2keSa/HUZHRNM475CyeBhToOUUyCEPoqgy I3og== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@google.com header.s=20210112 header.b=VmY8UlSK; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Return-Path: Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id n3-20020a17090695c300b0093e3a33d7easi4909489ejy.451.2023.04.05.15.32.42; Wed, 05 Apr 2023 15:33:10 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@google.com header.s=20210112 header.b=VmY8UlSK; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S232221AbjDEW3F (ORCPT + 99 others); Wed, 5 Apr 2023 18:29:05 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:56982 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229461AbjDEW3D (ORCPT ); Wed, 5 Apr 2023 18:29:03 -0400 Received: from mail-pg1-x549.google.com (mail-pg1-x549.google.com [IPv6:2607:f8b0:4864:20::549]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 5CB6D2719 for ; Wed, 5 Apr 2023 15:29:02 -0700 (PDT) Received: by mail-pg1-x549.google.com with SMTP id 140-20020a630792000000b0050be9465db8so10967431pgh.2 for ; Wed, 05 Apr 2023 15:29:02 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20210112; t=1680733742; h=cc:to:from:subject:message-id:mime-version:in-reply-to:date:from:to :cc:subject:date:message-id:reply-to; bh=GWR9MlnovPb4azeMHr07bRFmjnPJymKEQvz5exk2wDc=; b=VmY8UlSKwzQO0K5HFS4LZyNPsx2yz03z8VFeH0jzD3xNKquWByhUadcUB64clw/ukK PS0lE7ms2N4zjfi8yBjDa838Jv9P/Ebe1rmQaDpZapGQHrRUF7T36fqfDV4nrOsV5AMd SGz0kry10Itl5RDl9lqCNIJymFOfJXu3eyUU9TTKRLkH+oIyMJ/lPP7ew2v5fMpzLKgu h2uPt2YYNrcvek8x7z9xu1hq0h+WSNdpNf1iKUK96UPq7N13/VzzDBzL4pGp1LrBOcqG 9eHu/EI4kCb1GuSPO4OgPgpAKgvx41nOj6p+bBeCRlZUmYgF2jGC97zqN3Q2IAWdiWYG FlOQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; t=1680733742; h=cc:to:from:subject:message-id:mime-version:in-reply-to:date :x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=GWR9MlnovPb4azeMHr07bRFmjnPJymKEQvz5exk2wDc=; b=QJ9ns2lh0tm9zRlMbqZFvivEQwpILruMuDtaz2e0oGvGAPUm8d+NBlshKXvrRU36ms Ox/gGecU9JZj+y/Ib34SbYz1My7FkBHITH82ftY8acA4ufwhDfqkPAEzn0DJndWMWtq3 AgH6g1Pn18hiY7NdXn3tb3jX3LRSFLX22qBF8W1ucxAET/fmsYfp+lgZvuhPzMHjiE1P o5bb3+y9fYZk1hw0DbQXYHuGNEYSsep5hGctZcv7tJEz2R2+lzB2DDU8ix3U+JO8/NrI kW/d8HPamut6ycC1yhcSM7TFCZuIKYR4tKj6qCKB01JTXb89L4eDtOchEMjKdkESz65T QhYA== X-Gm-Message-State: AAQBX9dmLW0W3+L2W2KG5Tgnkw35a6brpPXzJxUkbfozm9c4UiB6jRrR /xEFcVvomoam4J7/B46BQnooaEw28qKRFF00JA== X-Received: from ackerleytng-cloudtop.c.googlers.com ([fda3:e722:ac3:cc00:7f:e700:c0a8:1f5f]) (user=ackerleytng job=sendgmr) by 2002:a17:90a:7308:b0:23f:1caa:233a with SMTP id m8-20020a17090a730800b0023f1caa233amr1734038pjk.1.1680733741709; Wed, 05 Apr 2023 15:29:01 -0700 (PDT) Date: Wed, 05 Apr 2023 22:29:00 +0000 In-Reply-To: (message from David Hildenbrand on Mon, 3 Apr 2023 10:21:48 +0200) Mime-Version: 1.0 Message-ID: Subject: Re: [RFC PATCH v3 1/2] mm: restrictedmem: Allow userspace to specify mount for memfd_restricted From: Ackerley Tng To: David Hildenbrand Cc: kvm@vger.kernel.org, linux-api@vger.kernel.org, linux-arch@vger.kernel.org, linux-doc@vger.kernel.org, linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org, linux-mm@kvack.org, qemu-devel@nongnu.org, aarcange@redhat.com, ak@linux.intel.com, akpm@linux-foundation.org, arnd@arndb.de, bfields@fieldses.org, bp@alien8.de, chao.p.peng@linux.intel.com, corbet@lwn.net, dave.hansen@intel.com, ddutile@redhat.com, dhildenb@redhat.com, hpa@zytor.com, hughd@google.com, jlayton@kernel.org, jmattson@google.com, joro@8bytes.org, jun.nakajima@intel.com, kirill.shutemov@linux.intel.com, linmiaohe@huawei.com, luto@kernel.org, mail@maciej.szmigiero.name, mhocko@suse.com, michael.roth@amd.com, mingo@redhat.com, naoya.horiguchi@nec.com, pbonzini@redhat.com, qperret@google.com, rppt@kernel.org, seanjc@google.com, shuah@kernel.org, steven.price@arm.com, tabba@google.com, tglx@linutronix.de, vannapurve@google.com, vbabka@suse.cz, vkuznets@redhat.com, wanpengli@tencent.com, wei.w.wang@intel.com, x86@kernel.org, yu.c.zhang@linux.intel.com Content-Type: text/plain; charset="UTF-8"; format=flowed; delsp=yes X-Spam-Status: No, score=-7.7 required=5.0 tests=DKIMWL_WL_MED,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_NONE, SPF_HELO_NONE,SPF_PASS,USER_IN_DEF_DKIM_WL autolearn=unavailable autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Thanks for your review! David Hildenbrand writes: > On 01.04.23 01:50, Ackerley Tng wrote: >> ... >> diff --git a/include/uapi/linux/restrictedmem.h >> b/include/uapi/linux/restrictedmem.h >> new file mode 100644 >> index 000000000000..22d6f2285f6d >> --- /dev/null >> +++ b/include/uapi/linux/restrictedmem.h >> @@ -0,0 +1,8 @@ >> +/* SPDX-License-Identifier: GPL-2.0 WITH Linux-syscall-note */ >> +#ifndef _UAPI_LINUX_RESTRICTEDMEM_H >> +#define _UAPI_LINUX_RESTRICTEDMEM_H >> + >> +/* flags for memfd_restricted */ >> +#define RMFD_USERMNT 0x0001U > I wonder if we can come up with a more expressive prefix than RMFD. > Sounds more like "rm fd" ;) Maybe it should better match the > "memfd_restricted" syscall name, like "MEMFD_RSTD_USERMNT". RMFD did actually sound vulgar, I'm good with MEMFD_RSTD_USERMNT! >> + >> +#endif /* _UAPI_LINUX_RESTRICTEDMEM_H */ >> diff --git a/mm/restrictedmem.c b/mm/restrictedmem.c >> index c5d869d8c2d8..f7b62364a31a 100644 >> --- a/mm/restrictedmem.c >> +++ b/mm/restrictedmem.c >> @@ -1,11 +1,12 @@ >> // SPDX-License-Identifier: GPL-2.0 >> -#include "linux/sbitmap.h" > Looks like an unrelated change? Will remove this in the next revision. >> +#include >> #include >> #include >> #include >> #include >> #include >> #include >> +#include >> #include >> struct restrictedmem { >> @@ -189,19 +190,20 @@ static struct file >> *restrictedmem_file_create(struct file *memfd) >> return file; >> } >> -SYSCALL_DEFINE1(memfd_restricted, unsigned int, flags) >> +static int restrictedmem_create(struct vfsmount *mount) >> { >> struct file *file, *restricted_file; >> int fd, err; >> - if (flags) >> - return -EINVAL; >> - >> fd = get_unused_fd_flags(0); >> if (fd < 0) >> return fd; >> - file = shmem_file_setup("memfd:restrictedmem", 0, VM_NORESERVE); >> + if (mount) >> + file = shmem_file_setup_with_mnt(mount, "memfd:restrictedmem", 0, >> VM_NORESERVE); >> + else >> + file = shmem_file_setup("memfd:restrictedmem", 0, VM_NORESERVE); >> + >> if (IS_ERR(file)) { >> err = PTR_ERR(file); >> goto err_fd; >> @@ -223,6 +225,66 @@ SYSCALL_DEFINE1(memfd_restricted, unsigned int, >> flags) >> return err; >> } >> +static bool is_shmem_mount(struct vfsmount *mnt) >> +{ >> + return mnt && mnt->mnt_sb && mnt->mnt_sb->s_magic == TMPFS_MAGIC; >> +} >> + >> +static bool is_mount_root(struct file *file) >> +{ >> + return file->f_path.dentry == file->f_path.mnt->mnt_root; >> +} > I'd inline at least that function, pretty self-explaining. Will inline this in the next revision. >> + >> +static int restrictedmem_create_on_user_mount(int mount_fd) >> +{ >> + int ret; >> + struct fd f; >> + struct vfsmount *mnt; >> + >> + f = fdget_raw(mount_fd); >> + if (!f.file) >> + return -EBADF; >> + >> + ret = -EINVAL; >> + if (!is_mount_root(f.file)) >> + goto out; >> + >> + mnt = f.file->f_path.mnt; >> + if (!is_shmem_mount(mnt)) >> + goto out; >> + >> + ret = file_permission(f.file, MAY_WRITE | MAY_EXEC); >> + if (ret) >> + goto out; >> + >> + ret = mnt_want_write(mnt); >> + if (unlikely(ret)) >> + goto out; >> + >> + ret = restrictedmem_create(mnt); >> + >> + mnt_drop_write(mnt); >> +out: >> + fdput(f); >> + >> + return ret; >> +} >> + >> +SYSCALL_DEFINE2(memfd_restricted, unsigned int, flags, int, mount_fd) >> +{ >> + if (flags & ~RMFD_USERMNT) >> + return -EINVAL; >> + >> + if (flags == RMFD_USERMNT) { >> + if (mount_fd < 0) >> + return -EINVAL; >> + >> + return restrictedmem_create_on_user_mount(mount_fd); >> + } else { >> + return restrictedmem_create(NULL); >> + } > You can drop the else case: > if (flags == RMFD_USERMNT) { > ... > return restrictedmem_create_on_user_mount(mount_fd); > } > return restrictedmem_create(NULL); I'll be refactoring this to adopt Kirill's suggestion of using a single restrictedmem_create(mnt) call. > I do wonder if you want to properly check for a flag instead of > comparing values. Results in a more natural way to deal with flags: > if (flags & RMFD_USERMNT) { > } Will use this in the next revision. >> +} >> + >> int restrictedmem_bind(struct file *file, pgoff_t start, pgoff_t end, >> struct restrictedmem_notifier *notifier, bool exclusive) >> { > The "memfd_restricted" vs. "restrictedmem" terminology is a bit > unfortunate, but not your fault here. > I'm not a FS person, but it does look good to me.