Received: by 2002:ac0:e350:0:0:0:0:0 with SMTP id g16csp2219523imn; Mon, 1 Aug 2022 16:01:14 -0700 (PDT) X-Google-Smtp-Source: AA6agR7+CpC8jrq2IFUxTtipvlkNa1VxVuAd60zVg3YvKUbtcQZax+WXg1tiNMBiBZIxRuhxv8n0 X-Received: by 2002:aa7:8096:0:b0:52d:d5f6:2ea6 with SMTP id v22-20020aa78096000000b0052dd5f62ea6mr869149pff.0.1659394874475; Mon, 01 Aug 2022 16:01:14 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1659394874; cv=none; d=google.com; s=arc-20160816; b=a1sBJM2VkT8MvtGcTkt9hzwMlKqimkQxIVlhyZsIu8sBjuhLaMd1Rmmt645IXEaZGZ CWXkDybbQb6UCdF3E1T3sATR8Yx++fHO0rarNO9VgqAmZv637KVwOUUvtHuTtTvf59/H qmmJag5vCXHiwKAxNa21mH5vbgtu1xgU+Cd2tI6CsnP+iW69XM2TZBqFRIQR/wWDA26F 6Vg5dA1SmcO6PJ4dbr4DiGtU1l8FLByNtgwfBeEcL7nJ2LdxbN/lvlugjhQ3FhT/h2px 9GdjFSUtJuLgaohm34b2zdySXpTu0FA2wi9fYJs8vyVWflcRVCXEgjQPrlGNRe+wDQSx P8cQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:cc:to:subject :message-id:date:from:in-reply-to:references:mime-version :dkim-signature; bh=glcOCUJ/+awLcSG8X5QxtUVosWAHdrRlN7P8qFCy/KU=; b=Mg0s60Ic8a4Ba7aJvIcwnIKE1GgmL3KOKmsNh04kFv9uSYFImSN6ieVbLUIu5R6TJD OZe+S3OIDADVH/ONR2VvOUv2nEbyy5ic0oCsHPdEjgjL4k/xR82FT/ji7eIk5O6stKLs Vs+uFkqzf0tUxpmCcKk3taEDzZDVLWr6hIosyhLIN/81IlUsUUz+6JNoXPE5BmeTM3W4 7BDIjdyV0XciPGOC12thsDuDyOG0cAAu4ZUgqSZx+Nj/QwpzpdppBZP45CWet4ihSU4R 4Czi3O+zU9ZAENC+83B3IxFGn8SgFFjjysm/K2D4LjaCC1DQhaje0a1jA+jYqlh0eGM2 q5/g== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@google.com header.s=20210112 header.b=VgjLNHr1; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Return-Path: Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id u14-20020a17090a5e4e00b001f29f693f90si15746799pji.186.2022.08.01.16.01.00; Mon, 01 Aug 2022 16:01:14 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@google.com header.s=20210112 header.b=VgjLNHr1; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S234760AbiHAWvg (ORCPT + 99 others); Mon, 1 Aug 2022 18:51:36 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:54296 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S234970AbiHAWve (ORCPT ); Mon, 1 Aug 2022 18:51:34 -0400 Received: from mail-io1-xd33.google.com (mail-io1-xd33.google.com [IPv6:2607:f8b0:4864:20::d33]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id ADC5526AC2 for ; Mon, 1 Aug 2022 15:51:31 -0700 (PDT) Received: by mail-io1-xd33.google.com with SMTP id q14so9518327iod.3 for ; Mon, 01 Aug 2022 15:51:31 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20210112; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc; bh=glcOCUJ/+awLcSG8X5QxtUVosWAHdrRlN7P8qFCy/KU=; b=VgjLNHr1hMbP7jz5VD9tM4tHHUdeWiE6jv/PZZxVCEl8oCfbeuNfGwRZv+sM+Wf2i5 oNtEEDp7KV0lDn8cKVruat3XxxQpJMcYjwbPK6y/MVRRPL11HHL0qSFaA+lfpHzZBha0 nLWIVaLCa21JIuOinIDkNS5w6493rVMBjMsIoJBEJyiHzTbqwHr8PtBHwPGlSVsZYhnF QZ0bwchCYRMq8Xg6oV57EE21VLo09Sr7nwNLOc9tMa0EkmmzqE6+UV6VMx2SxN+hUzky CnA8w+UFJxkOejBgEvFvCIaDj3nOXOQ57P4poCq6zqCOaJrsgvvPN50y3wl4tui1yenO Uz4Q== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc; bh=glcOCUJ/+awLcSG8X5QxtUVosWAHdrRlN7P8qFCy/KU=; b=RFzecEQQnd4Vo9pk+iMNCLFZq8mn7nwGLs3UmEF+nbiMp6AC78UXLCaw1Rs7SCXxkh isp+otZo237nOAGh0RFqcPs4YRz4oPLfeKoKfOkt0zcOOSNosm1lekptfuijAjvjkOAm 3HvT5eO08bsOn/CeREYDA78bG5bIR6jGn9w/yxSPqJLBvR3mjX7dSELGLZ3A2YOEneEe YUT5Ww+A+XcuZdJRYTcZgcxBk2gahbzzTKV5AwoXkmv+UI2HJgPtnvF14J6Q8Q4ZaXF3 i+yYang6e3i/iAMi7biuX26n+iPCtAdYlZGshN0vX2JkPVRDHdt50SrTSy6xauZYnedh kxpg== X-Gm-Message-State: AJIora+6t57E4jpYjIADn045G0tLpbHzyjOvwQ+mYp5UjzHcRtztVu3n 5p/PU8qQRGSnMvDGsYkadxmg0VFuLAMENLqcYI60Eg== X-Received: by 2002:a05:6602:15c8:b0:67c:45c7:40c9 with SMTP id f8-20020a05660215c800b0067c45c740c9mr6287785iow.138.1659394290961; Mon, 01 Aug 2022 15:51:30 -0700 (PDT) MIME-Version: 1.0 References: <20220719195628.3415852-1-axelrasmussen@google.com> <7EF50BE4-84EA-4D57-B58C-6697F1B74904@vmware.com> In-Reply-To: From: Axel Rasmussen Date: Mon, 1 Aug 2022 15:50:55 -0700 Message-ID: Subject: Re: [PATCH v4 0/5] userfaultfd: add /dev/userfaultfd for fine grained access control To: Nadav Amit Cc: "Schaufler, Casey" , Alexander Viro , Andrew Morton , Dave Hansen , "Dmitry V . Levin" , Gleb Fotengauer-Malinovskiy , Hugh Dickins , Jan Kara , Jonathan Corbet , Mel Gorman , Mike Kravetz , Mike Rapoport , Peter Xu , Shuah Khan , Suren Baghdasaryan , Vlastimil Babka , zhangyi , "linux-doc@vger.kernel.org" , "linux-fsdevel@vger.kernel.org" , "linux-kernel@vger.kernel.org" , "linux-mm@kvack.org" , "linux-kselftest@vger.kernel.org" , Andrea Arcangeli Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Spam-Status: No, score=-17.6 required=5.0 tests=BAYES_00,DKIMWL_WL_MED, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF, ENV_AND_HDR_SPF_MATCH,RCVD_IN_DNSWL_NONE,SPF_HELO_NONE,SPF_PASS, USER_IN_DEF_DKIM_WL,USER_IN_DEF_SPF_WL autolearn=unavailable autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Mon, Aug 1, 2022 at 12:53 PM Nadav Amit wrote: > > On Aug 1, 2022, at 10:13 AM, Axel Rasmussen wr= ote: > > > =E2=9A=A0 External Email > > > > I finished up some other work and got around to writing a v5 today, > > but I ran into a problem with /proc/[pid]/userfaultfd. > > > > Files in /proc/[pid]/* are owned by the user/group which started the > > process, and they don't support being chmod'ed. > > > > For the userfaultfd device, I think we want the following semantics: > > - For UFFDs created via the device, we want to always allow handling > > kernel mode faults > > - For security, the device should be owned by root:root by default, so > > unprivileged users don't have default access to handle kernel faults > > - But, the system administrator should be able to chown/chmod it, to > > grant access to handling kernel faults for this process more widely. > > > > It could be made to work like that but I think it would involve at leas= t: > > > > - Special casing userfaultfd in proc_pid_make_inode > > - Updating setattr/getattr for /proc/[pid] to meaningfully store and > > then retrieve uid/gid different from the task's, again probably > > special cased for userfautlfd since we don't want this behavior for > > other files > > > > It seems to me such a change might raise eyebrows among procfs folks. > > Before I spend the time to write this up, does this seem like > > something that would obviously be nack'ed? > > [ Please avoid top-posting in the future ] I will remember this. Gmail's default behavior is annoying. :/ > > I have no interest in making your life harder than it should be. If you > cannot find a suitable alternative, I will not fight against it. > > How about this alternative: how about following KVM usage-model? > > IOW: You open /dev/userfaultfd, but this is not the file-descriptor that = you > use for most operations. Instead you first issue an ioctl - similarly to > KVM_CREATE_VM - to get a file-descriptor for your specific process. You t= hen > use this new file-descriptor to perform your operations (read/ioctl/etc). > > This would make the fact that ioctls/reads from different processes refer= to > different contexts (i.e., file-descriptors) much more natural. > > Does it sound better? Ah, that I think is more or less what my series already proposes, if I understand you correctly. The usage is: fd =3D open(/dev/userfaultfd) /* This FD is only useful for creating new userfaultfds */ uffd =3D ioctl(fd, USERFAULTFD_IOC_NEW) /* Now you get a real uffd */ close(fd); /* No longer needed now that we have a real uffd */ /* Use uffd to register, COPY, CONTINUE, whatever */ One thing we could do now or in the future is extend USERFAULTFD_IOC_NEW to take a pid as an argument, to support creating uffds for remote processes. And then we get the benefit of permissions for /dev nodes working very naturally - they default to root, but can be configured by the sysadmin via chown/chmod, or udev rules, or whatever.