Received: by 2002:a6b:fb09:0:0:0:0:0 with SMTP id h9csp35653iog; Tue, 14 Jun 2022 18:17:40 -0700 (PDT) X-Google-Smtp-Source: AGRyM1sxmMuLkpoPZ/PBpFNhJn2Ji9fSaB0TpFUac5F6clZTIgmsoBvk3USmuZPE4Dc2zFkxmN8A X-Received: by 2002:a17:902:7611:b0:168:cb95:fd27 with SMTP id k17-20020a170902761100b00168cb95fd27mr6731358pll.126.1655255860699; Tue, 14 Jun 2022 18:17:40 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1655255860; cv=none; d=google.com; s=arc-20160816; b=suLvOaecc53uQfgsVvU9Ft8k2XUWqP+gAqskcJ2eUKh0F0hUVyZUTimqZm4RbssKFU SETm8H0so7RpbsBkygxXOUmqXJPrYXO7wRSRa1bgJfNBGjeu5DPjrB6Dbqq4HsMqrg5I T0bN7ypykq688BlNMWRlfkYc1MLagCQ/8RUdPMPoAxLvW//fuzdxc7hknZjZdmjkuLjy Z1S19CrElfKDgGqidgi4fEZbc4AjKRD8BomOcf5ccNxEJyZ+ySR6wdHEPcq2YFvGwSM8 6F0WtUFZdDr98/3Tq1ci2ouIJV8YnJlCfuBH2JakBASqZdFWnTZTQri9cHnqrrFJQ0Ku pfmg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:cc:to:subject:message-id:date:from:in-reply-to :references:mime-version:dkim-signature; bh=tKyg93cuOsjgX3SRfMau9CFkIFtGcca71X1GBGA/twA=; b=bBuCnz8zHYuqHZqEcS8r5ufhNhiAjM+wy10xqa94xaV5ZZ+arI9ZHkBGssoLs7uewx qklOICjkZFadpWNiJ0/PI9vUCZV8tCI0HoCpMCnClJ077ckP+47W0KAg5rleVnCWnpYf njeLlVhIiUX4ZyM4TKUlSG6HB1cuKGY/jOH5DC9O7LEiFquaNDPtJz/9JlB23EvE/f7E sRe+97SLa2k/5BHBcXX905KxNx/Njr04zE5FMxOrmloe3RtZNfCcRU7dolPNG76tiA/u rrn/3bhOmLbk4qFpJ2Nlo0Sa+0HbsEGDF1RUX8HIyFkqA6ncATe6o+wC5FedTAgYz2lG ZUGw== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@google.com header.s=20210112 header.b=MQ6LtxZj; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Return-Path: Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id x27-20020a056a00189b00b005184183372csi16042273pfh.350.2022.06.14.18.17.25; Tue, 14 Jun 2022 18:17:40 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@google.com header.s=20210112 header.b=MQ6LtxZj; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S241904AbiFOA4g (ORCPT + 99 others); Tue, 14 Jun 2022 20:56:36 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:34220 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S242061AbiFOA4e (ORCPT ); Tue, 14 Jun 2022 20:56:34 -0400 Received: from mail-io1-xd34.google.com (mail-io1-xd34.google.com [IPv6:2607:f8b0:4864:20::d34]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id C1D6D4D637 for ; Tue, 14 Jun 2022 17:56:32 -0700 (PDT) Received: by mail-io1-xd34.google.com with SMTP id e80so11164297iof.3 for ; Tue, 14 Jun 2022 17:56:32 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20210112; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=tKyg93cuOsjgX3SRfMau9CFkIFtGcca71X1GBGA/twA=; b=MQ6LtxZjQfWtQHpWV47vomiQdfj6L27c0TRvC+C0M3zjA6uTx0gi+PucQ1P8g5G2lR 9s4oKj6dTne6zg91Xlovn8GYKkSwHJeIPuUHrS1pUHw5ghOP87l97ghNFoSKJ9mSge+C bEQwt8e/oxql3neQNhMMlLVpRMWR+eKrArWle1YkBw/jEN5CSb4b5l9idUba5IPyPKb0 CFV2ZKNqCNOx/k7mUFmt6iaI8e4IO9X74o7exY4nB9pCR4mfCHevtOig8n3yuIL7806C CdcyhTYbMciQTpJrRaO0LMoQaewg3RhKTqCq5trGPrCnzKvs17mUU2qQfL/AQFilLOPc mU4Q== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=tKyg93cuOsjgX3SRfMau9CFkIFtGcca71X1GBGA/twA=; b=kPQDV3Ylhgwf8MkCWSm4aMeu983HzGLZ/DyCDrVCbRmqOjVnGfN3pU09d0/Os6JUf8 OSBLetuMKf8Pm0m9fHzi4jLfbfVUreGFIVN3FH0TsigpgxxgN24+rbuLqsQIoEjIJk16 LrNuFz/Ofsg2TRwX/XO0RNkqRQs+Qzsb2UGnB8AMT0hg3y0NMnWa415rFnT4nrMa3BYR UtItdVRSteuWvq8JKTKDTlOto43G9v62mLb58Qx9wBggDvvPEbY99Jmd8nY+hEsPUJgQ tPW+L83IX/xphILff3wtH3v5LiCJ+UNFGJB05/LRcwmhD6CHnkyQC79iral3ZfKmOCcA R3Eg== X-Gm-Message-State: AOAM531EcXwvwiCvspv5smiv4MXTHD9jtqcyqHFjKyohLGBLR0lBFQCT F7pTlVwkshFlwN4ZHWB8BEUqE8gAGyAAgsSlT3I99g== X-Received: by 2002:a6b:bac1:0:b0:669:b1fe:58e4 with SMTP id k184-20020a6bbac1000000b00669b1fe58e4mr3965351iof.171.1655254591967; Tue, 14 Jun 2022 17:56:31 -0700 (PDT) MIME-Version: 1.0 References: <20220601210951.3916598-1-axelrasmussen@google.com> <20220601210951.3916598-3-axelrasmussen@google.com> <20220613145540.1c9f7750092911bae1332b92@linux-foundation.org> In-Reply-To: From: Axel Rasmussen Date: Tue, 14 Jun 2022 17:55:55 -0700 Message-ID: Subject: Re: [PATCH v3 2/6] userfaultfd: add /dev/userfaultfd for fine grained access control To: Nadav Amit Cc: Peter Xu , Andrew Morton , Alexander Viro , Charan Teja Reddy , Dave Hansen , "Dmitry V . Levin" , Gleb Fotengauer-Malinovskiy , Hugh Dickins , Jan Kara , Jonathan Corbet , Mel Gorman , Mike Kravetz , Mike Rapoport , Shuah Khan , Suren Baghdasaryan , Vlastimil Babka , zhangyi , "linux-doc@vger.kernel.org" , linux-fsdevel , LKML , Linux MM , Linuxkselftest Content-Type: text/plain; charset="UTF-8" X-Spam-Status: No, score=-17.6 required=5.0 tests=BAYES_00,DKIMWL_WL_MED, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF, ENV_AND_HDR_SPF_MATCH,RCVD_IN_DNSWL_NONE,SPF_HELO_NONE,SPF_PASS, T_SCC_BODY_TEXT_LINE,USER_IN_DEF_DKIM_WL,USER_IN_DEF_SPF_WL autolearn=unavailable autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Mon, Jun 13, 2022 at 5:10 PM Nadav Amit wrote: > > On Jun 13, 2022, at 3:38 PM, Axel Rasmussen wrote: > > > On Mon, Jun 13, 2022 at 3:29 PM Peter Xu wrote: > >> On Mon, Jun 13, 2022 at 02:55:40PM -0700, Andrew Morton wrote: > >>> On Wed, 1 Jun 2022 14:09:47 -0700 Axel Rasmussen wrote: > >>> > >>>> To achieve this, add a /dev/userfaultfd misc device. This device > >>>> provides an alternative to the userfaultfd(2) syscall for the creation > >>>> of new userfaultfds. The idea is, any userfaultfds created this way will > >>>> be able to handle kernel faults, without the caller having any special > >>>> capabilities. Access to this mechanism is instead restricted using e.g. > >>>> standard filesystem permissions. > >>> > >>> The use of a /dev node isn't pretty. Why can't this be done by > >>> tweaking sys_userfaultfd() or by adding a sys_userfaultfd2()? > > > > I think for any approach involving syscalls, we need to be able to > > control access to who can call a syscall. Maybe there's another way > > I'm not aware of, but I think today the only mechanism to do this is > > capabilities. I proposed adding a CAP_USERFAULTFD for this purpose, > > but that approach was rejected [1]. So, I'm not sure of another way > > besides using a device node. > > > > One thing that could potentially make this cleaner is, as one LWN > > commenter pointed out, we could have open() on /dev/userfaultfd just > > return a new userfaultfd directly, instead of this multi-step process > > of open /dev/userfaultfd, NEW ioctl, then you get a userfaultfd. When > > I wrote this originally it wasn't clear to me how to get that to > > happen - open() doesn't directly return the result of our custom open > > function pointer, as far as I can tell - but it could be investigated. > > If this direction is pursued, I think that it would be better to set it as > /proc/[pid]/userfaultfd, which would allow remote monitors (processes) to > hook into userfaultfd of remote processes. I have a patch for that which > extends userfaultfd syscall, but /proc/[pid]/userfaultfd may be cleaner. Hmm, one thing I'm unsure about - If a process is able to control another process' memory like this, then this seems like exactly what CAP_SYS_PTRACE is intended to deal with, right? So I'm not sure this case is directly related to the one I'm trying to address. This also seems distinct to me versus the existing way you'd do this, which is open a userfaultfd and register a shared memory region, and then fork(). Now you can control your child's memory with userfaultfd. But, attaching to some other, previously-unrelated process with /proc/[pid]/userfaultfd seems like a clear case for CAP_SYS_PTRACE. >