Received: by 2002:a05:6602:2086:0:0:0:0 with SMTP id a6csp4425122ioa; Wed, 27 Apr 2022 03:41:32 -0700 (PDT) X-Google-Smtp-Source: ABdhPJzQl+C9V5oQrpmld/BykeFDaojy4zta9ttezHD2w57++eLGnuWKaPKb1m/LCmNUaJREmUtk X-Received: by 2002:a05:6a00:140c:b0:4e1:530c:edc0 with SMTP id l12-20020a056a00140c00b004e1530cedc0mr29103910pfu.18.1651056092265; Wed, 27 Apr 2022 03:41:32 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1651056092; cv=none; d=google.com; s=arc-20160816; b=CzZFoUmxYqim44ZoergJ/cHrpLgtqxaT3uWZTP0dxhkUnDTg16tJNfS0jQBz7t7vNm RaxkozKsH1v7yxsaqI8zgfGvQ5G/9FIImkSljlmIUstsx/qh/4sUlt/r3+P0U791N1AO e9SCPPqtclukBs2C2KVktOjxNLo9b121p+wrdNzo0c/Apg+RGdF0776yCaPlqgAaPkOW sFmabPYhxfqfMJtaCHb0vJIthcIyuDtP+F6oZ6agDFVRbBdXq7xpUskfSB5D7WSyBM2n 3sAd8YfLgyrR0C2AVA8NzU5H+9NS6uSglJgV3d74cG94kjQ6h+/NP6Q8LZjYNndFYpev X+sw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:content-language :in-reply-to:mime-version:user-agent:date:message-id:from:references :cc:to:subject:dkim-signature; bh=nGoQA4PseSMoDB9HLCqlZ/CtlZ9MNnC5N8+D59h5w8o=; b=BvjTO9A6zAxR7afQlx6DP5IQ/AcS8Ivx/Qo7NxDOCHEgwilaTkcSyhYOpbdI9Xxp3E Ur6xkuUZm1uCt7tcG17MaWdJDgdmBwsq51Y8kOsx6vG7BXO69gfwfvJx7YDHLw+KCZUN CQJ6r+wuVV0/qekv8TR/ECtzVFI0LxIwAq67HMHOuM2f9Jrnjhaxwi8uK5OpB2gKewd1 1joFKTNGuqzpRHqtyztbwoR/xrY1aezMMifLfafWnVubkmOHJ5aOvWlQkLpda8DPftXY 7E6iQ2oMrJfhrtNp813FxsaVdjfw2DO6gRtSQk0Mnjy6JwH3gbnqHt6h/4YIwFWZh+1A 7IEA== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@linuxfoundation.org header.s=google header.b="QnD/GAUE"; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=linuxfoundation.org Return-Path: Received: from lindbergh.monkeyblade.net (lindbergh.monkeyblade.net. [2620:137:e000::1:18]) by mx.google.com with ESMTPS id x26-20020a63b21a000000b003ab54545cd7si1173275pge.89.2022.04.27.03.41.30 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 27 Apr 2022 03:41:32 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:18 as permitted sender) client-ip=2620:137:e000::1:18; Authentication-Results: mx.google.com; dkim=pass header.i=@linuxfoundation.org header.s=google header.b="QnD/GAUE"; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=linuxfoundation.org Received: from out1.vger.email (out1.vger.email [IPv6:2620:137:e000::1:20]) by lindbergh.monkeyblade.net (Postfix) with ESMTP id 6107C400EE7; Wed, 27 Apr 2022 02:53:29 -0700 (PDT) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1353353AbiDZQxJ (ORCPT + 99 others); Tue, 26 Apr 2022 12:53:09 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:44232 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1353251AbiDZQv3 (ORCPT ); Tue, 26 Apr 2022 12:51:29 -0400 Received: from mail-io1-xd2b.google.com (mail-io1-xd2b.google.com [IPv6:2607:f8b0:4864:20::d2b]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 54FFB4ECE6 for ; Tue, 26 Apr 2022 09:46:55 -0700 (PDT) Received: by mail-io1-xd2b.google.com with SMTP id m13so7225366iob.4 for ; Tue, 26 Apr 2022 09:46:55 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linuxfoundation.org; s=google; h=subject:to:cc:references:from:message-id:date:user-agent :mime-version:in-reply-to:content-language:content-transfer-encoding; bh=nGoQA4PseSMoDB9HLCqlZ/CtlZ9MNnC5N8+D59h5w8o=; b=QnD/GAUE7CCosj6Br3twZxRnNYr08rmMRUz4OvX6q9jQkOBisUBMikIxBXEke6gIxl HDdrX0BRG0q0xgvMNTMpX+sEHvqA54rAFblk/g6oJw6KfoQVAtYNWdoWWuoAgU2e8wvP b1JNpyVhx3OhWzMDxCKoDnl7Ib0SKcaHb2rf8= X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:subject:to:cc:references:from:message-id:date :user-agent:mime-version:in-reply-to:content-language :content-transfer-encoding; bh=nGoQA4PseSMoDB9HLCqlZ/CtlZ9MNnC5N8+D59h5w8o=; b=yMll3VJqLxGfqIm1d9lanOaug/1I+m1NECVvgifAfkoJQm1b4RkiGz2R4+PT/EbbyQ HrD34gRwNZRVJVRnn0Hon8E/FFCwLZMAnB9Qdf7PGcHekCxsS2zg0lcmkZC37GBo48lL HGXcHCsDXbE/LPYeOCC4jD739xmbRgP/90qVd+BDbcRMoG1a9Jx6wz3Bu7aA2CGy45P2 OaMSZ/wf7jZuTVA6XIYrAG2bVfniAcru5mQfM2YfYM/WOrIBRgJefSvCl+IJsFKbU+Da z+x2qH2oyDgxoBTEAfdGZMmtTEhlIdS676RPzr999WgoTXGq+8d9vDgWF+qV92ltcDDN VOfg== X-Gm-Message-State: AOAM530vZiAn6TWvQE6tUNiu9AwMU1+65CVuKrobMQTv3PBwqhE5qYRN MFKQAu6JVzyTz2zWXQoRr6znPw== X-Received: by 2002:a05:6638:2182:b0:323:a610:3eaf with SMTP id s2-20020a056638218200b00323a6103eafmr10541408jaj.204.1650991614619; Tue, 26 Apr 2022 09:46:54 -0700 (PDT) Received: from [192.168.1.128] ([71.205.29.0]) by smtp.gmail.com with ESMTPSA id u12-20020a056e02170c00b002cc27d7fe26sm8832029ill.22.2022.04.26.09.46.53 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Tue, 26 Apr 2022 09:46:54 -0700 (PDT) Subject: Re: [PATCH v2 4/6] userfaultfd: update documentation to describe /dev/userfaultfd To: Axel Rasmussen , Alexander Viro , Andrew Morton , Charan Teja Reddy , Dave Hansen , "Dmitry V . Levin" , Gleb Fotengauer-Malinovskiy , Hugh Dickins , Jan Kara , Jonathan Corbet , Mel Gorman , Mike Kravetz , Mike Rapoport , Nadav Amit , Peter Xu , Shuah Khan , Suren Baghdasaryan , Vlastimil Babka , zhangyi Cc: linux-doc@vger.kernel.org, linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org, linux-mm@kvack.org, linux-kselftest@vger.kernel.org, Shuah Khan References: <20220422212945.2227722-1-axelrasmussen@google.com> <20220422212945.2227722-5-axelrasmussen@google.com> From: Shuah Khan Message-ID: Date: Tue, 26 Apr 2022 10:46:52 -0600 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:78.0) Gecko/20100101 Thunderbird/78.8.1 MIME-Version: 1.0 In-Reply-To: <20220422212945.2227722-5-axelrasmussen@google.com> Content-Type: text/plain; charset=utf-8; format=flowed Content-Language: en-US Content-Transfer-Encoding: 7bit X-Spam-Status: No, score=-4.4 required=5.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS, MAILING_LIST_MULTI,NICE_REPLY_A,RDNS_NONE,SPF_HELO_NONE autolearn=unavailable autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 4/22/22 3:29 PM, Axel Rasmussen wrote: > Explain the different ways to create a new userfaultfd, and how access > control works for each way. > > Signed-off-by: Axel Rasmussen > --- > Documentation/admin-guide/mm/userfaultfd.rst | 38 ++++++++++++++++++-- > Documentation/admin-guide/sysctl/vm.rst | 3 ++ > 2 files changed, 39 insertions(+), 2 deletions(-) > > diff --git a/Documentation/admin-guide/mm/userfaultfd.rst b/Documentation/admin-guide/mm/userfaultfd.rst > index 6528036093e1..4c079b5377d4 100644 > --- a/Documentation/admin-guide/mm/userfaultfd.rst > +++ b/Documentation/admin-guide/mm/userfaultfd.rst > @@ -17,7 +17,10 @@ of the ``PROT_NONE+SIGSEGV`` trick. > Design > ====== > > -Userfaults are delivered and resolved through the ``userfaultfd`` syscall. Please keep this sentence in there and rephrase it to indicate how it was done in the past. Also explain here why this new approach is better than the syscall approach before getting into the below details. > +Userspace creates a new userfaultfd, initializes it, and registers one or more > +regions of virtual memory with it. Then, any page faults which occur within the > +region(s) result in a message being delivered to the userfaultfd, notifying > +userspace of the fault. > > The ``userfaultfd`` (aside from registering and unregistering virtual > memory ranges) provides two primary functionalities: > @@ -39,7 +42,7 @@ Vmas are not suitable for page- (or hugepage) granular fault tracking > when dealing with virtual address spaces that could span > Terabytes. Too many vmas would be needed for that.> > -The ``userfaultfd`` once opened by invoking the syscall, can also be > +The ``userfaultfd``, once created, can also be This is sentence is too short and would look odd. Combine the sentences so it renders well in the generated doc. > passed using unix domain sockets to a manager process, so the same > manager process could handle the userfaults of a multitude of > different processes without them being aware about what is going on > @@ -50,6 +53,37 @@ is a corner case that would currently return ``-EBUSY``). > API > === > > +Creating a userfaultfd > +---------------------- > + > +There are two mechanisms to create a userfaultfd. There are various ways to > +restrict this too, since userfaultfds which handle kernel page faults have > +historically been a useful tool for exploiting the kernel. > + > +The first is the userfaultfd(2) syscall. Access to this is controlled in several > +ways: > + > +- By default, the userfaultfd will be able to handle kernel page faults. This > + can be disabled by passing in UFFD_USER_MODE_ONLY. > + > +- If vm.unprivileged_userfaultfd is 0, then the caller must *either* have > + CAP_SYS_PTRACE, or pass in UFFD_USER_MODE_ONLY. > + > +- If vm.unprivileged_userfaultfd is 1, then no particular privilege is needed to > + use this syscall, even if UFFD_USER_MODE_ONLY is *not* set. > + > +Alternatively, userfaultfds can be created by opening /dev/userfaultfd, and > +issuing a USERFAULTFD_IOC_NEW ioctl to this device. Access to this device is New ioctl? I thought we are moving away from using ioctls? > +controlled via normal filesystem permissions (user/group/mode for example) - no > +additional permission (capability/sysctl) is needed to be able to handle kernel > +faults this way. This is useful because it allows e.g. a specific user or group > +to be able to create kernel-fault-handling userfaultfds, without allowing it > +more broadly, or granting more privileges in addition to that particular ability > +(CAP_SYS_PTRACE). In other words, it allows permissions to be minimized. > + > +Initializing up a userfaultfd > +------------------------ > + This will generate doc warn very likley - extend the dashes to the entire length of the subtitle. > When first opened the ``userfaultfd`` must be enabled invoking the > ``UFFDIO_API`` ioctl specifying a ``uffdio_api.api`` value set to ``UFFD_API`` (or > a later API version) which will specify the ``read/POLLIN`` protocol > diff --git a/Documentation/admin-guide/sysctl/vm.rst b/Documentation/admin-guide/sysctl/vm.rst > index f4804ce37c58..8682d5fbc8ea 100644 > --- a/Documentation/admin-guide/sysctl/vm.rst > +++ b/Documentation/admin-guide/sysctl/vm.rst > @@ -880,6 +880,9 @@ calls without any restrictions. > > The default value is 0. > > +An alternative to this sysctl / the userfaultfd(2) syscall is to create > +userfaultfds via /dev/userfaultfd. See > +Documentation/admin-guide/mm/userfaultfd.rst. > > user_reserve_kbytes > =================== > thanks, -- Shuah