Received: by 2002:a05:6a10:8c0a:0:0:0:0 with SMTP id go10csp7660764pxb; Thu, 18 Feb 2021 16:52:50 -0800 (PST) X-Google-Smtp-Source: ABdhPJx04qoICVAfG0KNWIYTKoorZJHS/YUUeGas09mok/HsO8ANJsjI73w9DaN4XSFGoHglLR/f X-Received: by 2002:a17:906:d10d:: with SMTP id b13mr6335781ejz.204.1613695970206; Thu, 18 Feb 2021 16:52:50 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1613695970; cv=none; d=google.com; s=arc-20160816; b=jrttHZFzUhYNO0++mErp2PX1JxdPv+a4h4XSuqIYXIzaR3WJm+b81dZEUzNS8mZyBh TJ/SPHRv7eYd3SL3rD51s5cU3tRht2omSUq1R8T5CMP+dAN4UtIqhikABxZt1pZYj2ky 60Vg/PNamLfsywrtFRpzv2h6fvV2eY3klBHyvfr7X4n0LPDD1rOWUeR7O7VKp2rbmzhM ry2C8RDJGNttVVlpkKaNSJ36RXF3X+ZCzd0A/8tutxaKnn5mrzdIJ/Q1WYaYILxcK9vT N9v4L+FBiHT8f5/V84vXfBlGRK06DVgCxV+jyAOyPRyr4Y5b0Zv+1wQTeDCvZ2+c2Xax BG8w== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:cc:to:from:subject:references:mime-version :message-id:in-reply-to:date:sender:dkim-signature; bh=1cwm5N3iRSXNZcDifIiTRhsYT8Ky/UdIpcuK8zvOJso=; b=ts88AtQv/VKg6pcYGFcwdCOGj2Fp94lcwlpFBLPNv+ImsPzfrDiZxnGZXNSaZpmGMp bwMdWHCQjKNPtnsaJtt57CZcCpN93cXbkMJqJe5oWsryD7zLd8nqh6eawqasp04dgcp5 E/Z/OzUN6DAYDHPuONu/Eify9ZcmzLaMGXXsnOG1AlrdNsdd1B1TRFsBQCUGaGnjPWUJ h8+MbeGcxl9OV7QbvMYQphij3Kvr1b1kUyZOf8X++Uv86qr/oqacTeH/RVuAHLX+H3Qx g7M9eyKzxs8HY1dJ8rduYg2gCog6wOXFXM3OAkopNI1EpOQbXqk1sBL7B7H5LRLrGHbh SLqA== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@google.com header.s=20161025 header.b=KOIP0f7r; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id y6si4449903edm.386.2021.02.18.16.52.27; Thu, 18 Feb 2021 16:52:50 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; dkim=pass header.i=@google.com header.s=20161025 header.b=KOIP0f7r; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229998AbhBSAug (ORCPT + 99 others); Thu, 18 Feb 2021 19:50:36 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:37874 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229916AbhBSAt7 (ORCPT ); Thu, 18 Feb 2021 19:49:59 -0500 Received: from mail-qt1-x849.google.com (mail-qt1-x849.google.com [IPv6:2607:f8b0:4864:20::849]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 10EFAC061797 for ; Thu, 18 Feb 2021 16:48:42 -0800 (PST) Received: by mail-qt1-x849.google.com with SMTP id v22so2287968qto.16 for ; Thu, 18 Feb 2021 16:48:42 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20161025; h=sender:date:in-reply-to:message-id:mime-version:references:subject :from:to:cc; bh=1cwm5N3iRSXNZcDifIiTRhsYT8Ky/UdIpcuK8zvOJso=; b=KOIP0f7r/7awfBYAwsiP/am4wnAatFFEUV0C/ZGa1HGU4QPkDXqyQT7hMY06fmCKpn pOeD7L0zsDoWrUSccbB/A6Xgn1Iix2vRx7bd4VMXz48CnhGVZKtbqRTamescV20PnCT9 Z6y4D9UXmogsLgHVKFxXJInlJSbJifOEvMNDG5OI5cZFC/pdDWMFoVVEU929qqb8V4jM yP/+a4bKQ50e7fVm+b2KZco77RZC23ckyqym29kxbN01uYvQbFYkLxJb8UMKR0a/LmHt HBDf+qWky+cTz/zlyvFkj48fQMR1P7bYIH1MMUW5Ozz6SYAl3qWbTcUlZ2xkl8nyVZEX Ut4w== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:sender:date:in-reply-to:message-id:mime-version :references:subject:from:to:cc; bh=1cwm5N3iRSXNZcDifIiTRhsYT8Ky/UdIpcuK8zvOJso=; b=bg3rSUi7ZJ+G1rwiMNw5rB+9nmf6YNgJuV/c1WfR7+jeKycldFSm7p1V7oOvsPuicU 3UiqCeHsk0KGCnNI20rPqbfWsTk2KQL/g6kBbUGImokAmBg+XLTnp5x2xJtHHNG4/Ssd 7lE4umKhSJpc1R/l5rWvGiXaiYPHMRNeE0HTV8Y3fo6ZuMLdiLxZ4jVfqjvBG6tNmIuG uHdZeqYpsdar4ADY2A7+JTz34yXI4k9mjY3QxY/Xk+7MHeI9gfFtXlzrvc3uEDThf/bm C70B7Ae+l1B+MGqGqS/XABNU/zG+Lbfu1wqFT3N6cfIL0bn0lVOTsDJYMnk2+OzXdkio 3LfQ== X-Gm-Message-State: AOAM5319o/70M4xJ1zYK+OfeQdNrZVINhH2lmRRkq+2NIBE2JmskDbHd 3uUk4MNZKpIldnfDdLiaRTIVG8xKuy5gmobGpY5k Sender: "axelrasmussen via sendgmr" X-Received: from ajr0.svl.corp.google.com ([2620:15c:2cd:203:e939:4cce:117:5af3]) (user=axelrasmussen job=sendgmr) by 2002:a0c:dd05:: with SMTP id u5mr7037895qvk.54.1613695721183; Thu, 18 Feb 2021 16:48:41 -0800 (PST) Date: Thu, 18 Feb 2021 16:48:23 -0800 In-Reply-To: <20210219004824.2899045-1-axelrasmussen@google.com> Message-Id: <20210219004824.2899045-6-axelrasmussen@google.com> Mime-Version: 1.0 References: <20210219004824.2899045-1-axelrasmussen@google.com> X-Mailer: git-send-email 2.30.0.617.g56c4b15f3c-goog Subject: [PATCH v7 5/6] userfaultfd: update documentation to describe minor fault handling From: Axel Rasmussen To: Alexander Viro , Alexey Dobriyan , Andrea Arcangeli , Andrew Morton , Anshuman Khandual , Catalin Marinas , Chinwen Chang , Huang Ying , Ingo Molnar , Jann Horn , Jerome Glisse , Lokesh Gidra , "Matthew Wilcox (Oracle)" , Michael Ellerman , "=?UTF-8?q?Michal=20Koutn=C3=BD?=" , Michel Lespinasse , Mike Kravetz , Mike Rapoport , Nicholas Piggin , Peter Xu , Shaohua Li , Shawn Anastasio , Steven Rostedt , Steven Price , Vlastimil Babka Cc: linux-kernel@vger.kernel.org, linux-fsdevel@vger.kernel.org, linux-mm@kvack.org, Adam Ruprecht , Axel Rasmussen , Cannon Matthews , "Dr . David Alan Gilbert" , David Rientjes , Mina Almasry , Oliver Upton Content-Type: text/plain; charset="UTF-8" Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Reword / reorganize things a little bit into "lists", so new features / modes / ioctls can sort of just be appended. Describe how UFFDIO_REGISTER_MODE_MINOR and UFFDIO_CONTINUE can be used to intercept and resolve minor faults. Make it clear that COPY and ZEROPAGE are used for MISSING faults, whereas CONTINUE is used for MINOR faults. Signed-off-by: Axel Rasmussen --- Documentation/admin-guide/mm/userfaultfd.rst | 107 ++++++++++++------- 1 file changed, 66 insertions(+), 41 deletions(-) diff --git a/Documentation/admin-guide/mm/userfaultfd.rst b/Documentation/admin-guide/mm/userfaultfd.rst index 65eefa66c0ba..3aa38e8b8361 100644 --- a/Documentation/admin-guide/mm/userfaultfd.rst +++ b/Documentation/admin-guide/mm/userfaultfd.rst @@ -63,36 +63,36 @@ the generic ioctl available. The ``uffdio_api.features`` bitmask returned by the ``UFFDIO_API`` ioctl defines what memory types are supported by the ``userfaultfd`` and what -events, except page fault notifications, may be generated. - -If the kernel supports registering ``userfaultfd`` ranges on hugetlbfs -virtual memory areas, ``UFFD_FEATURE_MISSING_HUGETLBFS`` will be set in -``uffdio_api.features``. Similarly, ``UFFD_FEATURE_MISSING_SHMEM`` will be -set if the kernel supports registering ``userfaultfd`` ranges on shared -memory (covering all shmem APIs, i.e. tmpfs, ``IPCSHM``, ``/dev/zero``, -``MAP_SHARED``, ``memfd_create``, etc). - -The userland application that wants to use ``userfaultfd`` with hugetlbfs -or shared memory need to set the corresponding flag in -``uffdio_api.features`` to enable those features. - -If the userland desires to receive notifications for events other than -page faults, it has to verify that ``uffdio_api.features`` has appropriate -``UFFD_FEATURE_EVENT_*`` bits set. These events are described in more -detail below in `Non-cooperative userfaultfd`_ section. - -Once the ``userfaultfd`` has been enabled the ``UFFDIO_REGISTER`` ioctl should -be invoked (if present in the returned ``uffdio_api.ioctls`` bitmask) to -register a memory range in the ``userfaultfd`` by setting the +events, except page fault notifications, may be generated: + +- The ``UFFD_FEATURE_EVENT_*`` flags indicate that various other events + other than page faults are supported. These events are described in more + detail below in the `Non-cooperative userfaultfd`_ section. + +- ``UFFD_FEATURE_MISSING_HUGETLBFS`` and ``UFFD_FEATURE_MISSING_SHMEM`` + indicate that the kernel supports ``UFFDIO_REGISTER_MODE_MISSING`` + registrations for hugetlbfs and shared memory (covering all shmem APIs, + i.e. tmpfs, ``IPCSHM``, ``/dev/zero``, ``MAP_SHARED``, ``memfd_create``, + etc) virtual memory areas, respectively. + +- ``UFFD_FEATURE_MINOR_HUGETLBFS`` indicates that the kernel supports + ``UFFDIO_REGISTER_MODE_MINOR`` registration for hugetlbfs virtual memory + areas. + +The userland application should set the feature flags it intends to use +when invoking the ``UFFDIO_API`` ioctl, to request that those features be +enabled if supported. + +Once the ``userfaultfd`` API has been enabled the ``UFFDIO_REGISTER`` +ioctl should be invoked (if present in the returned ``uffdio_api.ioctls`` +bitmask) to register a memory range in the ``userfaultfd`` by setting the uffdio_register structure accordingly. The ``uffdio_register.mode`` bitmask will specify to the kernel which kind of faults to track for -the range (``UFFDIO_REGISTER_MODE_MISSING`` would track missing -pages). The ``UFFDIO_REGISTER`` ioctl will return the +the range. The ``UFFDIO_REGISTER`` ioctl will return the ``uffdio_register.ioctls`` bitmask of ioctls that are suitable to resolve userfaults on the range registered. Not all ioctls will necessarily be -supported for all memory types depending on the underlying virtual -memory backend (anonymous memory vs tmpfs vs real filebacked -mappings). +supported for all memory types (e.g. anonymous memory vs. shmem vs. +hugetlbfs), or all types of intercepted faults. Userland can use the ``uffdio_register.ioctls`` to manage the virtual address space in the background (to add or potentially also remove @@ -100,21 +100,46 @@ memory from the ``userfaultfd`` registered range). This means a userfault could be triggering just before userland maps in the background the user-faulted page. -The primary ioctl to resolve userfaults is ``UFFDIO_COPY``. That -atomically copies a page into the userfault registered range and wakes -up the blocked userfaults -(unless ``uffdio_copy.mode & UFFDIO_COPY_MODE_DONTWAKE`` is set). -Other ioctl works similarly to ``UFFDIO_COPY``. They're atomic as in -guaranteeing that nothing can see an half copied page since it'll -keep userfaulting until the copy has finished. +Resolving Userfaults +-------------------- + +There are three basic ways to resolve userfaults: + +- ``UFFDIO_COPY`` atomically copies some existing page contents from + userspace. + +- ``UFFDIO_ZEROPAGE`` atomically zeros the new page. + +- ``UFFDIO_CONTINUE`` maps an existing, previously-populated page. + +These operations are atomic in the sense that they guarantee nothing can +see a half-populated page, since readers will keep userfaulting until the +operation has finished. + +By default, these wake up userfaults blocked on the range in question. +They support a ``UFFDIO_*_MODE_DONTWAKE`` ``mode`` flag, which indicates +that waking will be done separately at some later time. + +Which ioctl to choose depends on the kind of page fault, and what we'd +like to do to resolve it: + +- For ``UFFDIO_REGISTER_MODE_MISSING`` faults, the fault needs to be + resolved by either providing a new page (``UFFDIO_COPY``), or mapping + the zero page (``UFFDIO_ZEROPAGE``). By default, the kernel would map + the zero page for a missing fault. With userfaultfd, userspace can + decide what content to provide before the faulting thread continues. + +- For ``UFFDIO_REGISTER_MODE_MINOR`` faults, there is an existing page (in + the page cache). Userspace has the option of modifying the page's + contents before resolving the fault. Once the contents are correct + (modified or not), userspace asks the kernel to map the page and let the + faulting thread continue with ``UFFDIO_CONTINUE``. Notes: -- If you requested ``UFFDIO_REGISTER_MODE_MISSING`` when registering then - you must provide some kind of page in your thread after reading from - the uffd. You must provide either ``UFFDIO_COPY`` or ``UFFDIO_ZEROPAGE``. - The normal behavior of the OS automatically providing a zero page on - an anonymous mmaping is not in place. +- You can tell which kind of fault occurred by examining + ``pagefault.flags`` within the ``uffd_msg``, checking for the + ``UFFD_PAGEFAULT_FLAG_*`` flags. - None of the page-delivering ioctls default to the range that you registered with. You must fill in all fields for the appropriate @@ -122,9 +147,9 @@ Notes: - You get the address of the access that triggered the missing page event out of a struct uffd_msg that you read in the thread from the - uffd. You can supply as many pages as you want with ``UFFDIO_COPY`` or - ``UFFDIO_ZEROPAGE``. Keep in mind that unless you used DONTWAKE then - the first of any of those IOCTLs wakes up the faulting thread. + uffd. You can supply as many pages as you want with these IOCTLs. + Keep in mind that unless you used DONTWAKE then the first of any of + those IOCTLs wakes up the faulting thread. - Be sure to test for all errors including (``pollfd[0].revents & POLLERR``). This can happen, e.g. when ranges -- 2.30.0.617.g56c4b15f3c-goog