Received: by 2002:a05:6a10:a852:0:0:0:0 with SMTP id d18csp2938233pxy; Mon, 3 May 2021 11:16:31 -0700 (PDT) X-Google-Smtp-Source: ABdhPJz+0VdLchz4izqZUH61e1CBEZQiJt/abxrxt1M39kXax9Pxa7Lhv6jSzgQ5GNDyY02dyEbv X-Received: by 2002:a17:902:d2ca:b029:ee:bcef:5239 with SMTP id n10-20020a170902d2cab02900eebcef5239mr12248821plc.81.1620065791193; Mon, 03 May 2021 11:16:31 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1620065791; cv=none; d=google.com; s=arc-20160816; b=zZpRlu5VYb2kgye/FZaQ6jZMq2tNQ43iKLKpeU/i5qUKDpp4xre6ORbiIta0yVvgNK arlTq904Zghb5FEQYf25XN950nX0Jb7Ftaa0SfLH09DnszpJX2q32ivcVX4KwYpjNsi6 WVGPkm8fMBGmIwaRFAGl/2ivAyE/A8vhJ0aEEPZ5T2+XCzEcKR+hTvCnL8axhIjG3e+d qDwcoX2sQCGTBPoiJpTgZKUhukmvr5ZFv+QswwfzAk4cQVhPoTL8abyMN4oNImlVHui6 4zaW33KPJxZexbDL2DukOHaJ4EncAQILCZwY9+G/srHrztEYnwnjH1w1WkokKMe7Gm+x B4Jw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:cc:to:from:subject:mime-version:message-id:date :dkim-signature; bh=clUE+9HvpWe+AhHNgycvb38TYHJ82kheqtW8YMjhSZE=; b=IcKi0kTmrwrvl1vrf19M87Fy6WsUtCgA7/rA2/vynW/M376YsGGoYVMX0I2QG5S950 YP4/hc2sYljvAeQ6w1RspzEJQv1FCMkijEHbLf4rWiGX8cwie5mJ4Grwm9fHEW/5cZuE ytpL3r+6JfqSFFp02QVsvSObPAGrfnYn4EmQMFfa89LnVrCpAhnxieGJAfoauo7IJ43F GRBRbGSjiT+nargQ2hG6IDfl3ZrzjuXzpA2WxEujltiqnkZOYeG2UAZu6HjlGHnoPfgt 07QMtLVSQuHJNnIeMuvE2zGTw8RcC77TKPZYGKyAzx+keAh6f5EefV8YV+MWYI+jaO3x JUmw== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@google.com header.s=20161025 header.b=qpqak4un; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id c7si429862plo.279.2021.05.03.11.16.18; Mon, 03 May 2021 11:16:31 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; dkim=pass header.i=@google.com header.s=20161025 header.b=qpqak4un; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231513AbhECSIi (ORCPT + 99 others); Mon, 3 May 2021 14:08:38 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:42602 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S231664AbhECSIh (ORCPT ); Mon, 3 May 2021 14:08:37 -0400 Received: from mail-yb1-xb4a.google.com (mail-yb1-xb4a.google.com [IPv6:2607:f8b0:4864:20::b4a]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 8A6CFC06174A for ; Mon, 3 May 2021 11:07:43 -0700 (PDT) Received: by mail-yb1-xb4a.google.com with SMTP id i8-20020a0569020688b02904ef3bd00ce7so8722578ybt.7 for ; Mon, 03 May 2021 11:07:43 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20161025; h=date:message-id:mime-version:subject:from:to:cc; bh=clUE+9HvpWe+AhHNgycvb38TYHJ82kheqtW8YMjhSZE=; b=qpqak4unLM8qQfi876apEbJkT1HiB2B5c3LraaO9JxCCJEY36Fe69ptYikR2jfVRax HxmCLDxLU2AAs6HSdeQRLJ3lFyDzQZWjR+oJyn5zjkT8xJIUfvaBLj7ZF9moyyCiyj/o okfDnHfteRCkIE08C3Z3GpYFij+Q4QqepcLh0ciBTk+GDlsfezzIx/7MA3bcsnynZCTM fL0V+ejVcSJlN3orfkZYfyQ+WCwCMESzhS7u0FGHigUL6UYPBNOeSF6M36hUmpNWzJH1 iut9VG7sEjNoK0VU4zsDbdRrR34gdZd3oDQhzUg/b76IWwudZBHpP6AwbyLrWPC0B052 fgtA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:date:message-id:mime-version:subject:from:to:cc; bh=clUE+9HvpWe+AhHNgycvb38TYHJ82kheqtW8YMjhSZE=; b=QLSuB9Rdtps6+152lx0frA+0PCaB9WTf9sLAGx/o7Uz1kWrJ6+pS+KxACkGZw21liP YFvUjXFSWXnbNoUKfYkaKuv+/FOlmmPUzQhkOmDjq6ntmDDDE1xp1qhq2iUBYJngdqfs gWcBWHf1R8VmbPGpXlJTn4ufJIoO4gMXPyn1BgPc0xXZrX/y552RNuAY8PxWGz9RrYFH lO29yB7Bp37GmFrYzD34wFOOgG23Q6WMKC/3CkJ5G0I8su5tYmnCW6YZbcHUCpWy9drh JPxu4kzMPASu/f8jIeqmumMmNgCde9iPtKogQcciKYqf9PsOCRWeX0uLSGdUWuKhuoQl ODTQ== X-Gm-Message-State: AOAM530FIF2+Ulj8se77+dHjd+HWJKFul3b1BP9gE0GRBJV2CuesNCNs xjITvbQ/vyYrdYClCwcerkuiyqVYaWw9v4TZoBpR X-Received: from ajr0.svl.corp.google.com ([2620:15c:2cd:203:3d79:e69a:a4f9:ef0]) (user=axelrasmussen job=sendgmr) by 2002:a25:80c4:: with SMTP id c4mr29628092ybm.283.1620065262758; Mon, 03 May 2021 11:07:42 -0700 (PDT) Date: Mon, 3 May 2021 11:07:27 -0700 Message-Id: <20210503180737.2487560-1-axelrasmussen@google.com> Mime-Version: 1.0 X-Mailer: git-send-email 2.31.1.527.g47e6f16901-goog Subject: [PATCH v6 00/10] userfaultfd: add minor fault handling for shmem From: Axel Rasmussen To: Alexander Viro , Andrea Arcangeli , Andrew Morton , Hugh Dickins , Jerome Glisse , Joe Perches , Lokesh Gidra , Mike Kravetz , Mike Rapoport , Peter Xu , Shaohua Li , Shuah Khan , Stephen Rothwell , Wang Qing Cc: linux-api@vger.kernel.org, linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org, linux-kselftest@vger.kernel.org, linux-mm@kvack.org, Axel Rasmussen , Brian Geffon , "Dr . David Alan Gilbert" , Mina Almasry , Oliver Upton Content-Type: text/plain; charset="UTF-8" Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Base ==== This series is based on (and therefore should apply cleanly to) the tag "v5.12-rc8-mmots-2021-04-21-23-08", with the following applied first: 1. Peter's selftest cleanup series: https://lore.kernel.org/patchwork/cover/1412450/ 2. My patch to fix a pre-existing BUG_ON in an edge case: https://lore.kernel.org/patchwork/patch/1419758/ Changelog ========= v5->v6: - Picked up {Reviewed,Acked}-by's. - Rebased onto v5.12-rc8-mmots-2021-04-21-23-08. - Put mistakenly removed delete_from_page_cache() back in the error path in shmem_mfill_atomic_pte(). [Hugh] - Keep shmem_mfill_atomic_pte() naming, instead of shmem_mcopy_... Likewise, rename our new helper to mfill_atomic_install_pte(). [Hugh] - Return directly instead of "goto out" in shmem_mfill_atomic_pte(), saving a couple of lines. [Peter] v4->v5: - Picked up {Reviewed,Acked}-by's. - Fix cleanup in error path in shmem_mcopy_atomic_pte(). [Hugh, Peter] - Mention switching to lru_cache_add() in the commit message of 9/10. [Hugh] - Split + reorder commits, so now we 1) implement the faulting path, 2) implement the CONTINUE ioctl, and 3) advertise the feature. Squash the documentation update into step (3). [Hugh, Peter] - Reorder install_pte() cleanup to come before selftest changes. [Hugh] v3->v4: - Fix handling of the shmem private mcopy case. Previously, I had (incorrectly) assumed that !vma_is_anonymous() was equivalent to "the page will be in the page cache". But, in this case we have an optimization where we allocate a new *anonymous* page. So, use a new "bool page_in_cache" instead, which checks if page->mapping is set. Correct several places with this new check. [Hugh] - Fix calling mm_counter() before page_add_..._rmap(). [Hugh] - When modifying shmem_mcopy_atomic_pte() to use the new install_pte() helper, just use lru_cache_add_inactive_or_unevictable(), no need to branch and maybe use lru_cache_add(). [Hugh] - De-pluralize mcopy_atomic_install_pte(s). [Hugh] - Make "writable" a bool, and initialize consistently. [Hugh] v2->v3: - Picked up {Reviewed,Acked}-by's. - Reorder commits: introduce CONTINUE before MINOR registration. [Hugh, Peter] - Don't try to {unlock,put}_page an xarray value in shmem_getpage_gfp. [Hugh] - Move enum mcopy_atomic_mode forward declare out of CONFIG_HUGETLB_PAGE. [Hugh] - Keep mistakenly removed UFFD_USER_MODE_ONLY in selftest. [Peter] - Cleanup context management in self test (make clear implicit, remove unneeded return values now that we have err()). [Peter] - Correct dst_pte argument to dst_pmd in shmem_mcopy_atomic_pte macro. [Hugh] - Mention the new shmem support feature in documentation. [Hugh] v1->v2: - Pick up Reviewed-by's. - Don't swapin page when a minor fault occurs. Notice that it needs to be swapped in, and just immediately fire the minor fault. Let a future CONTINUE deal with swapping in the page. [Peter] - Clarify comment about i_size checks in mm/userfaultfd.c. [Peter] - Only forward declare once (out of #ifdef) in hugetlb.h. [Peter] Changes since [2]: - Squash the fixes ([2]) in with the original series ([1]). This makes reviewing easier, as we no longer have to sift through deltas undoing what we had done before. [Hugh, Peter] - Modify shmem_mcopy_atomic_pte() to use the new mcopy_atomic_install_ptes() helper, reducing code duplication. [Hugh] - Properly trigger handle_userfault() in the shmem_swapin_page() case. [Hugh] - Use shmem_getpage() instead of find_lock_page() to lookup the existing page in for continue. This properly deals with swapped-out pages. [Hugh] - Unconditionally pte_mkdirty() for anon memory (as before). [Peter] - Don't include userfaultfd_k.h in either hugetlb.h or shmem_fs.h. [Hugh] - Add comment for UFFD_FEATURE_MINOR_SHMEM (to match _HUGETLBFS). [Hugh] - Fix some small cleanup issues (parens, reworded conditionals, reduced plumbing of some parameters, simplify labels/gotos, ...). [Hugh, Peter] Overview ======== See the series which added minor faults for hugetlbfs [3] for a detailed overview of minor fault handling in general. This series adds the same support for shmem-backed areas. This series is structured as follows: - Commits 1 and 2 are cleanups. - Commits 3 and 4 implement the new feature (minor fault handling for shmem). - Commit 5 advertises that the feature is now available since at this point it's fully implemented. - Commit 6 is a final cleanup, modifying an existing code path to re-use a new helper we've introduced. - Commits 7, 8, 9, 10 update the userfaultfd selftest to exercise the feature. Use Case ======== In some cases it is useful to have VM memory backed by tmpfs instead of hugetlbfs. So, this feature will be used to support the same VM live migration use case described in my original series. Additionally, Android folks (Lokesh Gidra ) hope to optimize the Android Runtime garbage collector using this feature: "The plan is to use userfaultfd for concurrently compacting the heap. With this feature, the heap can be shared-mapped at another location where the GC-thread(s) could continue the compaction operation without the need to invoke userfault ioctl(UFFDIO_COPY) each time. OTOH, if and when Java threads get faults on the heap, UFFDIO_CONTINUE can be used to resume execution. Furthermore, this feature enables updating references in the 'non-moving' portion of the heap efficiently. Without this feature, uneccessary page copying (ioctl(UFFDIO_COPY)) would be required." [1] https://lore.kernel.org/patchwork/cover/1388144/ [2] https://lore.kernel.org/patchwork/patch/1408161/ [3] https://lore.kernel.org/linux-fsdevel/20210301222728.176417-1-axelrasmussen@google.com/T/#t Axel Rasmussen (10): userfaultfd/hugetlbfs: avoid including userfaultfd_k.h in hugetlb.h userfaultfd/shmem: combine shmem_{mcopy_atomic,mfill_zeropage}_pte userfaultfd/shmem: support minor fault registration for shmem userfaultfd/shmem: support UFFDIO_CONTINUE for shmem userfaultfd/shmem: advertise shmem minor fault support userfaultfd/shmem: modify shmem_mfill_atomic_pte to use install_pte() userfaultfd/selftests: use memfd_create for shmem test type userfaultfd/selftests: create alias mappings in the shmem test userfaultfd/selftests: reinitialize test context in each test userfaultfd/selftests: exercise minor fault handling shmem support Documentation/admin-guide/mm/userfaultfd.rst | 3 +- fs/userfaultfd.c | 6 +- include/linux/hugetlb.h | 2 +- include/linux/shmem_fs.h | 19 +- include/linux/userfaultfd_k.h | 5 + include/uapi/linux/userfaultfd.h | 7 +- mm/hugetlb.c | 1 + mm/memory.c | 8 +- mm/shmem.c | 120 +++----- mm/userfaultfd.c | 175 ++++++++---- tools/testing/selftests/vm/userfaultfd.c | 274 ++++++++++++------- 11 files changed, 364 insertions(+), 256 deletions(-) -- 2.31.1.527.g47e6f16901-goog