2021-05-03 18:16:31

by Axel Rasmussen

[permalink] [raw]
Subject: [PATCH v6 00/10] userfaultfd: add minor fault handling for shmem

Base
====

This series is based on (and therefore should apply cleanly to) the tag
"v5.12-rc8-mmots-2021-04-21-23-08", with the following applied first:

1. Peter's selftest cleanup series:
https://lore.kernel.org/patchwork/cover/1412450/

2. My patch to fix a pre-existing BUG_ON in an edge case:
https://lore.kernel.org/patchwork/patch/1419758/

Changelog
=========

v5->v6:
- Picked up {Reviewed,Acked}-by's.
- Rebased onto v5.12-rc8-mmots-2021-04-21-23-08.
- Put mistakenly removed delete_from_page_cache() back in the error path in
shmem_mfill_atomic_pte(). [Hugh]
- Keep shmem_mfill_atomic_pte() naming, instead of shmem_mcopy_... Likewise,
rename our new helper to mfill_atomic_install_pte(). [Hugh]
- Return directly instead of "goto out" in shmem_mfill_atomic_pte(), saving a
couple of lines. [Peter]

v4->v5:
- Picked up {Reviewed,Acked}-by's.
- Fix cleanup in error path in shmem_mcopy_atomic_pte(). [Hugh, Peter]
- Mention switching to lru_cache_add() in the commit message of 9/10. [Hugh]
- Split + reorder commits, so now we 1) implement the faulting path, 2)
implement the CONTINUE ioctl, and 3) advertise the feature. Squash the
documentation update into step (3). [Hugh, Peter]
- Reorder install_pte() cleanup to come before selftest changes. [Hugh]

v3->v4:
- Fix handling of the shmem private mcopy case. Previously, I had (incorrectly)
assumed that !vma_is_anonymous() was equivalent to "the page will be in the
page cache". But, in this case we have an optimization where we allocate a new
*anonymous* page. So, use a new "bool page_in_cache" instead, which checks if
page->mapping is set. Correct several places with this new check. [Hugh]
- Fix calling mm_counter() before page_add_..._rmap(). [Hugh]
- When modifying shmem_mcopy_atomic_pte() to use the new install_pte() helper,
just use lru_cache_add_inactive_or_unevictable(), no need to branch and maybe
use lru_cache_add(). [Hugh]
- De-pluralize mcopy_atomic_install_pte(s). [Hugh]
- Make "writable" a bool, and initialize consistently. [Hugh]

v2->v3:
- Picked up {Reviewed,Acked}-by's.
- Reorder commits: introduce CONTINUE before MINOR registration. [Hugh, Peter]
- Don't try to {unlock,put}_page an xarray value in shmem_getpage_gfp. [Hugh]
- Move enum mcopy_atomic_mode forward declare out of CONFIG_HUGETLB_PAGE. [Hugh]
- Keep mistakenly removed UFFD_USER_MODE_ONLY in selftest. [Peter]
- Cleanup context management in self test (make clear implicit, remove unneeded
return values now that we have err()). [Peter]
- Correct dst_pte argument to dst_pmd in shmem_mcopy_atomic_pte macro. [Hugh]
- Mention the new shmem support feature in documentation. [Hugh]

v1->v2:
- Pick up Reviewed-by's.
- Don't swapin page when a minor fault occurs. Notice that it needs to be
swapped in, and just immediately fire the minor fault. Let a future CONTINUE
deal with swapping in the page. [Peter]
- Clarify comment about i_size checks in mm/userfaultfd.c. [Peter]
- Only forward declare once (out of #ifdef) in hugetlb.h. [Peter]

Changes since [2]:
- Squash the fixes ([2]) in with the original series ([1]). This makes reviewing
easier, as we no longer have to sift through deltas undoing what we had done
before. [Hugh, Peter]
- Modify shmem_mcopy_atomic_pte() to use the new mcopy_atomic_install_ptes()
helper, reducing code duplication. [Hugh]
- Properly trigger handle_userfault() in the shmem_swapin_page() case. [Hugh]
- Use shmem_getpage() instead of find_lock_page() to lookup the existing page in
for continue. This properly deals with swapped-out pages. [Hugh]
- Unconditionally pte_mkdirty() for anon memory (as before). [Peter]
- Don't include userfaultfd_k.h in either hugetlb.h or shmem_fs.h. [Hugh]
- Add comment for UFFD_FEATURE_MINOR_SHMEM (to match _HUGETLBFS). [Hugh]
- Fix some small cleanup issues (parens, reworded conditionals, reduced plumbing
of some parameters, simplify labels/gotos, ...). [Hugh, Peter]

Overview
========

See the series which added minor faults for hugetlbfs [3] for a detailed
overview of minor fault handling in general. This series adds the same support
for shmem-backed areas.

This series is structured as follows:

- Commits 1 and 2 are cleanups.
- Commits 3 and 4 implement the new feature (minor fault handling for shmem).
- Commit 5 advertises that the feature is now available since at this point it's
fully implemented.
- Commit 6 is a final cleanup, modifying an existing code path to re-use a new
helper we've introduced.
- Commits 7, 8, 9, 10 update the userfaultfd selftest to exercise the feature.

Use Case
========

In some cases it is useful to have VM memory backed by tmpfs instead of
hugetlbfs. So, this feature will be used to support the same VM live migration
use case described in my original series.

Additionally, Android folks (Lokesh Gidra <[email protected]>) hope to
optimize the Android Runtime garbage collector using this feature:

"The plan is to use userfaultfd for concurrently compacting the heap. With
this feature, the heap can be shared-mapped at another location where the
GC-thread(s) could continue the compaction operation without the need to
invoke userfault ioctl(UFFDIO_COPY) each time. OTOH, if and when Java threads
get faults on the heap, UFFDIO_CONTINUE can be used to resume execution.
Furthermore, this feature enables updating references in the 'non-moving'
portion of the heap efficiently. Without this feature, uneccessary page
copying (ioctl(UFFDIO_COPY)) would be required."

[1] https://lore.kernel.org/patchwork/cover/1388144/
[2] https://lore.kernel.org/patchwork/patch/1408161/
[3] https://lore.kernel.org/linux-fsdevel/[email protected]/T/#t

Axel Rasmussen (10):
userfaultfd/hugetlbfs: avoid including userfaultfd_k.h in hugetlb.h
userfaultfd/shmem: combine shmem_{mcopy_atomic,mfill_zeropage}_pte
userfaultfd/shmem: support minor fault registration for shmem
userfaultfd/shmem: support UFFDIO_CONTINUE for shmem
userfaultfd/shmem: advertise shmem minor fault support
userfaultfd/shmem: modify shmem_mfill_atomic_pte to use install_pte()
userfaultfd/selftests: use memfd_create for shmem test type
userfaultfd/selftests: create alias mappings in the shmem test
userfaultfd/selftests: reinitialize test context in each test
userfaultfd/selftests: exercise minor fault handling shmem support

Documentation/admin-guide/mm/userfaultfd.rst | 3 +-
fs/userfaultfd.c | 6 +-
include/linux/hugetlb.h | 2 +-
include/linux/shmem_fs.h | 19 +-
include/linux/userfaultfd_k.h | 5 +
include/uapi/linux/userfaultfd.h | 7 +-
mm/hugetlb.c | 1 +
mm/memory.c | 8 +-
mm/shmem.c | 120 +++-----
mm/userfaultfd.c | 175 ++++++++----
tools/testing/selftests/vm/userfaultfd.c | 274 ++++++++++++-------
11 files changed, 364 insertions(+), 256 deletions(-)

--
2.31.1.527.g47e6f16901-goog


2021-05-03 18:16:40

by Axel Rasmussen

[permalink] [raw]
Subject: [PATCH v6 04/10] userfaultfd/shmem: support UFFDIO_CONTINUE for shmem

With this change, userspace can resolve a minor fault within a
shmem-backed area with a UFFDIO_CONTINUE ioctl. The semantics for this
match those for hugetlbfs - we look up the existing page in the page
cache, and install a PTE for it.

This commit introduces a new helper: mfill_atomic_install_pte.

Why handle UFFDIO_CONTINUE for shmem in mm/userfaultfd.c, instead of in
shmem.c? The existing userfault implementation only relies on shmem.c
for VM_SHARED VMAs. However, minor fault handling / CONTINUE work just
fine for !VM_SHARED VMAs as well. We'd prefer to handle CONTINUE for
shmem in one place, regardless of shared/private (to reduce code
duplication).

Why add a new mfill_atomic_install_pte helper? A problem we have with
continue is that shmem_mfill_atomic_pte() and mcopy_atomic_pte() are
*close* to what we want, but not exactly. We do want to setup the PTEs
in a CONTINUE operation, but we don't want to e.g. allocate a new page,
charge it (e.g. to the shmem inode), manipulate various flags, etc. Also
we have the problem stated above: shmem_mfill_atomic_pte() and
mcopy_atomic_pte() both handle one-half of the problem (shared /
private) continue cares about. So, introduce mcontinue_atomic_pte(), to
handle all of the shmem continue cases. Introduce the helper so it
doesn't duplicate code with mcopy_atomic_pte().

In a future commit, shmem_mfill_atomic_pte() will also be modified to
use this new helper. However, since this is a bigger refactor, it seems
most clear to do it as a separate change.

Acked-by: Hugh Dickins <[email protected]>
Acked-by: Peter Xu <[email protected]>
Signed-off-by: Axel Rasmussen <[email protected]>
---
mm/userfaultfd.c | 172 ++++++++++++++++++++++++++++++++++-------------
1 file changed, 127 insertions(+), 45 deletions(-)

diff --git a/mm/userfaultfd.c b/mm/userfaultfd.c
index 242b7a04c16d..d1ac73a0d2a9 100644
--- a/mm/userfaultfd.c
+++ b/mm/userfaultfd.c
@@ -48,6 +48,83 @@ struct vm_area_struct *find_dst_vma(struct mm_struct *dst_mm,
return dst_vma;
}

+/*
+ * Install PTEs, to map dst_addr (within dst_vma) to page.
+ *
+ * This function handles MCOPY_ATOMIC_CONTINUE (which is always file-backed),
+ * whether or not dst_vma is VM_SHARED. It also handles the more general
+ * MCOPY_ATOMIC_NORMAL case, when dst_vma is *not* VM_SHARED (it may be file
+ * backed, or not).
+ *
+ * Note that MCOPY_ATOMIC_NORMAL for a VM_SHARED dst_vma is handled by
+ * shmem_mcopy_atomic_pte instead.
+ */
+static int mfill_atomic_install_pte(struct mm_struct *dst_mm, pmd_t *dst_pmd,
+ struct vm_area_struct *dst_vma,
+ unsigned long dst_addr, struct page *page,
+ bool newly_allocated, bool wp_copy)
+{
+ int ret;
+ pte_t _dst_pte, *dst_pte;
+ bool writable = dst_vma->vm_flags & VM_WRITE;
+ bool vm_shared = dst_vma->vm_flags & VM_SHARED;
+ bool page_in_cache = page->mapping;
+ spinlock_t *ptl;
+ struct inode *inode;
+ pgoff_t offset, max_off;
+
+ _dst_pte = mk_pte(page, dst_vma->vm_page_prot);
+ if (page_in_cache && !vm_shared)
+ writable = false;
+ if (writable || !page_in_cache)
+ _dst_pte = pte_mkdirty(_dst_pte);
+ if (writable) {
+ if (wp_copy)
+ _dst_pte = pte_mkuffd_wp(_dst_pte);
+ else
+ _dst_pte = pte_mkwrite(_dst_pte);
+ }
+
+ dst_pte = pte_offset_map_lock(dst_mm, dst_pmd, dst_addr, &ptl);
+
+ if (vma_is_shmem(dst_vma)) {
+ /* serialize against truncate with the page table lock */
+ inode = dst_vma->vm_file->f_inode;
+ offset = linear_page_index(dst_vma, dst_addr);
+ max_off = DIV_ROUND_UP(i_size_read(inode), PAGE_SIZE);
+ ret = -EFAULT;
+ if (unlikely(offset >= max_off))
+ goto out_unlock;
+ }
+
+ ret = -EEXIST;
+ if (!pte_none(*dst_pte))
+ goto out_unlock;
+
+ if (page_in_cache)
+ page_add_file_rmap(page, false);
+ else
+ page_add_new_anon_rmap(page, dst_vma, dst_addr, false);
+
+ /*
+ * Must happen after rmap, as mm_counter() checks mapping (via
+ * PageAnon()), which is set by __page_set_anon_rmap().
+ */
+ inc_mm_counter(dst_mm, mm_counter(page));
+
+ if (newly_allocated)
+ lru_cache_add_inactive_or_unevictable(page, dst_vma);
+
+ set_pte_at(dst_mm, dst_addr, dst_pte, _dst_pte);
+
+ /* No need to invalidate - it was non-present before */
+ update_mmu_cache(dst_vma, dst_addr, dst_pte);
+ ret = 0;
+out_unlock:
+ pte_unmap_unlock(dst_pte, ptl);
+ return ret;
+}
+
static int mcopy_atomic_pte(struct mm_struct *dst_mm,
pmd_t *dst_pmd,
struct vm_area_struct *dst_vma,
@@ -56,13 +133,9 @@ static int mcopy_atomic_pte(struct mm_struct *dst_mm,
struct page **pagep,
bool wp_copy)
{
- pte_t _dst_pte, *dst_pte;
- spinlock_t *ptl;
void *page_kaddr;
int ret;
struct page *page;
- pgoff_t offset, max_off;
- struct inode *inode;

if (!*pagep) {
ret = -ENOMEM;
@@ -99,43 +172,12 @@ static int mcopy_atomic_pte(struct mm_struct *dst_mm,
if (mem_cgroup_charge(page, dst_mm, GFP_KERNEL))
goto out_release;

- _dst_pte = pte_mkdirty(mk_pte(page, dst_vma->vm_page_prot));
- if (dst_vma->vm_flags & VM_WRITE) {
- if (wp_copy)
- _dst_pte = pte_mkuffd_wp(_dst_pte);
- else
- _dst_pte = pte_mkwrite(_dst_pte);
- }
-
- dst_pte = pte_offset_map_lock(dst_mm, dst_pmd, dst_addr, &ptl);
- if (dst_vma->vm_file) {
- /* the shmem MAP_PRIVATE case requires checking the i_size */
- inode = dst_vma->vm_file->f_inode;
- offset = linear_page_index(dst_vma, dst_addr);
- max_off = DIV_ROUND_UP(i_size_read(inode), PAGE_SIZE);
- ret = -EFAULT;
- if (unlikely(offset >= max_off))
- goto out_release_uncharge_unlock;
- }
- ret = -EEXIST;
- if (!pte_none(*dst_pte))
- goto out_release_uncharge_unlock;
-
- inc_mm_counter(dst_mm, MM_ANONPAGES);
- page_add_new_anon_rmap(page, dst_vma, dst_addr, false);
- lru_cache_add_inactive_or_unevictable(page, dst_vma);
-
- set_pte_at(dst_mm, dst_addr, dst_pte, _dst_pte);
-
- /* No need to invalidate - it was non-present before */
- update_mmu_cache(dst_vma, dst_addr, dst_pte);
-
- pte_unmap_unlock(dst_pte, ptl);
- ret = 0;
+ ret = mfill_atomic_install_pte(dst_mm, dst_pmd, dst_vma, dst_addr,
+ page, true, wp_copy);
+ if (ret)
+ goto out_release;
out:
return ret;
-out_release_uncharge_unlock:
- pte_unmap_unlock(dst_pte, ptl);
out_release:
put_page(page);
goto out;
@@ -176,6 +218,41 @@ static int mfill_zeropage_pte(struct mm_struct *dst_mm,
return ret;
}

+/* Handles UFFDIO_CONTINUE for all shmem VMAs (shared or private). */
+static int mcontinue_atomic_pte(struct mm_struct *dst_mm,
+ pmd_t *dst_pmd,
+ struct vm_area_struct *dst_vma,
+ unsigned long dst_addr,
+ bool wp_copy)
+{
+ struct inode *inode = file_inode(dst_vma->vm_file);
+ pgoff_t pgoff = linear_page_index(dst_vma, dst_addr);
+ struct page *page;
+ int ret;
+
+ ret = shmem_getpage(inode, pgoff, &page, SGP_READ);
+ if (ret)
+ goto out;
+ if (!page) {
+ ret = -EFAULT;
+ goto out;
+ }
+
+ ret = mfill_atomic_install_pte(dst_mm, dst_pmd, dst_vma, dst_addr,
+ page, false, wp_copy);
+ if (ret)
+ goto out_release;
+
+ unlock_page(page);
+ ret = 0;
+out:
+ return ret;
+out_release:
+ unlock_page(page);
+ put_page(page);
+ goto out;
+}
+
static pmd_t *mm_alloc_pmd(struct mm_struct *mm, unsigned long address)
{
pgd_t *pgd;
@@ -415,11 +492,16 @@ static __always_inline ssize_t mfill_atomic_pte(struct mm_struct *dst_mm,
unsigned long dst_addr,
unsigned long src_addr,
struct page **page,
- bool zeropage,
+ enum mcopy_atomic_mode mode,
bool wp_copy)
{
ssize_t err;

+ if (mode == MCOPY_ATOMIC_CONTINUE) {
+ return mcontinue_atomic_pte(dst_mm, dst_pmd, dst_vma, dst_addr,
+ wp_copy);
+ }
+
/*
* The normal page fault path for a shmem will invoke the
* fault, fill the hole in the file and COW it right away. The
@@ -431,7 +513,7 @@ static __always_inline ssize_t mfill_atomic_pte(struct mm_struct *dst_mm,
* and not in the radix tree.
*/
if (!(dst_vma->vm_flags & VM_SHARED)) {
- if (!zeropage)
+ if (mode == MCOPY_ATOMIC_NORMAL)
err = mcopy_atomic_pte(dst_mm, dst_pmd, dst_vma,
dst_addr, src_addr, page,
wp_copy);
@@ -441,7 +523,8 @@ static __always_inline ssize_t mfill_atomic_pte(struct mm_struct *dst_mm,
} else {
VM_WARN_ON_ONCE(wp_copy);
err = shmem_mfill_atomic_pte(dst_mm, dst_pmd, dst_vma,
- dst_addr, src_addr, zeropage,
+ dst_addr, src_addr,
+ mode != MCOPY_ATOMIC_NORMAL,
page);
}

@@ -463,7 +546,6 @@ static __always_inline ssize_t __mcopy_atomic(struct mm_struct *dst_mm,
long copied;
struct page *page;
bool wp_copy;
- bool zeropage = (mcopy_mode == MCOPY_ATOMIC_ZEROPAGE);

/*
* Sanitize the command parameters:
@@ -526,7 +608,7 @@ static __always_inline ssize_t __mcopy_atomic(struct mm_struct *dst_mm,

if (!vma_is_anonymous(dst_vma) && !vma_is_shmem(dst_vma))
goto out_unlock;
- if (mcopy_mode == MCOPY_ATOMIC_CONTINUE)
+ if (!vma_is_shmem(dst_vma) && mcopy_mode == MCOPY_ATOMIC_CONTINUE)
goto out_unlock;

/*
@@ -574,7 +656,7 @@ static __always_inline ssize_t __mcopy_atomic(struct mm_struct *dst_mm,
BUG_ON(pmd_trans_huge(*dst_pmd));

err = mfill_atomic_pte(dst_mm, dst_pmd, dst_vma, dst_addr,
- src_addr, &page, zeropage, wp_copy);
+ src_addr, &page, mcopy_mode, wp_copy);
cond_resched();

if (unlikely(err == -ENOENT)) {
--
2.31.1.527.g47e6f16901-goog

2021-05-03 18:16:44

by Axel Rasmussen

[permalink] [raw]
Subject: [PATCH v6 07/10] userfaultfd/selftests: use memfd_create for shmem test type

This is a preparatory commit. In the future, we want to be able to setup
alias mappings for area_src and area_dst in the shmem test, like we do
in the hugetlb_shared test. With a VMA obtained via
mmap(MAP_ANONYMOUS | MAP_SHARED), it isn't clear how to do this.

So, mmap() with an fd, so we can create alias mappings. Use memfd_create
instead of actually passing in a tmpfs path like hugetlb does, since
it's more convenient / simpler to run, and works just as well.

Future commits will:

1. Setup the alias mappings.
2. Extend our tests to actually take advantage of this, to test new
userfaultfd behavior being introduced in this series.

Also, a small fix in the area we're changing: when the hugetlb setup
fails in main(), pass in the right argv[] so we actually print out the
hugetlb file path.

Reviewed-by: Peter Xu <[email protected]>
Signed-off-by: Axel Rasmussen <[email protected]>
---
tools/testing/selftests/vm/userfaultfd.c | 16 +++++++++++++++-
1 file changed, 15 insertions(+), 1 deletion(-)

diff --git a/tools/testing/selftests/vm/userfaultfd.c b/tools/testing/selftests/vm/userfaultfd.c
index 6339aeaeeff8..fc40831f818f 100644
--- a/tools/testing/selftests/vm/userfaultfd.c
+++ b/tools/testing/selftests/vm/userfaultfd.c
@@ -85,6 +85,7 @@ static bool test_uffdio_wp = false;
static bool test_uffdio_minor = false;

static bool map_shared;
+static int shm_fd;
static int huge_fd;
static char *huge_fd_off0;
static unsigned long long *count_verify;
@@ -277,8 +278,11 @@ static void shmem_release_pages(char *rel_area)

static void shmem_allocate_area(void **alloc_area)
{
+ unsigned long offset =
+ alloc_area == (void **)&area_src ? 0 : nr_pages * page_size;
+
*alloc_area = mmap(NULL, nr_pages * page_size, PROT_READ | PROT_WRITE,
- MAP_ANONYMOUS | MAP_SHARED, -1, 0);
+ MAP_SHARED, shm_fd, offset);
if (*alloc_area == MAP_FAILED)
err("mmap of memfd failed");
}
@@ -1448,6 +1452,16 @@ int main(int argc, char **argv)
err("Open of %s failed", argv[4]);
if (ftruncate(huge_fd, 0))
err("ftruncate %s to size 0 failed", argv[4]);
+ } else if (test_type == TEST_SHMEM) {
+ shm_fd = memfd_create(argv[0], 0);
+ if (shm_fd < 0)
+ err("memfd_create");
+ if (ftruncate(shm_fd, nr_pages * page_size * 2))
+ err("ftruncate");
+ if (fallocate(shm_fd,
+ FALLOC_FL_PUNCH_HOLE | FALLOC_FL_KEEP_SIZE, 0,
+ nr_pages * page_size * 2))
+ err("fallocate");
}
printf("nr_pages: %lu, nr_pages_per_cpu: %lu\n",
nr_pages, nr_pages_per_cpu);
--
2.31.1.527.g47e6f16901-goog

2021-05-03 18:16:45

by Axel Rasmussen

[permalink] [raw]
Subject: [PATCH v6 08/10] userfaultfd/selftests: create alias mappings in the shmem test

Previously, we just allocated two shm areas: area_src and area_dst. With
this commit, change this so we also allocate area_src_alias, and
area_dst_alias.

area_*_alias and area_* (respectively) point to the same underlying
physical pages, but are different VMAs. In a future commit in this
series, we'll leverage this setup to exercise minor fault handling
support for shmem, just like we do in the hugetlb_shared test.

Reviewed-by: Peter Xu <[email protected]>
Signed-off-by: Axel Rasmussen <[email protected]>
---
tools/testing/selftests/vm/userfaultfd.c | 22 +++++++++++++++++++---
1 file changed, 19 insertions(+), 3 deletions(-)

diff --git a/tools/testing/selftests/vm/userfaultfd.c b/tools/testing/selftests/vm/userfaultfd.c
index fc40831f818f..1f65c4ab7994 100644
--- a/tools/testing/selftests/vm/userfaultfd.c
+++ b/tools/testing/selftests/vm/userfaultfd.c
@@ -278,13 +278,29 @@ static void shmem_release_pages(char *rel_area)

static void shmem_allocate_area(void **alloc_area)
{
- unsigned long offset =
- alloc_area == (void **)&area_src ? 0 : nr_pages * page_size;
+ void *area_alias = NULL;
+ bool is_src = alloc_area == (void **)&area_src;
+ unsigned long offset = is_src ? 0 : nr_pages * page_size;

*alloc_area = mmap(NULL, nr_pages * page_size, PROT_READ | PROT_WRITE,
MAP_SHARED, shm_fd, offset);
if (*alloc_area == MAP_FAILED)
err("mmap of memfd failed");
+
+ area_alias = mmap(NULL, nr_pages * page_size, PROT_READ | PROT_WRITE,
+ MAP_SHARED, shm_fd, offset);
+ if (area_alias == MAP_FAILED)
+ err("mmap of memfd alias failed");
+
+ if (is_src)
+ area_src_alias = area_alias;
+ else
+ area_dst_alias = area_alias;
+}
+
+static void shmem_alias_mapping(__u64 *start, size_t len, unsigned long offset)
+{
+ *start = (unsigned long)area_dst_alias + offset;
}

struct uffd_test_ops {
@@ -314,7 +330,7 @@ static struct uffd_test_ops shmem_uffd_test_ops = {
.expected_ioctls = SHMEM_EXPECTED_IOCTLS,
.allocate_area = shmem_allocate_area,
.release_pages = shmem_release_pages,
- .alias_mapping = noop_alias_mapping,
+ .alias_mapping = shmem_alias_mapping,
};

static struct uffd_test_ops hugetlb_uffd_test_ops = {
--
2.31.1.527.g47e6f16901-goog

2021-05-03 18:18:01

by Axel Rasmussen

[permalink] [raw]
Subject: [PATCH v6 03/10] userfaultfd/shmem: support minor fault registration for shmem

This patch allows shmem-backed VMAs to be registered for minor faults.
Minor faults are appropriately relayed to userspace in the fault path,
for VMAs with the relevant flag.

This commit doesn't hook up the UFFDIO_CONTINUE ioctl for shmem-backed
minor faults, though, so userspace doesn't yet have a way to resolve
such faults.

Because of this, we also don't yet advertise this as a supported
feature. That will be done in a separate commit when the feature is
fully implemented.

Acked-by: Peter Xu <[email protected]>
Acked-by: Hugh Dickins <[email protected]>
Signed-off-by: Axel Rasmussen <[email protected]>
---
fs/userfaultfd.c | 3 +--
mm/memory.c | 8 +++++---
mm/shmem.c | 12 +++++++++++-
3 files changed, 17 insertions(+), 6 deletions(-)

diff --git a/fs/userfaultfd.c b/fs/userfaultfd.c
index 14f92285d04f..468556fb04a9 100644
--- a/fs/userfaultfd.c
+++ b/fs/userfaultfd.c
@@ -1267,8 +1267,7 @@ static inline bool vma_can_userfault(struct vm_area_struct *vma,
}

if (vm_flags & VM_UFFD_MINOR) {
- /* FIXME: Add minor fault interception for shmem. */
- if (!is_vm_hugetlb_page(vma))
+ if (!(is_vm_hugetlb_page(vma) || vma_is_shmem(vma)))
return false;
}

diff --git a/mm/memory.c b/mm/memory.c
index 86ba6c1f6821..9a536cfde7c8 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -3972,9 +3972,11 @@ static vm_fault_t do_read_fault(struct vm_fault *vmf)
* something).
*/
if (vma->vm_ops->map_pages && fault_around_bytes >> PAGE_SHIFT > 1) {
- ret = do_fault_around(vmf);
- if (ret)
- return ret;
+ if (likely(!userfaultfd_minor(vmf->vma))) {
+ ret = do_fault_around(vmf);
+ if (ret)
+ return ret;
+ }
}

ret = __do_fault(vmf);
diff --git a/mm/shmem.c b/mm/shmem.c
index 04de845b50b3..e361f1d81c8d 100644
--- a/mm/shmem.c
+++ b/mm/shmem.c
@@ -1785,7 +1785,7 @@ static int shmem_swapin_page(struct inode *inode, pgoff_t index,
* vm. If we swap it in we mark it dirty since we also free the swap
* entry since a page cannot live in both the swap and page cache.
*
- * vmf and fault_type are only supplied by shmem_fault:
+ * vma, vmf, and fault_type are only supplied by shmem_fault:
* otherwise they are NULL.
*/
static int shmem_getpage_gfp(struct inode *inode, pgoff_t index,
@@ -1820,6 +1820,16 @@ static int shmem_getpage_gfp(struct inode *inode, pgoff_t index,

page = pagecache_get_page(mapping, index,
FGP_ENTRY | FGP_HEAD | FGP_LOCK, 0);
+
+ if (page && vma && userfaultfd_minor(vma)) {
+ if (!xa_is_value(page)) {
+ unlock_page(page);
+ put_page(page);
+ }
+ *fault_type = handle_userfault(vmf, VM_UFFD_MINOR);
+ return 0;
+ }
+
if (xa_is_value(page)) {
error = shmem_swapin_page(inode, index, &page,
sgp, gfp, vma, fault_type);
--
2.31.1.527.g47e6f16901-goog