LinuxLists.cc - [PATCH v8 00/23] userfaultfd-wp: Support shmem and hugetlbfs

2022-04-05 03:43:12

Subject: [PATCH v8 00/23] userfaultfd-wp: Support shmem and hugetlbfs

This is v8 of the series to add shmem+hugetlbfs support for userfaultfd
write protection. It is based on v5.17-mmots-2022-03-31-20-40.

I touched up two small details after the rebase, namely:

- Let UFFDIO_REGISTER fail gracefully if CONFIG_PTE_MARKER_UFFD_WP is not
set, in "mm/uffd: Enable write protection for shmem & hugetlbfs".

- Tweaked the patch "mm: Enable PTE markers by default" to make sure
it'll be auto-enabled on x86_64 (for real) by kconfig where proper.

During testing of recent versions, I grew another unit test for uffd-wp
specifically (uffd-test [0], name is not important.. though). The current
vm/userfaultfd test doesn't cover the case to check what message we expect,
so the simple new test can catch errors when e.g. one page was wr-protected
but it was wrongly written without being noticed by the fault resolving
thread, hence data corrupt.

I used to only find such issues with umapsort only, and MISSING mode won't
have those data loss issues. But now many of it can also be found with
uffd-test [0]. I plan to port it to linux repo after this series lands.

The whole tree can be found here for testing:

https://github.com/xzpeter/linux/tree/uffd-wp-shmem-hugetlbfs

Previous versions:

RFC: https://lore.kernel.org/lkml/[email protected]
v1: https://lore.kernel.org/lkml/[email protected]
v2: https://lore.kernel.org/lkml/[email protected]
v3: https://lore.kernel.org/lkml/[email protected]
v4: https://lore.kernel.org/lkml/[email protected]
v5: https://lore.kernel.org/lkml/[email protected]
v6: https://lore.kernel.org/lkml/[email protected]
v7: https://lore.kernel.org/lkml/[email protected]

Overview
========

Userfaultfd-wp anonymous support was merged two years ago. There're quite
a few applications that started to leverage this capability either to take
snapshots for user-app memory, or use it for full user controled swapping.

This series tries to complete the feature for uffd-wp so as to cover all
the RAM-based memory types. So far uffd-wp is the only missing piece of
the rest features (uffd-missing & uffd-minor mode).

One major reason to do so is that anonymous pages are sometimes not
satisfying the need of applications, and there're growing users of either
shmem and hugetlbfs for either sharing purpose (e.g., sharing guest mem
between hypervisor process and device emulation process, shmem local live
migration for upgrades), or for performance on tlb hits.

All these mean that if a uffd-wp app wants to switch to any of the memory
types, it'll stop working. I think it's worthwhile to have the kernel to
cover all these aspects.

This series chose to protect pages in pte level not page level.

One major reason is safety. I have no idea how we could make it safe if
any of the uffd-privileged app can wr-protect a page that any other
application can use. It means this app can block any process potentially
for any time it wants.

The other reason is that it aligns very well with not only the anonymous
uffd-wp solution, but also uffd as a whole. For example, userfaultfd is
implemented fundamentally based on VMAs. We set flags to VMAs showing the
status of uffd tracking. For another per-page based protection solution,
it'll be crossing the fundation line on VMA-based, and it could simply be
too far away already from what's called userfaultfd.

PTE markers
===========

The patchset is based on the idea called PTE markers. It was discussed in
one of the mm alignment sessions, proposed starting from v6, and this is
the 2nd version of it using PTE marker idea.

PTE marker is a new type of swap entry that is ony applicable to file
backed memories like shmem and hugetlbfs. It's used to persist some
pte-level information even if the original present ptes in pgtable are
zapped.

Logically pte markers can store more than uffd-wp information, but so far
only one bit is used for uffd-wp purpose. When the pte marker is installed
with uffd-wp bit set, it means this pte is wr-protected by uffd.

It solves the problem on e.g. file-backed memory mapped ptes got zapped due
to any reason (e.g. thp split, or swapped out), we can still keep the
wr-protect information in the ptes. Then when the page fault triggers
again, we'll know this pte is wr-protected so we can treat the pte the same
as a normal uffd wr-protected pte.

The extra information is encoded into the swap entry, or swp_offset to be
explicit, with the swp_type being PTE_MARKER. So far uffd-wp only uses one
bit out of the swap entry, the rest bits of swp_offset are still reserved
for other purposes.

There're two configs to enable/disable PTE markers:

CONFIG_PTE_MARKER
CONFIG_PTE_MARKER_UFFD_WP

We can set !PTE_MARKER to completely disable all the PTE markers, along with
uffd-wp support. I made two config so we can also enable PTE marker but
disable uffd-wp file-backed for other purposes. At the end of current series,
I'll enable CONFIG_PTE_MARKER by default, but that patch is standalone and if
anyone worries about having it by default, we can also consider turn it off by
dropping that oneliner patch. So far I don't see a huge risk of doing so, so I
kept that patch.

In most cases, PTE markers should be treated as none ptes. It is because that
unlike most of the other swap entry types, there's no PFN or block offset
information encoded into PTE markers but some extra well-defined bits showing
the status of the pte. These bits should only be used as extra data when
servicing an upcoming page fault, and then we behave as if it's a none pte.

I did spend a lot of time observing all the pte_none() users this time. It is
indeed a challenge because there're a lot, and I hope I didn't miss a single of
them when we should take care of pte markers. Luckily, I don't think it'll
need to be considered in many cases, for example: boot code, arch code
(especially non-x86), kernel-only page handlings (e.g. CPA), or device driver
codes when we're tackling with pure PFN mappings.

I introduced pte_none_mostly() in this series when we need to handle pte
markers the same as none pte, the "mostly" is the other way to write "either
none pte or a pte marker".

I didn't replace pte_none() to cover pte markers for below reasons:

- Very rare case of pte_none() callers will handle pte markers. E.g., all
the kernel pages do not require knowledge of pte markers. So we don't
pollute the major use cases.

- Unconditionally change pte_none() semantics could confuse people, because
pte_none() existed for so long a time.

- Unconditionally change pte_none() semantics could make pte_none() slower
even if in many cases pte markers do not exist.

- There're cases where we'd like to handle pte markers differntly from
pte_none(), so a full replace is also impossible. E.g. khugepaged should
still treat pte markers as normal swap ptes rather than none ptes, because
pte markers will always need a fault-in to merge the marker with a valid
pte. Or the smap code will need to parse PTE markers not none ptes.

Patch Layout
============

Introducing PTE marker and uffd-wp bit in PTE marker:

mm: Introduce PTE_MARKER swap entry
mm: Teach core mm about pte markers
mm: Check against orig_pte for finish_fault()
mm/uffd: PTE_MARKER_UFFD_WP

Adding support for shmem uffd-wp:

mm/shmem: Take care of UFFDIO_COPY_MODE_WP
mm/shmem: Handle uffd-wp special pte in page fault handler
mm/shmem: Persist uffd-wp bit across zapping for file-backed
mm/shmem: Allow uffd wr-protect none pte for file-backed mem
mm/shmem: Allows file-back mem to be uffd wr-protected on thps
mm/shmem: Handle uffd-wp during fork()

Adding support for hugetlbfs uffd-wp:

mm/hugetlb: Introduce huge pte version of uffd-wp helpers
mm/hugetlb: Hook page faults for uffd write protection
mm/hugetlb: Take care of UFFDIO_COPY_MODE_WP
mm/hugetlb: Handle UFFDIO_WRITEPROTECT
mm/hugetlb: Handle pte markers in page faults
mm/hugetlb: Allow uffd wr-protect none ptes
mm/hugetlb: Only drop uffd-wp special pte if required
mm/hugetlb: Handle uffd-wp during fork()

Misc handling on the rest mm for uffd-wp file-backed:

mm/khugepaged: Don't recycle vma pgtable if uffd-wp registered
mm/pagemap: Recognize uffd-wp bit for shmem/hugetlbfs

Enabling of uffd-wp on file-backed memory:

mm/uffd: Enable write protection for shmem & hugetlbfs
mm: Enable PTE markers by default
selftests/uffd: Enable uffd-wp for shmem/hugetlbfs

Tests
=====

- Compile test on x86_64 and aarch64 on different configs
- Kernel selftests
- uffd-test [0]
- Umapsort [1,2] test for shmem/hugetlb, with swap on/off

Please review, thanks.

[0] https://github.com/xzpeter/clibs/tree/master/uffd-test
[1] https://github.com/xzpeter/umap-apps/tree/peter
[2] https://github.com/xzpeter/umap/tree/peter-shmem-hugetlbfs

Peter Xu (23):
mm: Introduce PTE_MARKER swap entry
mm: Teach core mm about pte markers
mm: Check against orig_pte for finish_fault()
mm/uffd: PTE_MARKER_UFFD_WP
mm/shmem: Take care of UFFDIO_COPY_MODE_WP
mm/shmem: Handle uffd-wp special pte in page fault handler
mm/shmem: Persist uffd-wp bit across zapping for file-backed
mm/shmem: Allow uffd wr-protect none pte for file-backed mem
mm/shmem: Allows file-back mem to be uffd wr-protected on thps
mm/shmem: Handle uffd-wp during fork()
mm/hugetlb: Introduce huge pte version of uffd-wp helpers
mm/hugetlb: Hook page faults for uffd write protection
mm/hugetlb: Take care of UFFDIO_COPY_MODE_WP
mm/hugetlb: Handle UFFDIO_WRITEPROTECT
mm/hugetlb: Handle pte markers in page faults
mm/hugetlb: Allow uffd wr-protect none ptes
mm/hugetlb: Only drop uffd-wp special pte if required
mm/hugetlb: Handle uffd-wp during fork()
mm/khugepaged: Don't recycle vma pgtable if uffd-wp registered
mm/pagemap: Recognize uffd-wp bit for shmem/hugetlbfs
mm/uffd: Enable write protection for shmem & hugetlbfs
mm: Enable PTE markers by default
selftests/uffd: Enable uffd-wp for shmem/hugetlbfs

arch/s390/include/asm/hugetlb.h | 15 ++
fs/hugetlbfs/inode.c | 15 +-
fs/proc/task_mmu.c | 7 +
fs/userfaultfd.c | 31 ++--
include/asm-generic/hugetlb.h | 24 +++
include/linux/hugetlb.h | 27 ++--
include/linux/mm.h | 10 ++
include/linux/mm_inline.h | 43 +++++
include/linux/shmem_fs.h | 4 +-
include/linux/swap.h | 15 +-
include/linux/swapops.h | 79 +++++++++
include/linux/userfaultfd_k.h | 80 +++++++++
include/uapi/linux/userfaultfd.h | 10 +-
mm/Kconfig | 17 ++
mm/filemap.c | 5 +
mm/hmm.c | 2 +-
mm/hugetlb.c | 183 ++++++++++++++++-----
mm/khugepaged.c | 14 +-
mm/memcontrol.c | 8 +-
mm/memory.c | 196 ++++++++++++++++++++---
mm/mincore.c | 3 +-
mm/mprotect.c | 75 ++++++++-
mm/rmap.c | 8 +
mm/shmem.c | 4 +-
mm/userfaultfd.c | 54 +++++--
tools/testing/selftests/vm/userfaultfd.c | 4 +-
26 files changed, 807 insertions(+), 126 deletions(-)

--
2.32.0

2022-04-05 03:43:19

Subject: [PATCH v8 00/23] userfaultfd-wp: Support shmem and hugetlbfs

Subject: [PATCH v8 01/23] mm: Introduce PTE_MARKER swap entry

Subject: [PATCH v8 16/23] mm/hugetlb: Allow uffd wr-protect none ptes

Subject: [PATCH v8 13/23] mm/hugetlb: Take care of UFFDIO_COPY_MODE_WP

Subject: [PATCH v8 08/23] mm/shmem: Allow uffd wr-protect none pte for file-backed mem

Subject: [PATCH v8 05/23] mm/shmem: Take care of UFFDIO_COPY_MODE_WP

Subject: [PATCH v8 10/23] mm/shmem: Handle uffd-wp during fork()

Subject: [PATCH v8 02/23] mm: Teach core mm about pte markers

Subject: [PATCH v8 11/23] mm/hugetlb: Introduce huge pte version of uffd-wp helpers

Subject: [PATCH v8 03/23] mm: Check against orig_pte for finish_fault()

Subject: [PATCH v8 04/23] mm/uffd: PTE_MARKER_UFFD_WP

Subject: [PATCH v8 06/23] mm/shmem: Handle uffd-wp special pte in page fault handler

Subject: [PATCH v8 07/23] mm/shmem: Persist uffd-wp bit across zapping for file-backed

Subject: [PATCH v8 09/23] mm/shmem: Allows file-back mem to be uffd wr-protected on thps

Subject: [PATCH v8 12/23] mm/hugetlb: Hook page faults for uffd write protection

Subject: [PATCH v8 14/23] mm/hugetlb: Handle UFFDIO_WRITEPROTECT

Subject: [PATCH v8 18/23] mm/hugetlb: Handle uffd-wp during fork()

Subject: [PATCH v8 19/23] mm/khugepaged: Don't recycle vma pgtable if uffd-wp registered

Subject: [PATCH v8 21/23] mm/uffd: Enable write protection for shmem & hugetlbfs

Subject: [PATCH v8 15/23] mm/hugetlb: Handle pte markers in page faults

Subject: [PATCH v8 22/23] mm: Enable PTE markers by default

Subject: [PATCH v8 17/23] mm/hugetlb: Only drop uffd-wp special pte if required

Subject: [PATCH v8 23/23] selftests/uffd: Enable uffd-wp for shmem/hugetlbfs

Subject: [PATCH v8 20/23] mm/pagemap: Recognize uffd-wp bit for shmem/hugetlbfs

Subject: Re: [PATCH v8 00/23] userfaultfd-wp: Support shmem and hugetlbfs

Attachments:

Subject: Re: [PATCH v8 00/23] userfaultfd-wp: Support shmem and hugetlbfs

Subject: Re: [PATCH v8 00/23] userfaultfd-wp: Support shmem and hugetlbfs

Subject: Re: [PATCH v8 00/23] userfaultfd-wp: Support shmem and hugetlbfs

Subject: Re: [PATCH v8 00/23] userfaultfd-wp: Support shmem and hugetlbfs

Subject: Re: [PATCH v8 10/23] mm/shmem: Handle uffd-wp during fork()

Subject: Re: [PATCH v8 10/23] mm/shmem: Handle uffd-wp during fork()

Subject: Re: [PATCH v8 04/23] mm/uffd: PTE_MARKER_UFFD_WP

Subject: Re: [PATCH v8 15/23] mm/hugetlb: Handle pte markers in page faults

Subject: Re: [PATCH v8 15/23] mm/hugetlb: Handle pte markers in page faults

Attachments:

Subject: Re: [PATCH v8 01/23] mm: Introduce PTE_MARKER swap entry

Attachments:

Subject: Re: [PATCH v8 03/23] mm: Check against orig_pte for finish_fault()

Attachments:

Subject: Re: [PATCH v8 02/23] mm: Teach core mm about pte markers

Subject: Re: [PATCH v8 01/23] mm: Introduce PTE_MARKER swap entry

Subject: Re: [PATCH v8 03/23] mm: Check against orig_pte for finish_fault()

Subject: Re: [PATCH v8 02/23] mm: Teach core mm about pte markers

Attachments:

Subject: Re: [PATCH v8 01/23] mm: Introduce PTE_MARKER swap entry

Attachments:

Subject: Re: [PATCH v8 03/23] mm: Check against orig_pte for finish_fault()

Subject: Re: [PATCH v8 01/23] mm: Introduce PTE_MARKER swap entry

Subject: Re: [PATCH v8 03/23] mm: Check against orig_pte for finish_fault()

Attachments:

Subject: Re: [PATCH v8 03/23] mm: Check against orig_pte for finish_fault()

Subject: Re: [PATCH v8 03/23] mm: Check against orig_pte for finish_fault()

Subject: Re: [PATCH v8 03/23] mm: Check against orig_pte for finish_fault()

Subject: Re: [PATCH v8 03/23] mm: Check against orig_pte for finish_fault()

Subject: Re: [PATCH v8 03/23] mm: Check against orig_pte for finish_fault()

Subject: Re: [PATCH v8 03/23] mm: Check against orig_pte for finish_fault()

Subject: Re: [PATCH v8 01/23] mm: Introduce PTE_MARKER swap entry

Attachments:

Subject: Re: [PATCH v8 22/23] mm: Enable PTE markers by default

Subject: Re: [PATCH v8 22/23] mm: Enable PTE markers by default

Subject: Re: [PATCH v8 22/23] mm: Enable PTE markers by default

Subject: Re: [PATCH v8 22/23] mm: Enable PTE markers by default

Subject: Re: [PATCH v8 01/23] mm: Introduce PTE_MARKER swap entry

Subject: Re: [PATCH v8 22/23] mm: Enable PTE markers by default

Subject: Re: [PATCH v8 22/23] mm: Enable PTE markers by default

Subject: Re: [PATCH v8 22/23] mm: Enable PTE markers by default

Attachments:

Subject: Re: [PATCH v8 22/23] mm: Enable PTE markers by default

Subject: Re: [PATCH v8 00/23] userfaultfd-wp: Support shmem and hugetlbfs

Subject: Re: [PATCH v8 06/23] mm/shmem: Handle uffd-wp special pte in page fault handler

Subject: Re: [PATCH v8 06/23] mm/shmem: Handle uffd-wp special pte in page fault handler