2021-07-14 23:51:32

by Peter Xu

[permalink] [raw]
Subject: [PATCH v4 00/26] userfaultfd-wp: Support shmem and hugetlbfs

This is v4 of uffd-wp shmem & hugetlbfs support, which completes uffd-wp as a

full feature. It's based on v5.14-rc1.



The whole series can also be found online [1]. Nothing big really changed from

previous version.



One thing worth mentioning is in commit 22061a1ffabd we added a new single_page

parameter to zap_details from Hugh, then a few zap related patches need some

rebase around it. There's also the new unmap_mapping_page() introduced, in

this series it will start to run with zap flag ZAP_FLAG_DROP_FILE_UFFD_WP, as

this function is used as the last phase to unmap shmem mappings when page being

e.g. truncated. It actually even simplified a bit as I can drop the patch "mm:

Pass zap_flags into unmap_mapping_pages()" now.



Another thing to mention is I further modified (trivially) the test program

umap-apps [4] to allow backend storage to be run on a shmem file (it used to be

a e.g. XFS file, however as I noticed disk latency is a major bottleneck of

umapsort program especially when it's a HDD, shmem backend is much faster).

This should stress the kernel a bit more than before.



Full changelog listed below.



v4:

- Rebased to v5.14-rc1

- Collect r-b for Alistair

- Patch "mm/userfaultfd: Introduce special pte for unmapped file-backed mem"

- make pte_swp_uffd_wp_special() return false for !HAVE_ARCH_USERFAULTFD_WP

[Alistair]

- Patch "mm: Introduce zap_details.zap_flags"

- Rename zap_check_mapping_skip to zap_skip_check_mapping [Alistair]

- Patch "mm: Pass zap_flags into unmap_mapping_pages()"

- Dropped the patch because after commit 22061a1ffabd it's not needed anymore.

- Patch "mm/userfaultfd: Enable write protection for shmem & hugetlbfs"

- Drop UFFD_FEATURE_WP_HUGETLBFS_SHMEM too if !CONFIG_HAVE_ARCH_USERFAULTFD_WP

- Patch "shmem/userfaultfd: Persist uffd-wp bit across zapping for file-backed":

- Convert zap_pte_range() of device private to WARN_ON_ONCE() because it must

not be a file-backed pte [Alistair]

- Coordinate with new commit 22061a1ffabd ("mm/thp: unmap_mapping_page() to

fix THP truncate_cleanup_page()"), add ZAP_FLAG_CHECK_MAPPING for new

function unmap_mapping_page().



v3:

- Rebase to v5.13-rc3-mmots-2021-05-25-20-12

- Fix commit message and comment for patch "shmem/userfaultfd: Handle uffd-wp

special pte in page fault handler", dropping all reference to FAULT_FLAG_UFFD_WP.

- Reworked patch "shmem/userfaultfd: Take care of UFFDIO_COPY_MODE_WP" after

Axel's refactoring on uffdio-copy/continue.

- Added patch "mm/hugetlb: Introduce huge pte version of uffd-wp helpers", so

that huge pte helpers are introduced in one patch. Also add huge_pte_uffd_wp

helper, which was missing previously.

- Added patch: "mm/pagemap: Recognize uffd-wp bit for shmem/hugetlbfs", to let

pagemap uffd-wp bit work for shmem/hugetlbfs

- Added patch: "mm/shmem: Unconditionally set pte dirty in

mfill_atomic_install_pte", to clean up dirty bit together in uffdio-copy



v2:

- Add R-bs

- Added patch "mm/hugetlb: Drop __unmap_hugepage_range definition from

hugetlb.h" as noticed/suggested by Mike Kravets

- Fix commit message of patch "hugetlb/userfaultfd: Only drop uffd-wp special

pte if required" [MikeK]

- Removing comments for fields in zap_details since they're either incorrect or

not helping [Matthew]

- Rephrase commit message in patch "hugetlb/userfaultfd: Take care of

UFFDIO_COPY_MODE_WP" to explain better on why set dirty bit for UFFDIO_COPY

in hugetlbfs [MikeK]

- Don't emulate READ for uffd-wp-special on both shmem & hugetlbfs.

- Drop FAULT_FLAG_UFFD_WP flag, by checking vmf->orig_pte directly against

pte_swp_uffd_wp_special()

- Fix race condition of page fault handling on uffd-wp-special [Mike]



About Swap Special PTE

======================



In short, the so-called "swap special pte" in this patchset is a new type of

pte that doesn't exist in the past, but it got used initially in this series in

file-backed memories. It is used to persist information even if the ptes got

dropped meanwhile when the page cache still existed. For example, when

splitting a file-backed huge pmd, we could be simply dropping the pmd entry

then wait until another fault coming. It's okay in the past since all

information in the pte can be retained from the page cache when the next page

fault triggers. However in this case, uffd-wp is per-pte information which

cannot be kept in page cache, so that information needs to be maintained

somehow still in the pgtable entry, even if the pgtable entry is going to be

dropped. Here instead of replacing with a none entry, we used the "swap

special pte". Then when the next page fault triggers, we can observe orig_pte

to retain this information.



I'm copy-pasting some commit message from the patch "mm/swap: Introduce the

idea of special swap ptes", where it tried to explain this pte in another angle:



We used to have special swap entries, like migration entries, hw-poison

entries, device private entries, etc.



Those "special swap entries" reside in the range that they need to be at least

swap entries first, and their types are decided by swp_type(entry).



This patch introduces another idea called "special swap ptes".



It's very easy to get confused against "special swap entries", but a speical

swap pte should never contain a swap entry at all. It means, it's illegal to

call pte_to_swp_entry() upon a special swap pte.



Make the uffd-wp special pte to be the first special swap pte.



Before this patch, is_swap_pte()==true means one of the below:



(a.1) The pte has a normal swap entry (non_swap_entry()==false). For

example, when an anonymous page got swapped out.



(a.2) The pte has a special swap entry (non_swap_entry()==true). For

example, a migration entry, a hw-poison entry, etc.



After this patch, is_swap_pte()==true means one of the below, where case (b) is

added:



(a) The pte contains a swap entry.



(a.1) The pte has a normal swap entry (non_swap_entry()==false). For

example, when an anonymous page got swapped out.



(a.2) The pte has a special swap entry (non_swap_entry()==true). For

example, a migration entry, a hw-poison entry, etc.



(b) The pte does not contain a swap entry at all (so it cannot be passed

into pte_to_swp_entry()). For example, uffd-wp special swap pte.



Hugetlbfs needs similar thing because it's also file-backed. I directly reused

the same special pte there, though the shmem/hugetlb change on supporting this

new pte is different since they don't share code path a lot.



Patch layout

============



Part (1): Shmem support, this is where the special swap pte is introduced.

Some zap rework is needed within the process:



mm/shmem: Unconditionally set pte dirty in mfill_atomic_install_pte

shmem/userfaultfd: Take care of UFFDIO_COPY_MODE_WP

mm: Clear vmf->pte after pte_unmap_same() returns

mm/userfaultfd: Introduce special pte for unmapped file-backed mem

mm/swap: Introduce the idea of special swap ptes

shmem/userfaultfd: Handle uffd-wp special pte in page fault handler

mm: Drop first_index/last_index in zap_details

mm: Introduce zap_details.zap_flags

mm: Introduce ZAP_FLAG_SKIP_SWAP

shmem/userfaultfd: Persist uffd-wp bit across zapping for file-backed

shmem/userfaultfd: Allow wr-protect none pte for file-backed mem

shmem/userfaultfd: Allows file-back mem to be uffd wr-protected on thps

shmem/userfaultfd: Handle the left-overed special swap ptes

shmem/userfaultfd: Pass over uffd-wp special swap pte when fork()



Part (2): Hugetlb supportdisable huge pmd sharing for uffd-wp patches have been

merged. The rest is the changes required to teach hugetlbfs understand the

special swap pte too that introduced with the uffd-wp change:



mm/hugetlb: Drop __unmap_hugepage_range definition from hugetlb.h

mm/hugetlb: Introduce huge pte version of uffd-wp helpers

hugetlb/userfaultfd: Hook page faults for uffd write protection

hugetlb/userfaultfd: Take care of UFFDIO_COPY_MODE_WP

hugetlb/userfaultfd: Handle UFFDIO_WRITEPROTECT

mm/hugetlb: Introduce huge version of special swap pte helpers

hugetlb/userfaultfd: Handle uffd-wp special pte in hugetlb pf handler

hugetlb/userfaultfd: Allow wr-protect none ptes

hugetlb/userfaultfd: Only drop uffd-wp special pte if required



Part (3): Enable both features in code and test (plus pagemap support)



mm/pagemap: Recognize uffd-wp bit for shmem/hugetlbfs

userfaultfd: Enable write protection for shmem & hugetlbfs

userfaultfd/selftests: Enable uffd-wp for shmem/hugetlbfs



Tests

=====



I've tested it using either userfaultfd kselftest program, but also with

umapsort [2] which should be even stricter. Tested page swapping in/out during

umapsort.



If anyone would like to try umapsort, need to use an extremely hacked version

of umap library [3], because by default umap only supports anonymous. So to

test it we need to build [3] then [2].



Any comment would be greatly welcomed. Thanks,



[1] https://github.com/xzpeter/linux/tree/uffd-wp-shmem-hugetlbfs

[2] https://github.com/xzpeter/umap-apps/tree/peter

[3] https://github.com/xzpeter/umap/tree/peter-shmem-hugetlbfs

[4] https://github.com/xzpeter/umap-apps/commit/b0c2c7b1cd9dcb6835e7c59d02ece1f6b7f1ea01



Peter Xu (26):

mm/shmem: Unconditionally set pte dirty in mfill_atomic_install_pte

shmem/userfaultfd: Take care of UFFDIO_COPY_MODE_WP

mm: Clear vmf->pte after pte_unmap_same() returns

mm/userfaultfd: Introduce special pte for unmapped file-backed mem

mm/swap: Introduce the idea of special swap ptes

shmem/userfaultfd: Handle uffd-wp special pte in page fault handler

mm: Drop first_index/last_index in zap_details

mm: Introduce zap_details.zap_flags

mm: Introduce ZAP_FLAG_SKIP_SWAP

shmem/userfaultfd: Persist uffd-wp bit across zapping for file-backed

shmem/userfaultfd: Allow wr-protect none pte for file-backed mem

shmem/userfaultfd: Allows file-back mem to be uffd wr-protected on

thps

shmem/userfaultfd: Handle the left-overed special swap ptes

shmem/userfaultfd: Pass over uffd-wp special swap pte when fork()

mm/hugetlb: Drop __unmap_hugepage_range definition from hugetlb.h

mm/hugetlb: Introduce huge pte version of uffd-wp helpers

hugetlb/userfaultfd: Hook page faults for uffd write protection

hugetlb/userfaultfd: Take care of UFFDIO_COPY_MODE_WP

hugetlb/userfaultfd: Handle UFFDIO_WRITEPROTECT

mm/hugetlb: Introduce huge version of special swap pte helpers

hugetlb/userfaultfd: Handle uffd-wp special pte in hugetlb pf handler

hugetlb/userfaultfd: Allow wr-protect none ptes

hugetlb/userfaultfd: Only drop uffd-wp special pte if required

mm/pagemap: Recognize uffd-wp bit for shmem/hugetlbfs

mm/userfaultfd: Enable write protection for shmem & hugetlbfs

userfaultfd/selftests: Enable uffd-wp for shmem/hugetlbfs



arch/arm64/kernel/mte.c | 2 +-

arch/x86/include/asm/pgtable.h | 28 +++

fs/hugetlbfs/inode.c | 15 +-

fs/proc/task_mmu.c | 21 +-

fs/userfaultfd.c | 41 ++--

include/asm-generic/hugetlb.h | 15 ++

include/asm-generic/pgtable_uffd.h | 3 +

include/linux/hugetlb.h | 30 ++-

include/linux/mm.h | 44 +++-

include/linux/mm_inline.h | 42 ++++

include/linux/shmem_fs.h | 4 +-

include/linux/swapops.h | 39 +++-

include/linux/userfaultfd_k.h | 49 +++++

include/uapi/linux/userfaultfd.h | 10 +-

mm/gup.c | 2 +-

mm/hmm.c | 2 +-

mm/hugetlb.c | 160 ++++++++++++---

mm/khugepaged.c | 11 +-

mm/madvise.c | 4 +-

mm/memcontrol.c | 2 +-

mm/memory.c | 244 +++++++++++++++++------

mm/migrate.c | 4 +-

mm/mincore.c | 2 +-

mm/mprotect.c | 63 +++++-

mm/mremap.c | 2 +-

mm/page_vma_mapped.c | 6 +-

mm/rmap.c | 8 +

mm/shmem.c | 5 +-

mm/swapfile.c | 2 +-

mm/userfaultfd.c | 73 +++++--

tools/testing/selftests/vm/userfaultfd.c | 9 +-

31 files changed, 756 insertions(+), 186 deletions(-)



--

2.31.1





2021-07-14 23:51:34

by Peter Xu

[permalink] [raw]
Subject: [PATCH v4 04/26] mm/userfaultfd: Introduce special pte for unmapped file-backed mem

This patch introduces a very special swap-like pte for file-backed memories.

Currently it's only defined for x86_64 only, but as long as any arch that can
properly define the UFFD_WP_SWP_PTE_SPECIAL value as requested, it should
conceptually work too.

We will use this special pte to arm the ptes that got either unmapped or
swapped out for a file-backed region that was previously wr-protected. This
special pte could trigger a page fault just like swap entries, and as long as
the page fault will satisfy pte_none()==false && pte_present()==false.

Then we can revive the special pte into a normal pte backed by the page cache.

This idea is greatly inspired by Hugh and Andrea in the discussion, which is
referenced in the links below.

The other idea (from Hugh) is that we use swp_type==1 and swp_offset=0 as the
special pte. The current solution (as pointed out by Andrea) is slightly
preferred in that we don't even need swp_entry_t knowledge at all in trapping
these accesses. Meanwhile, we also reuse _PAGE_SWP_UFFD_WP from the anonymous
swp entries.

This patch only introduces the special pte and its operators. It's not yet
applied to have any functional difference.

Link: https://lore.kernel.org/lkml/[email protected]/
Link: https://lore.kernel.org/lkml/[email protected]/
Suggested-by: Andrea Arcangeli <[email protected]>
Suggested-by: Hugh Dickins <[email protected]>
Signed-off-by: Peter Xu <[email protected]>
---
arch/x86/include/asm/pgtable.h | 28 ++++++++++++++++++++++++++++
include/asm-generic/pgtable_uffd.h | 3 +++
include/linux/userfaultfd_k.h | 25 +++++++++++++++++++++++++
3 files changed, 56 insertions(+)

diff --git a/arch/x86/include/asm/pgtable.h b/arch/x86/include/asm/pgtable.h
index 448cd01eb3ec..71b1e73d5b26 100644
--- a/arch/x86/include/asm/pgtable.h
+++ b/arch/x86/include/asm/pgtable.h
@@ -1300,6 +1300,34 @@ static inline pmd_t pmd_swp_clear_soft_dirty(pmd_t pmd)
#endif

#ifdef CONFIG_HAVE_ARCH_USERFAULTFD_WP
+
+/*
+ * This is a very special swap-like pte that marks this pte as "wr-protected"
+ * by userfaultfd-wp. It should only exist for file-backed memory where the
+ * page (previously got wr-protected) has been unmapped or swapped out.
+ *
+ * For anonymous memories, the userfaultfd-wp _PAGE_SWP_UFFD_WP bit is kept
+ * along with a real swp entry instead.
+ *
+ * Let's make some rules for this special pte:
+ *
+ * (1) pte_none()==false, so that it'll not trigger a missing page fault.
+ *
+ * (2) pte_present()==false, so that it's recognized as swap (is_swap_pte).
+ *
+ * (3) pte_swp_uffd_wp()==true, so it can be tested just like a swap pte that
+ * contains a valid swap entry, so that we can check a swap pte always
+ * using "is_swap_pte() && pte_swp_uffd_wp()" without caring about whether
+ * there's one swap entry inside of the pte.
+ *
+ * (4) It should not be a valid swap pte anywhere, so that when we see this pte
+ * we know it does not contain a swap entry.
+ *
+ * For x86, the simplest special pte which satisfies all of above should be the
+ * pte with only _PAGE_SWP_UFFD_WP bit set (where swp_type==swp_offset==0).
+ */
+#define UFFD_WP_SWP_PTE_SPECIAL __pte(_PAGE_SWP_UFFD_WP)
+
static inline pte_t pte_swp_mkuffd_wp(pte_t pte)
{
return pte_set_flags(pte, _PAGE_SWP_UFFD_WP);
diff --git a/include/asm-generic/pgtable_uffd.h b/include/asm-generic/pgtable_uffd.h
index 828966d4c281..95e9811ce9d1 100644
--- a/include/asm-generic/pgtable_uffd.h
+++ b/include/asm-generic/pgtable_uffd.h
@@ -2,6 +2,9 @@
#define _ASM_GENERIC_PGTABLE_UFFD_H

#ifndef CONFIG_HAVE_ARCH_USERFAULTFD_WP
+
+#define UFFD_WP_SWP_PTE_SPECIAL __pte(0)
+
static __always_inline int pte_uffd_wp(pte_t pte)
{
return 0;
diff --git a/include/linux/userfaultfd_k.h b/include/linux/userfaultfd_k.h
index 331d2ccf0bcc..bb5a72a2b07a 100644
--- a/include/linux/userfaultfd_k.h
+++ b/include/linux/userfaultfd_k.h
@@ -145,6 +145,21 @@ extern int userfaultfd_unmap_prep(struct vm_area_struct *vma,
extern void userfaultfd_unmap_complete(struct mm_struct *mm,
struct list_head *uf);

+static inline pte_t pte_swp_mkuffd_wp_special(struct vm_area_struct *vma)
+{
+ WARN_ON_ONCE(vma_is_anonymous(vma));
+ return UFFD_WP_SWP_PTE_SPECIAL;
+}
+
+static inline bool pte_swp_uffd_wp_special(pte_t pte)
+{
+#ifdef CONFIG_HAVE_ARCH_USERFAULTFD_WP
+ return pte_same(pte, UFFD_WP_SWP_PTE_SPECIAL);
+#else
+ return false;
+#endif
+}
+
#else /* CONFIG_USERFAULTFD */

/* mm helpers */
@@ -234,6 +249,16 @@ static inline void userfaultfd_unmap_complete(struct mm_struct *mm,
{
}

+static inline pte_t pte_swp_mkuffd_wp_special(struct vm_area_struct *vma)
+{
+ return __pte(0);
+}
+
+static inline bool pte_swp_uffd_wp_special(pte_t pte)
+{
+ return false;
+}
+
#endif /* CONFIG_USERFAULTFD */

#endif /* _LINUX_USERFAULTFD_K_H */
--
2.31.1

2021-07-15 00:09:18

by Peter Xu

[permalink] [raw]
Subject: [PATCH v4 09/26] mm: Introduce ZAP_FLAG_SKIP_SWAP

Firstly, the comment in zap_pte_range() is misleading because it checks against
details rather than check_mappings, so it's against what the code did.

Meanwhile, it's confusing too on not explaining why passing in the details
pointer would mean to skip all swap entries. New user of zap_details could
very possibly miss this fact if they don't read deep until zap_pte_range()
because there's no comment at zap_details talking about it at all, so swap
entries could be errornously skipped without being noticed.

This partly reverts 3e8715fdc03e ("mm: drop zap_details::check_swap_entries"),
but introduce ZAP_FLAG_SKIP_SWAP flag, which means the opposite of previous
"details" parameter: the caller should explicitly set this to skip swap
entries, otherwise swap entries will always be considered (which is still the
major case here).

Cc: Kirill A. Shutemov <[email protected]>
Reviewed-by: Alistair Popple <[email protected]>
Signed-off-by: Peter Xu <[email protected]>
---
include/linux/mm.h | 12 ++++++++++++
mm/memory.c | 8 +++++---
2 files changed, 17 insertions(+), 3 deletions(-)

diff --git a/include/linux/mm.h b/include/linux/mm.h
index 26088174daab..62a75e4414e3 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -1717,6 +1717,8 @@ extern void user_shm_unlock(size_t, struct ucounts *);

/* Whether to check page->mapping when zapping */
#define ZAP_FLAG_CHECK_MAPPING BIT(0)
+/* Whether to skip zapping swap entries */
+#define ZAP_FLAG_SKIP_SWAP BIT(1)

/*
* Parameter block passed down to zap_pte_range in exceptional cases.
@@ -1740,6 +1742,16 @@ zap_skip_check_mapping(struct zap_details *details, struct page *page)
return details->zap_mapping != page_rmapping(page);
}

+/* Return true if skip swap entries, false otherwise */
+static inline bool
+zap_skip_swap(struct zap_details *details)
+{
+ if (!details)
+ return false;
+
+ return details->zap_flags & ZAP_FLAG_SKIP_SWAP;
+}
+
struct page *vm_normal_page(struct vm_area_struct *vma, unsigned long addr,
pte_t pte);
struct page *vm_normal_page_pmd(struct vm_area_struct *vma, unsigned long addr,
diff --git a/mm/memory.c b/mm/memory.c
index 2a5a6650f069..d6b1adbf29e4 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -1379,8 +1379,7 @@ static unsigned long zap_pte_range(struct mmu_gather *tlb,
continue;
}

- /* If details->check_mapping, we leave swap entries. */
- if (unlikely(details))
+ if (unlikely(zap_skip_swap(details)))
continue;

if (!non_swap_entry(entry))
@@ -3379,7 +3378,10 @@ void unmap_mapping_pages(struct address_space *mapping, pgoff_t start,
pgoff_t nr, bool even_cows)
{
pgoff_t first_index = start, last_index = start + nr - 1;
- struct zap_details details = { .zap_mapping = mapping };
+ struct zap_details details = {
+ .zap_mapping = mapping,
+ .zap_flags = ZAP_FLAG_SKIP_SWAP,
+ };

if (!even_cows)
details.zap_flags |= ZAP_FLAG_CHECK_MAPPING;
--
2.31.1

2021-07-15 00:10:19

by Peter Xu

[permalink] [raw]
Subject: [PATCH v4 08/26] mm: Introduce zap_details.zap_flags

Instead of trying to introduce one variable for every new zap_details fields,
let's introduce a flag so that it can start to encode true/false informations.

Let's start to use this flag first to clean up the only check_mapping variable.
Firstly, the name "check_mapping" implies this is a "boolean", but actually it
stores the mapping inside, just in a way that it won't be set if we don't want
to check the mapping.

To make things clearer, introduce the 1st zap flag ZAP_FLAG_CHECK_MAPPING, so
that we only check against the mapping if this bit set. At the same time, we
can rename check_mapping into zap_mapping and set it always.

Since at it, introduce another helper zap_check_mapping_skip() and use it in
zap_pte_range() properly.

Some old comments have been removed in zap_pte_range() because they're
duplicated, and since now we're with ZAP_FLAG_CHECK_MAPPING flag, it'll be very
easy to grep this information by simply grepping the flag.

It'll also make life easier when we want to e.g. pass in zap_flags into the
callers like unmap_mapping_pages() (instead of adding new booleans besides the
even_cows parameter).

Signed-off-by: Peter Xu <[email protected]>
---
include/linux/mm.h | 19 ++++++++++++++++++-
mm/memory.c | 34 ++++++++++------------------------
2 files changed, 28 insertions(+), 25 deletions(-)

diff --git a/include/linux/mm.h b/include/linux/mm.h
index 795b3cd643ca..26088174daab 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -1715,14 +1715,31 @@ static inline bool can_do_mlock(void) { return false; }
extern int user_shm_lock(size_t, struct ucounts *);
extern void user_shm_unlock(size_t, struct ucounts *);

+/* Whether to check page->mapping when zapping */
+#define ZAP_FLAG_CHECK_MAPPING BIT(0)
+
/*
* Parameter block passed down to zap_pte_range in exceptional cases.
*/
struct zap_details {
- struct address_space *check_mapping; /* Check page->mapping if set */
+ struct address_space *zap_mapping;
struct page *single_page; /* Locked page to be unmapped */
+ unsigned long zap_flags;
};

+/* Return true if skip zapping this page, false otherwise */
+static inline bool
+zap_skip_check_mapping(struct zap_details *details, struct page *page)
+{
+ if (!details || !page)
+ return false;
+
+ if (!(details->zap_flags & ZAP_FLAG_CHECK_MAPPING))
+ return false;
+
+ return details->zap_mapping != page_rmapping(page);
+}
+
struct page *vm_normal_page(struct vm_area_struct *vma, unsigned long addr,
pte_t pte);
struct page *vm_normal_page_pmd(struct vm_area_struct *vma, unsigned long addr,
diff --git a/mm/memory.c b/mm/memory.c
index 4c269d7b3d83..2a5a6650f069 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -1333,16 +1333,8 @@ static unsigned long zap_pte_range(struct mmu_gather *tlb,
struct page *page;

page = vm_normal_page(vma, addr, ptent);
- if (unlikely(details) && page) {
- /*
- * unmap_shared_mapping_pages() wants to
- * invalidate cache without truncating:
- * unmap shared but keep private pages.
- */
- if (details->check_mapping &&
- details->check_mapping != page_rmapping(page))
- continue;
- }
+ if (unlikely(zap_skip_check_mapping(details, page)))
+ continue;
ptent = ptep_get_and_clear_full(mm, addr, pte,
tlb->fullmm);
tlb_remove_tlb_entry(tlb, pte, addr);
@@ -1375,17 +1367,8 @@ static unsigned long zap_pte_range(struct mmu_gather *tlb,
is_device_exclusive_entry(entry)) {
struct page *page = pfn_swap_entry_to_page(entry);

- if (unlikely(details && details->check_mapping)) {
- /*
- * unmap_shared_mapping_pages() wants to
- * invalidate cache without truncating:
- * unmap shared but keep private pages.
- */
- if (details->check_mapping !=
- page_rmapping(page))
- continue;
- }
-
+ if (unlikely(zap_skip_check_mapping(details, page)))
+ continue;
pte_clear_not_present_full(mm, addr, pte, tlb->fullmm);
rss[mm_counter(page)]--;

@@ -3369,8 +3352,9 @@ void unmap_mapping_page(struct page *page)
first_index = page->index;
last_index = page->index + thp_nr_pages(page) - 1;

- details.check_mapping = mapping;
+ details.zap_mapping = mapping;
details.single_page = page;
+ details.zap_flags = ZAP_FLAG_CHECK_MAPPING;

i_mmap_lock_write(mapping);
if (unlikely(!RB_EMPTY_ROOT(&mapping->i_mmap.rb_root)))
@@ -3395,9 +3379,11 @@ void unmap_mapping_pages(struct address_space *mapping, pgoff_t start,
pgoff_t nr, bool even_cows)
{
pgoff_t first_index = start, last_index = start + nr - 1;
- struct zap_details details = { };
+ struct zap_details details = { .zap_mapping = mapping };
+
+ if (!even_cows)
+ details.zap_flags |= ZAP_FLAG_CHECK_MAPPING;

- details.check_mapping = even_cows ? NULL : mapping;
if (last_index < first_index)
last_index = ULONG_MAX;

--
2.31.1

2021-07-15 00:17:26

by Peter Xu

[permalink] [raw]
Subject: [PATCH v4 17/26] hugetlb/userfaultfd: Hook page faults for uffd write protection

Hook up hugetlbfs_fault() with the capability to handle userfaultfd-wp faults.

We do this slightly earlier than hugetlb_cow() so that we can avoid taking some
extra locks that we definitely don't need.

Reviewed-by: Mike Kravetz <[email protected]>
Signed-off-by: Peter Xu <[email protected]>
---
mm/hugetlb.c | 19 +++++++++++++++++++
1 file changed, 19 insertions(+)

diff --git a/mm/hugetlb.c b/mm/hugetlb.c
index 924553aa8f78..8559b8bb7fa5 100644
--- a/mm/hugetlb.c
+++ b/mm/hugetlb.c
@@ -5059,6 +5059,25 @@ vm_fault_t hugetlb_fault(struct mm_struct *mm, struct vm_area_struct *vma,
if (unlikely(!pte_same(entry, huge_ptep_get(ptep))))
goto out_ptl;

+ /* Handle userfault-wp first, before trying to lock more pages */
+ if (userfaultfd_wp(vma) && huge_pte_uffd_wp(huge_ptep_get(ptep)) &&
+ (flags & FAULT_FLAG_WRITE) && !huge_pte_write(entry)) {
+ struct vm_fault vmf = {
+ .vma = vma,
+ .address = haddr,
+ .flags = flags,
+ };
+
+ spin_unlock(ptl);
+ if (pagecache_page) {
+ unlock_page(pagecache_page);
+ put_page(pagecache_page);
+ }
+ mutex_unlock(&hugetlb_fault_mutex_table[hash]);
+ i_mmap_unlock_read(mapping);
+ return handle_userfault(&vmf, VM_UFFD_WP);
+ }
+
/*
* hugetlb_cow() requires page locks of pte_page(entry) and
* pagecache_page, so here we need take the former one
--
2.31.1

2021-07-15 00:17:32

by Peter Xu

[permalink] [raw]
Subject: [PATCH v4 18/26] hugetlb/userfaultfd: Take care of UFFDIO_COPY_MODE_WP

Firstly, pass the wp_copy variable into hugetlb_mcopy_atomic_pte() thoughout
the stack. Then, apply the UFFD_WP bit if UFFDIO_COPY_MODE_WP is with
UFFDIO_COPY. Introduce huge_pte_mkuffd_wp() for it.

Hugetlb pages are only managed by hugetlbfs, so we're safe even without setting
dirty bit in the huge pte if the page is installed as read-only. However we'd
better still keep the dirty bit set for a read-only UFFDIO_COPY pte (when
UFFDIO_COPY_MODE_WP bit is set), not only to match what we do with shmem, but
also because the page does contain dirty data that the kernel just copied from
the userspace.

Signed-off-by: Peter Xu <[email protected]>
---
include/linux/hugetlb.h | 6 ++++--
mm/hugetlb.c | 22 +++++++++++++++++-----
mm/userfaultfd.c | 12 ++++++++----
3 files changed, 29 insertions(+), 11 deletions(-)

diff --git a/include/linux/hugetlb.h b/include/linux/hugetlb.h
index c30f39815e13..fcdbf9f46d85 100644
--- a/include/linux/hugetlb.h
+++ b/include/linux/hugetlb.h
@@ -155,7 +155,8 @@ int hugetlb_mcopy_atomic_pte(struct mm_struct *dst_mm, pte_t *dst_pte,
unsigned long dst_addr,
unsigned long src_addr,
enum mcopy_atomic_mode mode,
- struct page **pagep);
+ struct page **pagep,
+ bool wp_copy);
#endif /* CONFIG_USERFAULTFD */
bool hugetlb_reserve_pages(struct inode *inode, long from, long to,
struct vm_area_struct *vma,
@@ -336,7 +337,8 @@ static inline int hugetlb_mcopy_atomic_pte(struct mm_struct *dst_mm,
unsigned long dst_addr,
unsigned long src_addr,
enum mcopy_atomic_mode mode,
- struct page **pagep)
+ struct page **pagep,
+ bool wp_copy)
{
BUG();
return 0;
diff --git a/mm/hugetlb.c b/mm/hugetlb.c
index 8559b8bb7fa5..f4efcb8c6214 100644
--- a/mm/hugetlb.c
+++ b/mm/hugetlb.c
@@ -5141,7 +5141,8 @@ int hugetlb_mcopy_atomic_pte(struct mm_struct *dst_mm,
unsigned long dst_addr,
unsigned long src_addr,
enum mcopy_atomic_mode mode,
- struct page **pagep)
+ struct page **pagep,
+ bool wp_copy)
{
bool is_continue = (mode == MCOPY_ATOMIC_CONTINUE);
struct hstate *h = hstate_vma(dst_vma);
@@ -5277,17 +5278,28 @@ int hugetlb_mcopy_atomic_pte(struct mm_struct *dst_mm,
hugepage_add_new_anon_rmap(page, dst_vma, dst_addr);
}

- /* For CONTINUE on a non-shared VMA, don't set VM_WRITE for CoW. */
- if (is_continue && !vm_shared)
+ /*
+ * For either: (1) CONTINUE on a non-shared VMA, or (2) UFFDIO_COPY
+ * with wp flag set, don't set pte write bit.
+ */
+ if (wp_copy || (is_continue && !vm_shared))
writable = 0;
else
writable = dst_vma->vm_flags & VM_WRITE;

_dst_pte = make_huge_pte(dst_vma, page, writable);
- if (writable)
- _dst_pte = huge_pte_mkdirty(_dst_pte);
+ /*
+ * Always mark UFFDIO_COPY page dirty; note that this may not be
+ * extremely important for hugetlbfs for now since swapping is not
+ * supported, but we should still be clear in that this page cannot be
+ * thrown away at will, even if write bit not set.
+ */
+ _dst_pte = huge_pte_mkdirty(_dst_pte);
_dst_pte = pte_mkyoung(_dst_pte);

+ if (wp_copy)
+ _dst_pte = huge_pte_mkuffd_wp(_dst_pte);
+
set_huge_pte_at(dst_mm, dst_addr, dst_pte, _dst_pte);

(void)huge_ptep_set_access_flags(dst_vma, dst_addr, dst_pte, _dst_pte,
diff --git a/mm/userfaultfd.c b/mm/userfaultfd.c
index 0c7212dfb95d..501d6b9f7a5a 100644
--- a/mm/userfaultfd.c
+++ b/mm/userfaultfd.c
@@ -297,7 +297,8 @@ static __always_inline ssize_t __mcopy_atomic_hugetlb(struct mm_struct *dst_mm,
unsigned long dst_start,
unsigned long src_start,
unsigned long len,
- enum mcopy_atomic_mode mode)
+ enum mcopy_atomic_mode mode,
+ bool wp_copy)
{
int vm_shared = dst_vma->vm_flags & VM_SHARED;
ssize_t err;
@@ -393,7 +394,8 @@ static __always_inline ssize_t __mcopy_atomic_hugetlb(struct mm_struct *dst_mm,
}

err = hugetlb_mcopy_atomic_pte(dst_mm, dst_pte, dst_vma,
- dst_addr, src_addr, mode, &page);
+ dst_addr, src_addr, mode, &page,
+ wp_copy);

mutex_unlock(&hugetlb_fault_mutex_table[hash]);
i_mmap_unlock_read(mapping);
@@ -448,7 +450,8 @@ extern ssize_t __mcopy_atomic_hugetlb(struct mm_struct *dst_mm,
unsigned long dst_start,
unsigned long src_start,
unsigned long len,
- enum mcopy_atomic_mode mode);
+ enum mcopy_atomic_mode mode,
+ bool wp_copy);
#endif /* CONFIG_HUGETLB_PAGE */

static __always_inline ssize_t mfill_atomic_pte(struct mm_struct *dst_mm,
@@ -568,7 +571,8 @@ static __always_inline ssize_t __mcopy_atomic(struct mm_struct *dst_mm,
*/
if (is_vm_hugetlb_page(dst_vma))
return __mcopy_atomic_hugetlb(dst_mm, dst_vma, dst_start,
- src_start, len, mcopy_mode);
+ src_start, len, mcopy_mode,
+ wp_copy);

if (!vma_is_anonymous(dst_vma) && !vma_is_shmem(dst_vma))
goto out_unlock;
--
2.31.1

2021-07-15 00:21:05

by Peter Xu

[permalink] [raw]
Subject: [PATCH v4 15/26] mm/hugetlb: Drop __unmap_hugepage_range definition from hugetlb.h

Drop it in the header since it's only used in hugetlb.c.

Suggested-by: Mike Kravetz <[email protected]>
Signed-off-by: Peter Xu <[email protected]>
---
include/linux/hugetlb.h | 10 ----------
1 file changed, 10 deletions(-)

diff --git a/include/linux/hugetlb.h b/include/linux/hugetlb.h
index f7ca1a3870ea..c30f39815e13 100644
--- a/include/linux/hugetlb.h
+++ b/include/linux/hugetlb.h
@@ -143,9 +143,6 @@ void __unmap_hugepage_range_final(struct mmu_gather *tlb,
struct vm_area_struct *vma,
unsigned long start, unsigned long end,
struct page *ref_page);
-void __unmap_hugepage_range(struct mmu_gather *tlb, struct vm_area_struct *vma,
- unsigned long start, unsigned long end,
- struct page *ref_page);
void hugetlb_report_meminfo(struct seq_file *);
int hugetlb_report_node_meminfo(char *buf, int len, int nid);
void hugetlb_show_meminfo(void);
@@ -385,13 +382,6 @@ static inline void __unmap_hugepage_range_final(struct mmu_gather *tlb,
BUG();
}

-static inline void __unmap_hugepage_range(struct mmu_gather *tlb,
- struct vm_area_struct *vma, unsigned long start,
- unsigned long end, struct page *ref_page)
-{
- BUG();
-}
-
static inline vm_fault_t hugetlb_fault(struct mm_struct *mm,
struct vm_area_struct *vma, unsigned long address,
unsigned int flags)
--
2.31.1

2021-07-15 00:21:07

by Peter Xu

[permalink] [raw]
Subject: [PATCH v4 22/26] hugetlb/userfaultfd: Allow wr-protect none ptes

Teach hugetlbfs code to wr-protect none ptes just in case the page cache
existed for that pte. Meanwhile we also need to be able to recognize a uffd-wp
marker pte and remove it for uffd_wp_resolve.

Since at it, introduce a variable "psize" to replace all references to the huge
page size fetcher.

Reviewed-by: Mike Kravetz <[email protected]>
Signed-off-by: Peter Xu <[email protected]>
---
mm/hugetlb.c | 29 +++++++++++++++++++++++++----
1 file changed, 25 insertions(+), 4 deletions(-)

diff --git a/mm/hugetlb.c b/mm/hugetlb.c
index 15e5de480cf0..6ae911185554 100644
--- a/mm/hugetlb.c
+++ b/mm/hugetlb.c
@@ -5561,7 +5561,7 @@ unsigned long hugetlb_change_protection(struct vm_area_struct *vma,
pte_t *ptep;
pte_t pte;
struct hstate *h = hstate_vma(vma);
- unsigned long pages = 0;
+ unsigned long pages = 0, psize = huge_page_size(h);
bool shared_pmd = false;
struct mmu_notifier_range range;
bool uffd_wp = cp_flags & MM_CP_UFFD_WP;
@@ -5581,13 +5581,19 @@ unsigned long hugetlb_change_protection(struct vm_area_struct *vma,

mmu_notifier_invalidate_range_start(&range);
i_mmap_lock_write(vma->vm_file->f_mapping);
- for (; address < end; address += huge_page_size(h)) {
+ for (; address < end; address += psize) {
spinlock_t *ptl;
- ptep = huge_pte_offset(mm, address, huge_page_size(h));
+ ptep = huge_pte_offset(mm, address, psize);
if (!ptep)
continue;
ptl = huge_pte_lock(h, mm, ptep);
if (huge_pmd_unshare(mm, vma, &address, ptep)) {
+ /*
+ * When uffd-wp is enabled on the vma, unshare
+ * shouldn't happen at all. Warn about it if it
+ * happened due to some reason.
+ */
+ WARN_ON_ONCE(uffd_wp || uffd_wp_resolve);
pages++;
spin_unlock(ptl);
shared_pmd = true;
@@ -5612,12 +5618,21 @@ unsigned long hugetlb_change_protection(struct vm_area_struct *vma,
else if (uffd_wp_resolve)
newpte = pte_swp_clear_uffd_wp(newpte);
set_huge_swap_pte_at(mm, address, ptep,
- newpte, huge_page_size(h));
+ newpte, psize);
pages++;
}
spin_unlock(ptl);
continue;
}
+ if (unlikely(is_swap_special_pte(pte))) {
+ WARN_ON_ONCE(!pte_swp_uffd_wp_special(pte));
+ /*
+ * This is changing a non-present pte into a none pte,
+ * no need for huge_ptep_modify_prot_start/commit().
+ */
+ if (uffd_wp_resolve)
+ huge_pte_clear(mm, address, ptep, psize);
+ }
if (!huge_pte_none(pte)) {
pte_t old_pte;
unsigned int shift = huge_page_shift(hstate_vma(vma));
@@ -5631,6 +5646,12 @@ unsigned long hugetlb_change_protection(struct vm_area_struct *vma,
pte = huge_pte_clear_uffd_wp(pte);
huge_ptep_modify_prot_commit(vma, address, ptep, old_pte, pte);
pages++;
+ } else {
+ /* None pte */
+ if (unlikely(uffd_wp))
+ /* Safe to modify directly (none->non-present). */
+ set_huge_pte_at(mm, address, ptep,
+ pte_swp_mkuffd_wp_special(vma));
}
spin_unlock(ptl);
}
--
2.31.1

2021-07-15 00:21:18

by Peter Xu

[permalink] [raw]
Subject: [PATCH v4 23/26] hugetlb/userfaultfd: Only drop uffd-wp special pte if required

As with shmem uffd-wp special ptes, only drop the uffd-wp special swap pte if
unmapping an entire vma or synchronized such that faults can not race with the
unmap operation. This requires passing zap_flags all the way to the lowest
level hugetlb unmap routine: __unmap_hugepage_range.

In general, unmap calls originated in hugetlbfs code will pass the
ZAP_FLAG_DROP_FILE_UFFD_WP flag as synchronization is in place to prevent
faults. The exception is hole punch which will first unmap without any
synchronization. Later when hole punch actually removes the page from the
file, it will check to see if there was a subsequent fault and if so take the
hugetlb fault mutex while unmapping again. This second unmap will pass in
ZAP_FLAG_DROP_FILE_UFFD_WP.

The core justification of "whether to apply ZAP_FLAG_DROP_FILE_UFFD_WP flag
when unmap a hugetlb range" is (IMHO): we should never reach a state when a
page fault could errornously fault in a page-cache page that was wr-protected
to be writable, even in an extremely short period. That could happen if
e.g. we pass ZAP_FLAG_DROP_FILE_UFFD_WP in hugetlbfs_punch_hole() when calling
hugetlb_vmdelete_list(), because if a page fault triggers after that call and
before the remove_inode_hugepages() right after it, the page cache can be
mapped writable again in the small window, which can cause data corruption.

Reviewed-by: Mike Kravetz <[email protected]>
Signed-off-by: Peter Xu <[email protected]>
---
fs/hugetlbfs/inode.c | 15 +++++++++------
include/linux/hugetlb.h | 8 +++++---
mm/hugetlb.c | 27 +++++++++++++++++++++------
mm/memory.c | 5 ++++-
4 files changed, 39 insertions(+), 16 deletions(-)

diff --git a/fs/hugetlbfs/inode.c b/fs/hugetlbfs/inode.c
index 926eeb9bf4eb..fdbb972b781b 100644
--- a/fs/hugetlbfs/inode.c
+++ b/fs/hugetlbfs/inode.c
@@ -404,7 +404,8 @@ static void remove_huge_page(struct page *page)
}

static void
-hugetlb_vmdelete_list(struct rb_root_cached *root, pgoff_t start, pgoff_t end)
+hugetlb_vmdelete_list(struct rb_root_cached *root, pgoff_t start, pgoff_t end,
+ unsigned long zap_flags)
{
struct vm_area_struct *vma;

@@ -437,7 +438,7 @@ hugetlb_vmdelete_list(struct rb_root_cached *root, pgoff_t start, pgoff_t end)
}

unmap_hugepage_range(vma, vma->vm_start + v_offset, v_end,
- NULL);
+ NULL, zap_flags);
}
}

@@ -515,7 +516,8 @@ static void remove_inode_hugepages(struct inode *inode, loff_t lstart,
mutex_lock(&hugetlb_fault_mutex_table[hash]);
hugetlb_vmdelete_list(&mapping->i_mmap,
index * pages_per_huge_page(h),
- (index + 1) * pages_per_huge_page(h));
+ (index + 1) * pages_per_huge_page(h),
+ ZAP_FLAG_DROP_FILE_UFFD_WP);
i_mmap_unlock_write(mapping);
}

@@ -581,7 +583,8 @@ static void hugetlb_vmtruncate(struct inode *inode, loff_t offset)
i_mmap_lock_write(mapping);
i_size_write(inode, offset);
if (!RB_EMPTY_ROOT(&mapping->i_mmap.rb_root))
- hugetlb_vmdelete_list(&mapping->i_mmap, pgoff, 0);
+ hugetlb_vmdelete_list(&mapping->i_mmap, pgoff, 0,
+ ZAP_FLAG_DROP_FILE_UFFD_WP);
i_mmap_unlock_write(mapping);
remove_inode_hugepages(inode, offset, LLONG_MAX);
}
@@ -614,8 +617,8 @@ static long hugetlbfs_punch_hole(struct inode *inode, loff_t offset, loff_t len)
i_mmap_lock_write(mapping);
if (!RB_EMPTY_ROOT(&mapping->i_mmap.rb_root))
hugetlb_vmdelete_list(&mapping->i_mmap,
- hole_start >> PAGE_SHIFT,
- hole_end >> PAGE_SHIFT);
+ hole_start >> PAGE_SHIFT,
+ hole_end >> PAGE_SHIFT, 0);
i_mmap_unlock_write(mapping);
remove_inode_hugepages(inode, hole_start, hole_end);
inode_unlock(inode);
diff --git a/include/linux/hugetlb.h b/include/linux/hugetlb.h
index e19ca363803d..809bb63ecf9e 100644
--- a/include/linux/hugetlb.h
+++ b/include/linux/hugetlb.h
@@ -138,11 +138,12 @@ long follow_hugetlb_page(struct mm_struct *, struct vm_area_struct *,
unsigned long *, unsigned long *, long, unsigned int,
int *);
void unmap_hugepage_range(struct vm_area_struct *,
- unsigned long, unsigned long, struct page *);
+ unsigned long, unsigned long, struct page *,
+ unsigned long);
void __unmap_hugepage_range_final(struct mmu_gather *tlb,
struct vm_area_struct *vma,
unsigned long start, unsigned long end,
- struct page *ref_page);
+ struct page *ref_page, unsigned long zap_flags);
void hugetlb_report_meminfo(struct seq_file *);
int hugetlb_report_node_meminfo(char *buf, int len, int nid);
void hugetlb_show_meminfo(void);
@@ -381,7 +382,8 @@ static inline unsigned long hugetlb_change_protection(

static inline void __unmap_hugepage_range_final(struct mmu_gather *tlb,
struct vm_area_struct *vma, unsigned long start,
- unsigned long end, struct page *ref_page)
+ unsigned long end, struct page *ref_page,
+ unsigned long zap_flags)
{
BUG();
}
diff --git a/mm/hugetlb.c b/mm/hugetlb.c
index 6ae911185554..cc5616d78f35 100644
--- a/mm/hugetlb.c
+++ b/mm/hugetlb.c
@@ -4353,7 +4353,7 @@ int copy_hugetlb_page_range(struct mm_struct *dst, struct mm_struct *src,

void __unmap_hugepage_range(struct mmu_gather *tlb, struct vm_area_struct *vma,
unsigned long start, unsigned long end,
- struct page *ref_page)
+ struct page *ref_page, unsigned long zap_flags)
{
struct mm_struct *mm = vma->vm_mm;
unsigned long address;
@@ -4405,6 +4405,19 @@ void __unmap_hugepage_range(struct mmu_gather *tlb, struct vm_area_struct *vma,
continue;
}

+ if (unlikely(is_swap_special_pte(pte))) {
+ WARN_ON_ONCE(!pte_swp_uffd_wp_special(pte));
+ /*
+ * Only drop the special swap uffd-wp pte if
+ * e.g. unmapping a vma or punching a hole (with proper
+ * lock held so that concurrent page fault won't happen).
+ */
+ if (zap_flags & ZAP_FLAG_DROP_FILE_UFFD_WP)
+ huge_pte_clear(mm, address, ptep, sz);
+ spin_unlock(ptl);
+ continue;
+ }
+
/*
* Migrating hugepage or HWPoisoned hugepage is already
* unmapped and its refcount is dropped, so just clear pte here.
@@ -4456,9 +4469,10 @@ void __unmap_hugepage_range(struct mmu_gather *tlb, struct vm_area_struct *vma,

void __unmap_hugepage_range_final(struct mmu_gather *tlb,
struct vm_area_struct *vma, unsigned long start,
- unsigned long end, struct page *ref_page)
+ unsigned long end, struct page *ref_page,
+ unsigned long zap_flags)
{
- __unmap_hugepage_range(tlb, vma, start, end, ref_page);
+ __unmap_hugepage_range(tlb, vma, start, end, ref_page, zap_flags);

/*
* Clear this flag so that x86's huge_pmd_share page_table_shareable
@@ -4474,12 +4488,13 @@ void __unmap_hugepage_range_final(struct mmu_gather *tlb,
}

void unmap_hugepage_range(struct vm_area_struct *vma, unsigned long start,
- unsigned long end, struct page *ref_page)
+ unsigned long end, struct page *ref_page,
+ unsigned long zap_flags)
{
struct mmu_gather tlb;

tlb_gather_mmu(&tlb, vma->vm_mm);
- __unmap_hugepage_range(&tlb, vma, start, end, ref_page);
+ __unmap_hugepage_range(&tlb, vma, start, end, ref_page, zap_flags);
tlb_finish_mmu(&tlb);
}

@@ -4534,7 +4549,7 @@ static void unmap_ref_private(struct mm_struct *mm, struct vm_area_struct *vma,
*/
if (!is_vma_resv_set(iter_vma, HPAGE_RESV_OWNER))
unmap_hugepage_range(iter_vma, address,
- address + huge_page_size(h), page);
+ address + huge_page_size(h), page, 0);
}
i_mmap_unlock_write(mapping);
}
diff --git a/mm/memory.c b/mm/memory.c
index af91bee934c7..c4a80f45e48f 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -1626,8 +1626,11 @@ static void unmap_single_vma(struct mmu_gather *tlb,
* safe to do nothing in this case.
*/
if (vma->vm_file) {
+ unsigned long zap_flags = details ?
+ details->zap_flags : 0;
i_mmap_lock_write(vma->vm_file->f_mapping);
- __unmap_hugepage_range_final(tlb, vma, start, end, NULL);
+ __unmap_hugepage_range_final(tlb, vma, start, end,
+ NULL, zap_flags);
i_mmap_unlock_write(vma->vm_file->f_mapping);
}
} else
--
2.31.1

2021-07-15 01:12:37

by Peter Xu

[permalink] [raw]
Subject: [PATCH v4 12/26] shmem/userfaultfd: Allows file-back mem to be uffd wr-protected on thps

We don't have "huge" version of PTE_SWP_UFFD_WP_SPECIAL, instead when necessary
we split the thp if the huge page is uffd wr-protected previously.

However split the thp is not enough, because file-backed thp is handled totally
differently comparing to anonymous thps - rather than doing a real split, the
thp pmd will simply got dropped in __split_huge_pmd_locked().

That is definitely not enough if e.g. when there is a thp covers range [0, 2M)
but we want to wr-protect small page resides in [4K, 8K) range, because after
__split_huge_pmd() returns, there will be a none pmd.

Here we leverage the previously introduced change_protection_prepare() macro so
that we'll populate the pmd with a pgtable page. Then change_pte_range() will
do all the rest for us, e.g., install the uffd-wp swap special pte marker at
any pte that we'd like to wr-protect, under the protection of pgtable lock.

Signed-off-by: Peter Xu <[email protected]>
---
mm/mprotect.c | 10 +++++++++-
1 file changed, 9 insertions(+), 1 deletion(-)

diff --git a/mm/mprotect.c b/mm/mprotect.c
index 8ec85b276975..3fcb87b59696 100644
--- a/mm/mprotect.c
+++ b/mm/mprotect.c
@@ -306,8 +306,16 @@ static inline unsigned long change_pmd_range(struct vm_area_struct *vma,
}

if (is_swap_pmd(*pmd) || pmd_trans_huge(*pmd) || pmd_devmap(*pmd)) {
- if (next - addr != HPAGE_PMD_SIZE) {
+ if (next - addr != HPAGE_PMD_SIZE ||
+ /* Uffd wr-protecting a file-backed memory range */
+ unlikely(!vma_is_anonymous(vma) &&
+ (cp_flags & MM_CP_UFFD_WP))) {
__split_huge_pmd(vma, pmd, addr, false, NULL);
+ /*
+ * For file-backed, the pmd could have been
+ * gone; still provide a pte pgtable if needed.
+ */
+ change_protection_prepare(vma, pmd, addr, cp_flags);
} else {
int nr_ptes = change_huge_pmd(vma, pmd, addr,
newprot, cp_flags);
--
2.31.1

2021-07-15 05:54:28

by kernel test robot

[permalink] [raw]
Subject: Re: [PATCH v4 15/26] mm/hugetlb: Drop __unmap_hugepage_range definition from hugetlb.h

Hi Peter,

Thank you for the patch! Perhaps something to improve:

[auto build test WARNING on linus/master]
[also build test WARNING on v5.14-rc1 next-20210714]
[cannot apply to hnaz-linux-mm/master asm-generic/master linux/master]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch]

url: https://github.com/0day-ci/linux/commits/Peter-Xu/userfaultfd-wp-Support-shmem-and-hugetlbfs/20210715-062718
base: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git 8096acd7442e613fad0354fc8dfdb2003cceea0b
config: powerpc64-randconfig-r032-20210714 (attached as .config)
compiler: powerpc64-linux-gcc (GCC) 9.3.0
reproduce (this is a W=1 build):
wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross
chmod +x ~/bin/make.cross
# https://github.com/0day-ci/linux/commit/f8dd355edbfe948f84c8aaa10a5173656aa2778c
git remote add linux-review https://github.com/0day-ci/linux
git fetch --no-tags linux-review Peter-Xu/userfaultfd-wp-Support-shmem-and-hugetlbfs/20210715-062718
git checkout f8dd355edbfe948f84c8aaa10a5173656aa2778c
# save the attached .config to linux build tree
COMPILER_INSTALL_PATH=$HOME/0day COMPILER=gcc-9.3.0 make.cross ARCH=powerpc64

If you fix the issue, kindly add following tag as appropriate
Reported-by: kernel test robot <[email protected]>

All warnings (new ones prefixed by >>):

>> mm/hugetlb.c:4334:6: warning: no previous prototype for '__unmap_hugepage_range' [-Wmissing-prototypes]
4334 | void __unmap_hugepage_range(struct mmu_gather *tlb, struct vm_area_struct *vma,
| ^~~~~~~~~~~~~~~~~~~~~~


vim +/__unmap_hugepage_range +4334 mm/hugetlb.c

63551ae0feaaa2 David Gibson 2005-06-21 4333
24669e58477e27 Aneesh Kumar K.V 2012-07-31 @4334 void __unmap_hugepage_range(struct mmu_gather *tlb, struct vm_area_struct *vma,
24669e58477e27 Aneesh Kumar K.V 2012-07-31 4335 unsigned long start, unsigned long end,
24669e58477e27 Aneesh Kumar K.V 2012-07-31 4336 struct page *ref_page)
63551ae0feaaa2 David Gibson 2005-06-21 4337 {
63551ae0feaaa2 David Gibson 2005-06-21 4338 struct mm_struct *mm = vma->vm_mm;
63551ae0feaaa2 David Gibson 2005-06-21 4339 unsigned long address;
c7546f8f03f5a4 David Gibson 2005-08-05 4340 pte_t *ptep;
63551ae0feaaa2 David Gibson 2005-06-21 4341 pte_t pte;
cb900f41215447 Kirill A. Shutemov 2013-11-14 4342 spinlock_t *ptl;
63551ae0feaaa2 David Gibson 2005-06-21 4343 struct page *page;
a5516438959d90 Andi Kleen 2008-07-23 4344 struct hstate *h = hstate_vma(vma);
a5516438959d90 Andi Kleen 2008-07-23 4345 unsigned long sz = huge_page_size(h);
ac46d4f3c43241 J?r?me Glisse 2018-12-28 4346 struct mmu_notifier_range range;
a5516438959d90 Andi Kleen 2008-07-23 4347
63551ae0feaaa2 David Gibson 2005-06-21 4348 WARN_ON(!is_vm_hugetlb_page(vma));
a5516438959d90 Andi Kleen 2008-07-23 4349 BUG_ON(start & ~huge_page_mask(h));
a5516438959d90 Andi Kleen 2008-07-23 4350 BUG_ON(end & ~huge_page_mask(h));
63551ae0feaaa2 David Gibson 2005-06-21 4351
07e326610e5634 Aneesh Kumar K.V 2016-12-12 4352 /*
07e326610e5634 Aneesh Kumar K.V 2016-12-12 4353 * This is a hugetlb vma, all the pte entries should point
07e326610e5634 Aneesh Kumar K.V 2016-12-12 4354 * to huge page.
07e326610e5634 Aneesh Kumar K.V 2016-12-12 4355 */
ed6a79352cad00 Peter Zijlstra 2018-08-31 4356 tlb_change_page_size(tlb, sz);
24669e58477e27 Aneesh Kumar K.V 2012-07-31 4357 tlb_start_vma(tlb, vma);
dff11abe280b47 Mike Kravetz 2018-10-05 4358
dff11abe280b47 Mike Kravetz 2018-10-05 4359 /*
dff11abe280b47 Mike Kravetz 2018-10-05 4360 * If sharing possible, alert mmu notifiers of worst case.
dff11abe280b47 Mike Kravetz 2018-10-05 4361 */
6f4f13e8d9e27c J?r?me Glisse 2019-05-13 4362 mmu_notifier_range_init(&range, MMU_NOTIFY_UNMAP, 0, vma, mm, start,
6f4f13e8d9e27c J?r?me Glisse 2019-05-13 4363 end);
ac46d4f3c43241 J?r?me Glisse 2018-12-28 4364 adjust_range_if_pmd_sharing_possible(vma, &range.start, &range.end);
ac46d4f3c43241 J?r?me Glisse 2018-12-28 4365 mmu_notifier_invalidate_range_start(&range);
569f48b85813f0 Hillf Danton 2014-12-10 4366 address = start;
569f48b85813f0 Hillf Danton 2014-12-10 4367 for (; address < end; address += sz) {
7868a2087ec13e Punit Agrawal 2017-07-06 4368 ptep = huge_pte_offset(mm, address, sz);
c7546f8f03f5a4 David Gibson 2005-08-05 4369 if (!ptep)
c7546f8f03f5a4 David Gibson 2005-08-05 4370 continue;
c7546f8f03f5a4 David Gibson 2005-08-05 4371
cb900f41215447 Kirill A. Shutemov 2013-11-14 4372 ptl = huge_pte_lock(h, mm, ptep);
34ae204f18519f Mike Kravetz 2020-08-11 4373 if (huge_pmd_unshare(mm, vma, &address, ptep)) {
31d49da5ad0172 Aneesh Kumar K.V 2016-07-26 4374 spin_unlock(ptl);
dff11abe280b47 Mike Kravetz 2018-10-05 4375 /*
dff11abe280b47 Mike Kravetz 2018-10-05 4376 * We just unmapped a page of PMDs by clearing a PUD.
dff11abe280b47 Mike Kravetz 2018-10-05 4377 * The caller's TLB flush range should cover this area.
dff11abe280b47 Mike Kravetz 2018-10-05 4378 */
31d49da5ad0172 Aneesh Kumar K.V 2016-07-26 4379 continue;
31d49da5ad0172 Aneesh Kumar K.V 2016-07-26 4380 }
39dde65c9940c9 Kenneth W Chen 2006-12-06 4381
6629326b89b6e6 Hillf Danton 2012-03-23 4382 pte = huge_ptep_get(ptep);
31d49da5ad0172 Aneesh Kumar K.V 2016-07-26 4383 if (huge_pte_none(pte)) {
31d49da5ad0172 Aneesh Kumar K.V 2016-07-26 4384 spin_unlock(ptl);
31d49da5ad0172 Aneesh Kumar K.V 2016-07-26 4385 continue;
31d49da5ad0172 Aneesh Kumar K.V 2016-07-26 4386 }
6629326b89b6e6 Hillf Danton 2012-03-23 4387
6629326b89b6e6 Hillf Danton 2012-03-23 4388 /*
9fbc1f635fd0bd Naoya Horiguchi 2015-02-11 4389 * Migrating hugepage or HWPoisoned hugepage is already
9fbc1f635fd0bd Naoya Horiguchi 2015-02-11 4390 * unmapped and its refcount is dropped, so just clear pte here.
6629326b89b6e6 Hillf Danton 2012-03-23 4391 */
9fbc1f635fd0bd Naoya Horiguchi 2015-02-11 4392 if (unlikely(!pte_present(pte))) {
9386fac34c7cbe Punit Agrawal 2017-07-06 4393 huge_pte_clear(mm, address, ptep, sz);
31d49da5ad0172 Aneesh Kumar K.V 2016-07-26 4394 spin_unlock(ptl);
31d49da5ad0172 Aneesh Kumar K.V 2016-07-26 4395 continue;
8c4894c6bc790d Naoya Horiguchi 2012-12-12 4396 }
6629326b89b6e6 Hillf Danton 2012-03-23 4397
6629326b89b6e6 Hillf Danton 2012-03-23 4398 page = pte_page(pte);
04f2cbe35699d2 Mel Gorman 2008-07-23 4399 /*
04f2cbe35699d2 Mel Gorman 2008-07-23 4400 * If a reference page is supplied, it is because a specific
04f2cbe35699d2 Mel Gorman 2008-07-23 4401 * page is being unmapped, not a range. Ensure the page we
04f2cbe35699d2 Mel Gorman 2008-07-23 4402 * are about to unmap is the actual page of interest.
04f2cbe35699d2 Mel Gorman 2008-07-23 4403 */
04f2cbe35699d2 Mel Gorman 2008-07-23 4404 if (ref_page) {
31d49da5ad0172 Aneesh Kumar K.V 2016-07-26 4405 if (page != ref_page) {
31d49da5ad0172 Aneesh Kumar K.V 2016-07-26 4406 spin_unlock(ptl);
31d49da5ad0172 Aneesh Kumar K.V 2016-07-26 4407 continue;
31d49da5ad0172 Aneesh Kumar K.V 2016-07-26 4408 }
04f2cbe35699d2 Mel Gorman 2008-07-23 4409 /*
04f2cbe35699d2 Mel Gorman 2008-07-23 4410 * Mark the VMA as having unmapped its page so that
04f2cbe35699d2 Mel Gorman 2008-07-23 4411 * future faults in this VMA will fail rather than
04f2cbe35699d2 Mel Gorman 2008-07-23 4412 * looking like data was lost
04f2cbe35699d2 Mel Gorman 2008-07-23 4413 */
04f2cbe35699d2 Mel Gorman 2008-07-23 4414 set_vma_resv_flags(vma, HPAGE_RESV_UNMAPPED);
04f2cbe35699d2 Mel Gorman 2008-07-23 4415 }
04f2cbe35699d2 Mel Gorman 2008-07-23 4416
c7546f8f03f5a4 David Gibson 2005-08-05 4417 pte = huge_ptep_get_and_clear(mm, address, ptep);
b528e4b6405b9f Aneesh Kumar K.V 2016-12-12 4418 tlb_remove_huge_tlb_entry(h, tlb, ptep, address);
106c992a5ebef2 Gerald Schaefer 2013-04-29 4419 if (huge_pte_dirty(pte))
6649a3863232eb Ken Chen 2007-02-08 4420 set_page_dirty(page);
9e81130b7ce230 Hillf Danton 2012-03-21 4421
5d317b2b653659 Naoya Horiguchi 2015-11-05 4422 hugetlb_count_sub(pages_per_huge_page(h), mm);
d281ee61451835 Kirill A. Shutemov 2016-01-15 4423 page_remove_rmap(page, true);
31d49da5ad0172 Aneesh Kumar K.V 2016-07-26 4424
cb900f41215447 Kirill A. Shutemov 2013-11-14 4425 spin_unlock(ptl);
e77b0852b551ff Aneesh Kumar K.V 2016-07-26 4426 tlb_remove_page_size(tlb, page, huge_page_size(h));
24669e58477e27 Aneesh Kumar K.V 2012-07-31 4427 /*
31d49da5ad0172 Aneesh Kumar K.V 2016-07-26 4428 * Bail out after unmapping reference page if supplied
24669e58477e27 Aneesh Kumar K.V 2012-07-31 4429 */
31d49da5ad0172 Aneesh Kumar K.V 2016-07-26 4430 if (ref_page)
31d49da5ad0172 Aneesh Kumar K.V 2016-07-26 4431 break;
fe1668ae5bf014 Kenneth W Chen 2006-10-04 4432 }
ac46d4f3c43241 J?r?me Glisse 2018-12-28 4433 mmu_notifier_invalidate_range_end(&range);
24669e58477e27 Aneesh Kumar K.V 2012-07-31 4434 tlb_end_vma(tlb, vma);
^1da177e4c3f41 Linus Torvalds 2005-04-16 4435 }
63551ae0feaaa2 David Gibson 2005-06-21 4436

---
0-DAY CI Kernel Test Service, Intel Corporation
https://lists.01.org/hyperkitty/list/[email protected]


Attachments:
(No filename) (9.92 kB)
.config.gz (22.21 kB)
Download all attachments

2021-07-15 10:47:12

by kernel test robot

[permalink] [raw]
Subject: Re: [PATCH v4 15/26] mm/hugetlb: Drop __unmap_hugepage_range definition from hugetlb.h

Hi Peter,

Thank you for the patch! Perhaps something to improve:

[auto build test WARNING on linus/master]
[also build test WARNING on v5.14-rc1 next-20210715]
[cannot apply to hnaz-linux-mm/master asm-generic/master linux/master]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch]

url: https://github.com/0day-ci/linux/commits/Peter-Xu/userfaultfd-wp-Support-shmem-and-hugetlbfs/20210715-062718
base: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git 8096acd7442e613fad0354fc8dfdb2003cceea0b
config: powerpc-randconfig-r012-20210714 (attached as .config)
compiler: clang version 13.0.0 (https://github.com/llvm/llvm-project 0e49c54a8cbd3e779e5526a5888c683c01cc3c50)
reproduce (this is a W=1 build):
wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross
chmod +x ~/bin/make.cross
# install powerpc cross compiling tool for clang build
# apt-get install binutils-powerpc-linux-gnu
# https://github.com/0day-ci/linux/commit/f8dd355edbfe948f84c8aaa10a5173656aa2778c
git remote add linux-review https://github.com/0day-ci/linux
git fetch --no-tags linux-review Peter-Xu/userfaultfd-wp-Support-shmem-and-hugetlbfs/20210715-062718
git checkout f8dd355edbfe948f84c8aaa10a5173656aa2778c
# save the attached .config to linux build tree
COMPILER_INSTALL_PATH=$HOME/0day COMPILER=clang make.cross ARCH=powerpc

If you fix the issue, kindly add following tag as appropriate
Reported-by: kernel test robot <[email protected]>

All warnings (new ones prefixed by >>):

__do_insb
^
arch/powerpc/include/asm/io.h:556:56: note: expanded from macro '__do_insb'
#define __do_insb(p, b, n) readsb((PCI_IO_ADDR)_IO_BASE+(p), (b), (n))
~~~~~~~~~~~~~~~~~~~~~^
In file included from mm/hugetlb.c:11:
In file included from include/linux/highmem.h:10:
In file included from include/linux/hardirq.h:11:
In file included from arch/powerpc/include/asm/hardirq.h:6:
In file included from include/linux/irq.h:20:
In file included from include/linux/io.h:13:
In file included from arch/powerpc/include/asm/io.h:619:
arch/powerpc/include/asm/io-defs.h:45:1: warning: performing pointer arithmetic on a null pointer has undefined behavior [-Wnull-pointer-arithmetic]
DEF_PCI_AC_NORET(insw, (unsigned long p, void *b, unsigned long c),
^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
arch/powerpc/include/asm/io.h:616:3: note: expanded from macro 'DEF_PCI_AC_NORET'
__do_##name al; \
^~~~~~~~~~~~~~
<scratch space>:126:1: note: expanded from here
__do_insw
^
arch/powerpc/include/asm/io.h:557:56: note: expanded from macro '__do_insw'
#define __do_insw(p, b, n) readsw((PCI_IO_ADDR)_IO_BASE+(p), (b), (n))
~~~~~~~~~~~~~~~~~~~~~^
In file included from mm/hugetlb.c:11:
In file included from include/linux/highmem.h:10:
In file included from include/linux/hardirq.h:11:
In file included from arch/powerpc/include/asm/hardirq.h:6:
In file included from include/linux/irq.h:20:
In file included from include/linux/io.h:13:
In file included from arch/powerpc/include/asm/io.h:619:
arch/powerpc/include/asm/io-defs.h:47:1: warning: performing pointer arithmetic on a null pointer has undefined behavior [-Wnull-pointer-arithmetic]
DEF_PCI_AC_NORET(insl, (unsigned long p, void *b, unsigned long c),
^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
arch/powerpc/include/asm/io.h:616:3: note: expanded from macro 'DEF_PCI_AC_NORET'
__do_##name al; \
^~~~~~~~~~~~~~
<scratch space>:128:1: note: expanded from here
__do_insl
^
arch/powerpc/include/asm/io.h:558:56: note: expanded from macro '__do_insl'
#define __do_insl(p, b, n) readsl((PCI_IO_ADDR)_IO_BASE+(p), (b), (n))
~~~~~~~~~~~~~~~~~~~~~^
In file included from mm/hugetlb.c:11:
In file included from include/linux/highmem.h:10:
In file included from include/linux/hardirq.h:11:
In file included from arch/powerpc/include/asm/hardirq.h:6:
In file included from include/linux/irq.h:20:
In file included from include/linux/io.h:13:
In file included from arch/powerpc/include/asm/io.h:619:
arch/powerpc/include/asm/io-defs.h:49:1: warning: performing pointer arithmetic on a null pointer has undefined behavior [-Wnull-pointer-arithmetic]
DEF_PCI_AC_NORET(outsb, (unsigned long p, const void *b, unsigned long c),
^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
arch/powerpc/include/asm/io.h:616:3: note: expanded from macro 'DEF_PCI_AC_NORET'
__do_##name al; \
^~~~~~~~~~~~~~
<scratch space>:130:1: note: expanded from here
__do_outsb
^
arch/powerpc/include/asm/io.h:559:58: note: expanded from macro '__do_outsb'
#define __do_outsb(p, b, n) writesb((PCI_IO_ADDR)_IO_BASE+(p),(b),(n))
~~~~~~~~~~~~~~~~~~~~~^
In file included from mm/hugetlb.c:11:
In file included from include/linux/highmem.h:10:
In file included from include/linux/hardirq.h:11:
In file included from arch/powerpc/include/asm/hardirq.h:6:
In file included from include/linux/irq.h:20:
In file included from include/linux/io.h:13:
In file included from arch/powerpc/include/asm/io.h:619:
arch/powerpc/include/asm/io-defs.h:51:1: warning: performing pointer arithmetic on a null pointer has undefined behavior [-Wnull-pointer-arithmetic]
DEF_PCI_AC_NORET(outsw, (unsigned long p, const void *b, unsigned long c),
^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
arch/powerpc/include/asm/io.h:616:3: note: expanded from macro 'DEF_PCI_AC_NORET'
__do_##name al; \
^~~~~~~~~~~~~~
<scratch space>:132:1: note: expanded from here
__do_outsw
^
arch/powerpc/include/asm/io.h:560:58: note: expanded from macro '__do_outsw'
#define __do_outsw(p, b, n) writesw((PCI_IO_ADDR)_IO_BASE+(p),(b),(n))
~~~~~~~~~~~~~~~~~~~~~^
In file included from mm/hugetlb.c:11:
In file included from include/linux/highmem.h:10:
In file included from include/linux/hardirq.h:11:
In file included from arch/powerpc/include/asm/hardirq.h:6:
In file included from include/linux/irq.h:20:
In file included from include/linux/io.h:13:
In file included from arch/powerpc/include/asm/io.h:619:
arch/powerpc/include/asm/io-defs.h:53:1: warning: performing pointer arithmetic on a null pointer has undefined behavior [-Wnull-pointer-arithmetic]
DEF_PCI_AC_NORET(outsl, (unsigned long p, const void *b, unsigned long c),
^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
arch/powerpc/include/asm/io.h:616:3: note: expanded from macro 'DEF_PCI_AC_NORET'
__do_##name al; \
^~~~~~~~~~~~~~
<scratch space>:134:1: note: expanded from here
__do_outsl
^
arch/powerpc/include/asm/io.h:561:58: note: expanded from macro '__do_outsl'
#define __do_outsl(p, b, n) writesl((PCI_IO_ADDR)_IO_BASE+(p),(b),(n))
~~~~~~~~~~~~~~~~~~~~~^
>> mm/hugetlb.c:4334:6: warning: no previous prototype for function '__unmap_hugepage_range' [-Wmissing-prototypes]
void __unmap_hugepage_range(struct mmu_gather *tlb, struct vm_area_struct *vma,
^
mm/hugetlb.c:4334:1: note: declare 'static' if the function is not intended to be used outside of this translation unit
void __unmap_hugepage_range(struct mmu_gather *tlb, struct vm_area_struct *vma,
^
static
7 warnings generated.


vim +/__unmap_hugepage_range +4334 mm/hugetlb.c

63551ae0feaaa2 David Gibson 2005-06-21 4333
24669e58477e27 Aneesh Kumar K.V 2012-07-31 @4334 void __unmap_hugepage_range(struct mmu_gather *tlb, struct vm_area_struct *vma,
24669e58477e27 Aneesh Kumar K.V 2012-07-31 4335 unsigned long start, unsigned long end,
24669e58477e27 Aneesh Kumar K.V 2012-07-31 4336 struct page *ref_page)
63551ae0feaaa2 David Gibson 2005-06-21 4337 {
63551ae0feaaa2 David Gibson 2005-06-21 4338 struct mm_struct *mm = vma->vm_mm;
63551ae0feaaa2 David Gibson 2005-06-21 4339 unsigned long address;
c7546f8f03f5a4 David Gibson 2005-08-05 4340 pte_t *ptep;
63551ae0feaaa2 David Gibson 2005-06-21 4341 pte_t pte;
cb900f41215447 Kirill A. Shutemov 2013-11-14 4342 spinlock_t *ptl;
63551ae0feaaa2 David Gibson 2005-06-21 4343 struct page *page;
a5516438959d90 Andi Kleen 2008-07-23 4344 struct hstate *h = hstate_vma(vma);
a5516438959d90 Andi Kleen 2008-07-23 4345 unsigned long sz = huge_page_size(h);
ac46d4f3c43241 J?r?me Glisse 2018-12-28 4346 struct mmu_notifier_range range;
a5516438959d90 Andi Kleen 2008-07-23 4347
63551ae0feaaa2 David Gibson 2005-06-21 4348 WARN_ON(!is_vm_hugetlb_page(vma));
a5516438959d90 Andi Kleen 2008-07-23 4349 BUG_ON(start & ~huge_page_mask(h));
a5516438959d90 Andi Kleen 2008-07-23 4350 BUG_ON(end & ~huge_page_mask(h));
63551ae0feaaa2 David Gibson 2005-06-21 4351
07e326610e5634 Aneesh Kumar K.V 2016-12-12 4352 /*
07e326610e5634 Aneesh Kumar K.V 2016-12-12 4353 * This is a hugetlb vma, all the pte entries should point
07e326610e5634 Aneesh Kumar K.V 2016-12-12 4354 * to huge page.
07e326610e5634 Aneesh Kumar K.V 2016-12-12 4355 */
ed6a79352cad00 Peter Zijlstra 2018-08-31 4356 tlb_change_page_size(tlb, sz);
24669e58477e27 Aneesh Kumar K.V 2012-07-31 4357 tlb_start_vma(tlb, vma);
dff11abe280b47 Mike Kravetz 2018-10-05 4358
dff11abe280b47 Mike Kravetz 2018-10-05 4359 /*
dff11abe280b47 Mike Kravetz 2018-10-05 4360 * If sharing possible, alert mmu notifiers of worst case.
dff11abe280b47 Mike Kravetz 2018-10-05 4361 */
6f4f13e8d9e27c J?r?me Glisse 2019-05-13 4362 mmu_notifier_range_init(&range, MMU_NOTIFY_UNMAP, 0, vma, mm, start,
6f4f13e8d9e27c J?r?me Glisse 2019-05-13 4363 end);
ac46d4f3c43241 J?r?me Glisse 2018-12-28 4364 adjust_range_if_pmd_sharing_possible(vma, &range.start, &range.end);
ac46d4f3c43241 J?r?me Glisse 2018-12-28 4365 mmu_notifier_invalidate_range_start(&range);
569f48b85813f0 Hillf Danton 2014-12-10 4366 address = start;
569f48b85813f0 Hillf Danton 2014-12-10 4367 for (; address < end; address += sz) {
7868a2087ec13e Punit Agrawal 2017-07-06 4368 ptep = huge_pte_offset(mm, address, sz);
c7546f8f03f5a4 David Gibson 2005-08-05 4369 if (!ptep)
c7546f8f03f5a4 David Gibson 2005-08-05 4370 continue;
c7546f8f03f5a4 David Gibson 2005-08-05 4371
cb900f41215447 Kirill A. Shutemov 2013-11-14 4372 ptl = huge_pte_lock(h, mm, ptep);
34ae204f18519f Mike Kravetz 2020-08-11 4373 if (huge_pmd_unshare(mm, vma, &address, ptep)) {
31d49da5ad0172 Aneesh Kumar K.V 2016-07-26 4374 spin_unlock(ptl);
dff11abe280b47 Mike Kravetz 2018-10-05 4375 /*
dff11abe280b47 Mike Kravetz 2018-10-05 4376 * We just unmapped a page of PMDs by clearing a PUD.
dff11abe280b47 Mike Kravetz 2018-10-05 4377 * The caller's TLB flush range should cover this area.
dff11abe280b47 Mike Kravetz 2018-10-05 4378 */
31d49da5ad0172 Aneesh Kumar K.V 2016-07-26 4379 continue;
31d49da5ad0172 Aneesh Kumar K.V 2016-07-26 4380 }
39dde65c9940c9 Kenneth W Chen 2006-12-06 4381
6629326b89b6e6 Hillf Danton 2012-03-23 4382 pte = huge_ptep_get(ptep);
31d49da5ad0172 Aneesh Kumar K.V 2016-07-26 4383 if (huge_pte_none(pte)) {
31d49da5ad0172 Aneesh Kumar K.V 2016-07-26 4384 spin_unlock(ptl);
31d49da5ad0172 Aneesh Kumar K.V 2016-07-26 4385 continue;
31d49da5ad0172 Aneesh Kumar K.V 2016-07-26 4386 }
6629326b89b6e6 Hillf Danton 2012-03-23 4387
6629326b89b6e6 Hillf Danton 2012-03-23 4388 /*
9fbc1f635fd0bd Naoya Horiguchi 2015-02-11 4389 * Migrating hugepage or HWPoisoned hugepage is already
9fbc1f635fd0bd Naoya Horiguchi 2015-02-11 4390 * unmapped and its refcount is dropped, so just clear pte here.
6629326b89b6e6 Hillf Danton 2012-03-23 4391 */
9fbc1f635fd0bd Naoya Horiguchi 2015-02-11 4392 if (unlikely(!pte_present(pte))) {
9386fac34c7cbe Punit Agrawal 2017-07-06 4393 huge_pte_clear(mm, address, ptep, sz);
31d49da5ad0172 Aneesh Kumar K.V 2016-07-26 4394 spin_unlock(ptl);
31d49da5ad0172 Aneesh Kumar K.V 2016-07-26 4395 continue;
8c4894c6bc790d Naoya Horiguchi 2012-12-12 4396 }
6629326b89b6e6 Hillf Danton 2012-03-23 4397
6629326b89b6e6 Hillf Danton 2012-03-23 4398 page = pte_page(pte);
04f2cbe35699d2 Mel Gorman 2008-07-23 4399 /*
04f2cbe35699d2 Mel Gorman 2008-07-23 4400 * If a reference page is supplied, it is because a specific
04f2cbe35699d2 Mel Gorman 2008-07-23 4401 * page is being unmapped, not a range. Ensure the page we
04f2cbe35699d2 Mel Gorman 2008-07-23 4402 * are about to unmap is the actual page of interest.
04f2cbe35699d2 Mel Gorman 2008-07-23 4403 */
04f2cbe35699d2 Mel Gorman 2008-07-23 4404 if (ref_page) {
31d49da5ad0172 Aneesh Kumar K.V 2016-07-26 4405 if (page != ref_page) {
31d49da5ad0172 Aneesh Kumar K.V 2016-07-26 4406 spin_unlock(ptl);
31d49da5ad0172 Aneesh Kumar K.V 2016-07-26 4407 continue;
31d49da5ad0172 Aneesh Kumar K.V 2016-07-26 4408 }
04f2cbe35699d2 Mel Gorman 2008-07-23 4409 /*
04f2cbe35699d2 Mel Gorman 2008-07-23 4410 * Mark the VMA as having unmapped its page so that
04f2cbe35699d2 Mel Gorman 2008-07-23 4411 * future faults in this VMA will fail rather than
04f2cbe35699d2 Mel Gorman 2008-07-23 4412 * looking like data was lost
04f2cbe35699d2 Mel Gorman 2008-07-23 4413 */
04f2cbe35699d2 Mel Gorman 2008-07-23 4414 set_vma_resv_flags(vma, HPAGE_RESV_UNMAPPED);
04f2cbe35699d2 Mel Gorman 2008-07-23 4415 }
04f2cbe35699d2 Mel Gorman 2008-07-23 4416
c7546f8f03f5a4 David Gibson 2005-08-05 4417 pte = huge_ptep_get_and_clear(mm, address, ptep);
b528e4b6405b9f Aneesh Kumar K.V 2016-12-12 4418 tlb_remove_huge_tlb_entry(h, tlb, ptep, address);
106c992a5ebef2 Gerald Schaefer 2013-04-29 4419 if (huge_pte_dirty(pte))
6649a3863232eb Ken Chen 2007-02-08 4420 set_page_dirty(page);
9e81130b7ce230 Hillf Danton 2012-03-21 4421
5d317b2b653659 Naoya Horiguchi 2015-11-05 4422 hugetlb_count_sub(pages_per_huge_page(h), mm);
d281ee61451835 Kirill A. Shutemov 2016-01-15 4423 page_remove_rmap(page, true);
31d49da5ad0172 Aneesh Kumar K.V 2016-07-26 4424
cb900f41215447 Kirill A. Shutemov 2013-11-14 4425 spin_unlock(ptl);
e77b0852b551ff Aneesh Kumar K.V 2016-07-26 4426 tlb_remove_page_size(tlb, page, huge_page_size(h));
24669e58477e27 Aneesh Kumar K.V 2012-07-31 4427 /*
31d49da5ad0172 Aneesh Kumar K.V 2016-07-26 4428 * Bail out after unmapping reference page if supplied
24669e58477e27 Aneesh Kumar K.V 2012-07-31 4429 */
31d49da5ad0172 Aneesh Kumar K.V 2016-07-26 4430 if (ref_page)
31d49da5ad0172 Aneesh Kumar K.V 2016-07-26 4431 break;
fe1668ae5bf014 Kenneth W Chen 2006-10-04 4432 }
ac46d4f3c43241 J?r?me Glisse 2018-12-28 4433 mmu_notifier_invalidate_range_end(&range);
24669e58477e27 Aneesh Kumar K.V 2012-07-31 4434 tlb_end_vma(tlb, vma);
^1da177e4c3f41 Linus Torvalds 2005-04-16 4435 }
63551ae0feaaa2 David Gibson 2005-06-21 4436

---
0-DAY CI Kernel Test Service, Intel Corporation
https://lists.01.org/hyperkitty/list/[email protected]


Attachments:
(No filename) (16.26 kB)
.config.gz (31.90 kB)
Download all attachments

2021-07-15 17:26:44

by kernel test robot

[permalink] [raw]
Subject: [RFC PATCH] mm/hugetlb: __unmap_hugepage_range() can be static

mm/hugetlb.c:4334:6: warning: symbol '__unmap_hugepage_range' was not declared. Should it be static?

Reported-by: kernel test robot <[email protected]>
Signed-off-by: kernel test robot <[email protected]>
---
hugetlb.c | 6 +++---
1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/mm/hugetlb.c b/mm/hugetlb.c
index 924553aa8f789ad..4bdd637b0c29a95 100644
--- a/mm/hugetlb.c
+++ b/mm/hugetlb.c
@@ -4331,9 +4331,9 @@ int copy_hugetlb_page_range(struct mm_struct *dst, struct mm_struct *src,
return ret;
}

-void __unmap_hugepage_range(struct mmu_gather *tlb, struct vm_area_struct *vma,
- unsigned long start, unsigned long end,
- struct page *ref_page)
+static void __unmap_hugepage_range(struct mmu_gather *tlb, struct vm_area_struct *vma,
+ unsigned long start, unsigned long end,
+ struct page *ref_page)
{
struct mm_struct *mm = vma->vm_mm;
unsigned long address;

2021-07-15 18:24:14

by kernel test robot

[permalink] [raw]
Subject: Re: [PATCH v4 15/26] mm/hugetlb: Drop __unmap_hugepage_range definition from hugetlb.h

Hi Peter,

Thank you for the patch! Perhaps something to improve:

[auto build test WARNING on linus/master]
[also build test WARNING on v5.14-rc1 next-20210715]
[cannot apply to hnaz-linux-mm/master asm-generic/master linux/master]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch]

url: https://github.com/0day-ci/linux/commits/Peter-Xu/userfaultfd-wp-Support-shmem-and-hugetlbfs/20210715-062718
base: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git 8096acd7442e613fad0354fc8dfdb2003cceea0b
config: i386-randconfig-s002-20210714 (attached as .config)
compiler: gcc-10 (Debian 10.2.1-6) 10.2.1 20210110
reproduce:
# apt-get install sparse
# sparse version: v0.6.3-341-g8af24329-dirty
# https://github.com/0day-ci/linux/commit/f8dd355edbfe948f84c8aaa10a5173656aa2778c
git remote add linux-review https://github.com/0day-ci/linux
git fetch --no-tags linux-review Peter-Xu/userfaultfd-wp-Support-shmem-and-hugetlbfs/20210715-062718
git checkout f8dd355edbfe948f84c8aaa10a5173656aa2778c
# save the attached .config to linux build tree
make W=1 C=1 CF='-fdiagnostic-prefix -D__CHECK_ENDIAN__' O=build_dir ARCH=i386 SHELL=/bin/bash

If you fix the issue, kindly add following tag as appropriate
Reported-by: kernel test robot <[email protected]>


sparse warnings: (new ones prefixed by >>)
>> mm/hugetlb.c:4334:6: sparse: sparse: symbol '__unmap_hugepage_range' was not declared. Should it be static?
mm/hugetlb.c:444:12: sparse: sparse: context imbalance in 'allocate_file_region_entries' - wrong count at exit
mm/hugetlb.c:517:13: sparse: sparse: context imbalance in 'region_add' - wrong count at exit
mm/hugetlb.c:584:13: sparse: sparse: context imbalance in 'region_chg' - wrong count at exit
mm/hugetlb.c: note: in included file (through include/linux/mmzone.h, include/linux/gfp.h, include/linux/mm.h):
include/linux/page-flags.h:183:29: sparse: sparse: context imbalance in 'hugetlb_cow' - unexpected unlock
mm/hugetlb.c:5386:25: sparse: sparse: context imbalance in 'follow_hugetlb_page' - different lock contexts for basic block

Please review and possibly fold the followup patch.

---
0-DAY CI Kernel Test Service, Intel Corporation
https://lists.01.org/hyperkitty/list/[email protected]


Attachments:
(No filename) (2.42 kB)
.config.gz (45.49 kB)
Download all attachments

2021-07-15 19:05:10

by Peter Xu

[permalink] [raw]
Subject: Re: [RFC PATCH] mm/hugetlb: __unmap_hugepage_range() can be static

On Fri, Jul 16, 2021 at 01:05:24AM +0800, kernel test robot wrote:
> mm/hugetlb.c:4334:6: warning: symbol '__unmap_hugepage_range' was not declared. Should it be static?
>
> Reported-by: kernel test robot <[email protected]>
> Signed-off-by: kernel test robot <[email protected]>
> ---
> hugetlb.c | 6 +++---
> 1 file changed, 3 insertions(+), 3 deletions(-)
>
> diff --git a/mm/hugetlb.c b/mm/hugetlb.c
> index 924553aa8f789ad..4bdd637b0c29a95 100644
> --- a/mm/hugetlb.c
> +++ b/mm/hugetlb.c
> @@ -4331,9 +4331,9 @@ int copy_hugetlb_page_range(struct mm_struct *dst, struct mm_struct *src,
> return ret;
> }
>
> -void __unmap_hugepage_range(struct mmu_gather *tlb, struct vm_area_struct *vma,
> - unsigned long start, unsigned long end,
> - struct page *ref_page)
> +static void __unmap_hugepage_range(struct mmu_gather *tlb, struct vm_area_struct *vma,
> + unsigned long start, unsigned long end,
> + struct page *ref_page)
> {
> struct mm_struct *mm = vma->vm_mm;
> unsigned long address;

Will squash this change into the patch. Thanks.

--
Peter Xu