2023-08-01 14:05:13

by David Hildenbrand

[permalink] [raw]
Subject: [PATCH v2 4/8] mm/gup: don't implicitly set FOLL_HONOR_NUMA_FAULT

Commit 0b9d705297b2 ("mm: numa: Support NUMA hinting page faults from
gup/gup_fast") from 2012 documented as the primary reason why we would want
to handle NUMA hinting faults from GUP:

KVM secondary MMU page faults will trigger the NUMA hinting page
faults through gup_fast -> get_user_pages -> follow_page ->
handle_mm_fault.

That is still the case today, and relevant KVM code has been converted to
manually set FOLL_HONOR_NUMA_FAULT. So let's stop setting
FOLL_HONOR_NUMA_FAULT for all GUP users and cross fingers that not that
many other ones that really require such handling for autonuma remain.

Possible interaction with MMU notifiers:

Assume a driver obtains a page using get_user_pages() to map it into
a secondary MMU, and uses the MMU notifier framework to get notified on
changes.

Assume get_user_pages() succeeded on a PROT_NONE-mapped page (because
FOLL_HONOR_NUMA_FAULT is not set) in an accessible VMA and the page is
mapped into a secondary MMU. Once user space would turn that mapping
inaccessible using mprotect(PROT_NONE), the actual PTE in the page table
might not change. If the MMU notifier would be smart and optimize for that
case "why notify if the PTE didn't change", that could be problematic.

At least change_pmd_range() with MMU_NOTIFY_PROTECTION_VMA for now does an
unconditional mmu_notifier_invalidate_range_start() ->
mmu_notifier_invalidate_range_end() and should be fine.

Note that even if a PTE in an accessible VMA is pte_protnone(), the
underlying page might be accessed by a secondary MMU that does not set
FOLL_HONOR_NUMA_FAULT, and test_young() MMU notifiers would return "true".

Signed-off-by: David Hildenbrand <[email protected]>
---
mm/gup.c | 6 ------
1 file changed, 6 deletions(-)

diff --git a/mm/gup.c b/mm/gup.c
index f463d3004ddc..ee4fc15ce88e 100644
--- a/mm/gup.c
+++ b/mm/gup.c
@@ -2244,12 +2244,6 @@ static bool is_valid_gup_args(struct page **pages, int *locked,
gup_flags |= FOLL_UNLOCKABLE;
}

- /*
- * For now, always trigger NUMA hinting faults. Some GUP users like
- * KVM really require it to benefit from autonuma.
- */
- gup_flags |= FOLL_HONOR_NUMA_FAULT;
-
/* FOLL_GET and FOLL_PIN are mutually exclusive. */
if (WARN_ON_ONCE((gup_flags & (FOLL_PIN | FOLL_GET)) ==
(FOLL_PIN | FOLL_GET)))
--
2.41.0



2023-08-02 16:52:44

by Mel Gorman

[permalink] [raw]
Subject: Re: [PATCH v2 4/8] mm/gup: don't implicitly set FOLL_HONOR_NUMA_FAULT

On Tue, Aug 01, 2023 at 02:48:40PM +0200, David Hildenbrand wrote:
> Commit 0b9d705297b2 ("mm: numa: Support NUMA hinting page faults from
> gup/gup_fast") from 2012 documented as the primary reason why we would want
> to handle NUMA hinting faults from GUP:
>
> KVM secondary MMU page faults will trigger the NUMA hinting page
> faults through gup_fast -> get_user_pages -> follow_page ->
> handle_mm_fault.
>
> That is still the case today, and relevant KVM code has been converted to
> manually set FOLL_HONOR_NUMA_FAULT. So let's stop setting
> FOLL_HONOR_NUMA_FAULT for all GUP users and cross fingers that not that
> many other ones that really require such handling for autonuma remain.
>
> Possible interaction with MMU notifiers:
>
> Assume a driver obtains a page using get_user_pages() to map it into
> a secondary MMU, and uses the MMU notifier framework to get notified on
> changes.
>
> Assume get_user_pages() succeeded on a PROT_NONE-mapped page (because
> FOLL_HONOR_NUMA_FAULT is not set) in an accessible VMA and the page is
> mapped into a secondary MMU. Once user space would turn that mapping
> inaccessible using mprotect(PROT_NONE), the actual PTE in the page table
> might not change. If the MMU notifier would be smart and optimize for that
> case "why notify if the PTE didn't change", that could be problematic.
>
> At least change_pmd_range() with MMU_NOTIFY_PROTECTION_VMA for now does an
> unconditional mmu_notifier_invalidate_range_start() ->
> mmu_notifier_invalidate_range_end() and should be fine.
>
> Note that even if a PTE in an accessible VMA is pte_protnone(), the
> underlying page might be accessed by a secondary MMU that does not set
> FOLL_HONOR_NUMA_FAULT, and test_young() MMU notifiers would return "true".
>
> Signed-off-by: David Hildenbrand <[email protected]>

Also seems sane but a large portion of its correctness also depends on
patch 3 being correct.

--
Mel Gorman
SUSE Labs