v5: * Removed transparent_hugepage_active() for !THP, per Zach.
Patch 4/7 and 5/7 were updated accordingly.
* Collected review tags.
v4: * Consolidated the transhuge_vma_size_ok() helper proposed in the
earlier versions into transhuge_vma_suitable(), per Zach.
* Fixed the regression introduced by patch 3/7, per Zach and Miaohe.
* Reworded the comment for transhuge_vma_suitable(), per Zach.
* Removed khugepaged_enter() per Miaohe.
* More comments for hugepage_vma_check(), per Zach.
* Squashed patch 4/7 (mm: khugepaged: use transhuge_vma_suitable replace open-code)
in the earlier version into patch 2/7 of this version.
* Minor correction to the doc about THPeligible (patch 7/7), so the
total number of patches is kept 7.
v3: * Fixed the comment from Willy
v2: * Rebased to the latest mm-unstable
* Fixed potential regression for smaps's THPeligible
This series is the follow-up of the discussion about cleaning up transhuge_xxx
helpers at https://lore.kernel.org/linux-mm/[email protected]/.
THP has a bunch of helpers that do VMA sanity check for different paths, they
do the similar checks for the most callsites and have a lot duplicate codes.
And it is confusing what helpers should be used at what conditions.
This series reorganized and cleaned up the code so that we could consolidate
all the checks into hugepage_vma_check().
The transhuge_vma_enabled(), transparent_hugepage_active() and
__transparent_hugepage_enabled() are killed by this series.
Yang Shi (7):
mm: khugepaged: check THP flag in hugepage_vma_check()
mm: thp: consolidate vma size check to transhuge_vma_suitable
mm: khugepaged: better comments for anon vma check in hugepage_vma_revalidate
mm: thp: kill transparent_hugepage_active()
mm: thp: kill __transhuge_page_enabled()
mm: khugepaged: reorg some khugepaged helpers
doc: proc: fix the description to THPeligible
Documentation/filesystems/proc.rst | 4 ++-
fs/proc/task_mmu.c | 2 +-
include/linux/huge_mm.h | 80 +++++++++++++++++++----------------------------------------
include/linux/khugepaged.h | 30 ----------------------
mm/huge_memory.c | 81 +++++++++++++++++++++++++++++++++++++++++++++++++++--------
mm/khugepaged.c | 84 +++++++++++++++++++-------------------------------------------
mm/memory.c | 7 ++++--
7 files changed, 130 insertions(+), 158 deletions(-)
The hugepage_vma_revalidate() needs to check if the vma is still
anonymous vma or not since the address may be unmapped then remapped to
file before khugepaged reaquired the mmap_lock.
The old comment is not quite helpful, elaborate this with better
comment.
Reviewed-by: Zach O'Keefe <[email protected]>
Signed-off-by: Yang Shi <[email protected]>
---
mm/khugepaged.c | 8 +++++++-
1 file changed, 7 insertions(+), 1 deletion(-)
diff --git a/mm/khugepaged.c b/mm/khugepaged.c
index db0b334a7d1f..5baa394e34c8 100644
--- a/mm/khugepaged.c
+++ b/mm/khugepaged.c
@@ -960,7 +960,13 @@ static int hugepage_vma_revalidate(struct mm_struct *mm, unsigned long address,
return SCAN_ADDRESS_RANGE;
if (!hugepage_vma_check(vma, vma->vm_flags))
return SCAN_VMA_CHECK;
- /* Anon VMA expected */
+ /*
+ * Anon VMA expected, the address may be unmapped then
+ * remapped to file after khugepaged reaquired the mmap_lock.
+ *
+ * hugepage_vma_check may return true for qualified file
+ * vmas.
+ */
if (!vma->anon_vma || !vma_is_anonymous(vma))
return SCAN_VMA_CHECK;
return 0;
--
2.26.3
The THPeligible bit shows 1 if and only if the VMA is eligible for
allocating THP and the THP is also PMD mappable. Some misaligned file
VMAs may be eligible for allocating THP but the THP can't be mapped by
PMD. Make this more explictly to avoid ambiguity.
Reviewed-by: Zach O'Keefe <[email protected]>
Signed-off-by: Yang Shi <[email protected]>
---
Documentation/filesystems/proc.rst | 4 +++-
1 file changed, 3 insertions(+), 1 deletion(-)
diff --git a/Documentation/filesystems/proc.rst b/Documentation/filesystems/proc.rst
index 1bc91fb8c321..a5e41e636a1a 100644
--- a/Documentation/filesystems/proc.rst
+++ b/Documentation/filesystems/proc.rst
@@ -514,8 +514,10 @@ replaced by copy-on-write) part of the underlying shmem object out on swap.
"SwapPss" shows proportional swap share of this mapping. Unlike "Swap", this
does not take into account swapped out page of underlying shmem objects.
"Locked" indicates whether the mapping is locked in memory or not.
+
"THPeligible" indicates whether the mapping is eligible for allocating THP
-pages - 1 if true, 0 otherwise. It just shows the current status.
+pages as well as the THP is PMD mappable or not - 1 if true, 0 otherwise.
+It just shows the current status.
"VmFlags" field deserves a separate description. This member represents the
kernel flags associated with the particular virtual memory area in two letter
--
2.26.3
The transparent_hugepage_active() was introduced to show THP eligibility
bit in smaps in proc, smaps is the only user. But it actually does the
similar check as hugepage_vma_check() which is used by khugepaged. We
definitely don't have to maintain two similar checks, so kill
transparent_hugepage_active().
This patch also fixed the wrong behavior for VM_NO_KHUGEPAGED vmas.
Also move hugepage_vma_check() to huge_memory.c and huge_mm.h since it
is not only for khugepaged anymore.
Reviewed-by: Zach O'Keefe <[email protected]>
Signed-off-by: Yang Shi <[email protected]>
---
fs/proc/task_mmu.c | 2 +-
include/linux/huge_mm.h | 16 +++++++-----
include/linux/khugepaged.h | 2 --
mm/huge_memory.c | 50 +++++++++++++++++++++++++++++++-------
mm/khugepaged.c | 48 +++---------------------------------
5 files changed, 56 insertions(+), 62 deletions(-)
diff --git a/fs/proc/task_mmu.c b/fs/proc/task_mmu.c
index 37ccb5c9f4f8..39a40ec181e7 100644
--- a/fs/proc/task_mmu.c
+++ b/fs/proc/task_mmu.c
@@ -863,7 +863,7 @@ static int show_smap(struct seq_file *m, void *v)
__show_smap(m, &mss, false);
seq_printf(m, "THPeligible: %d\n",
- transparent_hugepage_active(vma));
+ hugepage_vma_check(vma, vma->vm_flags, true));
if (arch_pkeys_enabled())
seq_printf(m, "ProtectionKey: %8u\n", vma_pkey(vma));
diff --git a/include/linux/huge_mm.h b/include/linux/huge_mm.h
index 8a5a8bfce0f5..64487bcd0c7b 100644
--- a/include/linux/huge_mm.h
+++ b/include/linux/huge_mm.h
@@ -202,7 +202,9 @@ static inline bool file_thp_enabled(struct vm_area_struct *vma)
!inode_is_open_for_write(inode) && S_ISREG(inode->i_mode);
}
-bool transparent_hugepage_active(struct vm_area_struct *vma);
+bool hugepage_vma_check(struct vm_area_struct *vma,
+ unsigned long vm_flags,
+ bool smaps);
#define transparent_hugepage_use_zero_page() \
(transparent_hugepage_flags & \
@@ -351,11 +353,6 @@ static inline bool __transparent_hugepage_enabled(struct vm_area_struct *vma)
return false;
}
-static inline bool transparent_hugepage_active(struct vm_area_struct *vma)
-{
- return false;
-}
-
static inline bool transhuge_vma_suitable(struct vm_area_struct *vma,
unsigned long addr)
{
@@ -368,6 +365,13 @@ static inline bool transhuge_vma_enabled(struct vm_area_struct *vma,
return false;
}
+static inline bool hugepage_vma_check(struct vm_area_struct *vma,
+ unsigned long vm_flags,
+ bool smaps)
+{
+ return false;
+}
+
static inline void prep_transhuge_page(struct page *page) {}
#define transparent_hugepage_flags 0UL
diff --git a/include/linux/khugepaged.h b/include/linux/khugepaged.h
index 31ca8a7f78f4..ea5fd4c398f7 100644
--- a/include/linux/khugepaged.h
+++ b/include/linux/khugepaged.h
@@ -10,8 +10,6 @@ extern struct attribute_group khugepaged_attr_group;
extern int khugepaged_init(void);
extern void khugepaged_destroy(void);
extern int start_stop_khugepaged(void);
-extern bool hugepage_vma_check(struct vm_area_struct *vma,
- unsigned long vm_flags);
extern void __khugepaged_enter(struct mm_struct *mm);
extern void __khugepaged_exit(struct mm_struct *mm);
extern void khugepaged_enter_vma(struct vm_area_struct *vma,
diff --git a/mm/huge_memory.c b/mm/huge_memory.c
index b530462c4493..a28c6100b491 100644
--- a/mm/huge_memory.c
+++ b/mm/huge_memory.c
@@ -69,21 +69,53 @@ static atomic_t huge_zero_refcount;
struct page *huge_zero_page __read_mostly;
unsigned long huge_zero_pfn __read_mostly = ~0UL;
-bool transparent_hugepage_active(struct vm_area_struct *vma)
+bool hugepage_vma_check(struct vm_area_struct *vma,
+ unsigned long vm_flags,
+ bool smaps)
{
- /* The addr is used to check if the vma size fits */
- unsigned long addr = (vma->vm_end & HPAGE_PMD_MASK) - HPAGE_PMD_SIZE;
+ if (!transhuge_vma_enabled(vma, vm_flags))
+ return false;
+
+ if (vm_flags & VM_NO_KHUGEPAGED)
+ return false;
+
+ /* Don't run khugepaged against DAX vma */
+ if (vma_is_dax(vma))
+ return false;
- if (!transhuge_vma_suitable(vma, addr))
+ /* Check alignment for file vma and size for both file and anon vma */
+ if (!transhuge_vma_suitable(vma, (vma->vm_end - HPAGE_PMD_SIZE)))
return false;
- if (vma_is_anonymous(vma))
- return __transparent_hugepage_enabled(vma);
- if (vma_is_shmem(vma))
+
+ /* Enabled via shmem mount options or sysfs settings. */
+ if (shmem_file(vma->vm_file))
return shmem_huge_enabled(vma);
- if (transhuge_vma_enabled(vma, vma->vm_flags) && file_thp_enabled(vma))
+
+ if (!khugepaged_enabled())
+ return false;
+
+ /* THP settings require madvise. */
+ if (!(vm_flags & VM_HUGEPAGE) && !khugepaged_always())
+ return false;
+
+ /* Only regular file is valid */
+ if (file_thp_enabled(vma))
return true;
- return false;
+ if (!vma_is_anonymous(vma))
+ return false;
+
+ if (vma_is_temporary_stack(vma))
+ return false;
+
+ /*
+ * THPeligible bit of smaps should show 1 for proper VMAs even
+ * though anon_vma is not initialized yet.
+ */
+ if (!vma->anon_vma)
+ return smaps;
+
+ return true;
}
static bool get_huge_zero_page(void)
diff --git a/mm/khugepaged.c b/mm/khugepaged.c
index 5baa394e34c8..3afd87f8c0b1 100644
--- a/mm/khugepaged.c
+++ b/mm/khugepaged.c
@@ -437,46 +437,6 @@ static inline int khugepaged_test_exit(struct mm_struct *mm)
return atomic_read(&mm->mm_users) == 0;
}
-bool hugepage_vma_check(struct vm_area_struct *vma,
- unsigned long vm_flags)
-{
- if (!transhuge_vma_enabled(vma, vm_flags))
- return false;
-
- if (vm_flags & VM_NO_KHUGEPAGED)
- return false;
-
- /* Don't run khugepaged against DAX vma */
- if (vma_is_dax(vma))
- return false;
-
- /* Check alignment for file vma and size for both file and anon vma */
- if (!transhuge_vma_suitable(vma, (vma->vm_end - HPAGE_PMD_SIZE)))
- return false;
-
- /* Enabled via shmem mount options or sysfs settings. */
- if (shmem_file(vma->vm_file))
- return shmem_huge_enabled(vma);
-
- if (!khugepaged_enabled())
- return false;
-
- /* THP settings require madvise. */
- if (!(vm_flags & VM_HUGEPAGE) && !khugepaged_always())
- return false;
-
- /* Only regular file is valid */
- if (file_thp_enabled(vma))
- return true;
-
- if (!vma->anon_vma || !vma_is_anonymous(vma))
- return false;
- if (vma_is_temporary_stack(vma))
- return false;
-
- return true;
-}
-
void __khugepaged_enter(struct mm_struct *mm)
{
struct mm_slot *mm_slot;
@@ -513,7 +473,7 @@ void khugepaged_enter_vma(struct vm_area_struct *vma,
{
if (!test_bit(MMF_VM_HUGEPAGE, &vma->vm_mm->flags) &&
khugepaged_enabled()) {
- if (hugepage_vma_check(vma, vm_flags))
+ if (hugepage_vma_check(vma, vm_flags, false))
__khugepaged_enter(vma->vm_mm);
}
}
@@ -958,7 +918,7 @@ static int hugepage_vma_revalidate(struct mm_struct *mm, unsigned long address,
if (!transhuge_vma_suitable(vma, address))
return SCAN_ADDRESS_RANGE;
- if (!hugepage_vma_check(vma, vma->vm_flags))
+ if (!hugepage_vma_check(vma, vma->vm_flags, false))
return SCAN_VMA_CHECK;
/*
* Anon VMA expected, the address may be unmapped then
@@ -1448,7 +1408,7 @@ void collapse_pte_mapped_thp(struct mm_struct *mm, unsigned long addr)
* the valid THP. Add extra VM_HUGEPAGE so hugepage_vma_check()
* will not fail the vma for missing VM_HUGEPAGE
*/
- if (!hugepage_vma_check(vma, vma->vm_flags | VM_HUGEPAGE))
+ if (!hugepage_vma_check(vma, vma->vm_flags | VM_HUGEPAGE, false))
return;
/* Keep pmd pgtable for uffd-wp; see comment in retract_page_tables() */
@@ -2143,7 +2103,7 @@ static unsigned int khugepaged_scan_mm_slot(unsigned int pages,
progress++;
break;
}
- if (!hugepage_vma_check(vma, vma->vm_flags)) {
+ if (!hugepage_vma_check(vma, vma->vm_flags, false)) {
skip:
progress++;
continue;
--
2.26.3
The khugepaged_{enabled|always|req_madv} are not khugepaged only
anymore, move them to huge_mm.h and rename to hugepage_flags_xxx, and
remove khugepaged_req_madv due to no users.
Also move khugepaged_defrag to khugepaged.c since its only caller is in
that file, it doesn't have to be in a header file.
Reviewed-by: Zach O'Keefe <[email protected]>
Signed-off-by: Yang Shi <[email protected]>
---
include/linux/huge_mm.h | 8 ++++++++
include/linux/khugepaged.h | 14 --------------
mm/huge_memory.c | 4 ++--
mm/khugepaged.c | 18 +++++++++++-------
4 files changed, 21 insertions(+), 23 deletions(-)
diff --git a/include/linux/huge_mm.h b/include/linux/huge_mm.h
index cd8a6c5d9fe5..ae3d8e2fd9e2 100644
--- a/include/linux/huge_mm.h
+++ b/include/linux/huge_mm.h
@@ -116,6 +116,14 @@ extern struct kobj_attribute shmem_enabled_attr;
extern unsigned long transparent_hugepage_flags;
+#define hugepage_flags_enabled() \
+ (transparent_hugepage_flags & \
+ ((1<<TRANSPARENT_HUGEPAGE_FLAG) | \
+ (1<<TRANSPARENT_HUGEPAGE_REQ_MADV_FLAG)))
+#define hugepage_flags_always() \
+ (transparent_hugepage_flags & \
+ (1<<TRANSPARENT_HUGEPAGE_FLAG))
+
/*
* Do the below checks:
* - For file vma, check if the linear page offset of vma is
diff --git a/include/linux/khugepaged.h b/include/linux/khugepaged.h
index ea5fd4c398f7..384f034ae947 100644
--- a/include/linux/khugepaged.h
+++ b/include/linux/khugepaged.h
@@ -24,20 +24,6 @@ static inline void collapse_pte_mapped_thp(struct mm_struct *mm,
}
#endif
-#define khugepaged_enabled() \
- (transparent_hugepage_flags & \
- ((1<<TRANSPARENT_HUGEPAGE_FLAG) | \
- (1<<TRANSPARENT_HUGEPAGE_REQ_MADV_FLAG)))
-#define khugepaged_always() \
- (transparent_hugepage_flags & \
- (1<<TRANSPARENT_HUGEPAGE_FLAG))
-#define khugepaged_req_madv() \
- (transparent_hugepage_flags & \
- (1<<TRANSPARENT_HUGEPAGE_REQ_MADV_FLAG))
-#define khugepaged_defrag() \
- (transparent_hugepage_flags & \
- (1<<TRANSPARENT_HUGEPAGE_DEFRAG_KHUGEPAGED_FLAG))
-
static inline void khugepaged_fork(struct mm_struct *mm, struct mm_struct *oldmm)
{
if (test_bit(MMF_VM_HUGEPAGE, &oldmm->flags))
diff --git a/mm/huge_memory.c b/mm/huge_memory.c
index d0c37d99917b..0f2cce2d7041 100644
--- a/mm/huge_memory.c
+++ b/mm/huge_memory.c
@@ -117,11 +117,11 @@ bool hugepage_vma_check(struct vm_area_struct *vma,
if (!in_pf && shmem_file(vma->vm_file))
return shmem_huge_enabled(vma);
- if (!khugepaged_enabled())
+ if (!hugepage_flags_enabled())
return false;
/* THP settings require madvise. */
- if (!(vm_flags & VM_HUGEPAGE) && !khugepaged_always())
+ if (!(vm_flags & VM_HUGEPAGE) && !hugepage_flags_always())
return false;
/* Only regular file is valid */
diff --git a/mm/khugepaged.c b/mm/khugepaged.c
index 2a676f37c921..d8ebb60aae36 100644
--- a/mm/khugepaged.c
+++ b/mm/khugepaged.c
@@ -472,7 +472,7 @@ void khugepaged_enter_vma(struct vm_area_struct *vma,
unsigned long vm_flags)
{
if (!test_bit(MMF_VM_HUGEPAGE, &vma->vm_mm->flags) &&
- khugepaged_enabled()) {
+ hugepage_flags_enabled()) {
if (hugepage_vma_check(vma, vm_flags, false, false))
__khugepaged_enter(vma->vm_mm);
}
@@ -763,6 +763,10 @@ static bool khugepaged_scan_abort(int nid)
return false;
}
+#define khugepaged_defrag() \
+ (transparent_hugepage_flags & \
+ (1<<TRANSPARENT_HUGEPAGE_DEFRAG_KHUGEPAGED_FLAG))
+
/* Defrag for khugepaged will enter direct reclaim/compaction if necessary */
static inline gfp_t alloc_hugepage_khugepaged_gfpmask(void)
{
@@ -860,7 +864,7 @@ static struct page *khugepaged_alloc_hugepage(bool *wait)
khugepaged_alloc_sleep();
} else
count_vm_event(THP_COLLAPSE_ALLOC);
- } while (unlikely(!hpage) && likely(khugepaged_enabled()));
+ } while (unlikely(!hpage) && likely(hugepage_flags_enabled()));
return hpage;
}
@@ -2186,7 +2190,7 @@ static unsigned int khugepaged_scan_mm_slot(unsigned int pages,
static int khugepaged_has_work(void)
{
return !list_empty(&khugepaged_scan.mm_head) &&
- khugepaged_enabled();
+ hugepage_flags_enabled();
}
static int khugepaged_wait_event(void)
@@ -2251,7 +2255,7 @@ static void khugepaged_wait_work(void)
return;
}
- if (khugepaged_enabled())
+ if (hugepage_flags_enabled())
wait_event_freezable(khugepaged_wait, khugepaged_wait_event());
}
@@ -2282,7 +2286,7 @@ static void set_recommended_min_free_kbytes(void)
int nr_zones = 0;
unsigned long recommended_min;
- if (!khugepaged_enabled()) {
+ if (!hugepage_flags_enabled()) {
calculate_min_free_kbytes();
goto update_wmarks;
}
@@ -2332,7 +2336,7 @@ int start_stop_khugepaged(void)
int err = 0;
mutex_lock(&khugepaged_mutex);
- if (khugepaged_enabled()) {
+ if (hugepage_flags_enabled()) {
if (!khugepaged_thread)
khugepaged_thread = kthread_run(khugepaged, NULL,
"khugepaged");
@@ -2358,7 +2362,7 @@ int start_stop_khugepaged(void)
void khugepaged_min_free_kbytes_update(void)
{
mutex_lock(&khugepaged_mutex);
- if (khugepaged_enabled() && khugepaged_thread)
+ if (hugepage_flags_enabled() && khugepaged_thread)
set_recommended_min_free_kbytes();
mutex_unlock(&khugepaged_mutex);
}
--
2.26.3
The page fault path checks THP eligibility with
__transhuge_page_enabled() which does the similar thing as
hugepage_vma_check(), so use hugepage_vma_check() instead.
However page fault allows DAX and !anon_vma cases, so added a new flag,
in_pf, to hugepage_vma_check() to make page fault work correctly.
The in_pf flag is also used to skip shmem and file THP for page fault
since shmem handles THP in its own shmem_fault() and file THP allocation
on fault is not supported yet.
Also remove hugepage_vma_enabled() since hugepage_vma_check() is the
only caller now, it is not necessary to have a helper function.
Reviewed-by: Zach O'Keefe <[email protected]>
Signed-off-by: Yang Shi <[email protected]>
---
fs/proc/task_mmu.c | 2 +-
include/linux/huge_mm.h | 57 ++---------------------------------------
mm/huge_memory.c | 51 ++++++++++++++++++++++++++++--------
mm/khugepaged.c | 8 +++---
mm/memory.c | 7 +++--
5 files changed, 52 insertions(+), 73 deletions(-)
diff --git a/fs/proc/task_mmu.c b/fs/proc/task_mmu.c
index 39a40ec181e7..cef72e49acc5 100644
--- a/fs/proc/task_mmu.c
+++ b/fs/proc/task_mmu.c
@@ -863,7 +863,7 @@ static int show_smap(struct seq_file *m, void *v)
__show_smap(m, &mss, false);
seq_printf(m, "THPeligible: %d\n",
- hugepage_vma_check(vma, vma->vm_flags, true));
+ hugepage_vma_check(vma, vma->vm_flags, true, false));
if (arch_pkeys_enabled())
seq_printf(m, "ProtectionKey: %8u\n", vma_pkey(vma));
diff --git a/include/linux/huge_mm.h b/include/linux/huge_mm.h
index 64487bcd0c7b..cd8a6c5d9fe5 100644
--- a/include/linux/huge_mm.h
+++ b/include/linux/huge_mm.h
@@ -146,48 +146,6 @@ static inline bool transhuge_vma_suitable(struct vm_area_struct *vma,
return true;
}
-static inline bool transhuge_vma_enabled(struct vm_area_struct *vma,
- unsigned long vm_flags)
-{
- /* Explicitly disabled through madvise. */
- if ((vm_flags & VM_NOHUGEPAGE) ||
- test_bit(MMF_DISABLE_THP, &vma->vm_mm->flags))
- return false;
- return true;
-}
-
-/*
- * to be used on vmas which are known to support THP.
- * Use transparent_hugepage_active otherwise
- */
-static inline bool __transparent_hugepage_enabled(struct vm_area_struct *vma)
-{
-
- /*
- * If the hardware/firmware marked hugepage support disabled.
- */
- if (transparent_hugepage_flags & (1 << TRANSPARENT_HUGEPAGE_NEVER_DAX))
- return false;
-
- if (!transhuge_vma_enabled(vma, vma->vm_flags))
- return false;
-
- if (vma_is_temporary_stack(vma))
- return false;
-
- if (transparent_hugepage_flags & (1 << TRANSPARENT_HUGEPAGE_FLAG))
- return true;
-
- if (vma_is_dax(vma))
- return true;
-
- if (transparent_hugepage_flags &
- (1 << TRANSPARENT_HUGEPAGE_REQ_MADV_FLAG))
- return !!(vma->vm_flags & VM_HUGEPAGE);
-
- return false;
-}
-
static inline bool file_thp_enabled(struct vm_area_struct *vma)
{
struct inode *inode;
@@ -204,7 +162,7 @@ static inline bool file_thp_enabled(struct vm_area_struct *vma)
bool hugepage_vma_check(struct vm_area_struct *vma,
unsigned long vm_flags,
- bool smaps);
+ bool smaps, bool in_pf);
#define transparent_hugepage_use_zero_page() \
(transparent_hugepage_flags & \
@@ -348,26 +306,15 @@ static inline bool folio_test_pmd_mappable(struct folio *folio)
return false;
}
-static inline bool __transparent_hugepage_enabled(struct vm_area_struct *vma)
-{
- return false;
-}
-
static inline bool transhuge_vma_suitable(struct vm_area_struct *vma,
unsigned long addr)
{
return false;
}
-static inline bool transhuge_vma_enabled(struct vm_area_struct *vma,
- unsigned long vm_flags)
-{
- return false;
-}
-
static inline bool hugepage_vma_check(struct vm_area_struct *vma,
unsigned long vm_flags,
- bool smaps)
+ bool smaps, bool in_pf)
{
return false;
}
diff --git a/mm/huge_memory.c b/mm/huge_memory.c
index a28c6100b491..d0c37d99917b 100644
--- a/mm/huge_memory.c
+++ b/mm/huge_memory.c
@@ -71,24 +71,50 @@ unsigned long huge_zero_pfn __read_mostly = ~0UL;
bool hugepage_vma_check(struct vm_area_struct *vma,
unsigned long vm_flags,
- bool smaps)
+ bool smaps, bool in_pf)
{
- if (!transhuge_vma_enabled(vma, vm_flags))
+ /*
+ * Explicitly disabled through madvise or prctl, or some
+ * architectures may disable THP for some mappings, for
+ * example, s390 kvm.
+ * */
+ if ((vm_flags & VM_NOHUGEPAGE) ||
+ test_bit(MMF_DISABLE_THP, &vma->vm_mm->flags))
return false;
-
- if (vm_flags & VM_NO_KHUGEPAGED)
+ /*
+ * If the hardware/firmware marked hugepage support disabled.
+ */
+ if (transparent_hugepage_flags & (1 << TRANSPARENT_HUGEPAGE_NEVER_DAX))
return false;
- /* Don't run khugepaged against DAX vma */
+ /* khugepaged doesn't collapse DAX vma, but page fault is fine. */
if (vma_is_dax(vma))
+ return in_pf;
+
+ /*
+ * Special VMA and hugetlb VMA.
+ * Must be checked after dax since some dax mappings may have
+ * VM_MIXEDMAP set.
+ */
+ if (vm_flags & VM_NO_KHUGEPAGED)
return false;
- /* Check alignment for file vma and size for both file and anon vma */
- if (!transhuge_vma_suitable(vma, (vma->vm_end - HPAGE_PMD_SIZE)))
+ /*
+ * Check alignment for file vma and size for both file and anon vma.
+ *
+ * Skip the check for page fault. Huge fault does the check in fault
+ * handlers. And this check is not suitable for huge PUD fault.
+ */
+ if (!in_pf &&
+ !transhuge_vma_suitable(vma, (vma->vm_end - HPAGE_PMD_SIZE)))
return false;
- /* Enabled via shmem mount options or sysfs settings. */
- if (shmem_file(vma->vm_file))
+ /*
+ * Enabled via shmem mount options or sysfs settings.
+ * Must be done before hugepage flags check since shmem has its
+ * own flags.
+ */
+ if (!in_pf && shmem_file(vma->vm_file))
return shmem_huge_enabled(vma);
if (!khugepaged_enabled())
@@ -99,7 +125,7 @@ bool hugepage_vma_check(struct vm_area_struct *vma,
return false;
/* Only regular file is valid */
- if (file_thp_enabled(vma))
+ if (!in_pf && file_thp_enabled(vma))
return true;
if (!vma_is_anonymous(vma))
@@ -111,9 +137,12 @@ bool hugepage_vma_check(struct vm_area_struct *vma,
/*
* THPeligible bit of smaps should show 1 for proper VMAs even
* though anon_vma is not initialized yet.
+ *
+ * Allow page fault since anon_vma may be not initialized until
+ * the first page fault.
*/
if (!vma->anon_vma)
- return smaps;
+ return (smaps || in_pf);
return true;
}
diff --git a/mm/khugepaged.c b/mm/khugepaged.c
index 3afd87f8c0b1..2a676f37c921 100644
--- a/mm/khugepaged.c
+++ b/mm/khugepaged.c
@@ -473,7 +473,7 @@ void khugepaged_enter_vma(struct vm_area_struct *vma,
{
if (!test_bit(MMF_VM_HUGEPAGE, &vma->vm_mm->flags) &&
khugepaged_enabled()) {
- if (hugepage_vma_check(vma, vm_flags, false))
+ if (hugepage_vma_check(vma, vm_flags, false, false))
__khugepaged_enter(vma->vm_mm);
}
}
@@ -918,7 +918,7 @@ static int hugepage_vma_revalidate(struct mm_struct *mm, unsigned long address,
if (!transhuge_vma_suitable(vma, address))
return SCAN_ADDRESS_RANGE;
- if (!hugepage_vma_check(vma, vma->vm_flags, false))
+ if (!hugepage_vma_check(vma, vma->vm_flags, false, false))
return SCAN_VMA_CHECK;
/*
* Anon VMA expected, the address may be unmapped then
@@ -1408,7 +1408,7 @@ void collapse_pte_mapped_thp(struct mm_struct *mm, unsigned long addr)
* the valid THP. Add extra VM_HUGEPAGE so hugepage_vma_check()
* will not fail the vma for missing VM_HUGEPAGE
*/
- if (!hugepage_vma_check(vma, vma->vm_flags | VM_HUGEPAGE, false))
+ if (!hugepage_vma_check(vma, vma->vm_flags | VM_HUGEPAGE, false, false))
return;
/* Keep pmd pgtable for uffd-wp; see comment in retract_page_tables() */
@@ -2103,7 +2103,7 @@ static unsigned int khugepaged_scan_mm_slot(unsigned int pages,
progress++;
break;
}
- if (!hugepage_vma_check(vma, vma->vm_flags, false)) {
+ if (!hugepage_vma_check(vma, vma->vm_flags, false, false)) {
skip:
progress++;
continue;
diff --git a/mm/memory.c b/mm/memory.c
index be724238a9d3..fee2884481f2 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -4985,6 +4985,7 @@ static vm_fault_t __handle_mm_fault(struct vm_area_struct *vma,
.gfp_mask = __get_fault_gfp_mask(vma),
};
struct mm_struct *mm = vma->vm_mm;
+ unsigned long vm_flags = vma->vm_flags;
pgd_t *pgd;
p4d_t *p4d;
vm_fault_t ret;
@@ -4998,7 +4999,8 @@ static vm_fault_t __handle_mm_fault(struct vm_area_struct *vma,
if (!vmf.pud)
return VM_FAULT_OOM;
retry_pud:
- if (pud_none(*vmf.pud) && __transparent_hugepage_enabled(vma)) {
+ if (pud_none(*vmf.pud) &&
+ hugepage_vma_check(vma, vm_flags, false, true)) {
ret = create_huge_pud(&vmf);
if (!(ret & VM_FAULT_FALLBACK))
return ret;
@@ -5031,7 +5033,8 @@ static vm_fault_t __handle_mm_fault(struct vm_area_struct *vma,
if (pud_trans_unstable(vmf.pud))
goto retry_pud;
- if (pmd_none(*vmf.pmd) && __transparent_hugepage_enabled(vma)) {
+ if (pmd_none(*vmf.pmd) &&
+ hugepage_vma_check(vma, vm_flags, false, true)) {
ret = create_huge_pmd(&vmf);
if (!(ret & VM_FAULT_FALLBACK))
return ret;
--
2.26.3
Currently the THP flag check in hugepage_vma_check() will fallthrough if
the flag is NEVER and VM_HUGEPAGE is set. This is not a problem for now
since all the callers have the flag checked before or can't be invoked if
the flag is NEVER.
However, the following patch will call hugepage_vma_check() in more
places, for example, page fault, so this flag must be checked in
hugepge_vma_check().
Reviewed-by: Zach O'Keefe <[email protected]>
Reviewed-by: Miaohe Lin <[email protected]>
Signed-off-by: Yang Shi <[email protected]>
---
mm/khugepaged.c | 3 +++
1 file changed, 3 insertions(+)
diff --git a/mm/khugepaged.c b/mm/khugepaged.c
index 476d79360101..b1dab94c0f1e 100644
--- a/mm/khugepaged.c
+++ b/mm/khugepaged.c
@@ -458,6 +458,9 @@ bool hugepage_vma_check(struct vm_area_struct *vma,
if (shmem_file(vma->vm_file))
return shmem_huge_enabled(vma);
+ if (!khugepaged_enabled())
+ return false;
+
/* THP settings require madvise. */
if (!(vm_flags & VM_HUGEPAGE) && !khugepaged_always())
return false;
--
2.26.3
There are couple of places that check whether the vma size is ok for
THP or whether address fits, they are open coded and duplicate, use
transhuge_vma_suitable() to do the job by passing in (vma->end -
HPAGE_PMD_SIZE).
Move vma size check into hugepage_vma_check(). This will make
khugepaged_enter() is as same as khugepaged_enter_vma(). There is just
one caller for khugepaged_enter(), replace it to khugepaged_enter_vma()
and remove khugepaged_enter().
Reviewed-by: Zach O'Keefe <[email protected]>
Signed-off-by: Yang Shi <[email protected]>
---
include/linux/huge_mm.h | 11 +++++++++++
include/linux/khugepaged.h | 14 --------------
mm/huge_memory.c | 2 +-
mm/khugepaged.c | 19 ++++++-------------
4 files changed, 18 insertions(+), 28 deletions(-)
diff --git a/include/linux/huge_mm.h b/include/linux/huge_mm.h
index 648cb3ce7099..8a5a8bfce0f5 100644
--- a/include/linux/huge_mm.h
+++ b/include/linux/huge_mm.h
@@ -116,6 +116,17 @@ extern struct kobj_attribute shmem_enabled_attr;
extern unsigned long transparent_hugepage_flags;
+/*
+ * Do the below checks:
+ * - For file vma, check if the linear page offset of vma is
+ * HPAGE_PMD_NR aligned within the file. The hugepage is
+ * guaranteed to be hugepage-aligned within the file, but we must
+ * check that the PMD-aligned addresses in the VMA map to
+ * PMD-aligned offsets within the file, else the hugepage will
+ * not be PMD-mappable.
+ * - For all vmas, check if the haddr is in an aligned HPAGE_PMD_SIZE
+ * area.
+ */
static inline bool transhuge_vma_suitable(struct vm_area_struct *vma,
unsigned long addr)
{
diff --git a/include/linux/khugepaged.h b/include/linux/khugepaged.h
index 392d34c3c59a..31ca8a7f78f4 100644
--- a/include/linux/khugepaged.h
+++ b/include/linux/khugepaged.h
@@ -51,16 +51,6 @@ static inline void khugepaged_exit(struct mm_struct *mm)
if (test_bit(MMF_VM_HUGEPAGE, &mm->flags))
__khugepaged_exit(mm);
}
-
-static inline void khugepaged_enter(struct vm_area_struct *vma,
- unsigned long vm_flags)
-{
- if (!test_bit(MMF_VM_HUGEPAGE, &vma->vm_mm->flags) &&
- khugepaged_enabled()) {
- if (hugepage_vma_check(vma, vm_flags))
- __khugepaged_enter(vma->vm_mm);
- }
-}
#else /* CONFIG_TRANSPARENT_HUGEPAGE */
static inline void khugepaged_fork(struct mm_struct *mm, struct mm_struct *oldmm)
{
@@ -68,10 +58,6 @@ static inline void khugepaged_fork(struct mm_struct *mm, struct mm_struct *oldmm
static inline void khugepaged_exit(struct mm_struct *mm)
{
}
-static inline void khugepaged_enter(struct vm_area_struct *vma,
- unsigned long vm_flags)
-{
-}
static inline void khugepaged_enter_vma(struct vm_area_struct *vma,
unsigned long vm_flags)
{
diff --git a/mm/huge_memory.c b/mm/huge_memory.c
index 4f9bbb4eab23..b530462c4493 100644
--- a/mm/huge_memory.c
+++ b/mm/huge_memory.c
@@ -726,7 +726,7 @@ vm_fault_t do_huge_pmd_anonymous_page(struct vm_fault *vmf)
return VM_FAULT_FALLBACK;
if (unlikely(anon_vma_prepare(vma)))
return VM_FAULT_OOM;
- khugepaged_enter(vma, vma->vm_flags);
+ khugepaged_enter_vma(vma, vma->vm_flags);
if (!(vmf->flags & FAULT_FLAG_WRITE) &&
!mm_forbids_zeropage(vma->vm_mm) &&
diff --git a/mm/khugepaged.c b/mm/khugepaged.c
index b1dab94c0f1e..db0b334a7d1f 100644
--- a/mm/khugepaged.c
+++ b/mm/khugepaged.c
@@ -450,8 +450,8 @@ bool hugepage_vma_check(struct vm_area_struct *vma,
if (vma_is_dax(vma))
return false;
- if (vma->vm_file && !IS_ALIGNED((vma->vm_start >> PAGE_SHIFT) -
- vma->vm_pgoff, HPAGE_PMD_NR))
+ /* Check alignment for file vma and size for both file and anon vma */
+ if (!transhuge_vma_suitable(vma, (vma->vm_end - HPAGE_PMD_SIZE)))
return false;
/* Enabled via shmem mount options or sysfs settings. */
@@ -512,9 +512,7 @@ void khugepaged_enter_vma(struct vm_area_struct *vma,
unsigned long vm_flags)
{
if (!test_bit(MMF_VM_HUGEPAGE, &vma->vm_mm->flags) &&
- khugepaged_enabled() &&
- (((vma->vm_start + ~HPAGE_PMD_MASK) & HPAGE_PMD_MASK) <
- (vma->vm_end & HPAGE_PMD_MASK))) {
+ khugepaged_enabled()) {
if (hugepage_vma_check(vma, vm_flags))
__khugepaged_enter(vma->vm_mm);
}
@@ -950,7 +948,6 @@ static int hugepage_vma_revalidate(struct mm_struct *mm, unsigned long address,
struct vm_area_struct **vmap)
{
struct vm_area_struct *vma;
- unsigned long hstart, hend;
if (unlikely(khugepaged_test_exit(mm)))
return SCAN_ANY_PROCESS;
@@ -959,9 +956,7 @@ static int hugepage_vma_revalidate(struct mm_struct *mm, unsigned long address,
if (!vma)
return SCAN_VMA_NULL;
- hstart = (vma->vm_start + ~HPAGE_PMD_MASK) & HPAGE_PMD_MASK;
- hend = vma->vm_end & HPAGE_PMD_MASK;
- if (address < hstart || address + HPAGE_PMD_SIZE > hend)
+ if (!transhuge_vma_suitable(vma, address))
return SCAN_ADDRESS_RANGE;
if (!hugepage_vma_check(vma, vma->vm_flags))
return SCAN_VMA_CHECK;
@@ -2147,10 +2142,8 @@ static unsigned int khugepaged_scan_mm_slot(unsigned int pages,
progress++;
continue;
}
- hstart = (vma->vm_start + ~HPAGE_PMD_MASK) & HPAGE_PMD_MASK;
- hend = vma->vm_end & HPAGE_PMD_MASK;
- if (hstart >= hend)
- goto skip;
+ hstart = round_up(vma->vm_start, HPAGE_PMD_SIZE);
+ hend = round_down(vma->vm_end, HPAGE_PMD_SIZE);
if (khugepaged_scan.address > hend)
goto skip;
if (khugepaged_scan.address < hstart)
--
2.26.3
On 16 Jun 10:48, Yang Shi wrote:
>
> v5: * Removed transparent_hugepage_active() for !THP, per Zach.
> Patch 4/7 and 5/7 were updated accordingly.
> * Collected review tags.
> v4: * Consolidated the transhuge_vma_size_ok() helper proposed in the
> earlier versions into transhuge_vma_suitable(), per Zach.
> * Fixed the regression introduced by patch 3/7, per Zach and Miaohe.
> * Reworded the comment for transhuge_vma_suitable(), per Zach.
> * Removed khugepaged_enter() per Miaohe.
> * More comments for hugepage_vma_check(), per Zach.
> * Squashed patch 4/7 (mm: khugepaged: use transhuge_vma_suitable replace open-code)
> in the earlier version into patch 2/7 of this version.
> * Minor correction to the doc about THPeligible (patch 7/7), so the
> total number of patches is kept 7.
> v3: * Fixed the comment from Willy
> v2: * Rebased to the latest mm-unstable
> * Fixed potential regression for smaps's THPeligible
>
> This series is the follow-up of the discussion about cleaning up transhuge_xxx
> helpers at https://lore.kernel.org/linux-mm/[email protected]/.
>
> THP has a bunch of helpers that do VMA sanity check for different paths, they
> do the similar checks for the most callsites and have a lot duplicate codes.
> And it is confusing what helpers should be used at what conditions.
>
> This series reorganized and cleaned up the code so that we could consolidate
> all the checks into hugepage_vma_check().
>
> The transhuge_vma_enabled(), transparent_hugepage_active() and
> __transparent_hugepage_enabled() are killed by this series.
>
>
> Yang Shi (7):
> mm: khugepaged: check THP flag in hugepage_vma_check()
> mm: thp: consolidate vma size check to transhuge_vma_suitable
> mm: khugepaged: better comments for anon vma check in hugepage_vma_revalidate
> mm: thp: kill transparent_hugepage_active()
> mm: thp: kill __transhuge_page_enabled()
> mm: khugepaged: reorg some khugepaged helpers
> doc: proc: fix the description to THPeligible
>
> Documentation/filesystems/proc.rst | 4 ++-
> fs/proc/task_mmu.c | 2 +-
> include/linux/huge_mm.h | 80 +++++++++++++++++++----------------------------------------
> include/linux/khugepaged.h | 30 ----------------------
> mm/huge_memory.c | 81 +++++++++++++++++++++++++++++++++++++++++++++++++++--------
> mm/khugepaged.c | 84 +++++++++++++++++++-------------------------------------------
> mm/memory.c | 7 ++++--
> 7 files changed, 130 insertions(+), 158 deletions(-)
>
Series LGTM. Thanks for these cleanups, Yang!
Best,
Zach
On Thu, Jun 16, 2022 at 4:08 PM Zach O'Keefe <[email protected]> wrote:
>
> On 16 Jun 10:48, Yang Shi wrote:
> >
> > v5: * Removed transparent_hugepage_active() for !THP, per Zach.
> > Patch 4/7 and 5/7 were updated accordingly.
> > * Collected review tags.
> > v4: * Consolidated the transhuge_vma_size_ok() helper proposed in the
> > earlier versions into transhuge_vma_suitable(), per Zach.
> > * Fixed the regression introduced by patch 3/7, per Zach and Miaohe.
> > * Reworded the comment for transhuge_vma_suitable(), per Zach.
> > * Removed khugepaged_enter() per Miaohe.
> > * More comments for hugepage_vma_check(), per Zach.
> > * Squashed patch 4/7 (mm: khugepaged: use transhuge_vma_suitable replace open-code)
> > in the earlier version into patch 2/7 of this version.
> > * Minor correction to the doc about THPeligible (patch 7/7), so the
> > total number of patches is kept 7.
> > v3: * Fixed the comment from Willy
> > v2: * Rebased to the latest mm-unstable
> > * Fixed potential regression for smaps's THPeligible
> >
> > This series is the follow-up of the discussion about cleaning up transhuge_xxx
> > helpers at https://lore.kernel.org/linux-mm/[email protected]/.
> >
> > THP has a bunch of helpers that do VMA sanity check for different paths, they
> > do the similar checks for the most callsites and have a lot duplicate codes.
> > And it is confusing what helpers should be used at what conditions.
> >
> > This series reorganized and cleaned up the code so that we could consolidate
> > all the checks into hugepage_vma_check().
> >
> > The transhuge_vma_enabled(), transparent_hugepage_active() and
> > __transparent_hugepage_enabled() are killed by this series.
> >
> >
> > Yang Shi (7):
> > mm: khugepaged: check THP flag in hugepage_vma_check()
> > mm: thp: consolidate vma size check to transhuge_vma_suitable
> > mm: khugepaged: better comments for anon vma check in hugepage_vma_revalidate
> > mm: thp: kill transparent_hugepage_active()
> > mm: thp: kill __transhuge_page_enabled()
> > mm: khugepaged: reorg some khugepaged helpers
> > doc: proc: fix the description to THPeligible
> >
> > Documentation/filesystems/proc.rst | 4 ++-
> > fs/proc/task_mmu.c | 2 +-
> > include/linux/huge_mm.h | 80 +++++++++++++++++++----------------------------------------
> > include/linux/khugepaged.h | 30 ----------------------
> > mm/huge_memory.c | 81 +++++++++++++++++++++++++++++++++++++++++++++++++++--------
> > mm/khugepaged.c | 84 +++++++++++++++++++-------------------------------------------
> > mm/memory.c | 7 ++++--
> > 7 files changed, 130 insertions(+), 158 deletions(-)
> >
>
> Series LGTM. Thanks for these cleanups, Yang!
Thank you so much for reviewing the patches.
>
> Best,
> Zach
On 16 Jun 10:48, Yang Shi wrote:
> The transparent_hugepage_active() was introduced to show THP eligibility
> bit in smaps in proc, smaps is the only user. But it actually does the
> similar check as hugepage_vma_check() which is used by khugepaged. We
> definitely don't have to maintain two similar checks, so kill
> transparent_hugepage_active().
>
> This patch also fixed the wrong behavior for VM_NO_KHUGEPAGED vmas.
>
> Also move hugepage_vma_check() to huge_memory.c and huge_mm.h since it
> is not only for khugepaged anymore.
>
> Reviewed-by: Zach O'Keefe <[email protected]>
> Signed-off-by: Yang Shi <[email protected]>
> ---
> fs/proc/task_mmu.c | 2 +-
> include/linux/huge_mm.h | 16 +++++++-----
> include/linux/khugepaged.h | 2 --
> mm/huge_memory.c | 50 +++++++++++++++++++++++++++++++-------
> mm/khugepaged.c | 48 +++---------------------------------
> 5 files changed, 56 insertions(+), 62 deletions(-)
>
> diff --git a/fs/proc/task_mmu.c b/fs/proc/task_mmu.c
> index 37ccb5c9f4f8..39a40ec181e7 100644
> --- a/fs/proc/task_mmu.c
> +++ b/fs/proc/task_mmu.c
> @@ -863,7 +863,7 @@ static int show_smap(struct seq_file *m, void *v)
> __show_smap(m, &mss, false);
>
> seq_printf(m, "THPeligible: %d\n",
> - transparent_hugepage_active(vma));
> + hugepage_vma_check(vma, vma->vm_flags, true));
>
> if (arch_pkeys_enabled())
> seq_printf(m, "ProtectionKey: %8u\n", vma_pkey(vma));
> diff --git a/include/linux/huge_mm.h b/include/linux/huge_mm.h
> index 8a5a8bfce0f5..64487bcd0c7b 100644
> --- a/include/linux/huge_mm.h
> +++ b/include/linux/huge_mm.h
> @@ -202,7 +202,9 @@ static inline bool file_thp_enabled(struct vm_area_struct *vma)
> !inode_is_open_for_write(inode) && S_ISREG(inode->i_mode);
> }
>
> -bool transparent_hugepage_active(struct vm_area_struct *vma);
> +bool hugepage_vma_check(struct vm_area_struct *vma,
> + unsigned long vm_flags,
> + bool smaps);
>
> #define transparent_hugepage_use_zero_page() \
> (transparent_hugepage_flags & \
> @@ -351,11 +353,6 @@ static inline bool __transparent_hugepage_enabled(struct vm_area_struct *vma)
> return false;
> }
>
> -static inline bool transparent_hugepage_active(struct vm_area_struct *vma)
> -{
> - return false;
> -}
> -
> static inline bool transhuge_vma_suitable(struct vm_area_struct *vma,
> unsigned long addr)
> {
> @@ -368,6 +365,13 @@ static inline bool transhuge_vma_enabled(struct vm_area_struct *vma,
> return false;
> }
>
> +static inline bool hugepage_vma_check(struct vm_area_struct *vma,
> + unsigned long vm_flags,
> + bool smaps)
> +{
> + return false;
> +}
> +
> static inline void prep_transhuge_page(struct page *page) {}
>
> #define transparent_hugepage_flags 0UL
> diff --git a/include/linux/khugepaged.h b/include/linux/khugepaged.h
> index 31ca8a7f78f4..ea5fd4c398f7 100644
> --- a/include/linux/khugepaged.h
> +++ b/include/linux/khugepaged.h
> @@ -10,8 +10,6 @@ extern struct attribute_group khugepaged_attr_group;
> extern int khugepaged_init(void);
> extern void khugepaged_destroy(void);
> extern int start_stop_khugepaged(void);
> -extern bool hugepage_vma_check(struct vm_area_struct *vma,
> - unsigned long vm_flags);
> extern void __khugepaged_enter(struct mm_struct *mm);
> extern void __khugepaged_exit(struct mm_struct *mm);
> extern void khugepaged_enter_vma(struct vm_area_struct *vma,
> diff --git a/mm/huge_memory.c b/mm/huge_memory.c
> index b530462c4493..a28c6100b491 100644
> --- a/mm/huge_memory.c
> +++ b/mm/huge_memory.c
> @@ -69,21 +69,53 @@ static atomic_t huge_zero_refcount;
> struct page *huge_zero_page __read_mostly;
> unsigned long huge_zero_pfn __read_mostly = ~0UL;
>
> -bool transparent_hugepage_active(struct vm_area_struct *vma)
> +bool hugepage_vma_check(struct vm_area_struct *vma,
> + unsigned long vm_flags,
> + bool smaps)
> {
> - /* The addr is used to check if the vma size fits */
> - unsigned long addr = (vma->vm_end & HPAGE_PMD_MASK) - HPAGE_PMD_SIZE;
> + if (!transhuge_vma_enabled(vma, vm_flags))
> + return false;
> +
During testing my work on top this patch, I found a small bug here.
Namely, transhuge_vma_enabled() will check vma->vm_mm->flags (to see if
MMF_DISABLE_THP is set); however, for vDSO vmas, vma->vm_mm is NULL.
Previously, transparent_hugepage_active() in smaps path would check
transhuge_vma_suitable() before checking these flags, which would fail for vDSO
vma since we'd take the !vma_is_anonymous() branch and find the vma (most
likely) wasn't suitably aligned (by chance ?).
Anyways, I think we need to check vma->vm_mm.
> + if (vm_flags & VM_NO_KHUGEPAGED)
> + return false;
> +
> + /* Don't run khugepaged against DAX vma */
> + if (vma_is_dax(vma))
> + return false;
>
> - if (!transhuge_vma_suitable(vma, addr))
> + /* Check alignment for file vma and size for both file and anon vma */
> + if (!transhuge_vma_suitable(vma, (vma->vm_end - HPAGE_PMD_SIZE)))
> return false;
> - if (vma_is_anonymous(vma))
> - return __transparent_hugepage_enabled(vma);
> - if (vma_is_shmem(vma))
> +
> + /* Enabled via shmem mount options or sysfs settings. */
> + if (shmem_file(vma->vm_file))
> return shmem_huge_enabled(vma);
> - if (transhuge_vma_enabled(vma, vma->vm_flags) && file_thp_enabled(vma))
> +
> + if (!khugepaged_enabled())
> + return false;
> +
> + /* THP settings require madvise. */
> + if (!(vm_flags & VM_HUGEPAGE) && !khugepaged_always())
> + return false;
> +
> + /* Only regular file is valid */
> + if (file_thp_enabled(vma))
> return true;
>
> - return false;
> + if (!vma_is_anonymous(vma))
> + return false;
> +
> + if (vma_is_temporary_stack(vma))
> + return false;
> +
> + /*
> + * THPeligible bit of smaps should show 1 for proper VMAs even
> + * though anon_vma is not initialized yet.
> + */
> + if (!vma->anon_vma)
> + return smaps;
> +
> + return true;
> }
>
> static bool get_huge_zero_page(void)
> diff --git a/mm/khugepaged.c b/mm/khugepaged.c
> index 5baa394e34c8..3afd87f8c0b1 100644
> --- a/mm/khugepaged.c
> +++ b/mm/khugepaged.c
> @@ -437,46 +437,6 @@ static inline int khugepaged_test_exit(struct mm_struct *mm)
> return atomic_read(&mm->mm_users) == 0;
> }
>
> -bool hugepage_vma_check(struct vm_area_struct *vma,
> - unsigned long vm_flags)
> -{
> - if (!transhuge_vma_enabled(vma, vm_flags))
> - return false;
> -
> - if (vm_flags & VM_NO_KHUGEPAGED)
> - return false;
> -
> - /* Don't run khugepaged against DAX vma */
> - if (vma_is_dax(vma))
> - return false;
> -
> - /* Check alignment for file vma and size for both file and anon vma */
> - if (!transhuge_vma_suitable(vma, (vma->vm_end - HPAGE_PMD_SIZE)))
> - return false;
> -
> - /* Enabled via shmem mount options or sysfs settings. */
> - if (shmem_file(vma->vm_file))
> - return shmem_huge_enabled(vma);
> -
> - if (!khugepaged_enabled())
> - return false;
> -
> - /* THP settings require madvise. */
> - if (!(vm_flags & VM_HUGEPAGE) && !khugepaged_always())
> - return false;
> -
> - /* Only regular file is valid */
> - if (file_thp_enabled(vma))
> - return true;
> -
> - if (!vma->anon_vma || !vma_is_anonymous(vma))
> - return false;
> - if (vma_is_temporary_stack(vma))
> - return false;
> -
> - return true;
> -}
> -
> void __khugepaged_enter(struct mm_struct *mm)
> {
> struct mm_slot *mm_slot;
> @@ -513,7 +473,7 @@ void khugepaged_enter_vma(struct vm_area_struct *vma,
> {
> if (!test_bit(MMF_VM_HUGEPAGE, &vma->vm_mm->flags) &&
> khugepaged_enabled()) {
> - if (hugepage_vma_check(vma, vm_flags))
> + if (hugepage_vma_check(vma, vm_flags, false))
> __khugepaged_enter(vma->vm_mm);
> }
> }
> @@ -958,7 +918,7 @@ static int hugepage_vma_revalidate(struct mm_struct *mm, unsigned long address,
>
> if (!transhuge_vma_suitable(vma, address))
> return SCAN_ADDRESS_RANGE;
> - if (!hugepage_vma_check(vma, vma->vm_flags))
> + if (!hugepage_vma_check(vma, vma->vm_flags, false))
> return SCAN_VMA_CHECK;
> /*
> * Anon VMA expected, the address may be unmapped then
> @@ -1448,7 +1408,7 @@ void collapse_pte_mapped_thp(struct mm_struct *mm, unsigned long addr)
> * the valid THP. Add extra VM_HUGEPAGE so hugepage_vma_check()
> * will not fail the vma for missing VM_HUGEPAGE
> */
> - if (!hugepage_vma_check(vma, vma->vm_flags | VM_HUGEPAGE))
> + if (!hugepage_vma_check(vma, vma->vm_flags | VM_HUGEPAGE, false))
> return;
>
> /* Keep pmd pgtable for uffd-wp; see comment in retract_page_tables() */
> @@ -2143,7 +2103,7 @@ static unsigned int khugepaged_scan_mm_slot(unsigned int pages,
> progress++;
> break;
> }
> - if (!hugepage_vma_check(vma, vma->vm_flags)) {
> + if (!hugepage_vma_check(vma, vma->vm_flags, false)) {
> skip:
> progress++;
> continue;
> --
> 2.26.3
>
On 06/16/22 10:48, Yang Shi wrote:
> This series is the follow-up of the discussion about cleaning up transhuge_xxx
> helpers at https://lore.kernel.org/linux-mm/[email protected]/.
>
> THP has a bunch of helpers that do VMA sanity check for different paths, they
> do the similar checks for the most callsites and have a lot duplicate codes.
> And it is confusing what helpers should be used at what conditions.
>
> This series reorganized and cleaned up the code so that we could consolidate
> all the checks into hugepage_vma_check().
>
> The transhuge_vma_enabled(), transparent_hugepage_active() and
> __transparent_hugepage_enabled() are killed by this series.
Running libhugetlbfs tests on next-20220621 produces the following:
[ 77.436038] BUG: kernel NULL pointer dereference, address: 0000000000000378
[ 77.437278] #PF: supervisor read access in kernel mode
[ 77.438211] #PF: error_code(0x0000) - not-present page
[ 77.439097] PGD 800000017a1a6067 P4D 800000017a1a6067 PUD 17f3b9067 PMD 0
[ 77.440021] Oops: 0000 [#7] PREEMPT SMP PTI
[ 77.440635] CPU: 1 PID: 2720 Comm: get_huge_pages Tainted: G D 5.19.0-rc3-next-20220621+ #22
[ 77.441973] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.15.0-1.fc35 04/01/2014
[ 77.443115] RIP: 0010:hugepage_vma_check+0x15/0x170
[ 77.444021] Code: 01 e9 84 fd ff ff 48 89 d8 e9 14 ff ff ff 0f 1f 80 00 00 00 00 0f 1f 44 00 00 f7 c6 00 00 00 40 0f 85 fe 00 00 00 48 8b 47 10 <48> 8b 80 78 03 00 00 48 c1 e8 18 83 e0 01 0f 85 e6 00 00 00 4c 8b
[ 77.447327] RSP: 0018:ffffc900039dfd20 EFLAGS: 00010246
[ 77.448317] RAX: 0000000000000000 RBX: ffff88817e4b27a0 RCX: 0000000000000000
[ 77.449681] RDX: 0000000000000001 RSI: 0000000000000004 RDI: ffffffff823f2000
[ 77.451040] RBP: ffffffff823f2000 R08: 0000000000000008 R09: ffff8881a3c042e6
[ 77.452353] R10: 00007ffe8b341000 R11: ffff8881a3c04526 R12: ffff88817e4b27a0
[ 77.453677] R13: ffffc900039dfd28 R14: ffff88817e4b27c8 R15: ffffffff823f2000
[ 77.455046] FS: 00007f0edc9880c0(0000) GS:ffff888277d00000(0000) knlGS:0000000000000000
[ 77.456625] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 77.457745] CR2: 0000000000000378 CR3: 0000000179394006 CR4: 0000000000370ee0
[ 77.459936] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 77.461308] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[ 77.462477] Call Trace:
[ 77.462950] <TASK>
[ 77.463402] show_smap+0xed/0x1c0
[ 77.464019] seq_read_iter+0x2af/0x480
[ 77.464674] seq_read+0xeb/0x120
[ 77.465286] vfs_read+0x97/0x190
[ 77.465880] ksys_read+0x5f/0xe0
[ 77.466488] do_syscall_64+0x3b/0x90
[ 77.467155] entry_SYSCALL_64_after_hwframe+0x46/0xb0
[ 77.468023] RIP: 0033:0x7f0edca7ade2
[ 77.468609] Code: c0 e9 b2 fe ff ff 50 48 8d 3d b2 3f 0a 00 e8 05 f0 01 00 0f 1f 44 00 00 f3 0f 1e fa 64 8b 04 25 18 00 00 00 85 c0 75 10 0f 05 <48> 3d 00 f0 ff ff 77 56 c3 0f 1f 44 00 00 48 83 ec 28 48 89 54 24
[ 77.471492] RSP: 002b:00007ffe8b324c28 EFLAGS: 00000246 ORIG_RAX: 0000000000000000
[ 77.472845] RAX: ffffffffffffffda RBX: 00000000022f82a0 RCX: 00007f0edca7ade2
[ 77.474017] RDX: 0000000000000400 RSI: 00000000022f8480 RDI: 0000000000000003
[ 77.475184] RBP: 00007f0edcb4f320 R08: 0000000000000003 R09: 0000000000000000
[ 77.476379] R10: 00007f0edcaffac0 R11: 0000000000000246 R12: 00000000022f82a0
[ 77.477517] R13: 0000000000000d68 R14: 00007f0edcb4e720 R15: 0000000000000d68
[ 77.478590] </TASK>
[ 77.479012] Modules linked in: rfkill ip6table_filter ip6_tables sunrpc snd_hda_codec_generic snd_hda_intel snd_intel_dspcfg snd_hda_codec snd_hwdep snd_hda_core snd_seq snd_seq_device snd_pcm joydev snd_timer 9p netfs 9pnet_virtio snd soundcore virtio_balloon 9pnet virtio_console virtio_blk virtio_net net_failover failover crct10dif_pclmul crc32_pclmul crc32c_intel ghash_clmulni_intel serio_raw virtio_pci virtio virtio_ring virtio_pci_legacy_dev virtio_pci_modern_dev fuse
[ 77.484573] CR2: 0000000000000378
[ 77.485123] ---[ end trace 0000000000000000 ]---
Looks to be related to this series. I'll start debugging unless someone
knows what the issue may be.
--
Mike Kravetz
On 21 Jun 14:07, Mike Kravetz wrote:
> On 06/16/22 10:48, Yang Shi wrote:
> > This series is the follow-up of the discussion about cleaning up transhuge_xxx
> > helpers at https://lore.kernel.org/linux-mm/[email protected]/.
> >
> > THP has a bunch of helpers that do VMA sanity check for different paths, they
> > do the similar checks for the most callsites and have a lot duplicate codes.
> > And it is confusing what helpers should be used at what conditions.
> >
> > This series reorganized and cleaned up the code so that we could consolidate
> > all the checks into hugepage_vma_check().
> >
> > The transhuge_vma_enabled(), transparent_hugepage_active() and
> > __transparent_hugepage_enabled() are killed by this series.
>
> Running libhugetlbfs tests on next-20220621 produces the following:
>
> [ 77.436038] BUG: kernel NULL pointer dereference, address: 0000000000000378
> [ 77.437278] #PF: supervisor read access in kernel mode
> [ 77.438211] #PF: error_code(0x0000) - not-present page
> [ 77.439097] PGD 800000017a1a6067 P4D 800000017a1a6067 PUD 17f3b9067 PMD 0
> [ 77.440021] Oops: 0000 [#7] PREEMPT SMP PTI
> [ 77.440635] CPU: 1 PID: 2720 Comm: get_huge_pages Tainted: G D 5.19.0-rc3-next-20220621+ #22
> [ 77.441973] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.15.0-1.fc35 04/01/2014
> [ 77.443115] RIP: 0010:hugepage_vma_check+0x15/0x170
> [ 77.444021] Code: 01 e9 84 fd ff ff 48 89 d8 e9 14 ff ff ff 0f 1f 80 00 00 00 00 0f 1f 44 00 00 f7 c6 00 00 00 40 0f 85 fe 00 00 00 48 8b 47 10 <48> 8b 80 78 03 00 00 48 c1 e8 18 83 e0 01 0f 85 e6 00 00 00 4c 8b
> [ 77.447327] RSP: 0018:ffffc900039dfd20 EFLAGS: 00010246
> [ 77.448317] RAX: 0000000000000000 RBX: ffff88817e4b27a0 RCX: 0000000000000000
> [ 77.449681] RDX: 0000000000000001 RSI: 0000000000000004 RDI: ffffffff823f2000
> [ 77.451040] RBP: ffffffff823f2000 R08: 0000000000000008 R09: ffff8881a3c042e6
> [ 77.452353] R10: 00007ffe8b341000 R11: ffff8881a3c04526 R12: ffff88817e4b27a0
> [ 77.453677] R13: ffffc900039dfd28 R14: ffff88817e4b27c8 R15: ffffffff823f2000
> [ 77.455046] FS: 00007f0edc9880c0(0000) GS:ffff888277d00000(0000) knlGS:0000000000000000
> [ 77.456625] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [ 77.457745] CR2: 0000000000000378 CR3: 0000000179394006 CR4: 0000000000370ee0
> [ 77.459936] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> [ 77.461308] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
> [ 77.462477] Call Trace:
> [ 77.462950] <TASK>
> [ 77.463402] show_smap+0xed/0x1c0
> [ 77.464019] seq_read_iter+0x2af/0x480
> [ 77.464674] seq_read+0xeb/0x120
> [ 77.465286] vfs_read+0x97/0x190
> [ 77.465880] ksys_read+0x5f/0xe0
> [ 77.466488] do_syscall_64+0x3b/0x90
> [ 77.467155] entry_SYSCALL_64_after_hwframe+0x46/0xb0
> [ 77.468023] RIP: 0033:0x7f0edca7ade2
> [ 77.468609] Code: c0 e9 b2 fe ff ff 50 48 8d 3d b2 3f 0a 00 e8 05 f0 01 00 0f 1f 44 00 00 f3 0f 1e fa 64 8b 04 25 18 00 00 00 85 c0 75 10 0f 05 <48> 3d 00 f0 ff ff 77 56 c3 0f 1f 44 00 00 48 83 ec 28 48 89 54 24
> [ 77.471492] RSP: 002b:00007ffe8b324c28 EFLAGS: 00000246 ORIG_RAX: 0000000000000000
> [ 77.472845] RAX: ffffffffffffffda RBX: 00000000022f82a0 RCX: 00007f0edca7ade2
> [ 77.474017] RDX: 0000000000000400 RSI: 00000000022f8480 RDI: 0000000000000003
> [ 77.475184] RBP: 00007f0edcb4f320 R08: 0000000000000003 R09: 0000000000000000
> [ 77.476379] R10: 00007f0edcaffac0 R11: 0000000000000246 R12: 00000000022f82a0
> [ 77.477517] R13: 0000000000000d68 R14: 00007f0edcb4e720 R15: 0000000000000d68
> [ 77.478590] </TASK>
> [ 77.479012] Modules linked in: rfkill ip6table_filter ip6_tables sunrpc snd_hda_codec_generic snd_hda_intel snd_intel_dspcfg snd_hda_codec snd_hwdep snd_hda_core snd_seq snd_seq_device snd_pcm joydev snd_timer 9p netfs 9pnet_virtio snd soundcore virtio_balloon 9pnet virtio_console virtio_blk virtio_net net_failover failover crct10dif_pclmul crc32_pclmul crc32c_intel ghash_clmulni_intel serio_raw virtio_pci virtio virtio_ring virtio_pci_legacy_dev virtio_pci_modern_dev fuse
> [ 77.484573] CR2: 0000000000000378
> [ 77.485123] ---[ end trace 0000000000000000 ]---
>
>
> Looks to be related to this series. I'll start debugging unless someone
> knows what the issue may be.
> --
> Mike Kravetz
Thanks Mike,
I posted the issue in response to '[v5 PATCH 4/7] mm: thp: kill
transparent_hugepage_active()' of this series[1], where the issue was
introduced.
TLDR: we need to check vma->vm_mm because for vDSO vma, this is NULL.
[1] https://lore.kernel.org/linux-mm/[email protected]/
On Tue, 21 Jun 2022 11:58:34 -0700 "Zach O'Keefe" <[email protected]> wrote:
> > -bool transparent_hugepage_active(struct vm_area_struct *vma)
> > +bool hugepage_vma_check(struct vm_area_struct *vma,
> > + unsigned long vm_flags,
> > + bool smaps)
> > {
> > - /* The addr is used to check if the vma size fits */
> > - unsigned long addr = (vma->vm_end & HPAGE_PMD_MASK) - HPAGE_PMD_SIZE;
> > + if (!transhuge_vma_enabled(vma, vm_flags))
> > + return false;
> > +
>
> During testing my work on top this patch, I found a small bug here.
>
> Namely, transhuge_vma_enabled() will check vma->vm_mm->flags (to see if
> MMF_DISABLE_THP is set); however, for vDSO vmas, vma->vm_mm is NULL.
>
> Previously, transparent_hugepage_active() in smaps path would check
> transhuge_vma_suitable() before checking these flags, which would fail for vDSO
> vma since we'd take the !vma_is_anonymous() branch and find the vma (most
> likely) wasn't suitably aligned (by chance ?).
>
> Anyways, I think we need to check vma->vm_mm.
Like this?
--- a/mm/huge_memory.c~mm-thp-kill-transparent_hugepage_active-fix
+++ a/mm/huge_memory.c
@@ -73,6 +73,9 @@ bool hugepage_vma_check(struct vm_area_s
unsigned long vm_flags,
bool smaps)
{
+ if (!vma->vm_mm)
+ return false;
+
if (!transhuge_vma_enabled(vma, vm_flags))
return false;
_
On 21 Jun 17:54, Andrew Morton wrote:
> On Tue, 21 Jun 2022 11:58:34 -0700 "Zach O'Keefe" <[email protected]> wrote:
>
> > > -bool transparent_hugepage_active(struct vm_area_struct *vma)
> > > +bool hugepage_vma_check(struct vm_area_struct *vma,
> > > + unsigned long vm_flags,
> > > + bool smaps)
> > > {
> > > - /* The addr is used to check if the vma size fits */
> > > - unsigned long addr = (vma->vm_end & HPAGE_PMD_MASK) - HPAGE_PMD_SIZE;
> > > + if (!transhuge_vma_enabled(vma, vm_flags))
> > > + return false;
> > > +
> >
> > During testing my work on top this patch, I found a small bug here.
> >
> > Namely, transhuge_vma_enabled() will check vma->vm_mm->flags (to see if
> > MMF_DISABLE_THP is set); however, for vDSO vmas, vma->vm_mm is NULL.
> >
> > Previously, transparent_hugepage_active() in smaps path would check
> > transhuge_vma_suitable() before checking these flags, which would fail for vDSO
> > vma since we'd take the !vma_is_anonymous() branch and find the vma (most
> > likely) wasn't suitably aligned (by chance ?).
> >
> > Anyways, I think we need to check vma->vm_mm.
>
> Like this?
>
> --- a/mm/huge_memory.c~mm-thp-kill-transparent_hugepage_active-fix
> +++ a/mm/huge_memory.c
> @@ -73,6 +73,9 @@ bool hugepage_vma_check(struct vm_area_s
> unsigned long vm_flags,
> bool smaps)
> {
> + if (!vma->vm_mm)
> + return false;
> +
> if (!transhuge_vma_enabled(vma, vm_flags))
> return false;
>
> _
>
Hey Andrew,
In principle, yes that would fix this. I don't know precisely how this fix will
be applied, but note that the subsequent patch "mm: thp: kill
__transhuge_page_enabled()" won't apply on top of this automatically. Also, I
wonder if we should add a comment for future travellers who wonder what kind of
vmas don't have an associated mm (it was news to me); though, I'm not sure if
vDSO is the only such case (though show_map_vma() seems to think so), or if this
just asking for stale comments down the road. Maybe it's fine as is.
Thanks,
Zach
On Thu, 16 Jun 2022 10:48:33 -0700 Yang Shi <[email protected]> wrote:
> This series is the follow-up of the discussion about cleaning up transhuge_xxx
> helpers at https://lore.kernel.org/linux-mm/[email protected]/.
>
> THP has a bunch of helpers that do VMA sanity check for different paths, they
> do the similar checks for the most callsites and have a lot duplicate codes.
> And it is confusing what helpers should be used at what conditions.
>
> This series reorganized and cleaned up the code so that we could consolidate
> all the checks into hugepage_vma_check().
>
> The transhuge_vma_enabled(), transparent_hugepage_active() and
> __transparent_hugepage_enabled() are killed by this series.
I plan to move this into mm-stable later this week, along with two fixups:
https://git.kernel.org/pub/scm/linux/kernel/git/akpm/25-new.git/tree/patches/mm-thp-kill-transparent_hugepage_active-fix.patch
https://git.kernel.org/pub/scm/linux/kernel/git/akpm/25-new.git/tree/patches/mm-thp-kill-transparent_hugepage_active-fix-fix.patch
On Sun, Jul 3, 2022 at 4:14 PM Andrew Morton <[email protected]> wrote:
>
> On Thu, 16 Jun 2022 10:48:33 -0700 Yang Shi <[email protected]> wrote:
>
> > This series is the follow-up of the discussion about cleaning up transhuge_xxx
> > helpers at https://lore.kernel.org/linux-mm/[email protected]/.
> >
> > THP has a bunch of helpers that do VMA sanity check for different paths, they
> > do the similar checks for the most callsites and have a lot duplicate codes.
> > And it is confusing what helpers should be used at what conditions.
> >
> > This series reorganized and cleaned up the code so that we could consolidate
> > all the checks into hugepage_vma_check().
> >
> > The transhuge_vma_enabled(), transparent_hugepage_active() and
> > __transparent_hugepage_enabled() are killed by this series.
>
> I plan to move this into mm-stable later this week, along with two fixups:
>
> https://git.kernel.org/pub/scm/linux/kernel/git/akpm/25-new.git/tree/patches/mm-thp-kill-transparent_hugepage_active-fix.patch
> https://git.kernel.org/pub/scm/linux/kernel/git/akpm/25-new.git/tree/patches/mm-thp-kill-transparent_hugepage_active-fix-fix.patch
Looks good to me. Thanks for the heads up.