2022-05-11 09:36:52

by Yang Shi

[permalink] [raw]
Subject: [mm-unstable v4 PATCH 0/8] Make khugepaged collapse readonly FS THP more consistent


Changelog
v4: * Incorporated Vlastimil's comments for patch 6/8.
* Reworked the commit log of patch 8/8 to make what the series fixed
clearer.
* Rebased onto mm-unstable tree.
* Collected the acks from Vlastimil.
v3: * Register mm to khugepaged in common mmap path instead of touching
filesystem code (patch 8/8).
* New patch 7/8 cleaned up and renamed khugepaged_enter_vma_merge()
to khugepaged_enter_vma().
* Collected acked-by from Song Liu for patch 1 ~ 6.
* Rebased on top of 5.18-rc1.
v2: * Collected reviewed-by tags from Miaohe Lin.
* Fixed build error for patch 4/8.

The readonly FS THP relies on khugepaged to collapse THP for suitable
vmas. But the behavior is inconsistent for "always" mode (https://lore.kernel.org/linux-mm/[email protected]/).

The "always" mode means THP allocation should be tried all the time and
khugepaged should try to collapse THP all the time. Of course the
allocation and collapse may fail due to other factors and conditions.

Currently file THP may not be collapsed by khugepaged even though all
the conditions are met. That does break the semantics of "always" mode.

So make sure readonly FS vmas are registered to khugepaged to fix the
break.

Registering suitable vmas in common mmap path, that could cover both
readonly FS vmas and shmem vmas, so removed the khugepaged calls in
shmem.c.

The patch 1 ~ 7 are minor bug fixes, clean up and preparation patches.
The patch 8 is the real meat.


Tested with khugepaged test in selftests and the testcase provided by
Vlastimil Babka in https://lore.kernel.org/lkml/[email protected]/
by commenting out MADV_HUGEPAGE call.


Yang Shi (8):
sched: coredump.h: clarify the use of MMF_VM_HUGEPAGE
mm: khugepaged: remove redundant check for VM_NO_KHUGEPAGED
mm: khugepaged: skip DAX vma
mm: thp: only regular file could be THP eligible
mm: khugepaged: make khugepaged_enter() void function
mm: khugepaged: make hugepage_vma_check() non-static
mm: khugepaged: introduce khugepaged_enter_vma() helper
mm: mmap: register suitable readonly file vmas for khugepaged

include/linux/huge_mm.h | 14 ++++++++++++++
include/linux/khugepaged.h | 44 ++++++++++++++++++--------------------------
include/linux/sched/coredump.h | 3 ++-
kernel/fork.c | 4 +---
mm/huge_memory.c | 15 ++++-----------
mm/khugepaged.c | 61 ++++++++++++++++++++++++++-----------------------------------
mm/mmap.c | 18 ++++++++++++------
mm/shmem.c | 12 ------------
8 files changed, 77 insertions(+), 94 deletions(-)



2022-05-11 09:44:32

by Yang Shi

[permalink] [raw]
Subject: [v4 PATCH 3/8] mm: khugepaged: skip DAX vma

The DAX vma may be seen by khugepaged when the mm has other khugepaged
suitable vmas. So khugepaged may try to collapse THP for DAX vma, but
it will fail due to page sanity check, for example, page is not
on LRU.

So it is not harmful, but it is definitely pointless to run khugepaged
against DAX vma, so skip it in early check.

Reviewed-by: Miaohe Lin <[email protected]>
Acked-by: Song Liu <[email protected]>
Acked-by: Vlastmil Babka <[email protected]>
Signed-off-by: Yang Shi <[email protected]>
---
mm/khugepaged.c | 4 ++++
1 file changed, 4 insertions(+)

diff --git a/mm/khugepaged.c b/mm/khugepaged.c
index dc8849d9dde4..a2380d88c3ea 100644
--- a/mm/khugepaged.c
+++ b/mm/khugepaged.c
@@ -447,6 +447,10 @@ static bool hugepage_vma_check(struct vm_area_struct *vma,
if (vm_flags & VM_NO_KHUGEPAGED)
return false;

+ /* Don't run khugepaged against DAX vma */
+ if (vma_is_dax(vma))
+ return false;
+
if (vma->vm_file && !IS_ALIGNED((vma->vm_start >> PAGE_SHIFT) -
vma->vm_pgoff, HPAGE_PMD_NR))
return false;
--
2.26.3


2022-05-11 10:02:55

by Yang Shi

[permalink] [raw]
Subject: [v4 PATCH 4/8] mm: thp: only regular file could be THP eligible

Since commit a4aeaa06d45e ("mm: khugepaged: skip huge page collapse for
special files"), khugepaged just collapses THP for regular file which is
the intended usecase for readonly fs THP. Only show regular file as THP
eligible accordingly.

And make file_thp_enabled() available for khugepaged too in order to remove
duplicate code.

Acked-by: Song Liu <[email protected]>
Acked-by: Vlastmil Babka <[email protected]>
Signed-off-by: Yang Shi <[email protected]>
---
include/linux/huge_mm.h | 14 ++++++++++++++
mm/huge_memory.c | 11 ++---------
mm/khugepaged.c | 9 ++-------
3 files changed, 18 insertions(+), 16 deletions(-)

diff --git a/include/linux/huge_mm.h b/include/linux/huge_mm.h
index fbf36bb1be22..de29821231c9 100644
--- a/include/linux/huge_mm.h
+++ b/include/linux/huge_mm.h
@@ -173,6 +173,20 @@ static inline bool __transparent_hugepage_enabled(struct vm_area_struct *vma)
return false;
}

+static inline bool file_thp_enabled(struct vm_area_struct *vma)
+{
+ struct inode *inode;
+
+ if (!vma->vm_file)
+ return false;
+
+ inode = vma->vm_file->f_inode;
+
+ return (IS_ENABLED(CONFIG_READ_ONLY_THP_FOR_FS)) &&
+ (vma->vm_flags & VM_EXEC) &&
+ !inode_is_open_for_write(inode) && S_ISREG(inode->i_mode);
+}
+
bool transparent_hugepage_active(struct vm_area_struct *vma);

#define transparent_hugepage_use_zero_page() \
diff --git a/mm/huge_memory.c b/mm/huge_memory.c
index d0c26a3b3b17..82434a9d4499 100644
--- a/mm/huge_memory.c
+++ b/mm/huge_memory.c
@@ -69,13 +69,6 @@ static atomic_t huge_zero_refcount;
struct page *huge_zero_page __read_mostly;
unsigned long huge_zero_pfn __read_mostly = ~0UL;

-static inline bool file_thp_enabled(struct vm_area_struct *vma)
-{
- return transhuge_vma_enabled(vma, vma->vm_flags) && vma->vm_file &&
- !inode_is_open_for_write(vma->vm_file->f_inode) &&
- (vma->vm_flags & VM_EXEC);
-}
-
bool transparent_hugepage_active(struct vm_area_struct *vma)
{
/* The addr is used to check if the vma size fits */
@@ -87,8 +80,8 @@ bool transparent_hugepage_active(struct vm_area_struct *vma)
return __transparent_hugepage_enabled(vma);
if (vma_is_shmem(vma))
return shmem_huge_enabled(vma);
- if (IS_ENABLED(CONFIG_READ_ONLY_THP_FOR_FS))
- return file_thp_enabled(vma);
+ if (transhuge_vma_enabled(vma, vma->vm_flags) && file_thp_enabled(vma))
+ return true;

return false;
}
diff --git a/mm/khugepaged.c b/mm/khugepaged.c
index a2380d88c3ea..c0d3215008ba 100644
--- a/mm/khugepaged.c
+++ b/mm/khugepaged.c
@@ -464,13 +464,8 @@ static bool hugepage_vma_check(struct vm_area_struct *vma,
return false;

/* Only regular file is valid */
- if (IS_ENABLED(CONFIG_READ_ONLY_THP_FOR_FS) && vma->vm_file &&
- (vm_flags & VM_EXEC)) {
- struct inode *inode = vma->vm_file->f_inode;
-
- return !inode_is_open_for_write(inode) &&
- S_ISREG(inode->i_mode);
- }
+ if (file_thp_enabled(vma))
+ return true;

if (!vma->anon_vma || !vma_is_anonymous(vma))
return false;
--
2.26.3


2022-05-11 11:32:33

by Yang Shi

[permalink] [raw]
Subject: [v4 PATCH 7/8] mm: khugepaged: introduce khugepaged_enter_vma() helper

The khugepaged_enter_vma_merge() actually does as the same thing as the
khugepaged_enter() section called by shmem_mmap(), so consolidate them
into one helper and rename it to khugepaged_enter_vma().

Acked-by: Vlastmil Babka <[email protected]>
Signed-off-by: Yang Shi <[email protected]>
---
include/linux/khugepaged.h | 8 ++++----
mm/khugepaged.c | 6 +++---
mm/mmap.c | 12 ++++++------
mm/shmem.c | 12 ++----------
4 files changed, 15 insertions(+), 23 deletions(-)

diff --git a/include/linux/khugepaged.h b/include/linux/khugepaged.h
index c340f6bb39d6..392d34c3c59a 100644
--- a/include/linux/khugepaged.h
+++ b/include/linux/khugepaged.h
@@ -14,8 +14,8 @@ extern bool hugepage_vma_check(struct vm_area_struct *vma,
unsigned long vm_flags);
extern void __khugepaged_enter(struct mm_struct *mm);
extern void __khugepaged_exit(struct mm_struct *mm);
-extern void khugepaged_enter_vma_merge(struct vm_area_struct *vma,
- unsigned long vm_flags);
+extern void khugepaged_enter_vma(struct vm_area_struct *vma,
+ unsigned long vm_flags);
extern void khugepaged_min_free_kbytes_update(void);
#ifdef CONFIG_SHMEM
extern void collapse_pte_mapped_thp(struct mm_struct *mm, unsigned long addr);
@@ -72,8 +72,8 @@ static inline void khugepaged_enter(struct vm_area_struct *vma,
unsigned long vm_flags)
{
}
-static inline void khugepaged_enter_vma_merge(struct vm_area_struct *vma,
- unsigned long vm_flags)
+static inline void khugepaged_enter_vma(struct vm_area_struct *vma,
+ unsigned long vm_flags)
{
}
static inline void collapse_pte_mapped_thp(struct mm_struct *mm,
diff --git a/mm/khugepaged.c b/mm/khugepaged.c
index dec449339964..32db587c5224 100644
--- a/mm/khugepaged.c
+++ b/mm/khugepaged.c
@@ -365,7 +365,7 @@ int hugepage_madvise(struct vm_area_struct *vma,
* register it here without waiting a page fault that
* may not happen any time soon.
*/
- khugepaged_enter_vma_merge(vma, *vm_flags);
+ khugepaged_enter_vma(vma, *vm_flags);
break;
case MADV_NOHUGEPAGE:
*vm_flags &= ~VM_HUGEPAGE;
@@ -505,8 +505,8 @@ void __khugepaged_enter(struct mm_struct *mm)
wake_up_interruptible(&khugepaged_wait);
}

-void khugepaged_enter_vma_merge(struct vm_area_struct *vma,
- unsigned long vm_flags)
+void khugepaged_enter_vma(struct vm_area_struct *vma,
+ unsigned long vm_flags)
{
if (!test_bit(MMF_VM_HUGEPAGE, &vma->vm_mm->flags) &&
khugepaged_enabled() &&
diff --git a/mm/mmap.c b/mm/mmap.c
index 3445a8c304af..34ff1600426c 100644
--- a/mm/mmap.c
+++ b/mm/mmap.c
@@ -1122,7 +1122,7 @@ struct vm_area_struct *vma_merge(struct mm_struct *mm,
end, prev->vm_pgoff, NULL, prev);
if (err)
return NULL;
- khugepaged_enter_vma_merge(prev, vm_flags);
+ khugepaged_enter_vma(prev, vm_flags);
return prev;
}

@@ -1149,7 +1149,7 @@ struct vm_area_struct *vma_merge(struct mm_struct *mm,
}
if (err)
return NULL;
- khugepaged_enter_vma_merge(area, vm_flags);
+ khugepaged_enter_vma(area, vm_flags);
return area;
}

@@ -2046,7 +2046,7 @@ int expand_upwards(struct vm_area_struct *vma, unsigned long address)
}
}
anon_vma_unlock_write(vma->anon_vma);
- khugepaged_enter_vma_merge(vma, vma->vm_flags);
+ khugepaged_enter_vma(vma, vma->vm_flags);
return error;
}
#endif /* CONFIG_STACK_GROWSUP || CONFIG_IA64 */
@@ -2127,7 +2127,7 @@ int expand_downwards(struct vm_area_struct *vma, unsigned long address)
}
}
anon_vma_unlock_write(vma->anon_vma);
- khugepaged_enter_vma_merge(vma, vma->vm_flags);
+ khugepaged_enter_vma(vma, vma->vm_flags);
return error;
}

@@ -2635,7 +2635,7 @@ unsigned long mmap_region(struct file *file, unsigned long addr,
/* Actually expand, if possible */
if (vma &&
!vma_expand(&mas, vma, merge_start, merge_end, vm_pgoff, next)) {
- khugepaged_enter_vma_merge(vma, vm_flags);
+ khugepaged_enter_vma(vma, vm_flags);
goto expanded;
}

@@ -3051,7 +3051,7 @@ static int do_brk_flags(struct ma_state *mas, struct vm_area_struct *vma,
anon_vma_interval_tree_post_update_vma(vma);
anon_vma_unlock_write(vma->anon_vma);
}
- khugepaged_enter_vma_merge(vma, flags);
+ khugepaged_enter_vma(vma, flags);
goto out;
}

diff --git a/mm/shmem.c b/mm/shmem.c
index 29701be579f8..89f6f4fec3f9 100644
--- a/mm/shmem.c
+++ b/mm/shmem.c
@@ -2232,11 +2232,7 @@ static int shmem_mmap(struct file *file, struct vm_area_struct *vma)

file_accessed(file);
vma->vm_ops = &shmem_vm_ops;
- if (IS_ENABLED(CONFIG_TRANSPARENT_HUGEPAGE) &&
- ((vma->vm_start + ~HPAGE_PMD_MASK) & HPAGE_PMD_MASK) <
- (vma->vm_end & HPAGE_PMD_MASK)) {
- khugepaged_enter(vma, vma->vm_flags);
- }
+ khugepaged_enter_vma(vma, vma->vm_flags);
return 0;
}

@@ -4137,11 +4133,7 @@ int shmem_zero_setup(struct vm_area_struct *vma)
vma->vm_file = file;
vma->vm_ops = &shmem_vm_ops;

- if (IS_ENABLED(CONFIG_TRANSPARENT_HUGEPAGE) &&
- ((vma->vm_start + ~HPAGE_PMD_MASK) & HPAGE_PMD_MASK) <
- (vma->vm_end & HPAGE_PMD_MASK)) {
- khugepaged_enter(vma, vma->vm_flags);
- }
+ khugepaged_enter_vma(vma, vma->vm_flags);

return 0;
}
--
2.26.3