2022-03-01 00:12:32

by Yang Shi

[permalink] [raw]
Subject: [PATCH 0/8] Make khugepaged collapse readonly FS THP more consistent


The readonly FS THP relies on khugepaged to collapse THP for suitable
vmas. But it is kind of "random luck" for khugepaged to see the
readonly FS vmas (see report: https://lore.kernel.org/linux-mm/[email protected]/) since currently the vmas are registered to khugepaged when:
- Anon huge pmd page fault
- VMA merge
- MADV_HUGEPAGE
- Shmem mmap

If the above conditions are not met, even though khugepaged is enabled
it won't see readonly FS vmas at all. MADV_HUGEPAGE could be specified
explicitly to tell khugepaged to collapse this area, but when khugepaged
mode is "always" it should scan suitable vmas as long as VM_NOHUGEPAGE
is not set.

So make sure readonly FS vmas are registered to khugepaged to make the
behavior more consistent.

Registering the vmas in mmap path seems more preferred from performance
point of view since page fault path is definitely hot path.


The patch 1 ~ 7 are minor bug fixes, clean up and preparation patches.
The patch 8 converts ext4 and xfs. We may need convert more filesystems,
but I'd like to hear some comments before doing that.


Tested with khugepaged test in selftests and the testcase provided by
Vlastimil Babka in https://lore.kernel.org/lkml/[email protected]/
by commenting out MADV_HUGEPAGE call.


b/fs/ext4/file.c | 4 +++
b/fs/xfs/xfs_file.c | 4 +++
b/include/linux/huge_mm.h | 9 +++++++
b/include/linux/khugepaged.h | 69 +++++++++++++++++++++----------------------------------------
b/include/linux/sched/coredump.h | 3 +-
b/kernel/fork.c | 4 ---
b/mm/huge_memory.c | 15 +++----------
b/mm/khugepaged.c | 71 ++++++++++++++++++++++++++++++++++++++++++++-------------------
b/mm/shmem.c | 14 +++---------
9 files changed, 102 insertions(+), 91 deletions(-)



2022-03-01 00:25:40

by Yang Shi

[permalink] [raw]
Subject: [PATCH 1/8] sched: coredump.h: clarify the use of MMF_VM_HUGEPAGE

MMF_VM_HUGEPAGE is set as long as the mm is available for khugepaged by
khugepaged_enter(), not only when VM_HUGEPAGE is set on vma. Correct
the comment to avoid confusion.

Signed-off-by: Yang Shi <[email protected]>
---
include/linux/sched/coredump.h | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/include/linux/sched/coredump.h b/include/linux/sched/coredump.h
index 4d9e3a656875..4d0a5be28b70 100644
--- a/include/linux/sched/coredump.h
+++ b/include/linux/sched/coredump.h
@@ -57,7 +57,8 @@ static inline int get_dumpable(struct mm_struct *mm)
#endif
/* leave room for more dump flags */
#define MMF_VM_MERGEABLE 16 /* KSM may merge identical pages */
-#define MMF_VM_HUGEPAGE 17 /* set when VM_HUGEPAGE is set on vma */
+#define MMF_VM_HUGEPAGE 17 /* set when mm is available for
+ khugepaged */
/*
* This one-shot flag is dropped due to necessity of changing exe once again
* on NFS restore
--
2.26.3

2022-03-01 00:27:48

by Yang Shi

[permalink] [raw]
Subject: [PATCH 6/8] mm: khugepaged: move some khugepaged_* functions to khugepaged.c

This move also makes the following patches easier. The following patches
will call khugepaged_enter() for regular filesystems to make readonly FS
THP collapse more consistent. They need to use some macros defined in
huge_mm.h, for example, HPAGE_PMD_*, but it seems not preferred to
polute filesystems code with including unnecessary header files. With
this move the filesystems code just need include khugepaged.h, which is
quite small and the use is quite specific, to call khugepaged_enter()
to hook mm with khugepaged.

And the khugepaged_* functions actually are wrappers for some non-inline
functions, so it seems the benefits are not too much to keep them inline.

This also helps to reuse hugepage_vma_check() for khugepaged_enter() so
that we could remove some duplicate checks.

Signed-off-by: Yang Shi <[email protected]>
---
include/linux/khugepaged.h | 33 ++++++---------------------------
mm/khugepaged.c | 20 ++++++++++++++++++++
2 files changed, 26 insertions(+), 27 deletions(-)

diff --git a/include/linux/khugepaged.h b/include/linux/khugepaged.h
index 0423d3619f26..54e169116d49 100644
--- a/include/linux/khugepaged.h
+++ b/include/linux/khugepaged.h
@@ -16,6 +16,12 @@ extern void __khugepaged_enter(struct mm_struct *mm);
extern void __khugepaged_exit(struct mm_struct *mm);
extern void khugepaged_enter_vma_merge(struct vm_area_struct *vma,
unsigned long vm_flags);
+extern void khugepaged_fork(struct mm_struct *mm,
+ struct mm_struct *oldmm);
+extern void khugepaged_exit(struct mm_struct *mm);
+extern void khugepaged_enter(struct vm_area_struct *vma,
+ unsigned long vm_flags);
+
extern void khugepaged_min_free_kbytes_update(void);
#ifdef CONFIG_SHMEM
extern void collapse_pte_mapped_thp(struct mm_struct *mm, unsigned long addr);
@@ -33,36 +39,9 @@ static inline void collapse_pte_mapped_thp(struct mm_struct *mm,
#define khugepaged_always() \
(transparent_hugepage_flags & \
(1<<TRANSPARENT_HUGEPAGE_FLAG))
-#define khugepaged_req_madv() \
- (transparent_hugepage_flags & \
- (1<<TRANSPARENT_HUGEPAGE_REQ_MADV_FLAG))
#define khugepaged_defrag() \
(transparent_hugepage_flags & \
(1<<TRANSPARENT_HUGEPAGE_DEFRAG_KHUGEPAGED_FLAG))
-
-static inline void khugepaged_fork(struct mm_struct *mm, struct mm_struct *oldmm)
-{
- if (test_bit(MMF_VM_HUGEPAGE, &oldmm->flags))
- __khugepaged_enter(mm);
-}
-
-static inline void khugepaged_exit(struct mm_struct *mm)
-{
- if (test_bit(MMF_VM_HUGEPAGE, &mm->flags))
- __khugepaged_exit(mm);
-}
-
-static inline void khugepaged_enter(struct vm_area_struct *vma,
- unsigned long vm_flags)
-{
- if (!test_bit(MMF_VM_HUGEPAGE, &vma->vm_mm->flags))
- if ((khugepaged_always() ||
- (shmem_file(vma->vm_file) && shmem_huge_enabled(vma)) ||
- (khugepaged_req_madv() && (vm_flags & VM_HUGEPAGE))) &&
- !(vm_flags & VM_NOHUGEPAGE) &&
- !test_bit(MMF_DISABLE_THP, &vma->vm_mm->flags))
- __khugepaged_enter(vma->vm_mm);
-}
#else /* CONFIG_TRANSPARENT_HUGEPAGE */
static inline void khugepaged_fork(struct mm_struct *mm, struct mm_struct *oldmm)
{
diff --git a/mm/khugepaged.c b/mm/khugepaged.c
index b87af297e652..4cb4379ecf25 100644
--- a/mm/khugepaged.c
+++ b/mm/khugepaged.c
@@ -557,6 +557,26 @@ void __khugepaged_exit(struct mm_struct *mm)
}
}

+void khugepaged_fork(struct mm_struct *mm, struct mm_struct *oldmm)
+{
+ if (test_bit(MMF_VM_HUGEPAGE, &oldmm->flags))
+ __khugepaged_enter(mm);
+}
+
+void khugepaged_exit(struct mm_struct *mm)
+{
+ if (test_bit(MMF_VM_HUGEPAGE, &mm->flags))
+ __khugepaged_exit(mm);
+}
+
+void khugepaged_enter(struct vm_area_struct *vma, unsigned long vm_flags)
+{
+ if (!test_bit(MMF_VM_HUGEPAGE, &vma->vm_mm->flags) &&
+ khugepaged_enabled())
+ if (hugepage_vma_check(vma, vm_flags))
+ __khugepaged_enter(vma->vm_mm);
+}
+
static void release_pte_page(struct page *page)
{
mod_node_page_state(page_pgdat(page),
--
2.26.3

2022-03-01 00:30:56

by Yang Shi

[permalink] [raw]
Subject: [PATCH 8/8] fs: register suitable readonly vmas for khugepaged

The readonly FS THP relies on khugepaged to collapse THP for suitable
vmas. But it is kind of "random luck" for khugepaged to see the
readonly FS vmas (https://lore.kernel.org/linux-mm/[email protected]/)
since currently the vmas are registered to khugepaged when:
- Anon huge pmd page fault
- VMA merge
- MADV_HUGEPAGE
- Shmem mmap

If the above conditions are not met, even though khugepaged is enabled
it won't see readonly FS vmas at all. MADV_HUGEPAGE could be specified
explicitly to tell khugepaged to collapse this area, but when khugepaged
mode is "always" it should scan suitable vmas as long as VM_NOHUGEPAGE
is not set.

So make sure readonly FS vmas are registered to khugepaged to make the
behavior more consistent.

Registering the vmas in mmap path seems more preferred from performance
point of view since page fault path is definitely hot path.

Reported-by: Vlastimil Babka <[email protected]>
Signed-off-by: Yang Shi <[email protected]>
---
fs/ext4/file.c | 4 ++++
fs/xfs/xfs_file.c | 4 ++++
2 files changed, 8 insertions(+)

diff --git a/fs/ext4/file.c b/fs/ext4/file.c
index 8cc11715518a..b894cd5aff44 100644
--- a/fs/ext4/file.c
+++ b/fs/ext4/file.c
@@ -30,6 +30,7 @@
#include <linux/uio.h>
#include <linux/mman.h>
#include <linux/backing-dev.h>
+#include <linux/khugepaged.h>
#include "ext4.h"
#include "ext4_jbd2.h"
#include "xattr.h"
@@ -782,6 +783,9 @@ static int ext4_file_mmap(struct file *file, struct vm_area_struct *vma)
} else {
vma->vm_ops = &ext4_file_vm_ops;
}
+
+ khugepaged_enter_file(vma, vma->vm_flags);
+
return 0;
}

diff --git a/fs/xfs/xfs_file.c b/fs/xfs/xfs_file.c
index 5bddb1e9e0b3..d94144b1fb0f 100644
--- a/fs/xfs/xfs_file.c
+++ b/fs/xfs/xfs_file.c
@@ -30,6 +30,7 @@
#include <linux/mman.h>
#include <linux/fadvise.h>
#include <linux/mount.h>
+#include <linux/khugepaged.h>

static const struct vm_operations_struct xfs_file_vm_ops;

@@ -1407,6 +1408,9 @@ xfs_file_mmap(
vma->vm_ops = &xfs_file_vm_ops;
if (IS_DAX(inode))
vma->vm_flags |= VM_HUGEPAGE;
+
+ khugepaged_enter_file(vma, vma->vm_flags);
+
return 0;
}

--
2.26.3

2022-03-01 00:46:08

by Yang Shi

[permalink] [raw]
Subject: [PATCH 2/8] mm: khugepaged: remove redundant check for VM_NO_KHUGEPAGED

The hugepage_vma_check() called by khugepaged_enter_vma_merge() does
check VM_NO_KHUGEPAGED. Remove the check from caller and move the check
in hugepage_vma_check() up.

More checks may be run for VM_NO_KHUGEPAGED vmas, but MADV_HUGEPAGE is
definitely not a hot path, so cleaner code does outweigh.

Signed-off-by: Yang Shi <[email protected]>
---
mm/khugepaged.c | 9 ++++++---
1 file changed, 6 insertions(+), 3 deletions(-)

diff --git a/mm/khugepaged.c b/mm/khugepaged.c
index 131492fd1148..82c71c6da9ce 100644
--- a/mm/khugepaged.c
+++ b/mm/khugepaged.c
@@ -366,8 +366,7 @@ int hugepage_madvise(struct vm_area_struct *vma,
* register it here without waiting a page fault that
* may not happen any time soon.
*/
- if (!(*vm_flags & VM_NO_KHUGEPAGED) &&
- khugepaged_enter_vma_merge(vma, *vm_flags))
+ if (khugepaged_enter_vma_merge(vma, *vm_flags))
return -ENOMEM;
break;
case MADV_NOHUGEPAGE:
@@ -446,6 +445,9 @@ static bool hugepage_vma_check(struct vm_area_struct *vma,
if (!transhuge_vma_enabled(vma, vm_flags))
return false;

+ if (vm_flags & VM_NO_KHUGEPAGED)
+ return false;
+
if (vma->vm_file && !IS_ALIGNED((vma->vm_start >> PAGE_SHIFT) -
vma->vm_pgoff, HPAGE_PMD_NR))
return false;
@@ -471,7 +473,8 @@ static bool hugepage_vma_check(struct vm_area_struct *vma,
return false;
if (vma_is_temporary_stack(vma))
return false;
- return !(vm_flags & VM_NO_KHUGEPAGED);
+
+ return true;
}

int __khugepaged_enter(struct mm_struct *mm)
--
2.26.3

2022-03-03 00:04:20

by Yang Shi

[permalink] [raw]
Subject: Re: [PATCH 2/8] mm: khugepaged: remove redundant check for VM_NO_KHUGEPAGED

On Tue, Mar 1, 2022 at 1:07 AM Miaohe Lin <[email protected]> wrote:
>
> On 2022/3/1 7:57, Yang Shi wrote:
> > The hugepage_vma_check() called by khugepaged_enter_vma_merge() does
> > check VM_NO_KHUGEPAGED. Remove the check from caller and move the check
> > in hugepage_vma_check() up.
> >
> > More checks may be run for VM_NO_KHUGEPAGED vmas, but MADV_HUGEPAGE is
> > definitely not a hot path, so cleaner code does outweigh.
> >
> > Signed-off-by: Yang Shi <[email protected]>
> > ---
> > mm/khugepaged.c | 9 ++++++---
> > 1 file changed, 6 insertions(+), 3 deletions(-)
> >
> > diff --git a/mm/khugepaged.c b/mm/khugepaged.c
> > index 131492fd1148..82c71c6da9ce 100644
> > --- a/mm/khugepaged.c
> > +++ b/mm/khugepaged.c
> > @@ -366,8 +366,7 @@ int hugepage_madvise(struct vm_area_struct *vma,
> > * register it here without waiting a page fault that
> > * may not happen any time soon.
> > */
> > - if (!(*vm_flags & VM_NO_KHUGEPAGED) &&
> > - khugepaged_enter_vma_merge(vma, *vm_flags))
> > + if (khugepaged_enter_vma_merge(vma, *vm_flags))
> > return -ENOMEM;
> > break;
> > case MADV_NOHUGEPAGE:
> > @@ -446,6 +445,9 @@ static bool hugepage_vma_check(struct vm_area_struct *vma,
> > if (!transhuge_vma_enabled(vma, vm_flags))
> > return false;
> >
> > + if (vm_flags & VM_NO_KHUGEPAGED)
> > + return false;
> > +
>
> This patch does improve the readability. But I have a question.
> It seems VM_NO_KHUGEPAGED is not checked in the below if-condition:
>
> /* Only regular file is valid */
> if (IS_ENABLED(CONFIG_READ_ONLY_THP_FOR_FS) && vma->vm_file &&
> (vm_flags & VM_EXEC)) {
> struct inode *inode = vma->vm_file->f_inode;
>
> return !inode_is_open_for_write(inode) &&
> S_ISREG(inode->i_mode);
> }
>
> If we return false due to VM_NO_KHUGEPAGED here, it seems it will affect the
> return value of this CONFIG_READ_ONLY_THP_FOR_FS condition check.
> Or am I miss something?

Yes, it will return false instead of true if that file THP check is
true, but wasn't that old behavior actually problematic? Khugepaged
definitely can't collapse VM_NO_KHUGEPAGED vmas even though it
satisfies all the readonly file THP checks. With the old behavior
khugepaged may scan an exec file hugetlb vma IIUC although it will
fail later due to other page sanity checks, i.e. page compound check.

>
> Thanks.
>
> > if (vma->vm_file && !IS_ALIGNED((vma->vm_start >> PAGE_SHIFT) -
> > vma->vm_pgoff, HPAGE_PMD_NR))
> > return false;
> > @@ -471,7 +473,8 @@ static bool hugepage_vma_check(struct vm_area_struct *vma,
> > return false;
> > if (vma_is_temporary_stack(vma))
> > return false;
> > - return !(vm_flags & VM_NO_KHUGEPAGED);
> > +
> > + return true;
> > }
> >
> > int __khugepaged_enter(struct mm_struct *mm)
> >
>
>