2022-06-11 09:30:08

by Miaohe Lin

[permalink] [raw]
Subject: [PATCH 0/7] A few cleanup patches for khugepaged

Hi everyone,
This series contains a few cleaup patches to remove unneeded return
value, use helper macro, fix typos and so on. More details can be
found in the respective changelogs. Thanks!

Miaohe Lin (7):
mm/khugepaged: remove unneeded shmem_huge_enabled() check
mm/khugepaged: stop swapping in page when VM_FAULT_RETRY occurs
mm/khugepaged: trivial typo and codestyle cleanup
mm/khugepaged: minor cleanup for collapse_file
mm/khugepaged: use helper macro __ATTR_RW
mm/khugepaged: remove unneeded return value of
khugepaged_add_pte_mapped_thp()
mm/khugepaged: try to free transhuge swapcache when possible

include/linux/swap.h | 5 +++
mm/khugepaged.c | 85 +++++++++++++++++++-------------------------
mm/swap.h | 5 ---
3 files changed, 41 insertions(+), 54 deletions(-)

--
2.23.0


2022-06-11 09:33:23

by Miaohe Lin

[permalink] [raw]
Subject: [PATCH 4/7] mm/khugepaged: minor cleanup for collapse_file

nr_none is always 0 for non-shmem case because the page can be read from
the backend store. So when nr_none ! = 0, it must be in is_shmem case.
Also only adjust the nrpages and uncharge shmem when nr_none != 0 to save
cpu cycles.

Signed-off-by: Miaohe Lin <[email protected]>
---
mm/khugepaged.c | 9 ++++-----
1 file changed, 4 insertions(+), 5 deletions(-)

diff --git a/mm/khugepaged.c b/mm/khugepaged.c
index 1b5dd3820eac..8e6fad7c7bd9 100644
--- a/mm/khugepaged.c
+++ b/mm/khugepaged.c
@@ -1885,8 +1885,7 @@ static void collapse_file(struct mm_struct *mm,

if (nr_none) {
__mod_lruvec_page_state(new_page, NR_FILE_PAGES, nr_none);
- if (is_shmem)
- __mod_lruvec_page_state(new_page, NR_SHMEM, nr_none);
+ __mod_lruvec_page_state(new_page, NR_SHMEM, nr_none);
}

/* Join all the small entries into a single multi-index entry */
@@ -1950,10 +1949,10 @@ static void collapse_file(struct mm_struct *mm,

/* Something went wrong: roll back page cache changes */
xas_lock_irq(&xas);
- mapping->nrpages -= nr_none;
-
- if (is_shmem)
+ if (nr_none) {
+ mapping->nrpages -= nr_none;
shmem_uncharge(mapping->host, nr_none);
+ }

xas_set(&xas, start);
xas_for_each(&xas, page, end - 1) {
--
2.23.0

2022-06-11 09:42:58

by Miaohe Lin

[permalink] [raw]
Subject: [PATCH 1/7] mm/khugepaged: remove unneeded shmem_huge_enabled() check

If we reach here, hugepage_vma_check() has already made sure that hugepage
is enabled for shmem. Remove this duplicated check.

Signed-off-by: Miaohe Lin <[email protected]>
---
mm/khugepaged.c | 2 --
1 file changed, 2 deletions(-)

diff --git a/mm/khugepaged.c b/mm/khugepaged.c
index 476d79360101..73570dfffcec 100644
--- a/mm/khugepaged.c
+++ b/mm/khugepaged.c
@@ -2153,8 +2153,6 @@ static unsigned int khugepaged_scan_mm_slot(unsigned int pages,
if (khugepaged_scan.address < hstart)
khugepaged_scan.address = hstart;
VM_BUG_ON(khugepaged_scan.address & ~HPAGE_PMD_MASK);
- if (shmem_file(vma->vm_file) && !shmem_huge_enabled(vma))
- goto skip;

while (khugepaged_scan.address < hend) {
int ret;
--
2.23.0

2022-06-11 09:45:19

by Miaohe Lin

[permalink] [raw]
Subject: [PATCH 3/7] mm/khugepaged: trivial typo and codestyle cleanup

Fix some typos and tweak the code to meet codestyle. No functional
change intended.

Signed-off-by: Miaohe Lin <[email protected]>
---
mm/khugepaged.c | 14 +++++++-------
1 file changed, 7 insertions(+), 7 deletions(-)

diff --git a/mm/khugepaged.c b/mm/khugepaged.c
index a8adb2d1e9c6..1b5dd3820eac 100644
--- a/mm/khugepaged.c
+++ b/mm/khugepaged.c
@@ -260,7 +260,7 @@ static ssize_t khugepaged_max_ptes_none_store(struct kobject *kobj,
unsigned long max_ptes_none;

err = kstrtoul(buf, 10, &max_ptes_none);
- if (err || max_ptes_none > HPAGE_PMD_NR-1)
+ if (err || max_ptes_none > HPAGE_PMD_NR - 1)
return -EINVAL;

khugepaged_max_ptes_none = max_ptes_none;
@@ -286,7 +286,7 @@ static ssize_t khugepaged_max_ptes_swap_store(struct kobject *kobj,
unsigned long max_ptes_swap;

err = kstrtoul(buf, 10, &max_ptes_swap);
- if (err || max_ptes_swap > HPAGE_PMD_NR-1)
+ if (err || max_ptes_swap > HPAGE_PMD_NR - 1)
return -EINVAL;

khugepaged_max_ptes_swap = max_ptes_swap;
@@ -313,7 +313,7 @@ static ssize_t khugepaged_max_ptes_shared_store(struct kobject *kobj,
unsigned long max_ptes_shared;

err = kstrtoul(buf, 10, &max_ptes_shared);
- if (err || max_ptes_shared > HPAGE_PMD_NR-1)
+ if (err || max_ptes_shared > HPAGE_PMD_NR - 1)
return -EINVAL;

khugepaged_max_ptes_shared = max_ptes_shared;
@@ -599,7 +599,7 @@ static int __collapse_huge_page_isolate(struct vm_area_struct *vma,
int none_or_zero = 0, shared = 0, result = 0, referenced = 0;
bool writable = false;

- for (_pte = pte; _pte < pte+HPAGE_PMD_NR;
+ for (_pte = pte; _pte < pte + HPAGE_PMD_NR;
_pte++, address += PAGE_SIZE) {
pte_t pteval = *_pte;
if (pte_none(pteval) || (pte_present(pteval) &&
@@ -1216,7 +1216,7 @@ static int khugepaged_scan_pmd(struct mm_struct *mm,

memset(khugepaged_node_load, 0, sizeof(khugepaged_node_load));
pte = pte_offset_map_lock(mm, pmd, address, &ptl);
- for (_address = address, _pte = pte; _pte < pte+HPAGE_PMD_NR;
+ for (_address = address, _pte = pte; _pte < pte + HPAGE_PMD_NR;
_pte++, _address += PAGE_SIZE) {
pte_t pteval = *_pte;
if (is_swap_pte(pteval)) {
@@ -1306,7 +1306,7 @@ static int khugepaged_scan_pmd(struct mm_struct *mm,
/*
* Check if the page has any GUP (or other external) pins.
*
- * Here the check is racy it may see totmal_mapcount > refcount
+ * Here the check is racy it may see total_mapcount > refcount
* in some cases.
* For example, one process with one forked child process.
* The parent has the PMD split due to MADV_DONTNEED, then
@@ -1557,7 +1557,7 @@ static void retract_page_tables(struct address_space *mapping, pgoff_t pgoff)
* mmap_write_lock(mm) as PMD-mapping is likely to be split
* later.
*
- * Not that vma->anon_vma check is racy: it can be set up after
+ * Note that vma->anon_vma check is racy: it can be set up after
* the check but before we took mmap_lock by the fault path.
* But page lock would prevent establishing any new ptes of the
* page, so we are safe.
--
2.23.0

2022-06-11 09:59:19

by Miaohe Lin

[permalink] [raw]
Subject: [PATCH 7/7] mm/khugepaged: try to free transhuge swapcache when possible

Transhuge swapcaches won't be freed in __collapse_huge_page_copy().
It's because release_pte_page() is not called for these pages and
thus free_page_and_swap_cache can't grab the page lock. These pages
won't be freed from swap cache even if we are the only user until
next time reclaim. It shouldn't hurt indeed, but we could try to
free these pages to save more memory for system.

Signed-off-by: Miaohe Lin <[email protected]>
---
include/linux/swap.h | 5 +++++
mm/khugepaged.c | 1 +
mm/swap.h | 5 -----
3 files changed, 6 insertions(+), 5 deletions(-)

diff --git a/include/linux/swap.h b/include/linux/swap.h
index 8672a7123ccd..ccb83b12b724 100644
--- a/include/linux/swap.h
+++ b/include/linux/swap.h
@@ -456,6 +456,7 @@ static inline unsigned long total_swapcache_pages(void)
return global_node_page_state(NR_SWAPCACHE);
}

+extern void free_swap_cache(struct page *page);
extern void free_page_and_swap_cache(struct page *);
extern void free_pages_and_swap_cache(struct page **, int);
/* linux/mm/swapfile.c */
@@ -540,6 +541,10 @@ static inline void put_swap_device(struct swap_info_struct *si)
/* used to sanity check ptes in zap_pte_range when CONFIG_SWAP=0 */
#define free_swap_and_cache(e) is_pfn_swap_entry(e)

+static inline void free_swap_cache(struct page *page)
+{
+}
+
static inline int add_swap_count_continuation(swp_entry_t swp, gfp_t gfp_mask)
{
return 0;
diff --git a/mm/khugepaged.c b/mm/khugepaged.c
index ee0a719c8be9..52109ad13f78 100644
--- a/mm/khugepaged.c
+++ b/mm/khugepaged.c
@@ -756,6 +756,7 @@ static void __collapse_huge_page_copy(pte_t *pte, struct page *page,
list_for_each_entry_safe(src_page, tmp, compound_pagelist, lru) {
list_del(&src_page->lru);
release_pte_page(src_page);
+ free_swap_cache(src_page);
}
}

diff --git a/mm/swap.h b/mm/swap.h
index 0193797b0c92..863f6086c916 100644
--- a/mm/swap.h
+++ b/mm/swap.h
@@ -41,7 +41,6 @@ void __delete_from_swap_cache(struct page *page,
void delete_from_swap_cache(struct page *page);
void clear_shadow_from_swap_cache(int type, unsigned long begin,
unsigned long end);
-void free_swap_cache(struct page *page);
struct page *lookup_swap_cache(swp_entry_t entry,
struct vm_area_struct *vma,
unsigned long addr);
@@ -81,10 +80,6 @@ static inline struct address_space *swap_address_space(swp_entry_t entry)
return NULL;
}

-static inline void free_swap_cache(struct page *page)
-{
-}
-
static inline void show_swap_cache_info(void)
{
}
--
2.23.0

2022-06-12 03:26:11

by Andrew Morton

[permalink] [raw]
Subject: Re: [PATCH 1/7] mm/khugepaged: remove unneeded shmem_huge_enabled() check

On Sat, 11 Jun 2022 16:47:25 +0800 Miaohe Lin <[email protected]> wrote:

> If we reach here, hugepage_vma_check() has already made sure that hugepage
> is enabled for shmem. Remove this duplicated check.

I updated this to

If we reach here, hugepage_vma_check() has already made sure that hugepage
is enabled for shmem, via its call to hugepage_vma_check(). Remove this
duplicated check.

2022-06-13 03:26:50

by Miaohe Lin

[permalink] [raw]
Subject: Re: [PATCH 1/7] mm/khugepaged: remove unneeded shmem_huge_enabled() check

On 2022/6/12 4:33, Andrew Morton wrote:
> On Sat, 11 Jun 2022 16:47:25 +0800 Miaohe Lin <[email protected]> wrote:
>
>> If we reach here, hugepage_vma_check() has already made sure that hugepage
>> is enabled for shmem. Remove this duplicated check.
>
> I updated this to
>
> If we reach here, hugepage_vma_check() has already made sure that hugepage
> is enabled for shmem, via its call to hugepage_vma_check(). Remove this
> duplicated check.

Do you mean "khugepaged_scan_mm_slot() has already made sure that hugepage is
enabled for shmem, via its call to hugepage_vma_check()"?

Thanks!

>
> .
>

2022-06-13 20:09:19

by Andrew Morton

[permalink] [raw]
Subject: Re: [PATCH 1/7] mm/khugepaged: remove unneeded shmem_huge_enabled() check

On Mon, 13 Jun 2022 09:48:27 +0800 Miaohe Lin <[email protected]> wrote:

> > I updated this to
> >
> > If we reach here, hugepage_vma_check() has already made sure that hugepage
> > is enabled for shmem, via its call to hugepage_vma_check(). Remove this
> > duplicated check.
>
> Do you mean "khugepaged_scan_mm_slot() has already made sure that hugepage is
> enabled for shmem, via its call to hugepage_vma_check()"?

yup, thanks.

2022-06-15 01:03:08

by Zach O'Keefe

[permalink] [raw]
Subject: Re: [PATCH 1/7] mm/khugepaged: remove unneeded shmem_huge_enabled() check

On 11 Jun 16:47, Miaohe Lin wrote:
> If we reach here, hugepage_vma_check() has already made sure that hugepage
> is enabled for shmem. Remove this duplicated check.
>
> Signed-off-by: Miaohe Lin <[email protected]>
> ---
> mm/khugepaged.c | 2 --
> 1 file changed, 2 deletions(-)
>
> diff --git a/mm/khugepaged.c b/mm/khugepaged.c
> index 476d79360101..73570dfffcec 100644
> --- a/mm/khugepaged.c
> +++ b/mm/khugepaged.c
> @@ -2153,8 +2153,6 @@ static unsigned int khugepaged_scan_mm_slot(unsigned int pages,
> if (khugepaged_scan.address < hstart)
> khugepaged_scan.address = hstart;
> VM_BUG_ON(khugepaged_scan.address & ~HPAGE_PMD_MASK);
> - if (shmem_file(vma->vm_file) && !shmem_huge_enabled(vma))
> - goto skip;
>
> while (khugepaged_scan.address < hend) {
> int ret;
> --
> 2.23.0
>
>

Thanks for these cleanups, Miaohe.

Reviewed-by: Zach O'Keefe <[email protected]>

2022-06-15 01:05:14

by Zach O'Keefe

[permalink] [raw]
Subject: Re: [PATCH 3/7] mm/khugepaged: trivial typo and codestyle cleanup

On 11 Jun 16:47, Miaohe Lin wrote:
> Fix some typos and tweak the code to meet codestyle. No functional
> change intended.
>
> Signed-off-by: Miaohe Lin <[email protected]>
> ---
> mm/khugepaged.c | 14 +++++++-------
> 1 file changed, 7 insertions(+), 7 deletions(-)
>
> diff --git a/mm/khugepaged.c b/mm/khugepaged.c
> index a8adb2d1e9c6..1b5dd3820eac 100644
> --- a/mm/khugepaged.c
> +++ b/mm/khugepaged.c
> @@ -260,7 +260,7 @@ static ssize_t khugepaged_max_ptes_none_store(struct kobject *kobj,
> unsigned long max_ptes_none;
>
> err = kstrtoul(buf, 10, &max_ptes_none);
> - if (err || max_ptes_none > HPAGE_PMD_NR-1)
> + if (err || max_ptes_none > HPAGE_PMD_NR - 1)
> return -EINVAL;
>
> khugepaged_max_ptes_none = max_ptes_none;
> @@ -286,7 +286,7 @@ static ssize_t khugepaged_max_ptes_swap_store(struct kobject *kobj,
> unsigned long max_ptes_swap;
>
> err = kstrtoul(buf, 10, &max_ptes_swap);
> - if (err || max_ptes_swap > HPAGE_PMD_NR-1)
> + if (err || max_ptes_swap > HPAGE_PMD_NR - 1)
> return -EINVAL;
>
> khugepaged_max_ptes_swap = max_ptes_swap;
> @@ -313,7 +313,7 @@ static ssize_t khugepaged_max_ptes_shared_store(struct kobject *kobj,
> unsigned long max_ptes_shared;
>
> err = kstrtoul(buf, 10, &max_ptes_shared);
> - if (err || max_ptes_shared > HPAGE_PMD_NR-1)
> + if (err || max_ptes_shared > HPAGE_PMD_NR - 1)
> return -EINVAL;
>
> khugepaged_max_ptes_shared = max_ptes_shared;
> @@ -599,7 +599,7 @@ static int __collapse_huge_page_isolate(struct vm_area_struct *vma,
> int none_or_zero = 0, shared = 0, result = 0, referenced = 0;
> bool writable = false;
>
> - for (_pte = pte; _pte < pte+HPAGE_PMD_NR;
> + for (_pte = pte; _pte < pte + HPAGE_PMD_NR;
> _pte++, address += PAGE_SIZE) {
> pte_t pteval = *_pte;
> if (pte_none(pteval) || (pte_present(pteval) &&
> @@ -1216,7 +1216,7 @@ static int khugepaged_scan_pmd(struct mm_struct *mm,
>
> memset(khugepaged_node_load, 0, sizeof(khugepaged_node_load));
> pte = pte_offset_map_lock(mm, pmd, address, &ptl);
> - for (_address = address, _pte = pte; _pte < pte+HPAGE_PMD_NR;
> + for (_address = address, _pte = pte; _pte < pte + HPAGE_PMD_NR;
> _pte++, _address += PAGE_SIZE) {
> pte_t pteval = *_pte;
> if (is_swap_pte(pteval)) {
> @@ -1306,7 +1306,7 @@ static int khugepaged_scan_pmd(struct mm_struct *mm,
> /*
> * Check if the page has any GUP (or other external) pins.
> *
> - * Here the check is racy it may see totmal_mapcount > refcount
> + * Here the check is racy it may see total_mapcount > refcount
> * in some cases.
> * For example, one process with one forked child process.
> * The parent has the PMD split due to MADV_DONTNEED, then
> @@ -1557,7 +1557,7 @@ static void retract_page_tables(struct address_space *mapping, pgoff_t pgoff)
> * mmap_write_lock(mm) as PMD-mapping is likely to be split
> * later.
> *
> - * Not that vma->anon_vma check is racy: it can be set up after
> + * Note that vma->anon_vma check is racy: it can be set up after
> * the check but before we took mmap_lock by the fault path.
> * But page lock would prevent establishing any new ptes of the
> * page, so we are safe.
> --
> 2.23.0
>
>

Reviewed-by: Zach O'Keefe <[email protected]>

2022-06-15 16:07:50

by Zach O'Keefe

[permalink] [raw]
Subject: Re: [PATCH 4/7] mm/khugepaged: minor cleanup for collapse_file

On 11 Jun 16:47, Miaohe Lin wrote:
> nr_none is always 0 for non-shmem case because the page can be read from
> the backend store. So when nr_none ! = 0, it must be in is_shmem case.
> Also only adjust the nrpages and uncharge shmem when nr_none != 0 to save
> cpu cycles.
>
> Signed-off-by: Miaohe Lin <[email protected]>
> ---
> mm/khugepaged.c | 9 ++++-----
> 1 file changed, 4 insertions(+), 5 deletions(-)
>
> diff --git a/mm/khugepaged.c b/mm/khugepaged.c
> index 1b5dd3820eac..8e6fad7c7bd9 100644
> --- a/mm/khugepaged.c
> +++ b/mm/khugepaged.c
> @@ -1885,8 +1885,7 @@ static void collapse_file(struct mm_struct *mm,
>
> if (nr_none) {
> __mod_lruvec_page_state(new_page, NR_FILE_PAGES, nr_none);
> - if (is_shmem)
> - __mod_lruvec_page_state(new_page, NR_SHMEM, nr_none);
> + __mod_lruvec_page_state(new_page, NR_SHMEM, nr_none);
> }


Might be worth a small comment here - even though folks can see in above code
that this is only incremented in shmem path, might be nice to say why it's
always 0 for non-shmem (or conversely, why it's only possible to be non 0 on
shmem).

>
> /* Join all the small entries into a single multi-index entry */
> @@ -1950,10 +1949,10 @@ static void collapse_file(struct mm_struct *mm,
>
> /* Something went wrong: roll back page cache changes */
> xas_lock_irq(&xas);
> - mapping->nrpages -= nr_none;
> -
> - if (is_shmem)
> + if (nr_none) {
> + mapping->nrpages -= nr_none;
> shmem_uncharge(mapping->host, nr_none);
> + }
>
> xas_set(&xas, start);
> xas_for_each(&xas, page, end - 1) {
> --
> 2.23.0
>
>

Otherwise,

Reviewed-by: Zach O'Keefe <[email protected]>

2022-06-15 17:32:38

by Zach O'Keefe

[permalink] [raw]
Subject: Re: [PATCH 7/7] mm/khugepaged: try to free transhuge swapcache when possible

On 11 Jun 16:47, Miaohe Lin wrote:
> Transhuge swapcaches won't be freed in __collapse_huge_page_copy().
> It's because release_pte_page() is not called for these pages and
> thus free_page_and_swap_cache can't grab the page lock. These pages
> won't be freed from swap cache even if we are the only user until
> next time reclaim. It shouldn't hurt indeed, but we could try to
> free these pages to save more memory for system.
>
> Signed-off-by: Miaohe Lin <[email protected]>
> ---
> include/linux/swap.h | 5 +++++
> mm/khugepaged.c | 1 +
> mm/swap.h | 5 -----
> 3 files changed, 6 insertions(+), 5 deletions(-)
>
> diff --git a/include/linux/swap.h b/include/linux/swap.h
> index 8672a7123ccd..ccb83b12b724 100644
> --- a/include/linux/swap.h
> +++ b/include/linux/swap.h
> @@ -456,6 +456,7 @@ static inline unsigned long total_swapcache_pages(void)
> return global_node_page_state(NR_SWAPCACHE);
> }
>
> +extern void free_swap_cache(struct page *page);
> extern void free_page_and_swap_cache(struct page *);
> extern void free_pages_and_swap_cache(struct page **, int);
> /* linux/mm/swapfile.c */
> @@ -540,6 +541,10 @@ static inline void put_swap_device(struct swap_info_struct *si)
> /* used to sanity check ptes in zap_pte_range when CONFIG_SWAP=0 */
> #define free_swap_and_cache(e) is_pfn_swap_entry(e)
>
> +static inline void free_swap_cache(struct page *page)
> +{
> +}
> +
> static inline int add_swap_count_continuation(swp_entry_t swp, gfp_t gfp_mask)
> {
> return 0;
> diff --git a/mm/khugepaged.c b/mm/khugepaged.c
> index ee0a719c8be9..52109ad13f78 100644
> --- a/mm/khugepaged.c
> +++ b/mm/khugepaged.c
> @@ -756,6 +756,7 @@ static void __collapse_huge_page_copy(pte_t *pte, struct page *page,
> list_for_each_entry_safe(src_page, tmp, compound_pagelist, lru) {
> list_del(&src_page->lru);
> release_pte_page(src_page);
> + free_swap_cache(src_page);
> }
> }

Aside: in __collapse_huge_page_isolate() (and also here) why can't we just check
PageCompound(page) && page == compound_head(page) to only act on compound pages
once? AFAIK this would alleviate this compound_pagelist business..

Anyways, as-is, free_page_and_swap_cache() won't be able to do
try_to_free_swap(), since it can't grab page lock, put it will call put_page().
I think (?) the last page ref might be dropped in release_pte_page(), so should
free_swap_cache() come before it?

>
> diff --git a/mm/swap.h b/mm/swap.h
> index 0193797b0c92..863f6086c916 100644
> --- a/mm/swap.h
> +++ b/mm/swap.h
> @@ -41,7 +41,6 @@ void __delete_from_swap_cache(struct page *page,
> void delete_from_swap_cache(struct page *page);
> void clear_shadow_from_swap_cache(int type, unsigned long begin,
> unsigned long end);
> -void free_swap_cache(struct page *page);
> struct page *lookup_swap_cache(swp_entry_t entry,
> struct vm_area_struct *vma,
> unsigned long addr);
> @@ -81,10 +80,6 @@ static inline struct address_space *swap_address_space(swp_entry_t entry)
> return NULL;
> }
>
> -static inline void free_swap_cache(struct page *page)
> -{
> -}
> -
> static inline void show_swap_cache_info(void)
> {
> }
> --
> 2.23.0
>
>

2022-06-15 19:05:11

by Yang Shi

[permalink] [raw]
Subject: Re: [PATCH 1/7] mm/khugepaged: remove unneeded shmem_huge_enabled() check

On Sat, Jun 11, 2022 at 1:47 AM Miaohe Lin <[email protected]> wrote:
>
> If we reach here, hugepage_vma_check() has already made sure that hugepage
> is enabled for shmem. Remove this duplicated check.
>
> Signed-off-by: Miaohe Lin <[email protected]>

Reviewed-by: Yang Shi <[email protected]>

> ---
> mm/khugepaged.c | 2 --
> 1 file changed, 2 deletions(-)
>
> diff --git a/mm/khugepaged.c b/mm/khugepaged.c
> index 476d79360101..73570dfffcec 100644
> --- a/mm/khugepaged.c
> +++ b/mm/khugepaged.c
> @@ -2153,8 +2153,6 @@ static unsigned int khugepaged_scan_mm_slot(unsigned int pages,
> if (khugepaged_scan.address < hstart)
> khugepaged_scan.address = hstart;
> VM_BUG_ON(khugepaged_scan.address & ~HPAGE_PMD_MASK);
> - if (shmem_file(vma->vm_file) && !shmem_huge_enabled(vma))
> - goto skip;
>
> while (khugepaged_scan.address < hend) {
> int ret;
> --
> 2.23.0
>
>

2022-06-15 19:29:11

by Yang Shi

[permalink] [raw]
Subject: Re: [PATCH 3/7] mm/khugepaged: trivial typo and codestyle cleanup

On Sat, Jun 11, 2022 at 1:47 AM Miaohe Lin <[email protected]> wrote:
>
> Fix some typos and tweak the code to meet codestyle. No functional
> change intended.
>
> Signed-off-by: Miaohe Lin <[email protected]>

Reviewed-by: Yang Shi <[email protected]>

> ---
> mm/khugepaged.c | 14 +++++++-------
> 1 file changed, 7 insertions(+), 7 deletions(-)
>
> diff --git a/mm/khugepaged.c b/mm/khugepaged.c
> index a8adb2d1e9c6..1b5dd3820eac 100644
> --- a/mm/khugepaged.c
> +++ b/mm/khugepaged.c
> @@ -260,7 +260,7 @@ static ssize_t khugepaged_max_ptes_none_store(struct kobject *kobj,
> unsigned long max_ptes_none;
>
> err = kstrtoul(buf, 10, &max_ptes_none);
> - if (err || max_ptes_none > HPAGE_PMD_NR-1)
> + if (err || max_ptes_none > HPAGE_PMD_NR - 1)
> return -EINVAL;
>
> khugepaged_max_ptes_none = max_ptes_none;
> @@ -286,7 +286,7 @@ static ssize_t khugepaged_max_ptes_swap_store(struct kobject *kobj,
> unsigned long max_ptes_swap;
>
> err = kstrtoul(buf, 10, &max_ptes_swap);
> - if (err || max_ptes_swap > HPAGE_PMD_NR-1)
> + if (err || max_ptes_swap > HPAGE_PMD_NR - 1)
> return -EINVAL;
>
> khugepaged_max_ptes_swap = max_ptes_swap;
> @@ -313,7 +313,7 @@ static ssize_t khugepaged_max_ptes_shared_store(struct kobject *kobj,
> unsigned long max_ptes_shared;
>
> err = kstrtoul(buf, 10, &max_ptes_shared);
> - if (err || max_ptes_shared > HPAGE_PMD_NR-1)
> + if (err || max_ptes_shared > HPAGE_PMD_NR - 1)
> return -EINVAL;
>
> khugepaged_max_ptes_shared = max_ptes_shared;
> @@ -599,7 +599,7 @@ static int __collapse_huge_page_isolate(struct vm_area_struct *vma,
> int none_or_zero = 0, shared = 0, result = 0, referenced = 0;
> bool writable = false;
>
> - for (_pte = pte; _pte < pte+HPAGE_PMD_NR;
> + for (_pte = pte; _pte < pte + HPAGE_PMD_NR;
> _pte++, address += PAGE_SIZE) {
> pte_t pteval = *_pte;
> if (pte_none(pteval) || (pte_present(pteval) &&
> @@ -1216,7 +1216,7 @@ static int khugepaged_scan_pmd(struct mm_struct *mm,
>
> memset(khugepaged_node_load, 0, sizeof(khugepaged_node_load));
> pte = pte_offset_map_lock(mm, pmd, address, &ptl);
> - for (_address = address, _pte = pte; _pte < pte+HPAGE_PMD_NR;
> + for (_address = address, _pte = pte; _pte < pte + HPAGE_PMD_NR;
> _pte++, _address += PAGE_SIZE) {
> pte_t pteval = *_pte;
> if (is_swap_pte(pteval)) {
> @@ -1306,7 +1306,7 @@ static int khugepaged_scan_pmd(struct mm_struct *mm,
> /*
> * Check if the page has any GUP (or other external) pins.
> *
> - * Here the check is racy it may see totmal_mapcount > refcount
> + * Here the check is racy it may see total_mapcount > refcount
> * in some cases.
> * For example, one process with one forked child process.
> * The parent has the PMD split due to MADV_DONTNEED, then
> @@ -1557,7 +1557,7 @@ static void retract_page_tables(struct address_space *mapping, pgoff_t pgoff)
> * mmap_write_lock(mm) as PMD-mapping is likely to be split
> * later.
> *
> - * Not that vma->anon_vma check is racy: it can be set up after
> + * Note that vma->anon_vma check is racy: it can be set up after
> * the check but before we took mmap_lock by the fault path.
> * But page lock would prevent establishing any new ptes of the
> * page, so we are safe.
> --
> 2.23.0
>
>

2022-06-15 19:57:25

by Yang Shi

[permalink] [raw]
Subject: Re: [PATCH 4/7] mm/khugepaged: minor cleanup for collapse_file

On Wed, Jun 15, 2022 at 8:55 AM Zach O'Keefe <[email protected]> wrote:
>
> On 11 Jun 16:47, Miaohe Lin wrote:
> > nr_none is always 0 for non-shmem case because the page can be read from
> > the backend store. So when nr_none ! = 0, it must be in is_shmem case.
> > Also only adjust the nrpages and uncharge shmem when nr_none != 0 to save
> > cpu cycles.
> >
> > Signed-off-by: Miaohe Lin <[email protected]>
> > ---
> > mm/khugepaged.c | 9 ++++-----
> > 1 file changed, 4 insertions(+), 5 deletions(-)
> >
> > diff --git a/mm/khugepaged.c b/mm/khugepaged.c
> > index 1b5dd3820eac..8e6fad7c7bd9 100644
> > --- a/mm/khugepaged.c
> > +++ b/mm/khugepaged.c
> > @@ -1885,8 +1885,7 @@ static void collapse_file(struct mm_struct *mm,
> >
> > if (nr_none) {
> > __mod_lruvec_page_state(new_page, NR_FILE_PAGES, nr_none);
> > - if (is_shmem)
> > - __mod_lruvec_page_state(new_page, NR_SHMEM, nr_none);
> > + __mod_lruvec_page_state(new_page, NR_SHMEM, nr_none);
> > }
>
>
> Might be worth a small comment here - even though folks can see in above code
> that this is only incremented in shmem path, might be nice to say why it's
> always 0 for non-shmem (or conversely, why it's only possible to be non 0 on
> shmem).

Agreed, better to have some comments in the code.

>
> >
> > /* Join all the small entries into a single multi-index entry */
> > @@ -1950,10 +1949,10 @@ static void collapse_file(struct mm_struct *mm,
> >
> > /* Something went wrong: roll back page cache changes */
> > xas_lock_irq(&xas);
> > - mapping->nrpages -= nr_none;
> > -
> > - if (is_shmem)
> > + if (nr_none) {
> > + mapping->nrpages -= nr_none;
> > shmem_uncharge(mapping->host, nr_none);
> > + }
> >
> > xas_set(&xas, start);
> > xas_for_each(&xas, page, end - 1) {
> > --
> > 2.23.0
> >
> >
>
> Otherwise,
>
> Reviewed-by: Zach O'Keefe <[email protected]>
>

2022-06-16 00:34:05

by Yang Shi

[permalink] [raw]
Subject: Re: [PATCH 7/7] mm/khugepaged: try to free transhuge swapcache when possible

On Sat, Jun 11, 2022 at 1:47 AM Miaohe Lin <[email protected]> wrote:
>
> Transhuge swapcaches won't be freed in __collapse_huge_page_copy().
> It's because release_pte_page() is not called for these pages and
> thus free_page_and_swap_cache can't grab the page lock. These pages
> won't be freed from swap cache even if we are the only user until
> next time reclaim. It shouldn't hurt indeed, but we could try to
> free these pages to save more memory for system.


>
> Signed-off-by: Miaohe Lin <[email protected]>
> ---
> include/linux/swap.h | 5 +++++
> mm/khugepaged.c | 1 +
> mm/swap.h | 5 -----
> 3 files changed, 6 insertions(+), 5 deletions(-)
>
> diff --git a/include/linux/swap.h b/include/linux/swap.h
> index 8672a7123ccd..ccb83b12b724 100644
> --- a/include/linux/swap.h
> +++ b/include/linux/swap.h
> @@ -456,6 +456,7 @@ static inline unsigned long total_swapcache_pages(void)
> return global_node_page_state(NR_SWAPCACHE);
> }
>
> +extern void free_swap_cache(struct page *page);
> extern void free_page_and_swap_cache(struct page *);
> extern void free_pages_and_swap_cache(struct page **, int);
> /* linux/mm/swapfile.c */
> @@ -540,6 +541,10 @@ static inline void put_swap_device(struct swap_info_struct *si)
> /* used to sanity check ptes in zap_pte_range when CONFIG_SWAP=0 */
> #define free_swap_and_cache(e) is_pfn_swap_entry(e)
>
> +static inline void free_swap_cache(struct page *page)
> +{
> +}
> +
> static inline int add_swap_count_continuation(swp_entry_t swp, gfp_t gfp_mask)
> {
> return 0;
> diff --git a/mm/khugepaged.c b/mm/khugepaged.c
> index ee0a719c8be9..52109ad13f78 100644
> --- a/mm/khugepaged.c
> +++ b/mm/khugepaged.c
> @@ -756,6 +756,7 @@ static void __collapse_huge_page_copy(pte_t *pte, struct page *page,
> list_for_each_entry_safe(src_page, tmp, compound_pagelist, lru) {
> list_del(&src_page->lru);
> release_pte_page(src_page);
> + free_swap_cache(src_page);

Will this really work? The free_swap_cache() will just dec refcounts
without putting the page back to buddy. So the hugepage is not
actually freed at all. Am I missing something?

> }
> }
>
> diff --git a/mm/swap.h b/mm/swap.h
> index 0193797b0c92..863f6086c916 100644
> --- a/mm/swap.h
> +++ b/mm/swap.h
> @@ -41,7 +41,6 @@ void __delete_from_swap_cache(struct page *page,
> void delete_from_swap_cache(struct page *page);
> void clear_shadow_from_swap_cache(int type, unsigned long begin,
> unsigned long end);
> -void free_swap_cache(struct page *page);
> struct page *lookup_swap_cache(swp_entry_t entry,
> struct vm_area_struct *vma,
> unsigned long addr);
> @@ -81,10 +80,6 @@ static inline struct address_space *swap_address_space(swp_entry_t entry)
> return NULL;
> }
>
> -static inline void free_swap_cache(struct page *page)
> -{
> -}
> -
> static inline void show_swap_cache_info(void)
> {
> }
> --
> 2.23.0
>
>

2022-06-16 04:13:27

by Mika Penttilä

[permalink] [raw]
Subject: Re: [PATCH 7/7] mm/khugepaged: try to free transhuge swapcache when possible


On 15.6.2022 20.13, Zach O'Keefe wrote:
> On 11 Jun 16:47, Miaohe Lin wrote:
>> Transhuge swapcaches won't be freed in __collapse_huge_page_copy().
>> It's because release_pte_page() is not called for these pages and
>> thus free_page_and_swap_cache can't grab the page lock. These pages
>> won't be freed from swap cache even if we are the only user until
>> next time reclaim. It shouldn't hurt indeed, but we could try to
>> free these pages to save more memory for system.
>>
>> Signed-off-by: Miaohe Lin <[email protected]>
>> ---
>> include/linux/swap.h | 5 +++++
>> mm/khugepaged.c | 1 +
>> mm/swap.h | 5 -----
>> 3 files changed, 6 insertions(+), 5 deletions(-)
>>
>> diff --git a/include/linux/swap.h b/include/linux/swap.h
>> index 8672a7123ccd..ccb83b12b724 100644
>> --- a/include/linux/swap.h
>> +++ b/include/linux/swap.h
>> @@ -456,6 +456,7 @@ static inline unsigned long total_swapcache_pages(void)
>> return global_node_page_state(NR_SWAPCACHE);
>> }
>>
>> +extern void free_swap_cache(struct page *page);
>> extern void free_page_and_swap_cache(struct page *);
>> extern void free_pages_and_swap_cache(struct page **, int);
>> /* linux/mm/swapfile.c */
>> @@ -540,6 +541,10 @@ static inline void put_swap_device(struct swap_info_struct *si)
>> /* used to sanity check ptes in zap_pte_range when CONFIG_SWAP=0 */
>> #define free_swap_and_cache(e) is_pfn_swap_entry(e)
>>
>> +static inline void free_swap_cache(struct page *page)
>> +{
>> +}
>> +
>> static inline int add_swap_count_continuation(swp_entry_t swp, gfp_t gfp_mask)
>> {
>> return 0;
>> diff --git a/mm/khugepaged.c b/mm/khugepaged.c
>> index ee0a719c8be9..52109ad13f78 100644
>> --- a/mm/khugepaged.c
>> +++ b/mm/khugepaged.c
>> @@ -756,6 +756,7 @@ static void __collapse_huge_page_copy(pte_t *pte, struct page *page,
>> list_for_each_entry_safe(src_page, tmp, compound_pagelist, lru) {
>> list_del(&src_page->lru);
>> release_pte_page(src_page);
>> + free_swap_cache(src_page);
>> }
>> }
>
> Aside: in __collapse_huge_page_isolate() (and also here) why can't we just check
> PageCompound(page) && page == compound_head(page) to only act on compound pages
> once? AFAIK this would alleviate this compound_pagelist business..



It is for pte mapped compound pages. Things like lock/unlock page and
isolate/putback lru have to operate on head pages and you have to do
operations like copy from tail pages before releasing heads. So while
maybe you could add tests for head/tail here and there, work pages
backwards etc it easily gets messier than the current solution with
compound_pagelist.



>
> Anyways, as-is, free_page_and_swap_cache() won't be able to do
> try_to_free_swap(), since it can't grab page lock, put it will call put_page().
> I think (?) the last page ref might be dropped in release_pte_page(), so should
> free_swap_cache() come before it?
>
>>
>> diff --git a/mm/swap.h b/mm/swap.h
>> index 0193797b0c92..863f6086c916 100644
>> --- a/mm/swap.h
>> +++ b/mm/swap.h
>> @@ -41,7 +41,6 @@ void __delete_from_swap_cache(struct page *page,
>> void delete_from_swap_cache(struct page *page);
>> void clear_shadow_from_swap_cache(int type, unsigned long begin,
>> unsigned long end);
>> -void free_swap_cache(struct page *page);
>> struct page *lookup_swap_cache(swp_entry_t entry,
>> struct vm_area_struct *vma,
>> unsigned long addr);
>> @@ -81,10 +80,6 @@ static inline struct address_space *swap_address_space(swp_entry_t entry)
>> return NULL;
>> }
>>
>> -static inline void free_swap_cache(struct page *page)
>> -{
>> -}
>> -
>> static inline void show_swap_cache_info(void)
>> {
>> }
>> --
>> 2.23.0
>>
>>
>

--Mika

2022-06-16 07:35:06

by Miaohe Lin

[permalink] [raw]
Subject: Re: [PATCH 7/7] mm/khugepaged: try to free transhuge swapcache when possible

On 2022/6/16 1:13, Zach O'Keefe wrote:
> On 11 Jun 16:47, Miaohe Lin wrote:
>> Transhuge swapcaches won't be freed in __collapse_huge_page_copy().
>> It's because release_pte_page() is not called for these pages and
>> thus free_page_and_swap_cache can't grab the page lock. These pages
>> won't be freed from swap cache even if we are the only user until
>> next time reclaim. It shouldn't hurt indeed, but we could try to
>> free these pages to save more memory for system.
>>
>> Signed-off-by: Miaohe Lin <[email protected]>
>> ---
>> include/linux/swap.h | 5 +++++
>> mm/khugepaged.c | 1 +
>> mm/swap.h | 5 -----
>> 3 files changed, 6 insertions(+), 5 deletions(-)
>>
>> diff --git a/include/linux/swap.h b/include/linux/swap.h
>> index 8672a7123ccd..ccb83b12b724 100644
>> --- a/include/linux/swap.h
>> +++ b/include/linux/swap.h
>> @@ -456,6 +456,7 @@ static inline unsigned long total_swapcache_pages(void)
>> return global_node_page_state(NR_SWAPCACHE);
>> }
>>
>> +extern void free_swap_cache(struct page *page);
>> extern void free_page_and_swap_cache(struct page *);
>> extern void free_pages_and_swap_cache(struct page **, int);
>> /* linux/mm/swapfile.c */
>> @@ -540,6 +541,10 @@ static inline void put_swap_device(struct swap_info_struct *si)
>> /* used to sanity check ptes in zap_pte_range when CONFIG_SWAP=0 */
>> #define free_swap_and_cache(e) is_pfn_swap_entry(e)
>>
>> +static inline void free_swap_cache(struct page *page)
>> +{
>> +}
>> +
>> static inline int add_swap_count_continuation(swp_entry_t swp, gfp_t gfp_mask)
>> {
>> return 0;
>> diff --git a/mm/khugepaged.c b/mm/khugepaged.c
>> index ee0a719c8be9..52109ad13f78 100644
>> --- a/mm/khugepaged.c
>> +++ b/mm/khugepaged.c
>> @@ -756,6 +756,7 @@ static void __collapse_huge_page_copy(pte_t *pte, struct page *page,
>> list_for_each_entry_safe(src_page, tmp, compound_pagelist, lru) {
>> list_del(&src_page->lru);
>> release_pte_page(src_page);
>> + free_swap_cache(src_page);
>> }
>> }
>
> Aside: in __collapse_huge_page_isolate() (and also here) why can't we just check
> PageCompound(page) && page == compound_head(page) to only act on compound pages
> once? AFAIK this would alleviate this compound_pagelist business..
>
> Anyways, as-is, free_page_and_swap_cache() won't be able to do
> try_to_free_swap(), since it can't grab page lock, put it will call put_page().
> I think (?) the last page ref might be dropped in release_pte_page(), so should
> free_swap_cache() come before it?

Thanks for catching this! If page is not in swapcache, the last page ref might be dropped.
So it's bad to call free_swap_cache() after it. Thanks!

>
>>
>> diff --git a/mm/swap.h b/mm/swap.h
>> index 0193797b0c92..863f6086c916 100644
>> --- a/mm/swap.h
>> +++ b/mm/swap.h
>> @@ -41,7 +41,6 @@ void __delete_from_swap_cache(struct page *page,
>> void delete_from_swap_cache(struct page *page);
>> void clear_shadow_from_swap_cache(int type, unsigned long begin,
>> unsigned long end);
>> -void free_swap_cache(struct page *page);
>> struct page *lookup_swap_cache(swp_entry_t entry,
>> struct vm_area_struct *vma,
>> unsigned long addr);
>> @@ -81,10 +80,6 @@ static inline struct address_space *swap_address_space(swp_entry_t entry)
>> return NULL;
>> }
>>
>> -static inline void free_swap_cache(struct page *page)
>> -{
>> -}
>> -
>> static inline void show_swap_cache_info(void)
>> {
>> }
>> --
>> 2.23.0
>>
>>
> .
>

2022-06-16 16:03:24

by Yang Shi

[permalink] [raw]
Subject: Re: [PATCH 7/7] mm/khugepaged: try to free transhuge swapcache when possible

On Thu, Jun 16, 2022 at 12:42 AM Miaohe Lin <[email protected]> wrote:
>
> On 2022/6/16 7:58, Yang Shi wrote:
> > On Sat, Jun 11, 2022 at 1:47 AM Miaohe Lin <[email protected]> wrote:
> >>
> >> Transhuge swapcaches won't be freed in __collapse_huge_page_copy().
> >> It's because release_pte_page() is not called for these pages and
> >> thus free_page_and_swap_cache can't grab the page lock. These pages
> >> won't be freed from swap cache even if we are the only user until
> >> next time reclaim. It shouldn't hurt indeed, but we could try to
> >> free these pages to save more memory for system.
> >
> >
> >>
> >> Signed-off-by: Miaohe Lin <[email protected]>
> >> ---
> >> include/linux/swap.h | 5 +++++
> >> mm/khugepaged.c | 1 +
> >> mm/swap.h | 5 -----
> >> 3 files changed, 6 insertions(+), 5 deletions(-)
> >>
> >> diff --git a/include/linux/swap.h b/include/linux/swap.h
> >> index 8672a7123ccd..ccb83b12b724 100644
> >> --- a/include/linux/swap.h
> >> +++ b/include/linux/swap.h
> >> @@ -456,6 +456,7 @@ static inline unsigned long total_swapcache_pages(void)
> >> return global_node_page_state(NR_SWAPCACHE);
> >> }
> >>
> >> +extern void free_swap_cache(struct page *page);
> >> extern void free_page_and_swap_cache(struct page *);
> >> extern void free_pages_and_swap_cache(struct page **, int);
> >> /* linux/mm/swapfile.c */
> >> @@ -540,6 +541,10 @@ static inline void put_swap_device(struct swap_info_struct *si)
> >> /* used to sanity check ptes in zap_pte_range when CONFIG_SWAP=0 */
> >> #define free_swap_and_cache(e) is_pfn_swap_entry(e)
> >>
> >> +static inline void free_swap_cache(struct page *page)
> >> +{
> >> +}
> >> +
> >> static inline int add_swap_count_continuation(swp_entry_t swp, gfp_t gfp_mask)
> >> {
> >> return 0;
> >> diff --git a/mm/khugepaged.c b/mm/khugepaged.c
> >> index ee0a719c8be9..52109ad13f78 100644
> >> --- a/mm/khugepaged.c
> >> +++ b/mm/khugepaged.c
> >> @@ -756,6 +756,7 @@ static void __collapse_huge_page_copy(pte_t *pte, struct page *page,
> >> list_for_each_entry_safe(src_page, tmp, compound_pagelist, lru) {
> >> list_del(&src_page->lru);
> >> release_pte_page(src_page);
> >> + free_swap_cache(src_page);
> >
> > Will this really work? The free_swap_cache() will just dec refcounts
> > without putting the page back to buddy. So the hugepage is not
> > actually freed at all. Am I missing something?
>
> Thanks for catching this! If page is on percpu lru_pvecs cache, page will
> be released when lru_pvecs are drained. But if not, free_swap_cache() won't
> free the page as it assumes the caller has a reference on the page and thus
> only does page_ref_sub(). Does the below change looks sense for you?

THP gets drained immediately so they won't stay in pagevecs.

>
> diff --git a/mm/khugepaged.c b/mm/khugepaged.c
> index 52109ad13f78..b8c96e33591d 100644
> --- a/mm/khugepaged.c
> +++ b/mm/khugepaged.c
> @@ -755,8 +755,12 @@ static void __collapse_huge_page_copy(pte_t *pte, struct page *page,
>
> list_for_each_entry_safe(src_page, tmp, compound_pagelist, lru) {
> list_del(&src_page->lru);
> - release_pte_page(src_page);
> + mod_node_page_state(page_pgdat(src_page),
> + NR_ISOLATED_ANON + page_is_file_lru(src_page),
> + -compound_nr(src_page));
> + unlock_page(src_page);
> free_swap_cache(src_page);
> + putback_lru_page(src_page);

I'm not sure if it is worth it or not for a rare corner case since THP
should not stay in swapcache unless try_to_unmap() in vmscan fails
IIUC. And it is not guaranteed that free_swap_cache() will get the
page lock.

> }
> }
>
> Thanks!
>
> >
> >> }
> >> }
> >>
> >> diff --git a/mm/swap.h b/mm/swap.h
> >> index 0193797b0c92..863f6086c916 100644
> >> --- a/mm/swap.h
> >> +++ b/mm/swap.h
> >> @@ -41,7 +41,6 @@ void __delete_from_swap_cache(struct page *page,
> >> void delete_from_swap_cache(struct page *page);
> >> void clear_shadow_from_swap_cache(int type, unsigned long begin,
> >> unsigned long end);
> >> -void free_swap_cache(struct page *page);
> >> struct page *lookup_swap_cache(swp_entry_t entry,
> >> struct vm_area_struct *vma,
> >> unsigned long addr);
> >> @@ -81,10 +80,6 @@ static inline struct address_space *swap_address_space(swp_entry_t entry)
> >> return NULL;
> >> }
> >>
> >> -static inline void free_swap_cache(struct page *page)
> >> -{
> >> -}
> >> -
> >> static inline void show_swap_cache_info(void)
> >> {
> >> }
> >> --
> >> 2.23.0
> >>
> >>
> > .
> >
>

2022-06-17 16:47:22

by Yang Shi

[permalink] [raw]
Subject: Re: [PATCH 7/7] mm/khugepaged: try to free transhuge swapcache when possible

On Thu, Jun 16, 2022 at 7:27 PM Miaohe Lin <[email protected]> wrote:
>
> On 2022/6/16 23:53, Yang Shi wrote:
> > On Thu, Jun 16, 2022 at 12:42 AM Miaohe Lin <[email protected]> wrote:
> >>
> >> On 2022/6/16 7:58, Yang Shi wrote:
> >>> On Sat, Jun 11, 2022 at 1:47 AM Miaohe Lin <[email protected]> wrote:
> >>>>
> >>>> Transhuge swapcaches won't be freed in __collapse_huge_page_copy().
> >>>> It's because release_pte_page() is not called for these pages and
> >>>> thus free_page_and_swap_cache can't grab the page lock. These pages
> >>>> won't be freed from swap cache even if we are the only user until
> >>>> next time reclaim. It shouldn't hurt indeed, but we could try to
> >>>> free these pages to save more memory for system.
> >>>
> >>>
> >>>>
> >>>> Signed-off-by: Miaohe Lin <[email protected]>
> >>>> ---
> >>>> include/linux/swap.h | 5 +++++
> >>>> mm/khugepaged.c | 1 +
> >>>> mm/swap.h | 5 -----
> >>>> 3 files changed, 6 insertions(+), 5 deletions(-)
> >>>>
> >>>> diff --git a/include/linux/swap.h b/include/linux/swap.h
> >>>> index 8672a7123ccd..ccb83b12b724 100644
> >>>> --- a/include/linux/swap.h
> >>>> +++ b/include/linux/swap.h
> >>>> @@ -456,6 +456,7 @@ static inline unsigned long total_swapcache_pages(void)
> >>>> return global_node_page_state(NR_SWAPCACHE);
> >>>> }
> >>>>
> >>>> +extern void free_swap_cache(struct page *page);
> >>>> extern void free_page_and_swap_cache(struct page *);
> >>>> extern void free_pages_and_swap_cache(struct page **, int);
> >>>> /* linux/mm/swapfile.c */
> >>>> @@ -540,6 +541,10 @@ static inline void put_swap_device(struct swap_info_struct *si)
> >>>> /* used to sanity check ptes in zap_pte_range when CONFIG_SWAP=0 */
> >>>> #define free_swap_and_cache(e) is_pfn_swap_entry(e)
> >>>>
> >>>> +static inline void free_swap_cache(struct page *page)
> >>>> +{
> >>>> +}
> >>>> +
> >>>> static inline int add_swap_count_continuation(swp_entry_t swp, gfp_t gfp_mask)
> >>>> {
> >>>> return 0;
> >>>> diff --git a/mm/khugepaged.c b/mm/khugepaged.c
> >>>> index ee0a719c8be9..52109ad13f78 100644
> >>>> --- a/mm/khugepaged.c
> >>>> +++ b/mm/khugepaged.c
> >>>> @@ -756,6 +756,7 @@ static void __collapse_huge_page_copy(pte_t *pte, struct page *page,
> >>>> list_for_each_entry_safe(src_page, tmp, compound_pagelist, lru) {
> >>>> list_del(&src_page->lru);
> >>>> release_pte_page(src_page);
> >>>> + free_swap_cache(src_page);
> >>>
> >>> Will this really work? The free_swap_cache() will just dec refcounts
> >>> without putting the page back to buddy. So the hugepage is not
> >>> actually freed at all. Am I missing something?
> >>
> >> Thanks for catching this! If page is on percpu lru_pvecs cache, page will
> >> be released when lru_pvecs are drained. But if not, free_swap_cache() won't
> >> free the page as it assumes the caller has a reference on the page and thus
> >> only does page_ref_sub(). Does the below change looks sense for you?
> >
> > THP gets drained immediately so they won't stay in pagevecs.
>
> Yes, you're right. I missed this.
>
> >
> >>
> >> diff --git a/mm/khugepaged.c b/mm/khugepaged.c
> >> index 52109ad13f78..b8c96e33591d 100644
> >> --- a/mm/khugepaged.c
> >> +++ b/mm/khugepaged.c
> >> @@ -755,8 +755,12 @@ static void __collapse_huge_page_copy(pte_t *pte, struct page *page,
> >>
> >> list_for_each_entry_safe(src_page, tmp, compound_pagelist, lru) {
> >> list_del(&src_page->lru);
> >> - release_pte_page(src_page);
> >> + mod_node_page_state(page_pgdat(src_page),
> >> + NR_ISOLATED_ANON + page_is_file_lru(src_page),
> >> + -compound_nr(src_page));
> >> + unlock_page(src_page);
> >> free_swap_cache(src_page);
> >> + putback_lru_page(src_page);
> >
> > I'm not sure if it is worth it or not for a rare corner case since THP
> > should not stay in swapcache unless try_to_unmap() in vmscan fails
>
> IIUC, even if try_to_unmap() in vmscan succeeds, THP might be still in the
> swapcache if shrink_page_list is not called for this THP again after writeback
> is done, e.g. when shrink_page_list is called from madvise, so there might be

I don't get, doesn't __remove_mapping() delete the page from swap cache?

> no memory pressure, or do_swap_page puts the THP into page table again. Also THP

do_swap_page() just swaps in base page, never THP.

> might not be splited when deferred_split_shrinker is not called, e.g. due to

I don't see how deferred split is related to this.

> not lacking of memory. Even if there is memory pressure, the THP will stay in
> swapcache until next round page reclaim for this THP is done. So there should
> be a non-negligible window that THP will stay in the swapcache.
> Or am I miss something?

I guess you may misunderstand what I meant. This patch is trying to
optimize freeing THP in swapcache. But it should be very rare that
khugepaged sees THP from swap cache. The only case I could think of is
try_to_unmap() in vmscan fails. That might leave THP in swap cache so
that khugepaged could see it.


>
> > IIUC. And it is not guaranteed that free_swap_cache() will get the
> > page lock.
>
> IMHO, we're not guaranteed that free_swap_cache() will get the page lock for the normal
> page anyway.
>
> Thanks!
>
> >
> >> }
> >> }
> >>
> >> Thanks!
> >>
> >>>
> >>>> }
> >>>> }
> >>>>
> >>>> diff --git a/mm/swap.h b/mm/swap.h
> >>>> index 0193797b0c92..863f6086c916 100644
> >>>> --- a/mm/swap.h
> >>>> +++ b/mm/swap.h
> >>>> @@ -41,7 +41,6 @@ void __delete_from_swap_cache(struct page *page,
> >>>> void delete_from_swap_cache(struct page *page);
> >>>> void clear_shadow_from_swap_cache(int type, unsigned long begin,
> >>>> unsigned long end);
> >>>> -void free_swap_cache(struct page *page);
> >>>> struct page *lookup_swap_cache(swp_entry_t entry,
> >>>> struct vm_area_struct *vma,
> >>>> unsigned long addr);
> >>>> @@ -81,10 +80,6 @@ static inline struct address_space *swap_address_space(swp_entry_t entry)
> >>>> return NULL;
> >>>> }
> >>>>
> >>>> -static inline void free_swap_cache(struct page *page)
> >>>> -{
> >>>> -}
> >>>> -
> >>>> static inline void show_swap_cache_info(void)
> >>>> {
> >>>> }
> >>>> --
> >>>> 2.23.0
> >>>>
> >>>>
> >>> .
> >>>
> >>
> > .
> >
>