2022-05-28 03:10:36

by Minchan Kim

[permalink] [raw]
Subject: [PATCH] mm: throttle LRU pages skipping on rmap_lock contention

On Thu, May 12, 2022 at 12:55:16PM -0700, Minchan Kim wrote:
> On Wed, May 11, 2022 at 07:05:23PM -0700, Andrew Morton wrote:
> > On Wed, 11 May 2022 15:57:09 -0700 Minchan Kim <[email protected]> wrote:
> >
> > > >
> > > > Could we burn much CPU time pointlessly churning though the LRU? Could
> > > > it mess up aging decisions enough to be performance-affecting in any
> > > > workload?
> > >
> > > Yes, correct. However, we are already churning LRUs by several
> > > ways. For example, isolate and putback from LRU list for page
> > > migration from several sources(typical example is compaction)
> > > and trylock_page and sc->gfp_mask not allowing page to be
> > > reclaimed in shrink_page_list.
> >
> > Well. "we're already doing a risky thing so it's OK to do more of that
> > thing"?
>
> I meant the aging is not rocket science.
>
>
> >
> > > >
> > > > Something else?
> > >
> > > One thing I am worry about was the granularity of the churning.
> > > Example above was page granuarity churning so might be execuse
> > > but this one is address space's churning, especically for file LRU
> > > (i_mmap_rwsem) which might cause too many rotating and live-lock
> > > in the end(keey rotating in small LRU with heavy memory pressure).
> > >
> > > If it could be a problem, maybe we use sc->priority to stop
> > > the skipping on a certain level of memory pressure.
> > >
> > > Any thought? Do we really need it?
> >
> > Are we able to think of a test which might demonstrate any worst case?
> > Whip that up and see what the numbers say?
>
> Yeah, let me create a worst test case to see how it goes.
>
> A thread keep reading a file-backed vma with 2xRAM file but other threads
> keep changing other vmas mapped at the same file so heavy i_mmap_rwsem
> contention in aging path.

Forking new thread

I checked what happens the worst case. I am not sure how the worst
case is realistic but would be great to have safety net.

From 5ccc8b170af5496f803243732e96b131419d7462 Mon Sep 17 00:00:00 2001
From: Minchan Kim <[email protected]>
Date: Thu, 19 May 2022 19:48:12 -0700
Subject: [PATCH] mm: throttle LRU pages skipping on rmap_lock contention

On heavy contention on rmap_lock(e.g., i_mmap_rwsem), VM can keep
skipping LRU pages so reclaim efficiency(steal/scanning) would drop
from 48% to 27% and workingset would be reclaimed faster than old
so workingset_refault rate increased to 240%.

We need a safe net to throttle the skipping LRU pages. This patch
throttle the skipping policy using (DEF_PRIRORITY - 2) magic value
VM has used for indicating non-light memory pressure.
IOW, let's skip rmap_lock contendeded pages only when
only when sc->priority >= (DEF_PRIRORITY - 2).

The test scenario to see the worst case:

1. A thread mmap a big file(e.g., 2x times of RAM) and keep touching
the address space up to three times.
2. B thread keeps doing mmap/munmap with the same file to cause
heavy lock contention in i_mmap_rwsem until the A thread finish
the job.
3. measure vmstat and thread A's elapsed time.

Thread's elapsed time:

1. vanilla
24.64sec(5.04%)

2. rmap_skip(i.e., mm-dont-be-stuck-to-rmap-lock-on-reclaim-path.patch)
25.20sec(4.16%)

3. priority(2 + this patch)
23.62sec(6.61%)

Vmstat Comparison:
vanilla rmap_skip priority
allocstall_movable 582 9772 14643
pgactivate 232 25865 4906
pgdeactivate 78 17265 651
pgmajfault 58 10639 1376
pgsteal_kswapd 15947857 15133195 15095445
pgsteal_direct 105439 583092 943195
pgscan_kswapd 24647536 52768898 28103170
pgscan_direct 8398139 3767100 7966353
workingset_refault_file 12582926 12248353 12565934

B test scenario

1. A thread mmap a big file(e.g., 2x times of RAM) and keep touching
the address space up to three times.
2. B thread keeps doing mmap/munmap with the same file to cause
heavy lock contention in i_mmap_rwsem until the A thread finish
the job.
3. C thread keep reading other big file using read(2) syscall
4. measure vmstat and thread A's elapsed time.

1. vanilla
27.24sec(5.29%)

2. rmap_skip
33.54sec(3.20%)

3. priority
28.68sec(1.26%)

Vmstat Comparison:
vanilla rmap_skip priority
allocstall_movable 15262 81258 21644
pgactivate 3042004 3086906 3502959
pgdeactivate 2307849 8959162 3605768
pgmajfault 566 1059 557
pgsteal_kswapd 17557735 30861283 18385674
pgsteal_direct 955389 6353527 1233605
pgscan_kswapd 31622695 59670433 35372575
pgscan_direct 4924052 13939254 4310247
workingset_refault_file 13466538 32193161 14588019

Signed-off-by: Minchan Kim <[email protected]>
---
include/linux/rmap.h | 5 +++--
mm/rmap.c | 6 ++++--
mm/vmscan.c | 6 ++++--
3 files changed, 11 insertions(+), 6 deletions(-)

diff --git a/include/linux/rmap.h b/include/linux/rmap.h
index 9ec23138e410..2893da3f1cd3 100644
--- a/include/linux/rmap.h
+++ b/include/linux/rmap.h
@@ -296,7 +296,8 @@ static inline int page_try_share_anon_rmap(struct page *page)
* Called from mm/vmscan.c to handle paging out
*/
int folio_referenced(struct folio *, int is_locked,
- struct mem_cgroup *memcg, unsigned long *vm_flags);
+ struct mem_cgroup *memcg, unsigned long *vm_flags,
+ bool rmap_try_lock);

void try_to_migrate(struct folio *folio, enum ttu_flags flags);
void try_to_unmap(struct folio *, enum ttu_flags flags);
@@ -418,7 +419,7 @@ void page_unlock_anon_vma_read(struct anon_vma *anon_vma);

static inline int folio_referenced(struct folio *folio, int is_locked,
struct mem_cgroup *memcg,
- unsigned long *vm_flags)
+ unsigned long *vm_flags, bool rmap_try_lock)
{
*vm_flags = 0;
return 0;
diff --git a/mm/rmap.c b/mm/rmap.c
index d4cf3ea1b616..a75c7f7a0392 100644
--- a/mm/rmap.c
+++ b/mm/rmap.c
@@ -888,6 +888,7 @@ static bool invalid_folio_referenced_vma(struct vm_area_struct *vma, void *arg)
* @is_locked: Caller holds lock on the folio.
* @memcg: target memory cgroup
* @vm_flags: A combination of all the vma->vm_flags which referenced the folio.
+ * @rmap_try_lock: bail out if the rmap lock is contended
*
* Quick test_and_clear_referenced for all mappings of a folio,
*
@@ -895,7 +896,8 @@ static bool invalid_folio_referenced_vma(struct vm_area_struct *vma, void *arg)
* the function bailed out due to rmap lock contention.
*/
int folio_referenced(struct folio *folio, int is_locked,
- struct mem_cgroup *memcg, unsigned long *vm_flags)
+ struct mem_cgroup *memcg, unsigned long *vm_flags,
+ bool rmap_try_lock)
{
int we_locked = 0;
struct folio_referenced_arg pra = {
@@ -906,7 +908,7 @@ int folio_referenced(struct folio *folio, int is_locked,
.rmap_one = folio_referenced_one,
.arg = (void *)&pra,
.anon_lock = folio_lock_anon_vma_read,
- .try_lock = true,
+ .try_lock = rmap_try_lock,
};

*vm_flags = 0;
diff --git a/mm/vmscan.c b/mm/vmscan.c
index ac168f4b0492..f0987e027aba 100644
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -1381,7 +1381,8 @@ static enum page_references folio_check_references(struct folio *folio,
unsigned long vm_flags;

referenced_ptes = folio_referenced(folio, 1, sc->target_mem_cgroup,
- &vm_flags);
+ &vm_flags,
+ sc->priority >= DEF_PRIORITY - 2);
referenced_folio = folio_test_clear_referenced(folio);

/*
@@ -2497,7 +2498,8 @@ static void shrink_active_list(unsigned long nr_to_scan,

/* Referenced or rmap lock contention: rotate */
if (folio_referenced(folio, 0, sc->target_mem_cgroup,
- &vm_flags) != 0) {
+ &vm_flags,
+ sc->priority >= DEF_PRIORITY - 2) != 0) {
/*
* Identify referenced, file-backed active pages and
* give them one more trip around the active list. So
--
2.36.1.124.g0e6072fb45-goog



2022-06-01 20:31:26

by Minchan Kim

[permalink] [raw]
Subject: Re: [PATCH] mm: throttle LRU pages skipping on rmap_lock contention

Bump up.

On Thu, May 26, 2022 at 10:08:44AM -0700, Minchan Kim wrote:
> On Thu, May 12, 2022 at 12:55:16PM -0700, Minchan Kim wrote:
> > On Wed, May 11, 2022 at 07:05:23PM -0700, Andrew Morton wrote:
> > > On Wed, 11 May 2022 15:57:09 -0700 Minchan Kim <[email protected]> wrote:
> > >
> > > > >
> > > > > Could we burn much CPU time pointlessly churning though the LRU? Could
> > > > > it mess up aging decisions enough to be performance-affecting in any
> > > > > workload?
> > > >
> > > > Yes, correct. However, we are already churning LRUs by several
> > > > ways. For example, isolate and putback from LRU list for page
> > > > migration from several sources(typical example is compaction)
> > > > and trylock_page and sc->gfp_mask not allowing page to be
> > > > reclaimed in shrink_page_list.
> > >
> > > Well. "we're already doing a risky thing so it's OK to do more of that
> > > thing"?
> >
> > I meant the aging is not rocket science.
> >
> >
> > >
> > > > >
> > > > > Something else?
> > > >
> > > > One thing I am worry about was the granularity of the churning.
> > > > Example above was page granuarity churning so might be execuse
> > > > but this one is address space's churning, especically for file LRU
> > > > (i_mmap_rwsem) which might cause too many rotating and live-lock
> > > > in the end(keey rotating in small LRU with heavy memory pressure).
> > > >
> > > > If it could be a problem, maybe we use sc->priority to stop
> > > > the skipping on a certain level of memory pressure.
> > > >
> > > > Any thought? Do we really need it?
> > >
> > > Are we able to think of a test which might demonstrate any worst case?
> > > Whip that up and see what the numbers say?
> >
> > Yeah, let me create a worst test case to see how it goes.
> >
> > A thread keep reading a file-backed vma with 2xRAM file but other threads
> > keep changing other vmas mapped at the same file so heavy i_mmap_rwsem
> > contention in aging path.
>
> Forking new thread
>
> I checked what happens the worst case. I am not sure how the worst
> case is realistic but would be great to have safety net.
>
> From 5ccc8b170af5496f803243732e96b131419d7462 Mon Sep 17 00:00:00 2001
> From: Minchan Kim <[email protected]>
> Date: Thu, 19 May 2022 19:48:12 -0700
> Subject: [PATCH] mm: throttle LRU pages skipping on rmap_lock contention
>
> On heavy contention on rmap_lock(e.g., i_mmap_rwsem), VM can keep
> skipping LRU pages so reclaim efficiency(steal/scanning) would drop
> from 48% to 27% and workingset would be reclaimed faster than old
> so workingset_refault rate increased to 240%.
>
> We need a safe net to throttle the skipping LRU pages. This patch
> throttle the skipping policy using (DEF_PRIRORITY - 2) magic value
> VM has used for indicating non-light memory pressure.
> IOW, let's skip rmap_lock contendeded pages only when
> only when sc->priority >= (DEF_PRIRORITY - 2).
>
> The test scenario to see the worst case:
>
> 1. A thread mmap a big file(e.g., 2x times of RAM) and keep touching
> the address space up to three times.
> 2. B thread keeps doing mmap/munmap with the same file to cause
> heavy lock contention in i_mmap_rwsem until the A thread finish
> the job.
> 3. measure vmstat and thread A's elapsed time.
>
> Thread's elapsed time:
>
> 1. vanilla
> 24.64sec(5.04%)
>
> 2. rmap_skip(i.e., mm-dont-be-stuck-to-rmap-lock-on-reclaim-path.patch)
> 25.20sec(4.16%)
>
> 3. priority(2 + this patch)
> 23.62sec(6.61%)
>
> Vmstat Comparison:
> vanilla rmap_skip priority
> allocstall_movable 582 9772 14643
> pgactivate 232 25865 4906
> pgdeactivate 78 17265 651
> pgmajfault 58 10639 1376
> pgsteal_kswapd 15947857 15133195 15095445
> pgsteal_direct 105439 583092 943195
> pgscan_kswapd 24647536 52768898 28103170
> pgscan_direct 8398139 3767100 7966353
> workingset_refault_file 12582926 12248353 12565934
>
> B test scenario
>
> 1. A thread mmap a big file(e.g., 2x times of RAM) and keep touching
> the address space up to three times.
> 2. B thread keeps doing mmap/munmap with the same file to cause
> heavy lock contention in i_mmap_rwsem until the A thread finish
> the job.
> 3. C thread keep reading other big file using read(2) syscall
> 4. measure vmstat and thread A's elapsed time.
>
> 1. vanilla
> 27.24sec(5.29%)
>
> 2. rmap_skip
> 33.54sec(3.20%)
>
> 3. priority
> 28.68sec(1.26%)
>
> Vmstat Comparison:
> vanilla rmap_skip priority
> allocstall_movable 15262 81258 21644
> pgactivate 3042004 3086906 3502959
> pgdeactivate 2307849 8959162 3605768
> pgmajfault 566 1059 557
> pgsteal_kswapd 17557735 30861283 18385674
> pgsteal_direct 955389 6353527 1233605
> pgscan_kswapd 31622695 59670433 35372575
> pgscan_direct 4924052 13939254 4310247
> workingset_refault_file 13466538 32193161 14588019
>
> Signed-off-by: Minchan Kim <[email protected]>
> ---
> include/linux/rmap.h | 5 +++--
> mm/rmap.c | 6 ++++--
> mm/vmscan.c | 6 ++++--
> 3 files changed, 11 insertions(+), 6 deletions(-)
>
> diff --git a/include/linux/rmap.h b/include/linux/rmap.h
> index 9ec23138e410..2893da3f1cd3 100644
> --- a/include/linux/rmap.h
> +++ b/include/linux/rmap.h
> @@ -296,7 +296,8 @@ static inline int page_try_share_anon_rmap(struct page *page)
> * Called from mm/vmscan.c to handle paging out
> */
> int folio_referenced(struct folio *, int is_locked,
> - struct mem_cgroup *memcg, unsigned long *vm_flags);
> + struct mem_cgroup *memcg, unsigned long *vm_flags,
> + bool rmap_try_lock);
>
> void try_to_migrate(struct folio *folio, enum ttu_flags flags);
> void try_to_unmap(struct folio *, enum ttu_flags flags);
> @@ -418,7 +419,7 @@ void page_unlock_anon_vma_read(struct anon_vma *anon_vma);
>
> static inline int folio_referenced(struct folio *folio, int is_locked,
> struct mem_cgroup *memcg,
> - unsigned long *vm_flags)
> + unsigned long *vm_flags, bool rmap_try_lock)
> {
> *vm_flags = 0;
> return 0;
> diff --git a/mm/rmap.c b/mm/rmap.c
> index d4cf3ea1b616..a75c7f7a0392 100644
> --- a/mm/rmap.c
> +++ b/mm/rmap.c
> @@ -888,6 +888,7 @@ static bool invalid_folio_referenced_vma(struct vm_area_struct *vma, void *arg)
> * @is_locked: Caller holds lock on the folio.
> * @memcg: target memory cgroup
> * @vm_flags: A combination of all the vma->vm_flags which referenced the folio.
> + * @rmap_try_lock: bail out if the rmap lock is contended
> *
> * Quick test_and_clear_referenced for all mappings of a folio,
> *
> @@ -895,7 +896,8 @@ static bool invalid_folio_referenced_vma(struct vm_area_struct *vma, void *arg)
> * the function bailed out due to rmap lock contention.
> */
> int folio_referenced(struct folio *folio, int is_locked,
> - struct mem_cgroup *memcg, unsigned long *vm_flags)
> + struct mem_cgroup *memcg, unsigned long *vm_flags,
> + bool rmap_try_lock)
> {
> int we_locked = 0;
> struct folio_referenced_arg pra = {
> @@ -906,7 +908,7 @@ int folio_referenced(struct folio *folio, int is_locked,
> .rmap_one = folio_referenced_one,
> .arg = (void *)&pra,
> .anon_lock = folio_lock_anon_vma_read,
> - .try_lock = true,
> + .try_lock = rmap_try_lock,
> };
>
> *vm_flags = 0;
> diff --git a/mm/vmscan.c b/mm/vmscan.c
> index ac168f4b0492..f0987e027aba 100644
> --- a/mm/vmscan.c
> +++ b/mm/vmscan.c
> @@ -1381,7 +1381,8 @@ static enum page_references folio_check_references(struct folio *folio,
> unsigned long vm_flags;
>
> referenced_ptes = folio_referenced(folio, 1, sc->target_mem_cgroup,
> - &vm_flags);
> + &vm_flags,
> + sc->priority >= DEF_PRIORITY - 2);
> referenced_folio = folio_test_clear_referenced(folio);
>
> /*
> @@ -2497,7 +2498,8 @@ static void shrink_active_list(unsigned long nr_to_scan,
>
> /* Referenced or rmap lock contention: rotate */
> if (folio_referenced(folio, 0, sc->target_mem_cgroup,
> - &vm_flags) != 0) {
> + &vm_flags,
> + sc->priority >= DEF_PRIORITY - 2) != 0) {
> /*
> * Identify referenced, file-backed active pages and
> * give them one more trip around the active list. So
> --
> 2.36.1.124.g0e6072fb45-goog
>