2024-04-08 08:06:00

by 刘海龙(LaoLiu)

[permalink] [raw]
Subject: [PATCH] mm: vmscan: do not skip CMA while LRU is full of CMA folios

From: liuhailong <[email protected]>

If the allocation flag isn't movable, commit 5da226dbfce3 ("mm:
skip CMA pages when they are not available") skips CMA during direct
reclamation. This generally speeds up the reclamation of non-movable
folios. However, in scenarios with limited system memory and a majority
of reclaimable folios in LRU being from CMA, this can result in prolonged
idle loops where the system reclaims nothing but consumes CPU.

I traced the process of a thread entering direct reclamation,
and printed the relevant information of the sc(scan_control).
__alloc_pages_direct_reclaim start
sc->priority:9 sc->nr_skipped_cma:32208 sc->nr_scanned:36 sc->nr_reclaimed:3
sc->priority:8 sc->nr_skipped_cma:32199 sc->nr_scanned:69 sc->nr_reclaimed:3
sc->priority:7 sc->nr_skipped_cma:198405 sc->nr_scanned:121 sc->nr_reclaimed:3
sc->priority:6 sc->nr_skipped_cma:236713 sc->nr_scanned:147 sc->nr_reclaimed:3
sc->priority:5 sc->nr_skipped_cma:708209 sc->nr_scanned:379 sc->nr_reclaimed:3
sc->priority:4 sc->nr_skipped_cma:785537 sc->nr_scanned:646 sc->nr_reclaimed:3
__alloc_pages_direct_reclaim end duration 3356ms

Continuously skipping CMA even when the LRU is filled with CMA
folios can also result in lmkd failing to terminate processes. The
duration of psi_memstall (measured from the exit to the entry of
__alloc_pages_direct_reclaim) becomes excessively long, lasting for
example a couple of seconds. Consequently, lmkd fails to awaken and
terminate processes promptly.

This patch introduces no_skip_cma and sets it to true when the number of
skipped CMA folios is excessively high. It offers two benefits: Rather
than wasting time in idle loops, it's better to assist other threads in
reclaiming some folios; This shortens the duration of psi_memstall and
ensures timely activation of lmkd within a few milliseconds.

Signed-off-by: liuhailong <[email protected]>
---
mm/vmscan.c | 23 ++++++++++++++++++++++-
1 file changed, 22 insertions(+), 1 deletion(-)

diff --git a/mm/vmscan.c b/mm/vmscan.c
index fa321c125099..2c74c1c94d88 100644
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -114,6 +114,9 @@ struct scan_control {
/* Proactive reclaim invoked by userspace through memory.reclaim */
unsigned int proactive:1;

+ /* Can reclaim skip cma pages */
+ unsigned int no_skip_cma:1;
+
/*
* Cgroup memory below memory.low is protected as long as we
* don't threaten to OOM. If any cgroup is reclaimed at
@@ -157,6 +160,9 @@ struct scan_control {
/* Number of pages freed so far during a call to shrink_zones() */
unsigned long nr_reclaimed;

+ /* Number of cma-pages skipped so far during a call to shrink_zones() */
+ unsigned long nr_skipped_cma;
+
struct {
unsigned int dirty;
unsigned int unqueued_dirty;
@@ -1572,9 +1578,13 @@ static __always_inline void update_lru_sizes(struct lruvec *lruvec,
*/
static bool skip_cma(struct folio *folio, struct scan_control *sc)
{
- return !current_is_kswapd() &&
+ bool ret = !current_is_kswapd() && !sc->no_skip_cma &&
gfp_migratetype(sc->gfp_mask) != MIGRATE_MOVABLE &&
folio_migratetype(folio) == MIGRATE_CMA;
+
+ if (ret)
+ sc->nr_skipped_cma += folio_nr_pages(folio);
+ return ret;
}
#else
static bool skip_cma(struct folio *folio, struct scan_control *sc)
@@ -6188,6 +6198,7 @@ static unsigned long do_try_to_free_pages(struct zonelist *zonelist,
vmpressure_prio(sc->gfp_mask, sc->target_mem_cgroup,
sc->priority);
sc->nr_scanned = 0;
+ sc->nr_skipped_cma = 0;
shrink_zones(zonelist, sc);

if (sc->nr_reclaimed >= sc->nr_to_reclaim)
@@ -6202,6 +6213,16 @@ static unsigned long do_try_to_free_pages(struct zonelist *zonelist,
*/
if (sc->priority < DEF_PRIORITY - 2)
sc->may_writepage = 1;
+
+ /*
+ * If we're getting trouble reclaiming non-cma pages and
+ * currently a substantial number of CMA pages on LRU,
+ * start reclaiming cma pages to alleviate other threads
+ * and decrease lru size.
+ */
+ if (sc->priority < DEF_PRIORITY - 2 &&
+ sc->nr_scanned < (sc->nr_skipped_cma >> 3))
+ sc->no_skip_cma = 1;
} while (--sc->priority >= 0);

last_pgdat = NULL;
--
2.36.1



2024-04-08 08:19:48

by Barry Song

[permalink] [raw]
Subject: Re: [PATCH] mm: vmscan: do not skip CMA while LRU is full of CMA folios

On Mon, Apr 8, 2024 at 8:06 PM <[email protected]> wrote:
>
> From: liuhailong <[email protected]>
>
> If the allocation flag isn't movable, commit 5da226dbfce3 ("mm:
> skip CMA pages when they are not available") skips CMA during direct
> reclamation. This generally speeds up the reclamation of non-movable
> folios. However, in scenarios with limited system memory and a majority
> of reclaimable folios in LRU being from CMA, this can result in prolonged
> idle loops where the system reclaims nothing but consumes CPU.
>
> I traced the process of a thread entering direct reclamation,
> and printed the relevant information of the sc(scan_control).
> __alloc_pages_direct_reclaim start
> sc->priority:9 sc->nr_skipped_cma:32208 sc->nr_scanned:36 sc->nr_reclaimed:3
> sc->priority:8 sc->nr_skipped_cma:32199 sc->nr_scanned:69 sc->nr_reclaimed:3
> sc->priority:7 sc->nr_skipped_cma:198405 sc->nr_scanned:121 sc->nr_reclaimed:3
> sc->priority:6 sc->nr_skipped_cma:236713 sc->nr_scanned:147 sc->nr_reclaimed:3
> sc->priority:5 sc->nr_skipped_cma:708209 sc->nr_scanned:379 sc->nr_reclaimed:3
> sc->priority:4 sc->nr_skipped_cma:785537 sc->nr_scanned:646 sc->nr_reclaimed:3
> __alloc_pages_direct_reclaim end duration 3356ms
>
> Continuously skipping CMA even when the LRU is filled with CMA
> folios can also result in lmkd failing to terminate processes. The
> duration of psi_memstall (measured from the exit to the entry of
> __alloc_pages_direct_reclaim) becomes excessively long, lasting for
> example a couple of seconds. Consequently, lmkd fails to awaken and
> terminate processes promptly.

If you're planning to send a newer version, it's best to include
the reproducer here.

>
> This patch introduces no_skip_cma and sets it to true when the number of
> skipped CMA folios is excessively high. It offers two benefits: Rather
> than wasting time in idle loops, it's better to assist other threads in
> reclaiming some folios; This shortens the duration of psi_memstall and
> ensures timely activation of lmkd within a few milliseconds.
>
> Signed-off-by: liuhailong <[email protected]>

I believe this patch tackles a niche scenario where CMA is configured
to be large.

Acked-by: Barry Song <[email protected]>

> ---

This is evidently version 3 of your previous patch titled "[PATCH v2] Revert "
mm: skip CMA pages when they are not available".[1]

It's advisable to provide a brief explanation of the changes made from v2
and include a link to v2 here.

[1] https://lore.kernel.org/linux-mm/CAGsJ_4wD-NiquhnhK_6TGEF4reTGO9pVNGyYBnYZ1inVFc40WQ@mail.gmail.com/

> mm/vmscan.c | 23 ++++++++++++++++++++++-
> 1 file changed, 22 insertions(+), 1 deletion(-)
>
> diff --git a/mm/vmscan.c b/mm/vmscan.c
> index fa321c125099..2c74c1c94d88 100644
> --- a/mm/vmscan.c
> +++ b/mm/vmscan.c
> @@ -114,6 +114,9 @@ struct scan_control {
> /* Proactive reclaim invoked by userspace through memory.reclaim */
> unsigned int proactive:1;
>
> + /* Can reclaim skip cma pages */
> + unsigned int no_skip_cma:1;
> +
> /*
> * Cgroup memory below memory.low is protected as long as we
> * don't threaten to OOM. If any cgroup is reclaimed at
> @@ -157,6 +160,9 @@ struct scan_control {
> /* Number of pages freed so far during a call to shrink_zones() */
> unsigned long nr_reclaimed;
>
> + /* Number of cma-pages skipped so far during a call to shrink_zones() */
> + unsigned long nr_skipped_cma;
> +
> struct {
> unsigned int dirty;
> unsigned int unqueued_dirty;
> @@ -1572,9 +1578,13 @@ static __always_inline void update_lru_sizes(struct lruvec *lruvec,
> */
> static bool skip_cma(struct folio *folio, struct scan_control *sc)
> {
> - return !current_is_kswapd() &&
> + bool ret = !current_is_kswapd() && !sc->no_skip_cma &&
> gfp_migratetype(sc->gfp_mask) != MIGRATE_MOVABLE &&
> folio_migratetype(folio) == MIGRATE_CMA;
> +
> + if (ret)
> + sc->nr_skipped_cma += folio_nr_pages(folio);
> + return ret;
> }
> #else
> static bool skip_cma(struct folio *folio, struct scan_control *sc)
> @@ -6188,6 +6198,7 @@ static unsigned long do_try_to_free_pages(struct zonelist *zonelist,
> vmpressure_prio(sc->gfp_mask, sc->target_mem_cgroup,
> sc->priority);
> sc->nr_scanned = 0;
> + sc->nr_skipped_cma = 0;
> shrink_zones(zonelist, sc);
>
> if (sc->nr_reclaimed >= sc->nr_to_reclaim)
> @@ -6202,6 +6213,16 @@ static unsigned long do_try_to_free_pages(struct zonelist *zonelist,
> */
> if (sc->priority < DEF_PRIORITY - 2)
> sc->may_writepage = 1;
> +
> + /*
> + * If we're getting trouble reclaiming non-cma pages and
> + * currently a substantial number of CMA pages on LRU,
> + * start reclaiming cma pages to alleviate other threads
> + * and decrease lru size.
> + */
> + if (sc->priority < DEF_PRIORITY - 2 &&
> + sc->nr_scanned < (sc->nr_skipped_cma >> 3))
> + sc->no_skip_cma = 1;
> } while (--sc->priority >= 0);
>
> last_pgdat = NULL;
> --
> 2.36.1
>

Thanks
Barry

2024-04-08 08:38:57

by Oscar Salvador

[permalink] [raw]
Subject: Re: [PATCH] mm: vmscan: do not skip CMA while LRU is full of CMA folios

On Mon, Apr 08, 2024 at 04:05:39PM +0800, [email protected] wrote:
> From: liuhailong <[email protected]>
> @@ -6202,6 +6213,16 @@ static unsigned long do_try_to_free_pages(struct zonelist *zonelist,
> */
> if (sc->priority < DEF_PRIORITY - 2)
> sc->may_writepage = 1;
> +
> + /*
> + * If we're getting trouble reclaiming non-cma pages and
> + * currently a substantial number of CMA pages on LRU,
"sit on LRU" ?

> + * start reclaiming cma pages to alleviate other threads
> + * and decrease lru size.
> + */
> + if (sc->priority < DEF_PRIORITY - 2 &&
> + sc->nr_scanned < (sc->nr_skipped_cma >> 3))

Why "sc->nr_skipped_cma >> 3"? It feels a bit hardcoded.
Maybe the comment or the changelog should contain a reference about why
this "/8" was a good choice.


--
Oscar Salvador
SUSE Labs

2024-04-09 04:54:31

by 刘海龙(LaoLiu)

[permalink] [raw]
Subject: Re: [PATCH] mm: vmscan: do not skip CMA while LRU is full of CMA folios

On 2024/4/8 16:38, Oscar Salvador wrote:
> On Mon, Apr 08, 2024 at 04:05:39PM +0800, [email protected] wrote:
>> From: liuhailong <[email protected]>
>> @@ -6202,6 +6213,16 @@ static unsigned long do_try_to_free_pages(struct zonelist *zonelist,
>> */
>> if (sc->priority < DEF_PRIORITY - 2)
>> sc->may_writepage = 1;
>> +
>> + /*
>> + * If we're getting trouble reclaiming non-cma pages and
>> + * currently a substantial number of CMA pages on LRU,
> "sit on LRU" ?
Got it, Thanks
>
>> + * start reclaiming cma pages to alleviate other threads
>> + * and decrease lru size.
>> + */
>> + if (sc->priority < DEF_PRIORITY - 2 &&
>> + sc->nr_scanned < (sc->nr_skipped_cma >> 3))
>
> Why "sc->nr_skipped_cma >> 3"? It feels a bit hardcoded.
> Maybe the comment or the changelog should contain a reference about why
> this "/8" was a good choice.

When the number of skipped CMA ages exceeds eight times the number of
scanned pages, it indicates that CMA pages constitute the majority
of the LRU pages. Setting the value too low may result in premature
reclamation of CMA pages, which are unnecessary for non-movable
allocations. Conversely, setting it too high may delay problem detection
until much later, wasting CPU time in idle loops.

Brs,
Hailong.