2021-07-26 15:03:35

by Johannes Weiner

[permalink] [raw]
Subject: [PATCH] mm: memcontrol: fix blocking rstat function called from atomic cgroup1 thresholding code

Dan Carpenter reports:

The patch 2d146aa3aa84: "mm: memcontrol: switch to rstat" from Apr
29, 2021, leads to the following static checker warning:

kernel/cgroup/rstat.c:200 cgroup_rstat_flush()
warn: sleeping in atomic context

mm/memcontrol.c
3572 static unsigned long mem_cgroup_usage(struct mem_cgroup *memcg, bool swap)
3573 {
3574 unsigned long val;
3575
3576 if (mem_cgroup_is_root(memcg)) {
3577 cgroup_rstat_flush(memcg->css.cgroup);
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

This is from static analysis and potentially a false positive. The
problem is that mem_cgroup_usage() is called from __mem_cgroup_threshold()
which holds an rcu_read_lock(). And the cgroup_rstat_flush() function
can sleep.

3578 val = memcg_page_state(memcg, NR_FILE_PAGES) +
3579 memcg_page_state(memcg, NR_ANON_MAPPED);
3580 if (swap)
3581 val += memcg_page_state(memcg, MEMCG_SWAP);
3582 } else {
3583 if (!swap)
3584 val = page_counter_read(&memcg->memory);
3585 else
3586 val = page_counter_read(&memcg->memsw);
3587 }
3588 return val;
3589 }

__mem_cgroup_threshold() indeed holds the rcu lock. In addition, the
thresholding code is invoked during stat changes, and those contexts
have irqs disabled as well. If the lock breaking occurs inside the
flush function, it will result in a sleep from an atomic context.

Use the irsafe flushing variant in mem_cgroup_usage() to fix this.

Fixes: 2d146aa3aa84 ("mm: memcontrol: switch to rstat")
Cc: <[email protected]>
Reported-by: Dan Carpenter <[email protected]>
Signed-off-by: Johannes Weiner <[email protected]>
---
mm/memcontrol.c | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index ae1f5d0cb581..eb8e87c4833f 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -3574,7 +3574,8 @@ static unsigned long mem_cgroup_usage(struct mem_cgroup *memcg, bool swap)
unsigned long val;

if (mem_cgroup_is_root(memcg)) {
- cgroup_rstat_flush(memcg->css.cgroup);
+ /* mem_cgroup_threshold() calls here from irqsafe context */
+ cgroup_rstat_flush_irqsafe(memcg->css.cgroup);
val = memcg_page_state(memcg, NR_FILE_PAGES) +
memcg_page_state(memcg, NR_ANON_MAPPED);
if (swap)
--
2.32.0


2021-07-26 15:11:33

by Chris Down

[permalink] [raw]
Subject: Re: [PATCH] mm: memcontrol: fix blocking rstat function called from atomic cgroup1 thresholding code

Johannes Weiner writes:
>Dan Carpenter reports:
>
> The patch 2d146aa3aa84: "mm: memcontrol: switch to rstat" from Apr
> 29, 2021, leads to the following static checker warning:
>
> kernel/cgroup/rstat.c:200 cgroup_rstat_flush()
> warn: sleeping in atomic context
>
> mm/memcontrol.c
> 3572 static unsigned long mem_cgroup_usage(struct mem_cgroup *memcg, bool swap)
> 3573 {
> 3574 unsigned long val;
> 3575
> 3576 if (mem_cgroup_is_root(memcg)) {
> 3577 cgroup_rstat_flush(memcg->css.cgroup);
> ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
>
> This is from static analysis and potentially a false positive. The
> problem is that mem_cgroup_usage() is called from __mem_cgroup_threshold()
> which holds an rcu_read_lock(). And the cgroup_rstat_flush() function
> can sleep.
>
> 3578 val = memcg_page_state(memcg, NR_FILE_PAGES) +
> 3579 memcg_page_state(memcg, NR_ANON_MAPPED);
> 3580 if (swap)
> 3581 val += memcg_page_state(memcg, MEMCG_SWAP);
> 3582 } else {
> 3583 if (!swap)
> 3584 val = page_counter_read(&memcg->memory);
> 3585 else
> 3586 val = page_counter_read(&memcg->memsw);
> 3587 }
> 3588 return val;
> 3589 }
>
>__mem_cgroup_threshold() indeed holds the rcu lock. In addition, the
>thresholding code is invoked during stat changes, and those contexts
>have irqs disabled as well. If the lock breaking occurs inside the
>flush function, it will result in a sleep from an atomic context.
>
>Use the irsafe flushing variant in mem_cgroup_usage() to fix this.
>
>Fixes: 2d146aa3aa84 ("mm: memcontrol: switch to rstat")
>Cc: <[email protected]>
>Reported-by: Dan Carpenter <[email protected]>
>Signed-off-by: Johannes Weiner <[email protected]>

Thanks, looks good.

Acked-by: Chris Down <[email protected]>

>---
> mm/memcontrol.c | 3 ++-
> 1 file changed, 2 insertions(+), 1 deletion(-)
>
>diff --git a/mm/memcontrol.c b/mm/memcontrol.c
>index ae1f5d0cb581..eb8e87c4833f 100644
>--- a/mm/memcontrol.c
>+++ b/mm/memcontrol.c
>@@ -3574,7 +3574,8 @@ static unsigned long mem_cgroup_usage(struct mem_cgroup *memcg, bool swap)
> unsigned long val;
>
> if (mem_cgroup_is_root(memcg)) {
>- cgroup_rstat_flush(memcg->css.cgroup);
>+ /* mem_cgroup_threshold() calls here from irqsafe context */
>+ cgroup_rstat_flush_irqsafe(memcg->css.cgroup);
> val = memcg_page_state(memcg, NR_FILE_PAGES) +
> memcg_page_state(memcg, NR_ANON_MAPPED);
> if (swap)
>--
>2.32.0
>
>

2021-07-26 15:19:04

by Rik van Riel

[permalink] [raw]
Subject: Re: [PATCH] mm: memcontrol: fix blocking rstat function called from atomic cgroup1 thresholding code

On Mon, 2021-07-26 at 11:00 -0400, Johannes Weiner wrote:
>
> __mem_cgroup_threshold() indeed holds the rcu lock. In addition, the
> thresholding code is invoked during stat changes, and those contexts
> have irqs disabled as well. If the lock breaking occurs inside the
> flush function, it will result in a sleep from an atomic context.
>
> Use the irsafe flushing variant in mem_cgroup_usage() to fix this

While this fix is necessary, in the long term I think we may
want some sort of redesign here, to make sure the irq safe
version does not spin long times trying to get the statistics
off some other CPU.

I have seen a number of soft (IIRC) lockups deep inside the
bowels of cgroup_rstat_flush_irqsafe, with the function taking
multiple seconds to complete.

Reviewed-by: Rik van Riel <[email protected]>

2021-07-26 20:37:04

by Michal Hocko

[permalink] [raw]
Subject: Re: [PATCH] mm: memcontrol: fix blocking rstat function called from atomic cgroup1 thresholding code

On Mon 26-07-21 11:00:19, Johannes Weiner wrote:
> Dan Carpenter reports:
>
> The patch 2d146aa3aa84: "mm: memcontrol: switch to rstat" from Apr
> 29, 2021, leads to the following static checker warning:
>
> kernel/cgroup/rstat.c:200 cgroup_rstat_flush()
> warn: sleeping in atomic context
>
> mm/memcontrol.c
> 3572 static unsigned long mem_cgroup_usage(struct mem_cgroup *memcg, bool swap)
> 3573 {
> 3574 unsigned long val;
> 3575
> 3576 if (mem_cgroup_is_root(memcg)) {
> 3577 cgroup_rstat_flush(memcg->css.cgroup);
> ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
>
> This is from static analysis and potentially a false positive. The
> problem is that mem_cgroup_usage() is called from __mem_cgroup_threshold()
> which holds an rcu_read_lock(). And the cgroup_rstat_flush() function
> can sleep.
>
> 3578 val = memcg_page_state(memcg, NR_FILE_PAGES) +
> 3579 memcg_page_state(memcg, NR_ANON_MAPPED);
> 3580 if (swap)
> 3581 val += memcg_page_state(memcg, MEMCG_SWAP);
> 3582 } else {
> 3583 if (!swap)
> 3584 val = page_counter_read(&memcg->memory);
> 3585 else
> 3586 val = page_counter_read(&memcg->memsw);
> 3587 }
> 3588 return val;
> 3589 }
>
> __mem_cgroup_threshold() indeed holds the rcu lock. In addition, the
> thresholding code is invoked during stat changes, and those contexts
> have irqs disabled as well. If the lock breaking occurs inside the
> flush function, it will result in a sleep from an atomic context.
>
> Use the irsafe flushing variant in mem_cgroup_usage() to fix this.
>
> Fixes: 2d146aa3aa84 ("mm: memcontrol: switch to rstat")
> Cc: <[email protected]>
> Reported-by: Dan Carpenter <[email protected]>
> Signed-off-by: Johannes Weiner <[email protected]>

Acked-by: Michal Hocko <[email protected]>

Thanks!

> ---
> mm/memcontrol.c | 3 ++-
> 1 file changed, 2 insertions(+), 1 deletion(-)
>
> diff --git a/mm/memcontrol.c b/mm/memcontrol.c
> index ae1f5d0cb581..eb8e87c4833f 100644
> --- a/mm/memcontrol.c
> +++ b/mm/memcontrol.c
> @@ -3574,7 +3574,8 @@ static unsigned long mem_cgroup_usage(struct mem_cgroup *memcg, bool swap)
> unsigned long val;
>
> if (mem_cgroup_is_root(memcg)) {
> - cgroup_rstat_flush(memcg->css.cgroup);
> + /* mem_cgroup_threshold() calls here from irqsafe context */
> + cgroup_rstat_flush_irqsafe(memcg->css.cgroup);
> val = memcg_page_state(memcg, NR_FILE_PAGES) +
> memcg_page_state(memcg, NR_ANON_MAPPED);
> if (swap)
> --
> 2.32.0

--
Michal Hocko
SUSE Labs

2021-07-27 16:53:02

by Shakeel Butt

[permalink] [raw]
Subject: Re: [PATCH] mm: memcontrol: fix blocking rstat function called from atomic cgroup1 thresholding code

On Mon, Jul 26, 2021 at 8:19 AM Rik van Riel <[email protected]> wrote:
>
> On Mon, 2021-07-26 at 11:00 -0400, Johannes Weiner wrote:
> >
> > __mem_cgroup_threshold() indeed holds the rcu lock. In addition, the
> > thresholding code is invoked during stat changes, and those contexts
> > have irqs disabled as well. If the lock breaking occurs inside the
> > flush function, it will result in a sleep from an atomic context.
> >
> > Use the irsafe flushing variant in mem_cgroup_usage() to fix this
>
> While this fix is necessary, in the long term I think we may
> want some sort of redesign here, to make sure the irq safe
> version does not spin long times trying to get the statistics
> off some other CPU.
>
> I have seen a number of soft (IIRC) lockups deep inside the
> bowels of cgroup_rstat_flush_irqsafe, with the function taking
> multiple seconds to complete.

Can you please share a bit more detail on this lockup? I am wondering
if this was due to the flush not happening more often and thus the
update tree is large or if there are too many concurrent flushes
happening.

>
> Reviewed-by: Rik van Riel <[email protected]>

2021-07-27 17:03:00

by Shakeel Butt

[permalink] [raw]
Subject: Re: [PATCH] mm: memcontrol: fix blocking rstat function called from atomic cgroup1 thresholding code

On Mon, Jul 26, 2021 at 8:01 AM Johannes Weiner <[email protected]> wrote:
>
> Dan Carpenter reports:
>
> The patch 2d146aa3aa84: "mm: memcontrol: switch to rstat" from Apr
> 29, 2021, leads to the following static checker warning:
>
> kernel/cgroup/rstat.c:200 cgroup_rstat_flush()
> warn: sleeping in atomic context
>
> mm/memcontrol.c
> 3572 static unsigned long mem_cgroup_usage(struct mem_cgroup *memcg, bool swap)
> 3573 {
> 3574 unsigned long val;
> 3575
> 3576 if (mem_cgroup_is_root(memcg)) {
> 3577 cgroup_rstat_flush(memcg->css.cgroup);
> ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
>
> This is from static analysis and potentially a false positive. The
> problem is that mem_cgroup_usage() is called from __mem_cgroup_threshold()
> which holds an rcu_read_lock(). And the cgroup_rstat_flush() function
> can sleep.
>
> 3578 val = memcg_page_state(memcg, NR_FILE_PAGES) +
> 3579 memcg_page_state(memcg, NR_ANON_MAPPED);
> 3580 if (swap)
> 3581 val += memcg_page_state(memcg, MEMCG_SWAP);
> 3582 } else {
> 3583 if (!swap)
> 3584 val = page_counter_read(&memcg->memory);
> 3585 else
> 3586 val = page_counter_read(&memcg->memsw);
> 3587 }
> 3588 return val;
> 3589 }
>
> __mem_cgroup_threshold() indeed holds the rcu lock. In addition, the
> thresholding code is invoked during stat changes, and those contexts
> have irqs disabled as well. If the lock breaking occurs inside the
> flush function, it will result in a sleep from an atomic context.
>
> Use the irsafe flushing variant in mem_cgroup_usage() to fix this.
>
> Fixes: 2d146aa3aa84 ("mm: memcontrol: switch to rstat")
> Cc: <[email protected]>
> Reported-by: Dan Carpenter <[email protected]>
> Signed-off-by: Johannes Weiner <[email protected]>

Reviewed-by: Shakeel Butt <[email protected]>

BTW what do you think of removing stat flushes from the read side
(kernel and userspace) completely after periodic flushing and async
flushing from update side? Basically with "memcg: infrastructure to
flush memcg stats".

2021-08-03 14:35:50

by Rik van Riel

[permalink] [raw]
Subject: Re: [PATCH] mm: memcontrol: fix blocking rstat function called from atomic cgroup1 thresholding code

On Tue, 2021-07-27 at 09:51 -0700, Shakeel Butt wrote:
> On Mon, Jul 26, 2021 at 8:19 AM Rik van Riel <[email protected]> wrote:
> >
> > On Mon, 2021-07-26 at 11:00 -0400, Johannes Weiner wrote:
> > >
> > > __mem_cgroup_threshold() indeed holds the rcu lock. In addition,
> > > the
> > > thresholding code is invoked during stat changes, and those
> > > contexts
> > > have irqs disabled as well. If the lock breaking occurs inside
> > > the
> > > flush function, it will result in a sleep from an atomic context.
> > >
> > > Use the irsafe flushing variant in mem_cgroup_usage() to fix this
> >
> > While this fix is necessary, in the long term I think we may
> > want some sort of redesign here, to make sure the irq safe
> > version does not spin long times trying to get the statistics
> > off some other CPU.
> >
> > I have seen a number of soft (IIRC) lockups deep inside the
> > bowels of cgroup_rstat_flush_irqsafe, with the function taking
> > multiple seconds to complete.
>
> Can you please share a bit more detail on this lockup? I am wondering
> if this was due to the flush not happening more often and thus the
> update tree is large or if there are too many concurrent flushes
> happening.

I was not logged into any system while it happened, but
only found it later in the logs.

I suspect your explanation is the reason why it happened,
though.