We've met softlockup with "CONFIG_PREEMPT_NONE=y", when
the target memcg doesn't have any reclaimable memory.
It can be easily reproduced as below:
watchdog: BUG: soft lockup - CPU#0 stuck for 111s![memcg_test:2204]
CPU: 0 PID: 2204 Comm: memcg_test Not tainted 5.9.0-rc2+ #12
Call Trace:
shrink_lruvec+0x49f/0x640
shrink_node+0x2a6/0x6f0
do_try_to_free_pages+0xe9/0x3e0
try_to_free_mem_cgroup_pages+0xef/0x1f0
try_charge+0x2c1/0x750
mem_cgroup_charge+0xd7/0x240
__add_to_page_cache_locked+0x2fd/0x370
add_to_page_cache_lru+0x4a/0xc0
pagecache_get_page+0x10b/0x2f0
filemap_fault+0x661/0xad0
ext4_filemap_fault+0x2c/0x40
__do_fault+0x4d/0xf9
handle_mm_fault+0x1080/0x1790
It only happens on our 1-vcpu instances, because there's no chance
for oom reaper to run to reclaim the to-be-killed process.
Add cond_resched() at the upper shrink_node_memcgs() to solve this
issue, and any other possible issue like meomry.min protection.
Suggested-by: Michal Hocko <[email protected]>
Signed-off-by: Xunlei Pang <[email protected]>
---
mm/vmscan.c | 2 ++
1 file changed, 2 insertions(+)
diff --git a/mm/vmscan.c b/mm/vmscan.c
index 99e1796..bbdc38b 100644
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -2617,6 +2617,8 @@ static void shrink_node_memcgs(pg_data_t *pgdat, struct scan_control *sc)
mem_cgroup_calculate_protection(target_memcg, memcg);
+ cond_resched();
+
if (mem_cgroup_below_min(memcg)) {
/*
* Hard protection.
--
1.8.3.1
Hi Xunlei,
Xunlei Pang writes:
>Add cond_resched() at the upper shrink_node_memcgs() to solve this
>issue, and any other possible issue like meomry.min protection.
I understand the general intent of the patch, but could you clarify your
concern around memory protection?
On Wed 26-08-20 21:47:02, Xunlei Pang wrote:
> We've met softlockup with "CONFIG_PREEMPT_NONE=y", when
> the target memcg doesn't have any reclaimable memory.
>
> It can be easily reproduced as below:
> watchdog: BUG: soft lockup - CPU#0 stuck for 111s![memcg_test:2204]
> CPU: 0 PID: 2204 Comm: memcg_test Not tainted 5.9.0-rc2+ #12
> Call Trace:
> shrink_lruvec+0x49f/0x640
> shrink_node+0x2a6/0x6f0
> do_try_to_free_pages+0xe9/0x3e0
> try_to_free_mem_cgroup_pages+0xef/0x1f0
> try_charge+0x2c1/0x750
> mem_cgroup_charge+0xd7/0x240
> __add_to_page_cache_locked+0x2fd/0x370
> add_to_page_cache_lru+0x4a/0xc0
> pagecache_get_page+0x10b/0x2f0
> filemap_fault+0x661/0xad0
> ext4_filemap_fault+0x2c/0x40
> __do_fault+0x4d/0xf9
> handle_mm_fault+0x1080/0x1790
>
> It only happens on our 1-vcpu instances, because there's no chance
> for oom reaper to run to reclaim the to-be-killed process.
>
> Add cond_resched() at the upper shrink_node_memcgs() to solve this
> issue, and any other possible issue like meomry.min protection.
I would just add
"
This will mean that we will get a scheduling point for each memcg in the
reclaimed hierarchy without any dependency on the reclaimable memory in
that memcg thus making it more predictable.
"
> Suggested-by: Michal Hocko <[email protected]>
> Signed-off-by: Xunlei Pang <[email protected]>
Acked-by: Michal Hocko <[email protected]>
Thanks!
> ---
> mm/vmscan.c | 2 ++
> 1 file changed, 2 insertions(+)
>
> diff --git a/mm/vmscan.c b/mm/vmscan.c
> index 99e1796..bbdc38b 100644
> --- a/mm/vmscan.c
> +++ b/mm/vmscan.c
> @@ -2617,6 +2617,8 @@ static void shrink_node_memcgs(pg_data_t *pgdat, struct scan_control *sc)
>
> mem_cgroup_calculate_protection(target_memcg, memcg);
>
> + cond_resched();
> +
> if (mem_cgroup_below_min(memcg)) {
> /*
> * Hard protection.
> --
> 1.8.3.1
--
Michal Hocko
SUSE Labs
On Wed, Aug 26, 2020 at 09:47:02PM +0800, Xunlei Pang wrote:
> We've met softlockup with "CONFIG_PREEMPT_NONE=y", when
> the target memcg doesn't have any reclaimable memory.
>
> It can be easily reproduced as below:
> watchdog: BUG: soft lockup - CPU#0 stuck for 111s![memcg_test:2204]
> CPU: 0 PID: 2204 Comm: memcg_test Not tainted 5.9.0-rc2+ #12
> Call Trace:
> shrink_lruvec+0x49f/0x640
> shrink_node+0x2a6/0x6f0
> do_try_to_free_pages+0xe9/0x3e0
> try_to_free_mem_cgroup_pages+0xef/0x1f0
> try_charge+0x2c1/0x750
> mem_cgroup_charge+0xd7/0x240
> __add_to_page_cache_locked+0x2fd/0x370
> add_to_page_cache_lru+0x4a/0xc0
> pagecache_get_page+0x10b/0x2f0
> filemap_fault+0x661/0xad0
> ext4_filemap_fault+0x2c/0x40
> __do_fault+0x4d/0xf9
> handle_mm_fault+0x1080/0x1790
>
> It only happens on our 1-vcpu instances, because there's no chance
> for oom reaper to run to reclaim the to-be-killed process.
>
> Add cond_resched() at the upper shrink_node_memcgs() to solve this
> issue, and any other possible issue like meomry.min protection.
>
> Suggested-by: Michal Hocko <[email protected]>
> Signed-off-by: Xunlei Pang <[email protected]>
This generally makes sense to me but really should have a comment:
/*
* This loop can become CPU-bound when there are thousands
* of cgroups that aren't eligible for reclaim - either
* because they don't have any pages, or because their
* memory is explicitly protected. Avoid soft lockups.
*/
cond_resched();
The placement in the middle of the multi-part protection checks is a
bit odd too. It would be better to have it either at the top of the
loop, or at the end, by replacing the continues with goto next.
On Wed 26-08-20 12:43:32, Johannes Weiner wrote:
> On Wed, Aug 26, 2020 at 09:47:02PM +0800, Xunlei Pang wrote:
> > We've met softlockup with "CONFIG_PREEMPT_NONE=y", when
> > the target memcg doesn't have any reclaimable memory.
> >
> > It can be easily reproduced as below:
> > watchdog: BUG: soft lockup - CPU#0 stuck for 111s![memcg_test:2204]
> > CPU: 0 PID: 2204 Comm: memcg_test Not tainted 5.9.0-rc2+ #12
> > Call Trace:
> > shrink_lruvec+0x49f/0x640
> > shrink_node+0x2a6/0x6f0
> > do_try_to_free_pages+0xe9/0x3e0
> > try_to_free_mem_cgroup_pages+0xef/0x1f0
> > try_charge+0x2c1/0x750
> > mem_cgroup_charge+0xd7/0x240
> > __add_to_page_cache_locked+0x2fd/0x370
> > add_to_page_cache_lru+0x4a/0xc0
> > pagecache_get_page+0x10b/0x2f0
> > filemap_fault+0x661/0xad0
> > ext4_filemap_fault+0x2c/0x40
> > __do_fault+0x4d/0xf9
> > handle_mm_fault+0x1080/0x1790
> >
> > It only happens on our 1-vcpu instances, because there's no chance
> > for oom reaper to run to reclaim the to-be-killed process.
> >
> > Add cond_resched() at the upper shrink_node_memcgs() to solve this
> > issue, and any other possible issue like meomry.min protection.
> >
> > Suggested-by: Michal Hocko <[email protected]>
> > Signed-off-by: Xunlei Pang <[email protected]>
>
> This generally makes sense to me but really should have a comment:
>
> /*
> * This loop can become CPU-bound when there are thousands
> * of cgroups that aren't eligible for reclaim - either
> * because they don't have any pages, or because their
> * memory is explicitly protected. Avoid soft lockups.
> */
> cond_resched();
>
> The placement in the middle of the multi-part protection checks is a
> bit odd too. It would be better to have it either at the top of the
> loop, or at the end, by replacing the continues with goto next.
Yes makes sense. I would stick it to the begining of the loop to make it
stand out and make it obvious wrt code flow.
--
Michal Hocko
SUSE Labs
Chris Down writes:
>Xunlei Pang writes:
>>Add cond_resched() at the upper shrink_node_memcgs() to solve this
>>issue, and any other possible issue like meomry.min protection.
>
>I understand the general intent of the patch, but could you clarify
>your concern around memory protection?
Oh, I see, your concern is just preemption in general rather than a fixing
anything for the memory protection side. In which case, go for it, but I agree
with Michael that it would be nice to send v3 with a clarifying comment.
Acked-by: Chris Down <[email protected]>