2020-09-08 19:00:08

by Julius Hemanth Pitti

[permalink] [raw]
Subject: [PATCH] mm: memcg: yield cpu when we fail to charge pages

For non root CG, in try_charge(), we keep trying
to charge until we succeed. On non-preemptive
kernel, when we are OOM, this results in holding
CPU forever.

On SMP systems, this doesn't create a big problem
because oom_reaper get a change to kill victim
and make some free pages. However on a single-core
CPU (or cases where oom_reaper pinned to same CPU
where try_charge is executing), oom_reaper shall
never get scheduled and we stay in try_charge forever.

Steps to repo this on non-smp:
1. mount -t tmpfs none /sys/fs/cgroup
2. mkdir /sys/fs/cgroup/memory
3. mount -t cgroup none /sys/fs/cgroup/memory -o memory
4. mkdir /sys/fs/cgroup/memory/0
5. echo 40M > /sys/fs/cgroup/memory/0/memory.limit_in_bytes
6. echo $$ > /sys/fs/cgroup/memory/0/tasks
7. stress -m 5 --vm-bytes 10M --vm-hang 0

Signed-off-by: Julius Hemanth Pitti <[email protected]>
---
mm/memcontrol.c | 2 ++
1 file changed, 2 insertions(+)

diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index 0d6f3ea86738..4620d70267cb 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -2652,6 +2652,8 @@ static int try_charge(struct mem_cgroup *memcg, gfp_t gfp_mask,
if (fatal_signal_pending(current))
goto force;

+ cond_resched();
+
/*
* keep retrying as long as the memcg oom killer is able to make
* a forward progress or bypass the charge if the oom killer
--
2.17.1


2020-09-08 19:27:01

by Roman Gushchin

[permalink] [raw]
Subject: Re: [PATCH] mm: memcg: yield cpu when we fail to charge pages

On Tue, Sep 08, 2020 at 11:50:51AM -0700, Julius Hemanth Pitti wrote:
> For non root CG, in try_charge(), we keep trying
> to charge until we succeed. On non-preemptive
> kernel, when we are OOM, this results in holding
> CPU forever.
>
> On SMP systems, this doesn't create a big problem
> because oom_reaper get a change to kill victim
> and make some free pages. However on a single-core
> CPU (or cases where oom_reaper pinned to same CPU
> where try_charge is executing), oom_reaper shall
> never get scheduled and we stay in try_charge forever.
>
> Steps to repo this on non-smp:
> 1. mount -t tmpfs none /sys/fs/cgroup
> 2. mkdir /sys/fs/cgroup/memory
> 3. mount -t cgroup none /sys/fs/cgroup/memory -o memory
> 4. mkdir /sys/fs/cgroup/memory/0
> 5. echo 40M > /sys/fs/cgroup/memory/0/memory.limit_in_bytes
> 6. echo $$ > /sys/fs/cgroup/memory/0/tasks
> 7. stress -m 5 --vm-bytes 10M --vm-hang 0
>
> Signed-off-by: Julius Hemanth Pitti <[email protected]>
> ---
> mm/memcontrol.c | 2 ++
> 1 file changed, 2 insertions(+)
>
> diff --git a/mm/memcontrol.c b/mm/memcontrol.c
> index 0d6f3ea86738..4620d70267cb 100644
> --- a/mm/memcontrol.c
> +++ b/mm/memcontrol.c
> @@ -2652,6 +2652,8 @@ static int try_charge(struct mem_cgroup *memcg, gfp_t gfp_mask,
> if (fatal_signal_pending(current))
> goto force;
>
> + cond_resched();
> +

Can you, please, add a short comment here?
Something like "give oom_reaper a chance on a non-SMP system"?

> /*
> * keep retrying as long as the memcg oom killer is able to make
> * a forward progress or bypass the charge if the oom killer
> --
> 2.17.1
>

The patch makes total sense to me. Please, feel free to add
Acked-by: Roman Gushchin <[email protected]> after adding a comment.

Thank you!

2020-09-08 19:31:43

by Julius Hemanth Pitti

[permalink] [raw]
Subject: Re: [PATCH] mm: memcg: yield cpu when we fail to charge pages

On Tue, 2020-09-08 at 12:21 -0700, Roman Gushchin wrote:
> On Tue, Sep 08, 2020 at 11:50:51AM -0700, Julius Hemanth Pitti wrote:
> > For non root CG, in try_charge(), we keep trying
> > to charge until we succeed. On non-preemptive
> > kernel, when we are OOM, this results in holding
> > CPU forever.
> >
> > On SMP systems, this doesn't create a big problem
> > because oom_reaper get a change to kill victim
> > and make some free pages. However on a single-core
> > CPU (or cases where oom_reaper pinned to same CPU
> > where try_charge is executing), oom_reaper shall
> > never get scheduled and we stay in try_charge forever.
> >
> > Steps to repo this on non-smp:
> > 1. mount -t tmpfs none /sys/fs/cgroup
> > 2. mkdir /sys/fs/cgroup/memory
> > 3. mount -t cgroup none /sys/fs/cgroup/memory -o memory
> > 4. mkdir /sys/fs/cgroup/memory/0
> > 5. echo 40M > /sys/fs/cgroup/memory/0/memory.limit_in_bytes
> > 6. echo $$ > /sys/fs/cgroup/memory/0/tasks
> > 7. stress -m 5 --vm-bytes 10M --vm-hang 0
> >
> > Signed-off-by: Julius Hemanth Pitti <[email protected]>
> > ---
> > mm/memcontrol.c | 2 ++
> > 1 file changed, 2 insertions(+)
> >
> > diff --git a/mm/memcontrol.c b/mm/memcontrol.c
> > index 0d6f3ea86738..4620d70267cb 100644
> > --- a/mm/memcontrol.c
> > +++ b/mm/memcontrol.c
> > @@ -2652,6 +2652,8 @@ static int try_charge(struct mem_cgroup
> > *memcg, gfp_t gfp_mask,
> > if (fatal_signal_pending(current))
> > goto force;
> >
> > + cond_resched();
> > +
>
> Can you, please, add a short comment here?
> Something like "give oom_reaper a chance on a non-SMP system"?
Sure.

>
> > /*
> > * keep retrying as long as the memcg oom killer is able to
> > make
> > * a forward progress or bypass the charge if the oom killer
> > --
> > 2.17.1
> >
>
> The patch makes total sense to me. Please, feel free to add
> Acked-by: Roman Gushchin <[email protected]> after adding a comment.
Thanks, I shall add.

>
> Thank you!

2020-09-14 04:16:30

by Xunlei Pang

[permalink] [raw]
Subject: Re: [PATCH] mm: memcg: yield cpu when we fail to charge pages

On 2020/9/9 AM2:50, Julius Hemanth Pitti wrote:
> For non root CG, in try_charge(), we keep trying
> to charge until we succeed. On non-preemptive
> kernel, when we are OOM, this results in holding
> CPU forever.
>
> On SMP systems, this doesn't create a big problem
> because oom_reaper get a change to kill victim
> and make some free pages. However on a single-core
> CPU (or cases where oom_reaper pinned to same CPU
> where try_charge is executing), oom_reaper shall
> never get scheduled and we stay in try_charge forever.
>
> Steps to repo this on non-smp:
> 1. mount -t tmpfs none /sys/fs/cgroup
> 2. mkdir /sys/fs/cgroup/memory
> 3. mount -t cgroup none /sys/fs/cgroup/memory -o memory
> 4. mkdir /sys/fs/cgroup/memory/0
> 5. echo 40M > /sys/fs/cgroup/memory/0/memory.limit_in_bytes
> 6. echo $$ > /sys/fs/cgroup/memory/0/tasks
> 7. stress -m 5 --vm-bytes 10M --vm-hang 0
>
> Signed-off-by: Julius Hemanth Pitti <[email protected]>
> ---
> mm/memcontrol.c | 2 ++
> 1 file changed, 2 insertions(+)
>
> diff --git a/mm/memcontrol.c b/mm/memcontrol.c
> index 0d6f3ea86738..4620d70267cb 100644
> --- a/mm/memcontrol.c
> +++ b/mm/memcontrol.c
> @@ -2652,6 +2652,8 @@ static int try_charge(struct mem_cgroup *memcg, gfp_t gfp_mask,
> if (fatal_signal_pending(current))
> goto force;
>
> + cond_resched();
> +
> /*
> * keep retrying as long as the memcg oom killer is able to make
> * a forward progress or bypass the charge if the oom killer
>

This should be fixed by:
https://lkml.org/lkml/2020/8/26/1440

Thanks,
Xunlei