2021-02-26 02:16:49

by Yang Shi

[permalink] [raw]
Subject: [PATCH] doc: memcontrol: add description for oom_kill

When debugging an oom issue, I found the oom_kill counter of memcg is
confusing. At the first glance without checking document, I thought it
just counts for memcg oom, but it turns out it counts both global and
memcg oom.

The cgroup v2 documents it, but the description is missed for cgroup v1.

Signed-off-by: Yang Shi <[email protected]>
---
Documentation/admin-guide/cgroup-v1/memory.rst | 3 +++
1 file changed, 3 insertions(+)

diff --git a/Documentation/admin-guide/cgroup-v1/memory.rst b/Documentation/admin-guide/cgroup-v1/memory.rst
index 0936412e044e..44d5429636e2 100644
--- a/Documentation/admin-guide/cgroup-v1/memory.rst
+++ b/Documentation/admin-guide/cgroup-v1/memory.rst
@@ -851,6 +851,9 @@ At reading, current status of OOM is shown.
(if 1, oom-killer is disabled)
- under_oom 0 or 1
(if 1, the memory cgroup is under OOM, tasks may be stopped.)
+ - oom_kill integer counter
+ The number of processes belonging to this cgroup killed by any
+ kind of OOM killer.

11. Memory Pressure
===================
--
2.26.2


2021-02-26 07:33:54

by Michal Hocko

[permalink] [raw]
Subject: Re: [PATCH] doc: memcontrol: add description for oom_kill

On Thu 25-02-21 18:12:54, Yang Shi wrote:
> When debugging an oom issue, I found the oom_kill counter of memcg is
> confusing. At the first glance without checking document, I thought it
> just counts for memcg oom, but it turns out it counts both global and
> memcg oom.

Yes, this is the case indeed. The point of the counter was to count oom
victims from the memcg rather than matching that to the source of the
oom. Rememeber that this could have been a memcg oom up in the
hierarchy as well. Counting victims on the oom origin could be equally
confusing because in many cases there would be no victim counted for the
above mentioned memcg ooms.

> The cgroup v2 documents it, but the description is missed for cgroup v1.
>
> Signed-off-by: Yang Shi <[email protected]>

Acked-by: Michal Hocko <[email protected]>

> ---
> Documentation/admin-guide/cgroup-v1/memory.rst | 3 +++
> 1 file changed, 3 insertions(+)
>
> diff --git a/Documentation/admin-guide/cgroup-v1/memory.rst b/Documentation/admin-guide/cgroup-v1/memory.rst
> index 0936412e044e..44d5429636e2 100644
> --- a/Documentation/admin-guide/cgroup-v1/memory.rst
> +++ b/Documentation/admin-guide/cgroup-v1/memory.rst
> @@ -851,6 +851,9 @@ At reading, current status of OOM is shown.
> (if 1, oom-killer is disabled)
> - under_oom 0 or 1
> (if 1, the memory cgroup is under OOM, tasks may be stopped.)
> + - oom_kill integer counter
> + The number of processes belonging to this cgroup killed by any
> + kind of OOM killer.
>
> 11. Memory Pressure
> ===================
> --
> 2.26.2
>

--
Michal Hocko
SUSE Labs

2021-02-26 14:28:35

by Shakeel Butt

[permalink] [raw]
Subject: Re: [PATCH] doc: memcontrol: add description for oom_kill

On Thu, Feb 25, 2021 at 6:12 PM Yang Shi <[email protected]> wrote:
>
> When debugging an oom issue, I found the oom_kill counter of memcg is
> confusing. At the first glance without checking document, I thought it
> just counts for memcg oom, but it turns out it counts both global and
> memcg oom.
>
> The cgroup v2 documents it, but the description is missed for cgroup v1.
>
> Signed-off-by: Yang Shi <[email protected]>

Reviewed-by: Shakeel Butt <[email protected]>

2021-02-26 14:38:21

by Chris Down

[permalink] [raw]
Subject: Re: [PATCH] doc: memcontrol: add description for oom_kill

Yang Shi writes:
>When debugging an oom issue, I found the oom_kill counter of memcg is
>confusing. At the first glance without checking document, I thought it
>just counts for memcg oom, but it turns out it counts both global and
>memcg oom.
>
>The cgroup v2 documents it, but the description is missed for cgroup v1.
>
>Signed-off-by: Yang Shi <[email protected]>

Thanks.

Acked-by: Chris Down <[email protected]>

>---
> Documentation/admin-guide/cgroup-v1/memory.rst | 3 +++
> 1 file changed, 3 insertions(+)
>
>diff --git a/Documentation/admin-guide/cgroup-v1/memory.rst b/Documentation/admin-guide/cgroup-v1/memory.rst
>index 0936412e044e..44d5429636e2 100644
>--- a/Documentation/admin-guide/cgroup-v1/memory.rst
>+++ b/Documentation/admin-guide/cgroup-v1/memory.rst
>@@ -851,6 +851,9 @@ At reading, current status of OOM is shown.
> (if 1, oom-killer is disabled)
> - under_oom 0 or 1
> (if 1, the memory cgroup is under OOM, tasks may be stopped.)
>+ - oom_kill integer counter
>+ The number of processes belonging to this cgroup killed by any
>+ kind of OOM killer.
>
> 11. Memory Pressure
> ===================
>--
>2.26.2
>
>

2021-02-26 16:45:03

by Yang Shi

[permalink] [raw]
Subject: Re: [PATCH] doc: memcontrol: add description for oom_kill

On Thu, Feb 25, 2021 at 11:30 PM Michal Hocko <[email protected]> wrote:
>
> On Thu 25-02-21 18:12:54, Yang Shi wrote:
> > When debugging an oom issue, I found the oom_kill counter of memcg is
> > confusing. At the first glance without checking document, I thought it
> > just counts for memcg oom, but it turns out it counts both global and
> > memcg oom.
>
> Yes, this is the case indeed. The point of the counter was to count oom
> victims from the memcg rather than matching that to the source of the
> oom. Rememeber that this could have been a memcg oom up in the
> hierarchy as well. Counting victims on the oom origin could be equally

Yes, it is updated hierarchically on v2, but not on v1. I'm supposed
this is because v1 may work in non-hierarchcal mode? If this is the
only reason we may be able to remove this to get aligned with v2 since
non-hierarchal mode is no longer supported.

> confusing because in many cases there would be no victim counted for the
> above mentioned memcg ooms.
>
> > The cgroup v2 documents it, but the description is missed for cgroup v1.
> >
> > Signed-off-by: Yang Shi <[email protected]>
>
> Acked-by: Michal Hocko <[email protected]>
>
> > ---
> > Documentation/admin-guide/cgroup-v1/memory.rst | 3 +++
> > 1 file changed, 3 insertions(+)
> >
> > diff --git a/Documentation/admin-guide/cgroup-v1/memory.rst b/Documentation/admin-guide/cgroup-v1/memory.rst
> > index 0936412e044e..44d5429636e2 100644
> > --- a/Documentation/admin-guide/cgroup-v1/memory.rst
> > +++ b/Documentation/admin-guide/cgroup-v1/memory.rst
> > @@ -851,6 +851,9 @@ At reading, current status of OOM is shown.
> > (if 1, oom-killer is disabled)
> > - under_oom 0 or 1
> > (if 1, the memory cgroup is under OOM, tasks may be stopped.)
> > + - oom_kill integer counter
> > + The number of processes belonging to this cgroup killed by any
> > + kind of OOM killer.
> >
> > 11. Memory Pressure
> > ===================
> > --
> > 2.26.2
> >
>
> --
> Michal Hocko
> SUSE Labs

2021-02-26 19:22:36

by Yang Shi

[permalink] [raw]
Subject: Re: [PATCH] doc: memcontrol: add description for oom_kill

On Fri, Feb 26, 2021 at 8:42 AM Yang Shi <[email protected]> wrote:
>
> On Thu, Feb 25, 2021 at 11:30 PM Michal Hocko <[email protected]> wrote:
> >
> > On Thu 25-02-21 18:12:54, Yang Shi wrote:
> > > When debugging an oom issue, I found the oom_kill counter of memcg is
> > > confusing. At the first glance without checking document, I thought it
> > > just counts for memcg oom, but it turns out it counts both global and
> > > memcg oom.
> >
> > Yes, this is the case indeed. The point of the counter was to count oom
> > victims from the memcg rather than matching that to the source of the
> > oom. Rememeber that this could have been a memcg oom up in the
> > hierarchy as well. Counting victims on the oom origin could be equally
>
> Yes, it is updated hierarchically on v2, but not on v1. I'm supposed
> this is because v1 may work in non-hierarchcal mode? If this is the
> only reason we may be able to remove this to get aligned with v2 since
> non-hierarchal mode is no longer supported.

BTW, having the counter recorded hierarchically may help out one of
our usecases. We want to monitor the oom_kill for some services, but
systemd would wipe out the cgroup if the service is oom killed then
restart the service from scratch (it means create a brand new cgroup
with the same name). So this systemd behavior makes the counter
useless if it is not recorded hierarchically.

>
> > confusing because in many cases there would be no victim counted for the
> > above mentioned memcg ooms.
> >
> > > The cgroup v2 documents it, but the description is missed for cgroup v1.
> > >
> > > Signed-off-by: Yang Shi <[email protected]>
> >
> > Acked-by: Michal Hocko <[email protected]>
> >
> > > ---
> > > Documentation/admin-guide/cgroup-v1/memory.rst | 3 +++
> > > 1 file changed, 3 insertions(+)
> > >
> > > diff --git a/Documentation/admin-guide/cgroup-v1/memory.rst b/Documentation/admin-guide/cgroup-v1/memory.rst
> > > index 0936412e044e..44d5429636e2 100644
> > > --- a/Documentation/admin-guide/cgroup-v1/memory.rst
> > > +++ b/Documentation/admin-guide/cgroup-v1/memory.rst
> > > @@ -851,6 +851,9 @@ At reading, current status of OOM is shown.
> > > (if 1, oom-killer is disabled)
> > > - under_oom 0 or 1
> > > (if 1, the memory cgroup is under OOM, tasks may be stopped.)
> > > + - oom_kill integer counter
> > > + The number of processes belonging to this cgroup killed by any
> > > + kind of OOM killer.
> > >
> > > 11. Memory Pressure
> > > ===================
> > > --
> > > 2.26.2
> > >
> >
> > --
> > Michal Hocko
> > SUSE Labs

2021-03-01 12:19:48

by Michal Hocko

[permalink] [raw]
Subject: Re: [PATCH] doc: memcontrol: add description for oom_kill

On Fri 26-02-21 08:42:29, Yang Shi wrote:
> On Thu, Feb 25, 2021 at 11:30 PM Michal Hocko <[email protected]> wrote:
> >
> > On Thu 25-02-21 18:12:54, Yang Shi wrote:
> > > When debugging an oom issue, I found the oom_kill counter of memcg is
> > > confusing. At the first glance without checking document, I thought it
> > > just counts for memcg oom, but it turns out it counts both global and
> > > memcg oom.
> >
> > Yes, this is the case indeed. The point of the counter was to count oom
> > victims from the memcg rather than matching that to the source of the
> > oom. Rememeber that this could have been a memcg oom up in the
> > hierarchy as well. Counting victims on the oom origin could be equally
>
> Yes, it is updated hierarchically on v2, but not on v1. I'm supposed
> this is because v1 may work in non-hierarchcal mode? If this is the
> only reason we may be able to remove this to get aligned with v2 since
> non-hierarchal mode is no longer supported.

I believe the reson is that v1 can have tasks in the intermediate
(non-leaf) memcgs. So you wouldn't have a way to tell whether the oom
kill has happened in such a memcg or somewhere down the hierarchy.
--
Michal Hocko
SUSE Labs

2021-03-01 12:29:14

by Michal Hocko

[permalink] [raw]
Subject: Re: [PATCH] doc: memcontrol: add description for oom_kill

On Fri 26-02-21 11:19:51, Yang Shi wrote:
> On Fri, Feb 26, 2021 at 8:42 AM Yang Shi <[email protected]> wrote:
> >
> > On Thu, Feb 25, 2021 at 11:30 PM Michal Hocko <[email protected]> wrote:
> > >
> > > On Thu 25-02-21 18:12:54, Yang Shi wrote:
> > > > When debugging an oom issue, I found the oom_kill counter of memcg is
> > > > confusing. At the first glance without checking document, I thought it
> > > > just counts for memcg oom, but it turns out it counts both global and
> > > > memcg oom.
> > >
> > > Yes, this is the case indeed. The point of the counter was to count oom
> > > victims from the memcg rather than matching that to the source of the
> > > oom. Rememeber that this could have been a memcg oom up in the
> > > hierarchy as well. Counting victims on the oom origin could be equally
> >
> > Yes, it is updated hierarchically on v2, but not on v1. I'm supposed
> > this is because v1 may work in non-hierarchcal mode? If this is the
> > only reason we may be able to remove this to get aligned with v2 since
> > non-hierarchal mode is no longer supported.
>
> BTW, having the counter recorded hierarchically may help out one of
> our usecases. We want to monitor the oom_kill for some services, but
> systemd would wipe out the cgroup if the service is oom killed then
> restart the service from scratch (it means create a brand new cgroup
> with the same name). So this systemd behavior makes the counter
> useless if it is not recorded hierarchically.

Just to make sure I understand correctly. You have a setup where memcg
for a service has a hard limit configured and it is destroyed when oom
happens inside that memcg. A new instance is created at the same place
of the hierarchy with a new memcg. Your problem is that the oom killed
memcg will not be recorded in its parent oom event and the information
will get lost with the torn down memcg. Correct?

If yes then how do you tell which of the child cgroup was killed from
the parent counter? Or is there only a single child?

Anyway, cgroup v2 will offer the hierarchical behavior. Do you have any
strong reasons that you cannot use v2?
--
Michal Hocko
SUSE Labs

2021-03-02 17:16:16

by Jonathan Corbet

[permalink] [raw]
Subject: Re: [PATCH] doc: memcontrol: add description for oom_kill

Yang Shi <[email protected]> writes:

> When debugging an oom issue, I found the oom_kill counter of memcg is
> confusing. At the first glance without checking document, I thought it
> just counts for memcg oom, but it turns out it counts both global and
> memcg oom.
>
> The cgroup v2 documents it, but the description is missed for cgroup v1.
>
> Signed-off-by: Yang Shi <[email protected]>
> ---
> Documentation/admin-guide/cgroup-v1/memory.rst | 3 +++
> 1 file changed, 3 insertions(+)
>
> diff --git a/Documentation/admin-guide/cgroup-v1/memory.rst b/Documentation/admin-guide/cgroup-v1/memory.rst
> index 0936412e044e..44d5429636e2 100644
> --- a/Documentation/admin-guide/cgroup-v1/memory.rst
> +++ b/Documentation/admin-guide/cgroup-v1/memory.rst
> @@ -851,6 +851,9 @@ At reading, current status of OOM is shown.
> (if 1, oom-killer is disabled)
> - under_oom 0 or 1
> (if 1, the memory cgroup is under OOM, tasks may be stopped.)
> + - oom_kill integer counter
> + The number of processes belonging to this cgroup killed by any
> + kind of OOM killer.

Applied, thanks.

jon

2021-03-03 14:07:08

by Yang Shi

[permalink] [raw]
Subject: Re: [PATCH] doc: memcontrol: add description for oom_kill

On Mon, Mar 1, 2021 at 4:15 AM Michal Hocko <[email protected]> wrote:
>
> On Fri 26-02-21 08:42:29, Yang Shi wrote:
> > On Thu, Feb 25, 2021 at 11:30 PM Michal Hocko <[email protected]> wrote:
> > >
> > > On Thu 25-02-21 18:12:54, Yang Shi wrote:
> > > > When debugging an oom issue, I found the oom_kill counter of memcg is
> > > > confusing. At the first glance without checking document, I thought it
> > > > just counts for memcg oom, but it turns out it counts both global and
> > > > memcg oom.
> > >
> > > Yes, this is the case indeed. The point of the counter was to count oom
> > > victims from the memcg rather than matching that to the source of the
> > > oom. Rememeber that this could have been a memcg oom up in the
> > > hierarchy as well. Counting victims on the oom origin could be equally
> >
> > Yes, it is updated hierarchically on v2, but not on v1. I'm supposed
> > this is because v1 may work in non-hierarchcal mode? If this is the
> > only reason we may be able to remove this to get aligned with v2 since
> > non-hierarchal mode is no longer supported.
>
> I believe the reson is that v1 can have tasks in the intermediate
> (non-leaf) memcgs. So you wouldn't have a way to tell whether the oom
> kill has happened in such a memcg or somewhere down the hierarchy.

Aha, I forgot it, that's bad. Although we don't have tasks in
intermediate nodes in practice, I do understand it is not forbidden as
cgroup v2.

> --
> Michal Hocko
> SUSE Labs

2021-03-03 14:32:55

by Yang Shi

[permalink] [raw]
Subject: Re: [PATCH] doc: memcontrol: add description for oom_kill

On Mon, Mar 1, 2021 at 4:24 AM Michal Hocko <[email protected]> wrote:
>
> On Fri 26-02-21 11:19:51, Yang Shi wrote:
> > On Fri, Feb 26, 2021 at 8:42 AM Yang Shi <[email protected]> wrote:
> > >
> > > On Thu, Feb 25, 2021 at 11:30 PM Michal Hocko <[email protected]> wrote:
> > > >
> > > > On Thu 25-02-21 18:12:54, Yang Shi wrote:
> > > > > When debugging an oom issue, I found the oom_kill counter of memcg is
> > > > > confusing. At the first glance without checking document, I thought it
> > > > > just counts for memcg oom, but it turns out it counts both global and
> > > > > memcg oom.
> > > >
> > > > Yes, this is the case indeed. The point of the counter was to count oom
> > > > victims from the memcg rather than matching that to the source of the
> > > > oom. Rememeber that this could have been a memcg oom up in the
> > > > hierarchy as well. Counting victims on the oom origin could be equally
> > >
> > > Yes, it is updated hierarchically on v2, but not on v1. I'm supposed
> > > this is because v1 may work in non-hierarchcal mode? If this is the
> > > only reason we may be able to remove this to get aligned with v2 since
> > > non-hierarchal mode is no longer supported.
> >
> > BTW, having the counter recorded hierarchically may help out one of
> > our usecases. We want to monitor the oom_kill for some services, but
> > systemd would wipe out the cgroup if the service is oom killed then
> > restart the service from scratch (it means create a brand new cgroup
> > with the same name). So this systemd behavior makes the counter
> > useless if it is not recorded hierarchically.
>
> Just to make sure I understand correctly. You have a setup where memcg
> for a service has a hard limit configured and it is destroyed when oom
> happens inside that memcg. A new instance is created at the same place
> of the hierarchy with a new memcg. Your problem is that the oom killed
> memcg will not be recorded in its parent oom event and the information
> will get lost with the torn down memcg. Correct?

Yes. But global oom instead of memcg oom.

>
> If yes then how do you tell which of the child cgroup was killed from
> the parent counter? Or is there only a single child?

Not only a single child, but our case is that oom-killed child
consumes 90% memory, then global oom would kill it. This definitely
doesn't prevent from accounting oom from other children, but we don't
have to have a very accurate counter and in our case we can tell 99%
oom kill happens with that specific memcg.

>
> Anyway, cgroup v2 will offer the hierarchical behavior. Do you have any
> strong reasons that you cannot use v2?

I do prefer to migrate to cgroup v2 personally. But it incurs
significant work for orchestration tools, infrastructure
configuration, monitoring tools, etc which are out of my control.

> --
> Michal Hocko
> SUSE Labs