2020-05-08 14:23:20

by Konstantin Khlebnikov

[permalink] [raw]
Subject: [PATCH] doc: cgroup: update note about conditions when oom killer is invoked

Starting from v4.19 commit 29ef680ae7c2 ("memcg, oom: move out_of_memory
back to the charge path") cgroup oom killer is no longer invoked only from
page faults. Now it implements the same semantics as global OOM killer:
allocation context invokes OOM killer and keeps retrying until success.

Signed-off-by: Konstantin Khlebnikov <[email protected]>
---
Documentation/admin-guide/cgroup-v2.rst | 17 ++++++++---------
1 file changed, 8 insertions(+), 9 deletions(-)

diff --git a/Documentation/admin-guide/cgroup-v2.rst b/Documentation/admin-guide/cgroup-v2.rst
index bcc80269bb6a..1bb9a8f6ebe1 100644
--- a/Documentation/admin-guide/cgroup-v2.rst
+++ b/Documentation/admin-guide/cgroup-v2.rst
@@ -1172,6 +1172,13 @@ PAGE_SIZE multiple when read back.
Under certain circumstances, the usage may go over the limit
temporarily.

+ In default configuration regular 0-order allocation always
+ succeed unless OOM killer choose current task as a victim.
+
+ Some kinds of allocations don't invoke the OOM killer.
+ Caller could retry them differently, return into userspace
+ as -ENOMEM or silently ignore in cases like disk readahead.
+
This is the ultimate protection mechanism. As long as the
high limit is used and monitored properly, this limit's
utility is limited to providing the final safety net.
@@ -1228,17 +1235,9 @@ PAGE_SIZE multiple when read back.
The number of time the cgroup's memory usage was
reached the limit and allocation was about to fail.

- Depending on context result could be invocation of OOM
- killer and retrying allocation or failing allocation.
-
- Failed allocation in its turn could be returned into
- userspace as -ENOMEM or silently ignored in cases like
- disk readahead. For now OOM in memory cgroup kills
- tasks iff shortage has happened inside page fault.
-
This event is not raised if the OOM killer is not
considered as an option, e.g. for failed high-order
- allocations.
+ allocations or if caller asked to not retry attempts.

oom_kill
The number of processes belonging to this cgroup


2020-05-08 16:05:28

by Randy Dunlap

[permalink] [raw]
Subject: Re: [PATCH] doc: cgroup: update note about conditions when oom killer is invoked

Hi,

On 5/8/20 7:16 AM, Konstantin Khlebnikov wrote:
> Starting from v4.19 commit 29ef680ae7c2 ("memcg, oom: move out_of_memory
> back to the charge path") cgroup oom killer is no longer invoked only from
> page faults. Now it implements the same semantics as global OOM killer:
> allocation context invokes OOM killer and keeps retrying until success.
>
> Signed-off-by: Konstantin Khlebnikov <[email protected]>
> ---
> Documentation/admin-guide/cgroup-v2.rst | 17 ++++++++---------
> 1 file changed, 8 insertions(+), 9 deletions(-)
>
> diff --git a/Documentation/admin-guide/cgroup-v2.rst b/Documentation/admin-guide/cgroup-v2.rst
> index bcc80269bb6a..1bb9a8f6ebe1 100644
> --- a/Documentation/admin-guide/cgroup-v2.rst
> +++ b/Documentation/admin-guide/cgroup-v2.rst
> @@ -1172,6 +1172,13 @@ PAGE_SIZE multiple when read back.
> Under certain circumstances, the usage may go over the limit
> temporarily.
>
> + In default configuration regular 0-order allocation always

allocations

> + succeed unless OOM killer choose current task as a victim.

chooses

> +
> + Some kinds of allocations don't invoke the OOM killer.
> + Caller could retry them differently, return into userspace
> + as -ENOMEM or silently ignore in cases like disk readahead.
> +
> This is the ultimate protection mechanism. As long as the
> high limit is used and monitored properly, this limit's
> utility is limited to providing the final safety net.
> @@ -1228,17 +1235,9 @@ PAGE_SIZE multiple when read back.
> The number of time the cgroup's memory usage was
> reached the limit and allocation was about to fail.
>
> - Depending on context result could be invocation of OOM
> - killer and retrying allocation or failing allocation.
> -
> - Failed allocation in its turn could be returned into
> - userspace as -ENOMEM or silently ignored in cases like
> - disk readahead. For now OOM in memory cgroup kills
> - tasks iff shortage has happened inside page fault.
> -
> This event is not raised if the OOM killer is not
> considered as an option, e.g. for failed high-order
> - allocations.
> + allocations or if caller asked to not retry attempts.
>
> oom_kill
> The number of processes belonging to this cgroup
>


thanks for updating the docs.
--
~Randy

2020-05-11 08:42:53

by Michal Hocko

[permalink] [raw]
Subject: Re: [PATCH] doc: cgroup: update note about conditions when oom killer is invoked

On Fri 08-05-20 17:16:29, Konstantin Khlebnikov wrote:
> Starting from v4.19 commit 29ef680ae7c2 ("memcg, oom: move out_of_memory
> back to the charge path") cgroup oom killer is no longer invoked only from
> page faults. Now it implements the same semantics as global OOM killer:
> allocation context invokes OOM killer and keeps retrying until success.
>
> Signed-off-by: Konstantin Khlebnikov <[email protected]>

Acked-by: Michal Hocko <[email protected]>

> ---
> Documentation/admin-guide/cgroup-v2.rst | 17 ++++++++---------
> 1 file changed, 8 insertions(+), 9 deletions(-)
>
> diff --git a/Documentation/admin-guide/cgroup-v2.rst b/Documentation/admin-guide/cgroup-v2.rst
> index bcc80269bb6a..1bb9a8f6ebe1 100644
> --- a/Documentation/admin-guide/cgroup-v2.rst
> +++ b/Documentation/admin-guide/cgroup-v2.rst
> @@ -1172,6 +1172,13 @@ PAGE_SIZE multiple when read back.
> Under certain circumstances, the usage may go over the limit
> temporarily.
>
> + In default configuration regular 0-order allocation always
> + succeed unless OOM killer choose current task as a victim.
> +
> + Some kinds of allocations don't invoke the OOM killer.
> + Caller could retry them differently, return into userspace
> + as -ENOMEM or silently ignore in cases like disk readahead.

I would probably add -EFAULT but the less error codes we document the
better.

> +
> This is the ultimate protection mechanism. As long as the
> high limit is used and monitored properly, this limit's
> utility is limited to providing the final safety net.
> @@ -1228,17 +1235,9 @@ PAGE_SIZE multiple when read back.
> The number of time the cgroup's memory usage was
> reached the limit and allocation was about to fail.
>
> - Depending on context result could be invocation of OOM
> - killer and retrying allocation or failing allocation.
> -
> - Failed allocation in its turn could be returned into
> - userspace as -ENOMEM or silently ignored in cases like
> - disk readahead. For now OOM in memory cgroup kills
> - tasks iff shortage has happened inside page fault.
> -
> This event is not raised if the OOM killer is not
> considered as an option, e.g. for failed high-order
> - allocations.
> + allocations or if caller asked to not retry attempts.
>
> oom_kill
> The number of processes belonging to this cgroup

--
Michal Hocko
SUSE Labs

2020-05-11 09:38:23

by Konstantin Khlebnikov

[permalink] [raw]
Subject: Re: [PATCH] doc: cgroup: update note about conditions when oom killer is invoked



On 11/05/2020 11.39, Michal Hocko wrote:
> On Fri 08-05-20 17:16:29, Konstantin Khlebnikov wrote:
>> Starting from v4.19 commit 29ef680ae7c2 ("memcg, oom: move out_of_memory
>> back to the charge path") cgroup oom killer is no longer invoked only from
>> page faults. Now it implements the same semantics as global OOM killer:
>> allocation context invokes OOM killer and keeps retrying until success.
>>
>> Signed-off-by: Konstantin Khlebnikov <[email protected]>
>
> Acked-by: Michal Hocko <[email protected]>
>
>> ---
>> Documentation/admin-guide/cgroup-v2.rst | 17 ++++++++---------
>> 1 file changed, 8 insertions(+), 9 deletions(-)
>>
>> diff --git a/Documentation/admin-guide/cgroup-v2.rst b/Documentation/admin-guide/cgroup-v2.rst
>> index bcc80269bb6a..1bb9a8f6ebe1 100644
>> --- a/Documentation/admin-guide/cgroup-v2.rst
>> +++ b/Documentation/admin-guide/cgroup-v2.rst
>> @@ -1172,6 +1172,13 @@ PAGE_SIZE multiple when read back.
>> Under certain circumstances, the usage may go over the limit
>> temporarily.
>>
>> + In default configuration regular 0-order allocation always
>> + succeed unless OOM killer choose current task as a victim.
>> +
>> + Some kinds of allocations don't invoke the OOM killer.
>> + Caller could retry them differently, return into userspace
>> + as -ENOMEM or silently ignore in cases like disk readahead.
>
> I would probably add -EFAULT but the less error codes we document the
> better.

Yeah, EFAULT was a most obscure result of memory shortage.
Fortunately with new behaviour this shouldn't happens a lot.

Actually where it is still possible? THP always fallback to 0-order.
I mean EFAULT could appear inside kernel only if task is killed so
nobody would see it.

>
>> +
>> This is the ultimate protection mechanism. As long as the
>> high limit is used and monitored properly, this limit's
>> utility is limited to providing the final safety net.
>> @@ -1228,17 +1235,9 @@ PAGE_SIZE multiple when read back.
>> The number of time the cgroup's memory usage was
>> reached the limit and allocation was about to fail.
>>
>> - Depending on context result could be invocation of OOM
>> - killer and retrying allocation or failing allocation.
>> -
>> - Failed allocation in its turn could be returned into
>> - userspace as -ENOMEM or silently ignored in cases like
>> - disk readahead. For now OOM in memory cgroup kills
>> - tasks iff shortage has happened inside page fault.
>> -
>> This event is not raised if the OOM killer is not
>> considered as an option, e.g. for failed high-order
>> - allocations.
>> + allocations or if caller asked to not retry attempts.
>>
>> oom_kill
>> The number of processes belonging to this cgroup
>

2020-05-11 10:15:42

by Michal Hocko

[permalink] [raw]
Subject: Re: [PATCH] doc: cgroup: update note about conditions when oom killer is invoked

On Mon 11-05-20 12:34:00, Konstantin Khlebnikov wrote:
>
>
> On 11/05/2020 11.39, Michal Hocko wrote:
> > On Fri 08-05-20 17:16:29, Konstantin Khlebnikov wrote:
> > > Starting from v4.19 commit 29ef680ae7c2 ("memcg, oom: move out_of_memory
> > > back to the charge path") cgroup oom killer is no longer invoked only from
> > > page faults. Now it implements the same semantics as global OOM killer:
> > > allocation context invokes OOM killer and keeps retrying until success.
> > >
> > > Signed-off-by: Konstantin Khlebnikov <[email protected]>
> >
> > Acked-by: Michal Hocko <[email protected]>
> >
> > > ---
> > > Documentation/admin-guide/cgroup-v2.rst | 17 ++++++++---------
> > > 1 file changed, 8 insertions(+), 9 deletions(-)
> > >
> > > diff --git a/Documentation/admin-guide/cgroup-v2.rst b/Documentation/admin-guide/cgroup-v2.rst
> > > index bcc80269bb6a..1bb9a8f6ebe1 100644
> > > --- a/Documentation/admin-guide/cgroup-v2.rst
> > > +++ b/Documentation/admin-guide/cgroup-v2.rst
> > > @@ -1172,6 +1172,13 @@ PAGE_SIZE multiple when read back.
> > > Under certain circumstances, the usage may go over the limit
> > > temporarily.
> > > + In default configuration regular 0-order allocation always
> > > + succeed unless OOM killer choose current task as a victim.
> > > +
> > > + Some kinds of allocations don't invoke the OOM killer.
> > > + Caller could retry them differently, return into userspace
> > > + as -ENOMEM or silently ignore in cases like disk readahead.
> >
> > I would probably add -EFAULT but the less error codes we document the
> > better.
>
> Yeah, EFAULT was a most obscure result of memory shortage.
> Fortunately with new behaviour this shouldn't happens a lot.

Yes, it shouldn't really happen very often. gup was the most prominent
example but this one should be taken care of by triggering the OOM
killer. But I wouldn't bet my hat there are no potential cases anymore.

> Actually where it is still possible? THP always fallback to 0-order.
> I mean EFAULT could appear inside kernel only if task is killed so
> nobody would see it.

Yes fatal_signal_pending paths are ok. And no I do not have any specific
examples. But as you've said EFAULT was a real surprise so I thought it
would be nice to still keep a reference for it around. Even when it is
unlikely.

--
Michal Hocko
SUSE Labs