LinuxLists.cc - OOM killer not nearly agressive enough?

2020-01-07 20:47:33

Subject: OOM killer not nearly agressive enough?

Hi!

I updated my userspace to x86-64, and now chromium likes to eat all
the memory and bring the system to standstill.

Unfortunately, OOM killer does not react:

I'm now running "ps aux", and it prints one line every 20 seconds or
more. Do we agree that is "unusable" system? I attempted to do kill
from other session.

Do we agree that OOM killer should have reacted way sooner?

Is there something I can tweak to make it behave more reasonably?

Best regards,
Pavel

--
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html

Attachments:

(No filename) (639.00 B)
signature.asc (188.00 B)
Digital signature Download all attachments

2020-01-09 14:47:12

by Michal Hocko

[permalink] [raw]

Subject: Re: OOM killer not nearly agressive enough?

On Tue 07-01-20 21:44:12, Pavel Machek wrote:
> Hi!
>
> I updated my userspace to x86-64, and now chromium likes to eat all
> the memory and bring the system to standstill.
>
> Unfortunately, OOM killer does not react:
>
> I'm now running "ps aux", and it prints one line every 20 seconds or
> more. Do we agree that is "unusable" system? I attempted to do kill
> from other session.

Does sysrq+f help?

> Do we agree that OOM killer should have reacted way sooner?

This is impossible to answer without knowing what was going on at the
time. Was the system threshing over page cache/swap? In other words, is
the system completely out of memory or refaulting the working set all
the time because it doesn't fit into memory?

> Is there something I can tweak to make it behave more reasonably?

PSI based early OOM killing might help. See https://github.com/facebookincubator/oomd
--
Michal Hocko
SUSE Labs

2020-01-09 21:05:44

by Pavel Machek

[permalink] [raw]

Subject: Re: OOM killer not nearly agressive enough?

On Thu 2020-01-09 12:56:33, Michal Hocko wrote:
> On Tue 07-01-20 21:44:12, Pavel Machek wrote:
> > Hi!
> >
> > I updated my userspace to x86-64, and now chromium likes to eat all
> > the memory and bring the system to standstill.
> >
> > Unfortunately, OOM killer does not react:
> >
> > I'm now running "ps aux", and it prints one line every 20 seconds or
> > more. Do we agree that is "unusable" system? I attempted to do kill
> > from other session.
>
> Does sysrq+f help?

May try that next time.

> > Do we agree that OOM killer should have reacted way sooner?
>
> This is impossible to answer without knowing what was going on at the
> time. Was the system threshing over page cache/swap? In other words, is
> the system completely out of memory or refaulting the working set all
> the time because it doesn't fit into memory?

Swap was full, so "completely out of memory", I guess. Chromium does
that fairly often :-(.

> > Is there something I can tweak to make it behave more reasonably?
>
> PSI based early OOM killing might help. See https://github.com/facebookincubator/oomd

Um. Before doing that... is there some knob somewhere saying "hey
oomkiller, one hour to recover machine is a bit too much, can you
please react sooner"? PSI is completely different system, but I guess
I should attempt to tweak the existing one first...

Pavel
--
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html

Attachments:

(No filename) (1.50 kB)
signature.asc (201.00 B)
Download all attachments

2020-01-09 21:06:45

by Pavel Machek

[permalink] [raw]

Subject: Re: OOM killer not nearly agressive enough?

On Thu 2020-01-09 12:56:33, Michal Hocko wrote:
> On Tue 07-01-20 21:44:12, Pavel Machek wrote:
> > Hi!
> >
> > I updated my userspace to x86-64, and now chromium likes to eat all
> > the memory and bring the system to standstill.
> >
> > Unfortunately, OOM killer does not react:
> >
> > I'm now running "ps aux", and it prints one line every 20 seconds or
> > more. Do we agree that is "unusable" system? I attempted to do kill
> > from other session.
>
> Does sysrq+f help?
>
> > Do we agree that OOM killer should have reacted way sooner?
>
> This is impossible to answer without knowing what was going on at the
> time. Was the system threshing over page cache/swap? In other words, is
> the system completely out of memory or refaulting the working set all
> the time because it doesn't fit into memory?

What statistics are best to collect? Would the memory lines from top
do the trick? I normally have gkrellm running, but I found its results
hard to interpret.

Best regards,
Pavel
--
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html

Attachments:

(No filename) (1.15 kB)
signature.asc (201.00 B)
Download all attachments

2020-01-09 21:26:47

by Michal Hocko

[permalink] [raw]

Subject: Re: OOM killer not nearly agressive enough?

On Thu 09-01-20 22:03:07, Pavel Machek wrote:
> On Thu 2020-01-09 12:56:33, Michal Hocko wrote:
> > On Tue 07-01-20 21:44:12, Pavel Machek wrote:
> > > Hi!
> > >
> > > I updated my userspace to x86-64, and now chromium likes to eat all
> > > the memory and bring the system to standstill.
> > >
> > > Unfortunately, OOM killer does not react:
> > >
> > > I'm now running "ps aux", and it prints one line every 20 seconds or
> > > more. Do we agree that is "unusable" system? I attempted to do kill
> > > from other session.
> >
> > Does sysrq+f help?
>
> May try that next time.
>
> > > Do we agree that OOM killer should have reacted way sooner?
> >
> > This is impossible to answer without knowing what was going on at the
> > time. Was the system threshing over page cache/swap? In other words, is
> > the system completely out of memory or refaulting the working set all
> > the time because it doesn't fit into memory?
>
> Swap was full, so "completely out of memory", I guess. Chromium does
> that fairly often :-(.

The oom heuristic is based on the reclaim failure. If the reclaim makes
some progress then the oom killer is not hit. Have a look at
should_reclaim_retry for more details.

> > > Is there something I can tweak to make it behave more reasonably?
> >
> > PSI based early OOM killing might help. See https://github.com/facebookincubator/oomd
>
> Um. Before doing that... is there some knob somewhere saying "hey
> oomkiller, one hour to recover machine is a bit too much, can you
> please react sooner"?

No, there is nothing like that.

> PSI is completely different system, but I guess
> I should attempt to tweak the existing one first...

PSI is measuring the cost of the allocation (among other things) and
that can give you some idea on how much time is spent to get memory.
Userspace can implement a policy based on that and act. The kernel oom
killer is the last resort when there is really no memory to allocate.
--
Michal Hocko
SUSE Labs

2020-01-09 21:30:32

by Michal Hocko

[permalink] [raw]

Subject: Re: OOM killer not nearly agressive enough?

On Thu 09-01-20 22:05:36, Pavel Machek wrote:
> On Thu 2020-01-09 12:56:33, Michal Hocko wrote:
> > On Tue 07-01-20 21:44:12, Pavel Machek wrote:
> > > Hi!
> > >
> > > I updated my userspace to x86-64, and now chromium likes to eat all
> > > the memory and bring the system to standstill.
> > >
> > > Unfortunately, OOM killer does not react:
> > >
> > > I'm now running "ps aux", and it prints one line every 20 seconds or
> > > more. Do we agree that is "unusable" system? I attempted to do kill
> > > from other session.
> >
> > Does sysrq+f help?
> >
> > > Do we agree that OOM killer should have reacted way sooner?
> >
> > This is impossible to answer without knowing what was going on at the
> > time. Was the system threshing over page cache/swap? In other words, is
> > the system completely out of memory or refaulting the working set all
> > the time because it doesn't fit into memory?
>
> What statistics are best to collect? Would the memory lines from top
> do the trick? I normally have gkrellm running, but I found its results
> hard to interpret.

/proc/vmstat (and collecting it periodically) gives the most
comprehensive picture about the state of MM. Interpreting numbers is far
from trivial though. It requires to analyze multiple snapshots usually
to see how the situation evolves.

--
Michal Hocko
SUSE Labs

2020-01-09 21:53:58

by Vito Caputo

[permalink] [raw]

Subject: Re: OOM killer not nearly agressive enough?

On Thu, Jan 09, 2020 at 10:03:07PM +0100, Pavel Machek wrote:
> On Thu 2020-01-09 12:56:33, Michal Hocko wrote:
> > On Tue 07-01-20 21:44:12, Pavel Machek wrote:
> > > Hi!
> > >
> > > I updated my userspace to x86-64, and now chromium likes to eat all
> > > the memory and bring the system to standstill.
> > >
> > > Unfortunately, OOM killer does not react:
> > >
> > > I'm now running "ps aux", and it prints one line every 20 seconds or
> > > more. Do we agree that is "unusable" system? I attempted to do kill
> > > from other session.
> >
> > Does sysrq+f help?
>
> May try that next time.
>
> > > Do we agree that OOM killer should have reacted way sooner?
> >
> > This is impossible to answer without knowing what was going on at the
> > time. Was the system threshing over page cache/swap? In other words, is
> > the system completely out of memory or refaulting the working set all
> > the time because it doesn't fit into memory?
>
> Swap was full, so "completely out of memory", I guess. Chromium does
> that fairly often :-(.
>

Have you considered restricting its memory limits a la `ulimit -m`?

I've taken to running browsers in nspawn containers for general
isolation improvements, but this also makes it easy to set cgroup
resource limits like memcg. i.e. --property MemoryMax=2G

This prevents the browser from bogging down the entire system, but it
doesn't prevent thrashing before FF OOMs within its control group.

I do feel there's a problem with the kernel's reclaim algorithm, it
seems far too willing to evict file-backed pages that are recently in
use. But at least with memcg this behavior is isolated to the cgroup,
though it still generates a crapload of disk reads from all the
thrashing.

Regards,
Vito Caputo

2020-01-09 22:00:13

by Michal Hocko

[permalink] [raw]

Subject: Re: OOM killer not nearly agressive enough?

On Thu 09-01-20 13:46:04, Vito Caputo wrote:
> On Thu, Jan 09, 2020 at 10:03:07PM +0100, Pavel Machek wrote:
> > On Thu 2020-01-09 12:56:33, Michal Hocko wrote:
> > > On Tue 07-01-20 21:44:12, Pavel Machek wrote:
> > > > Hi!
> > > >
> > > > I updated my userspace to x86-64, and now chromium likes to eat all
> > > > the memory and bring the system to standstill.
> > > >
> > > > Unfortunately, OOM killer does not react:
> > > >
> > > > I'm now running "ps aux", and it prints one line every 20 seconds or
> > > > more. Do we agree that is "unusable" system? I attempted to do kill
> > > > from other session.
> > >
> > > Does sysrq+f help?
> >
> > May try that next time.
> >
> > > > Do we agree that OOM killer should have reacted way sooner?
> > >
> > > This is impossible to answer without knowing what was going on at the
> > > time. Was the system threshing over page cache/swap? In other words, is
> > > the system completely out of memory or refaulting the working set all
> > > the time because it doesn't fit into memory?
> >
> > Swap was full, so "completely out of memory", I guess. Chromium does
> > that fairly often :-(.
> >
>
> Have you considered restricting its memory limits a la `ulimit -m`?

The kernel ignores RLIMIT_RSS. Unless the browser takes it into
consideration then I do not see how that would help.

> I've taken to running browsers in nspawn containers for general
> isolation improvements, but this also makes it easy to set cgroup
> resource limits like memcg. i.e. --property MemoryMax=2G

Yes, this should help to isolate the problem.

> This prevents the browser from bogging down the entire system, but it
> doesn't prevent thrashing before FF OOMs within its control group.
>
> I do feel there's a problem with the kernel's reclaim algorithm, it
> seems far too willing to evict file-backed pages that are recently in
> use.

It is true that the memory reclaim is quite page cache reclaim biased
unless there is very small amount of the page cache. Page cache refault
is considered during the reclaim but I am afraid that there are still
corner cases where the workload might end up threshing. Be it on the
page cache or the anonymous memory depending on the workload. Anyway
getting data from real workloads is always good so that we can think on
improving existing heuristics.

--
Michal Hocko
SUSE Labs

2020-01-09 22:51:27

by Pavel Machek

[permalink] [raw]

Subject: Re: OOM killer not nearly agressive enough?

Hi!

> > > > Do we agree that OOM killer should have reacted way sooner?
> > >
> > > This is impossible to answer without knowing what was going on at the
> > > time. Was the system threshing over page cache/swap? In other words, is
> > > the system completely out of memory or refaulting the working set all
> > > the time because it doesn't fit into memory?
> >
> > Swap was full, so "completely out of memory", I guess. Chromium does
> > that fairly often :-(.
>
> The oom heuristic is based on the reclaim failure. If the reclaim makes
> some progress then the oom killer is not hit. Have a look at
> should_reclaim_retry for more details.

Thanks for pointer.

I guess setting MAX_RECLAIM_RETRIES to 1 is not something you'd
recommend? :-).

> > PSI is completely different system, but I guess
> > I should attempt to tweak the existing one first...
>
> PSI is measuring the cost of the allocation (among other things) and
> that can give you some idea on how much time is spent to get memory.
> Userspace can implement a policy based on that and act. The kernel oom
> killer is the last resort when there is really no memory to
> allocate.

So what I'm seeing is system that is unresponsive, easily for an hour.

Sometimes, I'm able to log in. When I could do that, system was
absurdly slow, like ps printing at more than 10 seconds per line.
ps on my system takes 300msec, estimate in the slow case would be 2000
seconds, that is slowdown by factor of 6000x. That would be X terminal
opening in like two hours... that's not really usable.

DRAM is in 100nsec range, disk is in 10msec range; so worst case
slowdown is somewhere in 100000x range. (Actually, in the worst case
userland will do no progress at all, since you can need at 4+ pages in
single CPU instruction, right?)

But kernel is happy; system is unusable and will stay unusable for
hour or more, and there's not much user can do. (Besides sysrq, thanks
for the hint).

Can we do better? This is equivalent of system crash, and it is _way_
too easy to trigger. Should we do better by default?

Dunno. If user moved the mouse, and cursor did not move for 10
seconds, perhaps it is time for oom kill?

Or should I add more swap? Is it terrible to place swap on SSD?

Best regards,

Pavel
--
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html

Attachments:

(No filename) (2.41 kB)
signature.asc (188.00 B)
Digital signature Download all attachments

2020-01-10 01:25:32

by Shakeel Butt

[permalink] [raw]

Subject: Re: OOM killer not nearly agressive enough?

On Thu, Jan 9, 2020 at 2:49 PM Pavel Machek <[email protected]> wrote:
>
> Hi!
>
> > > > > Do we agree that OOM killer should have reacted way sooner?
> > > >
> > > > This is impossible to answer without knowing what was going on at the
> > > > time. Was the system threshing over page cache/swap? In other words, is
> > > > the system completely out of memory or refaulting the working set all
> > > > the time because it doesn't fit into memory?
> > >
> > > Swap was full, so "completely out of memory", I guess. Chromium does
> > > that fairly often :-(.
> >
> > The oom heuristic is based on the reclaim failure. If the reclaim makes
> > some progress then the oom killer is not hit. Have a look at
> > should_reclaim_retry for more details.
>
> Thanks for pointer.
>
> I guess setting MAX_RECLAIM_RETRIES to 1 is not something you'd
> recommend? :-).
>
> > > PSI is completely different system, but I guess
> > > I should attempt to tweak the existing one first...
> >
> > PSI is measuring the cost of the allocation (among other things) and
> > that can give you some idea on how much time is spent to get memory.
> > Userspace can implement a policy based on that and act. The kernel oom
> > killer is the last resort when there is really no memory to
> > allocate.
>
> So what I'm seeing is system that is unresponsive, easily for an hour.
>
> Sometimes, I'm able to log in. When I could do that, system was
> absurdly slow, like ps printing at more than 10 seconds per line.
> ps on my system takes 300msec, estimate in the slow case would be 2000
> seconds, that is slowdown by factor of 6000x. That would be X terminal
> opening in like two hours... that's not really usable.
>
> DRAM is in 100nsec range, disk is in 10msec range; so worst case
> slowdown is somewhere in 100000x range. (Actually, in the worst case
> userland will do no progress at all, since you can need at 4+ pages in
> single CPU instruction, right?)
>
> But kernel is happy; system is unusable and will stay unusable for
> hour or more, and there's not much user can do. (Besides sysrq, thanks
> for the hint).
>
> Can we do better? This is equivalent of system crash, and it is _way_
> too easy to trigger. Should we do better by default?
>
> Dunno. If user moved the mouse, and cursor did not move for 10
> seconds, perhaps it is time for oom kill?
>
> Or should I add more swap? Is it terrible to place swap on SSD?
>

What's the kernel version? How much memory is anon and file pages?
What's your swap to DRAM ratio? Are you using in-memory compression
based swap? Have you tried to disable swap completely?

Shakeel

2020-01-10 06:33:02

by Michal Hocko

[permalink] [raw]

Subject: Re: OOM killer not nearly agressive enough?

On Thu 09-01-20 23:48:45, Pavel Machek wrote:
> Hi!
>
> > > > > Do we agree that OOM killer should have reacted way sooner?
> > > >
> > > > This is impossible to answer without knowing what was going on at the
> > > > time. Was the system threshing over page cache/swap? In other words, is
> > > > the system completely out of memory or refaulting the working set all
> > > > the time because it doesn't fit into memory?
> > >
> > > Swap was full, so "completely out of memory", I guess. Chromium does
> > > that fairly often :-(.
> >
> > The oom heuristic is based on the reclaim failure. If the reclaim makes
> > some progress then the oom killer is not hit. Have a look at
> > should_reclaim_retry for more details.
>
> Thanks for pointer.
>
> I guess setting MAX_RECLAIM_RETRIES to 1 is not something you'd
> recommend? :-).

You can certainly play with that. I am not overly optimistic that would
help though because symptoms of a threshing system is that we actually
do not even reach this point. Pages are simply recycled but they evict
other part of the hot working set. But I am only guessing what is the
problem in your case. Anyway MAX_RECLAIM_RETRIES would tend to be more
timing sensitive in general. If the reclaim progress cannot be made
because of IO latencies or other resource depletion then the OOM be
declared too early. The current MAX_RECLAIM_RETRIES is not something we
have tuned for in any sense. I remember it didn't make much difference
to change it unless the number would be really high which would be
signal that the reclaim is not throttled very well.

> > > PSI is completely different system, but I guess
> > > I should attempt to tweak the existing one first...
> >
> > PSI is measuring the cost of the allocation (among other things) and
> > that can give you some idea on how much time is spent to get memory.
> > Userspace can implement a policy based on that and act. The kernel oom
> > killer is the last resort when there is really no memory to
> > allocate.
>
> So what I'm seeing is system that is unresponsive, easily for an hour.
>
> Sometimes, I'm able to log in. When I could do that, system was
> absurdly slow, like ps printing at more than 10 seconds per line.
> ps on my system takes 300msec, estimate in the slow case would be 2000
> seconds, that is slowdown by factor of 6000x. That would be X terminal
> opening in like two hours... that's not really usable.

It would be great to find out what is the bottle neck. Is the allocator
stuck in the memory reclaim? Waiting on some lock? Reclaiming pages
which are stolen by other contending processes?

--
Michal Hocko
SUSE Labs