2006-10-03 22:07:52

by Manish Neema

[permalink] [raw]
Subject: System hang problem.

Sorry, I've lost my patience with RedHat so posting here....

We see this problem frequently on RHEL3.0 U5 and U7. System would
completely hang upon memory shortage. The only option left is
power-cycle (or 'sysrq + b'). System hang occurs with any of the below 3
overcommit settings:

- default (heuristic) overcommit (overcommit_memory=0)
- no overcommit handling by kernel (overcommit_memory=1)
- restrictive overcommit with ratio=100% (overcommit_memory=2;
overcommit_ratio=100)

RHEL3.0 U3 would generate an OOM kill "each and every time" it sensed
system hang but due to other bugs, we had to move away from it. RedHat
calls the timely (at least for us) invocation of OOM in U3 a buggy
implementation and the delayed OOM kill in U5 and U7 the right
implementation (which we rarely get to see resulting in at least 5
systems hanging daily!)

Changing overcommit to 2 (and ratio to any where from 1 to 99) would
result in certain OS processes (automount daemon for e.g.) getting
killed when all the allowed memory is committed. What is the point in
reserving some memory if a random root process would get killed leaving
the system in a totally unknown state?

Any suggestions on how we can prevent system-hang + not have automount
(and any other root process) die?

TIA,
-Manish Neema

P.S. Sorry, we cannot move away from RHEL3.0 U7 for a while.


2006-10-03 23:35:28

by Keith Mannthey

[permalink] [raw]
Subject: Re: System hang problem.

On 10/3/06, Manash Neema <[email protected]> wrote:
> Sorry, I've lost my patience with RedHat so posting here....
>
> We see this problem frequently on RHEL3.0 U5 and U7. System would
> completely hang upon memory shortage. The only option left is
> power-cycle (or 'sysrq + b'). System hang occurs with any of the below 3
> overcommit settings:

In general RHEL3 isn't to good at returning from "memory shortage"
once the oom comes out I would consider the system trashed. You need
to manage it better so you don't overcommit the system resouces.

> - default (heuristic) overcommit (overcommit_memory=0)
> - no overcommit handling by kernel (overcommit_memory=1)
> - restrictive overcommit with ratio=100% (overcommit_memory=2;
> overcommit_ratio=100)
>
> RHEL3.0 U3 would generate an OOM kill "each and every time" it sensed
> system hang but due to other bugs, we had to move away from it. RedHat
> calls the timely (at least for us) invocation of OOM in U3 a buggy
> implementation and the delayed OOM kill in U5 and U7 the right
> implementation (which we rarely get to see resulting in at least 5
> systems hanging daily!)

If you are using too much memory you are using too much memory. RHEL3
may not be recovering well (by your standards) from this. (perhaps not
any kernel for that matter)

> Changing overcommit to 2 (and ratio to any where from 1 to 99) would
> result in certain OS processes (automount daemon for e.g.) getting
> killed when all the allowed memory is committed. What is the point in
> reserving some memory if a random root process would get killed leaving
> the system in a totally unknown state?

Choosing who to kill is a hard decision. Stay away from the kernel
oom killer is is a wiley beast.

> Any suggestions on how we can prevent system-hang + not have automount
> (and any other root process) die?

There are a whole bunch of /proc knobs (more than just the overcommit)
in RHEL3 that you may dive into. You would need to get your RHEL
support to help you out with that. It is really hard to say what the
deal is but if you application is using too much memory oom is
inevitable.

As far as having an oom killer that meets you standards... You have
to talk to Redhat about that.

At a high level I would say add more memory (or reduce the amount
used my the applications) to the box if you can't find the right /proc
bit to work. Sounds like something has to give.

good luck,
Keith

2006-10-04 00:07:45

by Alan

[permalink] [raw]
Subject: Re: System hang problem.

Ar Maw, 2006-10-03 am 15:07 -0700, ysgrifennodd Manish Neema:
> RHEL3.0 U3 would generate an OOM kill "each and every time" it sensed
> system hang but due to other bugs, we had to move away from it. RedHat

And often when it didn't need too which for many users workloads is bad

> Changing overcommit to 2 (and ratio to any where from 1 to 99) would
> result in certain OS processes (automount daemon for e.g.) getting
> killed when all the allowed memory is committed. What is the point in

Killed and logging an OOM message ? That indicates a bug (well for ratio
<= about 50% anyway). Killed because there is no memory and a memory
allocation fails is expected.

> reserving some memory if a random root process would get killed leaving
> the system in a totally unknown state?

If you run out of memory and someone asks for more something has to
give. A properly configured system really shouldn't be running out of
memory anyway for most sane workloads.

It's like putting water in a bottle, at the point you have more water
than bottle something has to spill, if it doesn't the box hangs.

Alan

2006-10-04 00:07:28

by Manish Neema

[permalink] [raw]
Subject: RE: System hang problem.

Thanks Keith for the response.

My explanation earlier is not clear. The "automount" process dying with
restrictive overcommit settings is not because of the OOM kill. It looks
like some bug with "automount" binary itself causing it to exit when it
could not service a new request.

"cd /remote/something" when the system is out of (allocate'able) memory
causes the below events (obtained from /var/log/messages)

Oct 3 13:35:32 gentoo036 automount[2060]: handle_packet_missing: fork:
Cannot allocate memory
Oct 3 13:35:34 gentoo036 automount[2060]: can't unmount /remote

And then the automount process for /remote mount disappears, which
should not happen.

Thanks anyways, I'll try to take it up with RedHat again...

-Manish


2006-10-04 00:23:43

by Alan

[permalink] [raw]
Subject: RE: System hang problem.

Ar Maw, 2006-10-03 am 17:07 -0700, ysgrifennodd Manish Neema:
> Thanks Keith for the response.
>
> My explanation earlier is not clear. The "automount" process dying with
> restrictive overcommit settings is not because of the OOM kill. It looks
> like some bug with "automount" binary itself causing it to exit when it
> could not service a new request.
>
> "cd /remote/something" when the system is out of (allocate'able) memory
> causes the below events (obtained from /var/log/messages)
>
> Oct 3 13:35:32 gentoo036 automount[2060]: handle_packet_missing: fork:
> Cannot allocate memory
> Oct 3 13:35:34 gentoo036 automount[2060]: can't unmount /remote

Your kernel is behaving correctly if it does this in mode 2. You need
more memory or to better set resource limits on what is running.

Alan

2006-10-04 04:01:29

by Willy Tarreau

[permalink] [raw]
Subject: Re: System hang problem.

On Tue, Oct 03, 2006 at 05:07:25PM -0700, Manish Neema wrote:
> Thanks Keith for the response.
>
> My explanation earlier is not clear. The "automount" process dying with
> restrictive overcommit settings is not because of the OOM kill. It looks
> like some bug with "automount" binary itself causing it to exit when it
> could not service a new request.

Well, what would you expect there ?

You configured your system to avoid killing processes and return NULL to
malloc() when there's no memory left. That's the basic semantics of any
malloc() call on any system. Automount might not be able to recover from
such a condition (which for a daemon generally indicates a poor design
anyway).

What you can often do, if you have one application using much memory, is
limiting *this application's* memory usage with ulimit. If the application
correctly handles malloc()==NULL, then at least your system will behave
stably.

To better understand how the overcommit works, imagine that you have
several people who need to put eggs in the same box. By default, they
are blind and simply put their eggs there. But they don't see if the
box is overloaded or not. So once the box is full and they push new
eggs, they start to break other ones (sometimes theirs), but breaking
eggs also leaves them some room to add new ones.

Now when you set the overcommit with the ratio, you basically define
the fill limit for the box, and the people stop filling the box with
their eggs when the limit is reached. Moreover, respecting the limit
leaves room to add other components in the box (eg: packing bubbles
to protect them). On the system, this might be used by tmpfs, socket
buffers, etc...

But what will the people do with their new eggs when the box is full ?
Those who know how to refuse new eggs will simply say "no thanks" and
stop pushing new ones, while others will simply drop them on the floor
with lack of former instructions. Your automount has not been instructed
about how to proceed with new eggs it seems.

Now if you set a limit on each process, it means you will tell each
worker not to accept more than XXX eggs. If you correctly set the
limit for each worker, you can ensure that the box will never be
full. It won't mean that dumb workers will not drop new eggs on the
floor, but at least, each worker will not make other ones stop their
job. So if you can instruct smart workers to stop accepting eggs at
one limit, knowing that the dumb ones will not receive too many of
them, then you might avoid sweeping the floor at the end of the day.

If you cannot instruct any of your workers not to drop anything on
the floor, then the only solution will be to have a larger box to
be able to put all the eggs, which means more memory in your
situation.

I hope it's clear now.

Regards,
Willy

2006-10-04 05:44:07

by Manish Neema

[permalink] [raw]
Subject: RE: System hang problem.

> What you can often do, if you have one application using much memory,
> is limiting *this application's* memory usage with ulimit. If the
> application correctly handles malloc()==NULL, then at least your
> system will behave stably.

The problem is its different application, different user each time (a
typical large R&D environment). /etc/security/limits.conf allows to set
max resident set size. Is there a way to limit based on the total
virtual size?

-Manish

2006-10-04 09:49:14

by Jarek Poplawski

[permalink] [raw]
Subject: Re: System hang problem.

On 04-10-2006 05:26, Willy Tarreau wrote:
...
> To better understand how the overcommit works, imagine that you have
> several people who need to put eggs in the same box. By default, they
> are blind and simply put their eggs there. But they don't see if the
> box is overloaded or not. So once the box is full and they push new
> eggs, they start to break other ones (sometimes theirs), but breaking
> eggs also leaves them some room to add new ones.
...

I think this explantion should be the model for linux Documentation!
If every problem could be described so pictorially (preferably with
eggs) learning would be a pleasure.

> I hope it's clear now.

Yes! (except the floor...)

Best regards,

Jarek P.

2006-10-04 14:22:04

by Al Boldi

[permalink] [raw]
Subject: RE: System hang problem.

Manish Neema wrote:
> > What you can often do, if you have one application using much memory,
> > is limiting *this application's* memory usage with ulimit. If the
> > application correctly handles malloc()==NULL, then at least your
> > system will behave stably.
>
> The problem is its different application, different user each time (a
> typical large R&D environment). /etc/security/limits.conf allows to set
> max resident set size. Is there a way to limit based on the total
> virtual size?

You mean like: ulimit -v [total VMsize/runqueue]

I suppose, that this could easily be dynamically calculated by the kernel,
for a tremendously inhibiting OOM-killer effect.


Thanks!

--
Al