Hello everyone,
I think this is my first post to this list. I have something which I think is
a improved OOM killer, it is tested pretty well and I didn't find any
problems.
Why did I wrote it?
I had a problem on one server where mysql was repeatedly forking another child
which was consuming all the memory and, finally, being killed by OOM. But the
problem was that the situation was never ending because the parent was left
alone.
What does it do?
It keeps a statistics of how many times was a child of one specific parent
killed. Then, when a threshold is reached, the parent itself is killed. If
parent process was blacklisted, and didn't have its chidlren killed in X
seconds, it is removed from the list. Both values are configurable with
sysctl.
Situation described earlier was actually a kind of DoS attack. The only
solution was for a admin to log in and kill mysql manualy. But logging in
takes around 15 minutes while server is in such situation. With this patch,
admin must log in to restart killed application and it can do that without
waiting. In that way server down-time is minimalized, and only the attacked
service is affected with DoS attack.
I am courious what comment will this patch receive. So please - feel free to
comment.
Also, I am not on the list so please use something like 'Reply All' eg. CC: me
when replying also.
Thanks!