2023-01-12 05:21:33

by Arseniy Lesin

[permalink] [raw]
Subject: [RESEND RFC] SIGOOM Proposal


1. Introduction
===================

AFAIK, majority of linux-running machines are configured to overcommit
memory -- so, memory needs of every process are always pleased. However,
in the Out-Of-Memory condition we repay cruely -- by _killing_ the most
memory-hungry process.

Dealing with OOM was always questionable -- kernel-space OOM-killer
often works as the last resort -- system can hang for a pretty long time
(especially when using swap) before it resolves the OOM condition.

User-space OOM-killers solve this problem _partially_ -- they can _kill_
such processes preventively or even display nice GUI prompt for user.

However, the key problem persists -- we can only _kill_ unaware process,
possibly causing valuable data loss. There is no way to tell process: "You
are causing system OOM, release memory or you will be terminated forcefully"!

2. Proposal
==================

2.1. The SIGOOM Signal
------------------

I propose the addition of new signal: SIGOOM (Out-Of-Memory SIGnal)

This signal is intended to be sent to the most memory-hungry process(es)
in order to give process a chance to release memory used for
non-valuable data (for example, browser can unload tabs, that are
currently not in use, assuming tabs are not separate processes) or to
write down valuable data and exit gracefully (for example, some
graphical editor).

Some applications can even set up a poll for OOM event by using signalfd

Default action: IGNORE
Proposed senders: kernel- and user-space OOM-killers

The technical detail of this addition is a bit unpleasant: there is
actually no room for new signals!

Numbers 1-31 are already assigned, every signal with number > SIGRTMIN
(currently 32) is considered realtime and queued accordingly.

Adding SIGOOM as signal #32 by shifting SIGRTMIN to 33 can do a trick,
but this will almost certainly break compatibility (namely, with glibc
threading)

I propose adding SIGOOM as signal #65 (after SIGRTMAX), but we should
clarify some checks in kernel/signal.c (possibly in other places too,
where signal number is tested against being realtime) and possibly add a
such-like macro:

#define SIG_IS_REALTIME(signum) (((signum) > SIGRTMIN) && ((signum) < SIGRTMAX))

I expect your comments on this topic very much, thanks in advance.

2.2. Adjusting kernel oom-killer to use SIGOOM
----------------------------------------------

Since we now have a way to inform process of it's memory utilization we
can try to send process SIGOOM signal first (if process set up handler
or poll for it) and only then kill it.

=============

I will try to prepare a patchset for kernel in next couple of weeks.
Also going to create some patches for user-space oom-killers (sd-oomd,
meta's oomd) and propose a new system call for those (it is not as
important, so i decided not to include it for this RFC).

I invite all interested to discuss this RFC here on list, or you can
catch me on #linux IRC channel (libera network) tonight (nick:
emptiedsoul)

THX for reading, and again, thanks in advance for your comments.


2023-01-12 05:48:15

by Willy Tarreau

[permalink] [raw]
Subject: Re: [RESEND RFC] SIGOOM Proposal

Hello,

On Thu, Jan 12, 2023 at 07:51:45AM +0300, Arseniy Lesin wrote:
> 2.1. The SIGOOM Signal
> ------------------
>
> I propose the addition of new signal: SIGOOM (Out-Of-Memory SIGnal)
>
> This signal is intended to be sent to the most memory-hungry process(es)
> in order to give process a chance to release memory used for
> non-valuable data (for example, browser can unload tabs, that are
> currently not in use, assuming tabs are not separate processes) or to
> write down valuable data and exit gracefully (for example, some
> graphical editor).
>
> Some applications can even set up a poll for OOM event by using signalfd
>
> Default action: IGNORE
> Proposed senders: kernel- and user-space OOM-killers
>
> The technical detail of this addition is a bit unpleasant: there is
> actually no room for new signals!

Do this simpler, let userspace configure the signal it wants to
receive for this via a new prctl(PR_SET_OOMSIG) and this would allow
each process to voluntarily declare this intended behavior and the
associated signal at the same time. There are already other comparable
mechanisms existing there (signal to receive on parent's death, or on
memory error for example).

Willy

2023-01-12 07:16:00

by Theodore Ts'o

[permalink] [raw]
Subject: Re: [RESEND RFC] SIGOOM Proposal

On Thu, Jan 12, 2023 at 07:51:45AM +0300, Arseniy Lesin wrote:
>
> 2. Proposal
> ==================
>
> 2.1. The SIGOOM Signal
> ------------------
>
> I propose the addition of new signal: SIGOOM (Out-Of-Memory SIGnal)

AIX had a similar SIGDANGER signal which was sent to all processes
when memory was low. By default, it was ignored, but processes that
were aware of it could use this as an opportunity to shrink their
memory footprint.

> The technical detail of this addition is a bit unpleasant: there is
> actually no room for new signals!
>
> Numbers 1-31 are already assigned, every signal with number > SIGRTMIN
> (currently 32) is considered realtime and queued accordingly.
>
> Adding SIGOOM as signal #32 by shifting SIGRTMIN to 33 can do a trick,
> but this will almost certainly break compatibility (namely, with glibc
> threading)
>
> I propose adding SIGOOM as signal #65 (after SIGRTMAX), but we should
> clarify some checks in kernel/signal.c (possibly in other places too,
> where signal number is tested against being realtime) and possibly add a
> such-like macro:
>
> #define SIG_IS_REALTIME(signum) (((signum) > SIGRTMIN) && ((signum) < SIGRTMAX))

It's actually worse than this. The problem is space in the signal
mask. From the signal(7) man page:

Signal mask and pending signals

A signal may be blocked, which means that it will not be
delivered until it is later unblocked. Between the time when
it is generated and when it is deliv‐ ered a signal is said to
be pending.

Each thread in a process has an independent signal mask, which
indicates the set of signals that the thread is currently
blocking. A thread can manipulate its signal mask using
pthread_sigmask(3). In a traditional single-threaded ap‐
plication, sigprocmask(2) can be used to manipulate the signal
mask.

The signal mask is stored in the signal set structure (sigset_t /
kernel_sigset_t). Later in that same man page:

The addition of real-time signals required the widening of the
signal set structure (sigset_t) from 32 to 64 bits.
Consequently, various system calls were superseded by new
system calls that supported the larger signal sets. The old
and new system calls are as follows:

Linux 2.0 and earlier Linux 2.2 and later
sigaction(2) rt_sigaction(2)
sigpending(2) rt_sigpending(2)
sigprocmask(2) rt_sigprocmask(2)
sigreturn(2) rt_sigreturn(2)
sigsuspend(2) rt_sigsuspend(2)
sigtimedwait(2) rt_sigtimedwait(2)

This is why adding a new signal is _hard_, whether it's
SIGDANGER/SIGOOM, or the SIGINFO from the people who want BSD-style
control-T support.

- Ted

2023-01-12 20:56:41

by Arseniy Lesin

[permalink] [raw]
Subject: Re: [RESEND RFC] SIGOOM Proposal

> It's actually worse than this. The problem is space in the signal
> mask.
Yeah, i've realized it right after sending my letter first time.

I am going to use Willy's approach to use prctl() to enable process to
choose a signal for OOM event.

> AIX had a similar SIGDANGER signal which was sent to all processes
> when memory was low. By default, it was ignored, but processes that
> were aware of it could use this as an opportunity to shrink their
> memory footprint.
Now should we go the same way and send SIGOOM to all receiving processes
or keep it targeted? Make it configurable?

Thanks.