Hi Everyone,
In my continuing work to perfect posix timers, I have started thinking
about the problem of managing signal queuing resources.
The sigqueue structures used to queue siginfo_t structures are
allocated from a slab cache. The system limits this allocation to a
maximum of 1024 queued signals. This global limit means that
a process may fail at the whim of an unrelated process.
The attached test program easily consumes all of the available
sigqueue entries. It blocks SIGRTMIN and sends itself 2000 SIGRTMIN
signals. Running this program along with a posix timers test will
cause the posix timers to fail. Using realtime signals makes it easy
to consume this resource, but the failure could also occur with many
processes each having only a few pending signals.
Changing from a system wide limit to a per process limit would solve
most of this problem. It does not protect against having the allocation
from the slab cache fail. Posix timers are required to fail the
timer_create with EAGAIN if "the system lacks sufficient signal queuing
resources to honor the request." The current Linux posix-timers
implementation doesn't do this.
I'm contemplating changes to kernel/signal.c to allow reserved or
pre-allocated sigqueue structures to be used. The idea is to do the
allocation in the system call context so the failure can be returned to
the application.
In the pre-allocated approach, the timer code would allocate a sigqueue
structure as part of the timer_create. I would add new send_sigqueue() and
send_group_sigqueue() which would accept the pointer to the pre-allocated
sigqueue structure rather than a siginfo pointer. There would also be changes
to the code which dequeues the siginfo structure to recognize these
preallocated sigqueue structures. In the case of Posix timers using a
preallocated siqueue entry also makes handling overruns easier. If the timer
code finds that its sigqueue structure is still queued, it can simply increment
the overrun count.
The reservation approach would keep a pre-allocated pool of sigqueue
structures and a reservation count. The timer_create would reserve
a sigqueue entry which would be place in the pool until it is needed.
I wonder if anyone else is interested in this problem.
Jim Houston - Concurrent Computer Corp.
--
/*
* Example program which consumes all of the available sigqueue
* structures.
*/
#include <unistd.h>
#include <stdio.h>
#include <signal.h>
void handler(int sig, siginfo_t *info, void *context)
{
printf("handler called\n");
printf("si_signo=%d si_code=%d si_errno=%d\n",
info->si_signo, info->si_code, info->si_errno);
}
int main(int argc, char **argv)
{
struct sigaction sa;
sigset_t s;
int i, ret;
sa.sa_sigaction = &handler;
sa.sa_flags = SA_SIGINFO;
sigemptyset(&sa.sa_mask);
if (sigaction(SIGRTMIN, &sa, 0) != 0) {
perror("sigaction");
exit(1);
}
sigemptyset(&s);
sigaddset(&s, SIGRTMIN);
if (sigprocmask(SIG_SETMASK, &s, NULL) != 0) {
perror("sigprocmask");
exit(1);
}
for (i = 0; i < 2000; i++)
if ((ret = kill(getpid(), SIGRTMIN)))
break;
if (ret)
perror("kill");
sleep(5);
sigemptyset(&s);
if (sigprocmask(SIG_SETMASK, &s, NULL) != 0) {
perror("sigprocmask");
exit(1);
}
sleep(1);
return(0);
}
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1
> Posix timers are required to fail the
> timer_create with EAGAIN if "the system lacks sufficient signal queuing
> resources to honor the request." The current Linux posix-timers
> implementation doesn't do this.
That's not really how you can interpret this. At the time timer_create
is it is not know when the timer expires and whether it's a repeating
timer. Therefore it is not correct to assume that if timer_create
succeeds the resources to always deliver the signal are available.
The shall-error in the standard just covers the case if there is really
no way this can be made working. For instance, some implementation
might allocate to each process using timer_create N signal slots. The
whole system could have only N * M slots.
Because there is no fixed limit (or better said: no guaranteed minimal
number of signal slots) in Linux this error doesn't apply at all.
> I'm contemplating changes to kernel/signal.c to allow reserved or
> pre-allocated sigqueue structures to be used. The idea is to do the
> allocation in the system call context so the failure can be returned to
> the application.
Allocate how and what for?
If you mean allocating signal slots for the process, this is wrong. It
should never we the case that a process accumulates many outstanding
signals. Every limit is reached at some point and then all further
signals are ignored. This is problematic because there is no guarantee
whatsoever that all signals come from the same timer and therefore later
signals can be safely ignored (to some degree of safety). You might
lose the one event a long-running timer generates.
Allocating signal slots to the timer object does make sense. But the
number must be small. This indeed makes the signal handling simpler and
more robust. Especially wrt to the just mentioned example, a often
expiring timer won't flood the signal system so that the events of other
timers are lost.
What I don' understand is why you bring up this 2000 outstanding signal
issue. It never makes sense to handle this, especially not when the
events come from one system. If really a high number of signals is used
the system is specialized and it not a problem to require a kernel
recompilation with new limits.
In summary: I think allocate, say, 16 signal slots per timer is a good
idea. It means all timers are treated fairly which is important. But
it should never be a goal to optimize the system for hundreds or even
thousands of outstanding signals.
- --
- --------------. ,-. 444 Castro Street
Ulrich Drepper \ ,-----------------' \ Mountain View, CA 94041 USA
Red Hat `--' drepper at redhat.com `---------------------------
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.2.1 (GNU/Linux)
iD8DBQE+1TGt2ijCOnn/RHQRAiyVAJ0bOIVhou4LAw5WR3hDuc+o60uEbwCgr6V9
Nfo2tgcmiU6q8M9GGacqXvE=
=JTEF
-----END PGP SIGNATURE-----
Ulrich Drepper wrote:
>
> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA1
>
> > Posix timers are required to fail the
> > timer_create with EAGAIN if "the system lacks sufficient signal queuing
> > resources to honor the request." The current Linux posix-timers
> > implementation doesn't do this.
>
> That's not really how you can interpret this. At the time timer_create
> is it is not know when the timer expires and whether it's a repeating
> timer. Therefore it is not correct to assume that if timer_create
> succeeds the resources to always deliver the signal are available.
Hi Ulrich,
My intention is to allocate a sigqueue structure in the kernel as part
of the timer_create call. Later on in timer completion, I will use this
preallocated sigqueue entry to queue the siginfo_t for the timer completion.
The Posix timers specification guarantees that there can only be a single
signal pending for a given timer so preallocating one sigqueue structure
for each timer is sufficient.
Perhaps I should have tracked down the appropriate "shall deliver the
signal on timer completion" and quoted it instead. The point is that
the current Linux implementation will fail to deliver the signal at the
whim of other programs which might consume the limited signal queue
resource.
I saw your post on the posixtest-discuss which complained about the test suite
trying to force this condition. I agree that this is a bad test. When I preallocate
this resource, I may happily wait for a sigqueue entry to be available
and never return this failure.
Have a look at the Rationale and Notes section B.14.2.2 Create a Per-Process
Timer. It is fairly clear that it expects signal queuing resources to
be allocated at timer_create time and that failing to deliver a completion
signal is never acceptable.
Jim Houston - Concurrent Computer Corp.
On Wed, May 28, 2003 at 02:56:15PM -0400, Jim Houston wrote:
> In the pre-allocated approach, the timer code would allocate a
> sigqueue structure as part of the timer_create. I would add new
> send_sigqueue() and send_group_sigqueue() which would accept the
> pointer to the pre-allocated sigqueue structure rather than a siginfo
> pointer. There would also be changes to the code which dequeues the
> siginfo structure to recognize these preallocated sigqueue structures.
> In the case of Posix timers using a preallocated siqueue entry also
> makes handling overruns easier. If the timer code finds that its
> sigqueue structure is still queued, it can simply increment the
> overrun count.
> The reservation approach would keep a pre-allocated pool of sigqueue
> structures and a reservation count. The timer_create would reserve
> a sigqueue entry which would be place in the pool until it is needed.
> I wonder if anyone else is interested in this problem.
Well, I've never run into it and it sounds really obscure, but I agree
in principle that it's better to return an explicit error to userspace
than to silently fail, at least when it's feasible (obviously the kernel
can be beaten to death with events faster than it can deliver them, so
it won't always be feasible).
-- wli
On Wed, May 28, 2003 at 03:01:17PM -0700, Ulrich Drepper wrote:
> That's not really how you can interpret this. At the time timer_create
> is it is not know when the timer expires and whether it's a repeating
> timer. Therefore it is not correct to assume that if timer_create
> succeeds the resources to always deliver the signal are available.
> The shall-error in the standard just covers the case if there is really
> no way this can be made working. For instance, some implementation
> might allocate to each process using timer_create N signal slots. The
> whole system could have only N * M slots.
> Because there is no fixed limit (or better said: no guaranteed minimal
> number of signal slots) in Linux this error doesn't apply at all.
The inability to prevent events from coming in faster than one can
process them does create an escape hatch so one doesn't have to handle
this case because it doesn't specify that minimum. Perhaps the
criterion for merging should be if some application is negatively
affected?
But I'm not convinced this would harm anything if merged beforehand.
It's also nice to exert explicit control over competition for memory.
-- wli
William Lee Irwin III <[email protected]> writes:
> On Wed, May 28, 2003 at 02:56:15PM -0400, Jim Houston wrote:
>> In the pre-allocated approach, the timer code would allocate a
>> sigqueue structure as part of the timer_create. I would add new
>> send_sigqueue() and send_group_sigqueue() which would accept the
>> pointer to the pre-allocated sigqueue structure rather than a siginfo
>> pointer. There would also be changes to the code which dequeues the
>> siginfo structure to recognize these preallocated sigqueue structures.
>> In the case of Posix timers using a preallocated siqueue entry also
>> makes handling overruns easier. If the timer code finds that its
>> sigqueue structure is still queued, it can simply increment the
>> overrun count.
>> The reservation approach would keep a pre-allocated pool of sigqueue
>> structures and a reservation count. The timer_create would reserve
>> a sigqueue entry which would be place in the pool until it is needed.
>> I wonder if anyone else is interested in this problem.
>
> Well, I've never run into it and it sounds really obscure, but I agree
> in principle that it's better to return an explicit error to userspace
> than to silently fail, at least when it's feasible (obviously the kernel
> can be beaten to death with events faster than it can deliver them, so
> it won't always be feasible).
Why couldn't this be a configurable per-user thing like RSS rlimits?
--
--Ed L Cashin PGP public key: http://noserose.net/e/pgp/
Ed L Cashin wrote:
> William Lee Irwin III <[email protected]> writes:
>
>
>>
>>Well, I've never run into it and it sounds really obscure, but I agree
>>in principle that it's better to return an explicit error to userspace
>>than to silently fail, at least when it's feasible (obviously the kernel
>>can be beaten to death with events faster than it can deliver them, so
>>it won't always be feasible).
>
>
> Why couldn't this be a configurable per-user thing like RSS rlimits?
>
Pardon me for butting in...
It seems to me that returning an error on unrecoverable failure is
ALWAYS the right thing to do.
We're not doing that right now, and that's okay. We can simply admit
that we're not quite doing the right thing and get around to fixing it
later.
But once the fix has been made, why would anyone want it to be optional?
Is it so rare an event that the performance hit isn't worth the
catastrophe which might occur if we don't properly return an error?