(Please CC me, I'm not subscribed.)
Attached is a test program that tests the order in which synchronously
triggered sigsegvs and pthread_kill() generated signals get delivered.
The test program triggers sigsegvs from a thread and tests whether the
the sigsegv handler is invoked when the sigusr1 handler is already
running. If that happens it exits with exit code 27. It seems that if
the signal is sent by pthread_kill() and its signum is lower than that
of sigsegv then it can be delivered before sigsegv. A normal kill does
not seem to do this.
I would expect that no asynchronously generated signal (and here I
include those sent by pthread_kill()) can overtake a sigsegv even if
its signum is lower.
Reaching this diagnosis led me to an identical looking issue documented
at: http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=4355769 .
Cheers,
Gabor Melis
PS: tested on x86 linux Debian Lenny, kernel: 2.6.26-1-686, glibc:
2.7-18
On 03/14, G?bor Melis wrote:
>
> The test program triggers sigsegvs from a thread and tests whether the
> the sigsegv handler is invoked when the sigusr1 handler is already
> running.
No, this is not what happens with this test case. SIGSEGV can't be
generated when we run SIGUSR1 handler.
void sigsegv_handler(int signal, siginfo_t *info, void *context)
{
/* The test signal is blocked only in test_handler. */
The comment is not right. We can't be here (in SIGSEGV handler) if
we are in test_handler.
if (is_signal_blocked(test_signal)) {
_exit(27);
}
If test_signal (SIGUSR1) is blocked, this means it is already delivered,
and the handler will be invoked when we return from sigsegv_handler(),
please see below.
> A normal kill does
> not seem to do this.
Yes. Because the task deques the private signals first (sent by tkill,
or generate by kernel when the test case does "*page_address = 1"),
then it dequeues the shared signals (sent by kill).
But please note that it is still possible to hit is_signal_blocked()
even with test_with_kill(), but the probability is very low.
> I would expect that no asynchronously generated signal (and here I
> include those sent by pthread_kill()) can overtake a sigsegv even if
> its signum is lower.
Signum doesn't matter. Any unblocked signal can preempt the task, whether
it runs inside a signal handler or not. This is correct, you can use
sigaction->sa_mask to specify which signals which should be blocked during
execution of the signal handler.
OK, let's do a simple test:
int is_blocked(int sig)
{
sigset_t set;
sigprocmask(SIG_BLOCK, NULL, &set);
return sigismember(&set, sig);
}
void sig_1(int sig)
{
printf("%d %d\n", sig, is_blocked(2));
}
void sig_2(int sig)
{
printf("%d %d\n", sig, is_blocked(1));
}
int main(void)
{
sigset_t set;
signal(1, sig_1);
signal(2, sig_2);
sigemptyset(&set);
sigaddset(&set, 1);
sigaddset(&set, 2);
sigprocmask(SIG_BLOCK, &set, NULL);
kill(getpid(), 1);
kill(getpid(), 2);
sigprocmask(SIG_UNBLOCK, &set, NULL);
return 0;
}
output is:
2 1
1 0
When sigprocmask(SIG_UNBLOCK) returns, both signals are delivered.
The kernel deques 1 first, then 2. This means that the handler for
"2" will be called first.
But if you change kill(getpid(), 2) to tkill(getpid(), 2)) then the
output should be:
1 1
2 0
So, what happens with test_with_pthread_kill() is: the sub-thread
likely deques both signals, SIGSEGV=11 and SIGUSR1=10 and starts
the handler for SIGSEGV.
With test_with_kill(), the child dequeues both signals too, but
runs the handler for SIGUSR1 first, because it was send by kill(),
not tkill().
If you modify your test-case so that test_signal == SIGIO, then
I bet test_with_pthread_kill() won't hit is_signal_blocked() too.
Or you can modify test_with_kill() to use tkill(), in that case
you should see the same behaviour as with test_with_pthread_kill().
Please don't hesitate to ask more questions.
Oleg.
On Domingo 15 Marzo 2009, Oleg Nesterov wrote:
> On 03/14, G?bor Melis wrote:
> > The test program triggers sigsegvs from a thread and tests whether
> > the the sigsegv handler is invoked when the sigusr1 handler is
> > already running.
>
> No, this is not what happens with this test case. SIGSEGV can't be
> generated when we run SIGUSR1 handler.
>
> void sigsegv_handler(int signal, siginfo_t *info, void *context)
> {
> /* The test signal is blocked only in test_handler. */
>
> The comment is not right. We can't be here (in SIGSEGV handler) if
> we are in test_handler.
> if (is_signal_blocked(test_signal)) {
> _exit(27);
> }
>
> If test_signal (SIGUSR1) is blocked, this means it is already
> delivered, and the handler will be invoked when we return from
> sigsegv_handler(), please see below.
SIGUSR1 is delivered, its sigmask is added to the current mask but the
handler is not yet invoked and in this instant synchronous sigsegv is
delivered, its handler invoked?
> > A normal kill does
> > not seem to do this.
>
> Yes. Because the task deques the private signals first (sent by
> tkill, or generate by kernel when the test case does "*page_address =
> 1"), then it dequeues the shared signals (sent by kill).
>
> But please note that it is still possible to hit is_signal_blocked()
> even with test_with_kill(), but the probability is very low.
>
> > I would expect that no asynchronously generated signal (and here I
> > include those sent by pthread_kill()) can overtake a sigsegv even
> > if its signum is lower.
>
> Signum doesn't matter. Any unblocked signal can preempt the task,
> whether it runs inside a signal handler or not. This is correct, you
> can use sigaction->sa_mask to specify which signals which should be
> blocked during execution of the signal handler.
>
> OK, let's do a simple test:
>
> int is_blocked(int sig)
> {
> sigset_t set;
> sigprocmask(SIG_BLOCK, NULL, &set);
> return sigismember(&set, sig);
> }
>
> void sig_1(int sig)
> {
> printf("%d %d\n", sig, is_blocked(2));
> }
>
> void sig_2(int sig)
> {
> printf("%d %d\n", sig, is_blocked(1));
> }
>
> int main(void)
> {
> sigset_t set;
>
> signal(1, sig_1);
> signal(2, sig_2);
>
> sigemptyset(&set);
> sigaddset(&set, 1);
> sigaddset(&set, 2);
> sigprocmask(SIG_BLOCK, &set, NULL);
>
> kill(getpid(), 1);
> kill(getpid(), 2);
>
> sigprocmask(SIG_UNBLOCK, &set, NULL);
>
> return 0;
> }
>
> output is:
>
> 2 1
> 1 0
>
> When sigprocmask(SIG_UNBLOCK) returns, both signals are delivered.
> The kernel deques 1 first, then 2. This means that the handler for
> "2" will be called first.
My mental model that matches what I quickly glean from the sources (from
kernel/signal.c, arch/x86/kernel/signal_32.c) goes like this:
- signal 1 and signal 2 are generated and made pending
- they are unblocked by sigprocmask
- signal 1 is delivered: signals in its mask (only itself here) are
blocked its handler is invoked
- barely into sig_1 (the printf didn't get a chance to run), signal 2 is
delivered its sigmask is added to current one that already includes
signal 1, sig_2 is invoked
- sig_2 prints "2 1" and returns
- sig_1 reaches the printf printing "1 0"
hes this model.
I can't find how 'handler for "2" will be called first'. Furthermore, if
it's indeed sig_2 that's invoked first then sig_1's sigmask is added to
the current mask upon dequeueing???
> But if you change kill(getpid(), 2) to tkill(getpid(), 2)) then the
> output should be:
>
> 1 1
> 2 0
Right.
> So, what happens with test_with_pthread_kill() is: the sub-thread
> likely deques both signals, SIGSEGV=11 and SIGUSR1=10 and starts
> the handler for SIGSEGV.
>
> With test_with_kill(), the child dequeues both signals too, but
> runs the handler for SIGUSR1 first, because it was send by kill(),
> not tkill().
>
> If you modify your test-case so that test_signal == SIGIO, then
> I bet test_with_pthread_kill() won't hit is_signal_blocked() too.
> Or you can modify test_with_kill() to use tkill(), in that case
> you should see the same behaviour as with test_with_pthread_kill().
>
> Please don't hesitate to ask more questions.
Thank you,
Gabor
> Oleg.
On 03/15, G?bor Melis wrote:
>
> On Domingo 15 Marzo 2009, Oleg Nesterov wrote:
> >
> > If test_signal (SIGUSR1) is blocked, this means it is already
> > delivered, and the handler will be invoked when we return from
> > sigsegv_handler(), please see below.
>
> SIGUSR1 is delivered, its sigmask is added to the current mask but the
> handler is not yet invoked and in this instant synchronous sigsegv is
> delivered, its handler invoked?
Can't understand the question. Could you reiterate?
> > When sigprocmask(SIG_UNBLOCK) returns, both signals are delivered.
> > The kernel deques 1 first, then 2. This means that the handler for
> > "2" will be called first.
>
> My mental model that matches what I quickly glean from the sources (from
> kernel/signal.c, arch/x86/kernel/signal_32.c) goes like this:
>
> - signal 1 and signal 2 are generated and made pending
> - they are unblocked by sigprocmask
> - signal 1 is delivered: signals in its mask (only itself here) are
> blocked
yes.
the kernel changes ip (instruction pointer) to sig_1.
> its handler is invoked
no.
We never return to user-space with a pending signal. We dequeue signal 2
too, and change ip to sig_2.
Now, since there are no more pending signals, we return to the user space,
and start sig_2().
> I can't find how 'handler for "2" will be called first'.
see above,
> Furthermore, if
> it's indeed sig_2 that's invoked first then sig_1's sigmask is added to
> the current mask upon dequeueing???
sig_1's sigmask was added to ->blocked when we dequeued signal 1.
Oleg.
On Domingo 15 Marzo 2009, Oleg Nesterov wrote:
> On 03/15, G?bor Melis wrote:
> > On Domingo 15 Marzo 2009, Oleg Nesterov wrote:
> > > If test_signal (SIGUSR1) is blocked, this means it is already
> > > delivered, and the handler will be invoked when we return from
> > > sigsegv_handler(), please see below.
> >
> > SIGUSR1 is delivered, its sigmask is added to the current mask but
> > the handler is not yet invoked and in this instant synchronous
> > sigsegv is delivered, its handler invoked?
>
> Can't understand the question. Could you reiterate?
No need, my question was answered below.
> > > When sigprocmask(SIG_UNBLOCK) returns, both signals are
> > > delivered. The kernel deques 1 first, then 2. This means that the
> > > handler for "2" will be called first.
> >
> > My mental model that matches what I quickly glean from the sources
> > (from kernel/signal.c, arch/x86/kernel/signal_32.c) goes like this:
> >
> > - signal 1 and signal 2 are generated and made pending
> > - they are unblocked by sigprocmask
> > - signal 1 is delivered: signals in its mask (only itself here) are
> > blocked
>
> yes.
>
> the kernel changes ip (instruction pointer) to sig_1.
>
> > its handler is invoked
>
> no.
>
> We never return to user-space with a pending signal. We dequeue
> signal 2 too, and change ip to sig_2.
>
> Now, since there are no more pending signals, we return to the user
> space, and start sig_2().
I see. I guess in addition to changing the ip, the stack frobbing magic
arranges that sig_2 returns to sig_1 or some code that calls sig_1.
From the point of view of sig_2 it seems that sig_1 is already invoked
because it has its sigmask in effect and the ip in the ucontext of
sig_2 points to sig_1 as the attached signal-test.c shows:
sig_1=80485a7
sig_2=80485ed
2 1
eip: 80485a7
1 0
eip: b7fab424
The revised signal-delivery-order.c (also attached) outputs:
test_handler=8048727
sigsegv_handler=804872c
eip: 8048727
esp: b7d94cb8
which shows that sigsegv_handler also has incorrect eip in the context.
On 03/15, G?bor Melis wrote:
>
> On Domingo 15 Marzo 2009, Oleg Nesterov wrote:
> > >
> > Now, since there are no more pending signals, we return to the user
> > space, and start sig_2().
>
> I see. I guess in addition to changing the ip, the stack frobbing magic
> arranges that sig_2 returns to sig_1 or some code that calls sig_1.
yes. "some code" == rt_sigreturn,
> The revised signal-delivery-order.c (also attached) outputs:
>
> test_handler=8048727
> sigsegv_handler=804872c
> eip: 8048727
> esp: b7d94cb8
>
> which shows that sigsegv_handler also has incorrect eip in the context.
Why do you think it is not correct?
I didn't try your test-case, but I can't see where "esp: b7d94cb8"
comes from. But "eip: 8048727" looks exactly right, this is the
address of test_handler.
Oleg.
On Lunes 16 Marzo 2009, Oleg Nesterov wrote:
> On 03/15, G?bor Melis wrote:
> > On Domingo 15 Marzo 2009, Oleg Nesterov wrote:
> > > Now, since there are no more pending signals, we return to the
> > > user space, and start sig_2().
> >
> > I see. I guess in addition to changing the ip, the stack frobbing
> > magic arranges that sig_2 returns to sig_1 or some code that calls
> > sig_1.
>
> yes. "some code" == rt_sigreturn,
>
> > The revised signal-delivery-order.c (also attached) outputs:
> >
> > test_handler=8048727
> > sigsegv_handler=804872c
> > eip: 8048727
> > esp: b7d94cb8
> >
> > which shows that sigsegv_handler also has incorrect eip in the
> > context.
>
> Why do you think it is not correct?
>
> I didn't try your test-case, but I can't see where "esp: b7d94cb8"
> comes from. But "eip: 8048727" looks exactly right, this is the
> address of test_handler.
Sorry, I removed the printing of esp from the code as it was not
relevant to my point but pasted the output of a previous run.
Anyway, I think eip is incorrect in sigsegv because it's not pointing to
the instruction that caused the sigsegv. In general the ucontext is
wrong, because it's as if sigsegv_handler were invoked within
test_handler.
This is problematic if the sigsegv handler wants to do something with
the context. The real life sigsegv handler that's been failing does
this:
- skip the offending instruction by incrementing eip
- taking esp from the context, frob the control stack so that some
function is called on return from the handler (the handler itself is on
altstack). This is not unlike what the kernel does, it seems.
Now, "some function" cannot be called with SIGUSR1 blocked because that
would potentially lead to deadlocks. (SIGUSR1 is sent to a thread when
the garbage collector wants to stop it, and some function does
allocations.)
So the context in the sigsegv handler pointing to the handler of SIGUSR1
loses because it finds an unexpected sigmask: SIGUSR1 is blocked. It
loses because the eip is not pointing to the right instruction, it
loses because the SIGUSR1 handler won't finish until "some function"
returns ...
It seems to me that the same problem could be triggerred by
pthread_kill()ing a thread that's sigtrapping if the signum of the
signal sent is lower than that of sigtrap, say it's SIGINT.
In a nutshell, the context argument is wrong.
> Oleg.
On 03/16, G?bor Melis wrote:
>
> In a nutshell, the context argument is wrong.
I strongly disagree. This all is correct and works as expected.
Yes, it doesn't match your expectations/needs, but this doesn't
mean it is wrong.
> On Lunes 16 Marzo 2009, Oleg Nesterov wrote:
> > On 03/15, G?bor Melis wrote:
> >
> > > The revised signal-delivery-order.c (also attached) outputs:
> > >
> > > test_handler=8048727
> > > sigsegv_handler=804872c
> > > eip: 8048727
> > > esp: b7d94cb8
> > >
> > > which shows that sigsegv_handler also has incorrect eip in the
> > > context.
> >
> > Why do you think it is not correct?
> >
> > I didn't try your test-case, but I can't see where "esp: b7d94cb8"
> > comes from. But "eip: 8048727" looks exactly right, this is the
> > address of test_handler.
>
> Sorry, I removed the printing of esp from the code as it was not
> relevant to my point but pasted the output of a previous run.
>
> Anyway, I think eip is incorrect in sigsegv because it's not pointing to
> the instruction that caused the sigsegv. In general the ucontext is
> wrong, because it's as if sigsegv_handler were invoked within
> test_handler.
>
> This is problematic if the sigsegv handler wants to do something with
> the context. The real life sigsegv handler that's been failing does
> this:
> - skip the offending instruction by incrementing eip
I don't know how to solve your problem cleanly. Please let me now
if you find the solution ;)
As a workaround, perhaps sigsegv_handler() can just return if
uc_mcontext->ip points to another signal handler (assuming that
the first instruction of the signal handler can't cause the fault).
In this case the task will repeat the faulting instruction, and
sigsegv_handler() will be called again.
Agreed, this is not nice and the task should track sigaction()
calls, or sigsegv_handler() can do sigaction(signr, NULL, &oldact)
in a loop to see which handler we have.
Can the kernel help?
Perhaps we can add si_ip to _sigfault, in this case we could just
check uc_mcontext->ip != info->si_ip and return. Or we can unwind
the stack and find the "correct" rt_sigframe which matches the
page fault.
Or, we can change force_sig_info_fault() to block all signals
except si_signo and use set_restore_sigmask() logic. This means
no other signal can be dequeued until sigsegv_handler() returns.
I dunno. I am not sure your problem is common enough to do these
changes, but I can't judge.
Oleg.
Oleg Nesterov wrote:
> On 03/16, G?bor Melis wrote:
>> In a nutshell, the context argument is wrong.
>
> I strongly disagree. This all is correct and works as expected.
> Yes, it doesn't match your expectations/needs, but this doesn't
> mean it is wrong.
It would appear that standard may allow this, depending on interpretation.
From SUSv3, regarding sigaction():
"...the third argument can be cast to a pointer to an object of type
ucontext_t to refer to the receiving thread's context that was
interrupted when the signal was delivered."
From the "signal concepts" section, "A signal is said to be
``delivered'' to a process when the appropriate action for the process
and signal is taken."
Given that the SIGSEGV isn't "delivered" until it's handler runs, it
could possibly be valid for the instruction pointer in the SIGSEGV
handler to reference test_handler, if the system was set up in such a
way that the context was set to test_handler() at the time the SIGSEGV
handler was run.
However, there are quality-of-implementation issues here--being able to
handle SIGSEGV is pretty useless if the instruction pointer being passed
to the handler doesn't actually match the instruction that caused the
segfault.
> I dunno. I am not sure your problem is common enough to do these
> changes, but I can't judge.
What he's trying to do isn't all that unusual. Certainly any
application wanting to do something smart (log the instruction pointer,
for instance) on a segfault could run into the same problem.
Chris
(see http://marc.info/?t=123704955800002)
First of all, perhaps I missed somethings and this is solvable without
kernel changes, but I can't see how.
To simplify, suppose that the application wants to log the faulting
instruction before exit, so it installs the handler for SIGSEGV
void sigsegv_handler(int sig, siginfo_t *info, struct ucontext *context)
{
fprintf(stderr, "bug at %p\n", context->uc_mcontext->ip);
exit(1);
}
But this doesn't work. It is possible that, before the task dequeues SIGSEGV,
another private signal, say, SIGHUP (note that SIGHUP < SIGSEGV) is sent to
this app.
In this case the task dequeues both signals, SIGHUP and then SIGSEGV before
return to user-space. This is correct, but now uc_mcontext->ip points to
sighup_handler, not to the faulted instruction.
Can/should we change force_sig_info_fault() to help?
We can add siginfo_t->_sigfault.ip to solve this problem. This shouldn't
enlarge the size of siginfo_t. With this change sigsegv_handler() always
knows the address of the instruction which caused the fault.
But this doesn't allow to change uc_mcontext->ip to, say, skip the offending
instruction or call another function which does a fixup. Actually, ->si_ip
helps, we can do
void sigsegv_handler(int sig, siginfo_t *info, struct ucontext *context)
{
if (info->si_ip != context->uc_mcontext->ip)
/*
* The offending instruction will be repeated, and
* we will have the "same" SIGSEGV again.
*/
return;
context->uc_mcontext->ip = fixup;
...
}
But this doesn't look very nice. So, perhaps we can do another change?
--- arch/x86/mm/fault.c
+++ arch/x86/mm/fault.c
@@ -177,6 +177,13 @@ static void force_sig_info_fault(int si_
{
siginfo_t info;
+ current->saved_sigmask = current->blocked;
+ spin_lock_irq(¤t->sighand->siglock);
+ siginitsetinv(¤t->blocked, sigmask(si_signo) |
+ sigmask(SIGKILL) | sigmask(SIGSTOP));
+ spin_unlock_irq(¤t->sighand->siglock);
+ set_restore_sigmask();
+
info.si_signo = si_signo;
info.si_errno = 0;
info.si_code = si_code;
But this is a user-visible change, all signals will be blocked until
sigsegv_handler() returns. But with this change sigsegv_handler()
always has the "correct" rt_sigframe.
Comments?
And once again, I have a nasty feeling I missed something and we don't
need to change the kernel.
Oleg.
Sorry for noise, forgot to mention,
On 03/17, Oleg Nesterov wrote:
>
> --- arch/x86/mm/fault.c
> +++ arch/x86/mm/fault.c
> @@ -177,6 +177,13 @@ static void force_sig_info_fault(int si_
> {
> siginfo_t info;
>
> + current->saved_sigmask = current->blocked;
> + spin_lock_irq(¤t->sighand->siglock);
> + siginitsetinv(¤t->blocked, sigmask(si_signo) |
> + sigmask(SIGKILL) | sigmask(SIGSTOP));
> + spin_unlock_irq(¤t->sighand->siglock);
> + set_restore_sigmask();
> +
Of course, this change is wrong, it is just for illustration.
We shouldn't unblock si_signo if it was blocked, force_sig_info()
sets SIG_DFL in this case.
Oleg.
On Martes 17 Marzo 2009, Oleg Nesterov wrote:
> (see http://marc.info/?t=123704955800002)
>
> First of all, perhaps I missed somethings and this is solvable
> without kernel changes, but I can't see how.
>
> To simplify, suppose that the application wants to log the faulting
> instruction before exit, so it installs the handler for SIGSEGV
>
> void sigsegv_handler(int sig, siginfo_t *info, struct ucontext
> *context) {
> fprintf(stderr, "bug at %p\n", context->uc_mcontext->ip);
> exit(1);
> }
>
> But this doesn't work. It is possible that, before the task dequeues
> SIGSEGV, another private signal, say, SIGHUP (note that SIGHUP <
> SIGSEGV) is sent to this app.
>
> In this case the task dequeues both signals, SIGHUP and then SIGSEGV
> before return to user-space. This is correct, but now uc_mcontext->ip
> points to sighup_handler, not to the faulted instruction.
>
>
> Can/should we change force_sig_info_fault() to help?
>
> We can add siginfo_t->_sigfault.ip to solve this problem. This
> shouldn't enlarge the size of siginfo_t. With this change
> sigsegv_handler() always knows the address of the instruction which
> caused the fault.
>
>
> But this doesn't allow to change uc_mcontext->ip to, say, skip the
> offending instruction or call another function which does a fixup.
> Actually, ->si_ip helps, we can do
>
> void sigsegv_handler(int sig, siginfo_t *info, struct ucontext
> *context) {
> if (info->si_ip != context->uc_mcontext->ip)
> /*
> * The offending instruction will be repeated, and
> * we will have the "same" SIGSEGV again.
> */
> return;
>
> context->uc_mcontext->ip = fixup;
> ...
> }
>
> But this doesn't look very nice. So, perhaps we can do another
> change?
>
> --- arch/x86/mm/fault.c
> +++ arch/x86/mm/fault.c
> @@ -177,6 +177,13 @@ static void force_sig_info_fault(int si_
> {
> siginfo_t info;
>
> + current->saved_sigmask = current->blocked;
> + spin_lock_irq(¤t->sighand->siglock);
> + siginitsetinv(¤t->blocked, sigmask(si_signo) |
> + sigmask(SIGKILL) | sigmask(SIGSTOP));
> + spin_unlock_irq(¤t->sighand->siglock);
> + set_restore_sigmask();
> +
> info.si_signo = si_signo;
> info.si_errno = 0;
> info.si_code = si_code;
>
> But this is a user-visible change, all signals will be blocked until
> sigsegv_handler() returns. But with this change sigsegv_handler()
> always has the "correct" rt_sigframe.
>
>
> Comments?
>
> And once again, I have a nasty feeling I missed something and we
> don't need to change the kernel.
>
> Oleg.
As an application developer what I'd like to have is this: synchronously
generated signals are delivered before asynchronously generated ones.
That is, if a number of signals are generated but not yet delivered
then the synchronously generated ones are delivered first. I guess, in
the kernel this would mean that the private/non-private distinction is
not enough.
I think Chris is right in that standard allows the current behaviour. I
would have quoted it already if I had thought otherwise ... He's also
right about quality of implementation. The standard doesn't say much
about synchronously and asynchronously generated signals and it even
goes further:
"The order in which multiple, simultaneously pending signals outside the
range SIGRTMIN to SIGRTMAX are delivered to or accepted by a process is
unspecified."
Bleh. It also says that the context argument represents the context at
the time of delivery. For a handler for sigsegv, sigtrap (to name just
the most likely suspects) the context - in order for it to be useful
at all - must be from the time of generation. There is an obvious
contradiction here and the only way I see to resolve this is to deliver
syncrhronously generated signals while the two contexts are the same
which is exactly what allowing other signals to 'overtake' violates. Of
course, if sigsegv is blocked then this is impossible to do, but that's
fine.
For the application I'm working on this whole issue is kind of moot
since kernels with the current behaviour will be around for ages to
come and I have the workaround of not pthread_kill()ing with a signal
whose signum is lower than the signum of any of the syncrhonously
generated signals. In practice, it only means that I'll use SIGUSR2
instead of SIGUSR1 because that's greater than SIGSEGV and pay
attention not to pthread_kill() with SIGINT. The only thing that
worries me is this remark from Oleg
(http://marc.info/?l=linux-kernel&m=123711058421913&w=2):
"But please note that it is still possible to hit is_signal_blocked()
even with test_with_kill(), but the probability is very low."
I still can't see how that's possible, but if it is, then it defeats the
workaround and I have even bigger problems than I thought. Not only me,
I guess, most applications with a sigtrap, sigsegv handler that use the
context will be broken by a C-c.
Gabor
On 03/17, G?bor Melis wrote:
>
> On Martes 17 Marzo 2009, Oleg Nesterov wrote:
> >
> > But this doesn't look very nice. So, perhaps we can do another
> > change?
> >
> > --- arch/x86/mm/fault.c
> > +++ arch/x86/mm/fault.c
> > @@ -177,6 +177,13 @@ static void force_sig_info_fault(int si_
> > {
> > siginfo_t info;
> >
> > + current->saved_sigmask = current->blocked;
> > + spin_lock_irq(¤t->sighand->siglock);
> > + siginitsetinv(¤t->blocked, sigmask(si_signo) |
> > + sigmask(SIGKILL) | sigmask(SIGSTOP));
> > + spin_unlock_irq(¤t->sighand->siglock);
> > + set_restore_sigmask();
> > +
> > info.si_signo = si_signo;
> > info.si_errno = 0;
> > info.si_code = si_code;
> >
> > But this is a user-visible change, all signals will be blocked until
> > sigsegv_handler() returns. But with this change sigsegv_handler()
> > always has the "correct" rt_sigframe.
>
> As an application developer what I'd like to have is this: synchronously
> generated signals are delivered before asynchronously generated ones.
> That is, if a number of signals are generated but not yet delivered
> then the synchronously generated ones are delivered first. I guess, in
> the kernel this would mean that the private/non-private distinction is
> not enough.
With the change like above, no other signal (except SIGKILL) can be
delivered until the signal handler returns.
Probably it is better to just change force_sig_info(), in this case
SIGFPE/etc will have the same behaviour.
> The only thing that
> worries me is this remark from Oleg
> (http://marc.info/?l=linux-kernel&m=123711058421913&w=2):
>
> "But please note that it is still possible to hit is_signal_blocked()
> even with test_with_kill(), but the probability is very low."
Sorry for confusion. Initially I misread test_with_kill() case, and then
forgot to remove this part. I think this is not possible.
Oleg.
On Martes 17 Marzo 2009, Oleg Nesterov wrote:
> On 03/17, G?bor Melis wrote:
> > On Martes 17 Marzo 2009, Oleg Nesterov wrote:
> > > But this doesn't look very nice. So, perhaps we can do another
> > > change?
> > >
> > > --- arch/x86/mm/fault.c
> > > +++ arch/x86/mm/fault.c
> > > @@ -177,6 +177,13 @@ static void force_sig_info_fault(int si_
> > > {
> > > siginfo_t info;
> > >
> > > + current->saved_sigmask = current->blocked;
> > > + spin_lock_irq(¤t->sighand->siglock);
> > > + siginitsetinv(¤t->blocked, sigmask(si_signo) |
> > > + sigmask(SIGKILL) | sigmask(SIGSTOP));
> > > + spin_unlock_irq(¤t->sighand->siglock);
> > > + set_restore_sigmask();
> > > +
> > > info.si_signo = si_signo;
> > > info.si_errno = 0;
> > > info.si_code = si_code;
> > >
> > > But this is a user-visible change, all signals will be blocked
> > > until sigsegv_handler() returns. But with this change
> > > sigsegv_handler() always has the "correct" rt_sigframe.
> >
> > As an application developer what I'd like to have is this:
> > synchronously generated signals are delivered before asynchronously
> > generated ones. That is, if a number of signals are generated but
> > not yet delivered then the synchronously generated ones are
> > delivered first. I guess, in the kernel this would mean that the
> > private/non-private distinction is not enough.
>
> With the change like above, no other signal (except SIGKILL) can be
> delivered until the signal handler returns.
Surely, you don't mean the above literally: it would violate the
standard to prevent all other signals from being delivered until the
sigsegv handler returns.
> Probably it is better to just change force_sig_info(), in this case
> SIGFPE/etc will have the same behaviour.
Indeed, uniformity seems preferable to me.
While we are at it, an interesting case is when a synchronously
generated signal and an asynchronously generated signal - that is also
of the type that can be synchronously generated - are to be delivered.
Say we have a fault and a sigsegv generated but some misguided soul
pthread_kill()s with sigtrap. In this case the sigsegv shall be
delivered first, and the async sigtrap later.
> > The only thing that
> > worries me is this remark from Oleg
> > (http://marc.info/?l=linux-kernel&m=123711058421913&w=2):
> >
> > "But please note that it is still possible to hit
> > is_signal_blocked() even with test_with_kill(), but the probability
> > is very low."
>
> Sorry for confusion. Initially I misread test_with_kill() case, and
> then forgot to remove this part. I think this is not possible.
Thanks for the clarification.
> Oleg.
On 03/17, G?bor Melis wrote:
>
> On Martes 17 Marzo 2009, Oleg Nesterov wrote:
> > On 03/17, G?bor Melis wrote:
> > > On Martes 17 Marzo 2009, Oleg Nesterov wrote:
> > > > But this doesn't look very nice. So, perhaps we can do another
> > > > change?
> > > >
> > > > --- arch/x86/mm/fault.c
> > > > +++ arch/x86/mm/fault.c
> > > > @@ -177,6 +177,13 @@ static void force_sig_info_fault(int si_
> > > > {
> > > > siginfo_t info;
> > > >
> > > > + current->saved_sigmask = current->blocked;
> > > > + spin_lock_irq(¤t->sighand->siglock);
> > > > + siginitsetinv(¤t->blocked, sigmask(si_signo) |
> > > > + sigmask(SIGKILL) | sigmask(SIGSTOP));
> > > > + spin_unlock_irq(¤t->sighand->siglock);
> > > > + set_restore_sigmask();
> > > > +
> > > > info.si_signo = si_signo;
> > > > info.si_errno = 0;
> > > > info.si_code = si_code;
> > > >
> > > > But this is a user-visible change, all signals will be blocked
> > > > until sigsegv_handler() returns. But with this change
> > > > sigsegv_handler() always has the "correct" rt_sigframe.
> > >
> > > As an application developer what I'd like to have is this:
> > > synchronously generated signals are delivered before asynchronously
> > > generated ones. That is, if a number of signals are generated but
> > > not yet delivered then the synchronously generated ones are
> > > delivered first. I guess, in the kernel this would mean that the
> > > private/non-private distinction is not enough.
> >
> > With the change like above, no other signal (except SIGKILL) can be
> > delivered until the signal handler returns.
>
> Surely, you don't mean the above literally:
I literally meant the above ;)
> it would violate the
> standard to prevent all other signals from being delivered until the
> sigsegv handler returns.
Yes. That is why I didn't send the patch but asked the question.
But, just in case, this will not happen if the signal was sent from
user-space via kill/tkill.
> While we are at it, an interesting case is when a synchronously
> generated signal and an asynchronously generated signal - that is also
> of the type that can be synchronously generated - are to be delivered.
> Say we have a fault and a sigsegv generated but some misguided soul
> pthread_kill()s with sigtrap. In this case the sigsegv shall be
> delivered first, and the async sigtrap later.
In this case we can do nothing. The second signal will be lost.
But this is not the problem. If sigsegv_handler() wants to play
with context, it should check SI_FROMKERNEL() first. If we lose
the SIGSEGV from the fault, it will be re-generated.
Oleg.
On Tue, 17 Mar 2009, G?bor Melis wrote:
>
> As an application developer what I'd like to have is this: synchronously
> generated signals are delivered before asynchronously generated ones.
I agree that it would be nice, but quite frankly, it's simply not how
signals work. It would be a reasonably invasive change, and you wouldn't
really be able to rely on it anyway since most kernels don't work that
way.
What you might be able to do instead is to walk signal frames backwards by
hand. IOW, accept the fact that sometimes signals end up being nested, but
then you could try to find the right frame by just looking at them.
And your trick of comparing 'info->si_ip' with 'context->uc_mcontext->ip'
is pretty good, and lets the code itself walk the signal frames by just
depending on the fault happening again.
Linus
From: Linus Torvalds <[email protected]>
Date: Tue, 17 Mar 2009 08:56:34 -0700 (PDT)
> What you might be able to do instead is to walk signal frames backwards by
> hand. IOW, accept the fact that sometimes signals end up being nested, but
> then you could try to find the right frame by just looking at them.
FWIW, GCC's dwarf2 unwind handlers already know how to walk through
Linux signal frames and identify them.
> First of all, perhaps I missed somethings and this is solvable without
> kernel changes, but I can't see how.
It depends what kind of "solved" you mean.
Signals pending for the thread are always delivered before signals pending
for the process. POSIX does not guarantee this to the application, but it
has always been so in Linux and it's fine enough to rely on that. Truly
externally-generated and asynchronous signals go to the process, so it's
really only pthread_kill use within your own program that raises the issue.
Among signals pending for the thread, signals < SIGRTMIN are always
delivered before ones >= SIGRTMIN. POSIX does not guarantee this to the
application, but it has always been so in Linux and it's fine enough to
rely on that. The most sensible thing to use with pthread_kill is some
SIGRTMIN+n signal anyway, since they are never confused with any other use.
If your program is doing that, you don't have a problem.
So on the one hand it seems pretty reasonable to say it's "solved" by
accepting it when we say, "Welcome to Unix, these things should have
stopped surprising you in the 1980s." It's a strange pitfall of how
everything fits together, granted. But you do sort of have to make an
effort to do things screwily before you can fall into it.
All that said, it's actually probably a pretty easy hack to arrange that
the signal posted by force_sig_info is the first one dequeued in all but
the most utterly strange situations.
Thanks,
Roland
On Mi?rcoles 18 Marzo 2009, Roland McGrath wrote:
> > First of all, perhaps I missed somethings and this is solvable
> > without kernel changes, but I can't see how.
>
> It depends what kind of "solved" you mean.
>
> Signals pending for the thread are always delivered before signals
> pending for the process. POSIX does not guarantee this to the
> application, but it has always been so in Linux and it's fine enough
> to rely on that. Truly externally-generated and asynchronous signals
> go to the process, so it's really only pthread_kill use within your
> own program that raises the issue.
>
> Among signals pending for the thread, signals < SIGRTMIN are always
> delivered before ones >= SIGRTMIN. POSIX does not guarantee this to
> the application, but it has always been so in Linux and it's fine
> enough to rely on that. The most sensible thing to use with
> pthread_kill is some SIGRTMIN+n signal anyway, since they are never
> confused with any other use. If your program is doing that, you don't
> have a problem.
It was just a month or so ago when I finally made to change to use a
non-real-time signal for signalling stop-for-gc. It was motivated by
the fact that even with rt signals there needs to be a fallback
mechanism for when the rt signal queue overflows. Another reason was
that _different processes_ could interfere with each other: if one
filled the queue the other processes would hang too (there was no
fallback mechanism implemented). From this behaviour, it seemed that
the rt signal queue was global. Attached is a test program that
reproduces this.
$ gcc -lpthread rt-signal-queue-overflow.c
$ (./a.out &); sleep 1; ./a.out
pthread_kill returned EAGAIN, errno=0, count=24566
pthread_kill returned EAGAIN, errno=0, count=0
There are two notable things here. The first is that pthread_kill
returns EAGAIN that's not mentioned on the man page, but does not set
errno. The other is that the first process filled the rt signal queue
and the second one could not send a single signal successfully.
Granted, without a fallback mechanism my app deserved to lose. However,
it seemed to me that there were other programs lacking in this regard
on my desktop as I managed to hang a few of them.
Even though within my app I could have guarenteed that the number of
pending rt signals is below a reasonable limit, there was no way to
defend against other processes filling up the queue so I had to
implement fallback mechanism that used non-rt signals (changing a few
other things as well) and when that was done, there was no reason to
keep the rt signal based one around.
Consider this another quality-of-implementation report.
> So on the one hand it seems pretty reasonable to say it's "solved" by
> accepting it when we say, "Welcome to Unix, these things should have
> stopped surprising you in the 1980s." It's a strange pitfall of how
> everything fits together, granted. But you do sort of have to make
> an effort to do things screwily before you can fall into it.
>
> All that said, it's actually probably a pretty easy hack to arrange
> that the signal posted by force_sig_info is the first one dequeued in
> all but the most utterly strange situations.
>
>
> Thanks,
> Roland
On Martes 17 Marzo 2009, Linus Torvalds wrote:
> On Tue, 17 Mar 2009, G?bor Melis wrote:
> > As an application developer what I'd like to have is this:
> > synchronously generated signals are delivered before asynchronously
> > generated ones.
>
> I agree that it would be nice, but quite frankly, it's simply not how
> signals work. It would be a reasonably invasive change, and you
> wouldn't really be able to rely on it anyway since most kernels don't
> work that way.
True. I won't be able to rely on this and I'll stick to the SIGUSR2
workaround that's confirmed by Oleg.
> What you might be able to do instead is to walk signal frames
> backwards by hand. IOW, accept the fact that sometimes signals end up
> being nested, but then you could try to find the right frame by just
> looking at them.
Walking the frames is not enough, because even if the right one is
found, I can't put a new frame on it if it's not at the top ...
> And your trick of comparing 'info->si_ip' with
> 'context->uc_mcontext->ip' is pretty good, and lets the code itself
> walk the signal frames by just depending on the fault happening
> again.
Another example. Suppose there is a stack with a mprotected guard page
at the end. The app's stack grows into the guard page, sigsegv is
generated, its handler would be invoked, but a pthread_kill'ed SIGUSR1
gets delivered first. Now the SIGUSR1 handler accesses the stack and
triggers another fault, the sigsegv handler sees that si_ip ==
uc_mcontext->ip, so it unprotects the page and puts a frame on the
stack, arranging for a function to be called. Then the function
deadlocks because it waits for a signal that's blocked in the SIGUSR1
handler.
I think it would be a definite improvement to prevent all these
headaches from occurring and deliver asynchronously generated, thread
private signals after the synchronous ones.
> Linus
On Wed, 18 Mar 2009, G?bor Melis wrote:
>
> It was just a month or so ago when I finally made to change to use a
> non-real-time signal for signalling stop-for-gc.
Ahh, and that is when you noticed the issue with SIGSEGV?
One thing you might try is to still use a non-real-time signal, but simply
depend on the fact that signals are basically peeled off the signal
bitmask in a signal number order (with the exception that fatal signals
are handled specially).
IOW, on x86, just about the _only_ normal user-generated signal that can
happen before SIGSEGV is SIGUSR1, because SIGSEGV is 11, and SIGUSR1 is
10.
If you were to use SIGUSR2 (signal #12) you'd probably never see this.
Of course, there are other signals with numbers < 11, but they are
generally meant for other synchronous traps (ie signals like
SIGTRAP/AIGABRT/SIGFPE/SIGBUS), or for killing the process (signals
SIGHUP/SIGINT/SIGQUIT).
So maybe you'd be happy with just picking another signal? It's not a
_pretty_ solution, but it should work across most kernel versions.
Linus
On Mi?rcoles 18 Marzo 2009, Linus Torvalds wrote:
> On Wed, 18 Mar 2009, G?bor Melis wrote:
> > It was just a month or so ago when I finally made to change to use
> > a non-real-time signal for signalling stop-for-gc.
>
> Ahh, and that is when you noticed the issue with SIGSEGV?
Unfortunately not immediately because another change was needed make
context frobbing sigsegvs frequent enough for this to be noticed ...
> One thing you might try is to still use a non-real-time signal, but
> simply depend on the fact that signals are basically peeled off the
> signal bitmask in a signal number order (with the exception that
> fatal signals are handled specially).
>
> IOW, on x86, just about the _only_ normal user-generated signal that
> can happen before SIGSEGV is SIGUSR1, because SIGSEGV is 11, and
> SIGUSR1 is 10.
>
> If you were to use SIGUSR2 (signal #12) you'd probably never see
> this.
>
> Of course, there are other signals with numbers < 11, but they are
> generally meant for other synchronous traps (ie signals like
> SIGTRAP/AIGABRT/SIGFPE/SIGBUS), or for killing the process (signals
> SIGHUP/SIGINT/SIGQUIT).
>
> So maybe you'd be happy with just picking another signal? It's not a
> _pretty_ solution, but it should work across most kernel versions.
So I did already. This is the workaround that I referred to before:
"... I'll stick to the SIGUSR2 workaround that's confirmed by Oleg."
So I'm happy enough that it's fixed in my app, but considering how hard
it was to figure out what's going on, I thought the kernel people may
be interested.
> Linus