cf http://bugzilla.kernel.org/show_bug.cgi?id=9347
While a signal is blocked, it must be posted even if its action is
SIG_IGN or is SIG_DFL with the default action to ignore. This works
right most of the time, but is broken when a sigwait (rt_sigtimedwait)
is in progress. This changes the early-discard check to respect
real_blocked. ~blocked is the set to check for "should wake up now",
but ~(blocked|real_blocked) is the set for "blocked" semantics as
defined by POSIX.
Signed-off-by: Roland McGrath <[email protected]>
---
kernel/signal.c | 2 +-
1 files changed, 1 insertions(+), 1 deletions(-)
diff --git a/kernel/signal.c b/kernel/signal.c
index 0ac614a..9b22790 100644
--- a/kernel/signal.c
+++ b/kernel/signal.c
@@ -55,7 +55,7 @@ static int sig_ignored(struct task_struct *t, int sig)
* signal handler may change by the time it is
* unblocked.
*/
- if (sigismember(&t->blocked, sig))
+ if (sigismember(&t->blocked, sig) || sigismember(&t->real_blocked, sig))
return 0;
/* Is it explicitly or implicitly ignored? */
On 11/12, Roland McGrath wrote:
>
> cf http://bugzilla.kernel.org/show_bug.cgi?id=9347
>
> While a signal is blocked, it must be posted even if its action is
> SIG_IGN or is SIG_DFL with the default action to ignore. This works
> right most of the time, but is broken when a sigwait (rt_sigtimedwait)
> is in progress. This changes the early-discard check to respect
> real_blocked. ~blocked is the set to check for "should wake up now",
> but ~(blocked|real_blocked) is the set for "blocked" semantics as
> defined by POSIX.
>
> Signed-off-by: Roland McGrath <[email protected]>
> ---
> kernel/signal.c | 2 +-
> 1 files changed, 1 insertions(+), 1 deletions(-)
>
> diff --git a/kernel/signal.c b/kernel/signal.c
> index 0ac614a..9b22790 100644
> --- a/kernel/signal.c
> +++ b/kernel/signal.c
> @@ -55,7 +55,7 @@ static int sig_ignored(struct task_struct *t, int sig)
> * signal handler may change by the time it is
> * unblocked.
> */
> - if (sigismember(&t->blocked, sig))
> + if (sigismember(&t->blocked, sig) || sigismember(&t->real_blocked, sig))
> return 0;
I believe the patch is correct and fixes the 9347 bug.
But I suspect we have other issues here. Let's suppose we have threads T1 (main)
and T2. T2 blocks SIGCHLD and does sigwait(SIGCHLD).
Now, we send SIGCHLD to the thread group. The signal is lost again because
sig_ignored() returns true on T1's side.
Is this OK? For example, let's modify the test-case:
void *tfunc(void *arg)
{
sleep(1);
if (!fork())
exit(0);
for (;;)
pause();
}
int main(void)
{
pthread_t pt;
sigset_t set;
int signum;
pthread_create(&pt, 0, tfunc, 0);
sigemptyset(&set);
sigaddset(&set, SIGCHLD);
sigprocmask(SIG_BLOCK, &set, NULL);
sigwait(&set, &signum);
return 0;
}
I bet sigwait() will hang again.
I think __group_send_sig_info() shouldn't use sig_ignored() at all. Instead,
we should split __group_complete_signal() into 2 functions. The first one, say,
choose_target() shoud be used instead. The second one, handle_fatal_signal()
can also be used by tkill() and friends.
What do you think?
Actually, I was going to do this cleanup a long ago, but can't find the time
to try to do this.
Oleg.
> But I suspect we have other issues here. Let's suppose we have threads T1
> (main) and T2. T2 blocks SIGCHLD and does sigwait(SIGCHLD).
>
> Now, we send SIGCHLD to the thread group. The signal is lost again because
> sig_ignored() returns true on T1's side.
>
> Is this OK? [...]
Yes, it's OK if T1 has SIGCHLD unblocked. When there are multiple threads
that either don't block the signal or are in sigwait for it, then it can go
to any of them and there are no guarantees at all about which. So we
simply say that the signal went to the thread not in sigwait that has that
signal unblocked (T1). When it got there, it was ignored. The user
semantics are equivalent even if that thread never actually woke and
dequeued the signal to ignore it.
> I think __group_send_sig_info() shouldn't use sig_ignored() at
> all. Instead, we should split __group_complete_signal() into 2
> functions. The first one, say, choose_target() shoud be used instead. The
> second one, handle_fatal_signal() can also be used by tkill() and
> friends.
I will probably like any reorganization of the code that makes it cleaner.
But we don't need to change the behavior in this regard. The most common
path is __group_send_sig_info (from sys_kill), so it would be a shame to
remove the short-circuit optimization there. In fact, it is probably wrong
semantics to do so (for what applications expect, and possibly for POSIX
conformance). That is, an unblocked, ignored (or default-ignore) signal
should not set TIF_SIGPENDING and interrupt a system call in progress or
just about to start. Obviously this is not a huge problem, since that's
what happens under any debugger. But still, an undesireable change of
semantics as well as de-optimization, I'd say.
Thanks,
Roland
On 11/13, Roland McGrath wrote:
>
> > But I suspect we have other issues here. Let's suppose we have threads T1
> > (main) and T2. T2 blocks SIGCHLD and does sigwait(SIGCHLD).
> >
> > Now, we send SIGCHLD to the thread group. The signal is lost again because
> > sig_ignored() returns true on T1's side.
> >
> > Is this OK? [...]
>
> Yes, it's OK if T1 has SIGCHLD unblocked. When there are multiple threads
> that either don't block the signal or are in sigwait for it, then it can go
> to any of them and there are no guarantees at all about which. So we
> simply say that the signal went to the thread not in sigwait that has that
> signal unblocked (T1). When it got there, it was ignored. The user
> semantics are equivalent even if that thread never actually woke and
> dequeued the signal to ignore it.
Yes.
I misunderstood the required semantics for sigwait(), thanks Roland.
Oleg.