2009-06-02 12:55:02

by Jiri Slaby

[permalink] [raw]
Subject: Re: [PATCH 1/1] signal: make group kill signal fatal

On 05/26/2009 12:51 AM, Oleg Nesterov wrote:
> Heh. In this case you have another (long-standing) issue, please note
> the "if (p->flags & PF_EXITING)" check in wants_signal().
>
> There is no guarantee the signal will wake up the exiting task task.
> Even SIGKILL, even if you use wait_event_interruptible() instead of
> _killable.

Last question, doesn't wait_event_interruptible return immediately in
this case? signal_pending returns true due to non-captured signal which
killed the application and hence we are in the .release under these
special circumstances. I think this is not much expected behavior, is
it? Shouldn't be that signal dequeued/cleared instead?


2009-06-02 14:55:44

by Oleg Nesterov

[permalink] [raw]
Subject: Re: [PATCH 1/1] signal: make group kill signal fatal

On 06/02, Jiri Slaby wrote:
>
> On 05/26/2009 12:51 AM, Oleg Nesterov wrote:
> > Heh. In this case you have another (long-standing) issue, please note
> > the "if (p->flags & PF_EXITING)" check in wants_signal().
> >
> > There is no guarantee the signal will wake up the exiting task task.
> > Even SIGKILL, even if you use wait_event_interruptible() instead of
> > _killable.
>
> Last question, doesn't wait_event_interruptible return immediately in
> this case? signal_pending returns true

Yes, if a thread exits with the pending signal, then of course interruptible
wait doesn work.

> due to non-captured signal which
> killed the application

This signal can be already dequeued, but not necessary. Most probably,
when the process is killed by group the fatal signal, each thread will
exit with the shared signal pending.

> I think this is not much expected behavior, is
> it?

Well. I don't know. I'd say this is expected, but I don't think this
was specially designed ;)

> Shouldn't be that signal dequeued/cleared instead?

We can't. Think about the multithreads program. Some thread exits,
and we have a shared signal. We must not dequeue it.

We can clear TIF_SIGPENDING, and we can change recalc_sigpending_xxx()
to take PF_EXITING into account (or change their callers), but this
needs changes. And I am not sure this will right.

Oleg.

2009-06-03 02:18:17

by Roland McGrath

[permalink] [raw]
Subject: Re: [PATCH 1/1] signal: make group kill signal fatal

> > > Heh. In this case you have another (long-standing) issue, please note
> > > the "if (p->flags & PF_EXITING)" check in wants_signal().

Hmm. wants_signal():

if (p->flags & PF_EXITING)
return 0;
if (sig == SIGKILL)
return 1;

Perhaps we should reverse the order of those two?

But also I'm now reminded that complete_signal() short-circuits for the
single-threaded case and never does the sig_fatal() case.

This means a single-threaded process will have SIGKILL in shared_pending
but not in its own pending so __fatal_signal_pending() will be false, no?

I'm also now wondering if in some of our recent signals discussions we have
been assuming that SIGNAL_GROUP_EXIT is set when a fatal signal is pending.
We might be leaving some other unintended hole since that's not really true.

Probably we should just fiddle complete_signal() to do that stuff for the
single-threaded case too. (That obviates the wants_signal change above.)

> Yes, if a thread exits with the pending signal, then of course interruptible
> wait doesn work.

Along the same lines of the recent core dump discussion, I think it would
be proper to fix this so TIF_SIGPENDING isn't left set (nor is newly set)
by a signal that won't affect it later.

> We can clear TIF_SIGPENDING, and we can change recalc_sigpending_xxx()
> to take PF_EXITING into account (or change their callers), but this
> needs changes. And I am not sure this will right.

I think we want recalc_sigpending_tsk to be consistent with wants_signal
and the other conditions controlling signal_wake_up calls. But indeed we
need to think through any ramifications carefully.


Thanks,
Roland

2009-06-04 02:32:32

by Oleg Nesterov

[permalink] [raw]
Subject: Re: [PATCH 1/1] signal: make group kill signal fatal

On 06/02, Roland McGrath wrote:
>
> > > > Heh. In this case you have another (long-standing) issue, please note
> > > > the "if (p->flags & PF_EXITING)" check in wants_signal().
>
> Hmm. wants_signal():
>
> if (p->flags & PF_EXITING)
> return 0;
> if (sig == SIGKILL)
> return 1;
>
> Perhaps we should reverse the order of those two?

Yes perhaps. But afaics this is not enough.

First of all, we should decide what we really want wrt exiting process/thread
&& signals. (see also the end of message).

Let's suppose the killed/exiting process hangs somewhere in close_files(),
and the user wants to SIGKILL via kill(1).

If this process is multithreaded, how can we find the right thread to
wake up? Or we should assume the user should find the offending thread
and use tkill() ? In that case, what if this thread still has the pending
private SIGKILL ?

Of course, the same problem with the shared SIGKILL pending, it is never
dequeued so the next group-wide SIGKILL has no effect.

> But also I'm now reminded that complete_signal() short-circuits for the
> single-threaded case and never does the sig_fatal() case.
>
> This means a single-threaded process will have SIGKILL in shared_pending
> but not in its own pending so __fatal_signal_pending() will be false, no?

Hmm, afaics no. Or I misunderstood. Or I missed something.

Yes, it is possible that we add SIGKILL in shared_pending and do not add
it in ->pending, but this can only happen if all threads have PF_EXITING.
(so "single-threaded" above doesn't matter).

> I'm also now wondering if in some of our recent signals discussions we have
> been assuming that SIGNAL_GROUP_EXIT is set when a fatal signal is pending.

Yes. SIGNAL_GROUP_EXIT == all threads have the pending private SIGKILL.
Except, in do_exit() path, it can be already dequeued.

> > We can clear TIF_SIGPENDING, and we can change recalc_sigpending_xxx()
> > to take PF_EXITING into account (or change their callers), but this
> > needs changes. And I am not sure this will right.
>
> I think we want recalc_sigpending_tsk to be consistent with wants_signal
> and the other conditions controlling signal_wake_up calls.

Well, perhaps. But let's look from the different angle. IF the task was
already SIGKILL'ed, it looks a bit insane we need another SIGKILL to
really kill it if it hangs in do_exit().

Perhaps we need another flag, SIGNAL_GROUP_KILLED or whatever which is
set along with SIGNAL_GROUP_EXIT by complete_signal() when the task is
killed. It is not set by zap_other_threads/etc.

Now, exit_signals() should do something like

if (SIGNAL_GROUP_KILLED) {
// make sure interruptible/killable sleep is not
// possible, we are already killed
set_thread_flag(TIF_SIGPENDING);
} else {
// OK, we still respect SIGKILL
clear_thread_flag(TIF_SIGPENDING);
}

Of course we need other changes. complete_signal() should check
SIGNAL_GROUP_KILLED, not SIGNAL_GROUP_EXIT, and wake up all threads.
recalc_sigpending_tsk() needs changes, __fatal_signal_pending()
should be consistent with SIGNAL_GROUP_KILLED on exiting, etc.

Note also complete_signal() does signal_wake_up(t, sig == SIGKILL)
even if SIGNAL_GROUP_EXIT, we should be carefull.

> But indeed we
> need to think through any ramifications carefully.

Agreed. And yes, this is connected to the coredump discussion.

Oleg.