2021-11-01 03:46:11

by Kyle Huey

[permalink] [raw]
Subject: [PATCH] signal: SIGKILL can cause signal effects to appear at PTRACE_EVENT_EXIT without tracer notification

rr, a userspace record and replay debugger[0], uses the recorded register
state at PTRACE_EVENT_EXIT to find the point in time at which to cease
executing the program during replay.

If a SIGKILL races with processing another signal in get_signal, it is
possible for the kernel to decline to notify the tracer of the original
signal. But if the original signal had a handler, the kernel proceeds
with setting up a signal handler frame as if the tracer had chosen to
deliver the signal unmodified to the tracee. When the kernel goes to
execute the signal handler that it has now modified the stack and registers
for, it will discover the pending SIGKILL, and terminate the tracee
without executing the handler. When PTRACE_EVENT_EXIT is delivered to
the tracer, however, the effects of handler setup will be visible to
the tracer.

Because rr (the tracer) was never notified of the signal, it is not aware
that a signal handler frame was set up and expects the state of the program
at PTRACE_EVENT_EXIT to be a state that will be reconstructed naturally
by allowing the program to execute from the last event. When that fails
to happen during replay, rr will assert and die.

The following patches add an explicit check for a newly pending SIGKILL
after the ptracer has been notified and the siglock has been reacquired.
If this happens, we stop processing the current signal and proceed
immediately to handling the SIGKILL. This makes the state reported at
PTRACE_EVENT_EXIT the unmodified state of the program, and also avoids the
work to set up a signal handler frame that will never be used.

This issue was originally reported by the credited rr user.

[0] https://rr-project.org/



2021-11-02 14:12:05

by Eric W. Biederman

[permalink] [raw]
Subject: Re: [PATCH] signal: SIGKILL can cause signal effects to appear at PTRACE_EVENT_EXIT without tracer notification

Kyle Huey <[email protected]> writes:

> rr, a userspace record and replay debugger[0], uses the recorded register
> state at PTRACE_EVENT_EXIT to find the point in time at which to cease
> executing the program during replay.
>
> If a SIGKILL races with processing another signal in get_signal, it is
> possible for the kernel to decline to notify the tracer of the original
> signal. But if the original signal had a handler, the kernel proceeds
> with setting up a signal handler frame as if the tracer had chosen to
> deliver the signal unmodified to the tracee. When the kernel goes to
> execute the signal handler that it has now modified the stack and registers
> for, it will discover the pending SIGKILL, and terminate the tracee
> without executing the handler. When PTRACE_EVENT_EXIT is delivered to
> the tracer, however, the effects of handler setup will be visible to
> the tracer.
>
> Because rr (the tracer) was never notified of the signal, it is not aware
> that a signal handler frame was set up and expects the state of the program
> at PTRACE_EVENT_EXIT to be a state that will be reconstructed naturally
> by allowing the program to execute from the last event. When that fails
> to happen during replay, rr will assert and die.
>
> The following patches add an explicit check for a newly pending SIGKILL
> after the ptracer has been notified and the siglock has been reacquired.
> If this happens, we stop processing the current signal and proceed
> immediately to handling the SIGKILL. This makes the state reported at
> PTRACE_EVENT_EXIT the unmodified state of the program, and also avoids the
> work to set up a signal handler frame that will never be used.
>
> This issue was originally reported by the credited rr user.
>
> [0] https://rr-project.org/

If I read this correctly the problem is not precisely that the rr
debugger is never notified about the signal, but rather that the program
is killed with SIGKILL before rr can read the notification and see which
signal it is.

This definitely sounds like a quality of implementation issue.

The solution that is proposed in your patches simply drops the signal
when SIGKILL is pending.

I think we can have a slightly better of quality of implementation
than that (as well as a simpler implementation) by requeuing the
signal instead of simply dropping it. Something like the below.

Can you test that and see if it works for you?

Eric

diff --git a/kernel/signal.c b/kernel/signal.c
index 056a107e3cbc..0dff366b9129 100644
--- a/kernel/signal.c
+++ b/kernel/signal.c
@@ -2610,7 +2610,8 @@ static int ptrace_signal(int signr, kernel_siginfo_t *info)
}

/* If the (new) signal is now blocked, requeue it. */
- if (sigismember(&current->blocked, signr)) {
+ if (sigismember(&current->blocked, signr) ||
+ signal_group_exit(current->signal)) {
send_signal(signr, info, current, PIDTYPE_PID);
signr = 0;
}

2021-11-02 16:18:17

by Kyle Huey

[permalink] [raw]
Subject: Re: [PATCH] signal: SIGKILL can cause signal effects to appear at PTRACE_EVENT_EXIT without tracer notification

On Tue, Nov 2, 2021 at 7:09 AM Eric W. Biederman <[email protected]> wrote:
>
> Kyle Huey <[email protected]> writes:
>
> > rr, a userspace record and replay debugger[0], uses the recorded register
> > state at PTRACE_EVENT_EXIT to find the point in time at which to cease
> > executing the program during replay.
> >
> > If a SIGKILL races with processing another signal in get_signal, it is
> > possible for the kernel to decline to notify the tracer of the original
> > signal. But if the original signal had a handler, the kernel proceeds
> > with setting up a signal handler frame as if the tracer had chosen to
> > deliver the signal unmodified to the tracee. When the kernel goes to
> > execute the signal handler that it has now modified the stack and registers
> > for, it will discover the pending SIGKILL, and terminate the tracee
> > without executing the handler. When PTRACE_EVENT_EXIT is delivered to
> > the tracer, however, the effects of handler setup will be visible to
> > the tracer.
> >
> > Because rr (the tracer) was never notified of the signal, it is not aware
> > that a signal handler frame was set up and expects the state of the program
> > at PTRACE_EVENT_EXIT to be a state that will be reconstructed naturally
> > by allowing the program to execute from the last event. When that fails
> > to happen during replay, rr will assert and die.
> >
> > The following patches add an explicit check for a newly pending SIGKILL
> > after the ptracer has been notified and the siglock has been reacquired.
> > If this happens, we stop processing the current signal and proceed
> > immediately to handling the SIGKILL. This makes the state reported at
> > PTRACE_EVENT_EXIT the unmodified state of the program, and also avoids the
> > work to set up a signal handler frame that will never be used.
> >
> > This issue was originally reported by the credited rr user.
> >
> > [0] https://rr-project.org/
>
> If I read this correctly the problem is not precisely that the rr
> debugger is never notified about the signal, but rather that the program
> is killed with SIGKILL before rr can read the notification and see which
> signal it is.

The precise problem is that the kernel made a modification to the
tracee state (setting up the signal handler frame) without telling the
tracer about it (delivering the ptrace notification for the pending
non-SIGKILL signal). That can be fixed either by not modifying the
tracee state here or by telling the tracer about the signal (that will
never actually run). I suspect we'll all agree that the former seems
preferable.

> This definitely sounds like a quality of implementation issue.
>
> The solution that is proposed in your patches simply drops the signal
> when SIGKILL is pending.

That's right.

> I think we can have a slightly better of quality of implementation
> than that (as well as a simpler implementation) by requeuing the
> signal instead of simply dropping it. Something like the below.

What is the benefit of requeueing the signal? All pending signals will
be dropped when the SIGKILL is processed, no?

> Can you test that and see if it works for you?

It does not work. This triggers an infinite loop in get_signal, as we
dequeue the signal, attempt to notify the ptracer, see the pending
sigkill, requeue the signal, go around the loop, dequeue the original
signal ...

- Kyle

> Eric
>
> diff --git a/kernel/signal.c b/kernel/signal.c
> index 056a107e3cbc..0dff366b9129 100644
> --- a/kernel/signal.c
> +++ b/kernel/signal.c
> @@ -2610,7 +2610,8 @@ static int ptrace_signal(int signr, kernel_siginfo_t *info)
> }
>
> /* If the (new) signal is now blocked, requeue it. */
> - if (sigismember(&current->blocked, signr)) {
> + if (sigismember(&current->blocked, signr) ||
> + signal_group_exit(current->signal)) {
> send_signal(signr, info, current, PIDTYPE_PID);
> signr = 0;
> }
>

2021-11-02 18:09:25

by Eric W. Biederman

[permalink] [raw]
Subject: Re: [PATCH] signal: SIGKILL can cause signal effects to appear at PTRACE_EVENT_EXIT without tracer notification

Kyle Huey <[email protected]> writes:

> On Tue, Nov 2, 2021 at 7:09 AM Eric W. Biederman <[email protected]> wrote:
>>
>> Kyle Huey <[email protected]> writes:
>>
>> > rr, a userspace record and replay debugger[0], uses the recorded register
>> > state at PTRACE_EVENT_EXIT to find the point in time at which to cease
>> > executing the program during replay.
>> >
>> > If a SIGKILL races with processing another signal in get_signal, it is
>> > possible for the kernel to decline to notify the tracer of the original
>> > signal. But if the original signal had a handler, the kernel proceeds
>> > with setting up a signal handler frame as if the tracer had chosen to
>> > deliver the signal unmodified to the tracee. When the kernel goes to
>> > execute the signal handler that it has now modified the stack and registers
>> > for, it will discover the pending SIGKILL, and terminate the tracee
>> > without executing the handler. When PTRACE_EVENT_EXIT is delivered to
>> > the tracer, however, the effects of handler setup will be visible to
>> > the tracer.
>> >
>> > Because rr (the tracer) was never notified of the signal, it is not aware
>> > that a signal handler frame was set up and expects the state of the program
>> > at PTRACE_EVENT_EXIT to be a state that will be reconstructed naturally
>> > by allowing the program to execute from the last event. When that fails
>> > to happen during replay, rr will assert and die.
>> >
>> > The following patches add an explicit check for a newly pending SIGKILL
>> > after the ptracer has been notified and the siglock has been reacquired.
>> > If this happens, we stop processing the current signal and proceed
>> > immediately to handling the SIGKILL. This makes the state reported at
>> > PTRACE_EVENT_EXIT the unmodified state of the program, and also avoids the
>> > work to set up a signal handler frame that will never be used.
>> >
>> > This issue was originally reported by the credited rr user.
>> >
>> > [0] https://rr-project.org/
>>
>> If I read this correctly the problem is not precisely that the rr
>> debugger is never notified about the signal, but rather that the program
>> is killed with SIGKILL before rr can read the notification and see which
>> signal it is.
>
> The precise problem is that the kernel made a modification to the
> tracee state (setting up the signal handler frame) without telling the
> tracer about it (delivering the ptrace notification for the pending
> non-SIGKILL signal).

Except the kernel did make it to ptrace_stop. The stop just did not
fully happen because of SIGKILL. I expect SIGCHLD was sent to the
tracer as part of that stop that never fully happened.

> That can be fixed either by not modifying the
> tracee state here or by telling the tracer about the signal (that will
> never actually run). I suspect we'll all agree that the former seems
> preferable.
>
>> This definitely sounds like a quality of implementation issue.
>>
>> The solution that is proposed in your patches simply drops the signal
>> when SIGKILL is pending.
>
> That's right.
>
>> I think we can have a slightly better of quality of implementation
>> than that (as well as a simpler implementation) by requeuing the
>> signal instead of simply dropping it. Something like the below.
>
> What is the benefit of requeueing the signal? All pending signals will
> be dropped when the SIGKILL is processed, no?

Not before PTRACE_EVENT_EXIT. In fact the pending signals are not
actually flushed until the thread or the entire process is reaped.

Further the coredump code makes some attempt to write out the
pending signals. The code appears to predate siginfo support
in the kernel so it misses a lot but it is there.

The real advantage is that it keeps the logic of dealing with weird
ptrace_stop logic in ptrace_signal where it belongs. It also allows the
common (and missing in this case) idiom of goto relock to be used.

So I think changing ptrace_signal will be much more maintainable.

>> Can you test that and see if it works for you?
>
> It does not work. This triggers an infinite loop in get_signal, as we
> dequeue the signal, attempt to notify the ptracer, see the pending
> sigkill, requeue the signal, go around the loop, dequeue the original
> signal ...

Apologies I made a bit of a thinko. That change also needs to change
the handling of if (signr == 0) after ptrace_signal.

Which means it would need to be something like the below.

diff --git a/kernel/signal.c b/kernel/signal.c
index 056a107e3cbc..eddb745b34a7 100644
--- a/kernel/signal.c
+++ b/kernel/signal.c
@@ -2610,7 +2610,8 @@ static int ptrace_signal(int signr, kernel_siginfo_t *info)
}

/* If the (new) signal is now blocked, requeue it. */
- if (sigismember(&current->blocked, signr)) {
+ if (sigismember(&current->blocked, signr) ||
+ signal_group_exit(current->signal)) {
send_signal(signr, info, current, PIDTYPE_PID);
signr = 0;
}
@@ -2764,8 +2765,10 @@ bool get_signal(struct ksignal *ksig)
if (unlikely(current->ptrace) && (signr != SIGKILL) &&
!(sighand->action[signr -1].sa.sa_flags & SA_IMMUTABLE)) {
signr = ptrace_signal(signr, &ksig->info);
- if (!signr)
- continue;
+ if (!signr) {
+ spin_unlock_irq(&sighand->siglock);
+ goto relock;
+ }
}

ka = &sighand->action[signr-1];

Eric

2021-11-02 19:15:10

by Kyle Huey

[permalink] [raw]
Subject: Re: [PATCH] signal: SIGKILL can cause signal effects to appear at PTRACE_EVENT_EXIT without tracer notification

On Tue, Nov 2, 2021 at 11:07 AM Eric W. Biederman <[email protected]> wrote:
>
> Kyle Huey <[email protected]> writes:
>
> > On Tue, Nov 2, 2021 at 7:09 AM Eric W. Biederman <[email protected]> wrote:
> >>
> >> Kyle Huey <[email protected]> writes:
> >>
> >> > rr, a userspace record and replay debugger[0], uses the recorded register
> >> > state at PTRACE_EVENT_EXIT to find the point in time at which to cease
> >> > executing the program during replay.
> >> >
> >> > If a SIGKILL races with processing another signal in get_signal, it is
> >> > possible for the kernel to decline to notify the tracer of the original
> >> > signal. But if the original signal had a handler, the kernel proceeds
> >> > with setting up a signal handler frame as if the tracer had chosen to
> >> > deliver the signal unmodified to the tracee. When the kernel goes to
> >> > execute the signal handler that it has now modified the stack and registers
> >> > for, it will discover the pending SIGKILL, and terminate the tracee
> >> > without executing the handler. When PTRACE_EVENT_EXIT is delivered to
> >> > the tracer, however, the effects of handler setup will be visible to
> >> > the tracer.
> >> >
> >> > Because rr (the tracer) was never notified of the signal, it is not aware
> >> > that a signal handler frame was set up and expects the state of the program
> >> > at PTRACE_EVENT_EXIT to be a state that will be reconstructed naturally
> >> > by allowing the program to execute from the last event. When that fails
> >> > to happen during replay, rr will assert and die.
> >> >
> >> > The following patches add an explicit check for a newly pending SIGKILL
> >> > after the ptracer has been notified and the siglock has been reacquired.
> >> > If this happens, we stop processing the current signal and proceed
> >> > immediately to handling the SIGKILL. This makes the state reported at
> >> > PTRACE_EVENT_EXIT the unmodified state of the program, and also avoids the
> >> > work to set up a signal handler frame that will never be used.
> >> >
> >> > This issue was originally reported by the credited rr user.
> >> >
> >> > [0] https://rr-project.org/
> >>
> >> If I read this correctly the problem is not precisely that the rr
> >> debugger is never notified about the signal, but rather that the program
> >> is killed with SIGKILL before rr can read the notification and see which
> >> signal it is.
> >
> > The precise problem is that the kernel made a modification to the
> > tracee state (setting up the signal handler frame) without telling the
> > tracer about it (delivering the ptrace notification for the pending
> > non-SIGKILL signal).
>
> Except the kernel did make it to ptrace_stop. The stop just did not
> fully happen because of SIGKILL. I expect SIGCHLD was sent to the
> tracer as part of that stop that never fully happened.

I don't know whether SIGCHLD was sent to the tracer (rr doesn't use it
directly) but waiting on the process does not produce a wait status
corresponding to the signal delivery stop for the original signal.
Waiting on the tracee skips immediately from whatever the preceding
ptrace event was to the PTRACE_EVENT_EXIT.

(In our particular case, if it had been notified of the signal, we
would have chosen to suppress the signal, because the signal in
question is a SIGSEGV from an rdtsc instruction that has been disabled
via prctl(PR_SET_TSC, PR_TSC_SIGSEGV) and we emulate it in the tracer
due to its non-deterministic behavior. So we really don't expect to
see the tracee signal handler.)

> > That can be fixed either by not modifying the
> > tracee state here or by telling the tracer about the signal (that will
> > never actually run). I suspect we'll all agree that the former seems
> > preferable.
> >
> >> This definitely sounds like a quality of implementation issue.
> >>
> >> The solution that is proposed in your patches simply drops the signal
> >> when SIGKILL is pending.
> >
> > That's right.
> >
> >> I think we can have a slightly better of quality of implementation
> >> than that (as well as a simpler implementation) by requeuing the
> >> signal instead of simply dropping it. Something like the below.
> >
> > What is the benefit of requeueing the signal? All pending signals will
> > be dropped when the SIGKILL is processed, no?
>
> Not before PTRACE_EVENT_EXIT. In fact the pending signals are not
> actually flushed until the thread or the entire process is reaped.
>
> Further the coredump code makes some attempt to write out the
> pending signals. The code appears to predate siginfo support
> in the kernel so it misses a lot but it is there.
>
> The real advantage is that it keeps the logic of dealing with weird
> ptrace_stop logic in ptrace_signal where it belongs. It also allows the
> common (and missing in this case) idiom of goto relock to be used.
>
> So I think changing ptrace_signal will be much more maintainable.

Ok.

> >> Can you test that and see if it works for you?
> >
> > It does not work. This triggers an infinite loop in get_signal, as we
> > dequeue the signal, attempt to notify the ptracer, see the pending
> > sigkill, requeue the signal, go around the loop, dequeue the original
> > signal ...
>
> Apologies I made a bit of a thinko. That change also needs to change
> the handling of if (signr == 0) after ptrace_signal.
>
> Which means it would need to be something like the below.
>
> diff --git a/kernel/signal.c b/kernel/signal.c
> index 056a107e3cbc..eddb745b34a7 100644
> --- a/kernel/signal.c
> +++ b/kernel/signal.c
> @@ -2610,7 +2610,8 @@ static int ptrace_signal(int signr, kernel_siginfo_t *info)
> }
>
> /* If the (new) signal is now blocked, requeue it. */
> - if (sigismember(&current->blocked, signr)) {
> + if (sigismember(&current->blocked, signr) ||
> + signal_group_exit(current->signal)) {
> send_signal(signr, info, current, PIDTYPE_PID);
> signr = 0;
> }
> @@ -2764,8 +2765,10 @@ bool get_signal(struct ksignal *ksig)
> if (unlikely(current->ptrace) && (signr != SIGKILL) &&
> !(sighand->action[signr -1].sa.sa_flags & SA_IMMUTABLE)) {
> signr = ptrace_signal(signr, &ksig->info);
> - if (!signr)
> - continue;
> + if (!signr) {
> + spin_unlock_irq(&sighand->siglock);
> + goto relock;
> + }
> }
>
> ka = &sighand->action[signr-1];
>
> Eric

Yeah that appears to fix the issue.

- Kyle

2021-11-09 06:23:17

by Kyle Huey

[permalink] [raw]
Subject: Re: [PATCH] signal: SIGKILL can cause signal effects to appear at PTRACE_EVENT_EXIT without tracer notification

On Tue, Nov 2, 2021 at 12:09 PM Kyle Huey <[email protected]> wrote:
>
> On Tue, Nov 2, 2021 at 11:07 AM Eric W. Biederman <[email protected]> wrote:
> >
> > Kyle Huey <[email protected]> writes:
> >
> > > On Tue, Nov 2, 2021 at 7:09 AM Eric W. Biederman <[email protected]> wrote:
> > >>
> > >> Kyle Huey <[email protected]> writes:
> > >>
> > >> > rr, a userspace record and replay debugger[0], uses the recorded register
> > >> > state at PTRACE_EVENT_EXIT to find the point in time at which to cease
> > >> > executing the program during replay.
> > >> >
> > >> > If a SIGKILL races with processing another signal in get_signal, it is
> > >> > possible for the kernel to decline to notify the tracer of the original
> > >> > signal. But if the original signal had a handler, the kernel proceeds
> > >> > with setting up a signal handler frame as if the tracer had chosen to
> > >> > deliver the signal unmodified to the tracee. When the kernel goes to
> > >> > execute the signal handler that it has now modified the stack and registers
> > >> > for, it will discover the pending SIGKILL, and terminate the tracee
> > >> > without executing the handler. When PTRACE_EVENT_EXIT is delivered to
> > >> > the tracer, however, the effects of handler setup will be visible to
> > >> > the tracer.
> > >> >
> > >> > Because rr (the tracer) was never notified of the signal, it is not aware
> > >> > that a signal handler frame was set up and expects the state of the program
> > >> > at PTRACE_EVENT_EXIT to be a state that will be reconstructed naturally
> > >> > by allowing the program to execute from the last event. When that fails
> > >> > to happen during replay, rr will assert and die.
> > >> >
> > >> > The following patches add an explicit check for a newly pending SIGKILL
> > >> > after the ptracer has been notified and the siglock has been reacquired.
> > >> > If this happens, we stop processing the current signal and proceed
> > >> > immediately to handling the SIGKILL. This makes the state reported at
> > >> > PTRACE_EVENT_EXIT the unmodified state of the program, and also avoids the
> > >> > work to set up a signal handler frame that will never be used.
> > >> >
> > >> > This issue was originally reported by the credited rr user.
> > >> >
> > >> > [0] https://rr-project.org/
> > >>
> > >> If I read this correctly the problem is not precisely that the rr
> > >> debugger is never notified about the signal, but rather that the program
> > >> is killed with SIGKILL before rr can read the notification and see which
> > >> signal it is.
> > >
> > > The precise problem is that the kernel made a modification to the
> > > tracee state (setting up the signal handler frame) without telling the
> > > tracer about it (delivering the ptrace notification for the pending
> > > non-SIGKILL signal).
> >
> > Except the kernel did make it to ptrace_stop. The stop just did not
> > fully happen because of SIGKILL. I expect SIGCHLD was sent to the
> > tracer as part of that stop that never fully happened.
>
> I don't know whether SIGCHLD was sent to the tracer (rr doesn't use it
> directly) but waiting on the process does not produce a wait status
> corresponding to the signal delivery stop for the original signal.
> Waiting on the tracee skips immediately from whatever the preceding
> ptrace event was to the PTRACE_EVENT_EXIT.
>
> (In our particular case, if it had been notified of the signal, we
> would have chosen to suppress the signal, because the signal in
> question is a SIGSEGV from an rdtsc instruction that has been disabled
> via prctl(PR_SET_TSC, PR_TSC_SIGSEGV) and we emulate it in the tracer
> due to its non-deterministic behavior. So we really don't expect to
> see the tracee signal handler.)
>
> > > That can be fixed either by not modifying the
> > > tracee state here or by telling the tracer about the signal (that will
> > > never actually run). I suspect we'll all agree that the former seems
> > > preferable.
> > >
> > >> This definitely sounds like a quality of implementation issue.
> > >>
> > >> The solution that is proposed in your patches simply drops the signal
> > >> when SIGKILL is pending.
> > >
> > > That's right.
> > >
> > >> I think we can have a slightly better of quality of implementation
> > >> than that (as well as a simpler implementation) by requeuing the
> > >> signal instead of simply dropping it. Something like the below.
> > >
> > > What is the benefit of requeueing the signal? All pending signals will
> > > be dropped when the SIGKILL is processed, no?
> >
> > Not before PTRACE_EVENT_EXIT. In fact the pending signals are not
> > actually flushed until the thread or the entire process is reaped.
> >
> > Further the coredump code makes some attempt to write out the
> > pending signals. The code appears to predate siginfo support
> > in the kernel so it misses a lot but it is there.
> >
> > The real advantage is that it keeps the logic of dealing with weird
> > ptrace_stop logic in ptrace_signal where it belongs. It also allows the
> > common (and missing in this case) idiom of goto relock to be used.
> >
> > So I think changing ptrace_signal will be much more maintainable.
>
> Ok.
>
> > >> Can you test that and see if it works for you?
> > >
> > > It does not work. This triggers an infinite loop in get_signal, as we
> > > dequeue the signal, attempt to notify the ptracer, see the pending
> > > sigkill, requeue the signal, go around the loop, dequeue the original
> > > signal ...
> >
> > Apologies I made a bit of a thinko. That change also needs to change
> > the handling of if (signr == 0) after ptrace_signal.
> >
> > Which means it would need to be something like the below.
> >
> > diff --git a/kernel/signal.c b/kernel/signal.c
> > index 056a107e3cbc..eddb745b34a7 100644
> > --- a/kernel/signal.c
> > +++ b/kernel/signal.c
> > @@ -2610,7 +2610,8 @@ static int ptrace_signal(int signr, kernel_siginfo_t *info)
> > }
> >
> > /* If the (new) signal is now blocked, requeue it. */
> > - if (sigismember(&current->blocked, signr)) {
> > + if (sigismember(&current->blocked, signr) ||
> > + signal_group_exit(current->signal)) {
> > send_signal(signr, info, current, PIDTYPE_PID);
> > signr = 0;
> > }
> > @@ -2764,8 +2765,10 @@ bool get_signal(struct ksignal *ksig)
> > if (unlikely(current->ptrace) && (signr != SIGKILL) &&
> > !(sighand->action[signr -1].sa.sa_flags & SA_IMMUTABLE)) {
> > signr = ptrace_signal(signr, &ksig->info);
> > - if (!signr)
> > - continue;
> > + if (!signr) {
> > + spin_unlock_irq(&sighand->siglock);
> > + goto relock;
> > + }
> > }
> >
> > ka = &sighand->action[signr-1];
> >
> > Eric
>
> Yeah that appears to fix the issue.
>
> - Kyle

Is there anything else I need to do here or are you going to take it from here?

- Kyle

2021-11-14 17:19:34

by Eric W. Biederman

[permalink] [raw]
Subject: Re: [PATCH] signal: SIGKILL can cause signal effects to appear at PTRACE_EVENT_EXIT without tracer notification

Kyle Huey <[email protected]> writes:

> On Tue, Nov 2, 2021 at 12:09 PM Kyle Huey <[email protected]> wrote:
>>
>> On Tue, Nov 2, 2021 at 11:07 AM Eric W. Biederman <[email protected]> wrote:
>> >
>> > Kyle Huey <[email protected]> writes:
>> >
>> > > On Tue, Nov 2, 2021 at 7:09 AM Eric W. Biederman <[email protected]> wrote:
>> > >>
>> > >> Kyle Huey <[email protected]> writes:
>> > >>
>> > >> > rr, a userspace record and replay debugger[0], uses the recorded register
>> > >> > state at PTRACE_EVENT_EXIT to find the point in time at which to cease
>> > >> > executing the program during replay.
>> > >> >
>> > >> > If a SIGKILL races with processing another signal in get_signal, it is
>> > >> > possible for the kernel to decline to notify the tracer of the original
>> > >> > signal. But if the original signal had a handler, the kernel proceeds
>> > >> > with setting up a signal handler frame as if the tracer had chosen to
>> > >> > deliver the signal unmodified to the tracee. When the kernel goes to
>> > >> > execute the signal handler that it has now modified the stack and registers
>> > >> > for, it will discover the pending SIGKILL, and terminate the tracee
>> > >> > without executing the handler. When PTRACE_EVENT_EXIT is delivered to
>> > >> > the tracer, however, the effects of handler setup will be visible to
>> > >> > the tracer.
>> > >> >
>> > >> > Because rr (the tracer) was never notified of the signal, it is not aware
>> > >> > that a signal handler frame was set up and expects the state of the program
>> > >> > at PTRACE_EVENT_EXIT to be a state that will be reconstructed naturally
>> > >> > by allowing the program to execute from the last event. When that fails
>> > >> > to happen during replay, rr will assert and die.
>> > >> >
>> > >> > The following patches add an explicit check for a newly pending SIGKILL
>> > >> > after the ptracer has been notified and the siglock has been reacquired.
>> > >> > If this happens, we stop processing the current signal and proceed
>> > >> > immediately to handling the SIGKILL. This makes the state reported at
>> > >> > PTRACE_EVENT_EXIT the unmodified state of the program, and also avoids the
>> > >> > work to set up a signal handler frame that will never be used.
>> > >> >
>> > >> > This issue was originally reported by the credited rr user.
>> > >> >
>> > >> > [0] https://rr-project.org/
>> > >>
>> > >> If I read this correctly the problem is not precisely that the rr
>> > >> debugger is never notified about the signal, but rather that the program
>> > >> is killed with SIGKILL before rr can read the notification and see which
>> > >> signal it is.
>> > >
>> > > The precise problem is that the kernel made a modification to the
>> > > tracee state (setting up the signal handler frame) without telling the
>> > > tracer about it (delivering the ptrace notification for the pending
>> > > non-SIGKILL signal).
>> >
>> > Except the kernel did make it to ptrace_stop. The stop just did not
>> > fully happen because of SIGKILL. I expect SIGCHLD was sent to the
>> > tracer as part of that stop that never fully happened.
>>
>> I don't know whether SIGCHLD was sent to the tracer (rr doesn't use it
>> directly) but waiting on the process does not produce a wait status
>> corresponding to the signal delivery stop for the original signal.
>> Waiting on the tracee skips immediately from whatever the preceding
>> ptrace event was to the PTRACE_EVENT_EXIT.
>>
>> (In our particular case, if it had been notified of the signal, we
>> would have chosen to suppress the signal, because the signal in
>> question is a SIGSEGV from an rdtsc instruction that has been disabled
>> via prctl(PR_SET_TSC, PR_TSC_SIGSEGV) and we emulate it in the tracer
>> due to its non-deterministic behavior. So we really don't expect to
>> see the tracee signal handler.)
>>
>> > > That can be fixed either by not modifying the
>> > > tracee state here or by telling the tracer about the signal (that will
>> > > never actually run). I suspect we'll all agree that the former seems
>> > > preferable.
>> > >
>> > >> This definitely sounds like a quality of implementation issue.
>> > >>
>> > >> The solution that is proposed in your patches simply drops the signal
>> > >> when SIGKILL is pending.
>> > >
>> > > That's right.
>> > >
>> > >> I think we can have a slightly better of quality of implementation
>> > >> than that (as well as a simpler implementation) by requeuing the
>> > >> signal instead of simply dropping it. Something like the below.
>> > >
>> > > What is the benefit of requeueing the signal? All pending signals will
>> > > be dropped when the SIGKILL is processed, no?
>> >
>> > Not before PTRACE_EVENT_EXIT. In fact the pending signals are not
>> > actually flushed until the thread or the entire process is reaped.
>> >
>> > Further the coredump code makes some attempt to write out the
>> > pending signals. The code appears to predate siginfo support
>> > in the kernel so it misses a lot but it is there.
>> >
>> > The real advantage is that it keeps the logic of dealing with weird
>> > ptrace_stop logic in ptrace_signal where it belongs. It also allows the
>> > common (and missing in this case) idiom of goto relock to be used.
>> >
>> > So I think changing ptrace_signal will be much more maintainable.
>>
>> Ok.
>>
>> > >> Can you test that and see if it works for you?
>> > >
>> > > It does not work. This triggers an infinite loop in get_signal, as we
>> > > dequeue the signal, attempt to notify the ptracer, see the pending
>> > > sigkill, requeue the signal, go around the loop, dequeue the original
>> > > signal ...
>> >
>> > Apologies I made a bit of a thinko. That change also needs to change
>> > the handling of if (signr == 0) after ptrace_signal.
>> >
>> > Which means it would need to be something like the below.
>> >
>> > diff --git a/kernel/signal.c b/kernel/signal.c
>> > index 056a107e3cbc..eddb745b34a7 100644
>> > --- a/kernel/signal.c
>> > +++ b/kernel/signal.c
>> > @@ -2610,7 +2610,8 @@ static int ptrace_signal(int signr, kernel_siginfo_t *info)
>> > }
>> >
>> > /* If the (new) signal is now blocked, requeue it. */
>> > - if (sigismember(&current->blocked, signr)) {
>> > + if (sigismember(&current->blocked, signr) ||
>> > + signal_group_exit(current->signal)) {
>> > send_signal(signr, info, current, PIDTYPE_PID);
>> > signr = 0;
>> > }
>> > @@ -2764,8 +2765,10 @@ bool get_signal(struct ksignal *ksig)
>> > if (unlikely(current->ptrace) && (signr != SIGKILL) &&
>> > !(sighand->action[signr -1].sa.sa_flags & SA_IMMUTABLE)) {
>> > signr = ptrace_signal(signr, &ksig->info);
>> > - if (!signr)
>> > - continue;
>> > + if (!signr) {
>> > + spin_unlock_irq(&sighand->siglock);
>> > + goto relock;
>> > + }
>> > }
>> >
>> > ka = &sighand->action[signr-1];
>> >
>> > Eric
>>
>> Yeah that appears to fix the issue.
>>
>> - Kyle
>
> Is there anything else I need to do here or are you going to take it
> from here?

No pinging me is the helpful thing to do. I sometimes get distracted.

The merge window closes sometime later today and then v5.16-rc1 will be
out. Then I will have a good base to work against.

Until then I can't really merge anything.

Eric

2021-11-16 05:31:52

by Eric W. Biederman

[permalink] [raw]
Subject: [PATCH 0/3] signal: requeuing undeliverable signals


Kyle Huey recently reported[1] that rr gets confused if SIGKILL prevents
ptrace_signal from delivering a signal, as the kernel setups up a signal
frame for a signal that rr did not have a chance to observe with ptrace.

In looking into it I found a couple of bugs and a quality of
implementation issue.

- The test for signal_group_exit should be inside the for loop in get_signal.
- Signals should be requeued on the same queue they were dequeued from.
- When a fatal signal is pending ptrace_signal should not return another
signal for delivery.

Kyle Huey has verified[2] an earlier version of this change.

I have reworked things one more time to completely fix the issues
raised, and to keep the code maintainable long term.

I have smoke tested this code and combined with a careful review I
expect this code to work fine. Kyle if you can double check that
my last round of changes still works for rr I would appreciate it.

Eric W. Biederman (3):
signal: In get_signal test for signal_group_exit every time through the loop
signal: Requeue signals in the appropriate queue
signal: Requeue ptrace signals

fs/signalfd.c | 5 +++--
include/linux/sched/signal.h | 7 ++++---
kernel/signal.c | 44 ++++++++++++++++++++++++++------------------
3 files changed, 33 insertions(+), 23 deletions(-)

[1] https://lkml.kernel.org/r/[email protected]
[2] https://lkml.kernel.org/r/CAP045ApAX725ZfujaK-jJNkfCo5s+oVFpBvNfPJk+DKY8K7d=Q@mail.gmail.com

Eric

2021-11-16 05:33:24

by Eric W. Biederman

[permalink] [raw]
Subject: [PATCH 1/3] signal: In get_signal test for signal_group_exit every time through the loop


Recently while investigating a problem with rr and signals I noticed
that siglock is dropped in ptrace_signal and get_signal does not jump
to relock.

Looking farther to see if the problem is anywhere else I see that
do_signal_stop also returns if signal_group_exit is true. I believe
that test can now never be true, but it is a bit hard to trace
through and be certain.

Testing signal_group_exit is not expensive, so move the test for
signal_group_exit into the for loop inside of get_signal to ensure
the test is never skipped improperly.

This has been a potential since I added the test for signal_group_exit
was added.

Fixes: 35634ffa1751 ("signal: Always notice exiting tasks")
Signed-off-by: "Eric W. Biederman" <[email protected]>
---
kernel/signal.c | 20 ++++++++++----------
1 file changed, 10 insertions(+), 10 deletions(-)

diff --git a/kernel/signal.c b/kernel/signal.c
index 7c4b7ae714d4..986fa69c15c5 100644
--- a/kernel/signal.c
+++ b/kernel/signal.c
@@ -2662,19 +2662,19 @@ bool get_signal(struct ksignal *ksig)
goto relock;
}

- /* Has this task already been marked for death? */
- if (signal_group_exit(signal)) {
- ksig->info.si_signo = signr = SIGKILL;
- sigdelset(&current->pending.signal, SIGKILL);
- trace_signal_deliver(SIGKILL, SEND_SIG_NOINFO,
- &sighand->action[SIGKILL - 1]);
- recalc_sigpending();
- goto fatal;
- }
-
for (;;) {
struct k_sigaction *ka;

+ /* Has this task already been marked for death? */
+ if (signal_group_exit(signal)) {
+ ksig->info.si_signo = signr = SIGKILL;
+ sigdelset(&current->pending.signal, SIGKILL);
+ trace_signal_deliver(SIGKILL, SEND_SIG_NOINFO,
+ &sighand->action[SIGKILL - 1]);
+ recalc_sigpending();
+ goto fatal;
+ }
+
if (unlikely(current->jobctl & JOBCTL_STOP_PENDING) &&
do_signal_stop(0))
goto relock;
--
2.20.1


2021-11-16 05:33:57

by Eric W. Biederman

[permalink] [raw]
Subject: [PATCH 2/3] signal: Requeue signals in the appropriate queue


In the event that a tracer changes which signal needs to be delivered
and that signal is currently blocked then the signal needs to be
requeued for later delivery.

With the advent of CLONE_THREAD the kernel has 2 signal queues per
task. The per process queue and the per task queue. Update the code
so that if the signal is removed from the per process queue it is
requeued on the per process queue. This is necessary to make it
appear the signal was never dequeued.

The rr debugger reasonably believes that the state of the process from
the last ptrace_stop it observed until PTRACE_EVENT_EXIT can be recreated
by simply letting a process run. If a SIGKILL interrupts a ptrace_stop
this is not true today.

So return signals to their original queue in ptrace_signal so that
signals that are not delivered appear like they were never dequeued.

Fixes: 794aa320b79d ("[PATCH] sigfix-2.5.40-D6")
History Tree: https://git.kernel.org/pub/scm/linux/kernel/git/tglx/history.gi
Signed-off-by: "Eric W. Biederman" <[email protected]>
---
fs/signalfd.c | 5 +++--
include/linux/sched/signal.h | 7 ++++---
kernel/signal.c | 21 ++++++++++++++-------
3 files changed, 21 insertions(+), 12 deletions(-)

diff --git a/fs/signalfd.c b/fs/signalfd.c
index 040e1cf90528..74f134cd1ff6 100644
--- a/fs/signalfd.c
+++ b/fs/signalfd.c
@@ -165,11 +165,12 @@ static int signalfd_copyinfo(struct signalfd_siginfo __user *uinfo,
static ssize_t signalfd_dequeue(struct signalfd_ctx *ctx, kernel_siginfo_t *info,
int nonblock)
{
+ enum pid_type type;
ssize_t ret;
DECLARE_WAITQUEUE(wait, current);

spin_lock_irq(&current->sighand->siglock);
- ret = dequeue_signal(current, &ctx->sigmask, info);
+ ret = dequeue_signal(current, &ctx->sigmask, info, &type);
switch (ret) {
case 0:
if (!nonblock)
@@ -184,7 +185,7 @@ static ssize_t signalfd_dequeue(struct signalfd_ctx *ctx, kernel_siginfo_t *info
add_wait_queue(&current->sighand->signalfd_wqh, &wait);
for (;;) {
set_current_state(TASK_INTERRUPTIBLE);
- ret = dequeue_signal(current, &ctx->sigmask, info);
+ ret = dequeue_signal(current, &ctx->sigmask, info, &type);
if (ret != 0)
break;
if (signal_pending(current)) {
diff --git a/include/linux/sched/signal.h b/include/linux/sched/signal.h
index 23505394ef70..167995d471da 100644
--- a/include/linux/sched/signal.h
+++ b/include/linux/sched/signal.h
@@ -286,17 +286,18 @@ static inline int signal_group_exit(const struct signal_struct *sig)
extern void flush_signals(struct task_struct *);
extern void ignore_signals(struct task_struct *);
extern void flush_signal_handlers(struct task_struct *, int force_default);
-extern int dequeue_signal(struct task_struct *task,
- sigset_t *mask, kernel_siginfo_t *info);
+extern int dequeue_signal(struct task_struct *task, sigset_t *mask,
+ kernel_siginfo_t *info, enum pid_type *type);

static inline int kernel_dequeue_signal(void)
{
struct task_struct *task = current;
kernel_siginfo_t __info;
+ enum pid_type __type;
int ret;

spin_lock_irq(&task->sighand->siglock);
- ret = dequeue_signal(task, &task->blocked, &__info);
+ ret = dequeue_signal(task, &task->blocked, &__info, &__type);
spin_unlock_irq(&task->sighand->siglock);

return ret;
diff --git a/kernel/signal.c b/kernel/signal.c
index 986fa69c15c5..43e8b7e362b0 100644
--- a/kernel/signal.c
+++ b/kernel/signal.c
@@ -626,7 +626,8 @@ static int __dequeue_signal(struct sigpending *pending, sigset_t *mask,
*
* All callers have to hold the siglock.
*/
-int dequeue_signal(struct task_struct *tsk, sigset_t *mask, kernel_siginfo_t *info)
+int dequeue_signal(struct task_struct *tsk, sigset_t *mask,
+ kernel_siginfo_t *info, enum pid_type *type)
{
bool resched_timer = false;
int signr;
@@ -634,8 +635,10 @@ int dequeue_signal(struct task_struct *tsk, sigset_t *mask, kernel_siginfo_t *in
/* We only dequeue private signals from ourselves, we don't let
* signalfd steal them
*/
+ *type = PIDTYPE_PID;
signr = __dequeue_signal(&tsk->pending, mask, info, &resched_timer);
if (!signr) {
+ *type = PIDTYPE_TGID;
signr = __dequeue_signal(&tsk->signal->shared_pending,
mask, info, &resched_timer);
#ifdef CONFIG_POSIX_TIMERS
@@ -2522,7 +2525,7 @@ static void do_freezer_trap(void)
freezable_schedule();
}

-static int ptrace_signal(int signr, kernel_siginfo_t *info)
+static int ptrace_signal(int signr, kernel_siginfo_t *info, enum pid_type type)
{
/*
* We do not check sig_kernel_stop(signr) but set this marker
@@ -2563,7 +2566,7 @@ static int ptrace_signal(int signr, kernel_siginfo_t *info)

/* If the (new) signal is now blocked, requeue it. */
if (sigismember(&current->blocked, signr)) {
- send_signal(signr, info, current, PIDTYPE_PID);
+ send_signal(signr, info, current, type);
signr = 0;
}

@@ -2664,6 +2667,7 @@ bool get_signal(struct ksignal *ksig)

for (;;) {
struct k_sigaction *ka;
+ enum pid_type type;

/* Has this task already been marked for death? */
if (signal_group_exit(signal)) {
@@ -2706,16 +2710,18 @@ bool get_signal(struct ksignal *ksig)
* so that the instruction pointer in the signal stack
* frame points to the faulting instruction.
*/
+ type = PIDTYPE_PID;
signr = dequeue_synchronous_signal(&ksig->info);
if (!signr)
- signr = dequeue_signal(current, &current->blocked, &ksig->info);
+ signr = dequeue_signal(current, &current->blocked,
+ &ksig->info, &type);

if (!signr)
break; /* will return 0 */

if (unlikely(current->ptrace) && (signr != SIGKILL) &&
!(sighand->action[signr -1].sa.sa_flags & SA_IMMUTABLE)) {
- signr = ptrace_signal(signr, &ksig->info);
+ signr = ptrace_signal(signr, &ksig->info, type);
if (!signr)
continue;
}
@@ -3540,6 +3546,7 @@ static int do_sigtimedwait(const sigset_t *which, kernel_siginfo_t *info,
ktime_t *to = NULL, timeout = KTIME_MAX;
struct task_struct *tsk = current;
sigset_t mask = *which;
+ enum pid_type type;
int sig, ret = 0;

if (ts) {
@@ -3556,7 +3563,7 @@ static int do_sigtimedwait(const sigset_t *which, kernel_siginfo_t *info,
signotset(&mask);

spin_lock_irq(&tsk->sighand->siglock);
- sig = dequeue_signal(tsk, &mask, info);
+ sig = dequeue_signal(tsk, &mask, info, &type);
if (!sig && timeout) {
/*
* None ready, temporarily unblock those we're interested
@@ -3575,7 +3582,7 @@ static int do_sigtimedwait(const sigset_t *which, kernel_siginfo_t *info,
spin_lock_irq(&tsk->sighand->siglock);
__set_task_blocked(tsk, &tsk->real_blocked);
sigemptyset(&tsk->real_blocked);
- sig = dequeue_signal(tsk, &mask, info);
+ sig = dequeue_signal(tsk, &mask, info, &type);
}
spin_unlock_irq(&tsk->sighand->siglock);

--
2.20.1


2021-11-16 05:35:31

by Eric W. Biederman

[permalink] [raw]
Subject: [PATCH 3/3] signal: Requeue ptrace signals


Kyle Huey <[email protected]> writes:

> rr, a userspace record and replay debugger[0], uses the recorded register
> state at PTRACE_EVENT_EXIT to find the point in time at which to cease
> executing the program during replay.
>
> If a SIGKILL races with processing another signal in get_signal, it is
> possible for the kernel to decline to notify the tracer of the original
> signal. But if the original signal had a handler, the kernel proceeds
> with setting up a signal handler frame as if the tracer had chosen to
> deliver the signal unmodified to the tracee. When the kernel goes to
> execute the signal handler that it has now modified the stack and registers
> for, it will discover the pending SIGKILL, and terminate the tracee
> without executing the handler. When PTRACE_EVENT_EXIT is delivered to
> the tracer, however, the effects of handler setup will be visible to
> the tracer.
>
> Because rr (the tracer) was never notified of the signal, it is not aware
> that a signal handler frame was set up and expects the state of the program
> at PTRACE_EVENT_EXIT to be a state that will be reconstructed naturally
> by allowing the program to execute from the last event. When that fails
> to happen during replay, rr will assert and die.
>
> The following patches add an explicit check for a newly pending SIGKILL
> after the ptracer has been notified and the siglock has been reacquired.
> If this happens, we stop processing the current signal and proceed
> immediately to handling the SIGKILL. This makes the state reported at
> PTRACE_EVENT_EXIT the unmodified state of the program, and also avoids the
> work to set up a signal handler frame that will never be used.
>
> [0] https://rr-project.org/

The problem is that while the traced process makes it into ptrace_stop,
the tracee is killed before the tracer manages to wait for the
tracee and discover which signal was about to be delivered.

More generally the problem is that while siglock was dropped a signal
with process wide effect is short cirucit delivered to the entire
process killing it, but the process continues to try and deliver another
signal.

In general it impossible to avoid all cases where work is performed
after the process has been killed. In particular if the process is
killed after get_signal returns the code will simply not know it has
been killed until after delivering the signal frame to userspace.

On the other hand when the code has already discovered the process
has been killed and taken user space visible action that shows
the kernel knows the process has been killed, it is just silly
to then write the signal frame to the user space stack.

Instead of being silly detect the process has been killed
in ptrace_signal and requeue the signal so the code can pretend
it was simply never dequeued for delivery.

To test the process has been killed I use fatal_signal_pending rather
than signal_group_exit to match the test in signal_pending_state which
is used in schedule which is where ptrace_stop detects the process has
been killed.

Requeuing the signal so the code can pretend it was simply never
dequeued improves the user space visible behavior that has been
present since ebf5ebe31d2c ("[PATCH] signal-fixes-2.5.59-A4").

Kyle Huey verified that this change in behavior and makes rr happy.

Reported-by: Kyle Huey <[email protected]>
Reported-by: Marko Mäkelä <[email protected]>
History Tree: https://git.kernel.org/pub/scm/linux/kernel/git/tglx/history.gi
Signed-off-by: "Eric W. Biederman" <[email protected]>
---
kernel/signal.c | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/kernel/signal.c b/kernel/signal.c
index 43e8b7e362b0..621401550f0f 100644
--- a/kernel/signal.c
+++ b/kernel/signal.c
@@ -2565,7 +2565,8 @@ static int ptrace_signal(int signr, kernel_siginfo_t *info, enum pid_type type)
}

/* If the (new) signal is now blocked, requeue it. */
- if (sigismember(&current->blocked, signr)) {
+ if (sigismember(&current->blocked, signr) ||
+ fatal_signal_pending(current)) {
send_signal(signr, info, current, type);
signr = 0;
}
--
2.20.1


2021-11-16 18:23:58

by Kees Cook

[permalink] [raw]
Subject: Re: [PATCH 1/3] signal: In get_signal test for signal_group_exit every time through the loop

On Mon, Nov 15, 2021 at 11:32:50PM -0600, Eric W. Biederman wrote:
>
> Recently while investigating a problem with rr and signals I noticed
> that siglock is dropped in ptrace_signal and get_signal does not jump
> to relock.
>
> Looking farther to see if the problem is anywhere else I see that
> do_signal_stop also returns if signal_group_exit is true. I believe
> that test can now never be true, but it is a bit hard to trace
> through and be certain.
>
> Testing signal_group_exit is not expensive, so move the test for
> signal_group_exit into the for loop inside of get_signal to ensure
> the test is never skipped improperly.

Seems reasonable.

>
> This has been a potential since I added the test for signal_group_exit

nit: missing word after "potential"? "bug", "problem"?

> was added.
>
> Fixes: 35634ffa1751 ("signal: Always notice exiting tasks")
> Signed-off-by: "Eric W. Biederman" <[email protected]>

Reviewed-by: Kees Cook <[email protected]>

> ---
> kernel/signal.c | 20 ++++++++++----------
> 1 file changed, 10 insertions(+), 10 deletions(-)
>
> diff --git a/kernel/signal.c b/kernel/signal.c
> index 7c4b7ae714d4..986fa69c15c5 100644
> --- a/kernel/signal.c
> +++ b/kernel/signal.c
> @@ -2662,19 +2662,19 @@ bool get_signal(struct ksignal *ksig)
> goto relock;
> }
>
> - /* Has this task already been marked for death? */
> - if (signal_group_exit(signal)) {
> - ksig->info.si_signo = signr = SIGKILL;
> - sigdelset(&current->pending.signal, SIGKILL);
> - trace_signal_deliver(SIGKILL, SEND_SIG_NOINFO,
> - &sighand->action[SIGKILL - 1]);
> - recalc_sigpending();
> - goto fatal;
> - }
> -
> for (;;) {
> struct k_sigaction *ka;
>
> + /* Has this task already been marked for death? */
> + if (signal_group_exit(signal)) {
> + ksig->info.si_signo = signr = SIGKILL;
> + sigdelset(&current->pending.signal, SIGKILL);
> + trace_signal_deliver(SIGKILL, SEND_SIG_NOINFO,
> + &sighand->action[SIGKILL - 1]);
> + recalc_sigpending();
> + goto fatal;
> + }
> +
> if (unlikely(current->jobctl & JOBCTL_STOP_PENDING) &&
> do_signal_stop(0))
> goto relock;
> --
> 2.20.1
>

--
Kees Cook

2021-11-16 18:30:53

by Kees Cook

[permalink] [raw]
Subject: Re: [PATCH 2/3] signal: Requeue signals in the appropriate queue

On Mon, Nov 15, 2021 at 11:33:19PM -0600, Eric W. Biederman wrote:
>
> In the event that a tracer changes which signal needs to be delivered
> and that signal is currently blocked then the signal needs to be
> requeued for later delivery.
>
> With the advent of CLONE_THREAD the kernel has 2 signal queues per
> task. The per process queue and the per task queue. Update the code
> so that if the signal is removed from the per process queue it is
> requeued on the per process queue. This is necessary to make it
> appear the signal was never dequeued.
>
> The rr debugger reasonably believes that the state of the process from
> the last ptrace_stop it observed until PTRACE_EVENT_EXIT can be recreated
> by simply letting a process run. If a SIGKILL interrupts a ptrace_stop
> this is not true today.
>
> So return signals to their original queue in ptrace_signal so that
> signals that are not delivered appear like they were never dequeued.

The only comment I have on this is that it seems like many callers
totally ignore the result store in "type" (signalfd_dequeue,
kernel_dequeue_signal, do_sigtimedwait), which would imply a different
API might be desirable. That said, it's also not a big deal.

>
> Fixes: 794aa320b79d ("[PATCH] sigfix-2.5.40-D6")
> History Tree: https://git.kernel.org/pub/scm/linux/kernel/git/tglx/history.gi
> Signed-off-by: "Eric W. Biederman" <[email protected]>

Reviewed-by: Kees Cook <[email protected]>


> ---
> fs/signalfd.c | 5 +++--
> include/linux/sched/signal.h | 7 ++++---
> kernel/signal.c | 21 ++++++++++++++-------
> 3 files changed, 21 insertions(+), 12 deletions(-)
>
> diff --git a/fs/signalfd.c b/fs/signalfd.c
> index 040e1cf90528..74f134cd1ff6 100644
> --- a/fs/signalfd.c
> +++ b/fs/signalfd.c
> @@ -165,11 +165,12 @@ static int signalfd_copyinfo(struct signalfd_siginfo __user *uinfo,
> static ssize_t signalfd_dequeue(struct signalfd_ctx *ctx, kernel_siginfo_t *info,
> int nonblock)
> {
> + enum pid_type type;
> ssize_t ret;
> DECLARE_WAITQUEUE(wait, current);
>
> spin_lock_irq(&current->sighand->siglock);
> - ret = dequeue_signal(current, &ctx->sigmask, info);
> + ret = dequeue_signal(current, &ctx->sigmask, info, &type);
> switch (ret) {
> case 0:
> if (!nonblock)
> @@ -184,7 +185,7 @@ static ssize_t signalfd_dequeue(struct signalfd_ctx *ctx, kernel_siginfo_t *info
> add_wait_queue(&current->sighand->signalfd_wqh, &wait);
> for (;;) {
> set_current_state(TASK_INTERRUPTIBLE);
> - ret = dequeue_signal(current, &ctx->sigmask, info);
> + ret = dequeue_signal(current, &ctx->sigmask, info, &type);
> if (ret != 0)
> break;
> if (signal_pending(current)) {
> diff --git a/include/linux/sched/signal.h b/include/linux/sched/signal.h
> index 23505394ef70..167995d471da 100644
> --- a/include/linux/sched/signal.h
> +++ b/include/linux/sched/signal.h
> @@ -286,17 +286,18 @@ static inline int signal_group_exit(const struct signal_struct *sig)
> extern void flush_signals(struct task_struct *);
> extern void ignore_signals(struct task_struct *);
> extern void flush_signal_handlers(struct task_struct *, int force_default);
> -extern int dequeue_signal(struct task_struct *task,
> - sigset_t *mask, kernel_siginfo_t *info);
> +extern int dequeue_signal(struct task_struct *task, sigset_t *mask,
> + kernel_siginfo_t *info, enum pid_type *type);
>
> static inline int kernel_dequeue_signal(void)
> {
> struct task_struct *task = current;
> kernel_siginfo_t __info;
> + enum pid_type __type;
> int ret;
>
> spin_lock_irq(&task->sighand->siglock);
> - ret = dequeue_signal(task, &task->blocked, &__info);
> + ret = dequeue_signal(task, &task->blocked, &__info, &__type);
> spin_unlock_irq(&task->sighand->siglock);
>
> return ret;
> diff --git a/kernel/signal.c b/kernel/signal.c
> index 986fa69c15c5..43e8b7e362b0 100644
> --- a/kernel/signal.c
> +++ b/kernel/signal.c
> @@ -626,7 +626,8 @@ static int __dequeue_signal(struct sigpending *pending, sigset_t *mask,
> *
> * All callers have to hold the siglock.
> */
> -int dequeue_signal(struct task_struct *tsk, sigset_t *mask, kernel_siginfo_t *info)
> +int dequeue_signal(struct task_struct *tsk, sigset_t *mask,
> + kernel_siginfo_t *info, enum pid_type *type)
> {
> bool resched_timer = false;
> int signr;
> @@ -634,8 +635,10 @@ int dequeue_signal(struct task_struct *tsk, sigset_t *mask, kernel_siginfo_t *in
> /* We only dequeue private signals from ourselves, we don't let
> * signalfd steal them
> */
> + *type = PIDTYPE_PID;
> signr = __dequeue_signal(&tsk->pending, mask, info, &resched_timer);
> if (!signr) {
> + *type = PIDTYPE_TGID;
> signr = __dequeue_signal(&tsk->signal->shared_pending,
> mask, info, &resched_timer);
> #ifdef CONFIG_POSIX_TIMERS
> @@ -2522,7 +2525,7 @@ static void do_freezer_trap(void)
> freezable_schedule();
> }
>
> -static int ptrace_signal(int signr, kernel_siginfo_t *info)
> +static int ptrace_signal(int signr, kernel_siginfo_t *info, enum pid_type type)
> {
> /*
> * We do not check sig_kernel_stop(signr) but set this marker
> @@ -2563,7 +2566,7 @@ static int ptrace_signal(int signr, kernel_siginfo_t *info)
>
> /* If the (new) signal is now blocked, requeue it. */
> if (sigismember(&current->blocked, signr)) {
> - send_signal(signr, info, current, PIDTYPE_PID);
> + send_signal(signr, info, current, type);
> signr = 0;
> }
>
> @@ -2664,6 +2667,7 @@ bool get_signal(struct ksignal *ksig)
>
> for (;;) {
> struct k_sigaction *ka;
> + enum pid_type type;
>
> /* Has this task already been marked for death? */
> if (signal_group_exit(signal)) {
> @@ -2706,16 +2710,18 @@ bool get_signal(struct ksignal *ksig)
> * so that the instruction pointer in the signal stack
> * frame points to the faulting instruction.
> */
> + type = PIDTYPE_PID;
> signr = dequeue_synchronous_signal(&ksig->info);
> if (!signr)
> - signr = dequeue_signal(current, &current->blocked, &ksig->info);
> + signr = dequeue_signal(current, &current->blocked,
> + &ksig->info, &type);
>
> if (!signr)
> break; /* will return 0 */
>
> if (unlikely(current->ptrace) && (signr != SIGKILL) &&
> !(sighand->action[signr -1].sa.sa_flags & SA_IMMUTABLE)) {
> - signr = ptrace_signal(signr, &ksig->info);
> + signr = ptrace_signal(signr, &ksig->info, type);
> if (!signr)
> continue;
> }
> @@ -3540,6 +3546,7 @@ static int do_sigtimedwait(const sigset_t *which, kernel_siginfo_t *info,
> ktime_t *to = NULL, timeout = KTIME_MAX;
> struct task_struct *tsk = current;
> sigset_t mask = *which;
> + enum pid_type type;
> int sig, ret = 0;
>
> if (ts) {
> @@ -3556,7 +3563,7 @@ static int do_sigtimedwait(const sigset_t *which, kernel_siginfo_t *info,
> signotset(&mask);
>
> spin_lock_irq(&tsk->sighand->siglock);
> - sig = dequeue_signal(tsk, &mask, info);
> + sig = dequeue_signal(tsk, &mask, info, &type);
> if (!sig && timeout) {
> /*
> * None ready, temporarily unblock those we're interested
> @@ -3575,7 +3582,7 @@ static int do_sigtimedwait(const sigset_t *which, kernel_siginfo_t *info,
> spin_lock_irq(&tsk->sighand->siglock);
> __set_task_blocked(tsk, &tsk->real_blocked);
> sigemptyset(&tsk->real_blocked);
> - sig = dequeue_signal(tsk, &mask, info);
> + sig = dequeue_signal(tsk, &mask, info, &type);
> }
> spin_unlock_irq(&tsk->sighand->siglock);
>
> --
> 2.20.1
>

--
Kees Cook

2021-11-16 18:31:22

by Kees Cook

[permalink] [raw]
Subject: Re: [PATCH 3/3] signal: Requeue ptrace signals

On Mon, Nov 15, 2021 at 11:34:33PM -0600, Eric W. Biederman wrote:
>
> Kyle Huey <[email protected]> writes:
>
> > rr, a userspace record and replay debugger[0], uses the recorded register
> > state at PTRACE_EVENT_EXIT to find the point in time at which to cease
> > executing the program during replay.
> >
> > If a SIGKILL races with processing another signal in get_signal, it is
> > possible for the kernel to decline to notify the tracer of the original
> > signal. But if the original signal had a handler, the kernel proceeds
> > with setting up a signal handler frame as if the tracer had chosen to
> > deliver the signal unmodified to the tracee. When the kernel goes to
> > execute the signal handler that it has now modified the stack and registers
> > for, it will discover the pending SIGKILL, and terminate the tracee
> > without executing the handler. When PTRACE_EVENT_EXIT is delivered to
> > the tracer, however, the effects of handler setup will be visible to
> > the tracer.
> >
> > Because rr (the tracer) was never notified of the signal, it is not aware
> > that a signal handler frame was set up and expects the state of the program
> > at PTRACE_EVENT_EXIT to be a state that will be reconstructed naturally
> > by allowing the program to execute from the last event. When that fails
> > to happen during replay, rr will assert and die.
> >
> > The following patches add an explicit check for a newly pending SIGKILL
> > after the ptracer has been notified and the siglock has been reacquired.
> > If this happens, we stop processing the current signal and proceed
> > immediately to handling the SIGKILL. This makes the state reported at
> > PTRACE_EVENT_EXIT the unmodified state of the program, and also avoids the
> > work to set up a signal handler frame that will never be used.
> >
> > [0] https://rr-project.org/
>
> The problem is that while the traced process makes it into ptrace_stop,
> the tracee is killed before the tracer manages to wait for the
> tracee and discover which signal was about to be delivered.
>
> More generally the problem is that while siglock was dropped a signal
> with process wide effect is short cirucit delivered to the entire
> process killing it, but the process continues to try and deliver another
> signal.
>
> In general it impossible to avoid all cases where work is performed
> after the process has been killed. In particular if the process is
> killed after get_signal returns the code will simply not know it has
> been killed until after delivering the signal frame to userspace.
>
> On the other hand when the code has already discovered the process
> has been killed and taken user space visible action that shows
> the kernel knows the process has been killed, it is just silly
> to then write the signal frame to the user space stack.
>
> Instead of being silly detect the process has been killed
> in ptrace_signal and requeue the signal so the code can pretend
> it was simply never dequeued for delivery.
>
> To test the process has been killed I use fatal_signal_pending rather
> than signal_group_exit to match the test in signal_pending_state which
> is used in schedule which is where ptrace_stop detects the process has
> been killed.
>
> Requeuing the signal so the code can pretend it was simply never
> dequeued improves the user space visible behavior that has been
> present since ebf5ebe31d2c ("[PATCH] signal-fixes-2.5.59-A4").
>
> Kyle Huey verified that this change in behavior and makes rr happy.
>
> Reported-by: Kyle Huey <[email protected]>
> Reported-by: Marko M?kel? <[email protected]>
> History Tree: https://git.kernel.org/pub/scm/linux/kernel/git/tglx/history.gi
> Signed-off-by: "Eric W. Biederman" <[email protected]>

Yay pre-git-history! :)

Reviewed-by: Kees Cook <[email protected]>

> ---
> kernel/signal.c | 3 ++-
> 1 file changed, 2 insertions(+), 1 deletion(-)
>
> diff --git a/kernel/signal.c b/kernel/signal.c
> index 43e8b7e362b0..621401550f0f 100644
> --- a/kernel/signal.c
> +++ b/kernel/signal.c
> @@ -2565,7 +2565,8 @@ static int ptrace_signal(int signr, kernel_siginfo_t *info, enum pid_type type)
> }
>
> /* If the (new) signal is now blocked, requeue it. */
> - if (sigismember(&current->blocked, signr)) {
> + if (sigismember(&current->blocked, signr) ||
> + fatal_signal_pending(current)) {
> send_signal(signr, info, current, type);
> signr = 0;
> }
> --
> 2.20.1
>

--
Kees Cook

2021-11-17 16:24:47

by Kyle Huey

[permalink] [raw]
Subject: Re: [PATCH 0/3] signal: requeuing undeliverable signals

On Mon, Nov 15, 2021 at 9:31 PM Eric W. Biederman <[email protected]> wrote:
>
>
> Kyle Huey recently reported[1] that rr gets confused if SIGKILL prevents
> ptrace_signal from delivering a signal, as the kernel setups up a signal
> frame for a signal that rr did not have a chance to observe with ptrace.
>
> In looking into it I found a couple of bugs and a quality of
> implementation issue.
>
> - The test for signal_group_exit should be inside the for loop in get_signal.
> - Signals should be requeued on the same queue they were dequeued from.
> - When a fatal signal is pending ptrace_signal should not return another
> signal for delivery.
>
> Kyle Huey has verified[2] an earlier version of this change.
>
> I have reworked things one more time to completely fix the issues
> raised, and to keep the code maintainable long term.
>
> I have smoke tested this code and combined with a careful review I
> expect this code to work fine. Kyle if you can double check that
> my last round of changes still works for rr I would appreciate it.

This still fixes the race we reported.

Tested-by: Kyle Huey <[email protected]>

- Kyle

> Eric W. Biederman (3):
> signal: In get_signal test for signal_group_exit every time through the loop
> signal: Requeue signals in the appropriate queue
> signal: Requeue ptrace signals
>
> fs/signalfd.c | 5 +++--
> include/linux/sched/signal.h | 7 ++++---
> kernel/signal.c | 44 ++++++++++++++++++++++++++------------------
> 3 files changed, 33 insertions(+), 23 deletions(-)
>
> [1] https://lkml.kernel.org/r/[email protected]
> [2] https://lkml.kernel.org/r/CAP045ApAX725ZfujaK-jJNkfCo5s+oVFpBvNfPJk+DKY8K7d=Q@mail.gmail.com
>
> Eric

2021-11-17 16:31:38

by Eric W. Biederman

[permalink] [raw]
Subject: Re: [PATCH 1/3] signal: In get_signal test for signal_group_exit every time through the loop

Kees Cook <[email protected]> writes:

> On Mon, Nov 15, 2021 at 11:32:50PM -0600, Eric W. Biederman wrote:
>>
>> Recently while investigating a problem with rr and signals I noticed
>> that siglock is dropped in ptrace_signal and get_signal does not jump
>> to relock.
>>
>> Looking farther to see if the problem is anywhere else I see that
>> do_signal_stop also returns if signal_group_exit is true. I believe
>> that test can now never be true, but it is a bit hard to trace
>> through and be certain.
>>
>> Testing signal_group_exit is not expensive, so move the test for
>> signal_group_exit into the for loop inside of get_signal to ensure
>> the test is never skipped improperly.
>
> Seems reasonable.
>
>>
>> This has been a potential since I added the test for signal_group_exit
>
> nit: missing word after "potential"? "bug", "problem"?

Yes. Potential problem. I will update the comment.

>> was added.
>>
>> Fixes: 35634ffa1751 ("signal: Always notice exiting tasks")
>> Signed-off-by: "Eric W. Biederman" <[email protected]>
>
> Reviewed-by: Kees Cook <[email protected]>
>
>> ---
>> kernel/signal.c | 20 ++++++++++----------
>> 1 file changed, 10 insertions(+), 10 deletions(-)
>>
>> diff --git a/kernel/signal.c b/kernel/signal.c
>> index 7c4b7ae714d4..986fa69c15c5 100644
>> --- a/kernel/signal.c
>> +++ b/kernel/signal.c
>> @@ -2662,19 +2662,19 @@ bool get_signal(struct ksignal *ksig)
>> goto relock;
>> }
>>
>> - /* Has this task already been marked for death? */
>> - if (signal_group_exit(signal)) {
>> - ksig->info.si_signo = signr = SIGKILL;
>> - sigdelset(&current->pending.signal, SIGKILL);
>> - trace_signal_deliver(SIGKILL, SEND_SIG_NOINFO,
>> - &sighand->action[SIGKILL - 1]);
>> - recalc_sigpending();
>> - goto fatal;
>> - }
>> -
>> for (;;) {
>> struct k_sigaction *ka;
>>
>> + /* Has this task already been marked for death? */
>> + if (signal_group_exit(signal)) {
>> + ksig->info.si_signo = signr = SIGKILL;
>> + sigdelset(&current->pending.signal, SIGKILL);
>> + trace_signal_deliver(SIGKILL, SEND_SIG_NOINFO,
>> + &sighand->action[SIGKILL - 1]);
>> + recalc_sigpending();
>> + goto fatal;
>> + }
>> +
>> if (unlikely(current->jobctl & JOBCTL_STOP_PENDING) &&
>> do_signal_stop(0))
>> goto relock;
>> --
>> 2.20.1
>>

2021-11-17 16:42:45

by Eric W. Biederman

[permalink] [raw]
Subject: Re: [PATCH 2/3] signal: Requeue signals in the appropriate queue

Kees Cook <[email protected]> writes:

> On Mon, Nov 15, 2021 at 11:33:19PM -0600, Eric W. Biederman wrote:
>>
>> In the event that a tracer changes which signal needs to be delivered
>> and that signal is currently blocked then the signal needs to be
>> requeued for later delivery.
>>
>> With the advent of CLONE_THREAD the kernel has 2 signal queues per
>> task. The per process queue and the per task queue. Update the code
>> so that if the signal is removed from the per process queue it is
>> requeued on the per process queue. This is necessary to make it
>> appear the signal was never dequeued.
>>
>> The rr debugger reasonably believes that the state of the process from
>> the last ptrace_stop it observed until PTRACE_EVENT_EXIT can be recreated
>> by simply letting a process run. If a SIGKILL interrupts a ptrace_stop
>> this is not true today.
>>
>> So return signals to their original queue in ptrace_signal so that
>> signals that are not delivered appear like they were never dequeued.
>
> The only comment I have on this is that it seems like many callers
> totally ignore the result store in "type" (signalfd_dequeue,
> kernel_dequeue_signal, do_sigtimedwait), which would imply a different
> API might be desirable. That said, it's also not a big deal.

I thought about it. The primary user of dequeue_signal get_signal does
care which queue you can from. Always returning the value makes it
possible for the other callers to ask the question do they care, when
they are updated.

My conclusion was that in this case it is more useful for the callers of
dequeue_signal to ask if they care. My sense is that if we have the
information right in people's faces and they care they will realize they
care. If the information is not trivially available people are unlikely
to remember the kernel supports two queues, even when it makes a
difference.

Eric

>> Fixes: 794aa320b79d ("[PATCH] sigfix-2.5.40-D6")
>> History Tree: https://git.kernel.org/pub/scm/linux/kernel/git/tglx/history.gi
>> Signed-off-by: "Eric W. Biederman" <[email protected]>
>
> Reviewed-by: Kees Cook <[email protected]>
>
>
>> ---
>> fs/signalfd.c | 5 +++--
>> include/linux/sched/signal.h | 7 ++++---
>> kernel/signal.c | 21 ++++++++++++++-------
>> 3 files changed, 21 insertions(+), 12 deletions(-)
>>
>> diff --git a/fs/signalfd.c b/fs/signalfd.c
>> index 040e1cf90528..74f134cd1ff6 100644
>> --- a/fs/signalfd.c
>> +++ b/fs/signalfd.c
>> @@ -165,11 +165,12 @@ static int signalfd_copyinfo(struct signalfd_siginfo __user *uinfo,
>> static ssize_t signalfd_dequeue(struct signalfd_ctx *ctx, kernel_siginfo_t *info,
>> int nonblock)
>> {
>> + enum pid_type type;
>> ssize_t ret;
>> DECLARE_WAITQUEUE(wait, current);
>>
>> spin_lock_irq(&current->sighand->siglock);
>> - ret = dequeue_signal(current, &ctx->sigmask, info);
>> + ret = dequeue_signal(current, &ctx->sigmask, info, &type);
>> switch (ret) {
>> case 0:
>> if (!nonblock)
>> @@ -184,7 +185,7 @@ static ssize_t signalfd_dequeue(struct signalfd_ctx *ctx, kernel_siginfo_t *info
>> add_wait_queue(&current->sighand->signalfd_wqh, &wait);
>> for (;;) {
>> set_current_state(TASK_INTERRUPTIBLE);
>> - ret = dequeue_signal(current, &ctx->sigmask, info);
>> + ret = dequeue_signal(current, &ctx->sigmask, info, &type);
>> if (ret != 0)
>> break;
>> if (signal_pending(current)) {
>> diff --git a/include/linux/sched/signal.h b/include/linux/sched/signal.h
>> index 23505394ef70..167995d471da 100644
>> --- a/include/linux/sched/signal.h
>> +++ b/include/linux/sched/signal.h
>> @@ -286,17 +286,18 @@ static inline int signal_group_exit(const struct signal_struct *sig)
>> extern void flush_signals(struct task_struct *);
>> extern void ignore_signals(struct task_struct *);
>> extern void flush_signal_handlers(struct task_struct *, int force_default);
>> -extern int dequeue_signal(struct task_struct *task,
>> - sigset_t *mask, kernel_siginfo_t *info);
>> +extern int dequeue_signal(struct task_struct *task, sigset_t *mask,
>> + kernel_siginfo_t *info, enum pid_type *type);
>>
>> static inline int kernel_dequeue_signal(void)
>> {
>> struct task_struct *task = current;
>> kernel_siginfo_t __info;
>> + enum pid_type __type;
>> int ret;
>>
>> spin_lock_irq(&task->sighand->siglock);
>> - ret = dequeue_signal(task, &task->blocked, &__info);
>> + ret = dequeue_signal(task, &task->blocked, &__info, &__type);
>> spin_unlock_irq(&task->sighand->siglock);
>>
>> return ret;
>> diff --git a/kernel/signal.c b/kernel/signal.c
>> index 986fa69c15c5..43e8b7e362b0 100644
>> --- a/kernel/signal.c
>> +++ b/kernel/signal.c
>> @@ -626,7 +626,8 @@ static int __dequeue_signal(struct sigpending *pending, sigset_t *mask,
>> *
>> * All callers have to hold the siglock.
>> */
>> -int dequeue_signal(struct task_struct *tsk, sigset_t *mask, kernel_siginfo_t *info)
>> +int dequeue_signal(struct task_struct *tsk, sigset_t *mask,
>> + kernel_siginfo_t *info, enum pid_type *type)
>> {
>> bool resched_timer = false;
>> int signr;
>> @@ -634,8 +635,10 @@ int dequeue_signal(struct task_struct *tsk, sigset_t *mask, kernel_siginfo_t *in
>> /* We only dequeue private signals from ourselves, we don't let
>> * signalfd steal them
>> */
>> + *type = PIDTYPE_PID;
>> signr = __dequeue_signal(&tsk->pending, mask, info, &resched_timer);
>> if (!signr) {
>> + *type = PIDTYPE_TGID;
>> signr = __dequeue_signal(&tsk->signal->shared_pending,
>> mask, info, &resched_timer);
>> #ifdef CONFIG_POSIX_TIMERS
>> @@ -2522,7 +2525,7 @@ static void do_freezer_trap(void)
>> freezable_schedule();
>> }
>>
>> -static int ptrace_signal(int signr, kernel_siginfo_t *info)
>> +static int ptrace_signal(int signr, kernel_siginfo_t *info, enum pid_type type)
>> {
>> /*
>> * We do not check sig_kernel_stop(signr) but set this marker
>> @@ -2563,7 +2566,7 @@ static int ptrace_signal(int signr, kernel_siginfo_t *info)
>>
>> /* If the (new) signal is now blocked, requeue it. */
>> if (sigismember(&current->blocked, signr)) {
>> - send_signal(signr, info, current, PIDTYPE_PID);
>> + send_signal(signr, info, current, type);
>> signr = 0;
>> }
>>
>> @@ -2664,6 +2667,7 @@ bool get_signal(struct ksignal *ksig)
>>
>> for (;;) {
>> struct k_sigaction *ka;
>> + enum pid_type type;
>>
>> /* Has this task already been marked for death? */
>> if (signal_group_exit(signal)) {
>> @@ -2706,16 +2710,18 @@ bool get_signal(struct ksignal *ksig)
>> * so that the instruction pointer in the signal stack
>> * frame points to the faulting instruction.
>> */
>> + type = PIDTYPE_PID;
>> signr = dequeue_synchronous_signal(&ksig->info);
>> if (!signr)
>> - signr = dequeue_signal(current, &current->blocked, &ksig->info);
>> + signr = dequeue_signal(current, &current->blocked,
>> + &ksig->info, &type);
>>
>> if (!signr)
>> break; /* will return 0 */
>>
>> if (unlikely(current->ptrace) && (signr != SIGKILL) &&
>> !(sighand->action[signr -1].sa.sa_flags & SA_IMMUTABLE)) {
>> - signr = ptrace_signal(signr, &ksig->info);
>> + signr = ptrace_signal(signr, &ksig->info, type);
>> if (!signr)
>> continue;
>> }
>> @@ -3540,6 +3546,7 @@ static int do_sigtimedwait(const sigset_t *which, kernel_siginfo_t *info,
>> ktime_t *to = NULL, timeout = KTIME_MAX;
>> struct task_struct *tsk = current;
>> sigset_t mask = *which;
>> + enum pid_type type;
>> int sig, ret = 0;
>>
>> if (ts) {
>> @@ -3556,7 +3563,7 @@ static int do_sigtimedwait(const sigset_t *which, kernel_siginfo_t *info,
>> signotset(&mask);
>>
>> spin_lock_irq(&tsk->sighand->siglock);
>> - sig = dequeue_signal(tsk, &mask, info);
>> + sig = dequeue_signal(tsk, &mask, info, &type);
>> if (!sig && timeout) {
>> /*
>> * None ready, temporarily unblock those we're interested
>> @@ -3575,7 +3582,7 @@ static int do_sigtimedwait(const sigset_t *which, kernel_siginfo_t *info,
>> spin_lock_irq(&tsk->sighand->siglock);
>> __set_task_blocked(tsk, &tsk->real_blocked);
>> sigemptyset(&tsk->real_blocked);
>> - sig = dequeue_signal(tsk, &mask, info);
>> + sig = dequeue_signal(tsk, &mask, info, &type);
>> }
>> spin_unlock_irq(&tsk->sighand->siglock);
>>
>> --
>> 2.20.1
>>

2021-11-17 16:45:10

by Eric W. Biederman

[permalink] [raw]
Subject: Re: [PATCH 3/3] signal: Requeue ptrace signals

Kees Cook <[email protected]> writes:

> On Mon, Nov 15, 2021 at 11:34:33PM -0600, Eric W. Biederman wrote:
>>
>> Kyle Huey <[email protected]> writes:
>>
>> > rr, a userspace record and replay debugger[0], uses the recorded register
>> > state at PTRACE_EVENT_EXIT to find the point in time at which to cease
>> > executing the program during replay.
>> >
>> > If a SIGKILL races with processing another signal in get_signal, it is
>> > possible for the kernel to decline to notify the tracer of the original
>> > signal. But if the original signal had a handler, the kernel proceeds
>> > with setting up a signal handler frame as if the tracer had chosen to
>> > deliver the signal unmodified to the tracee. When the kernel goes to
>> > execute the signal handler that it has now modified the stack and registers
>> > for, it will discover the pending SIGKILL, and terminate the tracee
>> > without executing the handler. When PTRACE_EVENT_EXIT is delivered to
>> > the tracer, however, the effects of handler setup will be visible to
>> > the tracer.
>> >
>> > Because rr (the tracer) was never notified of the signal, it is not aware
>> > that a signal handler frame was set up and expects the state of the program
>> > at PTRACE_EVENT_EXIT to be a state that will be reconstructed naturally
>> > by allowing the program to execute from the last event. When that fails
>> > to happen during replay, rr will assert and die.
>> >
>> > The following patches add an explicit check for a newly pending SIGKILL
>> > after the ptracer has been notified and the siglock has been reacquired.
>> > If this happens, we stop processing the current signal and proceed
>> > immediately to handling the SIGKILL. This makes the state reported at
>> > PTRACE_EVENT_EXIT the unmodified state of the program, and also avoids the
>> > work to set up a signal handler frame that will never be used.
>> >
>> > [0] https://rr-project.org/
>>
>> The problem is that while the traced process makes it into ptrace_stop,
>> the tracee is killed before the tracer manages to wait for the
>> tracee and discover which signal was about to be delivered.
>>
>> More generally the problem is that while siglock was dropped a signal
>> with process wide effect is short cirucit delivered to the entire
>> process killing it, but the process continues to try and deliver another
>> signal.
>>
>> In general it impossible to avoid all cases where work is performed
>> after the process has been killed. In particular if the process is
>> killed after get_signal returns the code will simply not know it has
>> been killed until after delivering the signal frame to userspace.
>>
>> On the other hand when the code has already discovered the process
>> has been killed and taken user space visible action that shows
>> the kernel knows the process has been killed, it is just silly
>> to then write the signal frame to the user space stack.
>>
>> Instead of being silly detect the process has been killed
>> in ptrace_signal and requeue the signal so the code can pretend
>> it was simply never dequeued for delivery.
>>
>> To test the process has been killed I use fatal_signal_pending rather
>> than signal_group_exit to match the test in signal_pending_state which
>> is used in schedule which is where ptrace_stop detects the process has
>> been killed.
>>
>> Requeuing the signal so the code can pretend it was simply never
>> dequeued improves the user space visible behavior that has been
>> present since ebf5ebe31d2c ("[PATCH] signal-fixes-2.5.59-A4").
>>
>> Kyle Huey verified that this change in behavior and makes rr happy.
>>
>> Reported-by: Kyle Huey <[email protected]>
>> Reported-by: Marko Mäkelä <[email protected]>
>> History Tree: https://git.kernel.org/pub/scm/linux/kernel/git/tglx/history.gi
>> Signed-off-by: "Eric W. Biederman" <[email protected]>
>
> Yay pre-git-history! :)

One of these days we might finish removing the rough edges and
fixing the corner case bugs in the original linux pthreads support.

Eric


> Reviewed-by: Kees Cook <[email protected]>
>
>> ---
>> kernel/signal.c | 3 ++-
>> 1 file changed, 2 insertions(+), 1 deletion(-)
>>
>> diff --git a/kernel/signal.c b/kernel/signal.c
>> index 43e8b7e362b0..621401550f0f 100644
>> --- a/kernel/signal.c
>> +++ b/kernel/signal.c
>> @@ -2565,7 +2565,8 @@ static int ptrace_signal(int signr, kernel_siginfo_t *info, enum pid_type type)
>> }
>>
>> /* If the (new) signal is now blocked, requeue it. */
>> - if (sigismember(&current->blocked, signr)) {
>> + if (sigismember(&current->blocked, signr) ||
>> + fatal_signal_pending(current)) {
>> send_signal(signr, info, current, type);
>> signr = 0;
>> }
>> --
>> 2.20.1
>>

2021-11-17 16:51:17

by Eric W. Biederman

[permalink] [raw]
Subject: Re: [PATCH 0/3] signal: requeuing undeliverable signals

Kyle Huey <[email protected]> writes:

> On Mon, Nov 15, 2021 at 9:31 PM Eric W. Biederman <[email protected]> wrote:
>>
>>
>> Kyle Huey recently reported[1] that rr gets confused if SIGKILL prevents
>> ptrace_signal from delivering a signal, as the kernel setups up a signal
>> frame for a signal that rr did not have a chance to observe with ptrace.
>>
>> In looking into it I found a couple of bugs and a quality of
>> implementation issue.
>>
>> - The test for signal_group_exit should be inside the for loop in get_signal.
>> - Signals should be requeued on the same queue they were dequeued from.
>> - When a fatal signal is pending ptrace_signal should not return another
>> signal for delivery.
>>
>> Kyle Huey has verified[2] an earlier version of this change.
>>
>> I have reworked things one more time to completely fix the issues
>> raised, and to keep the code maintainable long term.
>>
>> I have smoke tested this code and combined with a careful review I
>> expect this code to work fine. Kyle if you can double check that
>> my last round of changes still works for rr I would appreciate it.
>
> This still fixes the race we reported.

>
> Tested-by: Kyle Huey <[email protected]>

Thank you very much for retesting.

Eric

2021-11-18 06:12:39

by Marko Mäkelä

[permalink] [raw]
Subject: Re: [PATCH 0/3] signal: requeuing undeliverable signals

On Wed, Nov 17, 2021 at 6:51 PM Eric W. Biederman <[email protected]> wrote:
>
> Kyle Huey <[email protected]> writes:
>
> > On Mon, Nov 15, 2021 at 9:31 PM Eric W. Biederman <[email protected]> wrote:
> >>
> >>
> >> Kyle Huey recently reported[1] that rr gets confused if SIGKILL prevents
> >> ptrace_signal from delivering a signal, as the kernel setups up a signal
> >> frame for a signal that rr did not have a chance to observe with ptrace.
> >>
> >> In looking into it I found a couple of bugs and a quality of
> >> implementation issue.
> >>
> >> - The test for signal_group_exit should be inside the for loop in get_signal.
> >> - Signals should be requeued on the same queue they were dequeued from.
> >> - When a fatal signal is pending ptrace_signal should not return another
> >> signal for delivery.
> >>
> >> Kyle Huey has verified[2] an earlier version of this change.
> >>
> >> I have reworked things one more time to completely fix the issues
> >> raised, and to keep the code maintainable long term.
> >>
> >> I have smoke tested this code and combined with a careful review I
> >> expect this code to work fine. Kyle if you can double check that
> >> my last round of changes still works for rr I would appreciate it.
> >
> > This still fixes the race we reported.
>
> >
> > Tested-by: Kyle Huey <[email protected]>
>
> Thank you very much for retesting.
>
> Eric

Thank you, Kyle and Eric, for reporting and fixing the root cause of this race.

Meanwhile, I followed Kyle's suggestion and will disable the crash
handlers in the tracee whenever it is being traced.

Marko
--
Marko Mäkelä, Lead Developer InnoDB
MariaDB Corporation