Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 60D76C433EF for ; Sun, 14 Nov 2021 17:19:34 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 37A9860F45 for ; Sun, 14 Nov 2021 17:19:34 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S236146AbhKNRW0 (ORCPT ); Sun, 14 Nov 2021 12:22:26 -0500 Received: from out03.mta.xmission.com ([166.70.13.233]:33776 "EHLO out03.mta.xmission.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S236142AbhKNRWG (ORCPT ); Sun, 14 Nov 2021 12:22:06 -0500 Received: from in02.mta.xmission.com ([166.70.13.52]:57094) by out03.mta.xmission.com with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.93) (envelope-from ) id 1mmJ9q-007faX-VQ; Sun, 14 Nov 2021 10:19:11 -0700 Received: from ip68-227-160-95.om.om.cox.net ([68.227.160.95]:44950 helo=email.froward.int.ebiederm.org.xmission.com) by in02.mta.xmission.com with esmtpsa (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.93) (envelope-from ) id 1mmJ9p-00G2wk-Cs; Sun, 14 Nov 2021 10:19:10 -0700 From: ebiederm@xmission.com (Eric W. Biederman) To: Kyle Huey Cc: Jens Axboe , Peter Zijlstra , Marco Elver , Oleg Nesterov , Thomas Gleixner , Peter Collingbourne , Alexey Gladkov , "Robert O'Callahan" , Marko =?utf-8?B?TcOka2Vsw6Q=?= , open list References: <20211101034147.6203-1-khuey@kylehuey.com> <877ddqabvs.fsf@disp2133> <87fsse8maf.fsf@disp2133> Date: Sun, 14 Nov 2021 11:19:03 -0600 In-Reply-To: (Kyle Huey's message of "Mon, 8 Nov 2021 15:58:21 -0800") Message-ID: <87czn2k648.fsf@email.froward.int.ebiederm.org> User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/26.1 (gnu/linux) MIME-Version: 1.0 Content-Type: text/plain X-XM-SPF: eid=1mmJ9p-00G2wk-Cs;;;mid=<87czn2k648.fsf@email.froward.int.ebiederm.org>;;;hst=in02.mta.xmission.com;;;ip=68.227.160.95;;;frm=ebiederm@xmission.com;;;spf=neutral X-XM-AID: U2FsdGVkX19mfFWN13wU0r5Lw9cLz95XrBEFQ8aNTGk= X-SA-Exim-Connect-IP: 68.227.160.95 X-SA-Exim-Mail-From: ebiederm@xmission.com Subject: Re: [PATCH] signal: SIGKILL can cause signal effects to appear at PTRACE_EVENT_EXIT without tracer notification X-SA-Exim-Version: 4.2.1 (built Sat, 08 Feb 2020 21:53:50 +0000) X-SA-Exim-Scanned: Yes (on in02.mta.xmission.com) Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Kyle Huey writes: > On Tue, Nov 2, 2021 at 12:09 PM Kyle Huey wrote: >> >> On Tue, Nov 2, 2021 at 11:07 AM Eric W. Biederman wrote: >> > >> > Kyle Huey writes: >> > >> > > On Tue, Nov 2, 2021 at 7:09 AM Eric W. Biederman wrote: >> > >> >> > >> Kyle Huey writes: >> > >> >> > >> > rr, a userspace record and replay debugger[0], uses the recorded register >> > >> > state at PTRACE_EVENT_EXIT to find the point in time at which to cease >> > >> > executing the program during replay. >> > >> > >> > >> > If a SIGKILL races with processing another signal in get_signal, it is >> > >> > possible for the kernel to decline to notify the tracer of the original >> > >> > signal. But if the original signal had a handler, the kernel proceeds >> > >> > with setting up a signal handler frame as if the tracer had chosen to >> > >> > deliver the signal unmodified to the tracee. When the kernel goes to >> > >> > execute the signal handler that it has now modified the stack and registers >> > >> > for, it will discover the pending SIGKILL, and terminate the tracee >> > >> > without executing the handler. When PTRACE_EVENT_EXIT is delivered to >> > >> > the tracer, however, the effects of handler setup will be visible to >> > >> > the tracer. >> > >> > >> > >> > Because rr (the tracer) was never notified of the signal, it is not aware >> > >> > that a signal handler frame was set up and expects the state of the program >> > >> > at PTRACE_EVENT_EXIT to be a state that will be reconstructed naturally >> > >> > by allowing the program to execute from the last event. When that fails >> > >> > to happen during replay, rr will assert and die. >> > >> > >> > >> > The following patches add an explicit check for a newly pending SIGKILL >> > >> > after the ptracer has been notified and the siglock has been reacquired. >> > >> > If this happens, we stop processing the current signal and proceed >> > >> > immediately to handling the SIGKILL. This makes the state reported at >> > >> > PTRACE_EVENT_EXIT the unmodified state of the program, and also avoids the >> > >> > work to set up a signal handler frame that will never be used. >> > >> > >> > >> > This issue was originally reported by the credited rr user. >> > >> > >> > >> > [0] https://rr-project.org/ >> > >> >> > >> If I read this correctly the problem is not precisely that the rr >> > >> debugger is never notified about the signal, but rather that the program >> > >> is killed with SIGKILL before rr can read the notification and see which >> > >> signal it is. >> > > >> > > The precise problem is that the kernel made a modification to the >> > > tracee state (setting up the signal handler frame) without telling the >> > > tracer about it (delivering the ptrace notification for the pending >> > > non-SIGKILL signal). >> > >> > Except the kernel did make it to ptrace_stop. The stop just did not >> > fully happen because of SIGKILL. I expect SIGCHLD was sent to the >> > tracer as part of that stop that never fully happened. >> >> I don't know whether SIGCHLD was sent to the tracer (rr doesn't use it >> directly) but waiting on the process does not produce a wait status >> corresponding to the signal delivery stop for the original signal. >> Waiting on the tracee skips immediately from whatever the preceding >> ptrace event was to the PTRACE_EVENT_EXIT. >> >> (In our particular case, if it had been notified of the signal, we >> would have chosen to suppress the signal, because the signal in >> question is a SIGSEGV from an rdtsc instruction that has been disabled >> via prctl(PR_SET_TSC, PR_TSC_SIGSEGV) and we emulate it in the tracer >> due to its non-deterministic behavior. So we really don't expect to >> see the tracee signal handler.) >> >> > > That can be fixed either by not modifying the >> > > tracee state here or by telling the tracer about the signal (that will >> > > never actually run). I suspect we'll all agree that the former seems >> > > preferable. >> > > >> > >> This definitely sounds like a quality of implementation issue. >> > >> >> > >> The solution that is proposed in your patches simply drops the signal >> > >> when SIGKILL is pending. >> > > >> > > That's right. >> > > >> > >> I think we can have a slightly better of quality of implementation >> > >> than that (as well as a simpler implementation) by requeuing the >> > >> signal instead of simply dropping it. Something like the below. >> > > >> > > What is the benefit of requeueing the signal? All pending signals will >> > > be dropped when the SIGKILL is processed, no? >> > >> > Not before PTRACE_EVENT_EXIT. In fact the pending signals are not >> > actually flushed until the thread or the entire process is reaped. >> > >> > Further the coredump code makes some attempt to write out the >> > pending signals. The code appears to predate siginfo support >> > in the kernel so it misses a lot but it is there. >> > >> > The real advantage is that it keeps the logic of dealing with weird >> > ptrace_stop logic in ptrace_signal where it belongs. It also allows the >> > common (and missing in this case) idiom of goto relock to be used. >> > >> > So I think changing ptrace_signal will be much more maintainable. >> >> Ok. >> >> > >> Can you test that and see if it works for you? >> > > >> > > It does not work. This triggers an infinite loop in get_signal, as we >> > > dequeue the signal, attempt to notify the ptracer, see the pending >> > > sigkill, requeue the signal, go around the loop, dequeue the original >> > > signal ... >> > >> > Apologies I made a bit of a thinko. That change also needs to change >> > the handling of if (signr == 0) after ptrace_signal. >> > >> > Which means it would need to be something like the below. >> > >> > diff --git a/kernel/signal.c b/kernel/signal.c >> > index 056a107e3cbc..eddb745b34a7 100644 >> > --- a/kernel/signal.c >> > +++ b/kernel/signal.c >> > @@ -2610,7 +2610,8 @@ static int ptrace_signal(int signr, kernel_siginfo_t *info) >> > } >> > >> > /* If the (new) signal is now blocked, requeue it. */ >> > - if (sigismember(¤t->blocked, signr)) { >> > + if (sigismember(¤t->blocked, signr) || >> > + signal_group_exit(current->signal)) { >> > send_signal(signr, info, current, PIDTYPE_PID); >> > signr = 0; >> > } >> > @@ -2764,8 +2765,10 @@ bool get_signal(struct ksignal *ksig) >> > if (unlikely(current->ptrace) && (signr != SIGKILL) && >> > !(sighand->action[signr -1].sa.sa_flags & SA_IMMUTABLE)) { >> > signr = ptrace_signal(signr, &ksig->info); >> > - if (!signr) >> > - continue; >> > + if (!signr) { >> > + spin_unlock_irq(&sighand->siglock); >> > + goto relock; >> > + } >> > } >> > >> > ka = &sighand->action[signr-1]; >> > >> > Eric >> >> Yeah that appears to fix the issue. >> >> - Kyle > > Is there anything else I need to do here or are you going to take it > from here? No pinging me is the helpful thing to do. I sometimes get distracted. The merge window closes sometime later today and then v5.16-rc1 will be out. Then I will have a good base to work against. Until then I can't really merge anything. Eric