Received: by 2002:ad5:4acb:0:0:0:0:0 with SMTP id n11csp5597948imw; Wed, 20 Jul 2022 08:39:01 -0700 (PDT) X-Google-Smtp-Source: AGRyM1vbTnFws3eG6VVtz03K1wcpnkHjFgnVvo8JdxUfAUw2GjNHtPsdUYgN7+48+WsgvDoldZYz X-Received: by 2002:a17:902:e552:b0:16c:571d:fc08 with SMTP id n18-20020a170902e55200b0016c571dfc08mr39249615plf.151.1658331541692; Wed, 20 Jul 2022 08:39:01 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1658331541; cv=none; d=google.com; s=arc-20160816; b=UjZE8dsJojMe1q5JevQslbAgkvdEcfo5eVGGHGGlJ7JUfH+YkI7+Ya4lQEKTNfOfVT bpMvCc1+o3SrgEwsBeMQyMYPatTpJwVMah9+5a1M4B5nm1AkAnPTd+TcsKs3r0QwLuN3 JkiYW+OtVZaOX7n3EKzhV55fCjKP40SnQY6IkuNUiCktuWDAocH7xN5xPayuVm6DFyar fuL06ZEATIICf/HG4onBhviiyKG/6Nerl/sflNvp2EBYtYJCcrkwNs6E3IpMAmc6Rl+V yPF2f44uQRpqLU+FSk//wZI1L5SNBtLyeKVdDRBKbk7ZcvkCaQTGvsX2Sw1ssYLhq/I+ Fw3w== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:user-agent:in-reply-to:content-disposition :mime-version:references:message-id:subject:cc:to:from:date; bh=tHHn2JvsfhN3KzrBRYCuckZtKYEpKTag7vGUu+hcKUI=; b=srA8UiOU10pMB9pvZCi/YXB3B3XkScn1Sk3oQ1/Qw2Hg34Us+PN8/5KgRbag9FT11L UXIWF98XvUW18QgafrhIE9kXnUKwBL42P+I7T6JF8TOPuY5vCAGaZ2miQ9FWZ2Os8V68 h4sZYZugjFMlucITjc1utU/7blAwzsn9vFBUd5Q4PFkOiUpbF23coU7PZoyFjABl3B4Z 46aUGBFopBzFla9fG4Sh1lehWK28mNDAEkjpQ/10NKNOZVCh3quQD46wDb5uHfnaJasH mQJYWqGRC90ZieK2cyPv+qYPafXI2dfBNoQfFKfzF9eb77G4r5haL5ClFPWjlnUFh7L8 KUzA== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id t205-20020a635fd6000000b00419b2c24274si24536395pgb.550.2022.07.20.08.38.46; Wed, 20 Jul 2022 08:39:01 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231618AbiGTPDd (ORCPT + 99 others); Wed, 20 Jul 2022 11:03:33 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:40576 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229456AbiGTPDc (ORCPT ); Wed, 20 Jul 2022 11:03:32 -0400 Received: from mail.hallyn.com (mail.hallyn.com [178.63.66.53]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 31FB854670 for ; Wed, 20 Jul 2022 08:03:30 -0700 (PDT) Received: by mail.hallyn.com (Postfix, from userid 1001) id 34446937; Wed, 20 Jul 2022 10:03:28 -0500 (CDT) Date: Wed, 20 Jul 2022 10:03:28 -0500 From: "Serge E. Hallyn" To: Tycho Andersen Cc: "Eric W . Biederman" , Miklos Szeredi , linux-kernel@vger.kernel.org Subject: Re: [PATCH] sched: __fatal_signal_pending() should also check PF_EXITING Message-ID: <20220720150328.GA30749@mail.hallyn.com> References: <20220713175305.1327649-1-tycho@tycho.pizza> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20220713175305.1327649-1-tycho@tycho.pizza> User-Agent: Mutt/1.9.4 (2018-02-28) X-Spam-Status: No, score=-1.9 required=5.0 tests=BAYES_00,SPF_HELO_PASS, SPF_PASS autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Wed, Jul 13, 2022 at 11:53:05AM -0600, Tycho Andersen wrote: > The wait_* code uses signal_pending_state() to test whether a thread has > been interrupted, which ultimately uses __fatal_signal_pending() to detect > if there is a fatal signal. > > When a pid ns dies, it does: > > group_send_sig_info(SIGKILL, SEND_SIG_PRIV, task, PIDTYPE_MAX); > > for all the tasks in the pid ns. That calls through: > > group_send_sig_info() -> > do_send_sig_info() -> > send_signal_locked() -> > __send_signal_locked() > > which does: > > pending = (type != PIDTYPE_PID) ? &t->signal->shared_pending : &t->pending; > > which puts sigkill in the set of shared signals, but not the individual > pending ones. When complete_signal() is called at the end of > __send_signal_locked(), if the task already had PF_EXITING (i.e. was > already waiting on something in its fd closing path like a fuse flush), > complete_signal() will not wake up the thread, since wants_signal() checks > PF_EXITING before testing for SIGKILL. > > If tasks are stuck in a killable wait (e.g. a fuse flush operation), they > won't see this shared signal, and will hang forever, since TIF_SIGPENDING > is set, but the fatal signal can't be detected. So, let's also look for > PF_EXITING in __fatal_signal_pending(). > > Signed-off-by: Tycho Andersen Cool, thanks for nailing this down! I assume you've been running this on some boxes with no weird effects? > --- > include/linux/sched/signal.h | 3 ++- > 1 file changed, 2 insertions(+), 1 deletion(-) > > diff --git a/include/linux/sched/signal.h b/include/linux/sched/signal.h > index cafbe03eed01..c20b7e1d89ef 100644 > --- a/include/linux/sched/signal.h > +++ b/include/linux/sched/signal.h > @@ -402,7 +402,8 @@ static inline int signal_pending(struct task_struct *p) > > static inline int __fatal_signal_pending(struct task_struct *p) > { > - return unlikely(sigismember(&p->pending.signal, SIGKILL)); > + return unlikely(sigismember(&p->pending.signal, SIGKILL) || > + p->flags & PF_EXITING); Looking around at the callers this does seem safe, but the name does now seem misleading. Should this be renamed to something like exiting_or_fatal_signal_pending()? > } > > static inline int fatal_signal_pending(struct task_struct *p) > > base-commit: 32346491ddf24599decca06190ebca03ff9de7f8 > -- > 2.34.1 >