Received: by 2002:ac0:da4c:0:0:0:0:0 with SMTP id a12csp131273imi; Wed, 20 Jul 2022 19:08:58 -0700 (PDT) X-Google-Smtp-Source: AGRyM1sW9q9gFbwdploxMlG3VgWQyvwCDSS2XBjBbZVy1OzZbQrFcdV8yp2tcxX2h2llBLbwyA4j X-Received: by 2002:a17:90b:1bc7:b0:1f0:34e2:5c86 with SMTP id oa7-20020a17090b1bc700b001f034e25c86mr8651243pjb.136.1658369338202; Wed, 20 Jul 2022 19:08:58 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1658369338; cv=none; d=google.com; s=arc-20160816; b=SRQTAj16lKCmUgs9vIrYK26pygAb6QvdsSKP3hZMBpIy8A238cXsNlZ6ypgOCTvDm8 TF9pheC459H2FUVy0gbB2gXG2M+kTG/h9X3Hivaqy++duQLppf3x3FpVDPCsVxc7BXDY i7iVuHIDlAKCxBTNtqC08T9QoojmSGEs6zgLET/hB5GnXwXBMEOGDNYWivqqEYg/azlp pQ54uRx8W92OOplso5nISRkZupKRUj5sgwl2JBxNwSpPbmjc7ugyN64sa8OCJRZeIqQk cYVAsyNbAd9/r1U33tQPUZZy1E2EDYUOSzujFcZ8+tF+t+K89f70PIXASVt2bVBBefsl DI9Q== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:user-agent:in-reply-to:content-disposition :mime-version:references:message-id:subject:cc:to:from:date; bh=RYI+3LLmX6QlFmCqMUWCtnKMu1x+h2s/x1uLrSbzZh0=; b=wgQC3aRwB5nj56Di1/XW8wNbrFAtXjLQIKwpz2ZrxT0gBBx1XQu+SFrPfllhwzNEQt WBa7xglN68KH9wc15V9LkSBrZmW5Ntofv61iBv+UegSOUYxn9Kde6hQnxb5KTTSeTBcV Kx2+wy+Xb76miCVTP5L3KnQCk9Io+ZmrE65TRoOlWWUcXxVcYXeceKdNvVRqW3yoZ1dC Q0D94mWVr6iox4Gbdstih+O7bFQdFBQyulAm00XbeKhW8deJu0To2fcdi5tOD9krH6wa kfbCXqxqcvUiQ6UW4ve+X5Ox//DqX/eNJvuIN1f4+8Wta78zNTa4GZB1cpuDLH3ZnBsa 77gQ== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id i1-20020a17090a974100b001ed18115032si584847pjw.7.2022.07.20.19.08.43; Wed, 20 Jul 2022 19:08:58 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229482AbiGUBzF (ORCPT + 99 others); Wed, 20 Jul 2022 21:55:05 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:50084 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S231148AbiGUBzD (ORCPT ); Wed, 20 Jul 2022 21:55:03 -0400 Received: from mail.hallyn.com (mail.hallyn.com [178.63.66.53]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 785B87696E for ; Wed, 20 Jul 2022 18:55:01 -0700 (PDT) Received: by mail.hallyn.com (Postfix, from userid 1001) id B11208A9; Wed, 20 Jul 2022 20:54:59 -0500 (CDT) Date: Wed, 20 Jul 2022 20:54:59 -0500 From: "Serge E. Hallyn" To: Tycho Andersen Cc: "Serge E. Hallyn" , "Eric W . Biederman" , Miklos Szeredi , linux-kernel@vger.kernel.org, Oleg Nesterov Subject: Re: [PATCH] sched: __fatal_signal_pending() should also check PF_EXITING Message-ID: <20220721015459.GA4297@mail.hallyn.com> References: <20220713175305.1327649-1-tycho@tycho.pizza> <20220720150328.GA30749@mail.hallyn.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.9.4 (2018-02-28) X-Spam-Status: No, score=-1.9 required=5.0 tests=BAYES_00,SPF_HELO_PASS, SPF_PASS autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Wed, Jul 20, 2022 at 02:58:42PM -0600, Tycho Andersen wrote: > On Wed, Jul 20, 2022 at 10:03:28AM -0500, Serge E. Hallyn wrote: > > On Wed, Jul 13, 2022 at 11:53:05AM -0600, Tycho Andersen wrote: > > > The wait_* code uses signal_pending_state() to test whether a thread has > > > been interrupted, which ultimately uses __fatal_signal_pending() to detect > > > if there is a fatal signal. > > > > > > When a pid ns dies, it does: > > > > > > group_send_sig_info(SIGKILL, SEND_SIG_PRIV, task, PIDTYPE_MAX); > > > > > > for all the tasks in the pid ns. That calls through: > > > > > > group_send_sig_info() -> > > > do_send_sig_info() -> > > > send_signal_locked() -> > > > __send_signal_locked() > > > > > > which does: > > > > > > pending = (type != PIDTYPE_PID) ? &t->signal->shared_pending : &t->pending; > > > > > > which puts sigkill in the set of shared signals, but not the individual > > > pending ones. When complete_signal() is called at the end of > > > __send_signal_locked(), if the task already had PF_EXITING (i.e. was > > > already waiting on something in its fd closing path like a fuse flush), > > > complete_signal() will not wake up the thread, since wants_signal() checks > > > PF_EXITING before testing for SIGKILL. > > > > > > If tasks are stuck in a killable wait (e.g. a fuse flush operation), they > > > won't see this shared signal, and will hang forever, since TIF_SIGPENDING > > > is set, but the fatal signal can't be detected. So, let's also look for > > > PF_EXITING in __fatal_signal_pending(). > > > > > > Signed-off-by: Tycho Andersen > > > > Cool, thanks for nailing this down! > > > > I assume you've been running this on some boxes with no weird effects? > > Yes, but I haven't tested all the paths. > > > > --- > > > include/linux/sched/signal.h | 3 ++- > > > 1 file changed, 2 insertions(+), 1 deletion(-) > > > > > > diff --git a/include/linux/sched/signal.h b/include/linux/sched/signal.h > > > index cafbe03eed01..c20b7e1d89ef 100644 > > > --- a/include/linux/sched/signal.h > > > +++ b/include/linux/sched/signal.h > > > @@ -402,7 +402,8 @@ static inline int signal_pending(struct task_struct *p) > > > > > > static inline int __fatal_signal_pending(struct task_struct *p) > > > { > > > - return unlikely(sigismember(&p->pending.signal, SIGKILL)); > > > + return unlikely(sigismember(&p->pending.signal, SIGKILL) || > > > + p->flags & PF_EXITING); > > > > Looking around at the callers this does seem safe, but the name does > > now seem misleading. Should this be renamed to something like > > exiting_or_fatal_signal_pending()? > > This is why I like my original patch better: it is just expanding the > set of signals to include the shared signals, which are indeed still > fatal pending signals for the task. I don't really understand Eric's > argument about kernel threads ignoring SIGKILL, since kernel threads Oh - I didn't either - checking the sigkill in shared signals *seems* legit if they can be put there - but since you posted the new patch I assumed his reasoning was clear to you. I know Eric's busy, cc:ing Oleg for his interpretation too. > can still ignore SIGKILL just fine after this patch. > > But yes, assuming Eric is ok with this venison. I can send a v2 with > the name change as you suggest. > > Thanks for looking. > > Tycho