Received: by 2002:a05:6512:2355:0:0:0:0 with SMTP id p21csp5517005lfu; Mon, 28 Mar 2022 15:49:26 -0700 (PDT) X-Google-Smtp-Source: ABdhPJyMm6i/29LShsuYh3olWpxOLmklu0dYzu9lM549dWrYW7PDp4TdaLek0Sode70oifwDRxv4 X-Received: by 2002:a65:57ca:0:b0:381:ea8d:4d1f with SMTP id q10-20020a6557ca000000b00381ea8d4d1fmr11888676pgr.143.1648507766686; Mon, 28 Mar 2022 15:49:26 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1648507766; cv=none; d=google.com; s=arc-20160816; b=WJG7W1px09/nxZPh1s5Ag+VvCDP1gmty4CzIr/wLKxqdqhPuc8s35ClNzmJEUItfMa Ua5NmkZBr+5NUp4u+YaP7Gf2840SQdvsn2M322Fm/OrBOkY+V2muhBscLFMbGbigYfiN 7IuxT4BHVXG5LSx60cyjwZIJg1K9UzvwmySHCkQABwaSOQD5q+ErMjVcpl4RU3FpoPZK ES3eb6WzGv3eG9wiCExnK+58nPJzrL8SuKoqaKnctWz8M/bN4mUm6Q6v1wpoDhGvBIWJ cRUvuytyqWO2wxD1kVS/p2eUYNRNrixbMUshGpejyxNpAEF+S7z3tRKpXPkNQBDs+/Ld T3ew== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:subject:mime-version:user-agent:message-id :in-reply-to:date:references:cc:to:from; bh=GmpRoagLIkpu2q689h+cR1pPiBRbMMyiA0KdlQ/diAw=; b=kCb+FZ8Z4JZC8mRgctcoOcWixAjQFpHL7fOXZo7yfuCuwPRNZ4sU0ztBZiO5PYcfLL 5o4tKMvwgbk+rkRMoKCM0cWrgYpvJpP+CMpp/gmOfTNXacdEvRTvnb5Yleg2txff7cFf I2SfStBXvOChEaSsNnPO9lYNNmawWivFw7wa3WmYoyxE8wChgLscqepwuSdi2iPGUTx1 Z8lDfu556PxV17Gc92uPznQS5JNpUcXuPI2E5yTOPwUSUfTsbeOl8hmJBZE8dSoVIXKQ 4CFu0B7jH4TX1qxDkvzc/3MXvsuPj8LOrRdvl6yxGzcTpptCIYeR6a3aGZJY7q7UbDFN ZaVA== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=xmission.com Return-Path: Received: from lindbergh.monkeyblade.net (lindbergh.monkeyblade.net. [2620:137:e000::1:18]) by mx.google.com with ESMTPS id p3-20020a170902e74300b00153b2d164c3si16497061plf.203.2022.03.28.15.49.26 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 28 Mar 2022 15:49:26 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:18 as permitted sender) client-ip=2620:137:e000::1:18; Authentication-Results: mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=xmission.com Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by lindbergh.monkeyblade.net (Postfix) with ESMTP id 276D6261327; Mon, 28 Mar 2022 14:56:33 -0700 (PDT) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S241694AbiC1O1R (ORCPT + 99 others); Mon, 28 Mar 2022 10:27:17 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:50114 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230240AbiC1O1P (ORCPT ); Mon, 28 Mar 2022 10:27:15 -0400 Received: from out03.mta.xmission.com (out03.mta.xmission.com [166.70.13.233]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 8B6451EECF for ; Mon, 28 Mar 2022 07:25:34 -0700 (PDT) Received: from in02.mta.xmission.com ([166.70.13.52]:36834) by out03.mta.xmission.com with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.93) (envelope-from ) id 1nYqJG-003OQK-D2; Mon, 28 Mar 2022 08:25:30 -0600 Received: from ip68-227-174-4.om.om.cox.net ([68.227.174.4]:41434 helo=email.froward.int.ebiederm.org.xmission.com) by in02.mta.xmission.com with esmtpsa (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.93) (envelope-from ) id 1nYqJE-002YCW-Uq; Mon, 28 Mar 2022 08:25:30 -0600 From: "Eric W. Biederman" To: Sebastian Andrzej Siewior Cc: x86@kernel.org, linux-kernel@vger.kernel.org, Oleg Nesterov , "H. Peter Anvin" , Andy Lutomirski , Ben Segall , Borislav Petkov , Daniel Bristot de Oliveira , Dave Hansen , Dietmar Eggemann , Ingo Molnar , Juri Lelli , Mel Gorman , Peter Zijlstra , Steven Rostedt , Thomas Gleixner , Vincent Guittot References: Date: Mon, 28 Mar 2022 09:25:06 -0500 In-Reply-To: (Sebastian Andrzej Siewior's message of "Mon, 14 Feb 2022 21:19:52 +0100") Message-ID: <8735j2xigt.fsf@email.froward.int.ebiederm.org> User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/27.1 (gnu/linux) MIME-Version: 1.0 Content-Type: text/plain X-XM-SPF: eid=1nYqJE-002YCW-Uq;;;mid=<8735j2xigt.fsf@email.froward.int.ebiederm.org>;;;hst=in02.mta.xmission.com;;;ip=68.227.174.4;;;frm=ebiederm@xmission.com;;;spf=neutral X-XM-AID: U2FsdGVkX1+DSbZxTMSwJ8KUAlJYxFOHmS2cVk8VvCo= X-SA-Exim-Connect-IP: 68.227.174.4 X-SA-Exim-Mail-From: ebiederm@xmission.com X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net X-Spam-Level: X-Spam-Status: No, score=-1.9 required=5.0 tests=BAYES_00, HEADER_FROM_DIFFERENT_DOMAINS,MAILING_LIST_MULTI,RDNS_NONE, SPF_HELO_NONE,T_SCC_BODY_TEXT_LINE autolearn=no autolearn_force=no version=3.4.6 X-Spam-DCC: XMission; sa06 1397; Body=1 Fuz1=1 Fuz2=1 X-Spam-Combo: ;Sebastian Andrzej Siewior X-Spam-Relay-Country: X-Spam-Timing: total 876 ms - load_scoreonly_sql: 0.04 (0.0%), signal_user_changed: 11 (1.2%), b_tie_ro: 9 (1.1%), parse: 1.17 (0.1%), extract_message_metadata: 18 (2.1%), get_uri_detail_list: 3.8 (0.4%), tests_pri_-1000: 9 (1.0%), tests_pri_-950: 1.30 (0.1%), tests_pri_-900: 1.10 (0.1%), tests_pri_-90: 153 (17.5%), check_bayes: 139 (15.9%), b_tokenize: 15 (1.7%), b_tok_get_all: 13 (1.5%), b_comp_prob: 3.6 (0.4%), b_tok_touch_all: 102 (11.7%), b_finish: 1.04 (0.1%), tests_pri_0: 659 (75.2%), check_dkim_signature: 0.97 (0.1%), check_dkim_adsp: 8 (0.9%), poll_dns_idle: 2.3 (0.3%), tests_pri_10: 2.7 (0.3%), tests_pri_500: 16 (1.8%), rewrite_mail: 0.00 (0.0%) Subject: Re: [PATCH] signal/x86: Delay calling signals in atomic X-SA-Exim-Version: 4.2.1 (built Sat, 08 Feb 2020 21:53:50 +0000) X-SA-Exim-Scanned: Yes (on in02.mta.xmission.com) Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Sebastian Andrzej Siewior writes: > From: Oleg Nesterov > Date: Tue, 14 Jul 2015 14:26:34 +0200 > > On x86_64 we must disable preemption before we enable interrupts > for stack faults, int3 and debugging, because the current task is using > a per CPU debug stack defined by the IST. If we schedule out, another task > can come in and use the same stack and cause the stack to be corrupted > and crash the kernel on return. > > When CONFIG_PREEMPT_RT is enabled, spinlock_t locks become sleeping, and > one of these is the spin lock used in signal handling. > > Some of the debug code (int3) causes do_trap() to send a signal. > This function calls a spinlock_t lock that has been converted to a > sleeping lock. If this happens, the above issues with the corrupted > stack is possible. > > Instead of calling the signal right away, for PREEMPT_RT and x86, > the signal information is stored on the stacks task_struct and > TIF_NOTIFY_RESUME is set. Then on exit of the trap, the signal resume > code will send the signal when preemption is enabled. Folks I really would have appreciated being copied on a signal handling patch like this. It is too late to nack, but I think this buggy patch deserved one. Can we please fix PREEMPT_RT instead? As far as I can tell this violates all of rules from implementing/maintaining the RT kernel. Instead of coming up with new abstractions that makes sense and can use by everyone this introduces a hack only for PREEMPT_RT and a pretty horrible one at that. This talks about int3, but the code looks for in_atomic(). Which means that essentially every call of force_sig will take this path as they almost all come from exception handlers. It is the nature of signals that report on faults. An exception is raised and the kernel reports it to userspace with a fault signal (aka force_sig_xxx). Further this code is buggy. TIF_NOTIFY_RESUME is not the correct flag to set to enter into exit_to_usermode_loop. TIF_NOTIFY_RESUME is about that happens after signal handling. This very much needs to be TIF_SIGPENDING with recalc_sigpending and friends updated to know about "task->force_info". Does someone own this problem? Can that person please fix this properly? I really don't think it is going to be maintainable for PREEMPT_RT to maintain a separate signal delivery path for faults from the rest of linux. Eric > [ rostedt: Switched from #ifdef CONFIG_PREEMPT_RT to > ARCH_RT_DELAYS_SIGNAL_SEND and added comments to the code. ] > [bigeasy: Add on 32bit as per Yang Shi, minor rewording. ] > > Signed-off-by: Oleg Nesterov > Signed-off-by: Steven Rostedt > Signed-off-by: Thomas Gleixner > Signed-off-by: Sebastian Andrzej Siewior > --- > arch/x86/include/asm/signal.h | 13 +++++++++++++ > include/linux/sched.h | 3 +++ > kernel/entry/common.c | 9 +++++++++ > kernel/signal.c | 28 ++++++++++++++++++++++++++++ > 4 files changed, 53 insertions(+) > > diff --git a/arch/x86/include/asm/signal.h b/arch/x86/include/asm/signal.h > index 2dfb5fea13aff..fc03f4f7ed84c 100644 > --- a/arch/x86/include/asm/signal.h > +++ b/arch/x86/include/asm/signal.h > @@ -28,6 +28,19 @@ typedef struct { > #define SA_IA32_ABI 0x02000000u > #define SA_X32_ABI 0x01000000u > > +/* > + * Because some traps use the IST stack, we must keep preemption > + * disabled while calling do_trap(), but do_trap() may call > + * force_sig_info() which will grab the signal spin_locks for the > + * task, which in PREEMPT_RT are mutexes. By defining > + * ARCH_RT_DELAYS_SIGNAL_SEND the force_sig_info() will set > + * TIF_NOTIFY_RESUME and set up the signal to be sent on exit of the > + * trap. > + */ > +#if defined(CONFIG_PREEMPT_RT) > +#define ARCH_RT_DELAYS_SIGNAL_SEND > +#endif > + > #ifndef CONFIG_COMPAT > #define compat_sigset_t compat_sigset_t > typedef sigset_t compat_sigset_t; > diff --git a/include/linux/sched.h b/include/linux/sched.h > index 75ba8aa60248b..0514237cee3fc 100644 > --- a/include/linux/sched.h > +++ b/include/linux/sched.h > @@ -1087,6 +1087,9 @@ struct task_struct { > /* Restored if set_restore_sigmask() was used: */ > sigset_t saved_sigmask; > struct sigpending pending; > +#ifdef CONFIG_PREEMPT_RT > + struct kernel_siginfo forced_info; > +#endif > unsigned long sas_ss_sp; > size_t sas_ss_size; > unsigned int sas_ss_flags; > diff --git a/kernel/entry/common.c b/kernel/entry/common.c > index bad713684c2e3..216dbf46e05f5 100644 > --- a/kernel/entry/common.c > +++ b/kernel/entry/common.c > @@ -162,6 +162,15 @@ static unsigned long exit_to_user_mode_loop(struct pt_regs *regs, > if (ti_work & _TIF_NEED_RESCHED) > schedule(); > > +#ifdef ARCH_RT_DELAYS_SIGNAL_SEND > + if (unlikely(current->forced_info.si_signo)) { > + struct task_struct *t = current; > + > + force_sig_info(&t->forced_info); > + t->forced_info.si_signo = 0; > + } > +#endif > + > if (ti_work & _TIF_UPROBE) > uprobe_notify_resume(regs); > > diff --git a/kernel/signal.c b/kernel/signal.c > index 9b04631acde8f..cb2b28c17c0a5 100644 > --- a/kernel/signal.c > +++ b/kernel/signal.c > @@ -1327,6 +1327,34 @@ force_sig_info_to_task(struct kernel_siginfo *info, struct task_struct *t, > struct k_sigaction *action; > int sig = info->si_signo; > > + /* > + * On some archs, PREEMPT_RT has to delay sending a signal from a trap > + * since it can not enable preemption, and the signal code's spin_locks > + * turn into mutexes. Instead, it must set TIF_NOTIFY_RESUME which will > + * send the signal on exit of the trap. > + */ > +#ifdef ARCH_RT_DELAYS_SIGNAL_SEND > + if (in_atomic()) { > + struct task_struct *t = current; > + > + if (WARN_ON_ONCE(t->forced_info.si_signo)) > + return 0; > + > + if (is_si_special(info)) { > + WARN_ON_ONCE(info != SEND_SIG_PRIV); > + t->forced_info.si_signo = info->si_signo; > + t->forced_info.si_errno = 0; > + t->forced_info.si_code = SI_KERNEL; > + t->forced_info.si_pid = 0; > + t->forced_info.si_uid = 0; > + } else { > + t->forced_info = *info; > + } > + > + set_tsk_thread_flag(t, TIF_NOTIFY_RESUME); > + return 0; > + } > +#endif > spin_lock_irqsave(&t->sighand->siglock, flags); > action = &t->sighand->action[sig-1]; > ignored = action->sa.sa_handler == SIG_IGN;