Received: by 2002:a05:6a10:2726:0:0:0:0 with SMTP id ib38csp5645126pxb; Mon, 28 Mar 2022 15:26:03 -0700 (PDT) X-Google-Smtp-Source: ABdhPJzdEsdHzziQHljUHMukH1b9HkuBJ63mY1HBZabiVU6Vbvc2J0vU3yyIwhmABsTsAVjD+Mvb X-Received: by 2002:a67:b043:0:b0:2ff:8b5e:1664 with SMTP id q3-20020a67b043000000b002ff8b5e1664mr13104822vsh.33.1648506363649; Mon, 28 Mar 2022 15:26:03 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1648506363; cv=none; d=google.com; s=arc-20160816; b=VJSC0sdN+kDMfHNcJSbeKgl+Xd2qLxjv1HC+deYIkJ3n9E2ifJXJ/MVNVYS3H+OMRf zK3pugNMJi/02UACDu0rpPDJWPif497OiIjZhEqC/fRV4XKTmNythqv76DEsGf8Z1bK3 soqg+o1Mhs1rCGRnxtLhpd2z2EauhBB9GDxXh9MfdcG6SnFJCZc74Kk8PZMkOo5lTjjm gZ/SKh6VJNgiMbUOXtfjYfSc9Ch2AwqGRpZZlEqRbRPNOh1eR2UvHxZeMc3wVWO+jAdY EJia/c1F9rcbWGqzclV0iuvQak4zCW7mU+HOSJIIBoammF8562VDedZqhfaV1RPyOPbO HfPQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :references:in-reply-to:message-id:date:subject:cc:to:from :dkim-signature; bh=U2v6OJLs8Evj3dzPoTlLHy/Zi/1EnAq+0+HsgCLdZzs=; b=WGFEoTt5GLz4MyxR1iW7c7/BUfIUXIOT5YDyjDnNVM0XVxSrFSMpDrR6RxFZOqMlYR TIZUgKI0IN3T8A9uxEVW8Ajwex/rMiy6iPv4a94Xz2kexi70N4BmT3c6uAUY8TLOdV8/ sjwBFBgBjSqHr7a48pXK9PAkceYuPv1I4tt+qE/7KZYWEtGdaYeIuujeLe0V0wHs9UUl qqhTlo/BgR/7W5JjDxfTp41DF117llecPUENGewvEkePA9qlmLUZ50GXREghKriEJ7r1 qssp9eBT9dTeciMh/7FPGbtRjF6YD0EZvb7Y+E1h+BY7EhbwhF49CoxKKIUW4c7FwnQr FF3A== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@kernel.org header.s=k20201202 header.b=GR7J16Da; spf=softfail (google.com: domain of transitioning linux-kernel-owner@vger.kernel.org does not designate 23.128.96.19 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=kernel.org Return-Path: Received: from lindbergh.monkeyblade.net (lindbergh.monkeyblade.net. [23.128.96.19]) by mx.google.com with ESMTPS id a13-20020a67ca8d000000b00324c5c3be64si3569659vsl.166.2022.03.28.15.26.03 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 28 Mar 2022 15:26:03 -0700 (PDT) Received-SPF: softfail (google.com: domain of transitioning linux-kernel-owner@vger.kernel.org does not designate 23.128.96.19 as permitted sender) client-ip=23.128.96.19; Authentication-Results: mx.google.com; dkim=pass header.i=@kernel.org header.s=k20201202 header.b=GR7J16Da; spf=softfail (google.com: domain of transitioning linux-kernel-owner@vger.kernel.org does not designate 23.128.96.19 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=kernel.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by lindbergh.monkeyblade.net (Postfix) with ESMTP id D45141AF7C8; Mon, 28 Mar 2022 14:39:37 -0700 (PDT) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S241089AbiC1L27 (ORCPT + 99 others); Mon, 28 Mar 2022 07:28:59 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:54528 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S240794AbiC1LW2 (ORCPT ); Mon, 28 Mar 2022 07:22:28 -0400 Received: from ams.source.kernel.org (ams.source.kernel.org [145.40.68.75]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 136C85576B; Mon, 28 Mar 2022 04:19:33 -0700 (PDT) Received: from smtp.kernel.org (relay.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ams.source.kernel.org (Postfix) with ESMTPS id DE6B7B80ED8; Mon, 28 Mar 2022 11:19:32 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id 86BD7C340ED; Mon, 28 Mar 2022 11:19:29 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1648466371; bh=pT6/DAmDXjGCpbuqTH9UFCdOrBgFQpXkJTY1gkTsc4k=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=GR7J16DaDCki+yw7lgunwqscHIQjZfLhxU38AwS9vZ79g6kvtAa40jFwjBpGvAOMC RUbuShoOaIrBifOgM1vyQySa/c7nbRRtt+gklfP8Ugl9A68WeSGBPOF7qwGLtIPj1H NqfGK8uCuW1USZvygee++ZrohGhc01o+nbz+gctts1mihe1g244CD1eNK6bC182SPt tx6Em8DXJAHTj7VlCZY+TU1ECkakMTgnHXiPnZVlMJUuQQYeySJ/uBq1ZoN44cDkyM IxlI9Jou24kvRgWoB4/7vyyoqMIbmpKkaQtN4NV3NYz3cicjmAEqTG7WKae+EmdKG9 LCGUNBXJagpDA== From: Sasha Levin To: linux-kernel@vger.kernel.org, stable@vger.kernel.org Cc: Oleg Nesterov , Steven Rostedt , Thomas Gleixner , Sebastian Andrzej Siewior , Sasha Levin , mingo@redhat.com, bp@alien8.de, dave.hansen@linux.intel.com, x86@kernel.org, peterz@infradead.org, juri.lelli@redhat.com, vincent.guittot@linaro.org, luto@kernel.org, frederic@kernel.org, mark.rutland@arm.com, valentin.schneider@arm.com, ebiederm@xmission.com, keescook@chromium.org, elver@google.com, legion@kernel.org Subject: [PATCH AUTOSEL 5.17 29/43] signal, x86: Delay calling signals in atomic on RT enabled kernels Date: Mon, 28 Mar 2022 07:18:13 -0400 Message-Id: <20220328111828.1554086-29-sashal@kernel.org> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20220328111828.1554086-1-sashal@kernel.org> References: <20220328111828.1554086-1-sashal@kernel.org> MIME-Version: 1.0 X-stable: review X-Patchwork-Hint: Ignore Content-Transfer-Encoding: 8bit X-Spam-Status: No, score=-2.3 required=5.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,MAILING_LIST_MULTI, RDNS_NONE,SPF_HELO_NONE,T_SCC_BODY_TEXT_LINE autolearn=unavailable autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org From: Oleg Nesterov [ Upstream commit bf9ad37dc8a30cce22ae95d6c2ca6abf8731d305 ] On x86_64 we must disable preemption before we enable interrupts for stack faults, int3 and debugging, because the current task is using a per CPU debug stack defined by the IST. If we schedule out, another task can come in and use the same stack and cause the stack to be corrupted and crash the kernel on return. When CONFIG_PREEMPT_RT is enabled, spinlock_t locks become sleeping, and one of these is the spin lock used in signal handling. Some of the debug code (int3) causes do_trap() to send a signal. This function calls a spinlock_t lock that has been converted to a sleeping lock. If this happens, the above issues with the corrupted stack is possible. Instead of calling the signal right away, for PREEMPT_RT and x86, the signal information is stored on the stacks task_struct and TIF_NOTIFY_RESUME is set. Then on exit of the trap, the signal resume code will send the signal when preemption is enabled. [ rostedt: Switched from #ifdef CONFIG_PREEMPT_RT to ARCH_RT_DELAYS_SIGNAL_SEND and added comments to the code. ] [bigeasy: Add on 32bit as per Yang Shi, minor rewording. ] [ tglx: Use a config option ] Signed-off-by: Oleg Nesterov Signed-off-by: Steven Rostedt Signed-off-by: Thomas Gleixner Signed-off-by: Sebastian Andrzej Siewior Signed-off-by: Thomas Gleixner Link: https://lore.kernel.org/r/Ygq5aBB/qMQw6aP5@linutronix.de Signed-off-by: Sasha Levin --- arch/x86/Kconfig | 1 + include/linux/sched.h | 3 +++ kernel/Kconfig.preempt | 12 +++++++++++- kernel/entry/common.c | 14 ++++++++++++++ kernel/signal.c | 40 ++++++++++++++++++++++++++++++++++++++++ 5 files changed, 69 insertions(+), 1 deletion(-) diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig index 9f5bd41bf660..d557ac29b6cd 100644 --- a/arch/x86/Kconfig +++ b/arch/x86/Kconfig @@ -120,6 +120,7 @@ config X86 select ARCH_WANTS_NO_INSTR select ARCH_WANT_HUGE_PMD_SHARE select ARCH_WANT_LD_ORPHAN_WARN + select ARCH_WANTS_RT_DELAYED_SIGNALS select ARCH_WANTS_THP_SWAP if X86_64 select ARCH_HAS_PARANOID_L1D_FLUSH select BUILDTIME_TABLE_SORT diff --git a/include/linux/sched.h b/include/linux/sched.h index 75ba8aa60248..098e37fd770a 100644 --- a/include/linux/sched.h +++ b/include/linux/sched.h @@ -1087,6 +1087,9 @@ struct task_struct { /* Restored if set_restore_sigmask() was used: */ sigset_t saved_sigmask; struct sigpending pending; +#ifdef CONFIG_RT_DELAYED_SIGNALS + struct kernel_siginfo forced_info; +#endif unsigned long sas_ss_sp; size_t sas_ss_size; unsigned int sas_ss_flags; diff --git a/kernel/Kconfig.preempt b/kernel/Kconfig.preempt index ce77f0265660..5644abd5f8a8 100644 --- a/kernel/Kconfig.preempt +++ b/kernel/Kconfig.preempt @@ -132,4 +132,14 @@ config SCHED_CORE which is the likely usage by Linux distributions, there should be no measurable impact on performance. - +config ARCH_WANTS_RT_DELAYED_SIGNALS + bool + help + This option is selected by architectures where raising signals + can happen in atomic contexts on PREEMPT_RT enabled kernels. This + option delays raising the signal until the return to user space + loop where it is also delivered. X86 requires this to deliver + signals from trap handlers which run on IST stacks. + +config RT_DELAYED_SIGNALS + def_bool PREEMPT_RT && ARCH_WANTS_RT_DELAYED_SIGNALS diff --git a/kernel/entry/common.c b/kernel/entry/common.c index bad713684c2e..0543a2c92f20 100644 --- a/kernel/entry/common.c +++ b/kernel/entry/common.c @@ -148,6 +148,18 @@ static void handle_signal_work(struct pt_regs *regs, unsigned long ti_work) arch_do_signal_or_restart(regs, ti_work & _TIF_SIGPENDING); } +#ifdef CONFIG_RT_DELAYED_SIGNALS +static inline void raise_delayed_signal(void) +{ + if (unlikely(current->forced_info.si_signo)) { + force_sig_info(¤t->forced_info); + current->forced_info.si_signo = 0; + } +} +#else +static inline void raise_delayed_signal(void) { } +#endif + static unsigned long exit_to_user_mode_loop(struct pt_regs *regs, unsigned long ti_work) { @@ -162,6 +174,8 @@ static unsigned long exit_to_user_mode_loop(struct pt_regs *regs, if (ti_work & _TIF_NEED_RESCHED) schedule(); + raise_delayed_signal(); + if (ti_work & _TIF_UPROBE) uprobe_notify_resume(regs); diff --git a/kernel/signal.c b/kernel/signal.c index 9b04631acde8..e93de6daa188 100644 --- a/kernel/signal.c +++ b/kernel/signal.c @@ -1307,6 +1307,43 @@ enum sig_handler { HANDLER_EXIT, /* Only visible as the process exit code */ }; +/* + * On some archictectures, PREEMPT_RT has to delay sending a signal from a + * trap since it cannot enable preemption, and the signal code's + * spin_locks turn into mutexes. Instead, it must set TIF_NOTIFY_RESUME + * which will send the signal on exit of the trap. + */ +#ifdef CONFIG_RT_DELAYED_SIGNALS +static inline bool force_sig_delayed(struct kernel_siginfo *info, + struct task_struct *t) +{ + if (!in_atomic()) + return false; + + if (WARN_ON_ONCE(t->forced_info.si_signo)) + return true; + + if (is_si_special(info)) { + WARN_ON_ONCE(info != SEND_SIG_PRIV); + t->forced_info.si_signo = info->si_signo; + t->forced_info.si_errno = 0; + t->forced_info.si_code = SI_KERNEL; + t->forced_info.si_pid = 0; + t->forced_info.si_uid = 0; + } else { + t->forced_info = *info; + } + set_tsk_thread_flag(t, TIF_NOTIFY_RESUME); + return true; +} +#else +static inline bool force_sig_delayed(struct kernel_siginfo *info, + struct task_struct *t) +{ + return false; +} +#endif + /* * Force a signal that the process can't ignore: if necessary * we unblock the signal and change any SIG_IGN to SIG_DFL. @@ -1327,6 +1364,9 @@ force_sig_info_to_task(struct kernel_siginfo *info, struct task_struct *t, struct k_sigaction *action; int sig = info->si_signo; + if (force_sig_delayed(info, t)) + return 0; + spin_lock_irqsave(&t->sighand->siglock, flags); action = &t->sighand->action[sig-1]; ignored = action->sa.sa_handler == SIG_IGN; -- 2.34.1