Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20;
From:   Tony Luck <tony.luck@intel.com>
To:     Thomas Gleixner <tglx@linutronix.de>
Cc:     Fenghua Yu <fenghua.yu@intel.com>, x86@kernel.org,
        linux-kernel@vger.kernel.org, patches@lists.linux.dev,
        Tony Luck <tony.luck@intel.com>
Subject: [PATCH] x86/split_lock: Make life miserable for split lockers
Date:   Wed, 16 Feb 2022 17:27:21 -0800
Message-Id: <20220217012721.9694-1-tony.luck@intel.com>
MIME-Version: 1.0
Content-Transfer-Encoding: 8bit
Precedence: bulk

In https://lore.kernel.org/all/87y22uujkm.ffs@tglx/ Thomas
said:

  Its's simply wishful thinking that stuff gets fixed because of a
  WARN_ONCE(). This has never worked. The only thing which works is to
  make stuff fail hard or slow it down in a way which makes it annoying
  enough to users to complain.

He was talking about WBINVD. But it made me think about how we
use the split lock detection feature in Linux.

Existing code has three options for applications:
1) Don't enable split lock detection (allow arbitrary split locks)
2) Warn once when a process uses split lock, but let the process
   keep running with split lock detection disabled
3) Kill process that use split locks

Option 2 falls into the "wishful thinking" territory that Thomas
warns does nothing. But option 3 might not be viable in a situation
with legacy applications that need to run.

Hence a new option to "slow it down in a way which makes it annoying".

Primary reason for this change is to provide better quality of service
to the rest of the applications running on the system. Internal
testing shows that even with many processes splitting locks, performance
for the rest of the system is much more responsive.

Add a new choice to the existing "split_lock_detect" boot parameter
"sequential". In this mode split lock detection is enabled. When an
application tries to execute a bus lock the #AC handler.

1) Blocks (interruptibly) until it can get the semaphore
	If interrupted, just return. Assume the signal will either
	kill the task, or direct execution away from the instruction
	that is trying to get the bus lock.
2) Disables split lock detection for the current core
3) Schedules a work queue to re-enable split lock detect in 2 jiffies
4) Returns

The work queue that re-enables split lock detection also releases the
semaphore.

There is a corner case where a CPU may be taken offline while
split lock detection is disabled. A CPU hotplug handler handles
this case.

Questions for this RFC:

1) Does this need to be a new option? Maybe just update the
   existing "warn" mode to add this level of extra pain.
2) Under what circumstances will work a function scheduled with
   schedule_delayed_work() run on different CPU? I've covered
   the obvious case of the CPU being taken offline before the
   work is run. But are there other cases?
3) Should I add even more pain with an msleep() before even trying
   to get the semaphore?

Signed-off-by: Tony Luck <tony.luck@intel.com>
---
 arch/x86/kernel/cpu/intel.c | 67 ++++++++++++++++++++++++++++++++-----
 1 file changed, 59 insertions(+), 8 deletions(-)

diff --git a/arch/x86/kernel/cpu/intel.c b/arch/x86/kernel/cpu/intel.c
index 8321c43554a1..a331c4a71847 100644
--- a/arch/x86/kernel/cpu/intel.c
+++ b/arch/x86/kernel/cpu/intel.c
@@ -7,10 +7,13 @@
 #include <linux/smp.h>
 #include <linux/sched.h>
 #include <linux/sched/clock.h>
+#include <linux/semaphore.h>
 #include <linux/thread_info.h>
 #include <linux/init.h>
 #include <linux/uaccess.h>
+#include <linux/workqueue.h>
 #include <linux/delay.h>
+#include <linux/cpuhotplug.h>
 
 #include <asm/cpufeature.h>
 #include <asm/msr.h>
@@ -43,6 +46,7 @@ enum split_lock_detect_state {
 	sld_warn,
 	sld_fatal,
 	sld_ratelimit,
+	sld_sequential,
 };
 
 /*
@@ -1002,10 +1006,13 @@ static const struct {
 	{ "warn",	sld_warn  },
 	{ "fatal",	sld_fatal },
 	{ "ratelimit:", sld_ratelimit },
+	{ "sequential", sld_sequential },
 };
 
 static struct ratelimit_state bld_ratelimit;
 
+static DEFINE_SEMAPHORE(buslock_sem);
+
 static inline bool match_option(const char *arg, int arglen, const char *opt)
 {
 	int len = strlen(opt), ratelimit;
@@ -1045,7 +1052,7 @@ static bool split_lock_verify_msr(bool on)
 
 static void __init sld_state_setup(void)
 {
-	enum split_lock_detect_state state = sld_warn;
+	enum split_lock_detect_state state = sld_sequential;
 	char arg[20];
 	int i, ret;
 
@@ -1116,23 +1123,60 @@ static void split_lock_init(void)
 		split_lock_verify_msr(sld_state != sld_off);
 }
 
+static void __split_lock_reenable(struct work_struct *work)
+{
+	sld_update_msr(true);
+	up(&buslock_sem);
+}
+
+/*
+ * If a CPU goes offline with pending delayed work to
+ * re-enable split lock detection then the delayed work
+ * will be executed on some other CPU. That handles releasing
+ * the buslock_sem, but because it executes on a different
+ * CPU probably won't re-enable split lock detection. This
+ * is a problem on HT systems since the sibling CPU on the
+ * same core may then be left running with split lock
+ * detection disabled.
+ *
+ * Unconditionally re-enable detection here.
+ */
+static int splitlock_cpu_offline(unsigned int cpu)
+{
+	sld_update_msr(true);
+
+	return 0;
+}
+
+static DECLARE_DELAYED_WORK(split_lock_reenable, __split_lock_reenable);
+
 static void split_lock_warn(unsigned long ip)
 {
 	pr_warn_ratelimited("#AC: %s/%d took a split_lock trap at address: 0x%lx\n",
 			    current->comm, current->pid, ip);
 
-	/*
-	 * Disable the split lock detection for this task so it can make
-	 * progress and set TIF_SLD so the detection is re-enabled via
-	 * switch_to_sld() when the task is scheduled out.
-	 */
+	switch (sld_state) {
+	case sld_warn:
+		/* This task will keep running with split lock disabled */
+		set_tsk_thread_flag(current, TIF_SLD);
+		break;
+	case sld_sequential:
+		/* Only allow one buslocked disabled core at a time */
+		if (down_interruptible(&buslock_sem) == -EINTR)
+			return;
+		schedule_delayed_work(&split_lock_reenable, 2);
+		break;
+	default:
+		break;
+	}
+
+	/* Disable split lock detection to make progress */
 	sld_update_msr(false);
-	set_tsk_thread_flag(current, TIF_SLD);
 }
 
 bool handle_guest_split_lock(unsigned long ip)
 {
-	if (sld_state == sld_warn) {
+	if (sld_state == sld_warn || sld_state == sld_sequential) {
 		split_lock_warn(ip);
 		return true;
 	}
@@ -1191,6 +1235,7 @@ void handle_bus_lock(struct pt_regs *regs)
 		/* Warn on the bus lock. */
 		fallthrough;
 	case sld_warn:
+	case sld_sequential:
 		pr_warn_ratelimited("#DB: %s/%d took a bus_lock trap at address: 0x%lx\n",
 				    current->comm, current->pid, regs->ip);
 		break;
@@ -1299,6 +1344,12 @@ static void sld_state_show(void)
 		if (boot_cpu_has(X86_FEATURE_BUS_LOCK_DETECT))
 			pr_info("#DB: setting system wide bus lock rate limit to %u/sec\n", bld_ratelimit.burst);
 		break;
+	case sld_sequential:
+		pr_info("#AC: crashing the kernel on kernel split_locks and forcing sequential access for user-space split locks\n");
+		if (cpuhp_setup_state(CPUHP_AP_ONLINE_DYN,
+				      "x86/splitlock", NULL, splitlock_cpu_offline) < 0)
+			pr_warn("No splitlock CPU offline handler\n");
+		break;
 	}
 }
 
-- 
2.35.1