2018-05-27 15:51:18

by Fenghua Yu

[permalink] [raw]
Subject: [RFC PATCH 00/16] x86/split_lock: Enable #AC exception for split locked accesses

==Introduction==

A split lock is any atomic operation whose operand crosses two cache
lines. Since the operand spans two cache lines and the operation must
be atomic, the system locks the bus while the CPU accesses the two cache
lines.

During bus locking, request from other CPUs or bus agents for control
of the bus are blocked. Blocking bus access from other CPUs plus
overhead of configuring bus locking protocol degrade not only the
performance of one CPU but overall system performance.

If operand is cacheable and completely contained in one cache line, atomic
operation is optimized by less expensive cache locking on Intel P6 and
recent processors. If split lock is detected and the two cache lines in the
operand can be merged into one cache line, cache locking instead of
more expensive bus locking will be used for atomic operation. Removing
split lock can improve overall performance.

Instructions that may cause split lock issue include lock add, lock btc,
xchg, lsl, far call, ltr, etc.

More information about split lock, bus locking, and cache locking can be
found in the latest Intel 64 and IA-32 Architecture Software Developer's
Manual.

==#AC for split lock==

Currently we can trace split lock event counter for debug purpose. But
for system deployed in the field, this event counter after the fact is
insufficient. We need a mechanism that allows the system ensure that
bus lock is never incurred due to split lock.

Intel introduces mechanism to detect split lock via alignment
check exception in Tremont and other future processors. If split lock is
from user process, #AC handler can kill the process or re-execute faulting
instruction depending on configuration. If split lock is from kernel, the
handler can cause kernel panic or re-execute faulting instruction
depending on configuration.

This capability is critical for real time system designers who build
consolidated real time systems. These systems run hard real time
code on some cores and run "untrusted" user processes on some
other cores. To date the designers have been unable to deploy these
solutions as they have no way to prevent the "untrusted" user code from
generating split lock and bus lock to block the hard real time code to
access memory during bus locking.

This capability may also find usage in cloud. A user process with split
lock running in one guest can block other cores from accessing shared
memory during its split locked memory access. That may cause overall
system performance degradation.

Split lock may open a security hole where malicious user code may slow
down overall system by executing instructions with split lock.

==Detect Split Lock==

To detect split lock, a new control bit (bit 29) in per-core TEST_CTL
MSR 0x33 will be introduced in future x86 processors. When the bit 29
is set, the processor causes #AC exception for split locked accesses at
all CPL.

The bit 29 specification in MSR TEST_CTL is published in the latest
Intel Architecture Instruction Set Extensions and Future Features
Programming Reference.

==Handle Split Lock===

BIOS or hardware may set or clear the control bit depending on
platforms. To avoid disturbing BIOS/hardware setting, by default,
kernel inherits split lock BIOS setting with
CONFIG_SPLIT_LOCK_AC_ENABLE_DEFAUTL=2.

Kernel can override BIOS setting by explicitly enabling or disabling
the feature with CONFIG_SPLIT_LOCK_AC_ENABLE_DEFAULT=0 (disable) or
1 (enable).

When an instruction accesses split locked data and triggers #AC
exception, the faulting instruction is handled as follows:
- The faulting instruction is re-executed when the instruction is
from kernel by default. If configured, split lock can causes kernel
panic.
- User process gets SIGBUS signal when the faulting instruction is
from the user process by default. If configured, this behavior can be
changed to re-execute the faulting user instruction.

We do see #AC exception is triggered and causes system hang in BIOS path
(e.g. during system reboot) after kernel enables the feature. Instead of
debugging potential system hangs due to split locked accesses in various
buggy BIOSes, kernel only maintains enabled feature in the kernel domain.
Once it's out of the kernel domain (i.e. S3, S4, S5, efi runtime
services, kexec, kdump, CPU offline, etc), kernel restores to BIOS
setting. When returning from BIOS, kernel restores to kernel setting.

In cases when user does want to detect and fix split lock bang
in BIOS (e.g. in hard real time), the user can enable #AC for split lock
using debugfs interface /sys/kernel/debug/x86/split_lock/firmware.

Since kernel doesn't know when SMI comes, it's impossible for kernel
to disable #AC for split lock before entering SMI. So SMI handler may
inherit kernel's split lock setting and kernel tester may end up
debug split lock issues in SMI.

==Tests==

- /sys/kernel/debug/x86/split_lock/test_kernel (in patche 15) tests kernel
space split lock.
- selftest (in patch 16) tests user space split lock.
- perf traces event sq_misc.split_lock
- S3, S4, S5, CPU hotplug, kexec tests with split lock eanbled.

==Changelog==
In this version:
Comments from Dave Hansen:
- Enumerate feature in X86_FEATURE_SPLIT_LOCK_AC
- Separate #AC handler from do_error_trap
- Use CONFIG to configure inherit BIOS setting, enable, or disable split
lock. Remove kernel parameter "split_lock_ac="
- Change config interface to debugfs from sysfs
- Fix a few bisectable issues
- Other changes.

Comment from Tony Luck and Dave Hansen:
- Dump right information in #AC handler

Comment from Alan Cox and Dave Hansen:
- Description of split lock in patch 0

Others:
- Remove tracing because we can trace split lock in existing
sq_misc.split_lock.
- Add CONFIG to configure either panic or re-execute faulting instruction
for split lock in kernel.
- other minor changes.

Fenghua Yu (16):
x86/split_lock: Add CONFIG and enumerate #AC exception for split
locked access feature
x86/split_lock: Handle #AC exception for split lock in kernel mode
x86/split_lock: Set up #AC exception for split locked accesses on all
CPUs
x86/split_lock: Use non locked bit set instruction in set_cpu_cap
x86/split_lock: Use non atomic set and clear bit instructions in
clear_cpufeature()
x86/split_lock: Save #AC setting for split lock in firmware in boot
time and restore the setting in reboot
x86/split_lock: Handle suspend/hibernate and resume
x86/split_lock: Set split lock during EFI runtime service
x86/split_lock: Add CONFIG to control #AC for split lock at boot time
x86/split_lock: Add a debugfs interface to allow user to enable or
disable #AC for split lock during run time
x86/split_lock: Add CONFIG to control #AC for split lock from kernel
at boot time
x86/split_lock: Add a debugfs interface to allow user to change how to
handle split lock in kernel mode during run time
x86/split_lock: Add debugfs interface to control user mode behavior
x86/split_lock: Add debugfs interface to show and control firmware
setting for split lock
x86/split_lock: Add CONFIG and debugfs interface for testing #AC for
split lock in kernel mode
x86/split_lock: Add user space split lock test in selftest

arch/x86/Kconfig | 49 ++
arch/x86/include/asm/cpu.h | 18 +
arch/x86/include/asm/cpufeature.h | 3 +-
arch/x86/include/asm/cpufeatures.h | 1 +
arch/x86/include/asm/efi.h | 5 +
arch/x86/include/asm/msr-index.h | 4 +
arch/x86/kernel/cpu/Makefile | 1 +
arch/x86/kernel/cpu/common.c | 2 +
arch/x86/kernel/cpu/cpuid-deps.c | 10 +-
arch/x86/kernel/cpu/test_ctl.c | 665 +++++++++++++++++++++
arch/x86/kernel/setup.c | 2 +
arch/x86/kernel/traps.c | 30 +-
tools/testing/selftests/x86/Makefile | 3 +-
tools/testing/selftests/x86/split_lock_user_test.c | 207 +++++++
14 files changed, 992 insertions(+), 8 deletions(-)
create mode 100644 arch/x86/kernel/cpu/test_ctl.c
create mode 100644 tools/testing/selftests/x86/split_lock_user_test.c

--
2.5.0



2018-05-27 15:47:08

by Fenghua Yu

[permalink] [raw]
Subject: [RFC PATCH 03/16] x86/split_lock: Set up #AC exception for split locked accesses on all CPUs

#AC for split lock is set up when each CPU is brought up. By default,
kernel disables the feature bit 29 in TEST_CTL MSR on each CPU.

Signed-off-by: Fenghua Yu <[email protected]>
---
arch/x86/include/asm/cpu.h | 3 +++
arch/x86/kernel/cpu/common.c | 2 ++
arch/x86/kernel/cpu/test_ctl.c | 14 ++++++++++++++
3 files changed, 19 insertions(+)

diff --git a/arch/x86/include/asm/cpu.h b/arch/x86/include/asm/cpu.h
index 083ef6d05c45..00f453fd44ac 100644
--- a/arch/x86/include/asm/cpu.h
+++ b/arch/x86/include/asm/cpu.h
@@ -43,6 +43,7 @@ unsigned int x86_stepping(unsigned int sig);
#ifdef CONFIG_SPLIT_LOCK_AC
void detect_split_lock_ac(void);
bool do_split_lock_exception(struct pt_regs *regs, unsigned long error_code);
+void setup_split_lock(void);
#else /* CONFIG_SPLIT_LOCK_AC */
static inline void detect_split_lock_ac(void) {}
static inline bool
@@ -50,5 +51,7 @@ do_split_lock_exception(struct pt_regs *regs, unsigned long error_code)
{
return false;
}
+
+static inline void setup_split_lock(void) {}
#endif /* CONFIG_SPLIT_LOCK_AC */
#endif /* _ASM_X86_CPU_H */
diff --git a/arch/x86/kernel/cpu/common.c b/arch/x86/kernel/cpu/common.c
index b02d90320ec6..af3f23170503 100644
--- a/arch/x86/kernel/cpu/common.c
+++ b/arch/x86/kernel/cpu/common.c
@@ -1382,6 +1382,8 @@ static void identify_cpu(struct cpuinfo_x86 *c)
/* Init Machine Check Exception if available. */
mcheck_cpu_init(c);

+ setup_split_lock();
+
select_idle_routine(c);

#ifdef CONFIG_NUMA
diff --git a/arch/x86/kernel/cpu/test_ctl.c b/arch/x86/kernel/cpu/test_ctl.c
index 82f110759662..c72c0517c6ab 100644
--- a/arch/x86/kernel/cpu/test_ctl.c
+++ b/arch/x86/kernel/cpu/test_ctl.c
@@ -28,6 +28,8 @@ static DEFINE_PER_CPU(struct delayed_work, reenable_delayed_work);
static unsigned long disable_split_lock_jiffies;
static DEFINE_MUTEX(reexecute_split_lock_mutex);

+static int split_lock_ac_kernel = DISABLE_SPLIT_LOCK_AC;
+
/* Detete feature of #AC for split lock by probing bit 29 in MSR_TEST_CTL. */
void detect_split_lock_ac(void)
{
@@ -84,6 +86,18 @@ static void _setup_split_lock(int split_lock_ac_val)
wrmsrl(MSR_TEST_CTL, val);
}

+void setup_split_lock(void)
+{
+ if (!boot_cpu_has(X86_FEATURE_SPLIT_LOCK_AC))
+ return;
+
+ _setup_split_lock(split_lock_ac_kernel);
+
+ pr_info_once("#AC execption for split lock is %sd\n",
+ split_lock_ac_kernel == ENABLE_SPLIT_LOCK_AC ? "enable"
+ : "disable");
+}
+
static void wait_for_reexecution(void)
{
while (time_before(jiffies, disable_split_lock_jiffies +
--
2.5.0


2018-05-27 15:47:08

by Fenghua Yu

[permalink] [raw]
Subject: [RFC PATCH 12/16] x86/split_lock: Add a debugfs interface to allow user to change how to handle split lock in kernel mode during run time

CONFIG_SPLIT_LOCK_AC_PANIC_ON_KERNEL defines how to handle split lock in
kernel mode by default. But sometimes user wants to change the default
behavior.

The interface /sys/kernel/debug/x86/split_lock/kernel_mode is added
to allow user to do so.

For example, the interface shows "[re-execute] panic" which
means two behaviors: re-execute kernel faulting instruction
after #AC (default) and panic in #AC.

User can change the default behavior by writing "re-execute" or "panic"
to the interface.

Signed-off-by: Fenghua Yu <[email protected]>
---
arch/x86/kernel/cpu/test_ctl.c | 89 ++++++++++++++++++++++++++++++++++++++++++
1 file changed, 89 insertions(+)

diff --git a/arch/x86/kernel/cpu/test_ctl.c b/arch/x86/kernel/cpu/test_ctl.c
index 6784f68af26a..0212a7979a14 100644
--- a/arch/x86/kernel/cpu/test_ctl.c
+++ b/arch/x86/kernel/cpu/test_ctl.c
@@ -18,6 +18,7 @@
#include <linux/reboot.h>
#include <linux/syscore_ops.h>
#include <linux/debugfs.h>
+#include <linux/uaccess.h>
#include <asm/msr.h>

#define DISABLE_SPLIT_LOCK_AC 0
@@ -49,6 +50,11 @@ enum {
KERNEL_MODE_LAST
};

+static const char * const kernel_modes[KERNEL_MODE_LAST] = {
+ [KERNEL_MODE_RE_EXECUTE] = "re-execute",
+ [KERNEL_MODE_PANIC] = "panic",
+};
+
static int kernel_mode_reaction = KERNEL_MODE_RE_EXECUTE;

/* Detete feature of #AC for split lock by probing bit 29 in MSR_TEST_CTL. */
@@ -361,10 +367,93 @@ static int enable_store(void *data, u64 val)

DEFINE_DEBUGFS_ATTRIBUTE(enable_ops, enable_show, enable_store, "%llx\n");

+static ssize_t
+mode_show(char __user *user_buf, const char * const *modes, int start_reaction,
+ int last_reaction, int mode_reaction, size_t count, loff_t *ppos)
+{
+ char buf[32], *s = buf;
+ int reaction, len;
+
+ mutex_lock(&split_lock_mutex);
+ for (reaction = start_reaction; reaction < last_reaction; reaction++) {
+ if (reaction == mode_reaction)
+ s += sprintf(s, "[%s] ", modes[reaction]);
+ else
+ s += sprintf(s, "%s ", modes[reaction]);
+ }
+
+ if (s != buf)
+ /* convert the last space to a newline */
+ *(s - 1) = '\n';
+ mutex_unlock(&split_lock_mutex);
+
+ len = strlen(buf);
+
+ return simple_read_from_buffer(user_buf, count, ppos, buf, len);
+}
+
+static ssize_t
+mode_store(const char __user *user_buf, size_t count, const char * const *modes,
+ int start_reaction, int last_reaction, int *mode_reaction)
+{
+ int reaction, len, ret = -EINVAL;
+ const char * const *s, *p;
+ char buf[32];
+
+ len = min(count, sizeof(buf) - 1);
+ if (copy_from_user(buf, user_buf, len))
+ return -EFAULT;
+
+ buf[len] = '\0';
+
+ mutex_lock(&split_lock_mutex);
+ p = memchr(buf, '\n', count);
+ len = p ? p - buf : count;
+
+ reaction = start_reaction;
+ for (s = &modes[reaction]; reaction < last_reaction; s++, reaction++) {
+ if (*s && len == strlen(*s) && !strncmp(buf, *s, len)) {
+ *mode_reaction = reaction;
+ ret = 0;
+ break;
+ }
+ }
+ mutex_unlock(&split_lock_mutex);
+
+ return ret;
+}
+
+static ssize_t kernel_mode_show(struct file *file, char __user *user_buf,
+ size_t count, loff_t *ppos)
+{
+ return mode_show(user_buf, kernel_modes, KERNEL_MODE_RE_EXECUTE,
+ KERNEL_MODE_LAST, kernel_mode_reaction, count, ppos);
+}
+
+static ssize_t kernel_mode_store(struct file *file, const char __user *user_buf,
+ size_t count, loff_t *ppos)
+{
+ int ret;
+
+ ret = mode_store(user_buf, count, kernel_modes, KERNEL_MODE_RE_EXECUTE,
+ KERNEL_MODE_LAST, &kernel_mode_reaction);
+ if (ret)
+ return ret;
+
+ return count;
+}
+
+static const struct file_operations kernel_mode_ops = {
+ .read = kernel_mode_show,
+ .write = kernel_mode_store,
+ .llseek = default_llseek,
+};
+
static int __init debugfs_setup_split_lock(void)
{
struct debugfs_file debugfs_files[] = {
{"enable", 0600, &enable_ops},
+ {"kernel_mode", 0600, &kernel_mode_ops },
};
struct dentry *split_lock_dir, *fd;
int i;
--
2.5.0


2018-05-27 15:48:04

by Fenghua Yu

[permalink] [raw]
Subject: [RFC PATCH 06/16] x86/split_lock: Save #AC setting for split lock in firmware in boot time and restore the setting in reboot

Firmware may contain split locked instructions. #AC handler in firmware may
treat split lock as fatal fault and stop execution. If kernel enables
#AC exception for split locked accesses and then kernel returns to
firmware during reboot, the firmware reboot code may hit #AC exception and
block the reboot. This issue happens in reality.

Instead of debugging the buggy firmware, setting of #AC for split lock is
restored to original firmware setting to hide the potential firmware issue
and allow kernel reboot succeed.

Signed-off-by: Fenghua Yu <[email protected]>
---
arch/x86/include/asm/cpu.h | 2 ++
arch/x86/kernel/cpu/test_ctl.c | 45 ++++++++++++++++++++++++++++++++++++++++++
2 files changed, 47 insertions(+)

diff --git a/arch/x86/include/asm/cpu.h b/arch/x86/include/asm/cpu.h
index 00f453fd44ac..45fec729c470 100644
--- a/arch/x86/include/asm/cpu.h
+++ b/arch/x86/include/asm/cpu.h
@@ -44,6 +44,7 @@ unsigned int x86_stepping(unsigned int sig);
void detect_split_lock_ac(void);
bool do_split_lock_exception(struct pt_regs *regs, unsigned long error_code);
void setup_split_lock(void);
+void restore_split_lock_ac_firmware(void);
#else /* CONFIG_SPLIT_LOCK_AC */
static inline void detect_split_lock_ac(void) {}
static inline bool
@@ -53,5 +54,6 @@ do_split_lock_exception(struct pt_regs *regs, unsigned long error_code)
}

static inline void setup_split_lock(void) {}
+static inline void restore_split_lock_ac_firmware(void) {}
#endif /* CONFIG_SPLIT_LOCK_AC */
#endif /* _ASM_X86_CPU_H */
diff --git a/arch/x86/kernel/cpu/test_ctl.c b/arch/x86/kernel/cpu/test_ctl.c
index c72c0517c6ab..9e47f8174a47 100644
--- a/arch/x86/kernel/cpu/test_ctl.c
+++ b/arch/x86/kernel/cpu/test_ctl.c
@@ -15,6 +15,7 @@
#include <linux/workqueue.h>
#include <linux/cpu.h>
#include <linux/mm.h>
+#include <linux/reboot.h>
#include <asm/msr.h>

#define DISABLE_SPLIT_LOCK_AC 0
@@ -29,6 +30,7 @@ static unsigned long disable_split_lock_jiffies;
static DEFINE_MUTEX(reexecute_split_lock_mutex);

static int split_lock_ac_kernel = DISABLE_SPLIT_LOCK_AC;
+static int split_lock_ac_firmware = DISABLE_SPLIT_LOCK_AC;

/* Detete feature of #AC for split lock by probing bit 29 in MSR_TEST_CTL. */
void detect_split_lock_ac(void)
@@ -62,6 +64,12 @@ void detect_split_lock_ac(void)
* before leaving.
*/
wrmsrl(MSR_TEST_CTL, orig_val);
+
+ /* Get previous firmware setting. */
+ if (orig_val & MSR_TEST_CTL_ENABLE_AC_SPLIT_LOCK)
+ split_lock_ac_firmware = ENABLE_SPLIT_LOCK_AC;
+ else
+ split_lock_ac_firmware = DISABLE_SPLIT_LOCK_AC;
}

static void _setup_split_lock(int split_lock_ac_val)
@@ -86,6 +94,41 @@ static void _setup_split_lock(int split_lock_ac_val)
wrmsrl(MSR_TEST_CTL, val);
}

+static void restore_split_lock_ac(int split_lock_ac_val)
+{
+ _setup_split_lock(split_lock_ac_val);
+}
+
+/* Restore firmware setting for #AC exception for split lock. */
+void restore_split_lock_ac_firmware(void)
+{
+ if (!boot_cpu_has(X86_FEATURE_SPLIT_LOCK_AC))
+ return;
+
+ /* Don't restore the firmware setting if kernel didn't change it. */
+ if (split_lock_ac_kernel == split_lock_ac_firmware)
+ return;
+
+ restore_split_lock_ac(split_lock_ac_firmware);
+}
+
+static void split_lock_cpu_reboot(void *unused)
+{
+ restore_split_lock_ac_firmware();
+}
+
+static int split_lock_reboot_notify(struct notifier_block *nb,
+ unsigned long code, void *unused)
+{
+ on_each_cpu_mask(cpu_online_mask, split_lock_cpu_reboot, NULL, 1);
+
+ return NOTIFY_DONE;
+}
+
+static struct notifier_block split_lock_reboot_nb = {
+ .notifier_call = split_lock_reboot_notify,
+};
+
void setup_split_lock(void)
{
if (!boot_cpu_has(X86_FEATURE_SPLIT_LOCK_AC))
@@ -221,6 +264,8 @@ static int __init split_lock_init(void)
if (ret < 0)
return ret;

+ register_reboot_notifier(&split_lock_reboot_nb);
+
return 0;
}

--
2.5.0


2018-05-27 15:48:33

by Fenghua Yu

[permalink] [raw]
Subject: [RFC PATCH 16/16] x86/split_lock: Add user space split lock test in selftest

User may use the selftest to test how user space split lock is handled,

If #AC exception is enabled for split lock, the test generates split
locked access from user space and tests how the split lock access is
handled in two ways:

1. A SIGBUS is delivered to the test processor. This is specified by
writing "sigbus" to /sys/kernel/debug/x86/split_lock/user_mode.

2. The faulting instruction is re-executed. This is specified by
writing "re-execute" to /sys/kernel/debug/x86/split_lock/user_mode.

Signed-off-by: Fenghua Yu <[email protected]>
---
tools/testing/selftests/x86/Makefile | 3 +-
tools/testing/selftests/x86/split_lock_user_test.c | 207 +++++++++++++++++++++
2 files changed, 209 insertions(+), 1 deletion(-)
create mode 100644 tools/testing/selftests/x86/split_lock_user_test.c

diff --git a/tools/testing/selftests/x86/Makefile b/tools/testing/selftests/x86/Makefile
index 39f66bc29b82..e4bcb72c0ae8 100644
--- a/tools/testing/selftests/x86/Makefile
+++ b/tools/testing/selftests/x86/Makefile
@@ -11,7 +11,8 @@ CAN_BUILD_X86_64 := $(shell ./check_cc.sh $(CC) trivial_64bit_program.c)

TARGETS_C_BOTHBITS := single_step_syscall sysret_ss_attrs syscall_nt test_mremap_vdso \
check_initial_reg_state sigreturn iopl mpx-mini-test ioperm \
- protection_keys test_vdso test_vsyscall mov_ss_trap
+ protection_keys test_vdso test_vsyscall mov_ss_trap \
+ split_lock_user_test
TARGETS_C_32BIT_ONLY := entry_from_vm86 syscall_arg_fault test_syscall_vdso unwind_vdso \
test_FCMOV test_FCOMI test_FISTTP \
vdso_restorer
diff --git a/tools/testing/selftests/x86/split_lock_user_test.c b/tools/testing/selftests/x86/split_lock_user_test.c
new file mode 100644
index 000000000000..9ce5fc1af154
--- /dev/null
+++ b/tools/testing/selftests/x86/split_lock_user_test.c
@@ -0,0 +1,207 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * Copyright (C) 2018 Intel Corporation
+ * Author: Fenghua Yu <[email protected]>
+ *
+ * Pre-request:
+ * Kernel is built with CONFIG_SPLIT_LOCK_AC=y.
+ * Split lock is enabled. If not, enable it by:
+ * #echo 1 >/sys/kernel/debug/x86/split_lock/enable
+ *
+ * Usage:
+ * Run the test alone and it should show:
+ * TEST PASS: locked instruction is re-executed.
+ * TEST PASS: Caught SIGBUS/#AC due to split locked access
+ *
+ * Or launch the test from perf and watch "split_lock_user" event count.
+ * #/perf stat -e sq_misc.split_lock /root/split_lock_user_test_64
+ * TEST PASS: locked instruction is re-executed.
+ * TEST PASS: Caught SIGBUS/#AC due to split locked access
+ *
+ * Performance counter stats for 'tools/testing/selftests/x86/
+ * split_lock_user_test_64':
+ * 2 sq_misc.split_lock
+ *
+ * 1.001507372 seconds time elapsed
+ */
+#include <sys/types.h>
+#include <sys/stat.h>
+#include <sys/wait.h>
+#include <fcntl.h>
+#include <stdlib.h>
+#include <signal.h>
+#include <stdio.h>
+#include <unistd.h>
+#include <string.h>
+
+int split_lock_exception;
+
+void catch_sigbus(int sig)
+{
+ split_lock_exception = 1;
+ printf("TEST PASS: Caught SIGBUS/#AC due to split locked access\n");
+
+ exit(-1);
+}
+
+int split_lock_ac_enabled(void)
+{
+ int fd, enable, ret;
+ char buf[16];
+
+ fd = open("/sys/kernel/debug/x86/split_lock/enable", O_RDONLY);
+ if (fd < 0)
+ return 0;
+
+ if (read(fd, buf, sizeof(int)) < 0) {
+ ret = 0;
+ goto out;
+ }
+
+ enable = atoi(buf);
+ if (enable == 1)
+ ret = 1;
+ else
+ ret = 0;
+
+out:
+ close(fd);
+
+ return ret;
+}
+
+int setup_user_mode(char *user_mode_reaction)
+{
+ ssize_t count;
+ int fd;
+
+ if (strcmp(user_mode_reaction, "sigbus") &&
+ strcmp(user_mode_reaction, "re-execute"))
+ return -1;
+
+ fd = open("/sys/kernel/debug/x86/split_lock/user_mode", O_RDWR);
+ if (fd < 0)
+ return -1;
+
+ count = write(fd, user_mode_reaction, strlen(user_mode_reaction));
+ if (count != strlen(user_mode_reaction))
+ return -1;
+
+ close(fd);
+
+ return 0;
+}
+
+/*
+ * Atomically add 1 to *iptr using "lock addl" instruction.
+ * Since *iptr crosses two cache lines, #AC is generated by the split lock.
+ */
+void do_split_locked_inst(int *iptr)
+{
+ /*
+ * The distance between iptr and next cache line is 3 bytes.
+ * Operand size in "addl" is 4 bytes. So iptr will span two cache
+ * lines. "lock addl" instruction will trigger #AC in hardware
+ * and kernel either delivers SIGBUS to this process or re-execute
+ * the instruction depending on
+ * /sys/kernel/debug/x86/split_lock/user_mode setting.
+ */
+ asm volatile ("lock addl $1, %0\n\t"
+ : "=m" (*iptr));
+}
+
+/*
+ * Test re-executing a locked instruction after it generates #AC for split lock
+ * operand *iptr.
+ */
+void test_re_execute(int *iptr)
+{
+ setup_user_mode("re-execute");
+
+ /* Initialize *iptr. */
+ *iptr = 0;
+
+ /* The locked instruction triggers #AC and then it's re-executed. */
+ do_split_locked_inst(iptr);
+
+ /* If executed successfully, *iptr should be 1 now. */
+ if (*iptr == 1) {
+ printf("TEST PASS: locked instruction is re-executed.\n");
+ } else {
+ printf("TEST FAIL: No #AC exception is caught and ");
+ printf("instruction is not executed correctly.\n");
+ }
+}
+
+/*
+ * Test SIGBUS delivered after a lock instruction generates #AC for split lock
+ * operand *iptr.
+ */
+void test_sigbus(int *iptr)
+{
+ pid_t pid;
+
+ setup_user_mode("sigbus");
+
+ pid = fork();
+ if (pid) {
+ waitpid(pid, NULL, WIFSIGNALED(NULL));
+ return;
+ }
+
+ /*
+ * The locked instruction will trigger #AC and kernel will deliver
+ * SIGBUS to this process. The SIGBUS handler in this process will
+ * verify that the signal is delivered and the process is killed then.
+ */
+ do_split_locked_inst(iptr);
+}
+
+int main(int argc, char **argv)
+{
+ int *iptr;
+ char *cptr;
+
+ if (!split_lock_ac_enabled()) {
+ printf("#AC exception for split lock is NOT enabled!!\n");
+ printf("Before test, please make sure:\n");
+ printf("split lock feature is supported on this platform,\n");
+ printf("CONFIG_SPLIT_LOCK_AC is turned on,\n");
+ printf("and /sys/kernel/debug/x86/split_lock/enable is 1.\n");
+
+ return 0;
+ }
+
+ signal(SIGBUS, catch_sigbus);
+
+ /*
+ * Enable Alignment Checking on x86_64.
+ * This will generate alignment check on not only split lock but also
+ * on any misalignment.
+ * Turn on this for reference only.
+ */
+ /* __asm__("pushf\norl $0x40000,(%rsp)\npopf"); */
+
+ /* aligned_alloc() provides 64-byte aligned memory */
+ cptr = (char *)aligned_alloc(64, 128);
+
+ /*
+ * Increment the pointer by 61, making it 3 bytes away from the next
+ * cache line and 4-byte *iptr across two cache line.
+ */
+ iptr = (int *)(cptr + 61);
+
+ test_re_execute(iptr);
+ /*
+ * The split lock is disabled after the last locked instruction is
+ * re-executed.
+ *
+ * Wait for the split lock is re-enabled again before next test.
+ */
+ sleep(1);
+ test_sigbus(iptr);
+
+ free(cptr);
+
+ return 0;
+}
--
2.5.0


2018-05-27 15:48:50

by Fenghua Yu

[permalink] [raw]
Subject: [RFC PATCH 01/16] x86/split_lock: Add CONFIG and enumerate #AC exception for split locked access feature

#AC for split lock is supported on Tremont and future processors. We
need to enumerate the feature on processors.

Add CONFIG_SPLIT_LOCK_AC (default: y, dependent on X86 and CPU_SUP_INTEL)
to control inclusion of the feature.

Bit 29 in MSR TEST_CTL 0x33 can only be set on processors that support
the feature. On processors not supporting the feature, the bit is reserved
i.e. can not be set as one) or the MSR doesn't exist.

To detect the feature, attempt to set the bit in the MSR. If the writing
succeeds, the feature is available. Otherwise, the feature is not
supported on this platform.

And the enumeration happens before SMP so all processors can use
enumerated result when SMP boots.

test_ctl.c is created to contain majority of split lock code. Hopefully
more features related to MSR_TEST_CTL will be added to the file and
share some code with split lock in future.

More information on the bit 29 and MSR TEST_CTL can be found in the latest
Intel Architecture Instruction Set Extensions and Future Features
Programming Reference.

Signed-off-by: Fenghua Yu <[email protected]>
---
arch/x86/Kconfig | 12 ++++++++++
arch/x86/include/asm/cpu.h | 5 ++++
arch/x86/include/asm/cpufeatures.h | 1 +
arch/x86/include/asm/msr-index.h | 4 ++++
arch/x86/kernel/cpu/Makefile | 1 +
arch/x86/kernel/cpu/test_ctl.c | 49 ++++++++++++++++++++++++++++++++++++++
arch/x86/kernel/setup.c | 2 ++
7 files changed, 74 insertions(+)
create mode 100644 arch/x86/kernel/cpu/test_ctl.c

diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
index 6ca22706cd64..043cde9a9b08 100644
--- a/arch/x86/Kconfig
+++ b/arch/x86/Kconfig
@@ -449,6 +449,18 @@ config INTEL_RDT

Say N if unsure.

+config SPLIT_LOCK_AC
+ bool "#AC exception for split locked accesses support"
+ default y
+ depends on X86 && CPU_SUP_INTEL
+ help
+ Select to support #AC exception for split locked accesses. More
+ detailed information about the feature can be found in
+ Intel Architecture Instruction Set Extensions and Future Feature
+ Programming Reference.
+
+ Say N if unsure.
+
if X86_32
config X86_BIGSMP
bool "Support for big SMP systems with more than 8 CPUs"
diff --git a/arch/x86/include/asm/cpu.h b/arch/x86/include/asm/cpu.h
index adc6cc86b062..8e224956e3e2 100644
--- a/arch/x86/include/asm/cpu.h
+++ b/arch/x86/include/asm/cpu.h
@@ -40,4 +40,9 @@ int mwait_usable(const struct cpuinfo_x86 *);
unsigned int x86_family(unsigned int sig);
unsigned int x86_model(unsigned int sig);
unsigned int x86_stepping(unsigned int sig);
+#ifdef CONFIG_SPLIT_LOCK_AC
+void detect_split_lock_ac(void);
+#else /* CONFIG_SPLIT_LOCK_AC */
+static inline void detect_split_lock_ac(void) {}
+#endif /* CONFIG_SPLIT_LOCK_AC */
#endif /* _ASM_X86_CPU_H */
diff --git a/arch/x86/include/asm/cpufeatures.h b/arch/x86/include/asm/cpufeatures.h
index fb00a2fca990..8278d2ced4ea 100644
--- a/arch/x86/include/asm/cpufeatures.h
+++ b/arch/x86/include/asm/cpufeatures.h
@@ -219,6 +219,7 @@
#define X86_FEATURE_IBPB ( 7*32+26) /* Indirect Branch Prediction Barrier */
#define X86_FEATURE_STIBP ( 7*32+27) /* Single Thread Indirect Branch Predictors */
#define X86_FEATURE_ZEN ( 7*32+28) /* "" CPU is AMD family 0x17 (Zen) */
+#define X86_FEATURE_SPLIT_LOCK_AC ( 7*32+29) /* #AC exception for split locked access */

/* Virtualization flags: Linux defined, word 8 */
#define X86_FEATURE_TPR_SHADOW ( 8*32+ 0) /* Intel TPR Shadow */
diff --git a/arch/x86/include/asm/msr-index.h b/arch/x86/include/asm/msr-index.h
index fda2114197b3..3b0fe0f55a61 100644
--- a/arch/x86/include/asm/msr-index.h
+++ b/arch/x86/include/asm/msr-index.h
@@ -39,6 +39,10 @@

/* Intel MSRs. Some also available on other CPUs */

+#define MSR_TEST_CTL 0x00000033
+#define MSR_TEST_CTL_ENABLE_AC_SPLIT_LOCK_SHIFT 29
+#define MSR_TEST_CTL_ENABLE_AC_SPLIT_LOCK BIT(29)
+
#define MSR_IA32_SPEC_CTRL 0x00000048 /* Speculation Control */
#define SPEC_CTRL_IBRS (1 << 0) /* Indirect Branch Restricted Speculation */
#define SPEC_CTRL_STIBP (1 << 1) /* Single Thread Indirect Branch Predictors */
diff --git a/arch/x86/kernel/cpu/Makefile b/arch/x86/kernel/cpu/Makefile
index 7a40196967cb..228654485b3f 100644
--- a/arch/x86/kernel/cpu/Makefile
+++ b/arch/x86/kernel/cpu/Makefile
@@ -44,6 +44,7 @@ obj-$(CONFIG_MICROCODE) += microcode/
obj-$(CONFIG_X86_LOCAL_APIC) += perfctr-watchdog.o

obj-$(CONFIG_HYPERVISOR_GUEST) += vmware.o hypervisor.o mshyperv.o
+obj-$(CONFIG_SPLIT_LOCK_AC) += test_ctl.o

ifdef CONFIG_X86_FEATURE_NAMES
quiet_cmd_mkcapflags = MKCAP $@
diff --git a/arch/x86/kernel/cpu/test_ctl.c b/arch/x86/kernel/cpu/test_ctl.c
new file mode 100644
index 000000000000..46fa8e21f9f6
--- /dev/null
+++ b/arch/x86/kernel/cpu/test_ctl.c
@@ -0,0 +1,49 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * Enable #AC exception for split locked accesses in TEST_CTL MSR
+ *
+ * Copyright (C) 2018 Intel Corporation
+ *
+ * Author:
+ * Fenghua Yu <[email protected]>
+ */
+
+#define pr_fmt(fmt) KBUILD_MODNAME ": " fmt
+
+#include <linux/printk.h>
+#include <linux/cpufeature.h>
+#include <asm/msr.h>
+
+/* Detete feature of #AC for split lock by probing bit 29 in MSR_TEST_CTL. */
+void detect_split_lock_ac(void)
+{
+ u64 val, orig_val;
+ int ret;
+
+ /* Attempt to read the MSR. If the MSR doesn't exist, reading fails. */
+ ret = rdmsrl_safe(MSR_TEST_CTL, &val);
+ if (ret)
+ return;
+
+ orig_val = val;
+
+ /* Turn on the split lock bit */
+ val |= MSR_TEST_CTL_ENABLE_AC_SPLIT_LOCK;
+
+ /*
+ * Attempt to set bit 29 in the MSR. The bit is set successfully
+ * only on processors that support #AC for split lock.
+ */
+ ret = wrmsrl_safe(MSR_TEST_CTL, val);
+ if (ret)
+ return;
+
+ /* The feature is supported on CPU. */
+ setup_force_cpu_cap(X86_FEATURE_SPLIT_LOCK_AC);
+
+ /*
+ * Need to restore split lock setting to original firmware setting
+ * before leaving.
+ */
+ wrmsrl(MSR_TEST_CTL, orig_val);
+}
diff --git a/arch/x86/kernel/setup.c b/arch/x86/kernel/setup.c
index 5c623dfe39d1..4deb1ad5b442 100644
--- a/arch/x86/kernel/setup.c
+++ b/arch/x86/kernel/setup.c
@@ -1272,6 +1272,8 @@ void __init setup_arch(char **cmdline_p)

mcheck_init();

+ detect_split_lock_ac();
+
arch_init_ideal_nops();

register_refined_jiffies(CLOCK_TICK_RATE);
--
2.5.0


2018-05-27 15:49:02

by Fenghua Yu

[permalink] [raw]
Subject: [RFC PATCH 07/16] x86/split_lock: Handle suspend/hibernate and resume

During suspend or hibernation, system enters firmware. To avoid potential
firmware issue that may generate #AC exception for split locked accesses,
handle the #AC as fatal exception, and block suspend or hibernation,
the setting of #AC for split lock is restored to firmware setting during
suspend or hibernation. When resuming from suspend or hibernation, the
split lock setting is restored to kernel setting.

Signed-off-by: Fenghua Yu <[email protected]>
Acked-by: Rafael J. Wysocki <[email protected]>
---
arch/x86/include/asm/cpu.h | 2 ++
arch/x86/kernel/cpu/test_ctl.c | 30 ++++++++++++++++++++++++++++++
2 files changed, 32 insertions(+)

diff --git a/arch/x86/include/asm/cpu.h b/arch/x86/include/asm/cpu.h
index 45fec729c470..f2ca84deccc6 100644
--- a/arch/x86/include/asm/cpu.h
+++ b/arch/x86/include/asm/cpu.h
@@ -45,6 +45,7 @@ void detect_split_lock_ac(void);
bool do_split_lock_exception(struct pt_regs *regs, unsigned long error_code);
void setup_split_lock(void);
void restore_split_lock_ac_firmware(void);
+void restore_split_lock_ac_kernel(void);
#else /* CONFIG_SPLIT_LOCK_AC */
static inline void detect_split_lock_ac(void) {}
static inline bool
@@ -55,5 +56,6 @@ do_split_lock_exception(struct pt_regs *regs, unsigned long error_code)

static inline void setup_split_lock(void) {}
static inline void restore_split_lock_ac_firmware(void) {}
+static inline void restore_split_lock_ac_kernel(void) {}
#endif /* CONFIG_SPLIT_LOCK_AC */
#endif /* _ASM_X86_CPU_H */
diff --git a/arch/x86/kernel/cpu/test_ctl.c b/arch/x86/kernel/cpu/test_ctl.c
index 9e47f8174a47..82440740b2b9 100644
--- a/arch/x86/kernel/cpu/test_ctl.c
+++ b/arch/x86/kernel/cpu/test_ctl.c
@@ -16,6 +16,7 @@
#include <linux/cpu.h>
#include <linux/mm.h>
#include <linux/reboot.h>
+#include <linux/syscore_ops.h>
#include <asm/msr.h>

#define DISABLE_SPLIT_LOCK_AC 0
@@ -112,6 +113,15 @@ void restore_split_lock_ac_firmware(void)
restore_split_lock_ac(split_lock_ac_firmware);
}

+/* Restore kernel setting for #AC enable bit for split lock. */
+void restore_split_lock_ac_kernel(void)
+{
+ if (!boot_cpu_has(X86_FEATURE_SPLIT_LOCK_AC))
+ return;
+
+ restore_split_lock_ac(split_lock_ac_kernel);
+}
+
static void split_lock_cpu_reboot(void *unused)
{
restore_split_lock_ac_firmware();
@@ -247,11 +257,29 @@ static int split_lock_online(unsigned int cpu)

static int split_lock_offline(unsigned int cpu)
{
+ restore_split_lock_ac_firmware();
cancel_delayed_work(&per_cpu(reenable_delayed_work, cpu));

return 0;
}

+static int split_lock_bsp_suspend(void)
+{
+ restore_split_lock_ac_firmware();
+
+ return 0;
+}
+
+static void split_lock_bsp_resume(void)
+{
+ restore_split_lock_ac_kernel();
+}
+
+static struct syscore_ops split_lock_syscore_ops = {
+ .suspend = split_lock_bsp_suspend,
+ .resume = split_lock_bsp_resume,
+};
+
static int __init split_lock_init(void)
{
int ret;
@@ -264,6 +292,8 @@ static int __init split_lock_init(void)
if (ret < 0)
return ret;

+ register_syscore_ops(&split_lock_syscore_ops);
+
register_reboot_notifier(&split_lock_reboot_nb);

return 0;
--
2.5.0


2018-05-27 15:49:26

by Fenghua Yu

[permalink] [raw]
Subject: [RFC PATCH 15/16] x86/split_lock: Add CONFIG and debugfs interface for testing #AC for split lock in kernel mode

Sometimes user wants to test how split lock in kernel mode is process.

debugfs interface /sys/kernel/debug/x86/split_lock/test_kernel is provided
to do the test. The interface is enabled by CONFIG_SPLIT_LOCK_AC_TEST.

Writing 1 to the interface file triggers a split locked access in kernel
and procedure of handling the split lock.

The file is not readable.

Signed-off-by: Fenghua Yu <[email protected]>
---
arch/x86/Kconfig | 10 +++++++
arch/x86/kernel/cpu/test_ctl.c | 61 ++++++++++++++++++++++++++++++++++++++++++
2 files changed, 71 insertions(+)

diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
index d42d90abd644..5d44cc86aecf 100644
--- a/arch/x86/Kconfig
+++ b/arch/x86/Kconfig
@@ -488,6 +488,16 @@ config SPLIT_LOCK_AC_PANIC_ON_KERNEL

Say N if unsure.

+config SPLIT_LOCK_AC_TEST
+ bool "Test #AC exception for split locked accesses"
+ default n
+ depends on SPLIT_LOCK_AC
+ help
+ Select to enable testing #AC exception for split lock accesses.
+ This adds interface /sys/kernel/debug/x86/split_lock/test_kernel
+ to allow user to trigger split locked access in kernel and test
+ split lock handling.
+
if X86_32
config X86_BIGSMP
bool "Support for big SMP systems with more than 8 CPUs"
diff --git a/arch/x86/kernel/cpu/test_ctl.c b/arch/x86/kernel/cpu/test_ctl.c
index 8bdc01067be9..910ff19c2a3e 100644
--- a/arch/x86/kernel/cpu/test_ctl.c
+++ b/arch/x86/kernel/cpu/test_ctl.c
@@ -541,6 +541,64 @@ static int firmware_store(void *data, u64 val)

DEFINE_DEBUGFS_ATTRIBUTE(firmware_ops, firmware_show, firmware_store, "%llx\n");

+#ifdef CONFIG_SPLIT_LOCK_AC_TEST
+/* Execute locked btsl instruction with split lock operand. */
+static void split_lock_test_kernel(void)
+{
+ char cptr[128] __aligned(64);
+ int *iptr;
+
+ /*
+ * Change the pointer, making it 3-byte away from the next cache
+ * line.
+ */
+ iptr = (int *)(cptr + 61);
+
+ /* Initial value 0 in iptr */
+ *iptr = 0;
+
+ pr_info("split lock test: split lock address=0x%lx\n",
+ (unsigned long)iptr);
+
+ /*
+ * The distance between iptr and next cache line is 3 bytes.
+ * Operand size in "btsl" is 4 bytes. So iptr will span two cache
+ * lines. "lock btsl" instruction will trigger #AC in hardware
+ * and kernel will either re-execute the instruction or go to panic
+ * depending on user configuration in
+ * /sys/kernel/debug/x86/split_lock/kernel_mode.
+ */
+ asm volatile ("lock btsl $0, %0\n\t"
+ : "=m" (*iptr));
+
+ if (*iptr == 1)
+ pr_info("split lock kernel test passes\n");
+ else
+ pr_info("split lock kernel test fails\n");
+}
+
+/*
+ * Writing 1 to /sys/kernel/debug/x86/split_lock/test_kernel triggers
+ * split locke daccess in kernel mode.
+ */
+static int test_kernel_store(void *data, u64 val)
+{
+ if (split_lock_ac_kernel == DISABLE_SPLIT_LOCK_AC)
+ return -ENODEV;
+
+ if (val != 1)
+ return -EINVAL;
+
+ mutex_lock(&split_lock_mutex);
+ split_lock_test_kernel();
+ mutex_unlock(&split_lock_mutex);
+
+ return 0;
+}
+
+DEFINE_DEBUGFS_ATTRIBUTE(test_kernel_ops, NULL, test_kernel_store, "%llx\n");
+#endif /* CONFIG_SPLIT_LOCK_AC_TEST */
+
static int __init debugfs_setup_split_lock(void)
{
struct debugfs_file debugfs_files[] = {
@@ -548,6 +606,9 @@ static int __init debugfs_setup_split_lock(void)
{"kernel_mode", 0600, &kernel_mode_ops },
{"user_mode", 0600, &user_mode_ops },
{"firmware", 0600, &firmware_ops },
+#ifdef CONFIG_SPLIT_LOCK_AC_TEST
+ {"test_kernel", 0200, &test_kernel_ops },
+#endif
};
struct dentry *split_lock_dir, *fd;
int i;
--
2.5.0


2018-05-27 15:49:30

by Fenghua Yu

[permalink] [raw]
Subject: [RFC PATCH 13/16] x86/split_lock: Add debugfs interface to control user mode behavior

Sometimes user wants to change how to handle user mode split lock during
run time.

Add interface /sys/kernel/debug/x86/split_lock/user_mode to allow user
to choose to either generate SIGBUS (default) when hitting split lock in
user or re-execute the user faulting instruction without generating
SIGBUS signal.

By default, user process is killed by SIGBUS after hitting split lock.
This ensures no user process can generate bus lock to block memory access
from other CPUs.

If user just wants to log split lock and continue to run the user process,
the user can configure "re-execute" in the interface.

Signed-off-by: Fenghua Yu <[email protected]>
---
arch/x86/kernel/cpu/test_ctl.c | 68 ++++++++++++++++++++++++++++++++++++++++++
1 file changed, 68 insertions(+)

diff --git a/arch/x86/kernel/cpu/test_ctl.c b/arch/x86/kernel/cpu/test_ctl.c
index 0212a7979a14..d774485a5ca4 100644
--- a/arch/x86/kernel/cpu/test_ctl.c
+++ b/arch/x86/kernel/cpu/test_ctl.c
@@ -57,6 +57,19 @@ static const char * const kernel_modes[KERNEL_MODE_LAST] = {

static int kernel_mode_reaction = KERNEL_MODE_RE_EXECUTE;

+enum {
+ USER_MODE_SIGBUS,
+ USER_MODE_RE_EXECUTE,
+ USER_MODE_LAST
+};
+
+static const char * const user_modes[USER_MODE_LAST] = {
+ [USER_MODE_SIGBUS] = "sigbus",
+ [USER_MODE_RE_EXECUTE] = "re-execute",
+};
+
+static int user_mode_reaction = USER_MODE_SIGBUS;
+
/* Detete feature of #AC for split lock by probing bit 29 in MSR_TEST_CTL. */
void detect_split_lock_ac(void)
{
@@ -220,6 +233,16 @@ static void delayed_reenable_split_lock(struct work_struct *w)
mutex_unlock(&reexecute_split_lock_mutex);
}

+static unsigned long eflags_ac(struct pt_regs *regs)
+{
+ return regs->flags & X86_EFLAGS_AC;
+}
+
+static unsigned long cr0_am(struct pt_regs *regs)
+{
+ return read_cr0() & X86_CR0_AM;
+}
+
/* Will the faulting instruction be re-executed? */
static bool re_execute(struct pt_regs *regs)
{
@@ -230,6 +253,24 @@ static bool re_execute(struct pt_regs *regs)
if (!user_mode(regs))
return true;

+ /*
+ * Now check if the user faulting instruction can be re-executed.
+ *
+ * If both CR0.AM (Alignment Mask) and EFLAGS.AC (Alignment Check)
+ * are set in user space, any misalignment including split lock
+ * can trigger #AC. In this case, we just issue SIGBUS as standard
+ * #AC handler to the user process because split lock is not the
+ * definite reason for triggering this #AC.
+ *
+ * If either CR0.AM or EFLAGS.AC is zero, the only reason for
+ * triggering this #AC is split lock. So the faulting instruction
+ * can be re-executed if required by user.
+ */
+ if (cr0_am(regs) == 0 || eflags_ac(regs) == 0)
+ /* User faulting instruction will be re-executed if required. */
+ if (user_mode_reaction == USER_MODE_RE_EXECUTE)
+ return true;
+
return false;
}

@@ -449,11 +490,38 @@ static const struct file_operations kernel_mode_ops = {
.llseek = default_llseek,
};

+static ssize_t user_mode_show(struct file *file, char __user *user_buf,
+ size_t count, loff_t *ppos)
+{
+ return mode_show(user_buf, user_modes, USER_MODE_SIGBUS,
+ USER_MODE_LAST, user_mode_reaction, count, ppos);
+}
+
+static ssize_t user_mode_store(struct file *file, const char __user *user_buf,
+ size_t count, loff_t *ppos)
+{
+ int ret;
+
+ ret = mode_store(user_buf, count, user_modes, USER_MODE_SIGBUS,
+ USER_MODE_LAST, &user_mode_reaction);
+ if (ret)
+ return ret;
+
+ return count;
+}
+
+static const struct file_operations user_mode_ops = {
+ .read = user_mode_show,
+ .write = user_mode_store,
+ .llseek = default_llseek,
+};
+
static int __init debugfs_setup_split_lock(void)
{
struct debugfs_file debugfs_files[] = {
{"enable", 0600, &enable_ops},
{"kernel_mode", 0600, &kernel_mode_ops },
+ {"user_mode", 0600, &user_mode_ops },
};
struct dentry *split_lock_dir, *fd;
int i;
--
2.5.0


2018-05-27 15:49:42

by Fenghua Yu

[permalink] [raw]
Subject: [RFC PATCH 14/16] x86/split_lock: Add debugfs interface to show and control firmware setting for split lock

By default, the firmware setting for split lock inherits from firmware
setting before kernel boots.

In cases like hard real time, user wants to identify split lock issues in
firmware even when #AC for split lock is not enabled in firmware before
kernel boots. The user may explicitly enable #AC for split lock for
firmware. Getting bang whenever there is a split lock in firmware helps
identify and fix the firmware split lock issue.

The debugfs interface /sys/kernel/debug/x86/split_lock/firmware shows the
firmware split lock setting: 0 for disabled and 1 for enabled.

User can override the firmware setting by writing 1 to enable #AC for
split lock in firmware and write 0 to disable #AC for split lock in
firmware.

When control flow comes to firmware (e.g. in S3, S4, S5, and EFI runtime),
kernel sets the firmware setting for split lock. Kernel restores to kernel
setting for split lock after coming back to kernel.

Please note: System Management Mode (SMM) is out of control of
kernel. So this interface cannot control split lock in SMM.

Signed-off-by: Fenghua Yu <[email protected]>
---
arch/x86/kernel/cpu/test_ctl.c | 26 ++++++++++++++++++++++++++
1 file changed, 26 insertions(+)

diff --git a/arch/x86/kernel/cpu/test_ctl.c b/arch/x86/kernel/cpu/test_ctl.c
index d774485a5ca4..8bdc01067be9 100644
--- a/arch/x86/kernel/cpu/test_ctl.c
+++ b/arch/x86/kernel/cpu/test_ctl.c
@@ -516,12 +516,38 @@ static const struct file_operations user_mode_ops = {
.llseek = default_llseek,
};

+static int firmware_show(void *data, u64 *val)
+{
+ *val = split_lock_ac_firmware;
+
+ return 0;
+}
+
+static int firmware_store(void *data, u64 val)
+{
+ if (val != DISABLE_SPLIT_LOCK_AC && val != ENABLE_SPLIT_LOCK_AC)
+ return -EINVAL;
+
+ /* No need to update setting if new setting is the same as old one. */
+ if (val == split_lock_ac_firmware)
+ return 0;
+
+ mutex_lock(&split_lock_mutex);
+ split_lock_ac_firmware = val;
+ mutex_unlock(&split_lock_mutex);
+
+ return 0;
+}
+
+DEFINE_DEBUGFS_ATTRIBUTE(firmware_ops, firmware_show, firmware_store, "%llx\n");
+
static int __init debugfs_setup_split_lock(void)
{
struct debugfs_file debugfs_files[] = {
{"enable", 0600, &enable_ops},
{"kernel_mode", 0600, &kernel_mode_ops },
{"user_mode", 0600, &user_mode_ops },
+ {"firmware", 0600, &firmware_ops },
};
struct dentry *split_lock_dir, *fd;
int i;
--
2.5.0


2018-05-27 15:50:18

by Fenghua Yu

[permalink] [raw]
Subject: [RFC PATCH 11/16] x86/split_lock: Add CONFIG to control #AC for split lock from kernel at boot time

By default, faulting kernel instruction that generates #AC due to
split locked access is re-executed and doesn't block system.

But in cases when user doesn't tolerate any split lock (e.g. in hard real
time system), CONFIG_SPLIT_LOCK_AC_PANIC_ON_KERNEL is added to opt-in panic
when #AC for split lock is triggered from kernel.

If it's configured as N (default), faulting instruction in kernel mode
will be recorded in #AC handler and re-executed. Split lock is not
treated as fatal fault.

If configured as Y, kernel will panic if #AC triggered by split lock
is from a kernel instruction.

Signed-off-by: Fenghua Yu <[email protected]>
---
arch/x86/Kconfig | 13 +++++++++++++
arch/x86/kernel/cpu/test_ctl.c | 15 +++++++++++++++
2 files changed, 28 insertions(+)

diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
index 1d0dcd6fa69a..d42d90abd644 100644
--- a/arch/x86/Kconfig
+++ b/arch/x86/Kconfig
@@ -475,6 +475,19 @@ config SPLIT_LOCK_AC_ENABLE_DEFAULT

Leave this to the default value of 2 if you are unsure.

+config SPLIT_LOCK_AC_PANIC_ON_KERNEL
+ bool "Panic on #AC for split lock in kernel mode"
+ default n
+ depends on SPLIT_LOCK_AC
+ help
+ If you enable this option, kernel goes to panic when hitting
+ #AC for split lock from kernel.
+
+ If you disable this option, the kernel faulting instruction
+ is re-executed after hitting #AC for split lock.
+
+ Say N if unsure.
+
if X86_32
config X86_BIGSMP
bool "Support for big SMP systems with more than 8 CPUs"
diff --git a/arch/x86/kernel/cpu/test_ctl.c b/arch/x86/kernel/cpu/test_ctl.c
index e8b3032f3db0..6784f68af26a 100644
--- a/arch/x86/kernel/cpu/test_ctl.c
+++ b/arch/x86/kernel/cpu/test_ctl.c
@@ -43,6 +43,14 @@ struct debugfs_file {
const struct file_operations *fops;
};

+enum {
+ KERNEL_MODE_RE_EXECUTE,
+ KERNEL_MODE_PANIC,
+ KERNEL_MODE_LAST
+};
+
+static int kernel_mode_reaction = KERNEL_MODE_RE_EXECUTE;
+
/* Detete feature of #AC for split lock by probing bit 29 in MSR_TEST_CTL. */
void detect_split_lock_ac(void)
{
@@ -238,6 +246,10 @@ bool do_split_lock_exception(struct pt_regs *regs, unsigned long error_code)
struct task_struct *tsk = current;
int cpu = task_cpu(tsk);

+ /* If configured as panic for split lock in kernel mode, panic. */
+ if (kernel_mode_reaction == KERNEL_MODE_PANIC && !user_mode(regs))
+ panic("Alignment Check exception for split lock in kernel.");
+
if (!re_execute(regs))
return false;

@@ -391,6 +403,9 @@ static int __init split_lock_init(void)
if (ret)
pr_warn("debugfs for #AC for split lock cannot be set up\n");

+ if (IS_ENABLED(CONFIG_SPLIT_LOCK_AC_PANIC_ON_KERNEL))
+ kernel_mode_reaction = KERNEL_MODE_PANIC;
+
ret = cpuhp_setup_state(CPUHP_AP_ONLINE_DYN, "x86/split_lock:online",
split_lock_online, split_lock_offline);
if (ret < 0)
--
2.5.0


2018-05-27 15:50:28

by Fenghua Yu

[permalink] [raw]
Subject: [RFC PATCH 08/16] x86/split_lock: Set split lock during EFI runtime service

If kernel enables #AC for split locked accesses, EFI runtime service may
hit #AC exception, treat it as fatal fault, and cause system hang.

To avoid debugging potential buggy EFI runtime service code in firmware,
restore to firmware split lock setting before entering EFI runtime service
and restore to kernel split lock setting after exiting EFI runtime service.

Signed-off-by: Fenghua Yu <[email protected]>
---
arch/x86/include/asm/efi.h | 5 +++++
1 file changed, 5 insertions(+)

diff --git a/arch/x86/include/asm/efi.h b/arch/x86/include/asm/efi.h
index cec5fae23eb3..a5604e5a56c1 100644
--- a/arch/x86/include/asm/efi.h
+++ b/arch/x86/include/asm/efi.h
@@ -8,6 +8,7 @@
#include <asm/tlb.h>
#include <asm/nospec-branch.h>
#include <asm/mmu_context.h>
+#include <asm/cpu.h>

/*
* We map the EFI regions needed for runtime services non-contiguously,
@@ -41,12 +42,14 @@ extern asmlinkage unsigned long efi_call_phys(void *, ...);
#define arch_efi_call_virt_setup() \
({ \
kernel_fpu_begin(); \
+ restore_split_lock_ac_firmware(); \
firmware_restrict_branch_speculation_start(); \
})

#define arch_efi_call_virt_teardown() \
({ \
firmware_restrict_branch_speculation_end(); \
+ restore_split_lock_ac_kernel(); \
kernel_fpu_end(); \
})

@@ -84,6 +87,7 @@ struct efi_scratch {
efi_sync_low_kernel_mappings(); \
preempt_disable(); \
__kernel_fpu_begin(); \
+ restore_split_lock_ac_firmware(); \
firmware_restrict_branch_speculation_start(); \
\
if (!efi_enabled(EFI_OLD_MEMMAP)) \
@@ -99,6 +103,7 @@ struct efi_scratch {
efi_switch_mm(efi_scratch.prev_mm); \
\
firmware_restrict_branch_speculation_end(); \
+ restore_split_lock_ac_kernel(); \
__kernel_fpu_end(); \
preempt_enable(); \
})
--
2.5.0


2018-05-27 15:50:33

by Fenghua Yu

[permalink] [raw]
Subject: [RFC PATCH 09/16] x86/split_lock: Add CONFIG to control #AC for split lock at boot time

User wants to specify how to set up #AC for split lock at boot time.

CONFIG_SPLIT_LOCK_AC_ENABLE_DEFAULT is added to control split
lock setting at boot time.

Default value is 2: Don't explicitly enable or disable #AC for split lock.
Inherit setting of #AC for split lock from firmware.

Value 0 to explicitly disable split lock at boot time.

Value 1 to explicitly enable split lock at boot time.

Signed-off-by: Fenghua Yu <[email protected]>
---
arch/x86/Kconfig | 14 ++++++++++++++
arch/x86/kernel/cpu/test_ctl.c | 12 ++++++++++++
2 files changed, 26 insertions(+)

diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
index 043cde9a9b08..1d0dcd6fa69a 100644
--- a/arch/x86/Kconfig
+++ b/arch/x86/Kconfig
@@ -461,6 +461,20 @@ config SPLIT_LOCK_AC

Say N if unsure.

+config SPLIT_LOCK_AC_ENABLE_DEFAULT
+ int "#AC for split lock enable value (0-2) at boot time"
+ range 0 2
+ default "2"
+ depends on SPLIT_LOCK_AC
+ help
+ Set #AC for split lock enable default value at boot time
+ 0: Explicitly disable #AC for split lock at boot time.
+ 1: Explicitly enable #AC for split lock at boot time.
+ 2: Don't explicitly enable or disable #AC for split lock.
+ Inherit setting of #AC for split lock from firmware.
+
+ Leave this to the default value of 2 if you are unsure.
+
if X86_32
config X86_BIGSMP
bool "Support for big SMP systems with more than 8 CPUs"
diff --git a/arch/x86/kernel/cpu/test_ctl.c b/arch/x86/kernel/cpu/test_ctl.c
index 82440740b2b9..a2f84fcd4da1 100644
--- a/arch/x86/kernel/cpu/test_ctl.c
+++ b/arch/x86/kernel/cpu/test_ctl.c
@@ -21,6 +21,7 @@

#define DISABLE_SPLIT_LOCK_AC 0
#define ENABLE_SPLIT_LOCK_AC 1
+#define INHERIT_SPLIT_LOCK_AC_FIRMWARE 2

/* After disabling #AC for split lock in handler, re-enable it 1 msec later. */
#define reenable_split_lock_delay msecs_to_jiffies(1)
@@ -71,6 +72,17 @@ void detect_split_lock_ac(void)
split_lock_ac_firmware = ENABLE_SPLIT_LOCK_AC;
else
split_lock_ac_firmware = DISABLE_SPLIT_LOCK_AC;
+
+ /*
+ * By default configuration, kernel inherits firmware split lock
+ * setting. Kernel can be configured to explicitly enable or disable
+ * #AC for split lock to override firmware setting.
+ */
+ if (CONFIG_SPLIT_LOCK_AC_ENABLE_DEFAULT ==
+ INHERIT_SPLIT_LOCK_AC_FIRMWARE)
+ split_lock_ac_kernel = split_lock_ac_firmware;
+ else
+ split_lock_ac_kernel = CONFIG_SPLIT_LOCK_AC_ENABLE_DEFAULT;
}

static void _setup_split_lock(int split_lock_ac_val)
--
2.5.0


2018-05-27 15:50:55

by Fenghua Yu

[permalink] [raw]
Subject: [RFC PATCH 04/16] x86/split_lock: Use non locked bit set instruction in set_cpu_cap

set_bit() called by set_cpu_cap() is a locked bit set instruction for
atomic operation.

Since the c->x86_capability can span two cache lines depending on kernel
configuration and building evnironment, the locked bit set instruction may
cause #AC exception when #AC exception for split lock is enabled.

But set_cpu_cap() is only called in early init phase on one CPU and
therefore there is no contention for set_cpu_cap(). It's unnecessary
to be atomic operation.

Using __set_bit(), which is not a locked instruction, is faster than
the locked instruction and can fix kernel split lock issue.

Signed-off-by: Fenghua Yu <[email protected]>
Acked-by: Dave Hansen <[email protected]>
---
arch/x86/include/asm/cpufeature.h | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/arch/x86/include/asm/cpufeature.h b/arch/x86/include/asm/cpufeature.h
index aced6c9290d6..0793ec95d18b 100644
--- a/arch/x86/include/asm/cpufeature.h
+++ b/arch/x86/include/asm/cpufeature.h
@@ -128,7 +128,8 @@ extern const char * const x86_bug_flags[NBUGINTS*32];

#define boot_cpu_has(bit) cpu_has(&boot_cpu_data, bit)

-#define set_cpu_cap(c, bit) set_bit(bit, (unsigned long *)((c)->x86_capability))
+#define set_cpu_cap(c, bit) \
+ __set_bit(bit, (unsigned long *)((c)->x86_capability))

extern void setup_clear_cpu_cap(unsigned int bit);
extern void clear_cpu_cap(struct cpuinfo_x86 *c, unsigned int bit);
--
2.5.0


2018-05-27 15:51:07

by Fenghua Yu

[permalink] [raw]
Subject: [RFC PATCH 05/16] x86/split_lock: Use non atomic set and clear bit instructions in clear_cpufeature()

x86_capability can span two cache lines depending on kernel configuration
and building environment. When #AC exception is enabled for split locked
accesses, clear_cpufeature() may generate #AC exception because of atomic
setting or clearing bits in x86_capability.

But kernel clears cpufeature only during a CPU is booting up. Therefore,
there is no racing condition when clear_cpufeature() is called and no need
to atomically clear or set bits in x86_capability.

To avoid #AC exception caused by split lock, call non atomic __set_bit()
and __clear_bit(). They are faster than atomic set_bit() and clear_bit()
as well.

Signed-off-by: Fenghua Yu <[email protected]>
---
arch/x86/kernel/cpu/cpuid-deps.c | 10 +++++-----
1 file changed, 5 insertions(+), 5 deletions(-)

diff --git a/arch/x86/kernel/cpu/cpuid-deps.c b/arch/x86/kernel/cpu/cpuid-deps.c
index 2c0bd38a44ab..b2c2a004c769 100644
--- a/arch/x86/kernel/cpu/cpuid-deps.c
+++ b/arch/x86/kernel/cpu/cpuid-deps.c
@@ -65,15 +65,15 @@ static const struct cpuid_dep cpuid_deps[] = {
static inline void clear_feature(struct cpuinfo_x86 *c, unsigned int feature)
{
/*
- * Note: This could use the non atomic __*_bit() variants, but the
- * rest of the cpufeature code uses atomics as well, so keep it for
- * consistency. Cleanup all of it separately.
+ * Because this code is only called during boot time and there
+ * is no need to be atomic, use non atomic __*_bit() for better
+ * performance and to avoid #AC exception for split locked access.
*/
if (!c) {
clear_cpu_cap(&boot_cpu_data, feature);
- set_bit(feature, (unsigned long *)cpu_caps_cleared);
+ __set_bit(feature, (unsigned long *)cpu_caps_cleared);
} else {
- clear_bit(feature, (unsigned long *)c->x86_capability);
+ __clear_bit(feature, (unsigned long *)c->x86_capability);
}
}

--
2.5.0


2018-05-27 15:51:08

by Fenghua Yu

[permalink] [raw]
Subject: [RFC PATCH 02/16] x86/split_lock: Handle #AC exception for split lock in kernel mode

When #AC exception for split lock is triggered from a kernel instruction,
by default we don't want the exception to panic the whole system. Instead,
the exception handler should log the fault information for debug and
continue the instruction execution.

CPU generates #AC exception if split lock operand is found before CPU
executes a locked instruction. When returning from #AC handler,
instruction pointer still points to the faulting instruction. So to
re-execute the instruction, the #AC handler needs to disable #AC to
avoid to trigger another #AC for split lock before returning to the
faulting instruction. To capture future split lock, #AC for split
lock will be re-enabled (after 1 msec).

During the period between disabling and re-enabling #AC exception for
split lock, some split locked accesses may not be captured. And since
the MSR_TEST_CTL is per core, disabling #AC exception for split
lock on one thread disables the feature on all threads in the
same core.

Although it's not an accurate way, the delayed re-enabling code is
simpler and cleaner than another possible method which disables #AC
for split lock in the handler, sets single step execution to execute
the faulting instruction, and re-enables #AC for split lock
in debug trap triggered by the next instruction after the faulting
instruction. The delayed re-enabling code can prevent flood of #AC
for split lock caused by a lot of split locks in short time (e.g.
the faulting instruction in a loop). And there is no missing split
lock because the following few blocked split locks will show up once
the first split lock issue is fixed.

Define helper re_execute() to check if the faulting instruction
can be re-executed. Currently it only checks kernel faulting
instruction. Checking user faulting instruction will be added later.

Signed-off-by: Fenghua Yu <[email protected]>
---
arch/x86/include/asm/cpu.h | 6 ++
arch/x86/kernel/cpu/test_ctl.c | 164 +++++++++++++++++++++++++++++++++++++++++
arch/x86/kernel/traps.c | 30 +++++++-
3 files changed, 199 insertions(+), 1 deletion(-)

diff --git a/arch/x86/include/asm/cpu.h b/arch/x86/include/asm/cpu.h
index 8e224956e3e2..083ef6d05c45 100644
--- a/arch/x86/include/asm/cpu.h
+++ b/arch/x86/include/asm/cpu.h
@@ -42,7 +42,13 @@ unsigned int x86_model(unsigned int sig);
unsigned int x86_stepping(unsigned int sig);
#ifdef CONFIG_SPLIT_LOCK_AC
void detect_split_lock_ac(void);
+bool do_split_lock_exception(struct pt_regs *regs, unsigned long error_code);
#else /* CONFIG_SPLIT_LOCK_AC */
static inline void detect_split_lock_ac(void) {}
+static inline bool
+do_split_lock_exception(struct pt_regs *regs, unsigned long error_code)
+{
+ return false;
+}
#endif /* CONFIG_SPLIT_LOCK_AC */
#endif /* _ASM_X86_CPU_H */
diff --git a/arch/x86/kernel/cpu/test_ctl.c b/arch/x86/kernel/cpu/test_ctl.c
index 46fa8e21f9f6..82f110759662 100644
--- a/arch/x86/kernel/cpu/test_ctl.c
+++ b/arch/x86/kernel/cpu/test_ctl.c
@@ -12,8 +12,22 @@

#include <linux/printk.h>
#include <linux/cpufeature.h>
+#include <linux/workqueue.h>
+#include <linux/cpu.h>
+#include <linux/mm.h>
#include <asm/msr.h>

+#define DISABLE_SPLIT_LOCK_AC 0
+#define ENABLE_SPLIT_LOCK_AC 1
+
+/* After disabling #AC for split lock in handler, re-enable it 1 msec later. */
+#define reenable_split_lock_delay msecs_to_jiffies(1)
+
+static void delayed_reenable_split_lock(struct work_struct *w);
+static DEFINE_PER_CPU(struct delayed_work, reenable_delayed_work);
+static unsigned long disable_split_lock_jiffies;
+static DEFINE_MUTEX(reexecute_split_lock_mutex);
+
/* Detete feature of #AC for split lock by probing bit 29 in MSR_TEST_CTL. */
void detect_split_lock_ac(void)
{
@@ -47,3 +61,153 @@ void detect_split_lock_ac(void)
*/
wrmsrl(MSR_TEST_CTL, orig_val);
}
+
+static void _setup_split_lock(int split_lock_ac_val)
+{
+ u64 val;
+
+ rdmsrl(MSR_TEST_CTL, val);
+
+ /* No need to update MSR if same value. */
+ if ((val >> MSR_TEST_CTL_ENABLE_AC_SPLIT_LOCK_SHIFT & 0x1) ==
+ split_lock_ac_val)
+ return;
+
+ if (split_lock_ac_val == ENABLE_SPLIT_LOCK_AC) {
+ /* Set the split lock bit to enable the feature. */
+ val |= MSR_TEST_CTL_ENABLE_AC_SPLIT_LOCK;
+ } else {
+ /* Clear the split lock bit to disable the feature. */
+ val &= ~MSR_TEST_CTL_ENABLE_AC_SPLIT_LOCK;
+ }
+
+ wrmsrl(MSR_TEST_CTL, val);
+}
+
+static void wait_for_reexecution(void)
+{
+ while (time_before(jiffies, disable_split_lock_jiffies +
+ reenable_split_lock_delay))
+ cpu_relax();
+}
+
+/*
+ * TEST_CTL MSR is shared among threads on the same core. To simplify
+ * situation, disable_split_lock_jiffies is global instead of per core.
+ *
+ * Multiple threads may generate #AC for split lock at the same time.
+ * disable_split_lock_jiffies is updated by those threads. This may
+ * postpone re-enabling split lock on this thread. But that's OK
+ * because we need to make sure all threads on the same core re-execute
+ * their faulting instructions before re-enabling split lock on the core.
+ *
+ * We want to avoid the situation when split lock is disabled on one
+ * thread (thus on the whole core), then split lock is re-enabled on
+ * another thread (thus on the whole core), and the faulting instruction
+ * generates another #AC on the first thread.
+ *
+ * Before re-enabling split lock, wait until there is no re-executed
+ * split lock instruction which may only exist before
+ * disable_split_lock_jiffies + reenable_split_lock_delay.
+ */
+static void delayed_reenable_split_lock(struct work_struct *w)
+{
+ mutex_lock(&reexecute_split_lock_mutex);
+ wait_for_reexecution();
+ _setup_split_lock(ENABLE_SPLIT_LOCK_AC);
+ mutex_unlock(&reexecute_split_lock_mutex);
+}
+
+/* Will the faulting instruction be re-executed? */
+static bool re_execute(struct pt_regs *regs)
+{
+ /*
+ * The only reason for generating #AC from kernel is because of
+ * split lock. The kernel faulting instruction will be re-executed.
+ */
+ if (!user_mode(regs))
+ return true;
+
+ return false;
+}
+
+static void disable_split_lock(void *unused)
+{
+ _setup_split_lock(DISABLE_SPLIT_LOCK_AC);
+}
+
+/*
+ * #AC handler for split lock is called by generic #AC handler.
+ *
+ * Disable #AC for split lock on the CPU that the current task runs on
+ * in order for the faulting instruction to get executed. The #AC for split
+ * lock is re-enabled later.
+ */
+bool do_split_lock_exception(struct pt_regs *regs, unsigned long error_code)
+{
+ static DEFINE_RATELIMIT_STATE(ratelimit, 5 * HZ, 5);
+ char str[] = "alignment check for split lock";
+ struct task_struct *tsk = current;
+ int cpu = task_cpu(tsk);
+
+ if (!re_execute(regs))
+ return false;
+
+ /* Pace logging with jiffies. */
+ if (__ratelimit(&ratelimit)) {
+ pr_info("%s[%d] %s ip:%lx sp:%lx error:%lx",
+ tsk->comm, tsk->pid, str,
+ regs->ip, regs->sp, error_code);
+ print_vma_addr(KERN_CONT " in ", regs->ip);
+ pr_cont("\n");
+ }
+
+ mutex_lock(&reexecute_split_lock_mutex);
+ smp_call_function_single(cpu, disable_split_lock, NULL, 1);
+ /*
+ * Mark the time when split lock is disabled for re-executing the
+ * faulting instruction.
+ */
+ disable_split_lock_jiffies = jiffies;
+ mutex_unlock(&reexecute_split_lock_mutex);
+
+ /* The faulting instruction will be re-executed when
+ * split lock is re-enabled 1 HZ later.
+ */
+ schedule_delayed_work_on(cpu, &per_cpu(reenable_delayed_work, cpu),
+ reenable_split_lock_delay);
+
+ return true;
+}
+
+static int split_lock_online(unsigned int cpu)
+{
+ INIT_DELAYED_WORK(&per_cpu(reenable_delayed_work, cpu),
+ delayed_reenable_split_lock);
+
+ return 0;
+}
+
+static int split_lock_offline(unsigned int cpu)
+{
+ cancel_delayed_work(&per_cpu(reenable_delayed_work, cpu));
+
+ return 0;
+}
+
+static int __init split_lock_init(void)
+{
+ int ret;
+
+ if (!boot_cpu_has(X86_FEATURE_SPLIT_LOCK_AC))
+ return -ENODEV;
+
+ ret = cpuhp_setup_state(CPUHP_AP_ONLINE_DYN, "x86/split_lock:online",
+ split_lock_online, split_lock_offline);
+ if (ret < 0)
+ return ret;
+
+ return 0;
+}
+
+late_initcall(split_lock_init);
diff --git a/arch/x86/kernel/traps.c b/arch/x86/kernel/traps.c
index 03f3d7695dac..971664134094 100644
--- a/arch/x86/kernel/traps.c
+++ b/arch/x86/kernel/traps.c
@@ -61,6 +61,7 @@
#include <asm/mpx.h>
#include <asm/vm86.h>
#include <asm/umip.h>
+#include <asm/cpu.h>

#ifdef CONFIG_X86_64
#include <asm/x86_init.h>
@@ -317,7 +318,34 @@ DO_ERROR(X86_TRAP_OLD_MF, SIGFPE, "coprocessor segment overrun",coprocessor_seg
DO_ERROR(X86_TRAP_TS, SIGSEGV, "invalid TSS", invalid_TSS)
DO_ERROR(X86_TRAP_NP, SIGBUS, "segment not present", segment_not_present)
DO_ERROR(X86_TRAP_SS, SIGBUS, "stack segment", stack_segment)
-DO_ERROR(X86_TRAP_AC, SIGBUS, "alignment check", alignment_check)
+
+dotraplinkage void do_alignment_check(struct pt_regs *regs, long error_code)
+{
+ unsigned int trapnr = X86_TRAP_AC;
+ char str[] = "alignment check";
+ int signr = SIGBUS;
+ siginfo_t info;
+ int ret;
+
+ RCU_LOCKDEP_WARN(!rcu_is_watching(), "entry code didn't wake RCU");
+
+ if (notify_die(DIE_TRAP, str, regs, error_code, trapnr, signr) !=
+ NOTIFY_STOP) {
+ cond_local_irq_enable(regs);
+
+ /* #AC exception could be handled by split lock handler. */
+ ret = do_split_lock_exception(regs, error_code);
+ if (ret)
+ return;
+
+ /*
+ * If not processed by split lock handler, go to generic
+ * #AC handler.
+ */
+ do_trap(trapnr, signr, str, regs, error_code,
+ fill_trap_info(regs, signr, trapnr, &info));
+ }
+}

#ifdef CONFIG_VMAP_STACK
__visible void __noreturn handle_stack_overflow(const char *message,
--
2.5.0


2018-05-27 15:51:15

by Fenghua Yu

[permalink] [raw]
Subject: [RFC PATCH 10/16] x86/split_lock: Add a debugfs interface to allow user to enable or disable #AC for split lock during run time

User wants to enable or disable #AC for split lock during run time.

The interface /sys/kernel/debug/x86/split_lock/enable is added to allow
user to control #AC for split lock and show current split lock status
during run time.

Writing 1 to the file enables #AC for split lock and writing 0 disables
#AC for split lock.

Reading the file shows current eanbled/disabled status of #AC for split
lock:
0: disabled and 1: enabled.

Signed-off-by: Fenghua Yu <[email protected]>
---
arch/x86/kernel/cpu/test_ctl.c | 92 ++++++++++++++++++++++++++++++++++++++++++
1 file changed, 92 insertions(+)

diff --git a/arch/x86/kernel/cpu/test_ctl.c b/arch/x86/kernel/cpu/test_ctl.c
index a2f84fcd4da1..e8b3032f3db0 100644
--- a/arch/x86/kernel/cpu/test_ctl.c
+++ b/arch/x86/kernel/cpu/test_ctl.c
@@ -17,6 +17,7 @@
#include <linux/mm.h>
#include <linux/reboot.h>
#include <linux/syscore_ops.h>
+#include <linux/debugfs.h>
#include <asm/msr.h>

#define DISABLE_SPLIT_LOCK_AC 0
@@ -34,6 +35,14 @@ static DEFINE_MUTEX(reexecute_split_lock_mutex);
static int split_lock_ac_kernel = DISABLE_SPLIT_LOCK_AC;
static int split_lock_ac_firmware = DISABLE_SPLIT_LOCK_AC;

+static DEFINE_MUTEX(split_lock_mutex);
+
+struct debugfs_file {
+ char name[32];
+ int mode;
+ const struct file_operations *fops;
+};
+
/* Detete feature of #AC for split lock by probing bit 29 in MSR_TEST_CTL. */
void detect_split_lock_ac(void)
{
@@ -292,6 +301,85 @@ static struct syscore_ops split_lock_syscore_ops = {
.resume = split_lock_bsp_resume,
};

+static int enable_show(void *data, u64 *val)
+{
+ *val = split_lock_ac_kernel;
+
+ return 0;
+}
+
+static int enable_store(void *data, u64 val)
+{
+ u64 msr_val;
+ int cpu;
+
+ if (val != DISABLE_SPLIT_LOCK_AC && val != ENABLE_SPLIT_LOCK_AC)
+ return -EINVAL;
+
+ /* No need to update MSR if new setting is the same as old one. */
+ if (val == split_lock_ac_kernel)
+ return 0;
+
+ mutex_lock(&split_lock_mutex);
+ mutex_lock(&reexecute_split_lock_mutex);
+
+ /*
+ * Wait until it's out of any re-executed split lock instruction
+ * window.
+ */
+ wait_for_reexecution();
+
+ split_lock_ac_kernel = val;
+ /* Read split lock setting on the current CPU. */
+ rdmsrl(MSR_TEST_CTL, msr_val);
+ /* Change the split lock setting. */
+ if (split_lock_ac_kernel == DISABLE_SPLIT_LOCK_AC)
+ msr_val &= ~MSR_TEST_CTL_ENABLE_AC_SPLIT_LOCK;
+ else
+ msr_val |= MSR_TEST_CTL_ENABLE_AC_SPLIT_LOCK;
+ /* Update the split lock setting on all online CPUs. */
+ for_each_online_cpu(cpu)
+ wrmsrl_on_cpu(cpu, MSR_TEST_CTL, msr_val);
+
+ mutex_unlock(&reexecute_split_lock_mutex);
+ mutex_unlock(&split_lock_mutex);
+
+ return 0;
+}
+
+DEFINE_DEBUGFS_ATTRIBUTE(enable_ops, enable_show, enable_store, "%llx\n");
+
+static int __init debugfs_setup_split_lock(void)
+{
+ struct debugfs_file debugfs_files[] = {
+ {"enable", 0600, &enable_ops},
+ };
+ struct dentry *split_lock_dir, *fd;
+ int i;
+
+ split_lock_dir = debugfs_create_dir("split_lock", arch_debugfs_dir);
+ if (!split_lock_dir)
+ goto out;
+
+ /* Create files under split_lock_dir. */
+ for (i = 0; i < ARRAY_SIZE(debugfs_files); i++) {
+ fd = debugfs_create_file(debugfs_files[i].name,
+ debugfs_files[i].mode,
+ split_lock_dir, NULL,
+ debugfs_files[i].fops);
+ if (!fd)
+ goto out_cleanup;
+ }
+
+ return 0;
+
+out_cleanup:
+ debugfs_remove_recursive(split_lock_dir);
+out:
+
+ return -ENOMEM;
+}
+
static int __init split_lock_init(void)
{
int ret;
@@ -299,6 +387,10 @@ static int __init split_lock_init(void)
if (!boot_cpu_has(X86_FEATURE_SPLIT_LOCK_AC))
return -ENODEV;

+ ret = debugfs_setup_split_lock();
+ if (ret)
+ pr_warn("debugfs for #AC for split lock cannot be set up\n");
+
ret = cpuhp_setup_state(CPUHP_AP_ONLINE_DYN, "x86/split_lock:online",
split_lock_online, split_lock_offline);
if (ret < 0)
--
2.5.0


2018-05-29 16:40:00

by Dave Hansen

[permalink] [raw]
Subject: Re: [RFC PATCH 00/16] x86/split_lock: Enable #AC exception for split locked accesses

On 05/27/2018 08:45 AM, Fenghua Yu wrote:
> ==Detect Split Lock==
>
> To detect split lock, a new control bit (bit 29) in per-core TEST_CTL
> MSR 0x33 will be introduced in future x86 processors. When the bit 29
> is set, the processor causes #AC exception for split locked accesses at
> all CPL.
>
> The bit 29 specification in MSR TEST_CTL is published in the latest
> Intel Architecture Instruction Set Extensions and Future Features
> Programming Reference.

The fact that this is per-core is pretty important, right? Where does
that get mentioned?

2018-05-29 16:41:59

by Dave Hansen

[permalink] [raw]
Subject: Re: [RFC PATCH 00/16] x86/split_lock: Enable #AC exception for split locked accesses

On 05/27/2018 08:45 AM, Fenghua Yu wrote:
> A split lock is any atomic operation whose operand crosses two cache
> lines. Since the operand spans two cache lines and the operation must
> be atomic, the system locks the bus while the CPU accesses the two cache
> lines.

Could you give us a bit of an idea why this is RFC? What needs review?
What needs commenting-on? What is left before you think this can be
merged? Where would you like Thomas, Ingo, and other reviewers to focus
their attention?

One thing we from Intel can be horrible about is launching in to the
details of the hardware implementation instead of focusing on the
software implications. For instance, you launch into the details of bus
locks without mentioning key things like:

Split-lock-detection is a new hardware feature that generates
alignment-check (#AC) faults to help detect when badly-aligned atomic
instructions might impact whole-system performance. These patches are
primarily targeted at application-level issues, but we can also detect
the same issues in the kernel. There is a significant interaction
between this feature and firmware because firmware may or may not be
prepared for this feature to be enabled.

2018-05-29 17:27:15

by Fenghua Yu

[permalink] [raw]
Subject: Re: [RFC PATCH 00/16] x86/split_lock: Enable #AC exception for split locked accesses

On Tue, May 29, 2018 at 09:39:01AM -0700, Dave Hansen wrote:
> On 05/27/2018 08:45 AM, Fenghua Yu wrote:
> > ==Detect Split Lock==
> >
> > To detect split lock, a new control bit (bit 29) in per-core TEST_CTL
> > MSR 0x33 will be introduced in future x86 processors. When the bit 29
> > is set, the processor causes #AC exception for split locked accesses at
> > all CPL.
> >
> > The bit 29 specification in MSR TEST_CTL is published in the latest
> > Intel Architecture Instruction Set Extensions and Future Features
> > Programming Reference.
>
> The fact that this is per-core is pretty important, right? Where does
> that get mentioned?

The fact that TEST_CTL is per-core is NOT mentioned in the Instruction
Set Extensions or SDM.

(By the way, ISE and SDM don't mention other important details, e.g.
the fact that operand is fetched to cache before split lock is checked.
Without the fact, it's hard to explain that only split lock generates bus
lock in chapter 8.1.4 in SDM vol3.)

I was told that TEST_CTL is per-core by the split lock hardware designer.
And I do find that the MSR is per-core on the currently only processor
that has split lock implementation.

As you can see handling per-core TEST_CTL needs consideration of locking
(in patch2).

But the code is supposed to work even if TEST_CTL is per-thread (or even
per-socket) in future.

Maybe I can add "Current TEST_CTL implementation is per-core. The patches
are supposed to work even when TEST_CTL is per-thread (or even per-
socket) in future as well."?

Thanks.

-Fenghua

2018-05-29 17:30:11

by Dave Hansen

[permalink] [raw]
Subject: Re: [RFC PATCH 00/16] x86/split_lock: Enable #AC exception for split locked accesses

On 05/29/2018 10:25 AM, Fenghua Yu wrote:
> (By the way, ISE and SDM don't mention other important details, e.g.
> the fact that operand is fetched to cache before split lock is
> checked. Without the fact, it's hard to explain that only split lock
> generates bus lock in chapter 8.1.4 in SDM vol3.)

That's a bummer. You're working with the folks that own those documents
to get that fixed up, right?

> Maybe I can add "Current TEST_CTL implementation is per-core. The patches
> are supposed to work even when TEST_CTL is per-thread (or even per-
> socket) in future as well."?

Yes, that would be a very important part of the software implementation
to mention.

2018-05-29 17:38:53

by Fenghua Yu

[permalink] [raw]
Subject: Re: [RFC PATCH 00/16] x86/split_lock: Enable #AC exception for split locked accesses

On Tue, May 29, 2018 at 09:40:09AM -0700, Dave Hansen wrote:
> On 05/27/2018 08:45 AM, Fenghua Yu wrote:
> > A split lock is any atomic operation whose operand crosses two cache
> > lines. Since the operand spans two cache lines and the operation must
> > be atomic, the system locks the bus while the CPU accesses the two cache
> > lines.
>
> Could you give us a bit of an idea why this is RFC? What needs review?
> What needs commenting-on? What is left before you think this can be
> merged? Where would you like Thomas, Ingo, and other reviewers to focus
> their attention?
>
> One thing we from Intel can be horrible about is launching in to the
> details of the hardware implementation instead of focusing on the
> software implications. For instance, you launch into the details of bus
> locks without mentioning key things like:
>
> Split-lock-detection is a new hardware feature that generates
> alignment-check (#AC) faults to help detect when badly-aligned atomic
> instructions might impact whole-system performance. These patches are
> primarily targeted at application-level issues, but we can also detect
> the same issues in the kernel. There is a significant interaction
> between this feature and firmware because firmware may or may not be
> prepared for this feature to be enabled.

Hi, Thomas, Ingo, and other reviewers,

Currently I only got feedback from Intel people (Dave, Alan, Tony, Ashok,
etc). So far I haven't seen any feedback from you.

Does the patch set handle the feature of #AC for split lock right e.g.
in areas like enumeration, interaciton with firmware, etc? Does it handle
usage case right e.g. the configuration, debugfs interface,etc? Need any
coding improvement?

Thank you very much!

-Fenghua

2018-05-29 17:41:33

by Fenghua Yu

[permalink] [raw]
Subject: Re: [RFC PATCH 00/16] x86/split_lock: Enable #AC exception for split locked accesses

On Tue, May 29, 2018 at 10:28:11AM -0700, Dave Hansen wrote:
> On 05/29/2018 10:25 AM, Fenghua Yu wrote:
> > (By the way, ISE and SDM don't mention other important details, e.g.
> > the fact that operand is fetched to cache before split lock is
> > checked. Without the fact, it's hard to explain that only split lock
> > generates bus lock in chapter 8.1.4 in SDM vol3.)
>
> That's a bummer. You're working with the folks that own those documents
> to get that fixed up, right?

Yes, I already asked them to add some missing parts in future ISE and SDM.

>
> > Maybe I can add "Current TEST_CTL implementation is per-core. The patches
> > are supposed to work even when TEST_CTL is per-thread (or even per-
> > socket) in future as well."?
>
> Yes, that would be a very important part of the software implementation
> to mention.

I'll do that in the next version.

Thanks.

-Fenghua

2018-05-29 18:45:05

by Greg KH

[permalink] [raw]
Subject: Re: [RFC PATCH 10/16] x86/split_lock: Add a debugfs interface to allow user to enable or disable #AC for split lock during run time

On Sun, May 27, 2018 at 08:45:59AM -0700, Fenghua Yu wrote:
> User wants to enable or disable #AC for split lock during run time.
>
> The interface /sys/kernel/debug/x86/split_lock/enable is added to allow
> user to control #AC for split lock and show current split lock status
> during run time.
>
> Writing 1 to the file enables #AC for split lock and writing 0 disables
> #AC for split lock.
>
> Reading the file shows current eanbled/disabled status of #AC for split
> lock:
> 0: disabled and 1: enabled.
>
> Signed-off-by: Fenghua Yu <[email protected]>
> ---
> arch/x86/kernel/cpu/test_ctl.c | 92 ++++++++++++++++++++++++++++++++++++++++++
> 1 file changed, 92 insertions(+)
>
> diff --git a/arch/x86/kernel/cpu/test_ctl.c b/arch/x86/kernel/cpu/test_ctl.c
> index a2f84fcd4da1..e8b3032f3db0 100644
> --- a/arch/x86/kernel/cpu/test_ctl.c
> +++ b/arch/x86/kernel/cpu/test_ctl.c
> @@ -17,6 +17,7 @@
> #include <linux/mm.h>
> #include <linux/reboot.h>
> #include <linux/syscore_ops.h>
> +#include <linux/debugfs.h>
> #include <asm/msr.h>
>
> #define DISABLE_SPLIT_LOCK_AC 0
> @@ -34,6 +35,14 @@ static DEFINE_MUTEX(reexecute_split_lock_mutex);
> static int split_lock_ac_kernel = DISABLE_SPLIT_LOCK_AC;
> static int split_lock_ac_firmware = DISABLE_SPLIT_LOCK_AC;
>
> +static DEFINE_MUTEX(split_lock_mutex);
> +
> +struct debugfs_file {
> + char name[32];
> + int mode;
> + const struct file_operations *fops;
> +};
> +
> /* Detete feature of #AC for split lock by probing bit 29 in MSR_TEST_CTL. */
> void detect_split_lock_ac(void)
> {
> @@ -292,6 +301,85 @@ static struct syscore_ops split_lock_syscore_ops = {
> .resume = split_lock_bsp_resume,
> };
>
> +static int enable_show(void *data, u64 *val)
> +{
> + *val = split_lock_ac_kernel;
> +
> + return 0;
> +}
> +
> +static int enable_store(void *data, u64 val)
> +{
> + u64 msr_val;
> + int cpu;
> +
> + if (val != DISABLE_SPLIT_LOCK_AC && val != ENABLE_SPLIT_LOCK_AC)
> + return -EINVAL;
> +
> + /* No need to update MSR if new setting is the same as old one. */
> + if (val == split_lock_ac_kernel)
> + return 0;
> +
> + mutex_lock(&split_lock_mutex);
> + mutex_lock(&reexecute_split_lock_mutex);
> +
> + /*
> + * Wait until it's out of any re-executed split lock instruction
> + * window.
> + */
> + wait_for_reexecution();
> +
> + split_lock_ac_kernel = val;
> + /* Read split lock setting on the current CPU. */
> + rdmsrl(MSR_TEST_CTL, msr_val);
> + /* Change the split lock setting. */
> + if (split_lock_ac_kernel == DISABLE_SPLIT_LOCK_AC)
> + msr_val &= ~MSR_TEST_CTL_ENABLE_AC_SPLIT_LOCK;
> + else
> + msr_val |= MSR_TEST_CTL_ENABLE_AC_SPLIT_LOCK;
> + /* Update the split lock setting on all online CPUs. */
> + for_each_online_cpu(cpu)
> + wrmsrl_on_cpu(cpu, MSR_TEST_CTL, msr_val);
> +
> + mutex_unlock(&reexecute_split_lock_mutex);
> + mutex_unlock(&split_lock_mutex);
> +
> + return 0;
> +}
> +
> +DEFINE_DEBUGFS_ATTRIBUTE(enable_ops, enable_show, enable_store, "%llx\n");
> +
> +static int __init debugfs_setup_split_lock(void)
> +{
> + struct debugfs_file debugfs_files[] = {
> + {"enable", 0600, &enable_ops},
> + };
> + struct dentry *split_lock_dir, *fd;
> + int i;
> +
> + split_lock_dir = debugfs_create_dir("split_lock", arch_debugfs_dir);
> + if (!split_lock_dir)
> + goto out;
> +

No need to test this, just keep moving on. You should never need to
test the result of any debugfs call at all.


> + /* Create files under split_lock_dir. */
> + for (i = 0; i < ARRAY_SIZE(debugfs_files); i++) {
> + fd = debugfs_create_file(debugfs_files[i].name,
> + debugfs_files[i].mode,
> + split_lock_dir, NULL,
> + debugfs_files[i].fops);
> + if (!fd)
> + goto out_cleanup;

Same here, no need to test anything.

> + }
> +
> + return 0;

This function can not fail, might as well make it return void :)

thanks,

greg k-h

2018-06-21 19:40:09

by Peter Zijlstra

[permalink] [raw]
Subject: Re: [RFC PATCH 00/16] x86/split_lock: Enable #AC exception for split locked accesses

On Sun, May 27, 2018 at 08:45:49AM -0700, Fenghua Yu wrote:
> Currently we can trace split lock event counter for debug purpose. But

How? A while ago I actually tried that, but I could not find a suitable
perf event.

> Intel introduces mechanism to detect split lock via alignment
> check exception in Tremont and other future processors. If split lock is
> from user process, #AC handler can kill the process or re-execute faulting
> instruction depending on configuration.

Ideally it would #AC any unaligned (implied) LOCK prefix instruction,
not just across lines.

> To detect split lock, a new control bit (bit 29) in per-core TEST_CTL
> MSR 0x33 will be introduced in future x86 processors. When the bit 29
> is set, the processor causes #AC exception for split locked accesses at
> all CPL.

Per-Core is really unfortunate, but better than nothing.

2018-06-21 19:49:48

by Peter Zijlstra

[permalink] [raw]
Subject: Re: [RFC PATCH 02/16] x86/split_lock: Handle #AC exception for split lock in kernel mode

On Sun, May 27, 2018 at 08:45:51AM -0700, Fenghua Yu wrote:
> When #AC exception for split lock is triggered from a kernel instruction,
> by default we don't want the exception to panic the whole system. Instead,
> the exception handler should log the fault information for debug and
> continue the instruction execution.

This should not happen today, did you find any unaligned LOCK insns?

This is a lot of code to 'handle' something that really should not ever
happen. Maybe just WARN and clear the MSR bit, nothing complicated.

Also, fix your "disgusting drug-induced crap" comments.

2018-06-21 19:57:17

by Peter Zijlstra

[permalink] [raw]
Subject: Re: [RFC PATCH 04/16] x86/split_lock: Use non locked bit set instruction in set_cpu_cap

On Sun, May 27, 2018 at 08:45:53AM -0700, Fenghua Yu wrote:
> set_bit() called by set_cpu_cap() is a locked bit set instruction for
> atomic operation.
>
> Since the c->x86_capability can span two cache lines depending on kernel
> configuration and building evnironment, the locked bit set instruction may
> cause #AC exception when #AC exception for split lock is enabled.

That doesn't make sense. Sure the bitmap may be longer, but depending on
if the argument is an immediate or not we either use a byte instruction
(which can never cross a cacheline boundary) or a 'word' aligned BTS.
And the bitmap really _should_ be 'unsigned long' aligned.

If it is not aligned, fix that too.

/me looks at cpuinfo_x86 and finds x86_capability is in fact a __u32
array.. see that's broken and needs fixing first.

2018-06-21 19:59:38

by Peter Zijlstra

[permalink] [raw]
Subject: Re: [RFC PATCH 06/16] x86/split_lock: Save #AC setting for split lock in firmware in boot time and restore the setting in reboot

On Sun, May 27, 2018 at 08:45:55AM -0700, Fenghua Yu wrote:
> Firmware may contain split locked instructions.

I think that's the wrong attitude. You should mandate in your BIOS
development guide that Firmware _MUST_NOT_ contain unaligned LOCK
prefixed instructions.


2018-06-21 20:08:11

by Peter Zijlstra

[permalink] [raw]
Subject: Re: [RFC PATCH 00/16] x86/split_lock: Enable #AC exception for split locked accesses

On Tue, May 29, 2018 at 09:40:09AM -0700, Dave Hansen wrote:
> Split-lock-detection is a new hardware feature that generates
> alignment-check (#AC) faults to help detect when badly-aligned atomic
> instructions might impact whole-system performance. These patches are
> primarily targeted at application-level issues, but we can also detect
> the same issues in the kernel.

Which is nice, Thomas and me spend quite a while searching for a perf
event that could tell us about unaligned LOCK insns a few months back
and could not find anything useful.

> There is a significant interaction
> between this feature and firmware because firmware may or may not be
> prepared for this feature to be enabled.

And allowing firmware to muddle through, instead of drawing a hard line
in the sand means we'll be stuck with crap firmware for at least another
decade :/

Just say no.

2018-06-21 20:21:07

by Fenghua Yu

[permalink] [raw]
Subject: Re: [RFC PATCH 00/16] x86/split_lock: Enable #AC exception for split locked accesses

On Thu, Jun 21, 2018 at 09:37:38PM +0200, Peter Zijlstra wrote:
> On Sun, May 27, 2018 at 08:45:49AM -0700, Fenghua Yu wrote:
> > Currently we can trace split lock event counter for debug purpose. But
>
> How? A while ago I actually tried that, but I could not find a suitable
> perf event.

The event name is called sq_misc.split_lock. It's been supported in perf
already.

>
> > Intel introduces mechanism to detect split lock via alignment
> > check exception in Tremont and other future processors. If split lock is
> > from user process, #AC handler can kill the process or re-execute faulting
> > instruction depending on configuration.
>
> Ideally it would #AC any unaligned (implied) LOCK prefix instruction,
> not just across lines.

This feature only triggers #AC for unaligned cache line access, not for
other aligned (4 bytes, 8 bytes, etc). This is not explicitly said in
ISE. I can add this info in next version of patches.

>
> > To detect split lock, a new control bit (bit 29) in per-core TEST_CTL
> > MSR 0x33 will be introduced in future x86 processors. When the bit 29
> > is set, the processor causes #AC exception for split locked accesses at
> > all CPL.
>
> Per-Core is really unfortunate, but better than nothing.

Agree with you! Per-core is at least for current hardware implementation.
The code and locking in the code are supposed to work in the future on
potential per-thread or even per-socket implmentation.

Thanks.

-Fenghua

2018-06-21 20:34:25

by Thomas Gleixner

[permalink] [raw]
Subject: Re: [RFC PATCH 00/16] x86/split_lock: Enable #AC exception for split locked accesses

On Thu, 21 Jun 2018, Fenghua Yu wrote:
> On Thu, Jun 21, 2018 at 09:37:38PM +0200, Peter Zijlstra wrote:
> > On Sun, May 27, 2018 at 08:45:49AM -0700, Fenghua Yu wrote:
> > > Currently we can trace split lock event counter for debug purpose. But
> >
> > How? A while ago I actually tried that, but I could not find a suitable
> > perf event.
>
> The event name is called sq_misc.split_lock. It's been supported in perf
> already.

So the obvious question is why not simply use that counter and capture the
IP which triggers the event?

I can see that this wont cover the early boot process, but there it's
enough to catch #AC once, yell loudly and then disable the thing. I'm not
seing the value of adding 1000 lines of code with lots of control knobs.

I might be missing something though and am happy to be educated.

Thanks,

tglx



2018-06-21 20:37:55

by Peter Zijlstra

[permalink] [raw]
Subject: Re: [RFC PATCH 00/16] x86/split_lock: Enable #AC exception for split locked accesses

On Thu, Jun 21, 2018 at 01:18:52PM -0700, Fenghua Yu wrote:
> On Thu, Jun 21, 2018 at 09:37:38PM +0200, Peter Zijlstra wrote:
> > On Sun, May 27, 2018 at 08:45:49AM -0700, Fenghua Yu wrote:
> > > Currently we can trace split lock event counter for debug purpose. But
> >
> > How? A while ago I actually tried that, but I could not find a suitable
> > perf event.
>
> The event name is called sq_misc.split_lock. It's been supported in perf
> already.

Thanks.

> > > Intel introduces mechanism to detect split lock via alignment
> > > check exception in Tremont and other future processors. If split lock is
> > > from user process, #AC handler can kill the process or re-execute faulting
> > > instruction depending on configuration.
> >
> > Ideally it would #AC any unaligned (implied) LOCK prefix instruction,
> > not just across lines.
>
> This feature only triggers #AC for unaligned cache line access, not for
> other aligned (4 bytes, 8 bytes, etc). This is not explicitly said in
> ISE. I can add this info in next version of patches.

It was clear; what I'm saying it I'd like #AC to happen for any actual
unaligned LOCK access, not just across lines.


2018-06-21 20:39:14

by Thomas Gleixner

[permalink] [raw]
Subject: Re: [RFC PATCH 04/16] x86/split_lock: Use non locked bit set instruction in set_cpu_cap

On Thu, 21 Jun 2018, Peter Zijlstra wrote:

> On Sun, May 27, 2018 at 08:45:53AM -0700, Fenghua Yu wrote:
> > set_bit() called by set_cpu_cap() is a locked bit set instruction for
> > atomic operation.
> >
> > Since the c->x86_capability can span two cache lines depending on kernel
> > configuration and building evnironment, the locked bit set instruction may
> > cause #AC exception when #AC exception for split lock is enabled.
>
> That doesn't make sense. Sure the bitmap may be longer, but depending on
> if the argument is an immediate or not we either use a byte instruction
> (which can never cross a cacheline boundary) or a 'word' aligned BTS.
> And the bitmap really _should_ be 'unsigned long' aligned.
>
> If it is not aligned, fix that too.
>
> /me looks at cpuinfo_x86 and finds x86_capability is in fact a __u32
> array.. see that's broken and needs fixing first.

Ditto for the next patch. ...


2018-06-21 22:03:50

by Fenghua Yu

[permalink] [raw]
Subject: Re: [RFC PATCH 00/16] x86/split_lock: Enable #AC exception for split locked accesses

On Thu, Jun 21, 2018 at 10:32:57PM +0200, Thomas Gleixner wrote:
> On Thu, 21 Jun 2018, Fenghua Yu wrote:
> > On Thu, Jun 21, 2018 at 09:37:38PM +0200, Peter Zijlstra wrote:
> > > On Sun, May 27, 2018 at 08:45:49AM -0700, Fenghua Yu wrote:
> > > > Currently we can trace split lock event counter for debug purpose. But
> > >
> > > How? A while ago I actually tried that, but I could not find a suitable
> > > perf event.
> >
> > The event name is called sq_misc.split_lock. It's been supported in perf
> > already.
>
> So the obvious question is why not simply use that counter and capture the
> IP which triggers the event?
>

The sq_misc.split_lock event is AFTER the fact and insufficient to
capture split lock before the instruction is executed for system deployed
in the field. #AC for split lock is triggered BEFORE the instruction is
executed.

For example, on a consolidated real-time machine, some cores are running
hard real time workloads while the rest of cores are running "untrusted"
user processes. One untrusted user process may execute an instruction that
accesses split locked data and causes bus locking on the whole machine to
block real time workloads to access memory. In this case, capturing split
lock perf event won't immediately help the real time workloads. With #AC
for split lock, the split lock is detected before the instruction hurts
hard real time workloads and the untrusted process can be killed or system
admin gets warning depending on policy. Without #AC for split lock feature,
such consolidated real-time design is impossible.

Another example, in a public cloud deployed in the field, a user process
in a guest can execute an instruction with split lock to slow down overall
performance of other guests and host. This process could be a misdesigned
process or a malware. #AC for split lock can kill the process or provide
warning before harm. On the other hand, the perf event needs perf to run
to monitor the event on the public cloud and doesn't really prevent the
split lock from hurting system performance.

And perf cannot count split lock events in firmware. In real time, even
split lock in firmware (reboot, run time services, etc) may not be tolerant
and need to be detected and prevented.

Do the examples make sense?

> I can see that this wont cover the early boot process, but there it's
> enough to catch #AC once, yell loudly and then disable the thing. I'm not
> seing the value of adding 1000 lines of code with lots of control knobs.
>
> I might be missing something though and am happy to be educated.
>

Right. Code won't cover the early boot process. It only covers boot process
after the feature is enabled.

After disabling #AC split lock after catching #AC once, any future split
lock will not generate #AC any more.

The control knobs allow sysadmin to handle #AC for split lock in different
scenarios and usages.

The control knob for kernel is to choose re-executing the faulting
instruction (default) or kernel panic. Kernel panic may be useful in hard
real time which has less tolerant to bad performance.

The control knob for user is to choose killing the process (default) or
re-executing the faulting instruction without blocking the process.
Re-executing the instruction maybe be useful in platforms that run
well controlled apps with less split locks.

The control knob for firmware is to choose continuing firmware execution
by disabling #AC split lock (default) or stopping firmware execution
by enabling #AC for split lock. Stopping firmware execution may be useful
in hard real time system to identify any split lock issue on the platform.

So the control knobs may be useful for different scenarios, right?

Thanks.

-Fenghua



2018-06-21 22:08:28

by Peter Zijlstra

[permalink] [raw]
Subject: Re: [RFC PATCH 00/16] x86/split_lock: Enable #AC exception for split locked accesses

On Thu, Jun 21, 2018 at 03:00:03PM -0700, Fenghua Yu wrote:
> And perf cannot count split lock events in firmware. In real time, even
> split lock in firmware (reboot, run time services, etc) may not be tolerant
> and need to be detected and prevented.

And yet you want to allow firmware to use them by excluding it in these
here patches. Make up your mind.

2018-06-21 22:09:40

by Peter Zijlstra

[permalink] [raw]
Subject: Re: [RFC PATCH 00/16] x86/split_lock: Enable #AC exception for split locked accesses

On Thu, Jun 21, 2018 at 03:00:03PM -0700, Fenghua Yu wrote:
> The control knob for kernel is to choose re-executing the faulting
> instruction (default) or kernel panic. Kernel panic may be useful in hard
> real time which has less tolerant to bad performance.

The kernel should never trigger this, ever. Any #AC is a plain bug and
should be fixed. So unconditional WARN+disable is suffient.

2018-06-21 22:11:19

by Peter Zijlstra

[permalink] [raw]
Subject: Re: [RFC PATCH 00/16] x86/split_lock: Enable #AC exception for split locked accesses

On Thu, Jun 21, 2018 at 03:00:03PM -0700, Fenghua Yu wrote:
> The control knob for firmware is to choose continuing firmware execution
> by disabling #AC split lock (default) or stopping firmware execution
> by enabling #AC for split lock. Stopping firmware execution may be useful
> in hard real time system to identify any split lock issue on the platform.

Having the option only allows broken firmware to continue to exist.
Limiting people in how they can use their machines.

2018-06-21 22:12:43

by Peter Zijlstra

[permalink] [raw]
Subject: Re: [RFC PATCH 00/16] x86/split_lock: Enable #AC exception for split locked accesses

On Thu, Jun 21, 2018 at 03:00:03PM -0700, Fenghua Yu wrote:
> The control knob for user is to choose killing the process (default) or
> re-executing the faulting instruction without blocking the process.
> Re-executing the instruction maybe be useful in platforms that run
> well controlled apps with less split locks.

Just fix the applications. I doubt there will be many, many other
platforms will already SIGBUS on unaligned access, atomic or not.

2018-06-21 22:15:34

by Peter Zijlstra

[permalink] [raw]
Subject: Re: [RFC PATCH 00/16] x86/split_lock: Enable #AC exception for split locked accesses

On Thu, Jun 21, 2018 at 03:00:03PM -0700, Fenghua Yu wrote:
> For example, on a consolidated real-time machine, some cores are running

> Another example, in a public cloud deployed in the field, a user process

In either case a single split-lock shouldn't be a real problem, if you
program the event with a count of 1 and have the NMI handler kill the
offending task, you should be good.

Not saying the #AC isn't nicer, just saying the PMU based thing can
still work.

2018-06-21 22:19:16

by Dave Hansen

[permalink] [raw]
Subject: Re: [RFC PATCH 00/16] x86/split_lock: Enable #AC exception for split locked accesses

On 06/21/2018 03:08 PM, Peter Zijlstra wrote:
> On Thu, Jun 21, 2018 at 03:00:03PM -0700, Fenghua Yu wrote:
>> The control knob for kernel is to choose re-executing the faulting
>> instruction (default) or kernel panic. Kernel panic may be useful in hard
>> real time which has less tolerant to bad performance.
> The kernel should never trigger this, ever. Any #AC is a plain bug and
> should be fixed. So unconditional WARN+disable is suffient.

It sounds like we need to start with that functionality and then if
Fenghua wants to do more, we have add-on patches that can be evaluated
by themselves.

2018-06-21 22:44:53

by Fenghua Yu

[permalink] [raw]
Subject: Re: [RFC PATCH 00/16] x86/split_lock: Enable #AC exception for split locked accesses

On Thu, Jun 21, 2018 at 03:18:19PM -0700, Dave Hansen wrote:
> On 06/21/2018 03:08 PM, Peter Zijlstra wrote:
> > On Thu, Jun 21, 2018 at 03:00:03PM -0700, Fenghua Yu wrote:
> >> The control knob for kernel is to choose re-executing the faulting
> >> instruction (default) or kernel panic. Kernel panic may be useful in hard
> >> real time which has less tolerant to bad performance.
> > The kernel should never trigger this, ever. Any #AC is a plain bug and
> > should be fixed. So unconditional WARN+disable is suffient.
>
> It sounds like we need to start with that functionality and then if
> Fenghua wants to do more, we have add-on patches that can be evaluated
> by themselves.

For split lock in kernel, by default, I just do warning, disable feature
in #AC handler, continue to execute the faulting instruction, and re-enable
#AC for split lock after a delay. This is the default behavior in the
current patches.

By "WAR+disable", you mean I just do this default behavior and remove the
control knob?

Thanks.

-Fenghua

2018-06-21 22:46:38

by Fenghua Yu

[permalink] [raw]
Subject: Re: [RFC PATCH 00/16] x86/split_lock: Enable #AC exception for split locked accesses

On Fri, Jun 22, 2018 at 12:11:14AM +0200, Peter Zijlstra wrote:
> On Thu, Jun 21, 2018 at 03:00:03PM -0700, Fenghua Yu wrote:
> > The control knob for user is to choose killing the process (default) or
> > re-executing the faulting instruction without blocking the process.
> > Re-executing the instruction maybe be useful in platforms that run
> > well controlled apps with less split locks.
>
> Just fix the applications. I doubt there will be many, many other
> platforms will already SIGBUS on unaligned access, atomic or not.

Ok. I will just kill the process by SIGBUS and remove the control knob
in the next version.

Thanks.

-Fenghua

2018-06-21 23:06:27

by Fenghua Yu

[permalink] [raw]
Subject: Re: [RFC PATCH 00/16] x86/split_lock: Enable #AC exception for split locked accesses

On Fri, Jun 22, 2018 at 12:10:06AM +0200, Peter Zijlstra wrote:
> On Thu, Jun 21, 2018 at 03:00:03PM -0700, Fenghua Yu wrote:
> > The control knob for firmware is to choose continuing firmware execution
> > by disabling #AC split lock (default) or stopping firmware execution
> > by enabling #AC for split lock. Stopping firmware execution may be useful
> > in hard real time system to identify any split lock issue on the platform.
>
> Having the option only allows broken firmware to continue to exist.
> Limiting people in how they can use their machines.

But in a real case, when I enable #AC for split lock in kernel, reboot
hits #AC because of split lock in firmware code and firmware handles #AC
as fatal error and stops continuing to run.

It will take long time/forever for firmware to fix the split lock issue.
Before the firmware issue is fixed, reboot or S4 cannot run if the feature
is enabled by kernel.

And if unlucky, I'm afraid the patch set even has no chance to be merged to
upstream if maintainer's test machine has firmware split lock issue and the
machine simply cannot reboot or go to S4 if the feature is enabled.

For those reasons, the current patches just don't trust firmware and
disable #AC for split lock for firmware by default and allow sysadmin to
enable it for firmware via the control knob.

So is it ok to still keep the control knob and disable #AC for split lock
for firmware by default?

Thanks.

-Fenghua

2018-06-22 08:13:36

by Peter Zijlstra

[permalink] [raw]
Subject: Re: [RFC PATCH 00/16] x86/split_lock: Enable #AC exception for split locked accesses

On Thu, Jun 21, 2018 at 03:42:03PM -0700, Fenghua Yu wrote:
> On Thu, Jun 21, 2018 at 03:18:19PM -0700, Dave Hansen wrote:
> > On 06/21/2018 03:08 PM, Peter Zijlstra wrote:
> > > On Thu, Jun 21, 2018 at 03:00:03PM -0700, Fenghua Yu wrote:
> > >> The control knob for kernel is to choose re-executing the faulting
> > >> instruction (default) or kernel panic. Kernel panic may be useful in hard
> > >> real time which has less tolerant to bad performance.
> > > The kernel should never trigger this, ever. Any #AC is a plain bug and
> > > should be fixed. So unconditional WARN+disable is suffient.
> >
> > It sounds like we need to start with that functionality and then if
> > Fenghua wants to do more, we have add-on patches that can be evaluated
> > by themselves.
>
> For split lock in kernel, by default, I just do warning, disable feature
> in #AC handler, continue to execute the faulting instruction, and re-enable
> #AC for split lock after a delay. This is the default behavior in the
> current patches.
>
> By "WAR+disable", you mean I just do this default behavior and remove the
> control knob?

No, remove all that broken delay crap, remove the knob. Basically remove
everything.

The handler really should be:

do_handle_ac_trap()
{
if (user_mode(regs)) {
send_signal(SIGBUS);
return;
}

WARN_ON(1);
disable_ac_on_all_cpus();
}

done. No broken delay code, no nothing.

2018-06-22 08:20:44

by Peter Zijlstra

[permalink] [raw]
Subject: Re: [RFC PATCH 00/16] x86/split_lock: Enable #AC exception for split locked accesses

On Thu, Jun 21, 2018 at 04:05:02PM -0700, Fenghua Yu wrote:

> But in a real case, when I enable #AC for split lock in kernel, reboot
> hits #AC because of split lock in firmware code and firmware handles #AC
> as fatal error and stops continuing to run.
>
> It will take long time/forever for firmware to fix the split lock issue.
> Before the firmware issue is fixed, reboot or S4 cannot run if the feature
> is enabled by kernel.

Then it will take a long time / forever for people to use their machines
as they need. Don't you see that is a problem?

Either you take this real-time stuff serious and force BIOS monkeys to
get with the act, this includes no #AC but also very much force them
into abandoning SMM as a fail^Wfeature-platform.

Or it's all a joke and we'll continue to complain and shame..

> And if unlucky, I'm afraid the patch set even has no chance to be merged to
> upstream if maintainer's test machine has firmware split lock issue and the
> machine simply cannot reboot or go to S4 if the feature is enabled.

Well, then you'd better get those BIOS monkeys on board and fixing their
crap.

2018-06-22 09:19:50

by Thomas Gleixner

[permalink] [raw]
Subject: Re: [RFC PATCH 00/16] x86/split_lock: Enable #AC exception for split locked accesses

On Thu, 21 Jun 2018, Fenghua Yu wrote:
> Right. Code won't cover the early boot process. It only covers boot process
> after the feature is enabled.

That's broken. #AC wants to be enabled early as any other exception. The
reason why you cannot enable it is because your exception handler is a
total trainwreck, but it's even a trainwreck later on.

> After disabling #AC split lock after catching #AC once, any future split
> lock will not generate #AC any more.

That's perfectly fine for the kernel. Issue a prominent warning and stuff
gets fixed.

Thanks,

tglx

2018-06-22 09:23:11

by Thomas Gleixner

[permalink] [raw]
Subject: Re: [RFC PATCH 00/16] x86/split_lock: Enable #AC exception for split locked accesses

On Thu, 21 Jun 2018, Fenghua Yu wrote:
> The control knobs allow sysadmin to handle #AC for split lock in different
> scenarios and usages.
>
> The control knob for kernel is to choose re-executing the faulting
> instruction (default) or kernel panic. Kernel panic may be useful in hard
> real time which has less tolerant to bad performance.

That's nonsense, really.

1) The re-executing mechanism is broken and totally useless

2) Panicing a real-time system just due to a single #AC is total
overkill. Real-Time systems care very much about proper safe state
transitioning. Panic is surely a safe state, but so is power off. But
neither of them qualifies as proper state transitioning.

Thanks,

tglx

2018-06-22 09:27:53

by Thomas Gleixner

[permalink] [raw]
Subject: Re: [RFC PATCH 00/16] x86/split_lock: Enable #AC exception for split locked accesses

On Thu, 21 Jun 2018, Fenghua Yu wrote:
> The control knob for user is to choose killing the process (default) or
> re-executing the faulting instruction without blocking the process.
> Re-executing the instruction maybe be useful in platforms that run
> well controlled apps with less split locks.

Again, that's totally overengineered ballast. If you care, kill the thing
and if not, then disable #AC. There are other architectures which trap on
any unaligned access, so it's nothing new.

Thanks,

tglx

2018-06-22 09:30:07

by Thomas Gleixner

[permalink] [raw]
Subject: Re: [RFC PATCH 00/16] x86/split_lock: Enable #AC exception for split locked accesses

On Thu, 21 Jun 2018, Fenghua Yu wrote:
> The control knob for firmware is to choose continuing firmware execution
> by disabling #AC split lock (default) or stopping firmware execution
> by enabling #AC for split lock. Stopping firmware execution may be useful
> in hard real time system to identify any split lock issue on the platform.

There is no maybe on a real-time system. Either the thing is sane or it's
not. There is no middle ground. The default has to be fail otherwise
firmware wont be fixed ever.

> So the control knobs may be useful for different scenarios, right?

The control knobs are just helping to proliferate crap, so they are not
useful, they are outright counterproductive.

Thanks,

tglx

2018-06-22 09:32:26

by Thomas Gleixner

[permalink] [raw]
Subject: Re: [RFC PATCH 00/16] x86/split_lock: Enable #AC exception for split locked accesses

On Thu, 21 Jun 2018, Fenghua Yu wrote:
> On Thu, Jun 21, 2018 at 03:18:19PM -0700, Dave Hansen wrote:
> > On 06/21/2018 03:08 PM, Peter Zijlstra wrote:
> > > On Thu, Jun 21, 2018 at 03:00:03PM -0700, Fenghua Yu wrote:
> > >> The control knob for kernel is to choose re-executing the faulting
> > >> instruction (default) or kernel panic. Kernel panic may be useful in hard
> > >> real time which has less tolerant to bad performance.
> > > The kernel should never trigger this, ever. Any #AC is a plain bug and
> > > should be fixed. So unconditional WARN+disable is suffient.
> >
> > It sounds like we need to start with that functionality and then if
> > Fenghua wants to do more, we have add-on patches that can be evaluated
> > by themselves.
>
> For split lock in kernel, by default, I just do warning, disable feature
> in #AC handler, continue to execute the faulting instruction, and re-enable
> #AC for split lock after a delay. This is the default behavior in the
> current patches.

That default behaviour is broken beyond repair. I'll explain that when
replying to 2/N.

Plus the broken design prevents #AC early usage which is completely wrong
to begin with.

Thanks,

tglx

2018-06-22 10:50:57

by Thomas Gleixner

[permalink] [raw]
Subject: Re: [RFC PATCH 02/16] x86/split_lock: Handle #AC exception for split lock in kernel mode

On Sun, 27 May 2018, Fenghua Yu wrote:
> +static void wait_for_reexecution(void)
> +{
> + while (time_before(jiffies, disable_split_lock_jiffies +
> + reenable_split_lock_delay))
> + cpu_relax();
> +}
> +
> +/*
> + * TEST_CTL MSR is shared among threads on the same core. To simplify
> + * situation, disable_split_lock_jiffies is global instead of per core.

Simplify?

> + * Multiple threads may generate #AC for split lock at the same time.
> + * disable_split_lock_jiffies is updated by those threads. This may
> + * postpone re-enabling split lock on this thread. But that's OK
> + * because we need to make sure all threads on the same core re-execute
> + * their faulting instructions before re-enabling split lock on the core.

This does not ensure anything. It's a bandaid. If the task gets scheduled
out _before_ reexecuting then the timer will reenable AC and then when
scheduled back in it will trip it again.

> + * We want to avoid the situation when split lock is disabled on one
> + * thread (thus on the whole core), then split lock is re-enabled on
> + * another thread (thus on the whole core), and the faulting instruction
> + * generates another #AC on the first thread.
> + *
> + * Before re-enabling split lock, wait until there is no re-executed
> + * split lock instruction which may only exist before
> + * disable_split_lock_jiffies + reenable_split_lock_delay.
> + */
> +static void delayed_reenable_split_lock(struct work_struct *w)
> +{
> + mutex_lock(&reexecute_split_lock_mutex);
> + wait_for_reexecution();

And because the delayed work got scheduled after a tick and the other
thread just managed to make the timeout another tick longer, this thing
busy loops for a full tick with the mutex held. Groan.

> + _setup_split_lock(ENABLE_SPLIT_LOCK_AC);
> + mutex_unlock(&reexecute_split_lock_mutex);
> +}
> +
> +/* Will the faulting instruction be re-executed? */
> +static bool re_execute(struct pt_regs *regs)
> +{
> + /*
> + * The only reason for generating #AC from kernel is because of
> + * split lock. The kernel faulting instruction will be re-executed.

Will be reexecuted? By what?

> + */
> + if (!user_mode(regs))
> + return true;
> +
> + return false;
> +}

The extra value of this wrapper around !user_mode(regs) is that it is more
obfuscated than just having a sensible comment and a simple 'if
(user_mode(regs))' at the call site. Or is there anything I'm missing here?

> +static void disable_split_lock(void *unused)
> +{
> + _setup_split_lock(DISABLE_SPLIT_LOCK_AC);
> +}
> +
> +/*
> + * #AC handler for split lock is called by generic #AC handler.
> + *
> + * Disable #AC for split lock on the CPU that the current task runs on
> + * in order for the faulting instruction to get executed. The #AC for split
> + * lock is re-enabled later.
> + */
> +bool do_split_lock_exception(struct pt_regs *regs, unsigned long error_code)
> +{
> + static DEFINE_RATELIMIT_STATE(ratelimit, 5 * HZ, 5);
> + char str[] = "alignment check for split lock";
> + struct task_struct *tsk = current;
> + int cpu = task_cpu(tsk);

What the heck is wrong with smp_processor_id()?

task_cpu(current) is the CPU on which current is running and that's nothing
else than smp_processor_id(). But sure, it would be too obvious and not
make a reviewer scratch his head while trying to figure out what the magic
cpu selection means.

> +
> + if (!re_execute(regs))
> + return false;

So user space just returns here. The explanatory comment for that is where?
It's not even in this hideous wrapper which inverts the condition.

> +
> + /* Pace logging with jiffies. */
> + if (__ratelimit(&ratelimit)) {
> + pr_info("%s[%d] %s ip:%lx sp:%lx error:%lx",
> + tsk->comm, tsk->pid, str,
> + regs->ip, regs->sp, error_code);
> + print_vma_addr(KERN_CONT " in ", regs->ip);

Why the heck would print_vma_addr() of a kernel IP provide any useful
information? Kernel IP is in the kernel not in some random VMA.

What's wrong with a simple WARN() which is prominent and immediately
pointing to the right place?

> +
> + mutex_lock(&reexecute_split_lock_mutex);

How is that supposed to work when the exception happens in a preempt or
interrupt disabled region? And what guarantees that there is no #AC inside
a reexecute_split_lock_mutex held region? It's just forbidden to have one
there, right? If there is one, then you still have the reset button.

> + smp_call_function_single(cpu, disable_split_lock, NULL, 1);

When #AC happens in a interrupt disabled region then this will trigger the
irq disabled warning in smp_call_function_single(). Cannot happen because
it's forbidden to trip #AC in an interrupt disabled region, right?

Aside of that the value of using smp_call_function_single() instead of
invoking the function directly is? It makes sure that it's executed on:

task_cpu(current)

which is obviously the same as smp_processor_id(), i.e. the current
CPU. Makes a lot of sense especially as it has the added value that it
makes sure by setting 'wait = 1' that the function actually was
called.

> + /*
> + * Mark the time when split lock is disabled for re-executing the
> + * faulting instruction.
> + */
> + disable_split_lock_jiffies = jiffies;
> + mutex_unlock(&reexecute_split_lock_mutex);
> +
> + /* The faulting instruction will be re-executed when
> + * split lock is re-enabled 1 HZ later.

Ah, the reenablement of that stuff will re-execute the instruction. Now it
becomes more clear - NOT! These re-execution comments are just misleading
and wrong.

Aside of that Linus will love this comment style:

https://lkml.kernel.org/r/CA+55aFyQYJerovMsSoSKS7PessZBr4vNp-3QUUwhqk4A4_jcbg@mail.gmail.com

Why don't you use it everywhere?

> + */
> + schedule_delayed_work_on(cpu, &per_cpu(reenable_delayed_work, cpu),
> + reenable_split_lock_delay);

That does work really well during early boot because #AC can only happen
after scheduler and workqueues are initialized and after late init calls
have been done which initialized the per cpu delayed work. Heck no. #AC
wants to be on right from the beginning.

This patch surely earns extra points in the trainwreck engineering contest,
but that's not taking place on LKML.

The whole thing is simply:

handle_ac()
{
if (user_mode(regs)) {
do_trap(AC, SIGBUS, ...);
} else {
disable_ac_on_local_cpu();
WARN_ONCE(1);
}
}

That wants #AC enabled as early as possible so the kernel gets as much
coverage as it can. If it trips in the kernel it's a bug and needs to be
fixed and we can them fix ONE by ONE.

There is absolutely no requirement for all this so called "re-execution"
hackery. If there would be a true benefit of keeping #AC on after it
tripped in the kernel then you'd need to do

disable_ac()
emulate_faulting_insn()
enable_ac()

Obviously including refcounted synchronization of the disable/enable pair
with the CPUs which share the AC mechanics. Total overkill.

Thanks,

tglx

2018-06-22 12:00:55

by Thomas Gleixner

[permalink] [raw]
Subject: Re: [RFC PATCH 02/16] x86/split_lock: Handle #AC exception for split lock in kernel mode

On Fri, 22 Jun 2018, Thomas Gleixner wrote:
> The whole thing is simply:
>
> handle_ac()
> {
> if (user_mode(regs)) {
> do_trap(AC, SIGBUS, ...);
> } else {
> disable_ac_on_local_cpu();
> WARN_ONCE(1);
> }
> }
>
> That wants #AC enabled as early as possible so the kernel gets as much
> coverage as it can. If it trips in the kernel it's a bug and needs to be
> fixed and we can them fix ONE by ONE.

That said, #AC is just yet another badly defined and hastily bolted on
(mis)feature. This should have been:

Bit A: Enable #AC if CPL < 3
Bit B: Enable #AC if CPL == 3

But that would have been too useful and would allow sensible use of #AC
without creating software trainwrecks.

Aside of that the spec says:

31 Disable LOCK# assertion for split locked access.

Can you pretty please make sure that this bit enforces #AC enable? If 31 is
ever set and such an access happens then the resulting havoc will takes
ages to decode.

That bit is also mentioned in the SDM with ZERO explanation why it exists
in the first place and why anyone would ever enable it and without a big
fat warning about the possible consequences. Can this pretty please be
fixed?

Thanks,

tglx

2018-06-22 15:13:45

by Alan

[permalink] [raw]
Subject: Re: [RFC PATCH 06/16] x86/split_lock: Save #AC setting for split lock in firmware in boot time and restore the setting in reboot

On Thu, 2018-06-21 at 21:58 +0200, Peter Zijlstra wrote:
> On Sun, May 27, 2018 at 08:45:55AM -0700, Fenghua Yu wrote:
> > Firmware may contain split locked instructions.
>
> I think that's the wrong attitude. You should mandate in your BIOS
> development guide that Firmware _MUST_NOT_ contain unaligned LOCK
> prefixed instructions.
>

In the longer term I would agree entirely with that sentiment.

Alan


2018-06-22 22:43:20

by Fenghua Yu

[permalink] [raw]
Subject: Re: [RFC PATCH 02/16] x86/split_lock: Handle #AC exception for split lock in kernel mode

On Fri, Jun 22, 2018 at 01:59:44PM +0200, Thomas Gleixner wrote:
> On Fri, 22 Jun 2018, Thomas Gleixner wrote:
> > The whole thing is simply:
> >
> > handle_ac()
> > {
> > if (user_mode(regs)) {
> > do_trap(AC, SIGBUS, ...);
> > } else {
> > disable_ac_on_local_cpu();
> > WARN_ONCE(1);
> > }
> > }
> >
> > That wants #AC enabled as early as possible so the kernel gets as much
> > coverage as it can. If it trips in the kernel it's a bug and needs to be
> > fixed and we can them fix ONE by ONE.
>
> That said, #AC is just yet another badly defined and hastily bolted on
> (mis)feature. This should have been:
>
> Bit A: Enable #AC if CPL < 3
> Bit B: Enable #AC if CPL == 3
>
> But that would have been too useful and would allow sensible use of #AC
> without creating software trainwrecks.
>
> Aside of that the spec says:
>
> 31 Disable LOCK# assertion for split locked access.
>
> Can you pretty please make sure that this bit enforces #AC enable? If 31 is
> ever set and such an access happens then the resulting havoc will takes
> ages to decode.
>
> That bit is also mentioned in the SDM with ZERO explanation why it exists
> in the first place and why anyone would ever enable it and without a big
> fat warning about the possible consequences. Can this pretty please be
> fixed?

The bit 31 already exits on all processors. Hardware always sets its value
as zero after power on. It has been legacy for 20 years. It was added for
one customer 20 years ago. Now Intel hardware design team doesn't expect
anyone to set the bit.

Currently Linux kernel doesn't define this bit and doesn't set this bit.

Thanks.

-Fenghua

2018-06-22 22:58:27

by Thomas Gleixner

[permalink] [raw]
Subject: Re: [RFC PATCH 02/16] x86/split_lock: Handle #AC exception for split lock in kernel mode

On Fri, 22 Jun 2018, Fenghua Yu wrote:
> On Fri, Jun 22, 2018 at 01:59:44PM +0200, Thomas Gleixner wrote:
> > Aside of that the spec says:
> >
> > 31 Disable LOCK# assertion for split locked access.
> >
> > Can you pretty please make sure that this bit enforces #AC enable? If 31 is
> > ever set and such an access happens then the resulting havoc will takes
> > ages to decode.
> >
> > That bit is also mentioned in the SDM with ZERO explanation why it exists
> > in the first place and why anyone would ever enable it and without a big
> > fat warning about the possible consequences. Can this pretty please be
> > fixed?
>
> The bit 31 already exits on all processors. Hardware always sets its value
> as zero after power on. It has been legacy for 20 years. It was added for
> one customer 20 years ago. Now Intel hardware design team doesn't expect
> anyone to set the bit.

Doesn't expect. ROTFL.

That's the most stupiest excuse for not adding a big fat warning into the
SDM why this abomination should never be used at all.

Aside of that does the Intel hardware design team expect that this one
customer is still depending on this nonsense and is therefore proliferating
it forever?

> Currently Linux kernel doesn't define this bit and doesn't set this bit.

Thanks for the education. I knew that already, but it still does not make
the existence of it in contemporary CPUs any better or more justified.

Thanks,

tglx

2018-06-22 23:04:47

by Thomas Gleixner

[permalink] [raw]
Subject: Re: [RFC PATCH 02/16] x86/split_lock: Handle #AC exception for split lock in kernel mode

On Sat, 23 Jun 2018, Thomas Gleixner wrote:
> On Fri, 22 Jun 2018, Fenghua Yu wrote:
> > On Fri, Jun 22, 2018 at 01:59:44PM +0200, Thomas Gleixner wrote:
> > > Aside of that the spec says:
> > >
> > > 31 Disable LOCK# assertion for split locked access.
> > >
> > > Can you pretty please make sure that this bit enforces #AC enable? If 31 is
> > > ever set and such an access happens then the resulting havoc will takes
> > > ages to decode.
> > >
> > > That bit is also mentioned in the SDM with ZERO explanation why it exists
> > > in the first place and why anyone would ever enable it and without a big
> > > fat warning about the possible consequences. Can this pretty please be
> > > fixed?
> >
> > The bit 31 already exits on all processors. Hardware always sets its value
> > as zero after power on. It has been legacy for 20 years. It was added for
> > one customer 20 years ago. Now Intel hardware design team doesn't expect
> > anyone to set the bit.
>
> Doesn't expect. ROTFL.
>
> That's the most stupiest excuse for not adding a big fat warning into the
> SDM why this abomination should never be used at all.
>
> Aside of that does the Intel hardware design team expect that this one
> customer is still depending on this nonsense and is therefore proliferating
> it forever?

Forgot to add that there are a lot of things nobody expects to be done, but
especially BIOS/SMM people have a tendency to flip random bits as they see
fit. Maybe not this one, but only for the reason that they did not notice
it yet.

Thanks,

tglx



2018-06-22 23:20:37

by Fenghua Yu

[permalink] [raw]
Subject: Re: [RFC PATCH 02/16] x86/split_lock: Handle #AC exception for split lock in kernel mode

On Fri, Jun 22, 2018 at 01:59:44PM +0200, Thomas Gleixner wrote:
> On Fri, 22 Jun 2018, Thomas Gleixner wrote:
> > The whole thing is simply:
> >
> > handle_ac()
> > {
> > if (user_mode(regs)) {
> > do_trap(AC, SIGBUS, ...);
> > } else {
> > disable_ac_on_local_cpu();
> > WARN_ONCE(1);
> > }
> > }
> >
> > That wants #AC enabled as early as possible so the kernel gets as much
> > coverage as it can. If it trips in the kernel it's a bug and needs to be
> > fixed and we can them fix ONE by ONE.
>
> That said, #AC is just yet another badly defined and hastily bolted on
> (mis)feature. This should have been:
>
> Bit A: Enable #AC if CPL < 3
> Bit B: Enable #AC if CPL == 3
>
> But that would have been too useful and would allow sensible use of #AC
> without creating software trainwrecks.
>

The two bits would be ideal.

So I will do SIGBUG for user and WARN/disable for kernel in the next version.

Thanks.

-Fenghua

2018-06-23 04:22:42

by Fenghua Yu

[permalink] [raw]
Subject: Re: [RFC PATCH 02/16] x86/split_lock: Handle #AC exception for split lock in kernel mode

On Fri, Jun 22, 2018 at 12:49:00PM +0200, Thomas Gleixner wrote:
> On Sun, 27 May 2018, Fenghua Yu wrote:
> > +static void wait_for_reexecution(void)
> > +{
> > + while (time_before(jiffies, disable_split_lock_jiffies +
> > + reenable_split_lock_delay))
> > + cpu_relax();
> > +}
> > +
> > +/*
> > + * TEST_CTL MSR is shared among threads on the same core. To simplify
> > + * situation, disable_split_lock_jiffies is global instead of per core.
>
> This patch surely earns extra points in the trainwreck engineering contest,
> but that's not taking place on LKML.
>
> The whole thing is simply:
>
> handle_ac()
> {
> if (user_mode(regs)) {
> do_trap(AC, SIGBUS, ...);
> } else {
> disable_ac_on_local_cpu();
> WARN_ONCE(1);
> }
> }

Should I add kernel parameter or control knob to opt-out the feature?
I'm afraid firmware may hang system after handling split lock if the
feature is enabled by kernel, e.g. "reboot" hits split lock in firmware
and firmware hangs the system after handling #AC.

Thanks.

-Fenghua

2018-06-23 09:09:58

by Thomas Gleixner

[permalink] [raw]
Subject: Re: [RFC PATCH 02/16] x86/split_lock: Handle #AC exception for split lock in kernel mode

On Fri, 22 Jun 2018, Fenghua Yu wrote:
> On Fri, Jun 22, 2018 at 01:59:44PM +0200, Thomas Gleixner wrote:
> > On Fri, 22 Jun 2018, Thomas Gleixner wrote:
> > > The whole thing is simply:
> > >
> > > handle_ac()
> > > {
> > > if (user_mode(regs)) {
> > > do_trap(AC, SIGBUS, ...);
> > > } else {
> > > disable_ac_on_local_cpu();
> > > WARN_ONCE(1);
> > > }
> > > }
> > >
> > > That wants #AC enabled as early as possible so the kernel gets as much
> > > coverage as it can. If it trips in the kernel it's a bug and needs to be
> > > fixed and we can them fix ONE by ONE.
> >
> > That said, #AC is just yet another badly defined and hastily bolted on
> > (mis)feature. This should have been:
> >
> > Bit A: Enable #AC if CPL < 3
> > Bit B: Enable #AC if CPL == 3
> >
> > But that would have been too useful and would allow sensible use of #AC
> > without creating software trainwrecks.
> >
>
> The two bits would be ideal.

Correct. And if hardware people would not base their stuff on 'we expect'
assumptions and talk to us _before_ casting half baken features into
silicon, we would have two bits today.

Thanks,

tglx

2018-06-23 09:19:24

by Thomas Gleixner

[permalink] [raw]
Subject: Re: [RFC PATCH 02/16] x86/split_lock: Handle #AC exception for split lock in kernel mode

On Fri, 22 Jun 2018, Fenghua Yu wrote:
> On Fri, Jun 22, 2018 at 12:49:00PM +0200, Thomas Gleixner wrote:
> > On Sun, 27 May 2018, Fenghua Yu wrote:
> > > +static void wait_for_reexecution(void)
> > > +{
> > > + while (time_before(jiffies, disable_split_lock_jiffies +
> > > + reenable_split_lock_delay))
> > > + cpu_relax();
> > > +}
> > > +
> > > +/*
> > > + * TEST_CTL MSR is shared among threads on the same core. To simplify
> > > + * situation, disable_split_lock_jiffies is global instead of per core.
> >
> > This patch surely earns extra points in the trainwreck engineering contest,
> > but that's not taking place on LKML.
> >
> > The whole thing is simply:
> >
> > handle_ac()
> > {
> > if (user_mode(regs)) {
> > do_trap(AC, SIGBUS, ...);
> > } else {
> > disable_ac_on_local_cpu();
> > WARN_ONCE(1);
> > }
> > }
>
> Should I add kernel parameter or control knob to opt-out the feature?

A simple command line option 'acoff' or something more sensible should be
ok. No sysfs knobs or whatever please. The Kconfig option is not required
either.

> I'm afraid firmware may hang system after handling split lock if the
> feature is enabled by kernel, e.g. "reboot" hits split lock in firmware
> and firmware hangs the system after handling #AC.

Have you observed the problem in reality? I mean why would 'reboot' be the
critical path? I'd rather expect that EFI callbacks or SMM 'value add'
would trip over it.

Vs. reboot. If that is the only problem then we might just have to clear
#AC enable before issuing it, but that does not need to be part of the
initial patch set. Its an orthogonal issue.

Thanks,

tglx

2018-06-23 15:07:26

by Fenghua Yu

[permalink] [raw]
Subject: Re: [RFC PATCH 02/16] x86/split_lock: Handle #AC exception for split lock in kernel mode

On Sat, Jun 23, 2018 at 11:17:03AM +0200, Thomas Gleixner wrote:
> On Fri, 22 Jun 2018, Fenghua Yu wrote:
> > On Fri, Jun 22, 2018 at 12:49:00PM +0200, Thomas Gleixner wrote:
> > > On Sun, 27 May 2018, Fenghua Yu wrote:
> > > > +static void wait_for_reexecution(void)
> > > > +{
> > > > + while (time_before(jiffies, disable_split_lock_jiffies +
> > > > + reenable_split_lock_delay))
> > > > + cpu_relax();
> > > > +}
> > > > +
> > > > +/*
> > > > + * TEST_CTL MSR is shared among threads on the same core. To simplify
> > > > + * situation, disable_split_lock_jiffies is global instead of per core.
> > >
> > > This patch surely earns extra points in the trainwreck engineering contest,
> > > but that's not taking place on LKML.
> > >
> > > The whole thing is simply:
> > >
> > > handle_ac()
> > > {
> > > if (user_mode(regs)) {
> > > do_trap(AC, SIGBUS, ...);
> > > } else {
> > > disable_ac_on_local_cpu();
> > > WARN_ONCE(1);
> > > }
> > > }
> >
> > Should I add kernel parameter or control knob to opt-out the feature?
>
> A simple command line option 'acoff' or something more sensible should be
> ok. No sysfs knobs or whatever please. The Kconfig option is not required
> either.

Ok. I will have a command line option.

BTW, I have a Kconfig option to enable split lock test in kernel mode in
patch #15. Are the Kconfig option and the kernel test code still needed
in next version?

>
> > I'm afraid firmware may hang system after handling split lock if the
> > feature is enabled by kernel, e.g. "reboot" hits split lock in firmware
> > and firmware hangs the system after handling #AC.
>
> Have you observed the problem in reality? I mean why would 'reboot' be the
> critical path? I'd rather expect that EFI callbacks or SMM 'value add'
> would trip over it.
>
> Vs. reboot. If that is the only problem then we might just have to clear
> #AC enable before issuing it, but that does not need to be part of the
> initial patch set. Its an orthogonal issue.

Yes, I do see a real firmware hang after hitting and handling a split lock
in firmware during "reboot" in one simulation test environment. Apprantly
the split lock (and alignment access) is treated as a failure in firmware.

This real case triggered my concern that split lock in any future
firmware may happen in any path including run time service, S3/S4/S5,
hotplug. If we don't have opt-out option or something similar, system hang
from split lock in firmware can be a blocking issue on some platforms. If
that happens, bisect always finds the split lock patch to blame.

Thanks.

-Fenghua

2018-06-23 18:51:14

by Fenghua Yu

[permalink] [raw]
Subject: Re: [RFC PATCH 00/16] x86/split_lock: Enable #AC exception for split locked accesses

On Fri, Jun 22, 2018 at 12:13:37AM +0200, Peter Zijlstra wrote:
> On Thu, Jun 21, 2018 at 03:00:03PM -0700, Fenghua Yu wrote:
> > For example, on a consolidated real-time machine, some cores are running
>
> > Another example, in a public cloud deployed in the field, a user process
>
> In either case a single split-lock shouldn't be a real problem, if you
> program the event with a count of 1 and have the NMI handler kill the
> offending task, you should be good.
>
> Not saying the #AC isn't nicer, just saying the PMU based thing can
> still work.

The difference between perf event and #AC for split lock is perf event
is after the fact while #AC is before the fact.

In the examples (esp. the first example), hard real time cannot afforrd
a split lock/bus lock. Perf captures the event and kill the task; but
the split lock has been issued and harm has been done. #AC for split lock
kills the task before the task generates split lock or bus lock to hurt
real time tasks.

Thanks.

-Fenghua

2018-06-24 00:57:51

by Thomas Gleixner

[permalink] [raw]
Subject: Re: [RFC PATCH 02/16] x86/split_lock: Handle #AC exception for split lock in kernel mode

On Sat, 23 Jun 2018, Fenghua Yu wrote:
> On Sat, Jun 23, 2018 at 11:17:03AM +0200, Thomas Gleixner wrote:
> > On Fri, 22 Jun 2018, Fenghua Yu wrote:
> > > Should I add kernel parameter or control knob to opt-out the feature?
> >
> > A simple command line option 'acoff' or something more sensible should be
> > ok. No sysfs knobs or whatever please. The Kconfig option is not required
> > either.
>
> Ok. I will have a command line option.
>
> BTW, I have a Kconfig option to enable split lock test in kernel mode in
> patch #15. Are the Kconfig option and the kernel test code still needed
> in next version?

Unless you do not trust #AC to work everywhere where it is advertised it's
pretty much pointless.

Btw, please get also rid of these bloated control_ac() stuff. We have
msr_set/clear_bit() so no need to reinvent the wheel.

> > > I'm afraid firmware may hang system after handling split lock if the
> > > feature is enabled by kernel, e.g. "reboot" hits split lock in firmware
> > > and firmware hangs the system after handling #AC.
> >
> > Have you observed the problem in reality? I mean why would 'reboot' be the
> > critical path? I'd rather expect that EFI callbacks or SMM 'value add'
> > would trip over it.
> >
> > Vs. reboot. If that is the only problem then we might just have to clear
> > #AC enable before issuing it, but that does not need to be part of the
> > initial patch set. Its an orthogonal issue.
>
> Yes, I do see a real firmware hang after hitting and handling a split lock
> in firmware during "reboot" in one simulation test environment. Apprantly
> the split lock (and alignment access) is treated as a failure in firmware.

It's not treated as failure. The firmware simply does not have an handler
for #AC installed and dies. I hope you yelled at the firmware people
already.

> This real case triggered my concern that split lock in any future
> firmware may happen in any path including run time service, S3/S4/S5,
> hotplug. If we don't have opt-out option or something similar, system hang
> from split lock in firmware can be a blocking issue on some platforms. If
> that happens, bisect always finds the split lock patch to blame.

That's fine. The changelog will hopefully explain it along with the text
that people should use the commandline option and yell at their firmware
supplier. So what? Move on....

If that is a real wide spread issue in practice, then we might have to go
for some ugly workarounds, but we won't find out when we add them
upfront. So testing will show what's wrong in firmware land and we can
handle it from there. It's a completely orthogonal issue and has nothing to
do with the core functionality.

Thanks,

tglx



2018-06-24 01:21:15

by Fenghua Yu

[permalink] [raw]
Subject: Re: [RFC PATCH 02/16] x86/split_lock: Handle #AC exception for split lock in kernel mode

On Sun, Jun 24, 2018 at 02:55:08AM +0200, Thomas Gleixner wrote:
> On Sat, 23 Jun 2018, Fenghua Yu wrote:
> > On Sat, Jun 23, 2018 at 11:17:03AM +0200, Thomas Gleixner wrote:
> > > On Fri, 22 Jun 2018, Fenghua Yu wrote:
> It's not treated as failure. The firmware simply does not have an handler
> for #AC installed and dies. I hope you yelled at the firmware people
> already.

I think firmware does handle the split lock in its #AC handler because
firmware shows the following log and system hangs:

!!!! IA32 Exception Type - 11(#AC - Alignment Check) CPU Apic ID - 00000014!
!
ExceptionData - 00000000
EIP - 6A9DB6C3, CS - 00000010, EFLAGS - 00010006
...

> If that is a real wide spread issue in practice, then we might have to go
> for some ugly workarounds, but we won't find out when we add them
> upfront. So testing will show what's wrong in firmware land and we can
> handle it from there. It's a completely orthogonal issue and has nothing to
> do with the core functionality.

Sure

Thanks.

-Fenghua

2018-06-24 06:46:13

by Thomas Gleixner

[permalink] [raw]
Subject: Re: [RFC PATCH 02/16] x86/split_lock: Handle #AC exception for split lock in kernel mode

On Sat, 23 Jun 2018, Fenghua Yu wrote:

> On Sun, Jun 24, 2018 at 02:55:08AM +0200, Thomas Gleixner wrote:
> > On Sat, 23 Jun 2018, Fenghua Yu wrote:
> > > On Sat, Jun 23, 2018 at 11:17:03AM +0200, Thomas Gleixner wrote:
> > > > On Fri, 22 Jun 2018, Fenghua Yu wrote:
> > It's not treated as failure. The firmware simply does not have an handler
> > for #AC installed and dies. I hope you yelled at the firmware people
> > already.
>
> I think firmware does handle the split lock in its #AC handler because
> firmware shows the following log and system hangs:
>
> !!!! IA32 Exception Type - 11(#AC - Alignment Check) CPU Apic ID - 00000014!
> !
> ExceptionData - 00000000
> EIP - 6A9DB6C3, CS - 00000010, EFLAGS - 00010006
> ...

They even have a handler which makes the thing die, but never bothered to
enable #AC in the first place. So the handler is there to catch the case
that an operating system dares to enable #AC. Thank god, I don't have to
review that source code.

Thanks,

tglx

2018-06-26 09:08:25

by Peter Zijlstra

[permalink] [raw]
Subject: Re: [RFC PATCH 06/16] x86/split_lock: Save #AC setting for split lock in firmware in boot time and restore the setting in reboot

On Fri, Jun 22, 2018 at 04:11:07PM +0100, Alan Cox wrote:
> On Thu, 2018-06-21 at 21:58 +0200, Peter Zijlstra wrote:
> > On Sun, May 27, 2018 at 08:45:55AM -0700, Fenghua Yu wrote:
> > > Firmware may contain split locked instructions.
> >
> > I think that's the wrong attitude. You should mandate in your BIOS
> > development guide that Firmware _MUST_NOT_ contain unaligned LOCK
> > prefixed instructions.
> >
>
> In the longer term I would agree entirely with that sentiment.

But then how do we deal with SMIs ? The firmware people will at least
need to know about this, and the quick fix is to make the SMI handler
save/restore the MSR, but since they're aware and already changing their
code, they might as well fix the actual problem -- which is likely
trivial.

So no, I don't buy it. Just fix the firmware instead of allowing them to
fester and grow layers of ducttape.

Because even for SMM WRMSR is 100s of cycles, and why would they want to
make every single SMI more expensive.

Also, as mentioned earlier, what are we going to do about SMIs in
general? They're a _far_ _FAR_ bigger problem for RT workloads than a
sporadic split atomic. Esp. with some vendors thinking they can run
bitcoin miners in SMI (or whatever else it is that is taking miliseconds
of compute time).

Split atomics are an insignificant problem compared to the nightmare
trainwreck that is SMM.

2018-06-26 09:42:05

by Thomas Gleixner

[permalink] [raw]
Subject: Re: [RFC PATCH 06/16] x86/split_lock: Save #AC setting for split lock in firmware in boot time and restore the setting in reboot

On Tue, 26 Jun 2018, Peter Zijlstra wrote:
> On Fri, Jun 22, 2018 at 04:11:07PM +0100, Alan Cox wrote:
> > On Thu, 2018-06-21 at 21:58 +0200, Peter Zijlstra wrote:
> > > On Sun, May 27, 2018 at 08:45:55AM -0700, Fenghua Yu wrote:
> > > > Firmware may contain split locked instructions.
> > >
> > > I think that's the wrong attitude. You should mandate in your BIOS
> > > development guide that Firmware _MUST_NOT_ contain unaligned LOCK
> > > prefixed instructions.
> > >
> >
> > In the longer term I would agree entirely with that sentiment.
>
> But then how do we deal with SMIs ? The firmware people will at least
> need to know about this, and the quick fix is to make the SMI handler
> save/restore the MSR, but since they're aware and already changing their
> code, they might as well fix the actual problem -- which is likely
> trivial.
>
> So no, I don't buy it. Just fix the firmware instead of allowing them to
> fester and grow layers of ducttape.
>
> Because even for SMM WRMSR is 100s of cycles, and why would they want to
> make every single SMI more expensive.
>
> Also, as mentioned earlier, what are we going to do about SMIs in
> general? They're a _far_ _FAR_ bigger problem for RT workloads than a
> sporadic split atomic. Esp. with some vendors thinking they can run
> bitcoin miners in SMI (or whatever else it is that is taking miliseconds
> of compute time).
>
> Split atomics are an insignificant problem compared to the nightmare
> trainwreck that is SMM.

Aside of that, the split lock detection is only available for not yet
released silicon. So there is absolutely _ZERO_ reason to force the kernel
to deal with broken BIOSes. This would be a different story if the feature
would break gazillion of shipped systems.

So there is no 'In the longer term'. It's still enough time to fix the BIOS
hackery _before_ any of this hits a production line.

Ergo. This is not going to go anywhere near the x86 kernel code ever. We
are not fostering completely avoidable engineering trainwrecks.

To be honest, this part of the patch set should have never seen the light
of LKML and we neither should have that discussion in the first place.

Thanks,

tglx

2018-06-26 11:05:28

by Alan Cox

[permalink] [raw]
Subject: Re: [RFC PATCH 06/16] x86/split_lock: Save #AC setting for split lock in firmware in boot time and restore the setting in reboot

On Tue, 26 Jun 2018 11:05:21 +0200
Peter Zijlstra <[email protected]> wrote:

> On Fri, Jun 22, 2018 at 04:11:07PM +0100, Alan Cox wrote:
> > On Thu, 2018-06-21 at 21:58 +0200, Peter Zijlstra wrote:
> > > On Sun, May 27, 2018 at 08:45:55AM -0700, Fenghua Yu wrote:
> > > > Firmware may contain split locked instructions.
> > >
> > > I think that's the wrong attitude. You should mandate in your BIOS
> > > development guide that Firmware _MUST_NOT_ contain unaligned LOCK
> > > prefixed instructions.
> > >
> >
> > In the longer term I would agree entirely with that sentiment.
>
> But then how do we deal with SMIs ?

On a system doing that level of real time you don't do SMIs.

I think Thomas comment that this shouldn't be in a production kernel
because the firmware people need to have their house in order before that
is a fair one.

(and I've already been poking people to make sure this is documented
properly as a requirement)

Alan

2018-06-26 11:32:13

by Peter Zijlstra

[permalink] [raw]
Subject: Re: [RFC PATCH 06/16] x86/split_lock: Save #AC setting for split lock in firmware in boot time and restore the setting in reboot

On Tue, Jun 26, 2018 at 12:02:21PM +0100, Alan Cox wrote:
> > But then how do we deal with SMIs ?
>
> On a system doing that level of real time you don't do SMIs.

And where do I buy one of those systems? SMIs are a real problem today,
we need help to make them go away.

2018-06-27 23:55:30

by Fenghua Yu

[permalink] [raw]
Subject: Re: [RFC PATCH 04/16] x86/split_lock: Use non locked bit set instruction in set_cpu_cap

On Thu, Jun 21, 2018 at 09:55:40PM +0200, Peter Zijlstra wrote:
> On Sun, May 27, 2018 at 08:45:53AM -0700, Fenghua Yu wrote:
> > set_bit() called by set_cpu_cap() is a locked bit set instruction for
> > atomic operation.
> >
> > Since the c->x86_capability can span two cache lines depending on kernel
> > configuration and building evnironment, the locked bit set instruction may
> > cause #AC exception when #AC exception for split lock is enabled.
>
> That doesn't make sense. Sure the bitmap may be longer, but depending on
> if the argument is an immediate or not we either use a byte instruction
> (which can never cross a cacheline boundary) or a 'word' aligned BTS.
> And the bitmap really _should_ be 'unsigned long' aligned.
>
> If it is not aligned, fix that too.
>
> /me looks at cpuinfo_x86 and finds x86_capability is in fact a __u32
> array.. see that's broken and needs fixing first.

Do you mean x86_capability's type should be changed from __u32 to unsigned
long?

Changing x86_capability's type won't directly fix the split lock in
set_cpu_cap(), right? BTS still may access x86_capability across cache
line no matter x86_capability's type.

Thanks.

-Fenghua

2018-06-28 08:05:35

by Thomas Gleixner

[permalink] [raw]
Subject: Re: [RFC PATCH 04/16] x86/split_lock: Use non locked bit set instruction in set_cpu_cap

On Wed, 27 Jun 2018, Fenghua Yu wrote:
> On Thu, Jun 21, 2018 at 09:55:40PM +0200, Peter Zijlstra wrote:
> > On Sun, May 27, 2018 at 08:45:53AM -0700, Fenghua Yu wrote:
> > > set_bit() called by set_cpu_cap() is a locked bit set instruction for
> > > atomic operation.
> > >
> > > Since the c->x86_capability can span two cache lines depending on kernel
> > > configuration and building evnironment, the locked bit set instruction may
> > > cause #AC exception when #AC exception for split lock is enabled.
> >
> > That doesn't make sense. Sure the bitmap may be longer, but depending on
> > if the argument is an immediate or not we either use a byte instruction
> > (which can never cross a cacheline boundary) or a 'word' aligned BTS.
> > And the bitmap really _should_ be 'unsigned long' aligned.
> >
> > If it is not aligned, fix that too.
> >
> > /me looks at cpuinfo_x86 and finds x86_capability is in fact a __u32
> > array.. see that's broken and needs fixing first.
>
> Do you mean x86_capability's type should be changed from __u32 to unsigned
> long?
>
> Changing x86_capability's type won't directly fix the split lock in
> set_cpu_cap(), right? BTS still may access x86_capability across cache
> line no matter x86_capability's type.

Errm. No. BTS & al are accessing a single 64bit location which is

base_address + (bit_offset % 64) * 8

So if the base address is properly aligned then BTS & al will _NEVER_ have
to lock more than a single cache line. And it does not matter wheter we fix
the type or enforce 64bit alignement of the array by other means.

If that's not true then BTS & al are terminally broken and you can stop
working on #AC right away.

Thanks,

tglx

2018-06-28 14:25:23

by Fenghua Yu

[permalink] [raw]
Subject: Re: [RFC PATCH 04/16] x86/split_lock: Use non locked bit set instruction in set_cpu_cap

On Thu, Jun 28, 2018 at 09:53:17AM +0200, Thomas Gleixner wrote:
> On Wed, 27 Jun 2018, Fenghua Yu wrote:
> > On Thu, Jun 21, 2018 at 09:55:40PM +0200, Peter Zijlstra wrote:
> > > On Sun, May 27, 2018 at 08:45:53AM -0700, Fenghua Yu wrote:
> > > > set_bit() called by set_cpu_cap() is a locked bit set instruction for
> > > > atomic operation.
> > > >
> > > > Since the c->x86_capability can span two cache lines depending on kernel
> > > > configuration and building evnironment, the locked bit set instruction may
> > > > cause #AC exception when #AC exception for split lock is enabled.
> > >
> > > That doesn't make sense. Sure the bitmap may be longer, but depending on
> > > if the argument is an immediate or not we either use a byte instruction
> > > (which can never cross a cacheline boundary) or a 'word' aligned BTS.
> > > And the bitmap really _should_ be 'unsigned long' aligned.
> > >
> > > If it is not aligned, fix that too.
> > >
> > > /me looks at cpuinfo_x86 and finds x86_capability is in fact a __u32
> > > array.. see that's broken and needs fixing first.
> >
> > Do you mean x86_capability's type should be changed from __u32 to unsigned
> > long?
> >
> > Changing x86_capability's type won't directly fix the split lock in
> > set_cpu_cap(), right? BTS still may access x86_capability across cache
> > line no matter x86_capability's type.
>
> Errm. No. BTS & al are accessing a single 64bit location which is
>
> base_address + (bit_offset % 64) * 8
>
> So if the base address is properly aligned then BTS & al will _NEVER_ have
> to lock more than a single cache line. And it does not matter wheter we fix
> the type or enforce 64bit alignement of the array by other means.
>
> If that's not true then BTS & al are terminally broken and you can stop
> working on #AC right away.

Is the following patch right fix for the x86_capability split lock issue
(i.e. this patch replace patch 4 and 5)?

- __u32 x86_capability[NCAPINTS + NBUGINTS];
+ __u32 x86_capability[NCAPINTS + NBUGINTS]
+ __aligned(8);

Thanks.

-Fenghua

2018-06-28 14:56:22

by Fenghua Yu

[permalink] [raw]
Subject: Re: [RFC PATCH 04/16] x86/split_lock: Use non locked bit set instruction in set_cpu_cap

On Thu, Jun 28, 2018 at 07:23:22AM -0700, Fenghua Yu wrote:
> On Thu, Jun 28, 2018 at 09:53:17AM +0200, Thomas Gleixner wrote:
> > On Wed, 27 Jun 2018, Fenghua Yu wrote:
> > > On Thu, Jun 21, 2018 at 09:55:40PM +0200, Peter Zijlstra wrote:
> > > > On Sun, May 27, 2018 at 08:45:53AM -0700, Fenghua Yu wrote:
> > > > > set_bit() called by set_cpu_cap() is a locked bit set instruction for
> > > > > atomic operation.
> > > > >
> > > > > Since the c->x86_capability can span two cache lines depending on kernel
> > > > > configuration and building evnironment, the locked bit set instruction may
> > > > > cause #AC exception when #AC exception for split lock is enabled.
> > > >
> > > > That doesn't make sense. Sure the bitmap may be longer, but depending on
> > > > if the argument is an immediate or not we either use a byte instruction
> > > > (which can never cross a cacheline boundary) or a 'word' aligned BTS.
> > > > And the bitmap really _should_ be 'unsigned long' aligned.
> > > >
> > > > If it is not aligned, fix that too.
> > > >
> > > > /me looks at cpuinfo_x86 and finds x86_capability is in fact a __u32
> > > > array.. see that's broken and needs fixing first.
> > >
> > > Do you mean x86_capability's type should be changed from __u32 to unsigned
> > > long?
> > >
> > > Changing x86_capability's type won't directly fix the split lock in
> > > set_cpu_cap(), right? BTS still may access x86_capability across cache
> > > line no matter x86_capability's type.
> >
> > Errm. No. BTS & al are accessing a single 64bit location which is
> >
> > base_address + (bit_offset % 64) * 8
> >
> > So if the base address is properly aligned then BTS & al will _NEVER_ have
> > to lock more than a single cache line. And it does not matter wheter we fix
> > the type or enforce 64bit alignement of the array by other means.
> >
> > If that's not true then BTS & al are terminally broken and you can stop
> > working on #AC right away.
>
> Is the following patch right fix for the x86_capability split lock issue
> (i.e. this patch replace patch 4 and 5)?
>
> - __u32 x86_capability[NCAPINTS + NBUGINTS];
> + __u32 x86_capability[NCAPINTS + NBUGINTS]
> + __aligned(8);

Should be '__aligned(sizeof(unsigned long))' for both 32-bit and 64-bit.


Thanks.

-Fenghua