2019-11-21 01:48:10

by Fenghua Yu

[permalink] [raw]
Subject: [PATCH v10 6/6] x86/split_lock: Enable split lock detection by kernel parameter

Split lock detection is disabled by default. Enable the feature by
kernel parameter "split_lock_detect".

Usually it is enabled in real time when expensive split lock issues
cannot be tolerated so should be fatal errors, or for debugging and
fixing the split lock issues to improve performance.

Please note: enabling this feature will cause kernel panic or SIGBUS
to user application when a split lock issue is detected.

Signed-off-by: Fenghua Yu <[email protected]>
Reviewed-by: Tony Luck <[email protected]>
---
.../admin-guide/kernel-parameters.txt | 10 ++++++
arch/x86/kernel/cpu/intel.c | 34 +++++++++++++++++++
2 files changed, 44 insertions(+)

diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt
index 8dee8f68fe15..1ed313891f44 100644
--- a/Documentation/admin-guide/kernel-parameters.txt
+++ b/Documentation/admin-guide/kernel-parameters.txt
@@ -3166,6 +3166,16 @@

nosoftlockup [KNL] Disable the soft-lockup detector.

+ split_lock_detect
+ [X86] Enable split lock detection
+ This is a real time or debugging feature. When enabled
+ (and if hardware support is present), atomic
+ instructions that access data across cache line
+ boundaries will result in an alignment check exception.
+ When triggered in applications the kernel will send
+ SIGBUS. The kernel will panic for a split lock in
+ OS code.
+
nosync [HW,M68K] Disables sync negotiation for all devices.

nowatchdog [KNL] Disable both lockup detectors, i.e.
diff --git a/arch/x86/kernel/cpu/intel.c b/arch/x86/kernel/cpu/intel.c
index bc0c2f288509..9bf6daf185b9 100644
--- a/arch/x86/kernel/cpu/intel.c
+++ b/arch/x86/kernel/cpu/intel.c
@@ -20,6 +20,7 @@
#include <asm/hwcap2.h>
#include <asm/elf.h>
#include <asm/cpu_device_id.h>
+#include <asm/cmdline.h>

#ifdef CONFIG_X86_64
#include <linux/topology.h>
@@ -655,6 +656,26 @@ static void init_intel_misc_features(struct cpuinfo_x86 *c)
wrmsrl(MSR_MISC_FEATURES_ENABLES, msr);
}

+static void split_lock_init(void)
+{
+ if (split_lock_detect_enabled) {
+ u64 test_ctrl_val;
+
+ /*
+ * The TEST_CTRL MSR is per core. So multiple threads can
+ * read/write the MSR in parallel. But it's possible to
+ * simplify the read/write without locking and without
+ * worry about overwriting the MSR because only bit 29
+ * is implemented in the MSR and the bit is set as 1 by all
+ * threads. Locking may be needed in the future if situation
+ * is changed e.g. other bits are implemented.
+ */
+ rdmsrl(MSR_TEST_CTRL, test_ctrl_val);
+ test_ctrl_val |= MSR_TEST_CTRL_SPLIT_LOCK_DETECT;
+ wrmsrl(MSR_TEST_CTRL, test_ctrl_val);
+ }
+}
+
static void init_intel(struct cpuinfo_x86 *c)
{
early_init_intel(c);
@@ -770,6 +791,8 @@ static void init_intel(struct cpuinfo_x86 *c)
tsx_enable();
if (tsx_ctrl_state == TSX_CTRL_DISABLE)
tsx_disable();
+
+ split_lock_init();
}

#ifdef CONFIG_X86_32
@@ -1032,9 +1055,20 @@ static const struct cpu_dev intel_cpu_dev = {

cpu_dev_register(intel_cpu_dev);

+#undef pr_fmt
+#define pr_fmt(fmt) "x86/split lock detection: " fmt
+
static void __init split_lock_setup(void)
{
setup_force_cpu_cap(X86_FEATURE_SPLIT_LOCK_DETECT);
+
+ if (cmdline_find_option_bool(boot_command_line,
+ "split_lock_detect")) {
+ split_lock_detect_enabled = true;
+ pr_info("enabled\n");
+ } else {
+ pr_info("disabled\n");
+ }
}

#define SPLIT_LOCK_CPU(model) {X86_VENDOR_INTEL, 6, model, X86_FEATURE_ANY}
--
2.19.1


2019-11-21 06:08:20

by Ingo Molnar

[permalink] [raw]
Subject: Re: [PATCH v10 6/6] x86/split_lock: Enable split lock detection by kernel parameter


* Fenghua Yu <[email protected]> wrote:

> Split lock detection is disabled by default. Enable the feature by
> kernel parameter "split_lock_detect".
>
> Usually it is enabled in real time when expensive split lock issues
> cannot be tolerated so should be fatal errors, or for debugging and
> fixing the split lock issues to improve performance.
>
> Please note: enabling this feature will cause kernel panic or SIGBUS
> to user application when a split lock issue is detected.
>
> Signed-off-by: Fenghua Yu <[email protected]>
> Reviewed-by: Tony Luck <[email protected]>
> ---
> .../admin-guide/kernel-parameters.txt | 10 ++++++
> arch/x86/kernel/cpu/intel.c | 34 +++++++++++++++++++
> 2 files changed, 44 insertions(+)
>
> diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt
> index 8dee8f68fe15..1ed313891f44 100644
> --- a/Documentation/admin-guide/kernel-parameters.txt
> +++ b/Documentation/admin-guide/kernel-parameters.txt
> @@ -3166,6 +3166,16 @@
>
> nosoftlockup [KNL] Disable the soft-lockup detector.
>
> + split_lock_detect
> + [X86] Enable split lock detection
> + This is a real time or debugging feature. When enabled
> + (and if hardware support is present), atomic
> + instructions that access data across cache line
> + boundaries will result in an alignment check exception.
> + When triggered in applications the kernel will send
> + SIGBUS. The kernel will panic for a split lock in
> + OS code.

It would be really nice to be able to enable/disable this runtime as
well, has this been raised before, and what was the conclusion?

Thanks,

Ingo

2019-11-21 08:03:03

by Peter Zijlstra

[permalink] [raw]
Subject: Re: [PATCH v10 6/6] x86/split_lock: Enable split lock detection by kernel parameter

On Wed, Nov 20, 2019 at 04:53:23PM -0800, Fenghua Yu wrote:
> Split lock detection is disabled by default. Enable the feature by
> kernel parameter "split_lock_detect".
>
> Usually it is enabled in real time when expensive split lock issues
> cannot be tolerated so should be fatal errors, or for debugging and
> fixing the split lock issues to improve performance.
>
> Please note: enabling this feature will cause kernel panic or SIGBUS
> to user application when a split lock issue is detected.

ARGGGHH, by having this default disabled, firmware will _NEVER_ be
exposed to this before it ships.

How will you guarantee the firmware will not explode the moment you
enable this?

2019-11-21 13:04:13

by Peter Zijlstra

[permalink] [raw]
Subject: Re: [PATCH v10 6/6] x86/split_lock: Enable split lock detection by kernel parameter

On Thu, Nov 21, 2019 at 07:04:44AM +0100, Ingo Molnar wrote:
> * Fenghua Yu <[email protected]> wrote:

> > + split_lock_detect
> > + [X86] Enable split lock detection
> > + This is a real time or debugging feature. When enabled
> > + (and if hardware support is present), atomic
> > + instructions that access data across cache line
> > + boundaries will result in an alignment check exception.
> > + When triggered in applications the kernel will send
> > + SIGBUS. The kernel will panic for a split lock in
> > + OS code.
>
> It would be really nice to be able to enable/disable this runtime as
> well, has this been raised before, and what was the conclusion?

It has, previous versions had that. Somehow a lot of things went missing
and we're back to a broken neutered useless mess.

The problem appears to be that due to hardware design the feature cannot
be virtualized, and instead of then disabling it when a VM runs/exists
they just threw in the towel and went back to useless mode.. :-(

This feature MUST be default enabled, otherwise everything will
be/remain broken and we'll end up in the situation where you can't use
it even if you wanted to.

Imagine the BIOS/EFI/firmware containing an #AC exception. At that point
the feature becomes useless, because you cannot enable it without your
machine dying.

Now, from long and painful experience we all know that if a BIOS can be
wrong, it will be. Therefore this feature will be/is useless as
presented.

And I can't be arsed to look it up, but we've been making this very same
argument since very early (possible the very first) version.

So this version goes straight into the bit bucket. Please try again.

2019-11-21 13:18:07

by Peter Zijlstra

[permalink] [raw]
Subject: Re: [PATCH v10 6/6] x86/split_lock: Enable split lock detection by kernel parameter

On Thu, Nov 21, 2019 at 02:01:53PM +0100, Peter Zijlstra wrote:
> On Thu, Nov 21, 2019 at 07:04:44AM +0100, Ingo Molnar wrote:
> > * Fenghua Yu <[email protected]> wrote:
>
> > > + split_lock_detect
> > > + [X86] Enable split lock detection
> > > + This is a real time or debugging feature. When enabled
> > > + (and if hardware support is present), atomic
> > > + instructions that access data across cache line
> > > + boundaries will result in an alignment check exception.
> > > + When triggered in applications the kernel will send
> > > + SIGBUS. The kernel will panic for a split lock in
> > > + OS code.
> >
> > It would be really nice to be able to enable/disable this runtime as
> > well, has this been raised before, and what was the conclusion?
>
> It has, previous versions had that. Somehow a lot of things went missing
> and we're back to a broken neutered useless mess.
>
> The problem appears to be that due to hardware design the feature cannot
> be virtualized, and instead of then disabling it when a VM runs/exists
> they just threw in the towel and went back to useless mode.. :-(
>
> This feature MUST be default enabled, otherwise everything will
> be/remain broken and we'll end up in the situation where you can't use
> it even if you wanted to.
>
> Imagine the BIOS/EFI/firmware containing an #AC exception. At that point
> the feature becomes useless, because you cannot enable it without your
> machine dying.
>
> Now, from long and painful experience we all know that if a BIOS can be
> wrong, it will be. Therefore this feature will be/is useless as
> presented.
>
> And I can't be arsed to look it up, but we've been making this very same
> argument since very early (possible the very first) version.
>
> So this version goes straight into the bit bucket. Please try again.

Also, just to remind everyone why we really want this. Split lock is a
potent, unprivileged, DoS vector.

It works nicely across guests and everything. Furthermore no sane
software should have #AC, because RISC machines have been throwing
alignment checks on stupid crap like that forever.

And even on x86, where it 'works' it has been a performance nightmare
for pretty much ever since we lost the Front Side Bus or something like
that.

2019-11-21 16:06:54

by Fenghua Yu

[permalink] [raw]
Subject: Re: [PATCH v10 6/6] x86/split_lock: Enable split lock detection by kernel parameter

On Thu, Nov 21, 2019 at 02:01:53PM +0100, Peter Zijlstra wrote:
> On Thu, Nov 21, 2019 at 07:04:44AM +0100, Ingo Molnar wrote:
> > * Fenghua Yu <[email protected]> wrote:
>
> > > + split_lock_detect
> > > + [X86] Enable split lock detection
> > > + This is a real time or debugging feature. When enabled
> > > + (and if hardware support is present), atomic
> > > + instructions that access data across cache line
> > > + boundaries will result in an alignment check exception.
> > > + When triggered in applications the kernel will send
> > > + SIGBUS. The kernel will panic for a split lock in
> > > + OS code.
> >
> > It would be really nice to be able to enable/disable this runtime as
> > well, has this been raised before, and what was the conclusion?
>
> It has, previous versions had that. Somehow a lot of things went missing
> and we're back to a broken neutered useless mess.
>
> The problem appears to be that due to hardware design the feature cannot
> be virtualized, and instead of then disabling it when a VM runs/exists
> they just threw in the towel and went back to useless mode.. :-(

It's a bit complex to virtualize the TEST_CTRL MSR because it's per core
instead of per thread. But it's still doable to virtualize it as
discussion:
https://lore.kernel.org/lkml/[email protected]/

KVM code will be released later. Even if there is no KVM code for split
lock, the patch set will killl qemu/guest if split lock happens there.
The goal of this patch set is to have a basic enabling code.

>
> This feature MUST be default enabled, otherwise everything will
> be/remain broken and we'll end up in the situation where you can't use
> it even if you wanted to.

The usage scope of this patch set is largely reduced to only real time.
The long split lock processing time (>1000 cycles) cannot be tolerated
by real time.

Real time customers do want to use this feature to detect the fatal
split lock error. They don't want any split lock issue from BIOS/EFI/
firmware/kerne/drivers/user apps.

Real time can enable the feature (set bit 29 in TEST_CTRL MSR) in BIOS and
don't need OS to enable it. But, #AC handler cannot handle split lock
in the kernel and will return to the faulting instruction and re-enter #AC. So
current #AC handler doesn't provide useful information for the customers.
That's why we add the new #AC handler in this patch set.

>
> Imagine the BIOS/EFI/firmware containing an #AC exception. At that point
> the feature becomes useless, because you cannot enable it without your
> machine dying.

I believe Intel real time team guarantees to deliever a split lock FREE
BIOS/EFI/firmware to their real time users.

From kernel point of view, we are working on a split lock free kernel.
Some blocking split lock issues have been fixed in TIP tree.

Only limited user apps can run on real time and should be split lock
free before they are allowed to run on the real time system.

So the feature is enabled only for real time that wants to have
a controlled split lock free environment.

The point is a split lock issue is a FATAL error on real time. Whenever
it happens, the long processing time (>1000 cycles) cannot meet hard real
time requirement any more and the system/user app has to die.

>
> Now, from long and painful experience we all know that if a BIOS can be
> wrong, it will be. Therefore this feature will be/is useless as
> presented.
>
> And I can't be arsed to look it up, but we've been making this very same
> argument since very early (possible the very first) version.
>
> So this version goes straight into the bit bucket. Please try again.

In summary, the patch set only wants to enable the feature for real time
and disable it by default.

Thanks.

-Fenghua

2019-11-21 17:14:07

by Ingo Molnar

[permalink] [raw]
Subject: Re: [PATCH v10 6/6] x86/split_lock: Enable split lock detection by kernel parameter


* Peter Zijlstra <[email protected]> wrote:

> On Thu, Nov 21, 2019 at 07:04:44AM +0100, Ingo Molnar wrote:
> > * Fenghua Yu <[email protected]> wrote:
>
> > > + split_lock_detect
> > > + [X86] Enable split lock detection
> > > + This is a real time or debugging feature. When enabled
> > > + (and if hardware support is present), atomic
> > > + instructions that access data across cache line
> > > + boundaries will result in an alignment check exception.
> > > + When triggered in applications the kernel will send
> > > + SIGBUS. The kernel will panic for a split lock in
> > > + OS code.
> >
> > It would be really nice to be able to enable/disable this runtime as
> > well, has this been raised before, and what was the conclusion?
>
> It has, previous versions had that. Somehow a lot of things went missing
> and we're back to a broken neutered useless mess.
>
> The problem appears to be that due to hardware design the feature cannot
> be virtualized, and instead of then disabling it when a VM runs/exists
> they just threw in the towel and went back to useless mode.. :-(
>
> This feature MUST be default enabled, otherwise everything will
> be/remain broken and we'll end up in the situation where you can't use
> it even if you wanted to.

Agreed.

> And I can't be arsed to look it up, but we've been making this very
> same argument since very early (possible the very first) version.

Yeah, I now have a distinct deja vu...

Thanks,

Ingo

2019-11-21 17:16:12

by Ingo Molnar

[permalink] [raw]
Subject: Re: [PATCH v10 6/6] x86/split_lock: Enable split lock detection by kernel parameter


* Fenghua Yu <[email protected]> wrote:

> > This feature MUST be default enabled, otherwise everything will
> > be/remain broken and we'll end up in the situation where you can't
> > use it even if you wanted to.
>
> The usage scope of this patch set is largely reduced to only real time.
> The long split lock processing time (>1000 cycles) cannot be tolerated
> by real time.
>
> Real time customers do want to use this feature to detect the fatal
> split lock error. They don't want any split lock issue from BIOS/EFI/
> firmware/kerne/drivers/user apps.
>
> Real time can enable the feature (set bit 29 in TEST_CTRL MSR) in BIOS
> and don't need OS to enable it. But, #AC handler cannot handle split
> lock in the kernel and will return to the faulting instruction and
> re-enter #AC. So current #AC handler doesn't provide useful information
> for the customers. That's why we add the new #AC handler in this patch
> set.

Immaterial - for this feature to be useful it must be default-enabled,
with reasonable quirk knobs offered to people who happen to be bitten by
such bugs and cannot fix the software.

But default-enabled is a must-have, as Peter said.

Thanks,

Ingo

2019-11-21 17:37:09

by Tony Luck

[permalink] [raw]
Subject: Re: [PATCH v10 6/6] x86/split_lock: Enable split lock detection by kernel parameter

On Thu, Nov 21, 2019 at 06:12:14PM +0100, Ingo Molnar wrote:
>
> * Peter Zijlstra <[email protected]> wrote:
>
> > On Thu, Nov 21, 2019 at 07:04:44AM +0100, Ingo Molnar wrote:
> > > * Fenghua Yu <[email protected]> wrote:
> >
> > > > + split_lock_detect
> > > > + [X86] Enable split lock detection
> > > > + This is a real time or debugging feature. When enabled
> > > > + (and if hardware support is present), atomic
> > > > + instructions that access data across cache line
> > > > + boundaries will result in an alignment check exception.
> > > > + When triggered in applications the kernel will send
> > > > + SIGBUS. The kernel will panic for a split lock in
> > > > + OS code.
> > >
> > > It would be really nice to be able to enable/disable this runtime as
> > > well, has this been raised before, and what was the conclusion?
> >
> > It has, previous versions had that. Somehow a lot of things went missing
> > and we're back to a broken neutered useless mess.
> >
> > The problem appears to be that due to hardware design the feature cannot
> > be virtualized, and instead of then disabling it when a VM runs/exists
> > they just threw in the towel and went back to useless mode.. :-(
> >
> > This feature MUST be default enabled, otherwise everything will
> > be/remain broken and we'll end up in the situation where you can't use
> > it even if you wanted to.
>
> Agreed.
>
> > And I can't be arsed to look it up, but we've been making this very
> > same argument since very early (possible the very first) version.
>
> Yeah, I now have a distinct deja vu...

You'll notice that we are at version 10 ... lots of things have been tried
in previous versions. This new version is to get the core functionality
in, so we can build fancier features later. Painful experience has shown
that trying to do this all at once just leads to churn with no progress.

Enabling by default at this point would result in a flurry of complaints
about applications being killed and kernels panicing. That would be
followed by:

#include <linus/all-caps-rant-about-backwards-compatability.h>

and the patches being reverted.

This version can serve a very useful purpose. CI systems with h/w that
supports split lock can enable it and begin the process of finding
and fixing the remaining kernel issues. Especially helpful if they run
randconfig and fuzzers.

We'd also find out which libraries and applications currently use
split locks.

Real-time folks that have identified split lock as a fatal (don't meet
their deadlines issue) could also enable it as is (because it is better
to crash the kernel and have the laser be powered down than to keep
firing long past the point it should have stopped).

Any developer with concerns about their BIOS using split locks can also
enable using this patch and begin testing today.

I'm totally on board with follow up patches providing extra features like:

A way to enable/disable at run time.

Providing a way to allow but rate limit applications that cause
split locks.

Figuring out something useful to do with virtualization.

Those are all good things to have - but we won't get *any* of them if we
wait until *all* have them have been perfected.

<soapbox>
So let's just take the first step now and solve world hunger tomorrow.
</soapbox>

-Tony

2019-11-21 17:37:45

by Peter Zijlstra

[permalink] [raw]
Subject: Re: [PATCH v10 6/6] x86/split_lock: Enable split lock detection by kernel parameter

On Thu, Nov 21, 2019 at 08:14:10AM -0800, Fenghua Yu wrote:

> The usage scope of this patch set is largely reduced to only real time.
> The long split lock processing time (>1000 cycles) cannot be tolerated
> by real time.

I'm thinking you're clueless on realtime. There's plenty of things that
can cause many cycles to go out the window. And just a single
instruction soaking up cycles like that really isn't the problem.

The problem is that split lock defeats isolation. An otherwise contained
task can have pretty horrific side effects on other tasks.

> Real time customers do want to use this feature to detect the fatal
> split lock error. They don't want any split lock issue from BIOS/EFI/
> firmware/kerne/drivers/user apps.

Cloud vendors also don't want them. Nobody wants them, they stink. They
have a system wide impact.

I don't want them on my system.

> > Imagine the BIOS/EFI/firmware containing an #AC exception. At that point
> > the feature becomes useless, because you cannot enable it without your
> > machine dying.
>
> I believe Intel real time team guarantees to deliever a split lock FREE
> BIOS/EFI/firmware to their real time users.

Not good enough. Any system shipping with this capability needs to have
a split lock free firmware blob. And the only way to make that happen is
to force enable it by default.

> From kernel point of view, we are working on a split lock free kernel.
> Some blocking split lock issues have been fixed in TIP tree.

Haven't we fixed them all by now?

> Only limited user apps can run on real time and should be split lock
> free before they are allowed to run on the real time system.

I'm thinking most of the normal Linux userspace will run just fine.
Seeing how other architectures have rejected such nonsense forever.

> In summary, the patch set only wants to enable the feature for real time
> and disable it by default.

We told you that wasn't good enough many times. Lot's of people run the
preemp-rt kernel on lots of different hardware. And like I said, even
cloudy folks would want this.

Features that require special firmware that nobody has are useless.

For giggles, run the below. You can notice your desktop getting slower.

---
#include <stdio.h>
#include <stdlib.h>
#include <sys/mman.h>

void main(void)
{
void *addr = mmap(NULL, 4096*2, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0);
unsigned int *var;
if (addr == (void*)-1) {
printf("fail\n");
return;
}

var = addr + 4096 - 2;

for (;;)
asm volatile ("lock incl %0" : : "m" (*var));
}

2019-11-21 17:46:56

by David Laight

[permalink] [raw]
Subject: RE: [PATCH v10 6/6] x86/split_lock: Enable split lock detection by kernel parameter

From: Ingo Molnar
> Sent: 21 November 2019 17:12
> * Peter Zijlstra <[email protected]> wrote:
...
> > This feature MUST be default enabled, otherwise everything will
> > be/remain broken and we'll end up in the situation where you can't use
> > it even if you wanted to.
>
> Agreed.

Before it can be enabled by default someone needs to go through the
kernel and fix all the code that abuses the 'bit' functions by using them
on int[] instead of long[].

I've only seen one fix go through for one use case of one piece of code
that repeatedly uses potentially misaligned int[] arrays for bitmasks.

David

-
Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK
Registration No: 1397386 (Wales)

2019-11-21 17:53:02

by Andy Lutomirski

[permalink] [raw]
Subject: Re: [PATCH v10 6/6] x86/split_lock: Enable split lock detection by kernel parameter

On Thu, Nov 21, 2019 at 9:43 AM David Laight <[email protected]> wrote:
>
> From: Ingo Molnar
> > Sent: 21 November 2019 17:12
> > * Peter Zijlstra <[email protected]> wrote:
> ...
> > > This feature MUST be default enabled, otherwise everything will
> > > be/remain broken and we'll end up in the situation where you can't use
> > > it even if you wanted to.
> >
> > Agreed.
>
> Before it can be enabled by default someone needs to go through the
> kernel and fix all the code that abuses the 'bit' functions by using them
> on int[] instead of long[].
>
> I've only seen one fix go through for one use case of one piece of code
> that repeatedly uses potentially misaligned int[] arrays for bitmasks.
>

Can we really not just change the lock asm to use 32-bit accesses for
set_bit(), etc? Sure, it will fail if the bit index is greater than
2^32, but that seems nuts.

(Why the *hell* do the bitops use long anyway? They're *bit masks*
for crying out loud. As in, users generally want to operate on fixed
numbers of bits.)

--Andy

2019-11-21 18:44:45

by Fenghua Yu

[permalink] [raw]
Subject: Re: [PATCH v10 6/6] x86/split_lock: Enable split lock detection by kernel parameter

On Thu, Nov 21, 2019 at 09:51:03AM -0800, Andy Lutomirski wrote:
> On Thu, Nov 21, 2019 at 9:43 AM David Laight <[email protected]> wrote:
> >
> > From: Ingo Molnar
> > > Sent: 21 November 2019 17:12
> > > * Peter Zijlstra <[email protected]> wrote:
> > ...
> > > > This feature MUST be default enabled, otherwise everything will
> > > > be/remain broken and we'll end up in the situation where you can't use
> > > > it even if you wanted to.
> > >
> > > Agreed.
> >
> > Before it can be enabled by default someone needs to go through the
> > kernel and fix all the code that abuses the 'bit' functions by using them
> > on int[] instead of long[].
> >
> > I've only seen one fix go through for one use case of one piece of code
> > that repeatedly uses potentially misaligned int[] arrays for bitmasks.
> >
>
> Can we really not just change the lock asm to use 32-bit accesses for
> set_bit(), etc? Sure, it will fail if the bit index is greater than
> 2^32, but that seems nuts.
>
> (Why the *hell* do the bitops use long anyway? They're *bit masks*
> for crying out loud. As in, users generally want to operate on fixed
> numbers of bits.)

We are working on a separate patch set to fix all split lock issues
in atomic bitops. Per Peter Anvin and Tony Luck suggestions:
1. Still keep the byte optimization if nr is constant. No split lock.
2. If type of *addr is unsigned long, do quadword atomic instruction
on addr. No split lock.
3. If type of *addr is unsigned int, do word atomic instruction
on addr. No split lock.
4. Otherwise, re-calculate addr to point the 32-bit address which contains
the bit and operate on the bit. No split lock.

Only small percentage of atomic bitops calls are in case 4 (e.g. 3%
for set_bit()) which need a few extra instructions to re-calculate
address but can avoid big split lock overhead.

To get real type of *addr instead of type cast type "unsigned long",
the atomic bitops APIs are changed to macros from functions. This change
need to touch all architectures.

Thanks.

-Fenghua

2019-11-21 19:06:02

by Andy Lutomirski

[permalink] [raw]
Subject: Re: [PATCH v10 6/6] x86/split_lock: Enable split lock detection by kernel parameter


> On Nov 21, 2019, at 10:40 AM, Fenghua Yu <[email protected]> wrote:
>
> On Thu, Nov 21, 2019 at 09:51:03AM -0800, Andy Lutomirski wrote:
>>> On Thu, Nov 21, 2019 at 9:43 AM David Laight <[email protected]> wrote:
>>>
>>> From: Ingo Molnar
>>>> Sent: 21 November 2019 17:12
>>>> * Peter Zijlstra <[email protected]> wrote:
>>> ...
>>>>> This feature MUST be default enabled, otherwise everything will
>>>>> be/remain broken and we'll end up in the situation where you can't use
>>>>> it even if you wanted to.
>>>>
>>>> Agreed.
>>>
>>> Before it can be enabled by default someone needs to go through the
>>> kernel and fix all the code that abuses the 'bit' functions by using them
>>> on int[] instead of long[].
>>>
>>> I've only seen one fix go through for one use case of one piece of code
>>> that repeatedly uses potentially misaligned int[] arrays for bitmasks.
>>>
>>
>> Can we really not just change the lock asm to use 32-bit accesses for
>> set_bit(), etc? Sure, it will fail if the bit index is greater than
>> 2^32, but that seems nuts.
>>
>> (Why the *hell* do the bitops use long anyway? They're *bit masks*
>> for crying out loud. As in, users generally want to operate on fixed
>> numbers of bits.)
>
> We are working on a separate patch set to fix all split lock issues
> in atomic bitops. Per Peter Anvin and Tony Luck suggestions:
> 1. Still keep the byte optimization if nr is constant. No split lock.
> 2. If type of *addr is unsigned long, do quadword atomic instruction
> on addr. No split lock.
> 3. If type of *addr is unsigned int, do word atomic instruction
> on addr. No split lock.
> 4. Otherwise, re-calculate addr to point the 32-bit address which contains
> the bit and operate on the bit. No split lock.
>
> Only small percentage of atomic bitops calls are in case 4 (e.g. 3%
> for set_bit()) which need a few extra instructions to re-calculate
> address but can avoid big split lock overhead.
>
> To get real type of *addr instead of type cast type "unsigned long",
> the atomic bitops APIs are changed to macros from functions. This change
> need to touch all architectures.
>

Isn’t the kernel full of casts to long* to match the signature? Doing this based on type seems silly to me. I think it’s better to just to a 32-bit operation unconditionally and to try to optimize it using b*l when safe.

2019-11-21 19:51:12

by Peter Zijlstra

[permalink] [raw]
Subject: Re: [PATCH v10 6/6] x86/split_lock: Enable split lock detection by kernel parameter

On Thu, Nov 21, 2019 at 10:53:03AM -0800, Fenghua Yu wrote:

> We are working on a separate patch set to fix all split lock issues
> in atomic bitops. Per Peter Anvin and Tony Luck suggestions:
> 1. Still keep the byte optimization if nr is constant. No split lock.
> 2. If type of *addr is unsigned long, do quadword atomic instruction
> on addr. No split lock.
> 3. If type of *addr is unsigned int, do word atomic instruction
> on addr. No split lock.
> 4. Otherwise, re-calculate addr to point the 32-bit address which contains
> the bit and operate on the bit. No split lock.

Yeah, let's not do that. That sounds overly complicated for no real
purpose.

2019-11-21 19:59:44

by Peter Zijlstra

[permalink] [raw]
Subject: Re: [PATCH v10 6/6] x86/split_lock: Enable split lock detection by kernel parameter

On Thu, Nov 21, 2019 at 09:51:03AM -0800, Andy Lutomirski wrote:

> Can we really not just change the lock asm to use 32-bit accesses for
> set_bit(), etc? Sure, it will fail if the bit index is greater than
> 2^32, but that seems nuts.

There are 64bit architectures that do exactly that: Alpha, IA64.

And because of the byte 'optimization' from x86 we already could not
rely on word atomicity (we actually play games with multi-bit atomicity
for PG_waiters and clear_bit_unlock_is_negative_byte).

Also, there's a fun paper on the properties of mixed size atomic
operations for when you want to hurt your brain real bad:

https://www.cl.cam.ac.uk/~pes20/popl17/mixed-size.pdf

_If_ we're going to change the bitops interface, I would propose we
change it to u32 and mandate every operation is indeed 32bit wide.

2019-11-21 20:16:22

by Fenghua Yu

[permalink] [raw]
Subject: Re: [PATCH v10 6/6] x86/split_lock: Enable split lock detection by kernel parameter

On Thu, Nov 21, 2019 at 11:01:39AM -0800, Andy Lutomirski wrote:
>
> > On Nov 21, 2019, at 10:40 AM, Fenghua Yu <[email protected]> wrote:
> >
> > On Thu, Nov 21, 2019 at 09:51:03AM -0800, Andy Lutomirski wrote:
> >>> On Thu, Nov 21, 2019 at 9:43 AM David Laight <[email protected]> wrote:
> >>>
> >>> From: Ingo Molnar
> >>>> Sent: 21 November 2019 17:12
> >>>> * Peter Zijlstra <[email protected]> wrote:
> >>> ...
> >>>>> This feature MUST be default enabled, otherwise everything will
> >>>>> be/remain broken and we'll end up in the situation where you can't use
> >>>>> it even if you wanted to.
> >>>>
> >>>> Agreed.
> >>>
> >>> Before it can be enabled by default someone needs to go through the
> >>> kernel and fix all the code that abuses the 'bit' functions by using them
> >>> on int[] instead of long[].
> >>>
> >>> I've only seen one fix go through for one use case of one piece of code
> >>> that repeatedly uses potentially misaligned int[] arrays for bitmasks.
> >>>
> >>
> >> Can we really not just change the lock asm to use 32-bit accesses for
> >> set_bit(), etc? Sure, it will fail if the bit index is greater than
> >> 2^32, but that seems nuts.
> >>
> >> (Why the *hell* do the bitops use long anyway? They're *bit masks*
> >> for crying out loud. As in, users generally want to operate on fixed
> >> numbers of bits.)
> >
> > We are working on a separate patch set to fix all split lock issues
> > in atomic bitops. Per Peter Anvin and Tony Luck suggestions:
> > 1. Still keep the byte optimization if nr is constant. No split lock.
> > 2. If type of *addr is unsigned long, do quadword atomic instruction
> > on addr. No split lock.
> > 3. If type of *addr is unsigned int, do word atomic instruction
> > on addr. No split lock.
> > 4. Otherwise, re-calculate addr to point the 32-bit address which contains
> > the bit and operate on the bit. No split lock.
> >
> > Only small percentage of atomic bitops calls are in case 4 (e.g. 3%
> > for set_bit()) which need a few extra instructions to re-calculate
> > address but can avoid big split lock overhead.
> >
> > To get real type of *addr instead of type cast type "unsigned long",
> > the atomic bitops APIs are changed to macros from functions. This change
> > need to touch all architectures.
> >
>
> Isn’t the kernel full of casts to long* to match the signature? Doing this based on type seems silly to me. I think it’s better to just to a 32-bit operation unconditionally and to try to optimize it
>using b*l when safe.

Actually we only find 8 places calling atomic bitops using type casting
"unsigned long *". After above changes, other 8 patches remove the type
castings and then split lock free in atomic bitops in the current kernel.

To check type casting in new patches, we add checkpatch.pl to warn on
any type casting on atomic bitops in new patches because the APIs are
marocs and gcc doesn't warn/issue error on type casting.

Using b*l will change the 8 places as well plus a lot of other places
where *addr is defined as "unsigned long *", right?

Thanks.

-Fenghua

2019-11-21 20:24:14

by Peter Zijlstra

[permalink] [raw]
Subject: Re: [PATCH v10 6/6] x86/split_lock: Enable split lock detection by kernel parameter

On Thu, Nov 21, 2019 at 12:25:35PM -0800, Fenghua Yu wrote:

> > > We are working on a separate patch set to fix all split lock issues
> > > in atomic bitops. Per Peter Anvin and Tony Luck suggestions:
> > > 1. Still keep the byte optimization if nr is constant. No split lock.
> > > 2. If type of *addr is unsigned long, do quadword atomic instruction
> > > on addr. No split lock.
> > > 3. If type of *addr is unsigned int, do word atomic instruction
> > > on addr. No split lock.
> > > 4. Otherwise, re-calculate addr to point the 32-bit address which contains
> > > the bit and operate on the bit. No split lock.

> Actually we only find 8 places calling atomic bitops using type casting
> "unsigned long *". After above changes, other 8 patches remove the type
> castings and then split lock free in atomic bitops in the current kernel.

Those above changes are never going to happen.

2019-11-21 20:27:07

by Peter Zijlstra

[permalink] [raw]
Subject: Re: [PATCH v10 6/6] x86/split_lock: Enable split lock detection by kernel parameter

On Thu, Nov 21, 2019 at 10:53:03AM -0800, Fenghua Yu wrote:

> 4. Otherwise, re-calculate addr to point the 32-bit address which contains
> the bit and operate on the bit. No split lock.

That sounds confused, Even BT{,CRS} have a RmW size. There is no
'operate on the bit'.

Specifically I hard rely on BTSL to be a 32bit RmW, see commit:

7aa54be29765 ("locking/qspinlock, x86: Provide liveness guarantee")

You might need to read this paper:

https://www.cl.cam.ac.uk/~pes20/popl17/mixed-size.pdf

2019-11-21 21:03:55

by Andy Lutomirski

[permalink] [raw]
Subject: Re: [PATCH v10 6/6] x86/split_lock: Enable split lock detection by kernel parameter


> On Nov 21, 2019, at 11:56 AM, Peter Zijlstra <[email protected]> wrote:
>
> On Thu, Nov 21, 2019 at 09:51:03AM -0800, Andy Lutomirski wrote:
>
>> Can we really not just change the lock asm to use 32-bit accesses for
>> set_bit(), etc? Sure, it will fail if the bit index is greater than
>> 2^32, but that seems nuts.
>
> There are 64bit architectures that do exactly that: Alpha, IA64.
>
> And because of the byte 'optimization' from x86 we already could not
> rely on word atomicity (we actually play games with multi-bit atomicity
> for PG_waiters and clear_bit_unlock_is_negative_byte).

I read a couple pages of the paper you linked and I didn’t spot what you’re talking about as it refers to x86. What are the relevant word properties of x86 bitops or the byte optimization?

2019-11-21 21:26:21

by Andy Lutomirski

[permalink] [raw]
Subject: Re: [PATCH v10 6/6] x86/split_lock: Enable split lock detection by kernel parameter

On Thu, Nov 21, 2019 at 12:25 PM Peter Zijlstra <[email protected]> wrote:
>
> On Thu, Nov 21, 2019 at 10:53:03AM -0800, Fenghua Yu wrote:
>
> > 4. Otherwise, re-calculate addr to point the 32-bit address which contains
> > the bit and operate on the bit. No split lock.
>
> That sounds confused, Even BT{,CRS} have a RmW size. There is no
> 'operate on the bit'.
>
> Specifically I hard rely on BTSL to be a 32bit RmW, see commit:
>
> 7aa54be29765 ("locking/qspinlock, x86: Provide liveness guarantee")
>

Okay, spent a bit of time trying to grok this. Are you saying that
LOCK BTSL suffices in a case where LOCK BTSB or LOCK XCHG8 would not?
On x86, all the LOCK operations are full barriers, so they should
order with adjacent normal accesses even to unrelated addresses,
right?

I certainly understand that a *non-locked* RMW to a bit might need to
have a certain width to get the right ordering guarantees, but those
aren't affected by split-lock detection regardless.

--Andy

2019-11-21 21:53:44

by Tony Luck

[permalink] [raw]
Subject: Re: [PATCH v10 6/6] x86/split_lock: Enable split lock detection by kernel parameter

On Thu, Nov 21, 2019 at 02:15:22PM +0100, Peter Zijlstra wrote:
> Also, just to remind everyone why we really want this. Split lock is a
> potent, unprivileged, DoS vector.

So how much do we "really want this"?

It's been 543 days since the first version of this patch was
posted. We've made exactly zero progress.

Current cut down patch series is the foundation to move one
small step towards getting this done.

Almost all of what's in this set will be required in whatever
final solution we want to end up with. Out of this:

Documentation/admin-guide/kernel-parameters.txt | 10 +++
arch/x86/include/asm/cpu.h | 5 +
arch/x86/include/asm/cpufeatures.h | 2
arch/x86/include/asm/msr-index.h | 8 ++
arch/x86/include/asm/traps.h | 3 +
arch/x86/kernel/cpu/common.c | 2
arch/x86/kernel/cpu/intel.c | 72 ++++++++++++++++++++++++
arch/x86/kernel/traps.c | 22 +++++++
8 files changed, 123 insertions(+), 1 deletion(-)

the only substantive thing that will *change* is to make the default
be "on" rather than "off".

Everything else we want to do is *additions* to this base. We could
wait until we have those done and maybe see if we can stall out this
series to an even thousand days. Or, we can take the imperfect base
and build incrementally on it.

You've expressed concern about firmware ... with a simple kernel command
line switch to flip, LUV (https://01.org/linux-uefi-validation) could begin
testing to make sure that firmware is ready for the big day when we throw
the switch from "off" to "on".

-Tony

2019-11-21 22:26:40

by Andy Lutomirski

[permalink] [raw]
Subject: Re: [PATCH v10 6/6] x86/split_lock: Enable split lock detection by kernel parameter


> On Nov 21, 2019, at 1:51 PM, Luck, Tony <[email protected]> wrote:
>
> On Thu, Nov 21, 2019 at 02:15:22PM +0100, Peter Zijlstra wrote:
>> Also, just to remind everyone why we really want this. Split lock is a
>> potent, unprivileged, DoS vector.
>
> So how much do we "really want this"?
>
> It's been 543 days since the first version of this patch was
> posted. We've made exactly zero progress.
>
> Current cut down patch series is the foundation to move one
> small step towards getting this done.
>
> Almost all of what's in this set will be required in whatever
> final solution we want to end up with. Out of this:

Why don’t we beat it into shape and apply it, hidden behind BROKEN. Then we can work on the rest of the patches and have a way to test them.

It would be really, really nice if we could pass this feature through to a VM. Can we?

2019-11-21 22:31:29

by Tony Luck

[permalink] [raw]
Subject: RE: [PATCH v10 6/6] x86/split_lock: Enable split lock detection by kernel parameter

> It would be really, really nice if we could pass this feature through to a VM. Can we?

It's hard because the MSR is core scoped rather than thread scoped. So on an HT
enabled system a pair of logical processors gets enabled/disabled together.

-Tony

2019-11-21 23:21:44

by Andy Lutomirski

[permalink] [raw]
Subject: Re: [PATCH v10 6/6] x86/split_lock: Enable split lock detection by kernel parameter



> On Nov 21, 2019, at 2:29 PM, Luck, Tony <[email protected]> wrote:
>
> 
>>
>> It would be really, really nice if we could pass this feature through to a VM. Can we?
>
> It's hard because the MSR is core scoped rather than thread scoped. So on an HT
> enabled system a pair of logical processors gets enabled/disabled together.
>
>

Well that sucks.

Could we pass it through if the host has no HT? Debugging is *so* much easier in a VM. And HT is a bit dubious these days anyway.

2019-11-21 23:43:16

by Fenghua Yu

[permalink] [raw]
Subject: Re: [PATCH v10 6/6] x86/split_lock: Enable split lock detection by kernel parameter

On Thu, Nov 21, 2019 at 03:18:46PM -0800, Andy Lutomirski wrote:
>
>
> > On Nov 21, 2019, at 2:29 PM, Luck, Tony <[email protected]> wrote:
> >
> > 
> >>
> >> It would be really, really nice if we could pass this feature through to a VM. Can we?
> >
> > It's hard because the MSR is core scoped rather than thread scoped. So on an HT
> > enabled system a pair of logical processors gets enabled/disabled together.
> >
> >
>
> Well that sucks.
>
> Could we pass it through if the host has no HT? Debugging is *so* much easier in a VM. And HT is a bit dubious these days anyway.

I think it's doable to pass it through to KVM. The difficulty is to disable
split lock detection in KVM because that will disable split lock on
the whole core including threads for the host. Without disabling split lock
in KVM, it's doable to debug split lock in KVM.

Sean and Xiaoyao are working on split lock for KVM (in separate patch set).
They may have insight on how to do this.

Thanks.

-Fenghua

2019-11-21 23:58:04

by Tony Luck

[permalink] [raw]
Subject: RE: [PATCH v10 6/6] x86/split_lock: Enable split lock detection by kernel parameter

> Could we pass it through if the host has no HT? Debugging is *so* much easier in a VM. And HT is a bit dubious these days anyway.

Sure ... we can look at doing that in a future series once we get to agreement on the foundation pieces.

-Tony

2019-11-22 01:00:07

by Tony Luck

[permalink] [raw]
Subject: Re: [PATCH v10 6/6] x86/split_lock: Enable split lock detection by kernel parameter

On Thu, Nov 21, 2019 at 02:24:21PM -0800, Andy Lutomirski wrote:
>
> > On Nov 21, 2019, at 1:51 PM, Luck, Tony <[email protected]> wrote:

> > Almost all of what's in this set will be required in whatever
> > final solution we want to end up with. Out of this:
>
> Why don’t we beat it into shape and apply it, hidden behind BROKEN.
> Then we can work on the rest of the patches and have a way to test them.

That's my goal (and thanks for the help with the constructive beating,
"die" is a much better choice that "panic" at this stage of development).

I'm not sure I see the need to hide it behind BROKEN. The reasoning
behind choosing disabled by default was so that this wouldn't affect
anyone unless they chose to turn it on.

-Tony

2019-11-22 01:56:01

by Sean Christopherson

[permalink] [raw]
Subject: Re: [PATCH v10 6/6] x86/split_lock: Enable split lock detection by kernel parameter

On Thu, Nov 21, 2019 at 03:53:29PM -0800, Fenghua Yu wrote:
> On Thu, Nov 21, 2019 at 03:18:46PM -0800, Andy Lutomirski wrote:
> >
> > > On Nov 21, 2019, at 2:29 PM, Luck, Tony <[email protected]> wrote:
> > >
> > >> It would be really, really nice if we could pass this feature through to a VM. Can we?
> > >
> > > It's hard because the MSR is core scoped rather than thread scoped. So on an HT
> > > enabled system a pair of logical processors gets enabled/disabled together.
> > >
> >
> > Well that sucks.
> >
> > Could we pass it through if the host has no HT? Debugging is *so* much
> > easier in a VM. And HT is a bit dubious these days anyway.
>
> I think it's doable to pass it through to KVM. The difficulty is to disable
> split lock detection in KVM because that will disable split lock on the whole
> core including threads for the host. Without disabling split lock in KVM,
> it's doable to debug split lock in KVM.
>
> Sean and Xiaoyao are working on split lock for KVM (in separate patch set).
> They may have insight on how to do this.

Yes, with SMT off KVM could allow the guest to enable split lock #AC, but
for the initial implementation we'd want to allow it if and only if split
lock #AC is disabled in the host kernel. Otherwise we have to pull in the
logic to control whether or not a guest can disable split lock #AC, what
to do if a split lock #AC happens when it's enabled by the host but
disabled by the guest, etc...

2019-11-22 02:25:44

by Andy Lutomirski

[permalink] [raw]
Subject: Re: [PATCH v10 6/6] x86/split_lock: Enable split lock detection by kernel parameter


> On Nov 21, 2019, at 5:52 PM, Sean Christopherson <[email protected]> wrote:
>
> On Thu, Nov 21, 2019 at 03:53:29PM -0800, Fenghua Yu wrote:
>>> On Thu, Nov 21, 2019 at 03:18:46PM -0800, Andy Lutomirski wrote:
>>>
>>>> On Nov 21, 2019, at 2:29 PM, Luck, Tony <[email protected]> wrote:
>>>>
>>>>> It would be really, really nice if we could pass this feature through to a VM. Can we?
>>>>
>>>> It's hard because the MSR is core scoped rather than thread scoped. So on an HT
>>>> enabled system a pair of logical processors gets enabled/disabled together.
>>>>
>>>
>>> Well that sucks.
>>>
>>> Could we pass it through if the host has no HT? Debugging is *so* much
>>> easier in a VM. And HT is a bit dubious these days anyway.
>>
>> I think it's doable to pass it through to KVM. The difficulty is to disable
>> split lock detection in KVM because that will disable split lock on the whole
>> core including threads for the host. Without disabling split lock in KVM,
>> it's doable to debug split lock in KVM.
>>
>> Sean and Xiaoyao are working on split lock for KVM (in separate patch set).
>> They may have insight on how to do this.
>
> Yes, with SMT off KVM could allow the guest to enable split lock #AC, but
> for the initial implementation we'd want to allow it if and only if split
> lock #AC is disabled in the host kernel. Otherwise we have to pull in the
> logic to control whether or not a guest can disable split lock #AC, what
> to do if a split lock #AC happens when it's enabled by the host but
> disabled by the guest, etc...

What’s the actual issue? There’s a window around entry and exit when a split lock in the host might not give #AC, but as long as no user code is run, this doesn’t seem like a big problem.

2019-11-22 02:44:27

by Xiaoyao Li

[permalink] [raw]
Subject: Re: [PATCH v10 6/6] x86/split_lock: Enable split lock detection by kernel parameter

On 11/22/2019 10:21 AM, Andy Lutomirski wrote:
>
>> On Nov 21, 2019, at 5:52 PM, Sean Christopherson <[email protected]> wrote:
>>
>> On Thu, Nov 21, 2019 at 03:53:29PM -0800, Fenghua Yu wrote:
>>>> On Thu, Nov 21, 2019 at 03:18:46PM -0800, Andy Lutomirski wrote:
>>>>
>>>>> On Nov 21, 2019, at 2:29 PM, Luck, Tony <[email protected]> wrote:
>>>>>
>>>>>> It would be really, really nice if we could pass this feature through to a VM. Can we?
>>>>>
>>>>> It's hard because the MSR is core scoped rather than thread scoped. So on an HT
>>>>> enabled system a pair of logical processors gets enabled/disabled together.
>>>>>
>>>>
>>>> Well that sucks.
>>>>
>>>> Could we pass it through if the host has no HT? Debugging is *so* much
>>>> easier in a VM. And HT is a bit dubious these days anyway.
>>>
>>> I think it's doable to pass it through to KVM. The difficulty is to disable
>>> split lock detection in KVM because that will disable split lock on the whole
>>> core including threads for the host. Without disabling split lock in KVM,
>>> it's doable to debug split lock in KVM.
>>>
>>> Sean and Xiaoyao are working on split lock for KVM (in separate patch set).
>>> They may have insight on how to do this.
>>
>> Yes, with SMT off KVM could allow the guest to enable split lock #AC, but
>> for the initial implementation we'd want to allow it if and only if split
>> lock #AC is disabled in the host kernel. Otherwise we have to pull in the
>> logic to control whether or not a guest can disable split lock #AC, what
>> to do if a split lock #AC happens when it's enabled by the host but
>> disabled by the guest, etc...
>
> What’s the actual issue? There’s a window around entry and exit when a split lock in the host might not give #AC, but as long as no user code is run, this doesn’t seem like a big problem.
>
The problem is that guest can trigger split locked memory access just by
disabling split lock #AC even when host has it enabled. In this
situation, there is bus lock held on the hardware without #AC triggered,
which is conflict with the purpose that host enables split lock #AC

2019-11-22 03:02:09

by Andy Lutomirski

[permalink] [raw]
Subject: Re: [PATCH v10 6/6] x86/split_lock: Enable split lock detection by kernel parameter


> On Nov 21, 2019, at 6:39 PM, Xiaoyao Li <[email protected]> wrote:
>
> On 11/22/2019 10:21 AM, Andy Lutomirski wrote:
>>>> On Nov 21, 2019, at 5:52 PM, Sean Christopherson <[email protected]> wrote:
>>>
>>> On Thu, Nov 21, 2019 at 03:53:29PM -0800, Fenghua Yu wrote:
>>>>> On Thu, Nov 21, 2019 at 03:18:46PM -0800, Andy Lutomirski wrote:
>>>>>
>>>>>> On Nov 21, 2019, at 2:29 PM, Luck, Tony <[email protected]> wrote:
>>>>>>
>>>>>>> It would be really, really nice if we could pass this feature through to a VM. Can we?
>>>>>>
>>>>>> It's hard because the MSR is core scoped rather than thread scoped. So on an HT
>>>>>> enabled system a pair of logical processors gets enabled/disabled together.
>>>>>>
>>>>>
>>>>> Well that sucks.
>>>>>
>>>>> Could we pass it through if the host has no HT? Debugging is *so* much
>>>>> easier in a VM. And HT is a bit dubious these days anyway.
>>>>
>>>> I think it's doable to pass it through to KVM. The difficulty is to disable
>>>> split lock detection in KVM because that will disable split lock on the whole
>>>> core including threads for the host. Without disabling split lock in KVM,
>>>> it's doable to debug split lock in KVM.
>>>>
>>>> Sean and Xiaoyao are working on split lock for KVM (in separate patch set).
>>>> They may have insight on how to do this.
>>>
>>> Yes, with SMT off KVM could allow the guest to enable split lock #AC, but
>>> for the initial implementation we'd want to allow it if and only if split
>>> lock #AC is disabled in the host kernel. Otherwise we have to pull in the
>>> logic to control whether or not a guest can disable split lock #AC, what
>>> to do if a split lock #AC happens when it's enabled by the host but
>>> disabled by the guest, etc...
>> What’s the actual issue? There’s a window around entry and exit when a split lock in the host might not give #AC, but as long as no user code is run, this doesn’t seem like a big problem.
> The problem is that guest can trigger split locked memory access just by disabling split lock #AC even when host has it enabled. In this situation, there is bus lock held on the hardware without #AC triggered, which is conflict with the purpose that host enables split lock #AC

Fair enough. You need some way to get this enabled in guests eventually, though.

2019-11-22 09:30:43

by Peter Zijlstra

[permalink] [raw]
Subject: Re: [PATCH v10 6/6] x86/split_lock: Enable split lock detection by kernel parameter

On Thu, Nov 21, 2019 at 01:22:13PM -0800, Andy Lutomirski wrote:
> On Thu, Nov 21, 2019 at 12:25 PM Peter Zijlstra <[email protected]> wrote:
> >
> > On Thu, Nov 21, 2019 at 10:53:03AM -0800, Fenghua Yu wrote:
> >
> > > 4. Otherwise, re-calculate addr to point the 32-bit address which contains
> > > the bit and operate on the bit. No split lock.
> >
> > That sounds confused, Even BT{,CRS} have a RmW size. There is no
> > 'operate on the bit'.
> >
> > Specifically I hard rely on BTSL to be a 32bit RmW, see commit:
> >
> > 7aa54be29765 ("locking/qspinlock, x86: Provide liveness guarantee")
> >
>
> Okay, spent a bit of time trying to grok this. Are you saying that
> LOCK BTSL suffices in a case where LOCK BTSB or LOCK XCHG8 would not?

Yep.

> On x86, all the LOCK operations are full barriers, so they should
> order with adjacent normal accesses even to unrelated addresses,
> right?

Yep, still.

The barrier is not the problem here. Yes the whole value load must come
after the atomic op, be it XCHGB/BTSB or BTSL.

The problem with XCHGB is that it is an 8bit RmW and therefore it makes
no guarantess on the contents of the bytes next to it.

When we use byte ops, we must consider the word as 4 independent
variables. And in that case the later load might observe the lock-byte
state from 3, because the modification to the lock byte from 4 is in
CPU2's store-buffer.

However, by using a 32bit RmW, we force a write on all 4 bytes at the
same time which forces that store from CPU2 to be flushed (because the
operations overlap, whereas an 8byte RmW would not overlap and be
independent).

Now, it _might_ work with an XCHGB anyway, _if_ coherency is per
cacheline, and not on a smaller granularity. But I don't think that is
something the architecture guarantees -- they could play fun and games
with partial forwards or whatever.

Specifically, we made this change:

450cbdd0125c ("locking/x86: Use LOCK ADD for smp_mb() instead of MFENCE")

Now, we know that MFENCE will in fact flush the store buffers, and LOCK
prefix being faster does seem to imply it does not. LOCK prefix only
guarantees order, it does not guarantee completion (that's what makes
MFENCE so much more expensive).


Also; if we're going to change the bitops API, that is a generic change
and we must consider all architectures. Me having audited the atomic
bitops width a fair number of times now. to answer question on what code
actually does and/or if a proposed change is valid, indicates the
current state is crap, irrespective of the long vs u32 question.

So I'm saying that if we're going to muck with bitops, lets make it a
simple and consistent thing. This concurrency crap is hard enough
without fancy bells on.

2019-11-22 09:38:52

by Peter Zijlstra

[permalink] [raw]
Subject: Re: [PATCH v10 6/6] x86/split_lock: Enable split lock detection by kernel parameter

On Thu, Nov 21, 2019 at 01:01:08PM -0800, Andy Lutomirski wrote:
>
> > On Nov 21, 2019, at 11:56 AM, Peter Zijlstra <[email protected]> wrote:
> >
> > On Thu, Nov 21, 2019 at 09:51:03AM -0800, Andy Lutomirski wrote:
> >
> >> Can we really not just change the lock asm to use 32-bit accesses for
> >> set_bit(), etc? Sure, it will fail if the bit index is greater than
> >> 2^32, but that seems nuts.
> >
> > There are 64bit architectures that do exactly that: Alpha, IA64.
> >
> > And because of the byte 'optimization' from x86 we already could not
> > rely on word atomicity (we actually play games with multi-bit atomicity
> > for PG_waiters and clear_bit_unlock_is_negative_byte).
>
> I read a couple pages of the paper you linked and I didn’t spot what
> you’re talking about as it refers to x86. What are the relevant word
> properties of x86 bitops or the byte optimization?

The paper mostly deals with Power and ARM, x86 only gets sporadic
mention. It does present a way to reason about mixed size atomic
operations though.

And the bitops API is very much cross-architecture. And like I wrote in
that other email, having audited the atomic bitop width a number of
times now makes me say no to anything complicated.

2019-11-22 09:49:21

by David Laight

[permalink] [raw]
Subject: RE: [PATCH v10 6/6] x86/split_lock: Enable split lock detection by kernel parameter

From Andy Lutomirski
> Sent: 21 November 2019 17:51
> On Thu, Nov 21, 2019 at 9:43 AM David Laight <[email protected]> wrote:
> >
> > From: Ingo Molnar
> > > Sent: 21 November 2019 17:12
> > > * Peter Zijlstra <[email protected]> wrote:
> > ...
> > > > This feature MUST be default enabled, otherwise everything will
> > > > be/remain broken and we'll end up in the situation where you can't use
> > > > it even if you wanted to.
> > >
> > > Agreed.
> >
> > Before it can be enabled by default someone needs to go through the
> > kernel and fix all the code that abuses the 'bit' functions by using them
> > on int[] instead of long[].
> >
> > I've only seen one fix go through for one use case of one piece of code
> > that repeatedly uses potentially misaligned int[] arrays for bitmasks.
> >
>
> Can we really not just change the lock asm to use 32-bit accesses for
> set_bit(), etc? Sure, it will fail if the bit index is greater than
> 2^32, but that seems nuts.

For little endian 64bit cpu it is safe(ish) to cast int [] to long [] for the bitops.
On BE 64bit cpu all hell breaks loose if you do that.
It really wasn't obvious that all the casts I found were anywhere near right
on 64bit BE systems.

So while it is almost certainly safe to change the x86-64 bitops to use
32 bit accesses, some of the code is horribly broken.

> (Why the *hell* do the bitops use long anyway? They're *bit masks*
> for crying out loud. As in, users generally want to operate on fixed
> numbers of bits.)

The bitops functions were (probably) written for large bitmaps that
are bigger than the size of a 'word' (> 32 bits) and likely to be
variable size.
Quite why they use long [] is anybody's guess, but that is the definition.
It also isn't quite clear to me why that are required to be atomic.
On x86 atomicity doesn't cost much, on other architectures the cost
is significant.

David

-
Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK
Registration No: 1397386 (Wales)

2019-11-22 10:12:49

by Peter Zijlstra

[permalink] [raw]
Subject: Re: [PATCH v10 6/6] x86/split_lock: Enable split lock detection by kernel parameter

On Thu, Nov 21, 2019 at 01:51:26PM -0800, Luck, Tony wrote:
> On Thu, Nov 21, 2019 at 02:15:22PM +0100, Peter Zijlstra wrote:
> > Also, just to remind everyone why we really want this. Split lock is a
> > potent, unprivileged, DoS vector.
>
> So how much do we "really want this"?
>
> It's been 543 days since the first version of this patch was
> posted. We've made exactly zero progress.

Well, I was thinking we were getting there, but then, all of 58 days ago
you discovered the MSR was per core, which is rather fundamental and
would've been rather useful to know at v1.

http://lkml.kernel.org/r/[email protected]

So that is ~485 days wasted because we didn't know how the hardware
actually worked. I'm not thinking that's on us.


Also, talk like:

> I believe Intel real time team guarantees to deliever a split lock FREE
> BIOS/EFI/firmware to their real time users.

is fundamentally misguided. Everybody who buys a chip (with this on) is
a potential real-time customer.


2019-11-22 10:53:12

by Peter Zijlstra

[permalink] [raw]
Subject: Re: [PATCH v10 6/6] x86/split_lock: Enable split lock detection by kernel parameter

On Thu, Nov 21, 2019 at 09:34:44AM -0800, Luck, Tony wrote:

> You'll notice that we are at version 10 ... lots of things have been tried
> in previous versions. This new version is to get the core functionality
> in, so we can build fancier features later.

The cover letter actually mentions that as a non-goal. Seems like a
conflicting message here.

> Enabling by default at this point would result in a flurry of complaints
> about applications being killed and kernels panicing. That would be
> followed by:

I thought we already found and fixed all the few kernel users that got
it wrong?

And applications? I've desktop'ed around a little with:

perf stat -e sq_misc.split_lock -a -I 1000

running and that shows exactly, a grant total of, _ZERO_ split lock
usage. Except when I run my explicit split lock proglet, then it goes
through the roof.

So I really don't buy that argument. Like I've been saying forever, sane
architectures have never allowed unaligned atomics in the first place,
this means that sane software won't have any.

Furthermore, split_lock has been a performance issue on x86 for a long
long time, which is another reason why x86-specific software will not
have them.

And if you really really worry, just do a mode that pr_warn()s about the
userspace instead of SIGBUS.

> #include <linus/all-caps-rant-about-backwards-compatability.h>
>
> and the patches being reverted.

I don't buy that either, it would _maybe_ mean flipping the default. But
that very much depends on how many users and what sort of 'quality'
software they're running.

I suspect we can get away with a no_split_lock_detect boot flag. We've
had various such kernel flags in the past for new/dodgy features and
we've lived through that just fine.

Witness: no5lvl, noapic, noclflush noefi, nofxsr, etc..

> This version can serve a very useful purpose. CI systems with h/w that
> supports split lock can enable it and begin the process of finding
> and fixing the remaining kernel issues. Especially helpful if they run
> randconfig and fuzzers.

A non-lethal default enabled variant would be even better for them :-)

> We'd also find out which libraries and applications currently use
> split locks.

On my debian desktop, absolutely nothing I've used in the past hour or
so. That includes both major browsers and some A/V stuff, as well as
building a kernel and writing emails.

> Any developer with concerns about their BIOS using split locks can also
> enable using this patch and begin testing today.

I don't worry about developers much; they can't fix their BIOS other
than to return the box and try and get their money back :/

2019-11-22 15:30:11

by Peter Zijlstra

[permalink] [raw]
Subject: Re: [PATCH v10 6/6] x86/split_lock: Enable split lock detection by kernel parameter

On Fri, Nov 22, 2019 at 11:51:41AM +0100, Peter Zijlstra wrote:

> A non-lethal default enabled variant would be even better for them :-)

fresh from the keyboard, *completely* untested.

it requires we get the kernel and firmware clean, but only warns about
dodgy userspace, which I really don't think there is much of.

getting the kernel clean should be pretty simple.

---
Documentation/admin-guide/kernel-parameters.txt | 18 +++
arch/x86/include/asm/cpu.h | 17 +++
arch/x86/include/asm/cpufeatures.h | 2 +
arch/x86/include/asm/msr-index.h | 8 ++
arch/x86/include/asm/thread_info.h | 6 +-
arch/x86/include/asm/traps.h | 1 +
arch/x86/kernel/cpu/common.c | 2 +
arch/x86/kernel/cpu/intel.c | 165 ++++++++++++++++++++++++
arch/x86/kernel/process.c | 3 +
arch/x86/kernel/traps.c | 28 +++-
10 files changed, 246 insertions(+), 4 deletions(-)

diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt
index 9983ac73b66d..18f15defdba6 100644
--- a/Documentation/admin-guide/kernel-parameters.txt
+++ b/Documentation/admin-guide/kernel-parameters.txt
@@ -3172,6 +3172,24 @@

nosoftlockup [KNL] Disable the soft-lockup detector.

+ split_lock_detect=
+ [X86] Enable split lock detection
+
+ When enabled (and if hardware support is present), atomic
+ instructions that access data across cache line
+ boundaries will result in an alignment check exception.
+
+ off - not enabled
+
+ warn - the kernel will pr_alert about applications
+ triggering the #AC exception
+
+ fatal - the kernel will SIGBUS applications that
+ trigger the #AC exception.
+
+ For any more other than 'off' the kernel will die if
+ it (or firmware) will trigger #AC.
+
nosync [HW,M68K] Disables sync negotiation for all devices.

nowatchdog [KNL] Disable both lockup detectors, i.e.
diff --git a/arch/x86/include/asm/cpu.h b/arch/x86/include/asm/cpu.h
index adc6cc86b062..fa75bbd502b3 100644
--- a/arch/x86/include/asm/cpu.h
+++ b/arch/x86/include/asm/cpu.h
@@ -40,4 +40,21 @@ int mwait_usable(const struct cpuinfo_x86 *);
unsigned int x86_family(unsigned int sig);
unsigned int x86_model(unsigned int sig);
unsigned int x86_stepping(unsigned int sig);
+#ifdef CONFIG_CPU_SUP_INTEL
+extern void __init cpu_set_core_cap_bits(struct cpuinfo_x86 *c);
+extern bool handle_split_lock(void);
+extern bool handle_user_split_lock(struct pt_regs *regs, long error_code);
+extern void switch_sld(void);
+#else
+static inline void __init cpu_set_core_cap_bits(struct cpuinfo_x86 *c) {}
+static inline bool handle_split_lock(void)
+{
+ return false;
+}
+static inline bool handle_user_split_lock(struct pt_regs *regs, long error_code)
+{
+ return false;
+}
+static inline void switch_sld(struct task_struct *prev) {}
+#endif
#endif /* _ASM_X86_CPU_H */
diff --git a/arch/x86/include/asm/cpufeatures.h b/arch/x86/include/asm/cpufeatures.h
index e9b62498fe75..c3edd2bba184 100644
--- a/arch/x86/include/asm/cpufeatures.h
+++ b/arch/x86/include/asm/cpufeatures.h
@@ -220,6 +220,7 @@
#define X86_FEATURE_ZEN ( 7*32+28) /* "" CPU is AMD family 0x17 (Zen) */
#define X86_FEATURE_L1TF_PTEINV ( 7*32+29) /* "" L1TF workaround PTE inversion */
#define X86_FEATURE_IBRS_ENHANCED ( 7*32+30) /* Enhanced IBRS */
+#define X86_FEATURE_SPLIT_LOCK_DETECT ( 7*32+31) /* #AC for split lock */

/* Virtualization flags: Linux defined, word 8 */
#define X86_FEATURE_TPR_SHADOW ( 8*32+ 0) /* Intel TPR Shadow */
@@ -365,6 +366,7 @@
#define X86_FEATURE_INTEL_STIBP (18*32+27) /* "" Single Thread Indirect Branch Predictors */
#define X86_FEATURE_FLUSH_L1D (18*32+28) /* Flush L1D cache */
#define X86_FEATURE_ARCH_CAPABILITIES (18*32+29) /* IA32_ARCH_CAPABILITIES MSR (Intel) */
+#define X86_FEATURE_CORE_CAPABILITIES (18*32+30) /* "" IA32_CORE_CAPABILITIES MSR */
#define X86_FEATURE_SPEC_CTRL_SSBD (18*32+31) /* "" Speculative Store Bypass Disable */

/*
diff --git a/arch/x86/include/asm/msr-index.h b/arch/x86/include/asm/msr-index.h
index 6a3124664289..7b25cec494fd 100644
--- a/arch/x86/include/asm/msr-index.h
+++ b/arch/x86/include/asm/msr-index.h
@@ -41,6 +41,10 @@

/* Intel MSRs. Some also available on other CPUs */

+#define MSR_TEST_CTRL 0x00000033
+#define MSR_TEST_CTRL_SPLIT_LOCK_DETECT_BIT 29
+#define MSR_TEST_CTRL_SPLIT_LOCK_DETECT BIT(MSR_TEST_CTRL_SPLIT_LOCK_DETECT_BIT)
+
#define MSR_IA32_SPEC_CTRL 0x00000048 /* Speculation Control */
#define SPEC_CTRL_IBRS BIT(0) /* Indirect Branch Restricted Speculation */
#define SPEC_CTRL_STIBP_SHIFT 1 /* Single Thread Indirect Branch Predictor (STIBP) bit */
@@ -70,6 +74,10 @@
*/
#define MSR_IA32_UMWAIT_CONTROL_TIME_MASK (~0x03U)

+#define MSR_IA32_CORE_CAPABILITIES 0x000000cf
+#define MSR_IA32_CORE_CAPABILITIES_SPLIT_LOCK_DETECT_BIT 5
+#define MSR_IA32_CORE_CAPABILITIES_SPLIT_LOCK_DETECT BIT(MSR_IA32_CORE_CAPABILITIES_SPLIT_LOCK_DETECT_BIT)
+
#define MSR_PKG_CST_CONFIG_CONTROL 0x000000e2
#define NHM_C3_AUTO_DEMOTE (1UL << 25)
#define NHM_C1_AUTO_DEMOTE (1UL << 26)
diff --git a/arch/x86/include/asm/thread_info.h b/arch/x86/include/asm/thread_info.h
index d779366ce3f8..d23638a0525e 100644
--- a/arch/x86/include/asm/thread_info.h
+++ b/arch/x86/include/asm/thread_info.h
@@ -92,6 +92,7 @@ struct thread_info {
#define TIF_NOCPUID 15 /* CPUID is not accessible in userland */
#define TIF_NOTSC 16 /* TSC is not accessible in userland */
#define TIF_IA32 17 /* IA32 compatibility process */
+#define TIF_SLD 18 /* split_lock_detect */
#define TIF_NOHZ 19 /* in adaptive nohz mode */
#define TIF_MEMDIE 20 /* is terminating due to OOM killer */
#define TIF_POLLING_NRFLAG 21 /* idle is polling for TIF_NEED_RESCHED */
@@ -122,6 +123,7 @@ struct thread_info {
#define _TIF_NOCPUID (1 << TIF_NOCPUID)
#define _TIF_NOTSC (1 << TIF_NOTSC)
#define _TIF_IA32 (1 << TIF_IA32)
+#define _TIF_SLD (1 << TIF_SLD)
#define _TIF_NOHZ (1 << TIF_NOHZ)
#define _TIF_POLLING_NRFLAG (1 << TIF_POLLING_NRFLAG)
#define _TIF_IO_BITMAP (1 << TIF_IO_BITMAP)
@@ -158,9 +160,9 @@ struct thread_info {

#ifdef CONFIG_X86_IOPL_IOPERM
# define _TIF_WORK_CTXSW_PREV (_TIF_WORK_CTXSW| _TIF_USER_RETURN_NOTIFY | \
- _TIF_IO_BITMAP)
+ _TIF_IO_BITMAP | _TIF_SLD)
#else
-# define _TIF_WORK_CTXSW_PREV (_TIF_WORK_CTXSW| _TIF_USER_RETURN_NOTIFY)
+# define _TIF_WORK_CTXSW_PREV (_TIF_WORK_CTXSW| _TIF_USER_RETURN_NOTIFY | _TIF_SLD)
#endif

#define _TIF_WORK_CTXSW_NEXT (_TIF_WORK_CTXSW)
diff --git a/arch/x86/include/asm/traps.h b/arch/x86/include/asm/traps.h
index b25e633033c3..2a7cfe8e8c3f 100644
--- a/arch/x86/include/asm/traps.h
+++ b/arch/x86/include/asm/traps.h
@@ -172,4 +172,5 @@ enum x86_pf_error_code {
X86_PF_INSTR = 1 << 4,
X86_PF_PK = 1 << 5,
};
+
#endif /* _ASM_X86_TRAPS_H */
diff --git a/arch/x86/kernel/cpu/common.c b/arch/x86/kernel/cpu/common.c
index 4fc016bc6abd..a6b176fc3996 100644
--- a/arch/x86/kernel/cpu/common.c
+++ b/arch/x86/kernel/cpu/common.c
@@ -1233,6 +1233,8 @@ static void __init early_identify_cpu(struct cpuinfo_x86 *c)

cpu_set_bug_bits(c);

+ cpu_set_core_cap_bits(c);
+
fpu__init_system(c);

#ifdef CONFIG_X86_32
diff --git a/arch/x86/kernel/cpu/intel.c b/arch/x86/kernel/cpu/intel.c
index 4a900804a023..d83b8031a124 100644
--- a/arch/x86/kernel/cpu/intel.c
+++ b/arch/x86/kernel/cpu/intel.c
@@ -19,6 +19,8 @@
#include <asm/microcode_intel.h>
#include <asm/hwcap2.h>
#include <asm/elf.h>
+#include <asm/cpu_device_id.h>
+#include <asm/cmdline.h>

#ifdef CONFIG_X86_64
#include <linux/topology.h>
@@ -31,6 +33,14 @@
#include <asm/apic.h>
#endif

+enum split_lock_detect_state {
+ sld_off = 0,
+ sld_warn,
+ sld_fatal,
+};
+
+static enum split_lock_detect_state sld_state = sld_warn;
+
/*
* Just in case our CPU detection goes bad, or you have a weird system,
* allow a way to override the automatic disabling of MPX.
@@ -652,6 +662,8 @@ static void init_intel_misc_features(struct cpuinfo_x86 *c)
wrmsrl(MSR_MISC_FEATURES_ENABLES, msr);
}

+static void split_lock_init(void);
+
static void init_intel(struct cpuinfo_x86 *c)
{
early_init_intel(c);
@@ -767,6 +779,8 @@ static void init_intel(struct cpuinfo_x86 *c)
tsx_enable();
if (tsx_ctrl_state == TSX_CTRL_DISABLE)
tsx_disable();
+
+ split_lock_init();
}

#ifdef CONFIG_X86_32
@@ -1028,3 +1042,154 @@ static const struct cpu_dev intel_cpu_dev = {
};

cpu_dev_register(intel_cpu_dev);
+
+#undef pr_fmt
+#define pr_fmt(fmt) "x86/split lock detection: " fmt
+
+static const struct {
+ const char *option;
+ enum split_lock_detect_state state;
+} sld_options[] __initconst = {
+ { "off", sld_off },
+ { "warn", sld_warn },
+ { "force", sld_force },
+};
+
+static void __init split_lock_setup(void)
+{
+ enum split_lock_detect_state sld = sld_state;
+ char arg[20];
+ int i, ret;
+
+ setup_force_cpu_cap(X86_FEATURE_SPLIT_LOCK_DETECT);
+
+ ret = cmdline_find_option(boot_command_line, "split_lock_detect",
+ arg, sizeof(arg));
+ if (ret < 0)
+ goto print;
+
+ for (i = 0; i < ARRAY_SIZE(sld_options); i++) {
+ if (match_option(arg, ret, sld_options[i].option)) {
+ sld = sld_options[i].state;
+ break;
+ }
+ }
+
+ if (sld != sld_state)
+ sld_state = sld;
+
+print:
+ switch(sld) {
+ case sld_off:
+ pr_info("disabled\n");
+ break;
+
+ case sld_warn:
+ pr_info("warning about user-space split_locks\n");
+ break;
+
+ case sld_fatal:
+ pr_info("sending SIGBUS on user-space split_locks\n");
+ break;
+ }
+}
+
+/*
+ * The TEST_CTRL MSR is per core. So multiple threads can
+ * read/write the MSR in parallel. But it's possible to
+ * simplify the read/write without locking and without
+ * worry about overwriting the MSR because only bit 29
+ * is implemented in the MSR and the bit is set as 1 by all
+ * threads. Locking may be needed in the future if situation
+ * is changed e.g. other bits are implemented.
+ */
+
+static bool __sld_msr_set(bool on)
+{
+ u64 test_ctrl_val;
+
+ if (rdmsrl_safe(MSR_TEST_CTRL, test_ctrl_val))
+ return false;
+
+ if (on)
+ test_ctrl_val |= MSR_TEST_CTRL_SPLIT_LOCK_DETECT;
+ else
+ test_ctrl_val &= ~MSR_TEST_CTRL_SPLIT_LOCK_DETECT;
+
+ if (wrmsrl_safe(MSR_TEST_CTRL, test_ctrl_val))
+ return false;
+
+ return true;
+}
+
+static void split_lock_init(void)
+{
+ u64 test_ctrl_val;
+
+ if (sld_state == sld_off)
+ return;
+
+ if (__sld_msr_set(true))
+ return;
+
+ /*
+ * If this is anything other than the boot-cpu, you've done
+ * funny things and you get to keep whatever pieces.
+ */
+ pr_warn("MSR fail -- disabled\n");
+ __sld_set_all(sld_off);
+}
+
+void handle_split_lock(void)
+{
+ return sld_state != sld_off;
+}
+
+void handle_user_split_lock(struct pt_regs *regs, long error_code)
+{
+ if (sld_state == sld_fatal)
+ return false;
+
+ pr_alert("#AC: %s/%d took a split_lock trap at address: 0x%lx\n",
+ current->comm, current->pid, regs->ip);
+
+ __sld_set_msr(false);
+ set_tsk_thread_flag(current, TIF_CLD);
+ return true;
+}
+
+void switch_sld(struct task_struct *prev)
+{
+ __sld_set_msr(true);
+ clear_tsk_thread_flag(current, TIF_CLD);
+}
+
+#define SPLIT_LOCK_CPU(model) {X86_VENDOR_INTEL, 6, model, X86_FEATURE_ANY}
+
+/*
+ * The following processors have split lock detection feature. But since they
+ * don't have MSR IA32_CORE_CAPABILITIES, the feature cannot be enumerated by
+ * the MSR. So enumerate the feature by family and model on these processors.
+ */
+static const struct x86_cpu_id split_lock_cpu_ids[] __initconst = {
+ SPLIT_LOCK_CPU(INTEL_FAM6_ICELAKE_X),
+ SPLIT_LOCK_CPU(INTEL_FAM6_ICELAKE_L),
+ {}
+};
+
+void __init cpu_set_core_cap_bits(struct cpuinfo_x86 *c)
+{
+ u64 ia32_core_caps = 0;
+
+ if (cpu_has(c, X86_FEATURE_CORE_CAPABILITIES)) {
+ /* Enumerate features reported in IA32_CORE_CAPABILITIES MSR. */
+ rdmsrl(MSR_IA32_CORE_CAPABILITIES, ia32_core_caps);
+ } else if (!boot_cpu_has(X86_FEATURE_HYPERVISOR)) {
+ /* Enumerate split lock detection by family and model. */
+ if (x86_match_cpu(split_lock_cpu_ids))
+ ia32_core_caps |= MSR_IA32_CORE_CAPABILITIES_SPLIT_LOCK_DETECT;
+ }
+
+ if (ia32_core_caps & MSR_IA32_CORE_CAPABILITIES_SPLIT_LOCK_DETECT)
+ split_lock_setup();
+}
diff --git a/arch/x86/kernel/process.c b/arch/x86/kernel/process.c
index bd2a11ca5dd6..c04476a1f970 100644
--- a/arch/x86/kernel/process.c
+++ b/arch/x86/kernel/process.c
@@ -654,6 +654,9 @@ void __switch_to_xtra(struct task_struct *prev_p, struct task_struct *next_p)
/* Enforce MSR update to ensure consistent state */
__speculation_ctrl_update(~tifn, tifn);
}
+
+ if (tifp & _TIF_SLD)
+ switch_sld(prev_p);
}

/*
diff --git a/arch/x86/kernel/traps.c b/arch/x86/kernel/traps.c
index 3451a004e162..3cba28c9c4d9 100644
--- a/arch/x86/kernel/traps.c
+++ b/arch/x86/kernel/traps.c
@@ -242,7 +242,6 @@ do_trap(int trapnr, int signr, char *str, struct pt_regs *regs,
{
struct task_struct *tsk = current;

-
if (!do_trap_no_signal(tsk, trapnr, str, regs, error_code))
return;

@@ -288,9 +287,34 @@ DO_ERROR(X86_TRAP_OLD_MF, SIGFPE, 0, NULL, "coprocessor segment overru
DO_ERROR(X86_TRAP_TS, SIGSEGV, 0, NULL, "invalid TSS", invalid_TSS)
DO_ERROR(X86_TRAP_NP, SIGBUS, 0, NULL, "segment not present", segment_not_present)
DO_ERROR(X86_TRAP_SS, SIGBUS, 0, NULL, "stack segment", stack_segment)
-DO_ERROR(X86_TRAP_AC, SIGBUS, BUS_ADRALN, NULL, "alignment check", alignment_check)
#undef IP

+dotraplinkage void do_alignment_check(struct pt_regs *regs, long error_code)
+{
+ unsigned int trapnr = X86_TRAP_AC;
+ char str[] = "alignment check";
+ int signr = SIGBUS;
+
+ RCU_LOCKDEP_WARN(!rcu_is_watching(), "entry code didn't wake RCU");
+
+ if (notify_die(DIE_TRAP, str, regs, error_code, trapnr, signr) == NOTIFY_STOP)
+ return;
+
+ if (!handle_split_lock())
+ return;
+
+ if (!user_mode(regs))
+ die("Split lock detected\n", regs, error_code);
+
+ cond_local_irq_enable(regs);
+
+ if (handle_user_split_lock(regs, error_code))
+ return;
+
+ do_trap(X86_TRAP_AC, SIGBUS, "alignment check", regs,
+ error_code, BUS_ADRALN, NULL);
+}
+
#ifdef CONFIG_VMAP_STACK
__visible void __noreturn handle_stack_overflow(const char *message,
struct pt_regs *regs,

2019-11-22 17:24:49

by Tony Luck

[permalink] [raw]
Subject: Re: [PATCH v10 6/6] x86/split_lock: Enable split lock detection by kernel parameter

On Fri, Nov 22, 2019 at 04:27:15PM +0100, Peter Zijlstra wrote:
> +void handle_user_split_lock(struct pt_regs *regs, long error_code)
> +{
> + if (sld_state == sld_fatal)
> + return false;
> +
> + pr_alert("#AC: %s/%d took a split_lock trap at address: 0x%lx\n",
> + current->comm, current->pid, regs->ip);
> +
> + __sld_set_msr(false);
> + set_tsk_thread_flag(current, TIF_CLD);
> + return true;
> +}

I think you need an extra check in here. While a #AC in the kernel
is an indication of a split lock. A user might have enabled alignment
checking and so this #AC might not be from a split lock.

I think the extra code if just to change that first test to:

if ((regs->eflags & X86_EFLAGS_AC) || sld_fatal)

-Tony

2019-11-22 17:51:51

by Tony Luck

[permalink] [raw]
Subject: RE: [PATCH v10 6/6] x86/split_lock: Enable split lock detection by kernel parameter

> When we use byte ops, we must consider the word as 4 independent
> variables. And in that case the later load might observe the lock-byte
> state from 3, because the modification to the lock byte from 4 is in
> CPU2's store-buffer.

So we absolutely violate this with the optimization for constant arguments
to set_bit(), clear_bit() and change_bit() that are implemented as byte ops.

So is code that does:

set_bit(0, bitmap);

on one CPU. While another is doing:

set_bit(mybit, bitmap);

on another CPU safe? The first operates on just one byte, the second on 8 bytes.

-Tony

2019-11-22 18:03:42

by Tony Luck

[permalink] [raw]
Subject: RE: [PATCH v10 6/6] x86/split_lock: Enable split lock detection by kernel parameter

> it requires we get the kernel and firmware clean, but only warns about
> dodgy userspace, which I really don't think there is much of.
>
> getting the kernel clean should be pretty simple.

Fenghua has a half dozen additional patches (I think they were
all posted in previous iterations of the patch) that were found by
code inspection, rather than by actually hitting them.

Those should go in ahead of this.

-Tony

2019-11-22 18:47:07

by Sean Christopherson

[permalink] [raw]
Subject: Re: [PATCH v10 6/6] x86/split_lock: Enable split lock detection by kernel parameter

On Fri, Nov 22, 2019 at 04:27:15PM +0100, Peter Zijlstra wrote:
> On Fri, Nov 22, 2019 at 11:51:41AM +0100, Peter Zijlstra wrote:
>
> > A non-lethal default enabled variant would be even better for them :-)
>
> diff --git a/arch/x86/include/asm/thread_info.h b/arch/x86/include/asm/thread_info.h
> index d779366ce3f8..d23638a0525e 100644
> --- a/arch/x86/include/asm/thread_info.h
> +++ b/arch/x86/include/asm/thread_info.h
> @@ -92,6 +92,7 @@ struct thread_info {
> #define TIF_NOCPUID 15 /* CPUID is not accessible in userland */
> #define TIF_NOTSC 16 /* TSC is not accessible in userland */
> #define TIF_IA32 17 /* IA32 compatibility process */
> +#define TIF_SLD 18 /* split_lock_detect */

Maybe use SLAC (Split-Lock AC) as the acronym? I can't help but read
SLD as "split-lock disabled". And name this TIF_NOSLAC (or TIF_NOSLD if
you don't like SLAC) since it's set when the task is running without #AC?

> #define TIF_NOHZ 19 /* in adaptive nohz mode */
> #define TIF_MEMDIE 20 /* is terminating due to OOM killer */
> #define TIF_POLLING_NRFLAG 21 /* idle is polling for TIF_NEED_RESCHED */
> @@ -122,6 +123,7 @@ struct thread_info {
> #define _TIF_NOCPUID (1 << TIF_NOCPUID)
> #define _TIF_NOTSC (1 << TIF_NOTSC)
> #define _TIF_IA32 (1 << TIF_IA32)
> +#define _TIF_SLD (1 << TIF_SLD)
> #define _TIF_NOHZ (1 << TIF_NOHZ)
> #define _TIF_POLLING_NRFLAG (1 << TIF_POLLING_NRFLAG)
> #define _TIF_IO_BITMAP (1 << TIF_IO_BITMAP)

...

> +void handle_split_lock(void)
> +{
> + return sld_state != sld_off;
> +}
> +
> +void handle_user_split_lock(struct pt_regs *regs, long error_code)
> +{
> + if (sld_state == sld_fatal)
> + return false;
> +
> + pr_alert("#AC: %s/%d took a split_lock trap at address: 0x%lx\n",
> + current->comm, current->pid, regs->ip);
> +
> + __sld_set_msr(false);
> + set_tsk_thread_flag(current, TIF_CLD);
> + return true;
> +}
> +
> +void switch_sld(struct task_struct *prev)
> +{
> + __sld_set_msr(true);
> + clear_tsk_thread_flag(current, TIF_CLD);
> +}

...

> diff --git a/arch/x86/kernel/process.c b/arch/x86/kernel/process.c
> index bd2a11ca5dd6..c04476a1f970 100644
> --- a/arch/x86/kernel/process.c
> +++ b/arch/x86/kernel/process.c
> @@ -654,6 +654,9 @@ void __switch_to_xtra(struct task_struct *prev_p, struct task_struct *next_p)
> /* Enforce MSR update to ensure consistent state */
> __speculation_ctrl_update(~tifn, tifn);
> }
> +
> + if (tifp & _TIF_SLD)
> + switch_sld(prev_p);
> }

Re-enabling #AC when scheduling out the misbehaving task would also work
well for KVM, e.g. call a variant of handle_user_split_lock() on an
unhandled #AC in the guest. We can also reuse KVM's existing code to
restore the MSR on return to userspace so that an #AC in the guest doesn't
disable detection in the userspace VMM.

Alternatively, KVM could manually do it's own thing and context switch
the MSR on VM-Enter/VM-Exit (after an unhandled #AC), but I'd rather keep
this out of the VM-Enter path and also avoid thrashing the MSR on an SMT
CPU. The only downside is that KVM itself would occasionally run with #AC
disabled, but that doesn't seem like a big deal since split locks should
not be magically appearing in KVM.

Last thought, KVM should only expose split lock #AC to the guest if SMT=n
or the host is in "force" mode so that split lock #AC is always enabled
in hardware (for the guest) when then guest wants it enabled. KVM would
obviously not actually disable #AC in hardware when running in force mode,
regardless of the guest's wishes.

> /*
> diff --git a/arch/x86/kernel/traps.c b/arch/x86/kernel/traps.c
> index 3451a004e162..3cba28c9c4d9 100644
> --- a/arch/x86/kernel/traps.c
> +++ b/arch/x86/kernel/traps.c
> @@ -242,7 +242,6 @@ do_trap(int trapnr, int signr, char *str, struct pt_regs *regs,
> {
> struct task_struct *tsk = current;
>
> -
> if (!do_trap_no_signal(tsk, trapnr, str, regs, error_code))
> return;
>
> @@ -288,9 +287,34 @@ DO_ERROR(X86_TRAP_OLD_MF, SIGFPE, 0, NULL, "coprocessor segment overru
> DO_ERROR(X86_TRAP_TS, SIGSEGV, 0, NULL, "invalid TSS", invalid_TSS)
> DO_ERROR(X86_TRAP_NP, SIGBUS, 0, NULL, "segment not present", segment_not_present)
> DO_ERROR(X86_TRAP_SS, SIGBUS, 0, NULL, "stack segment", stack_segment)
> -DO_ERROR(X86_TRAP_AC, SIGBUS, BUS_ADRALN, NULL, "alignment check", alignment_check)
> #undef IP
>
> +dotraplinkage void do_alignment_check(struct pt_regs *regs, long error_code)
> +{
> + unsigned int trapnr = X86_TRAP_AC;
> + char str[] = "alignment check";
> + int signr = SIGBUS;
> +
> + RCU_LOCKDEP_WARN(!rcu_is_watching(), "entry code didn't wake RCU");
> +
> + if (notify_die(DIE_TRAP, str, regs, error_code, trapnr, signr) == NOTIFY_STOP)
> + return;
> +
> + if (!handle_split_lock())

Pretty sure this should be omitted entirely. For an #AC in the kernel,
simply restarting the instruction will fault indefinitely, e.g. dieing is
probably the best course of action if a (completely unexpteced) #AC occurs
in "off" mode. Dropping this check also lets handle_user_split_lock() do
the right thing for #AC due to EFLAGS.AC=1 (pointed out by Tony).

> + return;
> +
> + if (!user_mode(regs))
> + die("Split lock detected\n", regs, error_code);
> +
> + cond_local_irq_enable(regs);
> +
> + if (handle_user_split_lock(regs, error_code))
> + return;
> +
> + do_trap(X86_TRAP_AC, SIGBUS, "alignment check", regs,
> + error_code, BUS_ADRALN, NULL);
> +}
> +
> #ifdef CONFIG_VMAP_STACK
> __visible void __noreturn handle_stack_overflow(const char *message,
> struct pt_regs *regs,

2019-11-22 20:26:35

by Peter Zijlstra

[permalink] [raw]
Subject: Re: [PATCH v10 6/6] x86/split_lock: Enable split lock detection by kernel parameter

On Fri, Nov 22, 2019 at 06:02:04PM +0000, Luck, Tony wrote:
> > it requires we get the kernel and firmware clean, but only warns about
> > dodgy userspace, which I really don't think there is much of.
> >
> > getting the kernel clean should be pretty simple.
>
> Fenghua has a half dozen additional patches (I think they were
> all posted in previous iterations of the patch) that were found by
> code inspection, rather than by actually hitting them.

I thought we merged at least some of that, but maybe my recollection is
faulty.

> Those should go in ahead of this.

Yes, we should make the kernel as clean as possible before doing this.

2019-11-22 20:27:25

by Peter Zijlstra

[permalink] [raw]
Subject: Re: [PATCH v10 6/6] x86/split_lock: Enable split lock detection by kernel parameter

On Fri, Nov 22, 2019 at 09:22:46AM -0800, Luck, Tony wrote:
> On Fri, Nov 22, 2019 at 04:27:15PM +0100, Peter Zijlstra wrote:
> > +void handle_user_split_lock(struct pt_regs *regs, long error_code)
> > +{
> > + if (sld_state == sld_fatal)
> > + return false;
> > +
> > + pr_alert("#AC: %s/%d took a split_lock trap at address: 0x%lx\n",
> > + current->comm, current->pid, regs->ip);
> > +
> > + __sld_set_msr(false);
> > + set_tsk_thread_flag(current, TIF_CLD);
> > + return true;
> > +}
>
> I think you need an extra check in here. While a #AC in the kernel
> is an indication of a split lock. A user might have enabled alignment
> checking and so this #AC might not be from a split lock.
>
> I think the extra code if just to change that first test to:
>
> if ((regs->eflags & X86_EFLAGS_AC) || sld_fatal)

Indeed.

2019-11-22 20:31:33

by Fenghua Yu

[permalink] [raw]
Subject: Re: [PATCH v10 6/6] x86/split_lock: Enable split lock detection by kernel parameter

On Fri, Nov 22, 2019 at 09:23:45PM +0100, Peter Zijlstra wrote:
> On Fri, Nov 22, 2019 at 06:02:04PM +0000, Luck, Tony wrote:
> > > it requires we get the kernel and firmware clean, but only warns about
> > > dodgy userspace, which I really don't think there is much of.
> > >
> > > getting the kernel clean should be pretty simple.
> >
> > Fenghua has a half dozen additional patches (I think they were
> > all posted in previous iterations of the patch) that were found by
> > code inspection, rather than by actually hitting them.
>
> I thought we merged at least some of that, but maybe my recollection is
> faulty.

At least 2 key fixes are in TIP tree:
https://lore.kernel.org/lkml/157384597983.12247.8995835529288193538.tip-bot2@tip-bot2/
https://lore.kernel.org/lkml/157384597947.12247.7200239597382357556.tip-bot2@tip-bot2/

The two issues are blocking kernel boot when split lock is enabled.

>
> > Those should go in ahead of this.
>
> Yes, we should make the kernel as clean as possible before doing this.

I'll send out other 6 fixes for atomic bitops shortly. These issues are found
by code inspection.

Thanks.

-Fenghua

2019-11-22 20:32:24

by Peter Zijlstra

[permalink] [raw]
Subject: Re: [PATCH v10 6/6] x86/split_lock: Enable split lock detection by kernel parameter

On Fri, Nov 22, 2019 at 10:44:57AM -0800, Sean Christopherson wrote:
> On Fri, Nov 22, 2019 at 04:27:15PM +0100, Peter Zijlstra wrote:
> > On Fri, Nov 22, 2019 at 11:51:41AM +0100, Peter Zijlstra wrote:
> >
> > > A non-lethal default enabled variant would be even better for them :-)
> >
> > diff --git a/arch/x86/include/asm/thread_info.h b/arch/x86/include/asm/thread_info.h
> > index d779366ce3f8..d23638a0525e 100644
> > --- a/arch/x86/include/asm/thread_info.h
> > +++ b/arch/x86/include/asm/thread_info.h
> > @@ -92,6 +92,7 @@ struct thread_info {
> > #define TIF_NOCPUID 15 /* CPUID is not accessible in userland */
> > #define TIF_NOTSC 16 /* TSC is not accessible in userland */
> > #define TIF_IA32 17 /* IA32 compatibility process */
> > +#define TIF_SLD 18 /* split_lock_detect */
>
> Maybe use SLAC (Split-Lock AC) as the acronym? I can't help but read
> SLD as "split-lock disabled". And name this TIF_NOSLAC (or TIF_NOSLD if
> you don't like SLAC) since it's set when the task is running without #AC?

I'll take any other name, really. I was typing in a hurry and my
pick-a-sensible-name generator was definitely not running.

> > diff --git a/arch/x86/kernel/process.c b/arch/x86/kernel/process.c
> > index bd2a11ca5dd6..c04476a1f970 100644
> > --- a/arch/x86/kernel/process.c
> > +++ b/arch/x86/kernel/process.c
> > @@ -654,6 +654,9 @@ void __switch_to_xtra(struct task_struct *prev_p, struct task_struct *next_p)
> > /* Enforce MSR update to ensure consistent state */
> > __speculation_ctrl_update(~tifn, tifn);
> > }
> > +
> > + if (tifp & _TIF_SLD)
> > + switch_sld(prev_p);
> > }
>
> Re-enabling #AC when scheduling out the misbehaving task would also work
> well for KVM, e.g. call a variant of handle_user_split_lock() on an
> unhandled #AC in the guest.

Iinitially I thought having a timer to re-enable it, but this also
works. We really shouldn't be hitting this much. And any actual
occurence needs to be investigated and fixed anyway.

I've not thought much about guests, that's not really my thing. But I'll
think about it a bit :-)

> > +dotraplinkage void do_alignment_check(struct pt_regs *regs, long error_code)
> > +{
> > + unsigned int trapnr = X86_TRAP_AC;
> > + char str[] = "alignment check";
> > + int signr = SIGBUS;
> > +
> > + RCU_LOCKDEP_WARN(!rcu_is_watching(), "entry code didn't wake RCU");
> > +
> > + if (notify_die(DIE_TRAP, str, regs, error_code, trapnr, signr) == NOTIFY_STOP)
> > + return;
> > +
> > + if (!handle_split_lock())
>
> Pretty sure this should be omitted entirely.

Yes, I just wanted to early exit the thing for !SUP_INTEL.

> For an #AC in the kernel,
> simply restarting the instruction will fault indefinitely, e.g. dieing is
> probably the best course of action if a (completely unexpteced) #AC occurs
> in "off" mode. Dropping this check also lets handle_user_split_lock() do
> the right thing for #AC due to EFLAGS.AC=1 (pointed out by Tony).

Howveer I'd completely forgotten about EFLAGS.AC.

> > + return;
> > +
> > + if (!user_mode(regs))
> > + die("Split lock detected\n", regs, error_code);
> > +
> > + cond_local_irq_enable(regs);
> > +
> > + if (handle_user_split_lock(regs, error_code))
> > + return;
> > +
> > + do_trap(X86_TRAP_AC, SIGBUS, "alignment check", regs,
> > + error_code, BUS_ADRALN, NULL);
> > +}

2019-11-22 20:35:42

by Peter Zijlstra

[permalink] [raw]
Subject: Re: [PATCH v10 6/6] x86/split_lock: Enable split lock detection by kernel parameter

On Fri, Nov 22, 2019 at 05:48:14PM +0000, Luck, Tony wrote:
> > When we use byte ops, we must consider the word as 4 independent
> > variables. And in that case the later load might observe the lock-byte
> > state from 3, because the modification to the lock byte from 4 is in
> > CPU2's store-buffer.
>
> So we absolutely violate this with the optimization for constant arguments
> to set_bit(), clear_bit() and change_bit() that are implemented as byte ops.
>
> So is code that does:
>
> set_bit(0, bitmap);
>
> on one CPU. While another is doing:
>
> set_bit(mybit, bitmap);
>
> on another CPU safe? The first operates on just one byte, the second on 8 bytes.

It is safe if all you care about is the consistency of that one bit.

2019-11-22 20:36:59

by Peter Zijlstra

[permalink] [raw]
Subject: Re: [PATCH v10 6/6] x86/split_lock: Enable split lock detection by kernel parameter

On Fri, Nov 22, 2019 at 09:46:16AM +0000, David Laight wrote:
> From Andy Lutomirski

> > Can we really not just change the lock asm to use 32-bit accesses for
> > set_bit(), etc? Sure, it will fail if the bit index is greater than
> > 2^32, but that seems nuts.
>
> For little endian 64bit cpu it is safe(ish) to cast int [] to long [] for the bitops.

But that generates the alignment issues this patch set is concerned
about.

2019-11-22 21:28:53

by Andy Lutomirski

[permalink] [raw]
Subject: Re: [PATCH v10 6/6] x86/split_lock: Enable split lock detection by kernel parameter

On Fri, Nov 22, 2019 at 12:29 PM Fenghua Yu <[email protected]> wrote:
>
> On Fri, Nov 22, 2019 at 09:23:45PM +0100, Peter Zijlstra wrote:
> > On Fri, Nov 22, 2019 at 06:02:04PM +0000, Luck, Tony wrote:
> > > > it requires we get the kernel and firmware clean, but only warns about
> > > > dodgy userspace, which I really don't think there is much of.
> > > >
> > > > getting the kernel clean should be pretty simple.
> > >
> > > Fenghua has a half dozen additional patches (I think they were
> > > all posted in previous iterations of the patch) that were found by
> > > code inspection, rather than by actually hitting them.
> >
> > I thought we merged at least some of that, but maybe my recollection is
> > faulty.
>
> At least 2 key fixes are in TIP tree:
> https://lore.kernel.org/lkml/157384597983.12247.8995835529288193538.tip-bot2@tip-bot2/
> https://lore.kernel.org/lkml/157384597947.12247.7200239597382357556.tip-bot2@tip-bot2/

I do not like these patches at all. I would *much* rather see the
bitops fixed and those patches reverted.

Is there any Linux architecture that doesn't have 32-bit atomic
operations? If all architectures can support them, then we should add
set_bit_u32(), etc and/or make x86's set_bit() work for a
4-byte-aligned pointer.

--Andy

2019-11-22 21:29:20

by Andy Lutomirski

[permalink] [raw]
Subject: Re: [PATCH v10 6/6] x86/split_lock: Enable split lock detection by kernel parameter

On Fri, Nov 22, 2019 at 12:31 PM Peter Zijlstra <[email protected]> wrote:
>
> On Fri, Nov 22, 2019 at 05:48:14PM +0000, Luck, Tony wrote:
> > > When we use byte ops, we must consider the word as 4 independent
> > > variables. And in that case the later load might observe the lock-byte
> > > state from 3, because the modification to the lock byte from 4 is in
> > > CPU2's store-buffer.
> >
> > So we absolutely violate this with the optimization for constant arguments
> > to set_bit(), clear_bit() and change_bit() that are implemented as byte ops.
> >
> > So is code that does:
> >
> > set_bit(0, bitmap);
> >
> > on one CPU. While another is doing:
> >
> > set_bit(mybit, bitmap);
> >
> > on another CPU safe? The first operates on just one byte, the second on 8 bytes.
>
> It is safe if all you care about is the consistency of that one bit.
>

I'm still lost here. Can you explain how one could write code that
observes an issue? My trusty SDM, Vol 3 8.2.2 says "Locked
instructions have a total order." 8.2.3.9 says "Loads and Stores Are
Not Reordered with Locked Instructions." Admittedly, the latter is an
"example", but the section is very clear about the fact that a locked
instruction prevents reordering of a load or a store issued by the
same CPU relative to the locked instruction *regardless of whether
they overlap*.

So using LOCK to impleent smb_mb() is correct, and I still don't
understand your particular concern.

I understand that the CPU is probably permitted to optimize a LOCK RMW
operation such that it retires before the store buffers of earlier
instructions are fully flushed, but only if the store buffer and cache
coherency machinery work together to preserve the architecturally
guaranteed ordering.

2019-11-23 00:33:14

by Tony Luck

[permalink] [raw]
Subject: Re: [PATCH v10 6/6] x86/split_lock: Enable split lock detection by kernel parameter

On Fri, Nov 22, 2019 at 04:27:15PM +0100, Peter Zijlstra wrote:

This all looks dubious on an HT system .... three snips
from your patch:

> +static bool __sld_msr_set(bool on)
> +{
> + u64 test_ctrl_val;
> +
> + if (rdmsrl_safe(MSR_TEST_CTRL, test_ctrl_val))
> + return false;
> +
> + if (on)
> + test_ctrl_val |= MSR_TEST_CTRL_SPLIT_LOCK_DETECT;
> + else
> + test_ctrl_val &= ~MSR_TEST_CTRL_SPLIT_LOCK_DETECT;
> +
> + if (wrmsrl_safe(MSR_TEST_CTRL, test_ctrl_val))
> + return false;
> +
> + return true;
> +}

> +void switch_sld(struct task_struct *prev)
> +{
> + __sld_set_msr(true);
> + clear_tsk_thread_flag(current, TIF_CLD);
> +}

> @@ -654,6 +654,9 @@ void __switch_to_xtra(struct task_struct *prev_p, struct task_struct *next_p)
> /* Enforce MSR update to ensure consistent state */
> __speculation_ctrl_update(~tifn, tifn);
> }
> +
> + if (tifp & _TIF_SLD)
> + switch_sld(prev_p);
> }

Don't you have some horrible races between the two logical
processors on the same core as they both try to set/clear the
MSR that is shared at the core level?

-Tony

2019-11-25 16:16:16

by Sean Christopherson

[permalink] [raw]
Subject: Re: [PATCH v10 6/6] x86/split_lock: Enable split lock detection by kernel parameter

On Fri, Nov 22, 2019 at 04:30:56PM -0800, Luck, Tony wrote:
> On Fri, Nov 22, 2019 at 04:27:15PM +0100, Peter Zijlstra wrote:
>
> This all looks dubious on an HT system .... three snips
> from your patch:
>
> > +static bool __sld_msr_set(bool on)
> > +{
> > + u64 test_ctrl_val;
> > +
> > + if (rdmsrl_safe(MSR_TEST_CTRL, test_ctrl_val))
> > + return false;
> > +
> > + if (on)
> > + test_ctrl_val |= MSR_TEST_CTRL_SPLIT_LOCK_DETECT;
> > + else
> > + test_ctrl_val &= ~MSR_TEST_CTRL_SPLIT_LOCK_DETECT;
> > +
> > + if (wrmsrl_safe(MSR_TEST_CTRL, test_ctrl_val))
> > + return false;
> > +
> > + return true;
> > +}
>
> > +void switch_sld(struct task_struct *prev)
> > +{
> > + __sld_set_msr(true);
> > + clear_tsk_thread_flag(current, TIF_CLD);
> > +}
>
> > @@ -654,6 +654,9 @@ void __switch_to_xtra(struct task_struct *prev_p, struct task_struct *next_p)
> > /* Enforce MSR update to ensure consistent state */
> > __speculation_ctrl_update(~tifn, tifn);
> > }
> > +
> > + if (tifp & _TIF_SLD)
> > + switch_sld(prev_p);
> > }
>
> Don't you have some horrible races between the two logical
> processors on the same core as they both try to set/clear the
> MSR that is shared at the core level?

Yes and no. Yes, there will be races, but they won't be fatal in any way.

- Only the split-lock bit is supported by the kernel, so there isn't a
risk of corrupting other bits as both threads will rewrite the current
hardware value.

- Toggling of split-lock is only done in "warn" mode. Worst case
scenario of a race is that a misbehaving task will generate multiple
#AC exceptions on the same instruction. And this race will only occur
if both siblings are running tasks that generate split-lock #ACs, e.g.
a race where sibling threads are writing different values will only
occur if CPUx is disabling split-lock after an #AC and CPUy is
re-enabling split-lock after *its* previous task generated an #AC.

- Transitioning between modes at runtime isn't supported and disabling
is tracked per task, so hardware will always reach a steady state that
matches the configured mode. I.e. split-lock is guaranteed to be
enabled in hardware once all _TIF_SLD threads have been scheduled out.

2019-12-02 18:22:38

by Tony Luck

[permalink] [raw]
Subject: Re: [PATCH v10 6/6] x86/split_lock: Enable split lock detection by kernel parameter

On Mon, Nov 25, 2019 at 08:13:48AM -0800, Sean Christopherson wrote:
> On Fri, Nov 22, 2019 at 04:30:56PM -0800, Luck, Tony wrote:
> > Don't you have some horrible races between the two logical
> > processors on the same core as they both try to set/clear the
> > MSR that is shared at the core level?
>
> Yes and no. Yes, there will be races, but they won't be fatal in any way.
>
> - Only the split-lock bit is supported by the kernel, so there isn't a
> risk of corrupting other bits as both threads will rewrite the current
> hardware value.
>
> - Toggling of split-lock is only done in "warn" mode. Worst case
> scenario of a race is that a misbehaving task will generate multiple
> #AC exceptions on the same instruction. And this race will only occur
> if both siblings are running tasks that generate split-lock #ACs, e.g.
> a race where sibling threads are writing different values will only
> occur if CPUx is disabling split-lock after an #AC and CPUy is
> re-enabling split-lock after *its* previous task generated an #AC.
>
> - Transitioning between modes at runtime isn't supported and disabling
> is tracked per task, so hardware will always reach a steady state that
> matches the configured mode. I.e. split-lock is guaranteed to be
> enabled in hardware once all _TIF_SLD threads have been scheduled out.

We should probably include this analysis in the commit
comment. Maybe a comment or two in the code too to note
that the races are mostly harmless and guaranteed to end
quickly.

-Tony

2019-12-11 17:53:24

by Peter Zijlstra

[permalink] [raw]
Subject: Re: [PATCH v10 6/6] x86/split_lock: Enable split lock detection by kernel parameter

On Fri, Nov 22, 2019 at 01:23:30PM -0800, Andy Lutomirski wrote:
> On Fri, Nov 22, 2019 at 12:31 PM Peter Zijlstra <[email protected]> wrote:
> >
> > On Fri, Nov 22, 2019 at 05:48:14PM +0000, Luck, Tony wrote:
> > > > When we use byte ops, we must consider the word as 4 independent
> > > > variables. And in that case the later load might observe the lock-byte
> > > > state from 3, because the modification to the lock byte from 4 is in
> > > > CPU2's store-buffer.
> > >
> > > So we absolutely violate this with the optimization for constant arguments
> > > to set_bit(), clear_bit() and change_bit() that are implemented as byte ops.
> > >
> > > So is code that does:
> > >
> > > set_bit(0, bitmap);
> > >
> > > on one CPU. While another is doing:
> > >
> > > set_bit(mybit, bitmap);
> > >
> > > on another CPU safe? The first operates on just one byte, the second on 8 bytes.
> >
> > It is safe if all you care about is the consistency of that one bit.
> >
>
> I'm still lost here. Can you explain how one could write code that
> observes an issue? My trusty SDM, Vol 3 8.2.2 says "Locked
> instructions have a total order."

This is the thing I don't fully believe. Per this thread the bus-lock is
*BAD* and not used for normal LOCK prefixed operations. But without the
bus-lock it becomes very hard to guarantee total order.

After all, if some CPU doesn't observe a specific variable, it doesn't
care where in the order it fell. So I'm thinking they punted and went
with some partial order that is near enough that it becomes very hard to
tell the difference the moment you actually do observe stuff.

> 8.2.3.9 says "Loads and Stores Are
> Not Reordered with Locked Instructions." Admittedly, the latter is an
> "example", but the section is very clear about the fact that a locked
> instruction prevents reordering of a load or a store issued by the
> same CPU relative to the locked instruction *regardless of whether
> they overlap*.

IIRC this rule is CPU-local.

Sure, but we're talking two cpus here.

u32 var = 0;
u8 *ptr = &var;

CPU0 CPU1

xchg(ptr, 1)

xchg((ptr+1, 1);
r = READ_ONCE(var);

AFAICT nothing guarantees r == 0x0101. The CPU1 store can be stuck in
CPU1's store-buffer. CPU0's xchg() does not overlap and therefore
doesn't force a snoop or forward.

From the perspective of the LOCK prefixed instructions CPU0 never
observes the variable @ptr. And therefore doesn't need to provide order.

Note how the READ_ONCE() is a normal load on CPU0, and per the rules is
only forced to happen after it's own LOCK prefixed instruction, but it
is free to observe ptr[0,2,3] from before, only ptr[1] will be forwarded
from its own store-buffer.

This is exactly the one reorder TSO allows.

> I understand that the CPU is probably permitted to optimize a LOCK RMW
> operation such that it retires before the store buffers of earlier
> instructions are fully flushed, but only if the store buffer and cache
> coherency machinery work together to preserve the architecturally
> guaranteed ordering.

Maybe, maybe not. I'm very loathe to trust this without things being
better specified.

Like I said, it is possible that it all works, but the way I understand
things I _really_ don't want to rely on it.

Therefore, I've written:

u32 var = 0;
u8 *ptr = &var;

CPU0 CPU1

xchg(ptr, 1)

set_bit(8, ptr);

r = READ_ONCE(var);

Because then the LOCK BTSL overlaps with the LOCK XCHGB and CPU0 now
observes the variable @ptr and therefore must force order.

Did this clarify, or confuse more?

2019-12-11 18:14:05

by Andy Lutomirski

[permalink] [raw]
Subject: Re: [PATCH v10 6/6] x86/split_lock: Enable split lock detection by kernel parameter

On Wed, Dec 11, 2019 at 9:52 AM Peter Zijlstra <[email protected]> wrote:
>
> On Fri, Nov 22, 2019 at 01:23:30PM -0800, Andy Lutomirski wrote:
> > On Fri, Nov 22, 2019 at 12:31 PM Peter Zijlstra <[email protected]> wrote:
> > >
> > > On Fri, Nov 22, 2019 at 05:48:14PM +0000, Luck, Tony wrote:
> > > > > When we use byte ops, we must consider the word as 4 independent
> > > > > variables. And in that case the later load might observe the lock-byte
> > > > > state from 3, because the modification to the lock byte from 4 is in
> > > > > CPU2's store-buffer.
> > > >
> > > > So we absolutely violate this with the optimization for constant arguments
> > > > to set_bit(), clear_bit() and change_bit() that are implemented as byte ops.
> > > >
> > > > So is code that does:
> > > >
> > > > set_bit(0, bitmap);
> > > >
> > > > on one CPU. While another is doing:
> > > >
> > > > set_bit(mybit, bitmap);
> > > >
> > > > on another CPU safe? The first operates on just one byte, the second on 8 bytes.
> > >
> > > It is safe if all you care about is the consistency of that one bit.
> > >
> >
> > I'm still lost here. Can you explain how one could write code that
> > observes an issue? My trusty SDM, Vol 3 8.2.2 says "Locked
> > instructions have a total order."
>
> This is the thing I don't fully believe. Per this thread the bus-lock is
> *BAD* and not used for normal LOCK prefixed operations. But without the
> bus-lock it becomes very hard to guarantee total order.
>
> After all, if some CPU doesn't observe a specific variable, it doesn't
> care where in the order it fell. So I'm thinking they punted and went
> with some partial order that is near enough that it becomes very hard to
> tell the difference the moment you actually do observe stuff.

I hope that, if the SDM is indeed wrong, that Intel would fix the SDM.
It's definitely not fun to try to understand locking if we don't trust
the manual.

>
> > 8.2.3.9 says "Loads and Stores Are
> > Not Reordered with Locked Instructions." Admittedly, the latter is an
> > "example", but the section is very clear about the fact that a locked
> > instruction prevents reordering of a load or a store issued by the
> > same CPU relative to the locked instruction *regardless of whether
> > they overlap*.
>
> IIRC this rule is CPU-local.
>
> Sure, but we're talking two cpus here.
>
> u32 var = 0;
> u8 *ptr = &var;
>
> CPU0 CPU1
>
> xchg(ptr, 1)
>
> xchg((ptr+1, 1);
> r = READ_ONCE(var);
>
> AFAICT nothing guarantees r == 0x0101. The CPU1 store can be stuck in
> CPU1's store-buffer. CPU0's xchg() does not overlap and therefore
> doesn't force a snoop or forward.

I think I don't quite understand. The final value of var had better
be 0x0101 or something is severely wrong. But r can be 0x0100 because
nothing in this example guarantees that the total order of the locked
instructions has CPU 1's instruction first.

>
> From the perspective of the LOCK prefixed instructions CPU0 never
> observes the variable @ptr. And therefore doesn't need to provide order.

I suspect that the implementation works on whole cache lines for
everything except the actual store buffer entries, which would mean
that CPU 0 does think it observed ptr[0].

>
> Note how the READ_ONCE() is a normal load on CPU0, and per the rules is
> only forced to happen after it's own LOCK prefixed instruction, but it
> is free to observe ptr[0,2,3] from before, only ptr[1] will be forwarded
> from its own store-buffer.
>
> This is exactly the one reorder TSO allows.

If so, then our optimized smp_mb() has all kinds of problems, no?

>
> > I understand that the CPU is probably permitted to optimize a LOCK RMW
> > operation such that it retires before the store buffers of earlier
> > instructions are fully flushed, but only if the store buffer and cache
> > coherency machinery work together to preserve the architecturally
> > guaranteed ordering.
>
> Maybe, maybe not. I'm very loathe to trust this without things being
> better specified.
>
> Like I said, it is possible that it all works, but the way I understand
> things I _really_ don't want to rely on it.
>
> Therefore, I've written:
>
> u32 var = 0;
> u8 *ptr = &var;
>
> CPU0 CPU1
>
> xchg(ptr, 1)
>
> set_bit(8, ptr);
>
> r = READ_ONCE(var);
>
> Because then the LOCK BTSL overlaps with the LOCK XCHGB and CPU0 now
> observes the variable @ptr and therefore must force order.
>
> Did this clarify, or confuse more?

Probably confuses more.

If you're actual concerned that the SDM is wrong, I think that roping
in some architects would be a good idea.

I still think that making set_bit() do 32-bit or smaller accesses is okay.

2019-12-11 18:46:24

by Tony Luck

[permalink] [raw]
Subject: Re: [PATCH v10 6/6] x86/split_lock: Enable split lock detection by kernel parameter

On Wed, Dec 11, 2019 at 06:52:02PM +0100, Peter Zijlstra wrote:
> Sure, but we're talking two cpus here.
>
> u32 var = 0;
> u8 *ptr = &var;
>
> CPU0 CPU1
>
> xchg(ptr, 1)
>
> xchg((ptr+1, 1);
> r = READ_ONCE(var);

It looks like our current implementation of set_bit() would already run
into this if some call sites for a particular bitmap `pass in constant
bit positions (which get optimized to byte wide "orb") while others pass
in a variable bit (which execute as 64-bit "bts").

I'm not a h/w architect ... but I've assumed that a LOCK operation
on something contained entirely within a cache line gets its atomicity
by keeping exclusive ownership of the cache line. Split lock happens
because you can't keep ownership for two cache lines, so it gets
escalated to a bus lock.

-Tony

2019-12-11 22:35:14

by Peter Zijlstra

[permalink] [raw]
Subject: Re: [PATCH v10 6/6] x86/split_lock: Enable split lock detection by kernel parameter

On Wed, Dec 11, 2019 at 10:12:56AM -0800, Andy Lutomirski wrote:
> On Wed, Dec 11, 2019 at 9:52 AM Peter Zijlstra <[email protected]> wrote:
> >
> > On Fri, Nov 22, 2019 at 01:23:30PM -0800, Andy Lutomirski wrote:
> > > On Fri, Nov 22, 2019 at 12:31 PM Peter Zijlstra <[email protected]> wrote:
> > > >
> > > > On Fri, Nov 22, 2019 at 05:48:14PM +0000, Luck, Tony wrote:
> > > > > > When we use byte ops, we must consider the word as 4 independent
> > > > > > variables. And in that case the later load might observe the lock-byte
> > > > > > state from 3, because the modification to the lock byte from 4 is in
> > > > > > CPU2's store-buffer.
> > > > >
> > > > > So we absolutely violate this with the optimization for constant arguments
> > > > > to set_bit(), clear_bit() and change_bit() that are implemented as byte ops.
> > > > >
> > > > > So is code that does:
> > > > >
> > > > > set_bit(0, bitmap);
> > > > >
> > > > > on one CPU. While another is doing:
> > > > >
> > > > > set_bit(mybit, bitmap);
> > > > >
> > > > > on another CPU safe? The first operates on just one byte, the second on 8 bytes.
> > > >
> > > > It is safe if all you care about is the consistency of that one bit.
> > > >
> > >
> > > I'm still lost here. Can you explain how one could write code that
> > > observes an issue? My trusty SDM, Vol 3 8.2.2 says "Locked
> > > instructions have a total order."
> >
> > This is the thing I don't fully believe. Per this thread the bus-lock is
> > *BAD* and not used for normal LOCK prefixed operations. But without the
> > bus-lock it becomes very hard to guarantee total order.
> >
> > After all, if some CPU doesn't observe a specific variable, it doesn't
> > care where in the order it fell. So I'm thinking they punted and went
> > with some partial order that is near enough that it becomes very hard to
> > tell the difference the moment you actually do observe stuff.
>
> I hope that, if the SDM is indeed wrong, that Intel would fix the SDM.
> It's definitely not fun to try to understand locking if we don't trust
> the manual.

I can try and find a HW person; but getting the SDM updated is
difficult.

Anyway, the way I see it, it is a scalability thing. Absolute total
order is untenable, it cannot be, it would mean that if you have your 16
socket 20 core system with hyperthreads, and each logical CPU doing a
LOCK prefix instruction on a separate page, they all 640 need to sit
down and discuss who goes first.

Some sort of partial order that connects where variables/lines are
actually shared is needed. Then again, I'm not a HW person, just a poor
sod trying to understand how this can work.

> > Sure, but we're talking two cpus here.
> >
> > u32 var = 0;
> > u8 *ptr = &var;
> >
> > CPU0 CPU1
> >
> > xchg(ptr, 1)
> >
> > xchg((ptr+1, 1);
> > r = READ_ONCE(var);
> >
> > AFAICT nothing guarantees r == 0x0101. The CPU1 store can be stuck in
> > CPU1's store-buffer. CPU0's xchg() does not overlap and therefore
> > doesn't force a snoop or forward.
>
> I think I don't quite understand. The final value of var had better
> be 0x0101 or something is severely wrong.

> But r can be 0x0100 because
> nothing in this example guarantees that the total order of the locked
> instructions has CPU 1's instruction first.

Assuming CPU1 goes first, why would the load from CPU0 see CPU1's
ptr[0]? It can be in CPU1 store buffer, and TSO allows regular reads to
ignore (remote) store-buffers.

> > From the perspective of the LOCK prefixed instructions CPU0 never
> > observes the variable @ptr. And therefore doesn't need to provide order.
>
> I suspect that the implementation works on whole cache lines for
> everything except the actual store buffer entries, which would mean
> that CPU 0 does think it observed ptr[0].

Quite possible, but consider SMT where each thread has its own
store-buffer. Then the core owns the line, but the value is still not
visible.

I don't know if they want to tie down those semantics.

> > Note how the READ_ONCE() is a normal load on CPU0, and per the rules is
> > only forced to happen after it's own LOCK prefixed instruction, but it
> > is free to observe ptr[0,2,3] from before, only ptr[1] will be forwarded
> > from its own store-buffer.
> >
> > This is exactly the one reorder TSO allows.
>
> If so, then our optimized smp_mb() has all kinds of problems, no?

Why? All smp_mb() guarantees is order between two memops and it does
that just fine.

> > Did this clarify, or confuse more?
>
> Probably confuses more.

Lets put it this way, the first approach has many questions and subtle
points, the second approach must always work without question.

> If you're actual concerned that the SDM is wrong, I think that roping
> in some architects would be a good idea.

I'll see what I can do, getting them to commit to something is always
the hard part.

> I still think that making set_bit() do 32-bit or smaller accesses is okay.

Yes, that really should not be a problem. This whole subthread was more
of a cautionary tale that it is not immediately obviously safe. And like
I've said before, the bitops interface is across all archs, we must
consider the weakest behaviour.

Anyway, we considered these things when we did
clear_bit_unlock_is_negative_byte(), and there is a reason we ended up
with BIT(7), there is no way to slice up a byte.

2019-12-11 22:40:18

by Peter Zijlstra

[permalink] [raw]
Subject: Re: [PATCH v10 6/6] x86/split_lock: Enable split lock detection by kernel parameter

On Wed, Dec 11, 2019 at 10:44:16AM -0800, Luck, Tony wrote:
> On Wed, Dec 11, 2019 at 06:52:02PM +0100, Peter Zijlstra wrote:
> > Sure, but we're talking two cpus here.
> >
> > u32 var = 0;
> > u8 *ptr = &var;
> >
> > CPU0 CPU1
> >
> > xchg(ptr, 1)
> >
> > xchg((ptr+1, 1);
> > r = READ_ONCE(var);
>
> It looks like our current implementation of set_bit() would already run
> into this if some call sites for a particular bitmap `pass in constant
> bit positions (which get optimized to byte wide "orb") while others pass
> in a variable bit (which execute as 64-bit "bts").

Yes, but luckily most nobody cares.

I only know of two places in the entire kernel where we considered this,
one is clear_bit_unlock_is_negative_byte() and there we punted and
stuffed everything in a single byte, and the other is that x86
queued_fetch_set_pending_acquire() thing I pointed out earlier.

> I'm not a h/w architect ... but I've assumed that a LOCK operation
> on something contained entirely within a cache line gets its atomicity
> by keeping exclusive ownership of the cache line.

Right, but like I just wrote to Andy, consider SMT where each thread has
its own store-buffer. Then the line is local to the core, but there
still is a remote sb to hide stores in.

I don't know if anything x86 does that, or even allows that, but I'm not
aware of specs that are clear enough to say either way.

2019-12-12 08:59:07

by Peter Zijlstra

[permalink] [raw]
Subject: Re: [PATCH v10 6/6] x86/split_lock: Enable split lock detection by kernel parameter

On Fri, Nov 22, 2019 at 01:25:45PM -0800, Andy Lutomirski wrote:
> On Fri, Nov 22, 2019 at 12:29 PM Fenghua Yu <[email protected]> wrote:
> >
> > On Fri, Nov 22, 2019 at 09:23:45PM +0100, Peter Zijlstra wrote:
> > > On Fri, Nov 22, 2019 at 06:02:04PM +0000, Luck, Tony wrote:
> > > > > it requires we get the kernel and firmware clean, but only warns about
> > > > > dodgy userspace, which I really don't think there is much of.
> > > > >
> > > > > getting the kernel clean should be pretty simple.
> > > >
> > > > Fenghua has a half dozen additional patches (I think they were
> > > > all posted in previous iterations of the patch) that were found by
> > > > code inspection, rather than by actually hitting them.
> > >
> > > I thought we merged at least some of that, but maybe my recollection is
> > > faulty.
> >
> > At least 2 key fixes are in TIP tree:
> > https://lore.kernel.org/lkml/157384597983.12247.8995835529288193538.tip-bot2@tip-bot2/
> > https://lore.kernel.org/lkml/157384597947.12247.7200239597382357556.tip-bot2@tip-bot2/
>
> I do not like these patches at all. I would *much* rather see the
> bitops fixed and those patches reverted.
>
> Is there any Linux architecture that doesn't have 32-bit atomic
> operations?

Of course! The right question is if there's any architecture that has
SMP and doesn't have 32bit atomic instructions, and then I'd have to
tell you that yes we have those too :/

Personally I'd love to mandate any SMP system has proper atomic ops, but
for now we sorta have to make PARISC and SPARC32 (and some ARC variant
IIRC) limp along.

PARISC and SPARC32 only have the equivalent of an xchgb or something.
Using that you can build a test-and-set spinlock, and then you have to
build atomic primitives using a hashtable of spinlocks.

Awesome, right?

> If all architectures can support them, then we should add
> set_bit_u32(), etc and/or make x86's set_bit() work for a
> 4-byte-aligned pointer.

I object to _u32() variants of the atomic bitops; the bitops interface
is a big enough trainwreck already, lets not make it worse. Making the
existing bitops use 32bit atomics on the inside should be fine though.

If anything we could switch the entire bitmap interface to unsigned int,
but I'm not sure that'd actually help much.

Anyway, many of the unaligned usages appear not to require atomicicity
in the first place, see the other patches he sent [*]. And like pointed out
elsewhere, any code that casts random pointers to (unsigned long *) is
probably already broken due to endian issues. Just making the unaligned
check go away isn't fixing it.

[*] https://lkml.kernel.org/r/[email protected]


2019-12-12 09:02:20

by Peter Zijlstra

[permalink] [raw]
Subject: Re: [PATCH v10 6/6] x86/split_lock: Enable split lock detection by kernel parameter

On Mon, Nov 25, 2019 at 08:13:48AM -0800, Sean Christopherson wrote:
> On Fri, Nov 22, 2019 at 04:30:56PM -0800, Luck, Tony wrote:
> > On Fri, Nov 22, 2019 at 04:27:15PM +0100, Peter Zijlstra wrote:
> >
> > This all looks dubious on an HT system .... three snips
> > from your patch:
> >
> > > +static bool __sld_msr_set(bool on)
> > > +{
> > > + u64 test_ctrl_val;
> > > +
> > > + if (rdmsrl_safe(MSR_TEST_CTRL, test_ctrl_val))
> > > + return false;
> > > +
> > > + if (on)
> > > + test_ctrl_val |= MSR_TEST_CTRL_SPLIT_LOCK_DETECT;
> > > + else
> > > + test_ctrl_val &= ~MSR_TEST_CTRL_SPLIT_LOCK_DETECT;
> > > +
> > > + if (wrmsrl_safe(MSR_TEST_CTRL, test_ctrl_val))
> > > + return false;
> > > +
> > > + return true;
> > > +}
> >
> > > +void switch_sld(struct task_struct *prev)
> > > +{
> > > + __sld_set_msr(true);
> > > + clear_tsk_thread_flag(current, TIF_CLD);
> > > +}
> >
> > > @@ -654,6 +654,9 @@ void __switch_to_xtra(struct task_struct *prev_p, struct task_struct *next_p)
> > > /* Enforce MSR update to ensure consistent state */
> > > __speculation_ctrl_update(~tifn, tifn);
> > > }
> > > +
> > > + if (tifp & _TIF_SLD)
> > > + switch_sld(prev_p);
> > > }
> >
> > Don't you have some horrible races between the two logical
> > processors on the same core as they both try to set/clear the
> > MSR that is shared at the core level?
>
> Yes and no. Yes, there will be races, but they won't be fatal in any way.
>
> - Only the split-lock bit is supported by the kernel, so there isn't a
> risk of corrupting other bits as both threads will rewrite the current
> hardware value.
>
> - Toggling of split-lock is only done in "warn" mode. Worst case
> scenario of a race is that a misbehaving task will generate multiple
> #AC exceptions on the same instruction. And this race will only occur
> if both siblings are running tasks that generate split-lock #ACs, e.g.
> a race where sibling threads are writing different values will only
> occur if CPUx is disabling split-lock after an #AC and CPUy is
> re-enabling split-lock after *its* previous task generated an #AC.
>
> - Transitioning between modes at runtime isn't supported and disabling
> is tracked per task, so hardware will always reach a steady state that
> matches the configured mode. I.e. split-lock is guaranteed to be
> enabled in hardware once all _TIF_SLD threads have been scheduled out.

Just so, thanks for clarifying.

2019-12-12 10:38:27

by David Laight

[permalink] [raw]
Subject: RE: [PATCH v10 6/6] x86/split_lock: Enable split lock detection by kernel parameter

From: Peter Zijlstra
> Sent: 11 December 2019 22:39
> On Wed, Dec 11, 2019 at 10:44:16AM -0800, Luck, Tony wrote:
> > On Wed, Dec 11, 2019 at 06:52:02PM +0100, Peter Zijlstra wrote:
> > > Sure, but we're talking two cpus here.
> > >
> > > u32 var = 0;
> > > u8 *ptr = &var;
> > >
> > > CPU0 CPU1
> > >
> > > xchg(ptr, 1)
> > >
> > > xchg((ptr+1, 1);
> > > r = READ_ONCE(var);
> >
> > It looks like our current implementation of set_bit() would already run
> > into this if some call sites for a particular bitmap `pass in constant
> > bit positions (which get optimized to byte wide "orb") while others pass
> > in a variable bit (which execute as 64-bit "bts").
>
> Yes, but luckily most nobody cares.
>
> I only know of two places in the entire kernel where we considered this,
> one is clear_bit_unlock_is_negative_byte() and there we punted and
> stuffed everything in a single byte, and the other is that x86
> queued_fetch_set_pending_acquire() thing I pointed out earlier.
>
> > I'm not a h/w architect ... but I've assumed that a LOCK operation
> > on something contained entirely within a cache line gets its atomicity
> > by keeping exclusive ownership of the cache line.
>
> Right, but like I just wrote to Andy, consider SMT where each thread has
> its own store-buffer. Then the line is local to the core, but there
> still is a remote sb to hide stores in.
>
> I don't know if anything x86 does that, or even allows that, but I'm not
> aware of specs that are clear enough to say either way.

On x86 'xchg' is always 'locked' regardless of whether there is a 'lock' prefix.
set_bit() (etc) include the 'lock' prefix (dunno why this decision was made...).

For locked operations (including misaligned ones) that don't cross cache-line
boundaries the read operation almost certainly locks the cache line (against
a snoop) until the write has updated the cache line.
This won't happen until the write 'drains' from the store buffer.
(I suspect that locked read requests act like write requests in ensuring
that no other cpu has a dirty copy of the cache line, and also marking it dirty.)
Although this will delay the response to the snoop it will only
stall the cpu (or other bus master), not the entire memory 'bus'.

If you read the description of 'lock btr' you'll see that it always does the
write cycle (to complete the atomic RMW expected by the memory
subsystem) even when the bit is clear.

Remote store buffers are irrelevant to locked accesses.
(If you are doing concurrent locked and unlocked accesses to the same
memory location something is badly broken.)

It really can't matter whether one access is a mis-aligned 64bit word
and the other a byte. Both do atomic RMW updates so the result
cannot be unexpected.

In principle two separate 8 bit RMW cycles could be done concurrently
to two halves of a 16 bit 'flag' word without losing any bits or any reads
returning any of the expected 4 values.
Not that any memory system would support such updates.

David

-
Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK
Registration No: 1397386 (Wales)

2019-12-12 13:06:51

by Peter Zijlstra

[permalink] [raw]
Subject: Re: [PATCH v10 6/6] x86/split_lock: Enable split lock detection by kernel parameter

On Thu, Dec 12, 2019 at 10:36:27AM +0000, David Laight wrote:

> On x86 'xchg' is always 'locked' regardless of whether there is a 'lock' prefix.

Sure, irrelevant here though.

> set_bit() (etc) include the 'lock' prefix (dunno why this decision was made...).

Because it is the atomic set bit function, we have __set_bit() if you
want the non-atomic one.

Atomic bitops are (obviously) useful if you have concurrent changes to
your bitmap.

Lots of people seem confused on this though, as evidenced by a lot of
the broken crap we keep finding (then again, them using __set_bit()
would still be broken due to the endian thing).

> For locked operations (including misaligned ones) that don't cross cache-line
> boundaries the read operation almost certainly locks the cache line (against
> a snoop) until the write has updated the cache line.

Note your use of 'almost'. Almost isn't good enough. Note that other
architectures allow the store from atomic operations to hit the store
buffer. And I strongly suspect x86 does the same.

Waiting for a store-buffer drain is *expensive*.

Try timing:

LOCK INC (ptr);

vs

LOCK INC (ptr);
MFENCE

My guess is the second one *far* more expensive. MFENCE drains (and waits
for completion thereof) the store-buffer -- it must since it fences
against non-coherent stuff.

I suppose ARM's DMB vs DSB is of similar distinction.

> This won't happen until the write 'drains' from the store buffer.
> (I suspect that locked read requests act like write requests in ensuring
> that no other cpu has a dirty copy of the cache line, and also marking it dirty.)
> Although this will delay the response to the snoop it will only
> stall the cpu (or other bus master), not the entire memory 'bus'.

I really don't think so. The commit I pointed to earlier in the thread,
that replaced MFENCE with LOCK ADD $0, -4(%RSP) for smp_mb(), strongly
indicates LOCK prefixed instructions do _NOT_ flush the store buffer.

All barriers impose is order, if your store-buffer can preserve order,
all should just work. One possible way would be to tag each entry, and
increment the tag on barrier. Then ensure that all smaller tags are
flushed before allowing a higher tagged entry to leave.

> If you read the description of 'lock btr' you'll see that it always does the
> write cycle (to complete the atomic RMW expected by the memory
> subsystem) even when the bit is clear.

I know it does, but I don't see how that is relevant here.

> Remote store buffers are irrelevant to locked accesses.

They are not in general and I've seen nothing to indicate this is the
case on x86.

> (If you are doing concurrent locked and unlocked accesses to the same
> memory location something is badly broken.)

It is actually quite common.

> It really can't matter whether one access is a mis-aligned 64bit word
> and the other a byte. Both do atomic RMW updates so the result
> cannot be unexpected.

Expectations are often violated. Esp when talking about memory ordering.

> In principle two separate 8 bit RMW cycles could be done concurrently
> to two halves of a 16 bit 'flag' word without losing any bits or any reads
> returning any of the expected 4 values.
> Not that any memory system would support such updates.

I'm thinking you ought to go read that paper on mixed size concurrency I
referenced earlier in this thread. IIRC the conclusion was that PowerPC
does exactly that and ARM64 allows for it but it hasn't been observed,
yet.

Anyway, I'm not saying x86 behaves this way, I'm saying that I have lots
of questions and very little answers. I'm also saying that the variant
with non-overlapping atomics could conceivably misbehave, while the
variant with overlapping atomics is guaranteed not to.

Specifically smp_mb()/SYNC on PowerPC can not restore Sequential
Consistency under mixed size operations. How's that for expectations?

2019-12-12 16:05:13

by Andy Lutomirski

[permalink] [raw]
Subject: Re: [PATCH v10 6/6] x86/split_lock: Enable split lock detection by kernel parameter


> On Dec 12, 2019, at 5:04 AM, Peter Zijlstra <[email protected]> wrote:
>
> Waiting for a store-buffer drain is *expensive*.
>
> Try timing:
>
> LOCK INC (ptr);
>
> vs
>
> LOCK INC (ptr);
> MFENCE
>
> My guess is the second one *far* more expensive. MFENCE drains (and waits
> for completion thereof) the store-buffer -- it must since it fences
> against non-coherent stuff.

MFENCE also implies LFENCE, and LFENCE is fairly slow despite having no architectural semantics other than blocking speculative execution. AFAICT, in the absence of side channels timing oddities, there is no code whatsoever that would be correct with LFENCE but incorrect without it. “Serialization” is, to some extent, a weaker example of this — MOV to CR2 is *much* slower than MFENCE or LOCK despite the fact that, as far as the memory model is concerned, it doesn’t do a whole lot more.

So the fact that draining some buffer or stalling some superscalar thingy is expensive doesn’t necessarily mean that the lack of said draining is observable in the memory model.

(LFENCE before RDTSC counts as “timing” here.)

2019-12-12 16:24:22

by David Laight

[permalink] [raw]
Subject: RE: [PATCH v10 6/6] x86/split_lock: Enable split lock detection by kernel parameter

From: Andy Lutomirski
> Sent: 12 December 2019 16:02
...
> MFENCE also implies LFENCE, and LFENCE is fairly slow despite having no architectural semantics other than blocking speculative
> execution. AFAICT, in the absence of side channels timing oddities, there is no code whatsoever that would be correct with LFENCE
> but incorrect without it. “Serialization” is, to some extent, a weaker example of this — MOV to CR2 is *much* slower than MFENCE or
> LOCK despite the fact that, as far as the memory model is concerned, it doesn’t do a whole lot more.

IIRC LFENCE does affect things when you are mixing non-temporal and/or write-combining
memory accesses.

I also thought there was a case where you needed to stop the speculative reads.
But can't remember why.

David

-
Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK
Registration No: 1397386 (Wales)

2019-12-12 16:30:54

by David Laight

[permalink] [raw]
Subject: RE: [PATCH v10 6/6] x86/split_lock: Enable split lock detection by kernel parameter

From: Peter Zijlstra
> Sent: 12 December 2019 13:04
> On Thu, Dec 12, 2019 at 10:36:27AM +0000, David Laight wrote:
...
> > set_bit() (etc) include the 'lock' prefix (dunno why this decision was made...).
>
> Because it is the atomic set bit function, we have __set_bit() if you
> want the non-atomic one.

Horrid name, looks like part of the implementation...
I know _ prefixes get used for functions that don't acquire the obvious lock,
but they usually require the caller to hold the lock.

set_bit_nonatomic() and set_bit_atomic() wold be better names.

> Atomic bitops are (obviously) useful if you have concurrent changes to
> your bitmap.
>
> Lots of people seem confused on this though, as evidenced by a lot of
> the broken crap we keep finding (then again, them using __set_bit()
> would still be broken due to the endian thing).

Yep, quite a bit of code just wants x |= 1 << n;

> > For locked operations (including misaligned ones) that don't cross cache-line
> > boundaries the read operation almost certainly locks the cache line (against
> > a snoop) until the write has updated the cache line.
>
> Note your use of 'almost'. Almost isn't good enough. Note that other
> architectures allow the store from atomic operations to hit the store
> buffer. And I strongly suspect x86 does the same.
>
> Waiting for a store-buffer drain is *expensive*.

Right, the cpu doesn't need to wait for the store buffer to drain,
but the cache line needs to remain locked until it has drained.

...
> > This won't happen until the write 'drains' from the store buffer.
> > (I suspect that locked read requests act like write requests in ensuring
> > that no other cpu has a dirty copy of the cache line, and also marking it dirty.)
> > Although this will delay the response to the snoop it will only
> > stall the cpu (or other bus master), not the entire memory 'bus'.
>
> I really don't think so. The commit I pointed to earlier in the thread,
> that replaced MFENCE with LOCK ADD $0, -4(%RSP) for smp_mb(), strongly
> indicates LOCK prefixed instructions do _NOT_ flush the store buffer.

They don't need to.
It is only a remote cpu trying to gain exclusive access to the cache line
that needs to be stalled by the LOCK prefix write.
Once that write has escaped the store buffer the cache line can be released.

Of course the store buffer may be able to contain the write data for multiple
atomic operations to different parts of the same cache line.

...
> > (If you are doing concurrent locked and unlocked accesses to the same
> > memory location something is badly broken.)
>
> It is actually quite common.

Sorry I meant unlocked writes.

> > It really can't matter whether one access is a mis-aligned 64bit word
> > and the other a byte. Both do atomic RMW updates so the result
> > cannot be unexpected.
>
> Expectations are often violated. Esp when talking about memory ordering.

Especially on DEC Alpha :-)

> > In principle two separate 8 bit RMW cycles could be done concurrently
> > to two halves of a 16 bit 'flag' word without losing any bits or any reads
> > returning any of the expected 4 values.
> > Not that any memory system would support such updates.
>
> I'm thinking you ought to go read that paper on mixed size concurrency I
> referenced earlier in this thread. IIRC the conclusion was that PowerPC
> does exactly that and ARM64 allows for it but it hasn't been observed,
> yet.

CPU with shared L1 cache might manage to behave 'oddly'.
But they still need to do locked RMW cycles.

> Anyway, I'm not saying x86 behaves this way, I'm saying that I have lots
> of questions and very little answers. I'm also saying that the variant
> with non-overlapping atomics could conceivably misbehave, while the
> variant with overlapping atomics is guaranteed not to.
>
> Specifically smp_mb()/SYNC on PowerPC can not restore Sequential
> Consistency under mixed size operations. How's that for expectations?

Is that the spanish inquistion?

David

-
Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK
Registration No: 1397386 (Wales)

2019-12-12 18:53:43

by Tony Luck

[permalink] [raw]
Subject: RE: [PATCH v10 6/6] x86/split_lock: Enable split lock detection by kernel parameter

> If anything we could switch the entire bitmap interface to unsigned int,
> but I'm not sure that'd actually help much.

As we've been looking for potential split lock issues in kernel code, most of
the ones we found relate to callers who have <=32 bits and thus stick:

u32 flags;

in their structure. So it would solve those places, and fix any future code
where someone does the same thing.

-Tony

2019-12-12 19:41:49

by Andy Lutomirski

[permalink] [raw]
Subject: Re: [PATCH v10 6/6] x86/split_lock: Enable split lock detection by kernel parameter

On Wed, Dec 11, 2019 at 2:34 PM Peter Zijlstra <[email protected]> wrote:
>
> On Wed, Dec 11, 2019 at 10:12:56AM -0800, Andy Lutomirski wrote:

> > > Sure, but we're talking two cpus here.
> > >
> > > u32 var = 0;
> > > u8 *ptr = &var;
> > >
> > > CPU0 CPU1
> > >
> > > xchg(ptr, 1)
> > >
> > > xchg((ptr+1, 1);
> > > r = READ_ONCE(var);
> > >
> > > AFAICT nothing guarantees r == 0x0101. The CPU1 store can be stuck in
> > > CPU1's store-buffer. CPU0's xchg() does not overlap and therefore
> > > doesn't force a snoop or forward.
> >
> > I think I don't quite understand. The final value of var had better
> > be 0x0101 or something is severely wrong.
>
> > But r can be 0x0100 because
> > nothing in this example guarantees that the total order of the locked
> > instructions has CPU 1's instruction first.
>
> Assuming CPU1 goes first, why would the load from CPU0 see CPU1's
> ptr[0]? It can be in CPU1 store buffer, and TSO allows regular reads to
> ignore (remote) store-buffers.

What I'm saying is: if CPU0 goes first, then the three operations order as:



xchg(ptr+1, 1);
r = READ_ONCE(var); /* 0x0100 */
xchg(ptr, 1);

Anyway, this is all a bit too hypothetical for me. Is there a clear
example where the total ordering of LOCKed instructions is observable?
That is, is there a sequence of operations on, presumably, two or
three CPUs, such that LOCKed instructions being only partially ordered
allows an outcome that is disallowed by a total ordering? I suspect
there is, but I haven't come up with it yet. (I mean in an x86-like
memory model. Getting this in a relaxed atomic model is easy.)

As a probably bad example:

u32 x0, x1, a1, b0, b1;

CPU 0:
xchg(&x0, 1);
barrier();
a1 = READ_ONCE(x1);

CPU 1:
xchg(&b, 1);

CPU 2:
b1 = READ_ONCE(x1);
smp_rmb(); /* which is just barrier() on x86 */
b0 = READ_ONCE(x0);

Suppose a1 == 0 and b1 == 1. Then we know that CPU0's READ_ONCE
happened before CPU1's xchg and hence CPU0's xchg happened before
CPU1's xchg. We also know that CPU2's first read observed the write
from CPU1's xchg, which means that CPU2's second read should have been
after CPU0's xchg (because the xchg operations have a total order
according to the SDM). This means that b0 can't be 0.

Hence the outcome (a1, b1, b0) == (0, 1, 0) is disallowed.

It's entirely possible that I screwed up the analysis. But I think
this means that the cache coherency mechanism is doing something more
intelligent than just shoving the x0=1 write into the store buffer and
letting it hang out there. Something needs to make sure that CPU 2
observes everything in the same order that CPU 0 observes, and, as far
as I know it, there is a considerable amount of complexity in the CPUs
that makes sure this happens.

So here's my question: do you have a concrete example of a series of
operations and an outcome that you suspect Intel CPUs allow but that
is disallowed in the SDM?

--Andy

2019-12-12 19:47:16

by Tony Luck

[permalink] [raw]
Subject: RE: [PATCH v10 6/6] x86/split_lock: Enable split lock detection by kernel parameter

>> If anything we could switch the entire bitmap interface to unsigned int,
>> but I'm not sure that'd actually help much.
>
> As we've been looking for potential split lock issues in kernel code, most of
> the ones we found relate to callers who have <=32 bits and thus stick:
>
> u32 flags;
>
> in their structure. So it would solve those places, and fix any future code
> where someone does the same thing.

If different architectures can do better with 8-bit/16-bit/32-bit/64-bit instructions
to manipulate bitmaps, then perhaps this is justification to make all the
functions operate on "bitmap_t" and have each architecture provide the
typedef for their favorite width.

-Tony

2019-12-12 20:02:32

by Andy Lutomirski

[permalink] [raw]
Subject: Re: [PATCH v10 6/6] x86/split_lock: Enable split lock detection by kernel parameter

On Thu, Dec 12, 2019 at 11:46 AM Luck, Tony <[email protected]> wrote:
>
> >> If anything we could switch the entire bitmap interface to unsigned int,
> >> but I'm not sure that'd actually help much.
> >
> > As we've been looking for potential split lock issues in kernel code, most of
> > the ones we found relate to callers who have <=32 bits and thus stick:
> >
> > u32 flags;
> >
> > in their structure. So it would solve those places, and fix any future code
> > where someone does the same thing.
>
> If different architectures can do better with 8-bit/16-bit/32-bit/64-bit instructions
> to manipulate bitmaps, then perhaps this is justification to make all the
> functions operate on "bitmap_t" and have each architecture provide the
> typedef for their favorite width.
>

Hmm. IMO there are really two different types of uses of the API.

1 There's a field somewhere and I want to atomically set a bit. Something like:

struct whatever {
...
whatever_t field;
...
};

struct whatever *w;
set_bit(3, &w->field);

If whatever_t is architecture-dependent, then it's really awkward to
use more than 32 bits, since some architectures won't have more than
32-bits.


2. DECLARE_BITMAP(), etc. That is, someone wants a biggish bitmap
with a certain number of bits.

Here the type doesn't really matter.

On an architecture with genuinely atomic bit operations (i.e. no
hashed spinlocks involved), the width really shouldn't matter.
set_bit() should promise to be atomic on that bit, to be a full
barrier, and to not modify adjacent bits. I don't see why the width
would matter for most use cases. If we're concerned, the
implementation could actually use the largest atomic operation and
just suitably align it. IOW, on x86, LOCK BTSQ *where we manually
align the pointer to 8 bytes and adjust the bit number accordingly*
should cover every possible case even of PeterZ's concerns are
correct.

For the "I have a field in a struct and I just want an atomic RMW that
changes one bit*, an API that matches the rest of the atomic API seems
nice: just act on atomic_t and atomic64_t.

The current "unsigned long" thing basically can't be used on a 64-bit
big-endian architecture with a 32-bit field without gross hackery.
And sometimes we actually want a 32-bit field.

Or am I missing some annoying subtlely here?

2019-12-13 00:44:42

by Tony Luck

[permalink] [raw]
Subject: [PATCH v11] x86/split_lock: Enable split lock detection by kernel parameter

From: Peter Zijlstra <[email protected]>

A split-lock occurs when an atomic instruction operates on data
that spans two cache lines. In order to maintain atomicity the
core takes a global bus lock.

This is typically >1000 cycles slower than an atomic operation
within a cache line. It also disrupts performance on other cores
(which must wait for the bus lock to be released before their
memory operations can complete. For real-time systems this may
mean missing deadlines. For other systems it may just be very
annoying.

Some CPUs have the capability to raise an #AC trap when a
split lock is attempted.

Provide a command line option to give the user choices on how
to handle this. split_lock_detect=
off - not enabled (no traps for split locks)
warn - warn once when an application does a
split lock, bust allow it to continue
running.
fatal - Send SIGBUS to applications that cause split lock

Default is "warn". Note that if the kernel hits a split lock
in any mode other than "off" it will OOPs.

One implementation wrinkle is that the MSR to control the
split lock detection is per-core, not per thread. This might
result in some short lived races on HT systems in "warn" mode
if Linux tries to enable on one thread while disabling on
the other. Race analysis by Sean Christopherson:

- Toggling of split-lock is only done in "warn" mode. Worst case
scenario of a race is that a misbehaving task will generate multiple
#AC exceptions on the same instruction. And this race will only occur
if both siblings are running tasks that generate split-lock #ACs, e.g.
a race where sibling threads are writing different values will only
occur if CPUx is disabling split-lock after an #AC and CPUy is
re-enabling split-lock after *its* previous task generated an #AC.
- Transitioning between modes at runtime isn't supported and disabling
is tracked per task, so hardware will always reach a steady state that
matches the configured mode. I.e. split-lock is guaranteed to be
enabled in hardware once all _TIF_SLD threads have been scheduled out.

Co-developed-by: Fenghua Yu <[email protected]>
Co-developed-by: Peter Zijlstra <[email protected]>
Signed-off-by: Tony Luck <[email protected]>

---

[Note that I gave PeterZ Author credit because the majority
of the code here came from his untested patch. I just fixed
the typos. He didn't give a "Signed-off-by" ... so he can
either add one to this, or disavow all knowledge - his choice]
---
.../admin-guide/kernel-parameters.txt | 18 ++
Makefile | 4 +-
arch/x86/include/asm/cpu.h | 17 ++
arch/x86/include/asm/cpufeatures.h | 2 +
arch/x86/include/asm/msr-index.h | 8 +
arch/x86/include/asm/thread_info.h | 6 +-
arch/x86/include/asm/traps.h | 1 +
arch/x86/kernel/cpu/common.c | 2 +
arch/x86/kernel/cpu/intel.c | 170 ++++++++++++++++++
arch/x86/kernel/process.c | 3 +
arch/x86/kernel/traps.c | 29 ++-
11 files changed, 254 insertions(+), 6 deletions(-)

diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt
index ade4e6ec23e0..173c1acff5f0 100644
--- a/Documentation/admin-guide/kernel-parameters.txt
+++ b/Documentation/admin-guide/kernel-parameters.txt
@@ -3181,6 +3181,24 @@

nosoftlockup [KNL] Disable the soft-lockup detector.

+ split_lock_detect=
+ [X86] Enable split lock detection
+
+ When enabled (and if hardware support is present), atomic
+ instructions that access data across cache line
+ boundaries will result in an alignment check exception.
+
+ off - not enabled
+
+ warn - the kernel will pr_alert about applications
+ triggering the #AC exception
+
+ fatal - the kernel will SIGBUS applications that
+ trigger the #AC exception.
+
+ For any more other than 'off' the kernel will die if
+ it (or firmware) will trigger #AC.
+
nosync [HW,M68K] Disables sync negotiation for all devices.

nowatchdog [KNL] Disable both lockup detectors, i.e.
diff --git a/Makefile b/Makefile
index 999a197d67d2..73e3c2802927 100644
--- a/Makefile
+++ b/Makefile
@@ -1,8 +1,8 @@
# SPDX-License-Identifier: GPL-2.0
VERSION = 5
-PATCHLEVEL = 4
+PATCHLEVEL = 5
SUBLEVEL = 0
-EXTRAVERSION =
+EXTRAVERSION = -rc1
NAME = Kleptomaniac Octopus

# *DOCUMENTATION*
diff --git a/arch/x86/include/asm/cpu.h b/arch/x86/include/asm/cpu.h
index adc6cc86b062..5223504c7e7c 100644
--- a/arch/x86/include/asm/cpu.h
+++ b/arch/x86/include/asm/cpu.h
@@ -40,4 +40,21 @@ int mwait_usable(const struct cpuinfo_x86 *);
unsigned int x86_family(unsigned int sig);
unsigned int x86_model(unsigned int sig);
unsigned int x86_stepping(unsigned int sig);
+#ifdef CONFIG_CPU_SUP_INTEL
+extern void __init cpu_set_core_cap_bits(struct cpuinfo_x86 *c);
+extern bool handle_split_lock(void);
+extern bool handle_user_split_lock(struct pt_regs *regs, long error_code);
+extern void switch_sld(struct task_struct *);
+#else
+static inline void __init cpu_set_core_cap_bits(struct cpuinfo_x86 *c) {}
+static inline bool handle_split_lock(void)
+{
+ return false;
+}
+static inline bool handle_user_split_lock(struct pt_regs *regs, long error_code)
+{
+ return false;
+}
+static inline void switch_sld(struct task_struct *prev) {}
+#endif
#endif /* _ASM_X86_CPU_H */
diff --git a/arch/x86/include/asm/cpufeatures.h b/arch/x86/include/asm/cpufeatures.h
index e9b62498fe75..c3edd2bba184 100644
--- a/arch/x86/include/asm/cpufeatures.h
+++ b/arch/x86/include/asm/cpufeatures.h
@@ -220,6 +220,7 @@
#define X86_FEATURE_ZEN ( 7*32+28) /* "" CPU is AMD family 0x17 (Zen) */
#define X86_FEATURE_L1TF_PTEINV ( 7*32+29) /* "" L1TF workaround PTE inversion */
#define X86_FEATURE_IBRS_ENHANCED ( 7*32+30) /* Enhanced IBRS */
+#define X86_FEATURE_SPLIT_LOCK_DETECT ( 7*32+31) /* #AC for split lock */

/* Virtualization flags: Linux defined, word 8 */
#define X86_FEATURE_TPR_SHADOW ( 8*32+ 0) /* Intel TPR Shadow */
@@ -365,6 +366,7 @@
#define X86_FEATURE_INTEL_STIBP (18*32+27) /* "" Single Thread Indirect Branch Predictors */
#define X86_FEATURE_FLUSH_L1D (18*32+28) /* Flush L1D cache */
#define X86_FEATURE_ARCH_CAPABILITIES (18*32+29) /* IA32_ARCH_CAPABILITIES MSR (Intel) */
+#define X86_FEATURE_CORE_CAPABILITIES (18*32+30) /* "" IA32_CORE_CAPABILITIES MSR */
#define X86_FEATURE_SPEC_CTRL_SSBD (18*32+31) /* "" Speculative Store Bypass Disable */

/*
diff --git a/arch/x86/include/asm/msr-index.h b/arch/x86/include/asm/msr-index.h
index 084e98da04a7..8bb2e08ce4a3 100644
--- a/arch/x86/include/asm/msr-index.h
+++ b/arch/x86/include/asm/msr-index.h
@@ -41,6 +41,10 @@

/* Intel MSRs. Some also available on other CPUs */

+#define MSR_TEST_CTRL 0x00000033
+#define MSR_TEST_CTRL_SPLIT_LOCK_DETECT_BIT 29
+#define MSR_TEST_CTRL_SPLIT_LOCK_DETECT BIT(MSR_TEST_CTRL_SPLIT_LOCK_DETECT_BIT)
+
#define MSR_IA32_SPEC_CTRL 0x00000048 /* Speculation Control */
#define SPEC_CTRL_IBRS BIT(0) /* Indirect Branch Restricted Speculation */
#define SPEC_CTRL_STIBP_SHIFT 1 /* Single Thread Indirect Branch Predictor (STIBP) bit */
@@ -70,6 +74,10 @@
*/
#define MSR_IA32_UMWAIT_CONTROL_TIME_MASK (~0x03U)

+#define MSR_IA32_CORE_CAPABILITIES 0x000000cf
+#define MSR_IA32_CORE_CAPABILITIES_SPLIT_LOCK_DETECT_BIT 5
+#define MSR_IA32_CORE_CAPABILITIES_SPLIT_LOCK_DETECT BIT(MSR_IA32_CORE_CAPABILITIES_SPLIT_LOCK_DETECT_BIT)
+
#define MSR_PKG_CST_CONFIG_CONTROL 0x000000e2
#define NHM_C3_AUTO_DEMOTE (1UL << 25)
#define NHM_C1_AUTO_DEMOTE (1UL << 26)
diff --git a/arch/x86/include/asm/thread_info.h b/arch/x86/include/asm/thread_info.h
index d779366ce3f8..d23638a0525e 100644
--- a/arch/x86/include/asm/thread_info.h
+++ b/arch/x86/include/asm/thread_info.h
@@ -92,6 +92,7 @@ struct thread_info {
#define TIF_NOCPUID 15 /* CPUID is not accessible in userland */
#define TIF_NOTSC 16 /* TSC is not accessible in userland */
#define TIF_IA32 17 /* IA32 compatibility process */
+#define TIF_SLD 18 /* split_lock_detect */
#define TIF_NOHZ 19 /* in adaptive nohz mode */
#define TIF_MEMDIE 20 /* is terminating due to OOM killer */
#define TIF_POLLING_NRFLAG 21 /* idle is polling for TIF_NEED_RESCHED */
@@ -122,6 +123,7 @@ struct thread_info {
#define _TIF_NOCPUID (1 << TIF_NOCPUID)
#define _TIF_NOTSC (1 << TIF_NOTSC)
#define _TIF_IA32 (1 << TIF_IA32)
+#define _TIF_SLD (1 << TIF_SLD)
#define _TIF_NOHZ (1 << TIF_NOHZ)
#define _TIF_POLLING_NRFLAG (1 << TIF_POLLING_NRFLAG)
#define _TIF_IO_BITMAP (1 << TIF_IO_BITMAP)
@@ -158,9 +160,9 @@ struct thread_info {

#ifdef CONFIG_X86_IOPL_IOPERM
# define _TIF_WORK_CTXSW_PREV (_TIF_WORK_CTXSW| _TIF_USER_RETURN_NOTIFY | \
- _TIF_IO_BITMAP)
+ _TIF_IO_BITMAP | _TIF_SLD)
#else
-# define _TIF_WORK_CTXSW_PREV (_TIF_WORK_CTXSW| _TIF_USER_RETURN_NOTIFY)
+# define _TIF_WORK_CTXSW_PREV (_TIF_WORK_CTXSW| _TIF_USER_RETURN_NOTIFY | _TIF_SLD)
#endif

#define _TIF_WORK_CTXSW_NEXT (_TIF_WORK_CTXSW)
diff --git a/arch/x86/include/asm/traps.h b/arch/x86/include/asm/traps.h
index ffa0dc8a535e..6ceab60370f0 100644
--- a/arch/x86/include/asm/traps.h
+++ b/arch/x86/include/asm/traps.h
@@ -175,4 +175,5 @@ enum x86_pf_error_code {
X86_PF_INSTR = 1 << 4,
X86_PF_PK = 1 << 5,
};
+
#endif /* _ASM_X86_TRAPS_H */
diff --git a/arch/x86/kernel/cpu/common.c b/arch/x86/kernel/cpu/common.c
index 2e4d90294fe6..39245f61fad0 100644
--- a/arch/x86/kernel/cpu/common.c
+++ b/arch/x86/kernel/cpu/common.c
@@ -1234,6 +1234,8 @@ static void __init early_identify_cpu(struct cpuinfo_x86 *c)

cpu_set_bug_bits(c);

+ cpu_set_core_cap_bits(c);
+
fpu__init_system(c);

#ifdef CONFIG_X86_32
diff --git a/arch/x86/kernel/cpu/intel.c b/arch/x86/kernel/cpu/intel.c
index 4a900804a023..79cec85c5132 100644
--- a/arch/x86/kernel/cpu/intel.c
+++ b/arch/x86/kernel/cpu/intel.c
@@ -19,6 +19,8 @@
#include <asm/microcode_intel.h>
#include <asm/hwcap2.h>
#include <asm/elf.h>
+#include <asm/cpu_device_id.h>
+#include <asm/cmdline.h>

#ifdef CONFIG_X86_64
#include <linux/topology.h>
@@ -31,6 +33,14 @@
#include <asm/apic.h>
#endif

+enum split_lock_detect_state {
+ sld_off = 0,
+ sld_warn,
+ sld_fatal,
+};
+
+static enum split_lock_detect_state sld_state = sld_warn;
+
/*
* Just in case our CPU detection goes bad, or you have a weird system,
* allow a way to override the automatic disabling of MPX.
@@ -652,6 +662,8 @@ static void init_intel_misc_features(struct cpuinfo_x86 *c)
wrmsrl(MSR_MISC_FEATURES_ENABLES, msr);
}

+static void split_lock_init(void);
+
static void init_intel(struct cpuinfo_x86 *c)
{
early_init_intel(c);
@@ -767,6 +779,8 @@ static void init_intel(struct cpuinfo_x86 *c)
tsx_enable();
if (tsx_ctrl_state == TSX_CTRL_DISABLE)
tsx_disable();
+
+ split_lock_init();
}

#ifdef CONFIG_X86_32
@@ -1028,3 +1042,159 @@ static const struct cpu_dev intel_cpu_dev = {
};

cpu_dev_register(intel_cpu_dev);
+
+#undef pr_fmt
+#define pr_fmt(fmt) "x86/split lock detection: " fmt
+
+static const struct {
+ const char *option;
+ enum split_lock_detect_state state;
+} sld_options[] __initconst = {
+ { "off", sld_off },
+ { "warn", sld_warn },
+ { "fatal", sld_fatal },
+};
+
+static inline bool match_option(const char *arg, int arglen, const char *opt)
+{
+ int len = strlen(opt);
+
+ return len == arglen && !strncmp(arg, opt, len);
+}
+
+static void __init split_lock_setup(void)
+{
+ enum split_lock_detect_state sld = sld_state;
+ char arg[20];
+ int i, ret;
+
+ setup_force_cpu_cap(X86_FEATURE_SPLIT_LOCK_DETECT);
+
+ ret = cmdline_find_option(boot_command_line, "split_lock_detect",
+ arg, sizeof(arg));
+ if (ret < 0)
+ goto print;
+
+ for (i = 0; i < ARRAY_SIZE(sld_options); i++) {
+ if (match_option(arg, ret, sld_options[i].option)) {
+ sld = sld_options[i].state;
+ break;
+ }
+ }
+
+ if (sld != sld_state)
+ sld_state = sld;
+
+print:
+ switch(sld) {
+ case sld_off:
+ pr_info("disabled\n");
+ break;
+
+ case sld_warn:
+ pr_info("warning about user-space split_locks\n");
+ break;
+
+ case sld_fatal:
+ pr_info("sending SIGBUS on user-space split_locks\n");
+ break;
+ }
+}
+
+/*
+ * The TEST_CTRL MSR is per core. So multiple threads can
+ * read/write the MSR in parallel. But it's possible to
+ * simplify the read/write without locking and without
+ * worry about overwriting the MSR because only bit 29
+ * is implemented in the MSR and the bit is set as 1 by all
+ * threads. Locking may be needed in the future if situation
+ * is changed e.g. other bits are implemented.
+ */
+
+static bool __sld_msr_set(bool on)
+{
+ u64 test_ctrl_val;
+
+ if (rdmsrl_safe(MSR_TEST_CTRL, &test_ctrl_val))
+ return false;
+
+ if (on)
+ test_ctrl_val |= MSR_TEST_CTRL_SPLIT_LOCK_DETECT;
+ else
+ test_ctrl_val &= ~MSR_TEST_CTRL_SPLIT_LOCK_DETECT;
+
+ if (wrmsrl_safe(MSR_TEST_CTRL, test_ctrl_val))
+ return false;
+
+ return true;
+}
+
+static void split_lock_init(void)
+{
+ if (sld_state == sld_off)
+ return;
+
+ if (__sld_msr_set(true))
+ return;
+
+ /*
+ * If this is anything other than the boot-cpu, you've done
+ * funny things and you get to keep whatever pieces.
+ */
+ pr_warn("MSR fail -- disabled\n");
+ __sld_msr_set(sld_off);
+}
+
+bool handle_split_lock(void)
+{
+ return sld_state != sld_off;
+}
+
+bool handle_user_split_lock(struct pt_regs *regs, long error_code)
+{
+ if ((regs->eflags & X86_EFLAGS_AC) || sld_state == sld_fatal)
+ return false;
+
+ pr_alert("#AC: %s/%d took a split_lock trap at address: 0x%lx\n",
+ current->comm, current->pid, regs->ip);
+
+ __sld_msr_set(false);
+ set_tsk_thread_flag(current, TIF_SLD);
+ return true;
+}
+
+void switch_sld(struct task_struct *prev)
+{
+ __sld_msr_set(true);
+ clear_tsk_thread_flag(prev, TIF_SLD);
+}
+
+#define SPLIT_LOCK_CPU(model) {X86_VENDOR_INTEL, 6, model, X86_FEATURE_ANY}
+
+/*
+ * The following processors have split lock detection feature. But since they
+ * don't have MSR IA32_CORE_CAPABILITIES, the feature cannot be enumerated by
+ * the MSR. So enumerate the feature by family and model on these processors.
+ */
+static const struct x86_cpu_id split_lock_cpu_ids[] __initconst = {
+ SPLIT_LOCK_CPU(INTEL_FAM6_ICELAKE_X),
+ SPLIT_LOCK_CPU(INTEL_FAM6_ICELAKE_L),
+ {}
+};
+
+void __init cpu_set_core_cap_bits(struct cpuinfo_x86 *c)
+{
+ u64 ia32_core_caps = 0;
+
+ if (cpu_has(c, X86_FEATURE_CORE_CAPABILITIES)) {
+ /* Enumerate features reported in IA32_CORE_CAPABILITIES MSR. */
+ rdmsrl(MSR_IA32_CORE_CAPABILITIES, ia32_core_caps);
+ } else if (!boot_cpu_has(X86_FEATURE_HYPERVISOR)) {
+ /* Enumerate split lock detection by family and model. */
+ if (x86_match_cpu(split_lock_cpu_ids))
+ ia32_core_caps |= MSR_IA32_CORE_CAPABILITIES_SPLIT_LOCK_DETECT;
+ }
+
+ if (ia32_core_caps & MSR_IA32_CORE_CAPABILITIES_SPLIT_LOCK_DETECT)
+ split_lock_setup();
+}
diff --git a/arch/x86/kernel/process.c b/arch/x86/kernel/process.c
index 61e93a318983..55d205820f35 100644
--- a/arch/x86/kernel/process.c
+++ b/arch/x86/kernel/process.c
@@ -654,6 +654,9 @@ void __switch_to_xtra(struct task_struct *prev_p, struct task_struct *next_p)
/* Enforce MSR update to ensure consistent state */
__speculation_ctrl_update(~tifn, tifn);
}
+
+ if (tifp & _TIF_SLD)
+ switch_sld(prev_p);
}

/*
diff --git a/arch/x86/kernel/traps.c b/arch/x86/kernel/traps.c
index 05da6b5b167b..a933a01f6e40 100644
--- a/arch/x86/kernel/traps.c
+++ b/arch/x86/kernel/traps.c
@@ -46,6 +46,7 @@
#include <asm/traps.h>
#include <asm/desc.h>
#include <asm/fpu/internal.h>
+#include <asm/cpu.h>
#include <asm/cpu_entry_area.h>
#include <asm/mce.h>
#include <asm/fixmap.h>
@@ -242,7 +243,6 @@ do_trap(int trapnr, int signr, char *str, struct pt_regs *regs,
{
struct task_struct *tsk = current;

-
if (!do_trap_no_signal(tsk, trapnr, str, regs, error_code))
return;

@@ -288,9 +288,34 @@ DO_ERROR(X86_TRAP_OLD_MF, SIGFPE, 0, NULL, "coprocessor segment overru
DO_ERROR(X86_TRAP_TS, SIGSEGV, 0, NULL, "invalid TSS", invalid_TSS)
DO_ERROR(X86_TRAP_NP, SIGBUS, 0, NULL, "segment not present", segment_not_present)
DO_ERROR(X86_TRAP_SS, SIGBUS, 0, NULL, "stack segment", stack_segment)
-DO_ERROR(X86_TRAP_AC, SIGBUS, BUS_ADRALN, NULL, "alignment check", alignment_check)
#undef IP

+dotraplinkage void do_alignment_check(struct pt_regs *regs, long error_code)
+{
+ unsigned int trapnr = X86_TRAP_AC;
+ char str[] = "alignment check";
+ int signr = SIGBUS;
+
+ RCU_LOCKDEP_WARN(!rcu_is_watching(), "entry code didn't wake RCU");
+
+ if (notify_die(DIE_TRAP, str, regs, error_code, trapnr, signr) == NOTIFY_STOP)
+ return;
+
+ if (!handle_split_lock())
+ return;
+
+ if (!user_mode(regs))
+ die("Split lock detected\n", regs, error_code);
+
+ cond_local_irq_enable(regs);
+
+ if (handle_user_split_lock(regs, error_code))
+ return;
+
+ do_trap(X86_TRAP_AC, SIGBUS, "alignment check", regs,
+ error_code, BUS_ADRALN, NULL);
+}
+
#ifdef CONFIG_VMAP_STACK
__visible void __noreturn handle_stack_overflow(const char *message,
struct pt_regs *regs,
--
2.20.1

2019-12-13 00:46:56

by Tony Luck

[permalink] [raw]
Subject: Re: [PATCH v11] x86/split_lock: Enable split lock detection by kernel parameter

On Thu, Dec 12, 2019 at 04:09:08PM -0800, Tony Luck wrote:
> diff --git a/Makefile b/Makefile
> index 999a197d67d2..73e3c2802927 100644
> --- a/Makefile
> +++ b/Makefile
> @@ -1,8 +1,8 @@
> # SPDX-License-Identifier: GPL-2.0
> VERSION = 5
> -PATCHLEVEL = 4
> +PATCHLEVEL = 5
> SUBLEVEL = 0
> -EXTRAVERSION =
> +EXTRAVERSION = -rc1
> NAME = Kleptomaniac Octopus
>
> # *DOCUMENTATION*

Aaargh - brown paper bag time ... obviously this doesn't
belong here. Must have slipped in when I moved base from
5.4 to 5.5-rc1

-Tony

2020-01-10 19:25:22

by Tony Luck

[permalink] [raw]
Subject: [PATCH v11] x86/split_lock: Enable split lock detection by kernel

From: Peter Zijlstra <[email protected]>

A split-lock occurs when an atomic instruction operates on data
that spans two cache lines. In order to maintain atomicity the
core takes a global bus lock.

This is typically >1000 cycles slower than an atomic operation
within a cache line. It also disrupts performance on other cores
(which must wait for the bus lock to be released before their
memory operations can complete. For real-time systems this may
mean missing deadlines. For other systems it may just be very
annoying.

Some CPUs have the capability to raise an #AC trap when a
split lock is attempted.

Provide a command line option to give the user choices on how
to handle this. split_lock_detect=
off - not enabled (no traps for split locks)
warn - warn once when an application does a
split lock, bust allow it to continue
running.
fatal - Send SIGBUS to applications that cause split lock

Default is "warn". Note that if the kernel hits a split lock
in any mode other than "off" it will OOPs.

One implementation wrinkle is that the MSR to control the
split lock detection is per-core, not per thread. This might
result in some short lived races on HT systems in "warn" mode
if Linux tries to enable on one thread while disabling on
the other. Race analysis by Sean Christopherson:

- Toggling of split-lock is only done in "warn" mode. Worst case
scenario of a race is that a misbehaving task will generate multiple
#AC exceptions on the same instruction. And this race will only occur
if both siblings are running tasks that generate split-lock #ACs, e.g.
a race where sibling threads are writing different values will only
occur if CPUx is disabling split-lock after an #AC and CPUy is
re-enabling split-lock after *its* previous task generated an #AC.
- Transitioning between modes at runtime isn't supported and disabling
is tracked per task, so hardware will always reach a steady state that
matches the configured mode. I.e. split-lock is guaranteed to be
enabled in hardware once all _TIF_SLD threads have been scheduled out.

Co-developed-by: Fenghua Yu <[email protected]>
Co-developed-by: Peter Zijlstra <[email protected]>
Signed-off-by: Peter Zijlstra (Intel) <[email protected]>
Signed-off-by: Tony Luck <[email protected]>
---

I think all the known places where split locks occur in the kernel
have already been patched, or the patches are queued for the upcoming
merge window. If we missed some, well this patch will help find them
(for people with Icelake or Icelake Xeon systems). PeterZ didn't see
any application level use of split locks in a few hours of runtime
on his desktop. So likely little fallout there (default is just to
warn for applications, so just console noise rather than failure).

.../admin-guide/kernel-parameters.txt | 18 ++
arch/x86/include/asm/cpu.h | 17 ++
arch/x86/include/asm/cpufeatures.h | 2 +
arch/x86/include/asm/msr-index.h | 8 +
arch/x86/include/asm/thread_info.h | 6 +-
arch/x86/include/asm/traps.h | 1 +
arch/x86/kernel/cpu/common.c | 2 +
arch/x86/kernel/cpu/intel.c | 170 ++++++++++++++++++
arch/x86/kernel/process.c | 3 +
arch/x86/kernel/traps.c | 29 ++-
10 files changed, 252 insertions(+), 4 deletions(-)

diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt
index ade4e6ec23e0..173c1acff5f0 100644
--- a/Documentation/admin-guide/kernel-parameters.txt
+++ b/Documentation/admin-guide/kernel-parameters.txt
@@ -3181,6 +3181,24 @@

nosoftlockup [KNL] Disable the soft-lockup detector.

+ split_lock_detect=
+ [X86] Enable split lock detection
+
+ When enabled (and if hardware support is present), atomic
+ instructions that access data across cache line
+ boundaries will result in an alignment check exception.
+
+ off - not enabled
+
+ warn - the kernel will pr_alert about applications
+ triggering the #AC exception
+
+ fatal - the kernel will SIGBUS applications that
+ trigger the #AC exception.
+
+ For any more other than 'off' the kernel will die if
+ it (or firmware) will trigger #AC.
+
nosync [HW,M68K] Disables sync negotiation for all devices.

nowatchdog [KNL] Disable both lockup detectors, i.e.
diff --git a/arch/x86/include/asm/cpu.h b/arch/x86/include/asm/cpu.h
index adc6cc86b062..5223504c7e7c 100644
--- a/arch/x86/include/asm/cpu.h
+++ b/arch/x86/include/asm/cpu.h
@@ -40,4 +40,21 @@ int mwait_usable(const struct cpuinfo_x86 *);
unsigned int x86_family(unsigned int sig);
unsigned int x86_model(unsigned int sig);
unsigned int x86_stepping(unsigned int sig);
+#ifdef CONFIG_CPU_SUP_INTEL
+extern void __init cpu_set_core_cap_bits(struct cpuinfo_x86 *c);
+extern bool handle_split_lock(void);
+extern bool handle_user_split_lock(struct pt_regs *regs, long error_code);
+extern void switch_sld(struct task_struct *);
+#else
+static inline void __init cpu_set_core_cap_bits(struct cpuinfo_x86 *c) {}
+static inline bool handle_split_lock(void)
+{
+ return false;
+}
+static inline bool handle_user_split_lock(struct pt_regs *regs, long error_code)
+{
+ return false;
+}
+static inline void switch_sld(struct task_struct *prev) {}
+#endif
#endif /* _ASM_X86_CPU_H */
diff --git a/arch/x86/include/asm/cpufeatures.h b/arch/x86/include/asm/cpufeatures.h
index e9b62498fe75..c3edd2bba184 100644
--- a/arch/x86/include/asm/cpufeatures.h
+++ b/arch/x86/include/asm/cpufeatures.h
@@ -220,6 +220,7 @@
#define X86_FEATURE_ZEN ( 7*32+28) /* "" CPU is AMD family 0x17 (Zen) */
#define X86_FEATURE_L1TF_PTEINV ( 7*32+29) /* "" L1TF workaround PTE inversion */
#define X86_FEATURE_IBRS_ENHANCED ( 7*32+30) /* Enhanced IBRS */
+#define X86_FEATURE_SPLIT_LOCK_DETECT ( 7*32+31) /* #AC for split lock */

/* Virtualization flags: Linux defined, word 8 */
#define X86_FEATURE_TPR_SHADOW ( 8*32+ 0) /* Intel TPR Shadow */
@@ -365,6 +366,7 @@
#define X86_FEATURE_INTEL_STIBP (18*32+27) /* "" Single Thread Indirect Branch Predictors */
#define X86_FEATURE_FLUSH_L1D (18*32+28) /* Flush L1D cache */
#define X86_FEATURE_ARCH_CAPABILITIES (18*32+29) /* IA32_ARCH_CAPABILITIES MSR (Intel) */
+#define X86_FEATURE_CORE_CAPABILITIES (18*32+30) /* "" IA32_CORE_CAPABILITIES MSR */
#define X86_FEATURE_SPEC_CTRL_SSBD (18*32+31) /* "" Speculative Store Bypass Disable */

/*
diff --git a/arch/x86/include/asm/msr-index.h b/arch/x86/include/asm/msr-index.h
index 084e98da04a7..8bb2e08ce4a3 100644
--- a/arch/x86/include/asm/msr-index.h
+++ b/arch/x86/include/asm/msr-index.h
@@ -41,6 +41,10 @@

/* Intel MSRs. Some also available on other CPUs */

+#define MSR_TEST_CTRL 0x00000033
+#define MSR_TEST_CTRL_SPLIT_LOCK_DETECT_BIT 29
+#define MSR_TEST_CTRL_SPLIT_LOCK_DETECT BIT(MSR_TEST_CTRL_SPLIT_LOCK_DETECT_BIT)
+
#define MSR_IA32_SPEC_CTRL 0x00000048 /* Speculation Control */
#define SPEC_CTRL_IBRS BIT(0) /* Indirect Branch Restricted Speculation */
#define SPEC_CTRL_STIBP_SHIFT 1 /* Single Thread Indirect Branch Predictor (STIBP) bit */
@@ -70,6 +74,10 @@
*/
#define MSR_IA32_UMWAIT_CONTROL_TIME_MASK (~0x03U)

+#define MSR_IA32_CORE_CAPABILITIES 0x000000cf
+#define MSR_IA32_CORE_CAPABILITIES_SPLIT_LOCK_DETECT_BIT 5
+#define MSR_IA32_CORE_CAPABILITIES_SPLIT_LOCK_DETECT BIT(MSR_IA32_CORE_CAPABILITIES_SPLIT_LOCK_DETECT_BIT)
+
#define MSR_PKG_CST_CONFIG_CONTROL 0x000000e2
#define NHM_C3_AUTO_DEMOTE (1UL << 25)
#define NHM_C1_AUTO_DEMOTE (1UL << 26)
diff --git a/arch/x86/include/asm/thread_info.h b/arch/x86/include/asm/thread_info.h
index d779366ce3f8..d23638a0525e 100644
--- a/arch/x86/include/asm/thread_info.h
+++ b/arch/x86/include/asm/thread_info.h
@@ -92,6 +92,7 @@ struct thread_info {
#define TIF_NOCPUID 15 /* CPUID is not accessible in userland */
#define TIF_NOTSC 16 /* TSC is not accessible in userland */
#define TIF_IA32 17 /* IA32 compatibility process */
+#define TIF_SLD 18 /* split_lock_detect */
#define TIF_NOHZ 19 /* in adaptive nohz mode */
#define TIF_MEMDIE 20 /* is terminating due to OOM killer */
#define TIF_POLLING_NRFLAG 21 /* idle is polling for TIF_NEED_RESCHED */
@@ -122,6 +123,7 @@ struct thread_info {
#define _TIF_NOCPUID (1 << TIF_NOCPUID)
#define _TIF_NOTSC (1 << TIF_NOTSC)
#define _TIF_IA32 (1 << TIF_IA32)
+#define _TIF_SLD (1 << TIF_SLD)
#define _TIF_NOHZ (1 << TIF_NOHZ)
#define _TIF_POLLING_NRFLAG (1 << TIF_POLLING_NRFLAG)
#define _TIF_IO_BITMAP (1 << TIF_IO_BITMAP)
@@ -158,9 +160,9 @@ struct thread_info {

#ifdef CONFIG_X86_IOPL_IOPERM
# define _TIF_WORK_CTXSW_PREV (_TIF_WORK_CTXSW| _TIF_USER_RETURN_NOTIFY | \
- _TIF_IO_BITMAP)
+ _TIF_IO_BITMAP | _TIF_SLD)
#else
-# define _TIF_WORK_CTXSW_PREV (_TIF_WORK_CTXSW| _TIF_USER_RETURN_NOTIFY)
+# define _TIF_WORK_CTXSW_PREV (_TIF_WORK_CTXSW| _TIF_USER_RETURN_NOTIFY | _TIF_SLD)
#endif

#define _TIF_WORK_CTXSW_NEXT (_TIF_WORK_CTXSW)
diff --git a/arch/x86/include/asm/traps.h b/arch/x86/include/asm/traps.h
index ffa0dc8a535e..6ceab60370f0 100644
--- a/arch/x86/include/asm/traps.h
+++ b/arch/x86/include/asm/traps.h
@@ -175,4 +175,5 @@ enum x86_pf_error_code {
X86_PF_INSTR = 1 << 4,
X86_PF_PK = 1 << 5,
};
+
#endif /* _ASM_X86_TRAPS_H */
diff --git a/arch/x86/kernel/cpu/common.c b/arch/x86/kernel/cpu/common.c
index 2e4d90294fe6..39245f61fad0 100644
--- a/arch/x86/kernel/cpu/common.c
+++ b/arch/x86/kernel/cpu/common.c
@@ -1234,6 +1234,8 @@ static void __init early_identify_cpu(struct cpuinfo_x86 *c)

cpu_set_bug_bits(c);

+ cpu_set_core_cap_bits(c);
+
fpu__init_system(c);

#ifdef CONFIG_X86_32
diff --git a/arch/x86/kernel/cpu/intel.c b/arch/x86/kernel/cpu/intel.c
index 4a900804a023..43cc7a8f077e 100644
--- a/arch/x86/kernel/cpu/intel.c
+++ b/arch/x86/kernel/cpu/intel.c
@@ -19,6 +19,8 @@
#include <asm/microcode_intel.h>
#include <asm/hwcap2.h>
#include <asm/elf.h>
+#include <asm/cpu_device_id.h>
+#include <asm/cmdline.h>

#ifdef CONFIG_X86_64
#include <linux/topology.h>
@@ -31,6 +33,14 @@
#include <asm/apic.h>
#endif

+enum split_lock_detect_state {
+ sld_off = 0,
+ sld_warn,
+ sld_fatal,
+};
+
+static enum split_lock_detect_state sld_state = sld_warn;
+
/*
* Just in case our CPU detection goes bad, or you have a weird system,
* allow a way to override the automatic disabling of MPX.
@@ -652,6 +662,8 @@ static void init_intel_misc_features(struct cpuinfo_x86 *c)
wrmsrl(MSR_MISC_FEATURES_ENABLES, msr);
}

+static void split_lock_init(void);
+
static void init_intel(struct cpuinfo_x86 *c)
{
early_init_intel(c);
@@ -767,6 +779,8 @@ static void init_intel(struct cpuinfo_x86 *c)
tsx_enable();
if (tsx_ctrl_state == TSX_CTRL_DISABLE)
tsx_disable();
+
+ split_lock_init();
}

#ifdef CONFIG_X86_32
@@ -1028,3 +1042,159 @@ static const struct cpu_dev intel_cpu_dev = {
};

cpu_dev_register(intel_cpu_dev);
+
+#undef pr_fmt
+#define pr_fmt(fmt) "x86/split lock detection: " fmt
+
+static const struct {
+ const char *option;
+ enum split_lock_detect_state state;
+} sld_options[] __initconst = {
+ { "off", sld_off },
+ { "warn", sld_warn },
+ { "fatal", sld_fatal },
+};
+
+static inline bool match_option(const char *arg, int arglen, const char *opt)
+{
+ int len = strlen(opt);
+
+ return len == arglen && !strncmp(arg, opt, len);
+}
+
+static void __init split_lock_setup(void)
+{
+ enum split_lock_detect_state sld = sld_state;
+ char arg[20];
+ int i, ret;
+
+ setup_force_cpu_cap(X86_FEATURE_SPLIT_LOCK_DETECT);
+
+ ret = cmdline_find_option(boot_command_line, "split_lock_detect",
+ arg, sizeof(arg));
+ if (ret < 0)
+ goto print;
+
+ for (i = 0; i < ARRAY_SIZE(sld_options); i++) {
+ if (match_option(arg, ret, sld_options[i].option)) {
+ sld = sld_options[i].state;
+ break;
+ }
+ }
+
+ if (sld != sld_state)
+ sld_state = sld;
+
+print:
+ switch(sld) {
+ case sld_off:
+ pr_info("disabled\n");
+ break;
+
+ case sld_warn:
+ pr_info("warning about user-space split_locks\n");
+ break;
+
+ case sld_fatal:
+ pr_info("sending SIGBUS on user-space split_locks\n");
+ break;
+ }
+}
+
+/*
+ * The TEST_CTRL MSR is per core. So multiple threads can
+ * read/write the MSR in parallel. But it's possible to
+ * simplify the read/write without locking and without
+ * worry about overwriting the MSR because only bit 29
+ * is implemented in the MSR and the bit is set as 1 by all
+ * threads. Locking may be needed in the future if situation
+ * is changed e.g. other bits are implemented.
+ */
+
+static bool __sld_msr_set(bool on)
+{
+ u64 test_ctrl_val;
+
+ if (rdmsrl_safe(MSR_TEST_CTRL, &test_ctrl_val))
+ return false;
+
+ if (on)
+ test_ctrl_val |= MSR_TEST_CTRL_SPLIT_LOCK_DETECT;
+ else
+ test_ctrl_val &= ~MSR_TEST_CTRL_SPLIT_LOCK_DETECT;
+
+ if (wrmsrl_safe(MSR_TEST_CTRL, test_ctrl_val))
+ return false;
+
+ return true;
+}
+
+static void split_lock_init(void)
+{
+ if (sld_state == sld_off)
+ return;
+
+ if (__sld_msr_set(true))
+ return;
+
+ /*
+ * If this is anything other than the boot-cpu, you've done
+ * funny things and you get to keep whatever pieces.
+ */
+ pr_warn("MSR fail -- disabled\n");
+ __sld_msr_set(sld_off);
+}
+
+bool handle_split_lock(void)
+{
+ return sld_state != sld_off;
+}
+
+bool handle_user_split_lock(struct pt_regs *regs, long error_code)
+{
+ if ((regs->flags & X86_EFLAGS_AC) || sld_state == sld_fatal)
+ return false;
+
+ pr_alert("#AC: %s/%d took a split_lock trap at address: 0x%lx\n",
+ current->comm, current->pid, regs->ip);
+
+ __sld_msr_set(false);
+ set_tsk_thread_flag(current, TIF_SLD);
+ return true;
+}
+
+void switch_sld(struct task_struct *prev)
+{
+ __sld_msr_set(true);
+ clear_tsk_thread_flag(prev, TIF_SLD);
+}
+
+#define SPLIT_LOCK_CPU(model) {X86_VENDOR_INTEL, 6, model, X86_FEATURE_ANY}
+
+/*
+ * The following processors have split lock detection feature. But since they
+ * don't have MSR IA32_CORE_CAPABILITIES, the feature cannot be enumerated by
+ * the MSR. So enumerate the feature by family and model on these processors.
+ */
+static const struct x86_cpu_id split_lock_cpu_ids[] __initconst = {
+ SPLIT_LOCK_CPU(INTEL_FAM6_ICELAKE_X),
+ SPLIT_LOCK_CPU(INTEL_FAM6_ICELAKE_L),
+ {}
+};
+
+void __init cpu_set_core_cap_bits(struct cpuinfo_x86 *c)
+{
+ u64 ia32_core_caps = 0;
+
+ if (cpu_has(c, X86_FEATURE_CORE_CAPABILITIES)) {
+ /* Enumerate features reported in IA32_CORE_CAPABILITIES MSR. */
+ rdmsrl(MSR_IA32_CORE_CAPABILITIES, ia32_core_caps);
+ } else if (!boot_cpu_has(X86_FEATURE_HYPERVISOR)) {
+ /* Enumerate split lock detection by family and model. */
+ if (x86_match_cpu(split_lock_cpu_ids))
+ ia32_core_caps |= MSR_IA32_CORE_CAPABILITIES_SPLIT_LOCK_DETECT;
+ }
+
+ if (ia32_core_caps & MSR_IA32_CORE_CAPABILITIES_SPLIT_LOCK_DETECT)
+ split_lock_setup();
+}
diff --git a/arch/x86/kernel/process.c b/arch/x86/kernel/process.c
index 61e93a318983..55d205820f35 100644
--- a/arch/x86/kernel/process.c
+++ b/arch/x86/kernel/process.c
@@ -654,6 +654,9 @@ void __switch_to_xtra(struct task_struct *prev_p, struct task_struct *next_p)
/* Enforce MSR update to ensure consistent state */
__speculation_ctrl_update(~tifn, tifn);
}
+
+ if (tifp & _TIF_SLD)
+ switch_sld(prev_p);
}

/*
diff --git a/arch/x86/kernel/traps.c b/arch/x86/kernel/traps.c
index 05da6b5b167b..a933a01f6e40 100644
--- a/arch/x86/kernel/traps.c
+++ b/arch/x86/kernel/traps.c
@@ -46,6 +46,7 @@
#include <asm/traps.h>
#include <asm/desc.h>
#include <asm/fpu/internal.h>
+#include <asm/cpu.h>
#include <asm/cpu_entry_area.h>
#include <asm/mce.h>
#include <asm/fixmap.h>
@@ -242,7 +243,6 @@ do_trap(int trapnr, int signr, char *str, struct pt_regs *regs,
{
struct task_struct *tsk = current;

-
if (!do_trap_no_signal(tsk, trapnr, str, regs, error_code))
return;

@@ -288,9 +288,34 @@ DO_ERROR(X86_TRAP_OLD_MF, SIGFPE, 0, NULL, "coprocessor segment overru
DO_ERROR(X86_TRAP_TS, SIGSEGV, 0, NULL, "invalid TSS", invalid_TSS)
DO_ERROR(X86_TRAP_NP, SIGBUS, 0, NULL, "segment not present", segment_not_present)
DO_ERROR(X86_TRAP_SS, SIGBUS, 0, NULL, "stack segment", stack_segment)
-DO_ERROR(X86_TRAP_AC, SIGBUS, BUS_ADRALN, NULL, "alignment check", alignment_check)
#undef IP

+dotraplinkage void do_alignment_check(struct pt_regs *regs, long error_code)
+{
+ unsigned int trapnr = X86_TRAP_AC;
+ char str[] = "alignment check";
+ int signr = SIGBUS;
+
+ RCU_LOCKDEP_WARN(!rcu_is_watching(), "entry code didn't wake RCU");
+
+ if (notify_die(DIE_TRAP, str, regs, error_code, trapnr, signr) == NOTIFY_STOP)
+ return;
+
+ if (!handle_split_lock())
+ return;
+
+ if (!user_mode(regs))
+ die("Split lock detected\n", regs, error_code);
+
+ cond_local_irq_enable(regs);
+
+ if (handle_user_split_lock(regs, error_code))
+ return;
+
+ do_trap(X86_TRAP_AC, SIGBUS, "alignment check", regs,
+ error_code, BUS_ADRALN, NULL);
+}
+
#ifdef CONFIG_VMAP_STACK
__visible void __noreturn handle_stack_overflow(const char *message,
struct pt_regs *regs,
--
2.21.0

2020-01-14 05:59:30

by Sean Christopherson

[permalink] [raw]
Subject: Re: [PATCH v11] x86/split_lock: Enable split lock detection by kernel

On Fri, Jan 10, 2020 at 11:24:09AM -0800, Luck, Tony wrote:
> From: Peter Zijlstra <[email protected]>
>
> A split-lock occurs when an atomic instruction operates on data
> that spans two cache lines. In order to maintain atomicity the
> core takes a global bus lock.
>
> This is typically >1000 cycles slower than an atomic operation
> within a cache line. It also disrupts performance on other cores
> (which must wait for the bus lock to be released before their
> memory operations can complete. For real-time systems this may
> mean missing deadlines. For other systems it may just be very
> annoying.
>
> Some CPUs have the capability to raise an #AC trap when a
> split lock is attempted.
>
> Provide a command line option to give the user choices on how
> to handle this. split_lock_detect=
> off - not enabled (no traps for split locks)
> warn - warn once when an application does a
> split lock, bust allow it to continue
> running.
> fatal - Send SIGBUS to applications that cause split lock
>
> Default is "warn". Note that if the kernel hits a split lock
> in any mode other than "off" it will OOPs.
>
> One implementation wrinkle is that the MSR to control the
> split lock detection is per-core, not per thread. This might
> result in some short lived races on HT systems in "warn" mode
> if Linux tries to enable on one thread while disabling on
> the other. Race analysis by Sean Christopherson:
>
> - Toggling of split-lock is only done in "warn" mode. Worst case
> scenario of a race is that a misbehaving task will generate multiple
> #AC exceptions on the same instruction. And this race will only occur
> if both siblings are running tasks that generate split-lock #ACs, e.g.
> a race where sibling threads are writing different values will only
> occur if CPUx is disabling split-lock after an #AC and CPUy is
> re-enabling split-lock after *its* previous task generated an #AC.
> - Transitioning between modes at runtime isn't supported and disabling
> is tracked per task, so hardware will always reach a steady state that
> matches the configured mode. I.e. split-lock is guaranteed to be
> enabled in hardware once all _TIF_SLD threads have been scheduled out.
>
> Co-developed-by: Fenghua Yu <[email protected]>

Need Fenghua's SoB.

> Co-developed-by: Peter Zijlstra <[email protected]>

Co-developed-by for Peter not needed since he's the author (attributed
via From).

> Signed-off-by: Peter Zijlstra (Intel) <[email protected]>
> Signed-off-by: Tony Luck <[email protected]>
> ---
>
> I think all the known places where split locks occur in the kernel
> have already been patched, or the patches are queued for the upcoming
> merge window. If we missed some, well this patch will help find them
> (for people with Icelake or Icelake Xeon systems). PeterZ didn't see
> any application level use of split locks in a few hours of runtime
> on his desktop. So likely little fallout there (default is just to
> warn for applications, so just console noise rather than failure).
>
> .../admin-guide/kernel-parameters.txt | 18 ++
> arch/x86/include/asm/cpu.h | 17 ++
> arch/x86/include/asm/cpufeatures.h | 2 +
> arch/x86/include/asm/msr-index.h | 8 +
> arch/x86/include/asm/thread_info.h | 6 +-
> arch/x86/include/asm/traps.h | 1 +
> arch/x86/kernel/cpu/common.c | 2 +
> arch/x86/kernel/cpu/intel.c | 170 ++++++++++++++++++
> arch/x86/kernel/process.c | 3 +
> arch/x86/kernel/traps.c | 29 ++-
> 10 files changed, 252 insertions(+), 4 deletions(-)
>
> diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt
> index ade4e6ec23e0..173c1acff5f0 100644
> --- a/Documentation/admin-guide/kernel-parameters.txt
> +++ b/Documentation/admin-guide/kernel-parameters.txt
> @@ -3181,6 +3181,24 @@
>
> nosoftlockup [KNL] Disable the soft-lockup detector.
>
> + split_lock_detect=

Would it make sense to name this split_lock_ac? To help clarify what the
param does and to future proof a bit in the event split lock detection is
able to signal some other form of fault/trap.

> + [X86] Enable split lock detection
> +
> + When enabled (and if hardware support is present), atomic
> + instructions that access data across cache line
> + boundaries will result in an alignment check exception.
> +
> + off - not enabled
> +
> + warn - the kernel will pr_alert about applications
> + triggering the #AC exception
> +
> + fatal - the kernel will SIGBUS applications that
> + trigger the #AC exception.
> +
> + For any more other than 'off' the kernel will die if
> + it (or firmware) will trigger #AC.
> +
> nosync [HW,M68K] Disables sync negotiation for all devices.
>
> nowatchdog [KNL] Disable both lockup detectors, i.e.

...

> diff --git a/arch/x86/include/asm/thread_info.h b/arch/x86/include/asm/thread_info.h
> index d779366ce3f8..d23638a0525e 100644
> --- a/arch/x86/include/asm/thread_info.h
> +++ b/arch/x86/include/asm/thread_info.h
> @@ -92,6 +92,7 @@ struct thread_info {
> #define TIF_NOCPUID 15 /* CPUID is not accessible in userland */
> #define TIF_NOTSC 16 /* TSC is not accessible in userland */
> #define TIF_IA32 17 /* IA32 compatibility process */
> +#define TIF_SLD 18 /* split_lock_detect */

A more informative name comment would be helpful since the flag is set when
SLD is disabled by the previous task. Something like?

#define TIF_NEED_SLD_RESTORE 18 /* Restore split lock detection on context switch */

> #define TIF_NOHZ 19 /* in adaptive nohz mode */
> #define TIF_MEMDIE 20 /* is terminating due to OOM killer */
> #define TIF_POLLING_NRFLAG 21 /* idle is polling for TIF_NEED_RESCHED */
> @@ -122,6 +123,7 @@ struct thread_info {
> #define _TIF_NOCPUID (1 << TIF_NOCPUID)
> #define _TIF_NOTSC (1 << TIF_NOTSC)
> #define _TIF_IA32 (1 << TIF_IA32)
> +#define _TIF_SLD (1 << TIF_SLD)
> #define _TIF_NOHZ (1 << TIF_NOHZ)
> #define _TIF_POLLING_NRFLAG (1 << TIF_POLLING_NRFLAG)
> #define _TIF_IO_BITMAP (1 << TIF_IO_BITMAP)
> @@ -158,9 +160,9 @@ struct thread_info {
>
> #ifdef CONFIG_X86_IOPL_IOPERM
> # define _TIF_WORK_CTXSW_PREV (_TIF_WORK_CTXSW| _TIF_USER_RETURN_NOTIFY | \
> - _TIF_IO_BITMAP)
> + _TIF_IO_BITMAP | _TIF_SLD)
> #else
> -# define _TIF_WORK_CTXSW_PREV (_TIF_WORK_CTXSW| _TIF_USER_RETURN_NOTIFY)
> +# define _TIF_WORK_CTXSW_PREV (_TIF_WORK_CTXSW| _TIF_USER_RETURN_NOTIFY | _TIF_SLD)
> #endif
>
> #define _TIF_WORK_CTXSW_NEXT (_TIF_WORK_CTXSW)
> diff --git a/arch/x86/include/asm/traps.h b/arch/x86/include/asm/traps.h
> index ffa0dc8a535e..6ceab60370f0 100644
> --- a/arch/x86/include/asm/traps.h
> +++ b/arch/x86/include/asm/traps.h
> @@ -175,4 +175,5 @@ enum x86_pf_error_code {
> X86_PF_INSTR = 1 << 4,
> X86_PF_PK = 1 << 5,
> };
> +

Spurious whitespace.

> #endif /* _ASM_X86_TRAPS_H */
> diff --git a/arch/x86/kernel/cpu/common.c b/arch/x86/kernel/cpu/common.c
> index 2e4d90294fe6..39245f61fad0 100644

...

> +bool handle_split_lock(void)

This is a confusing name IMO, e.g. split_lock_detect_enabled() or similar
would be more intuitive. It'd also avoid the weirdness of having different
semantics for the returns values of handle_split_lock() and
handle_user_split_lock().

> +{
> + return sld_state != sld_off;
> +}
> +
> +bool handle_user_split_lock(struct pt_regs *regs, long error_code)
> +{
> + if ((regs->flags & X86_EFLAGS_AC) || sld_state == sld_fatal)
> + return false;

Maybe add "|| WARN_ON_ONCE(sld_state != sld_off)" to try to prevent the
kernel from going fully into the weeds if a spurious #AC occurs.

> +
> + pr_alert("#AC: %s/%d took a split_lock trap at address: 0x%lx\n",

pr_warn_ratelimited since it's user controlled?

> + current->comm, current->pid, regs->ip);
> +
> + __sld_msr_set(false);
> + set_tsk_thread_flag(current, TIF_SLD);
> + return true;
> +}
> +
> +void switch_sld(struct task_struct *prev)
> +{
> + __sld_msr_set(true);
> + clear_tsk_thread_flag(prev, TIF_SLD);
> +}
> +
> +#define SPLIT_LOCK_CPU(model) {X86_VENDOR_INTEL, 6, model, X86_FEATURE_ANY}
> +
> +/*
> + * The following processors have split lock detection feature. But since they
> + * don't have MSR IA32_CORE_CAPABILITIES, the feature cannot be enumerated by
> + * the MSR. So enumerate the feature by family and model on these processors.
> + */
> +static const struct x86_cpu_id split_lock_cpu_ids[] __initconst = {
> + SPLIT_LOCK_CPU(INTEL_FAM6_ICELAKE_X),
> + SPLIT_LOCK_CPU(INTEL_FAM6_ICELAKE_L),
> + {}
> +};
> +
> +void __init cpu_set_core_cap_bits(struct cpuinfo_x86 *c)
> +{
> + u64 ia32_core_caps = 0;
> +
> + if (cpu_has(c, X86_FEATURE_CORE_CAPABILITIES)) {
> + /* Enumerate features reported in IA32_CORE_CAPABILITIES MSR. */
> + rdmsrl(MSR_IA32_CORE_CAPABILITIES, ia32_core_caps);
> + } else if (!boot_cpu_has(X86_FEATURE_HYPERVISOR)) {
> + /* Enumerate split lock detection by family and model. */
> + if (x86_match_cpu(split_lock_cpu_ids))
> + ia32_core_caps |= MSR_IA32_CORE_CAPABILITIES_SPLIT_LOCK_DETECT;
> + }
> +
> + if (ia32_core_caps & MSR_IA32_CORE_CAPABILITIES_SPLIT_LOCK_DETECT)
> + split_lock_setup();
> +}
> diff --git a/arch/x86/kernel/process.c b/arch/x86/kernel/process.c
> index 61e93a318983..55d205820f35 100644
> --- a/arch/x86/kernel/process.c
> +++ b/arch/x86/kernel/process.c
> @@ -654,6 +654,9 @@ void __switch_to_xtra(struct task_struct *prev_p, struct task_struct *next_p)
> /* Enforce MSR update to ensure consistent state */
> __speculation_ctrl_update(~tifn, tifn);
> }
> +
> + if (tifp & _TIF_SLD)
> + switch_sld(prev_p);
> }
>
> /*
> diff --git a/arch/x86/kernel/traps.c b/arch/x86/kernel/traps.c
> index 05da6b5b167b..a933a01f6e40 100644
> --- a/arch/x86/kernel/traps.c
> +++ b/arch/x86/kernel/traps.c
> @@ -46,6 +46,7 @@
> #include <asm/traps.h>
> #include <asm/desc.h>
> #include <asm/fpu/internal.h>
> +#include <asm/cpu.h>
> #include <asm/cpu_entry_area.h>
> #include <asm/mce.h>
> #include <asm/fixmap.h>
> @@ -242,7 +243,6 @@ do_trap(int trapnr, int signr, char *str, struct pt_regs *regs,
> {
> struct task_struct *tsk = current;
>
> -

Whitespace.

> if (!do_trap_no_signal(tsk, trapnr, str, regs, error_code))
> return;
>
> @@ -288,9 +288,34 @@ DO_ERROR(X86_TRAP_OLD_MF, SIGFPE, 0, NULL, "coprocessor segment overru
> DO_ERROR(X86_TRAP_TS, SIGSEGV, 0, NULL, "invalid TSS", invalid_TSS)
> DO_ERROR(X86_TRAP_NP, SIGBUS, 0, NULL, "segment not present", segment_not_present)
> DO_ERROR(X86_TRAP_SS, SIGBUS, 0, NULL, "stack segment", stack_segment)
> -DO_ERROR(X86_TRAP_AC, SIGBUS, BUS_ADRALN, NULL, "alignment check", alignment_check)
> #undef IP
>
> +dotraplinkage void do_alignment_check(struct pt_regs *regs, long error_code)
> +{
> + unsigned int trapnr = X86_TRAP_AC;
> + char str[] = "alignment check";

const if you want to keep it.

> + int signr = SIGBUS;

Don't see any reason for these, e.g. they're not used for do_trap().
trapnr and signr in particular do more harm than good.

> + RCU_LOCKDEP_WARN(!rcu_is_watching(), "entry code didn't wake RCU");
> +
> + if (notify_die(DIE_TRAP, str, regs, error_code, trapnr, signr) == NOTIFY_STOP)
> + return;
> +
> + if (!handle_split_lock())
> + return;
> +
> + if (!user_mode(regs))
> + die("Split lock detected\n", regs, error_code);
> +
> + cond_local_irq_enable(regs);
> +
> + if (handle_user_split_lock(regs, error_code))
> + return;
> +
> + do_trap(X86_TRAP_AC, SIGBUS, "alignment check", regs,
> + error_code, BUS_ADRALN, NULL);
> +}
> +
> #ifdef CONFIG_VMAP_STACK
> __visible void __noreturn handle_stack_overflow(const char *message,
> struct pt_regs *regs,
> --
> 2.21.0
>

2020-01-15 22:29:15

by Tony Luck

[permalink] [raw]
Subject: Re: [PATCH v11] x86/split_lock: Enable split lock detection by kernel

On Mon, Jan 13, 2020 at 09:55:21PM -0800, Sean Christopherson wrote:
> On Fri, Jan 10, 2020 at 11:24:09AM -0800, Luck, Tony wrote:

All comments accepted and code changed ... except for these three:

> > +#define TIF_SLD 18 /* split_lock_detect */
>
> A more informative name comment would be helpful since the flag is set when
> SLD is disabled by the previous task. Something like?
>
> #define TIF_NEED_SLD_RESTORE 18 /* Restore split lock detection on context switch */

That name is more informative ... but it is also really, really long. Are
you sure?

> > +bool handle_user_split_lock(struct pt_regs *regs, long error_code)
> > +{
> > + if ((regs->flags & X86_EFLAGS_AC) || sld_state == sld_fatal)
> > + return false;
>
> Maybe add "|| WARN_ON_ONCE(sld_state != sld_off)" to try to prevent the
> kernel from going fully into the weeds if a spurious #AC occurs.

Can a spurious #AC occur? I don't see how.

> > @@ -242,7 +243,6 @@ do_trap(int trapnr, int signr, char *str, struct pt_regs *regs,
> > {
> > struct task_struct *tsk = current;
> >
> > -
>
> Whitespace.
>
> > if (!do_trap_no_signal(tsk, trapnr, str, regs, error_code))
> > return;

I'm staring at the post patch code, and I can't see what whitespace
issue you see.

-Tony

2020-01-16 00:22:34

by Sean Christopherson

[permalink] [raw]
Subject: Re: [PATCH v11] x86/split_lock: Enable split lock detection by kernel

On Wed, Jan 15, 2020 at 02:27:54PM -0800, Luck, Tony wrote:
> On Mon, Jan 13, 2020 at 09:55:21PM -0800, Sean Christopherson wrote:
> > On Fri, Jan 10, 2020 at 11:24:09AM -0800, Luck, Tony wrote:
>
> All comments accepted and code changed ... except for these three:

Sounds like you're also writing code, in which case you should give
yourself credit with your own Co-developed-by: tag.

> > > +#define TIF_SLD 18 /* split_lock_detect */
> >
> > A more informative name comment would be helpful since the flag is set when
> > SLD is disabled by the previous task. Something like?
> >
> > #define TIF_NEED_SLD_RESTORE 18 /* Restore split lock detection on context switch */
>
> That name is more informative ... but it is also really, really long. Are
> you sure?

Not at all. I picked a semi-arbitrary name that was similar to existing
TIF names, I'll defer to anyone with an opinion.

> > > +bool handle_user_split_lock(struct pt_regs *regs, long error_code)
> > > +{
> > > + if ((regs->flags & X86_EFLAGS_AC) || sld_state == sld_fatal)
> > > + return false;
> >
> > Maybe add "|| WARN_ON_ONCE(sld_state != sld_off)" to try to prevent the
> > kernel from going fully into the weeds if a spurious #AC occurs.
>
> Can a spurious #AC occur? I don't see how.

It's mostly paranoia, e.g. if sld_state==sld_off but the MSR bit was
misconfigured. No objection if you want to omit the check.

> > > @@ -242,7 +243,6 @@ do_trap(int trapnr, int signr, char *str, struct pt_regs *regs,
> > > {
> > > struct task_struct *tsk = current;
> > >
> > > -
> >
> > Whitespace.
> >
> > > if (!do_trap_no_signal(tsk, trapnr, str, regs, error_code))
> > > return;
>
> I'm staring at the post patch code, and I can't see what whitespace
> issue you see.

There's a random newline removal in do_trap(). It's a good change in the
sense that it eliminates an extra newline, bad in the sense that it's
unrelated to the rest of the patch.

2020-01-16 00:38:35

by Tony Luck

[permalink] [raw]
Subject: RE: [PATCH v11] x86/split_lock: Enable split lock detection by kernel

>> All comments accepted and code changed ... except for these three:
>
> Sounds like you're also writing code, in which case you should give
> yourself credit with your own Co-developed-by: tag.

I just fixed some typos in PeterZ's untested example patch. Now changed
a few names as per your suggestions. I don't' really think of that as "writing code".

-Tony

2020-01-22 18:57:31

by Tony Luck

[permalink] [raw]
Subject: [PATCH v12] x86/split_lock: Enable split lock detection by kernel

From: Peter Zijlstra <[email protected]>

A split-lock occurs when an atomic instruction operates on data
that spans two cache lines. In order to maintain atomicity the
core takes a global bus lock.

This is typically >1000 cycles slower than an atomic operation
within a cache line. It also disrupts performance on other cores
(which must wait for the bus lock to be released before their
memory operations can complete. For real-time systems this may
mean missing deadlines. For other systems it may just be very
annoying.

Some CPUs have the capability to raise an #AC trap when a
split lock is attempted.

Provide a command line option to give the user choices on how
to handle this. split_lock_detect=
off - not enabled (no traps for split locks)
warn - warn once when an application does a
split lock, bust allow it to continue
running.
fatal - Send SIGBUS to applications that cause split lock

Default is "warn". Note that if the kernel hits a split lock
in any mode other than "off" it will OOPs.

One implementation wrinkle is that the MSR to control the
split lock detection is per-core, not per thread. This might
result in some short lived races on HT systems in "warn" mode
if Linux tries to enable on one thread while disabling on
the other. Race analysis by Sean Christopherson:

- Toggling of split-lock is only done in "warn" mode. Worst case
scenario of a race is that a misbehaving task will generate multiple
#AC exceptions on the same instruction. And this race will only occur
if both siblings are running tasks that generate split-lock #ACs, e.g.
a race where sibling threads are writing different values will only
occur if CPUx is disabling split-lock after an #AC and CPUy is
re-enabling split-lock after *its* previous task generated an #AC.
- Transitioning between modes at runtime isn't supported and disabling
is tracked per task, so hardware will always reach a steady state that
matches the configured mode. I.e. split-lock is guaranteed to be
enabled in hardware once all _TIF_SLD threads have been scheduled out.

Co-developed-by: Fenghua Yu <[email protected]>
Co-developed-by: Tony Luck <[email protected]>
Signed-off-by: Fenghua Yu <[email protected]>
Signed-off-by: Peter Zijlstra (Intel) <[email protected]>
Signed-off-by: Tony Luck <[email protected]>
---

v12: Applied all changes suggested by Sean except:
1) Keep the short name TIF_SLD (though I did take the
improved comment on what it does)
2) Did not add a WARN_ON in trap code for unexpected #AC
3) Kept the white space cleanup (delete unneeded blank line)
in do_trap()

.../admin-guide/kernel-parameters.txt | 18 ++
arch/x86/include/asm/cpu.h | 17 ++
arch/x86/include/asm/cpufeatures.h | 2 +
arch/x86/include/asm/msr-index.h | 8 +
arch/x86/include/asm/thread_info.h | 6 +-
arch/x86/kernel/cpu/common.c | 2 +
arch/x86/kernel/cpu/intel.c | 170 ++++++++++++++++++
arch/x86/kernel/process.c | 3 +
arch/x86/kernel/traps.c | 27 ++-
9 files changed, 249 insertions(+), 4 deletions(-)

diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt
index ade4e6ec23e0..36a4e0e2654b 100644
--- a/Documentation/admin-guide/kernel-parameters.txt
+++ b/Documentation/admin-guide/kernel-parameters.txt
@@ -3181,6 +3181,24 @@

nosoftlockup [KNL] Disable the soft-lockup detector.

+ split_lock_ac=
+ [X86] Enable split lock detection
+
+ When enabled (and if hardware support is present), atomic
+ instructions that access data across cache line
+ boundaries will result in an alignment check exception.
+
+ off - not enabled
+
+ warn - the kernel will pr_alert about applications
+ triggering the #AC exception
+
+ fatal - the kernel will SIGBUS applications that
+ trigger the #AC exception.
+
+ For any more other than 'off' the kernel will die if
+ it (or firmware) will trigger #AC.
+
nosync [HW,M68K] Disables sync negotiation for all devices.

nowatchdog [KNL] Disable both lockup detectors, i.e.
diff --git a/arch/x86/include/asm/cpu.h b/arch/x86/include/asm/cpu.h
index adc6cc86b062..32a295533e2d 100644
--- a/arch/x86/include/asm/cpu.h
+++ b/arch/x86/include/asm/cpu.h
@@ -40,4 +40,21 @@ int mwait_usable(const struct cpuinfo_x86 *);
unsigned int x86_family(unsigned int sig);
unsigned int x86_model(unsigned int sig);
unsigned int x86_stepping(unsigned int sig);
+#ifdef CONFIG_CPU_SUP_INTEL
+extern void __init cpu_set_core_cap_bits(struct cpuinfo_x86 *c);
+extern bool split_lock_detect_enabled(void);
+extern bool handle_user_split_lock(struct pt_regs *regs, long error_code);
+extern void switch_sld(struct task_struct *);
+#else
+static inline void __init cpu_set_core_cap_bits(struct cpuinfo_x86 *c) {}
+static inline bool split_lock_detect_enabled(void)
+{
+ return false;
+}
+static inline bool handle_user_split_lock(struct pt_regs *regs, long error_code)
+{
+ return false;
+}
+static inline void switch_sld(struct task_struct *prev) {}
+#endif
#endif /* _ASM_X86_CPU_H */
diff --git a/arch/x86/include/asm/cpufeatures.h b/arch/x86/include/asm/cpufeatures.h
index e9b62498fe75..c3edd2bba184 100644
--- a/arch/x86/include/asm/cpufeatures.h
+++ b/arch/x86/include/asm/cpufeatures.h
@@ -220,6 +220,7 @@
#define X86_FEATURE_ZEN ( 7*32+28) /* "" CPU is AMD family 0x17 (Zen) */
#define X86_FEATURE_L1TF_PTEINV ( 7*32+29) /* "" L1TF workaround PTE inversion */
#define X86_FEATURE_IBRS_ENHANCED ( 7*32+30) /* Enhanced IBRS */
+#define X86_FEATURE_SPLIT_LOCK_DETECT ( 7*32+31) /* #AC for split lock */

/* Virtualization flags: Linux defined, word 8 */
#define X86_FEATURE_TPR_SHADOW ( 8*32+ 0) /* Intel TPR Shadow */
@@ -365,6 +366,7 @@
#define X86_FEATURE_INTEL_STIBP (18*32+27) /* "" Single Thread Indirect Branch Predictors */
#define X86_FEATURE_FLUSH_L1D (18*32+28) /* Flush L1D cache */
#define X86_FEATURE_ARCH_CAPABILITIES (18*32+29) /* IA32_ARCH_CAPABILITIES MSR (Intel) */
+#define X86_FEATURE_CORE_CAPABILITIES (18*32+30) /* "" IA32_CORE_CAPABILITIES MSR */
#define X86_FEATURE_SPEC_CTRL_SSBD (18*32+31) /* "" Speculative Store Bypass Disable */

/*
diff --git a/arch/x86/include/asm/msr-index.h b/arch/x86/include/asm/msr-index.h
index 084e98da04a7..8bb2e08ce4a3 100644
--- a/arch/x86/include/asm/msr-index.h
+++ b/arch/x86/include/asm/msr-index.h
@@ -41,6 +41,10 @@

/* Intel MSRs. Some also available on other CPUs */

+#define MSR_TEST_CTRL 0x00000033
+#define MSR_TEST_CTRL_SPLIT_LOCK_DETECT_BIT 29
+#define MSR_TEST_CTRL_SPLIT_LOCK_DETECT BIT(MSR_TEST_CTRL_SPLIT_LOCK_DETECT_BIT)
+
#define MSR_IA32_SPEC_CTRL 0x00000048 /* Speculation Control */
#define SPEC_CTRL_IBRS BIT(0) /* Indirect Branch Restricted Speculation */
#define SPEC_CTRL_STIBP_SHIFT 1 /* Single Thread Indirect Branch Predictor (STIBP) bit */
@@ -70,6 +74,10 @@
*/
#define MSR_IA32_UMWAIT_CONTROL_TIME_MASK (~0x03U)

+#define MSR_IA32_CORE_CAPABILITIES 0x000000cf
+#define MSR_IA32_CORE_CAPABILITIES_SPLIT_LOCK_DETECT_BIT 5
+#define MSR_IA32_CORE_CAPABILITIES_SPLIT_LOCK_DETECT BIT(MSR_IA32_CORE_CAPABILITIES_SPLIT_LOCK_DETECT_BIT)
+
#define MSR_PKG_CST_CONFIG_CONTROL 0x000000e2
#define NHM_C3_AUTO_DEMOTE (1UL << 25)
#define NHM_C1_AUTO_DEMOTE (1UL << 26)
diff --git a/arch/x86/include/asm/thread_info.h b/arch/x86/include/asm/thread_info.h
index d779366ce3f8..cd88642e9e15 100644
--- a/arch/x86/include/asm/thread_info.h
+++ b/arch/x86/include/asm/thread_info.h
@@ -92,6 +92,7 @@ struct thread_info {
#define TIF_NOCPUID 15 /* CPUID is not accessible in userland */
#define TIF_NOTSC 16 /* TSC is not accessible in userland */
#define TIF_IA32 17 /* IA32 compatibility process */
+#define TIF_SLD 18 /* Restore split lock detection on context switch */
#define TIF_NOHZ 19 /* in adaptive nohz mode */
#define TIF_MEMDIE 20 /* is terminating due to OOM killer */
#define TIF_POLLING_NRFLAG 21 /* idle is polling for TIF_NEED_RESCHED */
@@ -122,6 +123,7 @@ struct thread_info {
#define _TIF_NOCPUID (1 << TIF_NOCPUID)
#define _TIF_NOTSC (1 << TIF_NOTSC)
#define _TIF_IA32 (1 << TIF_IA32)
+#define _TIF_SLD (1 << TIF_SLD)
#define _TIF_NOHZ (1 << TIF_NOHZ)
#define _TIF_POLLING_NRFLAG (1 << TIF_POLLING_NRFLAG)
#define _TIF_IO_BITMAP (1 << TIF_IO_BITMAP)
@@ -158,9 +160,9 @@ struct thread_info {

#ifdef CONFIG_X86_IOPL_IOPERM
# define _TIF_WORK_CTXSW_PREV (_TIF_WORK_CTXSW| _TIF_USER_RETURN_NOTIFY | \
- _TIF_IO_BITMAP)
+ _TIF_IO_BITMAP | _TIF_SLD)
#else
-# define _TIF_WORK_CTXSW_PREV (_TIF_WORK_CTXSW| _TIF_USER_RETURN_NOTIFY)
+# define _TIF_WORK_CTXSW_PREV (_TIF_WORK_CTXSW| _TIF_USER_RETURN_NOTIFY | _TIF_SLD)
#endif

#define _TIF_WORK_CTXSW_NEXT (_TIF_WORK_CTXSW)
diff --git a/arch/x86/kernel/cpu/common.c b/arch/x86/kernel/cpu/common.c
index 2e4d90294fe6..39245f61fad0 100644
--- a/arch/x86/kernel/cpu/common.c
+++ b/arch/x86/kernel/cpu/common.c
@@ -1234,6 +1234,8 @@ static void __init early_identify_cpu(struct cpuinfo_x86 *c)

cpu_set_bug_bits(c);

+ cpu_set_core_cap_bits(c);
+
fpu__init_system(c);

#ifdef CONFIG_X86_32
diff --git a/arch/x86/kernel/cpu/intel.c b/arch/x86/kernel/cpu/intel.c
index 4a900804a023..708fde6db703 100644
--- a/arch/x86/kernel/cpu/intel.c
+++ b/arch/x86/kernel/cpu/intel.c
@@ -19,6 +19,8 @@
#include <asm/microcode_intel.h>
#include <asm/hwcap2.h>
#include <asm/elf.h>
+#include <asm/cpu_device_id.h>
+#include <asm/cmdline.h>

#ifdef CONFIG_X86_64
#include <linux/topology.h>
@@ -31,6 +33,14 @@
#include <asm/apic.h>
#endif

+enum split_lock_detect_state {
+ sld_off = 0,
+ sld_warn,
+ sld_fatal,
+};
+
+static enum split_lock_detect_state sld_state = sld_warn;
+
/*
* Just in case our CPU detection goes bad, or you have a weird system,
* allow a way to override the automatic disabling of MPX.
@@ -652,6 +662,8 @@ static void init_intel_misc_features(struct cpuinfo_x86 *c)
wrmsrl(MSR_MISC_FEATURES_ENABLES, msr);
}

+static void split_lock_init(void);
+
static void init_intel(struct cpuinfo_x86 *c)
{
early_init_intel(c);
@@ -767,6 +779,8 @@ static void init_intel(struct cpuinfo_x86 *c)
tsx_enable();
if (tsx_ctrl_state == TSX_CTRL_DISABLE)
tsx_disable();
+
+ split_lock_init();
}

#ifdef CONFIG_X86_32
@@ -1028,3 +1042,159 @@ static const struct cpu_dev intel_cpu_dev = {
};

cpu_dev_register(intel_cpu_dev);
+
+#undef pr_fmt
+#define pr_fmt(fmt) "x86/split lock detection: " fmt
+
+static const struct {
+ const char *option;
+ enum split_lock_detect_state state;
+} sld_options[] __initconst = {
+ { "off", sld_off },
+ { "warn", sld_warn },
+ { "fatal", sld_fatal },
+};
+
+static inline bool match_option(const char *arg, int arglen, const char *opt)
+{
+ int len = strlen(opt);
+
+ return len == arglen && !strncmp(arg, opt, len);
+}
+
+static void __init split_lock_setup(void)
+{
+ enum split_lock_detect_state sld = sld_state;
+ char arg[20];
+ int i, ret;
+
+ setup_force_cpu_cap(X86_FEATURE_SPLIT_LOCK_DETECT);
+
+ ret = cmdline_find_option(boot_command_line, "split_lock_ac",
+ arg, sizeof(arg));
+ if (ret < 0)
+ goto print;
+
+ for (i = 0; i < ARRAY_SIZE(sld_options); i++) {
+ if (match_option(arg, ret, sld_options[i].option)) {
+ sld = sld_options[i].state;
+ break;
+ }
+ }
+
+ if (sld != sld_state)
+ sld_state = sld;
+
+print:
+ switch(sld) {
+ case sld_off:
+ pr_info("disabled\n");
+ break;
+
+ case sld_warn:
+ pr_info("warning about user-space split_locks\n");
+ break;
+
+ case sld_fatal:
+ pr_info("sending SIGBUS on user-space split_locks\n");
+ break;
+ }
+}
+
+/*
+ * The TEST_CTRL MSR is per core. So multiple threads can
+ * read/write the MSR in parallel. But it's possible to
+ * simplify the read/write without locking and without
+ * worry about overwriting the MSR because only bit 29
+ * is implemented in the MSR and the bit is set as 1 by all
+ * threads. Locking may be needed in the future if situation
+ * is changed e.g. other bits are implemented.
+ */
+
+static bool __sld_msr_set(bool on)
+{
+ u64 test_ctrl_val;
+
+ if (rdmsrl_safe(MSR_TEST_CTRL, &test_ctrl_val))
+ return false;
+
+ if (on)
+ test_ctrl_val |= MSR_TEST_CTRL_SPLIT_LOCK_DETECT;
+ else
+ test_ctrl_val &= ~MSR_TEST_CTRL_SPLIT_LOCK_DETECT;
+
+ if (wrmsrl_safe(MSR_TEST_CTRL, test_ctrl_val))
+ return false;
+
+ return true;
+}
+
+static void split_lock_init(void)
+{
+ if (sld_state == sld_off)
+ return;
+
+ if (__sld_msr_set(true))
+ return;
+
+ /*
+ * If this is anything other than the boot-cpu, you've done
+ * funny things and you get to keep whatever pieces.
+ */
+ pr_warn("MSR fail -- disabled\n");
+ __sld_msr_set(sld_off);
+}
+
+bool split_lock_detect_enabled(void)
+{
+ return sld_state != sld_off;
+}
+
+bool handle_user_split_lock(struct pt_regs *regs, long error_code)
+{
+ if ((regs->flags & X86_EFLAGS_AC) || sld_state == sld_fatal)
+ return false;
+
+ pr_warn_ratelimited("#AC: %s/%d took a split_lock trap at address: 0x%lx\n",
+ current->comm, current->pid, regs->ip);
+
+ __sld_msr_set(false);
+ set_tsk_thread_flag(current, TIF_SLD);
+ return true;
+}
+
+void switch_sld(struct task_struct *prev)
+{
+ __sld_msr_set(true);
+ clear_tsk_thread_flag(prev, TIF_SLD);
+}
+
+#define SPLIT_LOCK_CPU(model) {X86_VENDOR_INTEL, 6, model, X86_FEATURE_ANY}
+
+/*
+ * The following processors have split lock detection feature. But since they
+ * don't have MSR IA32_CORE_CAPABILITIES, the feature cannot be enumerated by
+ * the MSR. So enumerate the feature by family and model on these processors.
+ */
+static const struct x86_cpu_id split_lock_cpu_ids[] __initconst = {
+ SPLIT_LOCK_CPU(INTEL_FAM6_ICELAKE_X),
+ SPLIT_LOCK_CPU(INTEL_FAM6_ICELAKE_L),
+ {}
+};
+
+void __init cpu_set_core_cap_bits(struct cpuinfo_x86 *c)
+{
+ u64 ia32_core_caps = 0;
+
+ if (cpu_has(c, X86_FEATURE_CORE_CAPABILITIES)) {
+ /* Enumerate features reported in IA32_CORE_CAPABILITIES MSR. */
+ rdmsrl(MSR_IA32_CORE_CAPABILITIES, ia32_core_caps);
+ } else if (!boot_cpu_has(X86_FEATURE_HYPERVISOR)) {
+ /* Enumerate split lock detection by family and model. */
+ if (x86_match_cpu(split_lock_cpu_ids))
+ ia32_core_caps |= MSR_IA32_CORE_CAPABILITIES_SPLIT_LOCK_DETECT;
+ }
+
+ if (ia32_core_caps & MSR_IA32_CORE_CAPABILITIES_SPLIT_LOCK_DETECT)
+ split_lock_setup();
+}
diff --git a/arch/x86/kernel/process.c b/arch/x86/kernel/process.c
index 61e93a318983..55d205820f35 100644
--- a/arch/x86/kernel/process.c
+++ b/arch/x86/kernel/process.c
@@ -654,6 +654,9 @@ void __switch_to_xtra(struct task_struct *prev_p, struct task_struct *next_p)
/* Enforce MSR update to ensure consistent state */
__speculation_ctrl_update(~tifn, tifn);
}
+
+ if (tifp & _TIF_SLD)
+ switch_sld(prev_p);
}

/*
diff --git a/arch/x86/kernel/traps.c b/arch/x86/kernel/traps.c
index 05da6b5b167b..ef287effd8ba 100644
--- a/arch/x86/kernel/traps.c
+++ b/arch/x86/kernel/traps.c
@@ -46,6 +46,7 @@
#include <asm/traps.h>
#include <asm/desc.h>
#include <asm/fpu/internal.h>
+#include <asm/cpu.h>
#include <asm/cpu_entry_area.h>
#include <asm/mce.h>
#include <asm/fixmap.h>
@@ -242,7 +243,6 @@ do_trap(int trapnr, int signr, char *str, struct pt_regs *regs,
{
struct task_struct *tsk = current;

-
if (!do_trap_no_signal(tsk, trapnr, str, regs, error_code))
return;

@@ -288,9 +288,32 @@ DO_ERROR(X86_TRAP_OLD_MF, SIGFPE, 0, NULL, "coprocessor segment overru
DO_ERROR(X86_TRAP_TS, SIGSEGV, 0, NULL, "invalid TSS", invalid_TSS)
DO_ERROR(X86_TRAP_NP, SIGBUS, 0, NULL, "segment not present", segment_not_present)
DO_ERROR(X86_TRAP_SS, SIGBUS, 0, NULL, "stack segment", stack_segment)
-DO_ERROR(X86_TRAP_AC, SIGBUS, BUS_ADRALN, NULL, "alignment check", alignment_check)
#undef IP

+dotraplinkage void do_alignment_check(struct pt_regs *regs, long error_code)
+{
+ const char str[] = "alignment check";
+
+ RCU_LOCKDEP_WARN(!rcu_is_watching(), "entry code didn't wake RCU");
+
+ if (notify_die(DIE_TRAP, str, regs, error_code, X86_TRAP_AC, SIGBUS) == NOTIFY_STOP)
+ return;
+
+ if (!split_lock_detect_enabled())
+ return;
+
+ if (!user_mode(regs))
+ die("Split lock detected\n", regs, error_code);
+
+ cond_local_irq_enable(regs);
+
+ if (handle_user_split_lock(regs, error_code))
+ return;
+
+ do_trap(X86_TRAP_AC, SIGBUS, "alignment check", regs,
+ error_code, BUS_ADRALN, NULL);
+}
+
#ifdef CONFIG_VMAP_STACK
__visible void __noreturn handle_stack_overflow(const char *message,
struct pt_regs *regs,
--
2.21.0

2020-01-22 19:05:24

by Borislav Petkov

[permalink] [raw]
Subject: Re: [PATCH v12] x86/split_lock: Enable split lock detection by kernel

On Wed, Jan 22, 2020 at 10:55:14AM -0800, Luck, Tony wrote:
> diff --git a/arch/x86/include/asm/cpufeatures.h b/arch/x86/include/asm/cpufeatures.h
> index e9b62498fe75..c3edd2bba184 100644
> --- a/arch/x86/include/asm/cpufeatures.h
> +++ b/arch/x86/include/asm/cpufeatures.h
> @@ -220,6 +220,7 @@
> #define X86_FEATURE_ZEN ( 7*32+28) /* "" CPU is AMD family 0x17 (Zen) */
> #define X86_FEATURE_L1TF_PTEINV ( 7*32+29) /* "" L1TF workaround PTE inversion */
> #define X86_FEATURE_IBRS_ENHANCED ( 7*32+30) /* Enhanced IBRS */
> +#define X86_FEATURE_SPLIT_LOCK_DETECT ( 7*32+31) /* #AC for split lock */

That word is already full in tip:

...
#define X86_FEATURE_MSR_IA32_FEAT_CTL ( 7*32+31) /* "" MSR IA32_FEAT_CTL configured */

use word 11 instead.

> +#define MSR_TEST_CTRL 0x00000033
> +#define MSR_TEST_CTRL_SPLIT_LOCK_DETECT_BIT 29
> +#define MSR_TEST_CTRL_SPLIT_LOCK_DETECT BIT(MSR_TEST_CTRL_SPLIT_LOCK_DETECT_BIT)
> +
> #define MSR_IA32_SPEC_CTRL 0x00000048 /* Speculation Control */
> #define SPEC_CTRL_IBRS BIT(0) /* Indirect Branch Restricted Speculation */
> #define SPEC_CTRL_STIBP_SHIFT 1 /* Single Thread Indirect Branch Predictor (STIBP) bit */
> @@ -70,6 +74,10 @@
> */
> #define MSR_IA32_UMWAIT_CONTROL_TIME_MASK (~0x03U)
>
> +#define MSR_IA32_CORE_CAPABILITIES 0x000000cf
> +#define MSR_IA32_CORE_CAPABILITIES_SPLIT_LOCK_DETECT_BIT 5
> +#define MSR_IA32_CORE_CAPABILITIES_SPLIT_LOCK_DETECT BIT(MSR_IA32_CORE_CAPABILITIES_SPLIT_LOCK_DETECT_BIT)

Any chance making those shorter?

--
Regards/Gruss,
Boris.

https://people.kernel.org/tglx/notes-about-netiquette

2020-01-22 20:06:03

by Tony Luck

[permalink] [raw]
Subject: RE: [PATCH v12] x86/split_lock: Enable split lock detection by kernel

>> +#define X86_FEATURE_SPLIT_LOCK_DETECT ( 7*32+31) /* #AC for split lock */
>
> That word is already full in tip:
> ...
> use word 11 instead.

Will rebase against tip/master and move to word 11.

>> +#define MSR_IA32_CORE_CAPABILITIES 0x000000cf
>> +#define MSR_IA32_CORE_CAPABILITIES_SPLIT_LOCK_DETECT_BIT 5
>> +#define MSR_IA32_CORE_CAPABILITIES_SPLIT_LOCK_DETECT BIT(MSR_IA32_CORE_CAPABILITIES_SPLIT_LOCK_DETECT_BIT)
>
> Any chance making those shorter?

I could abbreviate CAPABILITIES as "CAP", that would save 9 characters. Is that enough?

I'm not fond of the "remove the vowels": SPLT_LCK_DTCT, but that is sort of readable
and would save 4 more. What do you think?

-Tony

2020-01-22 20:58:12

by Borislav Petkov

[permalink] [raw]
Subject: Re: [PATCH v12] x86/split_lock: Enable split lock detection by kernel

On Wed, Jan 22, 2020 at 08:03:28PM +0000, Luck, Tony wrote:
> I could abbreviate CAPABILITIES as "CAP", that would save 9
> characters. Is that enough?

Sure, except...

> I'm not fond of the "remove the vowels": SPLT_LCK_DTCT, but that is
> sort of readable and would save 4 more. What do you think?

... we've been trying to keep the MSR names as spelled in the SDM to
avoid confusion.

Looking at it,

MSR_IA32_CORE_CAPABILITIES -> MSR_IA32_CORE_CAPS

along with a comment above its definition sounds like a good compromise
to me. IMO, of course.

Thx.

--
Regards/Gruss,
Boris.

https://people.kernel.org/tglx/notes-about-netiquette

2020-01-22 22:45:34

by Arvind Sankar

[permalink] [raw]
Subject: Re: [PATCH v12] x86/split_lock: Enable split lock detection by kernel

On Wed, Jan 22, 2020 at 10:55:14AM -0800, Luck, Tony wrote:
> +
> +static enum split_lock_detect_state sld_state = sld_warn;
> +

This sets sld_state to sld_warn even on CPUs that don't support
split-lock detection. split_lock_init will then try to read/write the
MSR to turn it on. Would it be better to initialize it to sld_off and
set it to sld_warn in split_lock_setup instead, which is only called if
the CPU supports the feature?

>
> +dotraplinkage void do_alignment_check(struct pt_regs *regs, long error_code)
> +{
> + const char str[] = "alignment check";
> +
> + RCU_LOCKDEP_WARN(!rcu_is_watching(), "entry code didn't wake RCU");
> +
> + if (notify_die(DIE_TRAP, str, regs, error_code, X86_TRAP_AC, SIGBUS) == NOTIFY_STOP)
> + return;
> +
> + if (!split_lock_detect_enabled())
> + return;

This misses one comment from Sean [1] that this check should be dropped,
otherwise user-space alignment check via EFLAGS.AC will get ignored when
split lock detection is disabled.

[1] https://lore.kernel.org/lkml/[email protected]/

> +
> + if (!user_mode(regs))
> + die("Split lock detected\n", regs, error_code);
> +
> + cond_local_irq_enable(regs);
> +
> + if (handle_user_split_lock(regs, error_code))
> + return;
> +
> + do_trap(X86_TRAP_AC, SIGBUS, "alignment check", regs,
> + error_code, BUS_ADRALN, NULL);
> +}
> +

Peter [2] called this a possible DOS vector. If userspace is malicious
rather than buggy, couldn't it simply ignore SIGBUS?

[2] https://lore.kernel.org/lkml/[email protected]/

2020-01-22 22:53:54

by Arvind Sankar

[permalink] [raw]
Subject: Re: [PATCH v12] x86/split_lock: Enable split lock detection by kernel

On Wed, Jan 22, 2020 at 05:42:51PM -0500, Arvind Sankar wrote:
>
> Peter [2] called this a possible DOS vector. If userspace is malicious
> rather than buggy, couldn't it simply ignore SIGBUS?
>
> [2] https://lore.kernel.org/lkml/[email protected]/

Ignore this last bit, wasn't thinking right.

2020-01-22 23:26:25

by Tony Luck

[permalink] [raw]
Subject: RE: [PATCH v12] x86/split_lock: Enable split lock detection by kernel

>> +static enum split_lock_detect_state sld_state = sld_warn;
>> +
>
> This sets sld_state to sld_warn even on CPUs that don't support
> split-lock detection. split_lock_init will then try to read/write the
> MSR to turn it on. Would it be better to initialize it to sld_off and
> set it to sld_warn in split_lock_setup instead, which is only called if
> the CPU supports the feature?

I've lost some bits of this patch series somewhere along the way :-( There
was once code to decide whether the feature was supported (either with
x86_match_cpu() for a couple of models, or using the architectural test
based on some MSR bits. I need to dig that out and put it back in. Then
stuff can check X86_FEATURE_SPLIT_LOCK before wandering into code
that messes with MSRs

>> + if (!split_lock_detect_enabled())
>> + return;
>
> This misses one comment from Sean [1] that this check should be dropped,
> otherwise user-space alignment check via EFLAGS.AC will get ignored when
> split lock detection is disabled.

Ah yes. Good catch. Will fix.

Thanks for the review.

-Tony

2020-01-23 00:48:03

by Arvind Sankar

[permalink] [raw]
Subject: Re: [PATCH v12] x86/split_lock: Enable split lock detection by kernel

On Wed, Jan 22, 2020 at 11:24:34PM +0000, Luck, Tony wrote:
> >> +static enum split_lock_detect_state sld_state = sld_warn;
> >> +
> >
> > This sets sld_state to sld_warn even on CPUs that don't support
> > split-lock detection. split_lock_init will then try to read/write the
> > MSR to turn it on. Would it be better to initialize it to sld_off and
> > set it to sld_warn in split_lock_setup instead, which is only called if
> > the CPU supports the feature?
>
> I've lost some bits of this patch series somewhere along the way :-( There
> was once code to decide whether the feature was supported (either with
> x86_match_cpu() for a couple of models, or using the architectural test
> based on some MSR bits. I need to dig that out and put it back in. Then
> stuff can check X86_FEATURE_SPLIT_LOCK before wandering into code
> that messes with MSRs

That code is still there (cpu_set_core_cap_bits). The issue is that with
the initialization here, nothing ever sets sld_state to sld_off if the
feature isn't supported.

v10 had a corresponding split_lock_detect_enabled that was
0-initialized, but Peter's patch as he sent out had the flag initialized
to sld_warn.

>
> >> + if (!split_lock_detect_enabled())
> >> + return;
> >
> > This misses one comment from Sean [1] that this check should be dropped,
> > otherwise user-space alignment check via EFLAGS.AC will get ignored when
> > split lock detection is disabled.
>
> Ah yes. Good catch. Will fix.
>
> Thanks for the review.
>
> -Tony

2020-01-23 01:26:00

by Tony Luck

[permalink] [raw]
Subject: Re: [PATCH v12] x86/split_lock: Enable split lock detection by kernel

On Wed, Jan 22, 2020 at 07:45:08PM -0500, Arvind Sankar wrote:
> On Wed, Jan 22, 2020 at 11:24:34PM +0000, Luck, Tony wrote:
> > >> +static enum split_lock_detect_state sld_state = sld_warn;
> > >> +
> > >
> > > This sets sld_state to sld_warn even on CPUs that don't support
> > > split-lock detection. split_lock_init will then try to read/write the
> > > MSR to turn it on. Would it be better to initialize it to sld_off and
> > > set it to sld_warn in split_lock_setup instead, which is only called if
> > > the CPU supports the feature?
> >
> > I've lost some bits of this patch series somewhere along the way :-( There
> > was once code to decide whether the feature was supported (either with
> > x86_match_cpu() for a couple of models, or using the architectural test
> > based on some MSR bits. I need to dig that out and put it back in. Then
> > stuff can check X86_FEATURE_SPLIT_LOCK before wandering into code
> > that messes with MSRs
>
> That code is still there (cpu_set_core_cap_bits). The issue is that with
> the initialization here, nothing ever sets sld_state to sld_off if the
> feature isn't supported.
>
> v10 had a corresponding split_lock_detect_enabled that was
> 0-initialized, but Peter's patch as he sent out had the flag initialized
> to sld_warn.

Ah yes. Maybe the problem is that split_lock_init() is only
called on systems that support split loc detect, while we call
split_lock_init() unconditionally.

What if we start with sld_state = sld_off, and then have split_lock_setup
set it to either sld_warn, or whatever the user chose on the command
line. Patch below (on top of patch so you can see what I'm saying,
but will just merge it in for next version.

-Tony


diff --git a/arch/x86/kernel/cpu/intel.c b/arch/x86/kernel/cpu/intel.c
index 7478bebcd735..b6046ccfa372 100644
--- a/arch/x86/kernel/cpu/intel.c
+++ b/arch/x86/kernel/cpu/intel.c
@@ -39,7 +39,13 @@ enum split_lock_detect_state {
sld_fatal,
};

-static enum split_lock_detect_state sld_state = sld_warn;
+/*
+ * Default to sld_off because most systems do not support
+ * split lock detection. split_lock_setup() will switch this
+ * to sld_warn, and then check to see if there is a command
+ * line override.
+ */
+static enum split_lock_detect_state sld_state = sld_off;

/*
* Just in case our CPU detection goes bad, or you have a weird system,
@@ -1017,10 +1023,11 @@ static inline bool match_option(const char *arg, int arglen, const char *opt)

static void __init split_lock_setup(void)
{
- enum split_lock_detect_state sld = sld_state;
+ enum split_lock_detect_state sld;
char arg[20];
int i, ret;

+ sld_state = sld = sld_warn;
setup_force_cpu_cap(X86_FEATURE_SPLIT_LOCK_DETECT);

ret = cmdline_find_option(boot_command_line, "split_lock_ac",

2020-01-23 03:56:13

by Tony Luck

[permalink] [raw]
Subject: [PATCH v13] x86/split_lock: Enable split lock detection by kernel

From: Peter Zijlstra <[email protected]>

A split-lock occurs when an atomic instruction operates on data
that spans two cache lines. In order to maintain atomicity the
core takes a global bus lock.

This is typically >1000 cycles slower than an atomic operation
within a cache line. It also disrupts performance on other cores
(which must wait for the bus lock to be released before their
memory operations can complete. For real-time systems this may
mean missing deadlines. For other systems it may just be very
annoying.

Some CPUs have the capability to raise an #AC trap when a
split lock is attempted.

Provide a command line option to give the user choices on how
to handle this. split_lock_detect=
off - not enabled (no traps for split locks)
warn - warn once when an application does a
split lock, bust allow it to continue
running.
fatal - Send SIGBUS to applications that cause split lock

Default is "warn". Note that if the kernel hits a split lock
in any mode other than "off" it will OOPs.

One implementation wrinkle is that the MSR to control the
split lock detection is per-core, not per thread. This might
result in some short lived races on HT systems in "warn" mode
if Linux tries to enable on one thread while disabling on
the other. Race analysis by Sean Christopherson:

- Toggling of split-lock is only done in "warn" mode. Worst case
scenario of a race is that a misbehaving task will generate multiple
#AC exceptions on the same instruction. And this race will only occur
if both siblings are running tasks that generate split-lock #ACs, e.g.
a race where sibling threads are writing different values will only
occur if CPUx is disabling split-lock after an #AC and CPUy is
re-enabling split-lock after *its* previous task generated an #AC.
- Transitioning between modes at runtime isn't supported and disabling
is tracked per task, so hardware will always reach a steady state that
matches the configured mode. I.e. split-lock is guaranteed to be
enabled in hardware once all _TIF_SLD threads have been scheduled out.

Co-developed-by: Fenghua Yu <[email protected]>
Co-developed-by: Tony Luck <[email protected]>
Signed-off-by: Fenghua Yu <[email protected]>
Signed-off-by: Peter Zijlstra (Intel) <[email protected]>
Signed-off-by: Tony Luck <[email protected]>
---

v13: (rebased to tip/master because of first item below)
Boris: X86 features word 7 is full, move to word 11
Boris: MSR_IA32_CORE_CAPABILITIES too long. Abbreviate
(but include comment with SDM matching name)
Arvind: Missed a comment from Sean about bogus test in
trap handling. Delete it.
Arvind: split_lock_init() accesses MSR on platforms that
don't support it. Change default to "off" and
only upgrade to "warn" on platforms that support
split lock detect.

.../admin-guide/kernel-parameters.txt | 18 ++
arch/x86/include/asm/cpu.h | 17 ++
arch/x86/include/asm/cpufeatures.h | 2 +
arch/x86/include/asm/msr-index.h | 9 +
arch/x86/include/asm/thread_info.h | 6 +-
arch/x86/kernel/cpu/common.c | 2 +
arch/x86/kernel/cpu/intel.c | 177 ++++++++++++++++++
arch/x86/kernel/process.c | 3 +
arch/x86/kernel/traps.c | 24 ++-
9 files changed, 254 insertions(+), 4 deletions(-)

diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt
index 7f1e2f327e43..b420e0cebc0c 100644
--- a/Documentation/admin-guide/kernel-parameters.txt
+++ b/Documentation/admin-guide/kernel-parameters.txt
@@ -3207,6 +3207,24 @@

nosoftlockup [KNL] Disable the soft-lockup detector.

+ split_lock_ac=
+ [X86] Enable split lock detection
+
+ When enabled (and if hardware support is present), atomic
+ instructions that access data across cache line
+ boundaries will result in an alignment check exception.
+
+ off - not enabled
+
+ warn - the kernel will pr_alert about applications
+ triggering the #AC exception
+
+ fatal - the kernel will SIGBUS applications that
+ trigger the #AC exception.
+
+ For any more other than 'off' the kernel will die if
+ it (or firmware) will trigger #AC.
+
nosync [HW,M68K] Disables sync negotiation for all devices.

nowatchdog [KNL] Disable both lockup detectors, i.e.
diff --git a/arch/x86/include/asm/cpu.h b/arch/x86/include/asm/cpu.h
index adc6cc86b062..32a295533e2d 100644
--- a/arch/x86/include/asm/cpu.h
+++ b/arch/x86/include/asm/cpu.h
@@ -40,4 +40,21 @@ int mwait_usable(const struct cpuinfo_x86 *);
unsigned int x86_family(unsigned int sig);
unsigned int x86_model(unsigned int sig);
unsigned int x86_stepping(unsigned int sig);
+#ifdef CONFIG_CPU_SUP_INTEL
+extern void __init cpu_set_core_cap_bits(struct cpuinfo_x86 *c);
+extern bool split_lock_detect_enabled(void);
+extern bool handle_user_split_lock(struct pt_regs *regs, long error_code);
+extern void switch_sld(struct task_struct *);
+#else
+static inline void __init cpu_set_core_cap_bits(struct cpuinfo_x86 *c) {}
+static inline bool split_lock_detect_enabled(void)
+{
+ return false;
+}
+static inline bool handle_user_split_lock(struct pt_regs *regs, long error_code)
+{
+ return false;
+}
+static inline void switch_sld(struct task_struct *prev) {}
+#endif
#endif /* _ASM_X86_CPU_H */
diff --git a/arch/x86/include/asm/cpufeatures.h b/arch/x86/include/asm/cpufeatures.h
index f3327cb56edf..cd56ad5d308e 100644
--- a/arch/x86/include/asm/cpufeatures.h
+++ b/arch/x86/include/asm/cpufeatures.h
@@ -285,6 +285,7 @@
#define X86_FEATURE_CQM_MBM_LOCAL (11*32+ 3) /* LLC Local MBM monitoring */
#define X86_FEATURE_FENCE_SWAPGS_USER (11*32+ 4) /* "" LFENCE in user entry SWAPGS path */
#define X86_FEATURE_FENCE_SWAPGS_KERNEL (11*32+ 5) /* "" LFENCE in kernel entry SWAPGS path */
+#define X86_FEATURE_SPLIT_LOCK_DETECT (11*32+ 6) /* #AC for split lock */

/* Intel-defined CPU features, CPUID level 0x00000007:1 (EAX), word 12 */
#define X86_FEATURE_AVX512_BF16 (12*32+ 5) /* AVX512 BFLOAT16 instructions */
@@ -367,6 +368,7 @@
#define X86_FEATURE_INTEL_STIBP (18*32+27) /* "" Single Thread Indirect Branch Predictors */
#define X86_FEATURE_FLUSH_L1D (18*32+28) /* Flush L1D cache */
#define X86_FEATURE_ARCH_CAPABILITIES (18*32+29) /* IA32_ARCH_CAPABILITIES MSR (Intel) */
+#define X86_FEATURE_CORE_CAPABILITIES (18*32+30) /* "" IA32_CORE_CAPABILITIES MSR */
#define X86_FEATURE_SPEC_CTRL_SSBD (18*32+31) /* "" Speculative Store Bypass Disable */

/*
diff --git a/arch/x86/include/asm/msr-index.h b/arch/x86/include/asm/msr-index.h
index ebe1685e92dd..8821697a7549 100644
--- a/arch/x86/include/asm/msr-index.h
+++ b/arch/x86/include/asm/msr-index.h
@@ -41,6 +41,10 @@

/* Intel MSRs. Some also available on other CPUs */

+#define MSR_TEST_CTRL 0x00000033
+#define MSR_TEST_CTRL_SPLIT_LOCK_DETECT_BIT 29
+#define MSR_TEST_CTRL_SPLIT_LOCK_DETECT BIT(MSR_TEST_CTRL_SPLIT_LOCK_DETECT_BIT)
+
#define MSR_IA32_SPEC_CTRL 0x00000048 /* Speculation Control */
#define SPEC_CTRL_IBRS BIT(0) /* Indirect Branch Restricted Speculation */
#define SPEC_CTRL_STIBP_SHIFT 1 /* Single Thread Indirect Branch Predictor (STIBP) bit */
@@ -70,6 +74,11 @@
*/
#define MSR_IA32_UMWAIT_CONTROL_TIME_MASK (~0x03U)

+/* Abbreviated from Intel SDM name IA32_CORE_CAPABILITIES */
+#define MSR_IA32_CORE_CAPS 0x000000cf
+#define MSR_IA32_CORE_CAPS_SPLIT_LOCK_DETECT_BIT 5
+#define MSR_IA32_CORE_CAPS_SPLIT_LOCK_DETECT BIT(MSR_IA32_CORE_CAPS_SPLIT_LOCK_DETECT_BIT)
+
#define MSR_PKG_CST_CONFIG_CONTROL 0x000000e2
#define NHM_C3_AUTO_DEMOTE (1UL << 25)
#define NHM_C1_AUTO_DEMOTE (1UL << 26)
diff --git a/arch/x86/include/asm/thread_info.h b/arch/x86/include/asm/thread_info.h
index cf4327986e98..e0d12517f348 100644
--- a/arch/x86/include/asm/thread_info.h
+++ b/arch/x86/include/asm/thread_info.h
@@ -92,6 +92,7 @@ struct thread_info {
#define TIF_NOCPUID 15 /* CPUID is not accessible in userland */
#define TIF_NOTSC 16 /* TSC is not accessible in userland */
#define TIF_IA32 17 /* IA32 compatibility process */
+#define TIF_SLD 18 /* Restore split lock detection on context switch */
#define TIF_NOHZ 19 /* in adaptive nohz mode */
#define TIF_MEMDIE 20 /* is terminating due to OOM killer */
#define TIF_POLLING_NRFLAG 21 /* idle is polling for TIF_NEED_RESCHED */
@@ -122,6 +123,7 @@ struct thread_info {
#define _TIF_NOCPUID (1 << TIF_NOCPUID)
#define _TIF_NOTSC (1 << TIF_NOTSC)
#define _TIF_IA32 (1 << TIF_IA32)
+#define _TIF_SLD (1 << TIF_SLD)
#define _TIF_NOHZ (1 << TIF_NOHZ)
#define _TIF_POLLING_NRFLAG (1 << TIF_POLLING_NRFLAG)
#define _TIF_IO_BITMAP (1 << TIF_IO_BITMAP)
@@ -158,9 +160,9 @@ struct thread_info {

#ifdef CONFIG_X86_IOPL_IOPERM
# define _TIF_WORK_CTXSW_PREV (_TIF_WORK_CTXSW| _TIF_USER_RETURN_NOTIFY | \
- _TIF_IO_BITMAP)
+ _TIF_IO_BITMAP | _TIF_SLD)
#else
-# define _TIF_WORK_CTXSW_PREV (_TIF_WORK_CTXSW| _TIF_USER_RETURN_NOTIFY)
+# define _TIF_WORK_CTXSW_PREV (_TIF_WORK_CTXSW| _TIF_USER_RETURN_NOTIFY | _TIF_SLD)
#endif

#define _TIF_WORK_CTXSW_NEXT (_TIF_WORK_CTXSW)
diff --git a/arch/x86/kernel/cpu/common.c b/arch/x86/kernel/cpu/common.c
index 86b8241c8209..adb2f639f388 100644
--- a/arch/x86/kernel/cpu/common.c
+++ b/arch/x86/kernel/cpu/common.c
@@ -1242,6 +1242,8 @@ static void __init early_identify_cpu(struct cpuinfo_x86 *c)

cpu_set_bug_bits(c);

+ cpu_set_core_cap_bits(c);
+
fpu__init_system(c);

#ifdef CONFIG_X86_32
diff --git a/arch/x86/kernel/cpu/intel.c b/arch/x86/kernel/cpu/intel.c
index 57473e2c0869..b6046ccfa372 100644
--- a/arch/x86/kernel/cpu/intel.c
+++ b/arch/x86/kernel/cpu/intel.c
@@ -19,6 +19,8 @@
#include <asm/microcode_intel.h>
#include <asm/hwcap2.h>
#include <asm/elf.h>
+#include <asm/cpu_device_id.h>
+#include <asm/cmdline.h>

#ifdef CONFIG_X86_64
#include <linux/topology.h>
@@ -31,6 +33,20 @@
#include <asm/apic.h>
#endif

+enum split_lock_detect_state {
+ sld_off = 0,
+ sld_warn,
+ sld_fatal,
+};
+
+/*
+ * Default to sld_off because most systems do not support
+ * split lock detection. split_lock_setup() will switch this
+ * to sld_warn, and then check to see if there is a command
+ * line override.
+ */
+static enum split_lock_detect_state sld_state = sld_off;
+
/*
* Just in case our CPU detection goes bad, or you have a weird system,
* allow a way to override the automatic disabling of MPX.
@@ -606,6 +622,8 @@ static void init_intel_misc_features(struct cpuinfo_x86 *c)
wrmsrl(MSR_MISC_FEATURES_ENABLES, msr);
}

+static void split_lock_init(void);
+
static void init_intel(struct cpuinfo_x86 *c)
{
early_init_intel(c);
@@ -720,6 +738,8 @@ static void init_intel(struct cpuinfo_x86 *c)
tsx_enable();
if (tsx_ctrl_state == TSX_CTRL_DISABLE)
tsx_disable();
+
+ split_lock_init();
}

#ifdef CONFIG_X86_32
@@ -981,3 +1001,160 @@ static const struct cpu_dev intel_cpu_dev = {
};

cpu_dev_register(intel_cpu_dev);
+
+#undef pr_fmt
+#define pr_fmt(fmt) "x86/split lock detection: " fmt
+
+static const struct {
+ const char *option;
+ enum split_lock_detect_state state;
+} sld_options[] __initconst = {
+ { "off", sld_off },
+ { "warn", sld_warn },
+ { "fatal", sld_fatal },
+};
+
+static inline bool match_option(const char *arg, int arglen, const char *opt)
+{
+ int len = strlen(opt);
+
+ return len == arglen && !strncmp(arg, opt, len);
+}
+
+static void __init split_lock_setup(void)
+{
+ enum split_lock_detect_state sld;
+ char arg[20];
+ int i, ret;
+
+ sld_state = sld = sld_warn;
+ setup_force_cpu_cap(X86_FEATURE_SPLIT_LOCK_DETECT);
+
+ ret = cmdline_find_option(boot_command_line, "split_lock_ac",
+ arg, sizeof(arg));
+ if (ret < 0)
+ goto print;
+
+ for (i = 0; i < ARRAY_SIZE(sld_options); i++) {
+ if (match_option(arg, ret, sld_options[i].option)) {
+ sld = sld_options[i].state;
+ break;
+ }
+ }
+
+ if (sld != sld_state)
+ sld_state = sld;
+
+print:
+ switch(sld) {
+ case sld_off:
+ pr_info("disabled\n");
+ break;
+
+ case sld_warn:
+ pr_info("warning about user-space split_locks\n");
+ break;
+
+ case sld_fatal:
+ pr_info("sending SIGBUS on user-space split_locks\n");
+ break;
+ }
+}
+
+/*
+ * The TEST_CTRL MSR is per core. So multiple threads can
+ * read/write the MSR in parallel. But it's possible to
+ * simplify the read/write without locking and without
+ * worry about overwriting the MSR because only bit 29
+ * is implemented in the MSR and the bit is set as 1 by all
+ * threads. Locking may be needed in the future if situation
+ * is changed e.g. other bits are implemented.
+ */
+
+static bool __sld_msr_set(bool on)
+{
+ u64 test_ctrl_val;
+
+ if (rdmsrl_safe(MSR_TEST_CTRL, &test_ctrl_val))
+ return false;
+
+ if (on)
+ test_ctrl_val |= MSR_TEST_CTRL_SPLIT_LOCK_DETECT;
+ else
+ test_ctrl_val &= ~MSR_TEST_CTRL_SPLIT_LOCK_DETECT;
+
+ if (wrmsrl_safe(MSR_TEST_CTRL, test_ctrl_val))
+ return false;
+
+ return true;
+}
+
+static void split_lock_init(void)
+{
+ if (sld_state == sld_off)
+ return;
+
+ if (__sld_msr_set(true))
+ return;
+
+ /*
+ * If this is anything other than the boot-cpu, you've done
+ * funny things and you get to keep whatever pieces.
+ */
+ pr_warn("MSR fail -- disabled\n");
+ __sld_msr_set(sld_off);
+}
+
+bool split_lock_detect_enabled(void)
+{
+ return sld_state != sld_off;
+}
+
+bool handle_user_split_lock(struct pt_regs *regs, long error_code)
+{
+ if ((regs->flags & X86_EFLAGS_AC) || sld_state == sld_fatal)
+ return false;
+
+ pr_warn_ratelimited("#AC: %s/%d took a split_lock trap at address: 0x%lx\n",
+ current->comm, current->pid, regs->ip);
+
+ __sld_msr_set(false);
+ set_tsk_thread_flag(current, TIF_SLD);
+ return true;
+}
+
+void switch_sld(struct task_struct *prev)
+{
+ __sld_msr_set(true);
+ clear_tsk_thread_flag(prev, TIF_SLD);
+}
+
+#define SPLIT_LOCK_CPU(model) {X86_VENDOR_INTEL, 6, model, X86_FEATURE_ANY}
+
+/*
+ * The following processors have split lock detection feature. But since they
+ * don't have MSR IA32_CORE_CAPABILITIES, the feature cannot be enumerated by
+ * the MSR. So enumerate the feature by family and model on these processors.
+ */
+static const struct x86_cpu_id split_lock_cpu_ids[] __initconst = {
+ SPLIT_LOCK_CPU(INTEL_FAM6_ICELAKE_X),
+ SPLIT_LOCK_CPU(INTEL_FAM6_ICELAKE_L),
+ {}
+};
+
+void __init cpu_set_core_cap_bits(struct cpuinfo_x86 *c)
+{
+ u64 ia32_core_caps = 0;
+
+ if (cpu_has(c, X86_FEATURE_CORE_CAPABILITIES)) {
+ /* Enumerate features reported in IA32_CORE_CAPABILITIES MSR. */
+ rdmsrl(MSR_IA32_CORE_CAPS, ia32_core_caps);
+ } else if (!boot_cpu_has(X86_FEATURE_HYPERVISOR)) {
+ /* Enumerate split lock detection by family and model. */
+ if (x86_match_cpu(split_lock_cpu_ids))
+ ia32_core_caps |= MSR_IA32_CORE_CAPS_SPLIT_LOCK_DETECT;
+ }
+
+ if (ia32_core_caps & MSR_IA32_CORE_CAPS_SPLIT_LOCK_DETECT)
+ split_lock_setup();
+}
diff --git a/arch/x86/kernel/process.c b/arch/x86/kernel/process.c
index 839b5244e3b7..355760d36505 100644
--- a/arch/x86/kernel/process.c
+++ b/arch/x86/kernel/process.c
@@ -650,6 +650,9 @@ void __switch_to_xtra(struct task_struct *prev_p, struct task_struct *next_p)
/* Enforce MSR update to ensure consistent state */
__speculation_ctrl_update(~tifn, tifn);
}
+
+ if (tifp & _TIF_SLD)
+ switch_sld(prev_p);
}

/*
diff --git a/arch/x86/kernel/traps.c b/arch/x86/kernel/traps.c
index 9e6f822922a3..61c576b95184 100644
--- a/arch/x86/kernel/traps.c
+++ b/arch/x86/kernel/traps.c
@@ -46,6 +46,7 @@
#include <asm/traps.h>
#include <asm/desc.h>
#include <asm/fpu/internal.h>
+#include <asm/cpu.h>
#include <asm/cpu_entry_area.h>
#include <asm/mce.h>
#include <asm/fixmap.h>
@@ -244,7 +245,6 @@ do_trap(int trapnr, int signr, char *str, struct pt_regs *regs,
{
struct task_struct *tsk = current;

-
if (!do_trap_no_signal(tsk, trapnr, str, regs, error_code))
return;

@@ -290,9 +290,29 @@ DO_ERROR(X86_TRAP_OLD_MF, SIGFPE, 0, NULL, "coprocessor segment overru
DO_ERROR(X86_TRAP_TS, SIGSEGV, 0, NULL, "invalid TSS", invalid_TSS)
DO_ERROR(X86_TRAP_NP, SIGBUS, 0, NULL, "segment not present", segment_not_present)
DO_ERROR(X86_TRAP_SS, SIGBUS, 0, NULL, "stack segment", stack_segment)
-DO_ERROR(X86_TRAP_AC, SIGBUS, BUS_ADRALN, NULL, "alignment check", alignment_check)
#undef IP

+dotraplinkage void do_alignment_check(struct pt_regs *regs, long error_code)
+{
+ const char str[] = "alignment check";
+
+ RCU_LOCKDEP_WARN(!rcu_is_watching(), "entry code didn't wake RCU");
+
+ if (notify_die(DIE_TRAP, str, regs, error_code, X86_TRAP_AC, SIGBUS) == NOTIFY_STOP)
+ return;
+
+ if (!user_mode(regs))
+ die("Split lock detected\n", regs, error_code);
+
+ cond_local_irq_enable(regs);
+
+ if (handle_user_split_lock(regs, error_code))
+ return;
+
+ do_trap(X86_TRAP_AC, SIGBUS, "alignment check", regs,
+ error_code, BUS_ADRALN, NULL);
+}
+
#ifdef CONFIG_VMAP_STACK
__visible void __noreturn handle_stack_overflow(const char *message,
struct pt_regs *regs,
--
2.21.0

2020-01-23 04:23:47

by Arvind Sankar

[permalink] [raw]
Subject: Re: [PATCH v12] x86/split_lock: Enable split lock detection by kernel

On Wed, Jan 22, 2020 at 05:23:17PM -0800, Luck, Tony wrote:
> On Wed, Jan 22, 2020 at 07:45:08PM -0500, Arvind Sankar wrote:
> > On Wed, Jan 22, 2020 at 11:24:34PM +0000, Luck, Tony wrote:
> > > >> +static enum split_lock_detect_state sld_state = sld_warn;
> > > >> +
> > > >
> > > > This sets sld_state to sld_warn even on CPUs that don't support
> > > > split-lock detection. split_lock_init will then try to read/write the
> > > > MSR to turn it on. Would it be better to initialize it to sld_off and
> > > > set it to sld_warn in split_lock_setup instead, which is only called if
> > > > the CPU supports the feature?
> > >
> > > I've lost some bits of this patch series somewhere along the way :-( There
> > > was once code to decide whether the feature was supported (either with
> > > x86_match_cpu() for a couple of models, or using the architectural test
> > > based on some MSR bits. I need to dig that out and put it back in. Then
> > > stuff can check X86_FEATURE_SPLIT_LOCK before wandering into code
> > > that messes with MSRs
> >
> > That code is still there (cpu_set_core_cap_bits). The issue is that with
> > the initialization here, nothing ever sets sld_state to sld_off if the
> > feature isn't supported.
> >
> > v10 had a corresponding split_lock_detect_enabled that was
> > 0-initialized, but Peter's patch as he sent out had the flag initialized
> > to sld_warn.
>
> Ah yes. Maybe the problem is that split_lock_init() is only
> called on systems that support split loc detect, while we call
> split_lock_init() unconditionally.

It was unconditional in v10 too?

>
> What if we start with sld_state = sld_off, and then have split_lock_setup
> set it to either sld_warn, or whatever the user chose on the command
> line. Patch below (on top of patch so you can see what I'm saying,
> but will just merge it in for next version.

Yep, that's what I suggested.

>
> -Tony
>
>
> diff --git a/arch/x86/kernel/cpu/intel.c b/arch/x86/kernel/cpu/intel.c
> index 7478bebcd735..b6046ccfa372 100644
> --- a/arch/x86/kernel/cpu/intel.c
> +++ b/arch/x86/kernel/cpu/intel.c
> @@ -39,7 +39,13 @@ enum split_lock_detect_state {
> sld_fatal,
> };
>
> -static enum split_lock_detect_state sld_state = sld_warn;
> +/*
> + * Default to sld_off because most systems do not support
> + * split lock detection. split_lock_setup() will switch this
> + * to sld_warn, and then check to see if there is a command
> + * line override.
> + */
> +static enum split_lock_detect_state sld_state = sld_off;
>
> /*
> * Just in case our CPU detection goes bad, or you have a weird system,
> @@ -1017,10 +1023,11 @@ static inline bool match_option(const char *arg, int arglen, const char *opt)
>
> static void __init split_lock_setup(void)
> {
> - enum split_lock_detect_state sld = sld_state;
> + enum split_lock_detect_state sld;

This is bike-shedding, but initializing sld = sld_warn here would have
been enough with no other changes to the patch I think?

2020-01-23 04:55:06

by Arvind Sankar

[permalink] [raw]
Subject: Re: [PATCH v13] x86/split_lock: Enable split lock detection by kernel

On Wed, Jan 22, 2020 at 07:53:59PM -0800, Luck, Tony wrote:
>
> + split_lock_ac=
> + [X86] Enable split lock detection

More bike-shedding: I actually don't get Sean's suggestion to rename
this to split_lock_ac [1]. If split lock detection is able to trigger
some other form of fault/trap we just change the implementation to cope,
we would not want to change the command line argument that enables it,
so split_lock_detect is more informative?

And if the concern is the earlier one [2], then surely everything should
be renamed sld -> slac?

[1] https://lore.kernel.org/lkml/[email protected]/
[2] https://lore.kernel.org/lkml/[email protected]/

2020-01-23 17:21:03

by Tony Luck

[permalink] [raw]
Subject: RE: [PATCH v12] x86/split_lock: Enable split lock detection by kernel

>> static void __init split_lock_setup(void)
>> {
>> - enum split_lock_detect_state sld = sld_state;
>> + enum split_lock_detect_state sld;
>
> This is bike-shedding, but initializing sld = sld_warn here would have
> been enough with no other changes to the patch I think?

Not quite. If there isn't a command line option, we get here:

if (ret < 0)
goto print;

which skips copying the local "sld" to the global "sld_state".

-Tony

2020-01-23 23:25:19

by Tony Luck

[permalink] [raw]
Subject: [PATCH v14] x86/split_lock: Enable split lock detection by kernel

From: Peter Zijlstra <[email protected]>

A split-lock occurs when an atomic instruction operates on data
that spans two cache lines. In order to maintain atomicity the
core takes a global bus lock.

This is typically >1000 cycles slower than an atomic operation
within a cache line. It also disrupts performance on other cores
(which must wait for the bus lock to be released before their
memory operations can complete. For real-time systems this may
mean missing deadlines. For other systems it may just be very
annoying.

Some CPUs have the capability to raise an #AC trap when a
split lock is attempted.

Provide a command line option to give the user choices on how
to handle this. split_lock_detect=
off - not enabled (no traps for split locks)
warn - warn once when an application does a
split lock, bust allow it to continue
running.
fatal - Send SIGBUS to applications that cause split lock

Default is "warn". Note that if the kernel hits a split lock
in any mode other than "off" it will OOPs.

One implementation wrinkle is that the MSR to control the
split lock detection is per-core, not per thread. This might
result in some short lived races on HT systems in "warn" mode
if Linux tries to enable on one thread while disabling on
the other. Race analysis by Sean Christopherson:

- Toggling of split-lock is only done in "warn" mode. Worst case
scenario of a race is that a misbehaving task will generate multiple
#AC exceptions on the same instruction. And this race will only occur
if both siblings are running tasks that generate split-lock #ACs, e.g.
a race where sibling threads are writing different values will only
occur if CPUx is disabling split-lock after an #AC and CPUy is
re-enabling split-lock after *its* previous task generated an #AC.
- Transitioning between modes at runtime isn't supported and disabling
is tracked per task, so hardware will always reach a steady state that
matches the configured mode. I.e. split-lock is guaranteed to be
enabled in hardware once all _TIF_SLD threads have been scheduled out.

Co-developed-by: Fenghua Yu <[email protected]>
Co-developed-by: Tony Luck <[email protected]>
Signed-off-by: Fenghua Yu <[email protected]>
Signed-off-by: Peter Zijlstra (Intel) <[email protected]>
Signed-off-by: Tony Luck <[email protected]>
---

v14: I chatted offline with Sean about the kernel parameter name. He's
now OK with the more generic name "split_lock_detect" rather than
the trap specific split_lock_ac. So this reverts that change in
the code and Documentation. Thanks to Arvind for making us see
sense ... not bike shedding at all!

.../admin-guide/kernel-parameters.txt | 18 ++
arch/x86/include/asm/cpu.h | 17 ++
arch/x86/include/asm/cpufeatures.h | 2 +
arch/x86/include/asm/msr-index.h | 9 +
arch/x86/include/asm/thread_info.h | 6 +-
arch/x86/kernel/cpu/common.c | 2 +
arch/x86/kernel/cpu/intel.c | 177 ++++++++++++++++++
arch/x86/kernel/process.c | 3 +
arch/x86/kernel/traps.c | 24 ++-
9 files changed, 254 insertions(+), 4 deletions(-)

diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt
index 7f1e2f327e43..568d20c04441 100644
--- a/Documentation/admin-guide/kernel-parameters.txt
+++ b/Documentation/admin-guide/kernel-parameters.txt
@@ -3207,6 +3207,24 @@

nosoftlockup [KNL] Disable the soft-lockup detector.

+ split_lock_detect=
+ [X86] Enable split lock detection
+
+ When enabled (and if hardware support is present), atomic
+ instructions that access data across cache line
+ boundaries will result in an alignment check exception.
+
+ off - not enabled
+
+ warn - the kernel will pr_alert about applications
+ triggering the #AC exception
+
+ fatal - the kernel will SIGBUS applications that
+ trigger the #AC exception.
+
+ For any more other than 'off' the kernel will die if
+ it (or firmware) will trigger #AC.
+
nosync [HW,M68K] Disables sync negotiation for all devices.

nowatchdog [KNL] Disable both lockup detectors, i.e.
diff --git a/arch/x86/include/asm/cpu.h b/arch/x86/include/asm/cpu.h
index adc6cc86b062..32a295533e2d 100644
--- a/arch/x86/include/asm/cpu.h
+++ b/arch/x86/include/asm/cpu.h
@@ -40,4 +40,21 @@ int mwait_usable(const struct cpuinfo_x86 *);
unsigned int x86_family(unsigned int sig);
unsigned int x86_model(unsigned int sig);
unsigned int x86_stepping(unsigned int sig);
+#ifdef CONFIG_CPU_SUP_INTEL
+extern void __init cpu_set_core_cap_bits(struct cpuinfo_x86 *c);
+extern bool split_lock_detect_enabled(void);
+extern bool handle_user_split_lock(struct pt_regs *regs, long error_code);
+extern void switch_sld(struct task_struct *);
+#else
+static inline void __init cpu_set_core_cap_bits(struct cpuinfo_x86 *c) {}
+static inline bool split_lock_detect_enabled(void)
+{
+ return false;
+}
+static inline bool handle_user_split_lock(struct pt_regs *regs, long error_code)
+{
+ return false;
+}
+static inline void switch_sld(struct task_struct *prev) {}
+#endif
#endif /* _ASM_X86_CPU_H */
diff --git a/arch/x86/include/asm/cpufeatures.h b/arch/x86/include/asm/cpufeatures.h
index f3327cb56edf..cd56ad5d308e 100644
--- a/arch/x86/include/asm/cpufeatures.h
+++ b/arch/x86/include/asm/cpufeatures.h
@@ -285,6 +285,7 @@
#define X86_FEATURE_CQM_MBM_LOCAL (11*32+ 3) /* LLC Local MBM monitoring */
#define X86_FEATURE_FENCE_SWAPGS_USER (11*32+ 4) /* "" LFENCE in user entry SWAPGS path */
#define X86_FEATURE_FENCE_SWAPGS_KERNEL (11*32+ 5) /* "" LFENCE in kernel entry SWAPGS path */
+#define X86_FEATURE_SPLIT_LOCK_DETECT (11*32+ 6) /* #AC for split lock */

/* Intel-defined CPU features, CPUID level 0x00000007:1 (EAX), word 12 */
#define X86_FEATURE_AVX512_BF16 (12*32+ 5) /* AVX512 BFLOAT16 instructions */
@@ -367,6 +368,7 @@
#define X86_FEATURE_INTEL_STIBP (18*32+27) /* "" Single Thread Indirect Branch Predictors */
#define X86_FEATURE_FLUSH_L1D (18*32+28) /* Flush L1D cache */
#define X86_FEATURE_ARCH_CAPABILITIES (18*32+29) /* IA32_ARCH_CAPABILITIES MSR (Intel) */
+#define X86_FEATURE_CORE_CAPABILITIES (18*32+30) /* "" IA32_CORE_CAPABILITIES MSR */
#define X86_FEATURE_SPEC_CTRL_SSBD (18*32+31) /* "" Speculative Store Bypass Disable */

/*
diff --git a/arch/x86/include/asm/msr-index.h b/arch/x86/include/asm/msr-index.h
index ebe1685e92dd..8821697a7549 100644
--- a/arch/x86/include/asm/msr-index.h
+++ b/arch/x86/include/asm/msr-index.h
@@ -41,6 +41,10 @@

/* Intel MSRs. Some also available on other CPUs */

+#define MSR_TEST_CTRL 0x00000033
+#define MSR_TEST_CTRL_SPLIT_LOCK_DETECT_BIT 29
+#define MSR_TEST_CTRL_SPLIT_LOCK_DETECT BIT(MSR_TEST_CTRL_SPLIT_LOCK_DETECT_BIT)
+
#define MSR_IA32_SPEC_CTRL 0x00000048 /* Speculation Control */
#define SPEC_CTRL_IBRS BIT(0) /* Indirect Branch Restricted Speculation */
#define SPEC_CTRL_STIBP_SHIFT 1 /* Single Thread Indirect Branch Predictor (STIBP) bit */
@@ -70,6 +74,11 @@
*/
#define MSR_IA32_UMWAIT_CONTROL_TIME_MASK (~0x03U)

+/* Abbreviated from Intel SDM name IA32_CORE_CAPABILITIES */
+#define MSR_IA32_CORE_CAPS 0x000000cf
+#define MSR_IA32_CORE_CAPS_SPLIT_LOCK_DETECT_BIT 5
+#define MSR_IA32_CORE_CAPS_SPLIT_LOCK_DETECT BIT(MSR_IA32_CORE_CAPS_SPLIT_LOCK_DETECT_BIT)
+
#define MSR_PKG_CST_CONFIG_CONTROL 0x000000e2
#define NHM_C3_AUTO_DEMOTE (1UL << 25)
#define NHM_C1_AUTO_DEMOTE (1UL << 26)
diff --git a/arch/x86/include/asm/thread_info.h b/arch/x86/include/asm/thread_info.h
index cf4327986e98..e0d12517f348 100644
--- a/arch/x86/include/asm/thread_info.h
+++ b/arch/x86/include/asm/thread_info.h
@@ -92,6 +92,7 @@ struct thread_info {
#define TIF_NOCPUID 15 /* CPUID is not accessible in userland */
#define TIF_NOTSC 16 /* TSC is not accessible in userland */
#define TIF_IA32 17 /* IA32 compatibility process */
+#define TIF_SLD 18 /* Restore split lock detection on context switch */
#define TIF_NOHZ 19 /* in adaptive nohz mode */
#define TIF_MEMDIE 20 /* is terminating due to OOM killer */
#define TIF_POLLING_NRFLAG 21 /* idle is polling for TIF_NEED_RESCHED */
@@ -122,6 +123,7 @@ struct thread_info {
#define _TIF_NOCPUID (1 << TIF_NOCPUID)
#define _TIF_NOTSC (1 << TIF_NOTSC)
#define _TIF_IA32 (1 << TIF_IA32)
+#define _TIF_SLD (1 << TIF_SLD)
#define _TIF_NOHZ (1 << TIF_NOHZ)
#define _TIF_POLLING_NRFLAG (1 << TIF_POLLING_NRFLAG)
#define _TIF_IO_BITMAP (1 << TIF_IO_BITMAP)
@@ -158,9 +160,9 @@ struct thread_info {

#ifdef CONFIG_X86_IOPL_IOPERM
# define _TIF_WORK_CTXSW_PREV (_TIF_WORK_CTXSW| _TIF_USER_RETURN_NOTIFY | \
- _TIF_IO_BITMAP)
+ _TIF_IO_BITMAP | _TIF_SLD)
#else
-# define _TIF_WORK_CTXSW_PREV (_TIF_WORK_CTXSW| _TIF_USER_RETURN_NOTIFY)
+# define _TIF_WORK_CTXSW_PREV (_TIF_WORK_CTXSW| _TIF_USER_RETURN_NOTIFY | _TIF_SLD)
#endif

#define _TIF_WORK_CTXSW_NEXT (_TIF_WORK_CTXSW)
diff --git a/arch/x86/kernel/cpu/common.c b/arch/x86/kernel/cpu/common.c
index 86b8241c8209..adb2f639f388 100644
--- a/arch/x86/kernel/cpu/common.c
+++ b/arch/x86/kernel/cpu/common.c
@@ -1242,6 +1242,8 @@ static void __init early_identify_cpu(struct cpuinfo_x86 *c)

cpu_set_bug_bits(c);

+ cpu_set_core_cap_bits(c);
+
fpu__init_system(c);

#ifdef CONFIG_X86_32
diff --git a/arch/x86/kernel/cpu/intel.c b/arch/x86/kernel/cpu/intel.c
index 57473e2c0869..68d2a7044779 100644
--- a/arch/x86/kernel/cpu/intel.c
+++ b/arch/x86/kernel/cpu/intel.c
@@ -19,6 +19,8 @@
#include <asm/microcode_intel.h>
#include <asm/hwcap2.h>
#include <asm/elf.h>
+#include <asm/cpu_device_id.h>
+#include <asm/cmdline.h>

#ifdef CONFIG_X86_64
#include <linux/topology.h>
@@ -31,6 +33,20 @@
#include <asm/apic.h>
#endif

+enum split_lock_detect_state {
+ sld_off = 0,
+ sld_warn,
+ sld_fatal,
+};
+
+/*
+ * Default to sld_off because most systems do not support
+ * split lock detection. split_lock_setup() will switch this
+ * to sld_warn, and then check to see if there is a command
+ * line override.
+ */
+static enum split_lock_detect_state sld_state = sld_off;
+
/*
* Just in case our CPU detection goes bad, or you have a weird system,
* allow a way to override the automatic disabling of MPX.
@@ -606,6 +622,8 @@ static void init_intel_misc_features(struct cpuinfo_x86 *c)
wrmsrl(MSR_MISC_FEATURES_ENABLES, msr);
}

+static void split_lock_init(void);
+
static void init_intel(struct cpuinfo_x86 *c)
{
early_init_intel(c);
@@ -720,6 +738,8 @@ static void init_intel(struct cpuinfo_x86 *c)
tsx_enable();
if (tsx_ctrl_state == TSX_CTRL_DISABLE)
tsx_disable();
+
+ split_lock_init();
}

#ifdef CONFIG_X86_32
@@ -981,3 +1001,160 @@ static const struct cpu_dev intel_cpu_dev = {
};

cpu_dev_register(intel_cpu_dev);
+
+#undef pr_fmt
+#define pr_fmt(fmt) "x86/split lock detection: " fmt
+
+static const struct {
+ const char *option;
+ enum split_lock_detect_state state;
+} sld_options[] __initconst = {
+ { "off", sld_off },
+ { "warn", sld_warn },
+ { "fatal", sld_fatal },
+};
+
+static inline bool match_option(const char *arg, int arglen, const char *opt)
+{
+ int len = strlen(opt);
+
+ return len == arglen && !strncmp(arg, opt, len);
+}
+
+static void __init split_lock_setup(void)
+{
+ enum split_lock_detect_state sld;
+ char arg[20];
+ int i, ret;
+
+ sld_state = sld = sld_warn;
+ setup_force_cpu_cap(X86_FEATURE_SPLIT_LOCK_DETECT);
+
+ ret = cmdline_find_option(boot_command_line, "split_lock_detect",
+ arg, sizeof(arg));
+ if (ret < 0)
+ goto print;
+
+ for (i = 0; i < ARRAY_SIZE(sld_options); i++) {
+ if (match_option(arg, ret, sld_options[i].option)) {
+ sld = sld_options[i].state;
+ break;
+ }
+ }
+
+ if (sld != sld_state)
+ sld_state = sld;
+
+print:
+ switch(sld) {
+ case sld_off:
+ pr_info("disabled\n");
+ break;
+
+ case sld_warn:
+ pr_info("warning about user-space split_locks\n");
+ break;
+
+ case sld_fatal:
+ pr_info("sending SIGBUS on user-space split_locks\n");
+ break;
+ }
+}
+
+/*
+ * The TEST_CTRL MSR is per core. So multiple threads can
+ * read/write the MSR in parallel. But it's possible to
+ * simplify the read/write without locking and without
+ * worry about overwriting the MSR because only bit 29
+ * is implemented in the MSR and the bit is set as 1 by all
+ * threads. Locking may be needed in the future if situation
+ * is changed e.g. other bits are implemented.
+ */
+
+static bool __sld_msr_set(bool on)
+{
+ u64 test_ctrl_val;
+
+ if (rdmsrl_safe(MSR_TEST_CTRL, &test_ctrl_val))
+ return false;
+
+ if (on)
+ test_ctrl_val |= MSR_TEST_CTRL_SPLIT_LOCK_DETECT;
+ else
+ test_ctrl_val &= ~MSR_TEST_CTRL_SPLIT_LOCK_DETECT;
+
+ if (wrmsrl_safe(MSR_TEST_CTRL, test_ctrl_val))
+ return false;
+
+ return true;
+}
+
+static void split_lock_init(void)
+{
+ if (sld_state == sld_off)
+ return;
+
+ if (__sld_msr_set(true))
+ return;
+
+ /*
+ * If this is anything other than the boot-cpu, you've done
+ * funny things and you get to keep whatever pieces.
+ */
+ pr_warn("MSR fail -- disabled\n");
+ __sld_msr_set(sld_off);
+}
+
+bool split_lock_detect_enabled(void)
+{
+ return sld_state != sld_off;
+}
+
+bool handle_user_split_lock(struct pt_regs *regs, long error_code)
+{
+ if ((regs->flags & X86_EFLAGS_AC) || sld_state == sld_fatal)
+ return false;
+
+ pr_warn_ratelimited("#AC: %s/%d took a split_lock trap at address: 0x%lx\n",
+ current->comm, current->pid, regs->ip);
+
+ __sld_msr_set(false);
+ set_tsk_thread_flag(current, TIF_SLD);
+ return true;
+}
+
+void switch_sld(struct task_struct *prev)
+{
+ __sld_msr_set(true);
+ clear_tsk_thread_flag(prev, TIF_SLD);
+}
+
+#define SPLIT_LOCK_CPU(model) {X86_VENDOR_INTEL, 6, model, X86_FEATURE_ANY}
+
+/*
+ * The following processors have split lock detection feature. But since they
+ * don't have MSR IA32_CORE_CAPABILITIES, the feature cannot be enumerated by
+ * the MSR. So enumerate the feature by family and model on these processors.
+ */
+static const struct x86_cpu_id split_lock_cpu_ids[] __initconst = {
+ SPLIT_LOCK_CPU(INTEL_FAM6_ICELAKE_X),
+ SPLIT_LOCK_CPU(INTEL_FAM6_ICELAKE_L),
+ {}
+};
+
+void __init cpu_set_core_cap_bits(struct cpuinfo_x86 *c)
+{
+ u64 ia32_core_caps = 0;
+
+ if (cpu_has(c, X86_FEATURE_CORE_CAPABILITIES)) {
+ /* Enumerate features reported in IA32_CORE_CAPABILITIES MSR. */
+ rdmsrl(MSR_IA32_CORE_CAPS, ia32_core_caps);
+ } else if (!boot_cpu_has(X86_FEATURE_HYPERVISOR)) {
+ /* Enumerate split lock detection by family and model. */
+ if (x86_match_cpu(split_lock_cpu_ids))
+ ia32_core_caps |= MSR_IA32_CORE_CAPS_SPLIT_LOCK_DETECT;
+ }
+
+ if (ia32_core_caps & MSR_IA32_CORE_CAPS_SPLIT_LOCK_DETECT)
+ split_lock_setup();
+}
diff --git a/arch/x86/kernel/process.c b/arch/x86/kernel/process.c
index 839b5244e3b7..355760d36505 100644
--- a/arch/x86/kernel/process.c
+++ b/arch/x86/kernel/process.c
@@ -650,6 +650,9 @@ void __switch_to_xtra(struct task_struct *prev_p, struct task_struct *next_p)
/* Enforce MSR update to ensure consistent state */
__speculation_ctrl_update(~tifn, tifn);
}
+
+ if (tifp & _TIF_SLD)
+ switch_sld(prev_p);
}

/*
diff --git a/arch/x86/kernel/traps.c b/arch/x86/kernel/traps.c
index 9e6f822922a3..61c576b95184 100644
--- a/arch/x86/kernel/traps.c
+++ b/arch/x86/kernel/traps.c
@@ -46,6 +46,7 @@
#include <asm/traps.h>
#include <asm/desc.h>
#include <asm/fpu/internal.h>
+#include <asm/cpu.h>
#include <asm/cpu_entry_area.h>
#include <asm/mce.h>
#include <asm/fixmap.h>
@@ -244,7 +245,6 @@ do_trap(int trapnr, int signr, char *str, struct pt_regs *regs,
{
struct task_struct *tsk = current;

-
if (!do_trap_no_signal(tsk, trapnr, str, regs, error_code))
return;

@@ -290,9 +290,29 @@ DO_ERROR(X86_TRAP_OLD_MF, SIGFPE, 0, NULL, "coprocessor segment overru
DO_ERROR(X86_TRAP_TS, SIGSEGV, 0, NULL, "invalid TSS", invalid_TSS)
DO_ERROR(X86_TRAP_NP, SIGBUS, 0, NULL, "segment not present", segment_not_present)
DO_ERROR(X86_TRAP_SS, SIGBUS, 0, NULL, "stack segment", stack_segment)
-DO_ERROR(X86_TRAP_AC, SIGBUS, BUS_ADRALN, NULL, "alignment check", alignment_check)
#undef IP

+dotraplinkage void do_alignment_check(struct pt_regs *regs, long error_code)
+{
+ const char str[] = "alignment check";
+
+ RCU_LOCKDEP_WARN(!rcu_is_watching(), "entry code didn't wake RCU");
+
+ if (notify_die(DIE_TRAP, str, regs, error_code, X86_TRAP_AC, SIGBUS) == NOTIFY_STOP)
+ return;
+
+ if (!user_mode(regs))
+ die("Split lock detected\n", regs, error_code);
+
+ cond_local_irq_enable(regs);
+
+ if (handle_user_split_lock(regs, error_code))
+ return;
+
+ do_trap(X86_TRAP_AC, SIGBUS, "alignment check", regs,
+ error_code, BUS_ADRALN, NULL);
+}
+
#ifdef CONFIG_VMAP_STACK
__visible void __noreturn handle_stack_overflow(const char *message,
struct pt_regs *regs,
--
2.21.0

2020-01-24 21:38:38

by Thomas Gleixner

[permalink] [raw]
Subject: Re: [PATCH v14] x86/split_lock: Enable split lock detection by kernel

Tony,

"Luck, Tony" <[email protected]> writes:
> + split_lock_detect=
> + [X86] Enable split lock detection
> +
> + When enabled (and if hardware support is present), atomic
> + instructions that access data across cache line
> + boundaries will result in an alignment check exception.
> +
> + off - not enabled
> +
> + warn - the kernel will pr_alert about applications

pr_alert is not a verb. And the implementation uses
pr_warn_ratelimited(). So this should be something like:

The kernel will emit rate limited warnings about
applications ...

> + triggering the #AC exception
> @@ -40,4 +40,21 @@ int mwait_usable(const struct cpuinfo_x86 *);
> unsigned int x86_family(unsigned int sig);
> unsigned int x86_model(unsigned int sig);
> unsigned int x86_stepping(unsigned int sig);
> +#ifdef CONFIG_CPU_SUP_INTEL
> +extern void __init cpu_set_core_cap_bits(struct cpuinfo_x86 *c);
> +extern bool split_lock_detect_enabled(void);

That function is unused.

> +extern bool handle_user_split_lock(struct pt_regs *regs, long error_code);
> +extern void switch_sld(struct task_struct *);
> +#else
> +static inline void __init cpu_set_core_cap_bits(struct cpuinfo_x86 *c) {}
> +static inline bool split_lock_detect_enabled(void)
> +{
> + return false;
> +}
> +static inline bool handle_user_split_lock(struct pt_regs *regs, long error_code)
> +{
> + return false;
> +}
> +static inline void switch_sld(struct task_struct *prev) {}
> +#endif

> +enum split_lock_detect_state {
> + sld_off = 0,
> + sld_warn,
> + sld_fatal,
> +};
> +
> +/*
> + * Default to sld_off because most systems do not support
> + * split lock detection. split_lock_setup() will switch this

Can you please add: If supported, then ...

> + * to sld_warn, and then check to see if there is a command
> + * line override.

I had to read this 3 times and then stare at the code.

> + */
> +static enum split_lock_detect_state sld_state = sld_off;
> +
> +static void __init split_lock_setup(void)
> +{
> + enum split_lock_detect_state sld;
> + char arg[20];
> + int i, ret;
> +
> + sld_state = sld = sld_warn;

This intermediate variable is pointless.

> + setup_force_cpu_cap(X86_FEATURE_SPLIT_LOCK_DETECT);
> +
> + ret = cmdline_find_option(boot_command_line, "split_lock_detect",
> + arg, sizeof(arg));
> + if (ret < 0)
> + goto print;
> +
> + for (i = 0; i < ARRAY_SIZE(sld_options); i++) {
> + if (match_option(arg, ret, sld_options[i].option)) {
> + sld = sld_options[i].state;
> + break;
> + }
> + }
> +
> + if (sld != sld_state)
> + sld_state = sld;
> +
> +print:

> +/*
> + * The TEST_CTRL MSR is per core. So multiple threads can
> + * read/write the MSR in parallel. But it's possible to
> + * simplify the read/write without locking and without
> + * worry about overwriting the MSR because only bit 29
> + * is implemented in the MSR and the bit is set as 1 by all
> + * threads. Locking may be needed in the future if situation
> + * is changed e.g. other bits are implemented.

This sentence doesn't parse. Something like this perhaps:

Locking is not required at the moment because only bit 29 of this
MSR is implemented and locking would not prevent that the operation
of one thread is immediately undone by the sibling thread.

This implies that locking might become necessary when new bits are added.

> + */
> +
> +static bool __sld_msr_set(bool on)
> +{
> + u64 test_ctrl_val;
> +
> + if (rdmsrl_safe(MSR_TEST_CTRL, &test_ctrl_val))
> + return false;
> +
> + if (on)
> + test_ctrl_val |= MSR_TEST_CTRL_SPLIT_LOCK_DETECT;
> + else
> + test_ctrl_val &= ~MSR_TEST_CTRL_SPLIT_LOCK_DETECT;
> +
> + if (wrmsrl_safe(MSR_TEST_CTRL, test_ctrl_val))
> + return false;
> +
> + return true;

return !wrmsrl_safe(MSR_TEST_CTRL, test_ctrl_val);

> +}
> +
> +static void split_lock_init(void)
> +{
> + if (sld_state == sld_off)
> + return;
> +
> + if (__sld_msr_set(true))
> + return;
> +
> + /*
> + * If this is anything other than the boot-cpu, you've done
> + * funny things and you get to keep whatever pieces.
> + */
> + pr_warn("MSR fail -- disabled\n");
> + __sld_msr_set(sld_off);

That should do:

sld_state = sld_off;

for consistency sake.

> +}
> +
> +bool split_lock_detect_enabled(void)
> +{
> + return sld_state != sld_off;
> +}
> +
> +bool handle_user_split_lock(struct pt_regs *regs, long error_code)
> +{
> + if ((regs->flags & X86_EFLAGS_AC) || sld_state == sld_fatal)
> + return false;
> +
> + pr_warn_ratelimited("#AC: %s/%d took a split_lock trap at address: 0x%lx\n",
> + current->comm, current->pid, regs->ip);

So with 10 prints per 5 seconds an intentional offender can still fill dmesg
pretty good. A standard dmesg buffer should be full of this in
~15min. Not a big issue, but it might be annoying. Let's start with this
and deal with it when people complain.

The magic below really lacks a comment. Something like:

/*
* Disable the split lock detection for this task so it can make
* progress and set TIF_SLD so the detection is reenabled via
* switch_to_sld() when the task is scheduled out.
*/

> + __sld_msr_set(false);
> + set_tsk_thread_flag(current, TIF_SLD);
> + return true;
> +}
> +
> +void switch_sld(struct task_struct *prev)

switch_to_sld() perhaps?

> +{
> + __sld_msr_set(true);
> + clear_tsk_thread_flag(prev, TIF_SLD);
> +}
>
> +dotraplinkage void do_alignment_check(struct pt_regs *regs, long error_code)
> +{
> + const char str[] = "alignment check";
> +
> + RCU_LOCKDEP_WARN(!rcu_is_watching(), "entry code didn't wake RCU");
> +
> + if (notify_die(DIE_TRAP, str, regs, error_code, X86_TRAP_AC, SIGBUS) == NOTIFY_STOP)
> + return;
> +
> + if (!user_mode(regs))
> + die("Split lock detected\n", regs, error_code);
> +
> + cond_local_irq_enable(regs);

This cond is pointless. We recently removed the ability for user space
to disable interrupts and even if that would still be allowed then
keeping interrupts disabled here does not make sense.

Other than those details, I really like this approach.

Thanks,

tglx

2020-01-25 02:49:13

by Tony Luck

[permalink] [raw]
Subject: [PATCH v15] x86/split_lock: Enable split lock detection by kernel

From: Peter Zijlstra <[email protected]>

A split-lock occurs when an atomic instruction operates on data
that spans two cache lines. In order to maintain atomicity the
core takes a global bus lock.

This is typically >1000 cycles slower than an atomic operation
within a cache line. It also disrupts performance on other cores
(which must wait for the bus lock to be released before their
memory operations can complete). For real-time systems this may
mean missing deadlines. For other systems it may just be very
annoying.

Some CPUs have the capability to raise an #AC trap when a
split lock is attempted.

Provide a command line option to give the user choices on how
to handle this. split_lock_detect=
off - not enabled (no traps for split locks)
warn - warn once when an application does a
split lock, but allow it to continue
running.
fatal - Send SIGBUS to applications that cause split lock

On systems that support split lock detection the default is "warn". Note
that if the kernel hits a split lock in any mode other than "off" it
will OOPs.

One implementation wrinkle is that the MSR to control the
split lock detection is per-core, not per thread. This might
result in some short lived races on HT systems in "warn" mode
if Linux tries to enable on one thread while disabling on
the other. Race analysis by Sean Christopherson:

- Toggling of split-lock is only done in "warn" mode. Worst case
scenario of a race is that a misbehaving task will generate multiple
#AC exceptions on the same instruction. And this race will only occur
if both siblings are running tasks that generate split-lock #ACs, e.g.
a race where sibling threads are writing different values will only
occur if CPUx is disabling split-lock after an #AC and CPUy is
re-enabling split-lock after *its* previous task generated an #AC.
- Transitioning between modes at runtime isn't supported and disabling
is tracked per task, so hardware will always reach a steady state that
matches the configured mode. I.e. split-lock is guaranteed to be
enabled in hardware once all _TIF_SLD threads have been scheduled out.

Co-developed-by: Fenghua Yu <[email protected]>
Co-developed-by: Tony Luck <[email protected]>
Signed-off-by: Fenghua Yu <[email protected]>
Signed-off-by: Peter Zijlstra (Intel) <[email protected]>
Signed-off-by: Tony Luck <[email protected]>
---

tglx> Other than those details, I really like this approach.

Thanks for the review. Here is V15 with all your V14 comments addressed.

I did find something with a new test. Applications that hit a
split lock warn as expected. But if they sleep before they hit
a new split lock, we get another warning. This is may be because
I messed up when fixing a PeterZ typo in the untested patch.
But I think there may have been bigger problems.

Context switch in V14 code did:

if (tifp & _TIF_SLD)
switch_to_sld(prev_p);

void switch_to_sld(struct task_struct *prev)
{
__sld_msr_set(true);
clear_tsk_thread_flag(prev, TIF_SLD);
}

Which re-enables split lock checking for the next process to run. But
mysteriously clears the TIF_SLD bit on the previous task.

I think we need to consider TIF_SLD state of both previous and next
process when deciding what to do with the MSR. Three cases:

1) If they are both the same, leave the MSR alone it is (probably) right (modulo
the other thread having messed with it).
2) Next process has _TIF_SLD set ... disable checking
3) Next process doesn't have _TIF_SLD set ... enable checking

So please look closely at the new version of switch_to_sld() which is
now called unconditonally on every switch ... but commonly will do
nothing.

.../admin-guide/kernel-parameters.txt | 18 ++
arch/x86/include/asm/cpu.h | 12 ++
arch/x86/include/asm/cpufeatures.h | 2 +
arch/x86/include/asm/msr-index.h | 9 +
arch/x86/include/asm/thread_info.h | 6 +-
arch/x86/kernel/cpu/common.c | 2 +
arch/x86/kernel/cpu/intel.c | 177 ++++++++++++++++++
arch/x86/kernel/process.c | 2 +
arch/x86/kernel/traps.c | 24 ++-
9 files changed, 248 insertions(+), 4 deletions(-)

diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt
index 7f1e2f327e43..27f61d44a37f 100644
--- a/Documentation/admin-guide/kernel-parameters.txt
+++ b/Documentation/admin-guide/kernel-parameters.txt
@@ -3207,6 +3207,24 @@

nosoftlockup [KNL] Disable the soft-lockup detector.

+ split_lock_detect=
+ [X86] Enable split lock detection
+
+ When enabled (and if hardware support is present), atomic
+ instructions that access data across cache line
+ boundaries will result in an alignment check exception.
+
+ off - not enabled
+
+ warn - the kernel will emit rate limited warnings
+ about applications triggering the #AC exception
+
+ fatal - the kernel will SIGBUS applications that
+ trigger the #AC exception.
+
+ For any more other than 'off' the kernel will die if
+ it (or firmware) will trigger #AC.
+
nosync [HW,M68K] Disables sync negotiation for all devices.

nowatchdog [KNL] Disable both lockup detectors, i.e.
diff --git a/arch/x86/include/asm/cpu.h b/arch/x86/include/asm/cpu.h
index adc6cc86b062..2dede2bbb7cf 100644
--- a/arch/x86/include/asm/cpu.h
+++ b/arch/x86/include/asm/cpu.h
@@ -40,4 +40,16 @@ int mwait_usable(const struct cpuinfo_x86 *);
unsigned int x86_family(unsigned int sig);
unsigned int x86_model(unsigned int sig);
unsigned int x86_stepping(unsigned int sig);
+#ifdef CONFIG_CPU_SUP_INTEL
+extern void __init cpu_set_core_cap_bits(struct cpuinfo_x86 *c);
+extern bool handle_user_split_lock(struct pt_regs *regs, long error_code);
+extern void switch_to_sld(struct task_struct *, struct task_struct *);
+#else
+static inline void __init cpu_set_core_cap_bits(struct cpuinfo_x86 *c) {}
+static inline bool handle_user_split_lock(struct pt_regs *regs, long error_code)
+{
+ return false;
+}
+static inline void switch_to_sld(struct task_struct *prev, struct stack *next) {}
+#endif
#endif /* _ASM_X86_CPU_H */
diff --git a/arch/x86/include/asm/cpufeatures.h b/arch/x86/include/asm/cpufeatures.h
index f3327cb56edf..cd56ad5d308e 100644
--- a/arch/x86/include/asm/cpufeatures.h
+++ b/arch/x86/include/asm/cpufeatures.h
@@ -285,6 +285,7 @@
#define X86_FEATURE_CQM_MBM_LOCAL (11*32+ 3) /* LLC Local MBM monitoring */
#define X86_FEATURE_FENCE_SWAPGS_USER (11*32+ 4) /* "" LFENCE in user entry SWAPGS path */
#define X86_FEATURE_FENCE_SWAPGS_KERNEL (11*32+ 5) /* "" LFENCE in kernel entry SWAPGS path */
+#define X86_FEATURE_SPLIT_LOCK_DETECT (11*32+ 6) /* #AC for split lock */

/* Intel-defined CPU features, CPUID level 0x00000007:1 (EAX), word 12 */
#define X86_FEATURE_AVX512_BF16 (12*32+ 5) /* AVX512 BFLOAT16 instructions */
@@ -367,6 +368,7 @@
#define X86_FEATURE_INTEL_STIBP (18*32+27) /* "" Single Thread Indirect Branch Predictors */
#define X86_FEATURE_FLUSH_L1D (18*32+28) /* Flush L1D cache */
#define X86_FEATURE_ARCH_CAPABILITIES (18*32+29) /* IA32_ARCH_CAPABILITIES MSR (Intel) */
+#define X86_FEATURE_CORE_CAPABILITIES (18*32+30) /* "" IA32_CORE_CAPABILITIES MSR */
#define X86_FEATURE_SPEC_CTRL_SSBD (18*32+31) /* "" Speculative Store Bypass Disable */

/*
diff --git a/arch/x86/include/asm/msr-index.h b/arch/x86/include/asm/msr-index.h
index ebe1685e92dd..8821697a7549 100644
--- a/arch/x86/include/asm/msr-index.h
+++ b/arch/x86/include/asm/msr-index.h
@@ -41,6 +41,10 @@

/* Intel MSRs. Some also available on other CPUs */

+#define MSR_TEST_CTRL 0x00000033
+#define MSR_TEST_CTRL_SPLIT_LOCK_DETECT_BIT 29
+#define MSR_TEST_CTRL_SPLIT_LOCK_DETECT BIT(MSR_TEST_CTRL_SPLIT_LOCK_DETECT_BIT)
+
#define MSR_IA32_SPEC_CTRL 0x00000048 /* Speculation Control */
#define SPEC_CTRL_IBRS BIT(0) /* Indirect Branch Restricted Speculation */
#define SPEC_CTRL_STIBP_SHIFT 1 /* Single Thread Indirect Branch Predictor (STIBP) bit */
@@ -70,6 +74,11 @@
*/
#define MSR_IA32_UMWAIT_CONTROL_TIME_MASK (~0x03U)

+/* Abbreviated from Intel SDM name IA32_CORE_CAPABILITIES */
+#define MSR_IA32_CORE_CAPS 0x000000cf
+#define MSR_IA32_CORE_CAPS_SPLIT_LOCK_DETECT_BIT 5
+#define MSR_IA32_CORE_CAPS_SPLIT_LOCK_DETECT BIT(MSR_IA32_CORE_CAPS_SPLIT_LOCK_DETECT_BIT)
+
#define MSR_PKG_CST_CONFIG_CONTROL 0x000000e2
#define NHM_C3_AUTO_DEMOTE (1UL << 25)
#define NHM_C1_AUTO_DEMOTE (1UL << 26)
diff --git a/arch/x86/include/asm/thread_info.h b/arch/x86/include/asm/thread_info.h
index cf4327986e98..e0d12517f348 100644
--- a/arch/x86/include/asm/thread_info.h
+++ b/arch/x86/include/asm/thread_info.h
@@ -92,6 +92,7 @@ struct thread_info {
#define TIF_NOCPUID 15 /* CPUID is not accessible in userland */
#define TIF_NOTSC 16 /* TSC is not accessible in userland */
#define TIF_IA32 17 /* IA32 compatibility process */
+#define TIF_SLD 18 /* Restore split lock detection on context switch */
#define TIF_NOHZ 19 /* in adaptive nohz mode */
#define TIF_MEMDIE 20 /* is terminating due to OOM killer */
#define TIF_POLLING_NRFLAG 21 /* idle is polling for TIF_NEED_RESCHED */
@@ -122,6 +123,7 @@ struct thread_info {
#define _TIF_NOCPUID (1 << TIF_NOCPUID)
#define _TIF_NOTSC (1 << TIF_NOTSC)
#define _TIF_IA32 (1 << TIF_IA32)
+#define _TIF_SLD (1 << TIF_SLD)
#define _TIF_NOHZ (1 << TIF_NOHZ)
#define _TIF_POLLING_NRFLAG (1 << TIF_POLLING_NRFLAG)
#define _TIF_IO_BITMAP (1 << TIF_IO_BITMAP)
@@ -158,9 +160,9 @@ struct thread_info {

#ifdef CONFIG_X86_IOPL_IOPERM
# define _TIF_WORK_CTXSW_PREV (_TIF_WORK_CTXSW| _TIF_USER_RETURN_NOTIFY | \
- _TIF_IO_BITMAP)
+ _TIF_IO_BITMAP | _TIF_SLD)
#else
-# define _TIF_WORK_CTXSW_PREV (_TIF_WORK_CTXSW| _TIF_USER_RETURN_NOTIFY)
+# define _TIF_WORK_CTXSW_PREV (_TIF_WORK_CTXSW| _TIF_USER_RETURN_NOTIFY | _TIF_SLD)
#endif

#define _TIF_WORK_CTXSW_NEXT (_TIF_WORK_CTXSW)
diff --git a/arch/x86/kernel/cpu/common.c b/arch/x86/kernel/cpu/common.c
index 86b8241c8209..adb2f639f388 100644
--- a/arch/x86/kernel/cpu/common.c
+++ b/arch/x86/kernel/cpu/common.c
@@ -1242,6 +1242,8 @@ static void __init early_identify_cpu(struct cpuinfo_x86 *c)

cpu_set_bug_bits(c);

+ cpu_set_core_cap_bits(c);
+
fpu__init_system(c);

#ifdef CONFIG_X86_32
diff --git a/arch/x86/kernel/cpu/intel.c b/arch/x86/kernel/cpu/intel.c
index 57473e2c0869..d9842c64e5af 100644
--- a/arch/x86/kernel/cpu/intel.c
+++ b/arch/x86/kernel/cpu/intel.c
@@ -19,6 +19,8 @@
#include <asm/microcode_intel.h>
#include <asm/hwcap2.h>
#include <asm/elf.h>
+#include <asm/cpu_device_id.h>
+#include <asm/cmdline.h>

#ifdef CONFIG_X86_64
#include <linux/topology.h>
@@ -31,6 +33,20 @@
#include <asm/apic.h>
#endif

+enum split_lock_detect_state {
+ sld_off = 0,
+ sld_warn,
+ sld_fatal,
+};
+
+/*
+ * Default to sld_off because most systems do not support
+ * split lock detection. split_lock_setup() will switch this
+ * to sld_warn on systems that support split lock detect, and
+ * then check to see if there is a command line override.
+ */
+static enum split_lock_detect_state sld_state = sld_off;
+
/*
* Just in case our CPU detection goes bad, or you have a weird system,
* allow a way to override the automatic disabling of MPX.
@@ -606,6 +622,8 @@ static void init_intel_misc_features(struct cpuinfo_x86 *c)
wrmsrl(MSR_MISC_FEATURES_ENABLES, msr);
}

+static void split_lock_init(void);
+
static void init_intel(struct cpuinfo_x86 *c)
{
early_init_intel(c);
@@ -720,6 +738,8 @@ static void init_intel(struct cpuinfo_x86 *c)
tsx_enable();
if (tsx_ctrl_state == TSX_CTRL_DISABLE)
tsx_disable();
+
+ split_lock_init();
}

#ifdef CONFIG_X86_32
@@ -981,3 +1001,160 @@ static const struct cpu_dev intel_cpu_dev = {
};

cpu_dev_register(intel_cpu_dev);
+
+#undef pr_fmt
+#define pr_fmt(fmt) "x86/split lock detection: " fmt
+
+static const struct {
+ const char *option;
+ enum split_lock_detect_state state;
+} sld_options[] __initconst = {
+ { "off", sld_off },
+ { "warn", sld_warn },
+ { "fatal", sld_fatal },
+};
+
+static inline bool match_option(const char *arg, int arglen, const char *opt)
+{
+ int len = strlen(opt);
+
+ return len == arglen && !strncmp(arg, opt, len);
+}
+
+static void __init split_lock_setup(void)
+{
+ char arg[20];
+ int i, ret;
+
+ sld_state = sld_warn;
+ setup_force_cpu_cap(X86_FEATURE_SPLIT_LOCK_DETECT);
+
+ ret = cmdline_find_option(boot_command_line, "split_lock_detect",
+ arg, sizeof(arg));
+ if (ret < 0)
+ goto print;
+
+ for (i = 0; i < ARRAY_SIZE(sld_options); i++) {
+ if (match_option(arg, ret, sld_options[i].option)) {
+ sld_state = sld_options[i].state;
+ break;
+ }
+ }
+
+print:
+ switch(sld_state) {
+ case sld_off:
+ pr_info("disabled\n");
+ break;
+
+ case sld_warn:
+ pr_info("warning about user-space split_locks\n");
+ break;
+
+ case sld_fatal:
+ pr_info("sending SIGBUS on user-space split_locks\n");
+ break;
+ }
+}
+
+/*
+ * Locking is not required at the moment because only bit 29 of this
+ * MSR is implemented and locking would not prevent that the operation
+ * of one thread is immediately undone by the sibling thread.
+ */
+
+static bool __sld_msr_set(bool on)
+{
+ u64 test_ctrl_val;
+
+ if (rdmsrl_safe(MSR_TEST_CTRL, &test_ctrl_val))
+ return false;
+
+ if (on)
+ test_ctrl_val |= MSR_TEST_CTRL_SPLIT_LOCK_DETECT;
+ else
+ test_ctrl_val &= ~MSR_TEST_CTRL_SPLIT_LOCK_DETECT;
+
+ return !wrmsrl_safe(MSR_TEST_CTRL, test_ctrl_val);
+}
+
+static void split_lock_init(void)
+{
+ if (sld_state == sld_off)
+ return;
+
+ if (__sld_msr_set(true))
+ return;
+
+ /*
+ * If this is anything other than the boot-cpu, you've done
+ * funny things and you get to keep whatever pieces.
+ */
+ pr_warn("MSR fail -- disabled\n");
+ __sld_msr_set(sld_off);
+ sld_state = sld_off;
+}
+
+bool handle_user_split_lock(struct pt_regs *regs, long error_code)
+{
+ if ((regs->flags & X86_EFLAGS_AC) || sld_state == sld_fatal)
+ return false;
+
+ pr_warn_ratelimited("#AC: %s/%d took a split_lock trap at address: 0x%lx\n",
+ current->comm, current->pid, regs->ip);
+
+ /*
+ * Disable the split lock detection for this task so it can make
+ * progress and set TIF_SLD so the detection is reenabled via
+ * switch_to_sld() when the task is scheduled out.
+ */
+ __sld_msr_set(false);
+ set_tsk_thread_flag(current, TIF_SLD);
+ return true;
+}
+
+void switch_to_sld(struct task_struct *prev, struct task_struct *next)
+{
+ bool prevflag = test_tsk_thread_flag(prev, TIF_SLD);
+ bool nextflag = test_tsk_thread_flag(next, TIF_SLD);
+
+ /*
+ * If we are switching between tasks that have the same
+ * need for split lock checking, then the MSR is (probably)
+ * right (modulo the other thread messing with it.
+ * Otherwise look at whether the new task needs split
+ * lock enabled.
+ */
+ if (prevflag != nextflag)
+ __sld_msr_set(nextflag);
+}
+
+#define SPLIT_LOCK_CPU(model) {X86_VENDOR_INTEL, 6, model, X86_FEATURE_ANY}
+
+/*
+ * The following processors have split lock detection feature. But since they
+ * don't have MSR IA32_CORE_CAPABILITIES, the feature cannot be enumerated by
+ * the MSR. So enumerate the feature by family and model on these processors.
+ */
+static const struct x86_cpu_id split_lock_cpu_ids[] __initconst = {
+ SPLIT_LOCK_CPU(INTEL_FAM6_ICELAKE_X),
+ SPLIT_LOCK_CPU(INTEL_FAM6_ICELAKE_L),
+ {}
+};
+
+void __init cpu_set_core_cap_bits(struct cpuinfo_x86 *c)
+{
+ u64 ia32_core_caps = 0;
+
+ if (cpu_has(c, X86_FEATURE_CORE_CAPABILITIES)) {
+ /* Enumerate features reported in IA32_CORE_CAPABILITIES MSR. */
+ rdmsrl(MSR_IA32_CORE_CAPS, ia32_core_caps);
+ } else if (!boot_cpu_has(X86_FEATURE_HYPERVISOR)) {
+ /* Enumerate split lock detection by family and model. */
+ if (x86_match_cpu(split_lock_cpu_ids))
+ ia32_core_caps |= MSR_IA32_CORE_CAPS_SPLIT_LOCK_DETECT;
+ }
+
+ if (ia32_core_caps & MSR_IA32_CORE_CAPS_SPLIT_LOCK_DETECT)
+ split_lock_setup();
+}
diff --git a/arch/x86/kernel/process.c b/arch/x86/kernel/process.c
index 839b5244e3b7..b34d359c4e39 100644
--- a/arch/x86/kernel/process.c
+++ b/arch/x86/kernel/process.c
@@ -650,6 +650,8 @@ void __switch_to_xtra(struct task_struct *prev_p, struct task_struct *next_p)
/* Enforce MSR update to ensure consistent state */
__speculation_ctrl_update(~tifn, tifn);
}
+
+ switch_to_sld(prev_p, next_p);
}

/*
diff --git a/arch/x86/kernel/traps.c b/arch/x86/kernel/traps.c
index 9e6f822922a3..884e8e59dafd 100644
--- a/arch/x86/kernel/traps.c
+++ b/arch/x86/kernel/traps.c
@@ -46,6 +46,7 @@
#include <asm/traps.h>
#include <asm/desc.h>
#include <asm/fpu/internal.h>
+#include <asm/cpu.h>
#include <asm/cpu_entry_area.h>
#include <asm/mce.h>
#include <asm/fixmap.h>
@@ -244,7 +245,6 @@ do_trap(int trapnr, int signr, char *str, struct pt_regs *regs,
{
struct task_struct *tsk = current;

-
if (!do_trap_no_signal(tsk, trapnr, str, regs, error_code))
return;

@@ -290,9 +290,29 @@ DO_ERROR(X86_TRAP_OLD_MF, SIGFPE, 0, NULL, "coprocessor segment overru
DO_ERROR(X86_TRAP_TS, SIGSEGV, 0, NULL, "invalid TSS", invalid_TSS)
DO_ERROR(X86_TRAP_NP, SIGBUS, 0, NULL, "segment not present", segment_not_present)
DO_ERROR(X86_TRAP_SS, SIGBUS, 0, NULL, "stack segment", stack_segment)
-DO_ERROR(X86_TRAP_AC, SIGBUS, BUS_ADRALN, NULL, "alignment check", alignment_check)
#undef IP

+dotraplinkage void do_alignment_check(struct pt_regs *regs, long error_code)
+{
+ const char str[] = "alignment check";
+
+ RCU_LOCKDEP_WARN(!rcu_is_watching(), "entry code didn't wake RCU");
+
+ if (notify_die(DIE_TRAP, str, regs, error_code, X86_TRAP_AC, SIGBUS) == NOTIFY_STOP)
+ return;
+
+ if (!user_mode(regs))
+ die("Split lock detected\n", regs, error_code);
+
+ local_irq_enable();
+
+ if (handle_user_split_lock(regs, error_code))
+ return;
+
+ do_trap(X86_TRAP_AC, SIGBUS, "alignment check", regs,
+ error_code, BUS_ADRALN, NULL);
+}
+
#ifdef CONFIG_VMAP_STACK
__visible void __noreturn handle_stack_overflow(const char *message,
struct pt_regs *regs,
--
2.21.1

2020-01-25 10:46:59

by Borislav Petkov

[permalink] [raw]
Subject: Re: [PATCH v15] x86/split_lock: Enable split lock detection by kernel

On Fri, Jan 24, 2020 at 06:47:27PM -0800, Luck, Tony wrote:
> From: Peter Zijlstra <[email protected]>
>
> A split-lock occurs when an atomic instruction operates on data
> that spans two cache lines. In order to maintain atomicity the
> core takes a global bus lock.
>
> This is typically >1000 cycles slower than an atomic operation
> within a cache line. It also disrupts performance on other cores
> (which must wait for the bus lock to be released before their
> memory operations can complete). For real-time systems this may
> mean missing deadlines. For other systems it may just be very
> annoying.
>
> Some CPUs have the capability to raise an #AC trap when a
> split lock is attempted.
>
> Provide a command line option to give the user choices on how
> to handle this. split_lock_detect=
> off - not enabled (no traps for split locks)
> warn - warn once when an application does a
> split lock, but allow it to continue
> running.
> fatal - Send SIGBUS to applications that cause split lock
>
> On systems that support split lock detection the default is "warn". Note
> that if the kernel hits a split lock in any mode other than "off" it
> will OOPs.
>
> One implementation wrinkle is that the MSR to control the
> split lock detection is per-core, not per thread. This might
> result in some short lived races on HT systems in "warn" mode
> if Linux tries to enable on one thread while disabling on
> the other. Race analysis by Sean Christopherson:
>
> - Toggling of split-lock is only done in "warn" mode. Worst case
> scenario of a race is that a misbehaving task will generate multiple
> #AC exceptions on the same instruction. And this race will only occur
> if both siblings are running tasks that generate split-lock #ACs, e.g.
> a race where sibling threads are writing different values will only
> occur if CPUx is disabling split-lock after an #AC and CPUy is
> re-enabling split-lock after *its* previous task generated an #AC.
> - Transitioning between modes at runtime isn't supported and disabling
> is tracked per task, so hardware will always reach a steady state that
> matches the configured mode. I.e. split-lock is guaranteed to be
> enabled in hardware once all _TIF_SLD threads have been scheduled out.

I think this "wrinkle" needs to be written down somewhere more prominent
- not in the commit message only - so that people can find it when using
the thing and start seeing the multiple #ACs on the same insn.

> Co-developed-by: Fenghua Yu <[email protected]>
> Co-developed-by: Tony Luck <[email protected]>
> Signed-off-by: Fenghua Yu <[email protected]>
> Signed-off-by: Peter Zijlstra (Intel) <[email protected]>
> Signed-off-by: Tony Luck <[email protected]>

checkpatch is bitching here:

WARNING: Co-developed-by: must be immediately followed by Signed-off-by:
#66:
Co-developed-by: Fenghua Yu <[email protected]>
Co-developed-by: Tony Luck <[email protected]>
WARNING: Co-developed-by and Signed-off-by: name/email do not match
#67:
Co-developed-by: Tony Luck <[email protected]>
Signed-off-by: Fenghua Yu <[email protected]>

> ---
>
> tglx> Other than those details, I really like this approach.
>
> Thanks for the review. Here is V15 with all your V14 comments addressed.
>
> I did find something with a new test. Applications that hit a
> split lock warn as expected. But if they sleep before they hit
> a new split lock, we get another warning. This is may be because
> I messed up when fixing a PeterZ typo in the untested patch.
> But I think there may have been bigger problems.
>
> Context switch in V14 code did:
>
> if (tifp & _TIF_SLD)
> switch_to_sld(prev_p);
>
> void switch_to_sld(struct task_struct *prev)
> {
> __sld_msr_set(true);
> clear_tsk_thread_flag(prev, TIF_SLD);
> }
>
> Which re-enables split lock checking for the next process to run. But
> mysteriously clears the TIF_SLD bit on the previous task.
>
> I think we need to consider TIF_SLD state of both previous and next
> process when deciding what to do with the MSR. Three cases:
>
> 1) If they are both the same, leave the MSR alone it is (probably) right (modulo
> the other thread having messed with it).
> 2) Next process has _TIF_SLD set ... disable checking
> 3) Next process doesn't have _TIF_SLD set ... enable checking
>
> So please look closely at the new version of switch_to_sld() which is
> now called unconditonally on every switch ... but commonly will do
> nothing.
>
> .../admin-guide/kernel-parameters.txt | 18 ++
> arch/x86/include/asm/cpu.h | 12 ++
> arch/x86/include/asm/cpufeatures.h | 2 +
> arch/x86/include/asm/msr-index.h | 9 +
> arch/x86/include/asm/thread_info.h | 6 +-
> arch/x86/kernel/cpu/common.c | 2 +
> arch/x86/kernel/cpu/intel.c | 177 ++++++++++++++++++
> arch/x86/kernel/process.c | 2 +
> arch/x86/kernel/traps.c | 24 ++-
> 9 files changed, 248 insertions(+), 4 deletions(-)
>
> diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt
> index 7f1e2f327e43..27f61d44a37f 100644
> --- a/Documentation/admin-guide/kernel-parameters.txt
> +++ b/Documentation/admin-guide/kernel-parameters.txt
> @@ -3207,6 +3207,24 @@
>
> nosoftlockup [KNL] Disable the soft-lockup detector.
>
> + split_lock_detect=

Needs to be alphabetically sorted.

> + [X86] Enable split lock detection
> +
> + When enabled (and if hardware support is present), atomic
> + instructions that access data across cache line
> + boundaries will result in an alignment check exception.
> +
> + off - not enabled
> +
> + warn - the kernel will emit rate limited warnings
> + about applications triggering the #AC exception
> +
> + fatal - the kernel will SIGBUS applications that

"... the kernel will send a SIGBUG to applications..."

> + trigger the #AC exception.
> +
> + For any more other than 'off' the kernel will die if
> + it (or firmware) will trigger #AC.

Why would the kernel die in the "warn" case? It prints ratelimited
warnings only, if I'm reading this help text correctly. Commit mesage says

" Note that if the kernel hits a split lock in any mode other than
"off" it will OOPs."

but this text doesn't say why and leaves people scratching heads and
making them look at the code...

/me scrolls down

aaha, you mean this:

if (!user_mode(regs))
die("Split lock detected\n", regs, error_code);

so what you're trying to say is, "if an #AC exception is hit in the
kernel or the firmware - not in a user task - then we will oops."

Yes?

If so, pls extend so that it is clear what this means.

And the default setting is? I.e., put a short sentence after "warn"
saying so.

> +
> nosync [HW,M68K] Disables sync negotiation for all devices.
>
> nowatchdog [KNL] Disable both lockup detectors, i.e.
> diff --git a/arch/x86/include/asm/cpu.h b/arch/x86/include/asm/cpu.h
> index adc6cc86b062..2dede2bbb7cf 100644
> --- a/arch/x86/include/asm/cpu.h
> +++ b/arch/x86/include/asm/cpu.h
> @@ -40,4 +40,16 @@ int mwait_usable(const struct cpuinfo_x86 *);
> unsigned int x86_family(unsigned int sig);
> unsigned int x86_model(unsigned int sig);
> unsigned int x86_stepping(unsigned int sig);
> +#ifdef CONFIG_CPU_SUP_INTEL
> +extern void __init cpu_set_core_cap_bits(struct cpuinfo_x86 *c);
> +extern bool handle_user_split_lock(struct pt_regs *regs, long error_code);
> +extern void switch_to_sld(struct task_struct *, struct task_struct *);

WARNING: function definition argument 'struct task_struct *' should also have an identifier name
#160: FILE: arch/x86/include/asm/cpu.h:46:
+extern void switch_to_sld(struct task_struct *, struct task_struct *);

> +#else
> +static inline void __init cpu_set_core_cap_bits(struct cpuinfo_x86 *c) {}
> +static inline bool handle_user_split_lock(struct pt_regs *regs, long error_code)
> +{
> + return false;
> +}
> +static inline void switch_to_sld(struct task_struct *prev, struct stack *next) {}
> +#endif
> #endif /* _ASM_X86_CPU_H */
> diff --git a/arch/x86/include/asm/cpufeatures.h b/arch/x86/include/asm/cpufeatures.h
> index f3327cb56edf..cd56ad5d308e 100644
> --- a/arch/x86/include/asm/cpufeatures.h
> +++ b/arch/x86/include/asm/cpufeatures.h
> @@ -285,6 +285,7 @@
> #define X86_FEATURE_CQM_MBM_LOCAL (11*32+ 3) /* LLC Local MBM monitoring */
> #define X86_FEATURE_FENCE_SWAPGS_USER (11*32+ 4) /* "" LFENCE in user entry SWAPGS path */
> #define X86_FEATURE_FENCE_SWAPGS_KERNEL (11*32+ 5) /* "" LFENCE in kernel entry SWAPGS path */
> +#define X86_FEATURE_SPLIT_LOCK_DETECT (11*32+ 6) /* #AC for split lock */

Do you really want to have "split_lock_detect" in /proc/cpuinfo or
rather somethign shorter?

> /* Intel-defined CPU features, CPUID level 0x00000007:1 (EAX), word 12 */
> #define X86_FEATURE_AVX512_BF16 (12*32+ 5) /* AVX512 BFLOAT16 instructions */
> @@ -367,6 +368,7 @@
> #define X86_FEATURE_INTEL_STIBP (18*32+27) /* "" Single Thread Indirect Branch Predictors */
> #define X86_FEATURE_FLUSH_L1D (18*32+28) /* Flush L1D cache */
> #define X86_FEATURE_ARCH_CAPABILITIES (18*32+29) /* IA32_ARCH_CAPABILITIES MSR (Intel) */
> +#define X86_FEATURE_CORE_CAPABILITIES (18*32+30) /* "" IA32_CORE_CAPABILITIES MSR */
> #define X86_FEATURE_SPEC_CTRL_SSBD (18*32+31) /* "" Speculative Store Bypass Disable */
>
> /*
> diff --git a/arch/x86/include/asm/msr-index.h b/arch/x86/include/asm/msr-index.h
> index ebe1685e92dd..8821697a7549 100644
> --- a/arch/x86/include/asm/msr-index.h
> +++ b/arch/x86/include/asm/msr-index.h
> @@ -41,6 +41,10 @@
>
> /* Intel MSRs. Some also available on other CPUs */
>
> +#define MSR_TEST_CTRL 0x00000033
> +#define MSR_TEST_CTRL_SPLIT_LOCK_DETECT_BIT 29
> +#define MSR_TEST_CTRL_SPLIT_LOCK_DETECT BIT(MSR_TEST_CTRL_SPLIT_LOCK_DETECT_BIT)
> +
> #define MSR_IA32_SPEC_CTRL 0x00000048 /* Speculation Control */
> #define SPEC_CTRL_IBRS BIT(0) /* Indirect Branch Restricted Speculation */
> #define SPEC_CTRL_STIBP_SHIFT 1 /* Single Thread Indirect Branch Predictor (STIBP) bit */
> @@ -70,6 +74,11 @@
> */
> #define MSR_IA32_UMWAIT_CONTROL_TIME_MASK (~0x03U)
>
> +/* Abbreviated from Intel SDM name IA32_CORE_CAPABILITIES */
> +#define MSR_IA32_CORE_CAPS 0x000000cf
> +#define MSR_IA32_CORE_CAPS_SPLIT_LOCK_DETECT_BIT 5
> +#define MSR_IA32_CORE_CAPS_SPLIT_LOCK_DETECT BIT(MSR_IA32_CORE_CAPS_SPLIT_LOCK_DETECT_BIT)
> +
> #define MSR_PKG_CST_CONFIG_CONTROL 0x000000e2
> #define NHM_C3_AUTO_DEMOTE (1UL << 25)
> #define NHM_C1_AUTO_DEMOTE (1UL << 26)
> diff --git a/arch/x86/include/asm/thread_info.h b/arch/x86/include/asm/thread_info.h
> index cf4327986e98..e0d12517f348 100644
> --- a/arch/x86/include/asm/thread_info.h
> +++ b/arch/x86/include/asm/thread_info.h
> @@ -92,6 +92,7 @@ struct thread_info {
> #define TIF_NOCPUID 15 /* CPUID is not accessible in userland */
> #define TIF_NOTSC 16 /* TSC is not accessible in userland */
> #define TIF_IA32 17 /* IA32 compatibility process */
> +#define TIF_SLD 18 /* Restore split lock detection on context switch */
> #define TIF_NOHZ 19 /* in adaptive nohz mode */
> #define TIF_MEMDIE 20 /* is terminating due to OOM killer */
> #define TIF_POLLING_NRFLAG 21 /* idle is polling for TIF_NEED_RESCHED */
> @@ -122,6 +123,7 @@ struct thread_info {
> #define _TIF_NOCPUID (1 << TIF_NOCPUID)
> #define _TIF_NOTSC (1 << TIF_NOTSC)
> #define _TIF_IA32 (1 << TIF_IA32)
> +#define _TIF_SLD (1 << TIF_SLD)
> #define _TIF_NOHZ (1 << TIF_NOHZ)
> #define _TIF_POLLING_NRFLAG (1 << TIF_POLLING_NRFLAG)
> #define _TIF_IO_BITMAP (1 << TIF_IO_BITMAP)
> @@ -158,9 +160,9 @@ struct thread_info {
>
> #ifdef CONFIG_X86_IOPL_IOPERM
> # define _TIF_WORK_CTXSW_PREV (_TIF_WORK_CTXSW| _TIF_USER_RETURN_NOTIFY | \
> - _TIF_IO_BITMAP)
> + _TIF_IO_BITMAP | _TIF_SLD)
> #else
> -# define _TIF_WORK_CTXSW_PREV (_TIF_WORK_CTXSW| _TIF_USER_RETURN_NOTIFY)
> +# define _TIF_WORK_CTXSW_PREV (_TIF_WORK_CTXSW| _TIF_USER_RETURN_NOTIFY | _TIF_SLD)

Can you fix those while at it pls:

ERROR: need consistent spacing around '|' (ctx:VxW)
#245: FILE: arch/x86/include/asm/thread_info.h:165:
+# define _TIF_WORK_CTXSW_PREV (_TIF_WORK_CTXSW| _TIF_USER_RETURN_NOTIFY | _TIF_SLD)
^
> #endif
>
> #define _TIF_WORK_CTXSW_NEXT (_TIF_WORK_CTXSW)
> diff --git a/arch/x86/kernel/cpu/common.c b/arch/x86/kernel/cpu/common.c
> index 86b8241c8209..adb2f639f388 100644
> --- a/arch/x86/kernel/cpu/common.c
> +++ b/arch/x86/kernel/cpu/common.c
> @@ -1242,6 +1242,8 @@ static void __init early_identify_cpu(struct cpuinfo_x86 *c)
>
> cpu_set_bug_bits(c);
>
> + cpu_set_core_cap_bits(c);
> +
> fpu__init_system(c);
>
> #ifdef CONFIG_X86_32
> diff --git a/arch/x86/kernel/cpu/intel.c b/arch/x86/kernel/cpu/intel.c
> index 57473e2c0869..d9842c64e5af 100644
> --- a/arch/x86/kernel/cpu/intel.c
> +++ b/arch/x86/kernel/cpu/intel.c
> @@ -19,6 +19,8 @@
> #include <asm/microcode_intel.h>
> #include <asm/hwcap2.h>
> #include <asm/elf.h>
> +#include <asm/cpu_device_id.h>
> +#include <asm/cmdline.h>
>
> #ifdef CONFIG_X86_64
> #include <linux/topology.h>
> @@ -31,6 +33,20 @@
> #include <asm/apic.h>
> #endif
>
> +enum split_lock_detect_state {
> + sld_off = 0,
> + sld_warn,
> + sld_fatal,
> +};
> +
> +/*
> + * Default to sld_off because most systems do not support
> + * split lock detection. split_lock_setup() will switch this
> + * to sld_warn on systems that support split lock detect, and
> + * then check to see if there is a command line override.
> + */

That comment is shorter than 80 cols while others below aren't.

> +static enum split_lock_detect_state sld_state = sld_off;
> +
> /*
> * Just in case our CPU detection goes bad, or you have a weird system,
> * allow a way to override the automatic disabling of MPX.
> @@ -606,6 +622,8 @@ static void init_intel_misc_features(struct cpuinfo_x86 *c)
> wrmsrl(MSR_MISC_FEATURES_ENABLES, msr);
> }
>
> +static void split_lock_init(void);
> +
> static void init_intel(struct cpuinfo_x86 *c)
> {
> early_init_intel(c);
> @@ -720,6 +738,8 @@ static void init_intel(struct cpuinfo_x86 *c)
> tsx_enable();
> if (tsx_ctrl_state == TSX_CTRL_DISABLE)
> tsx_disable();
> +
> + split_lock_init();
> }
>
> #ifdef CONFIG_X86_32
> @@ -981,3 +1001,160 @@ static const struct cpu_dev intel_cpu_dev = {
> };
>
> cpu_dev_register(intel_cpu_dev);
> +
> +#undef pr_fmt
> +#define pr_fmt(fmt) "x86/split lock detection: " fmt
> +
> +static const struct {
> + const char *option;
> + enum split_lock_detect_state state;
> +} sld_options[] __initconst = {
> + { "off", sld_off },
> + { "warn", sld_warn },
> + { "fatal", sld_fatal },
> +};
> +
> +static inline bool match_option(const char *arg, int arglen, const char *opt)
> +{
> + int len = strlen(opt);
> +
> + return len == arglen && !strncmp(arg, opt, len);
> +}

There's the same function in arch/x86/kernel/cpu/bugs.c. Why are you
duplicating it here?

Yeah, this whole chunk looks like it has been "influenced" by the sec
mitigations in bugs.c :-)

> +static void __init split_lock_setup(void)
> +{
> + char arg[20];
> + int i, ret;
> +
> + sld_state = sld_warn;
> + setup_force_cpu_cap(X86_FEATURE_SPLIT_LOCK_DETECT);
> +
> + ret = cmdline_find_option(boot_command_line, "split_lock_detect",
> + arg, sizeof(arg));
> + if (ret < 0)
> + goto print;
> +
> + for (i = 0; i < ARRAY_SIZE(sld_options); i++) {
> + if (match_option(arg, ret, sld_options[i].option)) {
> + sld_state = sld_options[i].state;
> + break;
> + }
> + }
> +
> +print:
> + switch(sld_state) {

ERROR: space required before the open parenthesis '('
#359: FILE: arch/x86/kernel/cpu/intel.c:1045:
+ switch(sld_state) {

> + case sld_off:
> + pr_info("disabled\n");
> + break;
> +
> + case sld_warn:
> + pr_info("warning about user-space split_locks\n");
> + break;
> +
> + case sld_fatal:
> + pr_info("sending SIGBUS on user-space split_locks\n");
> + break;
> + }
> +}
> +
> +/*
> + * Locking is not required at the moment because only bit 29 of this
> + * MSR is implemented and locking would not prevent that the operation
> + * of one thread is immediately undone by the sibling thread.
> + */
> +

^ Superfluous newline.

> +static bool __sld_msr_set(bool on)
> +{
> + u64 test_ctrl_val;
> +
> + if (rdmsrl_safe(MSR_TEST_CTRL, &test_ctrl_val))
> + return false;
> +
> + if (on)
> + test_ctrl_val |= MSR_TEST_CTRL_SPLIT_LOCK_DETECT;
> + else
> + test_ctrl_val &= ~MSR_TEST_CTRL_SPLIT_LOCK_DETECT;
> +
> + return !wrmsrl_safe(MSR_TEST_CTRL, test_ctrl_val);
> +}
> +
> +static void split_lock_init(void)
> +{
> + if (sld_state == sld_off)
> + return;
> +
> + if (__sld_msr_set(true))
> + return;
> +
> + /*
> + * If this is anything other than the boot-cpu, you've done
> + * funny things and you get to keep whatever pieces.
> + */
> + pr_warn("MSR fail -- disabled\n");

What's that for? Guests?

> + __sld_msr_set(sld_off);
> + sld_state = sld_off;
> +}
> +
> +bool handle_user_split_lock(struct pt_regs *regs, long error_code)
> +{
> + if ((regs->flags & X86_EFLAGS_AC) || sld_state == sld_fatal)
> + return false;
> +
> + pr_warn_ratelimited("#AC: %s/%d took a split_lock trap at address: 0x%lx\n",
> + current->comm, current->pid, regs->ip);
> +
> + /*
> + * Disable the split lock detection for this task so it can make
> + * progress and set TIF_SLD so the detection is reenabled via
> + * switch_to_sld() when the task is scheduled out.
> + */
> + __sld_msr_set(false);
> + set_tsk_thread_flag(current, TIF_SLD);
> + return true;
> +}
> +
> +void switch_to_sld(struct task_struct *prev, struct task_struct *next)

This will get called on other vendors but let's just assume, for
simplicity's sake, TIF_SLD won't be set there so it is only a couple of
insns on a task switch going to waste.

> +{
> + bool prevflag = test_tsk_thread_flag(prev, TIF_SLD);
> + bool nextflag = test_tsk_thread_flag(next, TIF_SLD);
> +
> + /*
> + * If we are switching between tasks that have the same
> + * need for split lock checking, then the MSR is (probably)
> + * right (modulo the other thread messing with it.
> + * Otherwise look at whether the new task needs split
> + * lock enabled.
> + */
> + if (prevflag != nextflag)
> + __sld_msr_set(nextflag);
> +}
> +
> +#define SPLIT_LOCK_CPU(model) {X86_VENDOR_INTEL, 6, model, X86_FEATURE_ANY}
> +
> +/*
> + * The following processors have split lock detection feature. But since they
> + * don't have MSR IA32_CORE_CAPABILITIES, the feature cannot be enumerated by
> + * the MSR. So enumerate the feature by family and model on these processors.
> + */
> +static const struct x86_cpu_id split_lock_cpu_ids[] __initconst = {
> + SPLIT_LOCK_CPU(INTEL_FAM6_ICELAKE_X),
> + SPLIT_LOCK_CPU(INTEL_FAM6_ICELAKE_L),
> + {}
> +};
> +
> +void __init cpu_set_core_cap_bits(struct cpuinfo_x86 *c)
> +{
> + u64 ia32_core_caps = 0;

So this gets called on other vendors too and even if they should not
have set X86_FEATURE_CORE_CAPABILITIES, a vendor check here would be
prudent for the future:

if (c->x86_vendor != X86_VENDOR_INTEL)
return;

> +
> + if (cpu_has(c, X86_FEATURE_CORE_CAPABILITIES)) {
> + /* Enumerate features reported in IA32_CORE_CAPABILITIES MSR. */
> + rdmsrl(MSR_IA32_CORE_CAPS, ia32_core_caps);
> + } else if (!boot_cpu_has(X86_FEATURE_HYPERVISOR)) {
> + /* Enumerate split lock detection by family and model. */
> + if (x86_match_cpu(split_lock_cpu_ids))
> + ia32_core_caps |= MSR_IA32_CORE_CAPS_SPLIT_LOCK_DETECT;
> + }
> +
> + if (ia32_core_caps & MSR_IA32_CORE_CAPS_SPLIT_LOCK_DETECT)
> + split_lock_setup();
> +}
> diff --git a/arch/x86/kernel/process.c b/arch/x86/kernel/process.c
> index 839b5244e3b7..b34d359c4e39 100644
> --- a/arch/x86/kernel/process.c
> +++ b/arch/x86/kernel/process.c
> @@ -650,6 +650,8 @@ void __switch_to_xtra(struct task_struct *prev_p, struct task_struct *next_p)
> /* Enforce MSR update to ensure consistent state */
> __speculation_ctrl_update(~tifn, tifn);
> }
> +
> + switch_to_sld(prev_p, next_p);
> }
>
> /*
> diff --git a/arch/x86/kernel/traps.c b/arch/x86/kernel/traps.c
> index 9e6f822922a3..884e8e59dafd 100644
> --- a/arch/x86/kernel/traps.c
> +++ b/arch/x86/kernel/traps.c
> @@ -46,6 +46,7 @@
> #include <asm/traps.h>
> #include <asm/desc.h>
> #include <asm/fpu/internal.h>
> +#include <asm/cpu.h>
> #include <asm/cpu_entry_area.h>
> #include <asm/mce.h>
> #include <asm/fixmap.h>
> @@ -244,7 +245,6 @@ do_trap(int trapnr, int signr, char *str, struct pt_regs *regs,
> {
> struct task_struct *tsk = current;
>
> -
> if (!do_trap_no_signal(tsk, trapnr, str, regs, error_code))
> return;
>
> @@ -290,9 +290,29 @@ DO_ERROR(X86_TRAP_OLD_MF, SIGFPE, 0, NULL, "coprocessor segment overru
> DO_ERROR(X86_TRAP_TS, SIGSEGV, 0, NULL, "invalid TSS", invalid_TSS)
> DO_ERROR(X86_TRAP_NP, SIGBUS, 0, NULL, "segment not present", segment_not_present)
> DO_ERROR(X86_TRAP_SS, SIGBUS, 0, NULL, "stack segment", stack_segment)
> -DO_ERROR(X86_TRAP_AC, SIGBUS, BUS_ADRALN, NULL, "alignment check", alignment_check)
> #undef IP
>
> +dotraplinkage void do_alignment_check(struct pt_regs *regs, long error_code)
> +{
> + const char str[] = "alignment check";

WARNING: const array should probably be static const
#517: FILE: arch/x86/kernel/traps.c:297:
+ const char str[] = "alignment check";

> +
> + RCU_LOCKDEP_WARN(!rcu_is_watching(), "entry code didn't wake RCU");
> +
> + if (notify_die(DIE_TRAP, str, regs, error_code, X86_TRAP_AC, SIGBUS) == NOTIFY_STOP)
> + return;
> +
> + if (!user_mode(regs))
> + die("Split lock detected\n", regs, error_code);
> +
> + local_irq_enable();
> +
> + if (handle_user_split_lock(regs, error_code))
> + return;
> +
> + do_trap(X86_TRAP_AC, SIGBUS, "alignment check", regs,
> + error_code, BUS_ADRALN, NULL);
> +}
> +
> #ifdef CONFIG_VMAP_STACK
> __visible void __noreturn handle_stack_overflow(const char *message,
> struct pt_regs *regs,
> --
> 2.21.1
>

--
Regards/Gruss,
Boris.

https://people.kernel.org/tglx/notes-about-netiquette

2020-01-25 13:43:04

by Thomas Gleixner

[permalink] [raw]
Subject: Re: [PATCH v15] x86/split_lock: Enable split lock detection by kernel

Tony,

"Luck, Tony" <[email protected]> writes:
> +
> +void switch_to_sld(struct task_struct *prev, struct task_struct *next)
> +{
> + bool prevflag = test_tsk_thread_flag(prev, TIF_SLD);
> + bool nextflag = test_tsk_thread_flag(next, TIF_SLD);
> +
> + /*
> + * If we are switching between tasks that have the same
> + * need for split lock checking, then the MSR is (probably)
> + * right (modulo the other thread messing with it.
> + * Otherwise look at whether the new task needs split
> + * lock enabled.
> + */
> + if (prevflag != nextflag)
> + __sld_msr_set(nextflag);
> +}
> diff --git a/arch/x86/kernel/process.c b/arch/x86/kernel/process.c
> index 839b5244e3b7..b34d359c4e39 100644
> --- a/arch/x86/kernel/process.c
> +++ b/arch/x86/kernel/process.c
> @@ -650,6 +650,8 @@ void __switch_to_xtra(struct task_struct *prev_p, struct task_struct *next_p)
> /* Enforce MSR update to ensure consistent state */
> __speculation_ctrl_update(~tifn, tifn);
> }
> +
> + switch_to_sld(prev_p, next_p);

This really wants to follow the logic of the other TIF checks.

if ((tifp ^ tifn) & _TIF_SLD)
switch_to_sld(tifn);

and

void switch_to_sld(tifn)
{
__sld_msr_set(tif & _TIF_SLD);
}

That reuses tifp, tifn which are ready to consume there and calls only
out of line when the bits differ. The xor/and combo turned out to result
in the most efficient code.

Thanks,

tglx

2020-01-25 19:57:39

by Tony Luck

[permalink] [raw]
Subject: Re: [PATCH v15] x86/split_lock: Enable split lock detection by kernel

On Sat, Jan 25, 2020 at 11:44:19AM +0100, Borislav Petkov wrote:

Boris,

Thanks for the review. All comments accepted and changes made, except as
listed below. Also will fix up some other checkpatch fluff.

-Tony


> > +#define X86_FEATURE_SPLIT_LOCK_DETECT (11*32+ 6) /* #AC for split lock */
>
> Do you really want to have "split_lock_detect" in /proc/cpuinfo or
> rather somethign shorter?

I don't have a good abbreviation. It would become the joint 2nd longest
flag name ... top ten lengths look like this on my test machine. So while
long, not unprecedented.

18 tsc_deadline_timer
17 split_lock_detect
17 arch_capabilities
16 avx512_vpopcntdq
14 tsc_known_freq
14 invpcid_single
14 hwp_act_window
13 ibrs_enhanced
13 cqm_occup_llc
13 cqm_mbm_total
13 cqm_mbm_local
13 avx512_bitalg
13 3dnowprefetch


> > +static inline bool match_option(const char *arg, int arglen, const char *opt)
> > +{
> > + int len = strlen(opt);
> > +
> > + return len == arglen && !strncmp(arg, opt, len);
> > +}
>
> There's the same function in arch/x86/kernel/cpu/bugs.c. Why are you
> duplicating it here?
>
> Yeah, this whole chunk looks like it has been "influenced" by the sec
> mitigations in bugs.c :-)

Blame PeterZ for that. For now I'd like to add the duplicate inline function
and then clean up by putting it into some header file (and maybe hunting down
other places where it could be used).

> > + /*
> > + * If this is anything other than the boot-cpu, you've done
> > + * funny things and you get to keep whatever pieces.
> > + */
> > + pr_warn("MSR fail -- disabled\n");
>
> What's that for? Guests?

Also some PeterZ code. As the comment implies we really shouldn't be able
to get here. This whole function should only be called on CPU models that
support the MSR ... but PeterZ is defending against the situation that sometimes
there are special SKUs with the same model number (since we may be here because
of an x86_match_cpu() hit, rather than the architectural enumeration check).

> > +void switch_to_sld(struct task_struct *prev, struct task_struct *next)
>
> This will get called on other vendors but let's just assume, for
> simplicity's sake, TIF_SLD won't be set there so it is only a couple of
> insns on a task switch going to waste.

Thomas explained how to fix it so we only call the function if TIF_SLD
is set in either the previous or next process (but not both). So the
overhead is just extra XOR/AND in the caller.

-Tony

2020-01-25 20:13:43

by Peter Zijlstra

[permalink] [raw]
Subject: Re: [PATCH v15] x86/split_lock: Enable split lock detection by kernel

On Sat, Jan 25, 2020 at 11:55:13AM -0800, Luck, Tony wrote:
> > > +static inline bool match_option(const char *arg, int arglen, const char *opt)
> > > +{
> > > + int len = strlen(opt);
> > > +
> > > + return len == arglen && !strncmp(arg, opt, len);
> > > +}
> >
> > There's the same function in arch/x86/kernel/cpu/bugs.c. Why are you
> > duplicating it here?
> >
> > Yeah, this whole chunk looks like it has been "influenced" by the sec
> > mitigations in bugs.c :-)
>
> Blame PeterZ for that. For now I'd like to add the duplicate inline function
> and then clean up by putting it into some header file (and maybe hunting down
> other places where it could be used).

Yeah, I copy/paste cobbled that together. I figured it was easier to
'borrow' something that worked and adapt it than try and write
something new in a hurry.

> > > + /*
> > > + * If this is anything other than the boot-cpu, you've done
> > > + * funny things and you get to keep whatever pieces.
> > > + */
> > > + pr_warn("MSR fail -- disabled\n");
> >
> > What's that for? Guests?
>
> Also some PeterZ code. As the comment implies we really shouldn't be able
> to get here. This whole function should only be called on CPU models that
> support the MSR ... but PeterZ is defending against the situation that sometimes
> there are special SKUs with the same model number (since we may be here because
> of an x86_match_cpu() hit, rather than the architectural enumeration check).

My thinking was Virt, virt likes to mess up all msr expectations.

2020-01-25 20:30:58

by Borislav Petkov

[permalink] [raw]
Subject: Re: [PATCH v15] x86/split_lock: Enable split lock detection by kernel

On Sat, Jan 25, 2020 at 11:55:13AM -0800, Luck, Tony wrote:
> I don't have a good abbreviation. It would become the joint 2nd longest
> flag name ... top ten lengths look like this on my test machine. So while
> long, not unprecedented.

Yah, I guess we lost that battle long ago.

> Thomas explained how to fix it so we only call the function if TIF_SLD
> is set in either the previous or next process (but not both). So the
> overhead is just extra XOR/AND in the caller.

Yeah.

Thx.

--
Regards/Gruss,
Boris.

https://people.kernel.org/tglx/notes-about-netiquette

2020-01-25 20:35:22

by Borislav Petkov

[permalink] [raw]
Subject: Re: [PATCH v15] x86/split_lock: Enable split lock detection by kernel

On Sat, Jan 25, 2020 at 09:12:21PM +0100, Peter Zijlstra wrote:
> > Blame PeterZ for that. For now I'd like to add the duplicate inline function
> > and then clean up by putting it into some header file (and maybe hunting down
> > other places where it could be used).

Sounds like a good plan.

> Yeah, I copy/paste cobbled that together. I figured it was easier to
> 'borrow' something that worked and adapt it than try and write
> something new in a hurry.

Yeah.

> > Also some PeterZ code. As the comment implies we really shouldn't be able
> > to get here. This whole function should only be called on CPU models that
> > support the MSR ... but PeterZ is defending against the situation that sometimes
> > there are special SKUs with the same model number (since we may be here because
> > of an x86_match_cpu() hit, rather than the architectural enumeration check).
>
> My thinking was Virt, virt likes to mess up all msr expectations.

My only worry is to have it written down why we're doing this so that it
can be changed/removed later, when we've forgotten all about split lock.
Because pretty often we look at a comment-less chunk of code and wonder,
"why the hell did we add this in the first place."

Thx.

--
Regards/Gruss,
Boris.

https://people.kernel.org/tglx/notes-about-netiquette

2020-01-25 21:26:36

by Arvind Sankar

[permalink] [raw]
Subject: Re: [PATCH v15] x86/split_lock: Enable split lock detection by kernel

On Fri, Jan 24, 2020 at 06:47:27PM -0800, Luck, Tony wrote:
> I did find something with a new test. Applications that hit a
> split lock warn as expected. But if they sleep before they hit
> a new split lock, we get another warning. This is may be because
> I messed up when fixing a PeterZ typo in the untested patch.
> But I think there may have been bigger problems.
>
> Context switch in V14 code did:
>
> if (tifp & _TIF_SLD)
> switch_to_sld(prev_p);
>
> void switch_to_sld(struct task_struct *prev)
> {
> __sld_msr_set(true);
> clear_tsk_thread_flag(prev, TIF_SLD);
> }
>
> Which re-enables split lock checking for the next process to run. But
> mysteriously clears the TIF_SLD bit on the previous task.

Did Peter mean to disable it only for the current timeslice and
re-enable it for the next time its scheduled?

>
> I think we need to consider TIF_SLD state of both previous and next
> process when deciding what to do with the MSR. Three cases:
>
> 1) If they are both the same, leave the MSR alone it is (probably) right (modulo
> the other thread having messed with it).
> 2) Next process has _TIF_SLD set ... disable checking
> 3) Next process doesn't have _TIF_SLD set ... enable checking
>
> So please look closely at the new version of switch_to_sld() which is
> now called unconditonally on every switch ... but commonly will do
> nothing.
...
> + /*
> + * Disable the split lock detection for this task so it can make
> + * progress and set TIF_SLD so the detection is reenabled via
> + * switch_to_sld() when the task is scheduled out.
> + */
> + __sld_msr_set(false);
> + set_tsk_thread_flag(current, TIF_SLD);
> + return true;
> +}
> +
> +void switch_to_sld(struct task_struct *prev, struct task_struct *next)
> +{
> + bool prevflag = test_tsk_thread_flag(prev, TIF_SLD);
> + bool nextflag = test_tsk_thread_flag(next, TIF_SLD);
> +
> + /*
> + * If we are switching between tasks that have the same
> + * need for split lock checking, then the MSR is (probably)
> + * right (modulo the other thread messing with it.
> + * Otherwise look at whether the new task needs split
> + * lock enabled.
> + */
> + if (prevflag != nextflag)
> + __sld_msr_set(nextflag);
> +}

I might be missing something but shouldnt this be !nextflag given the
flag being unset is when the task wants sld?

2020-01-25 21:43:54

by Tony Luck

[permalink] [raw]
Subject: Re: [PATCH v15] x86/split_lock: Enable split lock detection by kernel

On Sat, Jan 25, 2020 at 09:33:12PM +0100, Borislav Petkov wrote:
> On Sat, Jan 25, 2020 at 09:12:21PM +0100, Peter Zijlstra wrote:
> > My thinking was Virt, virt likes to mess up all msr expectations.
>
> My only worry is to have it written down why we're doing this so that it
> can be changed/removed later, when we've forgotten all about split lock.
> Because pretty often we look at a comment-less chunk of code and wonder,
> "why the hell did we add this in the first place."

Ok. I added a comment:

* Use the "safe" versions of rdmsr/wrmsr here because although code
* checks CPUID and MSR bits to make sure the TEST_CTRL MSR should
* exist, there may be glitches in virtualization that leave a guest
* with an incorrect view of real h/w capabilities.

-Tony

2020-01-25 21:51:10

by Tony Luck

[permalink] [raw]
Subject: Re: [PATCH v15] x86/split_lock: Enable split lock detection by kernel

On Sat, Jan 25, 2020 at 04:25:25PM -0500, Arvind Sankar wrote:
> On Fri, Jan 24, 2020 at 06:47:27PM -0800, Luck, Tony wrote:
> > I did find something with a new test. Applications that hit a
> > split lock warn as expected. But if they sleep before they hit
> > a new split lock, we get another warning. This is may be because
> > I messed up when fixing a PeterZ typo in the untested patch.
> > But I think there may have been bigger problems.
> >
> > Context switch in V14 code did:
> >
> > if (tifp & _TIF_SLD)
> > switch_to_sld(prev_p);
> >
> > void switch_to_sld(struct task_struct *prev)
> > {
> > __sld_msr_set(true);
> > clear_tsk_thread_flag(prev, TIF_SLD);
> > }
> >
> > Which re-enables split lock checking for the next process to run. But
> > mysteriously clears the TIF_SLD bit on the previous task.
>
> Did Peter mean to disable it only for the current timeslice and
> re-enable it for the next time its scheduled?

He's seen and commented on this thread since I made this comment. So
I'll assume not. Things get really noisy on the console (even with
the rate limit) if split lock detection is re-enabled after a context
switch (my new test highlighted this!)

> > +void switch_to_sld(struct task_struct *prev, struct task_struct *next)
> > +{
> > + bool prevflag = test_tsk_thread_flag(prev, TIF_SLD);
> > + bool nextflag = test_tsk_thread_flag(next, TIF_SLD);
> > +
> > + /*
> > + * If we are switching between tasks that have the same
> > + * need for split lock checking, then the MSR is (probably)
> > + * right (modulo the other thread messing with it.
> > + * Otherwise look at whether the new task needs split
> > + * lock enabled.
> > + */
> > + if (prevflag != nextflag)
> > + __sld_msr_set(nextflag);
> > +}
>
> I might be missing something but shouldnt this be !nextflag given the
> flag being unset is when the task wants sld?

That logic is convoluted ... but Thomas showed me a much better
way that is also much simpler ... so this code has gone now. The
new version is far easier to read (argument is flags for the new task
that we are switching to)

void switch_to_sld(unsigned long tifn)
{
__sld_msr_set(tifn & _TIF_SLD);
}

-Tony

2020-01-25 22:11:24

by Tony Luck

[permalink] [raw]
Subject: [PATCH v16] x86/split_lock: Enable split lock detection by kernel

From: "Peter Zijlstra (Intel)" <[email protected]>

A split-lock occurs when an atomic instruction operates on data
that spans two cache lines. In order to maintain atomicity the
core takes a global bus lock.

This is typically >1000 cycles slower than an atomic operation
within a cache line. It also disrupts performance on other cores
(which must wait for the bus lock to be released before their
memory operations can complete). For real-time systems this may
mean missing deadlines. For other systems it may just be very
annoying.

Some CPUs have the capability to raise an #AC trap when a
split lock is attempted.

Provide a command line option to give the user choices on how
to handle this. split_lock_detect=
off - not enabled (no traps for split locks)
warn - warn once when an application does a
split lock, but allow it to continue
running.
fatal - Send SIGBUS to applications that cause split lock

On systems that support split lock detection the default is "warn". Note
that if the kernel hits a split lock in any mode other than "off" it
will OOPs.

One implementation wrinkle is that the MSR to control the
split lock detection is per-core, not per thread. This might
result in some short lived races on HT systems in "warn" mode
if Linux tries to enable on one thread while disabling on
the other. Race analysis by Sean Christopherson:

- Toggling of split-lock is only done in "warn" mode. Worst case
scenario of a race is that a misbehaving task will generate multiple
#AC exceptions on the same instruction. And this race will only occur
if both siblings are running tasks that generate split-lock #ACs, e.g.
a race where sibling threads are writing different values will only
occur if CPUx is disabling split-lock after an #AC and CPUy is
re-enabling split-lock after *its* previous task generated an #AC.
- Transitioning between modes at runtime isn't supported and disabling
is tracked per task, so hardware will always reach a steady state that
matches the configured mode. I.e. split-lock is guaranteed to be
enabled in hardware once all _TIF_SLD threads have been scheduled out.

Signed-off-by: Peter Zijlstra (Intel) <[email protected]>
Co-developed-by: Fenghua Yu <[email protected]>
Signed-off-by: Fenghua Yu <[email protected]>
Co-developed-by: Tony Luck <[email protected]>
Signed-off-by: Tony Luck <[email protected]>
---

V16:

Thomas: Rewrote the context switch as you suggested with XOR/AND
to avoid the function call when TIF_SLD hasn't changed
Boris: Fixed up all the bits from you comments (except the few
that I listed in the reply to your e-mail). I think the
only outstanding item is a followup patch to remove
the duplicate match_option() inline function pasted from
cpu/bugs.c ... we can bikeshed what to name it in another
thread.

.../admin-guide/kernel-parameters.txt | 22 +++
arch/x86/include/asm/cpu.h | 12 ++
arch/x86/include/asm/cpufeatures.h | 2 +
arch/x86/include/asm/msr-index.h | 9 +
arch/x86/include/asm/thread_info.h | 8 +-
arch/x86/kernel/cpu/common.c | 2 +
arch/x86/kernel/cpu/intel.c | 177 ++++++++++++++++++
arch/x86/kernel/process.c | 3 +
arch/x86/kernel/traps.c | 24 ++-
9 files changed, 254 insertions(+), 5 deletions(-)

diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt
index 7f1e2f327e43..869afed16154 100644
--- a/Documentation/admin-guide/kernel-parameters.txt
+++ b/Documentation/admin-guide/kernel-parameters.txt
@@ -4625,6 +4625,28 @@
spia_pedr=
spia_peddr=

+ split_lock_detect=
+ [X86] Enable split lock detection
+
+ When enabled (and if hardware support is present), atomic
+ instructions that access data across cache line
+ boundaries will result in an alignment check exception.
+
+ off - not enabled
+
+ warn - the kernel will emit rate limited warnings
+ about applications triggering the #AC
+ exception. This mode is the default on h/w
+ that supports split lock detection.
+
+ fatal - the kernel will send SIGBUS to applications
+ that trigger the #AC exception.
+
+ If an #AC exception is hit in the kernel or in
+ firmware (i.e. not while executiing in user mode)
+ the the Linux will oops in either "warn" or "fatal"
+ mode.
+
srcutree.counter_wrap_check [KNL]
Specifies how frequently to check for
grace-period sequence counter wrap for the
diff --git a/arch/x86/include/asm/cpu.h b/arch/x86/include/asm/cpu.h
index adc6cc86b062..ff6f3ca649b3 100644
--- a/arch/x86/include/asm/cpu.h
+++ b/arch/x86/include/asm/cpu.h
@@ -40,4 +40,16 @@ int mwait_usable(const struct cpuinfo_x86 *);
unsigned int x86_family(unsigned int sig);
unsigned int x86_model(unsigned int sig);
unsigned int x86_stepping(unsigned int sig);
+#ifdef CONFIG_CPU_SUP_INTEL
+extern void __init cpu_set_core_cap_bits(struct cpuinfo_x86 *c);
+extern void switch_to_sld(unsigned long tifn);
+extern bool handle_user_split_lock(struct pt_regs *regs, long error_code);
+#else
+static inline void __init cpu_set_core_cap_bits(struct cpuinfo_x86 *c) {}
+static inline void switch_to_sld(unsigned long tifn) {}
+static inline bool handle_user_split_lock(struct pt_regs *regs, long error_code)
+{
+ return false;
+}
+#endif
#endif /* _ASM_X86_CPU_H */
diff --git a/arch/x86/include/asm/cpufeatures.h b/arch/x86/include/asm/cpufeatures.h
index f3327cb56edf..cd56ad5d308e 100644
--- a/arch/x86/include/asm/cpufeatures.h
+++ b/arch/x86/include/asm/cpufeatures.h
@@ -285,6 +285,7 @@
#define X86_FEATURE_CQM_MBM_LOCAL (11*32+ 3) /* LLC Local MBM monitoring */
#define X86_FEATURE_FENCE_SWAPGS_USER (11*32+ 4) /* "" LFENCE in user entry SWAPGS path */
#define X86_FEATURE_FENCE_SWAPGS_KERNEL (11*32+ 5) /* "" LFENCE in kernel entry SWAPGS path */
+#define X86_FEATURE_SPLIT_LOCK_DETECT (11*32+ 6) /* #AC for split lock */

/* Intel-defined CPU features, CPUID level 0x00000007:1 (EAX), word 12 */
#define X86_FEATURE_AVX512_BF16 (12*32+ 5) /* AVX512 BFLOAT16 instructions */
@@ -367,6 +368,7 @@
#define X86_FEATURE_INTEL_STIBP (18*32+27) /* "" Single Thread Indirect Branch Predictors */
#define X86_FEATURE_FLUSH_L1D (18*32+28) /* Flush L1D cache */
#define X86_FEATURE_ARCH_CAPABILITIES (18*32+29) /* IA32_ARCH_CAPABILITIES MSR (Intel) */
+#define X86_FEATURE_CORE_CAPABILITIES (18*32+30) /* "" IA32_CORE_CAPABILITIES MSR */
#define X86_FEATURE_SPEC_CTRL_SSBD (18*32+31) /* "" Speculative Store Bypass Disable */

/*
diff --git a/arch/x86/include/asm/msr-index.h b/arch/x86/include/asm/msr-index.h
index ebe1685e92dd..8821697a7549 100644
--- a/arch/x86/include/asm/msr-index.h
+++ b/arch/x86/include/asm/msr-index.h
@@ -41,6 +41,10 @@

/* Intel MSRs. Some also available on other CPUs */

+#define MSR_TEST_CTRL 0x00000033
+#define MSR_TEST_CTRL_SPLIT_LOCK_DETECT_BIT 29
+#define MSR_TEST_CTRL_SPLIT_LOCK_DETECT BIT(MSR_TEST_CTRL_SPLIT_LOCK_DETECT_BIT)
+
#define MSR_IA32_SPEC_CTRL 0x00000048 /* Speculation Control */
#define SPEC_CTRL_IBRS BIT(0) /* Indirect Branch Restricted Speculation */
#define SPEC_CTRL_STIBP_SHIFT 1 /* Single Thread Indirect Branch Predictor (STIBP) bit */
@@ -70,6 +74,11 @@
*/
#define MSR_IA32_UMWAIT_CONTROL_TIME_MASK (~0x03U)

+/* Abbreviated from Intel SDM name IA32_CORE_CAPABILITIES */
+#define MSR_IA32_CORE_CAPS 0x000000cf
+#define MSR_IA32_CORE_CAPS_SPLIT_LOCK_DETECT_BIT 5
+#define MSR_IA32_CORE_CAPS_SPLIT_LOCK_DETECT BIT(MSR_IA32_CORE_CAPS_SPLIT_LOCK_DETECT_BIT)
+
#define MSR_PKG_CST_CONFIG_CONTROL 0x000000e2
#define NHM_C3_AUTO_DEMOTE (1UL << 25)
#define NHM_C1_AUTO_DEMOTE (1UL << 26)
diff --git a/arch/x86/include/asm/thread_info.h b/arch/x86/include/asm/thread_info.h
index cf4327986e98..e399dcefc2a7 100644
--- a/arch/x86/include/asm/thread_info.h
+++ b/arch/x86/include/asm/thread_info.h
@@ -92,6 +92,7 @@ struct thread_info {
#define TIF_NOCPUID 15 /* CPUID is not accessible in userland */
#define TIF_NOTSC 16 /* TSC is not accessible in userland */
#define TIF_IA32 17 /* IA32 compatibility process */
+#define TIF_SLD 18 /* Restore split lock detection on context switch */
#define TIF_NOHZ 19 /* in adaptive nohz mode */
#define TIF_MEMDIE 20 /* is terminating due to OOM killer */
#define TIF_POLLING_NRFLAG 21 /* idle is polling for TIF_NEED_RESCHED */
@@ -122,6 +123,7 @@ struct thread_info {
#define _TIF_NOCPUID (1 << TIF_NOCPUID)
#define _TIF_NOTSC (1 << TIF_NOTSC)
#define _TIF_IA32 (1 << TIF_IA32)
+#define _TIF_SLD (1 << TIF_SLD)
#define _TIF_NOHZ (1 << TIF_NOHZ)
#define _TIF_POLLING_NRFLAG (1 << TIF_POLLING_NRFLAG)
#define _TIF_IO_BITMAP (1 << TIF_IO_BITMAP)
@@ -157,10 +159,10 @@ struct thread_info {
#endif

#ifdef CONFIG_X86_IOPL_IOPERM
-# define _TIF_WORK_CTXSW_PREV (_TIF_WORK_CTXSW| _TIF_USER_RETURN_NOTIFY | \
- _TIF_IO_BITMAP)
+# define _TIF_WORK_CTXSW_PREV (_TIF_WORK_CTXSW | _TIF_USER_RETURN_NOTIFY | \
+ _TIF_IO_BITMAP | _TIF_SLD)
#else
-# define _TIF_WORK_CTXSW_PREV (_TIF_WORK_CTXSW| _TIF_USER_RETURN_NOTIFY)
+# define _TIF_WORK_CTXSW_PREV (_TIF_WORK_CTXSW | _TIF_USER_RETURN_NOTIFY | _TIF_SLD)
#endif

#define _TIF_WORK_CTXSW_NEXT (_TIF_WORK_CTXSW)
diff --git a/arch/x86/kernel/cpu/common.c b/arch/x86/kernel/cpu/common.c
index 86b8241c8209..adb2f639f388 100644
--- a/arch/x86/kernel/cpu/common.c
+++ b/arch/x86/kernel/cpu/common.c
@@ -1242,6 +1242,8 @@ static void __init early_identify_cpu(struct cpuinfo_x86 *c)

cpu_set_bug_bits(c);

+ cpu_set_core_cap_bits(c);
+
fpu__init_system(c);

#ifdef CONFIG_X86_32
diff --git a/arch/x86/kernel/cpu/intel.c b/arch/x86/kernel/cpu/intel.c
index 57473e2c0869..a84de224ffb0 100644
--- a/arch/x86/kernel/cpu/intel.c
+++ b/arch/x86/kernel/cpu/intel.c
@@ -19,6 +19,8 @@
#include <asm/microcode_intel.h>
#include <asm/hwcap2.h>
#include <asm/elf.h>
+#include <asm/cpu_device_id.h>
+#include <asm/cmdline.h>

#ifdef CONFIG_X86_64
#include <linux/topology.h>
@@ -31,6 +33,19 @@
#include <asm/apic.h>
#endif

+enum split_lock_detect_state {
+ sld_off = 0,
+ sld_warn,
+ sld_fatal,
+};
+
+/*
+ * Default to sld_off because most systems do not support split lock detection
+ * split_lock_setup() will switch this to sld_warn on systems that support
+ * split lock detect, and then check to see if there is a command line override.
+ */
+static enum split_lock_detect_state sld_state = sld_off;
+
/*
* Just in case our CPU detection goes bad, or you have a weird system,
* allow a way to override the automatic disabling of MPX.
@@ -606,6 +621,8 @@ static void init_intel_misc_features(struct cpuinfo_x86 *c)
wrmsrl(MSR_MISC_FEATURES_ENABLES, msr);
}

+static void split_lock_init(void);
+
static void init_intel(struct cpuinfo_x86 *c)
{
early_init_intel(c);
@@ -720,6 +737,8 @@ static void init_intel(struct cpuinfo_x86 *c)
tsx_enable();
if (tsx_ctrl_state == TSX_CTRL_DISABLE)
tsx_disable();
+
+ split_lock_init();
}

#ifdef CONFIG_X86_32
@@ -981,3 +1000,161 @@ static const struct cpu_dev intel_cpu_dev = {
};

cpu_dev_register(intel_cpu_dev);
+
+#undef pr_fmt
+#define pr_fmt(fmt) "x86/split lock detection: " fmt
+
+static const struct {
+ const char *option;
+ enum split_lock_detect_state state;
+} sld_options[] __initconst = {
+ { "off", sld_off },
+ { "warn", sld_warn },
+ { "fatal", sld_fatal },
+};
+
+static inline bool match_option(const char *arg, int arglen, const char *opt)
+{
+ int len = strlen(opt);
+
+ return len == arglen && !strncmp(arg, opt, len);
+}
+
+static void __init split_lock_setup(void)
+{
+ char arg[20];
+ int i, ret;
+
+ sld_state = sld_warn;
+ setup_force_cpu_cap(X86_FEATURE_SPLIT_LOCK_DETECT);
+
+ ret = cmdline_find_option(boot_command_line, "split_lock_detect",
+ arg, sizeof(arg));
+ if (ret < 0)
+ goto print;
+
+ for (i = 0; i < ARRAY_SIZE(sld_options); i++) {
+ if (match_option(arg, ret, sld_options[i].option)) {
+ sld_state = sld_options[i].state;
+ break;
+ }
+ }
+
+print:
+ switch (sld_state) {
+ case sld_off:
+ pr_info("disabled\n");
+ break;
+
+ case sld_warn:
+ pr_info("warning about user-space split_locks\n");
+ break;
+
+ case sld_fatal:
+ pr_info("sending SIGBUS on user-space split_locks\n");
+ break;
+ }
+}
+
+/*
+ * Locking is not required at the moment because only bit 29 of this
+ * MSR is implemented and locking would not prevent that the operation
+ * of one thread is immediately undone by the sibling thread.
+ * Use the "safe" versions of rdmsr/wrmsr here because although code
+ * checks CPUID and MSR bits to make sure the TEST_CTRL MSR should
+ * exist, there may be glitches in virtualization that leave a guest
+ * with an incorrect view of real h/w capabilities.
+ */
+static bool __sld_msr_set(bool on)
+{
+ u64 test_ctrl_val;
+
+ if (rdmsrl_safe(MSR_TEST_CTRL, &test_ctrl_val))
+ return false;
+
+ if (on)
+ test_ctrl_val |= MSR_TEST_CTRL_SPLIT_LOCK_DETECT;
+ else
+ test_ctrl_val &= ~MSR_TEST_CTRL_SPLIT_LOCK_DETECT;
+
+ return !wrmsrl_safe(MSR_TEST_CTRL, test_ctrl_val);
+}
+
+static void split_lock_init(void)
+{
+ if (sld_state == sld_off)
+ return;
+
+ if (__sld_msr_set(true))
+ return;
+
+ /*
+ * If this is anything other than the boot-cpu, you've done
+ * funny things and you get to keep whatever pieces.
+ */
+ pr_warn("MSR fail -- disabled\n");
+ __sld_msr_set(sld_off);
+ sld_state = sld_off;
+}
+
+bool handle_user_split_lock(struct pt_regs *regs, long error_code)
+{
+ if ((regs->flags & X86_EFLAGS_AC) || sld_state == sld_fatal)
+ return false;
+
+ pr_warn_ratelimited("#AC: %s/%d took a split_lock trap at address: 0x%lx\n",
+ current->comm, current->pid, regs->ip);
+
+ /*
+ * Disable the split lock detection for this task so it can make
+ * progress and set TIF_SLD so the detection is re-enabled via
+ * switch_to_sld() when the task is scheduled out.
+ */
+ __sld_msr_set(false);
+ set_tsk_thread_flag(current, TIF_SLD);
+ return true;
+}
+
+/*
+ * This function is called only when switching between tasks with
+ * different split-lock detection modes. It sets the MSR for the
+ * mode of the new task. This is right most of the time, but since
+ * the MSR is shared by hyperthreads on a physical core there can
+ * be glitches when the two threads need different modes.
+ */
+void switch_to_sld(unsigned long tifn)
+{
+ __sld_msr_set(tifn & _TIF_SLD);
+}
+
+#define SPLIT_LOCK_CPU(model) {X86_VENDOR_INTEL, 6, model, X86_FEATURE_ANY}
+
+/*
+ * The following processors have split lock detection feature. But since they
+ * don't have MSR IA32_CORE_CAPABILITIES, the feature cannot be enumerated by
+ * the MSR. So enumerate the feature by family and model on these processors.
+ */
+static const struct x86_cpu_id split_lock_cpu_ids[] __initconst = {
+ SPLIT_LOCK_CPU(INTEL_FAM6_ICELAKE_X),
+ SPLIT_LOCK_CPU(INTEL_FAM6_ICELAKE_L),
+ {}
+};
+
+void __init cpu_set_core_cap_bits(struct cpuinfo_x86 *c)
+{
+ u64 ia32_core_caps = 0;
+
+ if (c->x86_vendor != X86_VENDOR_INTEL)
+ return;
+ if (cpu_has(c, X86_FEATURE_CORE_CAPABILITIES)) {
+ /* Enumerate features reported in IA32_CORE_CAPABILITIES MSR. */
+ rdmsrl(MSR_IA32_CORE_CAPS, ia32_core_caps);
+ } else if (!boot_cpu_has(X86_FEATURE_HYPERVISOR)) {
+ /* Enumerate split lock detection by family and model. */
+ if (x86_match_cpu(split_lock_cpu_ids))
+ ia32_core_caps |= MSR_IA32_CORE_CAPS_SPLIT_LOCK_DETECT;
+ }
+
+ if (ia32_core_caps & MSR_IA32_CORE_CAPS_SPLIT_LOCK_DETECT)
+ split_lock_setup();
+}
diff --git a/arch/x86/kernel/process.c b/arch/x86/kernel/process.c
index 839b5244e3b7..a43c32868c3c 100644
--- a/arch/x86/kernel/process.c
+++ b/arch/x86/kernel/process.c
@@ -650,6 +650,9 @@ void __switch_to_xtra(struct task_struct *prev_p, struct task_struct *next_p)
/* Enforce MSR update to ensure consistent state */
__speculation_ctrl_update(~tifn, tifn);
}
+
+ if ((tifp ^ tifn) & _TIF_SLD)
+ switch_to_sld(tifn);
}

/*
diff --git a/arch/x86/kernel/traps.c b/arch/x86/kernel/traps.c
index 9e6f822922a3..9f42f0a32185 100644
--- a/arch/x86/kernel/traps.c
+++ b/arch/x86/kernel/traps.c
@@ -46,6 +46,7 @@
#include <asm/traps.h>
#include <asm/desc.h>
#include <asm/fpu/internal.h>
+#include <asm/cpu.h>
#include <asm/cpu_entry_area.h>
#include <asm/mce.h>
#include <asm/fixmap.h>
@@ -244,7 +245,6 @@ do_trap(int trapnr, int signr, char *str, struct pt_regs *regs,
{
struct task_struct *tsk = current;

-
if (!do_trap_no_signal(tsk, trapnr, str, regs, error_code))
return;

@@ -290,9 +290,29 @@ DO_ERROR(X86_TRAP_OLD_MF, SIGFPE, 0, NULL, "coprocessor segment overru
DO_ERROR(X86_TRAP_TS, SIGSEGV, 0, NULL, "invalid TSS", invalid_TSS)
DO_ERROR(X86_TRAP_NP, SIGBUS, 0, NULL, "segment not present", segment_not_present)
DO_ERROR(X86_TRAP_SS, SIGBUS, 0, NULL, "stack segment", stack_segment)
-DO_ERROR(X86_TRAP_AC, SIGBUS, BUS_ADRALN, NULL, "alignment check", alignment_check)
#undef IP

+dotraplinkage void do_alignment_check(struct pt_regs *regs, long error_code)
+{
+ char *str = "alignment check";
+
+ RCU_LOCKDEP_WARN(!rcu_is_watching(), "entry code didn't wake RCU");
+
+ if (notify_die(DIE_TRAP, str, regs, error_code, X86_TRAP_AC, SIGBUS) == NOTIFY_STOP)
+ return;
+
+ if (!user_mode(regs))
+ die("Split lock detected\n", regs, error_code);
+
+ local_irq_enable();
+
+ if (handle_user_split_lock(regs, error_code))
+ return;
+
+ do_trap(X86_TRAP_AC, SIGBUS, "alignment check", regs,
+ error_code, BUS_ADRALN, NULL);
+}
+
#ifdef CONFIG_VMAP_STACK
__visible void __noreturn handle_stack_overflow(const char *message,
struct pt_regs *regs,
--
2.21.1

2020-01-25 22:19:47

by Borislav Petkov

[permalink] [raw]
Subject: Re: [PATCH v15] x86/split_lock: Enable split lock detection by kernel

On Sat, Jan 25, 2020 at 01:42:32PM -0800, Luck, Tony wrote:
> On Sat, Jan 25, 2020 at 09:33:12PM +0100, Borislav Petkov wrote:
> > On Sat, Jan 25, 2020 at 09:12:21PM +0100, Peter Zijlstra wrote:
> > > My thinking was Virt, virt likes to mess up all msr expectations.
> >
> > My only worry is to have it written down why we're doing this so that it
> > can be changed/removed later, when we've forgotten all about split lock.
> > Because pretty often we look at a comment-less chunk of code and wonder,
> > "why the hell did we add this in the first place."
>
> Ok. I added a comment:
>
> * Use the "safe" versions of rdmsr/wrmsr here because although code
> * checks CPUID and MSR bits to make sure the TEST_CTRL MSR should
> * exist, there may be glitches in virtualization that leave a guest
> * with an incorrect view of real h/w capabilities.

Yap, nice.

Thx.

--
Regards/Gruss,
Boris.

https://people.kernel.org/tglx/notes-about-netiquette

2020-01-25 22:45:38

by Mark D Rustad

[permalink] [raw]
Subject: Re: [PATCH v16] x86/split_lock: Enable split lock detection by kernel

On Jan 25, 2020, at 2:07 PM, Luck, Tony <[email protected]> wrote:

> - Transitioning between modes at runtime isn't supported and disabling
> is tracked per task, so hardware will always reach a steady state that
> matches the configured mode.

Maybe "isn't supported" is not really the right wording. I would think that
if it truly weren't supported that you really shouldn't be changing the
mode at all at runtime. Do you really just mean "isn't atomic"? Or is there
something deeper about it? If so, are there other possible risks associated
with changing the mode at runtime?

Sorry, the wording just happened to catch my eye and my mind immediately
went to "how can you be doing something that is not supported?"

--
Mark Rustad, [email protected]


Attachments:
signature.asc (890.00 B)
Message signed with OpenPGP

2020-01-25 23:13:59

by Tony Luck

[permalink] [raw]
Subject: Re: [PATCH v16] x86/split_lock: Enable split lock detection by kernel


>
> Maybe "isn't supported" is not really the right wording. I would think that if it truly weren't supported that you really shouldn't be changing the mode at all at runtime. Do you really just mean "isn't atomic"? Or is there something deeper about it? If so, are there other possible risks associated with changing the mode at runtime?
>
> Sorry, the wording just happened to catch my eye

The “modes” here means the three option selectable by command line option. Off/warn/fatal. Some earlier versions of this patch had a sysfs interface to switch things around.

Not whether we have the MSR enabled/disabled.

If Thomas or Boris finds more things to fix then I’ll take a look at clarifying this comment too.

-Tony

2020-01-26 00:03:53

by Arvind Sankar

[permalink] [raw]
Subject: Re: [PATCH v15] x86/split_lock: Enable split lock detection by kernel

On Sat, Jan 25, 2020 at 01:50:03PM -0800, Luck, Tony wrote:
> >
> > I might be missing something but shouldnt this be !nextflag given the
> > flag being unset is when the task wants sld?
>
> That logic is convoluted ... but Thomas showed me a much better
> way that is also much simpler ... so this code has gone now. The
> new version is far easier to read (argument is flags for the new task
> that we are switching to)
>
> void switch_to_sld(unsigned long tifn)
> {
> __sld_msr_set(tifn & _TIF_SLD);
> }
>
> -Tony

why doesnt this have the same problem though? tifn & _TIF_SLD still
needs to be logically negated no?

2020-01-26 00:35:50

by Andy Lutomirski

[permalink] [raw]
Subject: Re: [PATCH v16] x86/split_lock: Enable split lock detection by kernel

On Sat, Jan 25, 2020 at 2:07 PM Luck, Tony <[email protected]> wrote:
>
> From: "Peter Zijlstra (Intel)" <[email protected]>
>

> +void __init cpu_set_core_cap_bits(struct cpuinfo_x86 *c)
> +{
> + u64 ia32_core_caps = 0;
> +
> + if (c->x86_vendor != X86_VENDOR_INTEL)
> + return;
> + if (cpu_has(c, X86_FEATURE_CORE_CAPABILITIES)) {
> + /* Enumerate features reported in IA32_CORE_CAPABILITIES MSR. */
> + rdmsrl(MSR_IA32_CORE_CAPS, ia32_core_caps);
> + } else if (!boot_cpu_has(X86_FEATURE_HYPERVISOR)) {
> + /* Enumerate split lock detection by family and model. */
> + if (x86_match_cpu(split_lock_cpu_ids))
> + ia32_core_caps |= MSR_IA32_CORE_CAPS_SPLIT_LOCK_DETECT;
> + }

I was chatting with Andrew Cooper, and apparently there are a ton of
hypervisors bugs in this space, and the bugs take two forms. Some
hypervisors might #GP the read, and some might allow the read but
silently swallow writes. This isn't *that* likely given that the
hypervisor bit is the default, but we could improve this like (sorry
for awful whitespace);

static bool have_split_lock_detect(void) {
unsigned long tmp;

if (cpu_has(c, X86_FEATURE_CORE_CAPABILITIES) {
/* Enumerate features reported in IA32_CORE_CAPABILITIES MSR. */
rdmsrl(MSR_IA32_CORE_CAPS, tmp);
if (tmp & MSR_IA32_CORE_CAPS_SPLIT_LOCK_DETECT)
return true;
}

if (cpu_has(c, X86_FEATURE_HYPERVISOR))
return false;

if (rdmsrl_safe(MSR_TEST_CTRL, &tmp))
return false;

if (wrmsrl_safe(MSR_TEST_CTRL, tmp ^ MSR_TEST_CTRL_SPLIT_LOCK_DETECT))
return false;

wrmsrl(MSR_TEST_CTRL, tmp);
}

Although I suppose the pile of wrmsrl_safes() in the existing patch
might be sufficient.

All this being said, the current code appears wrong if a CPU is in the
list but does have X86_FEATURE_CORE_CAPABILITIES. Are there such
CPUs? I think either the logic should be changed or a comment should
be added.

2020-01-26 02:54:00

by Tony Luck

[permalink] [raw]
Subject: Re: [PATCH v15] x86/split_lock: Enable split lock detection by kernel

On Sat, Jan 25, 2020 at 06:51:31PM -0500, Arvind Sankar wrote:
> On Sat, Jan 25, 2020 at 01:50:03PM -0800, Luck, Tony wrote:
> > >
> > > I might be missing something but shouldnt this be !nextflag given the
> > > flag being unset is when the task wants sld?
> >
> > That logic is convoluted ... but Thomas showed me a much better
> > way that is also much simpler ... so this code has gone now. The
> > new version is far easier to read (argument is flags for the new task
> > that we are switching to)
> >
> > void switch_to_sld(unsigned long tifn)
> > {
> > __sld_msr_set(tifn & _TIF_SLD);
> > }
> >
> > -Tony
>
> why doesnt this have the same problem though? tifn & _TIF_SLD still
> needs to be logically negated no?

There's something very odd happening. I added this trace code:

if ((tifp ^ tifn) & _TIF_SLD) {
pr_info("switch from %d (%d) to %d (%d)\n",
task_tgid_nr(prev_p), (tifp & _TIF_SLD) != 0,
task_tgid_nr(next_p), (tifn & _TIF_SLD) != 0);
switch_to_sld(tifn);
}

Then ran:

$ taskset -cp 10 $$ # bind everything to just one CPU
pid 3205's current affinity list: 0-55
pid 3205's new affinity list: 10
$ ./spin & # infinite loop
[1] 3289
$ ./split_lock_test & # 10 * split lock with udelay(1000) between
[2] 3294

I was expecting to see transitions back & forward between the "spin"
process (which won't have TIF_SLD set) and the test program (which
will have it set after the first split executes).

But I see:
[ 83.871629] x86/split lock detection: #AC: split_lock_test/3294 took a split_lock trap at address: 0x4007fc
[ 83.871638] process: switch from 3294 (1) to 3289 (0)
[ 83.882583] process: switch from 3294 (1) to 3289 (0)
[ 83.893555] process: switch from 3294 (1) to 3289 (0)
[ 83.904528] process: switch from 3294 (1) to 3289 (0)
[ 83.915501] process: switch from 3294 (1) to 3289 (0)
[ 83.926475] process: switch from 3294 (1) to 3289 (0)
[ 83.937448] process: switch from 3294 (1) to 3289 (0)
[ 83.948421] process: switch from 3294 (1) to 3289 (0)
[ 83.959394] process: switch from 3294 (1) to 3289 (0)
[ 83.970439] process: switch from 3294 (1) to 3289 (0)

i.e. only the switches from the test process to the spinner.

So split-lock testing is disabled when we first hit the #AC
and is never re-enabled because we don't pass through this
code when switching to the spinner.

So you are right that the argument is inverted. We should be
ENABLING split lock when switching to the spin loop process
and we actually disable.

So why don't we come through __switch_to_xtra() when the spinner
runs out its time slice (or the udelay interrupt happens and
preempts the spinner)?

-Tony

2020-01-26 17:39:15

by Mark D Rustad

[permalink] [raw]
Subject: Re: [PATCH v16] x86/split_lock: Enable split lock detection by kernel

On Jan 25, 2020, at 3:10 PM, Luck, Tony <[email protected]> wrote:

> The “modes” here means the three option selectable by command line
> option. Off/warn/fatal. Some earlier versions of this patch had a sysfs
> interface to switch things around.
>
> Not whether we have the MSR enabled/disabled.

Ok. Thanks for the clarification.

--
Mark Rustad, [email protected]


Attachments:
signature.asc (890.00 B)
Message signed with OpenPGP

2020-01-26 20:03:00

by Tony Luck

[permalink] [raw]
Subject: Re: [PATCH v16] x86/split_lock: Enable split lock detection by kernel

On Sat, Jan 25, 2020 at 04:34:29PM -0800, Andy Lutomirski wrote:
> Although I suppose the pile of wrmsrl_safes() in the existing patch
> might be sufficient.
>
> All this being said, the current code appears wrong if a CPU is in the
> list but does have X86_FEATURE_CORE_CAPABILITIES. Are there such
> CPUs? I think either the logic should be changed or a comment should
> be added.

Is it really wrong? Code check the CPUID & CORE_CAPABILTIES first and
believes what they say. Otherwise falls back to the x86_match_cpu()
list.

I don't believe we put a CPU on that list that currently says
it supports CORE_CAPABILITIES. That could theoretically change
with a microcode update. I doubt we'd waste microcode space to do
that, but if we did, I assume we'd include the split lock bit
in the newly present MSR. So behavior would not change.

-Tony

2020-01-26 20:06:41

by Tony Luck

[permalink] [raw]
Subject: [PATCH v17] x86/split_lock: Enable split lock detection by kernel

From: "Peter Zijlstra (Intel)" <[email protected]>

A split-lock occurs when an atomic instruction operates on data
that spans two cache lines. In order to maintain atomicity the
core takes a global bus lock.

This is typically >1000 cycles slower than an atomic operation
within a cache line. It also disrupts performance on other cores
(which must wait for the bus lock to be released before their
memory operations can complete). For real-time systems this may
mean missing deadlines. For other systems it may just be very
annoying.

Some CPUs have the capability to raise an #AC trap when a
split lock is attempted.

Provide a command line option to give the user choices on how
to handle this. split_lock_detect=
off - not enabled (no traps for split locks)
warn - warn once when an application does a
split lock, but allow it to continue
running.
fatal - Send SIGBUS to applications that cause split lock

On systems that support split lock detection the default is "warn". Note
that if the kernel hits a split lock in any mode other than "off" it
will OOPs.

One implementation wrinkle is that the MSR to control the
split lock detection is per-core, not per thread. This might
result in some short lived races on HT systems in "warn" mode
if Linux tries to enable on one thread while disabling on
the other. Race analysis by Sean Christopherson:

- Toggling of split-lock is only done in "warn" mode. Worst case
scenario of a race is that a misbehaving task will generate multiple
#AC exceptions on the same instruction. And this race will only occur
if both siblings are running tasks that generate split-lock #ACs, e.g.
a race where sibling threads are writing different values will only
occur if CPUx is disabling split-lock after an #AC and CPUy is
re-enabling split-lock after *its* previous task generated an #AC.
- Transitioning between off/warn/fatal modes at runtime isn't supported
and disabling is tracked per task, so hardware will always reach a steady
state that matches the configured mode. I.e. split-lock is guaranteed to
be enabled in hardware once all _TIF_SLD threads have been scheduled out.

Signed-off-by: Peter Zijlstra (Intel) <[email protected]>
Co-developed-by: Fenghua Yu <[email protected]>
Signed-off-by: Fenghua Yu <[email protected]>
Co-developed-by: Tony Luck <[email protected]>
Signed-off-by: Tony Luck <[email protected]>
---

v17:
Mark Rustad:
Clarify in commit comment that changing modes refers to the
boot time option off/warn/fatal. Not to the split-lock detection
mode set by the TEST_CTL MSR.

Arvind Sankar:
The test for whether to reset the MSR in context switch was reversed.
Should be: __sld_msr_set(!(tifn & _TIF_SLD));
[Sorry you had to tell me twice]

Me:
Make sure we call __switch_to_xtra() both when switching to a task
with TIF_SLD set as well as switching from a TIF_SLD task. See
<asm/thread_info.h> now sets _TIF_SLD in _TIF_WORK_CTXSW_BASE
instead of in _TIF_WORK_CTXSW_PREV

.../admin-guide/kernel-parameters.txt | 22 +++
arch/x86/include/asm/cpu.h | 12 ++
arch/x86/include/asm/cpufeatures.h | 2 +
arch/x86/include/asm/msr-index.h | 9 +
arch/x86/include/asm/thread_info.h | 8 +-
arch/x86/kernel/cpu/common.c | 2 +
arch/x86/kernel/cpu/intel.c | 177 ++++++++++++++++++
arch/x86/kernel/process.c | 3 +
arch/x86/kernel/traps.c | 24 ++-
9 files changed, 254 insertions(+), 5 deletions(-)

diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt
index 0ab95f48292b..97d7c7cfd107 100644
--- a/Documentation/admin-guide/kernel-parameters.txt
+++ b/Documentation/admin-guide/kernel-parameters.txt
@@ -4638,6 +4638,28 @@
spia_pedr=
spia_peddr=

+ split_lock_detect=
+ [X86] Enable split lock detection
+
+ When enabled (and if hardware support is present), atomic
+ instructions that access data across cache line
+ boundaries will result in an alignment check exception.
+
+ off - not enabled
+
+ warn - the kernel will emit rate limited warnings
+ about applications triggering the #AC
+ exception. This mode is the default on h/w
+ that supports split lock detection.
+
+ fatal - the kernel will send SIGBUS to applications
+ that trigger the #AC exception.
+
+ If an #AC exception is hit in the kernel or in
+ firmware (i.e. not while executiing in user mode)
+ the the Linux will oops in either "warn" or "fatal"
+ mode.
+
srcutree.counter_wrap_check [KNL]
Specifies how frequently to check for
grace-period sequence counter wrap for the
diff --git a/arch/x86/include/asm/cpu.h b/arch/x86/include/asm/cpu.h
index adc6cc86b062..ff6f3ca649b3 100644
--- a/arch/x86/include/asm/cpu.h
+++ b/arch/x86/include/asm/cpu.h
@@ -40,4 +40,16 @@ int mwait_usable(const struct cpuinfo_x86 *);
unsigned int x86_family(unsigned int sig);
unsigned int x86_model(unsigned int sig);
unsigned int x86_stepping(unsigned int sig);
+#ifdef CONFIG_CPU_SUP_INTEL
+extern void __init cpu_set_core_cap_bits(struct cpuinfo_x86 *c);
+extern void switch_to_sld(unsigned long tifn);
+extern bool handle_user_split_lock(struct pt_regs *regs, long error_code);
+#else
+static inline void __init cpu_set_core_cap_bits(struct cpuinfo_x86 *c) {}
+static inline void switch_to_sld(unsigned long tifn) {}
+static inline bool handle_user_split_lock(struct pt_regs *regs, long error_code)
+{
+ return false;
+}
+#endif
#endif /* _ASM_X86_CPU_H */
diff --git a/arch/x86/include/asm/cpufeatures.h b/arch/x86/include/asm/cpufeatures.h
index f3327cb56edf..cd56ad5d308e 100644
--- a/arch/x86/include/asm/cpufeatures.h
+++ b/arch/x86/include/asm/cpufeatures.h
@@ -285,6 +285,7 @@
#define X86_FEATURE_CQM_MBM_LOCAL (11*32+ 3) /* LLC Local MBM monitoring */
#define X86_FEATURE_FENCE_SWAPGS_USER (11*32+ 4) /* "" LFENCE in user entry SWAPGS path */
#define X86_FEATURE_FENCE_SWAPGS_KERNEL (11*32+ 5) /* "" LFENCE in kernel entry SWAPGS path */
+#define X86_FEATURE_SPLIT_LOCK_DETECT (11*32+ 6) /* #AC for split lock */

/* Intel-defined CPU features, CPUID level 0x00000007:1 (EAX), word 12 */
#define X86_FEATURE_AVX512_BF16 (12*32+ 5) /* AVX512 BFLOAT16 instructions */
@@ -367,6 +368,7 @@
#define X86_FEATURE_INTEL_STIBP (18*32+27) /* "" Single Thread Indirect Branch Predictors */
#define X86_FEATURE_FLUSH_L1D (18*32+28) /* Flush L1D cache */
#define X86_FEATURE_ARCH_CAPABILITIES (18*32+29) /* IA32_ARCH_CAPABILITIES MSR (Intel) */
+#define X86_FEATURE_CORE_CAPABILITIES (18*32+30) /* "" IA32_CORE_CAPABILITIES MSR */
#define X86_FEATURE_SPEC_CTRL_SSBD (18*32+31) /* "" Speculative Store Bypass Disable */

/*
diff --git a/arch/x86/include/asm/msr-index.h b/arch/x86/include/asm/msr-index.h
index ebe1685e92dd..8821697a7549 100644
--- a/arch/x86/include/asm/msr-index.h
+++ b/arch/x86/include/asm/msr-index.h
@@ -41,6 +41,10 @@

/* Intel MSRs. Some also available on other CPUs */

+#define MSR_TEST_CTRL 0x00000033
+#define MSR_TEST_CTRL_SPLIT_LOCK_DETECT_BIT 29
+#define MSR_TEST_CTRL_SPLIT_LOCK_DETECT BIT(MSR_TEST_CTRL_SPLIT_LOCK_DETECT_BIT)
+
#define MSR_IA32_SPEC_CTRL 0x00000048 /* Speculation Control */
#define SPEC_CTRL_IBRS BIT(0) /* Indirect Branch Restricted Speculation */
#define SPEC_CTRL_STIBP_SHIFT 1 /* Single Thread Indirect Branch Predictor (STIBP) bit */
@@ -70,6 +74,11 @@
*/
#define MSR_IA32_UMWAIT_CONTROL_TIME_MASK (~0x03U)

+/* Abbreviated from Intel SDM name IA32_CORE_CAPABILITIES */
+#define MSR_IA32_CORE_CAPS 0x000000cf
+#define MSR_IA32_CORE_CAPS_SPLIT_LOCK_DETECT_BIT 5
+#define MSR_IA32_CORE_CAPS_SPLIT_LOCK_DETECT BIT(MSR_IA32_CORE_CAPS_SPLIT_LOCK_DETECT_BIT)
+
#define MSR_PKG_CST_CONFIG_CONTROL 0x000000e2
#define NHM_C3_AUTO_DEMOTE (1UL << 25)
#define NHM_C1_AUTO_DEMOTE (1UL << 26)
diff --git a/arch/x86/include/asm/thread_info.h b/arch/x86/include/asm/thread_info.h
index cf4327986e98..e90ddac22d11 100644
--- a/arch/x86/include/asm/thread_info.h
+++ b/arch/x86/include/asm/thread_info.h
@@ -92,6 +92,7 @@ struct thread_info {
#define TIF_NOCPUID 15 /* CPUID is not accessible in userland */
#define TIF_NOTSC 16 /* TSC is not accessible in userland */
#define TIF_IA32 17 /* IA32 compatibility process */
+#define TIF_SLD 18 /* Restore split lock detection on context switch */
#define TIF_NOHZ 19 /* in adaptive nohz mode */
#define TIF_MEMDIE 20 /* is terminating due to OOM killer */
#define TIF_POLLING_NRFLAG 21 /* idle is polling for TIF_NEED_RESCHED */
@@ -122,6 +123,7 @@ struct thread_info {
#define _TIF_NOCPUID (1 << TIF_NOCPUID)
#define _TIF_NOTSC (1 << TIF_NOTSC)
#define _TIF_IA32 (1 << TIF_IA32)
+#define _TIF_SLD (1 << TIF_SLD)
#define _TIF_NOHZ (1 << TIF_NOHZ)
#define _TIF_POLLING_NRFLAG (1 << TIF_POLLING_NRFLAG)
#define _TIF_IO_BITMAP (1 << TIF_IO_BITMAP)
@@ -145,7 +147,7 @@ struct thread_info {
/* flags to check in __switch_to() */
#define _TIF_WORK_CTXSW_BASE \
(_TIF_NOCPUID | _TIF_NOTSC | _TIF_BLOCKSTEP | \
- _TIF_SSBD | _TIF_SPEC_FORCE_UPDATE)
+ _TIF_SSBD | _TIF_SPEC_FORCE_UPDATE | _TIF_SLD)

/*
* Avoid calls to __switch_to_xtra() on UP as STIBP is not evaluated.
@@ -157,10 +159,10 @@ struct thread_info {
#endif

#ifdef CONFIG_X86_IOPL_IOPERM
-# define _TIF_WORK_CTXSW_PREV (_TIF_WORK_CTXSW| _TIF_USER_RETURN_NOTIFY | \
+# define _TIF_WORK_CTXSW_PREV (_TIF_WORK_CTXSW | _TIF_USER_RETURN_NOTIFY | \
_TIF_IO_BITMAP)
#else
-# define _TIF_WORK_CTXSW_PREV (_TIF_WORK_CTXSW| _TIF_USER_RETURN_NOTIFY)
+# define _TIF_WORK_CTXSW_PREV (_TIF_WORK_CTXSW | _TIF_USER_RETURN_NOTIFY)
#endif

#define _TIF_WORK_CTXSW_NEXT (_TIF_WORK_CTXSW)
diff --git a/arch/x86/kernel/cpu/common.c b/arch/x86/kernel/cpu/common.c
index 86b8241c8209..adb2f639f388 100644
--- a/arch/x86/kernel/cpu/common.c
+++ b/arch/x86/kernel/cpu/common.c
@@ -1242,6 +1242,8 @@ static void __init early_identify_cpu(struct cpuinfo_x86 *c)

cpu_set_bug_bits(c);

+ cpu_set_core_cap_bits(c);
+
fpu__init_system(c);

#ifdef CONFIG_X86_32
diff --git a/arch/x86/kernel/cpu/intel.c b/arch/x86/kernel/cpu/intel.c
index 57473e2c0869..99f62e7eb4b0 100644
--- a/arch/x86/kernel/cpu/intel.c
+++ b/arch/x86/kernel/cpu/intel.c
@@ -19,6 +19,8 @@
#include <asm/microcode_intel.h>
#include <asm/hwcap2.h>
#include <asm/elf.h>
+#include <asm/cpu_device_id.h>
+#include <asm/cmdline.h>

#ifdef CONFIG_X86_64
#include <linux/topology.h>
@@ -31,6 +33,19 @@
#include <asm/apic.h>
#endif

+enum split_lock_detect_state {
+ sld_off = 0,
+ sld_warn,
+ sld_fatal,
+};
+
+/*
+ * Default to sld_off because most systems do not support split lock detection
+ * split_lock_setup() will switch this to sld_warn on systems that support
+ * split lock detect, and then check to see if there is a command line override.
+ */
+static enum split_lock_detect_state sld_state = sld_off;
+
/*
* Just in case our CPU detection goes bad, or you have a weird system,
* allow a way to override the automatic disabling of MPX.
@@ -606,6 +621,8 @@ static void init_intel_misc_features(struct cpuinfo_x86 *c)
wrmsrl(MSR_MISC_FEATURES_ENABLES, msr);
}

+static void split_lock_init(void);
+
static void init_intel(struct cpuinfo_x86 *c)
{
early_init_intel(c);
@@ -720,6 +737,8 @@ static void init_intel(struct cpuinfo_x86 *c)
tsx_enable();
if (tsx_ctrl_state == TSX_CTRL_DISABLE)
tsx_disable();
+
+ split_lock_init();
}

#ifdef CONFIG_X86_32
@@ -981,3 +1000,161 @@ static const struct cpu_dev intel_cpu_dev = {
};

cpu_dev_register(intel_cpu_dev);
+
+#undef pr_fmt
+#define pr_fmt(fmt) "x86/split lock detection: " fmt
+
+static const struct {
+ const char *option;
+ enum split_lock_detect_state state;
+} sld_options[] __initconst = {
+ { "off", sld_off },
+ { "warn", sld_warn },
+ { "fatal", sld_fatal },
+};
+
+static inline bool match_option(const char *arg, int arglen, const char *opt)
+{
+ int len = strlen(opt);
+
+ return len == arglen && !strncmp(arg, opt, len);
+}
+
+static void __init split_lock_setup(void)
+{
+ char arg[20];
+ int i, ret;
+
+ sld_state = sld_warn;
+ setup_force_cpu_cap(X86_FEATURE_SPLIT_LOCK_DETECT);
+
+ ret = cmdline_find_option(boot_command_line, "split_lock_detect",
+ arg, sizeof(arg));
+ if (ret < 0)
+ goto print;
+
+ for (i = 0; i < ARRAY_SIZE(sld_options); i++) {
+ if (match_option(arg, ret, sld_options[i].option)) {
+ sld_state = sld_options[i].state;
+ break;
+ }
+ }
+
+print:
+ switch (sld_state) {
+ case sld_off:
+ pr_info("disabled\n");
+ break;
+
+ case sld_warn:
+ pr_info("warning about user-space split_locks\n");
+ break;
+
+ case sld_fatal:
+ pr_info("sending SIGBUS on user-space split_locks\n");
+ break;
+ }
+}
+
+/*
+ * Locking is not required at the moment because only bit 29 of this
+ * MSR is implemented and locking would not prevent that the operation
+ * of one thread is immediately undone by the sibling thread.
+ * Use the "safe" versions of rdmsr/wrmsr here because although code
+ * checks CPUID and MSR bits to make sure the TEST_CTRL MSR should
+ * exist, there may be glitches in virtualization that leave a guest
+ * with an incorrect view of real h/w capabilities.
+ */
+static bool __sld_msr_set(bool on)
+{
+ u64 test_ctrl_val;
+
+ if (rdmsrl_safe(MSR_TEST_CTRL, &test_ctrl_val))
+ return false;
+
+ if (on)
+ test_ctrl_val |= MSR_TEST_CTRL_SPLIT_LOCK_DETECT;
+ else
+ test_ctrl_val &= ~MSR_TEST_CTRL_SPLIT_LOCK_DETECT;
+
+ return !wrmsrl_safe(MSR_TEST_CTRL, test_ctrl_val);
+}
+
+static void split_lock_init(void)
+{
+ if (sld_state == sld_off)
+ return;
+
+ if (__sld_msr_set(true))
+ return;
+
+ /*
+ * If this is anything other than the boot-cpu, you've done
+ * funny things and you get to keep whatever pieces.
+ */
+ pr_warn("MSR fail -- disabled\n");
+ __sld_msr_set(sld_off);
+ sld_state = sld_off;
+}
+
+bool handle_user_split_lock(struct pt_regs *regs, long error_code)
+{
+ if ((regs->flags & X86_EFLAGS_AC) || sld_state == sld_fatal)
+ return false;
+
+ pr_warn_ratelimited("#AC: %s/%d took a split_lock trap at address: 0x%lx\n",
+ current->comm, current->pid, regs->ip);
+
+ /*
+ * Disable the split lock detection for this task so it can make
+ * progress and set TIF_SLD so the detection is re-enabled via
+ * switch_to_sld() when the task is scheduled out.
+ */
+ __sld_msr_set(false);
+ set_tsk_thread_flag(current, TIF_SLD);
+ return true;
+}
+
+/*
+ * This function is called only when switching between tasks with
+ * different split-lock detection modes. It sets the MSR for the
+ * mode of the new task. This is right most of the time, but since
+ * the MSR is shared by hyperthreads on a physical core there can
+ * be glitches when the two threads need different modes.
+ */
+void switch_to_sld(unsigned long tifn)
+{
+ __sld_msr_set(!(tifn & _TIF_SLD));
+}
+
+#define SPLIT_LOCK_CPU(model) {X86_VENDOR_INTEL, 6, model, X86_FEATURE_ANY}
+
+/*
+ * The following processors have split lock detection feature. But since they
+ * don't have MSR IA32_CORE_CAPABILITIES, the feature cannot be enumerated by
+ * the MSR. So enumerate the feature by family and model on these processors.
+ */
+static const struct x86_cpu_id split_lock_cpu_ids[] __initconst = {
+ SPLIT_LOCK_CPU(INTEL_FAM6_ICELAKE_X),
+ SPLIT_LOCK_CPU(INTEL_FAM6_ICELAKE_L),
+ {}
+};
+
+void __init cpu_set_core_cap_bits(struct cpuinfo_x86 *c)
+{
+ u64 ia32_core_caps = 0;
+
+ if (c->x86_vendor != X86_VENDOR_INTEL)
+ return;
+ if (cpu_has(c, X86_FEATURE_CORE_CAPABILITIES)) {
+ /* Enumerate features reported in IA32_CORE_CAPABILITIES MSR. */
+ rdmsrl(MSR_IA32_CORE_CAPS, ia32_core_caps);
+ } else if (!boot_cpu_has(X86_FEATURE_HYPERVISOR)) {
+ /* Enumerate split lock detection by family and model. */
+ if (x86_match_cpu(split_lock_cpu_ids))
+ ia32_core_caps |= MSR_IA32_CORE_CAPS_SPLIT_LOCK_DETECT;
+ }
+
+ if (ia32_core_caps & MSR_IA32_CORE_CAPS_SPLIT_LOCK_DETECT)
+ split_lock_setup();
+}
diff --git a/arch/x86/kernel/process.c b/arch/x86/kernel/process.c
index 839b5244e3b7..a43c32868c3c 100644
--- a/arch/x86/kernel/process.c
+++ b/arch/x86/kernel/process.c
@@ -650,6 +650,9 @@ void __switch_to_xtra(struct task_struct *prev_p, struct task_struct *next_p)
/* Enforce MSR update to ensure consistent state */
__speculation_ctrl_update(~tifn, tifn);
}
+
+ if ((tifp ^ tifn) & _TIF_SLD)
+ switch_to_sld(tifn);
}

/*
diff --git a/arch/x86/kernel/traps.c b/arch/x86/kernel/traps.c
index 9e6f822922a3..9f42f0a32185 100644
--- a/arch/x86/kernel/traps.c
+++ b/arch/x86/kernel/traps.c
@@ -46,6 +46,7 @@
#include <asm/traps.h>
#include <asm/desc.h>
#include <asm/fpu/internal.h>
+#include <asm/cpu.h>
#include <asm/cpu_entry_area.h>
#include <asm/mce.h>
#include <asm/fixmap.h>
@@ -244,7 +245,6 @@ do_trap(int trapnr, int signr, char *str, struct pt_regs *regs,
{
struct task_struct *tsk = current;

-
if (!do_trap_no_signal(tsk, trapnr, str, regs, error_code))
return;

@@ -290,9 +290,29 @@ DO_ERROR(X86_TRAP_OLD_MF, SIGFPE, 0, NULL, "coprocessor segment overru
DO_ERROR(X86_TRAP_TS, SIGSEGV, 0, NULL, "invalid TSS", invalid_TSS)
DO_ERROR(X86_TRAP_NP, SIGBUS, 0, NULL, "segment not present", segment_not_present)
DO_ERROR(X86_TRAP_SS, SIGBUS, 0, NULL, "stack segment", stack_segment)
-DO_ERROR(X86_TRAP_AC, SIGBUS, BUS_ADRALN, NULL, "alignment check", alignment_check)
#undef IP

+dotraplinkage void do_alignment_check(struct pt_regs *regs, long error_code)
+{
+ char *str = "alignment check";
+
+ RCU_LOCKDEP_WARN(!rcu_is_watching(), "entry code didn't wake RCU");
+
+ if (notify_die(DIE_TRAP, str, regs, error_code, X86_TRAP_AC, SIGBUS) == NOTIFY_STOP)
+ return;
+
+ if (!user_mode(regs))
+ die("Split lock detected\n", regs, error_code);
+
+ local_irq_enable();
+
+ if (handle_user_split_lock(regs, error_code))
+ return;
+
+ do_trap(X86_TRAP_AC, SIGBUS, "alignment check", regs,
+ error_code, BUS_ADRALN, NULL);
+}
+
#ifdef CONFIG_VMAP_STACK
__visible void __noreturn handle_stack_overflow(const char *message,
struct pt_regs *regs,
--
2.21.1

2020-01-27 02:09:28

by Tony Luck

[permalink] [raw]
Subject: Re: [PATCH v15] x86/split_lock: Enable split lock detection by kernel

On Sat, Jan 25, 2020 at 6:53 PM Luck, Tony <[email protected]> wrote:

> So why don't we come through __switch_to_xtra() when the spinner
> runs out its time slice (or the udelay interrupt happens and
> preempts the spinner)?

To close out this part of the thread. Linux doesn't call __switch_to_xtra()
in this case because I didn't ask it to. There are separate masks to check
TIF bits for the previous and next tasks in a context switch. I'd only set the
_TIF_SLD bit in the mask for the previous task.

See the v17 I posted a few hours before this message for the fix.

-Tony

2020-01-27 08:07:02

by Peter Zijlstra

[permalink] [raw]
Subject: Re: [PATCH v15] x86/split_lock: Enable split lock detection by kernel

On Sat, Jan 25, 2020 at 01:50:03PM -0800, Luck, Tony wrote:
> On Sat, Jan 25, 2020 at 04:25:25PM -0500, Arvind Sankar wrote:
> > On Fri, Jan 24, 2020 at 06:47:27PM -0800, Luck, Tony wrote:
> > > I did find something with a new test. Applications that hit a
> > > split lock warn as expected. But if they sleep before they hit
> > > a new split lock, we get another warning. This is may be because
> > > I messed up when fixing a PeterZ typo in the untested patch.
> > > But I think there may have been bigger problems.
> > >
> > > Context switch in V14 code did:
> > >
> > > if (tifp & _TIF_SLD)
> > > switch_to_sld(prev_p);
> > >
> > > void switch_to_sld(struct task_struct *prev)
> > > {
> > > __sld_msr_set(true);
> > > clear_tsk_thread_flag(prev, TIF_SLD);
> > > }
> > >
> > > Which re-enables split lock checking for the next process to run. But
> > > mysteriously clears the TIF_SLD bit on the previous task.
> >
> > Did Peter mean to disable it only for the current timeslice and
> > re-enable it for the next time its scheduled?
>
> He's seen and commented on this thread since I made this comment. So

Yeah, I sorta don't care either way :-)

> I'll assume not. Things get really noisy on the console (even with
> the rate limit) if split lock detection is re-enabled after a context
> switch (my new test highlighted this!)

Have you found any actual bad software ? The only way I could trigger
was by explicitly writing a program to tickle it.

2020-01-27 08:42:35

by Peter Zijlstra

[permalink] [raw]
Subject: Re: [PATCH v15] x86/split_lock: Enable split lock detection by kernel

On Sat, Jan 25, 2020 at 04:25:25PM -0500, Arvind Sankar wrote:
> On Fri, Jan 24, 2020 at 06:47:27PM -0800, Luck, Tony wrote:
> > I did find something with a new test. Applications that hit a
> > split lock warn as expected. But if they sleep before they hit
> > a new split lock, we get another warning. This is may be because
> > I messed up when fixing a PeterZ typo in the untested patch.
> > But I think there may have been bigger problems.
> >
> > Context switch in V14 code did:
> >
> > if (tifp & _TIF_SLD)
> > switch_to_sld(prev_p);
> >
> > void switch_to_sld(struct task_struct *prev)
> > {
> > __sld_msr_set(true);
> > clear_tsk_thread_flag(prev, TIF_SLD);
> > }
> >
> > Which re-enables split lock checking for the next process to run. But
> > mysteriously clears the TIF_SLD bit on the previous task.
>
> Did Peter mean to disable it only for the current timeslice and
> re-enable it for the next time its scheduled?

That was the initial approach, yes. I was thinking it might help find
multiple spots in bad programs.

And as I said; I used perf on my desktop and couldn't find a single bad
program, so I'm not actually expecting this to trigger much.

2020-01-27 09:37:24

by Peter Zijlstra

[permalink] [raw]
Subject: Re: [PATCH v15] x86/split_lock: Enable split lock detection by kernel

On Mon, Jan 27, 2020 at 09:04:19AM +0100, Peter Zijlstra wrote:
> On Sat, Jan 25, 2020 at 01:50:03PM -0800, Luck, Tony wrote:
> > On Sat, Jan 25, 2020 at 04:25:25PM -0500, Arvind Sankar wrote:
> > > On Fri, Jan 24, 2020 at 06:47:27PM -0800, Luck, Tony wrote:
> > > > I did find something with a new test. Applications that hit a
> > > > split lock warn as expected. But if they sleep before they hit
> > > > a new split lock, we get another warning. This is may be because
> > > > I messed up when fixing a PeterZ typo in the untested patch.
> > > > But I think there may have been bigger problems.
> > > >
> > > > Context switch in V14 code did:
> > > >
> > > > if (tifp & _TIF_SLD)
> > > > switch_to_sld(prev_p);
> > > >
> > > > void switch_to_sld(struct task_struct *prev)
> > > > {
> > > > __sld_msr_set(true);
> > > > clear_tsk_thread_flag(prev, TIF_SLD);
> > > > }
> > > >
> > > > Which re-enables split lock checking for the next process to run. But
> > > > mysteriously clears the TIF_SLD bit on the previous task.
> > >
> > > Did Peter mean to disable it only for the current timeslice and
> > > re-enable it for the next time its scheduled?
> >
> > He's seen and commented on this thread since I made this comment. So
>
> Yeah, I sorta don't care either way :-)

Part of the reason I did that was to get the MSR back to enabled ASAP,
to limit the blind spot on the sibling.

By leaving the TIF_SLD cleared for a task, and using the XOR logic used
for other TIF flags, the blind spots will be much larger.

2020-01-27 17:37:13

by Tony Luck

[permalink] [raw]
Subject: RE: [PATCH v15] x86/split_lock: Enable split lock detection by kernel

> Have you found any actual bad software ? The only way I could trigger
> was by explicitly writing a program to tickle it.

No application or library issues found so far (though I'm not running the kind of multi-threaded
applications that might be using atomic operations for synchronization).

Only Linux kernel seems to have APIs that make it easy for programmers to accidently split
an atomic operation between cache lines.

-Tony

2020-01-29 12:33:56

by Thomas Gleixner

[permalink] [raw]
Subject: Re: [PATCH v17] x86/split_lock: Enable split lock detection by kernel

"Luck, Tony" <[email protected]> writes:
> +static bool __sld_msr_set(bool on)
> +{
> + u64 test_ctrl_val;
> +
> + if (rdmsrl_safe(MSR_TEST_CTRL, &test_ctrl_val))
> + return false;
> +
> + if (on)
> + test_ctrl_val |= MSR_TEST_CTRL_SPLIT_LOCK_DETECT;
> + else
> + test_ctrl_val &= ~MSR_TEST_CTRL_SPLIT_LOCK_DETECT;
> +
> + return !wrmsrl_safe(MSR_TEST_CTRL, test_ctrl_val);
> +}
> +
> +static void split_lock_init(void)
> +{
> + if (sld_state == sld_off)
> + return;
> +
> + if (__sld_msr_set(true))
> + return;
> +
> + /*
> + * If this is anything other than the boot-cpu, you've done
> + * funny things and you get to keep whatever pieces.
> + */
> + pr_warn("MSR fail -- disabled\n");
> + __sld_msr_set(sld_off);

This one is pretty pointless. If the rdmsrl or the wrmsrl failed, then
the next attempt is going to fail too. Aside of that sld_off would be not
really the right argument value here. I just zap that line.

Thanks,

tglx

Subject: [tip: x86/cpu] x86/split_lock: Enable split lock detection by kernel

The following commit has been merged into the x86/cpu branch of tip:

Commit-ID: fdbfb51ae760d1bba3f89e4fa00da83016ec4dbe
Gitweb: https://git.kernel.org/tip/fdbfb51ae760d1bba3f89e4fa00da83016ec4dbe
Author: Peter Zijlstra (Intel) <[email protected]>
AuthorDate: Sun, 26 Jan 2020 12:05:35 -08:00
Committer: Thomas Gleixner <[email protected]>
CommitterDate: Wed, 29 Jan 2020 13:42:39 +01:00

x86/split_lock: Enable split lock detection by kernel

A split-lock occurs when an atomic instruction operates on data that spans
two cache lines. In order to maintain atomicity the core takes a global bus
lock.

This is typically >1000 cycles slower than an atomic operation within a
cache line. It also disrupts performance on other cores (which must wait
for the bus lock to be released before their memory operations can
complete). For real-time systems this may mean missing deadlines. For other
systems it may just be very annoying.

Some CPUs have the capability to raise an #AC trap when a split lock is
attempted.

Provide a command line option to give the user choices on how to handle
this:

split_lock_detect=
off - not enabled (no traps for split locks)
warn - warn once when an application does a
split lock, but allow it to continue
running.
fatal - Send SIGBUS to applications that cause split lock

On systems that support split lock detection the default is "warn". Note
that if the kernel hits a split lock in any mode other than "off" it will
OOPs.

One implementation wrinkle is that the MSR to control the split lock
detection is per-core, not per thread. This might result in some short
lived races on HT systems in "warn" mode if Linux tries to enable on one
thread while disabling on the other. Race analysis by Sean Christopherson:

- Toggling of split-lock is only done in "warn" mode. Worst case
scenario of a race is that a misbehaving task will generate multiple
#AC exceptions on the same instruction. And this race will only occur
if both siblings are running tasks that generate split-lock #ACs, e.g.
a race where sibling threads are writing different values will only
occur if CPUx is disabling split-lock after an #AC and CPUy is
re-enabling split-lock after *its* previous task generated an #AC.
- Transitioning between off/warn/fatal modes at runtime isn't supported
and disabling is tracked per task, so hardware will always reach a steady
state that matches the configured mode. I.e. split-lock is guaranteed to
be enabled in hardware once all _TIF_SLD threads have been scheduled out.

Signed-off-by: Peter Zijlstra (Intel) <[email protected]>
Co-developed-by: Fenghua Yu <[email protected]>
Signed-off-by: Fenghua Yu <[email protected]>
Co-developed-by: Tony Luck <[email protected]>
Signed-off-by: Tony Luck <[email protected]>
Signed-off-by: Thomas Gleixner <[email protected]>
Link: https://lore.kernel.org/r/[email protected]

---
Documentation/admin-guide/kernel-parameters.txt | 22 ++-
arch/x86/include/asm/cpu.h | 12 +-
arch/x86/include/asm/cpufeatures.h | 2 +-
arch/x86/include/asm/msr-index.h | 9 +-
arch/x86/include/asm/thread_info.h | 4 +-
arch/x86/kernel/cpu/common.c | 2 +-
arch/x86/kernel/cpu/intel.c | 175 +++++++++++++++-
arch/x86/kernel/process.c | 3 +-
arch/x86/kernel/traps.c | 24 +-
9 files changed, 250 insertions(+), 3 deletions(-)

diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt
index ec92120..87176a9 100644
--- a/Documentation/admin-guide/kernel-parameters.txt
+++ b/Documentation/admin-guide/kernel-parameters.txt
@@ -4637,6 +4637,28 @@
spia_pedr=
spia_peddr=

+ split_lock_detect=
+ [X86] Enable split lock detection
+
+ When enabled (and if hardware support is present), atomic
+ instructions that access data across cache line
+ boundaries will result in an alignment check exception.
+
+ off - not enabled
+
+ warn - the kernel will emit rate limited warnings
+ about applications triggering the #AC
+ exception. This mode is the default on CPUs
+ that supports split lock detection.
+
+ fatal - the kernel will send SIGBUS to applications
+ that trigger the #AC exception.
+
+ If an #AC exception is hit in the kernel or in
+ firmware (i.e. not while executing in user mode)
+ the kernel will oops in either "warn" or "fatal"
+ mode.
+
srcutree.counter_wrap_check [KNL]
Specifies how frequently to check for
grace-period sequence counter wrap for the
diff --git a/arch/x86/include/asm/cpu.h b/arch/x86/include/asm/cpu.h
index adc6cc8..ff6f3ca 100644
--- a/arch/x86/include/asm/cpu.h
+++ b/arch/x86/include/asm/cpu.h
@@ -40,4 +40,16 @@ int mwait_usable(const struct cpuinfo_x86 *);
unsigned int x86_family(unsigned int sig);
unsigned int x86_model(unsigned int sig);
unsigned int x86_stepping(unsigned int sig);
+#ifdef CONFIG_CPU_SUP_INTEL
+extern void __init cpu_set_core_cap_bits(struct cpuinfo_x86 *c);
+extern void switch_to_sld(unsigned long tifn);
+extern bool handle_user_split_lock(struct pt_regs *regs, long error_code);
+#else
+static inline void __init cpu_set_core_cap_bits(struct cpuinfo_x86 *c) {}
+static inline void switch_to_sld(unsigned long tifn) {}
+static inline bool handle_user_split_lock(struct pt_regs *regs, long error_code)
+{
+ return false;
+}
+#endif
#endif /* _ASM_X86_CPU_H */
diff --git a/arch/x86/include/asm/cpufeatures.h b/arch/x86/include/asm/cpufeatures.h
index f3327cb..cd56ad5 100644
--- a/arch/x86/include/asm/cpufeatures.h
+++ b/arch/x86/include/asm/cpufeatures.h
@@ -285,6 +285,7 @@
#define X86_FEATURE_CQM_MBM_LOCAL (11*32+ 3) /* LLC Local MBM monitoring */
#define X86_FEATURE_FENCE_SWAPGS_USER (11*32+ 4) /* "" LFENCE in user entry SWAPGS path */
#define X86_FEATURE_FENCE_SWAPGS_KERNEL (11*32+ 5) /* "" LFENCE in kernel entry SWAPGS path */
+#define X86_FEATURE_SPLIT_LOCK_DETECT (11*32+ 6) /* #AC for split lock */

/* Intel-defined CPU features, CPUID level 0x00000007:1 (EAX), word 12 */
#define X86_FEATURE_AVX512_BF16 (12*32+ 5) /* AVX512 BFLOAT16 instructions */
@@ -367,6 +368,7 @@
#define X86_FEATURE_INTEL_STIBP (18*32+27) /* "" Single Thread Indirect Branch Predictors */
#define X86_FEATURE_FLUSH_L1D (18*32+28) /* Flush L1D cache */
#define X86_FEATURE_ARCH_CAPABILITIES (18*32+29) /* IA32_ARCH_CAPABILITIES MSR (Intel) */
+#define X86_FEATURE_CORE_CAPABILITIES (18*32+30) /* "" IA32_CORE_CAPABILITIES MSR */
#define X86_FEATURE_SPEC_CTRL_SSBD (18*32+31) /* "" Speculative Store Bypass Disable */

/*
diff --git a/arch/x86/include/asm/msr-index.h b/arch/x86/include/asm/msr-index.h
index ebe1685..8821697 100644
--- a/arch/x86/include/asm/msr-index.h
+++ b/arch/x86/include/asm/msr-index.h
@@ -41,6 +41,10 @@

/* Intel MSRs. Some also available on other CPUs */

+#define MSR_TEST_CTRL 0x00000033
+#define MSR_TEST_CTRL_SPLIT_LOCK_DETECT_BIT 29
+#define MSR_TEST_CTRL_SPLIT_LOCK_DETECT BIT(MSR_TEST_CTRL_SPLIT_LOCK_DETECT_BIT)
+
#define MSR_IA32_SPEC_CTRL 0x00000048 /* Speculation Control */
#define SPEC_CTRL_IBRS BIT(0) /* Indirect Branch Restricted Speculation */
#define SPEC_CTRL_STIBP_SHIFT 1 /* Single Thread Indirect Branch Predictor (STIBP) bit */
@@ -70,6 +74,11 @@
*/
#define MSR_IA32_UMWAIT_CONTROL_TIME_MASK (~0x03U)

+/* Abbreviated from Intel SDM name IA32_CORE_CAPABILITIES */
+#define MSR_IA32_CORE_CAPS 0x000000cf
+#define MSR_IA32_CORE_CAPS_SPLIT_LOCK_DETECT_BIT 5
+#define MSR_IA32_CORE_CAPS_SPLIT_LOCK_DETECT BIT(MSR_IA32_CORE_CAPS_SPLIT_LOCK_DETECT_BIT)
+
#define MSR_PKG_CST_CONFIG_CONTROL 0x000000e2
#define NHM_C3_AUTO_DEMOTE (1UL << 25)
#define NHM_C1_AUTO_DEMOTE (1UL << 26)
diff --git a/arch/x86/include/asm/thread_info.h b/arch/x86/include/asm/thread_info.h
index cf43279..f807930 100644
--- a/arch/x86/include/asm/thread_info.h
+++ b/arch/x86/include/asm/thread_info.h
@@ -92,6 +92,7 @@ struct thread_info {
#define TIF_NOCPUID 15 /* CPUID is not accessible in userland */
#define TIF_NOTSC 16 /* TSC is not accessible in userland */
#define TIF_IA32 17 /* IA32 compatibility process */
+#define TIF_SLD 18 /* Restore split lock detection on context switch */
#define TIF_NOHZ 19 /* in adaptive nohz mode */
#define TIF_MEMDIE 20 /* is terminating due to OOM killer */
#define TIF_POLLING_NRFLAG 21 /* idle is polling for TIF_NEED_RESCHED */
@@ -122,6 +123,7 @@ struct thread_info {
#define _TIF_NOCPUID (1 << TIF_NOCPUID)
#define _TIF_NOTSC (1 << TIF_NOTSC)
#define _TIF_IA32 (1 << TIF_IA32)
+#define _TIF_SLD (1 << TIF_SLD)
#define _TIF_NOHZ (1 << TIF_NOHZ)
#define _TIF_POLLING_NRFLAG (1 << TIF_POLLING_NRFLAG)
#define _TIF_IO_BITMAP (1 << TIF_IO_BITMAP)
@@ -145,7 +147,7 @@ struct thread_info {
/* flags to check in __switch_to() */
#define _TIF_WORK_CTXSW_BASE \
(_TIF_NOCPUID | _TIF_NOTSC | _TIF_BLOCKSTEP | \
- _TIF_SSBD | _TIF_SPEC_FORCE_UPDATE)
+ _TIF_SSBD | _TIF_SPEC_FORCE_UPDATE | _TIF_SLD)

/*
* Avoid calls to __switch_to_xtra() on UP as STIBP is not evaluated.
diff --git a/arch/x86/kernel/cpu/common.c b/arch/x86/kernel/cpu/common.c
index 86b8241..adb2f63 100644
--- a/arch/x86/kernel/cpu/common.c
+++ b/arch/x86/kernel/cpu/common.c
@@ -1242,6 +1242,8 @@ static void __init early_identify_cpu(struct cpuinfo_x86 *c)

cpu_set_bug_bits(c);

+ cpu_set_core_cap_bits(c);
+
fpu__init_system(c);

#ifdef CONFIG_X86_32
diff --git a/arch/x86/kernel/cpu/intel.c b/arch/x86/kernel/cpu/intel.c
index 57473e2..5d92e38 100644
--- a/arch/x86/kernel/cpu/intel.c
+++ b/arch/x86/kernel/cpu/intel.c
@@ -19,6 +19,8 @@
#include <asm/microcode_intel.h>
#include <asm/hwcap2.h>
#include <asm/elf.h>
+#include <asm/cpu_device_id.h>
+#include <asm/cmdline.h>

#ifdef CONFIG_X86_64
#include <linux/topology.h>
@@ -31,6 +33,19 @@
#include <asm/apic.h>
#endif

+enum split_lock_detect_state {
+ sld_off = 0,
+ sld_warn,
+ sld_fatal,
+};
+
+/*
+ * Default to sld_off because most systems do not support split lock detection
+ * split_lock_setup() will switch this to sld_warn on systems that support
+ * split lock detect, unless there is a command line override.
+ */
+static enum split_lock_detect_state sld_state = sld_off;
+
/*
* Just in case our CPU detection goes bad, or you have a weird system,
* allow a way to override the automatic disabling of MPX.
@@ -606,6 +621,8 @@ static void init_intel_misc_features(struct cpuinfo_x86 *c)
wrmsrl(MSR_MISC_FEATURES_ENABLES, msr);
}

+static void split_lock_init(void);
+
static void init_intel(struct cpuinfo_x86 *c)
{
early_init_intel(c);
@@ -720,6 +737,8 @@ static void init_intel(struct cpuinfo_x86 *c)
tsx_enable();
if (tsx_ctrl_state == TSX_CTRL_DISABLE)
tsx_disable();
+
+ split_lock_init();
}

#ifdef CONFIG_X86_32
@@ -981,3 +1000,159 @@ static const struct cpu_dev intel_cpu_dev = {
};

cpu_dev_register(intel_cpu_dev);
+
+#undef pr_fmt
+#define pr_fmt(fmt) "x86/split lock detection: " fmt
+
+static const struct {
+ const char *option;
+ enum split_lock_detect_state state;
+} sld_options[] __initconst = {
+ { "off", sld_off },
+ { "warn", sld_warn },
+ { "fatal", sld_fatal },
+};
+
+static inline bool match_option(const char *arg, int arglen, const char *opt)
+{
+ int len = strlen(opt);
+
+ return len == arglen && !strncmp(arg, opt, len);
+}
+
+static void __init split_lock_setup(void)
+{
+ char arg[20];
+ int i, ret;
+
+ setup_force_cpu_cap(X86_FEATURE_SPLIT_LOCK_DETECT);
+ sld_state = sld_warn;
+
+ ret = cmdline_find_option(boot_command_line, "split_lock_detect",
+ arg, sizeof(arg));
+ if (ret >= 0) {
+ for (i = 0; i < ARRAY_SIZE(sld_options); i++) {
+ if (match_option(arg, ret, sld_options[i].option)) {
+ sld_state = sld_options[i].state;
+ break;
+ }
+ }
+ }
+
+ switch (sld_state) {
+ case sld_off:
+ pr_info("disabled\n");
+ break;
+
+ case sld_warn:
+ pr_info("warning about user-space split_locks\n");
+ break;
+
+ case sld_fatal:
+ pr_info("sending SIGBUS on user-space split_locks\n");
+ break;
+ }
+}
+
+/*
+ * Locking is not required at the moment because only bit 29 of this
+ * MSR is implemented and locking would not prevent that the operation
+ * of one thread is immediately undone by the sibling thread.
+ * Use the "safe" versions of rdmsr/wrmsr here because although code
+ * checks CPUID and MSR bits to make sure the TEST_CTRL MSR should
+ * exist, there may be glitches in virtualization that leave a guest
+ * with an incorrect view of real h/w capabilities.
+ */
+static bool __sld_msr_set(bool on)
+{
+ u64 test_ctrl_val;
+
+ if (rdmsrl_safe(MSR_TEST_CTRL, &test_ctrl_val))
+ return false;
+
+ if (on)
+ test_ctrl_val |= MSR_TEST_CTRL_SPLIT_LOCK_DETECT;
+ else
+ test_ctrl_val &= ~MSR_TEST_CTRL_SPLIT_LOCK_DETECT;
+
+ return !wrmsrl_safe(MSR_TEST_CTRL, test_ctrl_val);
+}
+
+static void split_lock_init(void)
+{
+ if (sld_state == sld_off)
+ return;
+
+ if (__sld_msr_set(true))
+ return;
+
+ /*
+ * If this is anything other than the boot-cpu, you've done
+ * funny things and you get to keep whatever pieces.
+ */
+ pr_warn("MSR fail -- disabled\n");
+ sld_state = sld_off;
+}
+
+bool handle_user_split_lock(struct pt_regs *regs, long error_code)
+{
+ if ((regs->flags & X86_EFLAGS_AC) || sld_state == sld_fatal)
+ return false;
+
+ pr_warn_ratelimited("#AC: %s/%d took a split_lock trap at address: 0x%lx\n",
+ current->comm, current->pid, regs->ip);
+
+ /*
+ * Disable the split lock detection for this task so it can make
+ * progress and set TIF_SLD so the detection is re-enabled via
+ * switch_to_sld() when the task is scheduled out.
+ */
+ __sld_msr_set(false);
+ set_tsk_thread_flag(current, TIF_SLD);
+ return true;
+}
+
+/*
+ * This function is called only when switching between tasks with
+ * different split-lock detection modes. It sets the MSR for the
+ * mode of the new task. This is right most of the time, but since
+ * the MSR is shared by hyperthreads on a physical core there can
+ * be glitches when the two threads need different modes.
+ */
+void switch_to_sld(unsigned long tifn)
+{
+ __sld_msr_set(!(tifn & _TIF_SLD));
+}
+
+#define SPLIT_LOCK_CPU(model) {X86_VENDOR_INTEL, 6, model, X86_FEATURE_ANY}
+
+/*
+ * The following processors have the split lock detection feature. But
+ * since they don't have the IA32_CORE_CAPABILITIES MSR, the feature cannot
+ * be enumerated. Enable it by family and model matching on these
+ * processors.
+ */
+static const struct x86_cpu_id split_lock_cpu_ids[] __initconst = {
+ SPLIT_LOCK_CPU(INTEL_FAM6_ICELAKE_X),
+ SPLIT_LOCK_CPU(INTEL_FAM6_ICELAKE_L),
+ {}
+};
+
+void __init cpu_set_core_cap_bits(struct cpuinfo_x86 *c)
+{
+ u64 ia32_core_caps = 0;
+
+ if (c->x86_vendor != X86_VENDOR_INTEL)
+ return;
+ if (cpu_has(c, X86_FEATURE_CORE_CAPABILITIES)) {
+ /* Enumerate features reported in IA32_CORE_CAPABILITIES MSR. */
+ rdmsrl(MSR_IA32_CORE_CAPS, ia32_core_caps);
+ } else if (!boot_cpu_has(X86_FEATURE_HYPERVISOR)) {
+ /* Enumerate split lock detection by family and model. */
+ if (x86_match_cpu(split_lock_cpu_ids))
+ ia32_core_caps |= MSR_IA32_CORE_CAPS_SPLIT_LOCK_DETECT;
+ }
+
+ if (ia32_core_caps & MSR_IA32_CORE_CAPS_SPLIT_LOCK_DETECT)
+ split_lock_setup();
+}
diff --git a/arch/x86/kernel/process.c b/arch/x86/kernel/process.c
index 839b524..a43c328 100644
--- a/arch/x86/kernel/process.c
+++ b/arch/x86/kernel/process.c
@@ -650,6 +650,9 @@ void __switch_to_xtra(struct task_struct *prev_p, struct task_struct *next_p)
/* Enforce MSR update to ensure consistent state */
__speculation_ctrl_update(~tifn, tifn);
}
+
+ if ((tifp ^ tifn) & _TIF_SLD)
+ switch_to_sld(tifn);
}

/*
diff --git a/arch/x86/kernel/traps.c b/arch/x86/kernel/traps.c
index 9e6f822..9f42f0a 100644
--- a/arch/x86/kernel/traps.c
+++ b/arch/x86/kernel/traps.c
@@ -46,6 +46,7 @@
#include <asm/traps.h>
#include <asm/desc.h>
#include <asm/fpu/internal.h>
+#include <asm/cpu.h>
#include <asm/cpu_entry_area.h>
#include <asm/mce.h>
#include <asm/fixmap.h>
@@ -244,7 +245,6 @@ do_trap(int trapnr, int signr, char *str, struct pt_regs *regs,
{
struct task_struct *tsk = current;

-
if (!do_trap_no_signal(tsk, trapnr, str, regs, error_code))
return;

@@ -290,9 +290,29 @@ DO_ERROR(X86_TRAP_OLD_MF, SIGFPE, 0, NULL, "coprocessor segment overru
DO_ERROR(X86_TRAP_TS, SIGSEGV, 0, NULL, "invalid TSS", invalid_TSS)
DO_ERROR(X86_TRAP_NP, SIGBUS, 0, NULL, "segment not present", segment_not_present)
DO_ERROR(X86_TRAP_SS, SIGBUS, 0, NULL, "stack segment", stack_segment)
-DO_ERROR(X86_TRAP_AC, SIGBUS, BUS_ADRALN, NULL, "alignment check", alignment_check)
#undef IP

+dotraplinkage void do_alignment_check(struct pt_regs *regs, long error_code)
+{
+ char *str = "alignment check";
+
+ RCU_LOCKDEP_WARN(!rcu_is_watching(), "entry code didn't wake RCU");
+
+ if (notify_die(DIE_TRAP, str, regs, error_code, X86_TRAP_AC, SIGBUS) == NOTIFY_STOP)
+ return;
+
+ if (!user_mode(regs))
+ die("Split lock detected\n", regs, error_code);
+
+ local_irq_enable();
+
+ if (handle_user_split_lock(regs, error_code))
+ return;
+
+ do_trap(X86_TRAP_AC, SIGBUS, "alignment check", regs,
+ error_code, BUS_ADRALN, NULL);
+}
+
#ifdef CONFIG_VMAP_STACK
__visible void __noreturn handle_stack_overflow(const char *message,
struct pt_regs *regs,

2020-02-03 20:43:23

by Sean Christopherson

[permalink] [raw]
Subject: Re: [PATCH v17] x86/split_lock: Enable split lock detection by kernel

On Sun, Jan 26, 2020 at 12:05:35PM -0800, Luck, Tony wrote:
> +/*
> + * Locking is not required at the moment because only bit 29 of this
> + * MSR is implemented and locking would not prevent that the operation
> + * of one thread is immediately undone by the sibling thread.
> + * Use the "safe" versions of rdmsr/wrmsr here because although code
> + * checks CPUID and MSR bits to make sure the TEST_CTRL MSR should
> + * exist, there may be glitches in virtualization that leave a guest
> + * with an incorrect view of real h/w capabilities.
> + */
> +static bool __sld_msr_set(bool on)
> +{
> + u64 test_ctrl_val;
> +
> + if (rdmsrl_safe(MSR_TEST_CTRL, &test_ctrl_val))
> + return false;

How about caching the MSR value on a per-{cpu/core} basis at boot to avoid
the RDMSR when switching to/from from a misbehaving tasks? E.g. to avoid
penalizing well-behaved tasks any more than necessary.

We've likely got bigger issues if MSR_TEST_CTL is being written by BIOS
at runtime, even if the writes were limited to synchronous calls from the
kernel.

Probably makes sense to split the MSR's init sequence and runtime sequence,
e.g. to also use an unsafe wrmsrl() at runtime so that an unexpected #GP
generates a WARN.

> +
> + if (on)
> + test_ctrl_val |= MSR_TEST_CTRL_SPLIT_LOCK_DETECT;
> + else
> + test_ctrl_val &= ~MSR_TEST_CTRL_SPLIT_LOCK_DETECT;
> +
> + return !wrmsrl_safe(MSR_TEST_CTRL, test_ctrl_val);
> +}

2020-02-04 00:06:22

by Sean Christopherson

[permalink] [raw]
Subject: Re: [PATCH v17] x86/split_lock: Enable split lock detection by kernel

On Sun, Jan 26, 2020 at 12:05:35PM -0800, Luck, Tony wrote:

...

> +bool handle_user_split_lock(struct pt_regs *regs, long error_code)

No reason to take the error code unless there's a plan to use it.

> +{
> + if ((regs->flags & X86_EFLAGS_AC) || sld_state == sld_fatal)
> + return false;

Any objection to moving the EFLAGS.AC up to do_alignment_check()? And
take "unsigned long rip" instead of @regs?

That would allow KVM to reuse handle_user_split_lock() for guest faults
without any changes (other than exporting).

E.g. do_alignment_check() becomes:

if (!(regs->flags & X86_EFLAGS_AC) && handle_user_split_lock(regs->ip))
return;

> +
> + pr_warn_ratelimited("#AC: %s/%d took a split_lock trap at address: 0x%lx\n",
> + current->comm, current->pid, regs->ip);
> +
> + /*
> + * Disable the split lock detection for this task so it can make
> + * progress and set TIF_SLD so the detection is re-enabled via
> + * switch_to_sld() when the task is scheduled out.
> + */
> + __sld_msr_set(false);
> + set_tsk_thread_flag(current, TIF_SLD);
> + return true;
> +}

...

> +dotraplinkage void do_alignment_check(struct pt_regs *regs, long error_code)
> +{
> + char *str = "alignment check";
> +
> + RCU_LOCKDEP_WARN(!rcu_is_watching(), "entry code didn't wake RCU");
> +
> + if (notify_die(DIE_TRAP, str, regs, error_code, X86_TRAP_AC, SIGBUS) == NOTIFY_STOP)
> + return;
> +
> + if (!user_mode(regs))
> + die("Split lock detected\n", regs, error_code);
> +
> + local_irq_enable();
> +
> + if (handle_user_split_lock(regs, error_code))
> + return;
> +
> + do_trap(X86_TRAP_AC, SIGBUS, "alignment check", regs,
> + error_code, BUS_ADRALN, NULL);
> +}
> +
> #ifdef CONFIG_VMAP_STACK
> __visible void __noreturn handle_stack_overflow(const char *message,
> struct pt_regs *regs,
> --
> 2.21.1
>

2020-02-04 12:54:17

by Thomas Gleixner

[permalink] [raw]
Subject: Re: [PATCH v17] x86/split_lock: Enable split lock detection by kernel

Sean Christopherson <[email protected]> writes:

> On Sun, Jan 26, 2020 at 12:05:35PM -0800, Luck, Tony wrote:
>
> ...
>
>> +bool handle_user_split_lock(struct pt_regs *regs, long error_code)
>
> No reason to take the error code unless there's a plan to use it.
>
>> +{
>> + if ((regs->flags & X86_EFLAGS_AC) || sld_state == sld_fatal)
>> + return false;
>
> Any objection to moving the EFLAGS.AC up to do_alignment_check()? And
> take "unsigned long rip" instead of @regs?
>
> That would allow KVM to reuse handle_user_split_lock() for guest faults
> without any changes (other than exporting).
>
> E.g. do_alignment_check() becomes:
>
> if (!(regs->flags & X86_EFLAGS_AC) && handle_user_split_lock(regs->ip))
> return;

No objections.

Thanks,

tglx

2020-02-06 00:55:08

by Tony Luck

[permalink] [raw]
Subject: [PATCH] x86/split_lock: Avoid runtime reads of the TEST_CTRL MSR

In a context switch from a task that is detecting split locks
to one that is not (or vice versa) we need to update the TEST_CTRL
MSR. Currently this is done with the common sequence:
read the MSR
flip the bit
write the MSR
in order to avoid changing the value of any reserved bits in the MSR.

Cache the value of the TEST_CTRL MSR when we read it during initialization
so we can avoid an expensive RDMSR instruction during context switch.

Suggested-by: Sean Christopherson <[email protected]>
Signed-off-by: Tony Luck <[email protected]>
---
arch/x86/kernel/cpu/intel.c | 32 +++++++++++++++++++++++++-------
1 file changed, 25 insertions(+), 7 deletions(-)

diff --git a/arch/x86/kernel/cpu/intel.c b/arch/x86/kernel/cpu/intel.c
index 5d92e381fd91..78de69c5887a 100644
--- a/arch/x86/kernel/cpu/intel.c
+++ b/arch/x86/kernel/cpu/intel.c
@@ -1054,6 +1054,14 @@ static void __init split_lock_setup(void)
}
}

+/*
+ * Soft copy of MSR_TEST_CTRL initialized when we first read the
+ * MSR. Used at runtime to avoid using rdmsr again just to collect
+ * the reserved bits in the MSR. We assume reserved bits are the
+ * same on all CPUs.
+ */
+static u64 test_ctrl_val;
+
/*
* Locking is not required at the moment because only bit 29 of this
* MSR is implemented and locking would not prevent that the operation
@@ -1063,19 +1071,29 @@ static void __init split_lock_setup(void)
* exist, there may be glitches in virtualization that leave a guest
* with an incorrect view of real h/w capabilities.
*/
-static bool __sld_msr_set(bool on)
+static bool __sld_msr_init(void)
{
- u64 test_ctrl_val;
+ u64 val;

- if (rdmsrl_safe(MSR_TEST_CTRL, &test_ctrl_val))
+ if (rdmsrl_safe(MSR_TEST_CTRL, &val))
return false;
+ test_ctrl_val = val;
+
+ val |= MSR_TEST_CTRL_SPLIT_LOCK_DETECT;
+
+ return !wrmsrl_safe(MSR_TEST_CTRL, val);
+}
+
+static void __sld_msr_set(bool on)
+{
+ u64 val = test_ctrl_val;

if (on)
- test_ctrl_val |= MSR_TEST_CTRL_SPLIT_LOCK_DETECT;
+ val |= MSR_TEST_CTRL_SPLIT_LOCK_DETECT;
else
- test_ctrl_val &= ~MSR_TEST_CTRL_SPLIT_LOCK_DETECT;
+ val &= ~MSR_TEST_CTRL_SPLIT_LOCK_DETECT;

- return !wrmsrl_safe(MSR_TEST_CTRL, test_ctrl_val);
+ wrmsrl_safe(MSR_TEST_CTRL, val);
}

static void split_lock_init(void)
@@ -1083,7 +1101,7 @@ static void split_lock_init(void)
if (sld_state == sld_off)
return;

- if (__sld_msr_set(true))
+ if (__sld_msr_init())
return;

/*
--
2.21.1

2020-02-06 01:20:38

by Andy Lutomirski

[permalink] [raw]
Subject: Re: [PATCH] x86/split_lock: Avoid runtime reads of the TEST_CTRL MSR

On Wed, Feb 5, 2020 at 4:49 PM Luck, Tony <[email protected]> wrote:
>
> In a context switch from a task that is detecting split locks
> to one that is not (or vice versa) we need to update the TEST_CTRL
> MSR. Currently this is done with the common sequence:
> read the MSR
> flip the bit
> write the MSR
> in order to avoid changing the value of any reserved bits in the MSR.
>
> Cache the value of the TEST_CTRL MSR when we read it during initialization
> so we can avoid an expensive RDMSR instruction during context switch.

If something else that is per-cpu-ish gets added to the MSR in the
future, I will personally make fun of you for not making this percpu.

2020-02-06 16:47:30

by Tony Luck

[permalink] [raw]
Subject: Re: [PATCH] x86/split_lock: Avoid runtime reads of the TEST_CTRL MSR

On Wed, Feb 05, 2020 at 05:18:23PM -0800, Andy Lutomirski wrote:
> On Wed, Feb 5, 2020 at 4:49 PM Luck, Tony <[email protected]> wrote:
> >
> > In a context switch from a task that is detecting split locks
> > to one that is not (or vice versa) we need to update the TEST_CTRL
> > MSR. Currently this is done with the common sequence:
> > read the MSR
> > flip the bit
> > write the MSR
> > in order to avoid changing the value of any reserved bits in the MSR.
> >
> > Cache the value of the TEST_CTRL MSR when we read it during initialization
> > so we can avoid an expensive RDMSR instruction during context switch.
>
> If something else that is per-cpu-ish gets added to the MSR in the
> future, I will personally make fun of you for not making this percpu.

Xiaoyao Li has posted a version using a percpu cache value:

https://lore.kernel.org/r/[email protected]

So take that if it makes you happier. My patch only used the
cached value to store the state of the reserved bits in the MSR
and assumed those are the same for all cores.

Xiaoyao Li's version updates with what was most recently written
on each thread (but doesn't, and can't, make use of that because we
know that the other thread on the core may have changed the actual
value in the MSR).

If more bits are implemented that need to be set at run time, we
are likely up the proverbial creek. I'll see if I can find out if
there are plans for that.

-Tony

2020-02-06 19:38:23

by Andy Lutomirski

[permalink] [raw]
Subject: Re: [PATCH] x86/split_lock: Avoid runtime reads of the TEST_CTRL MSR


> On Feb 6, 2020, at 8:46 AM, Luck, Tony <[email protected]> wrote:
>
> On Wed, Feb 05, 2020 at 05:18:23PM -0800, Andy Lutomirski wrote:
>>> On Wed, Feb 5, 2020 at 4:49 PM Luck, Tony <[email protected]> wrote:
>>>
>>> In a context switch from a task that is detecting split locks
>>> to one that is not (or vice versa) we need to update the TEST_CTRL
>>> MSR. Currently this is done with the common sequence:
>>> read the MSR
>>> flip the bit
>>> write the MSR
>>> in order to avoid changing the value of any reserved bits in the MSR.
>>>
>>> Cache the value of the TEST_CTRL MSR when we read it during initialization
>>> so we can avoid an expensive RDMSR instruction during context switch.
>>
>> If something else that is per-cpu-ish gets added to the MSR in the
>> future, I will personally make fun of you for not making this percpu.
>
> Xiaoyao Li has posted a version using a percpu cache value:
>
> https://lore.kernel.org/r/[email protected]
>
> So take that if it makes you happier. My patch only used the
> cached value to store the state of the reserved bits in the MSR
> and assumed those are the same for all cores.
>
> Xiaoyao Li's version updates with what was most recently written
> on each thread (but doesn't, and can't, make use of that because we
> know that the other thread on the core may have changed the actual
> value in the MSR).
>
> If more bits are implemented that need to be set at run time, we
> are likely up the proverbial creek. I'll see if I can find out if
> there are plans for that.
>

I suppose that this whole thing is a giant mess, especially since at least one bit there is per-physical-core. Sigh.

So I don’t have a strong preference.

2020-03-03 21:42:28

by Sean Christopherson

[permalink] [raw]
Subject: Re: [PATCH] x86/split_lock: Avoid runtime reads of the TEST_CTRL MSR

On Thu, Feb 06, 2020 at 11:37:04AM -0800, Andy Lutomirski wrote:
>
> > On Feb 6, 2020, at 8:46 AM, Luck, Tony <[email protected]> wrote:
> >
> > On Wed, Feb 05, 2020 at 05:18:23PM -0800, Andy Lutomirski wrote:
> >>> On Wed, Feb 5, 2020 at 4:49 PM Luck, Tony <[email protected]> wrote:
> >>>
> >>> In a context switch from a task that is detecting split locks
> >>> to one that is not (or vice versa) we need to update the TEST_CTRL
> >>> MSR. Currently this is done with the common sequence:
> >>> read the MSR
> >>> flip the bit
> >>> write the MSR
> >>> in order to avoid changing the value of any reserved bits in the MSR.
> >>>
> >>> Cache the value of the TEST_CTRL MSR when we read it during initialization
> >>> so we can avoid an expensive RDMSR instruction during context switch.
> >>
> >> If something else that is per-cpu-ish gets added to the MSR in the
> >> future, I will personally make fun of you for not making this percpu.
> >
> > Xiaoyao Li has posted a version using a percpu cache value:
> >
> > https://lore.kernel.org/r/[email protected]
> >
> > So take that if it makes you happier. My patch only used the
> > cached value to store the state of the reserved bits in the MSR
> > and assumed those are the same for all cores.
> >
> > Xiaoyao Li's version updates with what was most recently written
> > on each thread (but doesn't, and can't, make use of that because we
> > know that the other thread on the core may have changed the actual
> > value in the MSR).
> >
> > If more bits are implemented that need to be set at run time, we
> > are likely up the proverbial creek. I'll see if I can find out if
> > there are plans for that.
> >
>
> I suppose that this whole thing is a giant mess, especially since at least
> one bit there is per-physical-core. Sigh.
>
> So I don’t have a strong preference.

I'd prefer to go with this patch, i.e. not percpu, to remove the temptation
of incorrectly optimizing away toggling SPLIT_LOCK_DETECT.