2024-03-21 19:56:01

by Maksim Davydov

[permalink] [raw]
Subject: [PATCH] x86/split_lock: fix delayed detection enabling

If the warn mode with disabled mitigation mode is used, then on each cpu
where the split lock occurred detection will be disabled in order to make
progress and delayed work will be scheduled, which then will enable
detection back. Now it turns out that all CPUs use one global delayed
work structure. This leads to the fact that if a split lock occurs on
several CPUs at the same time (within 2 jiffies), only one cpu will
schedule delayed work, but the rest will not. The return value of
schedule_delayed_work_on() would have shown this, but it is not checked
in the code
In order to fix the warn mode with disabled mitigation mode, delayed work
has to be a per-cpu.

Fixes: 727209376f49 ("x86/split_lock: Add sysctl to control the misery mode")
Signed-off-by: Maksim Davydov <[email protected]>
---
arch/x86/kernel/cpu/intel.c | 65 ++++++++++++++++++++++---------------
1 file changed, 39 insertions(+), 26 deletions(-)

diff --git a/arch/x86/kernel/cpu/intel.c b/arch/x86/kernel/cpu/intel.c
index 40dec9b56f87..655165225d49 100644
--- a/arch/x86/kernel/cpu/intel.c
+++ b/arch/x86/kernel/cpu/intel.c
@@ -996,28 +996,6 @@ static struct ratelimit_state bld_ratelimit;
static unsigned int sysctl_sld_mitigate = 1;
static DEFINE_SEMAPHORE(buslock_sem, 1);

-#ifdef CONFIG_PROC_SYSCTL
-static struct ctl_table sld_sysctls[] = {
- {
- .procname = "split_lock_mitigate",
- .data = &sysctl_sld_mitigate,
- .maxlen = sizeof(unsigned int),
- .mode = 0644,
- .proc_handler = proc_douintvec_minmax,
- .extra1 = SYSCTL_ZERO,
- .extra2 = SYSCTL_ONE,
- },
-};
-
-static int __init sld_mitigate_sysctl_init(void)
-{
- register_sysctl_init("kernel", sld_sysctls);
- return 0;
-}
-
-late_initcall(sld_mitigate_sysctl_init);
-#endif
-
static inline bool match_option(const char *arg, int arglen, const char *opt)
{
int len = strlen(opt), ratelimit;
@@ -1140,7 +1118,43 @@ static void __split_lock_reenable(struct work_struct *work)
{
sld_update_msr(true);
}
-static DECLARE_DELAYED_WORK(sl_reenable, __split_lock_reenable);
+/*
+ * In order for each cpu to schedule itself delayed work independently of the
+ * others, delayed work struct should be per-cpu. This is not required when
+ * sysctl_sld_mitigate is enabled because of the semaphore that limits
+ * the number of simultaneously scheduled delayed works to 1.
+ */
+static DEFINE_PER_CPU(struct delayed_work, sl_reenable);
+
+#ifdef CONFIG_PROC_SYSCTL
+static struct ctl_table sld_sysctls[] = {
+ {
+ .procname = "split_lock_mitigate",
+ .data = &sysctl_sld_mitigate,
+ .maxlen = sizeof(unsigned int),
+ .mode = 0644,
+ .proc_handler = proc_douintvec_minmax,
+ .extra1 = SYSCTL_ZERO,
+ .extra2 = SYSCTL_ONE,
+ },
+};
+
+static int __init sld_mitigate_sysctl_init(void)
+{
+ unsigned int cpu;
+
+ for_each_possible_cpu(cpu) {
+ struct delayed_work *work = per_cpu_ptr(&sl_reenable, cpu);
+ *work = (struct delayed_work) __DELAYED_WORK_INITIALIZER(*work,
+ __split_lock_reenable, 0);
+ }
+
+ register_sysctl_init("kernel", sld_sysctls);
+ return 0;
+}
+
+late_initcall(sld_mitigate_sysctl_init);
+#endif

/*
* If a CPU goes offline with pending delayed work to re-enable split lock
@@ -1182,12 +1196,11 @@ static void split_lock_warn(unsigned long ip)
*/
if (down_interruptible(&buslock_sem) == -EINTR)
return;
- work = &sl_reenable_unlock;
- } else {
- work = &sl_reenable;
}

cpu = get_cpu();
+ work = (sysctl_sld_mitigate ?
+ &sl_reenable_unlock : this_cpu_ptr(&sl_reenable));
schedule_delayed_work_on(cpu, work, 2);

/* Disable split lock detection on this CPU to make progress */
--
2.34.1



2024-03-21 20:12:08

by Ingo Molnar

[permalink] [raw]
Subject: Re: [PATCH] x86/split_lock: fix delayed detection enabling


* Maksim Davydov <[email protected]> wrote:

> If the warn mode with disabled mitigation mode is used, then on each cpu
> where the split lock occurred detection will be disabled in order to make
> progress and delayed work will be scheduled, which then will enable
> detection back. Now it turns out that all CPUs use one global delayed
> work structure. This leads to the fact that if a split lock occurs on
> several CPUs at the same time (within 2 jiffies), only one cpu will
> schedule delayed work, but the rest will not. The return value of
> schedule_delayed_work_on() would have shown this, but it is not checked
> in the code
> In order to fix the warn mode with disabled mitigation mode, delayed work
> has to be a per-cpu.

Please be more careful about changelog typography. The above portion has:

- ~3 capitalization inconsistencies
- one missing period

> +/*
> + * In order for each cpu to schedule itself delayed work independently of the
> + * others, delayed work struct should be per-cpu. This is not required when
> + * sysctl_sld_mitigate is enabled because of the semaphore that limits
> + * the number of simultaneously scheduled delayed works to 1.
> + */

.. and some of that seeped into this comment block as well, plus there's a
missing comma as well.

Thanks,

Ingo

2024-03-31 17:25:11

by Guilherme G. Piccoli

[permalink] [raw]
Subject: Re: [PATCH] x86/split_lock: fix delayed detection enabling

On 21/03/2024 16:55, Maksim Davydov wrote:
> If the warn mode with disabled mitigation mode is used, then on each cpu
> where the split lock occurred detection will be disabled in order to make
> progress and delayed work will be scheduled, which then will enable
> detection back. Now it turns out that all CPUs use one global delayed
> work structure. This leads to the fact that if a split lock occurs on
> several CPUs at the same time (within 2 jiffies), only one cpu will
> schedule delayed work, but the rest will not. The return value of
> schedule_delayed_work_on() would have shown this, but it is not checked
> in the code
> In order to fix the warn mode with disabled mitigation mode, delayed work
> has to be a per-cpu.
>
> Fixes: 727209376f49 ("x86/split_lock: Add sysctl to control the misery mode")

Thanks Maksim! I confess I (think I) understand the theory behind the
possible problem, but I'm not seeing how it happens - probably just me
being silly , but can you help me to understand it clearly?

Let's say we have 2 CPUs, CPU0 and CPU1 and we're running with
sld_mitigate = 0, meaning we don't have "the misery".

If the code running in CPU0 reaches split_lock_warn(), my understanding
is that it warns the user, schedule the sld reenable [via and
schedule_delayed_work_on()] and disables the feature with
sld_update_msr(false), correct? So, does this disabling happens only at
core level, or it disables for the whole CPU including all cores?

But back to our example, if CPU1 detects the split lock, it'll run the
same procedure as CPU0 did - so are you saying we have a race there if
CPU1 face a split lock before CPU0 disabled the MSR?

Maybe a more clear example of the issue would be even helpful in the
commit message, showing the path both CPUs would take and how the
problem happens exactly.

Thanks in advance,


Guilherme

2024-04-19 10:16:22

by Maksim Davydov

[permalink] [raw]
Subject: Re: [PATCH] x86/split_lock: fix delayed detection enabling


On 3/31/24 20:07, Guilherme G. Piccoli wrote:
> On 21/03/2024 16:55, Maksim Davydov wrote:
>> If the warn mode with disabled mitigation mode is used, then on each cpu
>> where the split lock occurred detection will be disabled in order to make
>> progress and delayed work will be scheduled, which then will enable
>> detection back. Now it turns out that all CPUs use one global delayed
>> work structure. This leads to the fact that if a split lock occurs on
>> several CPUs at the same time (within 2 jiffies), only one cpu will
>> schedule delayed work, but the rest will not. The return value of
>> schedule_delayed_work_on() would have shown this, but it is not checked
>> in the code
>> In order to fix the warn mode with disabled mitigation mode, delayed work
>> has to be a per-cpu.
>>
>> Fixes: 727209376f49 ("x86/split_lock: Add sysctl to control the misery mode")
> Thanks Maksim! I confess I (think I) understand the theory behind the
> possible problem, but I'm not seeing how it happens - probably just me
> being silly , but can you help me to understand it clearly?
>
> Let's say we have 2 CPUs, CPU0 and CPU1 and we're running with
> sld_mitigate = 0, meaning we don't have "the misery".
>
> If the code running in CPU0 reaches split_lock_warn(), my understanding
> is that it warns the user, schedule the sld reenable [via and
> schedule_delayed_work_on()] and disables the feature with
> sld_update_msr(false), correct? So, does this disabling happens only at
> core level, or it disables for the whole CPU including all cores?
>
> But back to our example, if CPU1 detects the split lock, it'll run the
> same procedure as CPU0 did - so are you saying we have a race there if
> CPU1 face a split lock before CPU0 disabled the MSR?
>
> Maybe a more clear example of the issue would be even helpful in the
> commit message, showing the path both CPUs would take and how the
> problem happens exactly.
>
> Thanks in advance,
>
>
> Guilherme

Sorry for a late reply.

I made a diagram to explain how this bug occurs. If it makes it clearer,
then I will include the diagram in the commit description.

Some information that should be taken into account:
* sld_update_msr() enables/disables SLD on both CPUs on the same core
* schedule_delayed_work_on() internally checks WORK_STRUCT_PENDING_BIT. If
  a work has the 'pending' status, then schedule_delayed_work_on() will
  return an error code and, most importantly, the work will not be placed
  in the workqueue

Let's say we have a multicore system on which split_lock_mitigate=0 and
a multithreaded application is running that calls splitlock in multiple
threads. Due to the fact that sld_update_msr() affects the entire core (both
CPUs), we will consider 2 CPUs from different cores. Let the 2 threads of
this application schedule to CPU0 (core 0) and to CPU 2 (core 1), then:

| ||                                      |
|             CPU 0 (core 0)          ||             CPU 2 (core
1)           |
|_____________________________________||______________________________________|
| ||                                      |
| 1) SPLIT LOCK occured ||                                      |
| ||                                      |
| 2) split_lock_warn() ||                                      |
| ||                                      |
| 3) sysctl_sld_mitigate == 0 ||                                      |
|    (work = &sl_reenable) ||                                      |
| ||                                      |
| 4) schedule_delayed_work_on() ||                                      |
|    (reenable will be called after
||                                      |
|     2 jiffies on CPU 0) ||                                      |
| ||                                      |
| 5) disable SLD for core 0 ||                                      |
| ||                                      |
|    ----------------------------- ||                                      |
| ||                                      |
|                                     || 6) SPLIT LOCK
occured                |
| ||                                      |
|                                     || 7)
split_lock_warn()                 |
| ||                                      |
|                                     || 8) sysctl_sld_mitigate ==
0          |
|                                     ||    (work =
&sl_reenable,             |
|                                     ||     the same address as in 3)
)      |
| ||                                      |
|             2 jiffies               || 9) schedule_delayed_work_on()
fails  |
|                                     ||    because the work is in
the        |
|                                     ||    pending state since 4). The
work  |
|                                     ||    wasn't placed to the
workqueue.   |
|                                     ||    reenable won't be called on
CPU 2 |
| ||                                      |
|                                     || 10) disable SLD for core
0           |
| ||                                      |
|                                     ||     From now on, SLD
will            |
|                                     ||     never be reenabled on core
1     |
| ||                                      |
|    ----------------------------- ||                                      |
| ||                                      |
|    11) enable SLD for core 0 by ||                                      |
|        __split_lock_reenable ||                                      |
| ||                                      |


If the application threads can be scheduled to all processor cores, then
over
time there will be only one core left, on which SLD will be enabled and
split
lock can be detected; and on all other cores SLD will be disabled all the
time.
Most likely, this bug has not been noticed for so long because
sysctl_sld_mitigate default value is 1, and in this case a semaphore is used
that does not allow 2 different cores to have SLD disabled at the same time,
that is, strictly only one work is placed in the workqueue.

--
Best regards,
Maksim Davydov


2024-04-19 11:29:12

by Maksim Davydov

[permalink] [raw]
Subject: Re: [PATCH] x86/split_lock: fix delayed detection enabling



On 3/31/24 20:07, Guilherme G. Piccoli wrote:
> On 21/03/2024 16:55, Maksim Davydov wrote:
>> If the warn mode with disabled mitigation mode is used, then on each cpu
>> where the split lock occurred detection will be disabled in order to make
>> progress and delayed work will be scheduled, which then will enable
>> detection back. Now it turns out that all CPUs use one global delayed
>> work structure. This leads to the fact that if a split lock occurs on
>> several CPUs at the same time (within 2 jiffies), only one cpu will
>> schedule delayed work, but the rest will not. The return value of
>> schedule_delayed_work_on() would have shown this, but it is not checked
>> in the code
>> In order to fix the warn mode with disabled mitigation mode, delayed work
>> has to be a per-cpu.
>>
>> Fixes: 727209376f49 ("x86/split_lock: Add sysctl to control the misery mode")
>
> Thanks Maksim! I confess I (think I) understand the theory behind the
> possible problem, but I'm not seeing how it happens - probably just me
> being silly , but can you help me to understand it clearly?
>
> Let's say we have 2 CPUs, CPU0 and CPU1 and we're running with
> sld_mitigate = 0, meaning we don't have "the misery".
>
> If the code running in CPU0 reaches split_lock_warn(), my understanding
> is that it warns the user, schedule the sld reenable [via and
> schedule_delayed_work_on()] and disables the feature with
> sld_update_msr(false), correct? So, does this disabling happens only at
> core level, or it disables for the whole CPU including all cores?
>
> But back to our example, if CPU1 detects the split lock, it'll run the
> same procedure as CPU0 did - so are you saying we have a race there if
> CPU1 face a split lock before CPU0 disabled the MSR?
>
> Maybe a more clear example of the issue would be even helpful in the
> commit message, showing the path both CPUs would take and how the
> problem happens exactly.
>
> Thanks in advance,
>
>
> Guilherme



Resend with fixed formatting

Sorry for a late reply.

I made a diagram to explain how this bug occurs. If it makes it
clearer, then I will include the diagram in the commit
description.

Some information that should be taken into account:
* sld_update_msr() enables/disables SLD on both CPUs on the same core
* schedule_delayed_work_on() internally checks WORK_STRUCT_PENDING_BIT.
If a work has the 'pending' status, then schedule_delayed_work_on()
will return an error code and, most importantly, the work will not
be placed in the workqueue.

Let's say we have a multicore system on which split_lock_mitigate=0 and
a multithreaded application is running that calls splitlock in multiple
threads. Due to the fact that sld_update_msr() affects the entire core
(both CPUs), we will consider 2 CPUs from different cores. Let the 2
threads of this application schedule to CPU0 (core 0) and to CPU 2
(core 1), then:

| || |
| CPU 0 (core 0) || CPU 2 (core 1) |
|_________________________________||___________________________________|
| || |
| 1) SPLIT LOCK occured || |
| || |
| 2) split_lock_warn() || |
| || |
| 3) sysctl_sld_mitigate == 0 || |
| (work = &sl_reenable) || |
| || |
| 4) schedule_delayed_work_on() || |
| (reenable will be called || |
| after 2 jiffies on CPU 0) || |
| || |
| 5) disable SLD for core 0 || |
| || |
| ------------------------- || |
| || |
| || 6) SPLIT LOCK occured |
| || |
| || 7) split_lock_warn() |
| || |
| || 8) sysctl_sld_mitigate == 0 |
| || (work = &sl_reenable, |
| || the same address as in 3) ) |
| || |
| 2 jiffies || 9) schedule_delayed_work_on() |
| || fials because the work is in |

| || the pending state since 4). |
| || The work wasn't placed to the |
| || workqueue. reenable won't be |
| || called on CPU 2 |
| || |
| || 10) disable SLD for core 0 |
| || |
| || From now on SLD will |
| || never be reenabled on core 1 |
| || |
| ------------------------- || |
| || |
| 11) enable SLD for core 0 by || |
| __split_lock_reenable || |
| || |


If the application threads can be scheduled to all processor cores,
then over time there will be only one core left, on which SLD will be
enabled and split lock will be able to be detected; and on all other
cores SLD will be disabled all the time.
Most likely, this bug has not been noticed for so long because
sysctl_sld_mitigate default value is 1, and in this case a semaphore
is used that does not allow 2 different cores to have SLD disabled at
the same time, that is, strictly only one work is placed in the
workqueue.

--
Best regards,
Maksim Davydov

2024-04-21 18:28:03

by Guilherme G. Piccoli

[permalink] [raw]
Subject: Re: [PATCH] x86/split_lock: fix delayed detection enabling

On 19/04/2024 08:26, Maksim Davydov wrote:
>
> [...]
> Some information that should be taken into account:
> * sld_update_msr() enables/disables SLD on both CPUs on the same core
> * schedule_delayed_work_on() internally checks WORK_STRUCT_PENDING_BIT.
> If a work has the 'pending' status, then schedule_delayed_work_on()
> will return an error code and, most importantly, the work will not
> be placed in the workqueue.
>
> Let's say we have a multicore system on which split_lock_mitigate=0 and
> a multithreaded application is running that calls splitlock in multiple
> threads. Due to the fact that sld_update_msr() affects the entire core
> (both CPUs), we will consider 2 CPUs from different cores. Let the 2
> threads of this application schedule to CPU0 (core 0) and to CPU 2
> (core 1), then:
>
> | || |
> | CPU 0 (core 0) || CPU 2 (core 1) |
> |_________________________________||___________________________________|
> | || |
> | 1) SPLIT LOCK occured || |
> | || |
> | 2) split_lock_warn() || |
> | || |
> | 3) sysctl_sld_mitigate == 0 || |
> | (work = &sl_reenable) || |
> | || |
> | 4) schedule_delayed_work_on() || |
> | (reenable will be called || |
> | after 2 jiffies on CPU 0) || |
> | || |
> | 5) disable SLD for core 0 || |
> | || |
> | ------------------------- || |
> | || |
> | || 6) SPLIT LOCK occured |
> | || |
> | || 7) split_lock_warn() |
> | || |
> | || 8) sysctl_sld_mitigate == 0 |
> | || (work = &sl_reenable, |
> | || the same address as in 3) ) |
> | || |
> | 2 jiffies || 9) schedule_delayed_work_on() |
> | || fials because the work is in |
>
> | || the pending state since 4). |
> | || The work wasn't placed to the |
> | || workqueue. reenable won't be |
> | || called on CPU 2 |
> | || |
> | || 10) disable SLD for core 0 |
> | || |
> | || From now on SLD will |
> | || never be reenabled on core 1 |
> | || |
> | ------------------------- || |
> | || |
> | 11) enable SLD for core 0 by || |
> | __split_lock_reenable || |
> | || |
>
>
> If the application threads can be scheduled to all processor cores,
> then over time there will be only one core left, on which SLD will be
> enabled and split lock will be able to be detected; and on all other
> cores SLD will be disabled all the time.
> Most likely, this bug has not been noticed for so long because
> sysctl_sld_mitigate default value is 1, and in this case a semaphore
> is used that does not allow 2 different cores to have SLD disabled at
> the same time, that is, strictly only one work is placed in the
> workqueue.
>

Hi Maksim, this is awesome! Thanks a lot for the diagram, super clear now.

Well, I think you nailed it and we should get the patch merged, right?
I'm not sure if the diagram should be included or not in the commit
message - it's good but big, maybe include a lore archive mentioning the
diagram in a V2?

Cheers,


Guilherme