2020-02-22 16:26:22

by srinivas pandruvada

[permalink] [raw]
Subject: [PATCH] x86/mce/therm_throt: Handle case where throttle_active_work() is called on behalf of an offline CPU

During cpu-hotplug test with CONFIG_PREEMPTION and CONFIG_DEBUG_PREEMPT
enabled, Chris reported error:

BUG: using smp_processor_id() in preemptible [00000000] code: kworker/1:0/17
caller is throttle_active_work+0x12/0x280

Here throttle_active_work() is a work queue callback scheduled with
schedule_delayed_work_on(). This will not cause this error for the use
of smp_processor_id() under normal conditions as there is a check for
"current->nr_cpus_allowed == 1".
But when the target CPU is offline the workqueue becomes unbound.
Then the work queue callback can be scheduled on another CPU and the
error is printed for the use of smp_processor_id() in preemptible context.

When the workqueue is not getting called on the target CPU, simply return.
This is done by adding a cpu field in the _thermal_state struct and match
the current CPU id.

Once workqueue is scheduled, prevent CPU offline. In this way, the log
bits are checked and cleared on the correct CPU. Also use get_cpu() to
get current CPU id and prevent preemption before we finish processing.

Fixes: f6656208f04e ("x86/mce/therm_throt: Optimize notifications of thermal throttle")
Reported-by: Chris Wilson <[email protected]>
Signed-off-by: Srinivas Pandruvada <[email protected]>
Reviewed-by: Tony Luck <[email protected]>
---
arch/x86/kernel/cpu/mce/therm_throt.c | 19 +++++++++++++++++--
1 file changed, 17 insertions(+), 2 deletions(-)

diff --git a/arch/x86/kernel/cpu/mce/therm_throt.c b/arch/x86/kernel/cpu/mce/therm_throt.c
index 58b4ee3cda77..4dab8a4558f9 100644
--- a/arch/x86/kernel/cpu/mce/therm_throt.c
+++ b/arch/x86/kernel/cpu/mce/therm_throt.c
@@ -61,6 +61,7 @@
* @new_event: Stores the last high/low status of the
* THERM_STATUS_PROCHOT or
* THERM_STATUS_POWER_LIMIT.
+ * @cpu: CPU id for this instance.
* @level: Stores whether this _thermal_state instance is
* for a CORE level or for PACKAGE level.
* @sample_index: Index for storing the next sample in the buffer
@@ -86,6 +87,7 @@ struct _thermal_state {
unsigned long total_time_ms;
bool rate_control_active;
bool new_event;
+ int cpu;
u8 level;
u8 sample_index;
u8 sample_count;
@@ -239,11 +241,19 @@ static void __maybe_unused throttle_active_work(struct work_struct *work)
{
struct _thermal_state *state = container_of(to_delayed_work(work),
struct _thermal_state, therm_work);
- unsigned int i, avg, this_cpu = smp_processor_id();
+ unsigned int i, avg, this_cpu;
u64 now = get_jiffies_64();
bool hot;
u8 temp;

+ get_online_cpus();
+ this_cpu = get_cpu();
+
+ if (state->cpu != this_cpu) {
+ state->rate_control_active = false;
+ goto end;
+ }
+
get_therm_status(state->level, &hot, &temp);
/* temperature value is offset from the max so lesser means hotter */
if (!hot && temp > state->baseline_temp) {
@@ -254,7 +264,7 @@ static void __maybe_unused throttle_active_work(struct work_struct *work)
state->count);

state->rate_control_active = false;
- return;
+ goto end;
}

if (time_before64(now, state->next_check) &&
@@ -296,6 +306,10 @@ static void __maybe_unused throttle_active_work(struct work_struct *work)
re_arm:
clear_therm_status_log(state->level);
schedule_delayed_work_on(this_cpu, &state->therm_work, THERM_THROT_POLL_INTERVAL);
+
+end:
+ put_cpu();
+ put_online_cpus();
}

/***
@@ -359,6 +373,7 @@ static void therm_throt_process(bool new_event, int event, int level)

state->baseline_temp = temp;
state->last_interrupt_time = now;
+ state->cpu = this_cpu;
schedule_delayed_work_on(this_cpu, &state->therm_work, THERM_THROT_POLL_INTERVAL);
} else if (old_event && state->last_interrupt_time) {
unsigned long throttle_time;
--
2.24.1


2020-02-22 16:54:11

by Chris Wilson

[permalink] [raw]
Subject: Re: [PATCH] x86/mce/therm_throt: Handle case where throttle_active_work() is called on behalf of an offline CPU

Quoting Srinivas Pandruvada (2020-02-22 16:24:32)
> During cpu-hotplug test with CONFIG_PREEMPTION and CONFIG_DEBUG_PREEMPT
> enabled, Chris reported error:
>
> BUG: using smp_processor_id() in preemptible [00000000] code: kworker/1:0/17
> caller is throttle_active_work+0x12/0x280
>
> Here throttle_active_work() is a work queue callback scheduled with
> schedule_delayed_work_on(). This will not cause this error for the use
> of smp_processor_id() under normal conditions as there is a check for
> "current->nr_cpus_allowed == 1".
> But when the target CPU is offline the workqueue becomes unbound.
> Then the work queue callback can be scheduled on another CPU and the
> error is printed for the use of smp_processor_id() in preemptible context.
>
> When the workqueue is not getting called on the target CPU, simply return.
> This is done by adding a cpu field in the _thermal_state struct and match
> the current CPU id.
>
> Once workqueue is scheduled, prevent CPU offline. In this way, the log
> bits are checked and cleared on the correct CPU. Also use get_cpu() to
> get current CPU id and prevent preemption before we finish processing.
>
> Fixes: f6656208f04e ("x86/mce/therm_throt: Optimize notifications of thermal throttle")
> Reported-by: Chris Wilson <[email protected]>
> Signed-off-by: Srinivas Pandruvada <[email protected]>
> Reviewed-by: Tony Luck <[email protected]>

I've pushed the patch to our CI, but it's not a frequent occurrence, so
it may be some time before I can state a t-b with any confidence.
-Chris

2020-02-22 17:52:30

by Borislav Petkov

[permalink] [raw]
Subject: Re: [PATCH] x86/mce/therm_throt: Handle case where throttle_active_work() is called on behalf of an offline CPU

On Sat, Feb 22, 2020 at 08:24:32AM -0800, Srinivas Pandruvada wrote:
> During cpu-hotplug test with CONFIG_PREEMPTION and CONFIG_DEBUG_PREEMPT
> enabled, Chris reported error:
>
> BUG: using smp_processor_id() in preemptible [00000000] code: kworker/1:0/17
> caller is throttle_active_work+0x12/0x280
>
> Here throttle_active_work() is a work queue callback scheduled with
> schedule_delayed_work_on(). This will not cause this error for the use
> of smp_processor_id() under normal conditions as there is a check for
> "current->nr_cpus_allowed == 1".
> But when the target CPU is offline the workqueue becomes unbound.
> Then the work queue callback can be scheduled on another CPU and the
> error is printed for the use of smp_processor_id() in preemptible context.

So what's wrong with simply doing:

if (cpu_is_offline(this_cpu))
return;

?

You don't need to run the callback on an offlined CPU anyway...

--
Regards/Gruss,
Boris.

https://people.kernel.org/tglx/notes-about-netiquette

2020-02-23 00:26:26

by srinivas pandruvada

[permalink] [raw]
Subject: Re: [PATCH] x86/mce/therm_throt: Handle case where throttle_active_work() is called on behalf of an offline CPU

On Sat, 2020-02-22 at 18:51 +0100, Borislav Petkov wrote:
> On Sat, Feb 22, 2020 at 08:24:32AM -0800, Srinivas Pandruvada wrote:
> > During cpu-hotplug test with CONFIG_PREEMPTION and
> > CONFIG_DEBUG_PREEMPT
> > enabled, Chris reported error:
> >
> > BUG: using smp_processor_id() in preemptible [00000000] code:
> > kworker/1:0/17
> > caller is throttle_active_work+0x12/0x280
> >
> > Here throttle_active_work() is a work queue callback scheduled with
> > schedule_delayed_work_on(). This will not cause this error for the
> > use
> > of smp_processor_id() under normal conditions as there is a check
> > for
> > "current->nr_cpus_allowed == 1".
> > But when the target CPU is offline the workqueue becomes unbound.
> > Then the work queue callback can be scheduled on another CPU and
> > the
> > error is printed for the use of smp_processor_id() in preemptible
> > context.
>
> So what's wrong with simply doing:
>
> if (cpu_is_offline(this_cpu))
> return;
>
> ?
>
If the condition is false, will it prevent offline CPU before executing
next statement and reschedule on another CPU? Although It will not
cause any error or crash but in rare circumstance may print premature
warning/normal message based on the current CPU's state.

So I can submit something like this:

diff --git a/arch/x86/kernel/cpu/mce/therm_throt.c
b/arch/x86/kernel/cpu/mce/therm_throt.c
index b38010b541d6..7aa7c9d1df2a 100644
--- a/arch/x86/kernel/cpu/mce/therm_throt.c
+++ b/arch/x86/kernel/cpu/mce/therm_throt.c
@@ -239,11 +239,14 @@ static void throttle_active_work(struct
work_struct *work)
{
struct _thermal_state *state =
container_of(to_delayed_work(work),
struct _thermal_state,
therm_work);
- unsigned int i, avg, this_cpu = smp_processor_id();
+ unsigned int i, avg, this_cpu = state->cpu;
u64 now = get_jiffies_64();
bool hot;
u8 temp;

+ if (cpu_is_offline(this_cpu))
+ return;
+
get_therm_status(state->level, &hot, &temp);
/* temperature value is offset from the max so lesser means
hotter */
if (!hot && temp > state->baseline_temp) {

Thanks,
Srinivas

> You don't need to run the callback on an offlined CPU anyway...
>

2020-02-24 12:55:53

by Borislav Petkov

[permalink] [raw]
Subject: Re: [PATCH] x86/mce/therm_throt: Handle case where throttle_active_work() is called on behalf of an offline CPU

On Sat, Feb 22, 2020 at 04:25:59PM -0800, Srinivas Pandruvada wrote:
> If the condition is false, will it prevent offline CPU before executing
> next statement and reschedule on another CPU? Although It will not
> cause any error or crash but in rare circumstance may print premature
> warning/normal message based on the current CPU's state.

Why, offline CPU is offline CPU?

Btw, I'm asking whether you can do the simpler thing *instead* of your
patch. You basically don't run the workqueue callback on offlined CPUs:

get_online_cpus();

if (cpu_is_offline(smp_processor_id()))
goto out;

...


out:
put_online_cpus();

Hmm?

--
Regards/Gruss,
Boris.

https://people.kernel.org/tglx/notes-about-netiquette

2020-02-24 16:03:38

by Thomas Gleixner

[permalink] [raw]
Subject: Re: [PATCH] x86/mce/therm_throt: Handle case where throttle_active_work() is called on behalf of an offline CPU

Borislav Petkov <[email protected]> writes:

> On Sat, Feb 22, 2020 at 04:25:59PM -0800, Srinivas Pandruvada wrote:
>> If the condition is false, will it prevent offline CPU before executing
>> next statement and reschedule on another CPU? Although It will not
>> cause any error or crash but in rare circumstance may print premature
>> warning/normal message based on the current CPU's state.
>
> Why, offline CPU is offline CPU?
>
> Btw, I'm asking whether you can do the simpler thing *instead* of your
> patch. You basically don't run the workqueue callback on offlined CPUs:
>
> get_online_cpus();
>
> if (cpu_is_offline(smp_processor_id()))
> goto out;
>
> ...
>
>
> out:
> put_online_cpus();

Which is wrong as well. Trying to "fix" it in the work queue callback is
papering over the root cause.

Why is any work scheduled on an outgoing CPU after this CPU executed
thermal_throttle_offline()?

When thermal_throttle_offline() is invoked the cpu bound work queues are
still functional and thermal_throttle_offline() cancels outstanding
work.

So no, please fix the root cause not the symptom.

Thanks,

tglx



2020-02-24 19:26:15

by Thomas Gleixner

[permalink] [raw]
Subject: Re: [PATCH] x86/mce/therm_throt: Handle case where throttle_active_work() is called on behalf of an offline CPU

Thomas Gleixner <[email protected]> writes:
> Which is wrong as well. Trying to "fix" it in the work queue callback is
> papering over the root cause.
>
> Why is any work scheduled on an outgoing CPU after this CPU executed
> thermal_throttle_offline()?
>
> When thermal_throttle_offline() is invoked the cpu bound work queues are
> still functional and thermal_throttle_offline() cancels outstanding
> work.
>
> So no, please fix the root cause not the symptom.

And if you look at thermal_throttle_online() then you'll notice that it
is asymetric vs. thermal_throttle_offline().

Also you want to do cancel_delayed_work_sync() and not just
cancel_delayed_work() because only the latter guarantees that the work
is not enqueued anymore while the former does not take running or self
requeueing work into account.

Something like the untested patch below.

Thanks,

tglx
---
--- a/arch/x86/kernel/cpu/mce/therm_throt.c
+++ b/arch/x86/kernel/cpu/mce/therm_throt.c
@@ -487,8 +487,12 @@ static int thermal_throttle_offline(unsi
struct thermal_state *state = &per_cpu(thermal_state, cpu);
struct device *dev = get_cpu_device(cpu);

- cancel_delayed_work(&state->package_throttle.therm_work);
- cancel_delayed_work(&state->core_throttle.therm_work);
+ /* Mask the thermal vector before draining evtl. pending work */
+ l = apic_read(APIC_LVTTHMR);
+ apic_write(APIC_LVTTHMR, l | APIC_LVT_MASKED);
+
+ cancel_delayed_work_sync(&state->package_throttle.therm_work);
+ cancel_delayed_work_sync(&state->core_throttle.therm_work);

state->package_throttle.rate_control_active = false;
state->core_throttle.rate_control_active = false;

2020-02-24 21:05:33

by Pandruvada, Srinivas

[permalink] [raw]
Subject: Re: [PATCH] x86/mce/therm_throt: Handle case where throttle_active_work() is called on behalf of an offline CPU

On Mon, 2020-02-24 at 20:25 +0100, Thomas Gleixner wrote:
> Thomas Gleixner <[email protected]> writes:
> > Which is wrong as well. Trying to "fix" it in the work queue
> > callback is
> > papering over the root cause.
> >
> > Why is any work scheduled on an outgoing CPU after this CPU
> > executed
> > thermal_throttle_offline()?
> >
> > When thermal_throttle_offline() is invoked the cpu bound work
> > queues are
> > still functional and thermal_throttle_offline() cancels outstanding
> > work.
> >
> > So no, please fix the root cause not the symptom.
>
> And if you look at thermal_throttle_online() then you'll notice that
> it
> is asymetric vs. thermal_throttle_offline().
>
> Also you want to do cancel_delayed_work_sync() and not just
> cancel_delayed_work() because only the latter guarantees that the
> work
> is not enqueued anymore while the former does not take running or
> self
> requeueing work into account.
>
> Something like the untested patch below.
Thanks Thomas. I am testing this patch.

-Srinivas

>
> Thanks,
>
> tglx
> ---
> --- a/arch/x86/kernel/cpu/mce/therm_throt.c
> +++ b/arch/x86/kernel/cpu/mce/therm_throt.c
> @@ -487,8 +487,12 @@ static int thermal_throttle_offline(unsi
> struct thermal_state *state = &per_cpu(thermal_state, cpu);
> struct device *dev = get_cpu_device(cpu);
>
> - cancel_delayed_work(&state->package_throttle.therm_work);
> - cancel_delayed_work(&state->core_throttle.therm_work);
> + /* Mask the thermal vector before draining evtl. pending work
> */
> + l = apic_read(APIC_LVTTHMR);
> + apic_write(APIC_LVTTHMR, l | APIC_LVT_MASKED);
> +
> + cancel_delayed_work_sync(&state->package_throttle.therm_work);
> + cancel_delayed_work_sync(&state->core_throttle.therm_work);
>
> state->package_throttle.rate_control_active = false;
> state->core_throttle.rate_control_active = false;

2020-02-25 09:51:29

by srinivas pandruvada

[permalink] [raw]
Subject: Re: [PATCH] x86/mce/therm_throt: Handle case where throttle_active_work() is called on behalf of an offline CPU

On Mon, 2020-02-24 at 20:25 +0100, Thomas Gleixner wrote:
> Thomas Gleixner <[email protected]> writes:
> > Which is wrong as well. Trying to "fix" it in the work queue
> > callback is
> > papering over the root cause.
> >
> > Why is any work scheduled on an outgoing CPU after this CPU
> > executed
> > thermal_throttle_offline()?
> >
> > When thermal_throttle_offline() is invoked the cpu bound work
> > queues are
> > still functional and thermal_throttle_offline() cancels outstanding
> > work.
> >
> > So no, please fix the root cause not the symptom.
>
> And if you look at thermal_throttle_online() then you'll notice that
> it
> is asymetric vs. thermal_throttle_offline().
>
> Also you want to do cancel_delayed_work_sync() and not just
> cancel_delayed_work() because only the latter guarantees that the
> work
> is not enqueued anymore while the former does not take running or
> self
> requeueing work into account.
>
> Something like the untested patch below.
I tested this patch.
After simulating 20 million thermal interrupts and online/offline test
for 12+ hours, don't see the issue.

So this change fixed the issue.

I can send change on your behalf or you can add
Tested-by: Pandruvada, Srinivas <[email protected]>

Thanks,
Srinivas

>
> Thanks,
>
> tglx
> ---
> --- a/arch/x86/kernel/cpu/mce/therm_throt.c
> +++ b/arch/x86/kernel/cpu/mce/therm_throt.c
> @@ -487,8 +487,12 @@ static int thermal_throttle_offline(unsi
> struct thermal_state *state = &per_cpu(thermal_state, cpu);
> struct device *dev = get_cpu_device(cpu);
>
> - cancel_delayed_work(&state->package_throttle.therm_work);
> - cancel_delayed_work(&state->core_throttle.therm_work);
> + /* Mask the thermal vector before draining evtl. pending work
> */
> + l = apic_read(APIC_LVTTHMR);
> + apic_write(APIC_LVTTHMR, l | APIC_LVT_MASKED);
> +
> + cancel_delayed_work_sync(&state->package_throttle.therm_work);
> + cancel_delayed_work_sync(&state->core_throttle.therm_work);
>
> state->package_throttle.rate_control_active = false;
> state->core_throttle.rate_control_active = false;