2019-07-03 08:56:11

by Liwei Song

[permalink] [raw]
Subject: [PATCH] x86/microcode, cpuhotplug: move microcode hotplug callback after cpu teardown

From: Liwei Song <[email protected]>

Fix the following BUG:

[ 236.599792] BUG: sleeping function called from invalid context at kernel/locking/mutex.c:255
[ 236.599796] in_atomic(): 1, irqs_disabled(): 1, pid: 14, name: migration/1
[ 236.599798] Preemption disabled at:
[ 236.599807] [<ffffffffb3998fa1>] cpu_stopper_thread+0x71/0x100
[ 236.599816] Call Trace:
[ 236.599826] dump_stack+0x4f/0x6a
[ 236.599830] ? cpu_stopper_thread+0x71/0x100
[ 236.599836] ___might_sleep.cold+0xd1/0xe2
[ 236.599841] __might_sleep+0x4b/0x80
[ 236.599847] mutex_lock+0x21/0x50
[ 236.599852] kernfs_find_and_get_ns+0x24/0x60
[ 236.599857] sysfs_remove_group+0x2a/0x80
[ 236.599862] ? mc_device_remove+0x50/0x50
[ 236.599866] mc_cpu_down_prep+0x1d/0x30
[ 236.599871] cpuhp_invoke_callback+0x98/0x670
[ 236.599876] ? cpu_disable_common+0x26a/0x280
[ 236.599882] take_cpu_down+0x70/0xb0
[ 236.599886] multi_cpu_stop+0x64/0xc0
[ 236.599890] ? cpu_stop_queue_work+0x110/0x110
[ 236.599894] cpu_stopper_thread+0x79/0x100
[ 236.599899] ? smpboot_thread_fn+0x2d/0x290
[ 236.599904] smpboot_thread_fn+0x1e7/0x290
[ 236.599910] kthread+0x112/0x150
[ 236.599914] ? sort_range+0x30/0x30
[ 236.599918] ? kthread_park+0x90/0x90
[ 236.599922] ret_from_fork+0x35/0x40
[ 236.599965] smpboot: CPU 1 is now offline

After CPUHP_TEARDOWN_CPU callback was invoked, the context will become
atomic and IRQ disabled, while mc_cpu_down_prep will called
kernfs_find_and_get_ns which will try to acquire mutext lock which may
sleep.

Adjust CPUHP_AP_MICROCODE_LOADER callback function run before
CPUHP_TEARDOWN_CPU to fix this bug.

Fixes: 78f4e932f776 ("x86/microcode, cpuhotplug: Add a microcode loader CPU hotplug callback")
Signed-off-by: Liwei Song <[email protected]>
---
include/linux/cpuhotplug.h | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/include/linux/cpuhotplug.h b/include/linux/cpuhotplug.h
index 5c6062206760..6724bc8a17cd 100644
--- a/include/linux/cpuhotplug.h
+++ b/include/linux/cpuhotplug.h
@@ -101,7 +101,6 @@ enum cpuhp_state {
CPUHP_AP_IRQ_BCM2836_STARTING,
CPUHP_AP_IRQ_MIPS_GIC_STARTING,
CPUHP_AP_ARM_MVEBU_COHERENCY,
- CPUHP_AP_MICROCODE_LOADER,
CPUHP_AP_PERF_X86_AMD_UNCORE_STARTING,
CPUHP_AP_PERF_X86_STARTING,
CPUHP_AP_PERF_X86_AMD_IBS_STARTING,
@@ -143,6 +142,7 @@ enum cpuhp_state {
CPUHP_AP_ARM_CACHE_B15_RAC_DYING,
CPUHP_AP_ONLINE,
CPUHP_TEARDOWN_CPU,
+ CPUHP_AP_MICROCODE_LOADER,
CPUHP_AP_ONLINE_IDLE,
CPUHP_AP_SMPBOOT_THREADS,
CPUHP_AP_X86_VDSO_VMA_ONLINE,
--
2.7.4


2019-07-03 09:18:18

by Thomas Gleixner

[permalink] [raw]
Subject: Re: [PATCH] x86/microcode, cpuhotplug: move microcode hotplug callback after cpu teardown

On Wed, 3 Jul 2019, Song liwei wrote:

> After CPUHP_TEARDOWN_CPU callback was invoked, the context will become
> atomic and IRQ disabled, while mc_cpu_down_prep will called
> kernfs_find_and_get_ns which will try to acquire mutext lock which may
> sleep.
>
> Adjust CPUHP_AP_MICROCODE_LOADER callback function run before
> CPUHP_TEARDOWN_CPU to fix this bug.

That's just wrong and reintroduces the bug which was fixed with that commit
as perf will access a non existing MSR which is brought in by the micro
code update.

Aside of that the mutex issue _is_ fixed in rc7 already:

5423f5ce5ca4 ("x86/microcode: Fix the microcode load on CPU hotplug for real")

Thanks,

tglx