2020-12-17 13:10:21

by Lecopzer Chen

[permalink] [raw]
Subject: [PATCH] kernel/watchdog_hld.c: Fix access percpu in preemptible context

commit 367c820ef08082 ("arm64: Enable perf events based hard lockup detector")
reinitilizes lockup detector after arm64 PMU is initialized and provide
another chance for access smp_processor_id() in preemptible context.
Since hardlockup_detector_event_create() use many percpu relative variable,
just try to fix this by get/put_cpu()

BUG: using smp_processor_id() in preemptible [00000000] code: swapper/0/1
caller is debug_smp_processor_id+0x20/0x2c
CPU: 2 PID: 1 Comm: swapper/0 Not tainted 5.10.0+ #276
Hardware name: linux,dummy-virt (DT)
Call trace:
dump_backtrace+0x0/0x3c0
show_stack+0x20/0x6c
dump_stack+0x2f0/0x42c
check_preemption_disabled+0x1cc/0x1dc
debug_smp_processor_id+0x20/0x2c
hardlockup_detector_event_create+0x34/0x18c
hardlockup_detector_perf_init+0x2c/0x134
watchdog_nmi_probe+0x18/0x24
lockup_detector_init+0x44/0xa8
armv8_pmu_driver_init+0x54/0x78
do_one_initcall+0x184/0x43c
kernel_init_freeable+0x368/0x380
kernel_init+0x1c/0x1cc
ret_from_fork+0x10/0x30


Fixes: 367c820ef08082 ("arm64: Enable perf events based hard lockup detector")
Signed-off-by: Lecopzer Chen <[email protected]>
Cc: Sumit Garg <[email protected]>

---
kernel/watchdog_hld.c | 4 +++-
1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/kernel/watchdog_hld.c b/kernel/watchdog_hld.c
index 247bf0b1582c..c591a1ea8eb3 100644
--- a/kernel/watchdog_hld.c
+++ b/kernel/watchdog_hld.c
@@ -165,7 +165,7 @@ static void watchdog_overflow_callback(struct perf_event *event,

static int hardlockup_detector_event_create(void)
{
- unsigned int cpu = smp_processor_id();
+ unsigned int cpu = get_cpu();
struct perf_event_attr *wd_attr;
struct perf_event *evt;

@@ -176,11 +176,13 @@ static int hardlockup_detector_event_create(void)
evt = perf_event_create_kernel_counter(wd_attr, cpu, NULL,
watchdog_overflow_callback, NULL);
if (IS_ERR(evt)) {
+ put_cpu();
pr_debug("Perf event create on CPU %d failed with %ld\n", cpu,
PTR_ERR(evt));
return PTR_ERR(evt);
}
this_cpu_write(watchdog_ev, evt);
+ put_cpu();
return 0;
}

--
2.25.1


2020-12-19 13:50:27

by Oliver Sang

[permalink] [raw]
Subject: [kernel/watchdog_hld.c] 6e37d53a67: BUG:sleeping_function_called_from_invalid_context_at_include/linux/sched/mm.h


Greeting,

FYI, we noticed the following commit (built with gcc-9):

commit: 6e37d53a67753bcb12a0b9102cac85d98f8a0453 ("[PATCH] kernel/watchdog_hld.c: Fix access percpu in preemptible context")
url: https://github.com/0day-ci/linux/commits/Lecopzer-Chen/kernel-watchdog_hld-c-Fix-access-percpu-in-preemptible-context/20201217-211549
base: https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git accefff5b547a9a1d959c7e76ad539bf2480e78b

in testcase: boot

on test machine: qemu-system-x86_64 -enable-kvm -cpu SandyBridge -smp 2 -m 8G

caused below changes (please refer to attached dmesg/kmsg for entire log/backtrace):


+--------------------------------------------------------------------------------------------------------------------------+------------+------------+
| | accefff5b5 | 6e37d53a67 |
+--------------------------------------------------------------------------------------------------------------------------+------------+------------+
| boot_successes | 18 | 0 |
| boot_failures | 2 | 35 |
| Kernel_panic-not_syncing:VFS:Unable_to_mount_root_fs_on_unknown-block(#,#) | 2 | |
| BUG:sleeping_function_called_from_invalid_context_at_include/linux/sched/mm.h | 0 | 33 |
| kernel_BUG_at_kernel/sched/core.c | 0 | 1 |
| invalid_opcode:#[##] | 0 | 1 |
| RIP:sched_cpu_dying | 0 | 1 |
| Kernel_panic-not_syncing:Fatal_exception | 0 | 1 |
| BUG:kernel_failed_in_early-boot_stage,last_printk:LKP:HOSTNAME_vm-snb-#,MAC#:#:#:#:#:#,kernel##,serial_console/dev/ttyS0 | 0 | 2 |
+--------------------------------------------------------------------------------------------------------------------------+------------+------------+


If you fix the issue, kindly add following tag
Reported-by: kernel test robot <[email protected]>


[ 0.468064] BUG: sleeping function called from invalid context at include/linux/sched/mm.h:196
[ 0.468516] in_atomic(): 1, irqs_disabled(): 0, non_block: 0, pid: 1, name: swapper/0
[ 0.469509] CPU: 0 PID: 1 Comm: swapper/0 Not tainted 5.10.0-10896-g6e37d53a6775 #1
[ 0.470503] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.12.0-1 04/01/2014
[ 0.470503] Call Trace:
[ 0.470503] dump_stack+0x6d/0x88
[ 0.470503] ___might_sleep.cold+0x94/0xa2
[ 0.470503] kmem_cache_alloc_trace+0x3ca/0x460
[ 0.470503] ? hardlockup_detector_event_create+0xa0/0xa0
[ 0.470503] perf_event_alloc+0x51/0xc00
[ 0.470503] ? dynamic_debug_init+0x159/0x1a3
[ 0.470503] perf_event_create_kernel_counter+0x2d/0x140
[ 0.470503] hardlockup_detector_event_create+0x47/0xa0
[ 0.470503] hardlockup_detector_perf_init+0xc/0x40
[ 0.470503] lockup_detector_init+0x51/0x88
[ 0.470503] kernel_init_freeable+0x104/0x255
[ 0.470503] ? rest_init+0xd0/0xd0
[ 0.470503] kernel_init+0xa/0x110
[ 0.470503] ret_from_fork+0x22/0x30
[ 0.470526] NMI watchdog: Perf NMI watchdog permanently disabled
[ 0.471647] smp: Bringing up secondary CPUs ...
[ 0.472699] x86: Booting SMP configuration:
[ 0.473511] .... node #0, CPUs: #1
[ 0.138070] kvm-clock: cpu 1, msr 337b041, secondary cpu clock
[ 0.138070] masked ExtINT on CPU#1
[ 0.138070] smpboot: CPU 1 Converting physical 0 to logical die 1
[ 0.507566] kvm-guest: stealtime: cpu 1, msr 23fd18540
[ 0.508605] smp: Brought up 1 node, 2 CPUs
[ 0.509515] smpboot: Max logical packages: 2
[ 0.510509] smpboot: Total of 2 processors activated (10774.03 BogoMIPS)
[ 0.512579] devtmpfs: initialized
[ 0.513583] x86/mm: Memory block size: 128MB
[ 0.516951] clocksource: jiffies: mask: 0xffffffff max_cycles: 0xffffffff, max_idle_ns: 1911260446275000 ns
[ 0.517516] futex hash table entries: 512 (order: 3, 32768 bytes, linear)
[ 0.519552] pinctrl core: initialized pinctrl subsystem
[ 0.520772] NET: Registered protocol family 16
[ 0.521876] audit: initializing netlink subsys (disabled)
[ 0.523545] audit: type=2000 audit(1608248685.973:1): state=initialized audit_enabled=0 res=1
[ 0.523736] thermal_sys: Registered thermal governor 'fair_share'
[ 0.524508] thermal_sys: Registered thermal governor 'bang_bang'
[ 0.525508] thermal_sys: Registered thermal governor 'step_wise'
[ 0.526508] thermal_sys: Registered thermal governor 'user_space'
[ 0.527537] cpuidle: using governor menu
[ 0.529821] ACPI: bus type PCI registered
[ 0.530508] acpiphp: ACPI Hot Plug PCI Controller Driver version: 0.5
[ 0.532208] PCI: Using configuration type 1 for base access
[ 0.535587] HugeTLB registered 2.00 MiB page size, pre-allocated 0 pages
[ 0.538626] ACPI: Added _OSI(Module Device)
[ 0.539509] ACPI: Added _OSI(Processor Device)
[ 0.540483] ACPI: Added _OSI(3.0 _SCP Extensions)
[ 0.540511] ACPI: Added _OSI(Processor Aggregator Device)
[ 0.541523] ACPI: Added _OSI(Linux-Dell-Video)
[ 0.542548] ACPI: Added _OSI(Linux-Lenovo-NV-HDMI-Audio)
[ 0.543518] ACPI: Added _OSI(Linux-HPI-Hybrid-Graphics)
[ 0.546159] ACPI: 1 ACPI AML tables successfully acquired and loaded
[ 0.547852] ACPI: Interpreter enabled
[ 0.548531] ACPI: (supports S0 S3 S4 S5)
[ 0.549476] ACPI: Using IOAPIC for interrupt routing
[ 0.549545] PCI: Using host bridge windows from ACPI; if necessary, use "pci=nocrs" and report a bug
[ 0.550690] ACPI: Enabled 2 GPEs in block 00 to 0F
[ 0.556249] ACPI: PCI Root Bridge [PCI0] (domain 0000 [bus 00-ff])
[ 0.556518] acpi PNP0A03:00: _OSC: OS supports [ASPM ClockPM Segments MSI HPX-Type3]
[ 0.557536] acpi PNP0A03:00: fail to add MMCONFIG information, can't access extended PCI configuration space under this bridge.
[ 0.559088] acpiphp: Slot [3] registered
[ 0.559544] acpiphp: Slot [4] registered
[ 0.560469] acpiphp: Slot [5] registered
[ 0.560541] acpiphp: Slot [6] registered
[ 0.561424] acpiphp: Slot [7] registered
[ 0.561544] acpiphp: Slot [8] registered
[ 0.562546] acpiphp: Slot [9] registered
[ 0.563462] acpiphp: Slot [10] registered
[ 0.563543] acpiphp: Slot [11] registered
[ 0.564546] acpiphp: Slot [12] registered
[ 0.565481] acpiphp: Slot [13] registered
[ 0.565545] acpiphp: Slot [14] registered
[ 0.566516] acpiphp: Slot [15] registered
[ 0.567492] acpiphp: Slot [16] registered
[ 0.567544] acpiphp: Slot [17] registered
[ 0.568528] acpiphp: Slot [18] registered
[ 0.569487] acpiphp: Slot [19] registered
[ 0.569551] acpiphp: Slot [20] registered
[ 0.570547] acpiphp: Slot [21] registered
[ 0.571532] acpiphp: Slot [22] registered
[ 0.572489] acpiphp: Slot [23] registered
[ 0.572544] acpiphp: Slot [24] registered
[ 0.573485] acpiphp: Slot [25] registered
[ 0.573550] acpiphp: Slot [26] registered
[ 0.574545] acpiphp: Slot [27] registered
[ 0.575545] acpiphp: Slot [28] registered
[ 0.576546] acpiphp: Slot [29] registered
[ 0.577548] acpiphp: Slot [30] registered
[ 0.578497] acpiphp: Slot [31] registered
[ 0.578531] PCI host bridge to bus 0000:00
[ 0.579510] pci_bus 0000:00: root bus resource [io 0x0000-0x0cf7 window]
[ 0.580509] pci_bus 0000:00: root bus resource [io 0x0d00-0xffff window]
[ 0.581509] pci_bus 0000:00: root bus resource [mem 0x000a0000-0x000bffff window]
[ 0.582511] pci_bus 0000:00: root bus resource [mem 0xc0000000-0xfebfffff window]
[ 0.583510] pci_bus 0000:00: root bus resource [mem 0x240000000-0x2bfffffff window]
[ 0.584511] pci_bus 0000:00: root bus resource [bus 00-ff]
[ 0.585577] pci 0000:00:00.0: [8086:1237] type 00 class 0x060000
[ 0.587080] pci 0000:00:01.0: [8086:7000] type 00 class 0x060100


To reproduce:

# build kernel
cd linux
cp config-5.10.0-10896-g6e37d53a6775 .config
make HOSTCC=gcc-9 CC=gcc-9 ARCH=x86_64 olddefconfig prepare modules_prepare bzImage

git clone https://github.com/intel/lkp-tests.git
cd lkp-tests
bin/lkp qemu -k <bzImage> job-script # job-script is attached in this email



Thanks,
Oliver Sang


Attachments:
(No filename) (8.80 kB)
config-5.10.0-10896-g6e37d53a6775 (194.08 kB)
job-script (4.48 kB)
dmesg.xz (12.08 kB)
Download all attachments

2020-12-19 15:05:33

by Lecopzer Chen

[permalink] [raw]
Subject: Re: [PATCH] kernel/watchdog_hld.c: Fix access percpu in preemptible context

BRs,
Lecopzer


> Greeting,
>
> FYI, we noticed the following commit (built with gcc-9):
>
> commit: 6e37d53a67753bcb12a0b9102cac85d98f8a0453 ("[PATCH] kernel/watchdog_hld.c: Fix access percpu in preemptible context")
> url: https://github.com/0day-ci/linux/commits/Lecopzer-Chen/kernel-watchdog_hld-c-Fix-access-percpu-in-preemptible-context/20201217-211549
> base: https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git accefff5b547a9a1d959c7e76ad539bf2480e78b
>
> in testcase: boot
>
> on test machine: qemu-system-x86_64 -enable-kvm -cpu SandyBridge -smp 2 -m 8G
>
> caused below changes (please refer to attached dmesg/kmsg for entire log/backtrace):
>
> If you fix the issue, kindly add following tag
> Reported-by: kernel test robot <[email protected]>
>

2020-12-19 15:06:33

by Lecopzer Chen

[permalink] [raw]
Subject: Re: [PATCH] kernel/watchdog_hld.c: Fix access percpu in preemptible context


Thanks a lot, I'll try to fix this in anthoer way(patch v2) to avoid
regreesion other than arm64

BRs,
Lecopzer


> Greeting,
>
> FYI, we noticed the following commit (built with gcc-9):
>
> commit: 6e37d53a67753bcb12a0b9102cac85d98f8a0453 ("[PATCH] kernel/watchdog_hld.c: Fix access percpu in preemptible context")
> url: https://github.com/0day-ci/linux/commits/Lecopzer-Chen/kernel-watchdog_hld-c-Fix-access-percpu-in-preemptible-context/20201217-211549
> base: https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git accefff5b547a9a1d959c7e76ad539bf2480e78b
>
> in testcase: boot
>
> on test machine: qemu-system-x86_64 -enable-kvm -cpu SandyBridge -smp 2 -m 8G
>
> caused below changes (please refer to attached dmesg/kmsg for entire log/backtrace):
>
> If you fix the issue, kindly add following tag
> Reported-by: kernel test robot <[email protected]>
>