Hard lockup detector is helpful to diagnose unpaired irq enable/disable.
But the current watchdog framework can not cope with arm64 hw perf event
easily.
On arm64, when lockup_detector_init()->watchdog_nmi_probe(), PMU is not
ready until device_initcall(armv8_pmu_driver_init). And it is deeply
integrated with the driver model and cpuhp. Hence it is hard to push the
initialization of armv8_pmu_driver_init() before smp_init().
But it is easy to take an opposite approach by enabling watchdog_hld to
get the capability of PMU async.
The async model is achieved by expanding watchdog_nmi_probe() with
-EBUSY, and a re-initializing work_struct which waits on a
wait_queue_head.
In this series, [1-2/4] are trivial cleanup. [3-4/4] is for this async
model.
v2 -> v3:
check the delay work waken up and flush the work before __initdata is free.
improve the commit log of [4/4]
rebase to v5.15-rc5
v1 > v2:
uplift the async model from hard lockup layer to watchdog layter.
The benefit is simpler code, the drawback is re-initialize means wasted
alloc/free.
Cc: Sumit Garg <[email protected]>
Cc: Catalin Marinas <[email protected]>
Cc: Will Deacon <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Arnaldo Carvalho de Melo <[email protected]>
Cc: Mark Rutland <[email protected]>
Cc: Alexander Shishkin <[email protected]>
Cc: Jiri Olsa <[email protected]>
Cc: Namhyung Kim <[email protected]>
Cc: Marc Zyngier <[email protected]>
Cc: Kees Cook <[email protected]>
Cc: Masahiro Yamada <[email protected]>
Cc: Sami Tolvanen <[email protected]>
Cc: Petr Mladek <[email protected]>
Cc: Andrew Morton <[email protected]>
Cc: Wang Qing <[email protected]>
Cc: "Peter Zijlstra (Intel)" <[email protected]>
Cc: Santosh Sivaraj <[email protected]>
To: [email protected]
To: [email protected]
*** BLURB HERE ***
Pingfan Liu (3):
kernel/watchdog: trival cleanups
kernel/watchdog_hld: Ensure CPU-bound context when creating hardlockup
detector event
kernel/watchdog: Adapt the watchdog_hld interface for async model
Sumit Garg (1):
arm64: Enable perf events based hard lockup detector
arch/arm64/Kconfig | 2 ++
arch/arm64/kernel/Makefile | 1 +
arch/arm64/kernel/perf_event.c | 11 ++++--
arch/arm64/kernel/watchdog_hld.c | 36 +++++++++++++++++++
arch/sparc/kernel/nmi.c | 8 ++---
drivers/perf/arm_pmu.c | 5 +++
include/linux/nmi.h | 11 +++++-
include/linux/perf/arm_pmu.h | 2 ++
kernel/watchdog.c | 62 ++++++++++++++++++++++++++++----
kernel/watchdog_hld.c | 5 ++-
10 files changed, 129 insertions(+), 14 deletions(-)
create mode 100644 arch/arm64/kernel/watchdog_hld.c
--
2.31.1
No reference to WATCHDOG_DEFAULT, remove it.
And nobody cares about the return value of watchdog_nmi_enable(),
changing its prototype to void.
Signed-off-by: Pingfan Liu <[email protected]>
Cc: Petr Mladek <[email protected]>
Cc: Andrew Morton <[email protected]>
Cc: Wang Qing <[email protected]>
Cc: "Peter Zijlstra (Intel)" <[email protected]>
Cc: Santosh Sivaraj <[email protected]>
Cc: [email protected]
To: [email protected]
---
arch/sparc/kernel/nmi.c | 8 ++++----
include/linux/nmi.h | 2 +-
kernel/watchdog.c | 5 +----
3 files changed, 6 insertions(+), 9 deletions(-)
diff --git a/arch/sparc/kernel/nmi.c b/arch/sparc/kernel/nmi.c
index 060fff95a305..8dc0f4e820b0 100644
--- a/arch/sparc/kernel/nmi.c
+++ b/arch/sparc/kernel/nmi.c
@@ -282,11 +282,11 @@ __setup("nmi_watchdog=", setup_nmi_watchdog);
* sparc specific NMI watchdog enable function.
* Enables watchdog if it is not enabled already.
*/
-int watchdog_nmi_enable(unsigned int cpu)
+void watchdog_nmi_enable(unsigned int cpu)
{
if (atomic_read(&nmi_active) == -1) {
pr_warn("NMI watchdog cannot be enabled or disabled\n");
- return -1;
+ return;
}
/*
@@ -295,11 +295,11 @@ int watchdog_nmi_enable(unsigned int cpu)
* process first.
*/
if (!nmi_init_done)
- return 0;
+ return;
smp_call_function_single(cpu, start_nmi_watchdog, NULL, 1);
- return 0;
+ return;
}
/*
* sparc specific NMI watchdog disable function.
diff --git a/include/linux/nmi.h b/include/linux/nmi.h
index 750c7f395ca9..b7bcd63c36b4 100644
--- a/include/linux/nmi.h
+++ b/include/linux/nmi.h
@@ -119,7 +119,7 @@ static inline int hardlockup_detector_perf_init(void) { return 0; }
void watchdog_nmi_stop(void);
void watchdog_nmi_start(void);
int watchdog_nmi_probe(void);
-int watchdog_nmi_enable(unsigned int cpu);
+void watchdog_nmi_enable(unsigned int cpu);
void watchdog_nmi_disable(unsigned int cpu);
/**
diff --git a/kernel/watchdog.c b/kernel/watchdog.c
index ad912511a0c0..6e6dd5f0bc3e 100644
--- a/kernel/watchdog.c
+++ b/kernel/watchdog.c
@@ -30,10 +30,8 @@
static DEFINE_MUTEX(watchdog_mutex);
#if defined(CONFIG_HARDLOCKUP_DETECTOR) || defined(CONFIG_HAVE_NMI_WATCHDOG)
-# define WATCHDOG_DEFAULT (SOFT_WATCHDOG_ENABLED | NMI_WATCHDOG_ENABLED)
# define NMI_WATCHDOG_DEFAULT 1
#else
-# define WATCHDOG_DEFAULT (SOFT_WATCHDOG_ENABLED)
# define NMI_WATCHDOG_DEFAULT 0
#endif
@@ -95,10 +93,9 @@ __setup("nmi_watchdog=", hardlockup_panic_setup);
* softlockup watchdog start and stop. The arch must select the
* SOFTLOCKUP_DETECTOR Kconfig.
*/
-int __weak watchdog_nmi_enable(unsigned int cpu)
+void __weak watchdog_nmi_enable(unsigned int cpu)
{
hardlockup_detector_perf_enable();
- return 0;
}
void __weak watchdog_nmi_disable(unsigned int cpu)
--
2.31.1
hardlockup_detector_event_create() should create perf_event on the
current CPU. Preemption could not get disabled because
perf_event_create_kernel_counter() allocates memory. Instead,
the CPU locality is achieved by processing the code in a per-CPU
bound kthread.
Add a check to prevent mistakes when calling the code in another
code path.
Signed-off-by: Pingfan Liu <[email protected]>
Cc: Petr Mladek <[email protected]>
Cc: Andrew Morton <[email protected]>
Cc: Wang Qing <[email protected]>
Cc: "Peter Zijlstra (Intel)" <[email protected]>
Cc: Santosh Sivaraj <[email protected]>
Cc: [email protected]
To: [email protected]
---
kernel/watchdog_hld.c | 5 ++++-
1 file changed, 4 insertions(+), 1 deletion(-)
diff --git a/kernel/watchdog_hld.c b/kernel/watchdog_hld.c
index 247bf0b1582c..df010df76576 100644
--- a/kernel/watchdog_hld.c
+++ b/kernel/watchdog_hld.c
@@ -165,10 +165,13 @@ static void watchdog_overflow_callback(struct perf_event *event,
static int hardlockup_detector_event_create(void)
{
- unsigned int cpu = smp_processor_id();
+ unsigned int cpu;
struct perf_event_attr *wd_attr;
struct perf_event *evt;
+ /* This function plans to execute in cpu bound kthread */
+ WARN_ON(!is_percpu_thread());
+ cpu = raw_smp_processor_id();
wd_attr = &wd_hw_attr;
wd_attr->sample_period = hw_nmi_get_sample_period(watchdog_thresh);
--
2.31.1
Hi Pingfan,
Is this thread sill in progress?
We are looking for the upstream solution for ARM64 Hardlockup detector.
I'd appreciate it if someone keep working on it,
if not, I can take over it.
thanks!
-Lecopzer
On Mon, Jan 17, 2022 at 6:19 PM Lecopzer Chen
<[email protected]> wrote:
>
> Hi Pingfan,
>
> Is this thread sill in progress?
No, I am working on other topic at present, and this is not in my
queue in near future.
> We are looking for the upstream solution for ARM64 Hardlockup detector.
>
> I'd appreciate it if someone keep working on it,
> if not, I can take over it.
>
Be my guest, and hope you have great work. We badly wants hardlock up
detector on arm64
Best Regards,
Pingfan