2021-10-14 02:43:19

by Pingfan Liu

[permalink] [raw]
Subject: [PATCHv3 0/4] watchdog_hld cleanup and async model for arm64

Hard lockup detector is helpful to diagnose unpaired irq enable/disable.
But the current watchdog framework can not cope with arm64 hw perf event
easily.

On arm64, when lockup_detector_init()->watchdog_nmi_probe(), PMU is not
ready until device_initcall(armv8_pmu_driver_init). And it is deeply
integrated with the driver model and cpuhp. Hence it is hard to push the
initialization of armv8_pmu_driver_init() before smp_init().

But it is easy to take an opposite approach by enabling watchdog_hld to
get the capability of PMU async.
The async model is achieved by expanding watchdog_nmi_probe() with
-EBUSY, and a re-initializing work_struct which waits on a
wait_queue_head.

In this series, [1-2/4] are trivial cleanup. [3-4/4] is for this async
model.

v2 -> v3:
check the delay work waken up and flush the work before __initdata is free.
improve the commit log of [4/4]
rebase to v5.15-rc5

v1 > v2:
uplift the async model from hard lockup layer to watchdog layter.
The benefit is simpler code, the drawback is re-initialize means wasted
alloc/free.

Cc: Sumit Garg <[email protected]>
Cc: Catalin Marinas <[email protected]>
Cc: Will Deacon <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Arnaldo Carvalho de Melo <[email protected]>
Cc: Mark Rutland <[email protected]>
Cc: Alexander Shishkin <[email protected]>
Cc: Jiri Olsa <[email protected]>
Cc: Namhyung Kim <[email protected]>
Cc: Marc Zyngier <[email protected]>
Cc: Kees Cook <[email protected]>
Cc: Masahiro Yamada <[email protected]>
Cc: Sami Tolvanen <[email protected]>
Cc: Petr Mladek <[email protected]>
Cc: Andrew Morton <[email protected]>
Cc: Wang Qing <[email protected]>
Cc: "Peter Zijlstra (Intel)" <[email protected]>
Cc: Santosh Sivaraj <[email protected]>
To: [email protected]
To: [email protected]

*** BLURB HERE ***

Pingfan Liu (3):
kernel/watchdog: trival cleanups
kernel/watchdog_hld: Ensure CPU-bound context when creating hardlockup
detector event
kernel/watchdog: Adapt the watchdog_hld interface for async model

Sumit Garg (1):
arm64: Enable perf events based hard lockup detector

arch/arm64/Kconfig | 2 ++
arch/arm64/kernel/Makefile | 1 +
arch/arm64/kernel/perf_event.c | 11 ++++--
arch/arm64/kernel/watchdog_hld.c | 36 +++++++++++++++++++
arch/sparc/kernel/nmi.c | 8 ++---
drivers/perf/arm_pmu.c | 5 +++
include/linux/nmi.h | 11 +++++-
include/linux/perf/arm_pmu.h | 2 ++
kernel/watchdog.c | 62 ++++++++++++++++++++++++++++----
kernel/watchdog_hld.c | 5 ++-
10 files changed, 129 insertions(+), 14 deletions(-)
create mode 100644 arch/arm64/kernel/watchdog_hld.c

--
2.31.1


2021-10-14 02:43:29

by Pingfan Liu

[permalink] [raw]
Subject: [PATCHv3 1/4] kernel/watchdog: trival cleanups

No reference to WATCHDOG_DEFAULT, remove it.

And nobody cares about the return value of watchdog_nmi_enable(),
changing its prototype to void.

Signed-off-by: Pingfan Liu <[email protected]>
Cc: Petr Mladek <[email protected]>
Cc: Andrew Morton <[email protected]>
Cc: Wang Qing <[email protected]>
Cc: "Peter Zijlstra (Intel)" <[email protected]>
Cc: Santosh Sivaraj <[email protected]>
Cc: [email protected]
To: [email protected]
---
arch/sparc/kernel/nmi.c | 8 ++++----
include/linux/nmi.h | 2 +-
kernel/watchdog.c | 5 +----
3 files changed, 6 insertions(+), 9 deletions(-)

diff --git a/arch/sparc/kernel/nmi.c b/arch/sparc/kernel/nmi.c
index 060fff95a305..8dc0f4e820b0 100644
--- a/arch/sparc/kernel/nmi.c
+++ b/arch/sparc/kernel/nmi.c
@@ -282,11 +282,11 @@ __setup("nmi_watchdog=", setup_nmi_watchdog);
* sparc specific NMI watchdog enable function.
* Enables watchdog if it is not enabled already.
*/
-int watchdog_nmi_enable(unsigned int cpu)
+void watchdog_nmi_enable(unsigned int cpu)
{
if (atomic_read(&nmi_active) == -1) {
pr_warn("NMI watchdog cannot be enabled or disabled\n");
- return -1;
+ return;
}

/*
@@ -295,11 +295,11 @@ int watchdog_nmi_enable(unsigned int cpu)
* process first.
*/
if (!nmi_init_done)
- return 0;
+ return;

smp_call_function_single(cpu, start_nmi_watchdog, NULL, 1);

- return 0;
+ return;
}
/*
* sparc specific NMI watchdog disable function.
diff --git a/include/linux/nmi.h b/include/linux/nmi.h
index 750c7f395ca9..b7bcd63c36b4 100644
--- a/include/linux/nmi.h
+++ b/include/linux/nmi.h
@@ -119,7 +119,7 @@ static inline int hardlockup_detector_perf_init(void) { return 0; }
void watchdog_nmi_stop(void);
void watchdog_nmi_start(void);
int watchdog_nmi_probe(void);
-int watchdog_nmi_enable(unsigned int cpu);
+void watchdog_nmi_enable(unsigned int cpu);
void watchdog_nmi_disable(unsigned int cpu);

/**
diff --git a/kernel/watchdog.c b/kernel/watchdog.c
index ad912511a0c0..6e6dd5f0bc3e 100644
--- a/kernel/watchdog.c
+++ b/kernel/watchdog.c
@@ -30,10 +30,8 @@
static DEFINE_MUTEX(watchdog_mutex);

#if defined(CONFIG_HARDLOCKUP_DETECTOR) || defined(CONFIG_HAVE_NMI_WATCHDOG)
-# define WATCHDOG_DEFAULT (SOFT_WATCHDOG_ENABLED | NMI_WATCHDOG_ENABLED)
# define NMI_WATCHDOG_DEFAULT 1
#else
-# define WATCHDOG_DEFAULT (SOFT_WATCHDOG_ENABLED)
# define NMI_WATCHDOG_DEFAULT 0
#endif

@@ -95,10 +93,9 @@ __setup("nmi_watchdog=", hardlockup_panic_setup);
* softlockup watchdog start and stop. The arch must select the
* SOFTLOCKUP_DETECTOR Kconfig.
*/
-int __weak watchdog_nmi_enable(unsigned int cpu)
+void __weak watchdog_nmi_enable(unsigned int cpu)
{
hardlockup_detector_perf_enable();
- return 0;
}

void __weak watchdog_nmi_disable(unsigned int cpu)
--
2.31.1

2021-10-14 02:43:39

by Pingfan Liu

[permalink] [raw]
Subject: [PATCHv3 2/4] kernel/watchdog_hld: Ensure CPU-bound context when creating hardlockup detector event

hardlockup_detector_event_create() should create perf_event on the
current CPU. Preemption could not get disabled because
perf_event_create_kernel_counter() allocates memory. Instead,
the CPU locality is achieved by processing the code in a per-CPU
bound kthread.

Add a check to prevent mistakes when calling the code in another
code path.

Signed-off-by: Pingfan Liu <[email protected]>
Cc: Petr Mladek <[email protected]>
Cc: Andrew Morton <[email protected]>
Cc: Wang Qing <[email protected]>
Cc: "Peter Zijlstra (Intel)" <[email protected]>
Cc: Santosh Sivaraj <[email protected]>
Cc: [email protected]
To: [email protected]
---
kernel/watchdog_hld.c | 5 ++++-
1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/kernel/watchdog_hld.c b/kernel/watchdog_hld.c
index 247bf0b1582c..df010df76576 100644
--- a/kernel/watchdog_hld.c
+++ b/kernel/watchdog_hld.c
@@ -165,10 +165,13 @@ static void watchdog_overflow_callback(struct perf_event *event,

static int hardlockup_detector_event_create(void)
{
- unsigned int cpu = smp_processor_id();
+ unsigned int cpu;
struct perf_event_attr *wd_attr;
struct perf_event *evt;

+ /* This function plans to execute in cpu bound kthread */
+ WARN_ON(!is_percpu_thread());
+ cpu = raw_smp_processor_id();
wd_attr = &wd_hw_attr;
wd_attr->sample_period = hw_nmi_get_sample_period(watchdog_thresh);

--
2.31.1

2022-01-17 17:08:13

by Lecopzer Chen

[permalink] [raw]
Subject: Re: [PATCHv3 0/4] watchdog_hld cleanup and async model for arm64

Hi Pingfan,

Is this thread sill in progress?
We are looking for the upstream solution for ARM64 Hardlockup detector.

I'd appreciate it if someone keep working on it,
if not, I can take over it.



thanks!

-Lecopzer


2022-01-24 15:04:17

by Pingfan Liu

[permalink] [raw]
Subject: Re: [PATCHv3 0/4] watchdog_hld cleanup and async model for arm64

On Mon, Jan 17, 2022 at 6:19 PM Lecopzer Chen
<[email protected]> wrote:
>
> Hi Pingfan,
>
> Is this thread sill in progress?

No, I am working on other topic at present, and this is not in my
queue in near future.
> We are looking for the upstream solution for ARM64 Hardlockup detector.
>
> I'd appreciate it if someone keep working on it,
> if not, I can take over it.
>
Be my guest, and hope you have great work. We badly wants hardlock up
detector on arm64

Best Regards,

Pingfan