2021-09-15 03:53:16

by Pingfan Liu

[permalink] [raw]
Subject: [PATCH 0/5] watchdog_hld cleanup and async model for arm64

hard lockup detector is helpful to diagnose unpaired irq enable/disable.
Sumit has tried with a series, and the last one is V5 [1].
Since it lasts a long time without any update, I takes a retry, which
addresses the delay intialization of watchdog_hld.

There is an obstacle to integrate arm64 hw perf event into watchdog_hld.
When lockup_detector_init()->watchdog_nmi_probe(), on arm64, PMU is not
ready until device_initcall(armv8_pmu_driver_init). And it is deeply
integrated with the driver model and cpuhp. Hence it is hard to push
the initialization of armv8_pmu_driver_init() before smp_init().

But it is easy to take an opposite approach by enabling watchdog_hld to
get the capability of PMU async. The async model is achieved by
introducing an extra parameter notifier of watchdog_nmi_probe().

In this series, [1-2/5] are trivial cleanup. [3-5/5] is for this async
model.

[1]: http://lore.kernel.org/linux-arm-kernel/[email protected]

Cc: Sumit Garg <[email protected]>
Cc: Catalin Marinas <[email protected]>
Cc: Will Deacon <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Arnaldo Carvalho de Melo <[email protected]>
Cc: Mark Rutland <[email protected]>
Cc: Alexander Shishkin <[email protected]>
Cc: Jiri Olsa <[email protected]>
Cc: Namhyung Kim <[email protected]>
Cc: Marc Zyngier <[email protected]>
Cc: Julien Thierry <[email protected]>
Cc: Kees Cook <[email protected]>
Cc: Masahiro Yamada <[email protected]>
Cc: Sami Tolvanen <[email protected]>
Cc: Petr Mladek <[email protected]>
Cc: Andrew Morton <[email protected]>
Cc: Wang Qing <[email protected]>
Cc: "Peter Zijlstra (Intel)" <[email protected]>
Cc: Santosh Sivaraj <[email protected]>
To: [email protected]
To: [email protected]


Pingfan Liu (5):
kernel/watchdog: remove useless WATCHDOG_DEFAULT
kernel/watchdog_hld: clarify the condition in
hardlockup_detector_event_create()
kernel/watchdog: adapt the watchdog_hld interface for async model
kernel/watchdog_hld: simplify the detecting of hld watchdog
arm64/watchdog_hld: enable hard lockup on arm64 platform

arch/arm64/Kconfig | 3 ++
arch/arm64/include/asm/perf_event.h | 5 ++
arch/arm64/kernel/Makefile | 1 +
arch/arm64/kernel/perf_event.c | 14 ++++-
arch/arm64/kernel/watchdog_hld.c | 83 +++++++++++++++++++++++++++++
drivers/perf/arm_pmu.c | 5 ++
include/linux/nmi.h | 20 ++++---
kernel/watchdog.c | 69 ++++++++++++++++++------
kernel/watchdog_hld.c | 26 +++------
9 files changed, 181 insertions(+), 45 deletions(-)
create mode 100644 arch/arm64/kernel/watchdog_hld.c

--
2.31.1


2021-09-15 03:53:24

by Pingfan Liu

[permalink] [raw]
Subject: [PATCH 5/5] arm64/watchdog_hld: enable hard lockup on arm64 platform

With watchdog_hld armed with the async model, arm64 can probe and enable
perf NMI after smp_init(). At the boot stage, all of cpus arm hard
lockup detector in the async model.

In this patch, the function hw_nmi_get_sample_period() is borrowed from
[1], credit goes to Sumit.

[1]: http://lore.kernel.org/linux-arm-kernel/[email protected]

Signed-off-by: Pingfan Liu <[email protected]>
Cc: Catalin Marinas <[email protected]>
Cc: Will Deacon <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Arnaldo Carvalho de Melo <[email protected]>
Cc: Mark Rutland <[email protected]>
Cc: Alexander Shishkin <[email protected]>
Cc: Jiri Olsa <[email protected]>
Cc: Namhyung Kim <[email protected]>
Cc: Marc Zyngier <[email protected]>
Cc: Julien Thierry <[email protected]>
Cc: Kees Cook <[email protected]>
Cc: Masahiro Yamada <[email protected]>
Cc: Sami Tolvanen <[email protected]>
To: [email protected]
Cc: [email protected]
---
arch/arm64/Kconfig | 3 ++
arch/arm64/include/asm/perf_event.h | 5 ++
arch/arm64/kernel/Makefile | 1 +
arch/arm64/kernel/perf_event.c | 14 ++++-
arch/arm64/kernel/watchdog_hld.c | 83 +++++++++++++++++++++++++++++
drivers/perf/arm_pmu.c | 5 ++
6 files changed, 109 insertions(+), 2 deletions(-)
create mode 100644 arch/arm64/kernel/watchdog_hld.c

diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
index 62c3c1d2190f..0f49e58a9dd8 100644
--- a/arch/arm64/Kconfig
+++ b/arch/arm64/Kconfig
@@ -193,6 +193,9 @@ config ARM64
select HAVE_PERF_EVENTS
select HAVE_PERF_REGS
select HAVE_PERF_USER_STACK_DUMP
+ select HAVE_PERF_EVENTS_NMI
+ select HAVE_HARDLOCKUP_DETECTOR_PERF \
+ if PERF_EVENTS && HW_PERF_EVENTS && HAVE_PERF_EVENTS_NMI
select HAVE_REGS_AND_STACK_ACCESS_API
select HAVE_FUNCTION_ARG_ACCESS_API
select HAVE_FUTEX_CMPXCHG if FUTEX
diff --git a/arch/arm64/include/asm/perf_event.h b/arch/arm64/include/asm/perf_event.h
index 4ef6f19331f9..712a75f432f0 100644
--- a/arch/arm64/include/asm/perf_event.h
+++ b/arch/arm64/include/asm/perf_event.h
@@ -6,6 +6,7 @@
#ifndef __ASM_PERF_EVENT_H
#define __ASM_PERF_EVENT_H

+#include <linux/wait.h>
#include <asm/stack_pointer.h>
#include <asm/ptrace.h>

@@ -259,4 +260,8 @@ extern unsigned long perf_misc_flags(struct pt_regs *regs);
(regs)->pstate = PSR_MODE_EL1h; \
}

+extern bool arm_pmu_initialized;
+extern wait_queue_head_t arm_pmu_wait;
+extern bool check_pmu_nmi_ability(void);
+
#endif
diff --git a/arch/arm64/kernel/Makefile b/arch/arm64/kernel/Makefile
index 3f1490bfb938..789c2fe5bb90 100644
--- a/arch/arm64/kernel/Makefile
+++ b/arch/arm64/kernel/Makefile
@@ -46,6 +46,7 @@ obj-$(CONFIG_MODULES) += module.o
obj-$(CONFIG_ARM64_MODULE_PLTS) += module-plts.o
obj-$(CONFIG_PERF_EVENTS) += perf_regs.o perf_callchain.o
obj-$(CONFIG_HW_PERF_EVENTS) += perf_event.o
+obj-$(CONFIG_HARDLOCKUP_DETECTOR_PERF) += watchdog_hld.o
obj-$(CONFIG_HAVE_HW_BREAKPOINT) += hw_breakpoint.o
obj-$(CONFIG_CPU_PM) += sleep.o suspend.o
obj-$(CONFIG_CPU_IDLE) += cpuidle.o
diff --git a/arch/arm64/kernel/perf_event.c b/arch/arm64/kernel/perf_event.c
index d07788dad388..c4144cea0d55 100644
--- a/arch/arm64/kernel/perf_event.c
+++ b/arch/arm64/kernel/perf_event.c
@@ -24,6 +24,9 @@
#include <linux/sched_clock.h>
#include <linux/smp.h>

+bool arm_pmu_initialized;
+DECLARE_WAIT_QUEUE_HEAD(arm_pmu_wait);
+
/* ARMv8 Cortex-A53 specific event types. */
#define ARMV8_A53_PERFCTR_PREF_LINEFILL 0xC2

@@ -1284,10 +1287,17 @@ static struct platform_driver armv8_pmu_driver = {

static int __init armv8_pmu_driver_init(void)
{
+ int ret;
+
if (acpi_disabled)
- return platform_driver_register(&armv8_pmu_driver);
+ ret = platform_driver_register(&armv8_pmu_driver);
else
- return arm_pmu_acpi_probe(armv8_pmuv3_init);
+ ret = arm_pmu_acpi_probe(armv8_pmuv3_init);
+
+ arm_pmu_initialized = true;
+ wake_up_all(&arm_pmu_wait);
+
+ return ret;
}
device_initcall(armv8_pmu_driver_init)

diff --git a/arch/arm64/kernel/watchdog_hld.c b/arch/arm64/kernel/watchdog_hld.c
new file mode 100644
index 000000000000..18bfa62f1058
--- /dev/null
+++ b/arch/arm64/kernel/watchdog_hld.c
@@ -0,0 +1,83 @@
+// SPDX-License-Identifier: GPL-2.0
+#include <linux/kthread.h>
+#include <linux/sched.h>
+#include <linux/nmi.h>
+#include <linux/cpufreq.h>
+#include <asm/perf_event.h>
+
+/*
+ * Safe maximum CPU frequency in case a particular platform doesn't implement
+ * cpufreq driver. Although, architecture doesn't put any restrictions on
+ * maximum frequency but 5 GHz seems to be safe maximum given the available
+ * Arm CPUs in the market which are clocked much less than 5 GHz. On the other
+ * hand, we can't make it much higher as it would lead to a large hard-lockup
+ * detection timeout on parts which are running slower (eg. 1GHz on
+ * Developerbox) and doesn't possess a cpufreq driver.
+ */
+#define SAFE_MAX_CPU_FREQ 5000000000UL // 5 GHz
+u64 hw_nmi_get_sample_period(int watchdog_thresh)
+{
+ unsigned int cpu = smp_processor_id();
+ unsigned long max_cpu_freq;
+
+ max_cpu_freq = cpufreq_get_hw_max_freq(cpu) * 1000UL;
+ if (!max_cpu_freq)
+ max_cpu_freq = SAFE_MAX_CPU_FREQ;
+
+ return (u64)max_cpu_freq * watchdog_thresh;
+}
+
+static watchdog_nmi_status_reporter status_reporter;
+
+static int hld_enabled_thread_fun(void *unused)
+{
+ struct watchdog_nmi_status status;
+ watchdog_nmi_status_reporter local_reporter;
+ int ret;
+
+ wait_event(arm_pmu_wait, arm_pmu_initialized);
+ status.cpu = raw_smp_processor_id();
+
+ if (!check_pmu_nmi_ability()) {
+ status.status = -ENODEV;
+ goto report;
+ }
+
+ ret = hardlockup_detector_perf_enable();
+ status.status = ret;
+
+report:
+ local_reporter = (watchdog_nmi_status_reporter)atomic64_fetch_and(0,
+ (atomic64_t *)&status_reporter);
+ if (local_reporter)
+ (*local_reporter)(&status);
+
+ return 0;
+}
+
+/* As for watchdog_nmi_disable(), using the default implement */
+void watchdog_nmi_enable(unsigned int cpu)
+{
+ struct task_struct *t;
+
+ /* PMU is not ready */
+ if (!arm_pmu_initialized) {
+ t = kthread_create_on_cpu(hld_enabled_thread_fun, NULL, cpu,
+ "arm64_hld.%u");
+ if (IS_ERR(t))
+ return;
+ wake_up_process(t);
+ return;
+ }
+
+ /* For hotplug in cpu */
+ hardlockup_detector_perf_enable();
+}
+
+int __init watchdog_nmi_probe(watchdog_nmi_status_reporter notifier)
+{
+ /* On arm64, arm pmu is not ready at this stage */
+ status_reporter = notifier;
+ return -EBUSY;
+}
+
diff --git a/drivers/perf/arm_pmu.c b/drivers/perf/arm_pmu.c
index 3cbc3baf087f..e08961b37922 100644
--- a/drivers/perf/arm_pmu.c
+++ b/drivers/perf/arm_pmu.c
@@ -105,6 +105,11 @@ static DEFINE_PER_CPU(const struct pmu_irq_ops *, cpu_irq_ops);

static bool has_nmi;

+bool check_pmu_nmi_ability(void)
+{
+ return has_nmi;
+}
+
static inline u64 arm_pmu_event_max_period(struct perf_event *event)
{
if (event->hw.flags & ARMPMU_EVT_64BIT)
--
2.31.1

2021-09-15 03:53:50

by Pingfan Liu

[permalink] [raw]
Subject: [PATCH 3/5] kernel/watchdog: adapt the watchdog_hld interface for async model

When lockup_detector_init()->watchdog_nmi_probe(), PMU may be not ready
yet. E.g. on arm64, PMU is not ready until
device_initcall(armv8_pmu_driver_init). And it is deeply integrated
with the driver model and cpuhp. Hence it is hard to push this
initialization before smp_init().

But it is easy to take an opposite approach by enabling watchdog_hld to
get the capability of PMU async.

The async model is achieved by introducing an extra parameter notifier
of watchdog_nmi_probe().

Note after this patch, the async model, which is utilized by the next
patch, does not take effect yet.

Signed-off-by: Pingfan Liu <[email protected]>
Cc: Sumit Garg <[email protected]>
Cc: Catalin Marinas <[email protected]>
Cc: Will Deacon <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Arnaldo Carvalho de Melo <[email protected]>
Cc: Mark Rutland <[email protected]>
Cc: Alexander Shishkin <[email protected]>
Cc: Jiri Olsa <[email protected]>
Cc: Namhyung Kim <[email protected]>
Cc: Marc Zyngier <[email protected]>
Cc: Julien Thierry <[email protected]>
Cc: Kees Cook <[email protected]>
Cc: Masahiro Yamada <[email protected]>
Cc: Sami Tolvanen <[email protected]>
Cc: Petr Mladek <[email protected]>
Cc: Andrew Morton <[email protected]>
Cc: Wang Qing <[email protected]>
Cc: "Peter Zijlstra (Intel)" <[email protected]>
Cc: Santosh Sivaraj <[email protected]>
To: [email protected]
---
include/linux/nmi.h | 12 +++++++++--
kernel/watchdog.c | 49 +++++++++++++++++++++++++++++++++++----------
2 files changed, 48 insertions(+), 13 deletions(-)

diff --git a/include/linux/nmi.h b/include/linux/nmi.h
index 750c7f395ca9..70665fa6e0a9 100644
--- a/include/linux/nmi.h
+++ b/include/linux/nmi.h
@@ -78,8 +78,10 @@ static inline void reset_hung_task_detector(void) { }
*/
#define NMI_WATCHDOG_ENABLED_BIT 0
#define SOFT_WATCHDOG_ENABLED_BIT 1
+#define NMI_WATCHDOG_UNDETERMINED_BIT 2
#define NMI_WATCHDOG_ENABLED (1 << NMI_WATCHDOG_ENABLED_BIT)
#define SOFT_WATCHDOG_ENABLED (1 << SOFT_WATCHDOG_ENABLED_BIT)
+#define NMI_WATCHDOG_UNDETERMINED (1 << NMI_WATCHDOG_UNDETERMINED_BIT)

#if defined(CONFIG_HARDLOCKUP_DETECTOR)
extern void hardlockup_detector_disable(void);
@@ -116,10 +118,16 @@ static inline int hardlockup_detector_perf_init(void) { return 0; }
# endif
#endif

+struct watchdog_nmi_status {
+ unsigned int cpu;
+ int status;
+};
+
+typedef void (*watchdog_nmi_status_reporter)(struct watchdog_nmi_status *);
void watchdog_nmi_stop(void);
void watchdog_nmi_start(void);
-int watchdog_nmi_probe(void);
-int watchdog_nmi_enable(unsigned int cpu);
+int watchdog_nmi_probe(watchdog_nmi_status_reporter notifier);
+void watchdog_nmi_enable(unsigned int cpu);
void watchdog_nmi_disable(unsigned int cpu);

/**
diff --git a/kernel/watchdog.c b/kernel/watchdog.c
index e2a9e3331416..4ab71943d65f 100644
--- a/kernel/watchdog.c
+++ b/kernel/watchdog.c
@@ -40,7 +40,7 @@ int __read_mostly watchdog_user_enabled = 1;
int __read_mostly nmi_watchdog_user_enabled = NMI_WATCHDOG_DEFAULT;
int __read_mostly soft_watchdog_user_enabled = 1;
int __read_mostly watchdog_thresh = 10;
-static int __read_mostly nmi_watchdog_available;
+static int __read_mostly nmi_watchdog_status;

struct cpumask watchdog_cpumask __read_mostly;
unsigned long *watchdog_cpumask_bits = cpumask_bits(&watchdog_cpumask);
@@ -85,6 +85,10 @@ __setup("nmi_watchdog=", hardlockup_panic_setup);

#endif /* CONFIG_HARDLOCKUP_DETECTOR */

+static void lockup_detector_update_enable(void);
+
+static watchdog_nmi_status_reporter status_reporter;
+
/*
* These functions can be overridden if an architecture implements its
* own hardlockup detector.
@@ -93,10 +97,9 @@ __setup("nmi_watchdog=", hardlockup_panic_setup);
* softlockup watchdog start and stop. The arch must select the
* SOFTLOCKUP_DETECTOR Kconfig.
*/
-int __weak watchdog_nmi_enable(unsigned int cpu)
+void __weak watchdog_nmi_enable(unsigned int cpu)
{
hardlockup_detector_perf_enable();
- return 0;
}

void __weak watchdog_nmi_disable(unsigned int cpu)
@@ -104,8 +107,28 @@ void __weak watchdog_nmi_disable(unsigned int cpu)
hardlockup_detector_perf_disable();
}

-/* Return 0, if a NMI watchdog is available. Error code otherwise */
-int __weak __init watchdog_nmi_probe(void)
+static void watchdog_nmi_report_capability(struct watchdog_nmi_status *data)
+{
+ /* Set status to 1 temporary to block any further access */
+ if (atomic_cmpxchg((atomic_t *)&nmi_watchdog_status, -EBUSY, 1)
+ == -EBUSY) {
+ if (!data->status) {
+ nmi_watchdog_status = 0;
+ lockup_detector_update_enable();
+ } else {
+ nmi_watchdog_status = -ENODEV;
+ /* turn offf watchdog_enabled forever */
+ lockup_detector_update_enable();
+ pr_info("Perf NMI watchdog permanently disabled\n");
+ }
+ }
+}
+
+/*
+ * Return 0, if a NMI watchdog is available. -ENODEV if unavailable. -EBUSY if
+ * undetermined at this stage, and async notifier will update later.
+ */
+int __weak __init watchdog_nmi_probe(watchdog_nmi_status_reporter notifier)
{
return hardlockup_detector_perf_init();
}
@@ -144,8 +167,12 @@ static void lockup_detector_update_enable(void)
watchdog_enabled = 0;
if (!watchdog_user_enabled)
return;
- if (nmi_watchdog_available && nmi_watchdog_user_enabled)
- watchdog_enabled |= NMI_WATCHDOG_ENABLED;
+ if (nmi_watchdog_user_enabled) {
+ if (nmi_watchdog_status == 0)
+ watchdog_enabled |= NMI_WATCHDOG_ENABLED;
+ else if (nmi_watchdog_status == -EBUSY)
+ watchdog_enabled |= NMI_WATCHDOG_UNDETERMINED;
+ }
if (soft_watchdog_user_enabled)
watchdog_enabled |= SOFT_WATCHDOG_ENABLED;
}
@@ -467,7 +494,8 @@ static void watchdog_enable(unsigned int cpu)
/* Initialize timestamp */
update_touch_ts();
/* Enable the perf event */
- if (watchdog_enabled & NMI_WATCHDOG_ENABLED)
+ if (watchdog_enabled &
+ (NMI_WATCHDOG_ENABLED | NMI_WATCHDOG_UNDETERMINED))
watchdog_nmi_enable(cpu);
}

@@ -682,7 +710,7 @@ int proc_watchdog(struct ctl_table *table, int write,
int proc_nmi_watchdog(struct ctl_table *table, int write,
void *buffer, size_t *lenp, loff_t *ppos)
{
- if (!nmi_watchdog_available && write)
+ if (!nmi_watchdog_status && write)
return -ENOTSUPP;
return proc_watchdog_common(NMI_WATCHDOG_ENABLED,
table, write, buffer, lenp, ppos);
@@ -748,7 +776,6 @@ void __init lockup_detector_init(void)
cpumask_copy(&watchdog_cpumask,
housekeeping_cpumask(HK_FLAG_TIMER));

- if (!watchdog_nmi_probe())
- nmi_watchdog_available = true;
+ nmi_watchdog_status = watchdog_nmi_probe(watchdog_nmi_report_capability);
lockup_detector_setup();
}
--
2.31.1

2021-09-15 14:08:06

by Peter Zijlstra

[permalink] [raw]
Subject: Re: [PATCH 3/5] kernel/watchdog: adapt the watchdog_hld interface for async model

On Wed, Sep 15, 2021 at 11:51:01AM +0800, Pingfan Liu wrote:
> When lockup_detector_init()->watchdog_nmi_probe(), PMU may be not ready
> yet. E.g. on arm64, PMU is not ready until
> device_initcall(armv8_pmu_driver_init). And it is deeply integrated
> with the driver model and cpuhp. Hence it is hard to push this
> initialization before smp_init().
>
> But it is easy to take an opposite approach by enabling watchdog_hld to
> get the capability of PMU async.
>
> The async model is achieved by introducing an extra parameter notifier
> of watchdog_nmi_probe().
>
> Note after this patch, the async model, which is utilized by the next
> patch, does not take effect yet.

I can't make any sense of what you're trying to do..

> +static void watchdog_nmi_report_capability(struct watchdog_nmi_status *data)
> +{
> + /* Set status to 1 temporary to block any further access */
> + if (atomic_cmpxchg((atomic_t *)&nmi_watchdog_status, -EBUSY, 1)
> + == -EBUSY) {

But this..

> + if (!data->status) {
> + nmi_watchdog_status = 0;
> + lockup_detector_update_enable();
> + } else {
> + nmi_watchdog_status = -ENODEV;
> + /* turn offf watchdog_enabled forever */
> + lockup_detector_update_enable();
> + pr_info("Perf NMI watchdog permanently disabled\n");
> + }
> + }
> +}

> @@ -467,7 +494,8 @@ static void watchdog_enable(unsigned int cpu)
> /* Initialize timestamp */
> update_touch_ts();
> /* Enable the perf event */
> - if (watchdog_enabled & NMI_WATCHDOG_ENABLED)
> + if (watchdog_enabled &
> + (NMI_WATCHDOG_ENABLED | NMI_WATCHDOG_UNDETERMINED))

and this, are horrible indenting.

2021-09-16 03:08:52

by Pingfan Liu

[permalink] [raw]
Subject: Re: [PATCH 3/5] kernel/watchdog: adapt the watchdog_hld interface for async model

On Wed, Sep 15, 2021 at 04:02:28PM +0200, Peter Zijlstra wrote:
> On Wed, Sep 15, 2021 at 11:51:01AM +0800, Pingfan Liu wrote:
> > When lockup_detector_init()->watchdog_nmi_probe(), PMU may be not ready
> > yet. E.g. on arm64, PMU is not ready until
> > device_initcall(armv8_pmu_driver_init). And it is deeply integrated
> > with the driver model and cpuhp. Hence it is hard to push this
> > initialization before smp_init().
> >
> > But it is easy to take an opposite approach by enabling watchdog_hld to
> > get the capability of PMU async.
> >
> > The async model is achieved by introducing an extra parameter notifier
> > of watchdog_nmi_probe().
> >
> > Note after this patch, the async model, which is utilized by the next
> > patch, does not take effect yet.
>
> I can't make any sense of what you're trying to do..
>
Sorry for a bad expression. what I mean is: this patch [3/5] provides an
framework for async model. But since watchdog_nmi_probe() still return 0 or
-ENODEV after this patch, the code's behavior is the same as original.

Does it make sense to you?
> > +static void watchdog_nmi_report_capability(struct watchdog_nmi_status *data)
> > +{
> > + /* Set status to 1 temporary to block any further access */
> > + if (atomic_cmpxchg((atomic_t *)&nmi_watchdog_status, -EBUSY, 1)
> > + == -EBUSY) {
>
> But this..
>
Oh, check other codes, for a wrapped condition, blanks should be better choice.
> > + if (!data->status) {
> > + nmi_watchdog_status = 0;
> > + lockup_detector_update_enable();
> > + } else {
> > + nmi_watchdog_status = -ENODEV;
> > + /* turn offf watchdog_enabled forever */
> > + lockup_detector_update_enable();
> > + pr_info("Perf NMI watchdog permanently disabled\n");
> > + }
> > + }
> > +}
>
> > @@ -467,7 +494,8 @@ static void watchdog_enable(unsigned int cpu)
> > /* Initialize timestamp */
> > update_touch_ts();
> > /* Enable the perf event */
> > - if (watchdog_enabled & NMI_WATCHDOG_ENABLED)
> > + if (watchdog_enabled &
> > + (NMI_WATCHDOG_ENABLED | NMI_WATCHDOG_UNDETERMINED))
>
> and this, are horrible indenting.

Ditto.

Thanks for your comment and review.

Regards,

Pingfan

2021-09-16 08:29:59

by Petr Mladek

[permalink] [raw]
Subject: Re: [PATCH 3/5] kernel/watchdog: adapt the watchdog_hld interface for async model

On Wed 2021-09-15 11:51:01, Pingfan Liu wrote:
> When lockup_detector_init()->watchdog_nmi_probe(), PMU may be not ready
> yet. E.g. on arm64, PMU is not ready until
> device_initcall(armv8_pmu_driver_init). And it is deeply integrated
> with the driver model and cpuhp. Hence it is hard to push this
> initialization before smp_init().
>
> But it is easy to take an opposite approach by enabling watchdog_hld to
> get the capability of PMU async.

This is another cryptic description. I have probably got it after
looking at the 5th patch (was not Cc :-(

> The async model is achieved by introducing an extra parameter notifier
> of watchdog_nmi_probe().

I would say that the code is horrible and looks too complex.

What about simply calling watchdog_nmi_probe() and
lockup_detector_setup() once again when watchdog_nmi_probe()
failed in lockup_detector_init()?

Or do not call lockup_detector_init() at all in
kernel_init_freeable() when PMU is not ready yet.

Best Regards,
Petr

2021-09-16 08:37:04

by Petr Mladek

[permalink] [raw]
Subject: Re: [PATCH 3/5] kernel/watchdog: adapt the watchdog_hld interface for async model

On Thu 2021-09-16 10:29:05, Petr Mladek wrote:
> On Wed 2021-09-15 11:51:01, Pingfan Liu wrote:
> > When lockup_detector_init()->watchdog_nmi_probe(), PMU may be not ready
> > yet. E.g. on arm64, PMU is not ready until
> > device_initcall(armv8_pmu_driver_init). And it is deeply integrated
> > with the driver model and cpuhp. Hence it is hard to push this
> > initialization before smp_init().
> >
> > But it is easy to take an opposite approach by enabling watchdog_hld to
> > get the capability of PMU async.
>
> This is another cryptic description. I have probably got it after
> looking at the 5th patch (was not Cc :-(
>
> > The async model is achieved by introducing an extra parameter notifier
> > of watchdog_nmi_probe().
>
> I would say that the code is horrible and looks too complex.
>
> What about simply calling watchdog_nmi_probe() and
> lockup_detector_setup() once again when watchdog_nmi_probe()
> failed in lockup_detector_init()?
>
> Or do not call lockup_detector_init() at all in
> kernel_init_freeable() when PMU is not ready yet.

BTW: It is an overkill to create your own kthread just to run some
code just once. And you implemeted it a wrong way. The kthread
must wait in a loop until someone else stop it and read
the exit code.

The easiest solution is to queue a work into system_wq for this.

I was not Cc for the 5th patch, so I write it here.

Best Regards,
Petr

2021-09-17 15:48:05

by Pingfan Liu

[permalink] [raw]
Subject: Re: [PATCH 3/5] kernel/watchdog: adapt the watchdog_hld interface for async model

On Thu, Sep 16, 2021 at 10:36:10AM +0200, Petr Mladek wrote:
> On Thu 2021-09-16 10:29:05, Petr Mladek wrote:
> > On Wed 2021-09-15 11:51:01, Pingfan Liu wrote:
> > > When lockup_detector_init()->watchdog_nmi_probe(), PMU may be not ready
> > > yet. E.g. on arm64, PMU is not ready until
> > > device_initcall(armv8_pmu_driver_init). And it is deeply integrated
> > > with the driver model and cpuhp. Hence it is hard to push this
> > > initialization before smp_init().
> > >
> > > But it is easy to take an opposite approach by enabling watchdog_hld to
> > > get the capability of PMU async.
> >
> > This is another cryptic description. I have probably got it after
> > looking at the 5th patch (was not Cc :-(
> >
> > > The async model is achieved by introducing an extra parameter notifier
> > > of watchdog_nmi_probe().
> >
> > I would say that the code is horrible and looks too complex.
> >
> > What about simply calling watchdog_nmi_probe() and
> > lockup_detector_setup() once again when watchdog_nmi_probe()
> > failed in lockup_detector_init()?
> >
> > Or do not call lockup_detector_init() at all in
> > kernel_init_freeable() when PMU is not ready yet.
>
> BTW: It is an overkill to create your own kthread just to run some
> code just once. And you implemeted it a wrong way. The kthread

I had thought about queue_work_on() in watchdog_nmi_enable(). But since
this work will block the worker kthread for this cpu. So finally,
another worker kthread should be created for other work.

But now, I think queue_work_on() may be more neat.

> must wait in a loop until someone else stop it and read
> the exit code.
>
Is this behavior mandotory? Since this kthread can decide the exit
condition by itself.

And a quick through static int kthread(void *_create), I am not aware of
any problem with it.

> The easiest solution is to queue a work into system_wq for this.
>
> I was not Cc for the 5th patch, so I write it here.
>
Sorry for the inconvenience and Cc you now in case that you have further
comments. I will cc you in the next version.

Appreciate for all of your suggestions and comments

Thanks,

Pingfan

2021-09-17 22:53:14

by Pingfan Liu

[permalink] [raw]
Subject: Re: [PATCH 3/5] kernel/watchdog: adapt the watchdog_hld interface for async model

On Thu, Sep 16, 2021 at 10:29:04AM +0200, Petr Mladek wrote:
> On Wed 2021-09-15 11:51:01, Pingfan Liu wrote:
> > When lockup_detector_init()->watchdog_nmi_probe(), PMU may be not ready
> > yet. E.g. on arm64, PMU is not ready until
> > device_initcall(armv8_pmu_driver_init). And it is deeply integrated
> > with the driver model and cpuhp. Hence it is hard to push this
> > initialization before smp_init().
> >
> > But it is easy to take an opposite approach by enabling watchdog_hld to
> > get the capability of PMU async.
>
> This is another cryptic description. I have probably got it after
> looking at the 5th patch (was not Cc :-(
>
> > The async model is achieved by introducing an extra parameter notifier
> > of watchdog_nmi_probe().
>
> I would say that the code is horrible and looks too complex.
>
> What about simply calling watchdog_nmi_probe() and
> lockup_detector_setup() once again when watchdog_nmi_probe()
> failed in lockup_detector_init()?
>
It may work. But there is still a way to report the PMU NMI capability
to watchdog layer accurately. And the API should be extened somehow.

I am thinking something, maybe I can model in another way.
> Or do not call lockup_detector_init() at all in
> kernel_init_freeable() when PMU is not ready yet.
>

This may be not a good choice. Since lockup_detector_init() had better
be ready as early as possible, especially before drivers.


Thanks,

Pingfan

2021-09-17 23:23:17

by Pingfan Liu

[permalink] [raw]
Subject: Re: [PATCH 5/5] arm64/watchdog_hld: enable hard lockup on arm64 platform

On Wed, Sep 15, 2021 at 11:51:03AM +0800, Pingfan Liu wrote:

+ Petr Mladek

> With watchdog_hld armed with the async model, arm64 can probe and enable
> perf NMI after smp_init(). At the boot stage, all of cpus arm hard
> lockup detector in the async model.
>
> In this patch, the function hw_nmi_get_sample_period() is borrowed from
> [1], credit goes to Sumit.
>
> [1]: http://lore.kernel.org/linux-arm-kernel/[email protected]
>
> Signed-off-by: Pingfan Liu <[email protected]>
> Cc: Catalin Marinas <[email protected]>
> Cc: Will Deacon <[email protected]>
> Cc: Peter Zijlstra <[email protected]>
> Cc: Ingo Molnar <[email protected]>
> Cc: Arnaldo Carvalho de Melo <[email protected]>
> Cc: Mark Rutland <[email protected]>
> Cc: Alexander Shishkin <[email protected]>
> Cc: Jiri Olsa <[email protected]>
> Cc: Namhyung Kim <[email protected]>
> Cc: Marc Zyngier <[email protected]>
> Cc: Julien Thierry <[email protected]>
> Cc: Kees Cook <[email protected]>
> Cc: Masahiro Yamada <[email protected]>
> Cc: Sami Tolvanen <[email protected]>
> To: [email protected]
> Cc: [email protected]
> ---
> arch/arm64/Kconfig | 3 ++
> arch/arm64/include/asm/perf_event.h | 5 ++
> arch/arm64/kernel/Makefile | 1 +
> arch/arm64/kernel/perf_event.c | 14 ++++-
> arch/arm64/kernel/watchdog_hld.c | 83 +++++++++++++++++++++++++++++
> drivers/perf/arm_pmu.c | 5 ++
> 6 files changed, 109 insertions(+), 2 deletions(-)
> create mode 100644 arch/arm64/kernel/watchdog_hld.c
>
> diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
> index 62c3c1d2190f..0f49e58a9dd8 100644
> --- a/arch/arm64/Kconfig
> +++ b/arch/arm64/Kconfig
> @@ -193,6 +193,9 @@ config ARM64
> select HAVE_PERF_EVENTS
> select HAVE_PERF_REGS
> select HAVE_PERF_USER_STACK_DUMP
> + select HAVE_PERF_EVENTS_NMI
> + select HAVE_HARDLOCKUP_DETECTOR_PERF \
> + if PERF_EVENTS && HW_PERF_EVENTS && HAVE_PERF_EVENTS_NMI
> select HAVE_REGS_AND_STACK_ACCESS_API
> select HAVE_FUNCTION_ARG_ACCESS_API
> select HAVE_FUTEX_CMPXCHG if FUTEX
> diff --git a/arch/arm64/include/asm/perf_event.h b/arch/arm64/include/asm/perf_event.h
> index 4ef6f19331f9..712a75f432f0 100644
> --- a/arch/arm64/include/asm/perf_event.h
> +++ b/arch/arm64/include/asm/perf_event.h
> @@ -6,6 +6,7 @@
> #ifndef __ASM_PERF_EVENT_H
> #define __ASM_PERF_EVENT_H
>
> +#include <linux/wait.h>
> #include <asm/stack_pointer.h>
> #include <asm/ptrace.h>
>
> @@ -259,4 +260,8 @@ extern unsigned long perf_misc_flags(struct pt_regs *regs);
> (regs)->pstate = PSR_MODE_EL1h; \
> }
>
> +extern bool arm_pmu_initialized;
> +extern wait_queue_head_t arm_pmu_wait;
> +extern bool check_pmu_nmi_ability(void);
> +
> #endif
> diff --git a/arch/arm64/kernel/Makefile b/arch/arm64/kernel/Makefile
> index 3f1490bfb938..789c2fe5bb90 100644
> --- a/arch/arm64/kernel/Makefile
> +++ b/arch/arm64/kernel/Makefile
> @@ -46,6 +46,7 @@ obj-$(CONFIG_MODULES) += module.o
> obj-$(CONFIG_ARM64_MODULE_PLTS) += module-plts.o
> obj-$(CONFIG_PERF_EVENTS) += perf_regs.o perf_callchain.o
> obj-$(CONFIG_HW_PERF_EVENTS) += perf_event.o
> +obj-$(CONFIG_HARDLOCKUP_DETECTOR_PERF) += watchdog_hld.o
> obj-$(CONFIG_HAVE_HW_BREAKPOINT) += hw_breakpoint.o
> obj-$(CONFIG_CPU_PM) += sleep.o suspend.o
> obj-$(CONFIG_CPU_IDLE) += cpuidle.o
> diff --git a/arch/arm64/kernel/perf_event.c b/arch/arm64/kernel/perf_event.c
> index d07788dad388..c4144cea0d55 100644
> --- a/arch/arm64/kernel/perf_event.c
> +++ b/arch/arm64/kernel/perf_event.c
> @@ -24,6 +24,9 @@
> #include <linux/sched_clock.h>
> #include <linux/smp.h>
>
> +bool arm_pmu_initialized;
> +DECLARE_WAIT_QUEUE_HEAD(arm_pmu_wait);
> +
> /* ARMv8 Cortex-A53 specific event types. */
> #define ARMV8_A53_PERFCTR_PREF_LINEFILL 0xC2
>
> @@ -1284,10 +1287,17 @@ static struct platform_driver armv8_pmu_driver = {
>
> static int __init armv8_pmu_driver_init(void)
> {
> + int ret;
> +
> if (acpi_disabled)
> - return platform_driver_register(&armv8_pmu_driver);
> + ret = platform_driver_register(&armv8_pmu_driver);
> else
> - return arm_pmu_acpi_probe(armv8_pmuv3_init);
> + ret = arm_pmu_acpi_probe(armv8_pmuv3_init);
> +
> + arm_pmu_initialized = true;
> + wake_up_all(&arm_pmu_wait);
> +
> + return ret;
> }
> device_initcall(armv8_pmu_driver_init)
>
> diff --git a/arch/arm64/kernel/watchdog_hld.c b/arch/arm64/kernel/watchdog_hld.c
> new file mode 100644
> index 000000000000..18bfa62f1058
> --- /dev/null
> +++ b/arch/arm64/kernel/watchdog_hld.c
> @@ -0,0 +1,83 @@
> +// SPDX-License-Identifier: GPL-2.0
> +#include <linux/kthread.h>
> +#include <linux/sched.h>
> +#include <linux/nmi.h>
> +#include <linux/cpufreq.h>
> +#include <asm/perf_event.h>
> +
> +/*
> + * Safe maximum CPU frequency in case a particular platform doesn't implement
> + * cpufreq driver. Although, architecture doesn't put any restrictions on
> + * maximum frequency but 5 GHz seems to be safe maximum given the available
> + * Arm CPUs in the market which are clocked much less than 5 GHz. On the other
> + * hand, we can't make it much higher as it would lead to a large hard-lockup
> + * detection timeout on parts which are running slower (eg. 1GHz on
> + * Developerbox) and doesn't possess a cpufreq driver.
> + */
> +#define SAFE_MAX_CPU_FREQ 5000000000UL // 5 GHz
> +u64 hw_nmi_get_sample_period(int watchdog_thresh)
> +{
> + unsigned int cpu = smp_processor_id();
> + unsigned long max_cpu_freq;
> +
> + max_cpu_freq = cpufreq_get_hw_max_freq(cpu) * 1000UL;
> + if (!max_cpu_freq)
> + max_cpu_freq = SAFE_MAX_CPU_FREQ;
> +
> + return (u64)max_cpu_freq * watchdog_thresh;
> +}
> +
> +static watchdog_nmi_status_reporter status_reporter;
> +
> +static int hld_enabled_thread_fun(void *unused)
> +{
> + struct watchdog_nmi_status status;
> + watchdog_nmi_status_reporter local_reporter;
> + int ret;
> +
> + wait_event(arm_pmu_wait, arm_pmu_initialized);
> + status.cpu = raw_smp_processor_id();
> +
> + if (!check_pmu_nmi_ability()) {
> + status.status = -ENODEV;
> + goto report;
> + }
> +
> + ret = hardlockup_detector_perf_enable();
> + status.status = ret;
> +
> +report:
> + local_reporter = (watchdog_nmi_status_reporter)atomic64_fetch_and(0,
> + (atomic64_t *)&status_reporter);
> + if (local_reporter)
> + (*local_reporter)(&status);
> +
> + return 0;
> +}
> +
> +/* As for watchdog_nmi_disable(), using the default implement */
> +void watchdog_nmi_enable(unsigned int cpu)
> +{
> + struct task_struct *t;
> +
> + /* PMU is not ready */
> + if (!arm_pmu_initialized) {
> + t = kthread_create_on_cpu(hld_enabled_thread_fun, NULL, cpu,
> + "arm64_hld.%u");
> + if (IS_ERR(t))
> + return;
> + wake_up_process(t);
> + return;
> + }
> +
> + /* For hotplug in cpu */
> + hardlockup_detector_perf_enable();
> +}
> +
> +int __init watchdog_nmi_probe(watchdog_nmi_status_reporter notifier)
> +{
> + /* On arm64, arm pmu is not ready at this stage */
> + status_reporter = notifier;
> + return -EBUSY;
> +}
> +
> diff --git a/drivers/perf/arm_pmu.c b/drivers/perf/arm_pmu.c
> index 3cbc3baf087f..e08961b37922 100644
> --- a/drivers/perf/arm_pmu.c
> +++ b/drivers/perf/arm_pmu.c
> @@ -105,6 +105,11 @@ static DEFINE_PER_CPU(const struct pmu_irq_ops *, cpu_irq_ops);
>
> static bool has_nmi;
>
> +bool check_pmu_nmi_ability(void)
> +{
> + return has_nmi;
> +}
> +
> static inline u64 arm_pmu_event_max_period(struct perf_event *event)
> {
> if (event->hw.flags & ARMPMU_EVT_64BIT)
> --
> 2.31.1
>
>
> _______________________________________________
> linux-arm-kernel mailing list
> [email protected]
> http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

2021-09-20 12:24:42

by Petr Mladek

[permalink] [raw]
Subject: Re: [PATCH 3/5] kernel/watchdog: adapt the watchdog_hld interface for async model

On Fri 2021-09-17 23:41:31, Pingfan Liu wrote:
> On Thu, Sep 16, 2021 at 10:36:10AM +0200, Petr Mladek wrote:
> > On Thu 2021-09-16 10:29:05, Petr Mladek wrote:
> > > On Wed 2021-09-15 11:51:01, Pingfan Liu wrote:
> > > > When lockup_detector_init()->watchdog_nmi_probe(), PMU may be not ready
> > > > yet. E.g. on arm64, PMU is not ready until
> > > > device_initcall(armv8_pmu_driver_init). And it is deeply integrated
> > > > with the driver model and cpuhp. Hence it is hard to push this
> > > > initialization before smp_init().
> > > >
> > > > But it is easy to take an opposite approach by enabling watchdog_hld to
> > > > get the capability of PMU async.
> > >
> > > This is another cryptic description. I have probably got it after
> > > looking at the 5th patch (was not Cc :-(
> > >
> > > > The async model is achieved by introducing an extra parameter notifier
> > > > of watchdog_nmi_probe().
> > >
> > > I would say that the code is horrible and looks too complex.
> > >
> > > What about simply calling watchdog_nmi_probe() and
> > > lockup_detector_setup() once again when watchdog_nmi_probe()
> > > failed in lockup_detector_init()?
> > >
> > > Or do not call lockup_detector_init() at all in
> > > kernel_init_freeable() when PMU is not ready yet.
> >
> > BTW: It is an overkill to create your own kthread just to run some
> > code just once. And you implemeted it a wrong way. The kthread
>
> I had thought about queue_work_on() in watchdog_nmi_enable(). But since
> this work will block the worker kthread for this cpu. So finally,
> another worker kthread should be created for other work.

This is not a problem. workqueues use a pool of workers that are
already created and can be used when one worker gets blocked.

> But now, I think queue_work_on() may be more neat.
>
> > must wait in a loop until someone else stop it and read
> > the exit code.
> >
> Is this behavior mandotory? Since this kthread can decide the exit
> condition by itself.

I am pretty sure. Unfortunately, I can't find it in the documentation.

My view is the following. Each process has a task_struct. The
scheduler needs task_struct so that it can switch processes.
The task_struct must still exist when the process exits.
The scheduler puts the task into TASK_DEAD state.
Another process has to read the exit code and destroy the
task struct.

See, do_exit() in kernel/exit.c. It ends with do_dead_task().
It is the point when the process goes into TASK_DEAD state.

For a good example, see lib/test_vmalloc.c. The kthread waits
until anyone want him to stop:

static int test_func(void *private)
{
[...]

/*
* Wait for the kthread_stop() call.
*/
while (!kthread_should_stop())
msleep(10);

return 0;
}

The kthreads are started and stopped in:

static void do_concurrent_test(void)
{
[...]
for (i = 0; i < nr_threads; i++) {
[...]
t->task = kthread_run(test_func, t, "vmalloc_test/%d", i);
[...]
/*
* Sleep quiet until all workers are done with 1 second
* interval. Since the test can take a lot of time we
* can run into a stack trace of the hung task. That is
* why we go with completion_timeout and HZ value.
*/
do {
ret = wait_for_completion_timeout(&test_all_done_comp, HZ);
} while (!ret);
[...]
for (i = 0; i < nr_threads; i++) {
[...]
if (!IS_ERR(t->task))
kthread_stop(t->task);
[...]
}


You do not have to solve this if you use the system workqueue
(system_wq).

Best Regards,
Petr

2021-09-22 04:29:26

by Pingfan Liu

[permalink] [raw]
Subject: Re: [PATCH 3/5] kernel/watchdog: adapt the watchdog_hld interface for async model

On Mon, Sep 20, 2021 at 10:20:46AM +0200, Petr Mladek wrote:
> On Fri 2021-09-17 23:41:31, Pingfan Liu wrote:
[...]
> >
> > I had thought about queue_work_on() in watchdog_nmi_enable(). But since
> > this work will block the worker kthread for this cpu. So finally,
> > another worker kthread should be created for other work.
>
> This is not a problem. workqueues use a pool of workers that are
> already created and can be used when one worker gets blocked.
>
Yes, you are right. The creation is dynamic and immune to blocking.

> > But now, I think queue_work_on() may be more neat.
> >
> > > must wait in a loop until someone else stop it and read
> > > the exit code.
> > >
> > Is this behavior mandotory? Since this kthread can decide the exit
> > condition by itself.
>
> I am pretty sure. Unfortunately, I can't find it in the documentation.
>
> My view is the following. Each process has a task_struct. The
> scheduler needs task_struct so that it can switch processes.
> The task_struct must still exist when the process exits.
> The scheduler puts the task into TASK_DEAD state.
> Another process has to read the exit code and destroy the
> task struct.
>
Thanks for bringing up this, and I have an opportunity to think about it.

The core of the problem is put_task_struct(), and who releases the
last one.
It should be: finish_task_switch()->put_task_struct_rcu_user()->delayed_put_task_struct()->put_task_struct(),
if (unlikely(prev_state == TASK_DEAD)). It does not depend on another task.

> See, do_exit() in kernel/exit.c. It ends with do_dead_task().
> It is the point when the process goes into TASK_DEAD state.
>
> For a good example, see lib/test_vmalloc.c. The kthread waits
> until anyone want him to stop:
>
> static int test_func(void *private)
> {
> [...]
>
> /*
> * Wait for the kthread_stop() call.
> */
> while (!kthread_should_stop())
> msleep(10);
>
> return 0;
> }
>
> The kthreads are started and stopped in:
>
> static void do_concurrent_test(void)
> {
> [...]
> for (i = 0; i < nr_threads; i++) {
> [...]
> t->task = kthread_run(test_func, t, "vmalloc_test/%d", i);
> [...]
> /*
> * Sleep quiet until all workers are done with 1 second
> * interval. Since the test can take a lot of time we
> * can run into a stack trace of the hung task. That is
> * why we go with completion_timeout and HZ value.
> */
> do {
> ret = wait_for_completion_timeout(&test_all_done_comp, HZ);
> } while (!ret);
> [...]
> for (i = 0; i < nr_threads; i++) {
> [...]
> if (!IS_ERR(t->task))
> kthread_stop(t->task);
> [...]
> }

They are good and elegant examples.
>
>
> You do not have to solve this if you use the system workqueue
> (system_wq).
>
Yes, workqueue is a better choice.

Thanks for your great patience.


Regards,

Pingfan