hard lockup detector is helpful to diagnose unpaired irq enable/disable.
Sumit has tried with a series, and the last one is V5 [1].
Since it lasts a long time without any update, I takes a retry, which
addresses the delay intialization of watchdog_hld.
( To: Sumit, I think the main body of [4/4] is contributed from you,so I
keep you as the author, please let me know if you dislike it and my
modification.)
There is an obstacle to integrate arm64 hw perf event into watchdog_hld.
When lockup_detector_init()->watchdog_nmi_probe(), on arm64, PMU is not
ready until device_initcall(armv8_pmu_driver_init). And it is deeply
integrated with the driver model and cpuhp. Hence it is hard to push
the initialization of armv8_pmu_driver_init() before smp_init().
But it is easy to take an opposite approach by enabling watchdog_hld to
get the capability of PMU async.
The async model is achieved by expanding watchdog_nmi_probe() with
-EBUSY, and a re-initializing work_struct which waits on a
wait_queue_head.
In this series, [1-2/4] are trivial cleanup. [3-4/4] is for this async
model.
v1 > v2:
uplift the async model from hard lockup layer to watchdog layter.
The benefit is simpler code, the drawback is re-initialize means wasted
alloc/free.
[1]: http://lore.kernel.org/linux-arm-kernel/[email protected]
Cc: Sumit Garg <[email protected]>
Cc: Catalin Marinas <[email protected]>
Cc: Will Deacon <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Arnaldo Carvalho de Melo <[email protected]>
Cc: Mark Rutland <[email protected]>
Cc: Alexander Shishkin <[email protected]>
Cc: Jiri Olsa <[email protected]>
Cc: Namhyung Kim <[email protected]>
Cc: Marc Zyngier <[email protected]>
Cc: Kees Cook <[email protected]>
Cc: Masahiro Yamada <[email protected]>
Cc: Sami Tolvanen <[email protected]>
Cc: Petr Mladek <[email protected]>
Cc: Andrew Morton <[email protected]>
Cc: Wang Qing <[email protected]>
Cc: "Peter Zijlstra (Intel)" <[email protected]>
Cc: Santosh Sivaraj <[email protected]>
To: [email protected]
To: [email protected]
*** BLURB HERE ***
Pingfan Liu (3):
kernel/watchdog: trival cleanups
kernel/watchdog_hld: clarify the condition in
hardlockup_detector_event_create()
kernel/watchdog: adapt the watchdog_hld interface for async model
Sumit Garg (1):
arm64: Enable perf events based hard lockup detector
arch/arm64/Kconfig | 2 ++
arch/arm64/kernel/Makefile | 1 +
arch/arm64/kernel/perf_event.c | 11 +++++++--
arch/arm64/kernel/watchdog_hld.c | 36 +++++++++++++++++++++++++++
drivers/perf/arm_pmu.c | 5 ++++
include/linux/nmi.h | 5 +++-
include/linux/perf/arm_pmu.h | 2 ++
kernel/watchdog.c | 42 +++++++++++++++++++++++++++-----
kernel/watchdog_hld.c | 5 +++-
9 files changed, 99 insertions(+), 10 deletions(-)
create mode 100644 arch/arm64/kernel/watchdog_hld.c
--
2.31.1
As for the context, there are two arguments to change
debug_smp_processor_id() to is_percpu_thread().
-1. watchdog_ev is percpu, and migration will frustrate the attempt
which try to bind a watchdog_ev to a cpu by protecting this func inside
the pair of preempt_disable()/preempt_enable().
-2. hardlockup_detector_event_create() indirectly calls
kmem_cache_alloc_node(), which is blockable.
So here, spelling out the really planned context "is_percpu_thread()".
Signed-off-by: Pingfan Liu <[email protected]>
Cc: Petr Mladek <[email protected]>
Cc: Andrew Morton <[email protected]>
Cc: Wang Qing <[email protected]>
Cc: "Peter Zijlstra (Intel)" <[email protected]>
Cc: Santosh Sivaraj <[email protected]>
Cc: [email protected]
To: [email protected]
---
kernel/watchdog_hld.c | 5 ++++-
1 file changed, 4 insertions(+), 1 deletion(-)
diff --git a/kernel/watchdog_hld.c b/kernel/watchdog_hld.c
index 247bf0b1582c..df010df76576 100644
--- a/kernel/watchdog_hld.c
+++ b/kernel/watchdog_hld.c
@@ -165,10 +165,13 @@ static void watchdog_overflow_callback(struct perf_event *event,
static int hardlockup_detector_event_create(void)
{
- unsigned int cpu = smp_processor_id();
+ unsigned int cpu;
struct perf_event_attr *wd_attr;
struct perf_event *evt;
+ /* This function plans to execute in cpu bound kthread */
+ WARN_ON(!is_percpu_thread());
+ cpu = raw_smp_processor_id();
wd_attr = &wd_hw_attr;
wd_attr->sample_period = hw_nmi_get_sample_period(watchdog_thresh);
--
2.31.1
On Thu 2021-09-23 22:09:49, Pingfan Liu wrote:
> As for the context, there are two arguments to change
> debug_smp_processor_id() to is_percpu_thread().
>
> -1. watchdog_ev is percpu, and migration will frustrate the attempt
> which try to bind a watchdog_ev to a cpu by protecting this func inside
> the pair of preempt_disable()/preempt_enable().
>
> -2. hardlockup_detector_event_create() indirectly calls
> kmem_cache_alloc_node(), which is blockable.
>
> So here, spelling out the really planned context "is_percpu_thread()".
The description is pretty hard to understand. I would suggest
something like:
Subject: kernel/watchdog_hld: Ensure CPU-bound context when creating
hardlockup detector event
hardlockup_detector_event_create() should create perf_event on the
current CPU. Preemption could not get disabled because
perf_event_create_kernel_counter() allocates memory. Instead,
the CPU locality is achieved by processing the code in a per-CPU
bound kthread.
Add a check to prevent mistakes when calling the code in another
code path.
> Signed-off-by: Pingfan Liu <[email protected]>
> Cc: Petr Mladek <[email protected]>
> Cc: Andrew Morton <[email protected]>
> Cc: Wang Qing <[email protected]>
> Cc: "Peter Zijlstra (Intel)" <[email protected]>
> Cc: Santosh Sivaraj <[email protected]>
> Cc: [email protected]
> To: [email protected]
> ---
> kernel/watchdog_hld.c | 5 ++++-
> 1 file changed, 4 insertions(+), 1 deletion(-)
>
> diff --git a/kernel/watchdog_hld.c b/kernel/watchdog_hld.c
> index 247bf0b1582c..df010df76576 100644
> --- a/kernel/watchdog_hld.c
> +++ b/kernel/watchdog_hld.c
> @@ -165,10 +165,13 @@ static void watchdog_overflow_callback(struct perf_event *event,
>
> static int hardlockup_detector_event_create(void)
> {
> - unsigned int cpu = smp_processor_id();
> + unsigned int cpu;
> struct perf_event_attr *wd_attr;
> struct perf_event *evt;
>
> + /* This function plans to execute in cpu bound kthread */
This does not explain why it is needed. I suggest something like:
/*
* Preemption is not disabled because memory will be allocated.
* Ensure CPU-locality by calling this in per-CPU kthread.
*/
> + WARN_ON(!is_percpu_thread());
> + cpu = raw_smp_processor_id();
> wd_attr = &wd_hw_attr;
> wd_attr->sample_period = hw_nmi_get_sample_period(watchdog_thresh);
>
Othrewise the change looks good to me.
Best Regards,
Petr
On Mon, Oct 04, 2021 at 02:32:47PM +0200, Petr Mladek wrote:
> On Thu 2021-09-23 22:09:49, Pingfan Liu wrote:
> > As for the context, there are two arguments to change
> > debug_smp_processor_id() to is_percpu_thread().
> >
> > -1. watchdog_ev is percpu, and migration will frustrate the attempt
> > which try to bind a watchdog_ev to a cpu by protecting this func inside
> > the pair of preempt_disable()/preempt_enable().
> >
> > -2. hardlockup_detector_event_create() indirectly calls
> > kmem_cache_alloc_node(), which is blockable.
> >
> > So here, spelling out the really planned context "is_percpu_thread()".
>
> The description is pretty hard to understand. I would suggest
> something like:
>
> Subject: kernel/watchdog_hld: Ensure CPU-bound context when creating
> hardlockup detector event
>
> hardlockup_detector_event_create() should create perf_event on the
> current CPU. Preemption could not get disabled because
> perf_event_create_kernel_counter() allocates memory. Instead,
> the CPU locality is achieved by processing the code in a per-CPU
> bound kthread.
>
> Add a check to prevent mistakes when calling the code in another
> code path.
>
Appreciate for that. I will use it.
> > Signed-off-by: Pingfan Liu <[email protected]>
> > Cc: Petr Mladek <[email protected]>
> > Cc: Andrew Morton <[email protected]>
> > Cc: Wang Qing <[email protected]>
> > Cc: "Peter Zijlstra (Intel)" <[email protected]>
> > Cc: Santosh Sivaraj <[email protected]>
> > Cc: [email protected]
> > To: [email protected]
> > ---
> > kernel/watchdog_hld.c | 5 ++++-
> > 1 file changed, 4 insertions(+), 1 deletion(-)
> >
> > diff --git a/kernel/watchdog_hld.c b/kernel/watchdog_hld.c
> > index 247bf0b1582c..df010df76576 100644
> > --- a/kernel/watchdog_hld.c
> > +++ b/kernel/watchdog_hld.c
> > @@ -165,10 +165,13 @@ static void watchdog_overflow_callback(struct perf_event *event,
> >
> > static int hardlockup_detector_event_create(void)
> > {
> > - unsigned int cpu = smp_processor_id();
> > + unsigned int cpu;
> > struct perf_event_attr *wd_attr;
> > struct perf_event *evt;
> >
> > + /* This function plans to execute in cpu bound kthread */
>
> This does not explain why it is needed. I suggest something like:
>
> /*
> * Preemption is not disabled because memory will be allocated.
> * Ensure CPU-locality by calling this in per-CPU kthread.
> */
>
It sounds good. I will use it.
>
> > + WARN_ON(!is_percpu_thread());
> > + cpu = raw_smp_processor_id();
> > wd_attr = &wd_hw_attr;
> > wd_attr->sample_period = hw_nmi_get_sample_period(watchdog_thresh);
> >
>
> Othrewise the change looks good to me.
>
Thank for your help.
Regards,
Pingfan
>
> _______________________________________________
> linux-arm-kernel mailing list
> [email protected]
> http://lists.infradead.org/mailman/listinfo/linux-arm-kernel