2022-05-07 16:01:28

by Ricardo Neri

[permalink] [raw]
Subject: [PATCH v6 20/29] init/main: Delay initialization of the lockup detector after smp_init()

Certain implementations of the hardlockup detector require support for
Inter-Processor Interrupt shorthands. On x86, support for these can only
be determined after all the possible CPUs have booted once (in
smp_init()). Other architectures may not need such check.

lockup_detector_init() only performs the initializations of data
structures of the lockup detector. Hence, there are no dependencies on
smp_init().

Cc: Andi Kleen <[email protected]>
Cc: Nicholas Piggin <[email protected]>
Cc: Andrew Morton <[email protected]>
Cc: Stephane Eranian <[email protected]>
Cc: "Ravi V. Shankar" <[email protected]>
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Reviewed-by: Tony Luck <[email protected]>
Signed-off-by: Ricardo Neri <[email protected]>
---
Changes since v5:
* Introduced this patch

Changes since v4:
* N/A

Changes since v3:
* N/A

Changes since v2:
* N/A

Changes since v1:
* N/A
---
init/main.c | 4 +++-
1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/init/main.c b/init/main.c
index 98182c3c2c4b..62c52c9e4c2b 100644
--- a/init/main.c
+++ b/init/main.c
@@ -1600,9 +1600,11 @@ static noinline void __init kernel_init_freeable(void)

rcu_init_tasks_generic();
do_pre_smp_initcalls();
- lockup_detector_init();

smp_init();
+
+ lockup_detector_init();
+
sched_init_smp();

padata_init();
--
2.17.1



2022-05-10 13:36:01

by Nicholas Piggin

[permalink] [raw]
Subject: Re: [PATCH v6 20/29] init/main: Delay initialization of the lockup detector after smp_init()

Excerpts from Ricardo Neri's message of May 6, 2022 9:59 am:
> Certain implementations of the hardlockup detector require support for
> Inter-Processor Interrupt shorthands. On x86, support for these can only
> be determined after all the possible CPUs have booted once (in
> smp_init()). Other architectures may not need such check.
>
> lockup_detector_init() only performs the initializations of data
> structures of the lockup detector. Hence, there are no dependencies on
> smp_init().

I think this is the only real thing which affects other watchdog types?

Not sure if it's a big problem, the secondary CPUs coming up won't
have their watchdog active until quite late, and the primary could
implement its own timeout in __cpu_up for secondary coming up, and
IPI it to get traces if necessary which is probably more robust.

Acked-by: Nicholas Piggin <[email protected]>

>
> Cc: Andi Kleen <[email protected]>
> Cc: Nicholas Piggin <[email protected]>
> Cc: Andrew Morton <[email protected]>
> Cc: Stephane Eranian <[email protected]>
> Cc: "Ravi V. Shankar" <[email protected]>
> Cc: [email protected]
> Cc: [email protected]
> Cc: [email protected]
> Reviewed-by: Tony Luck <[email protected]>
> Signed-off-by: Ricardo Neri <[email protected]>
> ---
> Changes since v5:
> * Introduced this patch
>
> Changes since v4:
> * N/A
>
> Changes since v3:
> * N/A
>
> Changes since v2:
> * N/A
>
> Changes since v1:
> * N/A
> ---
> init/main.c | 4 +++-
> 1 file changed, 3 insertions(+), 1 deletion(-)
>
> diff --git a/init/main.c b/init/main.c
> index 98182c3c2c4b..62c52c9e4c2b 100644
> --- a/init/main.c
> +++ b/init/main.c
> @@ -1600,9 +1600,11 @@ static noinline void __init kernel_init_freeable(void)
>
> rcu_init_tasks_generic();
> do_pre_smp_initcalls();
> - lockup_detector_init();
>
> smp_init();
> +
> + lockup_detector_init();
> +
> sched_init_smp();
>
> padata_init();
> --
> 2.17.1
>
>

2022-05-14 02:42:43

by Ricardo Neri

[permalink] [raw]
Subject: Re: [PATCH v6 20/29] init/main: Delay initialization of the lockup detector after smp_init()

On Tue, May 10, 2022 at 08:38:22PM +1000, Nicholas Piggin wrote:
> Excerpts from Ricardo Neri's message of May 6, 2022 9:59 am:
> > Certain implementations of the hardlockup detector require support for
> > Inter-Processor Interrupt shorthands. On x86, support for these can only
> > be determined after all the possible CPUs have booted once (in
> > smp_init()). Other architectures may not need such check.
> >
> > lockup_detector_init() only performs the initializations of data
> > structures of the lockup detector. Hence, there are no dependencies on
> > smp_init().
>

Thank you for your feedback Nicholas!

> I think this is the only real thing which affects other watchdog types?

Also patches 18 and 19 that decouple the NMI watchdog functionality from
perf.

>
> Not sure if it's a big problem, the secondary CPUs coming up won't
> have their watchdog active until quite late, and the primary could
> implement its own timeout in __cpu_up for secondary coming up, and
> IPI it to get traces if necessary which is probably more robust.

Indeed that could work. Another alternative I have been pondering is to boot
the system with the perf-based NMI watchdog enabled. Once all CPUs are up
and running, switch to the HPET-based NMI watchdog and free the PMU counters.

>
> Acked-by: Nicholas Piggin <[email protected]>

Thank you!

BR,
Ricardo

2022-05-21 00:05:15

by Nicholas Piggin

[permalink] [raw]
Subject: Re: [PATCH v6 20/29] init/main: Delay initialization of the lockup detector after smp_init()

Excerpts from Ricardo Neri's message of May 14, 2022 9:16 am:
> On Tue, May 10, 2022 at 08:38:22PM +1000, Nicholas Piggin wrote:
>> Excerpts from Ricardo Neri's message of May 6, 2022 9:59 am:
>> > Certain implementations of the hardlockup detector require support for
>> > Inter-Processor Interrupt shorthands. On x86, support for these can only
>> > be determined after all the possible CPUs have booted once (in
>> > smp_init()). Other architectures may not need such check.
>> >
>> > lockup_detector_init() only performs the initializations of data
>> > structures of the lockup detector. Hence, there are no dependencies on
>> > smp_init().
>>
>
> Thank you for your feedback Nicholas!
>
>> I think this is the only real thing which affects other watchdog types?
>
> Also patches 18 and 19 that decouple the NMI watchdog functionality from
> perf.
>
>>
>> Not sure if it's a big problem, the secondary CPUs coming up won't
>> have their watchdog active until quite late, and the primary could
>> implement its own timeout in __cpu_up for secondary coming up, and
>> IPI it to get traces if necessary which is probably more robust.
>
> Indeed that could work. Another alternative I have been pondering is to boot
> the system with the perf-based NMI watchdog enabled. Once all CPUs are up
> and running, switch to the HPET-based NMI watchdog and free the PMU counters.

Just to cover smp_init()? Unless you could move the watchdog
significantly earlier, I'd say it's probably not worth bothering
with.

Yes the boot CPU is doing *some* work that could lock up, but most
complexity is in the secondaries coming up and they won't have their own
watchdog coverage for a good chunk of that anyway.

If anything I would just add some timeout warning or IPI or something in
those wait loops in x86's __cpu_up code if you are worried about
catching issues here. Actually the watchdog probably wouldn't catch any
of those anyway because they either run with interrupts enabled or
touch_nmi_watchdog()! So yeah that'd be pretty pointless.

Thanks,
Nick