2019-03-26 21:52:08

by Thomas Gleixner

[permalink] [raw]
Subject: [PATCH] watchdog: Respect watchdog cpumask on CPU hotplug

The rework of the watchdog core to use cpu_stop_work broke the watchdog
cpumask on CPU hotplug.

The watchdog_enable/disable() functions are now called unconditionally from
the hotplug callback, i.e. even on CPUs which are not in the watchdog
cpumask.

Only invoke them when the plugged CPU is in the watchdog cpumask.

Fixes: 9cf57731b63e ("watchdog/softlockup: Replace "watchdog/%u" threads with cpu_stop_work")
Signed-off-by: Thomas Gleixner <[email protected]>
Cc: [email protected]
---
kernel/watchdog.c | 6 ++++--
1 file changed, 4 insertions(+), 2 deletions(-)

--- a/kernel/watchdog.c
+++ b/kernel/watchdog.c
@@ -547,13 +547,15 @@ static void softlockup_start_all(void)

int lockup_detector_online_cpu(unsigned int cpu)
{
- watchdog_enable(cpu);
+ if (cpumask_test_cpu(cpu, &watchdog_allowed_mask))
+ watchdog_enable(cpu);
return 0;
}

int lockup_detector_offline_cpu(unsigned int cpu)
{
- watchdog_disable(cpu);
+ if (cpumask_test_cpu(cpu, &watchdog_allowed_mask))
+ watchdog_disable(cpu);
return 0;
}



2019-03-27 15:19:31

by Oleg Nesterov

[permalink] [raw]
Subject: Re: [PATCH] watchdog: Respect watchdog cpumask on CPU hotplug

On 03/26, Thomas Gleixner wrote:
>
> The rework of the watchdog core to use cpu_stop_work broke the watchdog
> cpumask on CPU hotplug.
>
> The watchdog_enable/disable() functions are now called unconditionally from
> the hotplug callback, i.e. even on CPUs which are not in the watchdog
> cpumask.
>
> Only invoke them when the plugged CPU is in the watchdog cpumask.
>
> Fixes: 9cf57731b63e ("watchdog/softlockup: Replace "watchdog/%u" threads with cpu_stop_work")
> Signed-off-by: Thomas Gleixner <[email protected]>
> Cc: [email protected]
> ---
> kernel/watchdog.c | 6 ++++--
> 1 file changed, 4 insertions(+), 2 deletions(-)
>
> --- a/kernel/watchdog.c
> +++ b/kernel/watchdog.c
> @@ -547,13 +547,15 @@ static void softlockup_start_all(void)
>
> int lockup_detector_online_cpu(unsigned int cpu)
> {
> - watchdog_enable(cpu);
> + if (cpumask_test_cpu(cpu, &watchdog_allowed_mask))
> + watchdog_enable(cpu);
> return 0;
> }
>
> int lockup_detector_offline_cpu(unsigned int cpu)
> {
> - watchdog_disable(cpu);
> + if (cpumask_test_cpu(cpu, &watchdog_allowed_mask))
> + watchdog_disable(cpu);
> return 0;
> }

IIUC without this fix an NMI watchdog can too be enabled at boot time even
if the initial watchdog_cpumask = housekeeping_cpumask(HK_FLAG_TIMER) doesn't
include the plugged CPU.

And after that writing 0 to /proc/sys/kernel/nmi_watchdog clears
NMI_WATCHDOG_ENABLED but this can't disable NMI watchdog's outside of
watchdog_allowed_mask.

So may be this can explain the problem reported by Maxime ?
See https://lore.kernel.org/lkml/[email protected]/

Oleg.


2019-03-27 19:12:26

by Thomas Gleixner

[permalink] [raw]
Subject: Re: [PATCH] watchdog: Respect watchdog cpumask on CPU hotplug

On Wed, 27 Mar 2019, Oleg Nesterov wrote:
> On 03/26, Thomas Gleixner wrote:
> >
> > The rework of the watchdog core to use cpu_stop_work broke the watchdog
> > cpumask on CPU hotplug.
> >
> > The watchdog_enable/disable() functions are now called unconditionally from
> > the hotplug callback, i.e. even on CPUs which are not in the watchdog
> > cpumask.
> >
> > Only invoke them when the plugged CPU is in the watchdog cpumask.
>
> IIUC without this fix an NMI watchdog can too be enabled at boot time even
> if the initial watchdog_cpumask = housekeeping_cpumask(HK_FLAG_TIMER) doesn't
> include the plugged CPU.

Yes.

> And after that writing 0 to /proc/sys/kernel/nmi_watchdog clears
> NMI_WATCHDOG_ENABLED but this can't disable NMI watchdog's outside of
> watchdog_allowed_mask.

Correct

> So may be this can explain the problem reported by Maxime ?
> See https://lore.kernel.org/lkml/[email protected]/

That looks so.

Thanks,

tglx

2019-03-28 09:38:02

by Maxime Coquelin

[permalink] [raw]
Subject: Re: [PATCH] watchdog: Respect watchdog cpumask on CPU hotplug



On 3/27/19 8:10 PM, Thomas Gleixner wrote:
> On Wed, 27 Mar 2019, Oleg Nesterov wrote:
>> On 03/26, Thomas Gleixner wrote:
>>>
>>> The rework of the watchdog core to use cpu_stop_work broke the watchdog
>>> cpumask on CPU hotplug.
>>>
>>> The watchdog_enable/disable() functions are now called unconditionally from
>>> the hotplug callback, i.e. even on CPUs which are not in the watchdog
>>> cpumask.
>>>
>>> Only invoke them when the plugged CPU is in the watchdog cpumask.
>>
>> IIUC without this fix an NMI watchdog can too be enabled at boot time even
>> if the initial watchdog_cpumask = housekeeping_cpumask(HK_FLAG_TIMER) doesn't
>> include the plugged CPU.
>
> Yes.
>
>> And after that writing 0 to /proc/sys/kernel/nmi_watchdog clears
>> NMI_WATCHDOG_ENABLED but this can't disable NMI watchdog's outside of
>> watchdog_allowed_mask.
>
> Correct
>
>> So may be this can explain the problem reported by Maxime ?
>> See https://lore.kernel.org/lkml/[email protected]/
>
> That looks so.

I had a trial with your patch, and I can confirm it fixes my issue:

Tested-by: Maxime Coquelin <[email protected]>

Thanks,
Maxime

> Thanks,
>
> tglx
>

Subject: [tip:core/urgent] watchdog: Respect watchdog cpumask on CPU hotplug

Commit-ID: 7dd47617114921fdd8c095509e5e7b4373cc44a1
Gitweb: https://git.kernel.org/tip/7dd47617114921fdd8c095509e5e7b4373cc44a1
Author: Thomas Gleixner <[email protected]>
AuthorDate: Tue, 26 Mar 2019 22:51:02 +0100
Committer: Thomas Gleixner <[email protected]>
CommitDate: Thu, 28 Mar 2019 13:32:01 +0100

watchdog: Respect watchdog cpumask on CPU hotplug

The rework of the watchdog core to use cpu_stop_work broke the watchdog
cpumask on CPU hotplug.

The watchdog_enable/disable() functions are now called unconditionally from
the hotplug callback, i.e. even on CPUs which are not in the watchdog
cpumask. As a consequence the watchdog can become unstoppable.

Only invoke them when the plugged CPU is in the watchdog cpumask.

Fixes: 9cf57731b63e ("watchdog/softlockup: Replace "watchdog/%u" threads with cpu_stop_work")
Reported-by: Maxime Coquelin <[email protected]>
Signed-off-by: Thomas Gleixner <[email protected]>
Tested-by: Maxime Coquelin <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Oleg Nesterov <[email protected]>
Cc: Michael Ellerman <[email protected]>
Cc: Nicholas Piggin <[email protected]>
Cc: Don Zickus <[email protected]>
Cc: Ricardo Neri <[email protected]>
Cc: [email protected]
Link: https://lkml.kernel.org/r/[email protected]

---
kernel/watchdog.c | 6 ++++--
1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/kernel/watchdog.c b/kernel/watchdog.c
index 403c9bd90413..6a5787233113 100644
--- a/kernel/watchdog.c
+++ b/kernel/watchdog.c
@@ -554,13 +554,15 @@ static void softlockup_start_all(void)

int lockup_detector_online_cpu(unsigned int cpu)
{
- watchdog_enable(cpu);
+ if (cpumask_test_cpu(cpu, &watchdog_allowed_mask))
+ watchdog_enable(cpu);
return 0;
}

int lockup_detector_offline_cpu(unsigned int cpu)
{
- watchdog_disable(cpu);
+ if (cpumask_test_cpu(cpu, &watchdog_allowed_mask))
+ watchdog_disable(cpu);
return 0;
}