2016-10-18 19:14:50

by Babu Moger

[permalink] [raw]
Subject: [PATCH v3 0/2] Introduce arch specific nmi enable, disable handlers

During our testing we noticed that nmi watchdogs in sparc could not be disabled or
enabled dynamically using sysctl/proc interface. Sparc uses its own arch specific
nmi watchdogs. There is a sysctl and proc interface(proc/sys/kernel/nmi_watchdog)
to enable/disable nmi watchdogs. However, that is not working for sparc. There
is no interface to feed this parameter to arch specific nmi watchdogs.

These patches extend the same sysctl/proc interface to enable or disable
these arch specific nmi watchdogs dynamically. Introduced new functions
arch_watchdog_nmi_enable and arch_watchdog_nmi_disable which can be implemented
in arch specific handlers.
If you think there is a better way to do this. Please advice.

Tested on sparc. Compile tested on x86.

v3:
Made one more change per Don Zickus comments.
Moved failure path messages to into generic code inside watchdog_nmi_enable.
Also added matching prints in sparc to warn about the failure.

v2:
a)Sam Ravnborg's comments about making the definitions visible.
With the new approach we dont need those definitions((NMI_WATCHDOG_ENABLED,
SOFT_WATCHDOG_ENABLED etc..) outside watchdog.c. So no action.

b) Made changes per Don Zickus comments.
Don, I could not use your patches as is. Reason is sparc does not define
CONFIG_HARDLOCKUP_DETECTOR. So, defining default __weak function did not
work for me. However, I have used your idea to define __weak functions
arch_watchdog_nmi_enable and arch_watchdog_nmi_disable when CONFIG_HARDLOCKUP_DETECTOR
is not defined. I feel this should have very less impact on the races you are
concerned about. Please take a look. Feel free to suggest.

Patch2 changes: I had to introduce new variable nmi_init_done to synchronize
watchdog thread and kernel init thread.

v1:
Initial version. Discussion thread here
http://www.mail-archive.com/[email protected]/msg1245427.html

Babu Moger (2):
watchdog: Introduce arch_watchdog_nmi_enable and
arch_watchdog_nmi_disable
sparc: Implement arch_watchdog_nmi_enable and
arch_watchdog_nmi_disable

arch/sparc/kernel/nmi.c | 44 +++++++++++++++++++++++++++++-
kernel/watchdog.c | 69 +++++++++++++++++++++++++++++++---------------
2 files changed, 89 insertions(+), 24 deletions(-)


2016-10-18 19:14:58

by Babu Moger

[permalink] [raw]
Subject: [PATCH v3 2/2] sparc: Implement arch_watchdog_nmi_enable and arch_watchdog_nmi_disable

Implement functions arch_watchdog_nmi_enable and arch_watchdog_nmi_disable
to enable/disable nmi watchdog. Sparc uses arch specific nmi watchdog
handler. Currently, we do not have a way to enable/disable nmi watchdog
dynamically. With these patches we can enable or disable arch
specific nmi watchdogs using proc or sysctl interface.

Example commands.
To enable: echo 1 > /proc/sys/kernel/nmi_watchdog
To disable: echo 0 > /proc/sys/kernel/nmi_watchdog

It can also achieved using the sysctl parameter kernel.nmi_watchdog

Signed-off-by: Babu Moger <[email protected]>
---
arch/sparc/kernel/nmi.c | 44 +++++++++++++++++++++++++++++++++++++++++++-
1 files changed, 43 insertions(+), 1 deletions(-)

diff --git a/arch/sparc/kernel/nmi.c b/arch/sparc/kernel/nmi.c
index a9973bb..b55d518 100644
--- a/arch/sparc/kernel/nmi.c
+++ b/arch/sparc/kernel/nmi.c
@@ -42,7 +42,7 @@ static int panic_on_timeout;
*/
atomic_t nmi_active = ATOMIC_INIT(0); /* oprofile uses this */
EXPORT_SYMBOL(nmi_active);
-
+static int nmi_init_done;
static unsigned int nmi_hz = HZ;
static DEFINE_PER_CPU(short, wd_enabled);
static int endflag __initdata;
@@ -153,6 +153,8 @@ static void report_broken_nmi(int cpu, int *prev_nmi_count)

void stop_nmi_watchdog(void *unused)
{
+ if (!__this_cpu_read(wd_enabled))
+ return;
pcr_ops->write_pcr(0, pcr_ops->pcr_nmi_disable);
__this_cpu_write(wd_enabled, 0);
atomic_dec(&nmi_active);
@@ -207,6 +209,9 @@ error:

void start_nmi_watchdog(void *unused)
{
+ if (__this_cpu_read(wd_enabled))
+ return;
+
__this_cpu_write(wd_enabled, 1);
atomic_inc(&nmi_active);

@@ -259,6 +264,8 @@ int __init nmi_init(void)
}
}

+ nmi_init_done = 1;
+
return err;
}

@@ -270,3 +277,38 @@ static int __init setup_nmi_watchdog(char *str)
return 0;
}
__setup("nmi_watchdog=", setup_nmi_watchdog);
+
+/*
+ * sparc specific NMI watchdog enable function.
+ * Enables watchdog if it is not enabled already.
+ */
+int arch_watchdog_nmi_enable(unsigned int cpu)
+{
+ if (atomic_read(&nmi_active) == -1) {
+ pr_warn("NMI watchdog cannot be enabled or disabled\n");
+ return -1;
+ }
+
+ /*
+ * watchdog thread could start even before nmi_init is called.
+ * Just Return in that case. Let nmi_init finish the init
+ * process first.
+ */
+ if (!nmi_init_done)
+ return 0;
+
+ smp_call_function_single(cpu, start_nmi_watchdog, NULL, 1);
+
+ return 0;
+}
+/*
+ * sparc specific NMI watchdog disable function.
+ * Disables watchdog if it is not disabled already.
+ */
+void arch_watchdog_nmi_disable(unsigned int cpu)
+{
+ if (atomic_read(&nmi_active) == -1)
+ pr_warn_once("NMI watchdog cannot be enabled or disabled\n");
+ else
+ smp_call_function_single(cpu, stop_nmi_watchdog, NULL, 1);
+}
--
1.7.1

2016-10-18 19:15:07

by Babu Moger

[permalink] [raw]
Subject: [PATCH v3 1/2] watchdog: Introduce arch_watchdog_nmi_enable and arch_watchdog_nmi_disable

Currently we do not have a way to enable/disable arch specific
watchdog handlers if it was implemented by any of the architectures.

This patch introduces new functions arch_watchdog_nmi_enable and
arch_watchdog_nmi_disable which can be used to enable/disable architecture
specific NMI watchdog handlers. These functions are defined as weak as
architectures can override their definitions to enable/disable nmi
watchdog behaviour.

Signed-off-by: Babu Moger <[email protected]>
---
kernel/watchdog.c | 69 +++++++++++++++++++++++++++++++++++-----------------
1 files changed, 46 insertions(+), 23 deletions(-)

diff --git a/kernel/watchdog.c b/kernel/watchdog.c
index 9acb29f..2d0765b 100644
--- a/kernel/watchdog.c
+++ b/kernel/watchdog.c
@@ -46,7 +46,7 @@

static DEFINE_MUTEX(watchdog_proc_mutex);

-#ifdef CONFIG_HARDLOCKUP_DETECTOR
+#if defined(CONFIG_HARDLOCKUP_DETECTOR) || defined(CONFIG_HAVE_NMI_WATCHDOG)
static unsigned long __read_mostly watchdog_enabled = SOFT_WATCHDOG_ENABLED|NMI_WATCHDOG_ENABLED;
#else
static unsigned long __read_mostly watchdog_enabled = SOFT_WATCHDOG_ENABLED;
@@ -585,15 +585,11 @@ static void watchdog(unsigned int cpu)
*/
static unsigned long cpu0_err;

-static int watchdog_nmi_enable(unsigned int cpu)
+static int arch_watchdog_nmi_enable(unsigned int cpu)
{
struct perf_event_attr *wd_attr;
struct perf_event *event = per_cpu(watchdog_ev, cpu);

- /* nothing to do if the hard lockup detector is disabled */
- if (!(watchdog_enabled & NMI_WATCHDOG_ENABLED))
- goto out;
-
/* is it already setup and enabled? */
if (event && event->state > PERF_EVENT_STATE_OFF)
goto out;
@@ -619,18 +615,6 @@ static int watchdog_nmi_enable(unsigned int cpu)
goto out_save;
}

- /*
- * Disable the hard lockup detector if _any_ CPU fails to set up
- * set up the hardware perf event. The watchdog() function checks
- * the NMI_WATCHDOG_ENABLED bit periodically.
- *
- * The barriers are for syncing up watchdog_enabled across all the
- * cpus, as clear_bit() does not use barriers.
- */
- smp_mb__before_atomic();
- clear_bit(NMI_WATCHDOG_ENABLED_BIT, &watchdog_enabled);
- smp_mb__after_atomic();
-
/* skip displaying the same error again */
if (cpu > 0 && (PTR_ERR(event) == cpu0_err))
return PTR_ERR(event);
@@ -645,8 +629,6 @@ static int watchdog_nmi_enable(unsigned int cpu)
pr_err("disabled (cpu%i): unable to create perf event: %ld\n",
cpu, PTR_ERR(event));

- pr_info("Shutting down hard lockup detector on all cpus\n");
-
return PTR_ERR(event);

/* success path */
@@ -658,7 +640,7 @@ out:
return 0;
}

-static void watchdog_nmi_disable(unsigned int cpu)
+static void arch_watchdog_nmi_disable(unsigned int cpu)
{
struct perf_event *event = per_cpu(watchdog_ev, cpu);

@@ -676,8 +658,13 @@ static void watchdog_nmi_disable(unsigned int cpu)
}

#else
-static int watchdog_nmi_enable(unsigned int cpu) { return 0; }
-static void watchdog_nmi_disable(unsigned int cpu) { return; }
+/*
+ * These two functions are mostly architecture specific
+ * defining them as weak here.
+ */
+int __weak arch_watchdog_nmi_enable(unsigned int cpu) { return 0; }
+void __weak arch_watchdog_nmi_disable(unsigned int cpu) { return; }
+
#endif /* CONFIG_HARDLOCKUP_DETECTOR */

static struct smp_hotplug_thread watchdog_threads = {
@@ -781,6 +768,42 @@ void lockup_detector_resume(void)
put_online_cpus();
}

+void watchdog_nmi_disable(unsigned int cpu)
+{
+ arch_watchdog_nmi_disable(cpu);
+}
+
+int watchdog_nmi_enable(unsigned int cpu)
+{
+ int err;
+
+ /* nothing to do if the hard lockup detector is disabled */
+ if (!(watchdog_enabled & NMI_WATCHDOG_ENABLED))
+ return 0;
+
+ err = arch_watchdog_nmi_enable(cpu);
+
+ if (err) {
+ /*
+ * Disable the hard lockup detector if _any_ CPU fails to set up
+ * set up the hardware perf event. The watchdog() function checks
+ * the NMI_WATCHDOG_ENABLED bit periodically.
+ *
+ * The barriers are for syncing up watchdog_enabled across all the
+ * cpus, as clear_bit() does not use barriers.
+ */
+ smp_mb__before_atomic();
+ clear_bit(NMI_WATCHDOG_ENABLED_BIT, &watchdog_enabled);
+ smp_mb__after_atomic();
+
+ pr_info("Shutting down hard lockup detector on all cpus\n");
+
+ return err;
+ }
+
+ return 0;
+}
+
static int update_watchdog_all_cpus(void)
{
int ret;
--
1.7.1

2016-10-18 20:03:10

by Don Zickus

[permalink] [raw]
Subject: Re: [PATCH v3 0/2] Introduce arch specific nmi enable, disable handlers

On Tue, Oct 18, 2016 at 12:14:08PM -0700, Babu Moger wrote:
> During our testing we noticed that nmi watchdogs in sparc could not be disabled or
> enabled dynamically using sysctl/proc interface. Sparc uses its own arch specific
> nmi watchdogs. There is a sysctl and proc interface(proc/sys/kernel/nmi_watchdog)
> to enable/disable nmi watchdogs. However, that is not working for sparc. There
> is no interface to feed this parameter to arch specific nmi watchdogs.
>
> These patches extend the same sysctl/proc interface to enable or disable
> these arch specific nmi watchdogs dynamically. Introduced new functions
> arch_watchdog_nmi_enable and arch_watchdog_nmi_disable which can be implemented
> in arch specific handlers.
> If you think there is a better way to do this. Please advice.
>
> Tested on sparc. Compile tested on x86.

Thanks Babu!

Tested-and-Reviewed-by: Don Zickus <[email protected]>

>
> v3:
> Made one more change per Don Zickus comments.
> Moved failure path messages to into generic code inside watchdog_nmi_enable.
> Also added matching prints in sparc to warn about the failure.
>
> v2:
> a)Sam Ravnborg's comments about making the definitions visible.
> With the new approach we dont need those definitions((NMI_WATCHDOG_ENABLED,
> SOFT_WATCHDOG_ENABLED etc..) outside watchdog.c. So no action.
>
> b) Made changes per Don Zickus comments.
> Don, I could not use your patches as is. Reason is sparc does not define
> CONFIG_HARDLOCKUP_DETECTOR. So, defining default __weak function did not
> work for me. However, I have used your idea to define __weak functions
> arch_watchdog_nmi_enable and arch_watchdog_nmi_disable when CONFIG_HARDLOCKUP_DETECTOR
> is not defined. I feel this should have very less impact on the races you are
> concerned about. Please take a look. Feel free to suggest.
>
> Patch2 changes: I had to introduce new variable nmi_init_done to synchronize
> watchdog thread and kernel init thread.
>
> v1:
> Initial version. Discussion thread here
> http://www.mail-archive.com/[email protected]/msg1245427.html
>
> Babu Moger (2):
> watchdog: Introduce arch_watchdog_nmi_enable and
> arch_watchdog_nmi_disable
> sparc: Implement arch_watchdog_nmi_enable and
> arch_watchdog_nmi_disable
>
> arch/sparc/kernel/nmi.c | 44 +++++++++++++++++++++++++++++-
> kernel/watchdog.c | 69 +++++++++++++++++++++++++++++++---------------
> 2 files changed, 89 insertions(+), 24 deletions(-)
>