Originally watchdog_nmi_enable(cpu) and watchdog_nmi_disable(cpu) were
only called in watchdog thread context. However, the following commits
utilize these functions outside of watchdog thread context too.
commit 9809b18fcf6b8d8ec4d3643677345907e6b50eca
Author: Michal Hocko <[email protected]>
Date: Tue Sep 24 15:27:30 2013 -0700
watchdog: update watchdog_thresh properly
commit b3738d29323344da3017a91010530cf3a58590fc
Author: Stephane Eranian <[email protected]>
Date: Mon Nov 17 20:07:03 2014 +0100
watchdog: Add watchdog enable/disable all functions
Hence, it is now possible that these functions execute concurrently with
the same 'cpu' argument. This concurrency is problematic because per-cpu
'watchdog_ev' can be accessed/modified without adequate synchronization.
The patch series aims to address the above problem. However, instead of
introducing locks to protect per-cpu 'watchdog_ev' a different approach
is taken: Invoke these functions by parking and unparking the watchdog
threads (to ensure they are always called in watchdog thread context).
static struct smp_hotplug_thread watchdog_threads = {
...
.park = watchdog_disable, // calls watchdog_nmi_disable()
.unpark = watchdog_enable, // calls watchdog_nmi_enable()
};
Both previously mentioned commits call these functions in a similar way
and thus in principle contain some duplicate code. The patch series also
avoids this duplication by providing a commonly usable mechanism.
- Patch 1/4 introduces the watchdog_{park|unpark}_threads functions that
park/unpark all watchdog threads specified in 'watchdog_cpumask'. They
are intended to be called inside of kernel/watchdog.c only.
- Patch 2/4 introduces the watchdog_{suspend|resume} functions which can
be utilized by external callers to deactivate the hard and soft lockup
detector temporarily.
- Patch 3/4 utilizes watchdog_{park|unpark}_threads to replace some code
that was introduced by commit 9809b18fcf6b8d8ec4d3643677345907e6b50eca.
- Patch 4/4 utilizes watchdog_{suspend|resume} to replace some code that
was introduced by commit b3738d29323344da3017a91010530cf3a58590fc.
A few corner cases should be mentioned here for completeness.
- kthread_park() of watchdog/N could hang if cpu N is already locked up.
However, if watchdog is enabled the lockup will be detected anyway.
- kthread_unpark() of watchdog/N could hang if cpu N got locked up after
kthread_park(). The occurrence of this scenario should be _very_ rare
in practice, in particular because it is not expected that temporary
deactivation will happen frequently, and if it happens at all it is
expected that the duration of deactivation will be short.
Ulrich Obergfell (4):
watchdog: introduce watchdog_park_threads() and
watchdog_unpark_threads()
watchdog: introduce watchdog_suspend() and watchdog_resume()
watchdog: use park/unpark functions in update_watchdog_all_cpus()
watchdog: use suspend/resume interface in fixup_ht_bug()
arch/x86/kernel/cpu/perf_event_intel.c | 9 +-
include/linux/nmi.h | 2 +
include/linux/watchdog.h | 8 --
kernel/watchdog.c | 157 +++++++++++++++++++--------------
4 files changed, 100 insertions(+), 76 deletions(-)
--
1.7.11.7
These functions are intended to be used only from inside kernel/watchdog.c
to park/unpark all watchdog threads that are specified in watchdog_cpumask.
Signed-off-by: Ulrich Obergfell <[email protected]>
---
kernel/watchdog.c | 36 ++++++++++++++++++++++++++++++++++++
1 file changed, 36 insertions(+)
diff --git a/kernel/watchdog.c b/kernel/watchdog.c
index a6ffa43..5571f20 100644
--- a/kernel/watchdog.c
+++ b/kernel/watchdog.c
@@ -24,6 +24,7 @@
#include <asm/irq_regs.h>
#include <linux/kvm_para.h>
#include <linux/perf_event.h>
+#include <linux/kthread.h>
/*
* The run state of the lockup detectors is controlled by the content of the
@@ -666,6 +667,41 @@ static struct smp_hotplug_thread watchdog_threads = {
.unpark = watchdog_enable,
};
+/*
+ * park all watchdog threads that are specified in 'watchdog_cpumask'
+ */
+static int watchdog_park_threads(void)
+{
+ int cpu, ret = 0;
+
+ get_online_cpus();
+ for_each_watchdog_cpu(cpu) {
+ ret = kthread_park(per_cpu(softlockup_watchdog, cpu));
+ if (ret)
+ break;
+ }
+ if (ret) {
+ for_each_watchdog_cpu(cpu)
+ kthread_unpark(per_cpu(softlockup_watchdog, cpu));
+ }
+ put_online_cpus();
+
+ return ret;
+}
+
+/*
+ * unpark all watchdog threads that are specified in 'watchdog_cpumask'
+ */
+static void watchdog_unpark_threads(void)
+{
+ int cpu;
+
+ get_online_cpus();
+ for_each_watchdog_cpu(cpu)
+ kthread_unpark(per_cpu(softlockup_watchdog, cpu));
+ put_online_cpus();
+}
+
static void restart_watchdog_hrtimer(void *info)
{
struct hrtimer *hrtimer = raw_cpu_ptr(&watchdog_hrtimer);
--
1.7.11.7
This interface can be utilized to deactivate the hard and soft lockup
detector temporarily. Callers are expected to minimize the duration of
deactivation. Multiple deactivations are allowed to occur in parallel
but should be rare in practice.
Signed-off-by: Ulrich Obergfell <[email protected]>
---
include/linux/nmi.h | 2 ++
kernel/watchdog.c | 65 +++++++++++++++++++++++++++++++++++++++++++++++++++++
2 files changed, 67 insertions(+)
diff --git a/include/linux/nmi.h b/include/linux/nmi.h
index f94da0e..60050c2 100644
--- a/include/linux/nmi.h
+++ b/include/linux/nmi.h
@@ -80,6 +80,8 @@ extern int proc_watchdog_thresh(struct ctl_table *, int ,
void __user *, size_t *, loff_t *);
extern int proc_watchdog_cpumask(struct ctl_table *, int,
void __user *, size_t *, loff_t *);
+extern int watchdog_suspend(void);
+extern void watchdog_resume(void);
#endif
#ifdef CONFIG_HAVE_ACPI_APEI_NMI
diff --git a/kernel/watchdog.c b/kernel/watchdog.c
index 5571f20..98d44b1 100644
--- a/kernel/watchdog.c
+++ b/kernel/watchdog.c
@@ -67,6 +67,7 @@ unsigned long *watchdog_cpumask_bits = cpumask_bits(&watchdog_cpumask);
#define for_each_watchdog_cpu(cpu) \
for_each_cpu_and((cpu), cpu_online_mask, &watchdog_cpumask)
+static int __read_mostly watchdog_suspended = 0;
static int __read_mostly watchdog_running;
static u64 __read_mostly sample_period;
@@ -702,6 +703,50 @@ static void watchdog_unpark_threads(void)
put_online_cpus();
}
+/*
+ * Suspend the hard and soft lockup detector by parking the watchdog threads.
+ */
+int watchdog_suspend(void)
+{
+ int ret = 0;
+
+ mutex_lock(&watchdog_proc_mutex);
+ /*
+ * Multiple suspend requests can be active in parallel (counted by
+ * the 'watchdog_suspended' variable). If the watchdog threads are
+ * running, the first caller takes care that they will be parked.
+ * The state of 'watchdog_running' cannot change while a suspend
+ * request is active (see related changes in 'proc' handlers).
+ */
+ if (watchdog_running && !watchdog_suspended)
+ ret = watchdog_park_threads();
+
+ if (ret == 0)
+ watchdog_suspended++;
+
+ mutex_unlock(&watchdog_proc_mutex);
+
+ return ret;
+}
+
+/*
+ * Resume the hard and soft lockup detector by unparking the watchdog threads.
+ */
+void watchdog_resume(void)
+{
+ mutex_lock(&watchdog_proc_mutex);
+
+ watchdog_suspended--;
+ /*
+ * The watchdog threads are unparked if they were previously running
+ * and if there is no more active suspend request.
+ */
+ if (watchdog_running && !watchdog_suspended)
+ watchdog_unpark_threads();
+
+ mutex_unlock(&watchdog_proc_mutex);
+}
+
static void restart_watchdog_hrtimer(void *info)
{
struct hrtimer *hrtimer = raw_cpu_ptr(&watchdog_hrtimer);
@@ -823,6 +868,12 @@ static int proc_watchdog_common(int which, struct ctl_table *table, int write,
mutex_lock(&watchdog_proc_mutex);
+ if (watchdog_suspended) {
+ /* no parameter changes allowed while watchdog is suspended */
+ err = -EAGAIN;
+ goto out;
+ }
+
/*
* If the parameter is being read return the state of the corresponding
* bit(s) in 'watchdog_enabled', else update 'watchdog_enabled' and the
@@ -908,6 +959,12 @@ int proc_watchdog_thresh(struct ctl_table *table, int write,
mutex_lock(&watchdog_proc_mutex);
+ if (watchdog_suspended) {
+ /* no parameter changes allowed while watchdog is suspended */
+ err = -EAGAIN;
+ goto out;
+ }
+
old = ACCESS_ONCE(watchdog_thresh);
err = proc_dointvec_minmax(table, write, buffer, lenp, ppos);
@@ -939,6 +996,13 @@ int proc_watchdog_cpumask(struct ctl_table *table, int write,
int err;
mutex_lock(&watchdog_proc_mutex);
+
+ if (watchdog_suspended) {
+ /* no parameter changes allowed while watchdog is suspended */
+ err = -EAGAIN;
+ goto out;
+ }
+
err = proc_do_large_bitmap(table, write, buffer, lenp, ppos);
if (!err && write) {
/* Remove impossible cpus to keep sysctl output cleaner. */
@@ -956,6 +1020,7 @@ int proc_watchdog_cpumask(struct ctl_table *table, int write,
pr_err("cpumask update failed\n");
}
}
+out:
mutex_unlock(&watchdog_proc_mutex);
return err;
}
--
1.7.11.7
Remove update_watchdog() and restart_watchdog_hrtimer() since these
functions are no longer needed. Changes of parameters such as the
sample period are honored at the time when the watchdog threads are
being unparked.
Signed-off-by: Ulrich Obergfell <[email protected]>
---
kernel/watchdog.c | 40 ++--------------------------------------
1 file changed, 2 insertions(+), 38 deletions(-)
diff --git a/kernel/watchdog.c b/kernel/watchdog.c
index 98d44b1..3618e93 100644
--- a/kernel/watchdog.c
+++ b/kernel/watchdog.c
@@ -747,46 +747,10 @@ void watchdog_resume(void)
mutex_unlock(&watchdog_proc_mutex);
}
-static void restart_watchdog_hrtimer(void *info)
-{
- struct hrtimer *hrtimer = raw_cpu_ptr(&watchdog_hrtimer);
- int ret;
-
- /*
- * No need to cancel and restart hrtimer if it is currently executing
- * because it will reprogram itself with the new period now.
- * We should never see it unqueued here because we are running per-cpu
- * with interrupts disabled.
- */
- ret = hrtimer_try_to_cancel(hrtimer);
- if (ret == 1)
- hrtimer_start(hrtimer, ns_to_ktime(sample_period),
- HRTIMER_MODE_REL_PINNED);
-}
-
-static void update_watchdog(int cpu)
-{
- /*
- * Make sure that perf event counter will adopt to a new
- * sampling period. Updating the sampling period directly would
- * be much nicer but we do not have an API for that now so
- * let's use a big hammer.
- * Hrtimer will adopt the new period on the next tick but this
- * might be late already so we have to restart the timer as well.
- */
- watchdog_nmi_disable(cpu);
- smp_call_function_single(cpu, restart_watchdog_hrtimer, NULL, 1);
- watchdog_nmi_enable(cpu);
-}
-
static void update_watchdog_all_cpus(void)
{
- int cpu;
-
- get_online_cpus();
- for_each_watchdog_cpu(cpu)
- update_watchdog(cpu);
- put_online_cpus();
+ watchdog_park_threads();
+ watchdog_unpark_threads();
}
static int watchdog_enable_all_cpus(void)
--
1.7.11.7
Remove watchdog_nmi_disable_all() and watchdog_nmi_enable_all()
since these functions are no longer needed. If a subsystem has a
need to deactivate the watchdog temporarily, it should utilize the
watchdog_suspend() and watchdog_resume() functions.
Signed-off-by: Ulrich Obergfell <[email protected]>
---
arch/x86/kernel/cpu/perf_event_intel.c | 9 +++++---
include/linux/watchdog.h | 8 -------
kernel/watchdog.c | 38 ----------------------------------
3 files changed, 6 insertions(+), 49 deletions(-)
diff --git a/arch/x86/kernel/cpu/perf_event_intel.c b/arch/x86/kernel/cpu/perf_event_intel.c
index b9826a9..d4e1b0c 100644
--- a/arch/x86/kernel/cpu/perf_event_intel.c
+++ b/arch/x86/kernel/cpu/perf_event_intel.c
@@ -12,7 +12,7 @@
#include <linux/init.h>
#include <linux/slab.h>
#include <linux/export.h>
-#include <linux/watchdog.h>
+#include <linux/nmi.h>
#include <asm/cpufeature.h>
#include <asm/hardirq.h>
@@ -3368,7 +3368,10 @@ static __init int fixup_ht_bug(void)
return 0;
}
- watchdog_nmi_disable_all();
+ if (watchdog_suspend() != 0) {
+ pr_info("failed to disable PMU erratum BJ122, BV98, HSD29 workaround\n");
+ return 0;
+ }
x86_pmu.flags &= ~(PMU_FL_EXCL_CNTRS | PMU_FL_EXCL_ENABLED);
@@ -3376,7 +3379,7 @@ static __init int fixup_ht_bug(void)
x86_pmu.commit_scheduling = NULL;
x86_pmu.stop_scheduling = NULL;
- watchdog_nmi_enable_all();
+ watchdog_resume();
get_online_cpus();
diff --git a/include/linux/watchdog.h b/include/linux/watchdog.h
index f47fead..d74a0e9 100644
--- a/include/linux/watchdog.h
+++ b/include/linux/watchdog.h
@@ -140,12 +140,4 @@ extern int watchdog_init_timeout(struct watchdog_device *wdd,
extern int watchdog_register_device(struct watchdog_device *);
extern void watchdog_unregister_device(struct watchdog_device *);
-#ifdef CONFIG_HARDLOCKUP_DETECTOR
-void watchdog_nmi_disable_all(void);
-void watchdog_nmi_enable_all(void);
-#else
-static inline void watchdog_nmi_disable_all(void) {}
-static inline void watchdog_nmi_enable_all(void) {}
-#endif
-
#endif /* ifndef _LINUX_WATCHDOG_H */
diff --git a/kernel/watchdog.c b/kernel/watchdog.c
index 3618e93..bf751ef 100644
--- a/kernel/watchdog.c
+++ b/kernel/watchdog.c
@@ -614,47 +614,9 @@ static void watchdog_nmi_disable(unsigned int cpu)
cpu0_err = 0;
}
}
-
-void watchdog_nmi_enable_all(void)
-{
- int cpu;
-
- mutex_lock(&watchdog_proc_mutex);
-
- if (!(watchdog_enabled & NMI_WATCHDOG_ENABLED))
- goto unlock;
-
- get_online_cpus();
- for_each_watchdog_cpu(cpu)
- watchdog_nmi_enable(cpu);
- put_online_cpus();
-
-unlock:
- mutex_unlock(&watchdog_proc_mutex);
-}
-
-void watchdog_nmi_disable_all(void)
-{
- int cpu;
-
- mutex_lock(&watchdog_proc_mutex);
-
- if (!watchdog_running)
- goto unlock;
-
- get_online_cpus();
- for_each_watchdog_cpu(cpu)
- watchdog_nmi_disable(cpu);
- put_online_cpus();
-
-unlock:
- mutex_unlock(&watchdog_proc_mutex);
-}
#else
static int watchdog_nmi_enable(unsigned int cpu) { return 0; }
static void watchdog_nmi_disable(unsigned int cpu) { return; }
-void watchdog_nmi_enable_all(void) {}
-void watchdog_nmi_disable_all(void) {}
#endif /* CONFIG_HARDLOCKUP_DETECTOR */
static struct smp_hotplug_thread watchdog_threads = {
--
1.7.11.7
On Sat, Aug 01, 2015 at 02:49:23PM +0200, Ulrich Obergfell wrote:
> This interface can be utilized to deactivate the hard and soft lockup
> detector temporarily. Callers are expected to minimize the duration of
> deactivation. Multiple deactivations are allowed to occur in parallel
> but should be rare in practice.
>
> Signed-off-by: Ulrich Obergfell <[email protected]>
> ---
> include/linux/nmi.h | 2 ++
> kernel/watchdog.c | 65 +++++++++++++++++++++++++++++++++++++++++++++++++++++
> 2 files changed, 67 insertions(+)
>
> diff --git a/include/linux/nmi.h b/include/linux/nmi.h
> index f94da0e..60050c2 100644
> --- a/include/linux/nmi.h
> +++ b/include/linux/nmi.h
> @@ -80,6 +80,8 @@ extern int proc_watchdog_thresh(struct ctl_table *, int ,
> void __user *, size_t *, loff_t *);
> extern int proc_watchdog_cpumask(struct ctl_table *, int,
> void __user *, size_t *, loff_t *);
> +extern int watchdog_suspend(void);
> +extern void watchdog_resume(void);
How about nmi_watchdog_enable() and nmi_watchdog_disable() to avoid confusion
with the watchdog subsystem ?
Thanks,
Guenter
> ----- Original Message -----
> From: "Guenter Roeck" <[email protected]>
> ...
> Subject: Re: [PATCH 2/4] watchdog: introduce watchdog_suspend() and watchdog_resume()
>
> On Sat, Aug 01, 2015 at 02:49:23PM +0200, Ulrich Obergfell wrote:
>> This interface can be utilized to deactivate the hard and soft lockup
>> detector temporarily. Callers are expected to minimize the duration of
>> deactivation. Multiple deactivations are allowed to occur in parallel
>> but should be rare in practice.
>>
>> Signed-off-by: Ulrich Obergfell <[email protected]>
>> ---
>> include/linux/nmi.h | 2 ++
>> kernel/watchdog.c | 65 +++++++++++++++++++++++++++++++++++++++++++++++++++++
>> 2 files changed, 67 insertions(+)
>>
>> diff --git a/include/linux/nmi.h b/include/linux/nmi.h
>> index f94da0e..60050c2 100644
>> --- a/include/linux/nmi.h
>> +++ b/include/linux/nmi.h
>> @@ -80,6 +80,8 @@ extern int proc_watchdog_thresh(struct ctl_table *, int ,
>> void __user *, size_t *, loff_t *);
>> extern int proc_watchdog_cpumask(struct ctl_table *, int,
>> void __user *, size_t *, loff_t *);
>> +extern int watchdog_suspend(void);
>> +extern void watchdog_resume(void);
>
> How about nmi_watchdog_enable() and nmi_watchdog_disable() to avoid confusion
> with the watchdog subsystem ?
Guenter,
Good point. However, I would like to avoid the 'nmi_' prefix in the
function names as it could be misleading. watchdog_{suspend|resume}
affect both -the hard and soft lockup detector- so I think function
names like
lockup_detector_suspend() instead of watchdog_suspend()
lockup_detector_resume() instead of watchdog_resume()
would summarize better what these functions are intended to be used
for. The above names would also be consistent with the existing name
lockup_detector_init()
Many Thanks,
Uli
On Sat, Aug 01, 2015 at 10:39:22AM -0400, Ulrich Obergfell wrote:
> > ----- Original Message -----
> > From: "Guenter Roeck" <[email protected]>
> > ...
> > Subject: Re: [PATCH 2/4] watchdog: introduce watchdog_suspend() and watchdog_resume()
> >
> > On Sat, Aug 01, 2015 at 02:49:23PM +0200, Ulrich Obergfell wrote:
> >> This interface can be utilized to deactivate the hard and soft lockup
> >> detector temporarily. Callers are expected to minimize the duration of
> >> deactivation. Multiple deactivations are allowed to occur in parallel
> >> but should be rare in practice.
> >>
> >> Signed-off-by: Ulrich Obergfell <[email protected]>
> >> ---
> >> include/linux/nmi.h | 2 ++
> >> kernel/watchdog.c | 65 +++++++++++++++++++++++++++++++++++++++++++++++++++++
> >> 2 files changed, 67 insertions(+)
> >>
> >> diff --git a/include/linux/nmi.h b/include/linux/nmi.h
> >> index f94da0e..60050c2 100644
> >> --- a/include/linux/nmi.h
> >> +++ b/include/linux/nmi.h
> >> @@ -80,6 +80,8 @@ extern int proc_watchdog_thresh(struct ctl_table *, int ,
> >> void __user *, size_t *, loff_t *);
> >> extern int proc_watchdog_cpumask(struct ctl_table *, int,
> >> void __user *, size_t *, loff_t *);
> >> +extern int watchdog_suspend(void);
> >> +extern void watchdog_resume(void);
> >
> > How about nmi_watchdog_enable() and nmi_watchdog_disable() to avoid confusion
> > with the watchdog subsystem ?
>
> Guenter,
>
> Good point. However, I would like to avoid the 'nmi_' prefix in the
> function names as it could be misleading. watchdog_{suspend|resume}
> affect both -the hard and soft lockup detector- so I think function
> names like
>
> lockup_detector_suspend() instead of watchdog_suspend()
> lockup_detector_resume() instead of watchdog_resume()
>
> would summarize better what these functions are intended to be used
> for. The above names would also be consistent with the existing name
>
> lockup_detector_init()
>
Makes sense.
Thanks,
Guenter
On Sat 2015-08-01 14:49 +0200, Ulrich Obergfell wrote:
> Originally watchdog_nmi_enable(cpu) and watchdog_nmi_disable(cpu) were
> only called in watchdog thread context. However, the following commits
> utilize these functions outside of watchdog thread context too.
>
> commit 9809b18fcf6b8d8ec4d3643677345907e6b50eca
> Author: Michal Hocko <[email protected]>
> Date: Tue Sep 24 15:27:30 2013 -0700
>
> watchdog: update watchdog_thresh properly
>
> commit b3738d29323344da3017a91010530cf3a58590fc
> Author: Stephane Eranian <[email protected]>
> Date: Mon Nov 17 20:07:03 2014 +0100
>
> watchdog: Add watchdog enable/disable all functions
>
> Hence, it is now possible that these functions execute concurrently with
> the same 'cpu' argument. This concurrency is problematic because per-cpu
> 'watchdog_ev' can be accessed/modified without adequate synchronization.
>
> The patch series aims to address the above problem. However, instead of
> introducing locks to protect per-cpu 'watchdog_ev' a different approach
> is taken: Invoke these functions by parking and unparking the watchdog
> threads (to ensure they are always called in watchdog thread context).
>
> static struct smp_hotplug_thread watchdog_threads = {
> ...
> .park = watchdog_disable, // calls watchdog_nmi_disable()
> .unpark = watchdog_enable, // calls watchdog_nmi_enable()
> };
>
> Both previously mentioned commits call these functions in a similar way
> and thus in principle contain some duplicate code. The patch series also
> avoids this duplication by providing a commonly usable mechanism.
>
> - Patch 1/4 introduces the watchdog_{park|unpark}_threads functions that
> park/unpark all watchdog threads specified in 'watchdog_cpumask'. They
> are intended to be called inside of kernel/watchdog.c only.
>
> - Patch 2/4 introduces the watchdog_{suspend|resume} functions which can
> be utilized by external callers to deactivate the hard and soft lockup
> detector temporarily.
>
> - Patch 3/4 utilizes watchdog_{park|unpark}_threads to replace some code
> that was introduced by commit 9809b18fcf6b8d8ec4d3643677345907e6b50eca.
>
> - Patch 4/4 utilizes watchdog_{suspend|resume} to replace some code that
> was introduced by commit b3738d29323344da3017a91010530cf3a58590fc.
>
> A few corner cases should be mentioned here for completeness.
>
> - kthread_park() of watchdog/N could hang if cpu N is already locked up.
> However, if watchdog is enabled the lockup will be detected anyway.
>
> - kthread_unpark() of watchdog/N could hang if cpu N got locked up after
> kthread_park(). The occurrence of this scenario should be _very_ rare
> in practice, in particular because it is not expected that temporary
> deactivation will happen frequently, and if it happens at all it is
> expected that the duration of deactivation will be short.
>
> Ulrich Obergfell (4):
> watchdog: introduce watchdog_park_threads() and
> watchdog_unpark_threads()
> watchdog: introduce watchdog_suspend() and watchdog_resume()
> watchdog: use park/unpark functions in update_watchdog_all_cpus()
> watchdog: use suspend/resume interface in fixup_ht_bug()
>
> arch/x86/kernel/cpu/perf_event_intel.c | 9 +-
> include/linux/nmi.h | 2 +
> include/linux/watchdog.h | 8 --
> kernel/watchdog.c | 157 +++++++++++++++++++--------------
> 4 files changed, 100 insertions(+), 76 deletions(-)
>
The whole patch series looks good to me.
Reviewed-by: Aaron Tomlin <[email protected]>
On Sat, Aug 01, 2015 at 02:49:21PM +0200, Ulrich Obergfell wrote:
> Originally watchdog_nmi_enable(cpu) and watchdog_nmi_disable(cpu) were
> only called in watchdog thread context. However, the following commits
> utilize these functions outside of watchdog thread context too.
>
> commit 9809b18fcf6b8d8ec4d3643677345907e6b50eca
> Author: Michal Hocko <[email protected]>
> Date: Tue Sep 24 15:27:30 2013 -0700
>
> watchdog: update watchdog_thresh properly
>
> commit b3738d29323344da3017a91010530cf3a58590fc
> Author: Stephane Eranian <[email protected]>
> Date: Mon Nov 17 20:07:03 2014 +0100
>
> watchdog: Add watchdog enable/disable all functions
>
> Hence, it is now possible that these functions execute concurrently with
> the same 'cpu' argument. This concurrency is problematic because per-cpu
> 'watchdog_ev' can be accessed/modified without adequate synchronization.
>
> The patch series aims to address the above problem. However, instead of
> introducing locks to protect per-cpu 'watchdog_ev' a different approach
> is taken: Invoke these functions by parking and unparking the watchdog
> threads (to ensure they are always called in watchdog thread context).
>
> static struct smp_hotplug_thread watchdog_threads = {
> ...
> .park = watchdog_disable, // calls watchdog_nmi_disable()
> .unpark = watchdog_enable, // calls watchdog_nmi_enable()
> };
>
> Both previously mentioned commits call these functions in a similar way
> and thus in principle contain some duplicate code. The patch series also
> avoids this duplication by providing a commonly usable mechanism.
>
> - Patch 1/4 introduces the watchdog_{park|unpark}_threads functions that
> park/unpark all watchdog threads specified in 'watchdog_cpumask'. They
> are intended to be called inside of kernel/watchdog.c only.
>
> - Patch 2/4 introduces the watchdog_{suspend|resume} functions which can
> be utilized by external callers to deactivate the hard and soft lockup
> detector temporarily.
>
> - Patch 3/4 utilizes watchdog_{park|unpark}_threads to replace some code
> that was introduced by commit 9809b18fcf6b8d8ec4d3643677345907e6b50eca.
>
> - Patch 4/4 utilizes watchdog_{suspend|resume} to replace some code that
> was introduced by commit b3738d29323344da3017a91010530cf3a58590fc.
>
> A few corner cases should be mentioned here for completeness.
>
> - kthread_park() of watchdog/N could hang if cpu N is already locked up.
> However, if watchdog is enabled the lockup will be detected anyway.
>
> - kthread_unpark() of watchdog/N could hang if cpu N got locked up after
> kthread_park(). The occurrence of this scenario should be _very_ rare
> in practice, in particular because it is not expected that temporary
> deactivation will happen frequently, and if it happens at all it is
> expected that the duration of deactivation will be short.
>
> Ulrich Obergfell (4):
> watchdog: introduce watchdog_park_threads() and
> watchdog_unpark_threads()
> watchdog: introduce watchdog_suspend() and watchdog_resume()
> watchdog: use park/unpark functions in update_watchdog_all_cpus()
> watchdog: use suspend/resume interface in fixup_ht_bug()
tested the same way as for I did for my other patch for this issue:
http://marc.info/?l=linux-kernel&m=143834383400803&w=2
works nicely ;-)
thanks,
jirka
On Sat 01-08-15 14:49:22, Ulrich Obergfell wrote:
> These functions are intended to be used only from inside kernel/watchdog.c
> to park/unpark all watchdog threads that are specified in watchdog_cpumask.
I would suggest merging this into Patch2. It is usually better to add
new functions along with their users.
> Signed-off-by: Ulrich Obergfell <[email protected]>
> ---
> kernel/watchdog.c | 36 ++++++++++++++++++++++++++++++++++++
> 1 file changed, 36 insertions(+)
>
> diff --git a/kernel/watchdog.c b/kernel/watchdog.c
> index a6ffa43..5571f20 100644
> --- a/kernel/watchdog.c
> +++ b/kernel/watchdog.c
> @@ -24,6 +24,7 @@
> #include <asm/irq_regs.h>
> #include <linux/kvm_para.h>
> #include <linux/perf_event.h>
> +#include <linux/kthread.h>
>
> /*
> * The run state of the lockup detectors is controlled by the content of the
> @@ -666,6 +667,41 @@ static struct smp_hotplug_thread watchdog_threads = {
> .unpark = watchdog_enable,
> };
>
> +/*
> + * park all watchdog threads that are specified in 'watchdog_cpumask'
> + */
> +static int watchdog_park_threads(void)
> +{
> + int cpu, ret = 0;
> +
> + get_online_cpus();
> + for_each_watchdog_cpu(cpu) {
> + ret = kthread_park(per_cpu(softlockup_watchdog, cpu));
> + if (ret)
> + break;
> + }
> + if (ret) {
> + for_each_watchdog_cpu(cpu)
> + kthread_unpark(per_cpu(softlockup_watchdog, cpu));
> + }
> + put_online_cpus();
> +
> + return ret;
> +}
> +
> +/*
> + * unpark all watchdog threads that are specified in 'watchdog_cpumask'
> + */
> +static void watchdog_unpark_threads(void)
> +{
> + int cpu;
> +
> + get_online_cpus();
> + for_each_watchdog_cpu(cpu)
> + kthread_unpark(per_cpu(softlockup_watchdog, cpu));
> + put_online_cpus();
> +}
> +
> static void restart_watchdog_hrtimer(void *info)
> {
> struct hrtimer *hrtimer = raw_cpu_ptr(&watchdog_hrtimer);
> --
> 1.7.11.7
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at http://www.tux.org/lkml/
--
Michal Hocko
SUSE Labs
On Sat 01-08-15 14:49:25, Ulrich Obergfell wrote:
[...]
> @@ -3368,7 +3368,10 @@ static __init int fixup_ht_bug(void)
> return 0;
> }
>
> - watchdog_nmi_disable_all();
> + if (watchdog_suspend() != 0) {
> + pr_info("failed to disable PMU erratum BJ122, BV98, HSD29 workaround\n");
> + return 0;
> + }
Is this really worth reporting to the log? What is an admin supposed to
do about it?
<looking into the code>
Ok, so kthread_park fails only when the kernel thread has already
exited. Can this ever happen during this call path?
--
Michal Hocko
SUSE Labs
On Tue, Aug 04, 2015 at 03:31:30PM +0200, Michal Hocko wrote:
> On Sat 01-08-15 14:49:25, Ulrich Obergfell wrote:
> [...]
> > @@ -3368,7 +3368,10 @@ static __init int fixup_ht_bug(void)
> > return 0;
> > }
> >
> > - watchdog_nmi_disable_all();
> > + if (watchdog_suspend() != 0) {
> > + pr_info("failed to disable PMU erratum BJ122, BV98, HSD29 workaround\n");
> > + return 0;
> > + }
>
> Is this really worth reporting to the log? What is an admin supposed to
> do about it?
I think it was more for developers to aid in debugging a strange behaviour
of the performance counters.
> <looking into the code>
> Ok, so kthread_park fails only when the kernel thread has already
> exited. Can this ever happen during this call path?
It might be overkill, but it is just a harmless informational failure
message.
Cheers,
Don
On Tue 04-08-15 10:27:50, Don Zickus wrote:
> On Tue, Aug 04, 2015 at 03:31:30PM +0200, Michal Hocko wrote:
> > On Sat 01-08-15 14:49:25, Ulrich Obergfell wrote:
> > [...]
> > > @@ -3368,7 +3368,10 @@ static __init int fixup_ht_bug(void)
> > > return 0;
> > > }
> > >
> > > - watchdog_nmi_disable_all();
> > > + if (watchdog_suspend() != 0) {
> > > + pr_info("failed to disable PMU erratum BJ122, BV98, HSD29 workaround\n");
> > > + return 0;
> > > + }
> >
> > Is this really worth reporting to the log? What is an admin supposed to
> > do about it?
>
> I think it was more for developers to aid in debugging a strange behaviour
> of the performance counters.
pr_debug then?
> > <looking into the code>
> > Ok, so kthread_park fails only when the kernel thread has already
> > exited. Can this ever happen during this call path?
>
> It might be overkill, but it is just a harmless informational failure
> message.
Maybe we have way too many of those harmless informational failure
admins scratch their heads about...
--
Michal Hocko
SUSE Labs
> ----- Original Message -----
> From: "Don Zickus" <[email protected]>
...
> On Tue, Aug 04, 2015 at 03:31:30PM +0200, Michal Hocko wrote:
>> On Sat 01-08-15 14:49:25, Ulrich Obergfell wrote:
>> [...]
>> > @@ -3368,7 +3368,10 @@ static __init int fixup_ht_bug(void)
>> > return 0;
>> > }
>> >
>> > - watchdog_nmi_disable_all();
>> > + if (watchdog_suspend() != 0) {
>> > + pr_info("failed to disable PMU erratum BJ122, BV98, HSD29 workaround\n");
>> > + return 0;
>> > + }
>>
>> Is this really worth reporting to the log? What is an admin supposed to
>> do about it?
>
> I think it was more for developers to aid in debugging a strange behaviour
> of the performance counters.
>
>> <looking into the code>
>> Ok, so kthread_park fails only when the kernel thread has already
>> exited. Can this ever happen during this call path?
>
> It might be overkill, but it is just a harmless informational failure
> message.
Don, Michal,
the module prints a message if the workaround is enabled and if the
workaround is disabled. Hence, I think we should keep the messages
consistent and thus inform the user also if we fail to disable the
workaround. Even though at the moment this seems to be an unlikely
failure case, I agree with Don that the message could be useful in
debugging.
Regards,
Uli
> ----- Original Message -----
> From: "Michal Hocko" <[email protected]>
...
> On Sat 01-08-15 14:49:22, Ulrich Obergfell wrote:
>> These functions are intended to be used only from inside kernel/watchdog.c
>> to park/unpark all watchdog threads that are specified in watchdog_cpumask.
>
> I would suggest merging this into Patch2. It is usually better to add
> new functions along with their users.
Michal,
watchdog_{park|unpark}_threads are called by watchdog_{suspend|resume}
in Patch 2/4 and by update_watchdog_all_cpus() in Patch 3/4, so I would
have to merge three patches into one, and I would end up with one large
patch and one smaller patch. I think the result would be harder to read,
review and understand. Thus I'd prefer to leave the patches split as they
currently are.
Regards,
Uli
On Sat, 1 Aug 2015 14:49:23 +0200 Ulrich Obergfell <[email protected]> wrote:
> This interface can be utilized to deactivate the hard and soft lockup
> detector temporarily. Callers are expected to minimize the duration of
> deactivation. Multiple deactivations are allowed to occur in parallel
> but should be rare in practice.
>
> ...
>
> --- a/include/linux/nmi.h
> +++ b/include/linux/nmi.h
> @@ -80,6 +80,8 @@ extern int proc_watchdog_thresh(struct ctl_table *, int ,
> void __user *, size_t *, loff_t *);
> extern int proc_watchdog_cpumask(struct ctl_table *, int,
> void __user *, size_t *, loff_t *);
> +extern int watchdog_suspend(void);
> +extern void watchdog_resume(void);
> #endif
>
> #ifdef CONFIG_HAVE_ACPI_APEI_NMI
> diff --git a/kernel/watchdog.c b/kernel/watchdog.c
> index 5571f20..98d44b1 100644
> --- a/kernel/watchdog.c
> +++ b/kernel/watchdog.c
> @@ -67,6 +67,7 @@ unsigned long *watchdog_cpumask_bits = cpumask_bits(&watchdog_cpumask);
> #define for_each_watchdog_cpu(cpu) \
> for_each_cpu_and((cpu), cpu_online_mask, &watchdog_cpumask)
>
> +static int __read_mostly watchdog_suspended = 0;
With my compiler the "= 0" increases the size of watchdog.o data. For
some reason by 16 bytes(!).
> static int __read_mostly watchdog_running;
> static u64 __read_mostly sample_period;
The relationship between watchdog_running and watchdog_suspended hurts
my brain a bit. It appears that all watchdog_running transitions
happen under watchdog_proc_mutex so I don't think it's racy, but I
wonder if things would be simpler if these were folded into a single
up/down counter.
> ----- Original Message -----
> From: "Andrew Morton" <[email protected]>
> ...
> On Sat, 1 Aug 2015 14:49:23 +0200 Ulrich Obergfell <[email protected]> wrote:
>
>> This interface can be utilized to deactivate the hard and soft lockup
>> detector temporarily. Callers are expected to minimize the duration of
>> deactivation. Multiple deactivations are allowed to occur in parallel
>> but should be rare in practice.
>>
>> ...
>>
>> --- a/include/linux/nmi.h
>> +++ b/include/linux/nmi.h
>> @@ -80,6 +80,8 @@ extern int proc_watchdog_thresh(struct ctl_table *, int ,
>> void __user *, size_t *, loff_t *);
>> extern int proc_watchdog_cpumask(struct ctl_table *, int,
>> void __user *, size_t *, loff_t *);
>> +extern int watchdog_suspend(void);
>> +extern void watchdog_resume(void);
>> #endif
>>
>> #ifdef CONFIG_HAVE_ACPI_APEI_NMI
>> diff --git a/kernel/watchdog.c b/kernel/watchdog.c
>> index 5571f20..98d44b1 100644
>> --- a/kernel/watchdog.c
>> +++ b/kernel/watchdog.c
>> @@ -67,6 +67,7 @@ unsigned long *watchdog_cpumask_bits = cpumask_bits(&watchdog_cpumask);
>> #define for_each_watchdog_cpu(cpu) \
>> for_each_cpu_and((cpu), cpu_online_mask, &watchdog_cpumask)
>>
>> +static int __read_mostly watchdog_suspended = 0;
>
> With my compiler the "= 0" increases the size of watchdog.o data. For
> some reason by 16 bytes(!).
I see that you already fixed this. Many Thanks.
>> static int __read_mostly watchdog_running;
>> static u64 __read_mostly sample_period;
>
> The relationship between watchdog_running and watchdog_suspended hurts
> my brain a bit. It appears that all watchdog_running transitions
> happen under watchdog_proc_mutex so I don't think it's racy, but I
> wonder if things would be simpler if these were folded into a single
> up/down counter.
The 'watchdog_running' variable indicates whether watchdog threads exist.
[Whether they have been launched via smpboot_register_percpu_thread().]
The 'watchdog_suspended' variable indicates whether existing threads are
currently parked, and whether multiple requests to suspend the watchdog
have been made by different callers.
I think folding them into one variable would not improve the readability
of the code, because instead of two variables with distinct semantics we
would then have one variable with multiple semantics (which I think would
also make code more complex). If we would want to improve the readability
I'd prefer to either just add a comment block to explain the semantics of
the variables or to rename them -for example- like:
watchdog_running to watchdog_threads_exist
watchdog_suspended to watchdog_suspend_count
Please let me know if you would want any changes in this regard.
I also received these suggestions:
- rename the watchdog_{suspend|resume} functions as discussed in
http://marc.info/?l=linux-kernel&m=143844050610220&w=2
- use pr_debug() instead of pr_info() as discussed in
http://marc.info/?l=linux-kernel&m=143869949229461&w=2
Please let me know if you want me to post a 'version 2' of the patch set
or if you want me to post these changes as separate follow-up patches.
Regards,
Uli
On Sat, 1 Aug 2015 14:49:25 +0200 Ulrich Obergfell <[email protected]> wrote:
> Remove watchdog_nmi_disable_all() and watchdog_nmi_enable_all()
> since these functions are no longer needed. If a subsystem has a
> need to deactivate the watchdog temporarily, it should utilize the
> watchdog_suspend() and watchdog_resume() functions.
>
With x86_64 allnoconfig I'm getting
arch/x86/kernel/cpu/perf_event_intel.c: In function 'fixup_ht_bug':
arch/x86/kernel/cpu/perf_event_intel.c:3371: error: implicit declaration of function 'watchdog_suspend'
arch/x86/kernel/cpu/perf_event_intel.c:3382: error: implicit declaration of function 'watchdog_resume'
I had to mangle your patches fairly heavily to make them fit against
other pending changes. Specifically
http://ozlabs.org/~akpm/mmots/broken-out/watchdog-move-nmi-function-header-declarations-from-watchdogh-to-nmih.patch
and
http://ozlabs.org/~akpm/mmots/broken-out/watchdog-move-nmi-function-header-declarations-from-watchdogh-to-nmih-v2.patch.
But I don't *think* I caused this... I'm testing this:
--- a/include/linux/nmi.h~watchdog-use-suspend-resume-interface-in-fixup_ht_bug-fix
+++ a/include/linux/nmi.h
@@ -80,6 +80,15 @@ extern int proc_watchdog_cpumask(struct
void __user *, size_t *, loff_t *);
extern int watchdog_suspend(void);
extern void watchdog_resume(void);
+#else
+static inline int watchdog_suspend(void)
+{
+ return 0;
+}
+
+static inline void watchdog_resume(void)
+{
+}
#endif
#ifdef CONFIG_HAVE_ACPI_APEI_NMI