This patch set addresses various races in relation to CPU hotplug
and a race in relation to watchdog timer expiry. I discovered the
corner cases during code inspection. I haven't seen any of these
issues occur in practice.
Ulrich Obergfell (4):
watchdog: avoid race between lockup detector suspend/resume and CPU
hotplug
watchdog: avoid races between /proc handlers and CPU hotplug
watchdog: remove {get|put}_online_cpus() from
watchdog_{park|unpark}_threads()
watchdog: fix race between proc_watchdog_thresh() and
watchdog_timer_fn()
kernel/watchdog.c | 20 +++++++++++++++-----
1 file changed, 15 insertions(+), 5 deletions(-)
--
1.7.11.7
The lockup detector suspend/resume interface that was introduced by
commit 8c073d27d7ad293bf734cc8475689413afadab81 does not protect
itself against races with CPU hotplug. Hence, theoretically it is
possible that a new watchdog thread is started on a hotplugged CPU
while the lockup detector is suspended, and the thread could thus
interfere unexpectedly with the code that requested to suspend the
lockup detector. Avoid the race by calling
get_online_cpus() in lockup_detector_suspend()
put_online_cpus() in lockup_detector_resume()
Signed-off-by: Ulrich Obergfell <[email protected]>
---
kernel/watchdog.c | 2 ++
1 file changed, 2 insertions(+)
diff --git a/kernel/watchdog.c b/kernel/watchdog.c
index 0a23125..7357842 100644
--- a/kernel/watchdog.c
+++ b/kernel/watchdog.c
@@ -719,6 +719,7 @@ int lockup_detector_suspend(void)
{
int ret = 0;
+ get_online_cpus();
mutex_lock(&watchdog_proc_mutex);
/*
* Multiple suspend requests can be active in parallel (counted by
@@ -759,6 +760,7 @@ void lockup_detector_resume(void)
watchdog_unpark_threads();
mutex_unlock(&watchdog_proc_mutex);
+ put_online_cpus();
}
static int update_watchdog_all_cpus(void)
--
1.7.11.7
The handler functions for watchdog parameters in /proc/sys/kernel
do not protect themselves against races with CPU hotplug. Hence,
theoretically it is possible that a new watchdog thread is started
on a hotplugged CPU while a parameter is being modified, and the
thread could thus use a parameter value that is 'in transition'.
For example, if 'watchdog_thresh' is being set to zero (note: this
disables the lockup detectors) the thread would erroneously use the
value zero as the sample period.
To avoid such races and to keep the /proc handler code consistent,
call
{get|put}_online_cpus() in proc_watchdog_common()
{get|put}_online_cpus() in proc_watchdog_thresh()
{get|put}_online_cpus() in proc_watchdog_cpumask()
Signed-off-by: Ulrich Obergfell <[email protected]>
---
kernel/watchdog.c | 6 ++++++
1 file changed, 6 insertions(+)
diff --git a/kernel/watchdog.c b/kernel/watchdog.c
index 7357842..13fdda1 100644
--- a/kernel/watchdog.c
+++ b/kernel/watchdog.c
@@ -857,6 +857,7 @@ static int proc_watchdog_common(int which, struct ctl_table *table, int write,
int err, old, new;
int *watchdog_param = (int *)table->data;
+ get_online_cpus();
mutex_lock(&watchdog_proc_mutex);
if (watchdog_suspended) {
@@ -908,6 +909,7 @@ static int proc_watchdog_common(int which, struct ctl_table *table, int write,
}
out:
mutex_unlock(&watchdog_proc_mutex);
+ put_online_cpus();
return err;
}
@@ -949,6 +951,7 @@ int proc_watchdog_thresh(struct ctl_table *table, int write,
{
int err, old;
+ get_online_cpus();
mutex_lock(&watchdog_proc_mutex);
if (watchdog_suspended) {
@@ -974,6 +977,7 @@ int proc_watchdog_thresh(struct ctl_table *table, int write,
}
out:
mutex_unlock(&watchdog_proc_mutex);
+ put_online_cpus();
return err;
}
@@ -988,6 +992,7 @@ int proc_watchdog_cpumask(struct ctl_table *table, int write,
{
int err;
+ get_online_cpus();
mutex_lock(&watchdog_proc_mutex);
if (watchdog_suspended) {
@@ -1015,6 +1020,7 @@ int proc_watchdog_cpumask(struct ctl_table *table, int write,
}
out:
mutex_unlock(&watchdog_proc_mutex);
+ put_online_cpus();
return err;
}
--
1.7.11.7
watchdog_{park|unpark}_threads() are now called in code paths that
protect themselves against CPU hotplug, so {get|put}_online_cpus()
calls are redundant and can be removed.
Signed-off-by: Ulrich Obergfell <[email protected]>
---
kernel/watchdog.c | 10 ++++++----
1 file changed, 6 insertions(+), 4 deletions(-)
diff --git a/kernel/watchdog.c b/kernel/watchdog.c
index 13fdda1..84c4744 100644
--- a/kernel/watchdog.c
+++ b/kernel/watchdog.c
@@ -683,33 +683,35 @@ static struct smp_hotplug_thread watchdog_threads = {
* be parked and the watchdog threads of other CPUs can still be runnable.
* Callers are expected to handle this special condition as appropriate in
* their context.
+ *
+ * This function may only be called in a context that is protected against
+ * races with CPU hotplug - for example, via get_online_cpus().
*/
static int watchdog_park_threads(void)
{
int cpu, ret = 0;
- get_online_cpus();
for_each_watchdog_cpu(cpu) {
ret = kthread_park(per_cpu(softlockup_watchdog, cpu));
if (ret)
break;
}
- put_online_cpus();
return ret;
}
/*
* unpark all watchdog threads that are specified in 'watchdog_cpumask'
+ *
+ * This function may only be called in a context that is protected against
+ * races with CPU hotplug - for example, via get_online_cpus().
*/
static void watchdog_unpark_threads(void)
{
int cpu;
- get_online_cpus();
for_each_watchdog_cpu(cpu)
kthread_unpark(per_cpu(softlockup_watchdog, cpu));
- put_online_cpus();
}
/*
--
1.7.11.7
Theoretically it is possible that the watchdog timer expires
right at the time when a user sets 'watchdog_thresh' to zero
(note: this disables the lockup detectors). In this scenario,
the is_softlockup() function - which is called by the timer -
could produce a false positive.
Fix this by checking the current value of 'watchdog_thresh'.
Signed-off-by: Ulrich Obergfell <[email protected]>
---
kernel/watchdog.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/kernel/watchdog.c b/kernel/watchdog.c
index 84c4744..18f34cf 100644
--- a/kernel/watchdog.c
+++ b/kernel/watchdog.c
@@ -289,7 +289,7 @@ static int is_softlockup(unsigned long touch_ts)
{
unsigned long now = get_timestamp();
- if (watchdog_enabled & SOFT_WATCHDOG_ENABLED) {
+ if ((watchdog_enabled & SOFT_WATCHDOG_ENABLED) && watchdog_thresh){
/* Warn about unreasonable delays. */
if (time_after(now, touch_ts + get_softlockup_thresh()))
return now - touch_ts;
--
1.7.11.7
On Tue, Nov 03, 2015 at 04:20:57PM +0100, Ulrich Obergfell wrote:
> This patch set addresses various races in relation to CPU hotplug
> and a race in relation to watchdog timer expiry. I discovered the
> corner cases during code inspection. I haven't seen any of these
> issues occur in practice.
Series looks fine to me. I have run some local panic tests with no
problems, along with modifying various values and everything seems fine.
Acked-by: Don Zickus <[email protected]>
>
> Ulrich Obergfell (4):
> watchdog: avoid race between lockup detector suspend/resume and CPU
> hotplug
> watchdog: avoid races between /proc handlers and CPU hotplug
> watchdog: remove {get|put}_online_cpus() from
> watchdog_{park|unpark}_threads()
> watchdog: fix race between proc_watchdog_thresh() and
> watchdog_timer_fn()
>
> kernel/watchdog.c | 20 +++++++++++++++-----
> 1 file changed, 15 insertions(+), 5 deletions(-)
>
> --
> 1.7.11.7
>
On Tue 2015-11-03 16:20 +0100, Ulrich Obergfell wrote:
> This patch set addresses various races in relation to CPU hotplug
> and a race in relation to watchdog timer expiry. I discovered the
> corner cases during code inspection. I haven't seen any of these
> issues occur in practice.
This patch series does adequately address the race conditions mentioned
above. Thanks.
Reviewed-by: Aaron Tomlin <[email protected]>