Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1752556AbbDBSpP (ORCPT ); Thu, 2 Apr 2015 14:45:15 -0400 Received: from mx1.redhat.com ([209.132.183.28]:46658 "EHLO mx1.redhat.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1753894AbbDBSpJ (ORCPT ); Thu, 2 Apr 2015 14:45:09 -0400 Date: Thu, 2 Apr 2015 14:45:06 -0400 From: Don Zickus To: cmetcalf@ezchip.com Cc: Frederic Weisbecker , Ingo Molnar , Andrew Morton , Andrew Jones , chai wen , Ulrich Obergfell , Fabian Frederick , Aaron Tomlin , Ben Zhang , Christoph Lameter , Gilad Ben-Yossef , Steven Rostedt , linux-kernel@vger.kernel.org, Jonathan Corbet , linux-doc@vger.kernel.org, Peter Zijlstra Subject: Re: [PATCH v3] watchdog: add watchdog_cpumask sysctl to assist nohz Message-ID: <20150402184506.GI175361@redhat.com> References: <1427741465-15747-1-git-send-email-cmetcalf@ezchip.com> <20150331072502.GA16754@gmail.com> <551AE7D4.3020608@ezchip.com> <20150402133502.GA175361@redhat.com> <551D48F9.6090101@ezchip.com> <20150402141527.GD175361@redhat.com> <20150402153827.GC10357@lerouge> <551D6373.2030000@ezchip.com> <20150402164845.GD10357@lerouge> <1427996368-2199-1-git-send-email-cmetcalf@ezchip.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <1427996368-2199-1-git-send-email-cmetcalf@ezchip.com> User-Agent: Mutt/1.5.21 (2010-09-15) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 8087 Lines: 223 On Thu, Apr 02, 2015 at 01:39:28PM -0400, cmetcalf@ezchip.com wrote: > From: Chris Metcalf > > Change the default behavior of watchdog so it only runs on the > housekeeping cores when nohz_full is enabled at build and boot time. > > Allow modifying the set of cores the watchdog is currently running > on with a new kernel.watchdog_cpumask sysctl. > > Signed-off-by: Chris Metcalf > --- > Technically this is only v2, but I accidentally replied to an > earlier email after adding v2 to the subject line, so for clarity > I'm calling this thread v3. > > This change depends on my earlier change to add a > tick_nohz_full_clear_cpus() API. If folks are OK with my doing so, I can > add it to the set of patches I'm planning to ask Linus to pull for 4.1. > > Documentation/lockup-watchdogs.txt | 6 ++++++ > Documentation/sysctl/kernel.txt | 9 +++++++++ > include/linux/nmi.h | 1 + > include/linux/sched.h | 3 +++ > kernel/sysctl.c | 7 +++++++ > kernel/watchdog.c | 33 ++++++++++++++++++++++++++++++++- > 6 files changed, 58 insertions(+), 1 deletion(-) > > diff --git a/Documentation/lockup-watchdogs.txt b/Documentation/lockup-watchdogs.txt > index ab0baa692c13..82a99eedf904 100644 > --- a/Documentation/lockup-watchdogs.txt > +++ b/Documentation/lockup-watchdogs.txt > @@ -61,3 +61,9 @@ As explained above, a kernel knob is provided that allows > administrators to configure the period of the hrtimer and the perf > event. The right value for a particular environment is a trade-off > between fast response to lockups and detection overhead. > + > +By default, the watchdog runs on all online cores. However, on a > +kernel configured with NO_HZ_FULL, by default the watchdog runs only > +on the housekeeping cores, not the cores specified in the "nohz_full" > +boot argument. In either case, the set of cores running the watchdog > +may be adjusted via the kernel.watchdog_cpumask sysctl. > diff --git a/Documentation/sysctl/kernel.txt b/Documentation/sysctl/kernel.txt > index 83ab25660fc9..5821dc6bb5c2 100644 > --- a/Documentation/sysctl/kernel.txt > +++ b/Documentation/sysctl/kernel.txt > @@ -858,6 +858,15 @@ example. If a system hangs up, try pressing the NMI switch. > > ============================================================== > > +watchdog_cpumask: > + > +This value can be used to control on which cpus the watchdog will run. > +The default cpumask specifies every core, but if NO_HZ_FULL is enabled > +in the kernel config, and cores are specified with the nohz_full= boot > +argument, those cores are excluded by default. > + > +============================================================== > + > watchdog_thresh: > > This value can be used to control the frequency of hrtimer and NMI > diff --git a/include/linux/nmi.h b/include/linux/nmi.h > index 9b2022ab4d85..cebf36e618e0 100644 > --- a/include/linux/nmi.h > +++ b/include/linux/nmi.h > @@ -70,6 +70,7 @@ int hw_nmi_is_cpu_stuck(struct pt_regs *); > u64 hw_nmi_get_sample_period(int watchdog_thresh); > extern int watchdog_user_enabled; > extern int watchdog_thresh; > +extern unsigned long *watchdog_mask_bits; > extern int sysctl_softlockup_all_cpu_backtrace; > struct ctl_table; > extern int proc_dowatchdog(struct ctl_table *, int , > diff --git a/include/linux/sched.h b/include/linux/sched.h > index 6d77432e14ff..a6f048f4fbeb 100644 > --- a/include/linux/sched.h > +++ b/include/linux/sched.h > @@ -377,6 +377,9 @@ extern void touch_all_softlockup_watchdogs(void); > extern int proc_dowatchdog_thresh(struct ctl_table *table, int write, > void __user *buffer, > size_t *lenp, loff_t *ppos); > +extern int proc_dowatchdog_mask(struct ctl_table *table, int write, > + void __user *buffer, > + size_t *lenp, loff_t *ppos); > extern unsigned int softlockup_panic; > void lockup_detector_init(void); > #else > diff --git a/kernel/sysctl.c b/kernel/sysctl.c > index 88ea2d6e0031..2fb96ffa56d1 100644 > --- a/kernel/sysctl.c > +++ b/kernel/sysctl.c > @@ -860,6 +860,13 @@ static struct ctl_table kern_table[] = { > .extra2 = &sixty, > }, > { > + .procname = "watchdog_cpumask", > + .data = &watchdog_mask_bits, > + .maxlen = NR_CPUS, > + .mode = 0644, > + .proc_handler = proc_dowatchdog_mask, > + }, > + { > .procname = "softlockup_panic", > .data = &softlockup_panic, > .maxlen = sizeof(int), > diff --git a/kernel/watchdog.c b/kernel/watchdog.c > index 3174bf8e3538..2140c2d81dc9 100644 > --- a/kernel/watchdog.c > +++ b/kernel/watchdog.c > @@ -19,6 +19,7 @@ > #include > #include > #include > +#include > > #include > #include > @@ -31,6 +32,8 @@ int __read_mostly sysctl_softlockup_all_cpu_backtrace; > #else > #define sysctl_softlockup_all_cpu_backtrace 0 > #endif > +static cpumask_var_t watchdog_mask; > +unsigned long *watchdog_mask_bits; > > static int __read_mostly watchdog_running; > static u64 __read_mostly sample_period; > @@ -431,6 +434,10 @@ static void watchdog_enable(unsigned int cpu) > hrtimer_init(hrtimer, CLOCK_MONOTONIC, HRTIMER_MODE_REL); > hrtimer->function = watchdog_timer_fn; > > + /* Exit if the cpu is not allowed for watchdog. */ > + if (!cpumask_test_cpu(cpu, watchdog_mask)) > + do_exit(0); > + Besides the do_exit(), a printk is probably needed. > /* Enable the perf event */ > watchdog_nmi_enable(cpu); > > @@ -653,6 +660,8 @@ static void watchdog_disable_all_cpus(void) > } > } > > +static DEFINE_MUTEX(watchdog_proc_mutex); > + I posted a patchset to akpm a while ago from Uli that changed things around with regards to the procfs stuff. Andrew queued it, but I wasn't sure if there was other issues with it or if it is good to go for 4.1. So this piece and the stuff below might get modified later.. > /* > * proc handler for /proc/sys/kernel/nmi_watchdog,watchdog_thresh > */ > @@ -662,7 +671,6 @@ int proc_dowatchdog(struct ctl_table *table, int write, > { > int err, old_thresh, old_enabled; > bool old_hardlockup; > - static DEFINE_MUTEX(watchdog_proc_mutex); > > mutex_lock(&watchdog_proc_mutex); > old_thresh = ACCESS_ONCE(watchdog_thresh); > @@ -700,12 +708,35 @@ out: > mutex_unlock(&watchdog_proc_mutex); > return err; > } > + > +int proc_dowatchdog_mask(struct ctl_table *table, int write, > + void __user *buffer, size_t *lenp, loff_t *ppos) > +{ > + int err; > + > + mutex_lock(&watchdog_proc_mutex); > + err = proc_do_large_bitmap(table, write, buffer, lenp, ppos); > + if (!err && write && watchdog_user_enabled) { > + watchdog_disable_all_cpus(); > + watchdog_enable_all_cpus(false); > + } > + mutex_unlock(&watchdog_proc_mutex); > + return err; > +} > + > #endif /* CONFIG_SYSCTL */ Hmm, based on the procfs changes in the new code, instead of a do_exit(), what if we do a 'return' instead. This keeps the thread registered but does nothing. Later if we update the watchdog_cpumask, a restart easily enables the soft/hard watchdog pieces. The new procfs changes tries hard to handle a 'restart' scenario better as the procfs variables are updated. This piece could fit nicely into that, I think. Those changes start here: https://lkml.org/lkml/2015/2/5/626 Cheers, Don > > void __init lockup_detector_init(void) > { > set_sample_period(); > > + alloc_bootmem_cpumask_var(&watchdog_mask); > + cpumask_copy(watchdog_mask, cpu_possible_mask); > + tick_nohz_full_clear_cpus(watchdog_mask); > + > + /* The sysctl API requires a variable holding a pointer to the mask. */ > + watchdog_mask_bits = cpumask_bits(watchdog_mask); > + > if (watchdog_user_enabled) > watchdog_enable_all_cpus(false); > } > -- > 2.1.2 > -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/