Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1754194AbbDHOBQ (ORCPT ); Wed, 8 Apr 2015 10:01:16 -0400 Received: from mail-wg0-f43.google.com ([74.125.82.43]:35177 "EHLO mail-wg0-f43.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1754011AbbDHOBN (ORCPT ); Wed, 8 Apr 2015 10:01:13 -0400 Date: Wed, 8 Apr 2015 16:01:09 +0200 From: Frederic Weisbecker To: cmetcalf@ezchip.com Cc: Don Zickus , Ingo Molnar , Andrew Morton , Andrew Jones , chai wen , Ulrich Obergfell , Fabian Frederick , Aaron Tomlin , Ben Zhang , Christoph Lameter , Gilad Ben-Yossef , Steven Rostedt , linux-kernel@vger.kernel.org, Jonathan Corbet , linux-doc@vger.kernel.org, Thomas Gleixner , Peter Zijlstra Subject: Re: [PATCH v5 2/2] watchdog: add watchdog_exclude sysctl to assist nohz Message-ID: <20150408140108.GB13227@lerouge> References: <1258649504.12464273.1428252407339.JavaMail.zimbra@redhat.com> <1428349556-21873-1-git-send-email-cmetcalf@ezchip.com> <1428349556-21873-3-git-send-email-cmetcalf@ezchip.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <1428349556-21873-3-git-send-email-cmetcalf@ezchip.com> User-Agent: Mutt/1.5.23 (2014-03-12) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 6562 Lines: 178 On Mon, Apr 06, 2015 at 03:45:56PM -0400, cmetcalf@ezchip.com wrote: > From: Chris Metcalf > > Change the default behavior of watchdog so it only runs on the > housekeeping cores when nohz_full is enabled at build and boot time. > > Allow modifying the set of cores the watchdog is currently running > on with a new kernel.watchdog_exclude sysctl. > > Signed-off-by: Chris Metcalf > --- > Documentation/lockup-watchdogs.txt | 6 ++++++ > Documentation/sysctl/kernel.txt | 9 +++++++++ > include/linux/nmi.h | 3 +++ > kernel/sysctl.c | 7 +++++++ > kernel/watchdog.c | 33 +++++++++++++++++++++++++++++++++ > 5 files changed, 58 insertions(+) > > diff --git a/Documentation/lockup-watchdogs.txt b/Documentation/lockup-watchdogs.txt > index ab0baa692c13..4f86aec1d69d 100644 > --- a/Documentation/lockup-watchdogs.txt > +++ b/Documentation/lockup-watchdogs.txt > @@ -61,3 +61,9 @@ As explained above, a kernel knob is provided that allows > administrators to configure the period of the hrtimer and the perf > event. The right value for a particular environment is a trade-off > between fast response to lockups and detection overhead. > + > +By default, the watchdog runs on all online cores. However, on a > +kernel configured with NO_HZ_FULL, by default the watchdog runs only > +on the housekeeping cores, not the cores specified in the "nohz_full" > +boot argument. In either case, the set of cores excluded from running > +the watchdog may be adjusted via the kernel.watchdog_exclude sysctl. > diff --git a/Documentation/sysctl/kernel.txt b/Documentation/sysctl/kernel.txt > index c831001c45f1..799a1fee3f26 100644 > --- a/Documentation/sysctl/kernel.txt > +++ b/Documentation/sysctl/kernel.txt > @@ -923,6 +923,15 @@ and nmi_watchdog. > > ============================================================== > > +watchdog_exclude: > + > +This value can be used to control on which cpus the watchdog is > +prohibited from running. The default exclude mask is empty, but if > +NO_HZ_FULL is enabled in the kernel config, and cores are specified > +with the nohz_full= boot argument, those cores are excluded by default. > + > +============================================================== Same here, cpumask are rather used as inclusive instead of exclusive. So I'd rather see "watchdog_cpumask". > + > watchdog_thresh: > > This value can be used to control the frequency of hrtimer and NMI > diff --git a/include/linux/nmi.h b/include/linux/nmi.h > index 3d46fb4708e0..5094386b4fb1 100644 > --- a/include/linux/nmi.h > +++ b/include/linux/nmi.h > @@ -67,6 +67,7 @@ extern int nmi_watchdog_enabled; > extern int soft_watchdog_enabled; > extern int watchdog_user_enabled; > extern int watchdog_thresh; > +extern unsigned long *watchdog_exclude_mask_bits; > extern int sysctl_softlockup_all_cpu_backtrace; > struct ctl_table; > extern int proc_watchdog(struct ctl_table *, int , > @@ -77,6 +78,8 @@ extern int proc_soft_watchdog(struct ctl_table *, int , > void __user *, size_t *, loff_t *); > extern int proc_watchdog_thresh(struct ctl_table *, int , > void __user *, size_t *, loff_t *); > +extern int proc_watchdog_exclude(struct ctl_table *, int, > + void __user *, size_t *, loff_t *); > #endif > > #ifdef CONFIG_HAVE_ACPI_APEI_NMI > diff --git a/kernel/sysctl.c b/kernel/sysctl.c > index 2082b1a88fb9..b934a4a01f0a 100644 > --- a/kernel/sysctl.c > +++ b/kernel/sysctl.c > @@ -881,6 +881,13 @@ static struct ctl_table kern_table[] = { > .extra2 = &one, > }, > { > + .procname = "watchdog_exclude", > + .data = &watchdog_exclude_mask_bits, > + .maxlen = NR_CPUS, > + .mode = 0644, > + .proc_handler = proc_watchdog_exclude, > + }, > + { > .procname = "softlockup_panic", > .data = &softlockup_panic, > .maxlen = sizeof(int), > diff --git a/kernel/watchdog.c b/kernel/watchdog.c > index f2be11ab7e08..3f4fbb208437 100644 > --- a/kernel/watchdog.c > +++ b/kernel/watchdog.c > @@ -19,6 +19,7 @@ > #include > #include > #include > +#include > > #include > #include > @@ -56,6 +57,8 @@ int __read_mostly sysctl_softlockup_all_cpu_backtrace; > #else > #define sysctl_softlockup_all_cpu_backtrace 0 > #endif > +static cpumask_var_t watchdog_exclude_mask; > +unsigned long *watchdog_exclude_mask_bits; > > static int __read_mostly watchdog_running; > static u64 __read_mostly sample_period; > @@ -841,12 +844,42 @@ out: > mutex_unlock(&watchdog_proc_mutex); > return err; > } > + > +int proc_watchdog_exclude(struct ctl_table *table, int write, > + void __user *buffer, size_t *lenp, loff_t *ppos) > +{ > + int err; > + > + mutex_lock(&watchdog_proc_mutex); > + err = proc_do_large_bitmap(table, write, buffer, lenp, ppos); > + if (!err && write && watchdog_enabled && watchdog_thresh) { > + watchdog_disable_all_cpus(); > + watchdog_enable_all_cpus(); The problem is that it modifies watchdog_threads.exclude_mask and watchdog_threads is a live object handled by smpboot. It happens not to be racy now because smpboot only checks this cpumask on per_cpu_threads register/unregister time, but it could change and become racy in the future. How about creating smpboot_update_mask_percpu_thread() and handle it from smpboot, this way future evolutions of smpboot won't overlook this cpumask live change? Thanks. > + } > + mutex_unlock(&watchdog_proc_mutex); > + return err; > +} > + > #endif /* CONFIG_SYSCTL */ > > void __init lockup_detector_init(void) > { > set_sample_period(); > > + alloc_bootmem_cpumask_var(&watchdog_exclude_mask); > + watchdog_threads.exclude_mask = watchdog_exclude_mask; > + > +#ifdef CONFIG_NO_HZ_FULL > + if (!cpumask_empty(tick_nohz_full_mask)) > + pr_info("Disabling watchdog on nohz_full cores by default\n"); > + cpumask_copy(watchdog_exclude_mask, tick_nohz_full_mask); > +#else > + cpumask_clear(watchdog_exclude_mask); > +#endif > + > + /* The sysctl API requires a variable holding a pointer to the mask. */ > + watchdog_exclude_mask_bits = cpumask_bits(watchdog_exclude_mask); > + > if (watchdog_enabled) > watchdog_enable_all_cpus(); > } > -- > 2.1.2 > -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/