Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S933215AbbDJUsq (ORCPT ); Fri, 10 Apr 2015 16:48:46 -0400 Received: from mail-am1on0065.outbound.protection.outlook.com ([157.56.112.65]:29952 "EHLO emea01-am1-obe.outbound.protection.outlook.com" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S1755913AbbDJUsi (ORCPT ); Fri, 10 Apr 2015 16:48:38 -0400 Authentication-Results: spf=fail (sender IP is 12.216.194.146) smtp.mailfrom=ezchip.com; ezchip.com; dkim=none (message not signed) header.d=none; From: Chris Metcalf To: Frederic Weisbecker , Don Zickus , Ingo Molnar , Andrew Morton , Andrew Jones , chai wen , Ulrich Obergfell , Fabian Frederick , Aaron Tomlin , Ben Zhang , "Christoph Lameter" , Gilad Ben-Yossef , "Steven Rostedt" , , "Jonathan Corbet" , , Thomas Gleixner , Peter Zijlstra CC: Chris Metcalf Subject: [PATCH v7 2/3] watchdog: add watchdog_cpumask sysctl to assist nohz Date: Fri, 10 Apr 2015 16:48:19 -0400 Message-ID: <1428698900-13358-2-git-send-email-cmetcalf@ezchip.com> X-Mailer: git-send-email 2.1.2 In-Reply-To: <1428698900-13358-1-git-send-email-cmetcalf@ezchip.com> References: <20150410015842.GG18314@lerouge> <1428698900-13358-1-git-send-email-cmetcalf@ezchip.com> X-EOPAttributedMessage: 0 X-Forefront-Antispam-Report: CIP:12.216.194.146;CTRY:US;IPV:NLI;EFV:NLI;BMV:1;SFV:NSPM;SFS:(10009020)(6009001)(339900001)(199003)(189002)(86362001)(46102003)(575784001)(48376002)(106466001)(105606002)(50986999)(229853001)(19580405001)(19580395003)(6806004)(92566002)(76176999)(50466002)(33646002)(50226001)(77156002)(62966003)(47776003)(87936001)(36756003)(85426001)(104016003)(2950100001)(42186005)(921003)(1121003);DIR:OUT;SFP:1101;SCL:1;SRVR:DB5PR02MB0775;H:ld-1.internal.tilera.com;FPR:;SPF:Fail;MLV:sfv;A:1;MX:1;LANG:en; MIME-Version: 1.0 Content-Type: text/plain X-Microsoft-Antispam: UriScan:;BCL:0;PCL:0;RULEID:;SRVR:DB5PR02MB0775; X-Microsoft-Antispam-PRVS: X-Exchange-Antispam-Report-Test: UriScan:; X-Exchange-Antispam-Report-CFA-Test: BCL:0;PCL:0;RULEID:(601004)(5002010)(5005006);SRVR:DB5PR02MB0775;BCL:0;PCL:0;RULEID:;SRVR:DB5PR02MB0775; X-Forefront-PRVS: 054231DC40 X-OriginatorOrg: ezchip.com X-MS-Exchange-CrossTenant-OriginalArrivalTime: 10 Apr 2015 20:48:33.3629 (UTC) X-MS-Exchange-CrossTenant-Id: 0fc16e0a-3cd3-4092-8b2f-0a42cff122c3 X-MS-Exchange-CrossTenant-OriginalAttributedTenantConnectingIp: TenantId=0fc16e0a-3cd3-4092-8b2f-0a42cff122c3;Ip=[12.216.194.146];Helo=[ld-1.internal.tilera.com] X-MS-Exchange-CrossTenant-FromEntityHeader: HybridOnPrem X-MS-Exchange-Transport-CrossTenantHeadersStamped: DB5PR02MB0775 Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 6144 Lines: 174 Change the default behavior of watchdog so it only runs on the housekeeping cores when nohz_full is enabled at build and boot time. Allow modifying the set of cores the watchdog is currently running on with a new kernel.watchdog_cpumask sysctl. Acked-by: Don Zickus Signed-off-by: Chris Metcalf --- Documentation/lockup-watchdogs.txt | 6 ++++++ Documentation/sysctl/kernel.txt | 11 ++++++++++ include/linux/nmi.h | 3 +++ kernel/sysctl.c | 7 ++++++ kernel/watchdog.c | 44 ++++++++++++++++++++++++++++++++++++++ 5 files changed, 71 insertions(+) diff --git a/Documentation/lockup-watchdogs.txt b/Documentation/lockup-watchdogs.txt index ab0baa692c13..31c312853d4c 100644 --- a/Documentation/lockup-watchdogs.txt +++ b/Documentation/lockup-watchdogs.txt @@ -61,3 +61,9 @@ As explained above, a kernel knob is provided that allows administrators to configure the period of the hrtimer and the perf event. The right value for a particular environment is a trade-off between fast response to lockups and detection overhead. + +By default, the watchdog runs on all online cores. However, on a +kernel configured with NO_HZ_FULL, by default the watchdog runs only +on the housekeeping cores, not the cores specified in the "nohz_full" +boot argument. In either case, the set of cores excluded from running +the watchdog may be adjusted via the kernel.watchdog_cpumask sysctl. diff --git a/Documentation/sysctl/kernel.txt b/Documentation/sysctl/kernel.txt index c831001c45f1..f6a9dca8c100 100644 --- a/Documentation/sysctl/kernel.txt +++ b/Documentation/sysctl/kernel.txt @@ -923,6 +923,17 @@ and nmi_watchdog. ============================================================== +watchdog_cpumask: + +This value can be used to control on which cpus the watchdog may run. +The default cpumask is all possible cores, but if NO_HZ_FULL is +enabled in the kernel config, and cores are specified with the +nohz_full= boot argument, those cores are excluded by default. +Offline cores can be included in this mask, and if the core is later +brought online, the watchdog will be started based on the mask value. + +============================================================== + watchdog_thresh: This value can be used to control the frequency of hrtimer and NMI diff --git a/include/linux/nmi.h b/include/linux/nmi.h index 3d46fb4708e0..f94da0e65dea 100644 --- a/include/linux/nmi.h +++ b/include/linux/nmi.h @@ -67,6 +67,7 @@ extern int nmi_watchdog_enabled; extern int soft_watchdog_enabled; extern int watchdog_user_enabled; extern int watchdog_thresh; +extern unsigned long *watchdog_cpumask_bits; extern int sysctl_softlockup_all_cpu_backtrace; struct ctl_table; extern int proc_watchdog(struct ctl_table *, int , @@ -77,6 +78,8 @@ extern int proc_soft_watchdog(struct ctl_table *, int , void __user *, size_t *, loff_t *); extern int proc_watchdog_thresh(struct ctl_table *, int , void __user *, size_t *, loff_t *); +extern int proc_watchdog_cpumask(struct ctl_table *, int, + void __user *, size_t *, loff_t *); #endif #ifdef CONFIG_HAVE_ACPI_APEI_NMI diff --git a/kernel/sysctl.c b/kernel/sysctl.c index 2082b1a88fb9..699571a74e3b 100644 --- a/kernel/sysctl.c +++ b/kernel/sysctl.c @@ -881,6 +881,13 @@ static struct ctl_table kern_table[] = { .extra2 = &one, }, { + .procname = "watchdog_cpumask", + .data = &watchdog_cpumask_bits, + .maxlen = NR_CPUS, + .mode = 0644, + .proc_handler = proc_watchdog_cpumask, + }, + { .procname = "softlockup_panic", .data = &softlockup_panic, .maxlen = sizeof(int), diff --git a/kernel/watchdog.c b/kernel/watchdog.c index 2316f50b07a4..2199f1f0b5a5 100644 --- a/kernel/watchdog.c +++ b/kernel/watchdog.c @@ -19,6 +19,7 @@ #include #include #include +#include #include #include @@ -56,6 +57,8 @@ int __read_mostly sysctl_softlockup_all_cpu_backtrace; #else #define sysctl_softlockup_all_cpu_backtrace 0 #endif +static cpumask_var_t watchdog_cpumask; +unsigned long *watchdog_cpumask_bits; static int __read_mostly watchdog_running; static u64 __read_mostly sample_period; @@ -869,12 +872,53 @@ out: mutex_unlock(&watchdog_proc_mutex); return err; } + +/* + * The cpumask is the mask of possible cpus that the watchdog can run + * on, not the mask of cpus it is actually running on. This allows the + * user to specify a mask that will include cpus that have not yet + * been brought online, if desired. + */ +int proc_watchdog_cpumask(struct ctl_table *table, int write, + void __user *buffer, size_t *lenp, loff_t *ppos) +{ + int err; + + mutex_lock(&watchdog_proc_mutex); + err = proc_do_large_bitmap(table, write, buffer, lenp, ppos); + if (!err && write) { + /* Remove impossible cpus to keep sysctl output cleaner. */ + cpumask_and(watchdog_cpumask, watchdog_cpumask, + cpu_possible_mask); + + if (watchdog_enabled && watchdog_thresh) + smpboot_update_cpumask_percpu_thread(&watchdog_threads); + } + mutex_unlock(&watchdog_proc_mutex); + return err; +} + #endif /* CONFIG_SYSCTL */ void __init lockup_detector_init(void) { set_sample_period(); + alloc_cpumask_var(&watchdog_cpumask, GFP_KERNEL); + watchdog_threads.cpumask = watchdog_cpumask; + +#ifdef CONFIG_NO_HZ_FULL + if (!cpumask_empty(tick_nohz_full_mask)) + pr_info("Disabling watchdog on nohz_full cores by default\n"); + cpumask_andnot(watchdog_cpumask, cpu_possible_mask, + tick_nohz_full_mask); +#else + cpumask_copy(watchdog_cpumask, cpu_possible_mask); +#endif + + /* The sysctl API requires a variable holding a pointer to the mask. */ + watchdog_cpumask_bits = cpumask_bits(watchdog_cpumask); + if (watchdog_enabled) watchdog_enable_all_cpus(); } -- 2.1.2 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/