Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1752427AbdIATIZ (ORCPT ); Fri, 1 Sep 2017 15:08:25 -0400 Received: from mx1.redhat.com ([209.132.183.28]:38082 "EHLO mx1.redhat.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752131AbdIATIY (ORCPT ); Fri, 1 Sep 2017 15:08:24 -0400 DMARC-Filter: OpenDMARC Filter v1.3.2 mx1.redhat.com 2FC2E129A Authentication-Results: ext-mx05.extmail.prod.ext.phx2.redhat.com; dmarc=none (p=none dis=none) header.from=redhat.com Authentication-Results: ext-mx05.extmail.prod.ext.phx2.redhat.com; spf=fail smtp.mailfrom=dzickus@redhat.com Date: Fri, 1 Sep 2017 15:08:22 -0400 From: Don Zickus To: Thomas Gleixner Cc: LKML , Peter Zijlstra , Ingo Molnar , Andrew Morton , Borislav Petkov , Sebastian Siewior , Nicholas Piggin , Chris Metcalf , Ulrich Obergfell Subject: Re: [patch 17/29] lockup_detector: Get rid of the thread teardown/setup dance Message-ID: <20170901190822.4vmgl72qefp72s2j@redhat.com> References: <20170831071558.995235362@linutronix.de> <20170831073054.346816852@linutronix.de> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20170831073054.346816852@linutronix.de> User-Agent: NeoMutt/20170428-dirty (1.8.2) X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.5.16 (mx1.redhat.com [10.5.110.29]); Fri, 01 Sep 2017 19:08:24 +0000 (UTC) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 9867 Lines: 318 On Thu, Aug 31, 2017 at 09:16:15AM +0200, Thomas Gleixner wrote: > The lockup detector reconfiguration tears down all watchdog threads when > the watchdog is disabled and sets them up again when its enabled. > > That's a pointless exercise. The watchdog threads are not consuming an > insane amount of resources, so it's enough to set them up at init time and > keep them in parked position when the watchdog is disabled and unpark them > when it is reenabled. The smpboot thread infrastructure takes care of > keeping the force parked threads in place even across cpu hotplug. The original reasoning for implementing this complexity was we were worried about catching kthread_[un]park errors. It was thought in the rare case if it failed, we would hang. Perhaps we misunderstood the kthread_[un]park code. Cheers, Don > > Aside of that the code implements the park/unpark facility of smp hotplug > threads on its own, which is even more pointless. We have functionality in > the smpboot thread code to do so. > > Use the new thread management functions and get rid of the unholy mess. > > Signed-off-by: Thomas Gleixner > --- > kernel/watchdog.c | 190 +++++------------------------------------------------- > 1 file changed, 19 insertions(+), 171 deletions(-) > > --- a/kernel/watchdog.c > +++ b/kernel/watchdog.c > @@ -92,13 +92,6 @@ struct cpumask watchdog_cpumask __read_m > unsigned long *watchdog_cpumask_bits = cpumask_bits(&watchdog_cpumask); > > /* > - * The 'watchdog_running' variable is set to 1 when the watchdog threads > - * are registered/started and is set to 0 when the watchdog threads are > - * unregistered/stopped, so it is an indicator whether the threads exist. > - */ > -static int __read_mostly watchdog_running; > - > -/* > * These functions can be overridden if an architecture implements its > * own hardlockup detector. > * > @@ -123,10 +116,6 @@ void __weak watchdog_nmi_reconfigure(voi > > #ifdef CONFIG_SOFTLOCKUP_DETECTOR > > -/* Helper for online, unparked cpus. */ > -#define for_each_watchdog_cpu(cpu) \ > - for_each_cpu_and((cpu), cpu_online_mask, &watchdog_cpumask) > - > /* Global variables, exported for sysctl */ > unsigned int __read_mostly softlockup_panic = > CONFIG_BOOTPARAM_SOFTLOCKUP_PANIC_VALUE; > @@ -252,11 +241,15 @@ void touch_all_softlockup_watchdogs(void > int cpu; > > /* > - * this is done lockless > - * do we care if a 0 races with a timestamp? > - * all it means is the softlock check starts one cycle later > + * watchdog_mutex cannpt be taken here, as this might be called > + * from (soft)interrupt context, so the access to > + * watchdog_allowed_cpumask might race with a concurrent update. > + * > + * The watchdog time stamp can race against a concurrent real > + * update as well, the only side effect might be a cycle delay for > + * the softlockup check. > */ > - for_each_watchdog_cpu(cpu) > + for_each_cpu(cpu, &watchdog_allowed_mask) > per_cpu(watchdog_touch_ts, cpu) = 0; > wq_watchdog_touch(-1); > } > @@ -296,9 +289,6 @@ static void watchdog_interrupt_count(voi > __this_cpu_inc(hrtimer_interrupts); > } > > -static int watchdog_enable_all_cpus(void); > -static void watchdog_disable_all_cpus(void); > - > /* watchdog kicker functions */ > static enum hrtimer_restart watchdog_timer_fn(struct hrtimer *hrtimer) > { > @@ -492,95 +482,6 @@ static struct smp_hotplug_thread watchdo > .unpark = watchdog_enable, > }; > > -/* > - * park all watchdog threads that are specified in 'watchdog_cpumask' > - * > - * This function returns an error if kthread_park() of a watchdog thread > - * fails. In this situation, the watchdog threads of some CPUs can already > - * be parked and the watchdog threads of other CPUs can still be runnable. > - * Callers are expected to handle this special condition as appropriate in > - * their context. > - * > - * This function may only be called in a context that is protected against > - * races with CPU hotplug - for example, via get_online_cpus(). > - */ > -static int watchdog_park_threads(void) > -{ > - int cpu, ret = 0; > - > - for_each_watchdog_cpu(cpu) { > - ret = kthread_park(per_cpu(softlockup_watchdog, cpu)); > - if (ret) > - break; > - } > - return ret; > -} > - > -/* > - * unpark all watchdog threads that are specified in 'watchdog_cpumask' > - * > - * This function may only be called in a context that is protected against > - * races with CPU hotplug - for example, via get_online_cpus(). > - */ > -static void watchdog_unpark_threads(void) > -{ > - int cpu; > - > - for_each_watchdog_cpu(cpu) > - kthread_unpark(per_cpu(softlockup_watchdog, cpu)); > -} > - > -static int update_watchdog_all_cpus(void) > -{ > - int ret; > - > - ret = watchdog_park_threads(); > - if (ret) > - return ret; > - > - watchdog_unpark_threads(); > - > - return 0; > -} > - > -static int watchdog_enable_all_cpus(void) > -{ > - int err = 0; > - > - if (!watchdog_running) { > - err = smpboot_register_percpu_thread_cpumask(&watchdog_threads, > - &watchdog_cpumask); > - if (err) > - pr_err("Failed to create watchdog threads, disabled\n"); > - else > - watchdog_running = 1; > - } else { > - /* > - * Enable/disable the lockup detectors or > - * change the sample period 'on the fly'. > - */ > - err = update_watchdog_all_cpus(); > - > - if (err) { > - watchdog_disable_all_cpus(); > - pr_err("Failed to update lockup detectors, disabled\n"); > - } > - } > - > - if (err) > - watchdog_enabled = 0; > - > - return err; > -} > - > -static void watchdog_disable_all_cpus(void) > -{ > - if (watchdog_running) { > - watchdog_running = 0; > - smpboot_unregister_percpu_thread(&watchdog_threads); > - } > -} > - > static void softlockup_update_smpboot_threads(void) > { > lockdep_assert_held(&watchdog_mutex); > @@ -655,7 +556,6 @@ static inline int watchdog_park_threads( > static inline void watchdog_unpark_threads(void) { } > static inline int watchdog_enable_all_cpus(void) { return 0; } > static inline void watchdog_disable_all_cpus(void) { } > -static inline void set_sample_period(void) { } > static inline void softlockup_init_threads(void) { } > static inline void softlockup_update_threads(void) { } > static inline void softlockup_reconfigure_threads(bool enabled) { } > @@ -695,28 +595,10 @@ void lockup_detector_soft_poweroff(void) > /* > * Update the run state of the lockup detectors. > */ > -static int proc_watchdog_update(void) > +static void proc_watchdog_update(void) > { > - int err = 0; > - > - /* > - * Watchdog threads won't be started if they are already active. > - * The 'watchdog_running' variable in watchdog_*_all_cpus() takes > - * care of this. If those threads are already active, the sample > - * period will be updated and the lockup detectors will be enabled > - * or disabled 'on the fly'. > - */ > - if (watchdog_enabled && watchdog_thresh) > - err = watchdog_enable_all_cpus(); > - else > - watchdog_disable_all_cpus(); > - > + softlockup_reconfigure_threads(watchdog_enabled && watchdog_thresh); > watchdog_nmi_reconfigure(); > - > - __lockup_detector_cleanup(); > - > - return err; > - > } > > /* > @@ -772,17 +654,8 @@ static int proc_watchdog_common(int whic > new = old & ~which; > } while (cmpxchg(&watchdog_enabled, old, new) != old); > > - /* > - * Update the run state of the lockup detectors. There is _no_ > - * need to check the value returned by proc_watchdog_update() > - * and to restore the previous value of 'watchdog_enabled' as > - * both lockup detectors are disabled if proc_watchdog_update() > - * returns an error. > - */ > - if (old == new) > - goto out; > - > - err = proc_watchdog_update(); > + if (old != new) > + proc_watchdog_update(); > } > out: > mutex_unlock(&watchdog_mutex); > @@ -826,50 +699,28 @@ int proc_soft_watchdog(struct ctl_table > int proc_watchdog_thresh(struct ctl_table *table, int write, > void __user *buffer, size_t *lenp, loff_t *ppos) > { > - int err, old, new; > + int err, old; > > cpu_hotplug_disable(); > mutex_lock(&watchdog_mutex); > > - old = ACCESS_ONCE(watchdog_thresh); > + old = READ_ONCE(watchdog_thresh); > err = proc_dointvec_minmax(table, write, buffer, lenp, ppos); > > - if (err || !write) > - goto out; > + if (!err && write && old != READ_ONCE(watchdog_thresh)) > + proc_watchdog_update(); > > - /* > - * Update the sample period. Restore on failure. > - */ > - new = ACCESS_ONCE(watchdog_thresh); > - if (old == new) > - goto out; > - > - set_sample_period(); > - err = proc_watchdog_update(); > - if (err) { > - watchdog_thresh = old; > - set_sample_period(); > - } > -out: > mutex_unlock(&watchdog_mutex); > cpu_hotplug_enable(); > return err; > } > > -static void watchdog_update_cpus(void) > -{ > - if (IS_ENABLED(CONFIG_SOFTLOCKUP_DETECTOR) && watchdog_running) { > - smpboot_update_cpumask_percpu_thread(&watchdog_threads, > - &watchdog_cpumask); > - __lockup_detector_cleanup(); > - } > -} > - > static void proc_watchdog_cpumask_update(void) > { > /* Remove impossible cpus to keep sysctl output clean. */ > cpumask_and(&watchdog_cpumask, &watchdog_cpumask, cpu_possible_mask); > - watchdog_update_cpus(); > + > + softlockup_update_threads(); > watchdog_nmi_reconfigure(); > } > > @@ -899,8 +750,6 @@ int proc_watchdog_cpumask(struct ctl_tab > > void __init lockup_detector_init(void) > { > - set_sample_period(); > - > #ifdef CONFIG_NO_HZ_FULL > if (tick_nohz_full_enabled()) { > pr_info("Disabling watchdog on nohz_full cores by default\n"); > @@ -911,6 +760,5 @@ void __init lockup_detector_init(void) > cpumask_copy(&watchdog_cpumask, cpu_possible_mask); > #endif > > - if (watchdog_enabled) > - watchdog_enable_all_cpus(); > + softlockup_init_threads(); > } > >