Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1751757AbdIEN6g (ORCPT ); Tue, 5 Sep 2017 09:58:36 -0400 Received: from Galois.linutronix.de ([146.0.238.70]:45701 "EHLO Galois.linutronix.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751386AbdIEN6d (ORCPT ); Tue, 5 Sep 2017 09:58:33 -0400 Date: Tue, 5 Sep 2017 15:58:31 +0200 (CEST) From: Thomas Gleixner To: Ulrich Obergfell cc: LKML , Peter Zijlstra , Ingo Molnar , Andrew Morton , Borislav Petkov , Sebastian Siewior , Nicholas Piggin , Don Zickus , Chris Metcalf Subject: Re: [patch 11/29] lockup_detector: Remove park_in_progress hackery In-Reply-To: Message-ID: References: <20170831071558.995235362@linutronix.de> <20170831073053.863251887@linutronix.de> User-Agent: Alpine 2.20 (DEB 67 2015-01-07) MIME-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 3866 Lines: 89 On Mon, 4 Sep 2017, Ulrich Obergfell wrote: > "b94f51183b06 ("kernel/watchdog: prevent false hardlockup on overloaded > system") tries to fix the following issue: > > watchdog_stop() > hrtimer_cancel() > perf_nmi_event_stop() > > If the task gets preempted after canceling the hrtimer or the VM gets > scheduled out long enough, then there is a chance that the next NMI will > see a stale hrtimer interrupt count and trigger a false positive hard > lockup splat." > > This is not exactly the issue that b94f51183b06 is actually supposed to fix. > For details, please see the accompanying commit log message at: > > > https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/kernel/watchdog.c?id=b94f51183b0617e7b9b4fb4137d4cf1cab7547c2 > > For convenience, here is a slightly different description of the scenario > that b94f51183b06 originally aims to address: > > - A thread hogs CPU N (soft lockup) so that watchdog/N is unable to run. > - A user re-configures 'watchdog_thresh' on the fly. The reconfiguration > requires parking/unparking of all watchdog threads. > - watchdog_timer_fn() on all CPUs can already pick up the new sample period, > but the perf counter frequency cannot be updated on all CPUs yet because > watchdog/N is unable to park, so watchdog_overflow_callback() still occurs > at a frequency that is based on the old sample period (on CPUs >= N which > have not had a chance to park their watchdog threads yet). > - If 'watchdog_thresh' was increased by the reconfiguration, the interval > at which watchdog_timer_fn() is now running could be too large for the > watchdog_overflow_callback() function to observe a change of the timer > interrupt counter. This can trigger a false positive. Oh well. You did not fix that at all with that hack. The real problem is that the watchdog threshold write in proc immediately updates sample_period, which is evaluated by the running thread. proc_write() set_sample_period() <----- Broken starts proc_watchdog_update() watchdog_enable_all_cpus() update_watchdog_all_cpus() watchdog_park_threads() <----- Broken ends watchdog_park_in_progress = 1 So up to the point where watchdog_park_in_progress is set to 1, the change to sample_period is visible to all active watchdog threads and will be picked up by the watchdog threads to rearm their timer. The NMI watchdog will run with the old frequency until the thread is parked and the watchdog NMI disarmed. The underlying problem is the non synchronized update of sample_period and this band aid hack is not solving that at all. > The above scenario should be rare in practice, so it seemed reasonable to > come up with a simple low-risk fix that addresses merely this particular > scenario (watchdog_timer_fn() and watchdog_overflow_callback() now return > immediately while park is in progress / no lockup detection is performed > during that window of time). That thing is neither reasonable nor a fix. It's mindless hackery which papers over the problem instead of fixing the root cause. It neither saw the problem in the park/unpark in general which is again a valid source for false positives. > In relation to: > > "That commit added a complete abstrusity with a atomic variable telling the > NMI watchdog function that stop is in progress. This is so stupid, that > it's not funny anymore." > > I cannot see any 'abstrusity' or 'stupidity' in the pragmatic approach that > was taken in b94f51183b06 to fix an issue that should be rare in practice > as described above. > > Please change the commit log of [patch 11/29] since it is inconsistent with > the commit log of b94f51183b06. Yes, I will change it to include the above reasoning why this is complete crap and keep the extra findings for reference. Thanks, tglx