Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1751441AbdFZUbW (ORCPT ); Mon, 26 Jun 2017 16:31:22 -0400 Received: from Galois.linutronix.de ([146.0.238.70]:42265 "EHLO Galois.linutronix.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751428AbdFZUbQ (ORCPT ); Mon, 26 Jun 2017 16:31:16 -0400 Date: Mon, 26 Jun 2017 22:30:57 +0200 (CEST) From: Thomas Gleixner To: Don Zickus cc: Kan Liang , linux-kernel@vger.kernel.org, mingo@kernel.org, akpm@linux-foundation.org, babu.moger@oracle.com, atomlin@redhat.com, prarit@redhat.com, torvalds@linux-foundation.org, peterz@infradead.org, eranian@google.com, acme@redhat.com, ak@linux.intel.com, stable@vger.kernel.org Subject: Re: [PATCH V2] kernel/watchdog: fix spurious hard lockups In-Reply-To: <20170626201927.3ak7fk3yvdzbb4ay@redhat.com> Message-ID: References: <20170621144118.5939-1-kan.liang@intel.com> <20170622154450.2lua7fdmigcixldw@redhat.com> <20170623162907.l6inpxgztwwkeaoi@redhat.com> <20170626201927.3ak7fk3yvdzbb4ay@redhat.com> User-Agent: Alpine 2.20 (DEB 67 2015-01-07) MIME-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 1684 Lines: 46 On Mon, 26 Jun 2017, Don Zickus wrote: > On Fri, Jun 23, 2017 at 11:50:25PM +0200, Thomas Gleixner wrote: > > On Fri, 23 Jun 2017, Don Zickus wrote: > > > Hmm, all this work for a temp fix. Kan, how much longer until the real fix > > > of having perf count the right cycles? > > > > Quite a while. The approach is wilfully breaking the user space ABI, which > > is not going to happen. > > > > And there is a simpler solution as well, as I said here: > > > > http://lkml.kernel.org/r/alpine.DEB.2.20.1706221730520.1885@nanos > > Hi Thomas, > > So, you are saying instead of slowing down the perf counter, speed up the > hrtimer to sample more frequently like so: > > diff --git a/kernel/watchdog.c b/kernel/watchdog.c > index 03e0b69..8ff49de 100644 > --- a/kernel/watchdog.c > +++ b/kernel/watchdog.c > @@ -160,7 +160,7 @@ static void set_sample_period(void) > * and hard thresholds) to increment before the > * hardlockup detector generates a warning > */ > - sample_period = get_softlockup_thresh() * ((u64)NSEC_PER_SEC / 5); > + sample_period = get_softlockup_thresh() * ((u64)NSEC_PER_SEC / 10); > } > > /* Commands for resetting the watchdog */ > > > That is another way of doing it. It just hits all the arches. It does seem > cleaner as the watchdog_thresh value still retains it correct meaning. Are > the laptop folks going to yell at me some more for waking their systems up > more? :-) Yes, that's bound to happen. You might make them less angry if you wake the softlockup thread only on every second hrtimer expiry, i.e. keeping the current wakeup rate. But I can't promise that this will significantly lower their wrath. :) Thanks, tglx