Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1753748AbdF0VJb (ORCPT ); Tue, 27 Jun 2017 17:09:31 -0400 Received: from mx1.redhat.com ([209.132.183.28]:35692 "EHLO mx1.redhat.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1753395AbdF0VJY (ORCPT ); Tue, 27 Jun 2017 17:09:24 -0400 DMARC-Filter: OpenDMARC Filter v1.3.2 mx1.redhat.com E0CD26655 Authentication-Results: ext-mx09.extmail.prod.ext.phx2.redhat.com; dmarc=none (p=none dis=none) header.from=redhat.com Authentication-Results: ext-mx09.extmail.prod.ext.phx2.redhat.com; spf=pass smtp.mailfrom=dzickus@redhat.com DKIM-Filter: OpenDKIM Filter v2.11.0 mx1.redhat.com E0CD26655 Date: Tue, 27 Jun 2017 17:09:22 -0400 From: Don Zickus To: "Liang, Kan" Cc: Thomas Gleixner , "linux-kernel@vger.kernel.org" , "mingo@kernel.org" , "akpm@linux-foundation.org" , "babu.moger@oracle.com" , "atomlin@redhat.com" , "prarit@redhat.com" , "torvalds@linux-foundation.org" , "peterz@infradead.org" , "eranian@google.com" , "acme@redhat.com" , "ak@linux.intel.com" , "stable@vger.kernel.org" Subject: Re: [PATCH V2] kernel/watchdog: fix spurious hard lockups Message-ID: <20170627210922.vsh2m6ajbtwnmd4d@redhat.com> References: <20170621144118.5939-1-kan.liang@intel.com> <20170622154450.2lua7fdmigcixldw@redhat.com> <20170623162907.l6inpxgztwwkeaoi@redhat.com> <20170626201927.3ak7fk3yvdzbb4ay@redhat.com> <20170627201249.ll34ecwhpme3vh2u@redhat.com> <37D7C6CF3E00A74B8858931C1DB2F0775371357D@SHSMSX103.ccr.corp.intel.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <37D7C6CF3E00A74B8858931C1DB2F0775371357D@SHSMSX103.ccr.corp.intel.com> User-Agent: NeoMutt/20170428-dirty (1.8.2) X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.5.16 (mx1.redhat.com [10.5.110.38]); Tue, 27 Jun 2017 21:09:24 +0000 (UTC) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 1711 Lines: 51 On Tue, Jun 27, 2017 at 08:49:19PM +0000, Liang, Kan wrote: > > > On Mon, Jun 26, 2017 at 04:19:27PM -0400, Don Zickus wrote: > > > On Fri, Jun 23, 2017 at 11:50:25PM +0200, Thomas Gleixner wrote: > > > > On Fri, 23 Jun 2017, Don Zickus wrote: > > > > > Hmm, all this work for a temp fix. Kan, how much longer until the > > > > > real fix of having perf count the right cycles? > > > > > > > > Quite a while. The approach is wilfully breaking the user space ABI, > > > > which is not going to happen. > > > > > > > > And there is a simpler solution as well, as I said here: > > > > > > > > > > > > http://lkml.kernel.org/r/alpine.DEB.2.20.1706221730520.1885@nanos > > > > > > Hi Thomas, > > > > > > So, you are saying instead of slowing down the perf counter, speed up > > > the hrtimer to sample more frequently like so: > > > > > > diff --git a/kernel/watchdog.c b/kernel/watchdog.c index > > > 03e0b69..8ff49de 100644 > > > --- a/kernel/watchdog.c > > > +++ b/kernel/watchdog.c > > > @@ -160,7 +160,7 @@ static void set_sample_period(void) > > > * and hard thresholds) to increment before the > > > * hardlockup detector generates a warning > > > */ > > > - sample_period = get_softlockup_thresh() * ((u64)NSEC_PER_SEC / 5); > > > + sample_period = get_softlockup_thresh() * ((u64)NSEC_PER_SEC / > > 10); > > > } > > > > Hi Kan, > > > > Will the above patch work for you? > > > > I haven't heard back any test result yet. > > The above patch looks good to me. > But I'm not sure if /10 is enough. We may need /15. > Anyway, I think we will test /10 first. > > Which workaround do you prefer, the above one or the one checking timestamp? Let's go with this one, it is simpler. Cheers, Don