Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1752768AbdFUPMs (ORCPT ); Wed, 21 Jun 2017 11:12:48 -0400 Received: from Galois.linutronix.de ([146.0.238.70]:50708 "EHLO Galois.linutronix.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751113AbdFUPMq (ORCPT ); Wed, 21 Jun 2017 11:12:46 -0400 Date: Wed, 21 Jun 2017 17:12:06 +0200 (CEST) From: Thomas Gleixner To: Kan Liang cc: linux-kernel@vger.kernel.org, dzickus@redhat.com, mingo@kernel.org, akpm@linux-foundation.org, babu.moger@oracle.com, atomlin@redhat.com, prarit@redhat.com, torvalds@linux-foundation.org, peterz@infradead.org, eranian@google.com, acme@redhat.com, ak@linux.intel.com, stable@vger.kernel.org Subject: Re: [PATCH V2] kernel/watchdog: fix spurious hard lockups In-Reply-To: <20170621144118.5939-1-kan.liang@intel.com> Message-ID: References: <20170621144118.5939-1-kan.liang@intel.com> User-Agent: Alpine 2.20 (DEB 67 2015-01-07) MIME-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 1355 Lines: 49 On Wed, 21 Jun 2017, kan.liang@intel.com wrote: > > #ifdef CONFIG_HARDLOCKUP_DETECTOR > +/* > + * The NMI watchdog relies on PERF_COUNT_HW_CPU_CYCLES event, which > + * can tick faster than the measured CPU Frequency due to Turbo mode. > + * That can lead to spurious timeouts. > + * To workaround the issue, extending the period by 3 times. > + */ > u64 hw_nmi_get_sample_period(int watchdog_thresh) > { > - return (u64)(cpu_khz) * 1000 * watchdog_thresh; > + return (u64)(cpu_khz) * 1000 * watchdog_thresh * 3; The maximum turbo frequency of any given machine can be retrieved. So why don't you simply take that ratio into account and apply it for the machines which have those insane turbo loaders? That's not a huge effort, can be easily backported and does not inflict this unconditially. So what you want is: return get_max_turbo_khz() * 1000 * watchdog_thresh; Where get_max_turbo_khz() by default returns cpu_khz for non turbo motors. And instead of silently doing this it should emit a info into dmesg: u64 period, max_khz = get_max_turbo_khz(); static int once; period = max_khz * 1000 * watchdog_thresh; if (max_khz != cpu_khz && !once) { unsigned int msec = period / cpu_khz; once = 1; pr_info("Adjusted watchdog threshold to %u.%04u sec\n", msec / 1000, msec % 1000); } return period; Hmm? Thanks, tglx