Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1752907AbdFURHm (ORCPT ); Wed, 21 Jun 2017 13:07:42 -0400 Received: from mga01.intel.com ([192.55.52.88]:14463 "EHLO mga01.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751798AbdFURHl (ORCPT ); Wed, 21 Jun 2017 13:07:41 -0400 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.39,369,1493708400"; d="scan'208";a="276996516" Date: Wed, 21 Jun 2017 10:07:34 -0700 From: Andi Kleen To: Thomas Gleixner Cc: Kan Liang , linux-kernel@vger.kernel.org, dzickus@redhat.com, mingo@kernel.org, akpm@linux-foundation.org, babu.moger@oracle.com, atomlin@redhat.com, prarit@redhat.com, torvalds@linux-foundation.org, peterz@infradead.org, eranian@google.com, acme@redhat.com, stable@vger.kernel.org Subject: Re: [PATCH V2] kernel/watchdog: fix spurious hard lockups Message-ID: <20170621170734.GF23705@tassilo.jf.intel.com> References: <20170621144118.5939-1-kan.liang@intel.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.8.0 (2017-02-23) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 859 Lines: 22 On Wed, Jun 21, 2017 at 05:12:06PM +0200, Thomas Gleixner wrote: > On Wed, 21 Jun 2017, kan.liang@intel.com wrote: > > > > #ifdef CONFIG_HARDLOCKUP_DETECTOR > > +/* > > + * The NMI watchdog relies on PERF_COUNT_HW_CPU_CYCLES event, which > > + * can tick faster than the measured CPU Frequency due to Turbo mode. > > + * That can lead to spurious timeouts. > > + * To workaround the issue, extending the period by 3 times. > > + */ > > u64 hw_nmi_get_sample_period(int watchdog_thresh) > > { > > - return (u64)(cpu_khz) * 1000 * watchdog_thresh; > > + return (u64)(cpu_khz) * 1000 * watchdog_thresh * 3; > > The maximum turbo frequency of any given machine can be retrieved. Not reliably, e.g. not in virtualization. Also it would require model specific checks, so as soon as you have a new model and an old kernel it could still randomly fail. -Andi