Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1753034AbdFURk2 (ORCPT ); Wed, 21 Jun 2017 13:40:28 -0400 Received: from mx1.redhat.com ([209.132.183.28]:58776 "EHLO mx1.redhat.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751640AbdFURk0 (ORCPT ); Wed, 21 Jun 2017 13:40:26 -0400 DMARC-Filter: OpenDMARC Filter v1.3.2 mx1.redhat.com B48E8A4648 Authentication-Results: ext-mx02.extmail.prod.ext.phx2.redhat.com; dmarc=none (p=none dis=none) header.from=redhat.com Authentication-Results: ext-mx02.extmail.prod.ext.phx2.redhat.com; spf=pass smtp.mailfrom=prarit@redhat.com DKIM-Filter: OpenDKIM Filter v2.11.0 mx1.redhat.com B48E8A4648 Subject: Re: [PATCH V2] kernel/watchdog: fix spurious hard lockups To: "Liang, Kan" , Thomas Gleixner References: <20170621144118.5939-1-kan.liang@intel.com> <37D7C6CF3E00A74B8858931C1DB2F077537101A3@SHSMSX103.ccr.corp.intel.com> Cc: "linux-kernel@vger.kernel.org" , "dzickus@redhat.com" , "mingo@kernel.org" , "akpm@linux-foundation.org" , "babu.moger@oracle.com" , "atomlin@redhat.com" , "torvalds@linux-foundation.org" , "peterz@infradead.org" , "eranian@google.com" , "acme@redhat.com" , "ak@linux.intel.com" , "stable@vger.kernel.org" From: Prarit Bhargava Message-ID: Date: Wed, 21 Jun 2017 13:40:23 -0400 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:45.0) Gecko/20100101 Thunderbird/45.7.0 MIME-Version: 1.0 In-Reply-To: <37D7C6CF3E00A74B8858931C1DB2F077537101A3@SHSMSX103.ccr.corp.intel.com> Content-Type: text/plain; charset=windows-1252 Content-Transfer-Encoding: 7bit X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.5.16 (mx1.redhat.com [10.5.110.26]); Wed, 21 Jun 2017 17:40:26 +0000 (UTC) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 1901 Lines: 74 On 06/21/2017 11:47 AM, Liang, Kan wrote: > > >> On Wed, 21 Jun 2017, kan.liang@intel.com wrote: >>> >>> #ifdef CONFIG_HARDLOCKUP_DETECTOR >>> +/* >>> + * The NMI watchdog relies on PERF_COUNT_HW_CPU_CYCLES event, >> which >>> + * can tick faster than the measured CPU Frequency due to Turbo mode. >>> + * That can lead to spurious timeouts. >>> + * To workaround the issue, extending the period by 3 times. >>> + */ >>> u64 hw_nmi_get_sample_period(int watchdog_thresh) { >>> - return (u64)(cpu_khz) * 1000 * watchdog_thresh; >>> + return (u64)(cpu_khz) * 1000 * watchdog_thresh * 3; >> >> The maximum turbo frequency of any given machine can be retrieved. > > The maximum turbo frequency is determined by the model of processor. > I'm not sure if there is a generic way to get the maximum turbo frequency. > Is there? > cpufreq_quick_get_max() but iff cpufreq subsystem is initialized. O/w 0 is returned for the freq. Quick test shows the correct turbo max of 3700000 on my 2000000 (2.00GHz) system. P. > Thanks, > Kan > >> >> So why don't you simply take that ratio into account and apply it for the >> machines which have those insane turbo loaders? That's not a huge effort, >> can be easily backported and does not inflict this unconditially. >> >> So what you want is: >> >> return get_max_turbo_khz() * 1000 * watchdog_thresh; >> >> Where get_max_turbo_khz() by default returns cpu_khz for non turbo >> motors. >> >> And instead of silently doing this it should emit a info into dmesg: >> >> u64 period, max_khz = get_max_turbo_khz(); >> static int once; >> >> period = max_khz * 1000 * watchdog_thresh; >> >> if (max_khz != cpu_khz && !once) { >> unsigned int msec = period / cpu_khz; >> >> once = 1; >> pr_info("Adjusted watchdog threshold to %u.%04u sec\n", >> msec / 1000, msec % 1000); >> } >> >> return period; >> >> Hmm? >> >> Thanks, >> >> tglx >