Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1752162AbdFUOKe convert rfc822-to-8bit (ORCPT ); Wed, 21 Jun 2017 10:10:34 -0400 Received: from mga06.intel.com ([134.134.136.31]:15046 "EHLO mga06.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751160AbdFUOKc (ORCPT ); Wed, 21 Jun 2017 10:10:32 -0400 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.39,369,1493708400"; d="scan'208";a="1143446038" From: "Liang, Kan" To: Don Zickus CC: Andrew Morton , "linux-kernel@vger.kernel.org" , "mingo@kernel.org" , "babu.moger@oracle.com" , "atomlin@redhat.com" , "prarit@redhat.com" , "torvalds@linux-foundation.org" , "peterz@infradead.org" , "tglx@linutronix.de" , "eranian@google.com" , "acme@redhat.com" , "ak@linux.intel.com" , "stable@vger.kernel.org" Subject: RE: [PATCH] kernel/watchdog: fix spurious hard lockups Thread-Topic: [PATCH] kernel/watchdog: fix spurious hard lockups Thread-Index: AQHS6gz8LSkdJFMSZk6lPSiU4VpMCqItyAaAgAFoAmD//5+wgIAAik+Q Date: Wed, 21 Jun 2017 14:10:27 +0000 Message-ID: <37D7C6CF3E00A74B8858931C1DB2F077537100F7@SHSMSX103.ccr.corp.intel.com> References: <20170620213309.30051-1-kan.liang@intel.com> <20170620150359.0fbb417aed72c84ac6ad8498@linux-foundation.org> <37D7C6CF3E00A74B8858931C1DB2F07753710034@SHSMSX103.ccr.corp.intel.com> <20170621134747.kd6w5rq4zforzaad@redhat.com> In-Reply-To: <20170621134747.kd6w5rq4zforzaad@redhat.com> Accept-Language: zh-CN, en-US Content-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: x-titus-metadata-40: eyJDYXRlZ29yeUxhYmVscyI6IiIsIk1ldGFkYXRhIjp7Im5zIjoiaHR0cDpcL1wvd3d3LnRpdHVzLmNvbVwvbnNcL0ludGVsMyIsImlkIjoiZWE0YTUyNWYtNzA1Zi00ZjhlLTk4MmYtYjA3ZWZjYWVhYjEwIiwicHJvcHMiOlt7Im4iOiJDVFBDbGFzc2lmaWNhdGlvbiIsInZhbHMiOlt7InZhbHVlIjoiQ1RQX0lDIn1dfV19LCJTdWJqZWN0TGFiZWxzIjpbXSwiVE1DVmVyc2lvbiI6IjE2LjUuOS4zIiwiVHJ1c3RlZExhYmVsSGFzaCI6IlhRcksrTXZHbzJkRGxmRStSVVFuQTNpb3AxZ3ozNHBMWXlGNXV3Y0JDRnM9In0= x-ctpclassification: CTP_IC dlp-product: dlpe-windows dlp-version: 10.0.102.7 dlp-reaction: no-action x-originating-ip: [10.239.127.40] Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 8BIT MIME-Version: 1.0 Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 1376 Lines: 36 . > On Wed, Jun 21, 2017 at 12:40:28PM +0000, Liang, Kan wrote: > > > > > > > > > > The right fix for mainline can be found here. > > > > perf/x86/intel: enable CPU ref_cycles for GP counter > > > > perf/x86/intel, > > > > watchdog: Switch NMI watchdog to ref cycles on x86 > > > > https://patchwork.kernel.org/patch/9779087/ > > > > https://patchwork.kernel.org/patch/9779089/ > > > > > > Presumably the "right fix" will later be altered to revert this > > > one-line workaround? > > > > The "right fix" itself will not touch the watchdog rate. I will modify > > the changelog to notify the people who want to do the backport. > > > > As my understanding, it's not harmful even if we don't revert the > > workaround. It can still detect the hardlockup, only takes a tiny bit > > longer. > > It depends on you perspective of harmful. :-) There are folks that would like > that sampling rate to be more accurate, so they can detect problems soon > than later. You just took an input of 'watchdog_thresh' and blindly > multiplied it by 3, which can confuse an end user who thought they setup a 5 > second threshold but instead it turned into a 15 second one. :-( Now, it cannot get accurate threshold because of the Turbo. We can only get an accurate one after applying the "right fix". Right, we should revert the workaround once the "right fix" is applied. Thanks, Kan