Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1751398AbdGQMSw convert rfc822-to-8bit (ORCPT ); Mon, 17 Jul 2017 08:18:52 -0400 Received: from mga11.intel.com ([192.55.52.93]:33080 "EHLO mga11.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751282AbdGQMSu (ORCPT ); Mon, 17 Jul 2017 08:18:50 -0400 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.40,374,1496127600"; d="scan'208";a="108883402" From: "Liang, Kan" To: Thomas Gleixner CC: Don Zickus , "linux-kernel@vger.kernel.org" , "mingo@kernel.org" , "akpm@linux-foundation.org" , "babu.moger@oracle.com" , "atomlin@redhat.com" , "prarit@redhat.com" , "torvalds@linux-foundation.org" , "peterz@infradead.org" , "eranian@google.com" , "acme@redhat.com" , "ak@linux.intel.com" , "stable@vger.kernel.org" Subject: RE: [PATCH V2] kernel/watchdog: fix spurious hard lockups Thread-Topic: [PATCH V2] kernel/watchdog: fix spurious hard lockups Thread-Index: AQHS6pyX93nZMscYGUu+rlSjZMkkxKIvVm6AgAErMwCAARD/gIAAjbWAgABZxoCABJ2VAIABkHmAgB6zMiD//+HwAIAA0r+Q Date: Mon, 17 Jul 2017 12:18:45 +0000 Message-ID: <37D7C6CF3E00A74B8858931C1DB2F0775371D8AA@SHSMSX103.ccr.corp.intel.com> References: <20170621144118.5939-1-kan.liang@intel.com> <20170622154450.2lua7fdmigcixldw@redhat.com> <20170623162907.l6inpxgztwwkeaoi@redhat.com> <20170626201927.3ak7fk3yvdzbb4ay@redhat.com> <20170627201249.ll34ecwhpme3vh2u@redhat.com> <37D7C6CF3E00A74B8858931C1DB2F0775371D43E@SHSMSX103.ccr.corp.intel.com> In-Reply-To: Accept-Language: zh-CN, en-US Content-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: x-titus-metadata-40: eyJDYXRlZ29yeUxhYmVscyI6IiIsIk1ldGFkYXRhIjp7Im5zIjoiaHR0cDpcL1wvd3d3LnRpdHVzLmNvbVwvbnNcL0ludGVsMyIsImlkIjoiNTc2Yjg3NDItZWYwOS00MmMzLWI2NTMtZjMwMDg1MjZmZmFlIiwicHJvcHMiOlt7Im4iOiJDVFBDbGFzc2lmaWNhdGlvbiIsInZhbHMiOlt7InZhbHVlIjoiQ1RQX0lDIn1dfV19LCJTdWJqZWN0TGFiZWxzIjpbXSwiVE1DVmVyc2lvbiI6IjE2LjUuOS4zIiwiVHJ1c3RlZExhYmVsSGFzaCI6Ik9URzBXNnJCOGc4YkRCVVFwbEJodU5DeHNoNVpLZDhtT09aN3B0R1wvaGFrPSJ9 x-ctpclassification: CTP_IC dlp-product: dlpe-windows dlp-version: 10.0.102.7 dlp-reaction: no-action x-originating-ip: [10.239.127.40] Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 8BIT MIME-Version: 1.0 Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 1309 Lines: 36 > > On Mon, 17 Jul 2017, Liang, Kan wrote: > > There are three proposed patches so far. > > Patch 1: The patch as above which speed up the hrtimer. > > Patch 2: Thomas's first proposal. > > https://patchwork.kernel.org/patch/9803033/ > > https://patchwork.kernel.org/patch/9805903/ > > Patch 3: my original proposal which increase the NMI watchdog timeout > > by 3X https://patchwork.kernel.org/patch/9802053/ > > > > According to our test, only patch 3 works well. > > The other two patches will hang the system eventually. > > For patch 1, the system hang after running our test case for ~1 hour. > > For patch 2, the system hang in running the overnight test. > > There is no error message shown when the system hang. So I don't know > > the root cause yet. > > That doesn't make sense. What's the exact test procedure? I don't know the exact test procedure. The test case is from our customer. I only know that the test case makes calls into the x11 libs. > > > BTW: We set 1 to watchdog_thresh when we did the test. > > It's believed that can speed up the failure. > > Believe is not really a technical measure.... > 1 is a valid value for watchdog_thresh. It was set through the standard proc interface. /proc/sys/kernel/watchdog_thresh It should not impacts the final test result. Thanks, Kan