Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1751306AbdGQHOg (ORCPT ); Mon, 17 Jul 2017 03:14:36 -0400 Received: from Galois.linutronix.de ([146.0.238.70]:57474 "EHLO Galois.linutronix.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751261AbdGQHOf (ORCPT ); Mon, 17 Jul 2017 03:14:35 -0400 Date: Mon, 17 Jul 2017 09:14:22 +0200 (CEST) From: Thomas Gleixner To: "Liang, Kan" cc: Don Zickus , "linux-kernel@vger.kernel.org" , "mingo@kernel.org" , "akpm@linux-foundation.org" , "babu.moger@oracle.com" , "atomlin@redhat.com" , "prarit@redhat.com" , "torvalds@linux-foundation.org" , "peterz@infradead.org" , "eranian@google.com" , "acme@redhat.com" , "ak@linux.intel.com" , "stable@vger.kernel.org" Subject: RE: [PATCH V2] kernel/watchdog: fix spurious hard lockups In-Reply-To: <37D7C6CF3E00A74B8858931C1DB2F0775371D43E@SHSMSX103.ccr.corp.intel.com> Message-ID: References: <20170621144118.5939-1-kan.liang@intel.com> <20170622154450.2lua7fdmigcixldw@redhat.com> <20170623162907.l6inpxgztwwkeaoi@redhat.com> <20170626201927.3ak7fk3yvdzbb4ay@redhat.com> <20170627201249.ll34ecwhpme3vh2u@redhat.com> <37D7C6CF3E00A74B8858931C1DB2F0775371D43E@SHSMSX103.ccr.corp.intel.com> User-Agent: Alpine 2.20 (DEB 67 2015-01-07) MIME-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 957 Lines: 26 On Mon, 17 Jul 2017, Liang, Kan wrote: > There are three proposed patches so far. > Patch 1: The patch as above which speed up the hrtimer. > Patch 2: Thomas's first proposal. > https://patchwork.kernel.org/patch/9803033/ > https://patchwork.kernel.org/patch/9805903/ > Patch 3: my original proposal which increase the NMI watchdog timeout by 3X > https://patchwork.kernel.org/patch/9802053/ > > According to our test, only patch 3 works well. > The other two patches will hang the system eventually. > For patch 1, the system hang after running our test case for ~1 hour. > For patch 2, the system hang in running the overnight test. > There is no error message shown when the system hang. So I don't know the > root cause yet. That doesn't make sense. What's the exact test procedure? > BTW: We set 1 to watchdog_thresh when we did the test. > It's believed that can speed up the failure. Believe is not really a technical measure.... Thanks, tglx