Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1751864AbcCNQaD (ORCPT ); Mon, 14 Mar 2016 12:30:03 -0400 Received: from mx1.redhat.com ([209.132.183.28]:41426 "EHLO mx1.redhat.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751381AbcCNQ37 (ORCPT ); Mon, 14 Mar 2016 12:29:59 -0400 Date: Mon, 14 Mar 2016 12:29:55 -0400 From: Don Zickus To: Josh Hunt Cc: akpm@linux-foundation.org, uobergfe@redhat.com, linux-kernel@vger.kernel.org Subject: Re: [PATCH] watchdog: don't run proc_watchdog_update if new value is same as old Message-ID: <20160314162955.GQ194535@redhat.com> References: <1457826627-21727-1-git-send-email-johunt@akamai.com> <20160314143426.GK194535@redhat.com> <56E6CE86.5070606@akamai.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <56E6CE86.5070606@akamai.com> User-Agent: Mutt/1.5.23.1 (2014-03-12) X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.5.16 (mx1.redhat.com [10.5.110.39]); Mon, 14 Mar 2016 16:29:59 +0000 (UTC) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 2665 Lines: 65 On Mon, Mar 14, 2016 at 09:45:26AM -0500, Josh Hunt wrote: > On 03/14/2016 09:34 AM, Don Zickus wrote: > >On Sat, Mar 12, 2016 at 06:50:26PM -0500, Joshua Hunt wrote: > >>While working on a script to restore all sysctl params before a series of > >>tests I found that writing any value into the > >>/proc/sys/kernel/{nmi_watchdog,soft_watchdog,watchdog,watchdog_thresh} > >>causes them to call proc_watchdog_update(). Not only that, but when I > >>wrote to these proc files in a loop I could easily trigger a soft lockup. > >> > >>[ 955.756196] NMI watchdog: enabled on all CPUs, permanently consumes one hw-PMU counter. > >>[ 955.765994] NMI watchdog: enabled on all CPUs, permanently consumes one hw-PMU counter. > >>[ 955.774619] NMI watchdog: enabled on all CPUs, permanently consumes one hw-PMU counter. > >>[ 955.783182] NMI watchdog: enabled on all CPUs, permanently consumes one hw-PMU counter. > >>[ 959.788319] NMI watchdog: BUG: soft lockup - CPU#4 stuck for 30s! [swapper/4:0] > >>[ 959.788325] NMI watchdog: BUG: soft lockup - CPU#5 stuck for 30s! [swapper/5:0] > >> > >>There doesn't appear to be a reason for doing this work other every time a > >>write occurs, so only do the work when the values change. > > > >Hi Josh, > > > >Thanks for the patch. I have no objections to it, but Uli and myself were > >interested in the reason for the softlockups. Uli is going to provide a > >test patch to see if his theory is correct. That way we fix the underlying > >issue and then apply your patch on top. Make sense? > > Yep. Sounds good. I meant to mention I didn't diagnose the soft-lockup. If > you provide a patch I'm happy to test. I can also attempt to debug that part > more if needed. Hi Josh, I believe Uli thought the below patch might fix it. Cheers, Don diff --git a/kernel/watchdog.c b/kernel/watchdog.c index b3ace6e..dd298d2 100644 --- a/kernel/watchdog.c +++ b/kernel/watchdog.c @@ -517,12 +517,12 @@ static void watchdog_enable(unsigned int cpu) /* Enable the perf event */ watchdog_nmi_enable(cpu); + watchdog_set_prio(SCHED_FIFO, MAX_RT_PRIO - 1); /* done here because hrtimer_start can only pin to smp_processor_id() */ hrtimer_start(hrtimer, ns_to_ktime(sample_period), HRTIMER_MODE_REL_PINNED); /* initialize timestamp */ - watchdog_set_prio(SCHED_FIFO, MAX_RT_PRIO - 1); __touch_watchdog(); } @@ -530,8 +530,8 @@ static void watchdog_disable(unsigned int cpu) { struct hrtimer *hrtimer = raw_cpu_ptr(&watchdog_hrtimer); - watchdog_set_prio(SCHED_NORMAL, 0); hrtimer_cancel(hrtimer); + watchdog_set_prio(SCHED_NORMAL, 0); /* disable the perf event */ watchdog_nmi_disable(cpu); }