Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1753553AbbLCUMw (ORCPT ); Thu, 3 Dec 2015 15:12:52 -0500 Received: from mx5-phx2.redhat.com ([209.132.183.37]:47879 "EHLO mx5-phx2.redhat.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1753473AbbLCUMu (ORCPT ); Thu, 3 Dec 2015 15:12:50 -0500 Date: Thu, 3 Dec 2015 15:12:20 -0500 (EST) From: Ulrich Obergfell To: Tejun Heo Cc: Don Zickus , Ingo Molnar , Peter Zijlstra , Andrew Morton , linux-kernel@vger.kernel.org, kernel-team@fb.com Message-ID: <1971916814.34665208.1449173540866.JavaMail.zimbra@redhat.com> In-Reply-To: <20151203194358.GK27463@mtj.duckdns.org> References: <20151203002810.GJ19878@mtj.duckdns.org> <20151203002839.GK19878@mtj.duckdns.org> <20151203175024.GE27730@redhat.com> <20151203194358.GK27463@mtj.duckdns.org> Subject: Re: [PATCH 2/2] workqueue: implement lockup detector MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-Originating-IP: [10.36.4.245] X-Mailer: Zimbra 8.0.6_GA_5922 (ZimbraWebClient - FF22 (Linux)/8.0.6_GA_5922) Thread-Topic: workqueue: implement lockup detector Thread-Index: TatBZhL2JK1z3pHS9BMbhpIqMEzeYA== Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 4760 Lines: 129 Tejun, I share Don's concern about connecting the soft lockup detector and the workqueue watchdog to the same kernel parameter in /proc. I would feel more comfortable if the workqueue watchdog had its dedicated parameter. I also see a scenario that the proposed patch does not handle well: The watchdog_thresh parameter can be changed 'on the fly' - i.e. it is not necessary to disable and re-enable the watchdog. The flow of execution looks like this. proc_watchdog_thresh proc_watchdog_update if (watchdog_enabled && watchdog_thresh) watchdog_enable_all_cpus if (!watchdog_running) { ... } else { // // update 'on the fly' // update_watchdog_all_cpus() } The patched watchdog_enable_all_cpus() function disables the workqueue watchdog unconditionally at [1]. However, the workqueue watchdog remains disabled if the code path [2] is executed (and wq_watchdog_thresh is not updated as well). static int watchdog_enable_all_cpus(void) { int err = 0; [1] --> disable_workqueue_watchdog(); if (!watchdog_running) { ... } else { .- /* | * Enable/disable the lockup detectors or | * change the sample period 'on the fly'. | */ [2] < err = update_watchdog_all_cpus(); | | if (err) { | watchdog_disable_all_cpus(); | pr_err("Failed to update lockup detectors, disabled\n"); '- } } if (err) watchdog_enabled = 0; return err; } And another question that comes to my mind is: Would the workqueue watchdog participate in the lockup detector suspend/resume mechanism, and if yes, how would it be integrated into this ? Regards, Uli ----- Original Message ----- From: "Tejun Heo" To: "Don Zickus" Cc: "Ulrich Obergfell" , "Ingo Molnar" , "Peter Zijlstra" , "Andrew Morton" , linux-kernel@vger.kernel.org, kernel-team@fb.com Sent: Thursday, December 3, 2015 8:43:58 PM Subject: Re: [PATCH 2/2] workqueue: implement lockup detector Hello, Don. On Thu, Dec 03, 2015 at 12:50:24PM -0500, Don Zickus wrote: > This sort of looks like the hung task detector.. > > I am a little concerned because we just made a big effort to properly > separate the hardlockup and softlockup paths and yet retain the flexibility > to enable/disable them separately. Now it seems the workqueue detector is > permanently entwined with the softlockup detector. I am not entirely sure > that is correct thing to do. The only area they get entwined is how it's controlled from userland. While it isn't quite the same as softlockup detection, I think what it monitors is close enough that it makes sense to put them under the same interface. > It also seems awkward for the lockup code to have to jump to the workqueue > code to function properly. :-/ Though we have made exceptions for the virt > stuff and the workqueue code is simple.. Softlockup code doesn't depend on workqueue in any way. Workqueue tags on touch_softlockup to detect cases which shouldn't be warned and its enabledness is controlled together with softlockup and that's it. > Actually, I am curious, it seems if you just added a > /proc/sys/kernel/wq_watchdog entry, you could elminiate the entire need for > modifying the watchdog code to begin with. As you really aren't using any > of it other than piggybacking on the touch_softlockup_watchdog stuff, which > could probably be easily added without all the extra enable/disable changes > in watchdog.c. Yeah, except for touch signal, it's purely interface thing. I don't feel too strong about this but it seems a bit silly to introduce a whole different set of interface for this. e.g. if the user wanted to disable softlockup detection, it'd be weird to leave wq lockup detection running. The same goes for threshold. > Again, this looks like what the hung task detector is doing, which I > struggled with years ago to integrate with the lockup code because in the > end I had trouble re-using much of it. So, it's a stall detector and there are inherent similarities but the conditions tested are pretty different and it's a lot lighter. I'm not really sure what you're meaning to say. Thanks. -- tejun -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/