Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1751971AbbLDNT4 (ORCPT ); Fri, 4 Dec 2015 08:19:56 -0500 Received: from mx3-phx2.redhat.com ([209.132.183.24]:43520 "EHLO mx3-phx2.redhat.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1750962AbbLDNTy (ORCPT ); Fri, 4 Dec 2015 08:19:54 -0500 Date: Fri, 4 Dec 2015 08:19:26 -0500 (EST) From: Ulrich Obergfell To: Tejun Heo Cc: Don Zickus , Ingo Molnar , Peter Zijlstra , Andrew Morton , linux-kernel@vger.kernel.org, kernel-team@fb.com Message-ID: <1686084684.35307565.1449235166196.JavaMail.zimbra@redhat.com> In-Reply-To: <20151203205449.GL27463@mtj.duckdns.org> References: <20151203002810.GJ19878@mtj.duckdns.org> <20151203002839.GK19878@mtj.duckdns.org> <20151203175024.GE27730@redhat.com> <20151203194358.GK27463@mtj.duckdns.org> <1971916814.34665208.1449173540866.JavaMail.zimbra@redhat.com> <20151203205449.GL27463@mtj.duckdns.org> Subject: Re: [PATCH 2/2] workqueue: implement lockup detector MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-Originating-IP: [10.36.4.68] X-Mailer: Zimbra 8.0.6_GA_5922 (ZimbraWebClient - FF22 (Linux)/8.0.6_GA_5922) Thread-Topic: workqueue: implement lockup detector Thread-Index: Q7eG9S4lUpLYHIWtZX0X1+9Q1H1mBg== Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 2668 Lines: 63 Tejun, > Sure, separating the knobs out isn't difficult. I still don't like > the idea of having multiple set of similar knobs controlling about the > same thing tho. > > For example, let's say there's a user who boots with "nosoftlockup" > explicitly. I'm pretty sure the user wouldn't be intending to keep > workqueue watchdog running. The same goes for threshold adjustments, > so here's my question. What are the reasons for the concern? What > are we worrying about? I'm not sure if it is obvious to a user that a stall of workqueues is "about the same thing" as a soft lockup, and that one could thus argue that both should be controlled by the same knob. Looking at this from perspective of usability, I would still vote for having separate knobs for each lockup detector. For example /proc/sys/kernel/wq_watchdog_thresh could control the on|off state of the workqueue watchdog and the timeout at the same time (0 means off, > 0 means on and specifies the timeout). Separating wq_watchdog_thresh from watchdog_thresh might also be useful for diagnostic purposes for example, if during the investigation of a problem one would want to explicitly increase or lower one threshold without impacting the other. >> And another question that comes to my mind is: Would the workqueue watchdog >> participate in the lockup detector suspend/resume mechanism, and if yes, how >> would it be integrated into this ? > > From the usage, I can't quite tell what the purpose of the mechanism > is. The only user seems to be fixup_ht_bug() and when it fails it > says "failed to disable PMU erratum BJ122, BV98, HSD29 workaround" so > if you could give me a pointer, it'd be great. But at any rate, if > shutting down watchdog is all that's necessary, it shouldn't be a > problem to integrate. The patch post that introduced the mechanism is here: http://marc.info/?l=linux-kernel&m=143843318208917&w=2 The watchdog_{suspend|resume} functions were later renamed: http://marc.info/?l=linux-kernel&m=143894132129982&w=2 At the moment I don't see a reason why the workqueue watchdog would have to participate in that mechanism. However, if the workqueue watchdog would be connected to the soft lockup detector as you proposed, I think it should be participating for the 'sake of consistency' (it would seem hard to under- stand if the interface would only suspend parts of the lockup detector). Regards, Uli -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/