Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1751983AbdIEPPm (ORCPT ); Tue, 5 Sep 2017 11:15:42 -0400 Received: from mx1.redhat.com ([209.132.183.28]:59192 "EHLO mx1.redhat.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751389AbdIEPPl (ORCPT ); Tue, 5 Sep 2017 11:15:41 -0400 DMARC-Filter: OpenDMARC Filter v1.3.2 mx1.redhat.com B9193750DC Authentication-Results: ext-mx10.extmail.prod.ext.phx2.redhat.com; dmarc=none (p=none dis=none) header.from=redhat.com Authentication-Results: ext-mx10.extmail.prod.ext.phx2.redhat.com; spf=fail smtp.mailfrom=dzickus@redhat.com Date: Tue, 5 Sep 2017 11:15:31 -0400 From: Don Zickus To: Peter Zijlstra Cc: Ulrich Obergfell , Thomas Gleixner , LKML , Ingo Molnar , Andrew Morton , Borislav Petkov , Sebastian Siewior , Nicholas Piggin , Chris Metcalf Subject: Re: [patch 11/29] lockup_detector: Remove park_in_progress hackery Message-ID: <20170905151531.4rw3e4ab5qpamstz@redhat.com> References: <20170831071558.995235362@linutronix.de> <20170831073053.863251887@linutronix.de> <20170904121050.dsqbh3efmkteu3qj@hirez.programming.kicks-ass.net> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20170904121050.dsqbh3efmkteu3qj@hirez.programming.kicks-ass.net> User-Agent: NeoMutt/20170428-dirty (1.8.2) X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.5.16 (mx1.redhat.com [10.5.110.39]); Tue, 05 Sep 2017 15:15:40 +0000 (UTC) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 1408 Lines: 38 On Mon, Sep 04, 2017 at 02:10:50PM +0200, Peter Zijlstra wrote: > On Mon, Sep 04, 2017 at 01:09:06PM +0200, Ulrich Obergfell wrote: > > > - A thread hogs CPU N (soft lockup) so that watchdog/N is unable to run. > > - A user re-configures 'watchdog_thresh' on the fly. The reconfiguration > > requires parking/unparking of all watchdog threads. > > This is where you fail, its silly to require parking for > reconfiguration. Hi Peter, Ok, please elaborate. Unless I am misunderstanding, that is what Thomas requested us do years ago when he implemented the parking/unparking scheme and what his current patch set is doing now. The point of parking I believe was to avoid the overhead of tearing down a thread and restarting it when the code needed to update various lockup detector settings. So if we can't depend on parking for reconfiguration, then are the other options (besides tearing down threads)? I am not trying to be argumentative here, just trying to fill in the disconnect between us. Hi Uli, I think the race you detailed is solved with Thomas's patches. In the original design we set the sample period first, then tried parking the threads, which created the mess. With this patchset, Thomas properly parks the threads first, then sets the sample period, thus avoiding the race I believe. You should be able to see that in patch 16, softlockup_reconfigure_threads(). Cheers, Don