Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1751014AbdIAEmr (ORCPT ); Fri, 1 Sep 2017 00:42:47 -0400 Received: from mail-pf0-f195.google.com ([209.85.192.195]:37045 "EHLO mail-pf0-f195.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1750707AbdIAEmp (ORCPT ); Fri, 1 Sep 2017 00:42:45 -0400 X-Google-Smtp-Source: ADKCNb7PPZc9d82c8+C9vM3xSs8wwWdXmplQNsFgK2vHdgc9m3tCSgEGxubL5xrgOBdbwnKawR3Sww== Date: Fri, 1 Sep 2017 14:42:29 +1000 From: Nicholas Piggin To: Don Zickus Cc: Thomas Gleixner , LKML , Peter Zijlstra , Ingo Molnar , Andrew Morton , Borislav Petkov , Sebastian Siewior , Chris Metcalf , Ulrich Obergfell , Michael Ellerman Subject: Re: [patch 00/29] lockup_detector: Cure hotplug deadlocks and replace duct tape Message-ID: <20170901144229.3791e5c9@roar.ozlabs.ibm.com> In-Reply-To: <20170831221014.b3tlpk3p6zxwclmy@redhat.com> References: <20170831071558.995235362@linutronix.de> <20170831221014.b3tlpk3p6zxwclmy@redhat.com> Organization: IBM X-Mailer: Claws Mail 3.15.0-dirty (GTK+ 2.24.31; x86_64-pc-linux-gnu) MIME-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 2427 Lines: 53 On Thu, 31 Aug 2017 18:10:14 -0400 Don Zickus wrote: > On Thu, Aug 31, 2017 at 09:15:58AM +0200, Thomas Gleixner wrote: > > The lockup detector is broken is several ways: > > > > - It's deadlock prone vs. CPU hotplug in various ways. Some of these > > are due to recursive cpus_read_lock() others are due to > > cpus_read_lock() from CPU hotplug callbacks which immediately lock > > the machine because cpus are write locked. > > > > - The handling of the cpu hotplug threads happens sideways to the > > smpboot thread infrastructure, which is racy and pointless > > > > - The handling of the user space sysctl interface is a complete > > trainwreck as it fiddles directly with variables which can be > > modified or evaluated by the running watchdogs. > > > > - The perf event initialization is a steaming pile of duct tape as it > > idiotically tries to create perf events over and over even if perf is > > not functional (no hardware, ....). To avoid excessive dmesg spam it > > contains magic printk ratelimiting along with either wrong or useless > > messages. > > > > - The code structure is horrible as ifdef sections are scattered all > > over the place which makes it unreadable > > > > - There is more wreckage, but see the changelogs for the ugly details. > > > > Before I get utterly grumpy, I just pretend that I don't give a sh*t! > > > > The following series sanitizes the facility and addresses the problems. > > Hi Thomas, > > Thanks for the patchset. I agree with most your issues you complained > about, just wasn't smart enough to figure out the right way to solve them. > Despite your aggressive comments, I will review the code to see if it covers > the scenarios that have popped up over the years and run some testing on my > side. Probably need a few days to do that. The powerpc bits look fine, there's no real changes pending there, so just take them through your tree if you like. I had a glance throught the series, no comments yet. The powerpc watchdog already duplicates the proc tunables rather than using them directly, so in theory it did not need the 2 stage reconfigure. In practice, it has a brown paper bag bug because it does not stop the watchdog before changing its internal variables :P 2 stage is probably safer and clearer way to go though. Thanks, Nick