Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1755433AbdIGQEj (ORCPT ); Thu, 7 Sep 2017 12:04:39 -0400 Received: from mx1.redhat.com ([209.132.183.28]:4528 "EHLO mx1.redhat.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1754932AbdIGQEi (ORCPT ); Thu, 7 Sep 2017 12:04:38 -0400 DMARC-Filter: OpenDMARC Filter v1.3.2 mx1.redhat.com F0CC081DFE Authentication-Results: ext-mx01.extmail.prod.ext.phx2.redhat.com; dmarc=none (p=none dis=none) header.from=redhat.com Authentication-Results: ext-mx01.extmail.prod.ext.phx2.redhat.com; spf=fail smtp.mailfrom=dzickus@redhat.com Date: Thu, 7 Sep 2017 12:04:36 -0400 From: Don Zickus To: Thomas Gleixner Cc: LKML , Peter Zijlstra , Ingo Molnar , Andrew Morton , Borislav Petkov , Sebastian Siewior , Nicholas Piggin , Chris Metcalf , Ulrich Obergfell Subject: Re: [patch 00/29] lockup_detector: Cure hotplug deadlocks and replace duct tape Message-ID: <20170907160436.b4mjttqugh3o77zl@redhat.com> References: <20170831071558.995235362@linutronix.de> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20170831071558.995235362@linutronix.de> User-Agent: NeoMutt/20170428-dirty (1.8.2) X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.5.16 (mx1.redhat.com [10.5.110.25]); Thu, 07 Sep 2017 16:04:38 +0000 (UTC) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 2565 Lines: 64 On Thu, Aug 31, 2017 at 09:15:58AM +0200, Thomas Gleixner wrote: > The lockup detector is broken is several ways: > > - It's deadlock prone vs. CPU hotplug in various ways. Some of these > are due to recursive cpus_read_lock() others are due to > cpus_read_lock() from CPU hotplug callbacks which immediately lock > the machine because cpus are write locked. > > - The handling of the cpu hotplug threads happens sideways to the > smpboot thread infrastructure, which is racy and pointless > > - The handling of the user space sysctl interface is a complete > trainwreck as it fiddles directly with variables which can be > modified or evaluated by the running watchdogs. > > - The perf event initialization is a steaming pile of duct tape as it > idiotically tries to create perf events over and over even if perf is > not functional (no hardware, ....). To avoid excessive dmesg spam it > contains magic printk ratelimiting along with either wrong or useless > messages. > > - The code structure is horrible as ifdef sections are scattered all > over the place which makes it unreadable > > - There is more wreckage, but see the changelogs for the ugly details. > > Before I get utterly grumpy, I just pretend that I don't give a sh*t! > > The following series sanitizes the facility and addresses the problems. One of the goals I was trying to achieve with splitting out watchdog_hld.c was to abstract it as another hw nmi thing. As some arches wanted to move away from using perf as a hardlockup detector. So watchdog_nmi_enable/disable was an attempt to be that hook, maybe watchdog_nmi_reconfigure. I think some of your hardlockup_detector_perf_enable/disable/restart might fit into that. The _cleanup() probably not. Other than that and the compile issue, I don't really have much problems with the bulk of the changes and my simple tests seem to work fine. Cheers, Don > > Thanks, > > tglx > --- > arch/parisc/kernel/process.c | 2 > arch/powerpc/kernel/watchdog.c | 22 - > arch/x86/events/intel/core.c | 11 > include/linux/nmi.h | 121 +++---- > include/linux/smpboot.h | 4 > kernel/cpu.c | 6 > kernel/smpboot.c | 22 - > kernel/sysctl.c | 22 - > kernel/watchdog.c | 638 ++++++++++++++--------------------------- > kernel/watchdog_hld.c | 193 ++++++------ > 10 files changed, 433 insertions(+), 608 deletions(-) > >