Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1752261AbdIAT3R (ORCPT ); Fri, 1 Sep 2017 15:29:17 -0400 Received: from Galois.linutronix.de ([146.0.238.70]:38417 "EHLO Galois.linutronix.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1750955AbdIAT3P (ORCPT ); Fri, 1 Sep 2017 15:29:15 -0400 Date: Fri, 1 Sep 2017 21:29:07 +0200 (CEST) From: Thomas Gleixner To: Don Zickus cc: LKML , Peter Zijlstra , Ingo Molnar , Andrew Morton , Borislav Petkov , Sebastian Siewior , Nicholas Piggin , Chris Metcalf , Ulrich Obergfell Subject: Re: [patch 10/29] lockup_detector/perf: Prevent cpu hotplug deadlock In-Reply-To: <20170901190208.pn4vq25udylxehph@redhat.com> Message-ID: References: <20170831071558.995235362@linutronix.de> <20170831073053.770526691@linutronix.de> <20170901190208.pn4vq25udylxehph@redhat.com> User-Agent: Alpine 2.20 (DEB 67 2015-01-07) MIME-Version: 1.0 Content-Type: text/plain; charset=US-ASCII X-Linutronix-Spam-Score: -1.0 X-Linutronix-Spam-Level: - X-Linutronix-Spam-Status: No , -1.0 points, 5.0 required, ALL_TRUSTED=-1,SHORTCIRCUIT=-0.0001 Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 1344 Lines: 35 On Fri, 1 Sep 2017, Don Zickus wrote: > On Thu, Aug 31, 2017 at 09:16:08AM +0200, Thomas Gleixner wrote: > > The following deadlock is possible in the watchdog hotplug code: > > > > cpus_write_lock() > > ... > > takedown_cpu() > > smpboot_park_threads() > > smpboot_park_thread() > > kthread_park() > > ->park() := watchdog_disable() > > watchdog_nmi_disable() > > perf_event_release_kernel(); > > put_event() > > _free_event() > > ->destroy() := hw_perf_event_destroy() > > x86_release_hardware() > > release_ds_buffers() > > get_online_cpus() > > > > when a per cpu watchdog perf event is destroyed which drops the last > > reference to the PMU hardware. The cleanup code there invokes > > get_online_cpus() which instantly deadlocks because the hotplug percpu > > rwsem is write locked. > > The main reason perf_event_release_kernel is in this path is because the > oprofile folks complained they couldn't use the perf counters when the > nmi_watchdog was disabled on the command line. If the nmi watchdog is disabled on the command line then there are no counters claimed at all. Thanks, tglx