Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1751663AbdH1Qc2 (ORCPT ); Mon, 28 Aug 2017 12:32:28 -0400 Received: from merlin.infradead.org ([205.233.59.134]:57704 "EHLO merlin.infradead.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751552AbdH1Qc1 (ORCPT ); Mon, 28 Aug 2017 12:32:27 -0400 Date: Mon, 28 Aug 2017 18:32:22 +0200 From: Peter Zijlstra To: Borislav Petkov Cc: Sebastian Andrzej Siewior , Thomas Gleixner , lkml Subject: Re: WARNING: possible circular locking dependency detected Message-ID: <20170828163222.tn4a2dx7nsc6u7zu@hirez.programming.kicks-ass.net> References: <20170825100304.5cwrlrfwi7f3zcld@pd.tnic> <20170828145808.btuqpe2bvxymljyg@hirez.programming.kicks-ass.net> <20170828150617.wp6hh7flfjjjsu4m@hirez.programming.kicks-ass.net> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20170828150617.wp6hh7flfjjjsu4m@hirez.programming.kicks-ass.net> User-Agent: NeoMutt/20170609 (1.8.3) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 3095 Lines: 94 On Mon, Aug 28, 2017 at 05:06:17PM +0200, Peter Zijlstra wrote: > On Mon, Aug 28, 2017 at 04:58:08PM +0200, Peter Zijlstra wrote: > > On Fri, Aug 25, 2017 at 12:03:04PM +0200, Borislav Petkov wrote: > > > Hey, > > > > > > tglx says I have something for ya :-) > > > > > > ====================================================== > > > WARNING: possible circular locking dependency detected > > > 4.13.0-rc6+ #1 Not tainted > > > ------------------------------------------------------ > > > watchdog/3/27 is trying to acquire lock: > > > (cpu_hotplug_lock.rw_sem){++++}, at: [] release_ds_buffers+0x29/0xd0 > > > > > > but now in release context of a crosslock acquired at the following: > > > ((complete)&self->parked){+.+.}, at: [] kthread_park+0x46/0x60 > > > > > > So I'm thinking this one is an actual deadlock. > > > > So, as far as I can tell this ends up being: > > > > CPU0 CPU1 > > > > (smpboot_regiser_percpu_thread_cpumask) > > > > get_online_cpus() > > __smpboot_create_thread() > > kthread_park(); > > wait_for_completion(&X) > > > > > > (smpboot_thread_fn) > > > > ->park() := watchdog_disable() > > watchdog_nmi_disable() > > perf_event_release_kernel(); > > put_event() > > _free_event() > > ->destroy() := hw_perf_event_destroy() > > x86_release_hardware() > > release_ds_buffers() > > get_online_cpus() > > > > > > kthread_parkme() > > complete(&X) > > > > > > > > So CPU0 holds cpus_hotplug_lock while wait_for_completion() and CPU1 > > needs to acquire before complete(). > > > > So if, in between, CPU2 does down_write(), things will get unstuck. > > > > What's worse, there's also: > > > > cpus_write_lock() > > ... > > takedown_cpu() > > smpboot_park_threads() > > smpboot_park_thread() > > kthread_park() > > ->park() := watchdog_disable() > > watchdog_nmi_disable() > > perf_event_release_kernel(); > > put_event() > > _free_event() > > ->destroy() := hw_perf_event_destroy() > > x86_release_hardware() > > release_ds_buffers() > > get_online_cpus() > > > > which as far as I can tell, spells instant deadlock.. > > Aah, but that latter will never happen.. because each CPU will have a > &pmc_refcount and we can't unplug _all_ CPUs. > > So the first one will only ever happen on boot, where we park() the very > first watchdog thread and is a potential deadlock, but won't happen > because nobody is around to do down_write() just yet. I suspect however it is possible to interleave: sysctl.kernel.nmi_watchdog = 0 proc_watchdog_common() with: hot un-plug watchdog_disable() to tickle that exact problem. Just needs a bit of luck.