Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1751424AbdH1PGX (ORCPT ); Mon, 28 Aug 2017 11:06:23 -0400 Received: from merlin.infradead.org ([205.233.59.134]:56448 "EHLO merlin.infradead.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751242AbdH1PGV (ORCPT ); Mon, 28 Aug 2017 11:06:21 -0400 Date: Mon, 28 Aug 2017 17:06:17 +0200 From: Peter Zijlstra To: Borislav Petkov Cc: Sebastian Andrzej Siewior , Thomas Gleixner , lkml Subject: Re: WARNING: possible circular locking dependency detected Message-ID: <20170828150617.wp6hh7flfjjjsu4m@hirez.programming.kicks-ass.net> References: <20170825100304.5cwrlrfwi7f3zcld@pd.tnic> <20170828145808.btuqpe2bvxymljyg@hirez.programming.kicks-ass.net> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20170828145808.btuqpe2bvxymljyg@hirez.programming.kicks-ass.net> User-Agent: NeoMutt/20170609 (1.8.3) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 2670 Lines: 82 On Mon, Aug 28, 2017 at 04:58:08PM +0200, Peter Zijlstra wrote: > On Fri, Aug 25, 2017 at 12:03:04PM +0200, Borislav Petkov wrote: > > Hey, > > > > tglx says I have something for ya :-) > > > > ====================================================== > > WARNING: possible circular locking dependency detected > > 4.13.0-rc6+ #1 Not tainted > > ------------------------------------------------------ > > watchdog/3/27 is trying to acquire lock: > > (cpu_hotplug_lock.rw_sem){++++}, at: [] release_ds_buffers+0x29/0xd0 > > > > but now in release context of a crosslock acquired at the following: > > ((complete)&self->parked){+.+.}, at: [] kthread_park+0x46/0x60 > > > So I'm thinking this one is an actual deadlock. > > So, as far as I can tell this ends up being: > > CPU0 CPU1 > > (smpboot_regiser_percpu_thread_cpumask) > > get_online_cpus() > __smpboot_create_thread() > kthread_park(); > wait_for_completion(&X) > > > (smpboot_thread_fn) > > ->park() := watchdog_disable() > watchdog_nmi_disable() > perf_event_release_kernel(); > put_event() > _free_event() > ->destroy() := hw_perf_event_destroy() > x86_release_hardware() > release_ds_buffers() > get_online_cpus() > > > kthread_parkme() > complete(&X) > > > > So CPU0 holds cpus_hotplug_lock while wait_for_completion() and CPU1 > needs to acquire before complete(). > > So if, in between, CPU2 does down_write(), things will get unstuck. > > What's worse, there's also: > > cpus_write_lock() > ... > takedown_cpu() > smpboot_park_threads() > smpboot_park_thread() > kthread_park() > ->park() := watchdog_disable() > watchdog_nmi_disable() > perf_event_release_kernel(); > put_event() > _free_event() > ->destroy() := hw_perf_event_destroy() > x86_release_hardware() > release_ds_buffers() > get_online_cpus() > > which as far as I can tell, spells instant deadlock.. Aah, but that latter will never happen.. because each CPU will have a &pmc_refcount and we can't unplug _all_ CPUs. So the first one will only ever happen on boot, where we park() the very first watchdog thread and is a potential deadlock, but won't happen because nobody is around to do down_write() just yet. argh!