Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1751954AbdH1O6R (ORCPT ); Mon, 28 Aug 2017 10:58:17 -0400 Received: from bombadil.infradead.org ([65.50.211.133]:57867 "EHLO bombadil.infradead.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751259AbdH1O6O (ORCPT ); Mon, 28 Aug 2017 10:58:14 -0400 Date: Mon, 28 Aug 2017 16:58:08 +0200 From: Peter Zijlstra To: Borislav Petkov Cc: Sebastian Andrzej Siewior , Thomas Gleixner , lkml Subject: Re: WARNING: possible circular locking dependency detected Message-ID: <20170828145808.btuqpe2bvxymljyg@hirez.programming.kicks-ass.net> References: <20170825100304.5cwrlrfwi7f3zcld@pd.tnic> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20170825100304.5cwrlrfwi7f3zcld@pd.tnic> User-Agent: NeoMutt/20170609 (1.8.3) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 2142 Lines: 72 On Fri, Aug 25, 2017 at 12:03:04PM +0200, Borislav Petkov wrote: > Hey, > > tglx says I have something for ya :-) > > ====================================================== > WARNING: possible circular locking dependency detected > 4.13.0-rc6+ #1 Not tainted > ------------------------------------------------------ > watchdog/3/27 is trying to acquire lock: > (cpu_hotplug_lock.rw_sem){++++}, at: [] release_ds_buffers+0x29/0xd0 > > but now in release context of a crosslock acquired at the following: > ((complete)&self->parked){+.+.}, at: [] kthread_park+0x46/0x60 So I'm thinking this one is an actual deadlock. So, as far as I can tell this ends up being: CPU0 CPU1 (smpboot_regiser_percpu_thread_cpumask) get_online_cpus() __smpboot_create_thread() kthread_park(); wait_for_completion(&X) (smpboot_thread_fn) ->park() := watchdog_disable() watchdog_nmi_disable() perf_event_release_kernel(); put_event() _free_event() ->destroy() := hw_perf_event_destroy() x86_release_hardware() release_ds_buffers() get_online_cpus() kthread_parkme() complete(&X) So CPU0 holds cpus_hotplug_lock while wait_for_completion() and CPU1 needs to acquire before complete(). So if, in between, CPU2 does down_write(), things will get unstuck. What's worse, there's also: cpus_write_lock() ... takedown_cpu() smpboot_park_threads() smpboot_park_thread() kthread_park() ->park() := watchdog_disable() watchdog_nmi_disable() perf_event_release_kernel(); put_event() _free_event() ->destroy() := hw_perf_event_destroy() x86_release_hardware() release_ds_buffers() get_online_cpus() which as far as I can tell, spells instant deadlock..