Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1751564AbdH1OMA (ORCPT ); Mon, 28 Aug 2017 10:12:00 -0400 Received: from merlin.infradead.org ([205.233.59.134]:54088 "EHLO merlin.infradead.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751232AbdH1OL6 (ORCPT ); Mon, 28 Aug 2017 10:11:58 -0400 Date: Mon, 28 Aug 2017 16:11:48 +0200 From: Peter Zijlstra To: Sebastian Andrzej Siewior Cc: Borislav Petkov , Byungchul Park , Thomas Gleixner , lkml Subject: Re: WARNING: possible circular locking dependency detected Message-ID: <20170828141148.56z7vuoqom65j5xv@hirez.programming.kicks-ass.net> References: <20170825100304.5cwrlrfwi7f3zcld@pd.tnic> <20170825144755.ms2h2j2xe6gznnqi@linutronix.de> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline Content-Transfer-Encoding: 8bit In-Reply-To: <20170825144755.ms2h2j2xe6gznnqi@linutronix.de> User-Agent: NeoMutt/20170609 (1.8.3) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 2303 Lines: 58 On Fri, Aug 25, 2017 at 04:47:55PM +0200, Sebastian Andrzej Siewior wrote: > On 2017-08-25 12:03:04 [+0200], Borislav Petkov wrote: > | ====================================================== > | WARNING: possible circular locking dependency detected > | 4.13.0-rc6-00758-gd80d4177391f-dirty #112 Not tainted > | ------------------------------------------------------ > | cpu-off.sh/426 is trying to acquire lock: > | ((complete)&st->done){+.+.}, at: [] takedown_cpu+0x84/0xf0 > | > | but task is already holding lock: > | (sparse_irq_lock){+.+.}, at: [] irq_lock_sparse+0x12/0x20 > | > | which lock already depends on the new lock. > | > | the existing dependency chain (in reverse order) is: > | > | -> #1 (sparse_irq_lock){+.+.}: > | __mutex_lock+0x88/0x9a0 > | mutex_lock_nested+0x16/0x20 > | irq_lock_sparse+0x12/0x20 > | irq_affinity_online_cpu+0x13/0xd0 > | cpuhp_invoke_callback+0x4a/0x130 > | > | -> #0 ((complete)&st->done){+.+.}: > | check_prev_add+0x351/0x700 > | __lock_acquire+0x114a/0x1220 > | lock_acquire+0x47/0x70 > | wait_for_completion+0x5c/0x180 > | takedown_cpu+0x84/0xf0 > | cpuhp_invoke_callback+0x4a/0x130 > | cpuhp_down_callbacks+0x3d/0x80 > … > | > | other info that might help us debug this: > | > | Possible unsafe locking scenario: > | CPU0 CPU1 > | ---- ---- > | lock(sparse_irq_lock); > | lock((complete)&st->done); > | lock(sparse_irq_lock); > | lock((complete)&st->done); > | > | *** DEADLOCK *** > > We hold the sparse_irq_lock lock while waiting for the completion in the > CPU-down case and in the CPU-up case we acquire the sparse_irq_lock lock > while the other CPU is waiting for the completion. > This is not an issue if my interpretation of lockdep here is correct. > > How do we annotate this? Right, so I'm also seeing this one on offline. Its not making sense to me. wait_for_completion() cannot be #0, we don't hold any other locks (except the scheduler locks) while we have the complete_acquire() thing. Now, I have enough hackery here to make it go away, but I'm not in fact sure I understand things yet :-(