Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S934237AbdHYQMJ (ORCPT ); Fri, 25 Aug 2017 12:12:09 -0400 Received: from mail-lf0-f67.google.com ([209.85.215.67]:36143 "EHLO mail-lf0-f67.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S933814AbdHYQMH (ORCPT ); Fri, 25 Aug 2017 12:12:07 -0400 MIME-Version: 1.0 In-Reply-To: <20170825144755.ms2h2j2xe6gznnqi@linutronix.de> References: <20170825100304.5cwrlrfwi7f3zcld@pd.tnic> <20170825144755.ms2h2j2xe6gznnqi@linutronix.de> From: Byungchul Park Date: Sat, 26 Aug 2017 01:12:05 +0900 Message-ID: Subject: Re: WARNING: possible circular locking dependency detected To: Sebastian Andrzej Siewior Cc: Borislav Petkov , Byungchul Park , Thomas Gleixner , Peter Zijlstra , lkml , kernel-team@lge.com Content-Type: text/plain; charset="UTF-8" Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Transfer-Encoding: 8bit X-MIME-Autoconverted: from quoted-printable to 8bit by nfs id v7PGCEHV007335 Content-Length: 2900 Lines: 82 On Fri, Aug 25, 2017 at 11:47 PM, Sebastian Andrzej Siewior wrote: > On 2017-08-25 12:03:04 [+0200], Borislav Petkov wrote: >> ====================================================== >> WARNING: possible circular locking dependency detected >> 4.13.0-rc6+ #1 Not tainted >> ------------------------------------------------------ > > While looking at this, I stumbled upon another one also enabled by > "completion annotation" in the TIP: > > | ====================================================== > | WARNING: possible circular locking dependency detected > | 4.13.0-rc6-00758-gd80d4177391f-dirty #112 Not tainted > | ------------------------------------------------------ > | cpu-off.sh/426 is trying to acquire lock: > | ((complete)&st->done){+.+.}, at: [] takedown_cpu+0x84/0xf0 > | > | but task is already holding lock: > | (sparse_irq_lock){+.+.}, at: [] irq_lock_sparse+0x12/0x20 > | > | which lock already depends on the new lock. > | > | the existing dependency chain (in reverse order) is: > | > | -> #1 (sparse_irq_lock){+.+.}: > | __mutex_lock+0x88/0x9a0 > | mutex_lock_nested+0x16/0x20 > | irq_lock_sparse+0x12/0x20 > | irq_affinity_online_cpu+0x13/0xd0 > | cpuhp_invoke_callback+0x4a/0x130 > | > | -> #0 ((complete)&st->done){+.+.}: > | check_prev_add+0x351/0x700 > | __lock_acquire+0x114a/0x1220 > | lock_acquire+0x47/0x70 > | wait_for_completion+0x5c/0x180 > | takedown_cpu+0x84/0xf0 > | cpuhp_invoke_callback+0x4a/0x130 > | cpuhp_down_callbacks+0x3d/0x80 > … > | > | other info that might help us debug this: > | > | Possible unsafe locking scenario: > | CPU0 CPU1 > | ---- ---- > | lock(sparse_irq_lock); > | lock((complete)&st->done); > | lock(sparse_irq_lock); > | lock((complete)&st->done); > | > | *** DEADLOCK *** > > We hold the sparse_irq_lock lock while waiting for the completion in the > CPU-down case and in the CPU-up case we acquire the sparse_irq_lock lock > while the other CPU is waiting for the completion. > This is not an issue if my interpretation of lockdep here is correct. Hello Sebastian, I think you parsed the message correctly. The message is saying that, for example: context A (maybe being up?) -- lock(sparse_irq_lock) // wait for sparse_irq_lock in B to be released complete(st->done) // impossible to hit here context B (maybe wanting to synchronize with the cpu being up?) -- lock(sparse_irq_lock) // acquired successfully wait_for_completion(st->done) // wait for completion of st->done in A unlock(sparse_irq_lock) // impossible to hit here I cannot check the kernel code at the moment.. I wonder if this scenario is impossible. Could you answer it? -- Thanks, Byungchul