Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S934583AbdHYQmn convert rfc822-to-8bit (ORCPT ); Fri, 25 Aug 2017 12:42:43 -0400 Received: from Galois.linutronix.de ([146.0.238.70]:42260 "EHLO Galois.linutronix.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S934474AbdHYQmm (ORCPT ); Fri, 25 Aug 2017 12:42:42 -0400 Date: Fri, 25 Aug 2017 18:42:34 +0200 From: Sebastian Andrzej Siewior To: Borislav Petkov , Byungchul Park Cc: Thomas Gleixner , Peter Zijlstra , lkml Subject: Re: WARNING: possible circular locking dependency detected Message-ID: <20170825164234.wlvg2mfvcx2pwlhg@linutronix.de> References: <20170825100304.5cwrlrfwi7f3zcld@pd.tnic> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline Content-Transfer-Encoding: 8BIT In-Reply-To: <20170825100304.5cwrlrfwi7f3zcld@pd.tnic> User-Agent: NeoMutt/20170609 (1.8.3) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 5259 Lines: 129 On 2017-08-25 12:03:04 [+0200], Borislav Petkov wrote: > Hey, Hi Borislav, > tglx says I have something for ya :-) :) > ====================================================== > WARNING: possible circular locking dependency detected > 4.13.0-rc6+ #1 Not tainted > ------------------------------------------------------ > watchdog/3/27 is trying to acquire lock: > (cpu_hotplug_lock.rw_sem){++++}, at: [] release_ds_buffers+0x29/0xd0 > > but now in release context of a crosslock acquired at the following: > ((complete)&self->parked){+.+.}, at: [] kthread_park+0x46/0x60 > > which lock already depends on the new lock. > > > the existing dependency chain (in reverse order) is: > > -> #2 ((complete)&self->parked){+.+.}: > __lock_acquire+0x10af/0x1110 > lock_acquire+0xea/0x1f0 > wait_for_completion+0x3b/0x130 > kthread_park+0x46/0x60 > __smpboot_create_thread.part.5+0x7d/0xf0 > smpboot_register_percpu_thread_cpumask+0xa2/0x100 > spawn_ksoftirqd+0x3b/0x45 > do_one_initcall+0x52/0x198 > kernel_init_freeable+0x6f/0x1a1 > kernel_init+0xe/0x100 > ret_from_fork+0x2a/0x40 > > -> #1 (smpboot_threads_lock){+.+.}: > __lock_acquire+0x10af/0x1110 > lock_acquire+0xea/0x1f0 > __mutex_lock+0x6c/0x940 > mutex_lock_nested+0x1b/0x20 > smpboot_register_percpu_thread_cpumask+0x42/0x100 > spawn_ksoftirqd+0x3b/0x45 > do_one_initcall+0x52/0x198 > kernel_init_freeable+0x6f/0x1a1 > kernel_init+0xe/0x100 > ret_from_fork+0x2a/0x40 > > -> #0 (cpu_hotplug_lock.rw_sem){++++}: > cpus_read_lock+0x2a/0x90 > release_ds_buffers+0x29/0xd0 > x86_release_hardware+0x8f/0xa0 > hw_perf_event_destroy+0xe/0x20 > _free_event+0xa7/0x250 > > other info that might help us debug this: > > Chain exists of: > cpu_hotplug_lock.rw_sem --> smpboot_threads_lock --> (complete)&self->parked > > Possible unsafe locking scenario by crosslock: > > CPU0 CPU1 > ---- ---- > lock(smpboot_threads_lock); > lock((complete)&self->parked); > lock(cpu_hotplug_lock.rw_sem); > unlock((complete)&self->parked); > > *** DEADLOCK *** > > 1 lock held by watchdog/3/27: > #0: (&x->wait){....}, at: [] complete+0x1d/0x60 > > stack backtrace: > CPU: 3 PID: 27 Comm: watchdog/3 Not tainted 4.13.0-rc6+ #1 > Hardware name: LENOVO 2320CTO/2320CTO, BIOS G2ET86WW (2.06 ) 11/13/2012 > Call Trace: > dump_stack+0x86/0xcf > print_circular_bug+0x1fa/0x2f0 > check_prev_add+0x3be/0x700 > ? __lock_acquire+0x4c6/0x1110 > ? trace_event_raw_event_lock+0xf0/0xf0 > lock_commit_crosslock+0x40d/0x590 > ? lock_commit_crosslock+0x40d/0x590 > complete+0x29/0x60 > __kthread_parkme+0x54/0x80 > kthread_parkme+0x24/0x40 > smpboot_thread_fn+0x95/0x230 > kthread+0x147/0x180 > ? sort_range+0x30/0x30 > ? kthread_create_on_node+0x40/0x40 > ret_from_fork+0x2a/0x40 I don't really get that one. I *think* it complains about the cpu_hotplug_lock.rw_sem which was held while waiting for the completion ->parked (this is smpboot_register_percpu_thread() -> kthread_park() -> wait_for_completion()) *and* then other case where kthread_park() waits for the thread (while holding cpu_hotplug_lock.rw_sem) and out of nowhere release_ds_buffers() comes on the CPU and asks for the cpu_hotplug_lock.rw_sem. So something like this happens: <...>-1 [002] .... 1.050786: kthread_park: watchdog/3 <...>-1 [002] .... 1.050794: kthread_park: wait… <- wait_for_completion() <...>-27 [003] .... 1.050844: release_ds_buffers: before <...>-27 [003] .... 1.050857: release_ds_buffers: after <...>-27 [003] .... 1.050905: release_ds_buffers: released <...>-27 [003] .... 1.050908: __kthread_parkme: do complete <- complete() <...>-27 [003] d..1 1.050916: print_circular_bug: <...>-1 [002] .... 1.459364: kthread_park: wait… done We can change the timing by adding a printk to kthread_parkme() and then lockdep won't see release_ds_buffers() within the wait_for_completion() section: <...>-1 [003] .... 1.065596: kthread_park: watchdog/3 <...>-27 [003] .... 1.065657: release_ds_buffers: before <...>-27 [003] .... 1.065669: release_ds_buffers: after <...>-27 [003] .... 1.065718: release_ds_buffers: released <...>-27 [003] .... 1.072390: __kthread_parkme: do complete <- complete <...>-1 [003] .... 1.072402: kthread_park: wait… <- wait_for_completion() <...>-1 [003] .... 1.072404: kthread_park: wait… done different timing, no complains. I don't think that this is a bug because we can have multiple readers of the rwsem and *if* a someone is holding a write lock then release_ds_buffers() would be simply deferred until smbboot_.* and the write-lock holder are done. If my interpretation of the situation is correct, how do we teach this lockdep. Sebastian