Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S933733AbdCURTd (ORCPT ); Tue, 21 Mar 2017 13:19:33 -0400 Received: from mail.fireflyinternet.com ([109.228.58.192]:53324 "EHLO fireflyinternet.com" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S932323AbdCURTb (ORCPT ); Tue, 21 Mar 2017 13:19:31 -0400 X-Default-Received-SPF: pass (skip=forwardok (res=PASS)) x-ip-name=78.156.65.138; Date: Tue, 21 Mar 2017 16:00:25 +0000 From: Chris Wilson To: Namhyung Kim Cc: Kees Cook , LKML , intel-gfx@lists.freedesktop.org, Tomi Sarvela , Anton Vorontsov , Colin Cross , Tony Luck , Stefan Hajnoczi , "# v4 . 10+" , kernel-team@lge.com Subject: Re: [PATCH] fs/pstore: Perform erase from a worker Message-ID: <20170321160025.GJ11338@nuc-i3427.alporthouse.com> Mail-Followup-To: Chris Wilson , Namhyung Kim , Kees Cook , LKML , intel-gfx@lists.freedesktop.org, Tomi Sarvela , Anton Vorontsov , Colin Cross , Tony Luck , Stefan Hajnoczi , "# v4 . 10+" , kernel-team@lge.com References: <20170317095223.15080-1-chris@chris-wilson.co.uk> <20170321055848.GA15831@danjae.aot.lge.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20170321055848.GA15831@danjae.aot.lge.com> User-Agent: Mutt/1.5.21 (2010-09-15) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 3735 Lines: 79 On Tue, Mar 21, 2017 at 02:58:48PM +0900, Namhyung Kim wrote: > Hello, > > On Mon, Mar 20, 2017 at 10:49:16AM -0700, Kees Cook wrote: > > On Fri, Mar 17, 2017 at 2:52 AM, Chris Wilson wrote: > > > In order to prevent a cyclic recursion between psi->read_mutex and the > > > inode_lock, we need to move the pse->erase to a worker. > > > > > > [ 605.374955] ====================================================== > > > [ 605.381281] [ INFO: possible circular locking dependency detected ] > > > [ 605.387679] 4.11.0-rc2-CI-CI_DRM_2352+ #1 Not tainted > > > [ 605.392826] ------------------------------------------------------- > > > [ 605.399196] rm/7298 is trying to acquire lock: > > > [ 605.403720] (&psinfo->read_mutex){+.+.+.}, at: [] pstore_unlink+0x3f/0xa0 > > > [ 605.412300] > > > [ 605.412300] but task is already holding lock: > > > [ 605.418237] (&sb->s_type->i_mutex_key#14){++++++}, at: [] vfs_unlink+0x4c/0x19 > > > 0 > > > [ 605.427397] > > > [ 605.427397] which lock already depends on the new lock. > > > [ 605.427397] > > > [ 605.435770] > > > [ 605.435770] the existing dependency chain (in reverse order) is: > > > [ 605.443396] > > > [ 605.443396] -> #1 (&sb->s_type->i_mutex_key#14){++++++}: > > > [ 605.450347] lock_acquire+0xc9/0x220 > > > [ 605.454551] down_write+0x3f/0x70 > > > [ 605.458484] pstore_mkfile+0x1f4/0x460 > > > [ 605.462835] pstore_get_records+0x17a/0x320 > > > [ 605.467664] pstore_fill_super+0xa4/0xc0 > > > [ 605.472205] mount_single+0x89/0xb0 > > > [ 605.476314] pstore_mount+0x13/0x20 > > > [ 605.480411] mount_fs+0xf/0x90 > > > [ 605.484122] vfs_kern_mount+0x66/0x170 > > > [ 605.488464] do_mount+0x190/0xd50 > > > [ 605.492397] SyS_mount+0x90/0xd0 > > > [ 605.496212] entry_SYSCALL_64_fastpath+0x1c/0xb1 > > > [ 605.501496] > > > [ 605.501496] -> #0 (&psinfo->read_mutex){+.+.+.}: > > > [ 605.507747] __lock_acquire+0x1ac0/0x1bb0 > > > [ 605.512401] lock_acquire+0xc9/0x220 > > > [ 605.516594] __mutex_lock+0x6e/0x990 > > > [ 605.520755] mutex_lock_nested+0x16/0x20 > > > [ 605.525279] pstore_unlink+0x3f/0xa0 > > > [ 605.529465] vfs_unlink+0xb5/0x190 > > > [ 605.533477] do_unlinkat+0x24c/0x2a0 > > > [ 605.537672] SyS_unlinkat+0x16/0x30 > > > [ 605.541781] entry_SYSCALL_64_fastpath+0x1c/0xb1 > > > > If I'm reading this right it's a race between mount and unlink... > > that's quite a corner case. :) > > > > > [ 605.547067] > > > [ 605.547067] other info that might help us debug this: > > > [ 605.547067] > > > [ 605.555221] Possible unsafe locking scenario: > > > [ 605.555221] > > > [ 605.561280] CPU0 CPU1 > > > [ 605.565883] ---- ---- > > > [ 605.570502] lock(&sb->s_type->i_mutex_key#14); > > > [ 605.575217] lock(&psinfo->read_mutex); > > > [ 605.581803] lock(&sb->s_type->i_mutex_key#14); > > > [ 605.589159] lock(&psinfo->read_mutex); > > > > I haven't had time to dig much yet, but I wonder if the locking order > > on unlink could just be reversed, and the deadlock would go away? > > IIUC, the unlink path locks a file in the root directory, while the > mount path locks the root directory. Maybe we can use a subclass? > (not tested) More puzzling, or just my confusion, reports from our CI farm say that this patch breaks removing objects from pstote. :| Will look forward to better suggestions on how to avoid lockdep. -Chris -- Chris Wilson, Intel Open Source Technology Centre