Return-Path: linux-nfs-owner@vger.kernel.org Received: from mail-qg0-f47.google.com ([209.85.192.47]:50626 "EHLO mail-qg0-f47.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751438AbaHKNEU (ORCPT ); Mon, 11 Aug 2014 09:04:20 -0400 Received: by mail-qg0-f47.google.com with SMTP id i50so8348097qgf.34 for ; Mon, 11 Aug 2014 06:04:20 -0700 (PDT) From: Jeff Layton Date: Mon, 11 Aug 2014 09:04:17 -0400 To: Christoph Hellwig Cc: trond.myklebust@primarydata.com, linux-nfs@vger.kernel.org Subject: Re: [PATCH v2 2/3] nfs4: queue free_lock_state job submission to nfsiod Message-ID: <20140811090417.0287b32b@tlielax.poochiereds.net> In-Reply-To: <20140811075013.77e0a29e@tlielax.poochiereds.net> References: <1398940127-31150-1-git-send-email-jlayton@poochiereds.net> <1398940127-31150-3-git-send-email-jlayton@poochiereds.net> <20140811104253.GA4747@infradead.org> <20140811075013.77e0a29e@tlielax.poochiereds.net> MIME-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Sender: linux-nfs-owner@vger.kernel.org List-ID: On Mon, 11 Aug 2014 07:50:13 -0400 Jeff Layton wrote: > On Mon, 11 Aug 2014 03:42:53 -0700 > Christoph Hellwig wrote: > > > This seems to introduce a fairly easily reproducible oops, which I can > > reproduce when running xfstests against a Linux NFSv4.1 server: > > > > generic/089 242s ...[ 2187.041239] general protection fault: 0000 [#1] > > SMP > > [ 2187.042899] Modules linked in: > > [ 2187.044000] CPU: 0 PID: 11913 Comm: kworker/0:1 Not tainted 3.16.0-rc6+ #1151 > > [ 2187.044287] Hardware name: Bochs Bochs, BIOS Bochs 01/01/2007 > > [ 2187.044287] Workqueue: nfsiod free_lock_state_work > > [ 2187.044287] task: ffff880072b50cd0 ti: ffff88007a4ec000 task.ti: ffff88007a4ec000 > > [ 2187.044287] RIP: 0010:[] [] free_lock_state_work+0x16/0x30 > > [ 2187.044287] RSP: 0018:ffff88007a4efd58 EFLAGS: 00010296 > > [ 2187.044287] RAX: 6b6b6b6b6b6b6b6b RBX: ffff88007a947ac0 RCX: 8000000000000000 > > [ 2187.044287] RDX: ffffffff826af9e0 RSI: ffff88007b093c00 RDI: ffff88007b093db8 > > [ 2187.044287] RBP: ffff88007a4efd58 R08: ffffffff832d3e10 R09: 000001c40efc0000 > > [ 2187.044287] R10: 0000000000000000 R11: 0000000000059e30 R12: ffff88007fc13240 > > [ 2187.044287] R13: ffff88007fc18b00 R14: ffff88007b093db8 R15: 0000000000000000 > > [ 2187.044287] FS: 0000000000000000(0000) GS:ffff88007fc00000(0000) knlGS:0000000000000000 > > [ 2187.044287] CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b > > [ 2187.044287] CR2: 00007f93ec33fb80 CR3: 0000000079dc2000 CR4: 00000000000006f0 > > [ 2187.044287] Stack: > > [ 2187.044287] ffff88007a4efdd8 ffffffff810cc877 ffffffff810cc80d ffff88007fc13258 > > [ 2187.044287] 000000007a947af0 0000000000000000 ffffffff8353ccc8 ffffffff82b6f3d0 > > [ 2187.044287] 0000000000000000 ffffffff82267679 ffff88007a4efdd8 ffff88007fc13240 > > [ 2187.044287] Call Trace: > > [ 2187.044287] [] process_one_work+0x1c7/0x490 > > [ 2187.044287] [] ? process_one_work+0x15d/0x490 > > [ 2187.044287] [] worker_thread+0x119/0x4f0 > > [ 2187.044287] [] ? trace_hardirqs_on+0xd/0x10 > > [ 2187.044287] [] ? init_pwq+0x190/0x190 > > [ 2187.044287] [] kthread+0xdf/0x100 > > [ 2187.044287] [] ? __init_kthread_worker+0x70/0x70 > > [ 2187.044287] [] ret_from_fork+0x7c/0xb0 > > [ 2187.044287] [] ? __init_kthread_worker+0x70/0x70 > > [ 2187.044287] Code: 0f 1f 44 00 00 31 c0 5d c3 66 66 66 2e 0f 1f 84 00 00 00 00 00 55 48 8d b7 48 fe ff ff 48 8b 87 58 fe ff ff 48 89 e5 48 8b 40 30 <48> 8b 00 48 8b 10 48 89 c7 48 8b 92 90 03 00 00 ff 52 28 5d c3 > > [ 2187.044287] RIP [] free_lock_state_work+0x16/0x30 > > [ 2187.044287] RSP > > [ 2187.103626] ---[ end trace 0f11326d28e5d8fa ]--- > > > > seems like this happens when trying to derefence the state owner in > > the work queue: > > > > (gdb) l *(free_lock_state_work+0x16) > > 0xffffffff81361ca6 is in free_lock_state_work (fs/nfs/nfs4state.c:808). > > 803 free_lock_state_work(struct work_struct *work) > > 804 { > > 805 struct nfs4_lock_state *lsp = container_of(work, > > 806 struct nfs4_lock_state, > > ls_release); > > 807 struct nfs4_state *state = lsp->ls_state; > > 808 struct nfs_server *server = state->owner->so_server; > > 809 struct nfs_client *clp = server->nfs_client; > > 810 > > 811 clp->cl_mvops->free_lock_state(server, lsp); > > 812 } > > > > Thanks for the report, Christoph. > > I suspect what's happening is that the lock state is outliving the open > state that it's associated with. That does look to be possible now that > we're queueing ->free_lock_state to a workqueue. > > It looks like the simplest fix would be to have a lock state hold a > reference to the open state, but that could be problematic too. The > open stateid holds an inode reference, but not necessarily one to the > superblock. I think you could end up closing a file and unmounting > before the work runs. > > Mostly we just need the server pointer which implies that we need to > pin the superblock somehow until the workqueue job is done. Should we > just add a sb pointer to nfs4_lock_state, and call nfs_sb_active before > queueing the work and then call nfs_sb_deactive on completion? > Christoph, Does this patch fix it? Note that I haven't yet tested it but it does build. I'll see if I can reproduce this later today and test it out. I'm also not sure whether it's really the _right_ solution but I believe it should fix the problem. ----------------------[snip]------------------------ [PATCH] nfs: pin superblock when running free_lock_state operations Christoph reported seeing this oops when running xfstests on a v4.1 client: generic/089 242s ...[ 2187.041239] general protection fault: 0000 [#1] SMP [ 2187.042899] Modules linked in: [ 2187.044000] CPU: 0 PID: 11913 Comm: kworker/0:1 Not tainted 3.16.0-rc6+ #1151 [ 2187.044287] Hardware name: Bochs Bochs, BIOS Bochs 01/01/2007 [ 2187.044287] Workqueue: nfsiod free_lock_state_work [ 2187.044287] task: ffff880072b50cd0 ti: ffff88007a4ec000 task.ti: ffff88007a4ec000 [ 2187.044287] RIP: 0010:[] [] free_lock_state_work+0x16/0x30 [ 2187.044287] RSP: 0018:ffff88007a4efd58 EFLAGS: 00010296 [ 2187.044287] RAX: 6b6b6b6b6b6b6b6b RBX: ffff88007a947ac0 RCX: 8000000000000000 [ 2187.044287] RDX: ffffffff826af9e0 RSI: ffff88007b093c00 RDI: ffff88007b093db8 [ 2187.044287] RBP: ffff88007a4efd58 R08: ffffffff832d3e10 R09: 000001c40efc0000 [ 2187.044287] R10: 0000000000000000 R11: 0000000000059e30 R12: ffff88007fc13240 [ 2187.044287] R13: ffff88007fc18b00 R14: ffff88007b093db8 R15: 0000000000000000 [ 2187.044287] FS: 0000000000000000(0000) GS:ffff88007fc00000(0000) knlGS:0000000000000000 [ 2187.044287] CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b [ 2187.044287] CR2: 00007f93ec33fb80 CR3: 0000000079dc2000 CR4: 00000000000006f0 [ 2187.044287] Stack: [ 2187.044287] ffff88007a4efdd8 ffffffff810cc877 ffffffff810cc80d ffff88007fc13258 [ 2187.044287] 000000007a947af0 0000000000000000 ffffffff8353ccc8 ffffffff82b6f3d0 [ 2187.044287] 0000000000000000 ffffffff82267679 ffff88007a4efdd8 ffff88007fc13240 [ 2187.044287] Call Trace: [ 2187.044287] [] process_one_work+0x1c7/0x490 [ 2187.044287] [] ? process_one_work+0x15d/0x490 [ 2187.044287] [] worker_thread+0x119/0x4f0 [ 2187.044287] [] ? trace_hardirqs_on+0xd/0x10 [ 2187.044287] [] ? init_pwq+0x190/0x190 [ 2187.044287] [] kthread+0xdf/0x100 [ 2187.044287] [] ? __init_kthread_worker+0x70/0x70 [ 2187.044287] [] ret_from_fork+0x7c/0xb0 [ 2187.044287] [] ? __init_kthread_worker+0x70/0x70 [ 2187.044287] Code: 0f 1f 44 00 00 31 c0 5d c3 66 66 66 2e 0f 1f 84 00 00 00 00 00 55 48 8d b7 48 fe ff ff 48 8b 87 58 fe ff ff 48 89 e5 48 8b 40 30 <48> 8b 00 48 8b 10 48 89 c7 48 8b 92 90 03 00 00 ff 52 28 5d c3 [ 2187.044287] RIP [] free_lock_state_work+0x16/0x30 [ 2187.044287] RSP [ 2187.103626] ---[ end trace 0f11326d28e5d8fa ]--- It appears that the lock state outlived the open state with which it was associated. We don't actually need to get to the open state -- it was just a convenient way to get to the server pointer. We do however need to ensure that the nfs_server doesn't go away until the workqueue job has run. Fix this by pinning down the sb via nfs_sb_active prior to queueing the workqueue job, and then calling nfs_sb_deactive after the free_lock_state operation has completed. Once the rpc_task is queued, then we no longer need to hold down the sb reference. Reported-by: Christoph Hellwig Signed-off-by: Jeff Layton --- fs/nfs/nfs4_fs.h | 1 + fs/nfs/nfs4state.c | 11 +++++++---- 2 files changed, 8 insertions(+), 4 deletions(-) diff --git a/fs/nfs/nfs4_fs.h b/fs/nfs/nfs4_fs.h index 92193eddb41d..6eee1fd6b4f1 100644 --- a/fs/nfs/nfs4_fs.h +++ b/fs/nfs/nfs4_fs.h @@ -135,6 +135,7 @@ struct nfs4_lock_state { #define NFS_LOCK_INITIALIZED 0 #define NFS_LOCK_LOST 1 unsigned long ls_flags; + struct super_block *ls_sb; struct nfs_seqid_counter ls_seqid; nfs4_stateid ls_stateid; atomic_t ls_count; diff --git a/fs/nfs/nfs4state.c b/fs/nfs/nfs4state.c index a770c8e469a7..0a5da4bedbb2 100644 --- a/fs/nfs/nfs4state.c +++ b/fs/nfs/nfs4state.c @@ -804,11 +804,12 @@ free_lock_state_work(struct work_struct *work) { struct nfs4_lock_state *lsp = container_of(work, struct nfs4_lock_state, ls_release); - struct nfs4_state *state = lsp->ls_state; - struct nfs_server *server = state->owner->so_server; + struct super_block *sb = lsp->ls_sb; + struct nfs_server *server = NFS_SB(sb); struct nfs_client *clp = server->nfs_client; clp->cl_mvops->free_lock_state(server, lsp); + nfs_sb_deactive(sb); } /* @@ -896,9 +897,11 @@ void nfs4_put_lock_state(struct nfs4_lock_state *lsp) if (list_empty(&state->lock_states)) clear_bit(LK_STATE_IN_USE, &state->flags); spin_unlock(&state->state_lock); - if (test_bit(NFS_LOCK_INITIALIZED, &lsp->ls_flags)) + if (test_bit(NFS_LOCK_INITIALIZED, &lsp->ls_flags)) { + lsp->ls_sb = lsp->ls_state->inode->i_sb; + nfs_sb_active(lsp->ls_sb); queue_work(nfsiod_workqueue, &lsp->ls_release); - else { + } else { server = state->owner->so_server; nfs4_free_lock_state(server, lsp); } -- 1.9.3 -- Jeff Layton