2011-03-28 01:56:17

by Ryan Mallon

[permalink] [raw]
Subject: [RFC PATCH 0/2] Introduce unlocked version of igrab

Commit 250df6ed274d767da844a5d9f05720b804240197 "fs: protect
inode->i_state with inode->i_lock" introduces a change to igrab to acquire
inode->i_lock.

This change causes a panic on boot on my ARM EP93xx board when the rootfs
uses NFS. The problem occurs because nfs_inode_add_request acquires
inode->i_lock and then calls igrab, resulting in the following panic:

BUG: spinlock recursion on CPU#0, getty/262
lock: cc421cb4, .magic: dead4ead, .owner: getty/262, .owner_cpu: 0
[<c0031b0c>] (unwind_backtrace+0x0/0xe4) from [<c015f16c>] (do_raw_spin_lock+0x40/0x13c)
[<c015f16c>] (do_raw_spin_lock+0x40/0x13c) from [<c00a938c>] (igrab+0x14/0x48)
[<c00a938c>] (igrab+0x14/0x48) from [<c01186bc>] (nfs_updatepage+0x2e0/0x524)
[<c01186bc>] (nfs_updatepage+0x2e0/0x524) from [<c010b19c>] (nfs_write_end+0x23c/0x270)
[<c010b19c>] (nfs_write_end+0x23c/0x270) from [<c006b484>] (generic_file_buffered_write+0x180/0x248)
[<c006b484>] (generic_file_buffered_write+0x180/0x248) from [<c006d060>] (__generic_file_aio_write+0x3b8/0x3f4)
[<c006d060>] (__generic_file_aio_write+0x3b8/0x3f4) from [<c006d108>] (generic_file_aio_write+0x6c/0xdc)
[<c006d108>] (generic_file_aio_write+0x6c/0xdc) from [<c010bce0>] (nfs_file_write+0xec/0x178)
[<c010bce0>] (nfs_file_write+0xec/0x178) from [<c00956ac>] (do_sync_write+0xa4/0xe4)
[<c00956ac>] (do_sync_write+0xa4/0xe4) from [<c00960c8>] (vfs_write+0xb4/0x12c)
[<c00960c8>] (vfs_write+0xb4/0x12c) from [<c00961f0>] (sys_write+0x3c/0x68)
[<c00961f0>] (sys_write+0x3c/0x68) from [<c002c8e0>] (ret_fast_syscall+0x0/0x2c)

This series introduces a new function called __igrab, which is an unlocked
version of igrab and modifies nfs_inode_add_request to use the unlocked
version.

Ryan Mallon (2):
Add unlocked version of igrab.
Use __igrab instead of igrab in nfs_inode_add_request

fs/inode.c | 16 ++++++++++++----
fs/nfs/write.c | 2 +-
include/linux/fs.h | 1 +
3 files changed, 14 insertions(+), 5 deletions(-)


2011-03-28 01:55:48

by Ryan Mallon

[permalink] [raw]
Subject: [RFC PATCH 2/2] Use __igrab instead of igrab in nfs_inode_add_request

nfs_inode_add_request already holds inode->i_lock, so call the unlocked
__igrab instead of igrab.

Signed-off-by: Ryan Mallon <[email protected]>
---
fs/nfs/write.c | 2 +-
1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/fs/nfs/write.c b/fs/nfs/write.c
index 85d7525..b161642 100644
--- a/fs/nfs/write.c
+++ b/fs/nfs/write.c
@@ -390,7 +390,7 @@ static int nfs_inode_add_request(struct inode *inode, struct nfs_page *req)
error = radix_tree_insert(&nfsi->nfs_page_tree, req->wb_index, req);
BUG_ON(error);
if (!nfsi->npages) {
- igrab(inode);
+ __igrab(inode);
if (nfs_have_delegation(inode, FMODE_WRITE))
nfsi->change_attr++;
}
--
1.7.0.4

2011-03-28 01:56:14

by Ryan Mallon

[permalink] [raw]
Subject: [RFC PATCH 1/2] Add unlocked version of igrab.

Commit 250df6ed274d767da844a5d9f05720b804240197 "fs: protect
inode->i_state with inode->i_lock" changes igrab to acquire inode->i_lock,
however some callees, notably nfs_inode_add_request, already hold the lock
when calling igrab.

Introduce an unlocked version of igrab called __igrab which can be
called when inode->i_lock is already held.

Signed-off-by: Ryan Mallon <[email protected]>
---
fs/inode.c | 16 ++++++++++++----
include/linux/fs.h | 1 +
2 files changed, 13 insertions(+), 4 deletions(-)

diff --git a/fs/inode.c b/fs/inode.c
index 05a1f75..bdcfbba 100644
--- a/fs/inode.c
+++ b/fs/inode.c
@@ -1134,14 +1134,11 @@ ino_t iunique(struct super_block *sb, ino_t max_reserved)
}
EXPORT_SYMBOL(iunique);

-struct inode *igrab(struct inode *inode)
+struct inode *__igrab(struct inode *inode)
{
- spin_lock(&inode->i_lock);
if (!(inode->i_state & (I_FREEING|I_WILL_FREE))) {
__iget(inode);
- spin_unlock(&inode->i_lock);
} else {
- spin_unlock(&inode->i_lock);
/*
* Handle the case where s_op->clear_inode is not been
* called yet, and somebody is calling igrab
@@ -1149,6 +1146,17 @@ struct inode *igrab(struct inode *inode)
*/
inode = NULL;
}
+
+ return inode;
+}
+EXPORT_SYMBOL(__igrab);
+
+struct inode *igrab(struct inode *inode)
+{
+ spin_lock(&inode->i_lock);
+ inode = __igrab(inode);
+ spin_unlock(&inode->i_lock);
+
return inode;
}
EXPORT_SYMBOL(igrab);
diff --git a/include/linux/fs.h b/include/linux/fs.h
index b677bd7..32c4bc5 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -2240,6 +2240,7 @@ extern void inode_init_once(struct inode *);
extern void address_space_init_once(struct address_space *mapping);
extern void ihold(struct inode * inode);
extern void iput(struct inode *);
+extern struct inode *__igrab(struct inode *inode);
extern struct inode * igrab(struct inode *);
extern ino_t iunique(struct super_block *, ino_t);
extern int inode_needs_sync(struct inode *inode);
--
1.7.0.4

2011-03-28 02:54:26

by Matthew Wilcox

[permalink] [raw]
Subject: Re: [RFC PATCH 1/2] Add unlocked version of igrab.

On Mon, Mar 28, 2011 at 02:56:00PM +1300, Ryan Mallon wrote:
> Commit 250df6ed274d767da844a5d9f05720b804240197 "fs: protect
> inode->i_state with inode->i_lock" changes igrab to acquire inode->i_lock,
> however some callees, notably nfs_inode_add_request, already hold the lock
> when calling igrab.

I think a better solution to your problem is to notice that this is
called in the context of doing a write to an inode. That means we
must already have a reference count on this inode, so it can't possibly
be in I_FREEING or I_WILL_FREE. That means we can just call __iget()
instead ... except that __iget isn't exported to modules.

So we could just bump the refcount by hand ... or we could
EXPORT_SYMBOL(__iget).


Or Al's going to swear a lot about what NFS is doing here and how it's
utterly misdesigned. Anyway, this patch could be part of the solution.


diff --git a/fs/nfs/write.c b/fs/nfs/write.c
index 85d7525..330cef3 100644
--- a/fs/nfs/write.c
+++ b/fs/nfs/write.c
@@ -390,7 +390,7 @@ static int nfs_inode_add_request(struct inode *inode, struct nfs_page *req)
error = radix_tree_insert(&nfsi->nfs_page_tree, req->wb_index, req);
BUG_ON(error);
if (!nfsi->npages) {
- igrab(inode);
+ __iget(inode);
if (nfs_have_delegation(inode, FMODE_WRITE))
nfsi->change_attr++;
}

--
Matthew Wilcox Intel Open Source Technology Centre
"Bill, look, we understand that you're interested in selling us this
operating system, but compare it to ours. We can't possibly take such
a retrograde step."

2011-03-28 04:38:57

by Ryan Mallon

[permalink] [raw]
Subject: Re: [RFC PATCH 1/2] Add unlocked version of igrab.

On 03/28/2011 03:54 PM, Matthew Wilcox wrote:
> On Mon, Mar 28, 2011 at 02:56:00PM +1300, Ryan Mallon wrote:
>> Commit 250df6ed274d767da844a5d9f05720b804240197 "fs: protect
>> inode->i_state with inode->i_lock" changes igrab to acquire inode->i_lock,
>> however some callees, notably nfs_inode_add_request, already hold the lock
>> when calling igrab.
>
> I think a better solution to your problem is to notice that this is
> called in the context of doing a write to an inode. That means we
> must already have a reference count on this inode, so it can't possibly
> be in I_FREEING or I_WILL_FREE. That means we can just call __iget()
> instead ... except that __iget isn't exported to modules.

Ah, okay. Thanks for the hint.

A few other locations that I can see that call igrab with inode->i_lock
held are:

fs/ceph/snap.c::ceph_queue_cap_snap
fs/ceph/addr.c::ceph_set_page_dirty
fs/nfs/nfs4state.c::nfs4_get_open_state

There may be some more cases where the locking is less obvious. I don't
know enough about the filesystem code to say whether each of those can
skip the (I_FREEING | I_WILL_FREE) check, or whether the correct
approach is to modify the filesystems themselves so that they do not
hold i_lock when calling igrab (i.e. rework to use a different outer lock)?

If the correct approach is to use __iget or __igrab then I can prepare a
patch for this. In the case of __iget, should it just be marked
EXPORT_SYMBOL and added to include/linux/fs.h?

~Ryan

--
Bluewater Systems Ltd - ARM Technology Solution Centre

Ryan Mallon 5 Amuri Park, 404 Barbadoes St
[email protected] PO Box 13 889, Christchurch 8013
http://www.bluewatersys.com New Zealand
Phone: +64 3 3779127 Freecall: Australia 1800 148 751
Fax: +64 3 3779135 USA 1800 261 2934

2011-03-28 04:43:52

by Dave Chinner

[permalink] [raw]
Subject: Re: [RFC PATCH 0/2] Introduce unlocked version of igrab

On Mon, Mar 28, 2011 at 02:55:59PM +1300, Ryan Mallon wrote:
> Commit 250df6ed274d767da844a5d9f05720b804240197 "fs: protect
> inode->i_state with inode->i_lock" introduces a change to igrab to acquire
> inode->i_lock.
>
> This change causes a panic on boot on my ARM EP93xx board when the rootfs
> uses NFS. The problem occurs because nfs_inode_add_request acquires
> inode->i_lock and then calls igrab, resulting in the following panic:
>
> BUG: spinlock recursion on CPU#0, getty/262
> lock: cc421cb4, .magic: dead4ead, .owner: getty/262, .owner_cpu: 0
> [<c0031b0c>] (unwind_backtrace+0x0/0xe4) from [<c015f16c>] (do_raw_spin_lock+0x40/0x13c)
> [<c015f16c>] (do_raw_spin_lock+0x40/0x13c) from [<c00a938c>] (igrab+0x14/0x48)
> [<c00a938c>] (igrab+0x14/0x48) from [<c01186bc>] (nfs_updatepage+0x2e0/0x524)
> [<c01186bc>] (nfs_updatepage+0x2e0/0x524) from [<c010b19c>] (nfs_write_end+0x23c/0x270)
> [<c010b19c>] (nfs_write_end+0x23c/0x270) from [<c006b484>] (generic_file_buffered_write+0x180/0x248)
> [<c006b484>] (generic_file_buffered_write+0x180/0x248) from [<c006d060>] (__generic_file_aio_write+0x3b8/0x3f4)
> [<c006d060>] (__generic_file_aio_write+0x3b8/0x3f4) from [<c006d108>] (generic_file_aio_write+0x6c/0xdc)
> [<c006d108>] (generic_file_aio_write+0x6c/0xdc) from [<c010bce0>] (nfs_file_write+0xec/0x178)
> [<c010bce0>] (nfs_file_write+0xec/0x178) from [<c00956ac>] (do_sync_write+0xa4/0xe4)
> [<c00956ac>] (do_sync_write+0xa4/0xe4) from [<c00960c8>] (vfs_write+0xb4/0x12c)
> [<c00960c8>] (vfs_write+0xb4/0x12c) from [<c00961f0>] (sys_write+0x3c/0x68)
> [<c00961f0>] (sys_write+0x3c/0x68) from [<c002c8e0>] (ret_fast_syscall+0x0/0x2c)
>
> This series introduces a new function called __igrab, which is an unlocked
> version of igrab and modifies nfs_inode_add_request to use the unlocked
> version.

It's called ihold() and already exists.

As it is, I thought I posted a igrab->ihold fix to lkml a couple of
days ago when this first came up. Trond then posted a better fix by
removingthe igrab() altoegther in the same thread.

Turns out, someone removed LKML from the cc list on the thread half
way through so neither of those patches made it to lkml.

Trond, can you push your fix for this (NFS: Fix a hang in the
writeback path) to Linus?

Cheers,

Dave.
--
Dave Chinner
[email protected]

2011-03-28 04:46:45

by Ryan Mallon

[permalink] [raw]
Subject: Re: [RFC PATCH 0/2] Introduce unlocked version of igrab

On 03/28/2011 05:43 PM, Dave Chinner wrote:
> On Mon, Mar 28, 2011 at 02:55:59PM +1300, Ryan Mallon wrote:
>> Commit 250df6ed274d767da844a5d9f05720b804240197 "fs: protect
>> inode->i_state with inode->i_lock" introduces a change to igrab to acquire
>> inode->i_lock.
>>
>> This change causes a panic on boot on my ARM EP93xx board when the rootfs
>> uses NFS. The problem occurs because nfs_inode_add_request acquires
>> inode->i_lock and then calls igrab, resulting in the following panic:
>>
>> BUG: spinlock recursion on CPU#0, getty/262
>> lock: cc421cb4, .magic: dead4ead, .owner: getty/262, .owner_cpu: 0
>> [<c0031b0c>] (unwind_backtrace+0x0/0xe4) from [<c015f16c>] (do_raw_spin_lock+0x40/0x13c)
>> [<c015f16c>] (do_raw_spin_lock+0x40/0x13c) from [<c00a938c>] (igrab+0x14/0x48)
>> [<c00a938c>] (igrab+0x14/0x48) from [<c01186bc>] (nfs_updatepage+0x2e0/0x524)
>> [<c01186bc>] (nfs_updatepage+0x2e0/0x524) from [<c010b19c>] (nfs_write_end+0x23c/0x270)
>> [<c010b19c>] (nfs_write_end+0x23c/0x270) from [<c006b484>] (generic_file_buffered_write+0x180/0x248)
>> [<c006b484>] (generic_file_buffered_write+0x180/0x248) from [<c006d060>] (__generic_file_aio_write+0x3b8/0x3f4)
>> [<c006d060>] (__generic_file_aio_write+0x3b8/0x3f4) from [<c006d108>] (generic_file_aio_write+0x6c/0xdc)
>> [<c006d108>] (generic_file_aio_write+0x6c/0xdc) from [<c010bce0>] (nfs_file_write+0xec/0x178)
>> [<c010bce0>] (nfs_file_write+0xec/0x178) from [<c00956ac>] (do_sync_write+0xa4/0xe4)
>> [<c00956ac>] (do_sync_write+0xa4/0xe4) from [<c00960c8>] (vfs_write+0xb4/0x12c)
>> [<c00960c8>] (vfs_write+0xb4/0x12c) from [<c00961f0>] (sys_write+0x3c/0x68)
>> [<c00961f0>] (sys_write+0x3c/0x68) from [<c002c8e0>] (ret_fast_syscall+0x0/0x2c)
>>
>> This series introduces a new function called __igrab, which is an unlocked
>> version of igrab and modifies nfs_inode_add_request to use the unlocked
>> version.
>
> It's called ihold() and already exists.

Thanks. Missed that one.

Is ihold the correct replacement for the fs/ceph cases I mentioned in my
other email?

~Ryan

--
Bluewater Systems Ltd - ARM Technology Solution Centre

Ryan Mallon 5 Amuri Park, 404 Barbadoes St
[email protected] PO Box 13 889, Christchurch 8013
http://www.bluewatersys.com New Zealand
Phone: +64 3 3779127 Freecall: Australia 1800 148 751
Fax: +64 3 3779135 USA 1800 261 2934

2011-03-28 05:03:41

by Ryan Mallon

[permalink] [raw]
Subject: Re: [RFC PATCH 0/2] Introduce unlocked version of igrab

On 03/28/2011 05:47 PM, Ryan Mallon wrote:
> On 03/28/2011 05:43 PM, Dave Chinner wrote:
>> On Mon, Mar 28, 2011 at 02:55:59PM +1300, Ryan Mallon wrote:
>>> Commit 250df6ed274d767da844a5d9f05720b804240197 "fs: protect
>>> inode->i_state with inode->i_lock" introduces a change to igrab to acquire
>>> inode->i_lock.
>>>
>>> This change causes a panic on boot on my ARM EP93xx board when the rootfs
>>> uses NFS. The problem occurs because nfs_inode_add_request acquires
>>> inode->i_lock and then calls igrab, resulting in the following panic:
>>>
>>> BUG: spinlock recursion on CPU#0, getty/262
>>> lock: cc421cb4, .magic: dead4ead, .owner: getty/262, .owner_cpu: 0
>>> [<c0031b0c>] (unwind_backtrace+0x0/0xe4) from [<c015f16c>] (do_raw_spin_lock+0x40/0x13c)
>>> [<c015f16c>] (do_raw_spin_lock+0x40/0x13c) from [<c00a938c>] (igrab+0x14/0x48)
>>> [<c00a938c>] (igrab+0x14/0x48) from [<c01186bc>] (nfs_updatepage+0x2e0/0x524)
>>> [<c01186bc>] (nfs_updatepage+0x2e0/0x524) from [<c010b19c>] (nfs_write_end+0x23c/0x270)
>>> [<c010b19c>] (nfs_write_end+0x23c/0x270) from [<c006b484>] (generic_file_buffered_write+0x180/0x248)
>>> [<c006b484>] (generic_file_buffered_write+0x180/0x248) from [<c006d060>] (__generic_file_aio_write+0x3b8/0x3f4)
>>> [<c006d060>] (__generic_file_aio_write+0x3b8/0x3f4) from [<c006d108>] (generic_file_aio_write+0x6c/0xdc)
>>> [<c006d108>] (generic_file_aio_write+0x6c/0xdc) from [<c010bce0>] (nfs_file_write+0xec/0x178)
>>> [<c010bce0>] (nfs_file_write+0xec/0x178) from [<c00956ac>] (do_sync_write+0xa4/0xe4)
>>> [<c00956ac>] (do_sync_write+0xa4/0xe4) from [<c00960c8>] (vfs_write+0xb4/0x12c)
>>> [<c00960c8>] (vfs_write+0xb4/0x12c) from [<c00961f0>] (sys_write+0x3c/0x68)
>>> [<c00961f0>] (sys_write+0x3c/0x68) from [<c002c8e0>] (ret_fast_syscall+0x0/0x2c)
>>>
>>> This series introduces a new function called __igrab, which is an unlocked
>>> version of igrab and modifies nfs_inode_add_request to use the unlocked
>>> version.
>> It's called ihold() and already exists.
> Thanks. Missed that one.
>
> Is ihold the correct replacement for the fs/ceph cases I mentioned in my
> other email?
>
> ~Ryan
>

i.e. this:
---

fs/ceph: Use ihold instead of igrab when i_lock is already held

Signed-off-by: Ryan Mallon <[email protected]>
---
diff --git a/fs/ceph/addr.c b/fs/ceph/addr.c
index 561438b..37368ba 100644
--- a/fs/ceph/addr.c
+++ b/fs/ceph/addr.c
@@ -92,7 +92,7 @@ static int ceph_set_page_dirty(struct page *page)
ci->i_head_snapc = ceph_get_snap_context(snapc);
++ci->i_wrbuffer_ref_head;
if (ci->i_wrbuffer_ref == 0)
- igrab(inode);
+ ihold(inode);
++ci->i_wrbuffer_ref;
dout("%p set_page_dirty %p idx %lu head %d/%d -> %d/%d "
"snapc %p seq %lld (%d snaps)\n",
diff --git a/fs/ceph/snap.c b/fs/ceph/snap.c
index f40b913..03afa8e 100644
--- a/fs/ceph/snap.c
+++ b/fs/ceph/snap.c
@@ -463,7 +463,7 @@ void ceph_queue_cap_snap(struct ceph_inode_info *ci)

dout("queue_cap_snap %p cap_snap %p queuing under %p\n", inode,
capsnap, snapc);
- igrab(inode);
+ ihold(inode);

atomic_set(&capsnap->nref, 1);
capsnap->ci = ci;

2011-03-28 05:19:36

by Dave Chinner

[permalink] [raw]
Subject: [PATCH] fs: don't use igrab() while holding i_lock (was Re: [RFC PATCH 1/2] Add unlocked version of igrab.)

On Mon, Mar 28, 2011 at 05:39:13PM +1300, Ryan Mallon wrote:
> On 03/28/2011 03:54 PM, Matthew Wilcox wrote:
> > On Mon, Mar 28, 2011 at 02:56:00PM +1300, Ryan Mallon wrote:
> >> Commit 250df6ed274d767da844a5d9f05720b804240197 "fs: protect
> >> inode->i_state with inode->i_lock" changes igrab to acquire inode->i_lock,
> >> however some callees, notably nfs_inode_add_request, already hold the lock
> >> when calling igrab.
> >
> > I think a better solution to your problem is to notice that this is
> > called in the context of doing a write to an inode. That means we
> > must already have a reference count on this inode, so it can't possibly
> > be in I_FREEING or I_WILL_FREE. That means we can just call __iget()
> > instead ... except that __iget isn't exported to modules.
>
> Ah, okay. Thanks for the hint.
>
> A few other locations that I can see that call igrab with inode->i_lock
> held are:
>
> fs/ceph/snap.c::ceph_queue_cap_snap
> fs/ceph/addr.c::ceph_set_page_dirty

I don't know how I missed these uses when auditing Nick's code - we
caught the use of the dcache_lock inside i_lock and got that fixed,
but missed these ones.

> fs/nfs/nfs4state.c::nfs4_get_open_state

I know I fixed this one once, along with the first NFS issue you
tripped over. Somehow I lost them along the way.

> There may be some more cases where the locking is less obvious. I don't
> know enough about the filesystem code to say whether each of those can
> skip the (I_FREEING | I_WILL_FREE) check, or whether the correct
> approach is to modify the filesystems themselves so that they do not
> hold i_lock when calling igrab (i.e. rework to use a different outer lock)?
>
> If the correct approach is to use __iget or __igrab then I can prepare a
> patch for this. In the case of __iget, should it just be marked
> EXPORT_SYMBOL and added to include/linux/fs.h?

All of them should simply be a conversion from igrab() to ihold(),
which is already exported. Patch below for all 4 you've reported.

Cheers,

Dave.
--
Dave Chinner
[email protected]

fs: don't use igrab() while holding i_lock

From: Dave Chinner <[email protected]>

If we are already holding the i_lock, we have a reference to the
inode so we can safely use ihold() to gain an extra reference. This
avoids hangs due to lock recursion on the i_lock.

Reviewed-by: Dave Chinner <[email protected]>
---
fs/ceph/addr.c | 2 +-
fs/ceph/snap.c | 4 ++--
fs/nfs/nfs4state.c | 2 +-
fs/nfs/write.c | 2 +-
4 files changed, 5 insertions(+), 5 deletions(-)

diff --git a/fs/ceph/addr.c b/fs/ceph/addr.c
index 561438b..37368ba 100644
--- a/fs/ceph/addr.c
+++ b/fs/ceph/addr.c
@@ -92,7 +92,7 @@ static int ceph_set_page_dirty(struct page *page)
ci->i_head_snapc = ceph_get_snap_context(snapc);
++ci->i_wrbuffer_ref_head;
if (ci->i_wrbuffer_ref == 0)
- igrab(inode);
+ ihold(inode);
++ci->i_wrbuffer_ref;
dout("%p set_page_dirty %p idx %lu head %d/%d -> %d/%d "
"snapc %p seq %lld (%d snaps)\n",
diff --git a/fs/ceph/snap.c b/fs/ceph/snap.c
index f40b913..0aee66b 100644
--- a/fs/ceph/snap.c
+++ b/fs/ceph/snap.c
@@ -463,8 +463,8 @@ void ceph_queue_cap_snap(struct ceph_inode_info *ci)

dout("queue_cap_snap %p cap_snap %p queuing under %p\n", inode,
capsnap, snapc);
- igrab(inode);
-
+ ihold(inode);
+
atomic_set(&capsnap->nref, 1);
capsnap->ci = ci;
INIT_LIST_HEAD(&capsnap->ci_item);
diff --git a/fs/nfs/nfs4state.c b/fs/nfs/nfs4state.c
index ab1bf5b..da6e895 100644
--- a/fs/nfs/nfs4state.c
+++ b/fs/nfs/nfs4state.c
@@ -590,7 +590,7 @@ nfs4_get_open_state(struct inode *inode, struct nfs4_state_owner *owner)
state->owner = owner;
atomic_inc(&owner->so_count);
list_add(&state->inode_states, &nfsi->open_states);
- state->inode = igrab(inode);
+ state->inode = ihold(inode);
spin_unlock(&inode->i_lock);
/* Note: The reclaim code dictates that we add stateless
* and read-only stateids to the end of the list */
diff --git a/fs/nfs/write.c b/fs/nfs/write.c
index 85d7525..3236951 100644
--- a/fs/nfs/write.c
+++ b/fs/nfs/write.c
@@ -390,7 +390,7 @@ static int nfs_inode_add_request(struct inode *inode, struct nfs_page *req)
error = radix_tree_insert(&nfsi->nfs_page_tree, req->wb_index, req);
BUG_ON(error);
if (!nfsi->npages) {
- igrab(inode);
+ ihold(inode);
if (nfs_have_delegation(inode, FMODE_WRITE))
nfsi->change_attr++;
}

2011-03-28 06:02:07

by Dave Chinner

[permalink] [raw]
Subject: Re: [RFC PATCH 0/2] Introduce unlocked version of igrab

On Mon, Mar 28, 2011 at 06:03:51PM +1300, Ryan Mallon wrote:
> On 03/28/2011 05:47 PM, Ryan Mallon wrote:
> > On 03/28/2011 05:43 PM, Dave Chinner wrote:
> >> On Mon, Mar 28, 2011 at 02:55:59PM +1300, Ryan Mallon wrote:
> >>> Commit 250df6ed274d767da844a5d9f05720b804240197 "fs: protect
> >>> inode->i_state with inode->i_lock" introduces a change to igrab to acquire
> >>> inode->i_lock.
> >>>
> >>> This change causes a panic on boot on my ARM EP93xx board when the rootfs
> >>> uses NFS. The problem occurs because nfs_inode_add_request acquires
> >>> inode->i_lock and then calls igrab, resulting in the following panic:
> >>>
> >>> BUG: spinlock recursion on CPU#0, getty/262
> >>> lock: cc421cb4, .magic: dead4ead, .owner: getty/262, .owner_cpu: 0
> >>> [<c0031b0c>] (unwind_backtrace+0x0/0xe4) from [<c015f16c>] (do_raw_spin_lock+0x40/0x13c)
> >>> [<c015f16c>] (do_raw_spin_lock+0x40/0x13c) from [<c00a938c>] (igrab+0x14/0x48)
> >>> [<c00a938c>] (igrab+0x14/0x48) from [<c01186bc>] (nfs_updatepage+0x2e0/0x524)
> >>> [<c01186bc>] (nfs_updatepage+0x2e0/0x524) from [<c010b19c>] (nfs_write_end+0x23c/0x270)
> >>> [<c010b19c>] (nfs_write_end+0x23c/0x270) from [<c006b484>] (generic_file_buffered_write+0x180/0x248)
> >>> [<c006b484>] (generic_file_buffered_write+0x180/0x248) from [<c006d060>] (__generic_file_aio_write+0x3b8/0x3f4)
> >>> [<c006d060>] (__generic_file_aio_write+0x3b8/0x3f4) from [<c006d108>] (generic_file_aio_write+0x6c/0xdc)
> >>> [<c006d108>] (generic_file_aio_write+0x6c/0xdc) from [<c010bce0>] (nfs_file_write+0xec/0x178)
> >>> [<c010bce0>] (nfs_file_write+0xec/0x178) from [<c00956ac>] (do_sync_write+0xa4/0xe4)
> >>> [<c00956ac>] (do_sync_write+0xa4/0xe4) from [<c00960c8>] (vfs_write+0xb4/0x12c)
> >>> [<c00960c8>] (vfs_write+0xb4/0x12c) from [<c00961f0>] (sys_write+0x3c/0x68)
> >>> [<c00961f0>] (sys_write+0x3c/0x68) from [<c002c8e0>] (ret_fast_syscall+0x0/0x2c)
> >>>
> >>> This series introduces a new function called __igrab, which is an unlocked
> >>> version of igrab and modifies nfs_inode_add_request to use the unlocked
> >>> version.
> >> It's called ihold() and already exists.
> > Thanks. Missed that one.
> >
> > Is ihold the correct replacement for the fs/ceph cases I mentioned in my
> > other email?
> >
> > ~Ryan
> >
>
> i.e. this:
> ---
>
> fs/ceph: Use ihold instead of igrab when i_lock is already held
>
> Signed-off-by: Ryan Mallon <[email protected]>

In essence, yes, though the NFS case (nfs4state.c) also needs the
same treatment. I posted a patch that fixes all the cases you
reported so you can continue to work without needing Tronnnd's
bigger fix for the initial problem.

Cheers,

Dave.
--
Dave Chinner
[email protected]