2005-07-06 21:25:10

by Nick Wilson

[permalink] [raw]
Subject: [PATCH] NFS: fix client hang due to race condition

The flags field in struct nfs_inode is protected by the BKL. The
following two code paths (there may be more, but my test program only
hits these two) modify the flags without obtaining the lock:

nfs_end_data_update
nfs_release
nfs_file_release
__fput
fput
filp_close
sys_close
syscall_call

nfs_revalidate_mapping
nfs_file_write
do_sync_write
vfs_write
sys_write
syscall_call

Running multiple instances of a simple program [1] that opens, writes
to, and closes NFS mounted files eventually results in the programs
hanging on an SMP system (see kernel .config [3]).

I've been testing this with 100 instances of the program:
$ ./breaknfs 100 &

Usually within 10 minutes, all instances of breaknfs will hang. They
disappear from the output of 'top' and there is no NFS activity between
the client and server.

/proc/*/wchan shows 22 instances of breaknfs are waiting on
nfs_wait_on_inode, and 78 on .text.lock.namei

echo t > /proc/sysrq-trigger output [2] shows 22 instances of breaknfs
similar to this...:
breaknfs S 00100100 5060 5530 5523 5531 5529 (NOTLB)
de0d1e24 00000086 c01178e0 00100100 de0d1de4 00000000 00000000 de0d1e14
de0d1dec c0309513 de0d1e0c c0127c7e 00000000 dfaff020 c140e400 000004c3
b37f50b5 0000003a c140e8c0 de7815b0 de7816d8 dbb5963c dbb59650 de0d0000
Call Trace:
[<c01eac01>] nfs_wait_on_inode+0x1b1/0x1c0
[<c01eb2ac>] __nfs_revalidate_inode+0x2cc/0x340
[<c01e8b1c>] nfs_file_flush+0x8c/0xc0
[<c0159366>] filp_close+0x56/0x70
[<c01593e9>] sys_close+0x69/0x90
[<c0103039>] syscall_call+0x7/0xb

... and 78 similar to this:
breaknfs D 00000310 5060 5523 5466 5524 (NOTLB)
ddcafebc 00000082 c0369810 00000310 ddcaff58 ddcafe90 db975690 00000000
ddcafee0 ddcafe94 c0170a75 ddcaff58 00000000 dfaff020 c140e400 00000178
b2b3096d 0000003a c140e8c0 df839550 df839678 dbb59e70 dbb59e78 00000286
Call Trace:
[<c03075b3>] __down+0x83/0xe0
[<c030772e>] __down_failed+0xa/0x10
[<c016b295>] .text.lock.namei+0xaa/0x1e5
[<c0158e5d>] filp_open+0x2d/0x50
[<c01592ad>] sys_open+0x4d/0x80
[<c0103039>] syscall_call+0x7/0xb

NFS mount options from /proc/mounts:
rw,v3,rsize=32768,wsize=32768,hard,intr,udp,lock,addr=njw

I've reproduced this bug on 2.6.11.10, 2.6.12-mm2, and 2.6.13-rc2.

With my patch against 2.6.13-rc2 below, I ran 100 instances of breaknfs
with this patch for 14 hours and I was unable to get the client to hang.

Thanks,

Nick Wilson

[1] http://developer.osdl.org/njw/nfs-bug/breaknfs.c
[2] http://developer.osdl.org/njw/nfs-bug/alt-sysrq-t.txt
[3] http://developer.osdl.org/njw/nfs-bug/kernel-config



The flags field in struct nfs_inode is protected by the BKL. This patch
fixes a couple places where the lock is not obtained before changing the
flags.

Signed-off-by: Nick Wilson <[email protected]>
---

inode.c | 4 ++++
1 files changed, 4 insertions(+)

--- linux.orig/fs/nfs/inode.c 2005-07-06 11:08:27.000000000 -0700
+++ linux/fs/nfs/inode.c 2005-07-06 11:20:19.000000000 -0700
@@ -1118,7 +1118,9 @@ void nfs_revalidate_mapping(struct inode
nfs_wb_all(inode);
}
invalidate_inode_pages2(mapping);
+ lock_kernel();
nfsi->flags &= ~NFS_INO_INVALID_DATA;
+ unlock_kernel();
if (S_ISDIR(inode->i_mode)) {
memset(nfsi->cookieverf, 0, sizeof(nfsi->cookieverf));
/* This ensures we revalidate child dentries */
@@ -1153,10 +1155,12 @@ void nfs_end_data_update(struct inode *i

if (!nfs_have_delegation(inode, FMODE_READ)) {
/* Mark the attribute cache for revalidation */
+ lock_kernel();
nfsi->flags |= NFS_INO_INVALID_ATTR;
/* Directories and symlinks: invalidate page cache too */
if (S_ISDIR(inode->i_mode) || S_ISLNK(inode->i_mode))
nfsi->flags |= NFS_INO_INVALID_DATA;
+ unlock_kernel();
}
nfsi->cache_change_attribute ++;
atomic_dec(&nfsi->data_updates);
_


-------------------------------------------------------
SF.Net email is sponsored by: Discover Easy Linux Migration Strategies
from IBM. Find simple to follow Roadmaps, straightforward articles,
informative Webcasts and more! Get everything you need to get up to
speed, fast. http://ads.osdn.com/?ad_id=7477&alloc_id=16492&op=click
_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs


2005-07-06 21:41:14

by Andrew Morton

[permalink] [raw]
Subject: Re: [PATCH] NFS: fix client hang due to race condition

Nick Wilson <[email protected]> wrote:
>
> The flags field in struct nfs_inode is protected by the BKL. This patch
> fixes a couple places where the lock is not obtained before changing the
> flags.
>

Yeah, nasty. Well caught.

> }
> invalidate_inode_pages2(mapping);
> + lock_kernel();
> nfsi->flags &= ~NFS_INO_INVALID_DATA;
> + unlock_kernel();

Adding new lock_kernel()s is a bit retro. We might want to use a per-inode
lock for this, or set_bit/clear_bit and friends.

If we choose to use a per-inode lock then it is legal to use inode.i_lock
(coz I said) as long as no locks are nested inside it. i_lock's mandate is
"a general-purpose innermost per-inode lock".

2005-07-07 02:11:35

by Lever, Charles

[permalink] [raw]
Subject: RE: [PATCH] NFS: fix client hang due to race condition

> The flags field in struct nfs_inode is protected by the BKL. The
> following two code paths (there may be more, but my test program only
> hits these two) modify the flags without obtaining the lock:
>=20
> nfs_end_data_update
> nfs_release
> nfs_file_release
> __fput
> fput
> filp_close
> sys_close
> syscall_call
>=20
> nfs_revalidate_mapping
> nfs_file_write
> do_sync_write
> vfs_write
> sys_write
> syscall_call
>=20
> Running multiple instances of a simple program [1] that opens, writes
> to, and closes NFS mounted files eventually results in the programs
> hanging on an SMP system (see kernel .config [3]).
>=20
> I've been testing this with 100 instances of the program:
> $ ./breaknfs 100 &
>=20
> Usually within 10 minutes, all instances of breaknfs will hang. They
> disappear from the output of 'top' and there is no NFS=20
> activity between
> the client and server.

[ sysrq output snipped... ]

> I've reproduced this bug on 2.6.11.10, 2.6.12-mm2, and 2.6.13-rc2.
>=20
> With my patch against 2.6.13-rc2 below, I ran 100 instances=20
> of breaknfs
> with this patch for 14 hours and I was unable to get the=20
> client to hang.

i agree this is a problem.

but instead of using heavyweight synchronization, why not convert the
NFS_INO flags into atomic bitops? i have a patch that does that; would
need to be ported to the latest kernels and tested to see if it
addresses the problem.

nick, are you interested in trying it out?


-------------------------------------------------------
SF.Net email is sponsored by: Discover Easy Linux Migration Strategies
from IBM. Find simple to follow Roadmaps, straightforward articles,
informative Webcasts and more! Get everything you need to get up to
speed, fast. http://ads.osdn.com/?ad_id=7477&alloc_id=16492&op=click
_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs

2005-07-07 17:19:00

by Nick Wilson

[permalink] [raw]
Subject: Re: [PATCH] NFS: fix client hang due to race condition

On Wed, Jul 06, 2005 at 07:11:25PM -0700, Lever, Charles wrote:
> > The flags field in struct nfs_inode is protected by the BKL. The
> > following two code paths (there may be more, but my test program only
> > hits these two) modify the flags without obtaining the lock:
> >
> > nfs_end_data_update
> > nfs_release
> > nfs_file_release
> > __fput
> > fput
> > filp_close
> > sys_close
> > syscall_call
> >
> > nfs_revalidate_mapping
> > nfs_file_write
> > do_sync_write
> > vfs_write
> > sys_write
> > syscall_call
> >
> > Running multiple instances of a simple program [1] that opens, writes
> > to, and closes NFS mounted files eventually results in the programs
> > hanging on an SMP system (see kernel .config [3]).
> >
> > I've been testing this with 100 instances of the program:
> > $ ./breaknfs 100 &
> >
> > Usually within 10 minutes, all instances of breaknfs will hang. They
> > disappear from the output of 'top' and there is no NFS
> > activity between
> > the client and server.
>
> [ sysrq output snipped... ]
>
> > I've reproduced this bug on 2.6.11.10, 2.6.12-mm2, and 2.6.13-rc2.
> >
> > With my patch against 2.6.13-rc2 below, I ran 100 instances
> > of breaknfs
> > with this patch for 14 hours and I was unable to get the
> > client to hang.
>
> i agree this is a problem.
>
> but instead of using heavyweight synchronization, why not convert the
> NFS_INO flags into atomic bitops? i have a patch that does that; would
> need to be ported to the latest kernels and tested to see if it
> addresses the problem.
>
> nick, are you interested in trying it out?

Sure. Send it my way and I'll see if I can get it updated to the latest
kernels and test it out.

Nick


-------------------------------------------------------
SF.Net email is sponsored by: Discover Easy Linux Migration Strategies
from IBM. Find simple to follow Roadmaps, straightforward articles,
informative Webcasts and more! Get everything you need to get up to
speed, fast. http://ads.osdn.com/?ad_id=7477&alloc_id=16492&op=click
_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs