2009-01-06 10:02:52

by Eric Sesterhenn

[permalink] [raw]
Subject: (unknown)

hi,

with todays -git i see the following lockdep warning, that wasnt there
yesterday. It occurs after booting the box.

[ 83.741022] Installing knfsd (copyright (C) 1996 [email protected]).
[ 84.115838] NFSD: Using /var/lib/nfs/v4recovery as the NFSv4 state
recovery directory
[ 84.133473] NFSD: starting 90-second grace period
[ 174.132715]
[ 174.132724] =============================================
[ 174.133004] [ INFO: possible recursive locking detected ]
[ 174.133133] 2.6.28 #85
[ 174.133232] ---------------------------------------------
[ 174.133428] nfsd4/3693 is trying to acquire lock:
[ 174.133548] (&type->i_mutex_dir_key#2){--..}, at: [<c01c4212>]
vfs_fsync+0x62/0xe0
[ 174.133991]
[ 174.133994] but task is already holding lock:
[ 174.134251] (&type->i_mutex_dir_key#2){--..}, at: [<d1c224f7>]
nfsd4_sync_rec_dir+0x17/0x40 [nfsd]
[ 174.134774]
[ 174.134777] other info that might help us debug this:
[ 174.134983] 4 locks held by nfsd4/3693:
[ 174.135151] #0: (nfsd4){--..}, at: [<c0139d4a>]
run_workqueue+0x7a/0x1e0
[ 174.135539] #1: ((laundromat_work).work){--..}, at: [<c0139d4a>]
run_workqueue+0x7a/0x1e0
[ 174.135968] #2: (client_mutex){--..}, at: [<d1c1e9f6>]
laundromat_main+0x26/0x2a0 [nfsd]
[ 174.136021] #3: (&type->i_mutex_dir_key#2){--..}, at: [<d1c224f7>]
nfsd4_sync_rec_dir+0x17/0x40 [nfsd]
[ 174.136021]
[ 174.136021] stack backtrace:
[ 174.136021] Pid: 3693, comm: nfsd4 Not tainted 2.6.28 #85
[ 174.136021] Call Trace:
[ 174.136021] [<c05abdb6>] ? printk+0x18/0x1a
[ 174.136021] [<c014f626>] __lock_acquire+0xe76/0x1110
[ 174.136021] [<c014f934>] lock_acquire+0x74/0xa0
[ 174.136021] [<c01c4212>] ? vfs_fsync+0x62/0xe0
[ 174.136021] [<c05acfac>] mutex_lock_nested+0x8c/0x2c0
[ 174.136021] [<c01c4212>] ? vfs_fsync+0x62/0xe0
[ 174.136021] [<c01c4212>] ? vfs_fsync+0x62/0xe0
[ 174.136021] [<c01c4212>] vfs_fsync+0x62/0xe0
[ 174.136021] [<d1c074be>] nfsd_sync_dir+0xe/0x10 [nfsd]
[ 174.136021] [<d1c22501>] nfsd4_sync_rec_dir+0x21/0x40 [nfsd]
[ 174.136021] [<d1c22595>] nfsd4_recdir_purge_old+0x75/0x80 [nfsd]
[ 174.136021] [<d1c1ea1e>] laundromat_main+0x4e/0x2a0 [nfsd]
[ 174.136021] [<c0139d4a>] ? run_workqueue+0x7a/0x1e0
[ 174.136021] [<c0139dac>] run_workqueue+0xdc/0x1e0
[ 174.136021] [<c0139d4a>] ? run_workqueue+0x7a/0x1e0
[ 174.136021] [<d1c1e9d0>] ? laundromat_main+0x0/0x2a0 [nfsd]
[ 174.136021] [<c013a1a7>] worker_thread+0x87/0xf0
[ 174.136021] [<c013dac0>] ? autoremove_wake_function+0x0/0x50
[ 174.136021] [<c013a120>] ? worker_thread+0x0/0xf0
[ 174.136021] [<c013d80a>] kthread+0x3a/0x70
[ 174.136021] [<c013d7d0>] ? kthread+0x0/0x70
[ 174.136021] [<c0103cf3>] kernel_thread_helper+0x7/0x14

Greetings, Eric


2009-01-06 17:29:17

by J. Bruce Fields

[permalink] [raw]
Subject: [PATCH] nfsd: fix double-lock of i_mutex on nfsv4 recovery directory

From: J. Bruce Fields <[email protected]>

This mutex should be unnecessary, and as of 4c728ef583b3d "add a
vfs_fsync helper" it triggers a lockdep warning.

Signed-off-by: J. Bruce Fields <[email protected]>
---
fs/nfsd/nfs4recover.c | 2 --
1 files changed, 0 insertions(+), 2 deletions(-)

On Tue, Jan 06, 2009 at 11:02:47AM +0100, Eric Sesterhenn wrote:
> hi,
>
> with todays -git i see the following lockdep warning, that wasnt there
> yesterday. It occurs after booting the box.

Thanks. The much-hated recovery code strikes again. I think this
should do it?

--b.

diff --git a/fs/nfsd/nfs4recover.c b/fs/nfsd/nfs4recover.c
index 74f7b67..9fb5a10 100644
--- a/fs/nfsd/nfs4recover.c
+++ b/fs/nfsd/nfs4recover.c
@@ -127,9 +127,7 @@ out_no_tfm:
static void
nfsd4_sync_rec_dir(void)
{
- mutex_lock(&rec_dir.dentry->d_inode->i_mutex);
nfsd_sync_dir(rec_dir.dentry);
- mutex_unlock(&rec_dir.dentry->d_inode->i_mutex);
}

int
--
1.5.5.rc1


2009-01-06 17:58:33

by J. Bruce Fields

[permalink] [raw]
Subject: Re: [PATCH] nfsd: fix double-lock of i_mutex on nfsv4 recovery directory

On Tue, Jan 06, 2009 at 12:29:14PM -0500, bfields wrote:
> From: J. Bruce Fields <[email protected]>
>
> This mutex should be unnecessary, and as of 4c728ef583b3d "add a
> vfs_fsync helper" it triggers a lockdep warning.
>
> Signed-off-by: J. Bruce Fields <[email protected]>
> ---
> fs/nfsd/nfs4recover.c | 2 --
> 1 files changed, 0 insertions(+), 2 deletions(-)
>
> On Tue, Jan 06, 2009 at 11:02:47AM +0100, Eric Sesterhenn wrote:
> > hi,
> >
> > with todays -git i see the following lockdep warning, that wasnt there
> > yesterday. It occurs after booting the box.
>
> Thanks. The much-hated recovery code strikes again. I think this
> should do it?

No, then we just run into a deadlocks in unlink, create, or any of the
other nfsd operations that want the parent lock to cover more than just
the sync. So 4c728ef583b3d just doesn't work for nfsd.

--b.

>
> --b.
>
> diff --git a/fs/nfsd/nfs4recover.c b/fs/nfsd/nfs4recover.c
> index 74f7b67..9fb5a10 100644
> --- a/fs/nfsd/nfs4recover.c
> +++ b/fs/nfsd/nfs4recover.c
> @@ -127,9 +127,7 @@ out_no_tfm:
> static void
> nfsd4_sync_rec_dir(void)
> {
> - mutex_lock(&rec_dir.dentry->d_inode->i_mutex);
> nfsd_sync_dir(rec_dir.dentry);
> - mutex_unlock(&rec_dir.dentry->d_inode->i_mutex);
> }
>
> int
> --
> 1.5.5.rc1
>

2009-01-06 19:23:04

by J. Bruce Fields

[permalink] [raw]
Subject: Re: [PATCH] nfsd: fix double-lock of i_mutex on nfsv4 recovery directory

On Tue, Jan 06, 2009 at 12:58:29PM -0500, bfields wrote:
> No, then we just run into a deadlocks in unlink, create, or any of the
> other nfsd operations that want the parent lock to cover more than just
> the sync. So 4c728ef583b3d just doesn't work for nfsd.

We could add another nfsd exception to vfs_sync() by taking the i_mutex
only in the "file != NULL" case. Perhaps there'd be some advantage to
having nfsd's peculiarity noted in the common code; I don't have
terifically strong feelings either way.

However I'm inclined to think that at that point the special cases get
out of hand and that it would be better to keep this back in the nfsd
code itself. The following (tested this time) seems to work.

--b.

commit 33e3950dc2eae7484e79685083c304d93013e3ec
Author: J. Bruce Fields <[email protected]>
Date: Tue Jan 6 13:37:03 2009 -0500

nfsd: fix double-locks of directory mutex

A number of nfsd operations depend on the i_mutex to cover more code
than just the fsync, so the approach of 4c728ef583b3d8 "add a vfs_fsync
helper" doesn't work for nfsd. Revert the parts of those patches that
touch nfsd, and remove the logic from vfs_nfsd that was needed only for
the special case of nfsd.

Reported-by: Eric Sesterhenn <[email protected]>
Signed-off-by: J. Bruce Fields <[email protected]>

diff --git a/fs/nfsd/vfs.c b/fs/nfsd/vfs.c
index 44aa92a..6e50aaa 100644
--- a/fs/nfsd/vfs.c
+++ b/fs/nfsd/vfs.c
@@ -744,16 +744,44 @@ nfsd_close(struct file *filp)
fput(filp);
}

+/*
+ * Sync a file
+ * As this calls fsync (not fdatasync) there is no need for a write_inode
+ * after it.
+ */
+static inline int nfsd_dosync(struct file *filp, struct dentry *dp,
+ const struct file_operations *fop)
+{
+ struct inode *inode = dp->d_inode;
+ int (*fsync) (struct file *, struct dentry *, int);
+ int err;
+
+ err = filemap_fdatawrite(inode->i_mapping);
+ if (err == 0 && fop && (fsync = fop->fsync))
+ err = fsync(filp, dp, 0);
+ if (err == 0)
+ err = filemap_fdatawait(inode->i_mapping);
+
+ return err;
+}
+
static int
nfsd_sync(struct file *filp)
{
- return vfs_fsync(filp, filp->f_path.dentry, 0);
+ int err;
+ struct inode *inode = filp->f_path.dentry->d_inode;
+ dprintk("nfsd: sync file %s\n", filp->f_path.dentry->d_name.name);
+ mutex_lock(&inode->i_mutex);
+ err=nfsd_dosync(filp, filp->f_path.dentry, filp->f_op);
+ mutex_unlock(&inode->i_mutex);
+
+ return err;
}

int
-nfsd_sync_dir(struct dentry *dentry)
+nfsd_sync_dir(struct dentry *dp)
{
- return vfs_fsync(NULL, dentry, 0);
+ return nfsd_dosync(NULL, dp, dp->d_inode->i_fop);
}

/*
diff --git a/fs/sync.c b/fs/sync.c
index 0921d6d..8e0a656 100644
--- a/fs/sync.c
+++ b/fs/sync.c
@@ -83,10 +83,6 @@ int file_fsync(struct file *filp, struct dentry *dentry, int datasync)
*
* Write back data and metadata for @file to disk. If @datasync is
* set only metadata needed to access modified file data is written.
- *
- * In case this function is called from nfsd @file may be %NULL and
- * only @dentry is set. This can only happen when the filesystem
- * implements the export_operations API.
*/
int vfs_fsync(struct file *file, struct dentry *dentry, int datasync)
{
@@ -94,18 +90,8 @@ int vfs_fsync(struct file *file, struct dentry *dentry, int datasync)
struct address_space *mapping;
int err, ret;

- /*
- * Get mapping and operations from the file in case we have
- * as file, or get the default values for them in case we
- * don't have a struct file available. Damn nfsd..
- */
- if (file) {
- mapping = file->f_mapping;
- fop = file->f_op;
- } else {
- mapping = dentry->d_inode->i_mapping;
- fop = dentry->d_inode->i_fop;
- }
+ mapping = file->f_mapping;
+ fop = file->f_op;

if (!fop || !fop->fsync) {
ret = -EINVAL;

2009-01-06 19:32:59

by Christoph Hellwig

[permalink] [raw]
Subject: Re: [PATCH] nfsd: fix double-lock of i_mutex on nfsv4 recovery directory

On Tue, Jan 06, 2009 at 02:23:01PM -0500, J. Bruce Fields wrote:
> On Tue, Jan 06, 2009 at 12:58:29PM -0500, bfields wrote:
> > No, then we just run into a deadlocks in unlink, create, or any of the
> > other nfsd operations that want the parent lock to cover more than just
> > the sync. So 4c728ef583b3d just doesn't work for nfsd.
>
> We could add another nfsd exception to vfs_sync() by taking the i_mutex
> only in the "file != NULL" case. Perhaps there'd be some advantage to
> having nfsd's peculiarity noted in the common code; I don't have
> terifically strong feelings either way.
>
> However I'm inclined to think that at that point the special cases get
> out of hand and that it would be better to keep this back in the nfsd
> code itself. The following (tested this time) seems to work.

Sorry, this is not going to do it. i_mutex is going to move into
->fsync soon unless we kill it entirely pretty soon. And for the
cases of stckable filesystems what you do (aswell as the <= 2.6.28 case)
is broken, as the NFS locking scheme can't apply to the lower stacked
filesystem.


2009-01-06 20:07:54

by J. Bruce Fields

[permalink] [raw]
Subject: Re: [PATCH] nfsd: fix double-lock of i_mutex on nfsv4 recovery directory

On Tue, Jan 06, 2009 at 02:32:56PM -0500, Christoph Hellwig wrote:
> On Tue, Jan 06, 2009 at 02:23:01PM -0500, J. Bruce Fields wrote:
> > On Tue, Jan 06, 2009 at 12:58:29PM -0500, bfields wrote:
> > > No, then we just run into a deadlocks in unlink, create, or any of the
> > > other nfsd operations that want the parent lock to cover more than just
> > > the sync. So 4c728ef583b3d just doesn't work for nfsd.
> >
> > We could add another nfsd exception to vfs_sync() by taking the i_mutex
> > only in the "file != NULL" case. Perhaps there'd be some advantage to
> > having nfsd's peculiarity noted in the common code; I don't have
> > terifically strong feelings either way.
> >
> > However I'm inclined to think that at that point the special cases get
> > out of hand and that it would be better to keep this back in the nfsd
> > code itself. The following (tested this time) seems to work.
>
> Sorry, this is not going to do it. i_mutex is going to move into
> ->fsync soon unless we kill it entirely pretty soon. And for the
> cases of stckable filesystems what you do (aswell as the <= 2.6.28 case)
> is broken, as the NFS locking scheme can't apply to the lower stacked
> filesystem.

We need an immediate fix for the regression--nfsd (all versions) just
doesn't work at all in mainline right now. My apologies for not having
seen that earlier.

A more complete solution is going to take more than a day or two.

--b.