2009-09-24 16:10:59

by Sage Weil

[permalink] [raw]
Subject: [PATCH] vfs: make real_lookup do dentry revalidation with i_mutex held

real_lookup() is called by do_lookup() if dentry revalidation fails. If
the cache is re-populated while waiting for i_mutex, it may find that
a d_lookup() subsequently succeeds (see the "Uhhuh! Nasty case" comment).

Previously, real_lookup() would drop i_mutex and do_revalidate() again. If
revalidate failed _again_, however, it would give up with -ENOENT. The
problem here that network file systems may be invalidating dentries via
server callbacks, e.g. due to concurrent access from another client, and
-ENOENT is frequently the wrong answer.

This problem has been seen with both Lustre and Ceph. It seems possible
to hit this case with NFS as well if the cache lifetime is very short.

Instead, we should do_revalidate() while i_mutex is still held. If
revalidation fails, we can move on to a ->lookup() and ensure a correct
result without worrying about any subsequent races.

Note that do_revalidate() is called with i_mutex held elsewhere. For
example, do_filp_open(), lookup_create(), do_unlinkat(), do_rmdir(),
and possibly others all take the directory i_mutex, and then

-> lookup_hash
-> __lookup_hash
-> cached_lookup
-> do_revalidate

so this does not introduce any new locking rules for d_revalidate
implementations.

Yes, the goto is ugly. A cleanup patch follows.

CC: Ian Kent <[email protected]>
CC: Christoph Hellwig <[email protected]>
CC: Al Viro <[email protected]>
CC: Andreas Dilger <[email protected]>
Signed-off-by: Yehuda Sadeh <[email protected]>
Signed-off-by: Sage Weil <[email protected]>
---
fs/namei.c | 5 +++--
1 files changed, 3 insertions(+), 2 deletions(-)

diff --git a/fs/namei.c b/fs/namei.c
index d11f404..f74ddb3 100644
--- a/fs/namei.c
+++ b/fs/namei.c
@@ -497,6 +497,7 @@ static struct dentry * real_lookup(struct dentry * parent, struct qstr * name, s
if (!result) {
struct dentry *dentry;

+do_the_lookup:
/* Don't create child dentry for a dead directory. */
result = ERR_PTR(-ENOENT);
if (IS_DEADDIR(dir))
@@ -520,12 +521,12 @@ out_unlock:
* Uhhuh! Nasty case: the cache was re-populated while
* we waited on the semaphore. Need to revalidate.
*/
- mutex_unlock(&dir->i_mutex);
if (result->d_op && result->d_op->d_revalidate) {
result = do_revalidate(result, nd);
if (!result)
- result = ERR_PTR(-ENOENT);
+ goto do_the_lookup;
}
+ mutex_unlock(&dir->i_mutex);
return result;
}

--
1.5.6.5


2009-09-24 16:11:00

by Sage Weil

[permalink] [raw]
Subject: [PATCH] vfs: clean up real_lookup

Get rid of the goto by flipping the if (!result) over. Make the
comments a bit more descriptive. Fix a few kernel style problems.
No functional changes.

CC: Ian Kent <[email protected]>
CC: Christoph Hellwig <[email protected]>
Signed-off-by: Yehuda Sadeh <[email protected]>
Signed-off-by: Sage Weil <[email protected]>
---
fs/namei.c | 65 +++++++++++++++++++++++++++++++----------------------------
1 files changed, 34 insertions(+), 31 deletions(-)

diff --git a/fs/namei.c b/fs/namei.c
index f74ddb3..6770dde 100644
--- a/fs/namei.c
+++ b/fs/namei.c
@@ -469,19 +469,20 @@ ok:
* This is called when everything else fails, and we actually have
* to go to the low-level filesystem to find out what we should do..
*
- * We get the directory semaphore, and after getting that we also
+ * We get the directory mutex, and after getting that we also
* make sure that nobody added the entry to the dcache in the meantime..
* SMP-safe
*/
-static struct dentry * real_lookup(struct dentry * parent, struct qstr * name, struct nameidata *nd)
+static struct dentry *real_lookup(struct dentry *parent, struct qstr *name,
+ struct nameidata *nd)
{
- struct dentry * result;
+ struct dentry *result, *dentry;
struct inode *dir = parent->d_inode;

mutex_lock(&dir->i_mutex);
/*
* First re-do the cached lookup just in case it was created
- * while we waited for the directory semaphore..
+ * while we waited for the directory mutex.
*
* FIXME! This could use version numbering or similar to
* avoid unnecessary cache lookups.
@@ -494,38 +495,40 @@ static struct dentry * real_lookup(struct dentry * parent, struct qstr * name, s
* so doing d_lookup() (with seqlock), instead of lockfree __d_lookup
*/
result = d_lookup(parent, name);
- if (!result) {
- struct dentry *dentry;
-
-do_the_lookup:
- /* Don't create child dentry for a dead directory. */
- result = ERR_PTR(-ENOENT);
- if (IS_DEADDIR(dir))
- goto out_unlock;
-
- dentry = d_alloc(parent, name);
- result = ERR_PTR(-ENOMEM);
- if (dentry) {
- result = dir->i_op->lookup(dir, dentry, nd);
+ if (result) {
+ /*
+ * The cache was re-populated while we waited on the
+ * mutex. We need to revalidate, this time while
+ * holding i_mutex (to avoid another race).
+ */
+ if (result->d_op && result->d_op->d_revalidate) {
+ result = do_revalidate(result, nd);
if (result)
- dput(dentry);
- else
- result = dentry;
+ goto out_unlock;
+ /*
+ * The dentry was left behind invalid. Just
+ * do the lookup.
+ */
+ } else {
+ goto out_unlock;
}
-out_unlock:
- mutex_unlock(&dir->i_mutex);
- return result;
}

- /*
- * Uhhuh! Nasty case: the cache was re-populated while
- * we waited on the semaphore. Need to revalidate.
- */
- if (result->d_op && result->d_op->d_revalidate) {
- result = do_revalidate(result, nd);
- if (!result)
- goto do_the_lookup;
+ /* Don't create child dentry for a dead directory. */
+ result = ERR_PTR(-ENOENT);
+ if (IS_DEADDIR(dir))
+ goto out_unlock;
+
+ dentry = d_alloc(parent, name);
+ result = ERR_PTR(-ENOMEM);
+ if (dentry) {
+ result = dir->i_op->lookup(dir, dentry, nd);
+ if (result)
+ dput(dentry);
+ else
+ result = dentry;
}
+out_unlock:
mutex_unlock(&dir->i_mutex);
return result;
}
--
1.5.6.5