I'm currently working on removing the "dentry" vote from ocfs2. This is a
network message broadcasted to all mounted nodes on every unlink() or
rename(). Upon recieving the message, those nodes d_delete() the dentry in
question.
What I'm doing is replacing the broadcast mechanism with a cluster lock
which covers a set of dentries. Every node gets a read only lock, until an
unlink during which the unlinking node, will request an exclusive lock,
forcing the other nodes who care about that dentry to d_delete() it. The
effect is that we retain a very lightweight ->d_revalidate(), and at the
same time get to make large improvements to the average case performance
of the ocfs2 unlink and rename operations.
I have uncovered a very small race with rename and d_move() however. The
d_move() for a rename is after the ->rename() callback, outside the parent
directory cluster locks which normally protect the name creation/removal
mechanism. The worry is that after ocfs2_rename(), but before the d_move() a
different node can discover the new name (created during the rename) and
unlink it, forcing a d_delete(). d_move() it seems, unconditionally rehashes
the (renamed) dentry and so if it's done after a d_delete() we could get
some inconsitent state amongst the nodes.
My solution is to simply allow ocfs2 to do the d_move() inside of
ocfs2_rename() by introducing a FS_RENAME_DOES_D_MOVE flag. OCFS2 won't
actually change how or why d_move() is called during a rename, it just uses
this flag to change where the call is made.
For any interested parties, a preliminary ocfs2 patch is at
http://oss.oracle.com/~mfasheh/vote_removal/remove_dentry_vote_wip.patch
The interesting stuff is mostly in fs/ocfs2/dcache.[ch]
The ocfs2 patch isn't posted via e-mail because it's still a work in
progress. That said, it actually works quite well, and the only things I
have left to do are unrelated to rename - the patch needs to be split up
into smaller ones, and a pair of (minor) dlm related fixups are needed.
--Mark
From: Mark Fasheh <[email protected]>
[PATCH] Allow file systems to manually d_move() inside of ->rename()
Some file systems want to be able to manually d_move() the dentries involved
in a rename. Introduce a flag which instructs vfs_rename_dir() and
vfs_rename_other() that it has already been handled internally.
OCFS2 uses this to protect that part of the entire operation with a cluster
lock.
Signed-off-by: Mark Fasheh <[email protected]>
diff --git a/fs/namei.c b/fs/namei.c
index c784e8b..e5a8478 100644
--- a/fs/namei.c
+++ b/fs/namei.c
@@ -2353,7 +2353,8 @@ static int vfs_rename_dir(struct inode *
dput(new_dentry);
}
if (!error)
- d_move(old_dentry,new_dentry);
+ if (!(old_dir->i_sb->s_type->fs_flags & FS_RENAME_DOES_D_MOVE))
+ d_move(old_dentry,new_dentry);
return error;
}
@@ -2377,7 +2378,7 @@ static int vfs_rename_other(struct inode
error = old_dir->i_op->rename(old_dir, old_dentry, new_dir, new_dentry);
if (!error) {
/* The following d_move() should become unconditional */
- if (!(old_dir->i_sb->s_type->fs_flags & FS_ODD_RENAME))
+ if (!(old_dir->i_sb->s_type->fs_flags & (FS_ODD_RENAME|FS_RENAME_DOES_D_MOVE)))
d_move(old_dentry, new_dentry);
}
if (target)
diff --git a/include/linux/fs.h b/include/linux/fs.h
index e04a5cf..8e9a7ca 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -87,6 +87,7 @@ #define SEL_EX 4
/* public flags for file_system_type */
#define FS_REQUIRES_DEV 1
#define FS_BINARY_MOUNTDATA 2
+#define FS_RENAME_DOES_D_MOVE 4
#define FS_REVAL_DOT 16384 /* Check the paths ".", ".." for staleness */
#define FS_ODD_RENAME 32768 /* Temporary stuff; will go away as soon
* as nfs_rename() will be cleaned up
On Tue, 2006-08-29 at 14:54 -0700, Mark Fasheh wrote:
> diff --git a/fs/namei.c b/fs/namei.c
> index c784e8b..e5a8478 100644
> --- a/fs/namei.c
> +++ b/fs/namei.c
> @@ -2353,7 +2353,8 @@ static int vfs_rename_dir(struct inode *
> dput(new_dentry);
> }
> if (!error)
> - d_move(old_dentry,new_dentry);
> + if (!(old_dir->i_sb->s_type->fs_flags & FS_RENAME_DOES_D_MOVE))
> + d_move(old_dentry,new_dentry);
> return error;
> }
>
> @@ -2377,7 +2378,7 @@ static int vfs_rename_other(struct inode
> error = old_dir->i_op->rename(old_dir, old_dentry, new_dir, new_dentry);
> if (!error) {
> /* The following d_move() should become unconditional */
> - if (!(old_dir->i_sb->s_type->fs_flags & FS_ODD_RENAME))
> + if (!(old_dir->i_sb->s_type->fs_flags & (FS_ODD_RENAME|FS_RENAME_DOES_D_MOVE)))
> d_move(old_dentry, new_dentry);
> }
> if (target)
> diff --git a/include/linux/fs.h b/include/linux/fs.h
> index e04a5cf..8e9a7ca 100644
> --- a/include/linux/fs.h
> +++ b/include/linux/fs.h
> @@ -87,6 +87,7 @@ #define SEL_EX 4
> /* public flags for file_system_type */
> #define FS_REQUIRES_DEV 1
> #define FS_BINARY_MOUNTDATA 2
> +#define FS_RENAME_DOES_D_MOVE 4
> #define FS_REVAL_DOT 16384 /* Check the paths ".", ".." for staleness */
> #define FS_ODD_RENAME 32768 /* Temporary stuff; will go away as soon
> * as nfs_rename() will be cleaned up
> -
Why have 2 synonyms for the FS_ODD_RENAME stuff? Just fix up the NFS
client to do the d_move() unconditionally, and add a check for
FS_ODD_RENAME to vfs_rename_dir().
Cheers,
Trond
Hi Trond,
On Tue, Aug 29, 2006 at 08:35:52PM -0400, Trond Myklebust wrote:
> Why have 2 synonyms for the FS_ODD_RENAME stuff? Just fix up the NFS
> client to do the d_move() unconditionally, and add a check for
> FS_ODD_RENAME to vfs_rename_dir().
Though I read through some of the nfs code when developing my ocfs2 patch, I
wasn't sure that approach was desired.
Partly because I'm not experienced enough with nfs internals to have known
whether that was safe to do, and partly because the comment in
vfs_rename_other() seemed to indicate that FS_ODD_RENAME was a temporary
solution for NFS, whereas I was looking for something more permanent.
That all said, how does this look? I took the liberty of renaming the flag
to something a little more descriptive.
During an irc conversation, Chuck pointed out that perhaps a better solution
is to just internalize the d_move() to all file systems. It makes the patch
a bit more sweeping, but I'm willing to handle it if everybody agrees on
that approach.
--Mark
From: Mark Fasheh <[email protected]>
[PATCH] Allow file systems to manually d_move() inside of ->rename()
Some file systems want to manually d_move() the dentries involved in a
rename. We can do this by making use of the FS_ODD_RENAME flag if we just
have nfs_rename() unconditionally do the d_move(). While there, we rename
the flag to be more descriptive.
OCFS2 uses this to protect that part of the rename operation with a cluster
lock.
Signed-off-by: Mark Fasheh <[email protected]>
diff --git a/fs/namei.c b/fs/namei.c
index c784e8b..29418ec 100644
--- a/fs/namei.c
+++ b/fs/namei.c
@@ -2353,7 +2353,8 @@ static int vfs_rename_dir(struct inode *
dput(new_dentry);
}
if (!error)
- d_move(old_dentry,new_dentry);
+ if (!(old_dir->i_sb->s_type->fs_flags & FS_RENAME_DOES_D_MOVE))
+ d_move(old_dentry,new_dentry);
return error;
}
@@ -2376,8 +2377,7 @@ static int vfs_rename_other(struct inode
else
error = old_dir->i_op->rename(old_dir, old_dentry, new_dir, new_dentry);
if (!error) {
- /* The following d_move() should become unconditional */
- if (!(old_dir->i_sb->s_type->fs_flags & FS_ODD_RENAME))
+ if (!(old_dir->i_sb->s_type->fs_flags & FS_RENAME_DOES_D_MOVE))
d_move(old_dentry, new_dentry);
}
if (target)
diff --git a/fs/nfs/dir.c b/fs/nfs/dir.c
index 3ddda6f..8ead2b9 100644
--- a/fs/nfs/dir.c
+++ b/fs/nfs/dir.c
@@ -1623,8 +1623,7 @@ out:
if (rehash)
d_rehash(rehash);
if (!error) {
- if (!S_ISDIR(old_inode->i_mode))
- d_move(old_dentry, new_dentry);
+ d_move(old_dentry, new_dentry);
nfs_renew_times(new_dentry);
nfs_set_verifier(new_dentry, nfs_save_change_attribute(new_dir));
}
diff --git a/fs/nfs/super.c b/fs/nfs/super.c
index e8a9bee..6dd17db 100644
--- a/fs/nfs/super.c
+++ b/fs/nfs/super.c
@@ -120,7 +120,7 @@ static struct file_system_type nfs_fs_ty
.name = "nfs",
.get_sb = nfs_get_sb,
.kill_sb = nfs_kill_super,
- .fs_flags = FS_ODD_RENAME|FS_REVAL_DOT|FS_BINARY_MOUNTDATA,
+ .fs_flags = FS_RENAME_DOES_D_MOVE|FS_REVAL_DOT|FS_BINARY_MOUNTDATA,
};
struct file_system_type clone_nfs_fs_type = {
@@ -128,7 +128,7 @@ struct file_system_type clone_nfs_fs_typ
.name = "nfs",
.get_sb = nfs_clone_nfs_sb,
.kill_sb = nfs_kill_super,
- .fs_flags = FS_ODD_RENAME|FS_REVAL_DOT|FS_BINARY_MOUNTDATA,
+ .fs_flags = FS_RENAME_DOES_D_MOVE|FS_REVAL_DOT|FS_BINARY_MOUNTDATA,
};
static struct super_operations nfs_sops = {
@@ -156,7 +156,7 @@ static struct file_system_type nfs4_fs_t
.name = "nfs4",
.get_sb = nfs4_get_sb,
.kill_sb = nfs4_kill_super,
- .fs_flags = FS_ODD_RENAME|FS_REVAL_DOT|FS_BINARY_MOUNTDATA,
+ .fs_flags = FS_RENAME_DOES_D_MOVE|FS_REVAL_DOT|FS_BINARY_MOUNTDATA,
};
struct file_system_type clone_nfs4_fs_type = {
@@ -164,7 +164,7 @@ struct file_system_type clone_nfs4_fs_ty
.name = "nfs4",
.get_sb = nfs_clone_nfs4_sb,
.kill_sb = nfs4_kill_super,
- .fs_flags = FS_ODD_RENAME|FS_REVAL_DOT|FS_BINARY_MOUNTDATA,
+ .fs_flags = FS_RENAME_DOES_D_MOVE|FS_REVAL_DOT|FS_BINARY_MOUNTDATA,
};
struct file_system_type nfs_referral_nfs4_fs_type = {
@@ -172,7 +172,7 @@ struct file_system_type nfs_referral_nfs
.name = "nfs4",
.get_sb = nfs_referral_nfs4_sb,
.kill_sb = nfs4_kill_super,
- .fs_flags = FS_ODD_RENAME|FS_REVAL_DOT|FS_BINARY_MOUNTDATA,
+ .fs_flags = FS_RENAME_DOES_D_MOVE|FS_REVAL_DOT|FS_BINARY_MOUNTDATA,
};
static struct super_operations nfs4_sops = {
diff --git a/include/linux/fs.h b/include/linux/fs.h
index e04a5cf..7b5f889 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -88,9 +88,10 @@ #define SEL_EX 4
#define FS_REQUIRES_DEV 1
#define FS_BINARY_MOUNTDATA 2
#define FS_REVAL_DOT 16384 /* Check the paths ".", ".." for staleness */
-#define FS_ODD_RENAME 32768 /* Temporary stuff; will go away as soon
- * as nfs_rename() will be cleaned up
- */
+#define FS_RENAME_DOES_D_MOVE 32768 /* FS will handle d_move()
+ * during rename() internally.
+ */
+
/*
* These are the fs-independent mount-flags: up to 32 flags are supported
*/