2006-09-24 22:11:42

by Mark Fasheh

[permalink] [raw]
Subject: [git patches] ocfs2 post 2.6.18 features

Hi Linus,
This series completes the final set of ocfs2 patches which I wanted
to merge upstream before 2.6.18-rc1.

These patches build on top of each other to improve ocfs2 cluster
messaging/locking.

The patch is too large for e-mail, changes are broken up in git and
can also be found at:

http://www.kernel.org/pub/linux/kernel/people/mfasheh/ocfs2/ocfs2_git_patches/ocfs2-upstream-linus-20060924/


The first set removes an expensive clusterwide message sent during
unlink/rename (we call this the "dentry vote"). It gets replaced with a
cluster lock which covers a set of dentries. This gives us an improvement in
average-case unlink performance and reduces the file systems reliance on
direct cluster messaging. A patch to the VFS and NFS was required to get
this going. It's the final version of a patch which was initially mailed to
linux-kernel and linux-fsdevel on August 29:

http://marc.theaimsgroup.com/?l=linux-kernel&m=115689222430028&w=2

The relevant parties are CC'd here, and the patch is attached to this e-mail
for any last-minute review.

Essentially, ocfs2 wanted to manually d_move() inside of rename. NFS already
does this for file renames, but ocfs2 wants to do it for all rename types,
which required also making NFS handle the d_move() for all types and fixing
up the VFS to check for the "FS_RENAME_DOES_D_MOVE" flag (which used to be
FS_ODD_RENAME) in vfs_rename_dir().


The second set revamps the way inode meta data locks are named, removing
i_generation from them. This way, a meta data lock can be acquired in
ocfs2_read_locked_inode() before reading the inode block off disk. Since the
read is covered by a lock, it can remain cached and won't have to be re-read
at a later date when the lock is acquired. My tests of cold-cache stat
timings have shown this to give a performance improvement of up to 20%.


The third set is a cleanup of dlmglue.c. No actual algorithms were changed,
some duplicated code was removed and all the different lock type specific
DLM callbacks were collapsed into a generic set that all locks can share.


And finally, my apologies for sending you multiple git pull requests so
closely spaced together. I mostly just wanted to see this patch set pushed
upstream as a logical unit.

Please pull from 'upstream-linus' branch of
git://git.kernel.org/pub/scm/linux/kernel/git/mfasheh/ocfs2.git

to receive the following updates:

fs/namei.c | 6
fs/nfs/dir.c | 3
fs/nfs/super.c | 10
fs/ocfs2/cluster/tcp_internal.h | 8
fs/ocfs2/dcache.c | 359 ++++++++++++-
fs/ocfs2/dcache.h | 27
fs/ocfs2/dlm/dlmapi.h | 1
fs/ocfs2/dlm/dlmast.c | 6
fs/ocfs2/dlm/dlmcommon.h | 1
fs/ocfs2/dlm/dlmlock.c | 10
fs/ocfs2/dlm/dlmmaster.c | 4
fs/ocfs2/dlm/dlmrecovery.c | 3
fs/ocfs2/dlm/userdlm.c | 81 +-
fs/ocfs2/dlm/userdlm.h | 1
fs/ocfs2/dlmglue.c | 1094 ++++++++++++++++++++--------------------
fs/ocfs2/dlmglue.h | 21
fs/ocfs2/export.c | 8
fs/ocfs2/inode.c | 156 ++++-
fs/ocfs2/inode.h | 8
fs/ocfs2/journal.c | 3
fs/ocfs2/namei.c | 116 ++--
fs/ocfs2/ocfs2_lockid.h | 25
fs/ocfs2/super.c | 6
fs/ocfs2/sysfile.c | 6
fs/ocfs2/vote.c | 180 ------
fs/ocfs2/vote.h | 5
include/linux/fs.h | 7
27 files changed, 1245 insertions(+), 910 deletions(-)

Mark Fasheh:
ocfs2: Silence dlm error print
ocfs2: Allow binary names in the DLM
ocfs2: Update dlmfs for new dlmlock() API
ocfs2: Update dlmglue for new dlmlock() API
ocfs2: Add new cluster lock type
ocfs2: Add dentry tracking API
ocfs2: Hook rest of the file system into dentry locking API
ocfs2: Remove the dentry vote
Allow file systems to manually d_move() inside of ->rename()
ocfs2: manually d_move() during ocfs2_rename()
ocfs2: Remove special casing for inode creation in ocfs2_dentry_attach_lock()
ocfs2: Free up some space in the lvb
ocfs2: Encode i_generation in the meta data lvb
ocfs2: Remove i_generation from inode lock names
ocfs2: Clean up lock resource refresh flags
ocfs2: combine inode and generic AST functions
ocfs2: remove ->unlock_ast() callback from ocfs2_lock_res_ops
ocfs2: Add ->get_osb() dlmglue locking operation
ocfs2: combine inode and generic blocking AST functions
ocfs2: don't unconditionally pass LVB flags
ocfs2: Check for refreshing locks in generic unblock function
ocfs2: Add ->check_downconvert callback in dlmglue
ocfs2: Add ->set_lvb callback in dlmglue
ocfs2: Have the metadata lock use generic dlmglue functions
ocfs2: Remove unused dlmglue functions
ocfs2: move downconvert worker to lockres ops
ocfs2: Remove ->unblock lockres operation
ocfs2: Teach ocfs2_drop_lock() to use ->set_lvb() callback



>From 349457ccf2592c14bdf13b6706170ae2e94931b1 Mon Sep 17 00:00:00 2001
From: Mark Fasheh <[email protected]>
Date: Fri, 8 Sep 2006 14:22:21 -0700
Subject: [PATCH] Allow file systems to manually d_move() inside of ->rename()

Some file systems want to manually d_move() the dentries involved in a
rename. We can do this by making use of the FS_ODD_RENAME flag if we just
have nfs_rename() unconditionally do the d_move(). While there, we rename
the flag to be more descriptive.

OCFS2 uses this to protect that part of the rename operation with a cluster
lock.

Signed-off-by: Mark Fasheh <[email protected]>
Cc: Trond Myklebust <[email protected]>
Cc: Al Viro <[email protected]>
Cc: Christoph Hellwig <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
---
fs/namei.c | 6 +++---
fs/nfs/dir.c | 3 +--
fs/nfs/super.c | 10 +++++-----
include/linux/fs.h | 7 ++++---
4 files changed, 13 insertions(+), 13 deletions(-)

diff --git a/fs/namei.c b/fs/namei.c
index 432d6bc..6b591c0 100644
--- a/fs/namei.c
+++ b/fs/namei.c
@@ -2370,7 +2370,8 @@ static int vfs_rename_dir(struct inode *
dput(new_dentry);
}
if (!error)
- d_move(old_dentry,new_dentry);
+ if (!(old_dir->i_sb->s_type->fs_flags & FS_RENAME_DOES_D_MOVE))
+ d_move(old_dentry,new_dentry);
return error;
}

@@ -2393,8 +2394,7 @@ static int vfs_rename_other(struct inode
else
error = old_dir->i_op->rename(old_dir, old_dentry, new_dir, new_dentry);
if (!error) {
- /* The following d_move() should become unconditional */
- if (!(old_dir->i_sb->s_type->fs_flags & FS_ODD_RENAME))
+ if (!(old_dir->i_sb->s_type->fs_flags & FS_RENAME_DOES_D_MOVE))
d_move(old_dentry, new_dentry);
}
if (target)
diff --git a/fs/nfs/dir.c b/fs/nfs/dir.c
index 3419c2d..7432f1a 100644
--- a/fs/nfs/dir.c
+++ b/fs/nfs/dir.c
@@ -1669,8 +1669,7 @@ out:
if (rehash)
d_rehash(rehash);
if (!error) {
- if (!S_ISDIR(old_inode->i_mode))
- d_move(old_dentry, new_dentry);
+ d_move(old_dentry, new_dentry);
nfs_renew_times(new_dentry);
nfs_set_verifier(new_dentry, nfs_save_change_attribute(new_dir));
}
diff --git a/fs/nfs/super.c b/fs/nfs/super.c
index b99113b..e8d4003 100644
--- a/fs/nfs/super.c
+++ b/fs/nfs/super.c
@@ -71,7 +71,7 @@ static struct file_system_type nfs_fs_ty
.name = "nfs",
.get_sb = nfs_get_sb,
.kill_sb = nfs_kill_super,
- .fs_flags = FS_ODD_RENAME|FS_REVAL_DOT|FS_BINARY_MOUNTDATA,
+ .fs_flags = FS_RENAME_DOES_D_MOVE|FS_REVAL_DOT|FS_BINARY_MOUNTDATA,
};

struct file_system_type nfs_xdev_fs_type = {
@@ -79,7 +79,7 @@ struct file_system_type nfs_xdev_fs_type
.name = "nfs",
.get_sb = nfs_xdev_get_sb,
.kill_sb = nfs_kill_super,
- .fs_flags = FS_ODD_RENAME|FS_REVAL_DOT|FS_BINARY_MOUNTDATA,
+ .fs_flags = FS_RENAME_DOES_D_MOVE|FS_REVAL_DOT|FS_BINARY_MOUNTDATA,
};

static struct super_operations nfs_sops = {
@@ -107,7 +107,7 @@ static struct file_system_type nfs4_fs_t
.name = "nfs4",
.get_sb = nfs4_get_sb,
.kill_sb = nfs4_kill_super,
- .fs_flags = FS_ODD_RENAME|FS_REVAL_DOT|FS_BINARY_MOUNTDATA,
+ .fs_flags = FS_RENAME_DOES_D_MOVE|FS_REVAL_DOT|FS_BINARY_MOUNTDATA,
};

struct file_system_type nfs4_xdev_fs_type = {
@@ -115,7 +115,7 @@ struct file_system_type nfs4_xdev_fs_typ
.name = "nfs4",
.get_sb = nfs4_xdev_get_sb,
.kill_sb = nfs4_kill_super,
- .fs_flags = FS_ODD_RENAME|FS_REVAL_DOT|FS_BINARY_MOUNTDATA,
+ .fs_flags = FS_RENAME_DOES_D_MOVE|FS_REVAL_DOT|FS_BINARY_MOUNTDATA,
};

struct file_system_type nfs4_referral_fs_type = {
@@ -123,7 +123,7 @@ struct file_system_type nfs4_referral_fs
.name = "nfs4",
.get_sb = nfs4_referral_get_sb,
.kill_sb = nfs4_kill_super,
- .fs_flags = FS_ODD_RENAME|FS_REVAL_DOT|FS_BINARY_MOUNTDATA,
+ .fs_flags = FS_RENAME_DOES_D_MOVE|FS_REVAL_DOT|FS_BINARY_MOUNTDATA,
};

static struct super_operations nfs4_sops = {
diff --git a/include/linux/fs.h b/include/linux/fs.h
index 555bc19..1d3e601 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -92,9 +92,10 @@ #define SEL_EX 4
#define FS_REQUIRES_DEV 1
#define FS_BINARY_MOUNTDATA 2
#define FS_REVAL_DOT 16384 /* Check the paths ".", ".." for staleness */
-#define FS_ODD_RENAME 32768 /* Temporary stuff; will go away as soon
- * as nfs_rename() will be cleaned up
- */
+#define FS_RENAME_DOES_D_MOVE 32768 /* FS will handle d_move()
+ * during rename() internally.
+ */
+
/*
* These are the fs-independent mount-flags: up to 32 flags are supported
*/
--
1.4.2.1


2006-09-24 22:36:27

by Linus Torvalds

[permalink] [raw]
Subject: Re: [git patches] ocfs2 post 2.6.18 features


Ok, pulled, and pushed out.

And btw, I appreciate how you separately explained the fs/namei.c change,
together with the diff for just that part. This is a prime example of how
to make things easier for me to verify, when I see something touching a
generic file. Thanks.

I do have a small nit: when you ask me to pull, you did:

> Please pull from 'upstream-linus' branch of
> git://git.kernel.org/pub/scm/linux/kernel/git/mfasheh/ocfs2.git

I really prefer to see the branch-name at the end of the line (don't worry
if it's more than 80 characters), because that way I don't make the
mistake of cutting-and-pasting the git URL, and forgetting the branch.

So if you can update your "please pull" script to do that, I'd be even
happier.

Linus

2006-09-24 23:29:22

by Mark Fasheh

[permalink] [raw]
Subject: Re: [git patches] ocfs2 post 2.6.18 features

On Sun, Sep 24, 2006 at 03:36:14PM -0700, Linus Torvalds wrote:
>
> Ok, pulled, and pushed out.
Great, thanks!


> And btw, I appreciate how you separately explained the fs/namei.c change,
> together with the diff for just that part. This is a prime example of how
> to make things easier for me to verify, when I see something touching a
> generic file. Thanks.
Oh, excellent - I'm glad that worked out. I asked Andrew a couple weeks back
how we should handle that patch, and he indicated that I could push it if I
clearly noted its existence in my e-mail.


> I do have a small nit: when you ask me to pull, you did:
>
> > Please pull from 'upstream-linus' branch of
> > git://git.kernel.org/pub/scm/linux/kernel/git/mfasheh/ocfs2.git
>
> I really prefer to see the branch-name at the end of the line (don't worry
> if it's more than 80 characters), because that way I don't make the
> mistake of cutting-and-pasting the git URL, and forgetting the branch.
No problem - script updated. I copied the format from one of Jeff's more
recent pull mails:

Please pull from 'upstream-linus' branch of
git://git.kernel.org/pub/scm/linux/kernel/git/mfasheh/ocfs2.git
upstream-linus

Hope that works for you.
--Mark

--
Mark Fasheh
Senior Software Developer, Oracle
[email protected]