2003-07-08 13:51:20

by Trond Myklebust

[permalink] [raw]
Subject: RE: [PATCH] path_lookup for 2.4.20-pre4 ([email protected])


This patch breaks NFS close-to-open cache consistency as it undoes
those changes that provide path revalidation for the case of
open(".").
The changelog entry doesn't even attempt to document this removal...

If people want to revert to the old behaviour, then there are ways of
doing this that will not affect NFS. Something like the appended patch
for instance...

Cheers,
Trond

diff -u --recursive --new-file linux-2.4.22-odirect/fs/namei.c linux-2.4.22-fix_cto/fs/namei.c
--- linux-2.4.22-odirect/fs/namei.c 2003-06-27 13:34:41.000000000 +0200
+++ linux-2.4.22-fix_cto/fs/namei.c 2003-07-08 15:51:08.000000000 +0200
@@ -563,7 +563,7 @@
while (*name=='/')
name++;
if (!*name)
- goto return_base;
+ goto return_reval;

inode = nd->dentry->d_inode;
if (current->link_count)
@@ -686,7 +686,7 @@
inode = nd->dentry->d_inode;
/* fallthrough */
case 1:
- goto return_base;
+ goto return_reval;
}
if (nd->dentry->d_op && nd->dentry->d_op->d_hash) {
err = nd->dentry->d_op->d_hash(nd->dentry, &this);
@@ -732,6 +732,24 @@
nd->last_type = LAST_DOT;
else if (this.len == 2 && this.name[1] == '.')
nd->last_type = LAST_DOTDOT;
+return_reval:
+ /*
+ * We bypassed the ordinary revalidation routines.
+ * We may need to check the cached dentry for staleness.
+ */
+ if (nd->dentry && nd->dentry->d_sb &&
+ (nd->dentry->d_sb->s_type->fs_flags & FS_REVAL_DOT)) {
+ struct dentry *dentry = nd->dentry;
+ unlock_nd(nd);
+ dput(pinned.dentry);
+ mntput(pinned.mnt);
+ if (!dentry->d_op->d_revalidate(dentry, 0)) {
+ d_invalidate(dentry);
+ path_release(nd);
+ return -ESTALE;
+ }
+ return 0;
+ }
return_base:
unlock_nd(nd);
dput(pinned.dentry);
diff -u --recursive --new-file linux-2.4.22-odirect/fs/nfs/inode.c linux-2.4.22-fix_cto/fs/nfs/inode.c
--- linux-2.4.22-odirect/fs/nfs/inode.c 2002-08-15 03:05:32.000000000 +0200
+++ linux-2.4.22-fix_cto/fs/nfs/inode.c 2003-07-08 15:24:32.000000000 +0200
@@ -1125,7 +1125,7 @@
/*
* File system information
*/
-static DECLARE_FSTYPE(nfs_fs_type, "nfs", nfs_read_super, FS_ODD_RENAME);
+static DECLARE_FSTYPE(nfs_fs_type, "nfs", nfs_read_super, FS_ODD_RENAME|FS_REVAL_DOT);

extern int nfs_init_nfspagecache(void);
extern void nfs_destroy_nfspagecache(void);
diff -u --recursive --new-file linux-2.4.22-odirect/include/linux/fs.h linux-2.4.22-fix_cto/include/linux/fs.h
--- linux-2.4.22-odirect/include/linux/fs.h 2003-07-08 11:47:08.000000000 +0200
+++ linux-2.4.22-fix_cto/include/linux/fs.h 2003-07-08 15:47:06.000000000 +0200
@@ -92,6 +92,7 @@
#define FS_SINGLE 8 /* Filesystem that can have only one superblock */
#define FS_NOMOUNT 16 /* Never mount from userland */
#define FS_LITTER 32 /* Keeps the tree in dcache */
+#define FS_REVAL_DOT 16384 /* Check the paths ".", ".." for staleness */
#define FS_ODD_RENAME 32768 /* Temporary stuff; will go away as soon
* as nfs_rename() will be cleaned up
*/


2003-07-08 14:07:45

by Alan

[permalink] [raw]
Subject: RE: [PATCH] path_lookup for 2.4.20-pre4 ([email protected])

On Maw, 2003-07-08 at 15:04, Trond Myklebust wrote:
> This patch breaks NFS close-to-open cache consistency as it undoes
> those changes that provide path revalidation for the case of
> open(".").
> The changelog entry doesn't even attempt to document this removal...

I was a little suprised it went in, it never seemed a candidate for
dealing with a stable tree, just optimisation stuff that is 2.5 material
only

2003-07-08 14:47:25

by Trond Myklebust

[permalink] [raw]
Subject: RE: [PATCH] Fastwalk: reduce cacheline bouncing of d_count ([email protected])

>>>>> " " == Alan Cox <[email protected]> writes:

> I was a little suprised it went in, it never seemed a candidate
> for dealing with a stable tree, just optimisation stuff that is
> 2.5 material only

As you can see, I screwed up on the title, so I may have confused
you...
...but I do agree with your comment. The patch I meant to refer to
(see revised title) does not appear in the 2.5.x tree either.

Have we BTW been shown any numbers that support the alleged benefits?
I may have missed those...

Cheers,
Trond

2003-07-08 15:09:59

by Alan

[permalink] [raw]
Subject: RE: [PATCH] Fastwalk: reduce cacheline bouncing of d_count ([email protected])

On Maw, 2003-07-08 at 16:00, Trond Myklebust wrote:
> ...but I do agree with your comment. The patch I meant to refer to
> (see revised title) does not appear in the 2.5.x tree either.
>
> Have we BTW been shown any numbers that support the alleged benefits?
> I may have missed those...

A while ago yes - on very big SMP boxes.

Its no big problem to me since I can just back it out of -ac

2003-07-08 16:05:45

by Jan Harkes

[permalink] [raw]
Subject: Re: [PATCH] path_lookup for 2.4.20-pre4 ([email protected])

One thing you might want to add to avoid revalidating all pathnames that
happen to start with a dot, like '.bashrc'.

Jan

On Tue, Jul 08, 2003 at 04:04:46PM +0200, Trond Myklebust wrote:
> diff -u --recursive --new-file linux-2.4.22-odirect/fs/namei.c linux-2.4.22-fix_cto/fs/namei.c
> --- linux-2.4.22-odirect/fs/namei.c 2003-06-27 13:34:41.000000000 +0200
> +++ linux-2.4.22-fix_cto/fs/namei.c 2003-07-08 15:51:08.000000000 +0200
> @@ -732,6 +732,24 @@
> nd->last_type = LAST_DOT;
> else if (this.len == 2 && this.name[1] == '.')
> nd->last_type = LAST_DOTDOT;
+ else
+ goto return_base;
> +return_reval:
> + /*
> + * We bypassed the ordinary revalidation routines.
> + * We may need to check the cached dentry for staleness.
> + */

2003-07-08 16:27:09

by Trond Myklebust

[permalink] [raw]
Subject: Re: [PATCH] Fastwalk: reduce cacheline bouncing of d_count ([email protected])

>>>>> " " == Jan Harkes <[email protected]> writes:

> One thing you might want to add to avoid revalidating all
> pathnames that happen to start with a dot, like '.bashrc'.

Oops. Well caught! Revised patch would be as follows.

Cheers,
Trond

diff -u --recursive --new-file linux-2.4.22-odirect/fs/namei.c linux-2.4.22-fix_cto/fs/namei.c
--- linux-2.4.22-odirect/fs/namei.c 2003-06-27 13:34:41.000000000 +0200
+++ linux-2.4.22-fix_cto/fs/namei.c 2003-07-08 18:34:55.000000000 +0200
@@ -563,7 +563,7 @@
while (*name=='/')
name++;
if (!*name)
- goto return_base;
+ goto return_reval;

inode = nd->dentry->d_inode;
if (current->link_count)
@@ -686,7 +686,7 @@
inode = nd->dentry->d_inode;
/* fallthrough */
case 1:
- goto return_base;
+ goto return_reval;
}
if (nd->dentry->d_op && nd->dentry->d_op->d_hash) {
err = nd->dentry->d_op->d_hash(nd->dentry, &this);
@@ -732,6 +732,26 @@
nd->last_type = LAST_DOT;
else if (this.len == 2 && this.name[1] == '.')
nd->last_type = LAST_DOTDOT;
+ else
+ goto return_base;
+return_reval:
+ /*
+ * We bypassed the ordinary revalidation routines.
+ * We may need to check the cached dentry for staleness.
+ */
+ if (nd->dentry && nd->dentry->d_sb &&
+ (nd->dentry->d_sb->s_type->fs_flags & FS_REVAL_DOT)) {
+ struct dentry *dentry = nd->dentry;
+ unlock_nd(nd);
+ dput(pinned.dentry);
+ mntput(pinned.mnt);
+ if (!dentry->d_op->d_revalidate(dentry, 0)) {
+ d_invalidate(dentry);
+ path_release(nd);
+ return -ESTALE;
+ }
+ return 0;
+ }
return_base:
unlock_nd(nd);
dput(pinned.dentry);
diff -u --recursive --new-file linux-2.4.22-odirect/fs/nfs/inode.c linux-2.4.22-fix_cto/fs/nfs/inode.c
--- linux-2.4.22-odirect/fs/nfs/inode.c 2002-08-15 03:05:32.000000000 +0200
+++ linux-2.4.22-fix_cto/fs/nfs/inode.c 2003-07-08 15:24:32.000000000 +0200
@@ -1125,7 +1125,7 @@
/*
* File system information
*/
-static DECLARE_FSTYPE(nfs_fs_type, "nfs", nfs_read_super, FS_ODD_RENAME);
+static DECLARE_FSTYPE(nfs_fs_type, "nfs", nfs_read_super, FS_ODD_RENAME|FS_REVAL_DOT);

extern int nfs_init_nfspagecache(void);
extern void nfs_destroy_nfspagecache(void);
diff -u --recursive --new-file linux-2.4.22-odirect/include/linux/fs.h linux-2.4.22-fix_cto/include/linux/fs.h
--- linux-2.4.22-odirect/include/linux/fs.h 2003-07-08 11:47:08.000000000 +0200
+++ linux-2.4.22-fix_cto/include/linux/fs.h 2003-07-08 15:57:26.000000000 +0200
@@ -92,6 +92,7 @@
#define FS_SINGLE 8 /* Filesystem that can have only one superblock */
#define FS_NOMOUNT 16 /* Never mount from userland */
#define FS_LITTER 32 /* Keeps the tree in dcache */
+#define FS_REVAL_DOT 16384 /* Check the paths ".", ".." for staleness */
#define FS_ODD_RENAME 32768 /* Temporary stuff; will go away as soon
* as nfs_rename() will be cleaned up
*/

2003-07-08 16:29:49

by Herbert Poetzl

[permalink] [raw]
Subject: Re: [PATCH] Fastwalk: reduce cacheline bouncing of d_count ([email protected])

On Tue, Jul 08, 2003 at 04:20:14PM +0100, Alan Cox wrote:
> On Maw, 2003-07-08 at 16:00, Trond Myklebust wrote:
> > ...but I do agree with your comment. The patch I meant to refer to
> > (see revised title) does not appear in the 2.5.x tree either.
> >
> > Have we BTW been shown any numbers that support the alleged benefits?
> > I may have missed those...
>
> A while ago yes - on very big SMP boxes.
>
> Its no big problem to me since I can just back it out of -ac

just curious, because I use this patch since early 2.4.20,
are there any reasons to 'back it out of -ac' for you?

anyway I totally agree that the NFS issue pointed out by
Trond should be addressed ...

TIA,
Herbert

2003-07-08 16:42:17

by Alan

[permalink] [raw]
Subject: Re: [PATCH] Fastwalk: reduce cacheline bouncing of d_count ([email protected])

On Maw, 2003-07-08 at 17:44, Herbert Poetzl wrote:
> > Its no big problem to me since I can just back it out of -ac
>
> just curious, because I use this patch since early 2.4.20,
> are there any reasons to 'back it out of -ac' for you?
>
> anyway I totally agree that the NFS issue pointed out by
> Trond should be addressed ...

Its high risk, its got bugs as Trond already showed and it only
helps performance on giant SMP boxes. Its all risk and no
reward. Quota updates get you working 32bit uid quota and
the interactivity stuff helps all even tho its got some
risk.


2003-07-08 16:51:50

by Herbert Poetzl

[permalink] [raw]
Subject: Re: [PATCH] Fastwalk: reduce cacheline bouncing of d_count ([email protected])

On Tue, Jul 08, 2003 at 05:53:34PM +0100, Alan Cox wrote:
> On Maw, 2003-07-08 at 17:44, Herbert Poetzl wrote:
> > > Its no big problem to me since I can just back it out of -ac
> >
> > just curious, because I use this patch since early 2.4.20,
> > are there any reasons to 'back it out of -ac' for you?
> >
> > anyway I totally agree that the NFS issue pointed out by
> > Trond should be addressed ...
>
> Its high risk, its got bugs as Trond already showed and it only
> helps performance on giant SMP boxes. Its all risk and no
> reward. Quota updates get you working 32bit uid quota and
> the interactivity stuff helps all even tho its got some
> risk.

every change (if it's not a bugfix, and even those) bear
a risk, what I like about the fastwalk patch is not the
performance gain on giant SMP boxes, because I do not have
any (unfortunately ;) but the code change from ...

if (path_init(pathname, LOOKUP_PARENT, &nd))
error = path_walk(pathname, &nd);

to
error = path_lookup(pathname, LOOKUP_PARENT, &nd);


and

dentry = cached_lookup(nd->dentry, &this, 0);
if (!dentry) {
dentry = real_lookup(nd->dentry, &this, 0);
err = PTR_ERR(dentry);
if (IS_ERR(dentry))
break;
}

to
err = do_lookup(nd, &this, &next, &pinned, 0);


which (at least for me) is more read-/understandable ...

anyway, thanks for you answer,
Herbert

2003-07-08 17:05:55

by Matthew Wilcox

[permalink] [raw]
Subject: Re: [PATCH] Fastwalk: reduce cacheline bouncing of d_count ([email protected])

On Tue, Jul 08, 2003 at 07:06:28PM +0200, Herbert Poetzl wrote:
> every change (if it's not a bugfix, and even those) bear
> a risk, what I like about the fastwalk patch is not the

exactly. 2.4 is not the place for cleanups that make the code easier
to read because those cleanups can introduce new bugs.

--
"It's not Hollywood. War is real, war is primarily not about defeat or
victory, it is about death. I've seen thousands and thousands of dead bodies.
Do you think I want to have an academic debate on this subject?" -- Robert Fisk

2003-07-08 17:28:08

by Hanna Linder

[permalink] [raw]
Subject: Re: [PATCH] Fastwalk: reduce cacheline bouncing of d_count ([email protected])

--On Tuesday, July 08, 2003 06:20:28 PM +0100 Matthew Wilcox <[email protected]> wrote:

> On Tue, Jul 08, 2003 at 07:06:28PM +0200, Herbert Poetzl wrote:
>> every change (if it's not a bugfix, and even those) bear
>> a risk, what I like about the fastwalk patch is not the
>
> exactly. 2.4 is not the place for cleanups that make the code easier
> to read because those cleanups can introduce new bugs.
>

Im not going to fight too hard for this. If people want it removed
that is OK. I pushed it because dcache_rcu was not going to be
accepted and Fastwalk is a fair second to dcache_rcu. The change Trond
pointed out was added by Al Viro after fastwalk was included in 2.5.11
which I backported.

Hanna

2003-07-08 17:56:00

by Trond Myklebust

[permalink] [raw]
Subject: Re: [PATCH] Fastwalk: reduce cacheline bouncing of d_count ([email protected])

>>>>> " " == Hanna Linder <[email protected]> writes:

> The change Trond pointed out was added by Al Viro
> after fastwalk was included in 2.5.11 which I backported.

IIRC, Al vetoed the NFS cto change that went into 2.4.x because he
claimed he was planning on providing an alternative fix that would
better fit the unionfs.
As that apparently won't materialize in 2.6.x, I'm in any case
planning on presenting the open(".") patch (or some variant of it) to
Linus.

That said, just backing a bugfix out of a stable kernel without
providing an alternative solution should only be done in those cases
where keeping it would cause a worse problem. For instance if the fix
is a security issue...

Cheers,
Trond

2003-07-08 18:05:00

by Hanna Linder

[permalink] [raw]
Subject: Re: [PATCH] Fastwalk: reduce cacheline bouncing of d_count ([email protected])

--On Tuesday, July 08, 2003 08:10:10 PM +0200 Trond Myklebust <[email protected]> wrote:

>>>>>> " " == Hanna Linder <[email protected]> writes:
>
> > The change Trond pointed out was added by Al Viro
> > after fastwalk was included in 2.5.11 which I backported.
>
> IIRC, Al vetoed the NFS cto change that went into 2.4.x because he
> claimed he was planning on providing an alternative fix that would
> better fit the unionfs.
> As that apparently won't materialize in 2.6.x, I'm in any case
> planning on presenting the open(".") patch (or some variant of it) to
> Linus.

Then I apologize for my bad memory. I dont remember that thread.
This fastwalk stuff was originally written over a year ago.

Please push your bugfix as you know the NFS area much better
than I do.

Thanks.

Hanna


2003-07-08 19:15:19

by Marcelo Tosatti

[permalink] [raw]
Subject: RE: [PATCH] Fastwalk: reduce cacheline bouncing of d_count ([email protected])



On Tue, 8 Jul 2003, Alan Cox wrote:

> On Maw, 2003-07-08 at 16:00, Trond Myklebust wrote:
> > ...but I do agree with your comment. The patch I meant to refer to
> > (see revised title) does not appear in the 2.5.x tree either.
> >
> > Have we BTW been shown any numbers that support the alleged benefits?
> > I may have missed those...
>
> A while ago yes - on very big SMP boxes.
>
> Its no big problem to me since I can just back it out of -ac

Alan,

I included fastwalk patch because I thought it was a stable and very
useful optimisation. Even for 2.4. (I was expecting comments/flames on it
when I first included it.

Now if you think it is a 2.5 thing and can cause any potential problem for
2.4 I will remove it.

2003-07-08 19:15:32

by Marcelo Tosatti

[permalink] [raw]
Subject: Re: [PATCH] Fastwalk: reduce cacheline bouncing of d_count ([email protected])



On Tue, 8 Jul 2003, Alan Cox wrote:

> On Maw, 2003-07-08 at 17:44, Herbert Poetzl wrote:
> > > Its no big problem to me since I can just back it out of -ac
> >
> > just curious, because I use this patch since early 2.4.20,
> > are there any reasons to 'back it out of -ac' for you?
> >
> > anyway I totally agree that the NFS issue pointed out by
> > Trond should be addressed ...
>
> Its high risk, its got bugs as Trond already showed and it only
> helps performance on giant SMP boxes. Its all risk and no
> reward. Quota updates get you working 32bit uid quota and
> the interactivity stuff helps all even tho its got some
> risk.

Ok, fine. Thats the feedback I wanted when I included it yesterday.

I'm going to revert it now.

Sorry, Hanna, but Alan saved the day again, and convinced me that fastwalk
is indeed a 2.5 thing.