by J. Bruce Fields

[permalink] [raw]

Subject: Re: [PATCH 2/2] exportfs: fix 32-bit nfsd handling of 64-bit inode numbers

On Wed, Oct 02, 2013 at 05:28:14PM -0400, J. Bruce Fields wrote:
> @@ -268,6 +268,16 @@ static int get_name(const struct path *path, char *name, struct dentry *child)
> if (!dir->i_fop)
> goto out;
> /*
> + * inode->i_ino is unsigned long, kstat->ino is u64, so the
> + * former would be insufficient on 32-bit hosts when the
> + * filesystem supports 64-bit inode numbers. So we need to
> + * actually call ->getattr, not just read i_ino:
> + */
> + error = vfs_getattr_nosec(path, &stat);

Doh, "path" here is for the parent.... The following works better!

--b.

commit 231d38b6f775c4677a61b4d7dc15e0a0d6bbb709
Author: J. Bruce Fields <[email protected]>
Date: Tue Sep 10 11:41:12 2013 -0400

exportfs: fix 32-bit nfsd handling of 64-bit inode numbers

Symptoms were spurious -ENOENTs on stat of an NFS filesystem from a
32-bit NFS server exporting a very large XFS filesystem, when the
server's cache is cold (so the inodes in question are not in cache).

Reviewed-by: Christoph Hellwig <[email protected]>
Reported-by: Trevor Cordes <[email protected]>
Signed-off-by: J. Bruce Fields <[email protected]>

diff --git a/fs/exportfs/expfs.c b/fs/exportfs/expfs.c
index a235f00..c43fe9b 100644
--- a/fs/exportfs/expfs.c
+++ b/fs/exportfs/expfs.c
@@ -215,7 +215,7 @@ struct getdents_callback {
struct dir_context ctx;
char *name; /* name that was found. It already points to a
buffer NAME_MAX+1 is size */
- unsigned long ino; /* the inum we are looking for */
+ u64 ino; /* the inum we are looking for */
int found; /* inode matched? */
int sequence; /* sequence counter */
};
@@ -255,10 +255,14 @@ static int get_name(const struct path *path, char *name, struct dentry *child)
struct inode *dir = path->dentry->d_inode;
int error;
struct file *file;
+ struct kstat stat;
+ struct path child_path = {
+ .mnt = path->mnt,
+ .dentry = child,
+ };
struct getdents_callback buffer = {
.ctx.actor = filldir_one,
.name = name,
- .ino = child->d_inode->i_ino
};

error = -ENOTDIR;
@@ -268,6 +272,16 @@ static int get_name(const struct path *path, char *name, struct dentry *child)
if (!dir->i_fop)
goto out;
/*
+ * inode->i_ino is unsigned long, kstat->ino is u64, so the
+ * former would be insufficient on 32-bit hosts when the
+ * filesystem supports 64-bit inode numbers. So we need to
+ * actually call ->getattr, not just read i_ino:
+ */
+ error = vfs_getattr_nosec(&child_path, &stat);
+ if (error)
+ return error;
+ buffer.ino = stat.ino;
+ /*
* Open the directory ...
*/
file = dentry_open(path, O_RDONLY, cred);

2013-10-09 14:53:49

by J. Bruce Fields

[permalink] [raw]

Subject: Re: [PATCH 2/2] exportfs: fix 32-bit nfsd handling of 64-bit inode numbers

On Wed, Oct 09, 2013 at 11:16:31AM +1100, Dave Chinner wrote:
> On Tue, Oct 08, 2013 at 05:56:56PM -0400, J. Bruce Fields wrote:
> > On Fri, Oct 04, 2013 at 06:15:22PM -0400, J. Bruce Fields wrote:
> > > On Fri, Oct 04, 2013 at 06:12:16PM -0400, bfields wrote:
> > > > On Wed, Oct 02, 2013 at 05:28:14PM -0400, J. Bruce Fields wrote:
> > > > > @@ -268,6 +268,16 @@ static int get_name(const struct path *path, char *name, struct dentry *child)
> > > > > if (!dir->i_fop)
> > > > > goto out;
> > > > > /*
> > > > > + * inode->i_ino is unsigned long, kstat->ino is u64, so the
> > > > > + * former would be insufficient on 32-bit hosts when the
> > > > > + * filesystem supports 64-bit inode numbers. So we need to
> > > > > + * actually call ->getattr, not just read i_ino:
> > > > > + */
> > > > > + error = vfs_getattr_nosec(path, &stat);
> > > >
> > > > Doh, "path" here is for the parent.... The following works better!
> > >
> > > By the way, I'm testing this with:
> > >
> > > - create a bunch of nested subdirectories, use
> > > name_to_fhandle_at to get a handle for the bottom directory.
> > > - echo 2 >/proc/sys/vm/drop_caches
> > > - open_by_fhandle_at on the filehandle
> > >
> > > But this only actually exercises the reconnect path on the first run
> > > after boot. Is there something obvious I'm missing here?
> >
> > Looking at the code.... OK, most of the work of drop_caches is done by
> > shrink_slab_node, which doesn't actually try to free every single thing
> > that it could free--in particular, it won't try to free anything if it
> > thinks there are less than shrinker->batch_size (1024 in the
> > super_block->s_shrink case) objects to free.

(Oops, sorry, that should have been "less than half of
shrinker->batch_size", see below.)

> That's not quite right. Yes, the shrinker won't be called if the
> calculated scan count is less than the batch size, but the left over
> is added back the shrinker scan count to carry over to the next call
> to the shrinker. Hence if you repeated call the shrinker on a small
> cache with a large batch size, it will eventually aggregate the scan
> counts to over the batch size and trim the cache....

No, in shrink_slab_count, we do this:

if (total_scan > max_pass * 2)
total_scan = max_pass * 2;

while (total_scan >= batch_size) {
...
}

where max_pass is the value returned from count_objects. So as long as
count_objects returns less than half batch_size, nothing ever happens.

(I wonder if that check's correct? The "forever" in the comment above
it seems wrong at least.)

--b.