2011-08-08 15:37:52

by Bernd Schubert

[permalink] [raw]
Subject: [PATCH 0/4] Series short description

With the ext3/ext4 directory index implementation hashes are used to specify
offsets for llseek(). For compatibility with NFSv2 and 32-bit user space
on 64-bit systems (kernel space) ext3/ext4 currently only return 32-bit
hashes and therefore the probability of hash collisions for larger directories
is rather high. As recently reported on the NFS mailing list that theoretical
problem also happens on real systems:
http://comments.gmane.org/gmane.linux.nfs/40863

The following series adds two new f_mode flags to tell ext4
to use 32-bit or 64-bit hash values for llseek() calls.
These flags can then used by network file systems, such as NFS, to
request 32-bit or 64-bit offsets (hashes).

Version 2:
- use f_mode instead of O_* flags and also in a separate patch
- introduce EXT4_HTREE_EOF_32BIT and EXT4_HTREE_EOF_64BIT
- fix SEEK_END in ext4_dir_llseek()
- set f_mode flags in NFS code as early as possible and introduce a new
NFSD_MAY_64BIT_COOKIE flag for that

--
Bernd Schubert
Fraunhofer ITWM


2011-08-08 15:38:00

by Bernd Schubert

[permalink] [raw]
Subject: [PATCH 1/4] Add new FMODE flags: FMODE_32bithash and FMODE_64bithash

Those flags are supposed to be set by NFS readdir() to tell ext3/ext4
to 32bit (NFSv2) or 64bit hash values (offsets) in seekdir().

Signed-off-by: Bernd Schubert <[email protected]>
---
include/linux/fs.h | 5 +++++
1 files changed, 5 insertions(+), 0 deletions(-)

diff --git a/include/linux/fs.h b/include/linux/fs.h
index 178cdb4..18d40ae 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -91,6 +91,11 @@ struct inodes_stat_t {
/* File is opened using open(.., 3, ..) and is writeable only for ioctls
(specialy hack for floppy.c) */
#define FMODE_WRITE_IOCTL ((__force fmode_t)0x100)
+/* 32bit hashes as llseek() offset (for directories) */
+#define FMODE_32BITHASH ((__force fmode_t)0x200)
+/* 64bit hashes as llseek() offset (for directories) */
+#define FMODE_64BITHASH ((__force fmode_t)0x400)
+

/*
* Don't update ctime and mtime.


2011-08-08 15:38:03

by Bernd Schubert

[permalink] [raw]
Subject: [PATCH 2/4] Return 32/64-bit dir name hash according to usage type

From: Fan Yong <[email protected]>

Traditionally ext2/3/4 has returned a 32-bit hash value from llseek()
to appease NFSv2, which can only handle a 32-bit cookie for seekdir()
and telldir(). However, this causes problems if there are 32-bit hash
collisions, since the NFSv2 server can get stuck resending the same
entries from the directory repeatedly.

Allow ext4 to return a full 64-bit hash (both major and minor) for
telldir to decrease the chance of hash collisions. This still needs
integration on the NFS side.

Patch-updated-by: Bernd Schubert <[email protected]>
(blame me if something is not correct)

Signed-off-by: Fan Yong <[email protected]>
Signed-off-by: Andreas Dilger <[email protected]>
Signed-off-by: Bernd Schubert <[email protected]>
---
fs/ext4/dir.c | 185 ++++++++++++++++++++++++++++++++++++++++++++------------
fs/ext4/ext4.h | 6 ++
fs/ext4/hash.c | 4 +
3 files changed, 154 insertions(+), 41 deletions(-)

diff --git a/fs/ext4/dir.c b/fs/ext4/dir.c
index 164c560..cc47087 100644
--- a/fs/ext4/dir.c
+++ b/fs/ext4/dir.c
@@ -32,24 +32,8 @@ static unsigned char ext4_filetype_table[] = {
DT_UNKNOWN, DT_REG, DT_DIR, DT_CHR, DT_BLK, DT_FIFO, DT_SOCK, DT_LNK
};

-static int ext4_readdir(struct file *, void *, filldir_t);
static int ext4_dx_readdir(struct file *filp,
void *dirent, filldir_t filldir);
-static int ext4_release_dir(struct inode *inode,
- struct file *filp);
-
-const struct file_operations ext4_dir_operations = {
- .llseek = ext4_llseek,
- .read = generic_read_dir,
- .readdir = ext4_readdir, /* we take BKL. needed?*/
- .unlocked_ioctl = ext4_ioctl,
-#ifdef CONFIG_COMPAT
- .compat_ioctl = ext4_compat_ioctl,
-#endif
- .fsync = ext4_sync_file,
- .release = ext4_release_dir,
-};
-

static unsigned char get_dtype(struct super_block *sb, int filetype)
{
@@ -254,22 +238,134 @@ out:
return ret;
}

+static inline int is_32bit_api(void)
+{
+#ifdef HAVE_IS_COMPAT_TASK
+ return is_compat_task();
+#else
+ return (BITS_PER_LONG == 32);
+#endif
+}
+
/*
* These functions convert from the major/minor hash to an f_pos
- * value.
+ * value for dx directories
+ *
+ * Upper layer (for example NFS) should specify FMODE_32BITHASH or
+ * FMODE_64BITHASH explicitly. On the other hand, we allow ext4 to be mounted
+ * directly on both 32-bit and 64-bit nodes, under such case, neither
+ * FMODE_32BITHASH nor FMODE_64BITHASH is specified.
+ */
+static inline loff_t hash2pos(struct file *filp, __u32 major, __u32 minor)
+{
+ if ((filp->f_flags & FMODE_32BITHASH) ||
+ (!(filp->f_flags & FMODE_64BITHASH) && is_32bit_api()))
+ return major >> 1;
+ else
+ return ((__u64)(major >> 1) << 32) | (__u64)minor;
+}
+
+static inline __u32 pos2maj_hash(struct file *filp, loff_t pos)
+{
+ if ((filp->f_flags & FMODE_32BITHASH) ||
+ (!(filp->f_mode & FMODE_64BITHASH) && is_32bit_api()))
+ return (pos << 1) & 0xffffffff;
+ else
+ return ((pos >> 32) << 1) & 0xffffffff;
+}
+
+static inline __u32 pos2min_hash(struct file *filp, loff_t pos)
+{
+ if ((filp->f_flags & FMODE_32BITHASH) ||
+ (!(filp->f_flags & FMODE_64BITHASH) && is_32bit_api()))
+ return 0;
+ else
+ return pos & 0xffffffff;
+}
+
+/*
+ * Return 32- or 64-bit end-of-file for dx directories
+ */
+static inline loff_t ext4_get_htree_eof(struct file *filp)
+{
+ if ((filp->f_mode & FMODE_32BITHASH) ||
+ (!(filp->f_mode & FMODE_64BITHASH) && is_32bit_api()))
+ return EXT4_HTREE_EOF_32BIT;
+ else
+ return EXT4_HTREE_EOF_64BIT;
+}
+
+
+/*
+ * ext4_dir_llseek() based on generic_file_llseek() to handle both
+ * non-htree and htree directories, where the "offset" is in terms
+ * of the filename hash value instead of the byte offset.
*
- * Currently we only use major hash numer. This is unfortunate, but
- * on 32-bit machines, the same VFS interface is used for lseek and
- * llseek, so if we use the 64 bit offset, then the 32-bit versions of
- * lseek/telldir/seekdir will blow out spectacularly, and from within
- * the ext2 low-level routine, we don't know if we're being called by
- * a 64-bit version of the system call or the 32-bit version of the
- * system call. Worse yet, NFSv2 only allows for a 32-bit readdir
- * cookie. Sigh.
+ * NOTE: offsets obtained *before* ext4_set_inode_flag(dir, EXT4_INODE_INDEX)
+ * will be invalid once the directory was converted into a dx directory
*/
-#define hash2pos(major, minor) (major >> 1)
-#define pos2maj_hash(pos) ((pos << 1) & 0xffffffff)
-#define pos2min_hash(pos) (0)
+loff_t ext4_dir_llseek(struct file *file, loff_t offset, int origin)
+{
+ struct inode *inode = file->f_mapping->host;
+ loff_t ret = -EINVAL;
+ int is_dx_dir = ext4_test_inode_flag(inode, EXT4_INODE_INDEX);
+
+ mutex_lock(&inode->i_mutex);
+
+ /* NOTE: relative offsets with dx directories might not work
+ * as expected, as it is difficult to figure out the
+ * correct offset between dx hashes */
+
+ switch (origin) {
+ case SEEK_END:
+ if (unlikely(offset > 0))
+ goto out_err; /* not supported for directories */
+
+ /* so only negative offsets are left, does that have a
+ * meaning for directories at all? */
+ if (is_dx_dir)
+ offset += ext4_get_htree_eof(file);
+ else
+ offset += inode->i_size;
+ break;
+ case SEEK_CUR:
+ /*
+ * Here we special-case the lseek(fd, 0, SEEK_CUR)
+ * position-querying operation. Avoid rewriting the "same"
+ * f_pos value back to the file because a concurrent read(),
+ * write() or lseek() might have altered it
+ */
+ if (offset == 0) {
+ offset = file->f_pos;
+ goto out_ok;
+ }
+
+ offset += file->f_pos;
+ break;
+ }
+
+ if (unlikely(offset < 0))
+ goto out_err;
+
+ if (!is_dx_dir) {
+ if (offset > inode->i_sb->s_maxbytes)
+ goto out_err;
+ } else if (offset > ext4_get_htree_eof(file))
+ goto out_err;
+
+ /* Special lock needed here? */
+ if (offset != file->f_pos) {
+ file->f_pos = offset;
+ file->f_version = 0;
+ }
+
+out_ok:
+ ret = offset;
+out_err:
+ mutex_unlock(&inode->i_mutex);
+
+ return ret;
+}

/*
* This structure holds the nodes of the red-black tree used to store
@@ -330,15 +426,16 @@ static void free_rb_tree_fname(struct rb_root *root)
}


-static struct dir_private_info *ext4_htree_create_dir_info(loff_t pos)
+static struct dir_private_info *ext4_htree_create_dir_info(struct file *filp,
+ loff_t pos)
{
struct dir_private_info *p;

p = kzalloc(sizeof(struct dir_private_info), GFP_KERNEL);
if (!p)
return NULL;
- p->curr_hash = pos2maj_hash(pos);
- p->curr_minor_hash = pos2min_hash(pos);
+ p->curr_hash = pos2maj_hash(filp, pos);
+ p->curr_minor_hash = pos2min_hash(filp, pos);
return p;
}

@@ -429,7 +526,7 @@ static int call_filldir(struct file *filp, void *dirent,
"null fname?!?\n");
return 0;
}
- curr_pos = hash2pos(fname->hash, fname->minor_hash);
+ curr_pos = hash2pos(filp, fname->hash, fname->minor_hash);
while (fname) {
error = filldir(dirent, fname->name,
fname->name_len, curr_pos,
@@ -454,13 +551,13 @@ static int ext4_dx_readdir(struct file *filp,
int ret;

if (!info) {
- info = ext4_htree_create_dir_info(filp->f_pos);
+ info = ext4_htree_create_dir_info(filp, filp->f_pos);
if (!info)
return -ENOMEM;
filp->private_data = info;
}

- if (filp->f_pos == EXT4_HTREE_EOF)
+ if (filp->f_pos == ext4_get_htree_eof(filp))
return 0; /* EOF */

/* Some one has messed with f_pos; reset the world */
@@ -468,8 +565,8 @@ static int ext4_dx_readdir(struct file *filp,
free_rb_tree_fname(&info->root);
info->curr_node = NULL;
info->extra_fname = NULL;
- info->curr_hash = pos2maj_hash(filp->f_pos);
- info->curr_minor_hash = pos2min_hash(filp->f_pos);
+ info->curr_hash = pos2maj_hash(filp, filp->f_pos);
+ info->curr_minor_hash = pos2min_hash(filp, filp->f_pos);
}

/*
@@ -501,7 +598,7 @@ static int ext4_dx_readdir(struct file *filp,
if (ret < 0)
return ret;
if (ret == 0) {
- filp->f_pos = EXT4_HTREE_EOF;
+ filp->f_pos = ext4_get_htree_eof(filp);
break;
}
info->curr_node = rb_first(&info->root);
@@ -521,7 +618,7 @@ static int ext4_dx_readdir(struct file *filp,
info->curr_minor_hash = fname->minor_hash;
} else {
if (info->next_hash == ~0) {
- filp->f_pos = EXT4_HTREE_EOF;
+ filp->f_pos = ext4_get_htree_eof(filp);
break;
}
info->curr_hash = info->next_hash;
@@ -540,3 +637,15 @@ static int ext4_release_dir(struct inode *inode, struct file *filp)

return 0;
}
+
+const struct file_operations ext4_dir_operations = {
+ .llseek = ext4_dir_llseek,
+ .read = generic_read_dir,
+ .readdir = ext4_readdir,
+ .unlocked_ioctl = ext4_ioctl,
+#ifdef CONFIG_COMPAT
+ .compat_ioctl = ext4_compat_ioctl,
+#endif
+ .fsync = ext4_sync_file,
+ .release = ext4_release_dir,
+};
diff --git a/fs/ext4/ext4.h b/fs/ext4/ext4.h
index e717dfd..31d9ba0 100644
--- a/fs/ext4/ext4.h
+++ b/fs/ext4/ext4.h
@@ -1560,7 +1560,11 @@ struct dx_hash_info
u32 *seed;
};

-#define EXT4_HTREE_EOF 0x7fffffff
+
+/* 32 and 64 bit signed EOF for dx directories */
+#define EXT4_HTREE_EOF_32BIT ((1UL << (32 - 1)) - 1)
+#define EXT4_HTREE_EOF_64BIT ((1ULL << (64 - 1)) - 1)
+

/*
* Control parameters used by ext4_htree_next_block
diff --git a/fs/ext4/hash.c b/fs/ext4/hash.c
index ac8f168..fa8e491 100644
--- a/fs/ext4/hash.c
+++ b/fs/ext4/hash.c
@@ -200,8 +200,8 @@ int ext4fs_dirhash(const char *name, int len, struct dx_hash_info *hinfo)
return -1;
}
hash = hash & ~1;
- if (hash == (EXT4_HTREE_EOF << 1))
- hash = (EXT4_HTREE_EOF-1) << 1;
+ if (hash == (EXT4_HTREE_EOF_32BIT << 1))
+ hash = (EXT4_HTREE_EOF_32BIT - 1) << 1;
hinfo->hash = hash;
hinfo->minor_hash = minor_hash;
return 0;


2011-08-08 15:38:08

by Bernd Schubert

[permalink] [raw]
Subject: [PATCH 3/4] RFC: Remove check for a 32-bit cookie in nfsd4_readdir()

Fan Yong <yong.fan-KloliPT79xf2eFz/[email protected]> noticed setting
FMODE_32bithash wouldn't work with nfsd v4, as
nfsd4_readdir() checks for 32 bit cookies. However, according to RFC 3530
cookies have a 64 bit type and cookies are also defined as u64 in
'struct nfsd4_readdir'. So remove the test for >32-bit values.

Signed-off-by: Bernd Schubert <[email protected]>
---
fs/nfsd/nfs4proc.c | 2 +-
1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/fs/nfsd/nfs4proc.c b/fs/nfsd/nfs4proc.c
index e807776..9bf0a66 100644
--- a/fs/nfsd/nfs4proc.c
+++ b/fs/nfsd/nfs4proc.c
@@ -691,7 +691,7 @@ nfsd4_readdir(struct svc_rqst *rqstp, struct nfsd4_compound_state *cstate,
readdir->rd_bmval[1] &= nfsd_suppattrs1(cstate->minorversion);
readdir->rd_bmval[2] &= nfsd_suppattrs2(cstate->minorversion);

- if ((cookie > ~(u32)0) || (cookie == 1) || (cookie == 2) ||
+ if ((cookie == 1) || (cookie == 2) ||
(cookie == 0 && memcmp(readdir->rd_verf.data, zeroverf.data, NFS4_VERIFIER_SIZE)))
return nfserr_bad_cookie;

2011-08-08 15:38:13

by Bernd Schubert

[permalink] [raw]
Subject: [PATCH 4/4] nfsd: vfs_llseek() with 32 or 64 bit offsets (hashes)

Use 32-bit or 64-bit llseek() hashes for directory offsets depending on
the NFS version. NFSv2 gets 32-bit hashes only.

NOTE: This patch got rather complex as Christoph asked to set the
filp->f_mode flag in the open call or immediatly after dentry_open()
in nfsd_open() to avoid races.
Personally I still do not see a reason for that and in my opinion
FMODE_32BITHASH/FMODE_64BITHASH flags could be set nfsd_readdir(), as it
follows directly after nfsd_open() without a chance of races.


Signed-off-by: Bernd Schubert <[email protected]>
---
fs/nfsd/vfs.c | 33 +++++++++++++++++++++++----------
fs/nfsd/vfs.h | 2 ++
2 files changed, 25 insertions(+), 10 deletions(-)

diff --git a/fs/nfsd/vfs.c b/fs/nfsd/vfs.c
index fd0acca..4bb517f 100644
--- a/fs/nfsd/vfs.c
+++ b/fs/nfsd/vfs.c
@@ -708,12 +708,13 @@ static int nfsd_open_break_lease(struct inode *inode, int access)

/*
* Open an existing file or directory.
- * The access argument indicates the type of open (read/write/lock)
+ * The may_flags argument indicates the type of open (read/write/lock)
+ * and additional flags.
* N.B. After this call fhp needs an fh_put
*/
__be32
nfsd_open(struct svc_rqst *rqstp, struct svc_fh *fhp, int type,
- int access, struct file **filp)
+ int may_flags, struct file **filp)
{
struct dentry *dentry;
struct inode *inode;
@@ -728,7 +729,7 @@ nfsd_open(struct svc_rqst *rqstp, struct svc_fh *fhp, int type,
* and (hopefully) checked permission - so allow OWNER_OVERRIDE
* in case a chmod has now revoked permission.
*/
- err = fh_verify(rqstp, fhp, type, access | NFSD_MAY_OWNER_OVERRIDE);
+ err = fh_verify(rqstp, fhp, type, may_flags | NFSD_MAY_OWNER_OVERRIDE);
if (err)
goto out;

@@ -739,7 +740,7 @@ nfsd_open(struct svc_rqst *rqstp, struct svc_fh *fhp, int type,
* or any access when mandatory locking enabled
*/
err = nfserr_perm;
- if (IS_APPEND(inode) && (access & NFSD_MAY_WRITE))
+ if (IS_APPEND(inode) && (may_flags & NFSD_MAY_WRITE))
goto out;
/*
* We must ignore files (but only files) which might have mandatory
@@ -752,12 +753,12 @@ nfsd_open(struct svc_rqst *rqstp, struct svc_fh *fhp, int type,
if (!inode->i_fop)
goto out;

- host_err = nfsd_open_break_lease(inode, access);
+ host_err = nfsd_open_break_lease(inode, may_flags);
if (host_err) /* NOMEM or WOULDBLOCK */
goto out_nfserr;

- if (access & NFSD_MAY_WRITE) {
- if (access & NFSD_MAY_READ)
+ if (may_flags & NFSD_MAY_WRITE) {
+ if (may_flags & NFSD_MAY_READ)
flags = O_RDWR|O_LARGEFILE;
else
flags = O_WRONLY|O_LARGEFILE;
@@ -766,8 +767,15 @@ nfsd_open(struct svc_rqst *rqstp, struct svc_fh *fhp, int type,
flags, current_cred());
if (IS_ERR(*filp))
host_err = PTR_ERR(*filp);
- else
- host_err = ima_file_check(*filp, access);
+ else {
+ host_err = ima_file_check(*filp, may_flags);
+
+ if (may_flags & NFSD_MAY_64BIT_COOKIE)
+ (*filp)->f_mode |= FMODE_64BITHASH;
+ else
+ (*filp)->f_mode |= FMODE_32BITHASH;
+ }
+
out_nfserr:
err = nfserrno(host_err);
out:
@@ -1989,8 +1997,13 @@ nfsd_readdir(struct svc_rqst *rqstp, struct svc_fh *fhp, loff_t *offsetp,
__be32 err;
struct file *file;
loff_t offset = *offsetp;
+ int flags = NFSD_MAY_READ;
+
+ /* NFSv2 only supports 32 bit cookies */
+ if (rqstp->rq_vers > 2)
+ flags |= NFSD_MAY_64BIT_COOKIE;

- err = nfsd_open(rqstp, fhp, S_IFDIR, NFSD_MAY_READ, &file);
+ err = nfsd_open(rqstp, fhp, S_IFDIR, flags, &file);
if (err)
goto out;

diff --git a/fs/nfsd/vfs.h b/fs/nfsd/vfs.h
index e0bbac0..ecd00e1 100644
--- a/fs/nfsd/vfs.h
+++ b/fs/nfsd/vfs.h
@@ -26,6 +26,8 @@
#define NFSD_MAY_NOT_BREAK_LEASE 512
#define NFSD_MAY_BYPASS_GSS 1024

+#define NFSD_MAY_64BIT_COOKIE 2048 /* 64 bit readdir cookies for >= NFSv3 */
+
#define NFSD_MAY_CREATE (NFSD_MAY_EXEC|NFSD_MAY_WRITE)
#define NFSD_MAY_REMOVE (NFSD_MAY_EXEC|NFSD_MAY_WRITE|NFSD_MAY_TRUNC)

2011-08-08 15:47:21

by Bernd Schubert

[permalink] [raw]
Subject: Re: [PATCH 0/4] 32/64 bit llseek hashes v2

Oh sorry, I forgot to set the correct subject line in my stg command line :(

On 08/08/2011 05:37 PM, Bernd Schubert wrote:
> With the ext3/ext4 directory index implementation hashes are used to specify
> offsets for llseek(). For compatibility with NFSv2 and 32-bit user space
> on 64-bit systems (kernel space) ext3/ext4 currently only return 32-bit
> hashes and therefore the probability of hash collisions for larger directories
> is rather high. As recently reported on the NFS mailing list that theoretical
> problem also happens on real systems:
> http://comments.gmane.org/gmane.linux.nfs/40863
>
> The following series adds two new f_mode flags to tell ext4
> to use 32-bit or 64-bit hash values for llseek() calls.
> These flags can then used by network file systems, such as NFS, to
> request 32-bit or 64-bit offsets (hashes).
>
> Version 2:
> - use f_mode instead of O_* flags and also in a separate patch
> - introduce EXT4_HTREE_EOF_32BIT and EXT4_HTREE_EOF_64BIT
> - fix SEEK_END in ext4_dir_llseek()
> - set f_mode flags in NFS code as early as possible and introduce a new
> NFSD_MAY_64BIT_COOKIE flag for that
>
> --
> Bernd Schubert
> Fraunhofer ITWM
> --
> To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html

2011-08-09 17:31:50

by J. Bruce Fields

[permalink] [raw]
Subject: Re: [PATCH 3/4] RFC: Remove check for a 32-bit cookie in nfsd4_readdir()

On Mon, Aug 08, 2011 at 05:38:08PM +0200, Bernd Schubert wrote:
> Fan Yong <[email protected]> noticed setting
> FMODE_32bithash wouldn't work with nfsd v4, as
> nfsd4_readdir() checks for 32 bit cookies. However, according to RFC 3530
> cookies have a 64 bit type and cookies are also defined as u64 in
> 'struct nfsd4_readdir'. So remove the test for >32-bit values.

Wow, thanks, I wonder where that check came from. Looks like it was
there since the very first nfsv4 commit.

Applying.

--b.

>
> Signed-off-by: Bernd Schubert <[email protected]>
> ---
> fs/nfsd/nfs4proc.c | 2 +-
> 1 files changed, 1 insertions(+), 1 deletions(-)
>
> diff --git a/fs/nfsd/nfs4proc.c b/fs/nfsd/nfs4proc.c
> index e807776..9bf0a66 100644
> --- a/fs/nfsd/nfs4proc.c
> +++ b/fs/nfsd/nfs4proc.c
> @@ -691,7 +691,7 @@ nfsd4_readdir(struct svc_rqst *rqstp, struct nfsd4_compound_state *cstate,
> readdir->rd_bmval[1] &= nfsd_suppattrs1(cstate->minorversion);
> readdir->rd_bmval[2] &= nfsd_suppattrs2(cstate->minorversion);
>
> - if ((cookie > ~(u32)0) || (cookie == 1) || (cookie == 2) ||
> + if ((cookie == 1) || (cookie == 2) ||
> (cookie == 0 && memcmp(readdir->rd_verf.data, zeroverf.data, NFS4_VERIFIER_SIZE)))
> return nfserr_bad_cookie;
>
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html

2011-08-09 17:33:46

by J. Bruce Fields

[permalink] [raw]
Subject: Re: [PATCH 4/4] nfsd: vfs_llseek() with 32 or 64 bit offsets (hashes)

On Mon, Aug 08, 2011 at 05:38:13PM +0200, Bernd Schubert wrote:
> Use 32-bit or 64-bit llseek() hashes for directory offsets depending on
> the NFS version. NFSv2 gets 32-bit hashes only.
>
> NOTE: This patch got rather complex as Christoph asked to set the
> filp->f_mode flag in the open call or immediatly after dentry_open()
> in nfsd_open() to avoid races.
> Personally I still do not see a reason for that and in my opinion
> FMODE_32BITHASH/FMODE_64BITHASH flags could be set nfsd_readdir(), as it
> follows directly after nfsd_open() without a chance of races.

The bulk of the patch seems to be just an access->may_flags rename.
Could you please split that into a separate patch?

--b.

>
>
> Signed-off-by: Bernd Schubert <[email protected]>
> ---
> fs/nfsd/vfs.c | 33 +++++++++++++++++++++++----------
> fs/nfsd/vfs.h | 2 ++
> 2 files changed, 25 insertions(+), 10 deletions(-)
>
> diff --git a/fs/nfsd/vfs.c b/fs/nfsd/vfs.c
> index fd0acca..4bb517f 100644
> --- a/fs/nfsd/vfs.c
> +++ b/fs/nfsd/vfs.c
> @@ -708,12 +708,13 @@ static int nfsd_open_break_lease(struct inode *inode, int access)
>
> /*
> * Open an existing file or directory.
> - * The access argument indicates the type of open (read/write/lock)
> + * The may_flags argument indicates the type of open (read/write/lock)
> + * and additional flags.
> * N.B. After this call fhp needs an fh_put
> */
> __be32
> nfsd_open(struct svc_rqst *rqstp, struct svc_fh *fhp, int type,
> - int access, struct file **filp)
> + int may_flags, struct file **filp)
> {
> struct dentry *dentry;
> struct inode *inode;
> @@ -728,7 +729,7 @@ nfsd_open(struct svc_rqst *rqstp, struct svc_fh *fhp, int type,
> * and (hopefully) checked permission - so allow OWNER_OVERRIDE
> * in case a chmod has now revoked permission.
> */
> - err = fh_verify(rqstp, fhp, type, access | NFSD_MAY_OWNER_OVERRIDE);
> + err = fh_verify(rqstp, fhp, type, may_flags | NFSD_MAY_OWNER_OVERRIDE);
> if (err)
> goto out;
>
> @@ -739,7 +740,7 @@ nfsd_open(struct svc_rqst *rqstp, struct svc_fh *fhp, int type,
> * or any access when mandatory locking enabled
> */
> err = nfserr_perm;
> - if (IS_APPEND(inode) && (access & NFSD_MAY_WRITE))
> + if (IS_APPEND(inode) && (may_flags & NFSD_MAY_WRITE))
> goto out;
> /*
> * We must ignore files (but only files) which might have mandatory
> @@ -752,12 +753,12 @@ nfsd_open(struct svc_rqst *rqstp, struct svc_fh *fhp, int type,
> if (!inode->i_fop)
> goto out;
>
> - host_err = nfsd_open_break_lease(inode, access);
> + host_err = nfsd_open_break_lease(inode, may_flags);
> if (host_err) /* NOMEM or WOULDBLOCK */
> goto out_nfserr;
>
> - if (access & NFSD_MAY_WRITE) {
> - if (access & NFSD_MAY_READ)
> + if (may_flags & NFSD_MAY_WRITE) {
> + if (may_flags & NFSD_MAY_READ)
> flags = O_RDWR|O_LARGEFILE;
> else
> flags = O_WRONLY|O_LARGEFILE;
> @@ -766,8 +767,15 @@ nfsd_open(struct svc_rqst *rqstp, struct svc_fh *fhp, int type,
> flags, current_cred());
> if (IS_ERR(*filp))
> host_err = PTR_ERR(*filp);
> - else
> - host_err = ima_file_check(*filp, access);
> + else {
> + host_err = ima_file_check(*filp, may_flags);
> +
> + if (may_flags & NFSD_MAY_64BIT_COOKIE)
> + (*filp)->f_mode |= FMODE_64BITHASH;
> + else
> + (*filp)->f_mode |= FMODE_32BITHASH;
> + }
> +
> out_nfserr:
> err = nfserrno(host_err);
> out:
> @@ -1989,8 +1997,13 @@ nfsd_readdir(struct svc_rqst *rqstp, struct svc_fh *fhp, loff_t *offsetp,
> __be32 err;
> struct file *file;
> loff_t offset = *offsetp;
> + int flags = NFSD_MAY_READ;
> +
> + /* NFSv2 only supports 32 bit cookies */
> + if (rqstp->rq_vers > 2)
> + flags |= NFSD_MAY_64BIT_COOKIE;
>
> - err = nfsd_open(rqstp, fhp, S_IFDIR, NFSD_MAY_READ, &file);
> + err = nfsd_open(rqstp, fhp, S_IFDIR, flags, &file);
> if (err)
> goto out;
>
> diff --git a/fs/nfsd/vfs.h b/fs/nfsd/vfs.h
> index e0bbac0..ecd00e1 100644
> --- a/fs/nfsd/vfs.h
> +++ b/fs/nfsd/vfs.h
> @@ -26,6 +26,8 @@
> #define NFSD_MAY_NOT_BREAK_LEASE 512
> #define NFSD_MAY_BYPASS_GSS 1024
>
> +#define NFSD_MAY_64BIT_COOKIE 2048 /* 64 bit readdir cookies for >= NFSv3 */
> +
> #define NFSD_MAY_CREATE (NFSD_MAY_EXEC|NFSD_MAY_WRITE)
> #define NFSD_MAY_REMOVE (NFSD_MAY_EXEC|NFSD_MAY_WRITE|NFSD_MAY_TRUNC)
>
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html

2011-08-09 17:39:38

by Boaz Harrosh

[permalink] [raw]
Subject: Re: [PATCH 3/4] RFC: Remove check for a 32-bit cookie in nfsd4_readdir()

On 08/09/2011 10:31 AM, J. Bruce Fields wrote:
> On Mon, Aug 08, 2011 at 05:38:08PM +0200, Bernd Schubert wrote:
>> Fan Yong <[email protected]> noticed setting
>> FMODE_32bithash wouldn't work with nfsd v4, as
>> nfsd4_readdir() checks for 32 bit cookies. However, according to RFC 3530
>> cookies have a 64 bit type and cookies are also defined as u64 in
>> 'struct nfsd4_readdir'. So remove the test for >32-bit values.
>
> Wow, thanks, I wonder where that check came from. Looks like it was
> there since the very first nfsv4 commit.
>

Even for the "very first nfsv4 commit" it sounds like a stupid bug.
Probably a copy/paste from V2 code. What about V3, does V3 code have
this bug?

Boaz

> Applying.
>
> --b.
>
>>
>> Signed-off-by: Bernd Schubert <[email protected]>
>> ---
>> fs/nfsd/nfs4proc.c | 2 +-
>> 1 files changed, 1 insertions(+), 1 deletions(-)
>>
>> diff --git a/fs/nfsd/nfs4proc.c b/fs/nfsd/nfs4proc.c
>> index e807776..9bf0a66 100644
>> --- a/fs/nfsd/nfs4proc.c
>> +++ b/fs/nfsd/nfs4proc.c
>> @@ -691,7 +691,7 @@ nfsd4_readdir(struct svc_rqst *rqstp, struct nfsd4_compound_state *cstate,
>> readdir->rd_bmval[1] &= nfsd_suppattrs1(cstate->minorversion);
>> readdir->rd_bmval[2] &= nfsd_suppattrs2(cstate->minorversion);
>>
>> - if ((cookie > ~(u32)0) || (cookie == 1) || (cookie == 2) ||
>> + if ((cookie == 1) || (cookie == 2) ||
>> (cookie == 0 && memcmp(readdir->rd_verf.data, zeroverf.data, NFS4_VERIFIER_SIZE)))
>> return nfserr_bad_cookie;
>>
>>
>> --
>> To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
>> the body of a message to [email protected]
>> More majordomo info at http://vger.kernel.org/majordomo-info.html
> --
> To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html


2011-08-09 18:05:09

by J. Bruce Fields

[permalink] [raw]
Subject: Re: [PATCH 3/4] RFC: Remove check for a 32-bit cookie in nfsd4_readdir()

On Tue, Aug 09, 2011 at 10:39:04AM -0700, Boaz Harrosh wrote:
> On 08/09/2011 10:31 AM, J. Bruce Fields wrote:
> > On Mon, Aug 08, 2011 at 05:38:08PM +0200, Bernd Schubert wrote:
> >> Fan Yong <[email protected]> noticed setting
> >> FMODE_32bithash wouldn't work with nfsd v4, as
> >> nfsd4_readdir() checks for 32 bit cookies. However, according to RFC 3530
> >> cookies have a 64 bit type and cookies are also defined as u64 in
> >> 'struct nfsd4_readdir'. So remove the test for >32-bit values.
> >
> > Wow, thanks, I wonder where that check came from. Looks like it was
> > there since the very first nfsv4 commit.
> >
>
> Even for the "very first nfsv4 commit" it sounds like a stupid bug.
> Probably a copy/paste from V2 code. What about V3, does V3 code have
> this bug?

I didn't read it carefully, but on a quick skim I don't see any similar
checks.

Not only that, but I don't see them in any historical versions either.

So somebody went out of their way to do that check. Hm.

--b.

>
> Boaz
>
> > Applying.
> >
> > --b.
> >
> >>
> >> Signed-off-by: Bernd Schubert <[email protected]>
> >> ---
> >> fs/nfsd/nfs4proc.c | 2 +-
> >> 1 files changed, 1 insertions(+), 1 deletions(-)
> >>
> >> diff --git a/fs/nfsd/nfs4proc.c b/fs/nfsd/nfs4proc.c
> >> index e807776..9bf0a66 100644
> >> --- a/fs/nfsd/nfs4proc.c
> >> +++ b/fs/nfsd/nfs4proc.c
> >> @@ -691,7 +691,7 @@ nfsd4_readdir(struct svc_rqst *rqstp, struct nfsd4_compound_state *cstate,
> >> readdir->rd_bmval[1] &= nfsd_suppattrs1(cstate->minorversion);
> >> readdir->rd_bmval[2] &= nfsd_suppattrs2(cstate->minorversion);
> >>
> >> - if ((cookie > ~(u32)0) || (cookie == 1) || (cookie == 2) ||
> >> + if ((cookie == 1) || (cookie == 2) ||
> >> (cookie == 0 && memcmp(readdir->rd_verf.data, zeroverf.data, NFS4_VERIFIER_SIZE)))
> >> return nfserr_bad_cookie;
> >>
> >>
> >> --
> >> To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
> >> the body of a message to [email protected]
> >> More majordomo info at http://vger.kernel.org/majordomo-info.html
> > --
> > To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
> > the body of a message to [email protected]
> > More majordomo info at http://vger.kernel.org/majordomo-info.html
>

2011-08-10 19:13:09

by Bernd Schubert

[permalink] [raw]
Subject: Re: [PATCH 4/4] nfsd: vfs_llseek() with 32 or 64 bit offsets (hashes)

On 08/09/2011 07:33 PM, J. Bruce Fields wrote:
> On Mon, Aug 08, 2011 at 05:38:13PM +0200, Bernd Schubert wrote:
>> Use 32-bit or 64-bit llseek() hashes for directory offsets depending on
>> the NFS version. NFSv2 gets 32-bit hashes only.
>>
>> NOTE: This patch got rather complex as Christoph asked to set the
>> filp->f_mode flag in the open call or immediatly after dentry_open()
>> in nfsd_open() to avoid races.
>> Personally I still do not see a reason for that and in my opinion
>> FMODE_32BITHASH/FMODE_64BITHASH flags could be set nfsd_readdir(), as it
>> follows directly after nfsd_open() without a chance of races.
>
> The bulk of the patch seems to be just an access->may_flags rename.
> Could you please split that into a separate patch?

Ok, shall I resend the entire patch series, but already remove the
32-bit nfsd_readdir() cookie patch? Or only just this patch split into
to parts?


Thanks,
Bernd

2011-08-10 19:35:11

by J. Bruce Fields

[permalink] [raw]
Subject: Re: [PATCH 4/4] nfsd: vfs_llseek() with 32 or 64 bit offsets (hashes)

On Wed, Aug 10, 2011 at 09:13:09PM +0200, Bernd Schubert wrote:
> On 08/09/2011 07:33 PM, J. Bruce Fields wrote:
> >On Mon, Aug 08, 2011 at 05:38:13PM +0200, Bernd Schubert wrote:
> >>Use 32-bit or 64-bit llseek() hashes for directory offsets depending on
> >>the NFS version. NFSv2 gets 32-bit hashes only.
> >>
> >>NOTE: This patch got rather complex as Christoph asked to set the
> >>filp->f_mode flag in the open call or immediatly after dentry_open()
> >>in nfsd_open() to avoid races.
> >>Personally I still do not see a reason for that and in my opinion
> >>FMODE_32BITHASH/FMODE_64BITHASH flags could be set nfsd_readdir(), as it
> >>follows directly after nfsd_open() without a chance of races.
> >
> >The bulk of the patch seems to be just an access->may_flags rename.
> >Could you please split that into a separate patch?
>
> Ok, shall I resend the entire patch series, but already remove the
> 32-bit nfsd_readdir() cookie patch? Or only just this patch split
> into to parts?

Probably best to resend. Who's going to take these patches?

(Looked like it would probably make the most sense for an ext4 tree, as
that looked like the trickiest part? But I'll take the nfsd4 fix.)

--b.