With the ext3/ext4 directory index implementation hashes are used to specify
offsets for llseek(). For compatibility with NFSv2 and 32-bit user space
on 64-bit systems (kernel space) ext3/ext4 currently only return 32-bit
hashes and therefore the probability of hash collisions for larger directories
is rather high. As recently reported on the NFS mailing list that theoretical
problem also happens on real systems:
http://comments.gmane.org/gmane.linux.nfs/40863
The following series adds two new open flags to tell ext4
to to 32-bit or 64-bit hash values for llseek() calls.
These flags are then used by NFS to use 32-bit (NFSv2) or 64-bit
offsets (hashes in case of ext3/ext4) for readdir and seekdir.
User space does not need to specify these flags, but usually the check
for is_32bit_api() should be sufficient.
---
Bernd Schubert (2):
Remove check for a 32-bit cookie in nfsd4_readdir()
nfsd: vfs_llseek() with O_32BITHASH or O_64BITHASH
Fan Yong (1):
Return 32/64-bit dir name hash according to usage type
fs/ext4/dir.c | 160 ++++++++++++++++++++++++++++++++++---------
fs/fcntl.c | 5 +
fs/nfsd/nfs4proc.c | 2 -
fs/nfsd/vfs.c | 6 ++
include/asm-generic/fcntl.h | 9 ++
5 files changed, 145 insertions(+), 37 deletions(-)
--
Bernd Schubert
Use 32-bit or 64-bit llseek() hashes for directory offsets depending on
the NFS version. NFSv2 gets 32-bit hashes only.
Signed-off-by: Bernd Schubert <[email protected]>
---
fs/nfsd/vfs.c | 6 ++++++
1 files changed, 6 insertions(+), 0 deletions(-)
diff --git a/fs/nfsd/vfs.c b/fs/nfsd/vfs.c
index fd0acca..d79bbcd 100644
--- a/fs/nfsd/vfs.c
+++ b/fs/nfsd/vfs.c
@@ -1994,6 +1994,12 @@ nfsd_readdir(struct svc_rqst *rqstp, struct svc_fh *fhp, loff_t *offsetp,
if (err)
goto out;
+ /* NFSv2 only supports 32 bit cookies */
+ if (rqstp->rq_vers > 2)
+ file->f_flags &= O_64BITHASH;
+ else
+ file->f_flags &= O_32BITHASH;
+
offset = vfs_llseek(file, offset, 0);
if (offset < 0) {
err = nfserrno((int)offset);
On Wed, Jul 27, 2011 at 01:02:59PM +0200, Bernd Schubert wrote:
> Use 32-bit or 64-bit llseek() hashes for directory offsets depending on
> the NFS version. NFSv2 gets 32-bit hashes only.
Independent of the O_ vs FMODE thing make sure you pass the correct
flag at open time, instead of racy runtime modifications.
On 07/27/2011 11:03 PM, Christoph Hellwig wrote:
> On Wed, Jul 27, 2011 at 01:02:59PM +0200, Bernd Schubert wrote:
>> Use 32-bit or 64-bit llseek() hashes for directory offsets depending on
>> the NFS version. NFSv2 gets 32-bit hashes only.
>
> Independent of the O_ vs FMODE thing make sure you pass the correct
> flag at open time, instead of racy runtime modifications.
>
Christoph,
before I'm going to work further on the patch sets, I have few questions
first. Could you please help me with that?
file->f_mode is set in __dentry_open() based on O_ flags.
So if f_mode is supposed to be set directly during the NFS open call we
would need O_ *and* FMODE flags,
Now I still do not understand why we cannot add a flag *after* the open
call and only in nfsd_readdir()? I do not see any races there.
nfsd_readdir() gets its 'struct file' from get_empty_filp() and
__dentry_open(). Now as 'struct file' is allocated by get_empty_filp()
it also cannot be used by any other thread.
nfsd_readdir() just reads the directory and closes it immedeatily after
the readdir().
So where is there supposed to be a race?
And lastly, if we are going to set f_mode directly at open time, we have
the choice to
1) Specify those new O_ flags for all files. While setting a flag is a
cheap operation, it still wastes CPU cycles for file opens, as files do
not need that flag.
2) Duplicate nfsd_open() to implement a new function for directories
only. I think not a good idea either.
3) Rewrite the nfsd_open() function to allow to set flags from calling
functions. That would mean to update the nfsd code at at least 8 places.
Do we really want to go that way?
So altogether, updating the patches to replace O_ by FMODE flags is
easy, but setting that flag in nfsd at open time, would mean a large
overhead.
Thanks,
Bernd
On Thu, Jul 28, 2011 at 11:19:07AM +0200, Bernd Schubert wrote:
> before I'm going to work further on the patch sets, I have few
> questions first. Could you please help me with that?
>
> file->f_mode is set in __dentry_open() based on O_ flags.
Or right after dentry_open, before anyone can possibly grab a reference
to the file. Just do that in nfsd_open based on an NFSD_MAY_ flag.