This adds a new ioctl, EXT3_IOC_32BITHASH, which allows a
userspace application to request 32-bit rather than 64-bit
hashes from readdir on an indexed / dx / htree directory.
Gluster had been relying on the top bits of the d_off being
free; there are some reports that filling all 64 bits breaks
Samba as well. The infrastructure to return 32-bit hashes
already exists; NFS can turn it on, and it's turned on for
32-bit processes as well. So it's just a matter of flipping
on the f_mode flag before readdir starts.
Care needs to be taken that we don't change the FMODE flag
after readdir has been started, so we make sure that
filp->private_data has not yet been set before we set the flag
(Thanks Zach!).
Pre-submission-fixes-by: Zach Brown <[email protected]>
Signed-off-by: Eric Sandeen <[email protected]>
---
diff --git a/fs/ext3/dir.c b/fs/ext3/dir.c
index 87eccbb..83df29f 100644
--- a/fs/ext3/dir.c
+++ b/fs/ext3/dir.c
@@ -47,7 +47,7 @@ static unsigned char get_dtype(struct super_block *sb, int filetype)
*
* Return 1 if it is a dx dir, 0 if not
*/
-static int is_dx_dir(struct inode *inode)
+int is_dx_dir(struct inode *inode)
{
struct super_block *sb = inode->i_sb;
diff --git a/fs/ext3/ext3.h b/fs/ext3/ext3.h
index e85ff15..b88cf19 100644
--- a/fs/ext3/ext3.h
+++ b/fs/ext3/ext3.h
@@ -220,6 +220,7 @@ struct ext3_new_group_data {
#endif
#define EXT3_IOC_GETRSVSZ _IOR('f', 5, long)
#define EXT3_IOC_SETRSVSZ _IOW('f', 6, long)
+#define EXT3_IOC_32BITHASH _IOW('f', 13, long)
/*
* ioctl commands in 32 bit emulation
@@ -1010,6 +1011,7 @@ extern void ext3_rsv_window_add(struct super_block *sb, struct ext3_reserve_wind
extern int ext3_trim_fs(struct super_block *sb, struct fstrim_range *range);
/* dir.c */
+extern int is_dx_dir(struct inode *inode);
extern int ext3_check_dir_entry(const char *, struct inode *,
struct ext3_dir_entry_2 *,
struct buffer_head *, unsigned long);
diff --git a/fs/ext3/ioctl.c b/fs/ext3/ioctl.c
index 4d96e9a..aab217d 100644
--- a/fs/ext3/ioctl.c
+++ b/fs/ext3/ioctl.c
@@ -251,6 +251,41 @@ group_add_out:
mnt_drop_write_file(filp);
return err;
}
+ case EXT3_IOC_32BITHASH: {
+ __u32 hash32bits;
+ int err = 0;
+
+ if (get_user(hash32bits, (int __user *) arg))
+ return -EFAULT;
+
+ /* Serialize with readdir */
+ if ((err = mutex_lock_killable(&inode->i_mutex)))
+ return err;
+
+ /* protect f_mode */
+ spin_lock(&filp->f_lock);
+
+ /* Only valid for htree directories */
+ if (!S_ISDIR(inode->i_mode) || !is_dx_dir(inode)) {
+ err = -EINVAL;
+ goto out_32bithash;
+ }
+
+ /* Have we already started readir on this dx dir? */
+ if (filp->private_data) {
+ err = -EINVAL;
+ goto out_32bithash;
+ }
+
+ if (hash32bits)
+ filp->f_mode |= FMODE_32BITHASH;
+ else
+ filp->f_mode &= ~FMODE_32BITHASH;
+out_32bithash:
+ spin_unlock(&filp->f_lock);
+ mutex_unlock(&inode->i_mutex);
+ return err;
+ }
case FITRIM: {
struct super_block *sb = inode->i_sb;
This adds a new ioctl, EXT3_IOC_32BITHASH, which allows a
userspace application to request 32-bit rather than 64-bit
hashes from readdir on an indexed / dx / htree directory.
Gluster had been relying on the top bits of the d_off being
free; there are some reports that filling all 64 bits breaks
Samba as well. The infrastructure to return 32-bit hashes
already exists; NFS can turn it on, and it's turned on for
32-bit processes as well. So it's just a matter of flipping
on the f_mode flag before readdir starts.
Care needs to be taken that we don't change the FMODE flag
after readdir has been started, so we make sure that
filp->private_data has not yet been set before we set the flag.
(Thanks Zach!).
Pre-submission-fixes-by: Zach Brown <[email protected]>
Signed-off-by: Eric Sandeen <[email protected]>
---
V2:
fix "readir" typo
rename goto target to *_out like others
remove parameter; we can't really ever turn this back off once it's used.
closing and reopening is the only way to get back to 64 bit hashes.
diff --git a/fs/ext3/dir.c b/fs/ext3/dir.c
index 87eccbb..83df29f 100644
--- a/fs/ext3/dir.c
+++ b/fs/ext3/dir.c
@@ -47,7 +47,7 @@ static unsigned char get_dtype(struct super_block *sb, int filetype)
*
* Return 1 if it is a dx dir, 0 if not
*/
-static int is_dx_dir(struct inode *inode)
+int is_dx_dir(struct inode *inode)
{
struct super_block *sb = inode->i_sb;
diff --git a/fs/ext3/ext3.h b/fs/ext3/ext3.h
index e85ff15..f3018f4 100644
--- a/fs/ext3/ext3.h
+++ b/fs/ext3/ext3.h
@@ -220,6 +220,7 @@ struct ext3_new_group_data {
#endif
#define EXT3_IOC_GETRSVSZ _IOR('f', 5, long)
#define EXT3_IOC_SETRSVSZ _IOW('f', 6, long)
+#define EXT3_IOC_32BITHASH _IO('f', 13)
/*
* ioctl commands in 32 bit emulation
@@ -1010,6 +1011,7 @@ extern void ext3_rsv_window_add(struct super_block *sb, struct ext3_reserve_wind
extern int ext3_trim_fs(struct super_block *sb, struct fstrim_range *range);
/* dir.c */
+extern int is_dx_dir(struct inode *inode);
extern int ext3_check_dir_entry(const char *, struct inode *,
struct ext3_dir_entry_2 *,
struct buffer_head *, unsigned long);
diff --git a/fs/ext3/ioctl.c b/fs/ext3/ioctl.c
index 4d96e9a..3811664 100644
--- a/fs/ext3/ioctl.c
+++ b/fs/ext3/ioctl.c
@@ -251,6 +251,34 @@ group_add_out:
mnt_drop_write_file(filp);
return err;
}
+ case EXT3_IOC_32BITHASH: {
+ int err = 0;
+
+ /* Serialize with readdir */
+ if ((err = mutex_lock_killable(&inode->i_mutex)))
+ return err;
+
+ /* protect f_mode */
+ spin_lock(&filp->f_lock);
+
+ /* Only valid for htree directories */
+ if (!S_ISDIR(inode->i_mode) || !is_dx_dir(inode)) {
+ err = -EINVAL;
+ goto hash32bits_out;
+ }
+
+ /* Have we already started readdir on this dx dir? */
+ if (filp->private_data) {
+ err = -EINVAL;
+ goto hash32bits_out;
+ }
+
+ filp->f_mode |= FMODE_32BITHASH;
+hash32bits_out:
+ spin_unlock(&filp->f_lock);
+ mutex_unlock(&inode->i_mutex);
+ return err;
+ }
case FITRIM: {
struct super_block *sb = inode->i_sb;
On Thu 28-03-13 15:40:20, Eric Sandeen wrote:
> This adds a new ioctl, EXT3_IOC_32BITHASH, which allows a
> userspace application to request 32-bit rather than 64-bit
> hashes from readdir on an indexed / dx / htree directory.
>
> Gluster had been relying on the top bits of the d_off being
> free; there are some reports that filling all 64 bits breaks
> Samba as well. The infrastructure to return 32-bit hashes
> already exists; NFS can turn it on, and it's turned on for
> 32-bit processes as well. So it's just a matter of flipping
> on the f_mode flag before readdir starts.
>
> Care needs to be taken that we don't change the FMODE flag
> after readdir has been started, so we make sure that
> filp->private_data has not yet been set before we set the flag.
> (Thanks Zach!).
>
> Pre-submission-fixes-by: Zach Brown <[email protected]>
> Signed-off-by: Eric Sandeen <[email protected]>
> ---
>
> V2:
> fix "readir" typo
> rename goto target to *_out like others
> remove parameter; we can't really ever turn this back off once it's used.
> closing and reopening is the only way to get back to 64 bit hashes.
Looks good. Just one nit below:
> diff --git a/fs/ext3/ext3.h b/fs/ext3/ext3.h
> + case EXT3_IOC_32BITHASH: {
> + int err = 0;
> +
> + /* Serialize with readdir */
> + if ((err = mutex_lock_killable(&inode->i_mutex)))
> + return err;
> +
> + /* protect f_mode */
> + spin_lock(&filp->f_lock);
> +
> + /* Only valid for htree directories */
> + if (!S_ISDIR(inode->i_mode) || !is_dx_dir(inode)) {
Won't it be better to return ENOTDIR in !S_ISDIR case?
> + err = -EINVAL;
> + goto hash32bits_out;
> + }
> +
> + /* Have we already started readdir on this dx dir? */
> + if (filp->private_data) {
> + err = -EINVAL;
And here maybe EBUSY, but in this case I'm really undecided.
> + goto hash32bits_out;
> + }
> +
> + filp->f_mode |= FMODE_32BITHASH;
> +hash32bits_out:
> + spin_unlock(&filp->f_lock);
> + mutex_unlock(&inode->i_mutex);
> + return err;
> + }
> case FITRIM: {
>
> struct super_block *sb = inode->i_sb;
Honza
--
Jan Kara <[email protected]>
SUSE Labs, CR
On 3/29/13 6:43 AM, Jan Kara wrote:
> On Thu 28-03-13 15:40:20, Eric Sandeen wrote:
>> This adds a new ioctl, EXT3_IOC_32BITHASH, which allows a
>> userspace application to request 32-bit rather than 64-bit
>> hashes from readdir on an indexed / dx / htree directory.
>>
>> Gluster had been relying on the top bits of the d_off being
>> free; there are some reports that filling all 64 bits breaks
>> Samba as well. The infrastructure to return 32-bit hashes
>> already exists; NFS can turn it on, and it's turned on for
>> 32-bit processes as well. So it's just a matter of flipping
>> on the f_mode flag before readdir starts.
>>
>> Care needs to be taken that we don't change the FMODE flag
>> after readdir has been started, so we make sure that
>> filp->private_data has not yet been set before we set the flag.
>> (Thanks Zach!).
>>
>> Pre-submission-fixes-by: Zach Brown <[email protected]>
>> Signed-off-by: Eric Sandeen <[email protected]>
>> ---
>>
>> V2:
>> fix "readir" typo
>> rename goto target to *_out like others
>> remove parameter; we can't really ever turn this back off once it's used.
>> closing and reopening is the only way to get back to 64 bit hashes.
> Looks good. Just one nit below:
Oh, of course, duh! I'll send V3, thanks.
-Eric
>> diff --git a/fs/ext3/ext3.h b/fs/ext3/ext3.h
>> + case EXT3_IOC_32BITHASH: {
>> + int err = 0;
>> +
>> + /* Serialize with readdir */
>> + if ((err = mutex_lock_killable(&inode->i_mutex)))
>> + return err;
>> +
>> + /* protect f_mode */
>> + spin_lock(&filp->f_lock);
>> +
>> + /* Only valid for htree directories */
>> + if (!S_ISDIR(inode->i_mode) || !is_dx_dir(inode)) {
> Won't it be better to return ENOTDIR in !S_ISDIR case?
>
>> + err = -EINVAL;
>> + goto hash32bits_out;
>> + }
>> +
>> + /* Have we already started readdir on this dx dir? */
>> + if (filp->private_data) {
>> + err = -EINVAL;
> And here maybe EBUSY, but in this case I'm really undecided.
>
>> + goto hash32bits_out;
>> + }
>> +
>> + filp->f_mode |= FMODE_32BITHASH;
>> +hash32bits_out:
>> + spin_unlock(&filp->f_lock);
>> + mutex_unlock(&inode->i_mutex);
>> + return err;
>> + }
>> case FITRIM: {
>>
>> struct super_block *sb = inode->i_sb;
>
> Honza
>
This adds a new ioctl, EXT3_IOC_32BITHASH, which allows a
userspace application to request 32-bit rather than 64-bit
hashes from readdir on an indexed / dx / htree directory.
Gluster had been relying on the top bits of the d_off being
free; there are some reports that filling all 64 bits breaks
Samba as well. The infrastructure to return 32-bit hashes
already exists; NFS can turn it on, and it's turned on for
32-bit processes as well. So it's just a matter of flipping
on the f_mode flag before readdir starts.
Care needs to be taken that we don't change the FMODE flag
after readdir has been started, so we make sure that
filp->private_data has not yet been set before we set the flag.
(Thanks Zach!).
Pre-submission-fixes-by: Zach Brown <[email protected]>
Signed-off-by: Eric Sandeen <[email protected]>
---
V2:
fix "readir" typo
rename goto target to *_out like others
remove parameter; we can't really ever turn this back off once it's used.
closing and reopening is the only way to get back to 64 bit hashes.
V3:
return -ENOTDIR if the target is not a directory
diff --git a/fs/ext3/dir.c b/fs/ext3/dir.c
index 87eccbb..83df29f 100644
--- a/fs/ext3/dir.c
+++ b/fs/ext3/dir.c
@@ -47,7 +47,7 @@ static unsigned char get_dtype(struct super_block *sb, int filetype)
*
* Return 1 if it is a dx dir, 0 if not
*/
-static int is_dx_dir(struct inode *inode)
+int is_dx_dir(struct inode *inode)
{
struct super_block *sb = inode->i_sb;
diff --git a/fs/ext3/ext3.h b/fs/ext3/ext3.h
index e85ff15..f3018f4 100644
--- a/fs/ext3/ext3.h
+++ b/fs/ext3/ext3.h
@@ -220,6 +220,7 @@ struct ext3_new_group_data {
#endif
#define EXT3_IOC_GETRSVSZ _IOR('f', 5, long)
#define EXT3_IOC_SETRSVSZ _IOW('f', 6, long)
+#define EXT3_IOC_32BITHASH _IO('f', 13)
/*
* ioctl commands in 32 bit emulation
@@ -1010,6 +1011,7 @@ extern void ext3_rsv_window_add(struct super_block *sb, struct ext3_reserve_wind
extern int ext3_trim_fs(struct super_block *sb, struct fstrim_range *range);
/* dir.c */
+extern int is_dx_dir(struct inode *inode);
extern int ext3_check_dir_entry(const char *, struct inode *,
struct ext3_dir_entry_2 *,
struct buffer_head *, unsigned long);
diff --git a/fs/ext3/ioctl.c b/fs/ext3/ioctl.c
index 4d96e9a..a7d2b0a 100644
--- a/fs/ext3/ioctl.c
+++ b/fs/ext3/ioctl.c
@@ -251,6 +251,39 @@ group_add_out:
mnt_drop_write_file(filp);
return err;
}
+ case EXT3_IOC_32BITHASH: {
+ int err = 0;
+
+ /* Serialize with readdir */
+ if ((err = mutex_lock_killable(&inode->i_mutex)))
+ return err;
+
+ /* protect f_mode */
+ spin_lock(&filp->f_lock);
+
+ /* Only valid for htree directories */
+ if (!S_ISDIR(inode->i_mode)) {
+ err = -ENOTDIR;
+ goto hash32bits_out;
+ }
+
+ if (!is_dx_dir(inode)) {
+ err = -EINVAL;
+ goto hash32bits_out;
+ }
+
+ /* Have we already started readdir on this dx dir? */
+ if (filp->private_data) {
+ err = -EINVAL;
+ goto hash32bits_out;
+ }
+
+ filp->f_mode |= FMODE_32BITHASH;
+hash32bits_out:
+ spin_unlock(&filp->f_lock);
+ mutex_unlock(&inode->i_mutex);
+ return err;
+ }
case FITRIM: {
struct super_block *sb = inode->i_sb;
On Mon, Apr 01, 2013 at 10:33:41AM -0500, Eric Sandeen wrote:
> + /* Have we already started readdir on this dx dir? */
> + if (filp->private_data) {
> + err = -EINVAL;
> + goto hash32bits_out;
> + }
I'm not sure how much we care, but if filp->private_data is non-NULL
and filp->f_pos == 0, then the application must have called
rewinddir(), and at that point it would be fair game for us to free
the rb_tree (see the code in ext4_dx_readdir() just after the comment
"Some one has messed with f_pos; reset the world".)
This would allow a bit more flexibility than just requiring that the
ioctl be issued just after the opendir(), and allow it just after a
call to rewinddir().
This this would invalidate any previously issued telldir() cookies,
but according to SuSv3, telldir() cookies are not guaranteed to be
consistent after a rewinddir() or a closedir(), so it would be
standards compliant to do this.
- Ted
On 4/1/13 1:17 PM, Theodore Ts'o wrote:
> On Mon, Apr 01, 2013 at 10:33:41AM -0500, Eric Sandeen wrote:
>> + /* Have we already started readdir on this dx dir? */
>> + if (filp->private_data) {
>> + err = -EINVAL;
>> + goto hash32bits_out;
>> + }
>
> I'm not sure how much we care, but if filp->private_data is non-NULL
> and filp->f_pos == 0, then the application must have called
> rewinddir(), and at that point it would be fair game for us to free
> the rb_tree (see the code in ext4_dx_readdir() just after the comment
> "Some one has messed with f_pos; reset the world".)
Hm, I originally tested f_pos == 0, and Zach thought that wasn't the right
test, suggested f_private, and that seemed right to me, but yeah, I suppose
!f_pos && f_private means it got rewound.
> This would allow a bit more flexibility than just requiring that the
> ioctl be issued just after the opendir(), and allow it just after a
> call to rewinddir().
I guess I do wonder what real-world use that might have, though.
> This this would invalidate any previously issued telldir() cookies,
> but according to SuSv3, telldir() cookies are not guaranteed to be
> consistent after a rewinddir() or a closedir(), so it would be
> standards compliant to do this.
Well, could do. Think it's worth it?
If so, then we could also allow flipping the flag on and off.
I suppose it's good to get the interface right to start with;
if we really want to be able to flip it on and off after a rewinddir,
then . . . ok, V4? It's more flexible, but I don't know what the
usecase might be.
-Eric
> - Ted
>
On Mon, Apr 01, 2013 at 01:21:51PM -0500, Eric Sandeen wrote:
> > This would allow a bit more flexibility than just requiring that the
> > ioctl be issued just after the opendir(), and allow it just after a
> > call to rewinddir().
>
> I guess I do wonder what real-world use that might have, though.
To be honest, I can't think of one. And if the presumption is this is
just going to be a special case hack, maybe we shouldn't worry about
the general-use case.
Thinking about this some more, keeping this simple might be better way
to go. It's not like we really want to be encouraging people to use
this interface....
What do you think?
- Ted
On 4/1/13 2:08 PM, Theodore Ts'o wrote:
> On Mon, Apr 01, 2013 at 01:21:51PM -0500, Eric Sandeen wrote:
>>> This would allow a bit more flexibility than just requiring that the
>>> ioctl be issued just after the opendir(), and allow it just after a
>>> call to rewinddir().
>>
>> I guess I do wonder what real-world use that might have, though.
>
> To be honest, I can't think of one. And if the presumption is this is
> just going to be a special case hack, maybe we shouldn't worry about
> the general-use case.
>
> Thinking about this some more, keeping this simple might be better way
> to go. It's not like we really want to be encouraging people to use
> this interface....
>
> What do you think?
Urgh, I guess if we are adding an interface which will live "forever,"
we may as well make it full featured & flexible, as long as the complexity
isn't out of hand, and I don't think it will be in this case. So I'm at
least half inclined to go ahead & allow toggling it on and off under the
right circumstances, even though it goes against what I think is my better
judgement. ;)
-Eric
> - Ted
>
On Mon, Apr 01, 2013 at 02:49:00PM -0500, Eric Sandeen wrote:
>
> Urgh, I guess if we are adding an interface which will live "forever,"
> we may as well make it full featured & flexible, as long as the complexity
> isn't out of hand, and I don't think it will be in this case. So I'm at
> least half inclined to go ahead & allow toggling it on and off under the
> right circumstances, even though it goes against what I think is my better
> judgement. ;)
If you want to have a fully flexible interface, then we probably
should have a way to both get and set the flag. And from there the
next step down the slippery slope would be to make this be a bit more
like a fcntl-style F_GETFL/F_SETFL style interface, so we can in the
future set and get other ext4-specific struct_file-specific flags. :-)
I'll let you decide how far you want to go with this; I won't mind if
you keep it with the original KISS interface, but I also won't mind if
you want to create a somewhat more expansive interface.
Cheers,
- Ted
On 4/1/13 3:00 PM, Theodore Ts'o wrote:
> On Mon, Apr 01, 2013 at 02:49:00PM -0500, Eric Sandeen wrote:
>>
>> Urgh, I guess if we are adding an interface which will live "forever,"
>> we may as well make it full featured & flexible, as long as the complexity
>> isn't out of hand, and I don't think it will be in this case. So I'm at
>> least half inclined to go ahead & allow toggling it on and off under the
>> right circumstances, even though it goes against what I think is my better
>> judgement. ;)
>
> If you want to have a fully flexible interface, then we probably
> should have a way to both get and set the flag. And from there the
> next step down the slippery slope would be to make this be a bit more
> like a fcntl-style F_GETFL/F_SETFL style interface, so we can in the
> future set and get other ext4-specific struct_file-specific flags. :-)
>
> I'll let you decide how far you want to go with this; I won't mind if
> you keep it with the original KISS interface, but I also won't mind if
> you want to create a somewhat more expansive interface.
Meh, let's just keep it simple then. And I'd really like to know if
gluster still even needs this, or if their new scheme will work instead,
in which case we should drop it - but Samba made noise about needing it too,
though I've not seen specifics, so I hate to merge it "just in case."
I put it out mostly for review so it was ready if we needed it (since
certain quarters seem anxious) ;)
-Eric
> Cheers,
>
> - Ted
>
On Mon, Apr 01, 2013 at 03:05:35PM -0500, Eric Sandeen wrote:
>
> Meh, let's just keep it simple then. And I'd really like to know if
> gluster still even needs this, or if their new scheme will work instead,
> in which case we should drop it - but Samba made noise about needing it too,
> though I've not seen specifics, so I hate to merge it "just in case."
Yeah, it would be good to get some clarification about whether Samba
is doing some wierd and unnatural with telldir cookies and assuming
that they can expropriate the high 32-bits of the telldir cookie for
their own purposes or not....
Jeremy, any chance you could clarify whether or not Samba depends on
telldir cookies to be small integers?
Thanks,
- Ted
On Apr 1, 2013, at 1:05 PM, Eric Sandeen <[email protected]> wrote:
>
> Meh, let's just keep it simple then. And I'd really like to know if
> gluster still even needs this, or if their new scheme will work instead,
As of this morning the new d_off transformation (Zach's idea) is merged in gluster. We had to put in some kind of ext4 awareness, and the "more complex" d_off transformation (which is finally working properly after fixing some minor issues) seemed better than calling ioctls by detecting the backend is ext4.
> in which case we should drop it - but Samba made noise about needing it too,
> though I've not seen specifics, so I hate to merge it "just in case."
Yes, I too saw comments from Andrew Bartlett of the Samba team. It appeared to be the case that Samba could only present 32bit cookies while ext4 was now returning larger cookies (somewhat like the old NFSv2 clients problem?). This ioctl would be useful there I guess, bring it "in par" with knfsd's abilities of expressing desire for 32bit cookies? However, for knfsd legacy requirements, FMODE_32BITHASH is in generic VFS. But for a userspace file server, it would need to first gain the knowledge of which filesystems in the world actually present large cookies, and for the subset which support smaller cookies, issue filesystem specific ioctls() in their own specific ways.
Wouldn't it be "fair" to treat userspace file servers as equals, and provide a generic FS independent ioctl to set the common FMODE_32BITHASH flag on any dir fd? Think of it as a way of extending the "pointer access" to file->f_mode which NFS exercises, up to userspace?
> I put it out mostly for review so it was ready if we needed it (since
> certain quarters seem anxious) ;)
Appreciate your help Eric!
Avati
On Mon 01-04-13 10:33:41, Eric Sandeen wrote:
> This adds a new ioctl, EXT3_IOC_32BITHASH, which allows a
> userspace application to request 32-bit rather than 64-bit
> hashes from readdir on an indexed / dx / htree directory.
>
> Gluster had been relying on the top bits of the d_off being
> free; there are some reports that filling all 64 bits breaks
> Samba as well. The infrastructure to return 32-bit hashes
> already exists; NFS can turn it on, and it's turned on for
> 32-bit processes as well. So it's just a matter of flipping
> on the f_mode flag before readdir starts.
>
> Care needs to be taken that we don't change the FMODE flag
> after readdir has been started, so we make sure that
> filp->private_data has not yet been set before we set the flag.
> (Thanks Zach!).
OK, I'm happy with this patch. So if Samba people confirm they are really
going to use it, I'll merge the patch.
Honza
> Pre-submission-fixes-by: Zach Brown <[email protected]>
> Signed-off-by: Eric Sandeen <[email protected]>
> ---
>
> V2:
> fix "readir" typo
> rename goto target to *_out like others
> remove parameter; we can't really ever turn this back off once it's used.
> closing and reopening is the only way to get back to 64 bit hashes.
>
> V3:
> return -ENOTDIR if the target is not a directory
>
>
>
>
> diff --git a/fs/ext3/dir.c b/fs/ext3/dir.c
> index 87eccbb..83df29f 100644
> --- a/fs/ext3/dir.c
> +++ b/fs/ext3/dir.c
> @@ -47,7 +47,7 @@ static unsigned char get_dtype(struct super_block *sb, int filetype)
> *
> * Return 1 if it is a dx dir, 0 if not
> */
> -static int is_dx_dir(struct inode *inode)
> +int is_dx_dir(struct inode *inode)
> {
> struct super_block *sb = inode->i_sb;
>
> diff --git a/fs/ext3/ext3.h b/fs/ext3/ext3.h
> index e85ff15..f3018f4 100644
> --- a/fs/ext3/ext3.h
> +++ b/fs/ext3/ext3.h
> @@ -220,6 +220,7 @@ struct ext3_new_group_data {
> #endif
> #define EXT3_IOC_GETRSVSZ _IOR('f', 5, long)
> #define EXT3_IOC_SETRSVSZ _IOW('f', 6, long)
> +#define EXT3_IOC_32BITHASH _IO('f', 13)
>
> /*
> * ioctl commands in 32 bit emulation
> @@ -1010,6 +1011,7 @@ extern void ext3_rsv_window_add(struct super_block *sb, struct ext3_reserve_wind
> extern int ext3_trim_fs(struct super_block *sb, struct fstrim_range *range);
>
> /* dir.c */
> +extern int is_dx_dir(struct inode *inode);
> extern int ext3_check_dir_entry(const char *, struct inode *,
> struct ext3_dir_entry_2 *,
> struct buffer_head *, unsigned long);
> diff --git a/fs/ext3/ioctl.c b/fs/ext3/ioctl.c
> index 4d96e9a..a7d2b0a 100644
> --- a/fs/ext3/ioctl.c
> +++ b/fs/ext3/ioctl.c
> @@ -251,6 +251,39 @@ group_add_out:
> mnt_drop_write_file(filp);
> return err;
> }
> + case EXT3_IOC_32BITHASH: {
> + int err = 0;
> +
> + /* Serialize with readdir */
> + if ((err = mutex_lock_killable(&inode->i_mutex)))
> + return err;
> +
> + /* protect f_mode */
> + spin_lock(&filp->f_lock);
> +
> + /* Only valid for htree directories */
> + if (!S_ISDIR(inode->i_mode)) {
> + err = -ENOTDIR;
> + goto hash32bits_out;
> + }
> +
> + if (!is_dx_dir(inode)) {
> + err = -EINVAL;
> + goto hash32bits_out;
> + }
> +
> + /* Have we already started readdir on this dx dir? */
> + if (filp->private_data) {
> + err = -EINVAL;
> + goto hash32bits_out;
> + }
> +
> + filp->f_mode |= FMODE_32BITHASH;
> +hash32bits_out:
> + spin_unlock(&filp->f_lock);
> + mutex_unlock(&inode->i_mutex);
> + return err;
> + }
> case FITRIM: {
>
> struct super_block *sb = inode->i_sb;
>
>
>
--
Jan Kara <[email protected]>
SUSE Labs, CR
On Mon, 2013-04-01 at 13:34 -0700, Anand Avati wrote:
> On Apr 1, 2013, at 1:05 PM, Eric Sandeen <[email protected]> wrote:
> >
> > Meh, let's just keep it simple then. And I'd really like to know if
> > gluster still even needs this, or if their new scheme will work instead,
>
> As of this morning the new d_off transformation (Zach's idea) is merged in gluster. We had to put in some kind of ext4 awareness, and the "more complex" d_off transformation (which is finally working properly after fixing some minor issues) seemed better than calling ioctls by detecting the backend is ext4.
>
> > in which case we should drop it - but Samba made noise about needing it too,
> > though I've not seen specifics, so I hate to merge it "just in case."
>
> Yes, I too saw comments from Andrew Bartlett of the Samba team. It
> appeared to be the case that Samba could only present 32bit cookies
> while ext4 was now returning larger cookies (somewhat like the old
> NFSv2 clients problem?). This ioctl would be useful there I guess,
> bring it "in par" with knfsd's abilities of expressing desire for
> 32bit cookies? However, for knfsd legacy requirements, FMODE_32BITHASH
> is in generic VFS. But for a userspace file server, it would need to
> first gain the knowledge of which filesystems in the world actually
> present large cookies, and for the subset which support smaller
> cookies, issue filesystem specific ioctls() in their own specific
> ways.
>
> Wouldn't it be "fair" to treat userspace file servers as equals, and
> provide a generic FS independent ioctl to set the common
> FMODE_32BITHASH flag on any dir fd? Think of it as a way of extending
> the "pointer access" to file->f_mode which NFS exercises, up to
> userspace?
If 32-bit cookies are baked into current-generation NFS, even if Samba
doesn't take this up, wouldn't
http://sourceforge.net/apps/trac/nfs-ganesha/ need it just the same?
Samba's use case fortunately is for DOS clients, and there just are not
very many of those any more (and tests that behave like dos clients,
which is how we noticed).
You CC'ed Jeremy, who is our authority on this area (I just noticed and
inquired into the failing tests). I'm hoping he can give a more
authoritative view on if we would push for this.
Andrew Bartlett
--
Andrew Bartlett http://samba.org/~abartlet/
Authentication Developer, Samba Team http://samba.org
On 04/05/2013 04:28 PM, J. Bruce Fields wrote:
> knfsd is still returning 32-bit cookies to v2 clients (that's the
> protocol), but I doubt v2 support is very critical for Ganesha.
The last I heard, Ganesha has officially removed all NFSv2 code from
their repo and do not support it at all anymore.
Avati
On Sat, Apr 06, 2013 at 10:05:25AM +1100, Andrew Bartlett wrote:
> On Mon, 2013-04-01 at 13:34 -0700, Anand Avati wrote:
> > On Apr 1, 2013, at 1:05 PM, Eric Sandeen <[email protected]> wrote:
> > >
> > > Meh, let's just keep it simple then. And I'd really like to know if
> > > gluster still even needs this, or if their new scheme will work instead,
> >
> > As of this morning the new d_off transformation (Zach's idea) is merged in gluster. We had to put in some kind of ext4 awareness, and the "more complex" d_off transformation (which is finally working properly after fixing some minor issues) seemed better than calling ioctls by detecting the backend is ext4.
> >
> > > in which case we should drop it - but Samba made noise about needing it too,
> > > though I've not seen specifics, so I hate to merge it "just in case."
> >
> > Yes, I too saw comments from Andrew Bartlett of the Samba team. It
> > appeared to be the case that Samba could only present 32bit cookies
> > while ext4 was now returning larger cookies (somewhat like the old
> > NFSv2 clients problem?). This ioctl would be useful there I guess,
> > bring it "in par" with knfsd's abilities of expressing desire for
> > 32bit cookies? However, for knfsd legacy requirements, FMODE_32BITHASH
> > is in generic VFS. But for a userspace file server, it would need to
> > first gain the knowledge of which filesystems in the world actually
> > present large cookies, and for the subset which support smaller
> > cookies, issue filesystem specific ioctls() in their own specific
> > ways.
> >
> > Wouldn't it be "fair" to treat userspace file servers as equals, and
> > provide a generic FS independent ioctl to set the common
> > FMODE_32BITHASH flag on any dir fd? Think of it as a way of extending
> > the "pointer access" to file->f_mode which NFS exercises, up to
> > userspace?
>
> If 32-bit cookies are baked into current-generation NFS, even if Samba
> doesn't take this up, wouldn't
> http://sourceforge.net/apps/trac/nfs-ganesha/ need it just the same?
I guess ganesha could use tricks similar to gluster's and throw out the
low bits of 64-bit cookies.
For knfsd I've been telling people they should either
- fix their clients (the protocol *does* define cookies to be 64
bits, and the Linux client has shown it's practical to remap
64-bit cookies to 32-bit cookies if necessary for applications
using 32-bit interfaces), or
- fix applications to use 64-bit interfaces, or
- just turn off dir_index: hopefully the fact that they've been
happily using ext4 without seeing hash collisions means they
aren't using very large directories, hence can live without
dir_index.
knfsd is still returning 32-bit cookies to v2 clients (that's the
protocol), but I doubt v2 support is very critical for Ganesha.
--b.
On Fri, 2013-04-05 at 16:26 -0700, Anand Avati wrote:
> On 04/05/2013 04:28 PM, J. Bruce Fields wrote:
>
> > knfsd is still returning 32-bit cookies to v2 clients (that's the
> > protocol), but I doubt v2 support is very critical for Ganesha.
>
> The last I heard, Ganesha has officially removed all NFSv2 code from
> their repo and do not support it at all anymore.
I spoke with Jeremy, and we wouldn't want Samba's support for equally
old/dead protocols to stand in the way of the kernel here. We both
agree that we can sort this out in userspace well enough (and would have
to anyway, as a there is a large deployed set of servers with this
behaviour already).
Thank you very much to those who took the time to loop us in for giving
Samba a chance to express a view here.
Andrew Bartlett
--
Andrew Bartlett http://samba.org/~abartlet/
Authentication Developer, Samba Team http://samba.org