This is version 2 of the hole punching series I posted last week. The following
things have changed
-Hole punching doesn't change file size
-Fixed the mode checks in ext4/btrfs/gfs2 so they do what they are supposed to
I posted updates to xfstests and xfsprogs in order to test this new interface,
and ran the test on xfs to make sure hole punching worked properly for xfs and I
tested it on btrfs to make sure it failed properly. This series also adds
support for doing hole punching to ocfs2, but I did not test this part of it,
albiet it's an obvious fix so it should work fine. Thanks,
Josef
Btrfs doesn't have the ability to punch holes yet, so make sure we return
EOPNOTSUPP if we try to use hole punching through fallocate. This support can
be added later. Thanks,
Signed-off-by: Josef Bacik <[email protected]>
---
fs/btrfs/inode.c | 4 ++++
1 files changed, 4 insertions(+), 0 deletions(-)
diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
index 78877d7..6f08892 100644
--- a/fs/btrfs/inode.c
+++ b/fs/btrfs/inode.c
@@ -6936,6 +6936,10 @@ static long btrfs_fallocate(struct inode *inode, int mode,
alloc_start = offset & ~mask;
alloc_end = (offset + len + mask) & ~mask;
+ /* We only support the FALLOC_FL_KEEP_SIZE mode */
+ if (mode && (mode != FALLOC_FL_KEEP_SIZE))
+ return -EOPNOTSUPP;
+
/*
* wait for ordered IO before we have any locks. We'll loop again
* below with the locks held.
--
1.6.6.1
Gfs2 doesn't have the ability to punch holes yet, so make sure we return
EOPNOTSUPP if we try to use hole punching through fallocate. This support can
be added later. Thanks,
Signed-off-by: Josef Bacik <[email protected]>
---
fs/gfs2/ops_inode.c | 4 ++++
1 files changed, 4 insertions(+), 0 deletions(-)
diff --git a/fs/gfs2/ops_inode.c b/fs/gfs2/ops_inode.c
index 12cbea7..65cb5f6 100644
--- a/fs/gfs2/ops_inode.c
+++ b/fs/gfs2/ops_inode.c
@@ -1439,6 +1439,10 @@ static long gfs2_fallocate(struct inode *inode, int mode, loff_t offset,
loff_t next = (offset + len - 1) >> sdp->sd_sb.sb_bsize_shift;
next = (next + 1) << sdp->sd_sb.sb_bsize_shift;
+ /* We only support the FALLOC_FL_KEEP_SIZE mode */
+ if (mode && (mode != FALLOC_FL_KEEP_SIZE))
+ return -EOPNOTSUPP;
+
offset = (offset >> sdp->sd_sb.sb_bsize_shift) <<
sdp->sd_sb.sb_bsize_shift;
--
1.6.6.1
Hole punching has already been implemented by XFS and OCFS2, and has the
potential to be implemented on both BTRFS and EXT4 so we need a generic way to
get to this feature. The simplest way in my mind is to add FALLOC_FL_PUNCH_HOLE
to fallocate() since it already looks like the normal fallocate() operation.
I've tested this patch with XFS and BTRFS to make sure XFS did what it's
supposed to do and that BTRFS failed like it was supposed to. Thank you,
Signed-off-by: Josef Bacik <[email protected]>
---
fs/open.c | 2 +-
include/linux/falloc.h | 1 +
2 files changed, 2 insertions(+), 1 deletions(-)
diff --git a/fs/open.c b/fs/open.c
index 4197b9e..ab8dedf 100644
--- a/fs/open.c
+++ b/fs/open.c
@@ -223,7 +223,7 @@ int do_fallocate(struct file *file, int mode, loff_t offset, loff_t len)
return -EINVAL;
/* Return error if mode is not supported */
- if (mode && !(mode & FALLOC_FL_KEEP_SIZE))
+ if (mode && (mode & ~(FALLOC_FL_KEEP_SIZE | FALLOC_FL_PUNCH_HOLE)))
return -EOPNOTSUPP;
if (!(file->f_mode & FMODE_WRITE))
diff --git a/include/linux/falloc.h b/include/linux/falloc.h
index 3c15510..851cba2 100644
--- a/include/linux/falloc.h
+++ b/include/linux/falloc.h
@@ -2,6 +2,7 @@
#define _FALLOC_H_
#define FALLOC_FL_KEEP_SIZE 0x01 /* default is extend size */
+#define FALLOC_FL_PUNCH_HOLE 0X02 /* de-allocates range */
#ifdef __KERNEL__
--
1.6.6.1
This patch simply allows XFS to handle the hole punching flag in fallocate
properly. I've tested this with a little program that does a bunch of random
hole punching with FL_KEEP_SIZE and without it to make sure it does the right
thing. Thanks,
Signed-off-by: Josef Bacik <[email protected]>
---
fs/xfs/linux-2.6/xfs_iops.c | 9 ++++++---
1 files changed, 6 insertions(+), 3 deletions(-)
diff --git a/fs/xfs/linux-2.6/xfs_iops.c b/fs/xfs/linux-2.6/xfs_iops.c
index 96107ef..63cc929 100644
--- a/fs/xfs/linux-2.6/xfs_iops.c
+++ b/fs/xfs/linux-2.6/xfs_iops.c
@@ -516,6 +516,7 @@ xfs_vn_fallocate(
loff_t new_size = 0;
xfs_flock64_t bf;
xfs_inode_t *ip = XFS_I(inode);
+ int cmd = XFS_IOC_RESVSP;
/* preallocation on directories not yet supported */
error = -ENODEV;
@@ -528,8 +529,11 @@ xfs_vn_fallocate(
xfs_ilock(ip, XFS_IOLOCK_EXCL);
+ if (mode & FALLOC_FL_PUNCH_HOLE)
+ cmd = XFS_IOC_UNRESVSP;
+
/* check the new inode size is valid before allocating */
- if (!(mode & FALLOC_FL_KEEP_SIZE) &&
+ if (!(mode & (FALLOC_FL_KEEP_SIZE | FALLOC_FL_PUNCH_HOLE)) &&
offset + len > i_size_read(inode)) {
new_size = offset + len;
error = inode_newsize_ok(inode, new_size);
@@ -537,8 +541,7 @@ xfs_vn_fallocate(
goto out_unlock;
}
- error = -xfs_change_file_space(ip, XFS_IOC_RESVSP, &bf,
- 0, XFS_ATTR_NOLOCK);
+ error = -xfs_change_file_space(ip, cmd, &bf, 0, XFS_ATTR_NOLOCK);
if (error)
goto out_unlock;
--
1.6.6.1
Ext4 doesn't have the ability to punch holes yet, so make sure we return
EOPNOTSUPP if we try to use hole punching through fallocate. This support can
be added later. Thanks,
Signed-off-by: Josef Bacik <[email protected]>
---
fs/ext4/extents.c | 4 ++++
1 files changed, 4 insertions(+), 0 deletions(-)
diff --git a/fs/ext4/extents.c b/fs/ext4/extents.c
index 0554c48..35bca73 100644
--- a/fs/ext4/extents.c
+++ b/fs/ext4/extents.c
@@ -3622,6 +3622,10 @@ long ext4_fallocate(struct inode *inode, int mode, loff_t offset, loff_t len)
struct ext4_map_blocks map;
unsigned int credits, blkbits = inode->i_blkbits;
+ /* We only support the FALLOC_FL_KEEP_SIZE mode */
+ if (mode && (mode != FALLOC_FL_KEEP_SIZE))
+ return -EOPNOTSUPP;
+
/*
* currently supporting (pre)allocate mode for extent-based
* files _only_
--
1.6.6.1
This patch just makes ocfs2 use its UNRESERVP ioctl when we get the hole punch
flag in fallocate. I didn't test it, but it seems simple enough. Thanks,
Signed-off-by: Josef Bacik <[email protected]>
---
fs/ocfs2/file.c | 10 ++++++++--
1 files changed, 8 insertions(+), 2 deletions(-)
diff --git a/fs/ocfs2/file.c b/fs/ocfs2/file.c
index 77b4c04..181ae52 100644
--- a/fs/ocfs2/file.c
+++ b/fs/ocfs2/file.c
@@ -1992,6 +1992,7 @@ static long ocfs2_fallocate(struct inode *inode, int mode, loff_t offset,
struct ocfs2_super *osb = OCFS2_SB(inode->i_sb);
struct ocfs2_space_resv sr;
int change_size = 1;
+ int cmd = OCFS2_IOC_RESVSP64;
if (!ocfs2_writes_unwritten_extents(osb))
return -EOPNOTSUPP;
@@ -2002,12 +2003,17 @@ static long ocfs2_fallocate(struct inode *inode, int mode, loff_t offset,
if (mode & FALLOC_FL_KEEP_SIZE)
change_size = 0;
+ if (mode & FALLOC_FL_PUNCH_HOLE) {
+ cmd = OCFS2_IOC_UNRESVSP64;
+ change_size = 0;
+ }
+
sr.l_whence = 0;
sr.l_start = (s64)offset;
sr.l_len = (s64)len;
- return __ocfs2_change_file_space(NULL, inode, offset,
- OCFS2_IOC_RESVSP64, &sr, change_size);
+ return __ocfs2_change_file_space(NULL, inode, offset, cmd, &sr,
+ change_size);
}
int ocfs2_check_range_for_refcount(struct inode *inode, loff_t pos,
--
1.6.6.1
On Mon 15-11-10 12:05:18, Josef Bacik wrote:
> diff --git a/fs/open.c b/fs/open.c
> index 4197b9e..ab8dedf 100644
> --- a/fs/open.c
> +++ b/fs/open.c
> @@ -223,7 +223,7 @@ int do_fallocate(struct file *file, int mode, loff_t offset, loff_t len)
> return -EINVAL;
>
> /* Return error if mode is not supported */
> - if (mode && !(mode & FALLOC_FL_KEEP_SIZE))
> + if (mode && (mode & ~(FALLOC_FL_KEEP_SIZE | FALLOC_FL_PUNCH_HOLE)))
Why not just:
if (mode & ~(FALLOC_FL_KEEP_SIZE | FALLOC_FL_PUNCH_HOLE)) ?
> diff --git a/include/linux/falloc.h b/include/linux/falloc.h
> index 3c15510..851cba2 100644
> --- a/include/linux/falloc.h
> +++ b/include/linux/falloc.h
> @@ -2,6 +2,7 @@
> #define _FALLOC_H_
>
> #define FALLOC_FL_KEEP_SIZE 0x01 /* default is extend size */
> +#define FALLOC_FL_PUNCH_HOLE 0X02 /* de-allocates range */
^ use lowercase 'x' please...
Honza
--
Jan Kara <[email protected]>
SUSE Labs, CR
On Tue 16-11-10 12:16:11, Jan Kara wrote:
> On Mon 15-11-10 12:05:18, Josef Bacik wrote:
> > diff --git a/fs/open.c b/fs/open.c
> > index 4197b9e..ab8dedf 100644
> > --- a/fs/open.c
> > +++ b/fs/open.c
> > @@ -223,7 +223,7 @@ int do_fallocate(struct file *file, int mode, loff_t offset, loff_t len)
> > return -EINVAL;
> >
> > /* Return error if mode is not supported */
> > - if (mode && !(mode & FALLOC_FL_KEEP_SIZE))
> > + if (mode && (mode & ~(FALLOC_FL_KEEP_SIZE | FALLOC_FL_PUNCH_HOLE)))
> Why not just:
> if (mode & ~(FALLOC_FL_KEEP_SIZE | FALLOC_FL_PUNCH_HOLE)) ?
And BTW, since FALLOC_FL_PUNCH_HOLE does not change the file size, should
not we enforce that FALLOC_FL_KEEP_SIZE is / is not set? I don't mind too
much which way but keeping it ambiguous (ignored) in the interface usually
proves as a bad idea in future when we want to further extend the interface...
Honza
--
Jan Kara <[email protected]>
SUSE Labs, CR
On Mon 15-11-10 12:05:20, Josef Bacik wrote:
> This patch just makes ocfs2 use its UNRESERVP ioctl when we get the hole punch
> flag in fallocate. I didn't test it, but it seems simple enough. Thanks,
>
> Signed-off-by: Josef Bacik <[email protected]>
You might want to directly CC Joel Becker <[email protected]> who
maintains OCFS2. Otherwise the patch looks OK so you can add
Acked-by: Jan Kara <[email protected]>
for what it's worth ;).
Honza
> ---
> fs/ocfs2/file.c | 10 ++++++++--
> 1 files changed, 8 insertions(+), 2 deletions(-)
>
> diff --git a/fs/ocfs2/file.c b/fs/ocfs2/file.c
> index 77b4c04..181ae52 100644
> --- a/fs/ocfs2/file.c
> +++ b/fs/ocfs2/file.c
> @@ -1992,6 +1992,7 @@ static long ocfs2_fallocate(struct inode *inode, int mode, loff_t offset,
> struct ocfs2_super *osb = OCFS2_SB(inode->i_sb);
> struct ocfs2_space_resv sr;
> int change_size = 1;
> + int cmd = OCFS2_IOC_RESVSP64;
>
> if (!ocfs2_writes_unwritten_extents(osb))
> return -EOPNOTSUPP;
> @@ -2002,12 +2003,17 @@ static long ocfs2_fallocate(struct inode *inode, int mode, loff_t offset,
> if (mode & FALLOC_FL_KEEP_SIZE)
> change_size = 0;
>
> + if (mode & FALLOC_FL_PUNCH_HOLE) {
> + cmd = OCFS2_IOC_UNRESVSP64;
> + change_size = 0;
> + }
> +
> sr.l_whence = 0;
> sr.l_start = (s64)offset;
> sr.l_len = (s64)len;
>
> - return __ocfs2_change_file_space(NULL, inode, offset,
> - OCFS2_IOC_RESVSP64, &sr, change_size);
> + return __ocfs2_change_file_space(NULL, inode, offset, cmd, &sr,
> + change_size);
> }
>
> int ocfs2_check_range_for_refcount(struct inode *inode, loff_t pos,
> --
> 1.6.6.1
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
--
Jan Kara <[email protected]>
SUSE Labs, CR
On Mon 15-11-10 12:05:21, Josef Bacik wrote:
> Ext4 doesn't have the ability to punch holes yet, so make sure we return
> EOPNOTSUPP if we try to use hole punching through fallocate. This support can
> be added later. Thanks,
>
> Signed-off-by: Josef Bacik <[email protected]>
Acked-by: Jan Kara <[email protected]>
Honza
> ---
> fs/ext4/extents.c | 4 ++++
> 1 files changed, 4 insertions(+), 0 deletions(-)
>
> diff --git a/fs/ext4/extents.c b/fs/ext4/extents.c
> index 0554c48..35bca73 100644
> --- a/fs/ext4/extents.c
> +++ b/fs/ext4/extents.c
> @@ -3622,6 +3622,10 @@ long ext4_fallocate(struct inode *inode, int mode, loff_t offset, loff_t len)
> struct ext4_map_blocks map;
> unsigned int credits, blkbits = inode->i_blkbits;
>
> + /* We only support the FALLOC_FL_KEEP_SIZE mode */
> + if (mode && (mode != FALLOC_FL_KEEP_SIZE))
> + return -EOPNOTSUPP;
> +
> /*
> * currently supporting (pre)allocate mode for extent-based
> * files _only_
> --
> 1.6.6.1
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
--
Jan Kara <[email protected]>
SUSE Labs, CR
On 11/15/2010 07:05 PM, Josef Bacik wrote:
> Ext4 doesn't have the ability to punch holes yet, so make sure we return
> EOPNOTSUPP if we try to use hole punching through fallocate. This support can
> be added later. Thanks,
>
Instead of teaching filesystems to fail if they don't support the
capability, why don't supporting filesystems say so, allowing the fail
code to be in common code?
--
error compiling committee.c: too many arguments to function
On Tue, Nov 16, 2010 at 02:25:35PM +0200, Avi Kivity wrote:
> On 11/15/2010 07:05 PM, Josef Bacik wrote:
>> Ext4 doesn't have the ability to punch holes yet, so make sure we return
>> EOPNOTSUPP if we try to use hole punching through fallocate. This support can
>> be added later. Thanks,
>>
>
> Instead of teaching filesystems to fail if they don't support the
> capability, why don't supporting filesystems say so, allowing the fail
> code to be in common code?
>
There is no simple way to test if a filesystem supports hole punching or not so
the check has to be done per fs. Thanks,
Josef
On Tue, Nov 16, 2010 at 12:43:46PM +0100, Jan Kara wrote:
> On Tue 16-11-10 12:16:11, Jan Kara wrote:
> > On Mon 15-11-10 12:05:18, Josef Bacik wrote:
> > > diff --git a/fs/open.c b/fs/open.c
> > > index 4197b9e..ab8dedf 100644
> > > --- a/fs/open.c
> > > +++ b/fs/open.c
> > > @@ -223,7 +223,7 @@ int do_fallocate(struct file *file, int mode, loff_t offset, loff_t len)
> > > return -EINVAL;
> > >
> > > /* Return error if mode is not supported */
> > > - if (mode && !(mode & FALLOC_FL_KEEP_SIZE))
> > > + if (mode && (mode & ~(FALLOC_FL_KEEP_SIZE | FALLOC_FL_PUNCH_HOLE)))
> > Why not just:
> > if (mode & ~(FALLOC_FL_KEEP_SIZE | FALLOC_FL_PUNCH_HOLE)) ?
> And BTW, since FALLOC_FL_PUNCH_HOLE does not change the file size, should
> not we enforce that FALLOC_FL_KEEP_SIZE is / is not set? I don't mind too
> much which way but keeping it ambiguous (ignored) in the interface usually
> proves as a bad idea in future when we want to further extend the interface...
>
Yeah I went back and forth on this. KEEP_SIZE won't change the behavior of
PUNCH_HOLE since PUNCH_HOLE implicitly means keep the size. I figured since its
"mode" and not "flags" it would be ok to make either way accepted, but if you
prefer PUNCH_HOLE means you have to have KEEP_SIZE set then I'm cool with that,
just let me know one way or the other. Thanks,
Josef
On Tue, Nov 16, 2010 at 12:16:11PM +0100, Jan Kara wrote:
> On Mon 15-11-10 12:05:18, Josef Bacik wrote:
> > diff --git a/fs/open.c b/fs/open.c
> > index 4197b9e..ab8dedf 100644
> > --- a/fs/open.c
> > +++ b/fs/open.c
> > @@ -223,7 +223,7 @@ int do_fallocate(struct file *file, int mode, loff_t offset, loff_t len)
> > return -EINVAL;
> >
> > /* Return error if mode is not supported */
> > - if (mode && !(mode & FALLOC_FL_KEEP_SIZE))
> > + if (mode && (mode & ~(FALLOC_FL_KEEP_SIZE | FALLOC_FL_PUNCH_HOLE)))
> Why not just:
> if (mode & ~(FALLOC_FL_KEEP_SIZE | FALLOC_FL_PUNCH_HOLE)) ?
>
Good point, thank you.
> > diff --git a/include/linux/falloc.h b/include/linux/falloc.h
> > index 3c15510..851cba2 100644
> > --- a/include/linux/falloc.h
> > +++ b/include/linux/falloc.h
> > @@ -2,6 +2,7 @@
> > #define _FALLOC_H_
> >
> > #define FALLOC_FL_KEEP_SIZE 0x01 /* default is extend size */
> > +#define FALLOC_FL_PUNCH_HOLE 0X02 /* de-allocates range */
> ^ use lowercase 'x' please...
>
Argh bitten by caps-lock again. Thanks I'll fix this up,
Josef
On 11/16/2010 02:50 PM, Josef Bacik wrote:
> On Tue, Nov 16, 2010 at 02:25:35PM +0200, Avi Kivity wrote:
> > On 11/15/2010 07:05 PM, Josef Bacik wrote:
> >> Ext4 doesn't have the ability to punch holes yet, so make sure we return
> >> EOPNOTSUPP if we try to use hole punching through fallocate. This support can
> >> be added later. Thanks,
> >>
> >
> > Instead of teaching filesystems to fail if they don't support the
> > capability, why don't supporting filesystems say so, allowing the fail
> > code to be in common code?
> >
>
> There is no simple way to test if a filesystem supports hole punching or not so
> the check has to be done per fs. Thanks,
Could put a flag word in superblock_operations. Filesystems which
support punching (or other features) can enable it there.
Or even have its own callback.
--
error compiling committee.c: too many arguments to function
On Tue 16-11-10 07:52:50, Josef Bacik wrote:
> On Tue, Nov 16, 2010 at 12:43:46PM +0100, Jan Kara wrote:
> > On Tue 16-11-10 12:16:11, Jan Kara wrote:
> > > On Mon 15-11-10 12:05:18, Josef Bacik wrote:
> > > > diff --git a/fs/open.c b/fs/open.c
> > > > index 4197b9e..ab8dedf 100644
> > > > --- a/fs/open.c
> > > > +++ b/fs/open.c
> > > > @@ -223,7 +223,7 @@ int do_fallocate(struct file *file, int mode, loff_t offset, loff_t len)
> > > > return -EINVAL;
> > > >
> > > > /* Return error if mode is not supported */
> > > > - if (mode && !(mode & FALLOC_FL_KEEP_SIZE))
> > > > + if (mode && (mode & ~(FALLOC_FL_KEEP_SIZE | FALLOC_FL_PUNCH_HOLE)))
> > > Why not just:
> > > if (mode & ~(FALLOC_FL_KEEP_SIZE | FALLOC_FL_PUNCH_HOLE)) ?
> > And BTW, since FALLOC_FL_PUNCH_HOLE does not change the file size, should
> > not we enforce that FALLOC_FL_KEEP_SIZE is / is not set? I don't mind too
> > much which way but keeping it ambiguous (ignored) in the interface usually
> > proves as a bad idea in future when we want to further extend the interface...
> >
>
> Yeah I went back and forth on this. KEEP_SIZE won't change the behavior of
> PUNCH_HOLE since PUNCH_HOLE implicitly means keep the size. I figured since its
> "mode" and not "flags" it would be ok to make either way accepted, but if you
> prefer PUNCH_HOLE means you have to have KEEP_SIZE set then I'm cool with that,
> just let me know one way or the other. Thanks,
I was wondering about 'mode' vs 'flags' as well. The manpage says:
The mode argument determines the operation to be performed on the given
range. Currently only one flag is supported for mode...
So we call it "mode" but speak about "flags"? Seems a bit inconsistent.
I'd maybe lean a bit at the "flags" side and just make sure that
only one of FALLOC_FL_KEEP_SIZE, FALLOC_FL_PUNCH_HOLE is set (interpreting
FALLOC_FL_KEEP_SIZE as allocate blocks beyond i_size). But I'm not sure
what others think.
Honza
--
Jan Kara <[email protected]>
SUSE Labs, CR
On Tue, Nov 16, 2010 at 03:07:29PM +0200, Avi Kivity wrote:
> On 11/16/2010 02:50 PM, Josef Bacik wrote:
>> On Tue, Nov 16, 2010 at 02:25:35PM +0200, Avi Kivity wrote:
>> > On 11/15/2010 07:05 PM, Josef Bacik wrote:
>> >> Ext4 doesn't have the ability to punch holes yet, so make sure we return
>> >> EOPNOTSUPP if we try to use hole punching through fallocate. This support can
>> >> be added later. Thanks,
>> >>
>> >
>> > Instead of teaching filesystems to fail if they don't support the
>> > capability, why don't supporting filesystems say so, allowing the fail
>> > code to be in common code?
>> >
>>
>> There is no simple way to test if a filesystem supports hole punching or not so
>> the check has to be done per fs. Thanks,
>
> Could put a flag word in superblock_operations. Filesystems which
> support punching (or other features) can enable it there.
>
> Or even have its own callback.
>
Sure but then you have to do the same thing for every other flag you add to
fallocate and then you have this huge mess of random flags just so you don't
call into the filesystem. This way is a lesser of two evils I think. Thanks,
Josef
On 15/11/10 17:05, Josef Bacik wrote:
> Ext4 doesn't have the ability to punch holes yet, so make sure we return
> EOPNOTSUPP if we try to use hole punching through fallocate. This support can
> be added later. Thanks,
>
> Signed-off-by: Josef Bacik <[email protected]>
> ---
> fs/ext4/extents.c | 4 ++++
> 1 files changed, 4 insertions(+), 0 deletions(-)
>
> diff --git a/fs/ext4/extents.c b/fs/ext4/extents.c
> index 0554c48..35bca73 100644
> --- a/fs/ext4/extents.c
> +++ b/fs/ext4/extents.c
> @@ -3622,6 +3622,10 @@ long ext4_fallocate(struct inode *inode, int mode, loff_t offset, loff_t len)
> struct ext4_map_blocks map;
> unsigned int credits, blkbits = inode->i_blkbits;
>
> + /* We only support the FALLOC_FL_KEEP_SIZE mode */
> + if (mode && (mode != FALLOC_FL_KEEP_SIZE))
> + return -EOPNOTSUPP;
> +
> /*
> * currently supporting (pre)allocate mode for extent-based
> * files _only_
So for older versions of ext4 or other filesystems, how do we know
that fallocate(...,FALLOC_FL_PUNCH_HOLE) is not supported.
I.E. how do we detect at runtime that the call succeeded
and didn't just do a normal fallocate()?
cheers,
P?draig.
On Tue, Nov 16, 2010 at 04:20:43PM +0000, P?draig Brady wrote:
> On 15/11/10 17:05, Josef Bacik wrote:
> > Ext4 doesn't have the ability to punch holes yet, so make sure we return
> > EOPNOTSUPP if we try to use hole punching through fallocate. This support can
> > be added later. Thanks,
> >
> > Signed-off-by: Josef Bacik <[email protected]>
> > ---
> > fs/ext4/extents.c | 4 ++++
> > 1 files changed, 4 insertions(+), 0 deletions(-)
> >
> > diff --git a/fs/ext4/extents.c b/fs/ext4/extents.c
> > index 0554c48..35bca73 100644
> > --- a/fs/ext4/extents.c
> > +++ b/fs/ext4/extents.c
> > @@ -3622,6 +3622,10 @@ long ext4_fallocate(struct inode *inode, int mode, loff_t offset, loff_t len)
> > struct ext4_map_blocks map;
> > unsigned int credits, blkbits = inode->i_blkbits;
> >
> > + /* We only support the FALLOC_FL_KEEP_SIZE mode */
> > + if (mode && (mode != FALLOC_FL_KEEP_SIZE))
> > + return -EOPNOTSUPP;
> > +
> > /*
> > * currently supporting (pre)allocate mode for extent-based
> > * files _only_
>
> So for older versions of ext4 or other filesystems, how do we know
> that fallocate(...,FALLOC_FL_PUNCH_HOLE) is not supported.
> I.E. how do we detect at runtime that the call succeeded
> and didn't just do a normal fallocate()?
>
Older kernels won't accept FALLOC_FL_PUNCH_HOLE, so you'll get an error.
Thanks,
Josef
On 16/11/10 16:33, Josef Bacik wrote:
> On Tue, Nov 16, 2010 at 04:20:43PM +0000, P?draig Brady wrote:
>>
>> So for older versions of ext4 or other filesystems, how do we know
>> that fallocate(...,FALLOC_FL_PUNCH_HOLE) is not supported.
>> I.E. how do we detect at runtime that the call succeeded
>> and didn't just do a normal fallocate()?
>>
>
> Older kernels won't accept FALLOC_FL_PUNCH_HOLE, so you'll get an error.
> Thanks,
Oops right.
Older kernels filter FALLOC_FL_PUNCH_HOLE at a higher level in fs/open.c
Whereas newer kernels will do it per file system.
thanks,
P?draig.
On Tue, Nov 16, 2010 at 8:05 AM, Josef Bacik <[email protected]> wrote:
> On Tue, Nov 16, 2010 at 03:07:29PM +0200, Avi Kivity wrote:
>> On 11/16/2010 02:50 PM, Josef Bacik wrote:
>>> On Tue, Nov 16, 2010 at 02:25:35PM +0200, Avi Kivity wrote:
>>> > ?On 11/15/2010 07:05 PM, Josef Bacik wrote:
>>> >> ?Ext4 doesn't have the ability to punch holes yet, so make sure we return
>>> >> ?EOPNOTSUPP if we try to use hole punching through fallocate. ?This support can
>>> >> ?be added later. ?Thanks,
>>> >>
>>> >
>>> > ?Instead of teaching filesystems to fail if they don't support the
>>> > ?capability, why don't supporting filesystems say so, allowing the fail
>>> > ?code to be in common code?
>>> >
>>>
>>> There is no simple way to test if a filesystem supports hole punching or not so
>>> the check has to be done per fs. ?Thanks,
>>
>> Could put a flag word in superblock_operations. ?Filesystems which
>> support punching (or other features) can enable it there.
>>
>> Or even have its own callback.
>>
>
> Sure but then you have to do the same thing for every other flag you add to
> fallocate and then you have this huge mess of random flags just so you don't
> call into the filesystem. ?This way is a lesser of two evils I think. ?Thanks,
>
> Josef
I'm not a true kernel hacker, so my opinion is not critical but I find
it hard to read / expand as
> + /* We only support the FALLOC_FL_KEEP_SIZE mode */
> + if (mode && (mode != FALLOC_FL_KEEP_SIZE))
> + return -EOPNOTSUPP;
How about:
#define EXT4_FALLOC_MODES_SUPPORTED (FALLOC_FL_KEEP_SIZE)
if (modes & ~EXT4_FALLOC_MODES_SUPPORTED)
return -EOPNOTSUPP;
Greg
On 2010-11-16, at 07:14, Jan Kara wrote:
>> Yeah I went back and forth on this. KEEP_SIZE won't change the behavior of PUNCH_HOLE since PUNCH_HOLE implicitly means keep the size. I figured since its "mode" and not "flags" it would be ok to make either way accepted, but if you prefer PUNCH_HOLE means you have to have KEEP_SIZE set then I'm cool with that, just let me know one way or the other.
>
> So we call it "mode" but speak about "flags"? Seems a bit inconsistent.
> I'd maybe lean a bit at the "flags" side and just make sure that only one of FALLOC_FL_KEEP_SIZE, FALLOC_FL_PUNCH_HOLE is set (interpreting FALLOC_FL_KEEP_SIZE as allocate blocks beyond i_size). But I'm not sure what others think.
IMHO, it makes more sense for consistency and "get what users expect" that these be treated as flags. Some users will want KEEP_SIZE, but in other cases it may make sense that a hole punch at the end of a file should shrink the file (i.e. the opposite of an append).
Cheers, Andreas
On Tue, Nov 16, 2010 at 06:22:47PM -0600, Andreas Dilger wrote:
> On 2010-11-16, at 07:14, Jan Kara wrote:
> >> Yeah I went back and forth on this. KEEP_SIZE won't change the
> >> behavior of PUNCH_HOLE since PUNCH_HOLE implicitly means keep
> >> the size. I figured since its "mode" and not "flags" it would
> >> be ok to make either way accepted, but if you prefer PUNCH_HOLE
> >> means you have to have KEEP_SIZE set then I'm cool with that,
> >> just let me know one way or the other.
> >
> > So we call it "mode" but speak about "flags"? Seems a bit
> > inconsistent. I'd maybe lean a bit at the "flags" side and just
> > make sure that only one of FALLOC_FL_KEEP_SIZE,
> > FALLOC_FL_PUNCH_HOLE is set (interpreting FALLOC_FL_KEEP_SIZE as
> > allocate blocks beyond i_size). But I'm not sure what others
> > think.
>
> IMHO, it makes more sense for consistency and "get what users
> expect" that these be treated as flags. Some users will want
> KEEP_SIZE, but in other cases it may make sense that a hole punch
> at the end of a file should shrink the file (i.e. the opposite of
> an append).
What's wrong with ftruncate() for this?
There's plenty of open questions about the interface if we allow
hole punching to change the file size. e.g. where do we set the EOF
(offset or offset+len)? What do we do with the rest of the blocks
that are now beyond EOF? We weren't asked to punch them out, so do
we leave them behind? What if we are leaving written blocks beyond
EOF - does any filesystem other than XFS support that (i.e. are we
introducing different behaviour on different filesystems)? And what
happens if the offset is beyond EOF? Do we extend the file, and if
so why wouldn't you just use ftruncate() instead?
IMO, allowing hole punching to change the file size makes it much
more complicated and hence less likely to simply do what the user
expects. It also is harder to implement and testing becomes much
more intricate. From that perspective, it does not seem desirable to
me...
Cheers,
Dave.
--
Dave Chinner
[email protected]
On Wed, Nov 17, 2010 at 01:11:50PM +1100, Dave Chinner wrote:
> On Tue, Nov 16, 2010 at 06:22:47PM -0600, Andreas Dilger wrote:
> > On 2010-11-16, at 07:14, Jan Kara wrote:
> > >> Yeah I went back and forth on this. KEEP_SIZE won't change the
> > >> behavior of PUNCH_HOLE since PUNCH_HOLE implicitly means keep
> > >> the size. I figured since its "mode" and not "flags" it would
> > >> be ok to make either way accepted, but if you prefer PUNCH_HOLE
> > >> means you have to have KEEP_SIZE set then I'm cool with that,
> > >> just let me know one way or the other.
> > >
> > > So we call it "mode" but speak about "flags"? Seems a bit
> > > inconsistent. I'd maybe lean a bit at the "flags" side and just
> > > make sure that only one of FALLOC_FL_KEEP_SIZE,
> > > FALLOC_FL_PUNCH_HOLE is set (interpreting FALLOC_FL_KEEP_SIZE as
> > > allocate blocks beyond i_size). But I'm not sure what others
> > > think.
> >
> > IMHO, it makes more sense for consistency and "get what users
> > expect" that these be treated as flags. Some users will want
> > KEEP_SIZE, but in other cases it may make sense that a hole punch
> > at the end of a file should shrink the file (i.e. the opposite of
> > an append).
>
> What's wrong with ftruncate() for this?
>
> There's plenty of open questions about the interface if we allow
> hole punching to change the file size. e.g. where do we set the EOF
> (offset or offset+len)? What do we do with the rest of the blocks
> that are now beyond EOF? We weren't asked to punch them out, so do
> we leave them behind? What if we are leaving written blocks beyond
> EOF - does any filesystem other than XFS support that (i.e. are we
> introducing different behaviour on different filesystems)? And what
> happens if the offset is beyond EOF? Do we extend the file, and if
> so why wouldn't you just use ftruncate() instead?
>
> IMO, allowing hole punching to change the file size makes it much
> more complicated and hence less likely to simply do what the user
> expects. It also is harder to implement and testing becomes much
> more intricate. From that perspective, it does not seem desirable to
> me...
>
FWIW I agree with Dave, the only question at this point is do we force users to
specify KEEP_SIZE with PUNCH_HOLE? On one hand it makes the interface a bit
more consistent, on the other hand it makes the documentation a little weird
"We have mode here, but if you want to use PUNCH_HOLE you also have to specify
KEEP_SIZE, so really it's like a flags field it's just named poorly"
I have no strong opinions the other way so if nobody else does then I'll just do
it Jan's way. Thanks,
Josef
On Tue, Nov 16, 2010 at 09:28:14PM -0500, Josef Bacik wrote:
> On Wed, Nov 17, 2010 at 01:11:50PM +1100, Dave Chinner wrote:
> > On Tue, Nov 16, 2010 at 06:22:47PM -0600, Andreas Dilger wrote:
> > > On 2010-11-16, at 07:14, Jan Kara wrote:
> > > >> Yeah I went back and forth on this. KEEP_SIZE won't change the
> > > >> behavior of PUNCH_HOLE since PUNCH_HOLE implicitly means keep
> > > >> the size. I figured since its "mode" and not "flags" it would
> > > >> be ok to make either way accepted, but if you prefer PUNCH_HOLE
> > > >> means you have to have KEEP_SIZE set then I'm cool with that,
> > > >> just let me know one way or the other.
> > > >
> > > > So we call it "mode" but speak about "flags"? Seems a bit
> > > > inconsistent. I'd maybe lean a bit at the "flags" side and just
> > > > make sure that only one of FALLOC_FL_KEEP_SIZE,
> > > > FALLOC_FL_PUNCH_HOLE is set (interpreting FALLOC_FL_KEEP_SIZE as
> > > > allocate blocks beyond i_size). But I'm not sure what others
> > > > think.
> > >
> > > IMHO, it makes more sense for consistency and "get what users
> > > expect" that these be treated as flags. Some users will want
> > > KEEP_SIZE, but in other cases it may make sense that a hole punch
> > > at the end of a file should shrink the file (i.e. the opposite of
> > > an append).
> >
> > What's wrong with ftruncate() for this?
> >
> > There's plenty of open questions about the interface if we allow
> > hole punching to change the file size. e.g. where do we set the EOF
> > (offset or offset+len)? What do we do with the rest of the blocks
> > that are now beyond EOF? We weren't asked to punch them out, so do
> > we leave them behind? What if we are leaving written blocks beyond
> > EOF - does any filesystem other than XFS support that (i.e. are we
> > introducing different behaviour on different filesystems)? And what
> > happens if the offset is beyond EOF? Do we extend the file, and if
> > so why wouldn't you just use ftruncate() instead?
> >
> > IMO, allowing hole punching to change the file size makes it much
> > more complicated and hence less likely to simply do what the user
> > expects. It also is harder to implement and testing becomes much
> > more intricate. From that perspective, it does not seem desirable to
> > me...
> >
>
> FWIW I agree with Dave, the only question at this point is do we force users to
> specify KEEP_SIZE with PUNCH_HOLE? On one hand it makes the interface a bit
> more consistent, on the other hand it makes the documentation a little weird
>
> "We have mode here, but if you want to use PUNCH_HOLE you also have to specify
> KEEP_SIZE, so really it's like a flags field it's just named poorly"
>
> I have no strong opinions the other way so if nobody else does then I'll just do
> it Jan's way. Thanks,
>
Sorry child induced sleep deprevation bleeding in there, that should read "I
have no strong opinions one way or the other." Sheesh,
Josef
> >There is no simple way to test if a filesystem supports hole punching or not so
> >the check has to be done per fs. Thanks,
>
> Could put a flag word in superblock_operations. Filesystems which
> support punching (or other features) can enable it there.
No, it couldn't be in super_operations. It may vary on a per-inode
basis for some file systems, such as ext4 (depending on whether the
inode is extent-mapped or indirect-block mapped).
So at least for ext4 we'd need to call into fallocate() function
anyway, once we add support. I suppose if other file systems really
want it, we could add a flag to the super block ops structure, so they
don't have do the "do we support the punch" operation. I can go
either way on that; although if we think the majority of file systems
are going support punch in the long-term, then it might not be worth
it to add such a flag.
- Ted
On Tue, Nov 16, 2010 at 10:06:40PM -0500, Ted Ts'o wrote:
> > >There is no simple way to test if a filesystem supports hole punching or not so
> > >the check has to be done per fs. Thanks,
> >
> > Could put a flag word in superblock_operations. Filesystems which
> > support punching (or other features) can enable it there.
>
> No, it couldn't be in super_operations. It may vary on a per-inode
> basis for some file systems, such as ext4 (depending on whether the
> inode is extent-mapped or indirect-block mapped).
>
> So at least for ext4 we'd need to call into fallocate() function
> anyway, once we add support. I suppose if other file systems really
> want it, we could add a flag to the super block ops structure, so they
> don't have do the "do we support the punch" operation. I can go
> either way on that; although if we think the majority of file systems
> are going support punch in the long-term, then it might not be worth
> it to add such a flag.
>
Yeah thats alot of extra code just for one part of fallocate. Calling into
->fallocate is perfectly reasonable, especially since ext4 works the way it
does, I'm going to leave things as they are. Thanks,
Josef
On 2010-11-16, at 20:11, Dave Chinner wrote:
> On Tue, Nov 16, 2010 at 06:22:47PM -0600, Andreas Dilger wrote:
>> IMHO, it makes more sense for consistency and "get what users
>> expect" that these be treated as flags. Some users will want
>> KEEP_SIZE, but in other cases it may make sense that a hole punch
>> at the end of a file should shrink the file (i.e. the opposite of
>> an append).
>
> What's wrong with ftruncate() for this?
It makes the API usage from applications more consistent. It would be inconvenient, for example, if applications had to use a different system call if they were writing in the middle of the file vs. at the end, wouldn't it?
Similarly, if multiple threads are appending vs. punching (let's assume non-overlapping regions, for sanity, like a producer/consumer model punching out completed records) then using ftruncate() to remove the last record and shrink the file would require locking the whole file from userspace (unlike the append, which does this in the kernel), or risk discarding unprocessed data beyond the record that was punched out.
> There's plenty of open questions about the interface if we allow
> hole punching to change the file size. e.g. where do we set the EOF
> (offset or offset+len)?
I would think it natural that the new size is the start of the region, like an "anti-write" (where write sets the size at the end of the added bytes).
> What do we do with the rest of the blocks that are now beyond EOF?
> We weren't asked to punch them out, so do we leave them behind?
I definitely think they should be left as is. If they were in the punched-out range, they would be deallocated, and if they are beyond EOF they will remain as they are - we didn't ask to remove them unless the punched-out range went to ~0ULL (which would make it equivalent to an ftruncate()).
> What if we are leaving written blocks beyond EOF - does any filesystem other than XFS support that (i.e. are we introducing different behaviour on different filesystems)?
I'm not sure I understand what a "written block beyond EOF" means. How can there be data beyond EOF? I think the KEEP_SIZE flag is only relevant if the punch is spanning EOF, like the opposite of a write that is spanning EOF. If KEEP_SIZE is set, then it leaves the size unchanged, and if unset and punch spans EOF it reduces the file size. If the punch is not at EOF it doesn't change the file size, just like a write that is not at EOF.
> And what happens if the offset is beyond EOF? Do we extend the file, and if so why wouldn't you just use ftruncate() instead?
Even if the effects were the same, it makes sense because applications may be using fallocate(PUNCH_HOLE) to punch out records, and having them special case the use of ftruncate() to get certain semantics at the end of the file adds needless complexity.
Cheers, Andreas
On 2010-11-16, at 20:34, Josef Bacik wrote:
> FWIW I agree with Dave, the only question at this point is do we force users to specify KEEP_SIZE with PUNCH_HOLE? On one hand it makes the interface a bit more consistent, on the other hand it makes the documentation a little weird
>
> "We have mode here, but if you want to use PUNCH_HOLE you also have to specify KEEP_SIZE, so really it's like a flags field it's just named poorly"
Even if this is the case, and we decide today that PUNCH_HOLE without KEEP_SIZE is not desirable to implement, it would be better to just return -EOPNOTSUPP if both flags are not set than assume one or the other is what the user wanted. That allows the ability to implement this in the future without breaking every application, while if it is assumed that KEEP_SIZE is always implicit there will never be a way to add that functionality without something awful like a separate CHANGE_SIZE flag for PUNCH_HOLE.
One option is to define FALLOC_FL_PUNCH_HOLE as 0x3 (so that KEEP_SIZE is always passed) and in the future we can define some new flag name like TRUNCATE_HOLE (or whatever) that is 0x2 only.
Cheers, Andreas
On Mon, Nov 15, 2010 at 12:05:20PM -0500, Josef Bacik wrote:
> This patch just makes ocfs2 use its UNRESERVP ioctl when we get the hole punch
> flag in fallocate. I didn't test it, but it seems simple enough. Thanks,
>
> Signed-off-by: Josef Bacik <[email protected]>
Seems reasonable to me.
Acked-by: Joel Becker <[email protected]>
Joel
> ---
> fs/ocfs2/file.c | 10 ++++++++--
> 1 files changed, 8 insertions(+), 2 deletions(-)
>
> diff --git a/fs/ocfs2/file.c b/fs/ocfs2/file.c
> index 77b4c04..181ae52 100644
> --- a/fs/ocfs2/file.c
> +++ b/fs/ocfs2/file.c
> @@ -1992,6 +1992,7 @@ static long ocfs2_fallocate(struct inode *inode, int mode, loff_t offset,
> struct ocfs2_super *osb = OCFS2_SB(inode->i_sb);
> struct ocfs2_space_resv sr;
> int change_size = 1;
> + int cmd = OCFS2_IOC_RESVSP64;
>
> if (!ocfs2_writes_unwritten_extents(osb))
> return -EOPNOTSUPP;
> @@ -2002,12 +2003,17 @@ static long ocfs2_fallocate(struct inode *inode, int mode, loff_t offset,
> if (mode & FALLOC_FL_KEEP_SIZE)
> change_size = 0;
>
> + if (mode & FALLOC_FL_PUNCH_HOLE) {
> + cmd = OCFS2_IOC_UNRESVSP64;
> + change_size = 0;
> + }
> +
> sr.l_whence = 0;
> sr.l_start = (s64)offset;
> sr.l_len = (s64)len;
>
> - return __ocfs2_change_file_space(NULL, inode, offset,
> - OCFS2_IOC_RESVSP64, &sr, change_size);
> + return __ocfs2_change_file_space(NULL, inode, offset, cmd, &sr,
> + change_size);
> }
>
> int ocfs2_check_range_for_refcount(struct inode *inode, loff_t pos,
> --
> 1.6.6.1
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at http://www.tux.org/lkml/
>
--
"Born under a bad sign.
I been down since I began to crawl.
If it wasn't for bad luck,
I wouldn't have no luck at all."
Joel Becker
Senior Development Manager
Oracle
E-mail: [email protected]
Phone: (650) 506-8127