LinuxLists.cc - [PATCH v5 0/10] fs: Introduce new flag(FALLOC_FL_COLLAPSE

2014-02-18 16:37:16

Subject: [PATCH v5 0/10] fs: Introduce new flag(FALLOC_FL_COLLAPSE_RANGE) for fallocate

From: Namjae Jeon <[email protected]>

This patch series is in response of the following post:
http://lwn.net/Articles/556136/
"ext4: introduce two new ioctls"

Dave chinner suggested that truncate_block_range
(which was one of the ioctls name) should be an fallocate operation
and not any fs specific ioctl, hence we add this functionality to fallocate.

This patch series introduces new flag FALLOC_FL_COLLAPSE_RANGE for fallocate
and implements it for XFS and Ext4.

The semantics of this flag are following:
1) It collapses the range lying between offset and length by removing any data
blocks which are present in this range and than updates all the logical
offsets of extents beyond "offset + len" to nullify the hole created by
removing blocks. In short, it does not leave a hole.
2) It should be used exclusively. No other fallocate flag in combination.
3) Offset and length supplied to fallocate should be fs block size aligned
in case of xfs and ext4.
4) Collaspe range does not work beyond i_size.

This new functionality of collapsing range could be used by media editing tools
which does non linear editing to quickly purge and edit parts of a media file.
This will immensely improve the performance of these operations.
The limitation of fs block size aligned offsets can be easily handled
by media codecs which are encapsulated in a conatiner as they have to
just change the offset to next keyframe value to match the proper alignment.

Namjae Jeon (10):
fs: Add new flag(FALLOC_FL_COLLAPSE_RANGE) for fallocate
xfs: Add support FALLOC_FL_COLLAPSE_RANGE for fallocate
ext4: Add support FALLOC_FL_COLLAPSE_RANGE for fallocate
xfsprog: xfsio: Add support FALLOC_FL_COLLAPSE_RANGE for fallocate
xfstest: shared/001: Standard collapse range tests
xfstest: shared/002: Delayed allocation collapse range
xfstest: shared/003: Multi collapse range tests
xfstest: shared/004: Delayed allocation multi collapse
xfstest: shared/005: Test multiple fallocate collapse
manpage: update FALLOC_FL_COLLAPSE_RANGE flag in fallocate
--
1.7.11-rc0

_______________________________________________
xfs mailing list
[email protected]
http://oss.sgi.com/mailman/listinfo/xfs

2014-02-24 00:57:10

by Dave Chinner

[permalink] [raw]

Subject: Re: [PATCH v5 0/10] fs: Introduce new flag(FALLOC_FL_COLLAPSE_RANGE) for fallocate

On Wed, Feb 19, 2014 at 01:37:16AM +0900, Namjae Jeon wrote:
> From: Namjae Jeon <[email protected]>
>
> This patch series is in response of the following post:
> http://lwn.net/Articles/556136/
> "ext4: introduce two new ioctls"
>
> Dave chinner suggested that truncate_block_range
> (which was one of the ioctls name) should be an fallocate operation
> and not any fs specific ioctl, hence we add this functionality to fallocate.
>
> This patch series introduces new flag FALLOC_FL_COLLAPSE_RANGE for fallocate
> and implements it for XFS and Ext4.
>
> The semantics of this flag are following:
> 1) It collapses the range lying between offset and length by removing any data
> blocks which are present in this range and than updates all the logical
> offsets of extents beyond "offset + len" to nullify the hole created by
> removing blocks. In short, it does not leave a hole.
> 2) It should be used exclusively. No other fallocate flag in combination.
> 3) Offset and length supplied to fallocate should be fs block size aligned
> in case of xfs and ext4.
> 4) Collaspe range does not work beyond i_size.
>
> This new functionality of collapsing range could be used by media editing tools
> which does non linear editing to quickly purge and edit parts of a media file.
> This will immensely improve the performance of these operations.
> The limitation of fs block size aligned offsets can be easily handled
> by media codecs which are encapsulated in a conatiner as they have to
> just change the offset to next keyframe value to match the proper alignment.
>
> Namjae Jeon (10):
> fs: Add new flag(FALLOC_FL_COLLAPSE_RANGE) for fallocate
> xfs: Add support FALLOC_FL_COLLAPSE_RANGE for fallocate

I've pushed these to the following branch:

git://oss.sgi.com/xfs/xfs.git xfs-collapse-range

And so they'll be in tomorrow's linux-next tree.

> ext4: Add support FALLOC_FL_COLLAPSE_RANGE for fallocate

I've left this one alone for the ext4 guys to sort out.

> xfsprog: xfsio: Add support FALLOC_FL_COLLAPSE_RANGE for fallocate

That's already in a current xfstests tree.

> xfstest: shared/001: Standard collapse range tests
> xfstest: shared/002: Delayed allocation collapse range
> xfstest: shared/003: Multi collapse range tests
> xfstest: shared/004: Delayed allocation multi collapse
> xfstest: shared/005: Test multiple fallocate collapse

These are now in the xfstests git tree.

> manpage: update FALLOC_FL_COLLAPSE_RANGE flag in fallocate

And Michael will need to review and commit that to the kernel
manpages tree.

Cheers,

Dave.
--
Dave Chinner
[email protected]

_______________________________________________
xfs mailing list
[email protected]
http://oss.sgi.com/mailman/listinfo/xfs

2014-02-24 01:34:49

by Namjae Jeon

[permalink] [raw]

Subject: Re: [PATCH v5 0/10] fs: Introduce new flag(FALLOC_FL_COLLAPSE_RANGE) for fallocate

2014-02-24 9:57 GMT+09:00, Dave Chinner <[email protected]>:
> On Wed, Feb 19, 2014 at 01:37:16AM +0900, Namjae Jeon wrote:
>> From: Namjae Jeon <[email protected]>
>>
>> This patch series is in response of the following post:
>> http://lwn.net/Articles/556136/
>> "ext4: introduce two new ioctls"
>>
>> Dave chinner suggested that truncate_block_range
>> (which was one of the ioctls name) should be an fallocate operation
>> and not any fs specific ioctl, hence we add this functionality to
>> fallocate.
>>
>> This patch series introduces new flag FALLOC_FL_COLLAPSE_RANGE for
>> fallocate
>> and implements it for XFS and Ext4.
>>
>> The semantics of this flag are following:
>> 1) It collapses the range lying between offset and length by removing any
>> data
>> blocks which are present in this range and than updates all the
>> logical
>> offsets of extents beyond "offset + len" to nullify the hole created
>> by
>> removing blocks. In short, it does not leave a hole.
>> 2) It should be used exclusively. No other fallocate flag in combination.
>> 3) Offset and length supplied to fallocate should be fs block size
>> aligned
>> in case of xfs and ext4.
>> 4) Collaspe range does not work beyond i_size.
>>
>> This new functionality of collapsing range could be used by media editing
>> tools
>> which does non linear editing to quickly purge and edit parts of a media
>> file.
>> This will immensely improve the performance of these operations.
>> The limitation of fs block size aligned offsets can be easily handled
>> by media codecs which are encapsulated in a conatiner as they have to
>> just change the offset to next keyframe value to match the proper
>> alignment.
>>
>> Namjae Jeon (10):
>> fs: Add new flag(FALLOC_FL_COLLAPSE_RANGE) for fallocate
>> xfs: Add support FALLOC_FL_COLLAPSE_RANGE for fallocate
>
> I've pushed these to the following branch:
>
> git://oss.sgi.com/xfs/xfs.git xfs-collapse-range
>
> And so they'll be in tomorrow's linux-next tree.
Okay.
>
>> ext4: Add support FALLOC_FL_COLLAPSE_RANGE for fallocate
>
> I've left this one alone for the ext4 guys to sort out.
I will try to follow up continously.
>
>> xfsprog: xfsio: Add support FALLOC_FL_COLLAPSE_RANGE for fallocate
>
> That's already in a current xfstests tree.
Okay.
>
>> xfstest: shared/001: Standard collapse range tests
>> xfstest: shared/002: Delayed allocation collapse range
>> xfstest: shared/003: Multi collapse range tests
>> xfstest: shared/004: Delayed allocation multi collapse
>> xfstest: shared/005: Test multiple fallocate collapse
>
> These are now in the xfstests git tree.
Okay.
>
>> manpage: update FALLOC_FL_COLLAPSE_RANGE flag in fallocate
>
> And Michael will need to review and commit that to the kernel
> manpages tree.
Okay,
Hi Micheal.
Could you please review manpage patch ?

Thanks :)
>
> Cheers,
>
> Dave.
> --
> Dave Chinner
> [email protected]
>

_______________________________________________
xfs mailing list
[email protected]
http://oss.sgi.com/mailman/listinfo/xfs

2014-02-25 03:16:01

On Wed, Feb 26, 2014 at 03:08:58PM -0800, Hugh Dickins wrote:
> On Wed, 26 Feb 2014, Dave Chinner wrote:
> > On Tue, Feb 25, 2014 at 08:45:15PM -0800, Hugh Dickins wrote:
> > > On Wed, 26 Feb 2014, Dave Chinner wrote:
> > > > On Tue, Feb 25, 2014 at 03:23:35PM -0800, Hugh Dickins wrote:
> > > >
> > > > > I should mention that when "we" implemented this thirty years ago,
> > > > > we had a strong conviction that the system call should be idempotent:
> > > > > that is, the len argument should indicate the final i_size, not the
> > > > > amount being removed from it. Now, I don't remember the grounds for
> > > > > that conviction: maybe it was just an idealistic preference for how
> > > > > to design a good system call. I can certainly see that defining it
> > > > > that way round would surprise many app programmers. Just mentioning
> > > > > this in case anyone on these lists sees a practical advantage to
> > > > > doing it that way instead.
> > > >
> > > > I don't see how specifying the end file size as an improvement. What
> > > > happens if you are collapse a range in a file that is still being
> > > > appended to by the application and so you race with a file size
> > > > update? IOWs, with such an API the range to be collapsed is
> > > > completely unpredictable, and IMO that's a fundamentally broken API.
> > >
> > > That's fine if you don't see the idempotent API as an improvement,
> > > I just wanted to put it on the table in case someone does see an
> > > advantage to it. But I think I'm missing something in your race
> > > example: I don't see a difference between the two APIs there.
> >
> >
> > Userspace can't sample the inode size via stat(2) and then use the value for a
> > syscall atomically. i.e. if you specify the offset you want to
> > collapse at, and the file size you want to have to define the region
> > to collapse, then the length you need to collapse is (current inode
> > size - end file size). If "current inode size" can change between
> > the stat(2) and fallocate() call (and it can), then the length being
> > collapsed is indeterminate....
>
> Thanks for explaining more, I was just about to acknowledge what a good
> example that is. Indeed, it seems not unreasonable to be editing the
> earlier part of a file while the later part of it is still streaming in.
>
> But damn, it now occurs to me that there's still a problem at the
> streaming end: its file write offset won't be updated to reflect
> the collapse, so there would be a sparse hole at that end. And
> collapse returns -EPERM if IS_APPEND(inode).

Well, we figure that most applications won't be using append only
inode flags for files that they know they want to edit at random
offsets later on. ;)

However, I can see how DVR apps would use open(O_APPEND) to obtain
the fd they write to because that sets the write position to the EOF
on every write() call (i.e. in generic_write_checks()). And collapse
range should behave sanely with this sort of usage.

e.g. XFS calls generic_write_checks() after it has taken the IO lock
to set the current write position to EOF. Hence it will be correctly
serialised against collapse range calls and so O_APPEND writes will
not leave sparse holes if collapse range calls are interleaved with
the write stream....

Cheers,

Dave.

--
Dave Chinner
[email protected]

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to [email protected]. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"[email protected]"> [email protected] </a>

2014-02-27 01:30:35

by Hugh Dickins

[permalink] [raw]

Subject: Re: [PATCH v5 0/10] fs: Introduce new flag(FALLOC_FL_COLLAPSE_RANGE) for fallocate

On Thu, 27 Feb 2014, Dave Chinner wrote:
> On Wed, Feb 26, 2014 at 03:08:58PM -0800, Hugh Dickins wrote:
> >
> > Thanks for explaining more, I was just about to acknowledge what a good
> > example that is. Indeed, it seems not unreasonable to be editing the
> > earlier part of a file while the later part of it is still streaming in.
> >
> > But damn, it now occurs to me that there's still a problem at the
> > streaming end: its file write offset won't be updated to reflect
> > the collapse, so there would be a sparse hole at that end. And
> > collapse returns -EPERM if IS_APPEND(inode).
>
> Well, we figure that most applications won't be using append only
> inode flags for files that they know they want to edit at random
> offsets later on. ;)
>
> However, I can see how DVR apps would use open(O_APPEND) to obtain
> the fd they write to because that sets the write position to the EOF
> on every write() call (i.e. in generic_write_checks()). And collapse
> range should behave sanely with this sort of usage.
>
> e.g. XFS calls generic_write_checks() after it has taken the IO lock
> to set the current write position to EOF. Hence it will be correctly
> serialised against collapse range calls and so O_APPEND writes will
> not leave sparse holes if collapse range calls are interleaved with
> the write stream....

Right, I was getting confused between O_APPEND and APPEND_Only!
Thanks, I'm back to being convinced by your example.

Hugh

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to [email protected]. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"[email protected]"> [email protected] </a>