2012-01-12 01:02:33

by Hugh Dickins

[permalink] [raw]
Subject: punch-hole should go beyond i_size

Hi Allison,

In thinking about fallocate() on tmpfs, I cross-check with ext4
and find this bug in its implementation of FALLOC_FL_PUNCH_HOLE:

rm -f temp
fallocate -l 4096 temp
du temp # shows 4, right
fallocate -p -l 4096 temp
du temp # shows 0, right
rm -f temp
fallocate -n -l 4096 temp
du temp # shows 4, right
fallocate -p -l 4096 temp
du temp # shows 4, wrong
rm temp

ext4_ext_punch_hole() contains /* No need to punch hole beyond i_size */
early return, and trimming to i_size below, but forgets that the other
variety of fallocate(), with FALLOC_FL_KEEP_SIZE set, may have allocated
blocks beyond i_size. They can be removed with ftruncate(), but it is
unexpected for fallocate() not to undo its own work, and xfs does so.

Hugh


2012-01-12 02:55:49

by Dave Chinner

[permalink] [raw]
Subject: Re: punch-hole should go beyond i_size

On Wed, Jan 11, 2012 at 05:02:12PM -0800, Hugh Dickins wrote:
> Hi Allison,
>
> In thinking about fallocate() on tmpfs, I cross-check with ext4
> and find this bug in its implementation of FALLOC_FL_PUNCH_HOLE:
>
> rm -f temp
> fallocate -l 4096 temp
> du temp # shows 4, right
> fallocate -p -l 4096 temp
> du temp # shows 0, right
> rm -f temp
> fallocate -n -l 4096 temp
> du temp # shows 4, right
> fallocate -p -l 4096 temp
> du temp # shows 4, wrong
> rm temp
>
> ext4_ext_punch_hole() contains /* No need to punch hole beyond i_size */
> early return, and trimming to i_size below, but forgets that the other
> variety of fallocate(), with FALLOC_FL_KEEP_SIZE set, may have allocated
> blocks beyond i_size. They can be removed with ftruncate(), but it is
> unexpected for fallocate() not to undo its own work, and xfs does so.

I'm pretty sure that's a bug as XFS allows punching holes in extents
beyond EOF.

Cheers,

Dave.
--
Dave Chinner
[email protected]

2012-01-12 16:27:23

by Allison Henderson

[permalink] [raw]
Subject: Re: punch-hole should go beyond i_size

On 01/11/2012 07:55 PM, Dave Chinner wrote:
> On Wed, Jan 11, 2012 at 05:02:12PM -0800, Hugh Dickins wrote:
>> Hi Allison,
>>
>> In thinking about fallocate() on tmpfs, I cross-check with ext4
>> and find this bug in its implementation of FALLOC_FL_PUNCH_HOLE:
>>
>> rm -f temp
>> fallocate -l 4096 temp
>> du temp # shows 4, right
>> fallocate -p -l 4096 temp
>> du temp # shows 0, right
>> rm -f temp
>> fallocate -n -l 4096 temp
>> du temp # shows 4, right
>> fallocate -p -l 4096 temp
>> du temp # shows 4, wrong
>> rm temp
>>
>> ext4_ext_punch_hole() contains /* No need to punch hole beyond i_size */
>> early return, and trimming to i_size below, but forgets that the other
>> variety of fallocate(), with FALLOC_FL_KEEP_SIZE set, may have allocated
>> blocks beyond i_size. They can be removed with ftruncate(), but it is
>> unexpected for fallocate() not to undo its own work, and xfs does so.
>
> I'm pretty sure that's a bug as XFS allows punching holes in extents
> beyond EOF.
>
> Cheers,
>
> Dave.

Oh I see, I'll take a look at it, I think it will be ok to just take out
the early return. Thx!

Allison


2012-01-13 00:21:50

by Hugh Dickins

[permalink] [raw]
Subject: Re: punch-hole should go beyond i_size

On Thu, 12 Jan 2012, Allison Henderson wrote:
> On 01/11/2012 07:55 PM, Dave Chinner wrote:
> > On Wed, Jan 11, 2012 at 05:02:12PM -0800, Hugh Dickins wrote:
> > >
> > > ext4_ext_punch_hole() contains /* No need to punch hole beyond i_size */
> > > early return, and trimming to i_size below, but forgets that the other
> > > variety of fallocate(), with FALLOC_FL_KEEP_SIZE set, may have allocated
> > > blocks beyond i_size. They can be removed with ftruncate(), but it is
> > > unexpected for fallocate() not to undo its own work, and xfs does so.
> >
> > I'm pretty sure that's a bug as XFS allows punching holes in extents
> > beyond EOF.
>
> Oh I see, I'll take a look at it, I think it will be ok to just take out the
> early return. Thx!

Thanks. And I've just noticed another, very easily fixed, error:
I believe those -ENOTSUPPs in ext4_punch_hole() should be -EOPNOTSUPPs.

Hugh

2012-01-13 03:18:39

by Allison Henderson

[permalink] [raw]
Subject: Re: punch-hole should go beyond i_size

On 01/12/2012 05:21 PM, Hugh Dickins wrote:
> On Thu, 12 Jan 2012, Allison Henderson wrote:
>> On 01/11/2012 07:55 PM, Dave Chinner wrote:
>>> On Wed, Jan 11, 2012 at 05:02:12PM -0800, Hugh Dickins wrote:
>>>>
>>>> ext4_ext_punch_hole() contains /* No need to punch hole beyond i_size */
>>>> early return, and trimming to i_size below, but forgets that the other
>>>> variety of fallocate(), with FALLOC_FL_KEEP_SIZE set, may have allocated
>>>> blocks beyond i_size. They can be removed with ftruncate(), but it is
>>>> unexpected for fallocate() not to undo its own work, and xfs does so.
>>>
>>> I'm pretty sure that's a bug as XFS allows punching holes in extents
>>> beyond EOF.
>>
>> Oh I see, I'll take a look at it, I think it will be ok to just take out the
>> early return. Thx!
>
> Thanks. And I've just noticed another, very easily fixed, error:
> I believe those -ENOTSUPPs in ext4_punch_hole() should be -EOPNOTSUPPs.
>
> Hugh
>
Ah, I think youre right, I will change the error values. I have the
current solution running in a test right now, and will post the patch
when it comes up clean. Thx!

Allison Henderson


2012-05-13 21:13:46

by Hugh Dickins

[permalink] [raw]
Subject: Re: punch-hole should go beyond i_size

On Thu, 12 Jan 2012, Allison Henderson wrote:
> On 01/11/2012 07:55 PM, Dave Chinner wrote:
> > On Wed, Jan 11, 2012 at 05:02:12PM -0800, Hugh Dickins wrote:
> > > Hi Allison,
> > >
> > > In thinking about fallocate() on tmpfs, I cross-check with ext4
> > > and find this bug in its implementation of FALLOC_FL_PUNCH_HOLE:
> > >
> > > rm -f temp
> > > fallocate -l 4096 temp
> > > du temp # shows 4, right
> > > fallocate -p -l 4096 temp
> > > du temp # shows 0, right
> > > rm -f temp
> > > fallocate -n -l 4096 temp
> > > du temp # shows 4, right
> > > fallocate -p -l 4096 temp
> > > du temp # shows 4, wrong
> > > rm temp
> > >
> > > ext4_ext_punch_hole() contains /* No need to punch hole beyond i_size */
> > > early return, and trimming to i_size below, but forgets that the other
> > > variety of fallocate(), with FALLOC_FL_KEEP_SIZE set, may have allocated
> > > blocks beyond i_size. They can be removed with ftruncate(), but it is
> > > unexpected for fallocate() not to undo its own work, and xfs does so.
> >
> > I'm pretty sure that's a bug as XFS allows punching holes in extents
> > beyond EOF.
> >
> > Cheers,
> >
> > Dave.
>
> Oh I see, I'll take a look at it, I think it will be ok to just take out the
> early return. Thx!

I see the -EOPNOTSUPPs have gone into 3.4's ext4_punch_hole() - thanks -
but the i_size issue remains unfixed. I wouldn't be surprised if it were
more complicated than you had hoped - I had no intention of trying a patch
myself! It's not an actual problem for me, but I thought I'd just send a
reminder, before I move out of the hole-punching business.

Hugh

2012-05-15 21:37:58

by Allison Henderson

[permalink] [raw]
Subject: Re: punch-hole should go beyond i_size

On 05/13/2012 02:13 PM, Hugh Dickins wrote:
> On Thu, 12 Jan 2012, Allison Henderson wrote:
>> On 01/11/2012 07:55 PM, Dave Chinner wrote:
>>> On Wed, Jan 11, 2012 at 05:02:12PM -0800, Hugh Dickins wrote:
>>>> Hi Allison,
>>>>
>>>> In thinking about fallocate() on tmpfs, I cross-check with ext4
>>>> and find this bug in its implementation of FALLOC_FL_PUNCH_HOLE:
>>>>
>>>> rm -f temp
>>>> fallocate -l 4096 temp
>>>> du temp # shows 4, right
>>>> fallocate -p -l 4096 temp
>>>> du temp # shows 0, right
>>>> rm -f temp
>>>> fallocate -n -l 4096 temp
>>>> du temp # shows 4, right
>>>> fallocate -p -l 4096 temp
>>>> du temp # shows 4, wrong
>>>> rm temp
>>>>
>>>> ext4_ext_punch_hole() contains /* No need to punch hole beyond i_size */
>>>> early return, and trimming to i_size below, but forgets that the other
>>>> variety of fallocate(), with FALLOC_FL_KEEP_SIZE set, may have allocated
>>>> blocks beyond i_size. They can be removed with ftruncate(), but it is
>>>> unexpected for fallocate() not to undo its own work, and xfs does so.
>>>
>>> I'm pretty sure that's a bug as XFS allows punching holes in extents
>>> beyond EOF.
>>>
>>> Cheers,
>>>
>>> Dave.
>>
>> Oh I see, I'll take a look at it, I think it will be ok to just take out the
>> early return. Thx!
>
> I see the -EOPNOTSUPPs have gone into 3.4's ext4_punch_hole() - thanks -
> but the i_size issue remains unfixed. I wouldn't be surprised if it were
> more complicated than you had hoped - I had no intention of trying a patch
> myself! It's not an actual problem for me, but I thought I'd just send a
> reminder, before I move out of the hole-punching business.
>
> Hugh
> --
> To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
>

Hi all,

I had a fix for this a while ago and I believe Lukas had rebased it when he was working on some punch hole optimizations, but Im not sure what happened to it after that. I think Lukas might still be working on that set? If not, I can take a peek at it again and see if I can get it updated and resent. Thx!

Allison Henderson


2012-05-15 22:38:57

by Hugh Dickins

[permalink] [raw]
Subject: Re: punch-hole should go beyond i_size

On Tue, 15 May 2012, Allison Henderson wrote:
> On 05/13/2012 02:13 PM, Hugh Dickins wrote:
> > On Thu, 12 Jan 2012, Allison Henderson wrote:
> >> On 01/11/2012 07:55 PM, Dave Chinner wrote:
> >>> On Wed, Jan 11, 2012 at 05:02:12PM -0800, Hugh Dickins wrote:
> >>>> Hi Allison,
> >>>>
> >>>> In thinking about fallocate() on tmpfs, I cross-check with ext4
> >>>> and find this bug in its implementation of FALLOC_FL_PUNCH_HOLE:
> >>>>
> >>>> rm -f temp
> >>>> fallocate -l 4096 temp
> >>>> du temp # shows 4, right
> >>>> fallocate -p -l 4096 temp
> >>>> du temp # shows 0, right
> >>>> rm -f temp
> >>>> fallocate -n -l 4096 temp
> >>>> du temp # shows 4, right
> >>>> fallocate -p -l 4096 temp
> >>>> du temp # shows 4, wrong
> >>>> rm temp
> >>>>
> >>>> ext4_ext_punch_hole() contains /* No need to punch hole beyond i_size */
> >>>> early return, and trimming to i_size below, but forgets that the other
> >>>> variety of fallocate(), with FALLOC_FL_KEEP_SIZE set, may have allocated
> >>>> blocks beyond i_size. They can be removed with ftruncate(), but it is
> >>>> unexpected for fallocate() not to undo its own work, and xfs does so.
> >>>
> >>> I'm pretty sure that's a bug as XFS allows punching holes in extents
> >>> beyond EOF.
> >>>
> >>> Cheers,
> >>>
> >>> Dave.
> >>
> >> Oh I see, I'll take a look at it, I think it will be ok to just take out the
> >> early return. Thx!
> >
> > I see the -EOPNOTSUPPs have gone into 3.4's ext4_punch_hole() - thanks -
> > but the i_size issue remains unfixed. I wouldn't be surprised if it were
> > more complicated than you had hoped - I had no intention of trying a patch
> > myself! It's not an actual problem for me, but I thought I'd just send a
> > reminder, before I move out of the hole-punching business.
>
> Hi all,
>
> I had a fix for this a while ago and I believe Lukas had rebased it
> when he was working on some punch hole optimizations, but Im not sure
> what happened to it after that. I think Lukas might still be working
> on that set? If not, I can take a peek at it again and see if I can
> get it updated and resent. Thx!
>
> Allison Henderson

Thanks, Allison. I just added Jan to the Cc list to make sure he sees,
since we mentioned this in the inode_dio_wait thread (which I skilfully
directed to an almost disjoint set of addressees - though I expect he
already saw via linux-ext4).

Hugh

2012-05-16 06:15:42

by Lukas Czerner

[permalink] [raw]
Subject: Re: punch-hole should go beyond i_size

On Tue, 15 May 2012, Hugh Dickins wrote:

> Date: Tue, 15 May 2012 15:38:33 -0700 (PDT)
> From: Hugh Dickins <[email protected]>
> To: Allison Henderson <[email protected]>
> Cc: Jan Kara <[email protected]>, Dave Chinner <[email protected]>,
> Theodore Ts'o <[email protected]>, [email protected],
> Lukas Czerner <[email protected]>
> Subject: Re: punch-hole should go beyond i_size
>
> On Tue, 15 May 2012, Allison Henderson wrote:
> > On 05/13/2012 02:13 PM, Hugh Dickins wrote:
> > > On Thu, 12 Jan 2012, Allison Henderson wrote:
> > >> On 01/11/2012 07:55 PM, Dave Chinner wrote:
> > >>> On Wed, Jan 11, 2012 at 05:02:12PM -0800, Hugh Dickins wrote:
> > >>>> Hi Allison,
> > >>>>
> > >>>> In thinking about fallocate() on tmpfs, I cross-check with ext4
> > >>>> and find this bug in its implementation of FALLOC_FL_PUNCH_HOLE:
> > >>>>
> > >>>> rm -f temp
> > >>>> fallocate -l 4096 temp
> > >>>> du temp # shows 4, right
> > >>>> fallocate -p -l 4096 temp
> > >>>> du temp # shows 0, right
> > >>>> rm -f temp
> > >>>> fallocate -n -l 4096 temp
> > >>>> du temp # shows 4, right
> > >>>> fallocate -p -l 4096 temp
> > >>>> du temp # shows 4, wrong
> > >>>> rm temp
> > >>>>
> > >>>> ext4_ext_punch_hole() contains /* No need to punch hole beyond i_size */
> > >>>> early return, and trimming to i_size below, but forgets that the other
> > >>>> variety of fallocate(), with FALLOC_FL_KEEP_SIZE set, may have allocated
> > >>>> blocks beyond i_size. They can be removed with ftruncate(), but it is
> > >>>> unexpected for fallocate() not to undo its own work, and xfs does so.
> > >>>
> > >>> I'm pretty sure that's a bug as XFS allows punching holes in extents
> > >>> beyond EOF.
> > >>>
> > >>> Cheers,
> > >>>
> > >>> Dave.
> > >>
> > >> Oh I see, I'll take a look at it, I think it will be ok to just take out the
> > >> early return. Thx!
> > >
> > > I see the -EOPNOTSUPPs have gone into 3.4's ext4_punch_hole() - thanks -
> > > but the i_size issue remains unfixed. I wouldn't be surprised if it were
> > > more complicated than you had hoped - I had no intention of trying a patch
> > > myself! It's not an actual problem for me, but I thought I'd just send a
> > > reminder, before I move out of the hole-punching business.
> >
> > Hi all,
> >
> > I had a fix for this a while ago and I believe Lukas had rebased it
> > when he was working on some punch hole optimizations, but Im not sure
> > what happened to it after that. I think Lukas might still be working
> > on that set? If not, I can take a peek at it again and see if I can
> > get it updated and resent. Thx!
> >
> > Allison Henderson
>
> Thanks, Allison. I just added Jan to the Cc list to make sure he sees,
> since we mentioned this in the inode_dio_wait thread (which I skilfully
> directed to an almost disjoint set of addressees - though I expect he
> already saw via linux-ext4).
>
> Hugh

Yes, we've been talking about this issue on LSF with Ted and the
conclusion is that we want to wait for the range locks to be ready.
This way we can avoid taking imutex for the punch hole when punching
beyond isize which we would have to do otherwise.

I am not sure how big of an issue this is, probably not so big. If
we can not wait for the range locks, I can make a patch with imutex
protection.

Thanks!
-Lukas

2012-05-16 18:09:54

by Hugh Dickins

[permalink] [raw]
Subject: Re: punch-hole should go beyond i_size

On Tue, May 15, 2012 at 11:14 PM, Lukáš Czerner <[email protected]> wrote:
> On Tue, 15 May 2012, Hugh Dickins wrote:
>>> Date: Tue, 15 May 2012 15:38:33 -0700 (PDT)
>> From: Hugh Dickins <[email protected]>
>> To: Allison Henderson <[email protected]>
>> Cc: Jan Kara <[email protected]>, Dave Chinner <[email protected]>,
>>     Theodore Ts'o <[email protected]>, [email protected],
>>     Lukas Czerner <[email protected]>
>> Subject: Re: punch-hole should go beyond i_size
>>
>> On Tue, 15 May 2012, Allison Henderson wrote:
>> > On 05/13/2012 02:13 PM, Hugh Dickins wrote:
>> > > On Thu, 12 Jan 2012, Allison Henderson wrote:
>> > >> On 01/11/2012 07:55 PM, Dave Chinner wrote:
>> > >>> On Wed, Jan 11, 2012 at 05:02:12PM -0800, Hugh Dickins wrote:
>> > >>>> Hi Allison,
>> > >>>>
>> > >>>> In thinking about fallocate() on tmpfs, I cross-check with ext4
>> > >>>> and find this bug in its implementation of FALLOC_FL_PUNCH_HOLE:
>> > >>>>
>> > >>>> rm -f temp
>> > >>>> fallocate    -l 4096 temp
>> > >>>> du temp                                # shows 4, right
>> > >>>> fallocate -p -l 4096 temp
>> > >>>> du temp                                # shows 0, right
>> > >>>> rm -f temp
>> > >>>> fallocate -n -l 4096 temp
>> > >>>> du temp                                # shows 4, right
>> > >>>> fallocate -p -l 4096 temp
>> > >>>> du temp                                # shows 4, wrong
>> > >>>> rm temp
>> > >>>>
>> > >>>> ext4_ext_punch_hole() contains /* No need to punch hole beyond i_size */
>> > >>>> early return, and trimming to i_size below, but forgets that the other
>> > >>>> variety of fallocate(), with FALLOC_FL_KEEP_SIZE set, may have allocated
>> > >>>> blocks beyond i_size.  They can be removed with ftruncate(), but it is
>> > >>>> unexpected for fallocate() not to undo its own work, and xfs does so.
>> > >>>
>> > >>> I'm pretty sure that's a bug as XFS allows punching holes in extents
>> > >>> beyond EOF.
>> > >>>
>> > >>> Cheers,
>> > >>>
>> > >>> Dave.
>> > >>
>> > >> Oh I see, I'll take a look at it, I think it will be ok to just take out the
>> > >> early return.  Thx!
>> > >
>> > > I see the -EOPNOTSUPPs have gone into 3.4's ext4_punch_hole() - thanks -
>> > > but the i_size issue remains unfixed.  I wouldn't be surprised if it were
>> > > more complicated than you had hoped - I had no intention of trying a patch
>> > > myself!  It's not an actual problem for me, but I thought I'd just send a
>> > > reminder, before I move out of the hole-punching business.
>> >
>> > Hi all,
>> >
>> > I had a fix for this a while ago and I believe Lukas had rebased it
>> > when he was working on some punch hole optimizations, but Im not sure
>> > what happened to it after that.  I think Lukas might still be working
>> > on that set?  If not, I can take a peek at it again and see if I can
>> > get it updated and resent.  Thx!
>> >
>> > Allison Henderson
>>
>> Thanks, Allison.  I just added Jan to the Cc list to make sure he sees,
>> since we mentioned this in the inode_dio_wait thread (which I skilfully
>> directed to an almost disjoint set of addressees - though I expect he
>> already saw via linux-ext4).
>>
>> Hugh
>
> Yes, we've been talking about this issue on LSF with Ted and the
> conclusion is that we want to wait for the range locks to be ready.
> This way we can avoid taking imutex for the punch hole when punching
> beyond isize which we would have to do otherwise.
>
> I am not sure how big of an issue this is, probably not so big. If
> we can not wait for the range locks, I can make a patch with imutex
> protection.

I agree with you, this issue is not big enough to be worth reordering
ext4 priorities and making an interim fix. I don't think it has
actually inconvenienced anyone at all, but merely came to my notice
when I was trying to work out the correct behaviour for tmpfs.

However, the issues that Jan is grappling with in "Hole punching and
mmap races" seem more serious, and may end up affecting or solving
this one too.

Hugh