In ext4_zero_range(), removing a file's entire block range from the
extent status tree removes all records of that file's delalloc extents.
The delalloc accounting code uses this information, and its loss can
then lead to accounting errors and kernel warnings at writeback time and
subsequent file system damage. This is most noticeable on bigalloc
file systems where code in ext4_ext_map_blocks() handles cases where
delalloc extents share clusters with a newly allocated extent.
Because we're not deleting a block range and are correctly updating the
status of its associated extent, there is no need to remove anything
from the extent status tree.
When this patch is combined with an unrelated bug fix for
ext4_zero_range(), kernel warnings and e2fsck errors reported during
xfstests runs on bigalloc filesystems are greatly reduced without
introducing regressions on other xfstests-bld test scenarios.
Signed-off-by: Eric Whitney <[email protected]>
---
fs/ext4/extents.c | 13 -------------
1 file changed, 13 deletions(-)
diff --git a/fs/ext4/extents.c b/fs/ext4/extents.c
index bed4308..c187cc3 100644
--- a/fs/ext4/extents.c
+++ b/fs/ext4/extents.c
@@ -4847,19 +4847,6 @@ static long ext4_zero_range(struct file *file, loff_t offset,
flags, mode);
if (ret)
goto out_dio;
- /*
- * Remove entire range from the extent status tree.
- *
- * ext4_es_remove_extent(inode, lblk, max_blocks) is
- * NOT sufficient. I'm not sure why this is the case,
- * but let's be conservative and remove the extent
- * status tree for the entire inode. There should be
- * no outstanding delalloc extents thanks to the
- * filemap_write_and_wait_range() call above.
- */
- ret = ext4_es_remove_extent(inode, 0, EXT_MAX_BLOCKS);
- if (ret)
- goto out_dio;
}
if (!partial_begin && !partial_end)
goto out_dio;
--
2.1.0
On Fri, 20 Mar 2015, Eric Whitney wrote:
> Date: Fri, 20 Mar 2015 19:53:50 -0400
> From: Eric Whitney <[email protected]>
> To: [email protected]
> Cc: [email protected]
> Subject: [PATCH] ext4: fix loss of delalloc extent info in ext4_zero_range()
>
> In ext4_zero_range(), removing a file's entire block range from the
> extent status tree removes all records of that file's delalloc extents.
> The delalloc accounting code uses this information, and its loss can
> then lead to accounting errors and kernel warnings at writeback time and
> subsequent file system damage. This is most noticeable on bigalloc
> file systems where code in ext4_ext_map_blocks() handles cases where
> delalloc extents share clusters with a newly allocated extent.
>
> Because we're not deleting a block range and are correctly updating the
> status of its associated extent, there is no need to remove anything
> from the extent status tree.
>
> When this patch is combined with an unrelated bug fix for
> ext4_zero_range(), kernel warnings and e2fsck errors reported during
> xfstests runs on bigalloc filesystems are greatly reduced without
> introducing regressions on other xfstests-bld test scenarios.
Ah, this is my bad sorry. I didn't realize that we're actually
relying on the delayed extent information in the extent status tree
now.
However I remember that I've seen some problems when this extent
removal was not there (see the comment you removed). I am not
entirely sure anymore what it was all about, but I need to retest
with your patch.
Thanks!
-Lukas
>
> Signed-off-by: Eric Whitney <[email protected]>
> ---
> fs/ext4/extents.c | 13 -------------
> 1 file changed, 13 deletions(-)
>
> diff --git a/fs/ext4/extents.c b/fs/ext4/extents.c
> index bed4308..c187cc3 100644
> --- a/fs/ext4/extents.c
> +++ b/fs/ext4/extents.c
> @@ -4847,19 +4847,6 @@ static long ext4_zero_range(struct file *file, loff_t offset,
> flags, mode);
> if (ret)
> goto out_dio;
> - /*
> - * Remove entire range from the extent status tree.
> - *
> - * ext4_es_remove_extent(inode, lblk, max_blocks) is
> - * NOT sufficient. I'm not sure why this is the case,
> - * but let's be conservative and remove the extent
> - * status tree for the entire inode. There should be
> - * no outstanding delalloc extents thanks to the
> - * filemap_write_and_wait_range() call above.
> - */
> - ret = ext4_es_remove_extent(inode, 0, EXT_MAX_BLOCKS);
> - if (ret)
> - goto out_dio;
> }
> if (!partial_begin && !partial_end)
> goto out_dio;
>
* Lukáš Czerner <[email protected]>:
> On Fri, 20 Mar 2015, Eric Whitney wrote:
>
> > Date: Fri, 20 Mar 2015 19:53:50 -0400
> > From: Eric Whitney <[email protected]>
> > To: [email protected]
> > Cc: [email protected]
> > Subject: [PATCH] ext4: fix loss of delalloc extent info in ext4_zero_range()
> >
> > In ext4_zero_range(), removing a file's entire block range from the
> > extent status tree removes all records of that file's delalloc extents.
> > The delalloc accounting code uses this information, and its loss can
> > then lead to accounting errors and kernel warnings at writeback time and
> > subsequent file system damage. This is most noticeable on bigalloc
> > file systems where code in ext4_ext_map_blocks() handles cases where
> > delalloc extents share clusters with a newly allocated extent.
> >
> > Because we're not deleting a block range and are correctly updating the
> > status of its associated extent, there is no need to remove anything
> > from the extent status tree.
> >
> > When this patch is combined with an unrelated bug fix for
> > ext4_zero_range(), kernel warnings and e2fsck errors reported during
> > xfstests runs on bigalloc filesystems are greatly reduced without
> > introducing regressions on other xfstests-bld test scenarios.
>
> Ah, this is my bad sorry. I didn't realize that we're actually
> relying on the delayed extent information in the extent status tree
> now.
>
> However I remember that I've seen some problems when this extent
> removal was not there (see the comment you removed). I am not
> entirely sure anymore what it was all about, but I need to retest
> with your patch.
>
Hi Lukas:
For a little more context, the unrelated bug fix I referenced is in fact
your pending zero_range fix. Without it, xfstests-bld runs on a kernel
with this patch applied will report test failures for generic/091 pretty
consistently and for generic/127 occasionally on various test scenarios
(including 4k, ext4, bigalloc, etc.). With your patch, things look pretty
good to me. (It's still possible to encounter occasional kernel warnings
related to delalloc reserved space accounting during bigalloc runs, but I
think those are related to another area I'm working on.)
We don't appear to have a record of the problems mentioned in the comment
I removed, and Ted doesn't recall what he might have seen. Since that
code was introduced a couple of releases ago, I'm speculating he might
possibly have been seeing the xfstests failures described above before you
posted your patch.
Any test results you can share would be very welcome - taken together, these
two patches address a bunch of test failures and kernel warning noise on
bigalloc. There are developers who want to do some ext4 SMR work leveraging
bigalloc right now, so if we can clear that up it will make things easier for
them.
Thanks,
Eric
> Thanks!
> -Lukas
>
> >
> > Signed-off-by: Eric Whitney <[email protected]>
> > ---
> > fs/ext4/extents.c | 13 -------------
> > 1 file changed, 13 deletions(-)
> >
> > diff --git a/fs/ext4/extents.c b/fs/ext4/extents.c
> > index bed4308..c187cc3 100644
> > --- a/fs/ext4/extents.c
> > +++ b/fs/ext4/extents.c
> > @@ -4847,19 +4847,6 @@ static long ext4_zero_range(struct file *file, loff_t offset,
> > flags, mode);
> > if (ret)
> > goto out_dio;
> > - /*
> > - * Remove entire range from the extent status tree.
> > - *
> > - * ext4_es_remove_extent(inode, lblk, max_blocks) is
> > - * NOT sufficient. I'm not sure why this is the case,
> > - * but let's be conservative and remove the extent
> > - * status tree for the entire inode. There should be
> > - * no outstanding delalloc extents thanks to the
> > - * filemap_write_and_wait_range() call above.
> > - */
> > - ret = ext4_es_remove_extent(inode, 0, EXT_MAX_BLOCKS);
> > - if (ret)
> > - goto out_dio;
> > }
> > if (!partial_begin && !partial_end)
> > goto out_dio;
> >
On Fri, Mar 20, 2015 at 07:53:50PM -0400, Eric Whitney wrote:
> In ext4_zero_range(), removing a file's entire block range from the
> extent status tree removes all records of that file's delalloc extents.
> The delalloc accounting code uses this information, and its loss can
> then lead to accounting errors and kernel warnings at writeback time and
> subsequent file system damage. This is most noticeable on bigalloc
> file systems where code in ext4_ext_map_blocks() handles cases where
> delalloc extents share clusters with a newly allocated extent.
>
> Because we're not deleting a block range and are correctly updating the
> status of its associated extent, there is no need to remove anything
> from the extent status tree.
>
> When this patch is combined with an unrelated bug fix for
> ext4_zero_range(), kernel warnings and e2fsck errors reported during
> xfstests runs on bigalloc filesystems are greatly reduced without
> introducing regressions on other xfstests-bld test scenarios.
>
> Signed-off-by: Eric Whitney <[email protected]>
Applied, thanks.
- Ted