2021-08-04 13:25:14

by yangerkun

[permalink] [raw]
Subject: [PATCH] ext4: stop return ENOSPC from ext4_issue_zeroout

Our testcase(briefly described as fsstress on dm thin-provisioning which
ext4 see volume size with 100G but actual size 10G) trigger a hungtask
bug since ext4_writepages fall into a infinite loop:

static int ext4_writepages(xxx)
{
...
while (!done && mpd.first_page <= mpd.last_page) {
...
ret = mpage_prepare_extent_to_map(&mpd);
if (!ret) {
...
ret = mpage_map_and_submit_extent(handle,
&mpd,&give_up_on_write);
<----- will return -ENOSPC
...
}
...
if (ret == -ENOSPC && sbi->s_journal) {
<------ we cannot break since we will get ENOSPC forever
jbd2_journal_force_commit_nested(sbi->s_journal);
ret = 0;
continue;
}
...
}
}

Got ENOSPC with follow stack:
...
ext4_ext_map_blocks
ext4_ext_convert_to_initialized
ext4_ext_zeroout
ext4_issue_zeroout
...
submit_bio_wait <-- bio to thinpool will return ENOSPC

'df22291ff0fd ("ext4: Retry block allocation if we have free blocks
left")' add the logic to retry block allcate since we may get free block
after we commit a transaction. But the ENOSPC from thin-provisioning
will confuse ext4, and lead the upper infinite loop. Fix it by convert
the err to EIO.

Signed-off-by: yangerkun <[email protected]>
---
fs/ext4/inode.c | 3 +++
1 file changed, 3 insertions(+)

diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c
index 038aebd7eb2f..d9ded699a88c 100644
--- a/fs/ext4/inode.c
+++ b/fs/ext4/inode.c
@@ -428,6 +428,9 @@ int ext4_issue_zeroout(struct inode *inode, ext4_lblk_t lblk, ext4_fsblk_t pblk,
if (ret > 0)
ret = 0;

+ if (ret == -ENOSPC)
+ ret = -EIO;
+
return ret;
}

--
2.31.1


2021-08-04 13:48:38

by Jan Kara

[permalink] [raw]
Subject: Re: [PATCH] ext4: stop return ENOSPC from ext4_issue_zeroout

On Wed 04-08-21 20:50:44, yangerkun wrote:
> Our testcase(briefly described as fsstress on dm thin-provisioning which
> ext4 see volume size with 100G but actual size 10G) trigger a hungtask
> bug since ext4_writepages fall into a infinite loop:
>
> static int ext4_writepages(xxx)
> {
> ...
> while (!done && mpd.first_page <= mpd.last_page) {
> ...
> ret = mpage_prepare_extent_to_map(&mpd);
> if (!ret) {
> ...
> ret = mpage_map_and_submit_extent(handle,
> &mpd,&give_up_on_write);
> <----- will return -ENOSPC
> ...
> }
> ...
> if (ret == -ENOSPC && sbi->s_journal) {
> <------ we cannot break since we will get ENOSPC forever
> jbd2_journal_force_commit_nested(sbi->s_journal);
> ret = 0;
> continue;
> }
> ...
> }
> }
>
> Got ENOSPC with follow stack:
> ...
> ext4_ext_map_blocks
> ext4_ext_convert_to_initialized
> ext4_ext_zeroout
> ext4_issue_zeroout
> ...
> submit_bio_wait <-- bio to thinpool will return ENOSPC
>
> 'df22291ff0fd ("ext4: Retry block allocation if we have free blocks
> left")' add the logic to retry block allcate since we may get free block
> after we commit a transaction. But the ENOSPC from thin-provisioning
> will confuse ext4, and lead the upper infinite loop. Fix it by convert
> the err to EIO.
>
> Signed-off-by: yangerkun <[email protected]>

Thanks for the patch. As a quick fix for the problem this is probably fine.
But longer term we might need to implement a configurable behavior for this
because just dropping data on the floor (which is what would happen here)
need not be what sysadmin wants and blocking until space is provisioned may be
actually a preferable behavior. Anyway for now feel free to add:

Reviewed-by: Jan Kara <[email protected]>

Honza

> ---
> fs/ext4/inode.c | 3 +++
> 1 file changed, 3 insertions(+)
>
> diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c
> index 038aebd7eb2f..d9ded699a88c 100644
> --- a/fs/ext4/inode.c
> +++ b/fs/ext4/inode.c
> @@ -428,6 +428,9 @@ int ext4_issue_zeroout(struct inode *inode, ext4_lblk_t lblk, ext4_fsblk_t pblk,
> if (ret > 0)
> ret = 0;
>
> + if (ret == -ENOSPC)
> + ret = -EIO;
> +
> return ret;
> }
>
> --
> 2.31.1
>
--
Jan Kara <[email protected]>
SUSE Labs, CR

2021-08-13 15:22:18

by Theodore Ts'o

[permalink] [raw]
Subject: Re: [PATCH] ext4: stop return ENOSPC from ext4_issue_zeroout

On Wed, Aug 04, 2021 at 03:35:29PM +0200, Jan Kara wrote:
> On Wed 04-08-21 20:50:44, yangerkun wrote:
> > Our testcase(briefly described as fsstress on dm thin-provisioning which
> > ext4 see volume size with 100G but actual size 10G) trigger a hungtask
> > bug since ext4_writepages fall into a infinite loop:
> >
> > Got ENOSPC with follow stack:
> > ...
> > ext4_ext_map_blocks
> > ext4_ext_convert_to_initialized
> > ext4_ext_zeroout
> > ext4_issue_zeroout
> > ...
> > submit_bio_wait <-- bio to thinpool will return ENOSPC
> >
>
> Thanks for the patch. As a quick fix for the problem this is probably fine.
> But longer term we might need to implement a configurable behavior for this
> because just dropping data on the floor (which is what would happen here)
> need not be what sysadmin wants and blocking until space is provisioned may be
> actually a preferable behavior. Anyway for now feel free to add:
>
> Reviewed-by: Jan Kara <[email protected]>

Hmm, I wonder if this would be a better fix. (Not yet tested, may fry
your file system, etc....) What do folks think?

- Ted

diff --git a/fs/ext4/extents.c b/fs/ext4/extents.c
index 92ad64b89d9b..501516cadc1b 100644
--- a/fs/ext4/extents.c
+++ b/fs/ext4/extents.c
@@ -3569,7 +3569,7 @@ static int ext4_ext_convert_to_initialized(handle_t *handle,
split_map.m_len - ee_block);
err = ext4_ext_zeroout(inode, &zero_ex1);
if (err)
- goto out;
+ goto fallback;
split_map.m_len = allocated;
}
if (split_map.m_lblk - ee_block + split_map.m_len <
@@ -3583,7 +3583,7 @@ static int ext4_ext_convert_to_initialized(handle_t *handle,
ext4_ext_pblock(ex));
err = ext4_ext_zeroout(inode, &zero_ex2);
if (err)
- goto out;
+ goto fallback;
}

split_map.m_len += split_map.m_lblk - ee_block;
@@ -3592,6 +3592,7 @@ static int ext4_ext_convert_to_initialized(handle_t *handle,
}
}

+fallback:
err = ext4_split_extent(handle, inode, ppath, &split_map, split_flag,
flags);
if (err > 0)

2021-08-13 21:29:43

by Theodore Ts'o

[permalink] [raw]
Subject: [PATCH] ext4: if zeroout fails fall back to splitting the extent node

If the underlying storage device is using thin-provisioning, it's
possible for a zeroout operation to return ENOSPC.

Commit df22291ff0fd ("ext4: Retry block allocation if we have free blocks
left") added logic to retry block allocation since we might get free block
after we commit a transaction. But the ENOSPC from thin-provisioning
will confuse ext4, and lead to an infinite loop.

Since using zeroout instead of splitting the extent node is an
optimization, if it fails, we might as well fall back to splitting the
extent node.

Reported-by: yangerkun <[email protected]>
Signed-off-by: Theodore Ts'o <[email protected]>
---

I've run this through my battery of tests, and it doesn't cause any
regressions. Yangerkun, can you test this and see if this works for
you?

fs/ext4/extents.c | 5 +++--
1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/fs/ext4/extents.c b/fs/ext4/extents.c
index 92ad64b89d9b..501516cadc1b 100644
--- a/fs/ext4/extents.c
+++ b/fs/ext4/extents.c
@@ -3569,7 +3569,7 @@ static int ext4_ext_convert_to_initialized(handle_t *handle,
split_map.m_len - ee_block);
err = ext4_ext_zeroout(inode, &zero_ex1);
if (err)
- goto out;
+ goto fallback;
split_map.m_len = allocated;
}
if (split_map.m_lblk - ee_block + split_map.m_len <
@@ -3583,7 +3583,7 @@ static int ext4_ext_convert_to_initialized(handle_t *handle,
ext4_ext_pblock(ex));
err = ext4_ext_zeroout(inode, &zero_ex2);
if (err)
- goto out;
+ goto fallback;
}

split_map.m_len += split_map.m_lblk - ee_block;
@@ -3592,6 +3592,7 @@ static int ext4_ext_convert_to_initialized(handle_t *handle,
}
}

+fallback:
err = ext4_split_extent(handle, inode, ppath, &split_map, split_flag,
flags);
if (err > 0)
--
2.31.0

2021-08-14 02:15:31

by yangerkun

[permalink] [raw]
Subject: Re: [PATCH] ext4: if zeroout fails fall back to splitting the extent node



在 2021/8/14 5:27, Theodore Ts'o 写道:
> If the underlying storage device is using thin-provisioning, it's
> possible for a zeroout operation to return ENOSPC.
>
> Commit df22291ff0fd ("ext4: Retry block allocation if we have free blocks
> left") added logic to retry block allocation since we might get free block
> after we commit a transaction. But the ENOSPC from thin-provisioning
> will confuse ext4, and lead to an infinite loop.
>
> Since using zeroout instead of splitting the extent node is an
> optimization, if it fails, we might as well fall back to splitting the
> extent node.
>
> Reported-by: yangerkun <[email protected]>
> Signed-off-by: Theodore Ts'o <[email protected]>
> ---
>
> I've run this through my battery of tests, and it doesn't cause any
> regressions. Yangerkun, can you test this and see if this works for
> you?

Will do it.

>
> fs/ext4/extents.c | 5 +++--
> 1 file changed, 3 insertions(+), 2 deletions(-)
>
> diff --git a/fs/ext4/extents.c b/fs/ext4/extents.c
> index 92ad64b89d9b..501516cadc1b 100644
> --- a/fs/ext4/extents.c
> +++ b/fs/ext4/extents.c
> @@ -3569,7 +3569,7 @@ static int ext4_ext_convert_to_initialized(handle_t *handle,
> split_map.m_len - ee_block);
> err = ext4_ext_zeroout(inode, &zero_ex1);
> if (err)
> - goto out;
> + goto fallback;
> split_map.m_len = allocated;
> }
> if (split_map.m_lblk - ee_block + split_map.m_len <
> @@ -3583,7 +3583,7 @@ static int ext4_ext_convert_to_initialized(handle_t *handle,
> ext4_ext_pblock(ex));
> err = ext4_ext_zeroout(inode, &zero_ex2);
> if (err)
> - goto out;
> + goto fallback;
> }
>
> split_map.m_len += split_map.m_lblk - ee_block;
> @@ -3592,6 +3592,7 @@ static int ext4_ext_convert_to_initialized(handle_t *handle,
> }
> }
>
> +fallback:
> err = ext4_split_extent(handle, inode, ppath, &split_map, split_flag,
> flags);
> if (err > 0)
>

2021-08-16 06:58:28

by yangerkun

[permalink] [raw]
Subject: Re: [PATCH] ext4: if zeroout fails fall back to splitting the extent node



在 2021/8/14 10:15, yangerkun 写道:
>
>
> 在 2021/8/14 5:27, Theodore Ts'o 写道:
>> If the underlying storage device is using thin-provisioning, it's
>> possible for a zeroout operation to return ENOSPC.
>>
>> Commit df22291ff0fd ("ext4: Retry block allocation if we have free blocks
>> left") added logic to retry block allocation since we might get free
>> block
>> after we commit a transaction. But the ENOSPC from thin-provisioning
>> will confuse ext4, and lead to an infinite loop.
>>
>> Since using zeroout instead of splitting the extent node is an
>> optimization, if it fails, we might as well fall back to splitting the
>> extent node.
>>
>> Reported-by: yangerkun <[email protected]>
>> Signed-off-by: Theodore Ts'o <[email protected]>
>> ---
>>
>> I've run this through my battery of tests, and it doesn't cause any
>> regressions.  Yangerkun, can you test this and see if this works for
>> you?
>
> Will do it.

Thanks for the patch, it can help us to pass the testcase. And after
some review, it's really a better fix for me.

Thanks,
Kun.

>
>>
>>   fs/ext4/extents.c | 5 +++--
>>   1 file changed, 3 insertions(+), 2 deletions(-)
>>
>> diff --git a/fs/ext4/extents.c b/fs/ext4/extents.c
>> index 92ad64b89d9b..501516cadc1b 100644
>> --- a/fs/ext4/extents.c
>> +++ b/fs/ext4/extents.c
>> @@ -3569,7 +3569,7 @@ static int
>> ext4_ext_convert_to_initialized(handle_t *handle,
>>                   split_map.m_len - ee_block);
>>               err = ext4_ext_zeroout(inode, &zero_ex1);
>>               if (err)
>> -                goto out;
>> +                goto fallback;
>>               split_map.m_len = allocated;
>>           }
>>           if (split_map.m_lblk - ee_block + split_map.m_len <
>> @@ -3583,7 +3583,7 @@ static int
>> ext4_ext_convert_to_initialized(handle_t *handle,
>>                                 ext4_ext_pblock(ex));
>>                   err = ext4_ext_zeroout(inode, &zero_ex2);
>>                   if (err)
>> -                    goto out;
>> +                    goto fallback;
>>               }
>>               split_map.m_len += split_map.m_lblk - ee_block;
>> @@ -3592,6 +3592,7 @@ static int
>> ext4_ext_convert_to_initialized(handle_t *handle,
>>           }
>>       }
>> +fallback:
>>       err = ext4_split_extent(handle, inode, ppath, &split_map,
>> split_flag,
>>                   flags);
>>       if (err > 0)
>>
> .

2021-08-16 10:06:27

by Jan Kara

[permalink] [raw]
Subject: Re: [PATCH] ext4: stop return ENOSPC from ext4_issue_zeroout

On Fri 13-08-21 11:18:01, Theodore Ts'o wrote:
> On Wed, Aug 04, 2021 at 03:35:29PM +0200, Jan Kara wrote:
> > On Wed 04-08-21 20:50:44, yangerkun wrote:
> > > Our testcase(briefly described as fsstress on dm thin-provisioning which
> > > ext4 see volume size with 100G but actual size 10G) trigger a hungtask
> > > bug since ext4_writepages fall into a infinite loop:
> > >
> > > Got ENOSPC with follow stack:
> > > ...
> > > ext4_ext_map_blocks
> > > ext4_ext_convert_to_initialized
> > > ext4_ext_zeroout
> > > ext4_issue_zeroout
> > > ...
> > > submit_bio_wait <-- bio to thinpool will return ENOSPC
> > >
> >
> > Thanks for the patch. As a quick fix for the problem this is probably fine.
> > But longer term we might need to implement a configurable behavior for this
> > because just dropping data on the floor (which is what would happen here)
> > need not be what sysadmin wants and blocking until space is provisioned may be
> > actually a preferable behavior. Anyway for now feel free to add:
> >
> > Reviewed-by: Jan Kara <[email protected]>
>
> Hmm, I wonder if this would be a better fix. (Not yet tested, may fry
> your file system, etc....) What do folks think?

Yes, that looks indeed better. I'd note that even splitting extent may fail
due to ENOSPC on thin-provisioned storage but the chances are *much* lower.

Honza

> diff --git a/fs/ext4/extents.c b/fs/ext4/extents.c
> index 92ad64b89d9b..501516cadc1b 100644
> --- a/fs/ext4/extents.c
> +++ b/fs/ext4/extents.c
> @@ -3569,7 +3569,7 @@ static int ext4_ext_convert_to_initialized(handle_t *handle,
> split_map.m_len - ee_block);
> err = ext4_ext_zeroout(inode, &zero_ex1);
> if (err)
> - goto out;
> + goto fallback;
> split_map.m_len = allocated;
> }
> if (split_map.m_lblk - ee_block + split_map.m_len <
> @@ -3583,7 +3583,7 @@ static int ext4_ext_convert_to_initialized(handle_t *handle,
> ext4_ext_pblock(ex));
> err = ext4_ext_zeroout(inode, &zero_ex2);
> if (err)
> - goto out;
> + goto fallback;
> }
>
> split_map.m_len += split_map.m_lblk - ee_block;
> @@ -3592,6 +3592,7 @@ static int ext4_ext_convert_to_initialized(handle_t *handle,
> }
> }
>
> +fallback:
> err = ext4_split_extent(handle, inode, ppath, &split_map, split_flag,
> flags);
> if (err > 0)
--
Jan Kara <[email protected]>
SUSE Labs, CR

2021-08-16 14:18:15

by Theodore Ts'o

[permalink] [raw]
Subject: Re: [PATCH] ext4: stop return ENOSPC from ext4_issue_zeroout

On Mon, Aug 16, 2021 at 12:05:45PM +0200, Jan Kara wrote:
>
> Yes, that looks indeed better. I'd note that even splitting extent may fail
> due to ENOSPC on thin-provisioned storage but the chances are *much* lower.

Indeed, any kind of metadata update (updating an inode atime, creating
a new inode, deleting a directory entry, etc.) can fail due to ENOSPC
on thin-provision storage, leading to a potentially corrupted file
system since some writes will succeed, while others won't --- and
that's not a scenario that's super well tested, nor is there much we
can do other than remounting the file system read/only or forcing a
reboot.

But at least we won't throw the kernel into an infinite loop, which is
what yangerkun was reporting...

- Ted

2021-09-26 11:35:42

by yangerkun

[permalink] [raw]
Subject: Re: [PATCH] ext4: if zeroout fails fall back to splitting the extent node

Hi,

Rethink about this problem. Should we consider other place which call
ext4_issue_zeroout? Maybe it can trigger the problem too(in theory, not
really happened)...

How about include follow patch which not only transfer ENOSPC to EIO.
But also stop to overwrite the error return by ext4_ext_insert_extent in
ext4_split_extent_at.

Besides, 308c57ccf431 ("ext4: if zeroout fails fall back to splitting
the extent node") can work together with this patch.



diff --git a/fs/ext4/extents.c b/fs/ext4/extents.c
index c0de30f25185..66767ede235f 100644
--- a/fs/ext4/extents.c
+++ b/fs/ext4/extents.c
@@ -3218,16 +3218,18 @@ static int ext4_split_extent_at(handle_t *handle,
goto out;

if (EXT4_EXT_MAY_ZEROOUT & split_flag) {
+ int ret = 0;
+
if (split_flag &
(EXT4_EXT_DATA_VALID1|EXT4_EXT_DATA_VALID2)) {
if (split_flag & EXT4_EXT_DATA_VALID1) {
- err = ext4_ext_zeroout(inode, ex2);
+ ret = ext4_ext_zeroout(inode, ex2);
zero_ex.ee_block = ex2->ee_block;
zero_ex.ee_len = cpu_to_le16(

ext4_ext_get_actual_len(ex2));
ext4_ext_store_pblock(&zero_ex,

ext4_ext_pblock(ex2));
} else {
- err = ext4_ext_zeroout(inode, ex);
+ ret = ext4_ext_zeroout(inode, ex);
zero_ex.ee_block = ex->ee_block;
zero_ex.ee_len = cpu_to_le16(

ext4_ext_get_actual_len(ex));
@@ -3235,7 +3237,7 @@ static int ext4_split_extent_at(handle_t *handle,
ext4_ext_pblock(ex));
}
} else {
- err = ext4_ext_zeroout(inode, &orig_ex);
+ ret = ext4_ext_zeroout(inode, &orig_ex);
zero_ex.ee_block = orig_ex.ee_block;
zero_ex.ee_len = cpu_to_le16(

ext4_ext_get_actual_len(&orig_ex));
@@ -3243,7 +3245,7 @@ static int ext4_split_extent_at(handle_t *handle,
ext4_ext_pblock(&orig_ex));
}

- if (!err) {
+ if (!ret) {
/* update the extent length and mark as
initialized */
ex->ee_len = cpu_to_le16(ee_len);
ext4_ext_try_to_merge(handle, inode, path, ex);
diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c
index d18852d6029c..95b970581864 100644
--- a/fs/ext4/inode.c
+++ b/fs/ext4/inode.c
@@ -427,6 +427,9 @@ int ext4_issue_zeroout(struct inode *inode,
ext4_lblk_t lblk, ext4_fsblk_t pblk,
if (ret > 0)
ret = 0;

+ if (ret == -ENOSPC)
+ ret = -EIO;
+
return ret;
}



在 2021/8/14 5:27, Theodore Ts'o 写道:
> If the underlying storage device is using thin-provisioning, it's
> possible for a zeroout operation to return ENOSPC.
>
> Commit df22291ff0fd ("ext4: Retry block allocation if we have free blocks
> left") added logic to retry block allocation since we might get free block
> after we commit a transaction. But the ENOSPC from thin-provisioning
> will confuse ext4, and lead to an infinite loop.
>
> Since using zeroout instead of splitting the extent node is an
> optimization, if it fails, we might as well fall back to splitting the
> extent node.
>
> Reported-by: yangerkun <[email protected]>
> Signed-off-by: Theodore Ts'o <[email protected]>
> ---
>
> I've run this through my battery of tests, and it doesn't cause any
> regressions. Yangerkun, can you test this and see if this works for
> you?
>
> fs/ext4/extents.c | 5 +++--
> 1 file changed, 3 insertions(+), 2 deletions(-)
>
> diff --git a/fs/ext4/extents.c b/fs/ext4/extents.c
> index 92ad64b89d9b..501516cadc1b 100644
> --- a/fs/ext4/extents.c
> +++ b/fs/ext4/extents.c
> @@ -3569,7 +3569,7 @@ static int ext4_ext_convert_to_initialized(handle_t *handle,
> split_map.m_len - ee_block);
> err = ext4_ext_zeroout(inode, &zero_ex1);
> if (err)
> - goto out;
> + goto fallback;
> split_map.m_len = allocated;
> }
> if (split_map.m_lblk - ee_block + split_map.m_len <
> @@ -3583,7 +3583,7 @@ static int ext4_ext_convert_to_initialized(handle_t *handle,
> ext4_ext_pblock(ex));
> err = ext4_ext_zeroout(inode, &zero_ex2);
> if (err)
> - goto out;
> + goto fallback;
> }
>
> split_map.m_len += split_map.m_lblk - ee_block;
> @@ -3592,6 +3592,7 @@ static int ext4_ext_convert_to_initialized(handle_t *handle,
> }
> }
>
> +fallback:
> err = ext4_split_extent(handle, inode, ppath, &split_map, split_flag,
> flags);
> if (err > 0)
>

2021-11-23 09:27:43

by Jan Kara

[permalink] [raw]
Subject: Re: [PATCH] ext4: if zeroout fails fall back to splitting the extent node

Hello,

On Sun 26-09-21 19:35:01, yangerkun wrote:
> Rethink about this problem. Should we consider other place which call
> ext4_issue_zeroout? Maybe it can trigger the problem too(in theory, not
> really happened)...
>
> How about include follow patch which not only transfer ENOSPC to EIO. But
> also stop to overwrite the error return by ext4_ext_insert_extent in
> ext4_split_extent_at.
>
> Besides, 308c57ccf431 ("ext4: if zeroout fails fall back to splitting the
> extent node") can work together with this patch.

I've got back to this. The ext4_ext_zeroout() calls in
ext4_split_extent_at() seem to be there as fallback when insertion of a new
extent fails due to ENOSPC / EDQUOT. If even ext4_ext_zeroout(), then I
think returning an error as the code does now is correct and we don't have
much other option. Also we are really running out of disk space so I think
returning ENOSPC is fine. What exact scenario are you afraid of?

Honza

> diff --git a/fs/ext4/extents.c b/fs/ext4/extents.c
> index c0de30f25185..66767ede235f 100644
> --- a/fs/ext4/extents.c
> +++ b/fs/ext4/extents.c
> @@ -3218,16 +3218,18 @@ static int ext4_split_extent_at(handle_t *handle,
> goto out;
>
> if (EXT4_EXT_MAY_ZEROOUT & split_flag) {
> + int ret = 0;
> +
> if (split_flag &
> (EXT4_EXT_DATA_VALID1|EXT4_EXT_DATA_VALID2)) {
> if (split_flag & EXT4_EXT_DATA_VALID1) {
> - err = ext4_ext_zeroout(inode, ex2);
> + ret = ext4_ext_zeroout(inode, ex2);
> zero_ex.ee_block = ex2->ee_block;
> zero_ex.ee_len = cpu_to_le16(
>
> ext4_ext_get_actual_len(ex2));
> ext4_ext_store_pblock(&zero_ex,
>
> ext4_ext_pblock(ex2));
> } else {
> - err = ext4_ext_zeroout(inode, ex);
> + ret = ext4_ext_zeroout(inode, ex);
> zero_ex.ee_block = ex->ee_block;
> zero_ex.ee_len = cpu_to_le16(
>
> ext4_ext_get_actual_len(ex));
> @@ -3235,7 +3237,7 @@ static int ext4_split_extent_at(handle_t *handle,
> ext4_ext_pblock(ex));
> }
> } else {
> - err = ext4_ext_zeroout(inode, &orig_ex);
> + ret = ext4_ext_zeroout(inode, &orig_ex);
> zero_ex.ee_block = orig_ex.ee_block;
> zero_ex.ee_len = cpu_to_le16(
>
> ext4_ext_get_actual_len(&orig_ex));
> @@ -3243,7 +3245,7 @@ static int ext4_split_extent_at(handle_t *handle,
> ext4_ext_pblock(&orig_ex));
> }
>
> - if (!err) {
> + if (!ret) {
> /* update the extent length and mark as initialized
> */
> ex->ee_len = cpu_to_le16(ee_len);
> ext4_ext_try_to_merge(handle, inode, path, ex);
> diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c
> index d18852d6029c..95b970581864 100644
> --- a/fs/ext4/inode.c
> +++ b/fs/ext4/inode.c
> @@ -427,6 +427,9 @@ int ext4_issue_zeroout(struct inode *inode, ext4_lblk_t
> lblk, ext4_fsblk_t pblk,
> if (ret > 0)
> ret = 0;
>
> + if (ret == -ENOSPC)
> + ret = -EIO;
> +
> return ret;
> }
>
>
>
> 在 2021/8/14 5:27, Theodore Ts'o 写道:
> > If the underlying storage device is using thin-provisioning, it's
> > possible for a zeroout operation to return ENOSPC.
> >
> > Commit df22291ff0fd ("ext4: Retry block allocation if we have free blocks
> > left") added logic to retry block allocation since we might get free block
> > after we commit a transaction. But the ENOSPC from thin-provisioning
> > will confuse ext4, and lead to an infinite loop.
> >
> > Since using zeroout instead of splitting the extent node is an
> > optimization, if it fails, we might as well fall back to splitting the
> > extent node.
> >
> > Reported-by: yangerkun <[email protected]>
> > Signed-off-by: Theodore Ts'o <[email protected]>
> > ---
> >
> > I've run this through my battery of tests, and it doesn't cause any
> > regressions. Yangerkun, can you test this and see if this works for
> > you?
> >
> > fs/ext4/extents.c | 5 +++--
> > 1 file changed, 3 insertions(+), 2 deletions(-)
> >
> > diff --git a/fs/ext4/extents.c b/fs/ext4/extents.c
> > index 92ad64b89d9b..501516cadc1b 100644
> > --- a/fs/ext4/extents.c
> > +++ b/fs/ext4/extents.c
> > @@ -3569,7 +3569,7 @@ static int ext4_ext_convert_to_initialized(handle_t *handle,
> > split_map.m_len - ee_block);
> > err = ext4_ext_zeroout(inode, &zero_ex1);
> > if (err)
> > - goto out;
> > + goto fallback;
> > split_map.m_len = allocated;
> > }
> > if (split_map.m_lblk - ee_block + split_map.m_len <
> > @@ -3583,7 +3583,7 @@ static int ext4_ext_convert_to_initialized(handle_t *handle,
> > ext4_ext_pblock(ex));
> > err = ext4_ext_zeroout(inode, &zero_ex2);
> > if (err)
> > - goto out;
> > + goto fallback;
> > }
> > split_map.m_len += split_map.m_lblk - ee_block;
> > @@ -3592,6 +3592,7 @@ static int ext4_ext_convert_to_initialized(handle_t *handle,
> > }
> > }
> > +fallback:
> > err = ext4_split_extent(handle, inode, ppath, &split_map, split_flag,
> > flags);
> > if (err > 0)
> >
--
Jan Kara <[email protected]>
SUSE Labs, CR

2021-11-24 09:01:23

by yangerkun

[permalink] [raw]
Subject: Re: [PATCH] ext4: if zeroout fails fall back to splitting the extent node



On 2021/11/23 17:27, Jan Kara wrote:
> Hello,
>
> On Sun 26-09-21 19:35:01, yangerkun wrote:
>> Rethink about this problem. Should we consider other place which call
>> ext4_issue_zeroout? Maybe it can trigger the problem too(in theory, not
>> really happened)...
>>
>> How about include follow patch which not only transfer ENOSPC to EIO. But
>> also stop to overwrite the error return by ext4_ext_insert_extent in
>> ext4_split_extent_at.
>>
>> Besides, 308c57ccf431 ("ext4: if zeroout fails fall back to splitting the
>> extent node") can work together with this patch.
>
> I've got back to this. The ext4_ext_zeroout() calls in
> ext4_split_extent_at() seem to be there as fallback when insertion of a new
> extent fails due to ENOSPC / EDQUOT. If even ext4_ext_zeroout(), then I
> think returning an error as the code does now is correct and we don't have
> much other option. Also we are really running out of disk space so I think
> returning ENOSPC is fine. What exact scenario are you afraid of?

I am afraid about the EDQUOT from ext4_ext_insert_extent may be
overwrite by ext4_ext_zeroout with ENOSPC. And this may lead to dead
loop since ext4_writepages will retry once get ENOSPC? Maybe I am wrong...

>
> Honza
>
>> diff --git a/fs/ext4/extents.c b/fs/ext4/extents.c
>> index c0de30f25185..66767ede235f 100644
>> --- a/fs/ext4/extents.c
>> +++ b/fs/ext4/extents.c
>> @@ -3218,16 +3218,18 @@ static int ext4_split_extent_at(handle_t *handle,
>> goto out;
>>
>> if (EXT4_EXT_MAY_ZEROOUT & split_flag) {
>> + int ret = 0;
>> +
>> if (split_flag &
>> (EXT4_EXT_DATA_VALID1|EXT4_EXT_DATA_VALID2)) {
>> if (split_flag & EXT4_EXT_DATA_VALID1) {
>> - err = ext4_ext_zeroout(inode, ex2);
>> + ret = ext4_ext_zeroout(inode, ex2);
>> zero_ex.ee_block = ex2->ee_block;
>> zero_ex.ee_len = cpu_to_le16(
>>
>> ext4_ext_get_actual_len(ex2));
>> ext4_ext_store_pblock(&zero_ex,
>>
>> ext4_ext_pblock(ex2));
>> } else {
>> - err = ext4_ext_zeroout(inode, ex);
>> + ret = ext4_ext_zeroout(inode, ex);
>> zero_ex.ee_block = ex->ee_block;
>> zero_ex.ee_len = cpu_to_le16(
>>
>> ext4_ext_get_actual_len(ex));
>> @@ -3235,7 +3237,7 @@ static int ext4_split_extent_at(handle_t *handle,
>> ext4_ext_pblock(ex));
>> }
>> } else {
>> - err = ext4_ext_zeroout(inode, &orig_ex);
>> + ret = ext4_ext_zeroout(inode, &orig_ex);
>> zero_ex.ee_block = orig_ex.ee_block;
>> zero_ex.ee_len = cpu_to_le16(
>>
>> ext4_ext_get_actual_len(&orig_ex));
>> @@ -3243,7 +3245,7 @@ static int ext4_split_extent_at(handle_t *handle,
>> ext4_ext_pblock(&orig_ex));
>> }
>>
>> - if (!err) {
>> + if (!ret) {
>> /* update the extent length and mark as initialized
>> */
>> ex->ee_len = cpu_to_le16(ee_len);
>> ext4_ext_try_to_merge(handle, inode, path, ex);
>> diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c
>> index d18852d6029c..95b970581864 100644
>> --- a/fs/ext4/inode.c
>> +++ b/fs/ext4/inode.c
>> @@ -427,6 +427,9 @@ int ext4_issue_zeroout(struct inode *inode, ext4_lblk_t
>> lblk, ext4_fsblk_t pblk,
>> if (ret > 0)
>> ret = 0;
>>
>> + if (ret == -ENOSPC)
>> + ret = -EIO;
>> +
>> return ret;
>> }
>>
>>
>>
>> 在 2021/8/14 5:27, Theodore Ts'o 写道:
>>> If the underlying storage device is using thin-provisioning, it's
>>> possible for a zeroout operation to return ENOSPC.
>>>
>>> Commit df22291ff0fd ("ext4: Retry block allocation if we have free blocks
>>> left") added logic to retry block allocation since we might get free block
>>> after we commit a transaction. But the ENOSPC from thin-provisioning
>>> will confuse ext4, and lead to an infinite loop.
>>>
>>> Since using zeroout instead of splitting the extent node is an
>>> optimization, if it fails, we might as well fall back to splitting the
>>> extent node.
>>>
>>> Reported-by: yangerkun <[email protected]>
>>> Signed-off-by: Theodore Ts'o <[email protected]>
>>> ---
>>>
>>> I've run this through my battery of tests, and it doesn't cause any
>>> regressions. Yangerkun, can you test this and see if this works for
>>> you?
>>>
>>> fs/ext4/extents.c | 5 +++--
>>> 1 file changed, 3 insertions(+), 2 deletions(-)
>>>
>>> diff --git a/fs/ext4/extents.c b/fs/ext4/extents.c
>>> index 92ad64b89d9b..501516cadc1b 100644
>>> --- a/fs/ext4/extents.c
>>> +++ b/fs/ext4/extents.c
>>> @@ -3569,7 +3569,7 @@ static int ext4_ext_convert_to_initialized(handle_t *handle,
>>> split_map.m_len - ee_block);
>>> err = ext4_ext_zeroout(inode, &zero_ex1);
>>> if (err)
>>> - goto out;
>>> + goto fallback;
>>> split_map.m_len = allocated;
>>> }
>>> if (split_map.m_lblk - ee_block + split_map.m_len <
>>> @@ -3583,7 +3583,7 @@ static int ext4_ext_convert_to_initialized(handle_t *handle,
>>> ext4_ext_pblock(ex));
>>> err = ext4_ext_zeroout(inode, &zero_ex2);
>>> if (err)
>>> - goto out;
>>> + goto fallback;
>>> }
>>> split_map.m_len += split_map.m_lblk - ee_block;
>>> @@ -3592,6 +3592,7 @@ static int ext4_ext_convert_to_initialized(handle_t *handle,
>>> }
>>> }
>>> +fallback:
>>> err = ext4_split_extent(handle, inode, ppath, &split_map, split_flag,
>>> flags);
>>> if (err > 0)
>>>

2021-11-24 10:37:44

by Jan Kara

[permalink] [raw]
Subject: Re: [PATCH] ext4: if zeroout fails fall back to splitting the extent node

On Wed 24-11-21 17:01:12, yangerkun wrote:
> On 2021/11/23 17:27, Jan Kara wrote:
> > Hello,
> >
> > On Sun 26-09-21 19:35:01, yangerkun wrote:
> > > Rethink about this problem. Should we consider other place which call
> > > ext4_issue_zeroout? Maybe it can trigger the problem too(in theory, not
> > > really happened)...
> > >
> > > How about include follow patch which not only transfer ENOSPC to EIO. But
> > > also stop to overwrite the error return by ext4_ext_insert_extent in
> > > ext4_split_extent_at.
> > >
> > > Besides, 308c57ccf431 ("ext4: if zeroout fails fall back to splitting the
> > > extent node") can work together with this patch.
> >
> > I've got back to this. The ext4_ext_zeroout() calls in
> > ext4_split_extent_at() seem to be there as fallback when insertion of a new
> > extent fails due to ENOSPC / EDQUOT. If even ext4_ext_zeroout(), then I
> > think returning an error as the code does now is correct and we don't have
> > much other option. Also we are really running out of disk space so I think
> > returning ENOSPC is fine. What exact scenario are you afraid of?
>
> I am afraid about the EDQUOT from ext4_ext_insert_extent may be overwrite by
> ext4_ext_zeroout with ENOSPC. And this may lead to dead loop since
> ext4_writepages will retry once get ENOSPC? Maybe I am wrong...

OK, so passing back original error instead of the error from
ext4_ext_zeroout() makes sense. But I don't think doing much more is needed
- firstly, ENOSPC or EDQUOT should not happen in ext4_split_extent_at()
called from ext4_writepages() because we should have reserved enough
space for extent splits when writing data. So hitting that is already
unexpected. Committing transaction holding blocks that are expected to be
free is the most likely reason for us seeing ENOSPC and returning EIO in
that case would be bug. Secondly, returning EIO instead of ENOSPC is IMO a
bit confusing for upper layers and makes it harder to analyze where the
real problem is...

Honza
--
Jan Kara <[email protected]>
SUSE Labs, CR

2021-11-24 12:13:48

by yangerkun

[permalink] [raw]
Subject: Re: [PATCH] ext4: if zeroout fails fall back to splitting the extent node



On 2021/11/24 18:37, Jan Kara wrote:
> On Wed 24-11-21 17:01:12, yangerkun wrote:
>> On 2021/11/23 17:27, Jan Kara wrote:
>>> Hello,
>>>
>>> On Sun 26-09-21 19:35:01, yangerkun wrote:
>>>> Rethink about this problem. Should we consider other place which call
>>>> ext4_issue_zeroout? Maybe it can trigger the problem too(in theory, not
>>>> really happened)...
>>>>
>>>> How about include follow patch which not only transfer ENOSPC to EIO. But
>>>> also stop to overwrite the error return by ext4_ext_insert_extent in
>>>> ext4_split_extent_at.
>>>>
>>>> Besides, 308c57ccf431 ("ext4: if zeroout fails fall back to splitting the
>>>> extent node") can work together with this patch.
>>>
>>> I've got back to this. The ext4_ext_zeroout() calls in
>>> ext4_split_extent_at() seem to be there as fallback when insertion of a new
>>> extent fails due to ENOSPC / EDQUOT. If even ext4_ext_zeroout(), then I
>>> think returning an error as the code does now is correct and we don't have
>>> much other option. Also we are really running out of disk space so I think
>>> returning ENOSPC is fine. What exact scenario are you afraid of?
>>
>> I am afraid about the EDQUOT from ext4_ext_insert_extent may be overwrite by
>> ext4_ext_zeroout with ENOSPC. And this may lead to dead loop since
>> ext4_writepages will retry once get ENOSPC? Maybe I am wrong...
>
> OK, so passing back original error instead of the error from
> ext4_ext_zeroout() makes sense. But I don't think doing much more is needed
> - firstly, ENOSPC or EDQUOT should not happen in ext4_split_extent_at()
> called from ext4_writepages() because we should have reserved enough
> space for extent splits when writing data. So hitting that is already

ext4_da_write_begin
ext4_da_get_block_prep
ext4_insert_delayed_block
ext4_da_reserve_space

It seems we will only reserve space for data, no for metadata...


> unexpected. Committing transaction holding blocks that are expected to be
> free is the most likely reason for us seeing ENOSPC and returning EIO in
> that case would be bug.

Agree. EIO from ext4_ext_zeroout that overwrite the ENOSPC from
ext4_ext_insert_extent seems buggy too. Maybe we should ignore the error
from ext4_ext_zeroout and return the error from ext4_ext_insert_extent
once ext4_ext_zeroout in ext4_split_extent_at got a error. Something
like this:

diff --git a/fs/ext4/extents.c b/fs/ext4/extents.c
index 0ecf819bf189..56cc00ee42a1 100644
--- a/fs/ext4/extents.c
+++ b/fs/ext4/extents.c
@@ -3185,6 +3185,7 @@ static int ext4_split_extent_at(handle_t *handle,
struct ext4_extent *ex2 = NULL;
unsigned int ee_len, depth;
int err = 0;
+ int err1;

BUG_ON((split_flag & (EXT4_EXT_DATA_VALID1 |
EXT4_EXT_DATA_VALID2)) ==
(EXT4_EXT_DATA_VALID1 | EXT4_EXT_DATA_VALID2));
@@ -3255,7 +3256,7 @@ static int ext4_split_extent_at(handle_t *handle,
if (EXT4_EXT_MAY_ZEROOUT & split_flag) {
if (split_flag &
(EXT4_EXT_DATA_VALID1|EXT4_EXT_DATA_VALID2)) {
if (split_flag & EXT4_EXT_DATA_VALID1) {
- err = ext4_ext_zeroout(inode, ex2);
+ err1 = ext4_ext_zeroout(inode, ex2);
zero_ex.ee_block = ex2->ee_block;
zero_ex.ee_len = cpu_to_le16(

ext4_ext_get_actual_len(ex2));
@@ -3270,7 +3271,7 @@ static int ext4_split_extent_at(handle_t *handle,
ext4_ext_pblock(ex));
}
} else {
- err = ext4_ext_zeroout(inode, &orig_ex);
+ err1 = ext4_ext_zeroout(inode, &orig_ex);
zero_ex.ee_block = orig_ex.ee_block;
zero_ex.ee_len = cpu_to_le16(

ext4_ext_get_actual_len(&orig_ex));
@@ -3278,7 +3279,7 @@ static int ext4_split_extent_at(handle_t *handle,
ext4_ext_pblock(&orig_ex));
}

- if (!err) {
+ if (!err1) {
/* update the extent length and mark as
initialized */
ex->ee_len = cpu_to_le16(ee_len);
ext4_ext_try_to_merge(handle, inode, path, ex);



> Secondly, returning EIO instead of ENOSPC is IMO a
> bit confusing for upper layers and makes it harder to analyze where the
> real problem is...
>
> Honza
>

2021-11-24 17:15:17

by Jan Kara

[permalink] [raw]
Subject: Re: [PATCH] ext4: if zeroout fails fall back to splitting the extent node

On Wed 24-11-21 20:11:43, yangerkun wrote:
>
>
> On 2021/11/24 18:37, Jan Kara wrote:
> > On Wed 24-11-21 17:01:12, yangerkun wrote:
> > > On 2021/11/23 17:27, Jan Kara wrote:
> > > > Hello,
> > > >
> > > > On Sun 26-09-21 19:35:01, yangerkun wrote:
> > > > > Rethink about this problem. Should we consider other place which call
> > > > > ext4_issue_zeroout? Maybe it can trigger the problem too(in theory, not
> > > > > really happened)...
> > > > >
> > > > > How about include follow patch which not only transfer ENOSPC to EIO. But
> > > > > also stop to overwrite the error return by ext4_ext_insert_extent in
> > > > > ext4_split_extent_at.
> > > > >
> > > > > Besides, 308c57ccf431 ("ext4: if zeroout fails fall back to splitting the
> > > > > extent node") can work together with this patch.
> > > >
> > > > I've got back to this. The ext4_ext_zeroout() calls in
> > > > ext4_split_extent_at() seem to be there as fallback when insertion of a new
> > > > extent fails due to ENOSPC / EDQUOT. If even ext4_ext_zeroout(), then I
> > > > think returning an error as the code does now is correct and we don't have
> > > > much other option. Also we are really running out of disk space so I think
> > > > returning ENOSPC is fine. What exact scenario are you afraid of?
> > >
> > > I am afraid about the EDQUOT from ext4_ext_insert_extent may be overwrite by
> > > ext4_ext_zeroout with ENOSPC. And this may lead to dead loop since
> > > ext4_writepages will retry once get ENOSPC? Maybe I am wrong...
> >
> > OK, so passing back original error instead of the error from
> > ext4_ext_zeroout() makes sense. But I don't think doing much more is needed
> > - firstly, ENOSPC or EDQUOT should not happen in ext4_split_extent_at()
> > called from ext4_writepages() because we should have reserved enough
> > space for extent splits when writing data. So hitting that is already
>
> ext4_da_write_begin
> ext4_da_get_block_prep
> ext4_insert_delayed_block
> ext4_da_reserve_space
>
> It seems we will only reserve space for data, no for metadata...
>
>
> > unexpected. Committing transaction holding blocks that are expected to be
> > free is the most likely reason for us seeing ENOSPC and returning EIO in
> > that case would be bug.
>
> Agree. EIO from ext4_ext_zeroout that overwrite the ENOSPC from
> ext4_ext_insert_extent seems buggy too. Maybe we should ignore the error
> from ext4_ext_zeroout and return the error from ext4_ext_insert_extent
> once ext4_ext_zeroout in ext4_split_extent_at got a error. Something
> like this:

Yep, something like that looks good to me.

Honza

>
> diff --git a/fs/ext4/extents.c b/fs/ext4/extents.c
> index 0ecf819bf189..56cc00ee42a1 100644
> --- a/fs/ext4/extents.c
> +++ b/fs/ext4/extents.c
> @@ -3185,6 +3185,7 @@ static int ext4_split_extent_at(handle_t *handle,
> struct ext4_extent *ex2 = NULL;
> unsigned int ee_len, depth;
> int err = 0;
> + int err1;
>
> BUG_ON((split_flag & (EXT4_EXT_DATA_VALID1 | EXT4_EXT_DATA_VALID2))
> ==
> (EXT4_EXT_DATA_VALID1 | EXT4_EXT_DATA_VALID2));
> @@ -3255,7 +3256,7 @@ static int ext4_split_extent_at(handle_t *handle,
> if (EXT4_EXT_MAY_ZEROOUT & split_flag) {
> if (split_flag &
> (EXT4_EXT_DATA_VALID1|EXT4_EXT_DATA_VALID2)) {
> if (split_flag & EXT4_EXT_DATA_VALID1) {
> - err = ext4_ext_zeroout(inode, ex2);
> + err1 = ext4_ext_zeroout(inode, ex2);
> zero_ex.ee_block = ex2->ee_block;
> zero_ex.ee_len = cpu_to_le16(
>
> ext4_ext_get_actual_len(ex2));
> @@ -3270,7 +3271,7 @@ static int ext4_split_extent_at(handle_t *handle,
> ext4_ext_pblock(ex));
> }
> } else {
> - err = ext4_ext_zeroout(inode, &orig_ex);
> + err1 = ext4_ext_zeroout(inode, &orig_ex);
> zero_ex.ee_block = orig_ex.ee_block;
> zero_ex.ee_len = cpu_to_le16(
>
> ext4_ext_get_actual_len(&orig_ex));
> @@ -3278,7 +3279,7 @@ static int ext4_split_extent_at(handle_t *handle,
> ext4_ext_pblock(&orig_ex));
> }
>
> - if (!err) {
> + if (!err1) {
> /* update the extent length and mark as initialized
> */
> ex->ee_len = cpu_to_le16(ee_len);
> ext4_ext_try_to_merge(handle, inode, path, ex);
>
>
>
> > Secondly, returning EIO instead of ENOSPC is IMO a
> > bit confusing for upper layers and makes it harder to analyze where the
> > real problem is...
> >
> > Honza
> >
--
Jan Kara <[email protected]>
SUSE Labs, CR