2021-03-25 03:43:41

by Ye Bin

[permalink] [raw]
Subject: [PATCH] ext4: Fix bug on in ext4_es_cache_extent as ext4_split_extent_at failed

We got follow bug_on:
[130747.323114] kernel BUG at fs/ext4/extents_status.c:762!
[130747.323117] Internal error: Oops - BUG: 0 [#1] SMP
......
[130747.334329] Call trace:
[130747.334553] ext4_es_cache_extent+0x150/0x168 [ext4]
[130747.334975] ext4_cache_extents+0x64/0xe8 [ext4]
[130747.335368] ext4_find_extent+0x300/0x330 [ext4]
[130747.335759] ext4_ext_map_blocks+0x74/0x1178 [ext4]
[130747.336179] ext4_map_blocks+0x2f4/0x5f0 [ext4]
[130747.336567] ext4_mpage_readpages+0x4a8/0x7a8 [ext4]
[130747.336995] ext4_readpage+0x54/0x100 [ext4]
[130747.337359] generic_file_buffered_read+0x410/0xae8
[130747.337767] generic_file_read_iter+0x114/0x190
[130747.338152] ext4_file_read_iter+0x5c/0x140 [ext4]
[130747.338556] __vfs_read+0x11c/0x188
[130747.338851] vfs_read+0x94/0x150
[130747.339110] ksys_read+0x74/0xf0

If call ext4_ext_insert_extent failed but new extent already inserted, we just
update "ex->ee_len = orig_ex.ee_len", this will lead to extent overlap, then
cause bug on when cache extent.
If call ext4_ext_insert_extent failed don't update ex->ee_len with old value.
Maybe there will lead to block leak, but it can be fixed by fsck later.

Signed-off-by: Ye Bin <[email protected]>
---
fs/ext4/extents.c | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/fs/ext4/extents.c b/fs/ext4/extents.c
index 77c84d6f1af6..970eb2dfcc46 100644
--- a/fs/ext4/extents.c
+++ b/fs/ext4/extents.c
@@ -3246,7 +3246,7 @@ static int ext4_split_extent_at(handle_t *handle,

goto out;
} else if (err)
- goto fix_extent_len;
+ goto err;

out:
ext4_ext_show_leaf(inode, path);
@@ -3254,6 +3254,7 @@ static int ext4_split_extent_at(handle_t *handle,

fix_extent_len:
ex->ee_len = orig_ex.ee_len;
+err:
/*
* Ignore ext4_ext_dirty return value since we are already in error path
* and err is a non-zero error code.
--
2.25.4


2021-04-20 14:07:07

by Ye Bin

[permalink] [raw]
Subject: Re: [PATCH] ext4: Fix bug on in ext4_es_cache_extent as ext4_split_extent_at failed



On 2021/4/9 9:47, Theodore Ts'o wrote:
> On Wed, Apr 07, 2021 at 09:41:57AM +0800, yebin wrote:
>>>> If call ext4_ext_insert_extent failed but new extent already inserted, we just
>>>> update "ex->ee_len = orig_ex.ee_len", this will lead to extent overlap, then
>>>> cause bug on when cache extent.
>>> How did this happen in the first place? It sounds like if the extent
>>> was already inserted, that would be casue there was an on-disk file
>>> system corruption, no?
>>>
>>> In that case, shouldn't we call ext4_error() to declare the file
>>> system has an inconsistency, so it can be fixed by fsck?
>> We inject IO fault when runing fsstress, JBD detect IO error then trigger
>> JBD abort. At the same time,
>> if ext4_ext_insert_extent already insert new exntent then call
>> ext4_ext_dirty to dirty metadata , but
>> JBD already aborted , ext4_ext_dirty will return error.
>> In ext4_ext_dirty function call ext4_ext_check_inode check extent if ok, if
>> not, trigger BUG_ON and
>> also print extent detail information.
> In this particular case, skipping the "ex->ee_len = orig_ex.ee_len"
> may avoid the BUG_ON. But it's not clear that this is always the
> right thing to do. The fundamental question is what should we do we
> run into an error while we are in the middle of making changes to
> on-disk and in-memory data structures?
>
> In the ideal world, we should undo the changes that we were in the
> middle of making before we return an error. That way, the semantics
> are very clear; on success, the function has made the requested change
> to the file system. If the function returns an error, then no changes
> should be made.
>
> That was the reasoning behind resetting ex->ee_len to orig_ex.ee_len
> in the fix_extent_len inside ext4_split_extent_at(). Unofrtunately,
> ext4_ext_insert_extent() does *not* always follow this convention, and
> that's because it would be extremely difficult for it to do so --- the
> mutations that it makes can be quite complex, including potentially
> increasing the height of the extent tree.
>
> However, I don't think your fix is by any means the ideal one, because
> the much more common way that ext4_ext_insert_extent() is when it
> needs to insert a new leaf node, or need to increase the height of the
> extent tree --- and in it returns an ENOSPC failure. In that case, it
> won't have made any changes changes in the extent tree, and so having
> ext4_split_extent_at() undo the change to ex->ee_len is the right
> thing to do.
>
> Having blocks get leaked when there is an ENOSPC failure, requiring
> fsck to be run --- and without giving the user any warning that this
> has happened is *not* a good way to fail. So I don't think the
> proposed patch is the right way to go.
>
> A better way to go would be to teach ext4_ext_insert_extent() so if
> there is a late failure, that it unwinds the leaf node back to its
> original state (at least from a semantic value). Since the extent
> leaf node could have been split, and/or adjacent extent entries may
> have been merged, what it would need to do is to remember the starting
> block number and length, and make whatever changes are necessaries to
> the extent entries in that leaf node corresponding to that starting
> block number and length.
>
> If you don't want to do that, then a "do no harm" fix would be
> something like this:
>
> ...
> } else if (err == -EROFS) {
> return err;
> } else if (err)
> goto fix_extent_len;
Thanks for your advice. I will send v2 patch.
>
> So in the journal abort case, when err is set to EROFS, we don't try
> to reset the length, since in theory the file system is read-only
> already anyway. However, in the ENOSPC case, we won't end up silently
> leaking blocks that will be lost until the user somehow decides to run
> fsck.
>
> There are still times when this doesn't get things completely right
> (e.g., what if we get a late ENOMEM error versus an early ENOMEM
> failure), where the only real fix is to make ext4_ext_insert_extent()
> obey the convention that if it returns an error, it must not result in
> any user-visible state change.
>
> Cheers,
>
> - Ted
> .