2024-05-15 12:24:53

by Zhang Yi

[permalink] [raw]
Subject: Re: [PATCH] ext4: fix infinite loop when replaying fast_commit

On 2024/5/15 17:13, Luis Henriques wrote:
> On Wed 15 May 2024 04:52:54 PM +08, Zhang Yi wrote;
>
>> On 2024/5/15 16:28, Luis Henriques wrote:
>>> On Wed 15 May 2024 12:59:26 PM +08, Zhang Yi wrote;
>>>
>>>> On 2024/5/14 21:04, Luis Henriques wrote:
>>>>> On Sat 11 May 2024 02:24:17 PM +08, Zhang Yi wrote;
>>>>>
>>>>>> On 2024/5/10 19:52, Luis Henriques (SUSE) wrote:
>>>>>>> When doing fast_commit replay an infinite loop may occur due to an
>>>>>>> uninitialized extent_status struct. ext4_ext_determine_insert_hole() does
>>>>>>> not detect the replay and calls ext4_es_find_extent_range(), which will
>>>>>>> return immediately without initializing the 'es' variable.
>>>>>>>
>>>>>>> Because 'es' contains garbage, an integer overflow may happen causing an
>>>>>>> infinite loop in this function, easily reproducible using fstest generic/039.
>>>>>>>
>>>>>>> This commit fixes this issue by detecting the replay in function
>>>>>>> ext4_ext_determine_insert_hole(). It also adds initialization code to the
>>>>>>> error path in function ext4_es_find_extent_range().
>>>>>>>
>>>>>>> Thanks to Zhang Yi, for figuring out the real problem!
>>>>>>>
>>>>>>> Fixes: 8016e29f4362 ("ext4: fast commit recovery path")
>>>>>>> Signed-off-by: Luis Henriques (SUSE) <[email protected]>
>>>>>>> ---
>>>>>>> Hi!
>>>>>>>
>>>>>>> Two comments:
>>>>>>> 1) The change in ext4_ext_map_blocks() could probably use the min_not_zero
>>>>>>> macro instead. I decided not to do so simply because I wasn't sure if
>>>>>>> that would be safe, but I'm fine changing that if you think it is.
>>>>>>>
>>>>>>> 2) I thought about returning 'EXT_MAX_BLOCKS' instead of '0' in
>>>>>>> ext4_lblk_t ext4_ext_determine_insert_hole(), which would then avoid
>>>>>>> the extra change to ext4_ext_map_blocks(). '0' sounds like the right
>>>>>>> value to return, but I'm also OK using 'EXT_MAX_BLOCKS' instead.
>>>>>>>
>>>>>>> And again thanks to Zhang Yi for pointing me the *real* problem!
>>>>>>>
>>>>>>> fs/ext4/extents.c | 6 +++++-
>>>>>>> fs/ext4/extents_status.c | 5 ++++-
>>>>>>> 2 files changed, 9 insertions(+), 2 deletions(-)
>>>>>>>
>>>>>>> diff --git a/fs/ext4/extents.c b/fs/ext4/extents.c
>>>>>>> index e57054bdc5fd..b5bfcb6c18a0 100644
>>>>>>> --- a/fs/ext4/extents.c
>>>>>>> +++ b/fs/ext4/extents.c
>>>>>>> @@ -4052,6 +4052,9 @@ static ext4_lblk_t ext4_ext_determine_insert_hole(struct inode *inode,
>>>>>>> ext4_lblk_t hole_start, len;
>>>>>>> struct extent_status es;
>>>>>>>
>>>>>>> + if (EXT4_SB(inode->i_sb)->s_mount_state & EXT4_FC_REPLAY)
>>>>>>> + return 0;
>>>>>>> +
>>>>>>
>>>>>> Sorry, I think it's may not correct. When replaying the jouranl, although
>>>>>> we don't use the extent statue tree, we still need to query the accurate
>>>>>> hole length, e.g. please see skip_hole(). If you do this, the hole length
>>>>>> becomes incorrect, right?
>>>>>
>>>>> Thank you for your review (and sorry for my delay replying).
>>>>>
>>>>> So, I see three different options to follow your suggestion:
>>>>>
>>>>> 1) Initialize 'es' immediately when declaring it in function
>>>>> ext4_ext_determine_insert_hole():
>>>>>
>>>>> es.es_lblk = es.es_len = es.es_pblk = 0;
>>>>>
>>>>> 2) Initialize 'es' only in ext4_es_find_extent_range() when checking if an
>>>>> fc replay is in progress (my patch was already doing something like
>>>>> that):
>>>>>
>>>>> if (EXT4_SB(inode->i_sb)->s_mount_state & EXT4_FC_REPLAY) {
>>>>> /* Initialize extent to zero */
>>>>> es->es_lblk = es->es_len = es->es_pblk = 0;
>>>>> return;
>>>>> }
>>>>>
>>>>> 3) Remove the check for fc replay in function ext4_es_find_extent_range(),
>>>>> which will then unconditionally call __es_find_extent_range(). This
>>>>> will effectively also initialize the 'es' fields to '0' and, because
>>>>> __es_tree_search() will return NULL (at least in generic/039 test!),
>>>>> nothing else will be done.
>>>>>
>>>>> Since all these 3 options seem to have the same result, I believe option
>>>>> 1) is probably the best as it initializes the structure shortly after it's
>>>>> declaration. Would you agree? Or did I misunderstood you?
>>>>>
>>>>
>>>> Both 1 and 2 are looks fine to me, but I would prefer to initialize it
>>>> unconditionally in ext4_es_find_extent_range().
>>>>
>>>> @@ -310,6 +310,8 @@ void ext4_es_find_extent_range(struct inode *inode,
>>>> ext4_lblk_t lblk, ext4_lblk_t end,
>>>> struct extent_status *es)
>>>> {
>>>> + es->es_lblk = es->es_len = es->es_pblk = 0;
>>>> +
>>>> if (EXT4_SB(inode->i_sb)->s_mount_state & EXT4_FC_REPLAY)
>>>> return;
>>>
>>> Thank you, Yi. I'll send out v2 shortly. Although, to be fair, the real
>>> patch author shouldn't be me. :-)
>>>
>>
>> Never mind, I just give a suggestion and also I didn't do a full test on
>> this change.
>
> Oh, talking about testing, I forgot to mention that I see the same
> behaviour with generic/311. I.e. this test also enters an infinite loop,
> but fixed with this patch.
>

Yeah, generic/311 also does a lot of mount && journal recovery operations,
and there maybe some other fault injection tests could have the same
results, it's all right now. :)

Thanks,
Yi.