2019-07-29 05:24:14

by Sahitya Tummala

[permalink] [raw]
Subject: [PATCH] f2fs: Fix indefinite loop in f2fs_gc()

Policy - foreground GC, LFS mode and greedy GC mode.

Under this policy, f2fs_gc() loops forever to GC as it doesn't have
enough free segements to proceed and thus it keeps calling gc_more
for the same victim segment. This can happen if the selected victim
segment could not be GC'd due to failed blkaddr validity check i.e.
is_alive() returns false for the blocks set in current validity map.

Fix this by not resetting the sbi->cur_victim_sec to NULL_SEGNO, when
the segment selected could not be GC'd. This helps to select another
segment for GC and thus helps to proceed forward with GC.

Signed-off-by: Sahitya Tummala <[email protected]>
---
fs/f2fs/gc.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/fs/f2fs/gc.c b/fs/f2fs/gc.c
index 8974672..7bbcc4a 100644
--- a/fs/f2fs/gc.c
+++ b/fs/f2fs/gc.c
@@ -1303,7 +1303,7 @@ int f2fs_gc(struct f2fs_sb_info *sbi, bool sync,
round++;
}

- if (gc_type == FG_GC)
+ if (gc_type == FG_GC && seg_freed)
sbi->cur_victim_sec = NULL_SEGNO;

if (sync)
--
Qualcomm India Private Limited, on behalf of Qualcomm Innovation Center, Inc.
Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, a Linux Foundation Collaborative Project.


2019-07-29 16:39:15

by Chao Yu

[permalink] [raw]
Subject: Re: [f2fs-dev] [PATCH] f2fs: Fix indefinite loop in f2fs_gc()

Hi Sahitya,

On 2019-7-29 13:20, Sahitya Tummala wrote:
> Policy - foreground GC, LFS mode and greedy GC mode.
>
> Under this policy, f2fs_gc() loops forever to GC as it doesn't have
> enough free segements to proceed and thus it keeps calling gc_more
> for the same victim segment. This can happen if the selected victim
> segment could not be GC'd due to failed blkaddr validity check i.e.
> is_alive() returns false for the blocks set in current validity map.
>
> Fix this by not resetting the sbi->cur_victim_sec to NULL_SEGNO, when
> the segment selected could not be GC'd. This helps to select another
> segment for GC and thus helps to proceed forward with GC.
>
> Signed-off-by: Sahitya Tummala <[email protected]>
> ---
> fs/f2fs/gc.c | 2 +-
> 1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/fs/f2fs/gc.c b/fs/f2fs/gc.c
> index 8974672..7bbcc4a 100644
> --- a/fs/f2fs/gc.c
> +++ b/fs/f2fs/gc.c
> @@ -1303,7 +1303,7 @@ int f2fs_gc(struct f2fs_sb_info *sbi, bool sync,
> round++;
> }
>
> - if (gc_type == FG_GC)
> + if (gc_type == FG_GC && seg_freed)
> sbi->cur_victim_sec = NULL_SEGNO;

In some cases, we may remain last victim in sbi->cur_victim_sec, and jump out of
GC cycle, then SSR can skip the last victim due to sec_usage_check()...

Thanks,

>
> if (sync)
>

2019-07-30 05:53:14

by Sahitya Tummala

[permalink] [raw]
Subject: Re: [f2fs-dev] [PATCH] f2fs: Fix indefinite loop in f2fs_gc()

Hi Chao,

On Tue, Jul 30, 2019 at 12:00:45AM +0800, Chao Yu wrote:
> Hi Sahitya,
>
> On 2019-7-29 13:20, Sahitya Tummala wrote:
> > Policy - foreground GC, LFS mode and greedy GC mode.
> >
> > Under this policy, f2fs_gc() loops forever to GC as it doesn't have
> > enough free segements to proceed and thus it keeps calling gc_more
> > for the same victim segment. This can happen if the selected victim
> > segment could not be GC'd due to failed blkaddr validity check i.e.
> > is_alive() returns false for the blocks set in current validity map.
> >
> > Fix this by not resetting the sbi->cur_victim_sec to NULL_SEGNO, when
> > the segment selected could not be GC'd. This helps to select another
> > segment for GC and thus helps to proceed forward with GC.
> >
> > Signed-off-by: Sahitya Tummala <[email protected]>
> > ---
> > fs/f2fs/gc.c | 2 +-
> > 1 file changed, 1 insertion(+), 1 deletion(-)
> >
> > diff --git a/fs/f2fs/gc.c b/fs/f2fs/gc.c
> > index 8974672..7bbcc4a 100644
> > --- a/fs/f2fs/gc.c
> > +++ b/fs/f2fs/gc.c
> > @@ -1303,7 +1303,7 @@ int f2fs_gc(struct f2fs_sb_info *sbi, bool sync,
> > round++;
> > }
> >
> > - if (gc_type == FG_GC)
> > + if (gc_type == FG_GC && seg_freed)
> > sbi->cur_victim_sec = NULL_SEGNO;
>
> In some cases, we may remain last victim in sbi->cur_victim_sec, and jump out of
> GC cycle, then SSR can skip the last victim due to sec_usage_check()...
>

I see. I have a few questions on how to fix this issue. Please share your
comments.

1. Do you think the scenario described is valid? It happens rarely, not very
easy to reproduce. From the dumps, I see that only block is set as valid in
the sentry->cur_valid_map for which I see that summary block check is_alive()
could return false. As only one block is set as valid, chances are there it
can be always selected as the victim by get_victim_by_default() under FG_GC.

2. What are the possible scenarios where summary block check is_alive() could
fail for a segment?

3. How does GC handle such segments?

Thanks,

> Thanks,
>
> >
> > if (sync)
> >

--
--
Sent by a consultant of the Qualcomm Innovation Center, Inc.
The Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum.

2019-07-30 17:12:09

by Chao Yu

[permalink] [raw]
Subject: Re: [f2fs-dev] [PATCH] f2fs: Fix indefinite loop in f2fs_gc()

Hi Sahitya,

On 2019/7/30 12:36, Sahitya Tummala wrote:
> Hi Chao,
>
> On Tue, Jul 30, 2019 at 12:00:45AM +0800, Chao Yu wrote:
>> Hi Sahitya,
>>
>> On 2019-7-29 13:20, Sahitya Tummala wrote:
>>> Policy - foreground GC, LFS mode and greedy GC mode.
>>>
>>> Under this policy, f2fs_gc() loops forever to GC as it doesn't have
>>> enough free segements to proceed and thus it keeps calling gc_more
>>> for the same victim segment. This can happen if the selected victim
>>> segment could not be GC'd due to failed blkaddr validity check i.e.
>>> is_alive() returns false for the blocks set in current validity map.
>>>
>>> Fix this by not resetting the sbi->cur_victim_sec to NULL_SEGNO, when
>>> the segment selected could not be GC'd. This helps to select another
>>> segment for GC and thus helps to proceed forward with GC.
>>>
>>> Signed-off-by: Sahitya Tummala <[email protected]>
>>> ---
>>> fs/f2fs/gc.c | 2 +-
>>> 1 file changed, 1 insertion(+), 1 deletion(-)
>>>
>>> diff --git a/fs/f2fs/gc.c b/fs/f2fs/gc.c
>>> index 8974672..7bbcc4a 100644
>>> --- a/fs/f2fs/gc.c
>>> +++ b/fs/f2fs/gc.c
>>> @@ -1303,7 +1303,7 @@ int f2fs_gc(struct f2fs_sb_info *sbi, bool sync,
>>> round++;
>>> }
>>>
>>> - if (gc_type == FG_GC)
>>> + if (gc_type == FG_GC && seg_freed)
>>> sbi->cur_victim_sec = NULL_SEGNO;
>>
>> In some cases, we may remain last victim in sbi->cur_victim_sec, and jump out of
>> GC cycle, then SSR can skip the last victim due to sec_usage_check()...
>>
>
> I see. I have a few questions on how to fix this issue. Please share your
> comments.
>
> 1. Do you think the scenario described is valid? It happens rarely, not very

IIRC, we suffered endless gc loop due to there is valid block belong to an
opened atomic write file. (because we will skip directly once we hit atomic file)

For your case, I'm not sure that would happen, did you look into is_alive(), why
will it fail? block address not match? If so, it looks like summary info and
dnode block and nat entry are inconsistent.

> easy to reproduce. From the dumps, I see that only block is set as valid in
> the sentry->cur_valid_map for which I see that summary block check is_alive()
> could return false. As only one block is set as valid, chances are there it
> can be always selected as the victim by get_victim_by_default() under FG_GC.
>
> 2. What are the possible scenarios where summary block check is_alive() could
> fail for a segment?

I guess, maybe after check_valid_map(), the block is been truncated before
is_alive(). If so the victim should be prefree directly instead of being
selected again...

>
> 3. How does GC handle such segments?

I think that's not a normal case, or I'm missing something.

Thanks,

>
> Thanks,
>
>> Thanks,
>>
>>>
>>> if (sync)
>>>
>

2019-07-31 05:45:03

by Sahitya Tummala

[permalink] [raw]
Subject: Re: [f2fs-dev] [PATCH] f2fs: Fix indefinite loop in f2fs_gc()

Hi Chao,

On Tue, Jul 30, 2019 at 08:35:46PM +0800, Chao Yu wrote:
> Hi Sahitya,
>
> On 2019/7/30 12:36, Sahitya Tummala wrote:
> > Hi Chao,
> >
> > On Tue, Jul 30, 2019 at 12:00:45AM +0800, Chao Yu wrote:
> >> Hi Sahitya,
> >>
> >> On 2019-7-29 13:20, Sahitya Tummala wrote:
> >>> Policy - foreground GC, LFS mode and greedy GC mode.
> >>>
> >>> Under this policy, f2fs_gc() loops forever to GC as it doesn't have
> >>> enough free segements to proceed and thus it keeps calling gc_more
> >>> for the same victim segment. This can happen if the selected victim
> >>> segment could not be GC'd due to failed blkaddr validity check i.e.
> >>> is_alive() returns false for the blocks set in current validity map.
> >>>
> >>> Fix this by not resetting the sbi->cur_victim_sec to NULL_SEGNO, when
> >>> the segment selected could not be GC'd. This helps to select another
> >>> segment for GC and thus helps to proceed forward with GC.
> >>>
> >>> Signed-off-by: Sahitya Tummala <[email protected]>
> >>> ---
> >>> fs/f2fs/gc.c | 2 +-
> >>> 1 file changed, 1 insertion(+), 1 deletion(-)
> >>>
> >>> diff --git a/fs/f2fs/gc.c b/fs/f2fs/gc.c
> >>> index 8974672..7bbcc4a 100644
> >>> --- a/fs/f2fs/gc.c
> >>> +++ b/fs/f2fs/gc.c
> >>> @@ -1303,7 +1303,7 @@ int f2fs_gc(struct f2fs_sb_info *sbi, bool sync,
> >>> round++;
> >>> }
> >>>
> >>> - if (gc_type == FG_GC)
> >>> + if (gc_type == FG_GC && seg_freed)
> >>> sbi->cur_victim_sec = NULL_SEGNO;
> >>
> >> In some cases, we may remain last victim in sbi->cur_victim_sec, and jump out of
> >> GC cycle, then SSR can skip the last victim due to sec_usage_check()...
> >>
> >
> > I see. I have a few questions on how to fix this issue. Please share your
> > comments.
> >
> > 1. Do you think the scenario described is valid? It happens rarely, not very
>
> IIRC, we suffered endless gc loop due to there is valid block belong to an
> opened atomic write file. (because we will skip directly once we hit atomic file)
>
> For your case, I'm not sure that would happen, did you look into is_alive(), why
> will it fail? block address not match? If so, it looks like summary info and
> dnode block and nat entry are inconsistent.

Yes, from the ramdumps, I could see that block address is not matching and
hence, is_alive() could fail in the issue scenario. Have you observed any such
cases before? What could be the reason for this mismatch?

Thanks,

>
> > easy to reproduce. From the dumps, I see that only block is set as valid in
> > the sentry->cur_valid_map for which I see that summary block check is_alive()
> > could return false. As only one block is set as valid, chances are there it
> > can be always selected as the victim by get_victim_by_default() under FG_GC.
> >
> > 2. What are the possible scenarios where summary block check is_alive() could
> > fail for a segment?
>
> I guess, maybe after check_valid_map(), the block is been truncated before
> is_alive(). If so the victim should be prefree directly instead of being
> selected again...
>
> >
> > 3. How does GC handle such segments?
>
> I think that's not a normal case, or I'm missing something.
>
> Thanks,
>
> >
> > Thanks,
> >
> >> Thanks,
> >>
> >>>
> >>> if (sync)
> >>>
> >

--
--
Sent by a consultant of the Qualcomm Innovation Center, Inc.
The Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum.

2019-07-31 12:18:33

by Chao Yu

[permalink] [raw]
Subject: Re: [f2fs-dev] [PATCH] f2fs: Fix indefinite loop in f2fs_gc()

Hi Sahitya,

On 2019/7/31 11:41, Sahitya Tummala wrote:
> Hi Chao,
>
> On Tue, Jul 30, 2019 at 08:35:46PM +0800, Chao Yu wrote:
>> Hi Sahitya,
>>
>> On 2019/7/30 12:36, Sahitya Tummala wrote:
>>> Hi Chao,
>>>
>>> On Tue, Jul 30, 2019 at 12:00:45AM +0800, Chao Yu wrote:
>>>> Hi Sahitya,
>>>>
>>>> On 2019-7-29 13:20, Sahitya Tummala wrote:
>>>>> Policy - foreground GC, LFS mode and greedy GC mode.
>>>>>
>>>>> Under this policy, f2fs_gc() loops forever to GC as it doesn't have
>>>>> enough free segements to proceed and thus it keeps calling gc_more
>>>>> for the same victim segment. This can happen if the selected victim
>>>>> segment could not be GC'd due to failed blkaddr validity check i.e.
>>>>> is_alive() returns false for the blocks set in current validity map.
>>>>>
>>>>> Fix this by not resetting the sbi->cur_victim_sec to NULL_SEGNO, when
>>>>> the segment selected could not be GC'd. This helps to select another
>>>>> segment for GC and thus helps to proceed forward with GC.
>>>>>
>>>>> Signed-off-by: Sahitya Tummala <[email protected]>
>>>>> ---
>>>>> fs/f2fs/gc.c | 2 +-
>>>>> 1 file changed, 1 insertion(+), 1 deletion(-)
>>>>>
>>>>> diff --git a/fs/f2fs/gc.c b/fs/f2fs/gc.c
>>>>> index 8974672..7bbcc4a 100644
>>>>> --- a/fs/f2fs/gc.c
>>>>> +++ b/fs/f2fs/gc.c
>>>>> @@ -1303,7 +1303,7 @@ int f2fs_gc(struct f2fs_sb_info *sbi, bool sync,
>>>>> round++;
>>>>> }
>>>>>
>>>>> - if (gc_type == FG_GC)
>>>>> + if (gc_type == FG_GC && seg_freed)
>>>>> sbi->cur_victim_sec = NULL_SEGNO;
>>>>
>>>> In some cases, we may remain last victim in sbi->cur_victim_sec, and jump out of
>>>> GC cycle, then SSR can skip the last victim due to sec_usage_check()...
>>>>
>>>
>>> I see. I have a few questions on how to fix this issue. Please share your
>>> comments.
>>>
>>> 1. Do you think the scenario described is valid? It happens rarely, not very
>>
>> IIRC, we suffered endless gc loop due to there is valid block belong to an
>> opened atomic write file. (because we will skip directly once we hit atomic file)
>>
>> For your case, I'm not sure that would happen, did you look into is_alive(), why
>> will it fail? block address not match? If so, it looks like summary info and
>> dnode block and nat entry are inconsistent.
>
> Yes, from the ramdumps, I could see that block address is not matching and
> hence, is_alive() could fail in the issue scenario. Have you observed any such
> cases before? What could be the reason for this mismatch?

Alright, I didn't suffer such case before...

I don't know, too few clues to find the root cause. I guess maybe:
- random data caused by emmc/ufs firmware bugs
- bit-flip or memory overflow
- f2fs bugs

So, for the solution, I suggest to detect such inconsistency, and tag in
somewhere to just get rid of selecting the corrupted section.

BTW, do you try fsck on that image? what's the result?

Thanks,

>
> Thanks,
>
>>
>>> easy to reproduce. From the dumps, I see that only block is set as valid in
>>> the sentry->cur_valid_map for which I see that summary block check is_alive()
>>> could return false. As only one block is set as valid, chances are there it
>>> can be always selected as the victim by get_victim_by_default() under FG_GC.
>>>
>>> 2. What are the possible scenarios where summary block check is_alive() could
>>> fail for a segment?
>>
>> I guess, maybe after check_valid_map(), the block is been truncated before
>> is_alive(). If so the victim should be prefree directly instead of being
>> selected again...
>>
>>>
>>> 3. How does GC handle such segments?
>>
>> I think that's not a normal case, or I'm missing something.
>>
>> Thanks,
>>
>>>
>>> Thanks,
>>>
>>>> Thanks,
>>>>
>>>>>
>>>>> if (sync)
>>>>>
>>>
>