LinuxLists.cc - [PATCH 1/2] f2fs: Fix mount failure due to SPO after a successful online resize FS

2020-02-27 10:40:22

Subject: [PATCH 1/2] f2fs: Fix mount failure due to SPO after a successful online resize FS

Even though online resize is successfully done, a SPO immediately
after resize, still causes below error in the next mount.

[ 11.294650] F2FS-fs (sda8): Wrong user_block_count: 2233856
[ 11.300272] F2FS-fs (sda8): Failed to get valid F2FS checkpoint

This is because after FS metadata is updated in update_fs_metadata()
if the SBI_IS_DIRTY is not dirty, then CP will not be done to reflect
the new user_block_count.

Signed-off-by: Sahitya Tummala <[email protected]>
---
fs/f2fs/gc.c | 1 +
1 file changed, 1 insertion(+)

diff --git a/fs/f2fs/gc.c b/fs/f2fs/gc.c
index a92fa49..a14a75f 100644
--- a/fs/f2fs/gc.c
+++ b/fs/f2fs/gc.c
@@ -1577,6 +1577,7 @@ int f2fs_resize_fs(struct f2fs_sb_info *sbi, __u64 block_count)

update_fs_metadata(sbi, -secs);
clear_sbi_flag(sbi, SBI_IS_RESIZEFS);
+ set_sbi_flag(sbi, SBI_IS_DIRTY);
err = f2fs_sync_fs(sbi->sb, 1);
if (err) {
update_fs_metadata(sbi, secs);
--
Qualcomm India Private Limited, on behalf of Qualcomm Innovation Center, Inc.
Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, a Linux Foundation Collaborative Project.

2020-02-27 10:40:29

by Sahitya Tummala

[permalink] [raw]

Subject: [PATCH 2/2] f2fs: Add a new CP flag to help fsck fix resize SPO issues

Add and set a new CP flag CP_RESIZEFS_FLAG during
online resize FS to help fsck fix the metadata mismatch
that may happen due to SPO during resize, where SB
got updated but CP data couldn't be written yet.

fsck errors -
Info: CKPT version = 6ed7bccb
Wrong user_block_count(2233856)
[f2fs_do_mount:3365] Checkpoint is polluted

Signed-off-by: Sahitya Tummala <[email protected]>
---
fs/f2fs/checkpoint.c | 8 ++++++--
include/linux/f2fs_fs.h | 1 +
2 files changed, 7 insertions(+), 2 deletions(-)

diff --git a/fs/f2fs/checkpoint.c b/fs/f2fs/checkpoint.c
index fdd7f3d..0bd4cdb 100644
--- a/fs/f2fs/checkpoint.c
+++ b/fs/f2fs/checkpoint.c
@@ -1301,10 +1301,14 @@ static void update_ckpt_flags(struct f2fs_sb_info *sbi, struct cp_control *cpc)
else
__clear_ckpt_flags(ckpt, CP_ORPHAN_PRESENT_FLAG);

- if (is_sbi_flag_set(sbi, SBI_NEED_FSCK) ||
- is_sbi_flag_set(sbi, SBI_IS_RESIZEFS))
+ if (is_sbi_flag_set(sbi, SBI_NEED_FSCK))
__set_ckpt_flags(ckpt, CP_FSCK_FLAG);

+ if (is_sbi_flag_set(sbi, SBI_IS_RESIZEFS))
+ __set_ckpt_flags(ckpt, CP_RESIZEFS_FLAG);
+ else
+ __clear_ckpt_flags(ckpt, CP_RESIZEFS_FLAG);
+
if (is_sbi_flag_set(sbi, SBI_CP_DISABLED))
__set_ckpt_flags(ckpt, CP_DISABLED_FLAG);
else
diff --git a/include/linux/f2fs_fs.h b/include/linux/f2fs_fs.h
index ac3f488..3c383dd 100644
--- a/include/linux/f2fs_fs.h
+++ b/include/linux/f2fs_fs.h
@@ -125,6 +125,7 @@ struct f2fs_super_block {
/*
* For checkpoint
*/
+#define CP_RESIZEFS_FLAG 0x00004000
#define CP_DISABLED_QUICK_FLAG 0x00002000
#define CP_DISABLED_FLAG 0x00001000
#define CP_QUOTA_NEED_FSCK_FLAG 0x00000800
--
Qualcomm India Private Limited, on behalf of Qualcomm Innovation Center, Inc.
Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, a Linux Foundation Collaborative Project.

2020-02-28 08:36:20

by Chao Yu

[permalink] [raw]

Subject: Re: [PATCH 1/2] f2fs: Fix mount failure due to SPO after a successful online resize FS

Hi Sahitya,

Good catch.

On 2020/2/27 18:39, Sahitya Tummala wrote:
> Even though online resize is successfully done, a SPO immediately
> after resize, still causes below error in the next mount.
>
> [ 11.294650] F2FS-fs (sda8): Wrong user_block_count: 2233856
> [ 11.300272] F2FS-fs (sda8): Failed to get valid F2FS checkpoint
>
> This is because after FS metadata is updated in update_fs_metadata()
> if the SBI_IS_DIRTY is not dirty, then CP will not be done to reflect
> the new user_block_count.
>
> Signed-off-by: Sahitya Tummala <[email protected]>
> ---
> fs/f2fs/gc.c | 1 +
> 1 file changed, 1 insertion(+)
>
> diff --git a/fs/f2fs/gc.c b/fs/f2fs/gc.c
> index a92fa49..a14a75f 100644
> --- a/fs/f2fs/gc.c
> +++ b/fs/f2fs/gc.c
> @@ -1577,6 +1577,7 @@ int f2fs_resize_fs(struct f2fs_sb_info *sbi, __u64 block_count)
>
> update_fs_metadata(sbi, -secs);
> clear_sbi_flag(sbi, SBI_IS_RESIZEFS);

Need a barrier here to keep order in between above code and set_sbi_flag(DIRTY)?

> + set_sbi_flag(sbi, SBI_IS_DIRTY);
> err = f2fs_sync_fs(sbi->sb, 1);
> if (err) {
> update_fs_metadata(sbi, secs);

Do we need to add clear_sbi_flag(, SBI_IS_DIRTY) into update_fs_metadata(), so above
path can be covered as well?

Thanks,

>

2020-03-02 04:40:28

by Sahitya Tummala

[permalink] [raw]

Subject: Re: [PATCH 1/2] f2fs: Fix mount failure due to SPO after a successful online resize FS

Hi Chao,

On Fri, Feb 28, 2020 at 04:35:37PM +0800, Chao Yu wrote:
> Hi Sahitya,
>
> Good catch.
>
> On 2020/2/27 18:39, Sahitya Tummala wrote:
> > Even though online resize is successfully done, a SPO immediately
> > after resize, still causes below error in the next mount.
> >
> > [ 11.294650] F2FS-fs (sda8): Wrong user_block_count: 2233856
> > [ 11.300272] F2FS-fs (sda8): Failed to get valid F2FS checkpoint
> >
> > This is because after FS metadata is updated in update_fs_metadata()
> > if the SBI_IS_DIRTY is not dirty, then CP will not be done to reflect
> > the new user_block_count.
> >
> > Signed-off-by: Sahitya Tummala <[email protected]>
> > ---
> > fs/f2fs/gc.c | 1 +
> > 1 file changed, 1 insertion(+)
> >
> > diff --git a/fs/f2fs/gc.c b/fs/f2fs/gc.c
> > index a92fa49..a14a75f 100644
> > --- a/fs/f2fs/gc.c
> > +++ b/fs/f2fs/gc.c
> > @@ -1577,6 +1577,7 @@ int f2fs_resize_fs(struct f2fs_sb_info *sbi, __u64 block_count)
> >
> > update_fs_metadata(sbi, -secs);
> > clear_sbi_flag(sbi, SBI_IS_RESIZEFS);
>
> Need a barrier here to keep order in between above code and set_sbi_flag(DIRTY)?

I don't think a barrier will help here. Let us say there is a another context
doing CP already, then it races with update_fs_metadata(), so it may or may not
see the resize updates and it will also clear the SBI_IS_DIRTY flag set by resize
(even with a barrier).

I think we need to synchronize this with CP context, so that these resize changes
will be reflected properly. Please see the new diff below and help with the review.

diff --git a/fs/f2fs/gc.c b/fs/f2fs/gc.c
index a14a75f..5554af8 100644
--- a/fs/f2fs/gc.c
+++ b/fs/f2fs/gc.c
@@ -1467,6 +1467,7 @@ static void update_fs_metadata(struct f2fs_sb_info *sbi, int secs)
long long user_block_count =
le64_to_cpu(F2FS_CKPT(sbi)->user_block_count);

+ clear_sbi_flag(sbi, SBI_IS_DIRTY);
SM_I(sbi)->segment_count = (int)SM_I(sbi)->segment_count + segs;
MAIN_SEGS(sbi) = (int)MAIN_SEGS(sbi) + segs;
FREE_I(sbi)->free_sections = (int)FREE_I(sbi)->free_sections + secs;
@@ -1575,9 +1576,12 @@ int f2fs_resize_fs(struct f2fs_sb_info *sbi, __u64 block_count)
goto out;
}

+ mutex_lock(&sbi->cp_mutex);
update_fs_metadata(sbi, -secs);
clear_sbi_flag(sbi, SBI_IS_RESIZEFS);
set_sbi_flag(sbi, SBI_IS_DIRTY);
+ mutex_unlock(&sbi->cp_mutex);
+
err = f2fs_sync_fs(sbi->sb, 1);
if (err) {
update_fs_metadata(sbi, secs);

thanks,

>
> > + set_sbi_flag(sbi, SBI_IS_DIRTY);
> > err = f2fs_sync_fs(sbi->sb, 1);
> > if (err) {
> > update_fs_metadata(sbi, secs);
>
> Do we need to add clear_sbi_flag(, SBI_IS_DIRTY) into update_fs_metadata(), so above
> path can be covered as well?
>
> Thanks,
>
> >

--
--
Sent by a consultant of the Qualcomm Innovation Center, Inc.
The Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum.

2020-03-03 12:48:37

by Chao Yu

[permalink] [raw]

Subject: Re: [PATCH 1/2] f2fs: Fix mount failure due to SPO after a successful online resize FS

Hi Sahitya,

On 2020/3/2 12:39, Sahitya Tummala wrote:
> Hi Chao,
>
> On Fri, Feb 28, 2020 at 04:35:37PM +0800, Chao Yu wrote:
>> Hi Sahitya,
>>
>> Good catch.
>>
>> On 2020/2/27 18:39, Sahitya Tummala wrote:
>>> Even though online resize is successfully done, a SPO immediately
>>> after resize, still causes below error in the next mount.
>>>
>>> [ 11.294650] F2FS-fs (sda8): Wrong user_block_count: 2233856
>>> [ 11.300272] F2FS-fs (sda8): Failed to get valid F2FS checkpoint
>>>
>>> This is because after FS metadata is updated in update_fs_metadata()
>>> if the SBI_IS_DIRTY is not dirty, then CP will not be done to reflect
>>> the new user_block_count.
>>>
>>> Signed-off-by: Sahitya Tummala <[email protected]>
>>> ---
>>> fs/f2fs/gc.c | 1 +
>>> 1 file changed, 1 insertion(+)
>>>
>>> diff --git a/fs/f2fs/gc.c b/fs/f2fs/gc.c
>>> index a92fa49..a14a75f 100644
>>> --- a/fs/f2fs/gc.c
>>> +++ b/fs/f2fs/gc.c
>>> @@ -1577,6 +1577,7 @@ int f2fs_resize_fs(struct f2fs_sb_info *sbi, __u64 block_count)
>>>
>>> update_fs_metadata(sbi, -secs);
>>> clear_sbi_flag(sbi, SBI_IS_RESIZEFS);
>>
>> Need a barrier here to keep order in between above code and set_sbi_flag(DIRTY)?
>
> I don't think a barrier will help here. Let us say there is a another context
> doing CP already, then it races with update_fs_metadata(), so it may or may not
> see the resize updates and it will also clear the SBI_IS_DIRTY flag set by resize
> (even with a barrier).

I agreed, actually, we didn't consider race condition in between CP and
update_fs_metadata(), it should be fixed.

>
> I think we need to synchronize this with CP context, so that these resize changes
> will be reflected properly. Please see the new diff below and help with the review.
>
> diff --git a/fs/f2fs/gc.c b/fs/f2fs/gc.c
> index a14a75f..5554af8 100644
> --- a/fs/f2fs/gc.c
> +++ b/fs/f2fs/gc.c
> @@ -1467,6 +1467,7 @@ static void update_fs_metadata(struct f2fs_sb_info *sbi, int secs)
> long long user_block_count =
> le64_to_cpu(F2FS_CKPT(sbi)->user_block_count);
>
> + clear_sbi_flag(sbi, SBI_IS_DIRTY);

Why clear dirty flag here?

And why not use cp_mutex to protect update_fs_metadata() in error path of
f2fs_sync_fs() below?

> SM_I(sbi)->segment_count = (int)SM_I(sbi)->segment_count + segs;
> MAIN_SEGS(sbi) = (int)MAIN_SEGS(sbi) + segs;
> FREE_I(sbi)->free_sections = (int)FREE_I(sbi)->free_sections + secs;
> @@ -1575,9 +1576,12 @@ int f2fs_resize_fs(struct f2fs_sb_info *sbi, __u64 block_count)
> goto out;
> }
>
> + mutex_lock(&sbi->cp_mutex);
> update_fs_metadata(sbi, -secs);
> clear_sbi_flag(sbi, SBI_IS_RESIZEFS);
> set_sbi_flag(sbi, SBI_IS_DIRTY);
> + mutex_unlock(&sbi->cp_mutex);
> +
> err = f2fs_sync_fs(sbi->sb, 1);
> if (err) {
> update_fs_metadata(sbi, secs);

^^^^^^^^^^^^^^

In addition, I found that we missed to use sb_lock to protect f2fs_super_block
fields update, will submit a patch for that.

Thanks,

>
> thanks,
>
>>
>>> + set_sbi_flag(sbi, SBI_IS_DIRTY);
>>> err = f2fs_sync_fs(sbi->sb, 1);
>>> if (err) {
>>> update_fs_metadata(sbi, secs);
>>
>> Do we need to add clear_sbi_flag(, SBI_IS_DIRTY) into update_fs_metadata(), so above
>> path can be covered as well?
>>
>> Thanks,
>>
>>>
>

2020-03-03 14:13:14

by Sahitya Tummala

[permalink] [raw]

Subject: Re: [PATCH 1/2] f2fs: Fix mount failure due to SPO after a successful online resize FS

Hi Chao,

On Tue, Mar 03, 2020 at 08:06:21PM +0800, Chao Yu wrote:
> Hi Sahitya,
>
> On 2020/3/2 12:39, Sahitya Tummala wrote:
> > Hi Chao,
> >
> > On Fri, Feb 28, 2020 at 04:35:37PM +0800, Chao Yu wrote:
> >> Hi Sahitya,
> >>
> >> Good catch.
> >>
> >> On 2020/2/27 18:39, Sahitya Tummala wrote:
> >>> Even though online resize is successfully done, a SPO immediately
> >>> after resize, still causes below error in the next mount.
> >>>
> >>> [ 11.294650] F2FS-fs (sda8): Wrong user_block_count: 2233856
> >>> [ 11.300272] F2FS-fs (sda8): Failed to get valid F2FS checkpoint
> >>>
> >>> This is because after FS metadata is updated in update_fs_metadata()
> >>> if the SBI_IS_DIRTY is not dirty, then CP will not be done to reflect
> >>> the new user_block_count.
> >>>
> >>> Signed-off-by: Sahitya Tummala <[email protected]>
> >>> ---
> >>> fs/f2fs/gc.c | 1 +
> >>> 1 file changed, 1 insertion(+)
> >>>
> >>> diff --git a/fs/f2fs/gc.c b/fs/f2fs/gc.c
> >>> index a92fa49..a14a75f 100644
> >>> --- a/fs/f2fs/gc.c
> >>> +++ b/fs/f2fs/gc.c
> >>> @@ -1577,6 +1577,7 @@ int f2fs_resize_fs(struct f2fs_sb_info *sbi, __u64 block_count)
> >>>
> >>> update_fs_metadata(sbi, -secs);
> >>> clear_sbi_flag(sbi, SBI_IS_RESIZEFS);
> >>
> >> Need a barrier here to keep order in between above code and set_sbi_flag(DIRTY)?
> >
> > I don't think a barrier will help here. Let us say there is a another context
> > doing CP already, then it races with update_fs_metadata(), so it may or may not
> > see the resize updates and it will also clear the SBI_IS_DIRTY flag set by resize
> > (even with a barrier).
>
> I agreed, actually, we didn't consider race condition in between CP and
> update_fs_metadata(), it should be fixed.
>
> >
> > I think we need to synchronize this with CP context, so that these resize changes
> > will be reflected properly. Please see the new diff below and help with the review.
> >
> > diff --git a/fs/f2fs/gc.c b/fs/f2fs/gc.c
> > index a14a75f..5554af8 100644
> > --- a/fs/f2fs/gc.c
> > +++ b/fs/f2fs/gc.c
> > @@ -1467,6 +1467,7 @@ static void update_fs_metadata(struct f2fs_sb_info *sbi, int secs)
> > long long user_block_count =
> > le64_to_cpu(F2FS_CKPT(sbi)->user_block_count);
> >
> > + clear_sbi_flag(sbi, SBI_IS_DIRTY);
>
> Why clear dirty flag here?

Yes, it is not required. I will remove it.

>
> And why not use cp_mutex to protect update_fs_metadata() in error path of
> f2fs_sync_fs() below?

Yes, will add a lock there too.

Thanks,

>
> > SM_I(sbi)->segment_count = (int)SM_I(sbi)->segment_count + segs;
> > MAIN_SEGS(sbi) = (int)MAIN_SEGS(sbi) + segs;
> > FREE_I(sbi)->free_sections = (int)FREE_I(sbi)->free_sections + secs;
> > @@ -1575,9 +1576,12 @@ int f2fs_resize_fs(struct f2fs_sb_info *sbi, __u64 block_count)
> > goto out;
> > }
> >
> > + mutex_lock(&sbi->cp_mutex);
> > update_fs_metadata(sbi, -secs);
> > clear_sbi_flag(sbi, SBI_IS_RESIZEFS);
> > set_sbi_flag(sbi, SBI_IS_DIRTY);
> > + mutex_unlock(&sbi->cp_mutex);
> > +
> > err = f2fs_sync_fs(sbi->sb, 1);
> > if (err) {
> > update_fs_metadata(sbi, secs);
>
> ^^^^^^^^^^^^^^
>
> In addition, I found that we missed to use sb_lock to protect f2fs_super_block
> fields update, will submit a patch for that.
>
> Thanks,
>
> >
> > thanks,
> >
> >>
> >>> + set_sbi_flag(sbi, SBI_IS_DIRTY);
> >>> err = f2fs_sync_fs(sbi->sb, 1);
> >>> if (err) {
> >>> update_fs_metadata(sbi, secs);
> >>
> >> Do we need to add clear_sbi_flag(, SBI_IS_DIRTY) into update_fs_metadata(), so above
> >> path can be covered as well?
> >>
> >> Thanks,
> >>
> >>>
> >

--
--
Sent by a consultant of the Qualcomm Innovation Center, Inc.
The Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum.