2020-03-26 10:39:36

by Sahitya Tummala

[permalink] [raw]
Subject: [PATCH] f2fs: prevent meta updates while checkpoint is in progress

allocate_segment_for_resize() can cause metapage updates if
it requires to change the current node/data segments for resizing.
Stop these meta updates when there is a checkpoint already
in progress to prevent inconsistent CP data.

Signed-off-by: Sahitya Tummala <[email protected]>
---
fs/f2fs/gc.c | 2 ++
1 file changed, 2 insertions(+)

diff --git a/fs/f2fs/gc.c b/fs/f2fs/gc.c
index 5bca560..6122bad 100644
--- a/fs/f2fs/gc.c
+++ b/fs/f2fs/gc.c
@@ -1399,8 +1399,10 @@ static int free_segment_range(struct f2fs_sb_info *sbi, unsigned int start,
int err = 0;

/* Move out cursegs from the target range */
+ f2fs_lock_op(sbi);
for (type = CURSEG_HOT_DATA; type < NR_CURSEG_TYPE; type++)
allocate_segment_for_resize(sbi, type, start, end);
+ f2fs_unlock_op(sbi);

/* do GC to move out valid blocks in the range */
for (segno = start; segno <= end; segno += sbi->segs_per_sec) {
--
Qualcomm India Private Limited, on behalf of Qualcomm Innovation Center, Inc.
Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, a Linux Foundation Collaborative Project.


2020-03-27 19:26:09

by Jaegeuk Kim

[permalink] [raw]
Subject: Re: [PATCH] f2fs: prevent meta updates while checkpoint is in progress

Hi Sahitya,

On 03/26, Sahitya Tummala wrote:
> allocate_segment_for_resize() can cause metapage updates if
> it requires to change the current node/data segments for resizing.
> Stop these meta updates when there is a checkpoint already
> in progress to prevent inconsistent CP data.

Doesn't freeze|thaw_bdev(sbi->sb->s_bdev); work for you?

>
> Signed-off-by: Sahitya Tummala <[email protected]>
> ---
> fs/f2fs/gc.c | 2 ++
> 1 file changed, 2 insertions(+)
>
> diff --git a/fs/f2fs/gc.c b/fs/f2fs/gc.c
> index 5bca560..6122bad 100644
> --- a/fs/f2fs/gc.c
> +++ b/fs/f2fs/gc.c
> @@ -1399,8 +1399,10 @@ static int free_segment_range(struct f2fs_sb_info *sbi, unsigned int start,
> int err = 0;
>
> /* Move out cursegs from the target range */
> + f2fs_lock_op(sbi);
> for (type = CURSEG_HOT_DATA; type < NR_CURSEG_TYPE; type++)
> allocate_segment_for_resize(sbi, type, start, end);
> + f2fs_unlock_op(sbi);
>
> /* do GC to move out valid blocks in the range */
> for (segno = start; segno <= end; segno += sbi->segs_per_sec) {
> --
> Qualcomm India Private Limited, on behalf of Qualcomm Innovation Center, Inc.
> Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, a Linux Foundation Collaborative Project.

2020-03-28 08:39:43

by Chao Yu

[permalink] [raw]
Subject: Re: [PATCH] f2fs: prevent meta updates while checkpoint is in progress

Hi all,

On 2020/3/28 3:24, Jaegeuk Kim wrote:
> Hi Sahitya,
>
> On 03/26, Sahitya Tummala wrote:
>> allocate_segment_for_resize() can cause metapage updates if
>> it requires to change the current node/data segments for resizing.
>> Stop these meta updates when there is a checkpoint already
>> in progress to prevent inconsistent CP data.
>
> Doesn't freeze|thaw_bdev(sbi->sb->s_bdev); work for you?

That can avoid foreground ops racing? rather than background ops like
balance_fs() from kworker?

BTW, I found that {freeze,thaw}_bdev is not enough to freeze all
foreground fs ops, it needs to use {freeze,thaw}_super instead.

---
fs/f2fs/gc.c | 6 +++---
1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/fs/f2fs/gc.c b/fs/f2fs/gc.c
index 26248c8936db..acdc8b99b543 100644
--- a/fs/f2fs/gc.c
+++ b/fs/f2fs/gc.c
@@ -1538,7 +1538,7 @@ int f2fs_resize_fs(struct f2fs_sb_info *sbi, __u64 block_count)
return -EINVAL;
}

- freeze_bdev(sbi->sb->s_bdev);
+ freeze_super(sbi->sb);

shrunk_blocks = old_block_count - block_count;
secs = div_u64(shrunk_blocks, BLKS_PER_SEC(sbi));
@@ -1551,7 +1551,7 @@ int f2fs_resize_fs(struct f2fs_sb_info *sbi, __u64 block_count)
sbi->user_block_count -= shrunk_blocks;
spin_unlock(&sbi->stat_lock);
if (err) {
- thaw_bdev(sbi->sb->s_bdev, sbi->sb);
+ thaw_super(sbi->sb);
return err;
}

@@ -1613,6 +1613,6 @@ int f2fs_resize_fs(struct f2fs_sb_info *sbi, __u64 block_count)
}
clear_sbi_flag(sbi, SBI_IS_RESIZEFS);
mutex_unlock(&sbi->resize_mutex);
- thaw_bdev(sbi->sb->s_bdev, sbi->sb);
+ thaw_super(sbi->sb);
return err;
}
--
2.18.0.rc1

>
>>
>> Signed-off-by: Sahitya Tummala <[email protected]>
>> ---
>> fs/f2fs/gc.c | 2 ++
>> 1 file changed, 2 insertions(+)
>>
>> diff --git a/fs/f2fs/gc.c b/fs/f2fs/gc.c
>> index 5bca560..6122bad 100644
>> --- a/fs/f2fs/gc.c
>> +++ b/fs/f2fs/gc.c
>> @@ -1399,8 +1399,10 @@ static int free_segment_range(struct f2fs_sb_info *sbi, unsigned int start,
>> int err = 0;
>>
>> /* Move out cursegs from the target range */
>> + f2fs_lock_op(sbi);
>> for (type = CURSEG_HOT_DATA; type < NR_CURSEG_TYPE; type++)
>> allocate_segment_for_resize(sbi, type, start, end);
>> + f2fs_unlock_op(sbi);
>>
>> /* do GC to move out valid blocks in the range */
>> for (segno = start; segno <= end; segno += sbi->segs_per_sec) {
>> --
>> Qualcomm India Private Limited, on behalf of Qualcomm Innovation Center, Inc.
>> Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, a Linux Foundation Collaborative Project.
> .
>

2020-03-30 08:42:40

by Sahitya Tummala

[permalink] [raw]
Subject: Re: [PATCH] f2fs: prevent meta updates while checkpoint is in progress

On Sat, Mar 28, 2020 at 04:38:00PM +0800, Chao Yu wrote:
> Hi all,
>
> On 2020/3/28 3:24, Jaegeuk Kim wrote:
> > Hi Sahitya,
> >
> > On 03/26, Sahitya Tummala wrote:
> >> allocate_segment_for_resize() can cause metapage updates if
> >> it requires to change the current node/data segments for resizing.
> >> Stop these meta updates when there is a checkpoint already
> >> in progress to prevent inconsistent CP data.
> >
> > Doesn't freeze|thaw_bdev(sbi->sb->s_bdev); work for you?
>
> That can avoid foreground ops racing? rather than background ops like
> balance_fs() from kworker?
>

Yes, that can only prevent foreground ops but not the background ops
invoked in the context of kworker thread.

> BTW, I found that {freeze,thaw}_bdev is not enough to freeze all
> foreground fs ops, it needs to use {freeze,thaw}_super instead.
>

Yes, I agree.

Thanks,

> ---
> fs/f2fs/gc.c | 6 +++---
> 1 file changed, 3 insertions(+), 3 deletions(-)
>
> diff --git a/fs/f2fs/gc.c b/fs/f2fs/gc.c
> index 26248c8936db..acdc8b99b543 100644
> --- a/fs/f2fs/gc.c
> +++ b/fs/f2fs/gc.c
> @@ -1538,7 +1538,7 @@ int f2fs_resize_fs(struct f2fs_sb_info *sbi, __u64 block_count)
> return -EINVAL;
> }
>
> - freeze_bdev(sbi->sb->s_bdev);
> + freeze_super(sbi->sb);
>
> shrunk_blocks = old_block_count - block_count;
> secs = div_u64(shrunk_blocks, BLKS_PER_SEC(sbi));
> @@ -1551,7 +1551,7 @@ int f2fs_resize_fs(struct f2fs_sb_info *sbi, __u64 block_count)
> sbi->user_block_count -= shrunk_blocks;
> spin_unlock(&sbi->stat_lock);
> if (err) {
> - thaw_bdev(sbi->sb->s_bdev, sbi->sb);
> + thaw_super(sbi->sb);
> return err;
> }
>
> @@ -1613,6 +1613,6 @@ int f2fs_resize_fs(struct f2fs_sb_info *sbi, __u64 block_count)
> }
> clear_sbi_flag(sbi, SBI_IS_RESIZEFS);
> mutex_unlock(&sbi->resize_mutex);
> - thaw_bdev(sbi->sb->s_bdev, sbi->sb);
> + thaw_super(sbi->sb);
> return err;
> }
> --
> 2.18.0.rc1
>
> >
> >>
> >> Signed-off-by: Sahitya Tummala <[email protected]>
> >> ---
> >> fs/f2fs/gc.c | 2 ++
> >> 1 file changed, 2 insertions(+)
> >>
> >> diff --git a/fs/f2fs/gc.c b/fs/f2fs/gc.c
> >> index 5bca560..6122bad 100644
> >> --- a/fs/f2fs/gc.c
> >> +++ b/fs/f2fs/gc.c
> >> @@ -1399,8 +1399,10 @@ static int free_segment_range(struct f2fs_sb_info *sbi, unsigned int start,
> >> int err = 0;
> >>
> >> /* Move out cursegs from the target range */
> >> + f2fs_lock_op(sbi);
> >> for (type = CURSEG_HOT_DATA; type < NR_CURSEG_TYPE; type++)
> >> allocate_segment_for_resize(sbi, type, start, end);
> >> + f2fs_unlock_op(sbi);
> >>
> >> /* do GC to move out valid blocks in the range */
> >> for (segno = start; segno <= end; segno += sbi->segs_per_sec) {
> >> --
> >> Qualcomm India Private Limited, on behalf of Qualcomm Innovation Center, Inc.
> >> Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, a Linux Foundation Collaborative Project.
> > .
> >

--
--
Sent by a consultant of the Qualcomm Innovation Center, Inc.
The Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum.

2020-03-30 18:35:40

by Jaegeuk Kim

[permalink] [raw]
Subject: Re: [PATCH] f2fs: prevent meta updates while checkpoint is in progress

On 03/30, Sahitya Tummala wrote:
> On Sat, Mar 28, 2020 at 04:38:00PM +0800, Chao Yu wrote:
> > Hi all,
> >
> > On 2020/3/28 3:24, Jaegeuk Kim wrote:
> > > Hi Sahitya,
> > >
> > > On 03/26, Sahitya Tummala wrote:
> > >> allocate_segment_for_resize() can cause metapage updates if
> > >> it requires to change the current node/data segments for resizing.
> > >> Stop these meta updates when there is a checkpoint already
> > >> in progress to prevent inconsistent CP data.
> > >
> > > Doesn't freeze|thaw_bdev(sbi->sb->s_bdev); work for you?
> >
> > That can avoid foreground ops racing? rather than background ops like
> > balance_fs() from kworker?
> >
>
> Yes, that can only prevent foreground ops but not the background ops
> invoked in the context of kworker thread.
>
> > BTW, I found that {freeze,thaw}_bdev is not enough to freeze all
> > foreground fs ops, it needs to use {freeze,thaw}_super instead.
> >
>
> Yes, I agree.

sgtm. :)

>
> Thanks,
>
> > ---
> > fs/f2fs/gc.c | 6 +++---
> > 1 file changed, 3 insertions(+), 3 deletions(-)
> >
> > diff --git a/fs/f2fs/gc.c b/fs/f2fs/gc.c
> > index 26248c8936db..acdc8b99b543 100644
> > --- a/fs/f2fs/gc.c
> > +++ b/fs/f2fs/gc.c
> > @@ -1538,7 +1538,7 @@ int f2fs_resize_fs(struct f2fs_sb_info *sbi, __u64 block_count)
> > return -EINVAL;
> > }
> >
> > - freeze_bdev(sbi->sb->s_bdev);
> > + freeze_super(sbi->sb);
> >
> > shrunk_blocks = old_block_count - block_count;
> > secs = div_u64(shrunk_blocks, BLKS_PER_SEC(sbi));
> > @@ -1551,7 +1551,7 @@ int f2fs_resize_fs(struct f2fs_sb_info *sbi, __u64 block_count)
> > sbi->user_block_count -= shrunk_blocks;
> > spin_unlock(&sbi->stat_lock);
> > if (err) {
> > - thaw_bdev(sbi->sb->s_bdev, sbi->sb);
> > + thaw_super(sbi->sb);
> > return err;
> > }
> >
> > @@ -1613,6 +1613,6 @@ int f2fs_resize_fs(struct f2fs_sb_info *sbi, __u64 block_count)
> > }
> > clear_sbi_flag(sbi, SBI_IS_RESIZEFS);
> > mutex_unlock(&sbi->resize_mutex);
> > - thaw_bdev(sbi->sb->s_bdev, sbi->sb);
> > + thaw_super(sbi->sb);
> > return err;
> > }
> > --
> > 2.18.0.rc1
> >
> > >
> > >>
> > >> Signed-off-by: Sahitya Tummala <[email protected]>
> > >> ---
> > >> fs/f2fs/gc.c | 2 ++
> > >> 1 file changed, 2 insertions(+)
> > >>
> > >> diff --git a/fs/f2fs/gc.c b/fs/f2fs/gc.c
> > >> index 5bca560..6122bad 100644
> > >> --- a/fs/f2fs/gc.c
> > >> +++ b/fs/f2fs/gc.c
> > >> @@ -1399,8 +1399,10 @@ static int free_segment_range(struct f2fs_sb_info *sbi, unsigned int start,
> > >> int err = 0;
> > >>
> > >> /* Move out cursegs from the target range */
> > >> + f2fs_lock_op(sbi);
> > >> for (type = CURSEG_HOT_DATA; type < NR_CURSEG_TYPE; type++)
> > >> allocate_segment_for_resize(sbi, type, start, end);
> > >> + f2fs_unlock_op(sbi);
> > >>
> > >> /* do GC to move out valid blocks in the range */
> > >> for (segno = start; segno <= end; segno += sbi->segs_per_sec) {
> > >> --
> > >> Qualcomm India Private Limited, on behalf of Qualcomm Innovation Center, Inc.
> > >> Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, a Linux Foundation Collaborative Project.
> > > .
> > >
>
> --
> --
> Sent by a consultant of the Qualcomm Innovation Center, Inc.
> The Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum.

2020-03-31 00:58:42

by Chao Yu

[permalink] [raw]
Subject: Re: [PATCH] f2fs: prevent meta updates while checkpoint is in progress

On 2020/3/26 18:36, Sahitya Tummala wrote:
> allocate_segment_for_resize() can cause metapage updates if
> it requires to change the current node/data segments for resizing.
> Stop these meta updates when there is a checkpoint already
> in progress to prevent inconsistent CP data.
>
> Signed-off-by: Sahitya Tummala <[email protected]>

Reviewed-by: Chao Yu <[email protected]>

Thanks,

2020-03-31 03:55:51

by Jaegeuk Kim

[permalink] [raw]
Subject: Re: [PATCH] f2fs: prevent meta updates while checkpoint is in progress

On 03/26, Sahitya Tummala wrote:
> allocate_segment_for_resize() can cause metapage updates if
> it requires to change the current node/data segments for resizing.
> Stop these meta updates when there is a checkpoint already
> in progress to prevent inconsistent CP data.

I'd prefer to use f2fs_lock_op() in bigger coverage.

>
> Signed-off-by: Sahitya Tummala <[email protected]>
> ---
> fs/f2fs/gc.c | 2 ++
> 1 file changed, 2 insertions(+)
>
> diff --git a/fs/f2fs/gc.c b/fs/f2fs/gc.c
> index 5bca560..6122bad 100644
> --- a/fs/f2fs/gc.c
> +++ b/fs/f2fs/gc.c
> @@ -1399,8 +1399,10 @@ static int free_segment_range(struct f2fs_sb_info *sbi, unsigned int start,
> int err = 0;
>
> /* Move out cursegs from the target range */
> + f2fs_lock_op(sbi);
> for (type = CURSEG_HOT_DATA; type < NR_CURSEG_TYPE; type++)
> allocate_segment_for_resize(sbi, type, start, end);
> + f2fs_unlock_op(sbi);
>
> /* do GC to move out valid blocks in the range */
> for (segno = start; segno <= end; segno += sbi->segs_per_sec) {
> --
> Qualcomm India Private Limited, on behalf of Qualcomm Innovation Center, Inc.
> Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, a Linux Foundation Collaborative Project.

2020-03-31 09:23:44

by Sahitya Tummala

[permalink] [raw]
Subject: Re: [PATCH] f2fs: prevent meta updates while checkpoint is in progress

On Mon, Mar 30, 2020 at 08:54:19PM -0700, Jaegeuk Kim wrote:
> On 03/26, Sahitya Tummala wrote:
> > allocate_segment_for_resize() can cause metapage updates if
> > it requires to change the current node/data segments for resizing.
> > Stop these meta updates when there is a checkpoint already
> > in progress to prevent inconsistent CP data.
>
> I'd prefer to use f2fs_lock_op() in bigger coverage.

Do you mean to cover the entire free_segment_range() function within
f2fs_lock_op()? Please clarify.

Thanks,

>
> >
> > Signed-off-by: Sahitya Tummala <[email protected]>
> > ---
> > fs/f2fs/gc.c | 2 ++
> > 1 file changed, 2 insertions(+)
> >
> > diff --git a/fs/f2fs/gc.c b/fs/f2fs/gc.c
> > index 5bca560..6122bad 100644
> > --- a/fs/f2fs/gc.c
> > +++ b/fs/f2fs/gc.c
> > @@ -1399,8 +1399,10 @@ static int free_segment_range(struct f2fs_sb_info *sbi, unsigned int start,
> > int err = 0;
> >
> > /* Move out cursegs from the target range */
> > + f2fs_lock_op(sbi);
> > for (type = CURSEG_HOT_DATA; type < NR_CURSEG_TYPE; type++)
> > allocate_segment_for_resize(sbi, type, start, end);
> > + f2fs_unlock_op(sbi);
> >
> > /* do GC to move out valid blocks in the range */
> > for (segno = start; segno <= end; segno += sbi->segs_per_sec) {
> > --
> > Qualcomm India Private Limited, on behalf of Qualcomm Innovation Center, Inc.
> > Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, a Linux Foundation Collaborative Project.

--
--
Sent by a consultant of the Qualcomm Innovation Center, Inc.
The Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum.

2020-03-31 18:45:08

by Jaegeuk Kim

[permalink] [raw]
Subject: Re: [PATCH] f2fs: prevent meta updates while checkpoint is in progress

On 03/31, Sahitya Tummala wrote:
> On Mon, Mar 30, 2020 at 08:54:19PM -0700, Jaegeuk Kim wrote:
> > On 03/26, Sahitya Tummala wrote:
> > > allocate_segment_for_resize() can cause metapage updates if
> > > it requires to change the current node/data segments for resizing.
> > > Stop these meta updates when there is a checkpoint already
> > > in progress to prevent inconsistent CP data.
> >
> > I'd prefer to use f2fs_lock_op() in bigger coverage.
>
> Do you mean to cover the entire free_segment_range() function within
> f2fs_lock_op()? Please clarify.

I didn't test tho, something like this?

---
fs/f2fs/checkpoint.c | 6 ++++--
fs/f2fs/f2fs.h | 2 +-
fs/f2fs/gc.c | 28 ++++++++++++++--------------
fs/f2fs/super.c | 1 -
include/trace/events/f2fs.h | 4 +++-
5 files changed, 22 insertions(+), 19 deletions(-)

diff --git a/fs/f2fs/checkpoint.c b/fs/f2fs/checkpoint.c
index 852890b72d6ac..531995192b714 100644
--- a/fs/f2fs/checkpoint.c
+++ b/fs/f2fs/checkpoint.c
@@ -1553,7 +1553,8 @@ int f2fs_write_checkpoint(struct f2fs_sb_info *sbi, struct cp_control *cpc)
return 0;
f2fs_warn(sbi, "Start checkpoint disabled!");
}
- mutex_lock(&sbi->cp_mutex);
+ if (cpc->reason != CP_RESIZE)
+ mutex_lock(&sbi->cp_mutex);

if (!is_sbi_flag_set(sbi, SBI_IS_DIRTY) &&
((cpc->reason & CP_FASTBOOT) || (cpc->reason & CP_SYNC) ||
@@ -1622,7 +1623,8 @@ int f2fs_write_checkpoint(struct f2fs_sb_info *sbi, struct cp_control *cpc)
f2fs_update_time(sbi, CP_TIME);
trace_f2fs_write_checkpoint(sbi->sb, cpc->reason, "finish checkpoint");
out:
- mutex_unlock(&sbi->cp_mutex);
+ if (cpc->reason != CP_RESIZE)
+ mutex_unlock(&sbi->cp_mutex);
return err;
}

diff --git a/fs/f2fs/f2fs.h b/fs/f2fs/f2fs.h
index c84442eefc56d..7c98dca3ec1d6 100644
--- a/fs/f2fs/f2fs.h
+++ b/fs/f2fs/f2fs.h
@@ -193,6 +193,7 @@ enum {
#define CP_DISCARD 0x00000010
#define CP_TRIMMED 0x00000020
#define CP_PAUSE 0x00000040
+#define CP_RESIZE 0x00000080

#define MAX_DISCARD_BLOCKS(sbi) BLKS_PER_SEC(sbi)
#define DEF_MAX_DISCARD_REQUEST 8 /* issue 8 discards per round */
@@ -1417,7 +1418,6 @@ struct f2fs_sb_info {
unsigned int segs_per_sec; /* segments per section */
unsigned int secs_per_zone; /* sections per zone */
unsigned int total_sections; /* total section count */
- struct mutex resize_mutex; /* for resize exclusion */
unsigned int total_node_count; /* total node block count */
unsigned int total_valid_node_count; /* valid node block count */
loff_t max_file_blocks; /* max block index of file */
diff --git a/fs/f2fs/gc.c b/fs/f2fs/gc.c
index 26248c8936db0..1e5a06fda09d3 100644
--- a/fs/f2fs/gc.c
+++ b/fs/f2fs/gc.c
@@ -1402,8 +1402,9 @@ void f2fs_build_gc_manager(struct f2fs_sb_info *sbi)
static int free_segment_range(struct f2fs_sb_info *sbi, unsigned int start,
unsigned int end)
{
- int type;
unsigned int segno, next_inuse;
+ struct cp_control cpc = { CP_RESIZE, 0, 0, 0 };
+ int type;
int err = 0;

/* Move out cursegs from the target range */
@@ -1417,16 +1418,14 @@ static int free_segment_range(struct f2fs_sb_info *sbi, unsigned int start,
.iroot = RADIX_TREE_INIT(gc_list.iroot, GFP_NOFS),
};

- down_write(&sbi->gc_lock);
do_garbage_collect(sbi, segno, &gc_list, FG_GC);
- up_write(&sbi->gc_lock);
put_gc_inode(&gc_list);

if (get_valid_blocks(sbi, segno, true))
return -EAGAIN;
}

- err = f2fs_sync_fs(sbi->sb, 1);
+ err = f2fs_write_checkpoint(sbi, &cpc);
if (err)
return err;

@@ -1502,6 +1501,7 @@ static void update_fs_metadata(struct f2fs_sb_info *sbi, int secs)
int f2fs_resize_fs(struct f2fs_sb_info *sbi, __u64 block_count)
{
__u64 old_block_count, shrunk_blocks;
+ struct cp_control cpc = { CP_RESIZE, 0, 0, 0 };
unsigned int secs;
int gc_mode, gc_type;
int err = 0;
@@ -1538,7 +1538,9 @@ int f2fs_resize_fs(struct f2fs_sb_info *sbi, __u64 block_count)
return -EINVAL;
}

- freeze_bdev(sbi->sb->s_bdev);
+ freeze_super(sbi->sb);
+ down_write(&sbi->gc_lock);
+ mutex_lock(&sbi->cp_mutex);

shrunk_blocks = old_block_count - block_count;
secs = div_u64(shrunk_blocks, BLKS_PER_SEC(sbi));
@@ -1551,11 +1553,12 @@ int f2fs_resize_fs(struct f2fs_sb_info *sbi, __u64 block_count)
sbi->user_block_count -= shrunk_blocks;
spin_unlock(&sbi->stat_lock);
if (err) {
- thaw_bdev(sbi->sb->s_bdev, sbi->sb);
+ mutex_unlock(&sbi->cp_mutex);
+ up_write(&sbi->gc_lock);
+ thaw_super(sbi->sb);
return err;
}

- mutex_lock(&sbi->resize_mutex);
set_sbi_flag(sbi, SBI_IS_RESIZEFS);

mutex_lock(&DIRTY_I(sbi)->seglist_lock);
@@ -1587,17 +1590,13 @@ int f2fs_resize_fs(struct f2fs_sb_info *sbi, __u64 block_count)
goto out;
}

- mutex_lock(&sbi->cp_mutex);
update_fs_metadata(sbi, -secs);
clear_sbi_flag(sbi, SBI_IS_RESIZEFS);
set_sbi_flag(sbi, SBI_IS_DIRTY);
- mutex_unlock(&sbi->cp_mutex);

- err = f2fs_sync_fs(sbi->sb, 1);
+ err = f2fs_write_checkpoint(sbi, &cpc);
if (err) {
- mutex_lock(&sbi->cp_mutex);
update_fs_metadata(sbi, secs);
- mutex_unlock(&sbi->cp_mutex);
update_sb_metadata(sbi, secs);
f2fs_commit_super(sbi, false);
}
@@ -1612,7 +1611,8 @@ int f2fs_resize_fs(struct f2fs_sb_info *sbi, __u64 block_count)
spin_unlock(&sbi->stat_lock);
}
clear_sbi_flag(sbi, SBI_IS_RESIZEFS);
- mutex_unlock(&sbi->resize_mutex);
- thaw_bdev(sbi->sb->s_bdev, sbi->sb);
+ mutex_unlock(&sbi->cp_mutex);
+ up_write(&sbi->gc_lock);
+ thaw_super(sbi->sb);
return err;
}
diff --git a/fs/f2fs/super.c b/fs/f2fs/super.c
index b83b17b54a0a6..1e7b1d21d0177 100644
--- a/fs/f2fs/super.c
+++ b/fs/f2fs/super.c
@@ -3412,7 +3412,6 @@ static int f2fs_fill_super(struct super_block *sb, void *data, int silent)
init_rwsem(&sbi->gc_lock);
mutex_init(&sbi->writepages);
mutex_init(&sbi->cp_mutex);
- mutex_init(&sbi->resize_mutex);
init_rwsem(&sbi->node_write);
init_rwsem(&sbi->node_change);

diff --git a/include/trace/events/f2fs.h b/include/trace/events/f2fs.h
index d97adfc327f03..f5eb03c54e96f 100644
--- a/include/trace/events/f2fs.h
+++ b/include/trace/events/f2fs.h
@@ -50,6 +50,7 @@ TRACE_DEFINE_ENUM(CP_RECOVERY);
TRACE_DEFINE_ENUM(CP_DISCARD);
TRACE_DEFINE_ENUM(CP_TRIMMED);
TRACE_DEFINE_ENUM(CP_PAUSE);
+TRACE_DEFINE_ENUM(CP_RESIZE);

#define show_block_type(type) \
__print_symbolic(type, \
@@ -126,7 +127,8 @@ TRACE_DEFINE_ENUM(CP_PAUSE);
{ CP_RECOVERY, "Recovery" }, \
{ CP_DISCARD, "Discard" }, \
{ CP_PAUSE, "Pause" }, \
- { CP_TRIMMED, "Trimmed" })
+ { CP_TRIMMED, "Trimmed" }, \
+ { CP_RESIZE, "Resize" })

#define show_fsync_cpreason(type) \
__print_symbolic(type, \
--
2.26.0.rc2.310.g2932bb562d-goog

2020-04-01 02:56:14

by Chao Yu

[permalink] [raw]
Subject: Re: [PATCH] f2fs: prevent meta updates while checkpoint is in progress

On 2020/4/1 2:43, Jaegeuk Kim wrote:
> On 03/31, Sahitya Tummala wrote:
>> On Mon, Mar 30, 2020 at 08:54:19PM -0700, Jaegeuk Kim wrote:
>>> On 03/26, Sahitya Tummala wrote:
>>>> allocate_segment_for_resize() can cause metapage updates if
>>>> it requires to change the current node/data segments for resizing.
>>>> Stop these meta updates when there is a checkpoint already
>>>> in progress to prevent inconsistent CP data.
>>>
>>> I'd prefer to use f2fs_lock_op() in bigger coverage.
>>
>> Do you mean to cover the entire free_segment_range() function within
>> f2fs_lock_op()? Please clarify.
>
> I didn't test tho, something like this?
>
> ---
> fs/f2fs/checkpoint.c | 6 ++++--
> fs/f2fs/f2fs.h | 2 +-
> fs/f2fs/gc.c | 28 ++++++++++++++--------------
> fs/f2fs/super.c | 1 -
> include/trace/events/f2fs.h | 4 +++-
> 5 files changed, 22 insertions(+), 19 deletions(-)
>
> diff --git a/fs/f2fs/checkpoint.c b/fs/f2fs/checkpoint.c
> index 852890b72d6ac..531995192b714 100644
> --- a/fs/f2fs/checkpoint.c
> +++ b/fs/f2fs/checkpoint.c
> @@ -1553,7 +1553,8 @@ int f2fs_write_checkpoint(struct f2fs_sb_info *sbi, struct cp_control *cpc)
> return 0;
> f2fs_warn(sbi, "Start checkpoint disabled!");
> }
> - mutex_lock(&sbi->cp_mutex);
> + if (cpc->reason != CP_RESIZE)
> + mutex_lock(&sbi->cp_mutex);
>
> if (!is_sbi_flag_set(sbi, SBI_IS_DIRTY) &&
> ((cpc->reason & CP_FASTBOOT) || (cpc->reason & CP_SYNC) ||
> @@ -1622,7 +1623,8 @@ int f2fs_write_checkpoint(struct f2fs_sb_info *sbi, struct cp_control *cpc)
> f2fs_update_time(sbi, CP_TIME);
> trace_f2fs_write_checkpoint(sbi->sb, cpc->reason, "finish checkpoint");
> out:
> - mutex_unlock(&sbi->cp_mutex);
> + if (cpc->reason != CP_RESIZE)
> + mutex_unlock(&sbi->cp_mutex);
> return err;
> }
>
> diff --git a/fs/f2fs/f2fs.h b/fs/f2fs/f2fs.h
> index c84442eefc56d..7c98dca3ec1d6 100644
> --- a/fs/f2fs/f2fs.h
> +++ b/fs/f2fs/f2fs.h
> @@ -193,6 +193,7 @@ enum {
> #define CP_DISCARD 0x00000010
> #define CP_TRIMMED 0x00000020
> #define CP_PAUSE 0x00000040
> +#define CP_RESIZE 0x00000080
>
> #define MAX_DISCARD_BLOCKS(sbi) BLKS_PER_SEC(sbi)
> #define DEF_MAX_DISCARD_REQUEST 8 /* issue 8 discards per round */
> @@ -1417,7 +1418,6 @@ struct f2fs_sb_info {
> unsigned int segs_per_sec; /* segments per section */
> unsigned int secs_per_zone; /* sections per zone */
> unsigned int total_sections; /* total section count */
> - struct mutex resize_mutex; /* for resize exclusion */
> unsigned int total_node_count; /* total node block count */
> unsigned int total_valid_node_count; /* valid node block count */
> loff_t max_file_blocks; /* max block index of file */
> diff --git a/fs/f2fs/gc.c b/fs/f2fs/gc.c
> index 26248c8936db0..1e5a06fda09d3 100644
> --- a/fs/f2fs/gc.c
> +++ b/fs/f2fs/gc.c
> @@ -1402,8 +1402,9 @@ void f2fs_build_gc_manager(struct f2fs_sb_info *sbi)
> static int free_segment_range(struct f2fs_sb_info *sbi, unsigned int start,
> unsigned int end)
> {
> - int type;
> unsigned int segno, next_inuse;
> + struct cp_control cpc = { CP_RESIZE, 0, 0, 0 };
> + int type;
> int err = 0;
>
> /* Move out cursegs from the target range */
> @@ -1417,16 +1418,14 @@ static int free_segment_range(struct f2fs_sb_info *sbi, unsigned int start,
> .iroot = RADIX_TREE_INIT(gc_list.iroot, GFP_NOFS),
> };
>
> - down_write(&sbi->gc_lock);
> do_garbage_collect(sbi, segno, &gc_list, FG_GC);
> - up_write(&sbi->gc_lock);
> put_gc_inode(&gc_list);
>
> if (get_valid_blocks(sbi, segno, true))
> return -EAGAIN;
> }
>
> - err = f2fs_sync_fs(sbi->sb, 1);
> + err = f2fs_write_checkpoint(sbi, &cpc);
> if (err)
> return err;
>
> @@ -1502,6 +1501,7 @@ static void update_fs_metadata(struct f2fs_sb_info *sbi, int secs)
> int f2fs_resize_fs(struct f2fs_sb_info *sbi, __u64 block_count)
> {
> __u64 old_block_count, shrunk_blocks;
> + struct cp_control cpc = { CP_RESIZE, 0, 0, 0 };
> unsigned int secs;
> int gc_mode, gc_type;
> int err = 0;
> @@ -1538,7 +1538,9 @@ int f2fs_resize_fs(struct f2fs_sb_info *sbi, __u64 block_count)
> return -EINVAL;
> }
>
> - freeze_bdev(sbi->sb->s_bdev);
> + freeze_super(sbi->sb);

Look at this again, I guess holding freeze lock here may cause potential
hang task issue, imaging that in a low-end storage, shrinking large size
address space, free_segment_range() needs very long time to migrate all
valid blocks in the tail of device, that's why previously we do block
migration with small gc_lock coverage.

Quoted:

Changelog v5 ==> v6:
- In free_segment_range(), reduce granularity of gc_mutex.

Thanks,

> + down_write(&sbi->gc_lock);
> + mutex_lock(&sbi->cp_mutex);
>
> shrunk_blocks = old_block_count - block_count;
> secs = div_u64(shrunk_blocks, BLKS_PER_SEC(sbi));
> @@ -1551,11 +1553,12 @@ int f2fs_resize_fs(struct f2fs_sb_info *sbi, __u64 block_count)
> sbi->user_block_count -= shrunk_blocks;
> spin_unlock(&sbi->stat_lock);
> if (err) {
> - thaw_bdev(sbi->sb->s_bdev, sbi->sb);
> + mutex_unlock(&sbi->cp_mutex);
> + up_write(&sbi->gc_lock);
> + thaw_super(sbi->sb);
> return err;
> }
>
> - mutex_lock(&sbi->resize_mutex);
> set_sbi_flag(sbi, SBI_IS_RESIZEFS);
>
> mutex_lock(&DIRTY_I(sbi)->seglist_lock);
> @@ -1587,17 +1590,13 @@ int f2fs_resize_fs(struct f2fs_sb_info *sbi, __u64 block_count)
> goto out;
> }
>
> - mutex_lock(&sbi->cp_mutex);
> update_fs_metadata(sbi, -secs);
> clear_sbi_flag(sbi, SBI_IS_RESIZEFS);
> set_sbi_flag(sbi, SBI_IS_DIRTY);
> - mutex_unlock(&sbi->cp_mutex);
>
> - err = f2fs_sync_fs(sbi->sb, 1);
> + err = f2fs_write_checkpoint(sbi, &cpc);
> if (err) {
> - mutex_lock(&sbi->cp_mutex);
> update_fs_metadata(sbi, secs);
> - mutex_unlock(&sbi->cp_mutex);
> update_sb_metadata(sbi, secs);
> f2fs_commit_super(sbi, false);
> }
> @@ -1612,7 +1611,8 @@ int f2fs_resize_fs(struct f2fs_sb_info *sbi, __u64 block_count)
> spin_unlock(&sbi->stat_lock);
> }
> clear_sbi_flag(sbi, SBI_IS_RESIZEFS);
> - mutex_unlock(&sbi->resize_mutex);
> - thaw_bdev(sbi->sb->s_bdev, sbi->sb);
> + mutex_unlock(&sbi->cp_mutex);
> + up_write(&sbi->gc_lock);
> + thaw_super(sbi->sb);
> return err;
> }
> diff --git a/fs/f2fs/super.c b/fs/f2fs/super.c
> index b83b17b54a0a6..1e7b1d21d0177 100644
> --- a/fs/f2fs/super.c
> +++ b/fs/f2fs/super.c
> @@ -3412,7 +3412,6 @@ static int f2fs_fill_super(struct super_block *sb, void *data, int silent)
> init_rwsem(&sbi->gc_lock);
> mutex_init(&sbi->writepages);
> mutex_init(&sbi->cp_mutex);
> - mutex_init(&sbi->resize_mutex);
> init_rwsem(&sbi->node_write);
> init_rwsem(&sbi->node_change);
>
> diff --git a/include/trace/events/f2fs.h b/include/trace/events/f2fs.h
> index d97adfc327f03..f5eb03c54e96f 100644
> --- a/include/trace/events/f2fs.h
> +++ b/include/trace/events/f2fs.h
> @@ -50,6 +50,7 @@ TRACE_DEFINE_ENUM(CP_RECOVERY);
> TRACE_DEFINE_ENUM(CP_DISCARD);
> TRACE_DEFINE_ENUM(CP_TRIMMED);
> TRACE_DEFINE_ENUM(CP_PAUSE);
> +TRACE_DEFINE_ENUM(CP_RESIZE);
>
> #define show_block_type(type) \
> __print_symbolic(type, \
> @@ -126,7 +127,8 @@ TRACE_DEFINE_ENUM(CP_PAUSE);
> { CP_RECOVERY, "Recovery" }, \
> { CP_DISCARD, "Discard" }, \
> { CP_PAUSE, "Pause" }, \
> - { CP_TRIMMED, "Trimmed" })
> + { CP_TRIMMED, "Trimmed" }, \
> + { CP_RESIZE, "Resize" })
>
> #define show_fsync_cpreason(type) \
> __print_symbolic(type, \
>

2020-04-01 05:11:11

by Sahitya Tummala

[permalink] [raw]
Subject: Re: [PATCH] f2fs: prevent meta updates while checkpoint is in progress

Hi Jaegeuk,

Got it.
The diff below looks good to me.
Would you like me to test it and put a patch for this?

Thanks,

On Tue, Mar 31, 2020 at 11:43:07AM -0700, Jaegeuk Kim wrote:
> On 03/31, Sahitya Tummala wrote:
> > On Mon, Mar 30, 2020 at 08:54:19PM -0700, Jaegeuk Kim wrote:
> > > On 03/26, Sahitya Tummala wrote:
> > > > allocate_segment_for_resize() can cause metapage updates if
> > > > it requires to change the current node/data segments for resizing.
> > > > Stop these meta updates when there is a checkpoint already
> > > > in progress to prevent inconsistent CP data.
> > >
> > > I'd prefer to use f2fs_lock_op() in bigger coverage.
> >
> > Do you mean to cover the entire free_segment_range() function within
> > f2fs_lock_op()? Please clarify.
>
> I didn't test tho, something like this?
>
> ---
> fs/f2fs/checkpoint.c | 6 ++++--
> fs/f2fs/f2fs.h | 2 +-
> fs/f2fs/gc.c | 28 ++++++++++++++--------------
> fs/f2fs/super.c | 1 -
> include/trace/events/f2fs.h | 4 +++-
> 5 files changed, 22 insertions(+), 19 deletions(-)
>
> diff --git a/fs/f2fs/checkpoint.c b/fs/f2fs/checkpoint.c
> index 852890b72d6ac..531995192b714 100644
> --- a/fs/f2fs/checkpoint.c
> +++ b/fs/f2fs/checkpoint.c
> @@ -1553,7 +1553,8 @@ int f2fs_write_checkpoint(struct f2fs_sb_info *sbi, struct cp_control *cpc)
> return 0;
> f2fs_warn(sbi, "Start checkpoint disabled!");
> }
> - mutex_lock(&sbi->cp_mutex);
> + if (cpc->reason != CP_RESIZE)
> + mutex_lock(&sbi->cp_mutex);
>
> if (!is_sbi_flag_set(sbi, SBI_IS_DIRTY) &&
> ((cpc->reason & CP_FASTBOOT) || (cpc->reason & CP_SYNC) ||
> @@ -1622,7 +1623,8 @@ int f2fs_write_checkpoint(struct f2fs_sb_info *sbi, struct cp_control *cpc)
> f2fs_update_time(sbi, CP_TIME);
> trace_f2fs_write_checkpoint(sbi->sb, cpc->reason, "finish checkpoint");
> out:
> - mutex_unlock(&sbi->cp_mutex);
> + if (cpc->reason != CP_RESIZE)
> + mutex_unlock(&sbi->cp_mutex);
> return err;
> }
>
> diff --git a/fs/f2fs/f2fs.h b/fs/f2fs/f2fs.h
> index c84442eefc56d..7c98dca3ec1d6 100644
> --- a/fs/f2fs/f2fs.h
> +++ b/fs/f2fs/f2fs.h
> @@ -193,6 +193,7 @@ enum {
> #define CP_DISCARD 0x00000010
> #define CP_TRIMMED 0x00000020
> #define CP_PAUSE 0x00000040
> +#define CP_RESIZE 0x00000080
>
> #define MAX_DISCARD_BLOCKS(sbi) BLKS_PER_SEC(sbi)
> #define DEF_MAX_DISCARD_REQUEST 8 /* issue 8 discards per round */
> @@ -1417,7 +1418,6 @@ struct f2fs_sb_info {
> unsigned int segs_per_sec; /* segments per section */
> unsigned int secs_per_zone; /* sections per zone */
> unsigned int total_sections; /* total section count */
> - struct mutex resize_mutex; /* for resize exclusion */
> unsigned int total_node_count; /* total node block count */
> unsigned int total_valid_node_count; /* valid node block count */
> loff_t max_file_blocks; /* max block index of file */
> diff --git a/fs/f2fs/gc.c b/fs/f2fs/gc.c
> index 26248c8936db0..1e5a06fda09d3 100644
> --- a/fs/f2fs/gc.c
> +++ b/fs/f2fs/gc.c
> @@ -1402,8 +1402,9 @@ void f2fs_build_gc_manager(struct f2fs_sb_info *sbi)
> static int free_segment_range(struct f2fs_sb_info *sbi, unsigned int start,
> unsigned int end)
> {
> - int type;
> unsigned int segno, next_inuse;
> + struct cp_control cpc = { CP_RESIZE, 0, 0, 0 };
> + int type;
> int err = 0;
>
> /* Move out cursegs from the target range */
> @@ -1417,16 +1418,14 @@ static int free_segment_range(struct f2fs_sb_info *sbi, unsigned int start,
> .iroot = RADIX_TREE_INIT(gc_list.iroot, GFP_NOFS),
> };
>
> - down_write(&sbi->gc_lock);
> do_garbage_collect(sbi, segno, &gc_list, FG_GC);
> - up_write(&sbi->gc_lock);
> put_gc_inode(&gc_list);
>
> if (get_valid_blocks(sbi, segno, true))
> return -EAGAIN;
> }
>
> - err = f2fs_sync_fs(sbi->sb, 1);
> + err = f2fs_write_checkpoint(sbi, &cpc);
> if (err)
> return err;
>
> @@ -1502,6 +1501,7 @@ static void update_fs_metadata(struct f2fs_sb_info *sbi, int secs)
> int f2fs_resize_fs(struct f2fs_sb_info *sbi, __u64 block_count)
> {
> __u64 old_block_count, shrunk_blocks;
> + struct cp_control cpc = { CP_RESIZE, 0, 0, 0 };
> unsigned int secs;
> int gc_mode, gc_type;
> int err = 0;
> @@ -1538,7 +1538,9 @@ int f2fs_resize_fs(struct f2fs_sb_info *sbi, __u64 block_count)
> return -EINVAL;
> }
>
> - freeze_bdev(sbi->sb->s_bdev);
> + freeze_super(sbi->sb);
> + down_write(&sbi->gc_lock);
> + mutex_lock(&sbi->cp_mutex);
>
> shrunk_blocks = old_block_count - block_count;
> secs = div_u64(shrunk_blocks, BLKS_PER_SEC(sbi));
> @@ -1551,11 +1553,12 @@ int f2fs_resize_fs(struct f2fs_sb_info *sbi, __u64 block_count)
> sbi->user_block_count -= shrunk_blocks;
> spin_unlock(&sbi->stat_lock);
> if (err) {
> - thaw_bdev(sbi->sb->s_bdev, sbi->sb);
> + mutex_unlock(&sbi->cp_mutex);
> + up_write(&sbi->gc_lock);
> + thaw_super(sbi->sb);
> return err;
> }
>
> - mutex_lock(&sbi->resize_mutex);
> set_sbi_flag(sbi, SBI_IS_RESIZEFS);
>
> mutex_lock(&DIRTY_I(sbi)->seglist_lock);
> @@ -1587,17 +1590,13 @@ int f2fs_resize_fs(struct f2fs_sb_info *sbi, __u64 block_count)
> goto out;
> }
>
> - mutex_lock(&sbi->cp_mutex);
> update_fs_metadata(sbi, -secs);
> clear_sbi_flag(sbi, SBI_IS_RESIZEFS);
> set_sbi_flag(sbi, SBI_IS_DIRTY);
> - mutex_unlock(&sbi->cp_mutex);
>
> - err = f2fs_sync_fs(sbi->sb, 1);
> + err = f2fs_write_checkpoint(sbi, &cpc);
> if (err) {
> - mutex_lock(&sbi->cp_mutex);
> update_fs_metadata(sbi, secs);
> - mutex_unlock(&sbi->cp_mutex);
> update_sb_metadata(sbi, secs);
> f2fs_commit_super(sbi, false);
> }
> @@ -1612,7 +1611,8 @@ int f2fs_resize_fs(struct f2fs_sb_info *sbi, __u64 block_count)
> spin_unlock(&sbi->stat_lock);
> }
> clear_sbi_flag(sbi, SBI_IS_RESIZEFS);
> - mutex_unlock(&sbi->resize_mutex);
> - thaw_bdev(sbi->sb->s_bdev, sbi->sb);
> + mutex_unlock(&sbi->cp_mutex);
> + up_write(&sbi->gc_lock);
> + thaw_super(sbi->sb);
> return err;
> }
> diff --git a/fs/f2fs/super.c b/fs/f2fs/super.c
> index b83b17b54a0a6..1e7b1d21d0177 100644
> --- a/fs/f2fs/super.c
> +++ b/fs/f2fs/super.c
> @@ -3412,7 +3412,6 @@ static int f2fs_fill_super(struct super_block *sb, void *data, int silent)
> init_rwsem(&sbi->gc_lock);
> mutex_init(&sbi->writepages);
> mutex_init(&sbi->cp_mutex);
> - mutex_init(&sbi->resize_mutex);
> init_rwsem(&sbi->node_write);
> init_rwsem(&sbi->node_change);
>
> diff --git a/include/trace/events/f2fs.h b/include/trace/events/f2fs.h
> index d97adfc327f03..f5eb03c54e96f 100644
> --- a/include/trace/events/f2fs.h
> +++ b/include/trace/events/f2fs.h
> @@ -50,6 +50,7 @@ TRACE_DEFINE_ENUM(CP_RECOVERY);
> TRACE_DEFINE_ENUM(CP_DISCARD);
> TRACE_DEFINE_ENUM(CP_TRIMMED);
> TRACE_DEFINE_ENUM(CP_PAUSE);
> +TRACE_DEFINE_ENUM(CP_RESIZE);
>
> #define show_block_type(type) \
> __print_symbolic(type, \
> @@ -126,7 +127,8 @@ TRACE_DEFINE_ENUM(CP_PAUSE);
> { CP_RECOVERY, "Recovery" }, \
> { CP_DISCARD, "Discard" }, \
> { CP_PAUSE, "Pause" }, \
> - { CP_TRIMMED, "Trimmed" })
> + { CP_TRIMMED, "Trimmed" }, \
> + { CP_RESIZE, "Resize" })
>
> #define show_fsync_cpreason(type) \
> __print_symbolic(type, \
> --
> 2.26.0.rc2.310.g2932bb562d-goog
>

--
--
Sent by a consultant of the Qualcomm Innovation Center, Inc.
The Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum.

2020-04-03 16:23:19

by Jaegeuk Kim

[permalink] [raw]
Subject: Re: [PATCH] f2fs: prevent meta updates while checkpoint is in progress

On 04/01, Chao Yu wrote:
> On 2020/4/1 2:43, Jaegeuk Kim wrote:
> > On 03/31, Sahitya Tummala wrote:
> >> On Mon, Mar 30, 2020 at 08:54:19PM -0700, Jaegeuk Kim wrote:
> >>> On 03/26, Sahitya Tummala wrote:
> >>>> allocate_segment_for_resize() can cause metapage updates if
> >>>> it requires to change the current node/data segments for resizing.
> >>>> Stop these meta updates when there is a checkpoint already
> >>>> in progress to prevent inconsistent CP data.
> >>>
> >>> I'd prefer to use f2fs_lock_op() in bigger coverage.
> >>
> >> Do you mean to cover the entire free_segment_range() function within
> >> f2fs_lock_op()? Please clarify.
> >
> > I didn't test tho, something like this?
> >
> > ---
> > fs/f2fs/checkpoint.c | 6 ++++--
> > fs/f2fs/f2fs.h | 2 +-
> > fs/f2fs/gc.c | 28 ++++++++++++++--------------
> > fs/f2fs/super.c | 1 -
> > include/trace/events/f2fs.h | 4 +++-
> > 5 files changed, 22 insertions(+), 19 deletions(-)
> >
> > diff --git a/fs/f2fs/checkpoint.c b/fs/f2fs/checkpoint.c
> > index 852890b72d6ac..531995192b714 100644
> > --- a/fs/f2fs/checkpoint.c
> > +++ b/fs/f2fs/checkpoint.c
> > @@ -1553,7 +1553,8 @@ int f2fs_write_checkpoint(struct f2fs_sb_info *sbi, struct cp_control *cpc)
> > return 0;
> > f2fs_warn(sbi, "Start checkpoint disabled!");
> > }
> > - mutex_lock(&sbi->cp_mutex);
> > + if (cpc->reason != CP_RESIZE)
> > + mutex_lock(&sbi->cp_mutex);
> >
> > if (!is_sbi_flag_set(sbi, SBI_IS_DIRTY) &&
> > ((cpc->reason & CP_FASTBOOT) || (cpc->reason & CP_SYNC) ||
> > @@ -1622,7 +1623,8 @@ int f2fs_write_checkpoint(struct f2fs_sb_info *sbi, struct cp_control *cpc)
> > f2fs_update_time(sbi, CP_TIME);
> > trace_f2fs_write_checkpoint(sbi->sb, cpc->reason, "finish checkpoint");
> > out:
> > - mutex_unlock(&sbi->cp_mutex);
> > + if (cpc->reason != CP_RESIZE)
> > + mutex_unlock(&sbi->cp_mutex);
> > return err;
> > }
> >
> > diff --git a/fs/f2fs/f2fs.h b/fs/f2fs/f2fs.h
> > index c84442eefc56d..7c98dca3ec1d6 100644
> > --- a/fs/f2fs/f2fs.h
> > +++ b/fs/f2fs/f2fs.h
> > @@ -193,6 +193,7 @@ enum {
> > #define CP_DISCARD 0x00000010
> > #define CP_TRIMMED 0x00000020
> > #define CP_PAUSE 0x00000040
> > +#define CP_RESIZE 0x00000080
> >
> > #define MAX_DISCARD_BLOCKS(sbi) BLKS_PER_SEC(sbi)
> > #define DEF_MAX_DISCARD_REQUEST 8 /* issue 8 discards per round */
> > @@ -1417,7 +1418,6 @@ struct f2fs_sb_info {
> > unsigned int segs_per_sec; /* segments per section */
> > unsigned int secs_per_zone; /* sections per zone */
> > unsigned int total_sections; /* total section count */
> > - struct mutex resize_mutex; /* for resize exclusion */
> > unsigned int total_node_count; /* total node block count */
> > unsigned int total_valid_node_count; /* valid node block count */
> > loff_t max_file_blocks; /* max block index of file */
> > diff --git a/fs/f2fs/gc.c b/fs/f2fs/gc.c
> > index 26248c8936db0..1e5a06fda09d3 100644
> > --- a/fs/f2fs/gc.c
> > +++ b/fs/f2fs/gc.c
> > @@ -1402,8 +1402,9 @@ void f2fs_build_gc_manager(struct f2fs_sb_info *sbi)
> > static int free_segment_range(struct f2fs_sb_info *sbi, unsigned int start,
> > unsigned int end)
> > {
> > - int type;
> > unsigned int segno, next_inuse;
> > + struct cp_control cpc = { CP_RESIZE, 0, 0, 0 };
> > + int type;
> > int err = 0;
> >
> > /* Move out cursegs from the target range */
> > @@ -1417,16 +1418,14 @@ static int free_segment_range(struct f2fs_sb_info *sbi, unsigned int start,
> > .iroot = RADIX_TREE_INIT(gc_list.iroot, GFP_NOFS),
> > };
> >
> > - down_write(&sbi->gc_lock);
> > do_garbage_collect(sbi, segno, &gc_list, FG_GC);
> > - up_write(&sbi->gc_lock);
> > put_gc_inode(&gc_list);
> >
> > if (get_valid_blocks(sbi, segno, true))
> > return -EAGAIN;
> > }
> >
> > - err = f2fs_sync_fs(sbi->sb, 1);
> > + err = f2fs_write_checkpoint(sbi, &cpc);
> > if (err)
> > return err;
> >
> > @@ -1502,6 +1501,7 @@ static void update_fs_metadata(struct f2fs_sb_info *sbi, int secs)
> > int f2fs_resize_fs(struct f2fs_sb_info *sbi, __u64 block_count)
> > {
> > __u64 old_block_count, shrunk_blocks;
> > + struct cp_control cpc = { CP_RESIZE, 0, 0, 0 };
> > unsigned int secs;
> > int gc_mode, gc_type;
> > int err = 0;
> > @@ -1538,7 +1538,9 @@ int f2fs_resize_fs(struct f2fs_sb_info *sbi, __u64 block_count)
> > return -EINVAL;
> > }
> >
> > - freeze_bdev(sbi->sb->s_bdev);
> > + freeze_super(sbi->sb);
>
> Look at this again, I guess holding freeze lock here may cause potential
> hang task issue, imaging that in a low-end storage, shrinking large size
> address space, free_segment_range() needs very long time to migrate all
> valid blocks in the tail of device, that's why previously we do block
> migration with small gc_lock coverage.

Hmm, it seems we had to do like:
1) do GC
2) freeze everything
3) check it's okay to go new metadata
4) do resize
5) thaw

>
> Quoted:
>
> Changelog v5 ==> v6:
> - In free_segment_range(), reduce granularity of gc_mutex.
>
> Thanks,
>
> > + down_write(&sbi->gc_lock);
> > + mutex_lock(&sbi->cp_mutex);
> >
> > shrunk_blocks = old_block_count - block_count;
> > secs = div_u64(shrunk_blocks, BLKS_PER_SEC(sbi));
> > @@ -1551,11 +1553,12 @@ int f2fs_resize_fs(struct f2fs_sb_info *sbi, __u64 block_count)
> > sbi->user_block_count -= shrunk_blocks;
> > spin_unlock(&sbi->stat_lock);
> > if (err) {
> > - thaw_bdev(sbi->sb->s_bdev, sbi->sb);
> > + mutex_unlock(&sbi->cp_mutex);
> > + up_write(&sbi->gc_lock);
> > + thaw_super(sbi->sb);
> > return err;
> > }
> >
> > - mutex_lock(&sbi->resize_mutex);
> > set_sbi_flag(sbi, SBI_IS_RESIZEFS);
> >
> > mutex_lock(&DIRTY_I(sbi)->seglist_lock);
> > @@ -1587,17 +1590,13 @@ int f2fs_resize_fs(struct f2fs_sb_info *sbi, __u64 block_count)
> > goto out;
> > }
> >
> > - mutex_lock(&sbi->cp_mutex);
> > update_fs_metadata(sbi, -secs);
> > clear_sbi_flag(sbi, SBI_IS_RESIZEFS);
> > set_sbi_flag(sbi, SBI_IS_DIRTY);
> > - mutex_unlock(&sbi->cp_mutex);
> >
> > - err = f2fs_sync_fs(sbi->sb, 1);
> > + err = f2fs_write_checkpoint(sbi, &cpc);
> > if (err) {
> > - mutex_lock(&sbi->cp_mutex);
> > update_fs_metadata(sbi, secs);
> > - mutex_unlock(&sbi->cp_mutex);
> > update_sb_metadata(sbi, secs);
> > f2fs_commit_super(sbi, false);
> > }
> > @@ -1612,7 +1611,8 @@ int f2fs_resize_fs(struct f2fs_sb_info *sbi, __u64 block_count)
> > spin_unlock(&sbi->stat_lock);
> > }
> > clear_sbi_flag(sbi, SBI_IS_RESIZEFS);
> > - mutex_unlock(&sbi->resize_mutex);
> > - thaw_bdev(sbi->sb->s_bdev, sbi->sb);
> > + mutex_unlock(&sbi->cp_mutex);
> > + up_write(&sbi->gc_lock);
> > + thaw_super(sbi->sb);
> > return err;
> > }
> > diff --git a/fs/f2fs/super.c b/fs/f2fs/super.c
> > index b83b17b54a0a6..1e7b1d21d0177 100644
> > --- a/fs/f2fs/super.c
> > +++ b/fs/f2fs/super.c
> > @@ -3412,7 +3412,6 @@ static int f2fs_fill_super(struct super_block *sb, void *data, int silent)
> > init_rwsem(&sbi->gc_lock);
> > mutex_init(&sbi->writepages);
> > mutex_init(&sbi->cp_mutex);
> > - mutex_init(&sbi->resize_mutex);
> > init_rwsem(&sbi->node_write);
> > init_rwsem(&sbi->node_change);
> >
> > diff --git a/include/trace/events/f2fs.h b/include/trace/events/f2fs.h
> > index d97adfc327f03..f5eb03c54e96f 100644
> > --- a/include/trace/events/f2fs.h
> > +++ b/include/trace/events/f2fs.h
> > @@ -50,6 +50,7 @@ TRACE_DEFINE_ENUM(CP_RECOVERY);
> > TRACE_DEFINE_ENUM(CP_DISCARD);
> > TRACE_DEFINE_ENUM(CP_TRIMMED);
> > TRACE_DEFINE_ENUM(CP_PAUSE);
> > +TRACE_DEFINE_ENUM(CP_RESIZE);
> >
> > #define show_block_type(type) \
> > __print_symbolic(type, \
> > @@ -126,7 +127,8 @@ TRACE_DEFINE_ENUM(CP_PAUSE);
> > { CP_RECOVERY, "Recovery" }, \
> > { CP_DISCARD, "Discard" }, \
> > { CP_PAUSE, "Pause" }, \
> > - { CP_TRIMMED, "Trimmed" })
> > + { CP_TRIMMED, "Trimmed" }, \
> > + { CP_RESIZE, "Resize" })
> >
> > #define show_fsync_cpreason(type) \
> > __print_symbolic(type, \
> >

2020-04-03 17:18:58

by Jaegeuk Kim

[permalink] [raw]
Subject: Re: [PATCH] f2fs: prevent meta updates while checkpoint is in progress

On 04/01, Sahitya Tummala wrote:
> Hi Jaegeuk,
>
> Got it.
> The diff below looks good to me.
> Would you like me to test it and put a patch for this?

Sahitya, Chao,

Could you please take a look at this patch and test intensively?

Thanks,

From b9126ba7437602d7945c420eb2eb411f9cb95600 Mon Sep 17 00:00:00 2001
From: Jaegeuk Kim <[email protected]>
Date: Tue, 31 Mar 2020 11:43:07 -0700
Subject: [PATCH] f2fs: refactor resize_fs to avoid meta updates in progress

Sahitya raised an issue:
- prevent meta updates while checkpoint is in progress

allocate_segment_for_resize() can cause metapage updates if
it requires to change the current node/data segments for resizing.
Stop these meta updates when there is a checkpoint already
in progress to prevent inconsistent CP data.

Signed-off-by: Sahitya Tummala <[email protected]>
Signed-off-by: Jaegeuk Kim <[email protected]>
---
fs/f2fs/checkpoint.c | 6 ++-
fs/f2fs/f2fs.h | 2 +-
fs/f2fs/file.c | 5 +-
fs/f2fs/gc.c | 105 +++++++++++++++++++-----------------
fs/f2fs/super.c | 1 -
include/trace/events/f2fs.h | 4 +-
6 files changed, 66 insertions(+), 57 deletions(-)

diff --git a/fs/f2fs/checkpoint.c b/fs/f2fs/checkpoint.c
index 852890b72d6ac..531995192b714 100644
--- a/fs/f2fs/checkpoint.c
+++ b/fs/f2fs/checkpoint.c
@@ -1553,7 +1553,8 @@ int f2fs_write_checkpoint(struct f2fs_sb_info *sbi, struct cp_control *cpc)
return 0;
f2fs_warn(sbi, "Start checkpoint disabled!");
}
- mutex_lock(&sbi->cp_mutex);
+ if (cpc->reason != CP_RESIZE)
+ mutex_lock(&sbi->cp_mutex);

if (!is_sbi_flag_set(sbi, SBI_IS_DIRTY) &&
((cpc->reason & CP_FASTBOOT) || (cpc->reason & CP_SYNC) ||
@@ -1622,7 +1623,8 @@ int f2fs_write_checkpoint(struct f2fs_sb_info *sbi, struct cp_control *cpc)
f2fs_update_time(sbi, CP_TIME);
trace_f2fs_write_checkpoint(sbi->sb, cpc->reason, "finish checkpoint");
out:
- mutex_unlock(&sbi->cp_mutex);
+ if (cpc->reason != CP_RESIZE)
+ mutex_unlock(&sbi->cp_mutex);
return err;
}

diff --git a/fs/f2fs/f2fs.h b/fs/f2fs/f2fs.h
index be02a5cadd944..f9b2caa2135bd 100644
--- a/fs/f2fs/f2fs.h
+++ b/fs/f2fs/f2fs.h
@@ -193,6 +193,7 @@ enum {
#define CP_DISCARD 0x00000010
#define CP_TRIMMED 0x00000020
#define CP_PAUSE 0x00000040
+#define CP_RESIZE 0x00000080

#define MAX_DISCARD_BLOCKS(sbi) BLKS_PER_SEC(sbi)
#define DEF_MAX_DISCARD_REQUEST 8 /* issue 8 discards per round */
@@ -1421,7 +1422,6 @@ struct f2fs_sb_info {
unsigned int segs_per_sec; /* segments per section */
unsigned int secs_per_zone; /* sections per zone */
unsigned int total_sections; /* total section count */
- struct mutex resize_mutex; /* for resize exclusion */
unsigned int total_node_count; /* total node block count */
unsigned int total_valid_node_count; /* valid node block count */
loff_t max_file_blocks; /* max block index of file */
diff --git a/fs/f2fs/file.c b/fs/f2fs/file.c
index 257e61d0afffb..b4c12370bb3d6 100644
--- a/fs/f2fs/file.c
+++ b/fs/f2fs/file.c
@@ -3305,7 +3305,6 @@ static int f2fs_ioc_resize_fs(struct file *filp, unsigned long arg)
{
struct f2fs_sb_info *sbi = F2FS_I_SB(file_inode(filp));
__u64 block_count;
- int ret;

if (!capable(CAP_SYS_ADMIN))
return -EPERM;
@@ -3317,9 +3316,7 @@ static int f2fs_ioc_resize_fs(struct file *filp, unsigned long arg)
sizeof(block_count)))
return -EFAULT;

- ret = f2fs_resize_fs(sbi, block_count);
-
- return ret;
+ return f2fs_resize_fs(sbi, block_count);
}

static int f2fs_ioc_enable_verity(struct file *filp, unsigned long arg)
diff --git a/fs/f2fs/gc.c b/fs/f2fs/gc.c
index 26248c8936db0..ca07fa4a6fd68 100644
--- a/fs/f2fs/gc.c
+++ b/fs/f2fs/gc.c
@@ -1399,12 +1399,28 @@ void f2fs_build_gc_manager(struct f2fs_sb_info *sbi)
GET_SEGNO(sbi, FDEV(0).end_blk) + 1;
}

-static int free_segment_range(struct f2fs_sb_info *sbi, unsigned int start,
- unsigned int end)
+static int free_segment_range(struct f2fs_sb_info *sbi,
+ unsigned int secs, bool gc_only)
{
- int type;
- unsigned int segno, next_inuse;
+ unsigned int segno, next_inuse, start, end;
+ struct cp_control cpc = { CP_RESIZE, 0, 0, 0 };
int err = 0;
+ int type;
+
+ /* Force block allocation for GC */
+ MAIN_SECS(sbi) -= secs;
+ start = MAIN_SECS(sbi) * sbi->segs_per_sec;
+ end = MAIN_SEGS(sbi) - 1;
+
+ mutex_lock(&DIRTY_I(sbi)->seglist_lock);
+ for (gc_mode = 0; gc_mode < MAX_GC_POLICY; gc_mode++)
+ if (SIT_I(sbi)->last_victim[gc_mode] >= start)
+ SIT_I(sbi)->last_victim[gc_mode] = 0;
+
+ for (gc_type = BG_GC; gc_type <= FG_GC; gc_type++)
+ if (sbi->next_victim_seg[gc_type] >= start)
+ sbi->next_victim_seg[gc_type] = NULL_SEGNO;
+ mutex_unlock(&DIRTY_I(sbi)->seglist_lock);

/* Move out cursegs from the target range */
for (type = CURSEG_HOT_DATA; type < NR_CURSEG_TYPE; type++)
@@ -1417,18 +1433,20 @@ static int free_segment_range(struct f2fs_sb_info *sbi, unsigned int start,
.iroot = RADIX_TREE_INIT(gc_list.iroot, GFP_NOFS),
};

- down_write(&sbi->gc_lock);
do_garbage_collect(sbi, segno, &gc_list, FG_GC);
- up_write(&sbi->gc_lock);
put_gc_inode(&gc_list);

- if (get_valid_blocks(sbi, segno, true))
- return -EAGAIN;
+ if (!gc_only && get_valid_blocks(sbi, segno, true)) {
+ err = -EAGAIN;
+ goto out;
+ }
}
+ if (gc_only)
+ goto out;

- err = f2fs_sync_fs(sbi->sb, 1);
+ err = f2fs_write_checkpoint(sbi, &cpc);
if (err)
- return err;
+ goto out;

next_inuse = find_next_inuse(FREE_I(sbi), end + 1, start);
if (next_inuse <= end) {
@@ -1436,6 +1454,8 @@ static int free_segment_range(struct f2fs_sb_info *sbi, unsigned int start,
next_inuse);
f2fs_bug_on(sbi, 1);
}
+out:
+ MAIN_SECS(sbi) -= secs;
return err;
}

@@ -1481,6 +1501,7 @@ static void update_fs_metadata(struct f2fs_sb_info *sbi, int secs)

SM_I(sbi)->segment_count = (int)SM_I(sbi)->segment_count + segs;
MAIN_SEGS(sbi) = (int)MAIN_SEGS(sbi) + segs;
+ MAIN_SECS(sbi) += secs;
FREE_I(sbi)->free_sections = (int)FREE_I(sbi)->free_sections + secs;
FREE_I(sbi)->free_segments = (int)FREE_I(sbi)->free_segments + segs;
F2FS_CKPT(sbi)->user_block_count = cpu_to_le64(user_block_count + blks);
@@ -1502,6 +1523,7 @@ static void update_fs_metadata(struct f2fs_sb_info *sbi, int secs)
int f2fs_resize_fs(struct f2fs_sb_info *sbi, __u64 block_count)
{
__u64 old_block_count, shrunk_blocks;
+ struct cp_control cpc = { CP_RESIZE, 0, 0, 0 };
unsigned int secs;
int gc_mode, gc_type;
int err = 0;
@@ -1538,10 +1560,22 @@ int f2fs_resize_fs(struct f2fs_sb_info *sbi, __u64 block_count)
return -EINVAL;
}

- freeze_bdev(sbi->sb->s_bdev);
-
shrunk_blocks = old_block_count - block_count;
secs = div_u64(shrunk_blocks, BLKS_PER_SEC(sbi));
+
+ /* protect MAIN_SEC in free_segment_range */
+ f2fs_lock_op(sbi);
+ err = free_segment_range(sbi, secs, true);
+ f2fs_unlock_op(sbi);
+ if (err)
+ return err;
+
+ set_sbi_flag(sbi, SBI_IS_RESIZEFS);
+
+ freeze_super(sbi->sb);
+ down_write(&sbi->gc_lock);
+ mutex_lock(&sbi->cp_mutex);
+
spin_lock(&sbi->stat_lock);
if (shrunk_blocks + valid_user_blocks(sbi) +
sbi->current_reserved_blocks + sbi->unusable_block_count +
@@ -1550,69 +1584,44 @@ int f2fs_resize_fs(struct f2fs_sb_info *sbi, __u64 block_count)
else
sbi->user_block_count -= shrunk_blocks;
spin_unlock(&sbi->stat_lock);
- if (err) {
- thaw_bdev(sbi->sb->s_bdev, sbi->sb);
- return err;
- }
-
- mutex_lock(&sbi->resize_mutex);
- set_sbi_flag(sbi, SBI_IS_RESIZEFS);
-
- mutex_lock(&DIRTY_I(sbi)->seglist_lock);
-
- MAIN_SECS(sbi) -= secs;
-
- for (gc_mode = 0; gc_mode < MAX_GC_POLICY; gc_mode++)
- if (SIT_I(sbi)->last_victim[gc_mode] >=
- MAIN_SECS(sbi) * sbi->segs_per_sec)
- SIT_I(sbi)->last_victim[gc_mode] = 0;
-
- for (gc_type = BG_GC; gc_type <= FG_GC; gc_type++)
- if (sbi->next_victim_seg[gc_type] >=
- MAIN_SECS(sbi) * sbi->segs_per_sec)
- sbi->next_victim_seg[gc_type] = NULL_SEGNO;
-
- mutex_unlock(&DIRTY_I(sbi)->seglist_lock);
+ if (err)
+ goto out_err;

- err = free_segment_range(sbi, MAIN_SECS(sbi) * sbi->segs_per_sec,
- MAIN_SEGS(sbi) - 1);
+ err = free_segment_range(sbi, secs, false):
if (err)
- goto out;
+ goto recover_out;

update_sb_metadata(sbi, -secs);

err = f2fs_commit_super(sbi, false);
if (err) {
update_sb_metadata(sbi, secs);
- goto out;
+ goto recover_out;
}

- mutex_lock(&sbi->cp_mutex);
update_fs_metadata(sbi, -secs);
clear_sbi_flag(sbi, SBI_IS_RESIZEFS);
set_sbi_flag(sbi, SBI_IS_DIRTY);
- mutex_unlock(&sbi->cp_mutex);

- err = f2fs_sync_fs(sbi->sb, 1);
+ err = f2fs_write_checkpoint(sbi, &cpc);
if (err) {
- mutex_lock(&sbi->cp_mutex);
update_fs_metadata(sbi, secs);
- mutex_unlock(&sbi->cp_mutex);
update_sb_metadata(sbi, secs);
f2fs_commit_super(sbi, false);
}
-out:
+recover_out:
if (err) {
set_sbi_flag(sbi, SBI_NEED_FSCK);
f2fs_err(sbi, "resize_fs failed, should run fsck to repair!");

- MAIN_SECS(sbi) += secs;
spin_lock(&sbi->stat_lock);
sbi->user_block_count += shrunk_blocks;
spin_unlock(&sbi->stat_lock);
}
+out_err:
+ mutex_unlock(&sbi->cp_mutex);
+ up_write(&sbi->gc_lock);
+ thaw_super(sbi->sb);
clear_sbi_flag(sbi, SBI_IS_RESIZEFS);
- mutex_unlock(&sbi->resize_mutex);
- thaw_bdev(sbi->sb->s_bdev, sbi->sb);
return err;
}
diff --git a/fs/f2fs/super.c b/fs/f2fs/super.c
index b83b17b54a0a6..1e7b1d21d0177 100644
--- a/fs/f2fs/super.c
+++ b/fs/f2fs/super.c
@@ -3412,7 +3412,6 @@ static int f2fs_fill_super(struct super_block *sb, void *data, int silent)
init_rwsem(&sbi->gc_lock);
mutex_init(&sbi->writepages);
mutex_init(&sbi->cp_mutex);
- mutex_init(&sbi->resize_mutex);
init_rwsem(&sbi->node_write);
init_rwsem(&sbi->node_change);

diff --git a/include/trace/events/f2fs.h b/include/trace/events/f2fs.h
index 4d7d4c391879d..5d1a72001fdb4 100644
--- a/include/trace/events/f2fs.h
+++ b/include/trace/events/f2fs.h
@@ -50,6 +50,7 @@ TRACE_DEFINE_ENUM(CP_RECOVERY);
TRACE_DEFINE_ENUM(CP_DISCARD);
TRACE_DEFINE_ENUM(CP_TRIMMED);
TRACE_DEFINE_ENUM(CP_PAUSE);
+TRACE_DEFINE_ENUM(CP_RESIZE);

#define show_block_type(type) \
__print_symbolic(type, \
@@ -126,7 +127,8 @@ TRACE_DEFINE_ENUM(CP_PAUSE);
{ CP_RECOVERY, "Recovery" }, \
{ CP_DISCARD, "Discard" }, \
{ CP_PAUSE, "Pause" }, \
- { CP_TRIMMED, "Trimmed" })
+ { CP_TRIMMED, "Trimmed" }, \
+ { CP_RESIZE, "Resize" })

#define show_fsync_cpreason(type) \
__print_symbolic(type, \
--
2.26.0.292.g33ef6b2f38-goog

2020-04-03 17:47:31

by Jaegeuk Kim

[permalink] [raw]
Subject: Re: [f2fs-dev] [PATCH] f2fs: prevent meta updates while checkpoint is in progress

On 04/03, Jaegeuk Kim wrote:
> On 04/01, Sahitya Tummala wrote:
> > Hi Jaegeuk,
> >
> > Got it.
> > The diff below looks good to me.
> > Would you like me to test it and put a patch for this?
>
> Sahitya, Chao,
>
> Could you please take a look at this patch and test intensively?
>
> Thanks,

v2:

From 6bf7d5b227d466b0fe90d4957af29bd184fb646e Mon Sep 17 00:00:00 2001
From: Jaegeuk Kim <[email protected]>
Date: Tue, 31 Mar 2020 11:43:07 -0700
Subject: [PATCH] f2fs: refactor resize_fs to avoid meta updates in progress

Sahitya raised an issue:
- prevent meta updates while checkpoint is in progress

allocate_segment_for_resize() can cause metapage updates if
it requires to change the current node/data segments for resizing.
Stop these meta updates when there is a checkpoint already
in progress to prevent inconsistent CP data.

Signed-off-by: Sahitya Tummala <[email protected]>
Signed-off-by: Jaegeuk Kim <[email protected]>
---
fs/f2fs/checkpoint.c | 6 +-
fs/f2fs/f2fs.h | 2 +-
fs/f2fs/file.c | 5 +-
fs/f2fs/gc.c | 107 +++++++++++++++++++-----------------
fs/f2fs/super.c | 1 -
include/trace/events/f2fs.h | 4 +-
6 files changed, 67 insertions(+), 58 deletions(-)

diff --git a/fs/f2fs/checkpoint.c b/fs/f2fs/checkpoint.c
index 852890b72d6ac..531995192b714 100644
--- a/fs/f2fs/checkpoint.c
+++ b/fs/f2fs/checkpoint.c
@@ -1553,7 +1553,8 @@ int f2fs_write_checkpoint(struct f2fs_sb_info *sbi, struct cp_control *cpc)
return 0;
f2fs_warn(sbi, "Start checkpoint disabled!");
}
- mutex_lock(&sbi->cp_mutex);
+ if (cpc->reason != CP_RESIZE)
+ mutex_lock(&sbi->cp_mutex);

if (!is_sbi_flag_set(sbi, SBI_IS_DIRTY) &&
((cpc->reason & CP_FASTBOOT) || (cpc->reason & CP_SYNC) ||
@@ -1622,7 +1623,8 @@ int f2fs_write_checkpoint(struct f2fs_sb_info *sbi, struct cp_control *cpc)
f2fs_update_time(sbi, CP_TIME);
trace_f2fs_write_checkpoint(sbi->sb, cpc->reason, "finish checkpoint");
out:
- mutex_unlock(&sbi->cp_mutex);
+ if (cpc->reason != CP_RESIZE)
+ mutex_unlock(&sbi->cp_mutex);
return err;
}

diff --git a/fs/f2fs/f2fs.h b/fs/f2fs/f2fs.h
index be02a5cadd944..f9b2caa2135bd 100644
--- a/fs/f2fs/f2fs.h
+++ b/fs/f2fs/f2fs.h
@@ -193,6 +193,7 @@ enum {
#define CP_DISCARD 0x00000010
#define CP_TRIMMED 0x00000020
#define CP_PAUSE 0x00000040
+#define CP_RESIZE 0x00000080

#define MAX_DISCARD_BLOCKS(sbi) BLKS_PER_SEC(sbi)
#define DEF_MAX_DISCARD_REQUEST 8 /* issue 8 discards per round */
@@ -1421,7 +1422,6 @@ struct f2fs_sb_info {
unsigned int segs_per_sec; /* segments per section */
unsigned int secs_per_zone; /* sections per zone */
unsigned int total_sections; /* total section count */
- struct mutex resize_mutex; /* for resize exclusion */
unsigned int total_node_count; /* total node block count */
unsigned int total_valid_node_count; /* valid node block count */
loff_t max_file_blocks; /* max block index of file */
diff --git a/fs/f2fs/file.c b/fs/f2fs/file.c
index 257e61d0afffb..b4c12370bb3d6 100644
--- a/fs/f2fs/file.c
+++ b/fs/f2fs/file.c
@@ -3305,7 +3305,6 @@ static int f2fs_ioc_resize_fs(struct file *filp, unsigned long arg)
{
struct f2fs_sb_info *sbi = F2FS_I_SB(file_inode(filp));
__u64 block_count;
- int ret;

if (!capable(CAP_SYS_ADMIN))
return -EPERM;
@@ -3317,9 +3316,7 @@ static int f2fs_ioc_resize_fs(struct file *filp, unsigned long arg)
sizeof(block_count)))
return -EFAULT;

- ret = f2fs_resize_fs(sbi, block_count);
-
- return ret;
+ return f2fs_resize_fs(sbi, block_count);
}

static int f2fs_ioc_enable_verity(struct file *filp, unsigned long arg)
diff --git a/fs/f2fs/gc.c b/fs/f2fs/gc.c
index 26248c8936db0..46c75ecb64a2e 100644
--- a/fs/f2fs/gc.c
+++ b/fs/f2fs/gc.c
@@ -1399,12 +1399,29 @@ void f2fs_build_gc_manager(struct f2fs_sb_info *sbi)
GET_SEGNO(sbi, FDEV(0).end_blk) + 1;
}

-static int free_segment_range(struct f2fs_sb_info *sbi, unsigned int start,
- unsigned int end)
+static int free_segment_range(struct f2fs_sb_info *sbi,
+ unsigned int secs, bool gc_only)
{
- int type;
- unsigned int segno, next_inuse;
+ unsigned int segno, next_inuse, start, end;
+ struct cp_control cpc = { CP_RESIZE, 0, 0, 0 };
+ int gc_mode, gc_type;
int err = 0;
+ int type;
+
+ /* Force block allocation for GC */
+ MAIN_SECS(sbi) -= secs;
+ start = MAIN_SECS(sbi) * sbi->segs_per_sec;
+ end = MAIN_SEGS(sbi) - 1;
+
+ mutex_lock(&DIRTY_I(sbi)->seglist_lock);
+ for (gc_mode = 0; gc_mode < MAX_GC_POLICY; gc_mode++)
+ if (SIT_I(sbi)->last_victim[gc_mode] >= start)
+ SIT_I(sbi)->last_victim[gc_mode] = 0;
+
+ for (gc_type = BG_GC; gc_type <= FG_GC; gc_type++)
+ if (sbi->next_victim_seg[gc_type] >= start)
+ sbi->next_victim_seg[gc_type] = NULL_SEGNO;
+ mutex_unlock(&DIRTY_I(sbi)->seglist_lock);

/* Move out cursegs from the target range */
for (type = CURSEG_HOT_DATA; type < NR_CURSEG_TYPE; type++)
@@ -1417,18 +1434,20 @@ static int free_segment_range(struct f2fs_sb_info *sbi, unsigned int start,
.iroot = RADIX_TREE_INIT(gc_list.iroot, GFP_NOFS),
};

- down_write(&sbi->gc_lock);
do_garbage_collect(sbi, segno, &gc_list, FG_GC);
- up_write(&sbi->gc_lock);
put_gc_inode(&gc_list);

- if (get_valid_blocks(sbi, segno, true))
- return -EAGAIN;
+ if (!gc_only && get_valid_blocks(sbi, segno, true)) {
+ err = -EAGAIN;
+ goto out;
+ }
}
+ if (gc_only)
+ goto out;

- err = f2fs_sync_fs(sbi->sb, 1);
+ err = f2fs_write_checkpoint(sbi, &cpc);
if (err)
- return err;
+ goto out;

next_inuse = find_next_inuse(FREE_I(sbi), end + 1, start);
if (next_inuse <= end) {
@@ -1436,6 +1455,8 @@ static int free_segment_range(struct f2fs_sb_info *sbi, unsigned int start,
next_inuse);
f2fs_bug_on(sbi, 1);
}
+out:
+ MAIN_SECS(sbi) -= secs;
return err;
}

@@ -1481,6 +1502,7 @@ static void update_fs_metadata(struct f2fs_sb_info *sbi, int secs)

SM_I(sbi)->segment_count = (int)SM_I(sbi)->segment_count + segs;
MAIN_SEGS(sbi) = (int)MAIN_SEGS(sbi) + segs;
+ MAIN_SECS(sbi) += secs;
FREE_I(sbi)->free_sections = (int)FREE_I(sbi)->free_sections + secs;
FREE_I(sbi)->free_segments = (int)FREE_I(sbi)->free_segments + segs;
F2FS_CKPT(sbi)->user_block_count = cpu_to_le64(user_block_count + blks);
@@ -1502,8 +1524,8 @@ static void update_fs_metadata(struct f2fs_sb_info *sbi, int secs)
int f2fs_resize_fs(struct f2fs_sb_info *sbi, __u64 block_count)
{
__u64 old_block_count, shrunk_blocks;
+ struct cp_control cpc = { CP_RESIZE, 0, 0, 0 };
unsigned int secs;
- int gc_mode, gc_type;
int err = 0;
__u32 rem;

@@ -1538,10 +1560,22 @@ int f2fs_resize_fs(struct f2fs_sb_info *sbi, __u64 block_count)
return -EINVAL;
}

- freeze_bdev(sbi->sb->s_bdev);
-
shrunk_blocks = old_block_count - block_count;
secs = div_u64(shrunk_blocks, BLKS_PER_SEC(sbi));
+
+ /* protect MAIN_SEC in free_segment_range */
+ f2fs_lock_op(sbi);
+ err = free_segment_range(sbi, secs, true);
+ f2fs_unlock_op(sbi);
+ if (err)
+ return err;
+
+ set_sbi_flag(sbi, SBI_IS_RESIZEFS);
+
+ freeze_super(sbi->sb);
+ down_write(&sbi->gc_lock);
+ mutex_lock(&sbi->cp_mutex);
+
spin_lock(&sbi->stat_lock);
if (shrunk_blocks + valid_user_blocks(sbi) +
sbi->current_reserved_blocks + sbi->unusable_block_count +
@@ -1550,69 +1584,44 @@ int f2fs_resize_fs(struct f2fs_sb_info *sbi, __u64 block_count)
else
sbi->user_block_count -= shrunk_blocks;
spin_unlock(&sbi->stat_lock);
- if (err) {
- thaw_bdev(sbi->sb->s_bdev, sbi->sb);
- return err;
- }
-
- mutex_lock(&sbi->resize_mutex);
- set_sbi_flag(sbi, SBI_IS_RESIZEFS);
-
- mutex_lock(&DIRTY_I(sbi)->seglist_lock);
-
- MAIN_SECS(sbi) -= secs;
-
- for (gc_mode = 0; gc_mode < MAX_GC_POLICY; gc_mode++)
- if (SIT_I(sbi)->last_victim[gc_mode] >=
- MAIN_SECS(sbi) * sbi->segs_per_sec)
- SIT_I(sbi)->last_victim[gc_mode] = 0;
-
- for (gc_type = BG_GC; gc_type <= FG_GC; gc_type++)
- if (sbi->next_victim_seg[gc_type] >=
- MAIN_SECS(sbi) * sbi->segs_per_sec)
- sbi->next_victim_seg[gc_type] = NULL_SEGNO;
-
- mutex_unlock(&DIRTY_I(sbi)->seglist_lock);
+ if (err)
+ goto out_err;

- err = free_segment_range(sbi, MAIN_SECS(sbi) * sbi->segs_per_sec,
- MAIN_SEGS(sbi) - 1);
+ err = free_segment_range(sbi, secs, false);
if (err)
- goto out;
+ goto recover_out;

update_sb_metadata(sbi, -secs);

err = f2fs_commit_super(sbi, false);
if (err) {
update_sb_metadata(sbi, secs);
- goto out;
+ goto recover_out;
}

- mutex_lock(&sbi->cp_mutex);
update_fs_metadata(sbi, -secs);
clear_sbi_flag(sbi, SBI_IS_RESIZEFS);
set_sbi_flag(sbi, SBI_IS_DIRTY);
- mutex_unlock(&sbi->cp_mutex);

- err = f2fs_sync_fs(sbi->sb, 1);
+ err = f2fs_write_checkpoint(sbi, &cpc);
if (err) {
- mutex_lock(&sbi->cp_mutex);
update_fs_metadata(sbi, secs);
- mutex_unlock(&sbi->cp_mutex);
update_sb_metadata(sbi, secs);
f2fs_commit_super(sbi, false);
}
-out:
+recover_out:
if (err) {
set_sbi_flag(sbi, SBI_NEED_FSCK);
f2fs_err(sbi, "resize_fs failed, should run fsck to repair!");

- MAIN_SECS(sbi) += secs;
spin_lock(&sbi->stat_lock);
sbi->user_block_count += shrunk_blocks;
spin_unlock(&sbi->stat_lock);
}
+out_err:
+ mutex_unlock(&sbi->cp_mutex);
+ up_write(&sbi->gc_lock);
+ thaw_super(sbi->sb);
clear_sbi_flag(sbi, SBI_IS_RESIZEFS);
- mutex_unlock(&sbi->resize_mutex);
- thaw_bdev(sbi->sb->s_bdev, sbi->sb);
return err;
}
diff --git a/fs/f2fs/super.c b/fs/f2fs/super.c
index b83b17b54a0a6..1e7b1d21d0177 100644
--- a/fs/f2fs/super.c
+++ b/fs/f2fs/super.c
@@ -3412,7 +3412,6 @@ static int f2fs_fill_super(struct super_block *sb, void *data, int silent)
init_rwsem(&sbi->gc_lock);
mutex_init(&sbi->writepages);
mutex_init(&sbi->cp_mutex);
- mutex_init(&sbi->resize_mutex);
init_rwsem(&sbi->node_write);
init_rwsem(&sbi->node_change);

diff --git a/include/trace/events/f2fs.h b/include/trace/events/f2fs.h
index 4d7d4c391879d..5d1a72001fdb4 100644
--- a/include/trace/events/f2fs.h
+++ b/include/trace/events/f2fs.h
@@ -50,6 +50,7 @@ TRACE_DEFINE_ENUM(CP_RECOVERY);
TRACE_DEFINE_ENUM(CP_DISCARD);
TRACE_DEFINE_ENUM(CP_TRIMMED);
TRACE_DEFINE_ENUM(CP_PAUSE);
+TRACE_DEFINE_ENUM(CP_RESIZE);

#define show_block_type(type) \
__print_symbolic(type, \
@@ -126,7 +127,8 @@ TRACE_DEFINE_ENUM(CP_PAUSE);
{ CP_RECOVERY, "Recovery" }, \
{ CP_DISCARD, "Discard" }, \
{ CP_PAUSE, "Pause" }, \
- { CP_TRIMMED, "Trimmed" })
+ { CP_TRIMMED, "Trimmed" }, \
+ { CP_RESIZE, "Resize" })

#define show_fsync_cpreason(type) \
__print_symbolic(type, \
--
2.26.0.292.g33ef6b2f38-goog

2020-04-07 02:35:33

by Chao Yu

[permalink] [raw]
Subject: Re: [f2fs-dev] [PATCH] f2fs: prevent meta updates while checkpoint is in progress

On 2020/4/4 1:27, Jaegeuk Kim wrote:
> On 04/03, Jaegeuk Kim wrote:
>> On 04/01, Sahitya Tummala wrote:
>>> Hi Jaegeuk,
>>>
>>> Got it.
>>> The diff below looks good to me.
>>> Would you like me to test it and put a patch for this?
>>
>> Sahitya, Chao,
>>
>> Could you please take a look at this patch and test intensively?
>>
>> Thanks,
>
> v2:
>
>>From 6bf7d5b227d466b0fe90d4957af29bd184fb646e Mon Sep 17 00:00:00 2001
> From: Jaegeuk Kim <[email protected]>
> Date: Tue, 31 Mar 2020 11:43:07 -0700
> Subject: [PATCH] f2fs: refactor resize_fs to avoid meta updates in progress
>
> Sahitya raised an issue:
> - prevent meta updates while checkpoint is in progress
>
> allocate_segment_for_resize() can cause metapage updates if
> it requires to change the current node/data segments for resizing.
> Stop these meta updates when there is a checkpoint already
> in progress to prevent inconsistent CP data.
>
> Signed-off-by: Sahitya Tummala <[email protected]>
> Signed-off-by: Jaegeuk Kim <[email protected]>
> ---
> fs/f2fs/checkpoint.c | 6 +-
> fs/f2fs/f2fs.h | 2 +-
> fs/f2fs/file.c | 5 +-
> fs/f2fs/gc.c | 107 +++++++++++++++++++-----------------
> fs/f2fs/super.c | 1 -
> include/trace/events/f2fs.h | 4 +-
> 6 files changed, 67 insertions(+), 58 deletions(-)
>
> diff --git a/fs/f2fs/checkpoint.c b/fs/f2fs/checkpoint.c
> index 852890b72d6ac..531995192b714 100644
> --- a/fs/f2fs/checkpoint.c
> +++ b/fs/f2fs/checkpoint.c
> @@ -1553,7 +1553,8 @@ int f2fs_write_checkpoint(struct f2fs_sb_info *sbi, struct cp_control *cpc)
> return 0;
> f2fs_warn(sbi, "Start checkpoint disabled!");
> }
> - mutex_lock(&sbi->cp_mutex);
> + if (cpc->reason != CP_RESIZE)
> + mutex_lock(&sbi->cp_mutex);
>
> if (!is_sbi_flag_set(sbi, SBI_IS_DIRTY) &&
> ((cpc->reason & CP_FASTBOOT) || (cpc->reason & CP_SYNC) ||
> @@ -1622,7 +1623,8 @@ int f2fs_write_checkpoint(struct f2fs_sb_info *sbi, struct cp_control *cpc)
> f2fs_update_time(sbi, CP_TIME);
> trace_f2fs_write_checkpoint(sbi->sb, cpc->reason, "finish checkpoint");
> out:
> - mutex_unlock(&sbi->cp_mutex);
> + if (cpc->reason != CP_RESIZE)
> + mutex_unlock(&sbi->cp_mutex);
> return err;
> }
>
> diff --git a/fs/f2fs/f2fs.h b/fs/f2fs/f2fs.h
> index be02a5cadd944..f9b2caa2135bd 100644
> --- a/fs/f2fs/f2fs.h
> +++ b/fs/f2fs/f2fs.h
> @@ -193,6 +193,7 @@ enum {
> #define CP_DISCARD 0x00000010
> #define CP_TRIMMED 0x00000020
> #define CP_PAUSE 0x00000040
> +#define CP_RESIZE 0x00000080
>
> #define MAX_DISCARD_BLOCKS(sbi) BLKS_PER_SEC(sbi)
> #define DEF_MAX_DISCARD_REQUEST 8 /* issue 8 discards per round */
> @@ -1421,7 +1422,6 @@ struct f2fs_sb_info {
> unsigned int segs_per_sec; /* segments per section */
> unsigned int secs_per_zone; /* sections per zone */
> unsigned int total_sections; /* total section count */
> - struct mutex resize_mutex; /* for resize exclusion */
> unsigned int total_node_count; /* total node block count */
> unsigned int total_valid_node_count; /* valid node block count */
> loff_t max_file_blocks; /* max block index of file */
> diff --git a/fs/f2fs/file.c b/fs/f2fs/file.c
> index 257e61d0afffb..b4c12370bb3d6 100644
> --- a/fs/f2fs/file.c
> +++ b/fs/f2fs/file.c
> @@ -3305,7 +3305,6 @@ static int f2fs_ioc_resize_fs(struct file *filp, unsigned long arg)
> {
> struct f2fs_sb_info *sbi = F2FS_I_SB(file_inode(filp));
> __u64 block_count;
> - int ret;
>
> if (!capable(CAP_SYS_ADMIN))
> return -EPERM;
> @@ -3317,9 +3316,7 @@ static int f2fs_ioc_resize_fs(struct file *filp, unsigned long arg)
> sizeof(block_count)))
> return -EFAULT;
>
> - ret = f2fs_resize_fs(sbi, block_count);
> -
> - return ret;
> + return f2fs_resize_fs(sbi, block_count);
> }
>
> static int f2fs_ioc_enable_verity(struct file *filp, unsigned long arg)
> diff --git a/fs/f2fs/gc.c b/fs/f2fs/gc.c
> index 26248c8936db0..46c75ecb64a2e 100644
> --- a/fs/f2fs/gc.c
> +++ b/fs/f2fs/gc.c
> @@ -1399,12 +1399,29 @@ void f2fs_build_gc_manager(struct f2fs_sb_info *sbi)
> GET_SEGNO(sbi, FDEV(0).end_blk) + 1;
> }
>
> -static int free_segment_range(struct f2fs_sb_info *sbi, unsigned int start,
> - unsigned int end)
> +static int free_segment_range(struct f2fs_sb_info *sbi,
> + unsigned int secs, bool gc_only)
> {
> - int type;
> - unsigned int segno, next_inuse;
> + unsigned int segno, next_inuse, start, end;
> + struct cp_control cpc = { CP_RESIZE, 0, 0, 0 };
> + int gc_mode, gc_type;
> int err = 0;
> + int type;
> +
> + /* Force block allocation for GC */
> + MAIN_SECS(sbi) -= secs;
> + start = MAIN_SECS(sbi) * sbi->segs_per_sec;
> + end = MAIN_SEGS(sbi) - 1;
> +
> + mutex_lock(&DIRTY_I(sbi)->seglist_lock);
> + for (gc_mode = 0; gc_mode < MAX_GC_POLICY; gc_mode++)
> + if (SIT_I(sbi)->last_victim[gc_mode] >= start)
> + SIT_I(sbi)->last_victim[gc_mode] = 0;
> +
> + for (gc_type = BG_GC; gc_type <= FG_GC; gc_type++)
> + if (sbi->next_victim_seg[gc_type] >= start)
> + sbi->next_victim_seg[gc_type] = NULL_SEGNO;
> + mutex_unlock(&DIRTY_I(sbi)->seglist_lock);
>
> /* Move out cursegs from the target range */
> for (type = CURSEG_HOT_DATA; type < NR_CURSEG_TYPE; type++)
> @@ -1417,18 +1434,20 @@ static int free_segment_range(struct f2fs_sb_info *sbi, unsigned int start,
> .iroot = RADIX_TREE_INIT(gc_list.iroot, GFP_NOFS),
> };
>
> - down_write(&sbi->gc_lock);
> do_garbage_collect(sbi, segno, &gc_list, FG_GC);
> - up_write(&sbi->gc_lock);
> put_gc_inode(&gc_list);
>
> - if (get_valid_blocks(sbi, segno, true))
> - return -EAGAIN;
> + if (!gc_only && get_valid_blocks(sbi, segno, true)) {
> + err = -EAGAIN;
> + goto out;
> + }
> }
> + if (gc_only)
> + goto out;
>
> - err = f2fs_sync_fs(sbi->sb, 1);
> + err = f2fs_write_checkpoint(sbi, &cpc);
> if (err)
> - return err;
> + goto out;
>
> next_inuse = find_next_inuse(FREE_I(sbi), end + 1, start);
> if (next_inuse <= end) {
> @@ -1436,6 +1455,8 @@ static int free_segment_range(struct f2fs_sb_info *sbi, unsigned int start,
> next_inuse);
> f2fs_bug_on(sbi, 1);
> }
> +out:
> + MAIN_SECS(sbi) -= secs;
> return err;
> }
>
> @@ -1481,6 +1502,7 @@ static void update_fs_metadata(struct f2fs_sb_info *sbi, int secs)
>
> SM_I(sbi)->segment_count = (int)SM_I(sbi)->segment_count + segs;
> MAIN_SEGS(sbi) = (int)MAIN_SEGS(sbi) + segs;
> + MAIN_SECS(sbi) += secs;
> FREE_I(sbi)->free_sections = (int)FREE_I(sbi)->free_sections + secs;
> FREE_I(sbi)->free_segments = (int)FREE_I(sbi)->free_segments + segs;
> F2FS_CKPT(sbi)->user_block_count = cpu_to_le64(user_block_count + blks);
> @@ -1502,8 +1524,8 @@ static void update_fs_metadata(struct f2fs_sb_info *sbi, int secs)
> int f2fs_resize_fs(struct f2fs_sb_info *sbi, __u64 block_count)
> {
> __u64 old_block_count, shrunk_blocks;
> + struct cp_control cpc = { CP_RESIZE, 0, 0, 0 };
> unsigned int secs;
> - int gc_mode, gc_type;
> int err = 0;
> __u32 rem;
>
> @@ -1538,10 +1560,22 @@ int f2fs_resize_fs(struct f2fs_sb_info *sbi, __u64 block_count)
> return -EINVAL;
> }
>
> - freeze_bdev(sbi->sb->s_bdev);
> -
> shrunk_blocks = old_block_count - block_count;
> secs = div_u64(shrunk_blocks, BLKS_PER_SEC(sbi));
> +
> + /* protect MAIN_SEC in free_segment_range */
> + f2fs_lock_op(sbi);
> + err = free_segment_range(sbi, secs, true);

For this path, we break the rule that we need hold gc_lock during
do_garbage_collect().

One other concern is that still the granularity of lock_op is too large,
to avoid potential hang if it triggers heavy gc migration, how about using
a timeout mechanism in free_segment_range() like we did in
f2fs_disable_checkpoint()?

> + f2fs_unlock_op(sbi);
> + if (err)
> + return err;
> +
> + set_sbi_flag(sbi, SBI_IS_RESIZEFS);
> +
> + freeze_super(sbi->sb);
> + down_write(&sbi->gc_lock);
> + mutex_lock(&sbi->cp_mutex);
> +
> spin_lock(&sbi->stat_lock);
> if (shrunk_blocks + valid_user_blocks(sbi) +
> sbi->current_reserved_blocks + sbi->unusable_block_count +
> @@ -1550,69 +1584,44 @@ int f2fs_resize_fs(struct f2fs_sb_info *sbi, __u64 block_count)
> else
> sbi->user_block_count -= shrunk_blocks;
> spin_unlock(&sbi->stat_lock);
> - if (err) {
> - thaw_bdev(sbi->sb->s_bdev, sbi->sb);
> - return err;
> - }
> -
> - mutex_lock(&sbi->resize_mutex);
> - set_sbi_flag(sbi, SBI_IS_RESIZEFS);
> -
> - mutex_lock(&DIRTY_I(sbi)->seglist_lock);
> -
> - MAIN_SECS(sbi) -= secs;
> -
> - for (gc_mode = 0; gc_mode < MAX_GC_POLICY; gc_mode++)
> - if (SIT_I(sbi)->last_victim[gc_mode] >=
> - MAIN_SECS(sbi) * sbi->segs_per_sec)
> - SIT_I(sbi)->last_victim[gc_mode] = 0;
> -
> - for (gc_type = BG_GC; gc_type <= FG_GC; gc_type++)
> - if (sbi->next_victim_seg[gc_type] >=
> - MAIN_SECS(sbi) * sbi->segs_per_sec)
> - sbi->next_victim_seg[gc_type] = NULL_SEGNO;
> -
> - mutex_unlock(&DIRTY_I(sbi)->seglist_lock);
> + if (err)
> + goto out_err;
>
> - err = free_segment_range(sbi, MAIN_SECS(sbi) * sbi->segs_per_sec,
> - MAIN_SEGS(sbi) - 1);
> + err = free_segment_range(sbi, secs, false);

Lock coverage is still large here, what about just checking the resize condition
with find_next_inuse(, end + 1, start), if the migration finished, then let's call
write_checkpoint(), otherwise, returning -EAGAIN.

> if (err)
> - goto out;
> + goto recover_out;
>
> update_sb_metadata(sbi, -secs);
>
> err = f2fs_commit_super(sbi, false);
> if (err) {
> update_sb_metadata(sbi, secs);
> - goto out;
> + goto recover_out;
> }
>
> - mutex_lock(&sbi->cp_mutex);
> update_fs_metadata(sbi, -secs);
> clear_sbi_flag(sbi, SBI_IS_RESIZEFS);
> set_sbi_flag(sbi, SBI_IS_DIRTY);
> - mutex_unlock(&sbi->cp_mutex);
>
> - err = f2fs_sync_fs(sbi->sb, 1);
> + err = f2fs_write_checkpoint(sbi, &cpc);
> if (err) {
> - mutex_lock(&sbi->cp_mutex);
> update_fs_metadata(sbi, secs);
> - mutex_unlock(&sbi->cp_mutex);
> update_sb_metadata(sbi, secs);
> f2fs_commit_super(sbi, false);
> }
> -out:
> +recover_out:
> if (err) {
> set_sbi_flag(sbi, SBI_NEED_FSCK);
> f2fs_err(sbi, "resize_fs failed, should run fsck to repair!");
>
> - MAIN_SECS(sbi) += secs;
> spin_lock(&sbi->stat_lock);
> sbi->user_block_count += shrunk_blocks;
> spin_unlock(&sbi->stat_lock);
> }
> +out_err:
> + mutex_unlock(&sbi->cp_mutex);
> + up_write(&sbi->gc_lock);
> + thaw_super(sbi->sb);
> clear_sbi_flag(sbi, SBI_IS_RESIZEFS);
> - mutex_unlock(&sbi->resize_mutex);
> - thaw_bdev(sbi->sb->s_bdev, sbi->sb);
> return err;
> }
> diff --git a/fs/f2fs/super.c b/fs/f2fs/super.c
> index b83b17b54a0a6..1e7b1d21d0177 100644
> --- a/fs/f2fs/super.c
> +++ b/fs/f2fs/super.c
> @@ -3412,7 +3412,6 @@ static int f2fs_fill_super(struct super_block *sb, void *data, int silent)
> init_rwsem(&sbi->gc_lock);
> mutex_init(&sbi->writepages);
> mutex_init(&sbi->cp_mutex);
> - mutex_init(&sbi->resize_mutex);
> init_rwsem(&sbi->node_write);
> init_rwsem(&sbi->node_change);
>
> diff --git a/include/trace/events/f2fs.h b/include/trace/events/f2fs.h
> index 4d7d4c391879d..5d1a72001fdb4 100644
> --- a/include/trace/events/f2fs.h
> +++ b/include/trace/events/f2fs.h
> @@ -50,6 +50,7 @@ TRACE_DEFINE_ENUM(CP_RECOVERY);
> TRACE_DEFINE_ENUM(CP_DISCARD);
> TRACE_DEFINE_ENUM(CP_TRIMMED);
> TRACE_DEFINE_ENUM(CP_PAUSE);
> +TRACE_DEFINE_ENUM(CP_RESIZE);
>
> #define show_block_type(type) \
> __print_symbolic(type, \
> @@ -126,7 +127,8 @@ TRACE_DEFINE_ENUM(CP_PAUSE);
> { CP_RECOVERY, "Recovery" }, \
> { CP_DISCARD, "Discard" }, \
> { CP_PAUSE, "Pause" }, \
> - { CP_TRIMMED, "Trimmed" })
> + { CP_TRIMMED, "Trimmed" }, \
> + { CP_RESIZE, "Resize" })
>
> #define show_fsync_cpreason(type) \
> __print_symbolic(type, \
>

2020-04-07 03:00:44

by Jaegeuk Kim

[permalink] [raw]
Subject: Re: [f2fs-dev] [PATCH] f2fs: prevent meta updates while checkpoint is in progress

On 04/07, Chao Yu wrote:
> On 2020/4/4 1:27, Jaegeuk Kim wrote:
> > On 04/03, Jaegeuk Kim wrote:
> >> On 04/01, Sahitya Tummala wrote:
> >>> Hi Jaegeuk,
> >>>
> >>> Got it.
> >>> The diff below looks good to me.
> >>> Would you like me to test it and put a patch for this?
> >>
> >> Sahitya, Chao,
> >>
> >> Could you please take a look at this patch and test intensively?
> >>
> >> Thanks,
> >
> > v2:
> >
> >>From 6bf7d5b227d466b0fe90d4957af29bd184fb646e Mon Sep 17 00:00:00 2001
> > From: Jaegeuk Kim <[email protected]>
> > Date: Tue, 31 Mar 2020 11:43:07 -0700
> > Subject: [PATCH] f2fs: refactor resize_fs to avoid meta updates in progress
> >
> > Sahitya raised an issue:
> > - prevent meta updates while checkpoint is in progress
> >
> > allocate_segment_for_resize() can cause metapage updates if
> > it requires to change the current node/data segments for resizing.
> > Stop these meta updates when there is a checkpoint already
> > in progress to prevent inconsistent CP data.
> >
> > Signed-off-by: Sahitya Tummala <[email protected]>
> > Signed-off-by: Jaegeuk Kim <[email protected]>
> > ---
> > fs/f2fs/checkpoint.c | 6 +-
> > fs/f2fs/f2fs.h | 2 +-
> > fs/f2fs/file.c | 5 +-
> > fs/f2fs/gc.c | 107 +++++++++++++++++++-----------------
> > fs/f2fs/super.c | 1 -
> > include/trace/events/f2fs.h | 4 +-
> > 6 files changed, 67 insertions(+), 58 deletions(-)
> >
> > diff --git a/fs/f2fs/checkpoint.c b/fs/f2fs/checkpoint.c
> > index 852890b72d6ac..531995192b714 100644
> > --- a/fs/f2fs/checkpoint.c
> > +++ b/fs/f2fs/checkpoint.c
> > @@ -1553,7 +1553,8 @@ int f2fs_write_checkpoint(struct f2fs_sb_info *sbi, struct cp_control *cpc)
> > return 0;
> > f2fs_warn(sbi, "Start checkpoint disabled!");
> > }
> > - mutex_lock(&sbi->cp_mutex);
> > + if (cpc->reason != CP_RESIZE)
> > + mutex_lock(&sbi->cp_mutex);
> >
> > if (!is_sbi_flag_set(sbi, SBI_IS_DIRTY) &&
> > ((cpc->reason & CP_FASTBOOT) || (cpc->reason & CP_SYNC) ||
> > @@ -1622,7 +1623,8 @@ int f2fs_write_checkpoint(struct f2fs_sb_info *sbi, struct cp_control *cpc)
> > f2fs_update_time(sbi, CP_TIME);
> > trace_f2fs_write_checkpoint(sbi->sb, cpc->reason, "finish checkpoint");
> > out:
> > - mutex_unlock(&sbi->cp_mutex);
> > + if (cpc->reason != CP_RESIZE)
> > + mutex_unlock(&sbi->cp_mutex);
> > return err;
> > }
> >
> > diff --git a/fs/f2fs/f2fs.h b/fs/f2fs/f2fs.h
> > index be02a5cadd944..f9b2caa2135bd 100644
> > --- a/fs/f2fs/f2fs.h
> > +++ b/fs/f2fs/f2fs.h
> > @@ -193,6 +193,7 @@ enum {
> > #define CP_DISCARD 0x00000010
> > #define CP_TRIMMED 0x00000020
> > #define CP_PAUSE 0x00000040
> > +#define CP_RESIZE 0x00000080
> >
> > #define MAX_DISCARD_BLOCKS(sbi) BLKS_PER_SEC(sbi)
> > #define DEF_MAX_DISCARD_REQUEST 8 /* issue 8 discards per round */
> > @@ -1421,7 +1422,6 @@ struct f2fs_sb_info {
> > unsigned int segs_per_sec; /* segments per section */
> > unsigned int secs_per_zone; /* sections per zone */
> > unsigned int total_sections; /* total section count */
> > - struct mutex resize_mutex; /* for resize exclusion */
> > unsigned int total_node_count; /* total node block count */
> > unsigned int total_valid_node_count; /* valid node block count */
> > loff_t max_file_blocks; /* max block index of file */
> > diff --git a/fs/f2fs/file.c b/fs/f2fs/file.c
> > index 257e61d0afffb..b4c12370bb3d6 100644
> > --- a/fs/f2fs/file.c
> > +++ b/fs/f2fs/file.c
> > @@ -3305,7 +3305,6 @@ static int f2fs_ioc_resize_fs(struct file *filp, unsigned long arg)
> > {
> > struct f2fs_sb_info *sbi = F2FS_I_SB(file_inode(filp));
> > __u64 block_count;
> > - int ret;
> >
> > if (!capable(CAP_SYS_ADMIN))
> > return -EPERM;
> > @@ -3317,9 +3316,7 @@ static int f2fs_ioc_resize_fs(struct file *filp, unsigned long arg)
> > sizeof(block_count)))
> > return -EFAULT;
> >
> > - ret = f2fs_resize_fs(sbi, block_count);
> > -
> > - return ret;
> > + return f2fs_resize_fs(sbi, block_count);
> > }
> >
> > static int f2fs_ioc_enable_verity(struct file *filp, unsigned long arg)
> > diff --git a/fs/f2fs/gc.c b/fs/f2fs/gc.c
> > index 26248c8936db0..46c75ecb64a2e 100644
> > --- a/fs/f2fs/gc.c
> > +++ b/fs/f2fs/gc.c
> > @@ -1399,12 +1399,29 @@ void f2fs_build_gc_manager(struct f2fs_sb_info *sbi)
> > GET_SEGNO(sbi, FDEV(0).end_blk) + 1;
> > }
> >
> > -static int free_segment_range(struct f2fs_sb_info *sbi, unsigned int start,
> > - unsigned int end)
> > +static int free_segment_range(struct f2fs_sb_info *sbi,
> > + unsigned int secs, bool gc_only)
> > {
> > - int type;
> > - unsigned int segno, next_inuse;
> > + unsigned int segno, next_inuse, start, end;
> > + struct cp_control cpc = { CP_RESIZE, 0, 0, 0 };
> > + int gc_mode, gc_type;
> > int err = 0;
> > + int type;
> > +
> > + /* Force block allocation for GC */
> > + MAIN_SECS(sbi) -= secs;
> > + start = MAIN_SECS(sbi) * sbi->segs_per_sec;
> > + end = MAIN_SEGS(sbi) - 1;
> > +
> > + mutex_lock(&DIRTY_I(sbi)->seglist_lock);
> > + for (gc_mode = 0; gc_mode < MAX_GC_POLICY; gc_mode++)
> > + if (SIT_I(sbi)->last_victim[gc_mode] >= start)
> > + SIT_I(sbi)->last_victim[gc_mode] = 0;
> > +
> > + for (gc_type = BG_GC; gc_type <= FG_GC; gc_type++)
> > + if (sbi->next_victim_seg[gc_type] >= start)
> > + sbi->next_victim_seg[gc_type] = NULL_SEGNO;
> > + mutex_unlock(&DIRTY_I(sbi)->seglist_lock);
> >
> > /* Move out cursegs from the target range */
> > for (type = CURSEG_HOT_DATA; type < NR_CURSEG_TYPE; type++)
> > @@ -1417,18 +1434,20 @@ static int free_segment_range(struct f2fs_sb_info *sbi, unsigned int start,
> > .iroot = RADIX_TREE_INIT(gc_list.iroot, GFP_NOFS),
> > };
> >
> > - down_write(&sbi->gc_lock);
> > do_garbage_collect(sbi, segno, &gc_list, FG_GC);
> > - up_write(&sbi->gc_lock);
> > put_gc_inode(&gc_list);
> >
> > - if (get_valid_blocks(sbi, segno, true))
> > - return -EAGAIN;
> > + if (!gc_only && get_valid_blocks(sbi, segno, true)) {
> > + err = -EAGAIN;
> > + goto out;
> > + }
> > }
> > + if (gc_only)
> > + goto out;
> >
> > - err = f2fs_sync_fs(sbi->sb, 1);
> > + err = f2fs_write_checkpoint(sbi, &cpc);
> > if (err)
> > - return err;
> > + goto out;
> >
> > next_inuse = find_next_inuse(FREE_I(sbi), end + 1, start);
> > if (next_inuse <= end) {
> > @@ -1436,6 +1455,8 @@ static int free_segment_range(struct f2fs_sb_info *sbi, unsigned int start,
> > next_inuse);
> > f2fs_bug_on(sbi, 1);
> > }
> > +out:
> > + MAIN_SECS(sbi) -= secs;
> > return err;
> > }
> >
> > @@ -1481,6 +1502,7 @@ static void update_fs_metadata(struct f2fs_sb_info *sbi, int secs)
> >
> > SM_I(sbi)->segment_count = (int)SM_I(sbi)->segment_count + segs;
> > MAIN_SEGS(sbi) = (int)MAIN_SEGS(sbi) + segs;
> > + MAIN_SECS(sbi) += secs;
> > FREE_I(sbi)->free_sections = (int)FREE_I(sbi)->free_sections + secs;
> > FREE_I(sbi)->free_segments = (int)FREE_I(sbi)->free_segments + segs;
> > F2FS_CKPT(sbi)->user_block_count = cpu_to_le64(user_block_count + blks);
> > @@ -1502,8 +1524,8 @@ static void update_fs_metadata(struct f2fs_sb_info *sbi, int secs)
> > int f2fs_resize_fs(struct f2fs_sb_info *sbi, __u64 block_count)
> > {
> > __u64 old_block_count, shrunk_blocks;
> > + struct cp_control cpc = { CP_RESIZE, 0, 0, 0 };
> > unsigned int secs;
> > - int gc_mode, gc_type;
> > int err = 0;
> > __u32 rem;
> >
> > @@ -1538,10 +1560,22 @@ int f2fs_resize_fs(struct f2fs_sb_info *sbi, __u64 block_count)
> > return -EINVAL;
> > }
> >
> > - freeze_bdev(sbi->sb->s_bdev);
> > -
> > shrunk_blocks = old_block_count - block_count;
> > secs = div_u64(shrunk_blocks, BLKS_PER_SEC(sbi));
> > +
> > + /* protect MAIN_SEC in free_segment_range */
> > + f2fs_lock_op(sbi);
> > + err = free_segment_range(sbi, secs, true);
>
> For this path, we break the rule that we need hold gc_lock during
> do_garbage_collect().

I don't get the point.
In free_segment_range(), gc_lock is held before/after do_garbage_collect().

>
> One other concern is that still the granularity of lock_op is too large,
> to avoid potential hang if it triggers heavy gc migration, how about using
> a timeout mechanism in free_segment_range() like we did in
> f2fs_disable_checkpoint()?

We can do first round GC without f2fs_lock_op().

>
> > + f2fs_unlock_op(sbi);
> > + if (err)
> > + return err;
> > +
> > + set_sbi_flag(sbi, SBI_IS_RESIZEFS);
> > +
> > + freeze_super(sbi->sb);
> > + down_write(&sbi->gc_lock);
> > + mutex_lock(&sbi->cp_mutex);
> > +
> > spin_lock(&sbi->stat_lock);
> > if (shrunk_blocks + valid_user_blocks(sbi) +
> > sbi->current_reserved_blocks + sbi->unusable_block_count +
> > @@ -1550,69 +1584,44 @@ int f2fs_resize_fs(struct f2fs_sb_info *sbi, __u64 block_count)
> > else
> > sbi->user_block_count -= shrunk_blocks;
> > spin_unlock(&sbi->stat_lock);
> > - if (err) {
> > - thaw_bdev(sbi->sb->s_bdev, sbi->sb);
> > - return err;
> > - }
> > -
> > - mutex_lock(&sbi->resize_mutex);
> > - set_sbi_flag(sbi, SBI_IS_RESIZEFS);
> > -
> > - mutex_lock(&DIRTY_I(sbi)->seglist_lock);
> > -
> > - MAIN_SECS(sbi) -= secs;
> > -
> > - for (gc_mode = 0; gc_mode < MAX_GC_POLICY; gc_mode++)
> > - if (SIT_I(sbi)->last_victim[gc_mode] >=
> > - MAIN_SECS(sbi) * sbi->segs_per_sec)
> > - SIT_I(sbi)->last_victim[gc_mode] = 0;
> > -
> > - for (gc_type = BG_GC; gc_type <= FG_GC; gc_type++)
> > - if (sbi->next_victim_seg[gc_type] >=
> > - MAIN_SECS(sbi) * sbi->segs_per_sec)
> > - sbi->next_victim_seg[gc_type] = NULL_SEGNO;
> > -
> > - mutex_unlock(&DIRTY_I(sbi)->seglist_lock);
> > + if (err)
> > + goto out_err;
> >
> > - err = free_segment_range(sbi, MAIN_SECS(sbi) * sbi->segs_per_sec,
> > - MAIN_SEGS(sbi) - 1);
> > + err = free_segment_range(sbi, secs, false);
>
> Lock coverage is still large here, what about just checking the resize condition
> with find_next_inuse(, end + 1, start), if the migration finished, then let's call
> write_checkpoint(), otherwise, returning -EAGAIN.

We did GC above, so how much time do you expect to complete here? Basically
I hesitate to return EAGAIN, since there's no context with respect to how
many times user need to retry to get succeess. Disabling checkpoint has
some ways to get a sense tho. Nevertheless, if we want to return EAGAIN,
it'd be better to give # of bytes that it requires to migrate further?

>
> > if (err)
> > - goto out;
> > + goto recover_out;
> >
> > update_sb_metadata(sbi, -secs);
> >
> > err = f2fs_commit_super(sbi, false);
> > if (err) {
> > update_sb_metadata(sbi, secs);
> > - goto out;
> > + goto recover_out;
> > }
> >
> > - mutex_lock(&sbi->cp_mutex);
> > update_fs_metadata(sbi, -secs);
> > clear_sbi_flag(sbi, SBI_IS_RESIZEFS);
> > set_sbi_flag(sbi, SBI_IS_DIRTY);
> > - mutex_unlock(&sbi->cp_mutex);
> >
> > - err = f2fs_sync_fs(sbi->sb, 1);
> > + err = f2fs_write_checkpoint(sbi, &cpc);
> > if (err) {
> > - mutex_lock(&sbi->cp_mutex);
> > update_fs_metadata(sbi, secs);
> > - mutex_unlock(&sbi->cp_mutex);
> > update_sb_metadata(sbi, secs);
> > f2fs_commit_super(sbi, false);
> > }
> > -out:
> > +recover_out:
> > if (err) {
> > set_sbi_flag(sbi, SBI_NEED_FSCK);
> > f2fs_err(sbi, "resize_fs failed, should run fsck to repair!");
> >
> > - MAIN_SECS(sbi) += secs;
> > spin_lock(&sbi->stat_lock);
> > sbi->user_block_count += shrunk_blocks;
> > spin_unlock(&sbi->stat_lock);
> > }
> > +out_err:
> > + mutex_unlock(&sbi->cp_mutex);
> > + up_write(&sbi->gc_lock);
> > + thaw_super(sbi->sb);
> > clear_sbi_flag(sbi, SBI_IS_RESIZEFS);
> > - mutex_unlock(&sbi->resize_mutex);
> > - thaw_bdev(sbi->sb->s_bdev, sbi->sb);
> > return err;
> > }
> > diff --git a/fs/f2fs/super.c b/fs/f2fs/super.c
> > index b83b17b54a0a6..1e7b1d21d0177 100644
> > --- a/fs/f2fs/super.c
> > +++ b/fs/f2fs/super.c
> > @@ -3412,7 +3412,6 @@ static int f2fs_fill_super(struct super_block *sb, void *data, int silent)
> > init_rwsem(&sbi->gc_lock);
> > mutex_init(&sbi->writepages);
> > mutex_init(&sbi->cp_mutex);
> > - mutex_init(&sbi->resize_mutex);
> > init_rwsem(&sbi->node_write);
> > init_rwsem(&sbi->node_change);
> >
> > diff --git a/include/trace/events/f2fs.h b/include/trace/events/f2fs.h
> > index 4d7d4c391879d..5d1a72001fdb4 100644
> > --- a/include/trace/events/f2fs.h
> > +++ b/include/trace/events/f2fs.h
> > @@ -50,6 +50,7 @@ TRACE_DEFINE_ENUM(CP_RECOVERY);
> > TRACE_DEFINE_ENUM(CP_DISCARD);
> > TRACE_DEFINE_ENUM(CP_TRIMMED);
> > TRACE_DEFINE_ENUM(CP_PAUSE);
> > +TRACE_DEFINE_ENUM(CP_RESIZE);
> >
> > #define show_block_type(type) \
> > __print_symbolic(type, \
> > @@ -126,7 +127,8 @@ TRACE_DEFINE_ENUM(CP_PAUSE);
> > { CP_RECOVERY, "Recovery" }, \
> > { CP_DISCARD, "Discard" }, \
> > { CP_PAUSE, "Pause" }, \
> > - { CP_TRIMMED, "Trimmed" })
> > + { CP_TRIMMED, "Trimmed" }, \
> > + { CP_RESIZE, "Resize" })
> >
> > #define show_fsync_cpreason(type) \
> > __print_symbolic(type, \
> >

2020-04-07 03:23:19

by Chao Yu

[permalink] [raw]
Subject: Re: [f2fs-dev] [PATCH] f2fs: prevent meta updates while checkpoint is in progress

On 2020/4/7 10:58, Jaegeuk Kim wrote:
> On 04/07, Chao Yu wrote:
>> On 2020/4/4 1:27, Jaegeuk Kim wrote:
>>> On 04/03, Jaegeuk Kim wrote:
>>>> On 04/01, Sahitya Tummala wrote:
>>>>> Hi Jaegeuk,
>>>>>
>>>>> Got it.
>>>>> The diff below looks good to me.
>>>>> Would you like me to test it and put a patch for this?
>>>>
>>>> Sahitya, Chao,
>>>>
>>>> Could you please take a look at this patch and test intensively?
>>>>
>>>> Thanks,
>>>
>>> v2:
>>>
>>> >From 6bf7d5b227d466b0fe90d4957af29bd184fb646e Mon Sep 17 00:00:00 2001
>>> From: Jaegeuk Kim <[email protected]>
>>> Date: Tue, 31 Mar 2020 11:43:07 -0700
>>> Subject: [PATCH] f2fs: refactor resize_fs to avoid meta updates in progress
>>>
>>> Sahitya raised an issue:
>>> - prevent meta updates while checkpoint is in progress
>>>
>>> allocate_segment_for_resize() can cause metapage updates if
>>> it requires to change the current node/data segments for resizing.
>>> Stop these meta updates when there is a checkpoint already
>>> in progress to prevent inconsistent CP data.
>>>
>>> Signed-off-by: Sahitya Tummala <[email protected]>
>>> Signed-off-by: Jaegeuk Kim <[email protected]>
>>> ---
>>> fs/f2fs/checkpoint.c | 6 +-
>>> fs/f2fs/f2fs.h | 2 +-
>>> fs/f2fs/file.c | 5 +-
>>> fs/f2fs/gc.c | 107 +++++++++++++++++++-----------------
>>> fs/f2fs/super.c | 1 -
>>> include/trace/events/f2fs.h | 4 +-
>>> 6 files changed, 67 insertions(+), 58 deletions(-)
>>>
>>> diff --git a/fs/f2fs/checkpoint.c b/fs/f2fs/checkpoint.c
>>> index 852890b72d6ac..531995192b714 100644
>>> --- a/fs/f2fs/checkpoint.c
>>> +++ b/fs/f2fs/checkpoint.c
>>> @@ -1553,7 +1553,8 @@ int f2fs_write_checkpoint(struct f2fs_sb_info *sbi, struct cp_control *cpc)
>>> return 0;
>>> f2fs_warn(sbi, "Start checkpoint disabled!");
>>> }
>>> - mutex_lock(&sbi->cp_mutex);
>>> + if (cpc->reason != CP_RESIZE)
>>> + mutex_lock(&sbi->cp_mutex);
>>>
>>> if (!is_sbi_flag_set(sbi, SBI_IS_DIRTY) &&
>>> ((cpc->reason & CP_FASTBOOT) || (cpc->reason & CP_SYNC) ||
>>> @@ -1622,7 +1623,8 @@ int f2fs_write_checkpoint(struct f2fs_sb_info *sbi, struct cp_control *cpc)
>>> f2fs_update_time(sbi, CP_TIME);
>>> trace_f2fs_write_checkpoint(sbi->sb, cpc->reason, "finish checkpoint");
>>> out:
>>> - mutex_unlock(&sbi->cp_mutex);
>>> + if (cpc->reason != CP_RESIZE)
>>> + mutex_unlock(&sbi->cp_mutex);
>>> return err;
>>> }
>>>
>>> diff --git a/fs/f2fs/f2fs.h b/fs/f2fs/f2fs.h
>>> index be02a5cadd944..f9b2caa2135bd 100644
>>> --- a/fs/f2fs/f2fs.h
>>> +++ b/fs/f2fs/f2fs.h
>>> @@ -193,6 +193,7 @@ enum {
>>> #define CP_DISCARD 0x00000010
>>> #define CP_TRIMMED 0x00000020
>>> #define CP_PAUSE 0x00000040
>>> +#define CP_RESIZE 0x00000080
>>>
>>> #define MAX_DISCARD_BLOCKS(sbi) BLKS_PER_SEC(sbi)
>>> #define DEF_MAX_DISCARD_REQUEST 8 /* issue 8 discards per round */
>>> @@ -1421,7 +1422,6 @@ struct f2fs_sb_info {
>>> unsigned int segs_per_sec; /* segments per section */
>>> unsigned int secs_per_zone; /* sections per zone */
>>> unsigned int total_sections; /* total section count */
>>> - struct mutex resize_mutex; /* for resize exclusion */
>>> unsigned int total_node_count; /* total node block count */
>>> unsigned int total_valid_node_count; /* valid node block count */
>>> loff_t max_file_blocks; /* max block index of file */
>>> diff --git a/fs/f2fs/file.c b/fs/f2fs/file.c
>>> index 257e61d0afffb..b4c12370bb3d6 100644
>>> --- a/fs/f2fs/file.c
>>> +++ b/fs/f2fs/file.c
>>> @@ -3305,7 +3305,6 @@ static int f2fs_ioc_resize_fs(struct file *filp, unsigned long arg)
>>> {
>>> struct f2fs_sb_info *sbi = F2FS_I_SB(file_inode(filp));
>>> __u64 block_count;
>>> - int ret;
>>>
>>> if (!capable(CAP_SYS_ADMIN))
>>> return -EPERM;
>>> @@ -3317,9 +3316,7 @@ static int f2fs_ioc_resize_fs(struct file *filp, unsigned long arg)
>>> sizeof(block_count)))
>>> return -EFAULT;
>>>
>>> - ret = f2fs_resize_fs(sbi, block_count);
>>> -
>>> - return ret;
>>> + return f2fs_resize_fs(sbi, block_count);
>>> }
>>>
>>> static int f2fs_ioc_enable_verity(struct file *filp, unsigned long arg)
>>> diff --git a/fs/f2fs/gc.c b/fs/f2fs/gc.c
>>> index 26248c8936db0..46c75ecb64a2e 100644
>>> --- a/fs/f2fs/gc.c
>>> +++ b/fs/f2fs/gc.c
>>> @@ -1399,12 +1399,29 @@ void f2fs_build_gc_manager(struct f2fs_sb_info *sbi)
>>> GET_SEGNO(sbi, FDEV(0).end_blk) + 1;
>>> }
>>>
>>> -static int free_segment_range(struct f2fs_sb_info *sbi, unsigned int start,
>>> - unsigned int end)
>>> +static int free_segment_range(struct f2fs_sb_info *sbi,
>>> + unsigned int secs, bool gc_only)
>>> {
>>> - int type;
>>> - unsigned int segno, next_inuse;
>>> + unsigned int segno, next_inuse, start, end;
>>> + struct cp_control cpc = { CP_RESIZE, 0, 0, 0 };
>>> + int gc_mode, gc_type;
>>> int err = 0;
>>> + int type;
>>> +
>>> + /* Force block allocation for GC */
>>> + MAIN_SECS(sbi) -= secs;
>>> + start = MAIN_SECS(sbi) * sbi->segs_per_sec;
>>> + end = MAIN_SEGS(sbi) - 1;
>>> +
>>> + mutex_lock(&DIRTY_I(sbi)->seglist_lock);
>>> + for (gc_mode = 0; gc_mode < MAX_GC_POLICY; gc_mode++)
>>> + if (SIT_I(sbi)->last_victim[gc_mode] >= start)
>>> + SIT_I(sbi)->last_victim[gc_mode] = 0;
>>> +
>>> + for (gc_type = BG_GC; gc_type <= FG_GC; gc_type++)
>>> + if (sbi->next_victim_seg[gc_type] >= start)
>>> + sbi->next_victim_seg[gc_type] = NULL_SEGNO;
>>> + mutex_unlock(&DIRTY_I(sbi)->seglist_lock);
>>>
>>> /* Move out cursegs from the target range */
>>> for (type = CURSEG_HOT_DATA; type < NR_CURSEG_TYPE; type++)
>>> @@ -1417,18 +1434,20 @@ static int free_segment_range(struct f2fs_sb_info *sbi, unsigned int start,
>>> .iroot = RADIX_TREE_INIT(gc_list.iroot, GFP_NOFS),
>>> };
>>>
>>> - down_write(&sbi->gc_lock);
>>> do_garbage_collect(sbi, segno, &gc_list, FG_GC);
>>> - up_write(&sbi->gc_lock);
>>> put_gc_inode(&gc_list);
>>>
>>> - if (get_valid_blocks(sbi, segno, true))
>>> - return -EAGAIN;
>>> + if (!gc_only && get_valid_blocks(sbi, segno, true)) {
>>> + err = -EAGAIN;
>>> + goto out;
>>> + }
>>> }
>>> + if (gc_only)
>>> + goto out;
>>>
>>> - err = f2fs_sync_fs(sbi->sb, 1);
>>> + err = f2fs_write_checkpoint(sbi, &cpc);
>>> if (err)
>>> - return err;
>>> + goto out;
>>>
>>> next_inuse = find_next_inuse(FREE_I(sbi), end + 1, start);
>>> if (next_inuse <= end) {
>>> @@ -1436,6 +1455,8 @@ static int free_segment_range(struct f2fs_sb_info *sbi, unsigned int start,
>>> next_inuse);
>>> f2fs_bug_on(sbi, 1);
>>> }
>>> +out:
>>> + MAIN_SECS(sbi) -= secs;
>>> return err;
>>> }
>>>
>>> @@ -1481,6 +1502,7 @@ static void update_fs_metadata(struct f2fs_sb_info *sbi, int secs)
>>>
>>> SM_I(sbi)->segment_count = (int)SM_I(sbi)->segment_count + segs;
>>> MAIN_SEGS(sbi) = (int)MAIN_SEGS(sbi) + segs;
>>> + MAIN_SECS(sbi) += secs;
>>> FREE_I(sbi)->free_sections = (int)FREE_I(sbi)->free_sections + secs;
>>> FREE_I(sbi)->free_segments = (int)FREE_I(sbi)->free_segments + segs;
>>> F2FS_CKPT(sbi)->user_block_count = cpu_to_le64(user_block_count + blks);
>>> @@ -1502,8 +1524,8 @@ static void update_fs_metadata(struct f2fs_sb_info *sbi, int secs)
>>> int f2fs_resize_fs(struct f2fs_sb_info *sbi, __u64 block_count)
>>> {
>>> __u64 old_block_count, shrunk_blocks;
>>> + struct cp_control cpc = { CP_RESIZE, 0, 0, 0 };
>>> unsigned int secs;
>>> - int gc_mode, gc_type;
>>> int err = 0;
>>> __u32 rem;
>>>
>>> @@ -1538,10 +1560,22 @@ int f2fs_resize_fs(struct f2fs_sb_info *sbi, __u64 block_count)
>>> return -EINVAL;
>>> }
>>>
>>> - freeze_bdev(sbi->sb->s_bdev);
>>> -
>>> shrunk_blocks = old_block_count - block_count;
>>> secs = div_u64(shrunk_blocks, BLKS_PER_SEC(sbi));
>>> +
>>> + /* protect MAIN_SEC in free_segment_range */
>>> + f2fs_lock_op(sbi);
>>> + err = free_segment_range(sbi, secs, true);
>>
>> For this path, we break the rule that we need hold gc_lock during
>> do_garbage_collect().
>
> I don't get the point.
> In free_segment_range(), gc_lock is held before/after do_garbage_collect().

@@ -1417,18 +1434,20 @@ static int free_segment_range(struct f2fs_sb_info *sbi, unsigned int start,
.iroot = RADIX_TREE_INIT(gc_list.iroot, GFP_NOFS),
};

- down_write(&sbi->gc_lock);
do_garbage_collect(sbi, segno, &gc_list, FG_GC);
- up_write(&sbi->gc_lock);
put_gc_inode(&gc_list);

They were removed, right?

>
>>
>> One other concern is that still the granularity of lock_op is too large,
>> to avoid potential hang if it triggers heavy gc migration, how about using
>> a timeout mechanism in free_segment_range() like we did in
>> f2fs_disable_checkpoint()?
>
> We can do first round GC without f2fs_lock_op().

Yup, that makes sense to me.

>
>>
>>> + f2fs_unlock_op(sbi);
>>> + if (err)
>>> + return err;
>>> +
>>> + set_sbi_flag(sbi, SBI_IS_RESIZEFS);
>>> +
>>> + freeze_super(sbi->sb);
>>> + down_write(&sbi->gc_lock);
>>> + mutex_lock(&sbi->cp_mutex);
>>> +
>>> spin_lock(&sbi->stat_lock);
>>> if (shrunk_blocks + valid_user_blocks(sbi) +
>>> sbi->current_reserved_blocks + sbi->unusable_block_count +
>>> @@ -1550,69 +1584,44 @@ int f2fs_resize_fs(struct f2fs_sb_info *sbi, __u64 block_count)
>>> else
>>> sbi->user_block_count -= shrunk_blocks;
>>> spin_unlock(&sbi->stat_lock);
>>> - if (err) {
>>> - thaw_bdev(sbi->sb->s_bdev, sbi->sb);
>>> - return err;
>>> - }
>>> -
>>> - mutex_lock(&sbi->resize_mutex);
>>> - set_sbi_flag(sbi, SBI_IS_RESIZEFS);
>>> -
>>> - mutex_lock(&DIRTY_I(sbi)->seglist_lock);
>>> -
>>> - MAIN_SECS(sbi) -= secs;
>>> -
>>> - for (gc_mode = 0; gc_mode < MAX_GC_POLICY; gc_mode++)
>>> - if (SIT_I(sbi)->last_victim[gc_mode] >=
>>> - MAIN_SECS(sbi) * sbi->segs_per_sec)
>>> - SIT_I(sbi)->last_victim[gc_mode] = 0;
>>> -
>>> - for (gc_type = BG_GC; gc_type <= FG_GC; gc_type++)
>>> - if (sbi->next_victim_seg[gc_type] >=
>>> - MAIN_SECS(sbi) * sbi->segs_per_sec)
>>> - sbi->next_victim_seg[gc_type] = NULL_SEGNO;
>>> -
>>> - mutex_unlock(&DIRTY_I(sbi)->seglist_lock);
>>> + if (err)
>>> + goto out_err;
>>>
>>> - err = free_segment_range(sbi, MAIN_SECS(sbi) * sbi->segs_per_sec,
>>> - MAIN_SEGS(sbi) - 1);
>>> + err = free_segment_range(sbi, secs, false);
>>
>> Lock coverage is still large here, what about just checking the resize condition
>> with find_next_inuse(, end + 1, start), if the migration finished, then let's call
>> write_checkpoint(), otherwise, returning -EAGAIN.
>
> We did GC above, so how much time do you expect to complete here? Basically

Just to consider handling corner case here, which is in between round 1 and round 2
GC, there may occur a fair mount of data writes locating in the tail of device.

> I hesitate to return EAGAIN, since there's no context with respect to how
> many times user need to retry to get succeess. Disabling checkpoint has

I think user can accept EAGAIN if we describe clearly about that the interface
may fail due to concurrent IO on device, if the administrator want to resize
successfully, concurrent operation on device should be avoided.

> some ways to get a sense tho. Nevertheless, if we want to return EAGAIN,
> it'd be better to give # of bytes that it requires to migrate further?

Hmm.. I don't think there is such number, as resize should be atomic.

Thanks,

>
>>
>>> if (err)
>>> - goto out;
>>> + goto recover_out;
>>>
>>> update_sb_metadata(sbi, -secs);
>>>
>>> err = f2fs_commit_super(sbi, false);
>>> if (err) {
>>> update_sb_metadata(sbi, secs);
>>> - goto out;
>>> + goto recover_out;
>>> }
>>>
>>> - mutex_lock(&sbi->cp_mutex);
>>> update_fs_metadata(sbi, -secs);
>>> clear_sbi_flag(sbi, SBI_IS_RESIZEFS);
>>> set_sbi_flag(sbi, SBI_IS_DIRTY);
>>> - mutex_unlock(&sbi->cp_mutex);
>>>
>>> - err = f2fs_sync_fs(sbi->sb, 1);
>>> + err = f2fs_write_checkpoint(sbi, &cpc);
>>> if (err) {
>>> - mutex_lock(&sbi->cp_mutex);
>>> update_fs_metadata(sbi, secs);
>>> - mutex_unlock(&sbi->cp_mutex);
>>> update_sb_metadata(sbi, secs);
>>> f2fs_commit_super(sbi, false);
>>> }
>>> -out:
>>> +recover_out:
>>> if (err) {
>>> set_sbi_flag(sbi, SBI_NEED_FSCK);
>>> f2fs_err(sbi, "resize_fs failed, should run fsck to repair!");
>>>
>>> - MAIN_SECS(sbi) += secs;
>>> spin_lock(&sbi->stat_lock);
>>> sbi->user_block_count += shrunk_blocks;
>>> spin_unlock(&sbi->stat_lock);
>>> }
>>> +out_err:
>>> + mutex_unlock(&sbi->cp_mutex);
>>> + up_write(&sbi->gc_lock);
>>> + thaw_super(sbi->sb);
>>> clear_sbi_flag(sbi, SBI_IS_RESIZEFS);
>>> - mutex_unlock(&sbi->resize_mutex);
>>> - thaw_bdev(sbi->sb->s_bdev, sbi->sb);
>>> return err;
>>> }
>>> diff --git a/fs/f2fs/super.c b/fs/f2fs/super.c
>>> index b83b17b54a0a6..1e7b1d21d0177 100644
>>> --- a/fs/f2fs/super.c
>>> +++ b/fs/f2fs/super.c
>>> @@ -3412,7 +3412,6 @@ static int f2fs_fill_super(struct super_block *sb, void *data, int silent)
>>> init_rwsem(&sbi->gc_lock);
>>> mutex_init(&sbi->writepages);
>>> mutex_init(&sbi->cp_mutex);
>>> - mutex_init(&sbi->resize_mutex);
>>> init_rwsem(&sbi->node_write);
>>> init_rwsem(&sbi->node_change);
>>>
>>> diff --git a/include/trace/events/f2fs.h b/include/trace/events/f2fs.h
>>> index 4d7d4c391879d..5d1a72001fdb4 100644
>>> --- a/include/trace/events/f2fs.h
>>> +++ b/include/trace/events/f2fs.h
>>> @@ -50,6 +50,7 @@ TRACE_DEFINE_ENUM(CP_RECOVERY);
>>> TRACE_DEFINE_ENUM(CP_DISCARD);
>>> TRACE_DEFINE_ENUM(CP_TRIMMED);
>>> TRACE_DEFINE_ENUM(CP_PAUSE);
>>> +TRACE_DEFINE_ENUM(CP_RESIZE);
>>>
>>> #define show_block_type(type) \
>>> __print_symbolic(type, \
>>> @@ -126,7 +127,8 @@ TRACE_DEFINE_ENUM(CP_PAUSE);
>>> { CP_RECOVERY, "Recovery" }, \
>>> { CP_DISCARD, "Discard" }, \
>>> { CP_PAUSE, "Pause" }, \
>>> - { CP_TRIMMED, "Trimmed" })
>>> + { CP_TRIMMED, "Trimmed" }, \
>>> + { CP_RESIZE, "Resize" })
>>>
>>> #define show_fsync_cpreason(type) \
>>> __print_symbolic(type, \
>>>
> .
>

2020-04-09 02:28:03

by Jaegeuk Kim

[permalink] [raw]
Subject: Re: [f2fs-dev] [PATCH] f2fs: prevent meta updates while checkpoint is in progress

On 04/07, Chao Yu wrote:
> On 2020/4/7 10:58, Jaegeuk Kim wrote:
> > On 04/07, Chao Yu wrote:
> >> On 2020/4/4 1:27, Jaegeuk Kim wrote:
> >>> On 04/03, Jaegeuk Kim wrote:
> >>>> On 04/01, Sahitya Tummala wrote:
> >>>>> Hi Jaegeuk,
> >>>>>
> >>>>> Got it.
> >>>>> The diff below looks good to me.
> >>>>> Would you like me to test it and put a patch for this?
> >>>>
> >>>> Sahitya, Chao,
> >>>>
> >>>> Could you please take a look at this patch and test intensively?
> >>>>
> >>>> Thanks,
> >>>
> >>> v2:
> >>>
> >>> >From 6bf7d5b227d466b0fe90d4957af29bd184fb646e Mon Sep 17 00:00:00 2001
> >>> From: Jaegeuk Kim <[email protected]>
> >>> Date: Tue, 31 Mar 2020 11:43:07 -0700
> >>> Subject: [PATCH] f2fs: refactor resize_fs to avoid meta updates in progress
> >>>
> >>> Sahitya raised an issue:
> >>> - prevent meta updates while checkpoint is in progress
> >>>
> >>> allocate_segment_for_resize() can cause metapage updates if
> >>> it requires to change the current node/data segments for resizing.
> >>> Stop these meta updates when there is a checkpoint already
> >>> in progress to prevent inconsistent CP data.
> >>>
> >>> Signed-off-by: Sahitya Tummala <[email protected]>
> >>> Signed-off-by: Jaegeuk Kim <[email protected]>
> >>> ---
> >>> fs/f2fs/checkpoint.c | 6 +-
> >>> fs/f2fs/f2fs.h | 2 +-
> >>> fs/f2fs/file.c | 5 +-
> >>> fs/f2fs/gc.c | 107 +++++++++++++++++++-----------------
> >>> fs/f2fs/super.c | 1 -
> >>> include/trace/events/f2fs.h | 4 +-
> >>> 6 files changed, 67 insertions(+), 58 deletions(-)
> >>>
> >>> diff --git a/fs/f2fs/checkpoint.c b/fs/f2fs/checkpoint.c
> >>> index 852890b72d6ac..531995192b714 100644
> >>> --- a/fs/f2fs/checkpoint.c
> >>> +++ b/fs/f2fs/checkpoint.c
> >>> @@ -1553,7 +1553,8 @@ int f2fs_write_checkpoint(struct f2fs_sb_info *sbi, struct cp_control *cpc)
> >>> return 0;
> >>> f2fs_warn(sbi, "Start checkpoint disabled!");
> >>> }
> >>> - mutex_lock(&sbi->cp_mutex);
> >>> + if (cpc->reason != CP_RESIZE)
> >>> + mutex_lock(&sbi->cp_mutex);
> >>>
> >>> if (!is_sbi_flag_set(sbi, SBI_IS_DIRTY) &&
> >>> ((cpc->reason & CP_FASTBOOT) || (cpc->reason & CP_SYNC) ||
> >>> @@ -1622,7 +1623,8 @@ int f2fs_write_checkpoint(struct f2fs_sb_info *sbi, struct cp_control *cpc)
> >>> f2fs_update_time(sbi, CP_TIME);
> >>> trace_f2fs_write_checkpoint(sbi->sb, cpc->reason, "finish checkpoint");
> >>> out:
> >>> - mutex_unlock(&sbi->cp_mutex);
> >>> + if (cpc->reason != CP_RESIZE)
> >>> + mutex_unlock(&sbi->cp_mutex);
> >>> return err;
> >>> }
> >>>
> >>> diff --git a/fs/f2fs/f2fs.h b/fs/f2fs/f2fs.h
> >>> index be02a5cadd944..f9b2caa2135bd 100644
> >>> --- a/fs/f2fs/f2fs.h
> >>> +++ b/fs/f2fs/f2fs.h
> >>> @@ -193,6 +193,7 @@ enum {
> >>> #define CP_DISCARD 0x00000010
> >>> #define CP_TRIMMED 0x00000020
> >>> #define CP_PAUSE 0x00000040
> >>> +#define CP_RESIZE 0x00000080
> >>>
> >>> #define MAX_DISCARD_BLOCKS(sbi) BLKS_PER_SEC(sbi)
> >>> #define DEF_MAX_DISCARD_REQUEST 8 /* issue 8 discards per round */
> >>> @@ -1421,7 +1422,6 @@ struct f2fs_sb_info {
> >>> unsigned int segs_per_sec; /* segments per section */
> >>> unsigned int secs_per_zone; /* sections per zone */
> >>> unsigned int total_sections; /* total section count */
> >>> - struct mutex resize_mutex; /* for resize exclusion */
> >>> unsigned int total_node_count; /* total node block count */
> >>> unsigned int total_valid_node_count; /* valid node block count */
> >>> loff_t max_file_blocks; /* max block index of file */
> >>> diff --git a/fs/f2fs/file.c b/fs/f2fs/file.c
> >>> index 257e61d0afffb..b4c12370bb3d6 100644
> >>> --- a/fs/f2fs/file.c
> >>> +++ b/fs/f2fs/file.c
> >>> @@ -3305,7 +3305,6 @@ static int f2fs_ioc_resize_fs(struct file *filp, unsigned long arg)
> >>> {
> >>> struct f2fs_sb_info *sbi = F2FS_I_SB(file_inode(filp));
> >>> __u64 block_count;
> >>> - int ret;
> >>>
> >>> if (!capable(CAP_SYS_ADMIN))
> >>> return -EPERM;
> >>> @@ -3317,9 +3316,7 @@ static int f2fs_ioc_resize_fs(struct file *filp, unsigned long arg)
> >>> sizeof(block_count)))
> >>> return -EFAULT;
> >>>
> >>> - ret = f2fs_resize_fs(sbi, block_count);
> >>> -
> >>> - return ret;
> >>> + return f2fs_resize_fs(sbi, block_count);
> >>> }
> >>>
> >>> static int f2fs_ioc_enable_verity(struct file *filp, unsigned long arg)
> >>> diff --git a/fs/f2fs/gc.c b/fs/f2fs/gc.c
> >>> index 26248c8936db0..46c75ecb64a2e 100644
> >>> --- a/fs/f2fs/gc.c
> >>> +++ b/fs/f2fs/gc.c
> >>> @@ -1399,12 +1399,29 @@ void f2fs_build_gc_manager(struct f2fs_sb_info *sbi)
> >>> GET_SEGNO(sbi, FDEV(0).end_blk) + 1;
> >>> }
> >>>
> >>> -static int free_segment_range(struct f2fs_sb_info *sbi, unsigned int start,
> >>> - unsigned int end)
> >>> +static int free_segment_range(struct f2fs_sb_info *sbi,
> >>> + unsigned int secs, bool gc_only)
> >>> {
> >>> - int type;
> >>> - unsigned int segno, next_inuse;
> >>> + unsigned int segno, next_inuse, start, end;
> >>> + struct cp_control cpc = { CP_RESIZE, 0, 0, 0 };
> >>> + int gc_mode, gc_type;
> >>> int err = 0;
> >>> + int type;
> >>> +
> >>> + /* Force block allocation for GC */
> >>> + MAIN_SECS(sbi) -= secs;
> >>> + start = MAIN_SECS(sbi) * sbi->segs_per_sec;
> >>> + end = MAIN_SEGS(sbi) - 1;
> >>> +
> >>> + mutex_lock(&DIRTY_I(sbi)->seglist_lock);
> >>> + for (gc_mode = 0; gc_mode < MAX_GC_POLICY; gc_mode++)
> >>> + if (SIT_I(sbi)->last_victim[gc_mode] >= start)
> >>> + SIT_I(sbi)->last_victim[gc_mode] = 0;
> >>> +
> >>> + for (gc_type = BG_GC; gc_type <= FG_GC; gc_type++)
> >>> + if (sbi->next_victim_seg[gc_type] >= start)
> >>> + sbi->next_victim_seg[gc_type] = NULL_SEGNO;
> >>> + mutex_unlock(&DIRTY_I(sbi)->seglist_lock);
> >>>
> >>> /* Move out cursegs from the target range */
> >>> for (type = CURSEG_HOT_DATA; type < NR_CURSEG_TYPE; type++)
> >>> @@ -1417,18 +1434,20 @@ static int free_segment_range(struct f2fs_sb_info *sbi, unsigned int start,
> >>> .iroot = RADIX_TREE_INIT(gc_list.iroot, GFP_NOFS),
> >>> };
> >>>
> >>> - down_write(&sbi->gc_lock);
> >>> do_garbage_collect(sbi, segno, &gc_list, FG_GC);
> >>> - up_write(&sbi->gc_lock);
> >>> put_gc_inode(&gc_list);
> >>>
> >>> - if (get_valid_blocks(sbi, segno, true))
> >>> - return -EAGAIN;
> >>> + if (!gc_only && get_valid_blocks(sbi, segno, true)) {
> >>> + err = -EAGAIN;
> >>> + goto out;
> >>> + }
> >>> }
> >>> + if (gc_only)
> >>> + goto out;
> >>>
> >>> - err = f2fs_sync_fs(sbi->sb, 1);
> >>> + err = f2fs_write_checkpoint(sbi, &cpc);
> >>> if (err)
> >>> - return err;
> >>> + goto out;
> >>>
> >>> next_inuse = find_next_inuse(FREE_I(sbi), end + 1, start);
> >>> if (next_inuse <= end) {
> >>> @@ -1436,6 +1455,8 @@ static int free_segment_range(struct f2fs_sb_info *sbi, unsigned int start,
> >>> next_inuse);
> >>> f2fs_bug_on(sbi, 1);
> >>> }
> >>> +out:
> >>> + MAIN_SECS(sbi) -= secs;
> >>> return err;
> >>> }
> >>>
> >>> @@ -1481,6 +1502,7 @@ static void update_fs_metadata(struct f2fs_sb_info *sbi, int secs)
> >>>
> >>> SM_I(sbi)->segment_count = (int)SM_I(sbi)->segment_count + segs;
> >>> MAIN_SEGS(sbi) = (int)MAIN_SEGS(sbi) + segs;
> >>> + MAIN_SECS(sbi) += secs;
> >>> FREE_I(sbi)->free_sections = (int)FREE_I(sbi)->free_sections + secs;
> >>> FREE_I(sbi)->free_segments = (int)FREE_I(sbi)->free_segments + segs;
> >>> F2FS_CKPT(sbi)->user_block_count = cpu_to_le64(user_block_count + blks);
> >>> @@ -1502,8 +1524,8 @@ static void update_fs_metadata(struct f2fs_sb_info *sbi, int secs)
> >>> int f2fs_resize_fs(struct f2fs_sb_info *sbi, __u64 block_count)
> >>> {
> >>> __u64 old_block_count, shrunk_blocks;
> >>> + struct cp_control cpc = { CP_RESIZE, 0, 0, 0 };
> >>> unsigned int secs;
> >>> - int gc_mode, gc_type;
> >>> int err = 0;
> >>> __u32 rem;
> >>>
> >>> @@ -1538,10 +1560,22 @@ int f2fs_resize_fs(struct f2fs_sb_info *sbi, __u64 block_count)
> >>> return -EINVAL;
> >>> }
> >>>
> >>> - freeze_bdev(sbi->sb->s_bdev);
> >>> -
> >>> shrunk_blocks = old_block_count - block_count;
> >>> secs = div_u64(shrunk_blocks, BLKS_PER_SEC(sbi));
> >>> +
> >>> + /* protect MAIN_SEC in free_segment_range */
> >>> + f2fs_lock_op(sbi);
> >>> + err = free_segment_range(sbi, secs, true);
> >>
> >> For this path, we break the rule that we need hold gc_lock during
> >> do_garbage_collect().
> >
> > I don't get the point.
> > In free_segment_range(), gc_lock is held before/after do_garbage_collect().
>
> @@ -1417,18 +1434,20 @@ static int free_segment_range(struct f2fs_sb_info *sbi, unsigned int start,
> .iroot = RADIX_TREE_INIT(gc_list.iroot, GFP_NOFS),
> };
>
> - down_write(&sbi->gc_lock);
> do_garbage_collect(sbi, segno, &gc_list, FG_GC);
> - up_write(&sbi->gc_lock);
> put_gc_inode(&gc_list);
>
> They were removed, right?

Ah, I messed up the patches. Yes, it seems I removed it.

>
> >
> >>
> >> One other concern is that still the granularity of lock_op is too large,
> >> to avoid potential hang if it triggers heavy gc migration, how about using
> >> a timeout mechanism in free_segment_range() like we did in
> >> f2fs_disable_checkpoint()?
> >
> > We can do first round GC without f2fs_lock_op().
>
> Yup, that makes sense to me.
>
> >
> >>
> >>> + f2fs_unlock_op(sbi);
> >>> + if (err)
> >>> + return err;
> >>> +
> >>> + set_sbi_flag(sbi, SBI_IS_RESIZEFS);
> >>> +
> >>> + freeze_super(sbi->sb);
> >>> + down_write(&sbi->gc_lock);
> >>> + mutex_lock(&sbi->cp_mutex);
> >>> +
> >>> spin_lock(&sbi->stat_lock);
> >>> if (shrunk_blocks + valid_user_blocks(sbi) +
> >>> sbi->current_reserved_blocks + sbi->unusable_block_count +
> >>> @@ -1550,69 +1584,44 @@ int f2fs_resize_fs(struct f2fs_sb_info *sbi, __u64 block_count)
> >>> else
> >>> sbi->user_block_count -= shrunk_blocks;
> >>> spin_unlock(&sbi->stat_lock);
> >>> - if (err) {
> >>> - thaw_bdev(sbi->sb->s_bdev, sbi->sb);
> >>> - return err;
> >>> - }
> >>> -
> >>> - mutex_lock(&sbi->resize_mutex);
> >>> - set_sbi_flag(sbi, SBI_IS_RESIZEFS);
> >>> -
> >>> - mutex_lock(&DIRTY_I(sbi)->seglist_lock);
> >>> -
> >>> - MAIN_SECS(sbi) -= secs;
> >>> -
> >>> - for (gc_mode = 0; gc_mode < MAX_GC_POLICY; gc_mode++)
> >>> - if (SIT_I(sbi)->last_victim[gc_mode] >=
> >>> - MAIN_SECS(sbi) * sbi->segs_per_sec)
> >>> - SIT_I(sbi)->last_victim[gc_mode] = 0;
> >>> -
> >>> - for (gc_type = BG_GC; gc_type <= FG_GC; gc_type++)
> >>> - if (sbi->next_victim_seg[gc_type] >=
> >>> - MAIN_SECS(sbi) * sbi->segs_per_sec)
> >>> - sbi->next_victim_seg[gc_type] = NULL_SEGNO;
> >>> -
> >>> - mutex_unlock(&DIRTY_I(sbi)->seglist_lock);
> >>> + if (err)
> >>> + goto out_err;
> >>>
> >>> - err = free_segment_range(sbi, MAIN_SECS(sbi) * sbi->segs_per_sec,
> >>> - MAIN_SEGS(sbi) - 1);
> >>> + err = free_segment_range(sbi, secs, false);
> >>
> >> Lock coverage is still large here, what about just checking the resize condition
> >> with find_next_inuse(, end + 1, start), if the migration finished, then let's call
> >> write_checkpoint(), otherwise, returning -EAGAIN.
> >
> > We did GC above, so how much time do you expect to complete here? Basically
>
> Just to consider handling corner case here, which is in between round 1 and round 2
> GC, there may occur a fair mount of data writes locating in the tail of device.
>
> > I hesitate to return EAGAIN, since there's no context with respect to how
> > many times user need to retry to get succeess. Disabling checkpoint has
>
> I think user can accept EAGAIN if we describe clearly about that the interface
> may fail due to concurrent IO on device, if the administrator want to resize
> successfully, concurrent operation on device should be avoided.
>
> > some ways to get a sense tho. Nevertheless, if we want to return EAGAIN,
> > it'd be better to give # of bytes that it requires to migrate further?
>
> Hmm.. I don't think there is such number, as resize should be atomic.

I thought we can return roughly # of dirty segments?

>
> Thanks,
>
> >
> >>
> >>> if (err)
> >>> - goto out;
> >>> + goto recover_out;
> >>>
> >>> update_sb_metadata(sbi, -secs);
> >>>
> >>> err = f2fs_commit_super(sbi, false);
> >>> if (err) {
> >>> update_sb_metadata(sbi, secs);
> >>> - goto out;
> >>> + goto recover_out;
> >>> }
> >>>
> >>> - mutex_lock(&sbi->cp_mutex);
> >>> update_fs_metadata(sbi, -secs);
> >>> clear_sbi_flag(sbi, SBI_IS_RESIZEFS);
> >>> set_sbi_flag(sbi, SBI_IS_DIRTY);
> >>> - mutex_unlock(&sbi->cp_mutex);
> >>>
> >>> - err = f2fs_sync_fs(sbi->sb, 1);
> >>> + err = f2fs_write_checkpoint(sbi, &cpc);
> >>> if (err) {
> >>> - mutex_lock(&sbi->cp_mutex);
> >>> update_fs_metadata(sbi, secs);
> >>> - mutex_unlock(&sbi->cp_mutex);
> >>> update_sb_metadata(sbi, secs);
> >>> f2fs_commit_super(sbi, false);
> >>> }
> >>> -out:
> >>> +recover_out:
> >>> if (err) {
> >>> set_sbi_flag(sbi, SBI_NEED_FSCK);
> >>> f2fs_err(sbi, "resize_fs failed, should run fsck to repair!");
> >>>
> >>> - MAIN_SECS(sbi) += secs;
> >>> spin_lock(&sbi->stat_lock);
> >>> sbi->user_block_count += shrunk_blocks;
> >>> spin_unlock(&sbi->stat_lock);
> >>> }
> >>> +out_err:
> >>> + mutex_unlock(&sbi->cp_mutex);
> >>> + up_write(&sbi->gc_lock);
> >>> + thaw_super(sbi->sb);
> >>> clear_sbi_flag(sbi, SBI_IS_RESIZEFS);
> >>> - mutex_unlock(&sbi->resize_mutex);
> >>> - thaw_bdev(sbi->sb->s_bdev, sbi->sb);
> >>> return err;
> >>> }
> >>> diff --git a/fs/f2fs/super.c b/fs/f2fs/super.c
> >>> index b83b17b54a0a6..1e7b1d21d0177 100644
> >>> --- a/fs/f2fs/super.c
> >>> +++ b/fs/f2fs/super.c
> >>> @@ -3412,7 +3412,6 @@ static int f2fs_fill_super(struct super_block *sb, void *data, int silent)
> >>> init_rwsem(&sbi->gc_lock);
> >>> mutex_init(&sbi->writepages);
> >>> mutex_init(&sbi->cp_mutex);
> >>> - mutex_init(&sbi->resize_mutex);
> >>> init_rwsem(&sbi->node_write);
> >>> init_rwsem(&sbi->node_change);
> >>>
> >>> diff --git a/include/trace/events/f2fs.h b/include/trace/events/f2fs.h
> >>> index 4d7d4c391879d..5d1a72001fdb4 100644
> >>> --- a/include/trace/events/f2fs.h
> >>> +++ b/include/trace/events/f2fs.h
> >>> @@ -50,6 +50,7 @@ TRACE_DEFINE_ENUM(CP_RECOVERY);
> >>> TRACE_DEFINE_ENUM(CP_DISCARD);
> >>> TRACE_DEFINE_ENUM(CP_TRIMMED);
> >>> TRACE_DEFINE_ENUM(CP_PAUSE);
> >>> +TRACE_DEFINE_ENUM(CP_RESIZE);
> >>>
> >>> #define show_block_type(type) \
> >>> __print_symbolic(type, \
> >>> @@ -126,7 +127,8 @@ TRACE_DEFINE_ENUM(CP_PAUSE);
> >>> { CP_RECOVERY, "Recovery" }, \
> >>> { CP_DISCARD, "Discard" }, \
> >>> { CP_PAUSE, "Pause" }, \
> >>> - { CP_TRIMMED, "Trimmed" })
> >>> + { CP_TRIMMED, "Trimmed" }, \
> >>> + { CP_RESIZE, "Resize" })
> >>>
> >>> #define show_fsync_cpreason(type) \
> >>> __print_symbolic(type, \
> >>>
> > .
> >

2020-04-14 13:30:47

by Jaegeuk Kim

[permalink] [raw]
Subject: Re: [f2fs-dev] [PATCH] f2fs: prevent meta updates while checkpoint is in progress

On 04/03, Jaegeuk Kim wrote:
> On 04/03, Jaegeuk Kim wrote:
> > On 04/01, Sahitya Tummala wrote:
> > > Hi Jaegeuk,
> > >
> > > Got it.
> > > The diff below looks good to me.
> > > Would you like me to test it and put a patch for this?
> >
> > Sahitya, Chao,
> >
> > Could you please take a look at this patch and test intensively?
> >
> > Thanks,
v3:
- fix gc_lock

From d10c09dfedc7a10cef7dd34493ddbd7c27889033 Mon Sep 17 00:00:00 2001
From: Jaegeuk Kim <[email protected]>
Date: Tue, 31 Mar 2020 11:43:07 -0700
Subject: [PATCH] f2fs: refactor resize_fs to avoid meta updates in progress

Sahitya raised an issue:
- prevent meta updates while checkpoint is in progress

allocate_segment_for_resize() can cause metapage updates if
it requires to change the current node/data segments for resizing.
Stop these meta updates when there is a checkpoint already
in progress to prevent inconsistent CP data.

Signed-off-by: Sahitya Tummala <[email protected]>
Signed-off-by: Jaegeuk Kim <[email protected]>
---
fs/f2fs/checkpoint.c | 6 ++-
fs/f2fs/f2fs.h | 2 +-
fs/f2fs/file.c | 5 +-
fs/f2fs/gc.c | 105 ++++++++++++++++++++----------------
fs/f2fs/super.c | 1 -
include/trace/events/f2fs.h | 4 +-
6 files changed, 67 insertions(+), 56 deletions(-)

diff --git a/fs/f2fs/checkpoint.c b/fs/f2fs/checkpoint.c
index 6be357c8e0020..dcb3a15574c99 100644
--- a/fs/f2fs/checkpoint.c
+++ b/fs/f2fs/checkpoint.c
@@ -1554,7 +1554,8 @@ int f2fs_write_checkpoint(struct f2fs_sb_info *sbi, struct cp_control *cpc)
return 0;
f2fs_warn(sbi, "Start checkpoint disabled!");
}
- mutex_lock(&sbi->cp_mutex);
+ if (cpc->reason != CP_RESIZE)
+ mutex_lock(&sbi->cp_mutex);

if (!is_sbi_flag_set(sbi, SBI_IS_DIRTY) &&
((cpc->reason & CP_FASTBOOT) || (cpc->reason & CP_SYNC) ||
@@ -1623,7 +1624,8 @@ int f2fs_write_checkpoint(struct f2fs_sb_info *sbi, struct cp_control *cpc)
f2fs_update_time(sbi, CP_TIME);
trace_f2fs_write_checkpoint(sbi->sb, cpc->reason, "finish checkpoint");
out:
- mutex_unlock(&sbi->cp_mutex);
+ if (cpc->reason != CP_RESIZE)
+ mutex_unlock(&sbi->cp_mutex);
return err;
}

diff --git a/fs/f2fs/f2fs.h b/fs/f2fs/f2fs.h
index 1241a397bf53c..e8e26ab723eba 100644
--- a/fs/f2fs/f2fs.h
+++ b/fs/f2fs/f2fs.h
@@ -194,6 +194,7 @@ enum {
#define CP_DISCARD 0x00000010
#define CP_TRIMMED 0x00000020
#define CP_PAUSE 0x00000040
+#define CP_RESIZE 0x00000080

#define MAX_DISCARD_BLOCKS(sbi) BLKS_PER_SEC(sbi)
#define DEF_MAX_DISCARD_REQUEST 8 /* issue 8 discards per round */
@@ -1422,7 +1423,6 @@ struct f2fs_sb_info {
unsigned int segs_per_sec; /* segments per section */
unsigned int secs_per_zone; /* sections per zone */
unsigned int total_sections; /* total section count */
- struct mutex resize_mutex; /* for resize exclusion */
unsigned int total_node_count; /* total node block count */
unsigned int total_valid_node_count; /* valid node block count */
loff_t max_file_blocks; /* max block index of file */
diff --git a/fs/f2fs/file.c b/fs/f2fs/file.c
index f06b029c00d8d..0514fab8d2eb8 100644
--- a/fs/f2fs/file.c
+++ b/fs/f2fs/file.c
@@ -3313,7 +3313,6 @@ static int f2fs_ioc_resize_fs(struct file *filp, unsigned long arg)
{
struct f2fs_sb_info *sbi = F2FS_I_SB(file_inode(filp));
__u64 block_count;
- int ret;

if (!capable(CAP_SYS_ADMIN))
return -EPERM;
@@ -3325,9 +3324,7 @@ static int f2fs_ioc_resize_fs(struct file *filp, unsigned long arg)
sizeof(block_count)))
return -EFAULT;

- ret = f2fs_resize_fs(sbi, block_count);
-
- return ret;
+ return f2fs_resize_fs(sbi, block_count);
}

static int f2fs_ioc_enable_verity(struct file *filp, unsigned long arg)
diff --git a/fs/f2fs/gc.c b/fs/f2fs/gc.c
index 26248c8936db0..3d003397252b8 100644
--- a/fs/f2fs/gc.c
+++ b/fs/f2fs/gc.c
@@ -1399,12 +1399,29 @@ void f2fs_build_gc_manager(struct f2fs_sb_info *sbi)
GET_SEGNO(sbi, FDEV(0).end_blk) + 1;
}

-static int free_segment_range(struct f2fs_sb_info *sbi, unsigned int start,
- unsigned int end)
+static int free_segment_range(struct f2fs_sb_info *sbi,
+ unsigned int secs, bool gc_only)
{
- int type;
- unsigned int segno, next_inuse;
+ unsigned int segno, next_inuse, start, end;
+ struct cp_control cpc = { CP_RESIZE, 0, 0, 0 };
+ int gc_mode, gc_type;
int err = 0;
+ int type;
+
+ /* Force block allocation for GC */
+ MAIN_SECS(sbi) -= secs;
+ start = MAIN_SECS(sbi) * sbi->segs_per_sec;
+ end = MAIN_SEGS(sbi) - 1;
+
+ mutex_lock(&DIRTY_I(sbi)->seglist_lock);
+ for (gc_mode = 0; gc_mode < MAX_GC_POLICY; gc_mode++)
+ if (SIT_I(sbi)->last_victim[gc_mode] >= start)
+ SIT_I(sbi)->last_victim[gc_mode] = 0;
+
+ for (gc_type = BG_GC; gc_type <= FG_GC; gc_type++)
+ if (sbi->next_victim_seg[gc_type] >= start)
+ sbi->next_victim_seg[gc_type] = NULL_SEGNO;
+ mutex_unlock(&DIRTY_I(sbi)->seglist_lock);

/* Move out cursegs from the target range */
for (type = CURSEG_HOT_DATA; type < NR_CURSEG_TYPE; type++)
@@ -1422,13 +1439,17 @@ static int free_segment_range(struct f2fs_sb_info *sbi, unsigned int start,
up_write(&sbi->gc_lock);
put_gc_inode(&gc_list);

- if (get_valid_blocks(sbi, segno, true))
- return -EAGAIN;
+ if (!gc_only && get_valid_blocks(sbi, segno, true)) {
+ err = -EAGAIN;
+ goto out;
+ }
}
+ if (gc_only)
+ goto out;

- err = f2fs_sync_fs(sbi->sb, 1);
+ err = f2fs_write_checkpoint(sbi, &cpc);
if (err)
- return err;
+ goto out;

next_inuse = find_next_inuse(FREE_I(sbi), end + 1, start);
if (next_inuse <= end) {
@@ -1436,6 +1457,8 @@ static int free_segment_range(struct f2fs_sb_info *sbi, unsigned int start,
next_inuse);
f2fs_bug_on(sbi, 1);
}
+out:
+ MAIN_SECS(sbi) -= secs;
return err;
}

@@ -1481,6 +1504,7 @@ static void update_fs_metadata(struct f2fs_sb_info *sbi, int secs)

SM_I(sbi)->segment_count = (int)SM_I(sbi)->segment_count + segs;
MAIN_SEGS(sbi) = (int)MAIN_SEGS(sbi) + segs;
+ MAIN_SECS(sbi) += secs;
FREE_I(sbi)->free_sections = (int)FREE_I(sbi)->free_sections + secs;
FREE_I(sbi)->free_segments = (int)FREE_I(sbi)->free_segments + segs;
F2FS_CKPT(sbi)->user_block_count = cpu_to_le64(user_block_count + blks);
@@ -1502,8 +1526,8 @@ static void update_fs_metadata(struct f2fs_sb_info *sbi, int secs)
int f2fs_resize_fs(struct f2fs_sb_info *sbi, __u64 block_count)
{
__u64 old_block_count, shrunk_blocks;
+ struct cp_control cpc = { CP_RESIZE, 0, 0, 0 };
unsigned int secs;
- int gc_mode, gc_type;
int err = 0;
__u32 rem;

@@ -1538,10 +1562,22 @@ int f2fs_resize_fs(struct f2fs_sb_info *sbi, __u64 block_count)
return -EINVAL;
}

- freeze_bdev(sbi->sb->s_bdev);
-
shrunk_blocks = old_block_count - block_count;
secs = div_u64(shrunk_blocks, BLKS_PER_SEC(sbi));
+
+ /* protect MAIN_SEC in free_segment_range */
+ f2fs_lock_op(sbi);
+ err = free_segment_range(sbi, secs, true);
+ f2fs_unlock_op(sbi);
+ if (err)
+ return err;
+
+ set_sbi_flag(sbi, SBI_IS_RESIZEFS);
+
+ freeze_super(sbi->sb);
+ down_write(&sbi->gc_lock);
+ mutex_lock(&sbi->cp_mutex);
+
spin_lock(&sbi->stat_lock);
if (shrunk_blocks + valid_user_blocks(sbi) +
sbi->current_reserved_blocks + sbi->unusable_block_count +
@@ -1550,69 +1586,44 @@ int f2fs_resize_fs(struct f2fs_sb_info *sbi, __u64 block_count)
else
sbi->user_block_count -= shrunk_blocks;
spin_unlock(&sbi->stat_lock);
- if (err) {
- thaw_bdev(sbi->sb->s_bdev, sbi->sb);
- return err;
- }
-
- mutex_lock(&sbi->resize_mutex);
- set_sbi_flag(sbi, SBI_IS_RESIZEFS);
-
- mutex_lock(&DIRTY_I(sbi)->seglist_lock);
-
- MAIN_SECS(sbi) -= secs;
-
- for (gc_mode = 0; gc_mode < MAX_GC_POLICY; gc_mode++)
- if (SIT_I(sbi)->last_victim[gc_mode] >=
- MAIN_SECS(sbi) * sbi->segs_per_sec)
- SIT_I(sbi)->last_victim[gc_mode] = 0;
-
- for (gc_type = BG_GC; gc_type <= FG_GC; gc_type++)
- if (sbi->next_victim_seg[gc_type] >=
- MAIN_SECS(sbi) * sbi->segs_per_sec)
- sbi->next_victim_seg[gc_type] = NULL_SEGNO;
-
- mutex_unlock(&DIRTY_I(sbi)->seglist_lock);
+ if (err)
+ goto out_err;

- err = free_segment_range(sbi, MAIN_SECS(sbi) * sbi->segs_per_sec,
- MAIN_SEGS(sbi) - 1);
+ err = free_segment_range(sbi, secs, false);
if (err)
- goto out;
+ goto recover_out;

update_sb_metadata(sbi, -secs);

err = f2fs_commit_super(sbi, false);
if (err) {
update_sb_metadata(sbi, secs);
- goto out;
+ goto recover_out;
}

- mutex_lock(&sbi->cp_mutex);
update_fs_metadata(sbi, -secs);
clear_sbi_flag(sbi, SBI_IS_RESIZEFS);
set_sbi_flag(sbi, SBI_IS_DIRTY);
- mutex_unlock(&sbi->cp_mutex);

- err = f2fs_sync_fs(sbi->sb, 1);
+ err = f2fs_write_checkpoint(sbi, &cpc);
if (err) {
- mutex_lock(&sbi->cp_mutex);
update_fs_metadata(sbi, secs);
- mutex_unlock(&sbi->cp_mutex);
update_sb_metadata(sbi, secs);
f2fs_commit_super(sbi, false);
}
-out:
+recover_out:
if (err) {
set_sbi_flag(sbi, SBI_NEED_FSCK);
f2fs_err(sbi, "resize_fs failed, should run fsck to repair!");

- MAIN_SECS(sbi) += secs;
spin_lock(&sbi->stat_lock);
sbi->user_block_count += shrunk_blocks;
spin_unlock(&sbi->stat_lock);
}
+out_err:
+ mutex_unlock(&sbi->cp_mutex);
+ up_write(&sbi->gc_lock);
+ thaw_super(sbi->sb);
clear_sbi_flag(sbi, SBI_IS_RESIZEFS);
- mutex_unlock(&sbi->resize_mutex);
- thaw_bdev(sbi->sb->s_bdev, sbi->sb);
return err;
}
diff --git a/fs/f2fs/super.c b/fs/f2fs/super.c
index f2dfc21c6abb0..18b4a43a13438 100644
--- a/fs/f2fs/super.c
+++ b/fs/f2fs/super.c
@@ -3413,7 +3413,6 @@ static int f2fs_fill_super(struct super_block *sb, void *data, int silent)
init_rwsem(&sbi->gc_lock);
mutex_init(&sbi->writepages);
mutex_init(&sbi->cp_mutex);
- mutex_init(&sbi->resize_mutex);
init_rwsem(&sbi->node_write);
init_rwsem(&sbi->node_change);

diff --git a/include/trace/events/f2fs.h b/include/trace/events/f2fs.h
index e78c8696e2adc..e1ad392aae05a 100644
--- a/include/trace/events/f2fs.h
+++ b/include/trace/events/f2fs.h
@@ -50,6 +50,7 @@ TRACE_DEFINE_ENUM(CP_RECOVERY);
TRACE_DEFINE_ENUM(CP_DISCARD);
TRACE_DEFINE_ENUM(CP_TRIMMED);
TRACE_DEFINE_ENUM(CP_PAUSE);
+TRACE_DEFINE_ENUM(CP_RESIZE);

#define show_block_type(type) \
__print_symbolic(type, \
@@ -126,7 +127,8 @@ TRACE_DEFINE_ENUM(CP_PAUSE);
{ CP_RECOVERY, "Recovery" }, \
{ CP_DISCARD, "Discard" }, \
{ CP_PAUSE, "Pause" }, \
- { CP_TRIMMED, "Trimmed" })
+ { CP_TRIMMED, "Trimmed" }, \
+ { CP_RESIZE, "Resize" })

#define show_fsync_cpreason(type) \
__print_symbolic(type, \
--
2.26.0.110.g2183baf09c-goog

2020-04-14 16:00:47

by Jaegeuk Kim

[permalink] [raw]
Subject: Re: [f2fs-dev] [PATCH] f2fs: prevent meta updates while checkpoint is in progress

On 04/13, Jaegeuk Kim wrote:
> On 04/03, Jaegeuk Kim wrote:
> > On 04/03, Jaegeuk Kim wrote:
> > > On 04/01, Sahitya Tummala wrote:
> > > > Hi Jaegeuk,
> > > >
> > > > Got it.
> > > > The diff below looks good to me.
> > > > Would you like me to test it and put a patch for this?
> > >
> > > Sahitya, Chao,
> > >
> > > Could you please take a look at this patch and test intensively?
> > >
> > > Thanks,

v4:
- fix deadlock

From fcbf75b308a8b933706c7e4dd18f275129baa928 Mon Sep 17 00:00:00 2001
From: Jaegeuk Kim <[email protected]>
Date: Tue, 31 Mar 2020 11:43:07 -0700
Subject: [PATCH] f2fs: refactor resize_fs to avoid meta updates in progress

Sahitya raised an issue:
- prevent meta updates while checkpoint is in progress

allocate_segment_for_resize() can cause metapage updates if
it requires to change the current node/data segments for resizing.
Stop these meta updates when there is a checkpoint already
in progress to prevent inconsistent CP data.

Signed-off-by: Sahitya Tummala <[email protected]>
Signed-off-by: Jaegeuk Kim <[email protected]>
---
fs/f2fs/checkpoint.c | 6 +-
fs/f2fs/f2fs.h | 2 +-
fs/f2fs/file.c | 5 +-
fs/f2fs/gc.c | 112 ++++++++++++++++++++----------------
fs/f2fs/super.c | 1 -
include/trace/events/f2fs.h | 4 +-
6 files changed, 72 insertions(+), 58 deletions(-)

diff --git a/fs/f2fs/checkpoint.c b/fs/f2fs/checkpoint.c
index 6be357c8e0020..dcb3a15574c99 100644
--- a/fs/f2fs/checkpoint.c
+++ b/fs/f2fs/checkpoint.c
@@ -1554,7 +1554,8 @@ int f2fs_write_checkpoint(struct f2fs_sb_info *sbi, struct cp_control *cpc)
return 0;
f2fs_warn(sbi, "Start checkpoint disabled!");
}
- mutex_lock(&sbi->cp_mutex);
+ if (cpc->reason != CP_RESIZE)
+ mutex_lock(&sbi->cp_mutex);

if (!is_sbi_flag_set(sbi, SBI_IS_DIRTY) &&
((cpc->reason & CP_FASTBOOT) || (cpc->reason & CP_SYNC) ||
@@ -1623,7 +1624,8 @@ int f2fs_write_checkpoint(struct f2fs_sb_info *sbi, struct cp_control *cpc)
f2fs_update_time(sbi, CP_TIME);
trace_f2fs_write_checkpoint(sbi->sb, cpc->reason, "finish checkpoint");
out:
- mutex_unlock(&sbi->cp_mutex);
+ if (cpc->reason != CP_RESIZE)
+ mutex_unlock(&sbi->cp_mutex);
return err;
}

diff --git a/fs/f2fs/f2fs.h b/fs/f2fs/f2fs.h
index 801c04858bc94..da5e9dd747fab 100644
--- a/fs/f2fs/f2fs.h
+++ b/fs/f2fs/f2fs.h
@@ -194,6 +194,7 @@ enum {
#define CP_DISCARD 0x00000010
#define CP_TRIMMED 0x00000020
#define CP_PAUSE 0x00000040
+#define CP_RESIZE 0x00000080

#define MAX_DISCARD_BLOCKS(sbi) BLKS_PER_SEC(sbi)
#define DEF_MAX_DISCARD_REQUEST 8 /* issue 8 discards per round */
@@ -1423,7 +1424,6 @@ struct f2fs_sb_info {
unsigned int segs_per_sec; /* segments per section */
unsigned int secs_per_zone; /* sections per zone */
unsigned int total_sections; /* total section count */
- struct mutex resize_mutex; /* for resize exclusion */
unsigned int total_node_count; /* total node block count */
unsigned int total_valid_node_count; /* valid node block count */
loff_t max_file_blocks; /* max block index of file */
diff --git a/fs/f2fs/file.c b/fs/f2fs/file.c
index dc470358f25eb..212c5996d3807 100644
--- a/fs/f2fs/file.c
+++ b/fs/f2fs/file.c
@@ -3306,7 +3306,6 @@ static int f2fs_ioc_resize_fs(struct file *filp, unsigned long arg)
{
struct f2fs_sb_info *sbi = F2FS_I_SB(file_inode(filp));
__u64 block_count;
- int ret;

if (!capable(CAP_SYS_ADMIN))
return -EPERM;
@@ -3318,9 +3317,7 @@ static int f2fs_ioc_resize_fs(struct file *filp, unsigned long arg)
sizeof(block_count)))
return -EFAULT;

- ret = f2fs_resize_fs(sbi, block_count);
-
- return ret;
+ return f2fs_resize_fs(sbi, block_count);
}

static int f2fs_ioc_enable_verity(struct file *filp, unsigned long arg)
diff --git a/fs/f2fs/gc.c b/fs/f2fs/gc.c
index 26248c8936db0..ad395b774a0b2 100644
--- a/fs/f2fs/gc.c
+++ b/fs/f2fs/gc.c
@@ -1399,12 +1399,29 @@ void f2fs_build_gc_manager(struct f2fs_sb_info *sbi)
GET_SEGNO(sbi, FDEV(0).end_blk) + 1;
}

-static int free_segment_range(struct f2fs_sb_info *sbi, unsigned int start,
- unsigned int end)
+static int free_segment_range(struct f2fs_sb_info *sbi,
+ unsigned int secs, bool gc_only)
{
- int type;
- unsigned int segno, next_inuse;
+ unsigned int segno, next_inuse, start, end;
+ struct cp_control cpc = { CP_RESIZE, 0, 0, 0 };
+ int gc_mode, gc_type;
int err = 0;
+ int type;
+
+ /* Force block allocation for GC */
+ MAIN_SECS(sbi) -= secs;
+ start = MAIN_SECS(sbi) * sbi->segs_per_sec;
+ end = MAIN_SEGS(sbi) - 1;
+
+ mutex_lock(&DIRTY_I(sbi)->seglist_lock);
+ for (gc_mode = 0; gc_mode < MAX_GC_POLICY; gc_mode++)
+ if (SIT_I(sbi)->last_victim[gc_mode] >= start)
+ SIT_I(sbi)->last_victim[gc_mode] = 0;
+
+ for (gc_type = BG_GC; gc_type <= FG_GC; gc_type++)
+ if (sbi->next_victim_seg[gc_type] >= start)
+ sbi->next_victim_seg[gc_type] = NULL_SEGNO;
+ mutex_unlock(&DIRTY_I(sbi)->seglist_lock);

/* Move out cursegs from the target range */
for (type = CURSEG_HOT_DATA; type < NR_CURSEG_TYPE; type++)
@@ -1417,18 +1434,20 @@ static int free_segment_range(struct f2fs_sb_info *sbi, unsigned int start,
.iroot = RADIX_TREE_INIT(gc_list.iroot, GFP_NOFS),
};

- down_write(&sbi->gc_lock);
do_garbage_collect(sbi, segno, &gc_list, FG_GC);
- up_write(&sbi->gc_lock);
put_gc_inode(&gc_list);

- if (get_valid_blocks(sbi, segno, true))
- return -EAGAIN;
+ if (!gc_only && get_valid_blocks(sbi, segno, true)) {
+ err = -EAGAIN;
+ goto out;
+ }
}
+ if (gc_only)
+ goto out;

- err = f2fs_sync_fs(sbi->sb, 1);
+ err = f2fs_write_checkpoint(sbi, &cpc);
if (err)
- return err;
+ goto out;

next_inuse = find_next_inuse(FREE_I(sbi), end + 1, start);
if (next_inuse <= end) {
@@ -1436,6 +1455,8 @@ static int free_segment_range(struct f2fs_sb_info *sbi, unsigned int start,
next_inuse);
f2fs_bug_on(sbi, 1);
}
+out:
+ MAIN_SECS(sbi) -= secs;
return err;
}

@@ -1481,6 +1502,7 @@ static void update_fs_metadata(struct f2fs_sb_info *sbi, int secs)

SM_I(sbi)->segment_count = (int)SM_I(sbi)->segment_count + segs;
MAIN_SEGS(sbi) = (int)MAIN_SEGS(sbi) + segs;
+ MAIN_SECS(sbi) += secs;
FREE_I(sbi)->free_sections = (int)FREE_I(sbi)->free_sections + secs;
FREE_I(sbi)->free_segments = (int)FREE_I(sbi)->free_segments + segs;
F2FS_CKPT(sbi)->user_block_count = cpu_to_le64(user_block_count + blks);
@@ -1502,8 +1524,8 @@ static void update_fs_metadata(struct f2fs_sb_info *sbi, int secs)
int f2fs_resize_fs(struct f2fs_sb_info *sbi, __u64 block_count)
{
__u64 old_block_count, shrunk_blocks;
+ struct cp_control cpc = { CP_RESIZE, 0, 0, 0 };
unsigned int secs;
- int gc_mode, gc_type;
int err = 0;
__u32 rem;

@@ -1538,10 +1560,27 @@ int f2fs_resize_fs(struct f2fs_sb_info *sbi, __u64 block_count)
return -EINVAL;
}

- freeze_bdev(sbi->sb->s_bdev);
-
shrunk_blocks = old_block_count - block_count;
secs = div_u64(shrunk_blocks, BLKS_PER_SEC(sbi));
+
+ /* stop other GC */
+ if (!down_write_trylock(&sbi->gc_lock))
+ return -EAGAIN;
+
+ /* stop CP to protect MAIN_SEC in free_segment_range */
+ f2fs_lock_op(sbi);
+ err = free_segment_range(sbi, secs, true);
+ f2fs_unlock_op(sbi);
+ up_write(&sbi->gc_lock);
+ if (err)
+ return err;
+
+ set_sbi_flag(sbi, SBI_IS_RESIZEFS);
+
+ freeze_super(sbi->sb);
+ down_write(&sbi->gc_lock);
+ mutex_lock(&sbi->cp_mutex);
+
spin_lock(&sbi->stat_lock);
if (shrunk_blocks + valid_user_blocks(sbi) +
sbi->current_reserved_blocks + sbi->unusable_block_count +
@@ -1550,69 +1589,44 @@ int f2fs_resize_fs(struct f2fs_sb_info *sbi, __u64 block_count)
else
sbi->user_block_count -= shrunk_blocks;
spin_unlock(&sbi->stat_lock);
- if (err) {
- thaw_bdev(sbi->sb->s_bdev, sbi->sb);
- return err;
- }
-
- mutex_lock(&sbi->resize_mutex);
- set_sbi_flag(sbi, SBI_IS_RESIZEFS);
-
- mutex_lock(&DIRTY_I(sbi)->seglist_lock);
-
- MAIN_SECS(sbi) -= secs;
-
- for (gc_mode = 0; gc_mode < MAX_GC_POLICY; gc_mode++)
- if (SIT_I(sbi)->last_victim[gc_mode] >=
- MAIN_SECS(sbi) * sbi->segs_per_sec)
- SIT_I(sbi)->last_victim[gc_mode] = 0;
-
- for (gc_type = BG_GC; gc_type <= FG_GC; gc_type++)
- if (sbi->next_victim_seg[gc_type] >=
- MAIN_SECS(sbi) * sbi->segs_per_sec)
- sbi->next_victim_seg[gc_type] = NULL_SEGNO;
-
- mutex_unlock(&DIRTY_I(sbi)->seglist_lock);
+ if (err)
+ goto out_err;

- err = free_segment_range(sbi, MAIN_SECS(sbi) * sbi->segs_per_sec,
- MAIN_SEGS(sbi) - 1);
+ err = free_segment_range(sbi, secs, false);
if (err)
- goto out;
+ goto recover_out;

update_sb_metadata(sbi, -secs);

err = f2fs_commit_super(sbi, false);
if (err) {
update_sb_metadata(sbi, secs);
- goto out;
+ goto recover_out;
}

- mutex_lock(&sbi->cp_mutex);
update_fs_metadata(sbi, -secs);
clear_sbi_flag(sbi, SBI_IS_RESIZEFS);
set_sbi_flag(sbi, SBI_IS_DIRTY);
- mutex_unlock(&sbi->cp_mutex);

- err = f2fs_sync_fs(sbi->sb, 1);
+ err = f2fs_write_checkpoint(sbi, &cpc);
if (err) {
- mutex_lock(&sbi->cp_mutex);
update_fs_metadata(sbi, secs);
- mutex_unlock(&sbi->cp_mutex);
update_sb_metadata(sbi, secs);
f2fs_commit_super(sbi, false);
}
-out:
+recover_out:
if (err) {
set_sbi_flag(sbi, SBI_NEED_FSCK);
f2fs_err(sbi, "resize_fs failed, should run fsck to repair!");

- MAIN_SECS(sbi) += secs;
spin_lock(&sbi->stat_lock);
sbi->user_block_count += shrunk_blocks;
spin_unlock(&sbi->stat_lock);
}
+out_err:
+ mutex_unlock(&sbi->cp_mutex);
+ up_write(&sbi->gc_lock);
+ thaw_super(sbi->sb);
clear_sbi_flag(sbi, SBI_IS_RESIZEFS);
- mutex_unlock(&sbi->resize_mutex);
- thaw_bdev(sbi->sb->s_bdev, sbi->sb);
return err;
}
diff --git a/fs/f2fs/super.c b/fs/f2fs/super.c
index 43a61ed592c10..33da1ad238d72 100644
--- a/fs/f2fs/super.c
+++ b/fs/f2fs/super.c
@@ -3420,7 +3420,6 @@ static int f2fs_fill_super(struct super_block *sb, void *data, int silent)
init_rwsem(&sbi->gc_lock);
mutex_init(&sbi->writepages);
mutex_init(&sbi->cp_mutex);
- mutex_init(&sbi->resize_mutex);
init_rwsem(&sbi->node_write);
init_rwsem(&sbi->node_change);

diff --git a/include/trace/events/f2fs.h b/include/trace/events/f2fs.h
index 3577fa67690af..421a661bfd5aa 100644
--- a/include/trace/events/f2fs.h
+++ b/include/trace/events/f2fs.h
@@ -50,6 +50,7 @@ TRACE_DEFINE_ENUM(CP_RECOVERY);
TRACE_DEFINE_ENUM(CP_DISCARD);
TRACE_DEFINE_ENUM(CP_TRIMMED);
TRACE_DEFINE_ENUM(CP_PAUSE);
+TRACE_DEFINE_ENUM(CP_RESIZE);

#define show_block_type(type) \
__print_symbolic(type, \
@@ -126,7 +127,8 @@ TRACE_DEFINE_ENUM(CP_PAUSE);
{ CP_RECOVERY, "Recovery" }, \
{ CP_DISCARD, "Discard" }, \
{ CP_PAUSE, "Pause" }, \
- { CP_TRIMMED, "Trimmed" })
+ { CP_TRIMMED, "Trimmed" }, \
+ { CP_RESIZE, "Resize" })

#define show_fsync_cpreason(type) \
__print_symbolic(type, \
--
2.26.0.110.g2183baf09c-goog

2020-04-15 20:19:34

by Sahitya Tummala

[permalink] [raw]
Subject: Re: [f2fs-dev] [PATCH] f2fs: prevent meta updates while checkpoint is in progress

On Mon, Apr 13, 2020 at 10:42:37AM -0700, Jaegeuk Kim wrote:
> On 04/03, Jaegeuk Kim wrote:
> > On 04/03, Jaegeuk Kim wrote:
> > > On 04/01, Sahitya Tummala wrote:
> > > > Hi Jaegeuk,
> > > >
> > > > Got it.
> > > > The diff below looks good to me.
> > > > Would you like me to test it and put a patch for this?
> > >
> > > Sahitya, Chao,
> > >
> > > Could you please take a look at this patch and test intensively?
> > >
> > > Thanks,
> v3:
> - fix gc_lock
>
> From d10c09dfedc7a10cef7dd34493ddbd7c27889033 Mon Sep 17 00:00:00 2001
> From: Jaegeuk Kim <[email protected]>
> Date: Tue, 31 Mar 2020 11:43:07 -0700
> Subject: [PATCH] f2fs: refactor resize_fs to avoid meta updates in progress
>
> Sahitya raised an issue:
> - prevent meta updates while checkpoint is in progress
>
> allocate_segment_for_resize() can cause metapage updates if
> it requires to change the current node/data segments for resizing.
> Stop these meta updates when there is a checkpoint already
> in progress to prevent inconsistent CP data.
>
> Signed-off-by: Sahitya Tummala <[email protected]>
> Signed-off-by: Jaegeuk Kim <[email protected]>
> ---
> fs/f2fs/checkpoint.c | 6 ++-
> fs/f2fs/f2fs.h | 2 +-
> fs/f2fs/file.c | 5 +-
> fs/f2fs/gc.c | 105 ++++++++++++++++++++----------------
> fs/f2fs/super.c | 1 -
> include/trace/events/f2fs.h | 4 +-
> 6 files changed, 67 insertions(+), 56 deletions(-)
>
> diff --git a/fs/f2fs/checkpoint.c b/fs/f2fs/checkpoint.c
> index 6be357c8e0020..dcb3a15574c99 100644
> --- a/fs/f2fs/checkpoint.c
> +++ b/fs/f2fs/checkpoint.c
> @@ -1554,7 +1554,8 @@ int f2fs_write_checkpoint(struct f2fs_sb_info *sbi, struct cp_control *cpc)
> return 0;
> f2fs_warn(sbi, "Start checkpoint disabled!");
> }
> - mutex_lock(&sbi->cp_mutex);
> + if (cpc->reason != CP_RESIZE)
> + mutex_lock(&sbi->cp_mutex);
>
> if (!is_sbi_flag_set(sbi, SBI_IS_DIRTY) &&
> ((cpc->reason & CP_FASTBOOT) || (cpc->reason & CP_SYNC) ||
> @@ -1623,7 +1624,8 @@ int f2fs_write_checkpoint(struct f2fs_sb_info *sbi, struct cp_control *cpc)
> f2fs_update_time(sbi, CP_TIME);
> trace_f2fs_write_checkpoint(sbi->sb, cpc->reason, "finish checkpoint");
> out:
> - mutex_unlock(&sbi->cp_mutex);
> + if (cpc->reason != CP_RESIZE)
> + mutex_unlock(&sbi->cp_mutex);
> return err;
> }
>
> diff --git a/fs/f2fs/f2fs.h b/fs/f2fs/f2fs.h
> index 1241a397bf53c..e8e26ab723eba 100644
> --- a/fs/f2fs/f2fs.h
> +++ b/fs/f2fs/f2fs.h
> @@ -194,6 +194,7 @@ enum {
> #define CP_DISCARD 0x00000010
> #define CP_TRIMMED 0x00000020
> #define CP_PAUSE 0x00000040
> +#define CP_RESIZE 0x00000080
>
> #define MAX_DISCARD_BLOCKS(sbi) BLKS_PER_SEC(sbi)
> #define DEF_MAX_DISCARD_REQUEST 8 /* issue 8 discards per round */
> @@ -1422,7 +1423,6 @@ struct f2fs_sb_info {
> unsigned int segs_per_sec; /* segments per section */
> unsigned int secs_per_zone; /* sections per zone */
> unsigned int total_sections; /* total section count */
> - struct mutex resize_mutex; /* for resize exclusion */
> unsigned int total_node_count; /* total node block count */
> unsigned int total_valid_node_count; /* valid node block count */
> loff_t max_file_blocks; /* max block index of file */
> diff --git a/fs/f2fs/file.c b/fs/f2fs/file.c
> index f06b029c00d8d..0514fab8d2eb8 100644
> --- a/fs/f2fs/file.c
> +++ b/fs/f2fs/file.c
> @@ -3313,7 +3313,6 @@ static int f2fs_ioc_resize_fs(struct file *filp, unsigned long arg)
> {
> struct f2fs_sb_info *sbi = F2FS_I_SB(file_inode(filp));
> __u64 block_count;
> - int ret;
>
> if (!capable(CAP_SYS_ADMIN))
> return -EPERM;
> @@ -3325,9 +3324,7 @@ static int f2fs_ioc_resize_fs(struct file *filp, unsigned long arg)
> sizeof(block_count)))
> return -EFAULT;
>
> - ret = f2fs_resize_fs(sbi, block_count);
> -
> - return ret;
> + return f2fs_resize_fs(sbi, block_count);
> }
>
> static int f2fs_ioc_enable_verity(struct file *filp, unsigned long arg)
> diff --git a/fs/f2fs/gc.c b/fs/f2fs/gc.c
> index 26248c8936db0..3d003397252b8 100644
> --- a/fs/f2fs/gc.c
> +++ b/fs/f2fs/gc.c
> @@ -1399,12 +1399,29 @@ void f2fs_build_gc_manager(struct f2fs_sb_info *sbi)
> GET_SEGNO(sbi, FDEV(0).end_blk) + 1;
> }
>
> -static int free_segment_range(struct f2fs_sb_info *sbi, unsigned int start,
> - unsigned int end)
> +static int free_segment_range(struct f2fs_sb_info *sbi,
> + unsigned int secs, bool gc_only)
> {
> - int type;
> - unsigned int segno, next_inuse;
> + unsigned int segno, next_inuse, start, end;
> + struct cp_control cpc = { CP_RESIZE, 0, 0, 0 };
> + int gc_mode, gc_type;
> int err = 0;
> + int type;
> +
> + /* Force block allocation for GC */
> + MAIN_SECS(sbi) -= secs;
> + start = MAIN_SECS(sbi) * sbi->segs_per_sec;
> + end = MAIN_SEGS(sbi) - 1;
> +
> + mutex_lock(&DIRTY_I(sbi)->seglist_lock);
> + for (gc_mode = 0; gc_mode < MAX_GC_POLICY; gc_mode++)
> + if (SIT_I(sbi)->last_victim[gc_mode] >= start)
> + SIT_I(sbi)->last_victim[gc_mode] = 0;
> +
> + for (gc_type = BG_GC; gc_type <= FG_GC; gc_type++)
> + if (sbi->next_victim_seg[gc_type] >= start)
> + sbi->next_victim_seg[gc_type] = NULL_SEGNO;
> + mutex_unlock(&DIRTY_I(sbi)->seglist_lock);
>
> /* Move out cursegs from the target range */
> for (type = CURSEG_HOT_DATA; type < NR_CURSEG_TYPE; type++)
> @@ -1422,13 +1439,17 @@ static int free_segment_range(struct f2fs_sb_info *sbi, unsigned int start,
> up_write(&sbi->gc_lock);
> put_gc_inode(&gc_list);
>
> - if (get_valid_blocks(sbi, segno, true))
> - return -EAGAIN;
> + if (!gc_only && get_valid_blocks(sbi, segno, true)) {
> + err = -EAGAIN;
> + goto out;
> + }
> }
> + if (gc_only)
> + goto out;
>
> - err = f2fs_sync_fs(sbi->sb, 1);
> + err = f2fs_write_checkpoint(sbi, &cpc);
> if (err)
> - return err;
> + goto out;
>
> next_inuse = find_next_inuse(FREE_I(sbi), end + 1, start);
> if (next_inuse <= end) {
> @@ -1436,6 +1457,8 @@ static int free_segment_range(struct f2fs_sb_info *sbi, unsigned int start,
> next_inuse);
> f2fs_bug_on(sbi, 1);
> }
> +out:
> + MAIN_SECS(sbi) -= secs;
> return err;
> }
>
> @@ -1481,6 +1504,7 @@ static void update_fs_metadata(struct f2fs_sb_info *sbi, int secs)
>
> SM_I(sbi)->segment_count = (int)SM_I(sbi)->segment_count + segs;
> MAIN_SEGS(sbi) = (int)MAIN_SEGS(sbi) + segs;
> + MAIN_SECS(sbi) += secs;
> FREE_I(sbi)->free_sections = (int)FREE_I(sbi)->free_sections + secs;
> FREE_I(sbi)->free_segments = (int)FREE_I(sbi)->free_segments + segs;
> F2FS_CKPT(sbi)->user_block_count = cpu_to_le64(user_block_count + blks);
> @@ -1502,8 +1526,8 @@ static void update_fs_metadata(struct f2fs_sb_info *sbi, int secs)
> int f2fs_resize_fs(struct f2fs_sb_info *sbi, __u64 block_count)
> {
> __u64 old_block_count, shrunk_blocks;
> + struct cp_control cpc = { CP_RESIZE, 0, 0, 0 };
> unsigned int secs;
> - int gc_mode, gc_type;
> int err = 0;
> __u32 rem;
>
> @@ -1538,10 +1562,22 @@ int f2fs_resize_fs(struct f2fs_sb_info *sbi, __u64 block_count)
> return -EINVAL;
> }
>
> - freeze_bdev(sbi->sb->s_bdev);
> -
> shrunk_blocks = old_block_count - block_count;
> secs = div_u64(shrunk_blocks, BLKS_PER_SEC(sbi));
> +
> + /* protect MAIN_SEC in free_segment_range */
> + f2fs_lock_op(sbi);
> + err = free_segment_range(sbi, secs, true);
> + f2fs_unlock_op(sbi);
> + if (err)
> + return err;
> +
> + set_sbi_flag(sbi, SBI_IS_RESIZEFS);
> +
> + freeze_super(sbi->sb);
> + down_write(&sbi->gc_lock);

free_segment_range() tries to acquire the gc_lock before do_garbage_collect().
It can deadlock.

Thanks,

> + mutex_lock(&sbi->cp_mutex);
> +
> spin_lock(&sbi->stat_lock);
> if (shrunk_blocks + valid_user_blocks(sbi) +
> sbi->current_reserved_blocks + sbi->unusable_block_count +
> @@ -1550,69 +1586,44 @@ int f2fs_resize_fs(struct f2fs_sb_info *sbi, __u64 block_count)
> else
> sbi->user_block_count -= shrunk_blocks;
> spin_unlock(&sbi->stat_lock);
> - if (err) {
> - thaw_bdev(sbi->sb->s_bdev, sbi->sb);
> - return err;
> - }
> -
> - mutex_lock(&sbi->resize_mutex);
> - set_sbi_flag(sbi, SBI_IS_RESIZEFS);
> -
> - mutex_lock(&DIRTY_I(sbi)->seglist_lock);
> -
> - MAIN_SECS(sbi) -= secs;
> -
> - for (gc_mode = 0; gc_mode < MAX_GC_POLICY; gc_mode++)
> - if (SIT_I(sbi)->last_victim[gc_mode] >=
> - MAIN_SECS(sbi) * sbi->segs_per_sec)
> - SIT_I(sbi)->last_victim[gc_mode] = 0;
> -
> - for (gc_type = BG_GC; gc_type <= FG_GC; gc_type++)
> - if (sbi->next_victim_seg[gc_type] >=
> - MAIN_SECS(sbi) * sbi->segs_per_sec)
> - sbi->next_victim_seg[gc_type] = NULL_SEGNO;
> -
> - mutex_unlock(&DIRTY_I(sbi)->seglist_lock);
> + if (err)
> + goto out_err;
>
> - err = free_segment_range(sbi, MAIN_SECS(sbi) * sbi->segs_per_sec,
> - MAIN_SEGS(sbi) - 1);
> + err = free_segment_range(sbi, secs, false);
> if (err)
> - goto out;
> + goto recover_out;
>
> update_sb_metadata(sbi, -secs);
>
> err = f2fs_commit_super(sbi, false);
> if (err) {
> update_sb_metadata(sbi, secs);
> - goto out;
> + goto recover_out;
> }
>
> - mutex_lock(&sbi->cp_mutex);
> update_fs_metadata(sbi, -secs);
> clear_sbi_flag(sbi, SBI_IS_RESIZEFS);
> set_sbi_flag(sbi, SBI_IS_DIRTY);
> - mutex_unlock(&sbi->cp_mutex);
>
> - err = f2fs_sync_fs(sbi->sb, 1);
> + err = f2fs_write_checkpoint(sbi, &cpc);
> if (err) {
> - mutex_lock(&sbi->cp_mutex);
> update_fs_metadata(sbi, secs);
> - mutex_unlock(&sbi->cp_mutex);
> update_sb_metadata(sbi, secs);
> f2fs_commit_super(sbi, false);
> }
> -out:
> +recover_out:
> if (err) {
> set_sbi_flag(sbi, SBI_NEED_FSCK);
> f2fs_err(sbi, "resize_fs failed, should run fsck to repair!");
>
> - MAIN_SECS(sbi) += secs;
> spin_lock(&sbi->stat_lock);
> sbi->user_block_count += shrunk_blocks;
> spin_unlock(&sbi->stat_lock);
> }
> +out_err:
> + mutex_unlock(&sbi->cp_mutex);
> + up_write(&sbi->gc_lock);
> + thaw_super(sbi->sb);
> clear_sbi_flag(sbi, SBI_IS_RESIZEFS);
> - mutex_unlock(&sbi->resize_mutex);
> - thaw_bdev(sbi->sb->s_bdev, sbi->sb);
> return err;
> }
> diff --git a/fs/f2fs/super.c b/fs/f2fs/super.c
> index f2dfc21c6abb0..18b4a43a13438 100644
> --- a/fs/f2fs/super.c
> +++ b/fs/f2fs/super.c
> @@ -3413,7 +3413,6 @@ static int f2fs_fill_super(struct super_block *sb, void *data, int silent)
> init_rwsem(&sbi->gc_lock);
> mutex_init(&sbi->writepages);
> mutex_init(&sbi->cp_mutex);
> - mutex_init(&sbi->resize_mutex);
> init_rwsem(&sbi->node_write);
> init_rwsem(&sbi->node_change);
>
> diff --git a/include/trace/events/f2fs.h b/include/trace/events/f2fs.h
> index e78c8696e2adc..e1ad392aae05a 100644
> --- a/include/trace/events/f2fs.h
> +++ b/include/trace/events/f2fs.h
> @@ -50,6 +50,7 @@ TRACE_DEFINE_ENUM(CP_RECOVERY);
> TRACE_DEFINE_ENUM(CP_DISCARD);
> TRACE_DEFINE_ENUM(CP_TRIMMED);
> TRACE_DEFINE_ENUM(CP_PAUSE);
> +TRACE_DEFINE_ENUM(CP_RESIZE);
>
> #define show_block_type(type) \
> __print_symbolic(type, \
> @@ -126,7 +127,8 @@ TRACE_DEFINE_ENUM(CP_PAUSE);
> { CP_RECOVERY, "Recovery" }, \
> { CP_DISCARD, "Discard" }, \
> { CP_PAUSE, "Pause" }, \
> - { CP_TRIMMED, "Trimmed" })
> + { CP_TRIMMED, "Trimmed" }, \
> + { CP_RESIZE, "Resize" })
>
> #define show_fsync_cpreason(type) \
> __print_symbolic(type, \
> --
> 2.26.0.110.g2183baf09c-goog
>

--
--
Sent by a consultant of the Qualcomm Innovation Center, Inc.
The Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum.

2020-04-15 21:26:09

by Chao Yu

[permalink] [raw]
Subject: Re: [f2fs-dev] [PATCH] f2fs: prevent meta updates while checkpoint is in progress

On 2020/4/14 1:42, Jaegeuk Kim wrote:
> On 04/03, Jaegeuk Kim wrote:
>> On 04/03, Jaegeuk Kim wrote:
>>> On 04/01, Sahitya Tummala wrote:
>>>> Hi Jaegeuk,
>>>>
>>>> Got it.
>>>> The diff below looks good to me.
>>>> Would you like me to test it and put a patch for this?
>>>
>>> Sahitya, Chao,
>>>
>>> Could you please take a look at this patch and test intensively?
>>>
>>> Thanks,
> v3:
> - fix gc_lock
>
>>From d10c09dfedc7a10cef7dd34493ddbd7c27889033 Mon Sep 17 00:00:00 2001
> From: Jaegeuk Kim <[email protected]>
> Date: Tue, 31 Mar 2020 11:43:07 -0700
> Subject: [PATCH] f2fs: refactor resize_fs to avoid meta updates in progress
>
> Sahitya raised an issue:
> - prevent meta updates while checkpoint is in progress
>
> allocate_segment_for_resize() can cause metapage updates if
> it requires to change the current node/data segments for resizing.
> Stop these meta updates when there is a checkpoint already
> in progress to prevent inconsistent CP data.
>
> Signed-off-by: Sahitya Tummala <[email protected]>
> Signed-off-by: Jaegeuk Kim <[email protected]>
> ---
> fs/f2fs/checkpoint.c | 6 ++-
> fs/f2fs/f2fs.h | 2 +-
> fs/f2fs/file.c | 5 +-
> fs/f2fs/gc.c | 105 ++++++++++++++++++++----------------
> fs/f2fs/super.c | 1 -
> include/trace/events/f2fs.h | 4 +-
> 6 files changed, 67 insertions(+), 56 deletions(-)
>
> diff --git a/fs/f2fs/checkpoint.c b/fs/f2fs/checkpoint.c
> index 6be357c8e0020..dcb3a15574c99 100644
> --- a/fs/f2fs/checkpoint.c
> +++ b/fs/f2fs/checkpoint.c
> @@ -1554,7 +1554,8 @@ int f2fs_write_checkpoint(struct f2fs_sb_info *sbi, struct cp_control *cpc)
> return 0;
> f2fs_warn(sbi, "Start checkpoint disabled!");
> }
> - mutex_lock(&sbi->cp_mutex);
> + if (cpc->reason != CP_RESIZE)
> + mutex_lock(&sbi->cp_mutex);
>
> if (!is_sbi_flag_set(sbi, SBI_IS_DIRTY) &&
> ((cpc->reason & CP_FASTBOOT) || (cpc->reason & CP_SYNC) ||
> @@ -1623,7 +1624,8 @@ int f2fs_write_checkpoint(struct f2fs_sb_info *sbi, struct cp_control *cpc)
> f2fs_update_time(sbi, CP_TIME);
> trace_f2fs_write_checkpoint(sbi->sb, cpc->reason, "finish checkpoint");
> out:
> - mutex_unlock(&sbi->cp_mutex);
> + if (cpc->reason != CP_RESIZE)
> + mutex_unlock(&sbi->cp_mutex);
> return err;
> }
>
> diff --git a/fs/f2fs/f2fs.h b/fs/f2fs/f2fs.h
> index 1241a397bf53c..e8e26ab723eba 100644
> --- a/fs/f2fs/f2fs.h
> +++ b/fs/f2fs/f2fs.h
> @@ -194,6 +194,7 @@ enum {
> #define CP_DISCARD 0x00000010
> #define CP_TRIMMED 0x00000020
> #define CP_PAUSE 0x00000040
> +#define CP_RESIZE 0x00000080
>
> #define MAX_DISCARD_BLOCKS(sbi) BLKS_PER_SEC(sbi)
> #define DEF_MAX_DISCARD_REQUEST 8 /* issue 8 discards per round */
> @@ -1422,7 +1423,6 @@ struct f2fs_sb_info {
> unsigned int segs_per_sec; /* segments per section */
> unsigned int secs_per_zone; /* sections per zone */
> unsigned int total_sections; /* total section count */
> - struct mutex resize_mutex; /* for resize exclusion */
> unsigned int total_node_count; /* total node block count */
> unsigned int total_valid_node_count; /* valid node block count */
> loff_t max_file_blocks; /* max block index of file */
> diff --git a/fs/f2fs/file.c b/fs/f2fs/file.c
> index f06b029c00d8d..0514fab8d2eb8 100644
> --- a/fs/f2fs/file.c
> +++ b/fs/f2fs/file.c
> @@ -3313,7 +3313,6 @@ static int f2fs_ioc_resize_fs(struct file *filp, unsigned long arg)
> {
> struct f2fs_sb_info *sbi = F2FS_I_SB(file_inode(filp));
> __u64 block_count;
> - int ret;
>
> if (!capable(CAP_SYS_ADMIN))
> return -EPERM;
> @@ -3325,9 +3324,7 @@ static int f2fs_ioc_resize_fs(struct file *filp, unsigned long arg)
> sizeof(block_count)))
> return -EFAULT;
>
> - ret = f2fs_resize_fs(sbi, block_count);
> -
> - return ret;
> + return f2fs_resize_fs(sbi, block_count);
> }
>
> static int f2fs_ioc_enable_verity(struct file *filp, unsigned long arg)
> diff --git a/fs/f2fs/gc.c b/fs/f2fs/gc.c
> index 26248c8936db0..3d003397252b8 100644
> --- a/fs/f2fs/gc.c
> +++ b/fs/f2fs/gc.c
> @@ -1399,12 +1399,29 @@ void f2fs_build_gc_manager(struct f2fs_sb_info *sbi)
> GET_SEGNO(sbi, FDEV(0).end_blk) + 1;
> }
>
> -static int free_segment_range(struct f2fs_sb_info *sbi, unsigned int start,
> - unsigned int end)
> +static int free_segment_range(struct f2fs_sb_info *sbi,
> + unsigned int secs, bool gc_only)
> {
> - int type;
> - unsigned int segno, next_inuse;
> + unsigned int segno, next_inuse, start, end;
> + struct cp_control cpc = { CP_RESIZE, 0, 0, 0 };
> + int gc_mode, gc_type;
> int err = 0;
> + int type;
> +
> + /* Force block allocation for GC */
> + MAIN_SECS(sbi) -= secs;
> + start = MAIN_SECS(sbi) * sbi->segs_per_sec;
> + end = MAIN_SEGS(sbi) - 1;
> +
> + mutex_lock(&DIRTY_I(sbi)->seglist_lock);
> + for (gc_mode = 0; gc_mode < MAX_GC_POLICY; gc_mode++)
> + if (SIT_I(sbi)->last_victim[gc_mode] >= start)
> + SIT_I(sbi)->last_victim[gc_mode] = 0;
> +
> + for (gc_type = BG_GC; gc_type <= FG_GC; gc_type++)
> + if (sbi->next_victim_seg[gc_type] >= start)
> + sbi->next_victim_seg[gc_type] = NULL_SEGNO;
> + mutex_unlock(&DIRTY_I(sbi)->seglist_lock);
>
> /* Move out cursegs from the target range */
> for (type = CURSEG_HOT_DATA; type < NR_CURSEG_TYPE; type++)
> @@ -1422,13 +1439,17 @@ static int free_segment_range(struct f2fs_sb_info *sbi, unsigned int start,
> up_write(&sbi->gc_lock);
> put_gc_inode(&gc_list);
>
> - if (get_valid_blocks(sbi, segno, true))
> - return -EAGAIN;
> + if (!gc_only && get_valid_blocks(sbi, segno, true)) {
> + err = -EAGAIN;
> + goto out;
> + }
> }
> + if (gc_only)
> + goto out;
>
> - err = f2fs_sync_fs(sbi->sb, 1);
> + err = f2fs_write_checkpoint(sbi, &cpc);
> if (err)
> - return err;
> + goto out;
>
> next_inuse = find_next_inuse(FREE_I(sbi), end + 1, start);
> if (next_inuse <= end) {
> @@ -1436,6 +1457,8 @@ static int free_segment_range(struct f2fs_sb_info *sbi, unsigned int start,
> next_inuse);
> f2fs_bug_on(sbi, 1);
> }
> +out:
> + MAIN_SECS(sbi) -= secs;
> return err;
> }
>
> @@ -1481,6 +1504,7 @@ static void update_fs_metadata(struct f2fs_sb_info *sbi, int secs)
>
> SM_I(sbi)->segment_count = (int)SM_I(sbi)->segment_count + segs;
> MAIN_SEGS(sbi) = (int)MAIN_SEGS(sbi) + segs;
> + MAIN_SECS(sbi) += secs;
> FREE_I(sbi)->free_sections = (int)FREE_I(sbi)->free_sections + secs;
> FREE_I(sbi)->free_segments = (int)FREE_I(sbi)->free_segments + segs;
> F2FS_CKPT(sbi)->user_block_count = cpu_to_le64(user_block_count + blks);
> @@ -1502,8 +1526,8 @@ static void update_fs_metadata(struct f2fs_sb_info *sbi, int secs)
> int f2fs_resize_fs(struct f2fs_sb_info *sbi, __u64 block_count)
> {
> __u64 old_block_count, shrunk_blocks;
> + struct cp_control cpc = { CP_RESIZE, 0, 0, 0 };
> unsigned int secs;
> - int gc_mode, gc_type;
> int err = 0;
> __u32 rem;
>
> @@ -1538,10 +1562,22 @@ int f2fs_resize_fs(struct f2fs_sb_info *sbi, __u64 block_count)
> return -EINVAL;
> }
>
> - freeze_bdev(sbi->sb->s_bdev);
> -
> shrunk_blocks = old_block_count - block_count;
> secs = div_u64(shrunk_blocks, BLKS_PER_SEC(sbi));
> +
> + /* protect MAIN_SEC in free_segment_range */
> + f2fs_lock_op(sbi);
> + err = free_segment_range(sbi, secs, true);
> + f2fs_unlock_op(sbi);

There will be ABBA deadlock:

- lock_op()
- free_segment_range()
- f2fs_sync_fs()
- down_write(gc_lock)
- down_write(gc_lock)
- do_garbage_collect()
- f2fs_write_checkpoint()
- block_operations()
- f2fs_lock_all()

Thanks,

> + if (err)
> + return err;
> +
> + set_sbi_flag(sbi, SBI_IS_RESIZEFS);
> +
> + freeze_super(sbi->sb);
> + down_write(&sbi->gc_lock);
> + mutex_lock(&sbi->cp_mutex);
> +
> spin_lock(&sbi->stat_lock);
> if (shrunk_blocks + valid_user_blocks(sbi) +
> sbi->current_reserved_blocks + sbi->unusable_block_count +
> @@ -1550,69 +1586,44 @@ int f2fs_resize_fs(struct f2fs_sb_info *sbi, __u64 block_count)
> else
> sbi->user_block_count -= shrunk_blocks;
> spin_unlock(&sbi->stat_lock);
> - if (err) {
> - thaw_bdev(sbi->sb->s_bdev, sbi->sb);
> - return err;
> - }
> -
> - mutex_lock(&sbi->resize_mutex);
> - set_sbi_flag(sbi, SBI_IS_RESIZEFS);
> -
> - mutex_lock(&DIRTY_I(sbi)->seglist_lock);
> -
> - MAIN_SECS(sbi) -= secs;
> -
> - for (gc_mode = 0; gc_mode < MAX_GC_POLICY; gc_mode++)
> - if (SIT_I(sbi)->last_victim[gc_mode] >=
> - MAIN_SECS(sbi) * sbi->segs_per_sec)
> - SIT_I(sbi)->last_victim[gc_mode] = 0;
> -
> - for (gc_type = BG_GC; gc_type <= FG_GC; gc_type++)
> - if (sbi->next_victim_seg[gc_type] >=
> - MAIN_SECS(sbi) * sbi->segs_per_sec)
> - sbi->next_victim_seg[gc_type] = NULL_SEGNO;
> -
> - mutex_unlock(&DIRTY_I(sbi)->seglist_lock);
> + if (err)
> + goto out_err;
>
> - err = free_segment_range(sbi, MAIN_SECS(sbi) * sbi->segs_per_sec,
> - MAIN_SEGS(sbi) - 1);
> + err = free_segment_range(sbi, secs, false);
> if (err)
> - goto out;
> + goto recover_out;
>
> update_sb_metadata(sbi, -secs);
>
> err = f2fs_commit_super(sbi, false);
> if (err) {
> update_sb_metadata(sbi, secs);
> - goto out;
> + goto recover_out;
> }
>
> - mutex_lock(&sbi->cp_mutex);
> update_fs_metadata(sbi, -secs);
> clear_sbi_flag(sbi, SBI_IS_RESIZEFS);
> set_sbi_flag(sbi, SBI_IS_DIRTY);
> - mutex_unlock(&sbi->cp_mutex);
>
> - err = f2fs_sync_fs(sbi->sb, 1);
> + err = f2fs_write_checkpoint(sbi, &cpc);
> if (err) {
> - mutex_lock(&sbi->cp_mutex);
> update_fs_metadata(sbi, secs);
> - mutex_unlock(&sbi->cp_mutex);
> update_sb_metadata(sbi, secs);
> f2fs_commit_super(sbi, false);
> }
> -out:
> +recover_out:
> if (err) {
> set_sbi_flag(sbi, SBI_NEED_FSCK);
> f2fs_err(sbi, "resize_fs failed, should run fsck to repair!");
>
> - MAIN_SECS(sbi) += secs;
> spin_lock(&sbi->stat_lock);
> sbi->user_block_count += shrunk_blocks;
> spin_unlock(&sbi->stat_lock);
> }
> +out_err:
> + mutex_unlock(&sbi->cp_mutex);
> + up_write(&sbi->gc_lock);
> + thaw_super(sbi->sb);
> clear_sbi_flag(sbi, SBI_IS_RESIZEFS);
> - mutex_unlock(&sbi->resize_mutex);
> - thaw_bdev(sbi->sb->s_bdev, sbi->sb);
> return err;
> }
> diff --git a/fs/f2fs/super.c b/fs/f2fs/super.c
> index f2dfc21c6abb0..18b4a43a13438 100644
> --- a/fs/f2fs/super.c
> +++ b/fs/f2fs/super.c
> @@ -3413,7 +3413,6 @@ static int f2fs_fill_super(struct super_block *sb, void *data, int silent)
> init_rwsem(&sbi->gc_lock);
> mutex_init(&sbi->writepages);
> mutex_init(&sbi->cp_mutex);
> - mutex_init(&sbi->resize_mutex);
> init_rwsem(&sbi->node_write);
> init_rwsem(&sbi->node_change);
>
> diff --git a/include/trace/events/f2fs.h b/include/trace/events/f2fs.h
> index e78c8696e2adc..e1ad392aae05a 100644
> --- a/include/trace/events/f2fs.h
> +++ b/include/trace/events/f2fs.h
> @@ -50,6 +50,7 @@ TRACE_DEFINE_ENUM(CP_RECOVERY);
> TRACE_DEFINE_ENUM(CP_DISCARD);
> TRACE_DEFINE_ENUM(CP_TRIMMED);
> TRACE_DEFINE_ENUM(CP_PAUSE);
> +TRACE_DEFINE_ENUM(CP_RESIZE);
>
> #define show_block_type(type) \
> __print_symbolic(type, \
> @@ -126,7 +127,8 @@ TRACE_DEFINE_ENUM(CP_PAUSE);
> { CP_RECOVERY, "Recovery" }, \
> { CP_DISCARD, "Discard" }, \
> { CP_PAUSE, "Pause" }, \
> - { CP_TRIMMED, "Trimmed" })
> + { CP_TRIMMED, "Trimmed" }, \
> + { CP_RESIZE, "Resize" })
>
> #define show_fsync_cpreason(type) \
> __print_symbolic(type, \
>

2020-04-16 07:08:31

by Chao Yu

[permalink] [raw]
Subject: Re: [f2fs-dev] [PATCH] f2fs: prevent meta updates while checkpoint is in progress

On 2020/4/14 21:44, Jaegeuk Kim wrote:
> On 04/13, Jaegeuk Kim wrote:
>> On 04/03, Jaegeuk Kim wrote:
>>> On 04/03, Jaegeuk Kim wrote:
>>>> On 04/01, Sahitya Tummala wrote:
>>>>> Hi Jaegeuk,
>>>>>
>>>>> Got it.
>>>>> The diff below looks good to me.
>>>>> Would you like me to test it and put a patch for this?
>>>>
>>>> Sahitya, Chao,
>>>>
>>>> Could you please take a look at this patch and test intensively?
>>>>
>>>> Thanks,
>
> v4:
> - fix deadlock
>
>>From fcbf75b308a8b933706c7e4dd18f275129baa928 Mon Sep 17 00:00:00 2001
> From: Jaegeuk Kim <[email protected]>
> Date: Tue, 31 Mar 2020 11:43:07 -0700
> Subject: [PATCH] f2fs: refactor resize_fs to avoid meta updates in progress
>
> Sahitya raised an issue:
> - prevent meta updates while checkpoint is in progress
>
> allocate_segment_for_resize() can cause metapage updates if
> it requires to change the current node/data segments for resizing.
> Stop these meta updates when there is a checkpoint already
> in progress to prevent inconsistent CP data.
>
> Signed-off-by: Sahitya Tummala <[email protected]>
> Signed-off-by: Jaegeuk Kim <[email protected]>
> ---
> fs/f2fs/checkpoint.c | 6 +-
> fs/f2fs/f2fs.h | 2 +-
> fs/f2fs/file.c | 5 +-
> fs/f2fs/gc.c | 112 ++++++++++++++++++++----------------
> fs/f2fs/super.c | 1 -
> include/trace/events/f2fs.h | 4 +-
> 6 files changed, 72 insertions(+), 58 deletions(-)
>
> diff --git a/fs/f2fs/checkpoint.c b/fs/f2fs/checkpoint.c
> index 6be357c8e0020..dcb3a15574c99 100644
> --- a/fs/f2fs/checkpoint.c
> +++ b/fs/f2fs/checkpoint.c
> @@ -1554,7 +1554,8 @@ int f2fs_write_checkpoint(struct f2fs_sb_info *sbi, struct cp_control *cpc)
> return 0;
> f2fs_warn(sbi, "Start checkpoint disabled!");
> }
> - mutex_lock(&sbi->cp_mutex);
> + if (cpc->reason != CP_RESIZE)
> + mutex_lock(&sbi->cp_mutex);
>
> if (!is_sbi_flag_set(sbi, SBI_IS_DIRTY) &&
> ((cpc->reason & CP_FASTBOOT) || (cpc->reason & CP_SYNC) ||
> @@ -1623,7 +1624,8 @@ int f2fs_write_checkpoint(struct f2fs_sb_info *sbi, struct cp_control *cpc)
> f2fs_update_time(sbi, CP_TIME);
> trace_f2fs_write_checkpoint(sbi->sb, cpc->reason, "finish checkpoint");
> out:
> - mutex_unlock(&sbi->cp_mutex);
> + if (cpc->reason != CP_RESIZE)
> + mutex_unlock(&sbi->cp_mutex);
> return err;
> }
>
> diff --git a/fs/f2fs/f2fs.h b/fs/f2fs/f2fs.h
> index 801c04858bc94..da5e9dd747fab 100644
> --- a/fs/f2fs/f2fs.h
> +++ b/fs/f2fs/f2fs.h
> @@ -194,6 +194,7 @@ enum {
> #define CP_DISCARD 0x00000010
> #define CP_TRIMMED 0x00000020
> #define CP_PAUSE 0x00000040
> +#define CP_RESIZE 0x00000080
>
> #define MAX_DISCARD_BLOCKS(sbi) BLKS_PER_SEC(sbi)
> #define DEF_MAX_DISCARD_REQUEST 8 /* issue 8 discards per round */
> @@ -1423,7 +1424,6 @@ struct f2fs_sb_info {
> unsigned int segs_per_sec; /* segments per section */
> unsigned int secs_per_zone; /* sections per zone */
> unsigned int total_sections; /* total section count */
> - struct mutex resize_mutex; /* for resize exclusion */
> unsigned int total_node_count; /* total node block count */
> unsigned int total_valid_node_count; /* valid node block count */
> loff_t max_file_blocks; /* max block index of file */
> diff --git a/fs/f2fs/file.c b/fs/f2fs/file.c
> index dc470358f25eb..212c5996d3807 100644
> --- a/fs/f2fs/file.c
> +++ b/fs/f2fs/file.c
> @@ -3306,7 +3306,6 @@ static int f2fs_ioc_resize_fs(struct file *filp, unsigned long arg)
> {
> struct f2fs_sb_info *sbi = F2FS_I_SB(file_inode(filp));
> __u64 block_count;
> - int ret;
>
> if (!capable(CAP_SYS_ADMIN))
> return -EPERM;
> @@ -3318,9 +3317,7 @@ static int f2fs_ioc_resize_fs(struct file *filp, unsigned long arg)
> sizeof(block_count)))
> return -EFAULT;
>
> - ret = f2fs_resize_fs(sbi, block_count);
> -
> - return ret;
> + return f2fs_resize_fs(sbi, block_count);
> }
>
> static int f2fs_ioc_enable_verity(struct file *filp, unsigned long arg)
> diff --git a/fs/f2fs/gc.c b/fs/f2fs/gc.c
> index 26248c8936db0..ad395b774a0b2 100644
> --- a/fs/f2fs/gc.c
> +++ b/fs/f2fs/gc.c
> @@ -1399,12 +1399,29 @@ void f2fs_build_gc_manager(struct f2fs_sb_info *sbi)
> GET_SEGNO(sbi, FDEV(0).end_blk) + 1;
> }
>
> -static int free_segment_range(struct f2fs_sb_info *sbi, unsigned int start,
> - unsigned int end)
> +static int free_segment_range(struct f2fs_sb_info *sbi,
> + unsigned int secs, bool gc_only)
> {
> - int type;
> - unsigned int segno, next_inuse;
> + unsigned int segno, next_inuse, start, end;
> + struct cp_control cpc = { CP_RESIZE, 0, 0, 0 };
> + int gc_mode, gc_type;
> int err = 0;
> + int type;
> +
> + /* Force block allocation for GC */
> + MAIN_SECS(sbi) -= secs;
> + start = MAIN_SECS(sbi) * sbi->segs_per_sec;
> + end = MAIN_SEGS(sbi) - 1;
> +
> + mutex_lock(&DIRTY_I(sbi)->seglist_lock);
> + for (gc_mode = 0; gc_mode < MAX_GC_POLICY; gc_mode++)
> + if (SIT_I(sbi)->last_victim[gc_mode] >= start)
> + SIT_I(sbi)->last_victim[gc_mode] = 0;
> +
> + for (gc_type = BG_GC; gc_type <= FG_GC; gc_type++)
> + if (sbi->next_victim_seg[gc_type] >= start)
> + sbi->next_victim_seg[gc_type] = NULL_SEGNO;
> + mutex_unlock(&DIRTY_I(sbi)->seglist_lock);
>
> /* Move out cursegs from the target range */
> for (type = CURSEG_HOT_DATA; type < NR_CURSEG_TYPE; type++)
> @@ -1417,18 +1434,20 @@ static int free_segment_range(struct f2fs_sb_info *sbi, unsigned int start,
> .iroot = RADIX_TREE_INIT(gc_list.iroot, GFP_NOFS),
> };
>
> - down_write(&sbi->gc_lock);
> do_garbage_collect(sbi, segno, &gc_list, FG_GC);
> - up_write(&sbi->gc_lock);
> put_gc_inode(&gc_list);

Granularity is still large, how about handling userspace signal here
to provide a termination way in case of user don't want / can't wait
for more time.

if (fatal_signal_pending(current))
return -ERESTARTSYS;

Thanks,

>
> - if (get_valid_blocks(sbi, segno, true))
> - return -EAGAIN;
> + if (!gc_only && get_valid_blocks(sbi, segno, true)) {
> + err = -EAGAIN;
> + goto out;
> + }
> }
> + if (gc_only)
> + goto out;
>
> - err = f2fs_sync_fs(sbi->sb, 1);
> + err = f2fs_write_checkpoint(sbi, &cpc);
> if (err)
> - return err;
> + goto out;
>
> next_inuse = find_next_inuse(FREE_I(sbi), end + 1, start);
> if (next_inuse <= end) {
> @@ -1436,6 +1455,8 @@ static int free_segment_range(struct f2fs_sb_info *sbi, unsigned int start,
> next_inuse);
> f2fs_bug_on(sbi, 1);
> }
> +out:
> + MAIN_SECS(sbi) -= secs;
> return err;
> }
>
> @@ -1481,6 +1502,7 @@ static void update_fs_metadata(struct f2fs_sb_info *sbi, int secs)
>
> SM_I(sbi)->segment_count = (int)SM_I(sbi)->segment_count + segs;
> MAIN_SEGS(sbi) = (int)MAIN_SEGS(sbi) + segs;
> + MAIN_SECS(sbi) += secs;
> FREE_I(sbi)->free_sections = (int)FREE_I(sbi)->free_sections + secs;
> FREE_I(sbi)->free_segments = (int)FREE_I(sbi)->free_segments + segs;
> F2FS_CKPT(sbi)->user_block_count = cpu_to_le64(user_block_count + blks);
> @@ -1502,8 +1524,8 @@ static void update_fs_metadata(struct f2fs_sb_info *sbi, int secs)
> int f2fs_resize_fs(struct f2fs_sb_info *sbi, __u64 block_count)
> {
> __u64 old_block_count, shrunk_blocks;
> + struct cp_control cpc = { CP_RESIZE, 0, 0, 0 };
> unsigned int secs;
> - int gc_mode, gc_type;
> int err = 0;
> __u32 rem;
>
> @@ -1538,10 +1560,27 @@ int f2fs_resize_fs(struct f2fs_sb_info *sbi, __u64 block_count)
> return -EINVAL;
> }
>
> - freeze_bdev(sbi->sb->s_bdev);
> -
> shrunk_blocks = old_block_count - block_count;
> secs = div_u64(shrunk_blocks, BLKS_PER_SEC(sbi));
> +
> + /* stop other GC */
> + if (!down_write_trylock(&sbi->gc_lock))
> + return -EAGAIN;
> +
> + /* stop CP to protect MAIN_SEC in free_segment_range */
> + f2fs_lock_op(sbi);
> + err = free_segment_range(sbi, secs, true);
> + f2fs_unlock_op(sbi);
> + up_write(&sbi->gc_lock);
> + if (err)
> + return err;
> +
> + set_sbi_flag(sbi, SBI_IS_RESIZEFS);
> +
> + freeze_super(sbi->sb);
> + down_write(&sbi->gc_lock);
> + mutex_lock(&sbi->cp_mutex);
> +
> spin_lock(&sbi->stat_lock);
> if (shrunk_blocks + valid_user_blocks(sbi) +
> sbi->current_reserved_blocks + sbi->unusable_block_count +
> @@ -1550,69 +1589,44 @@ int f2fs_resize_fs(struct f2fs_sb_info *sbi, __u64 block_count)
> else
> sbi->user_block_count -= shrunk_blocks;
> spin_unlock(&sbi->stat_lock);
> - if (err) {
> - thaw_bdev(sbi->sb->s_bdev, sbi->sb);
> - return err;
> - }
> -
> - mutex_lock(&sbi->resize_mutex);
> - set_sbi_flag(sbi, SBI_IS_RESIZEFS);
> -
> - mutex_lock(&DIRTY_I(sbi)->seglist_lock);
> -
> - MAIN_SECS(sbi) -= secs;
> -
> - for (gc_mode = 0; gc_mode < MAX_GC_POLICY; gc_mode++)
> - if (SIT_I(sbi)->last_victim[gc_mode] >=
> - MAIN_SECS(sbi) * sbi->segs_per_sec)
> - SIT_I(sbi)->last_victim[gc_mode] = 0;
> -
> - for (gc_type = BG_GC; gc_type <= FG_GC; gc_type++)
> - if (sbi->next_victim_seg[gc_type] >=
> - MAIN_SECS(sbi) * sbi->segs_per_sec)
> - sbi->next_victim_seg[gc_type] = NULL_SEGNO;
> -
> - mutex_unlock(&DIRTY_I(sbi)->seglist_lock);
> + if (err)
> + goto out_err;
>
> - err = free_segment_range(sbi, MAIN_SECS(sbi) * sbi->segs_per_sec,
> - MAIN_SEGS(sbi) - 1);
> + err = free_segment_range(sbi, secs, false);
> if (err)
> - goto out;
> + goto recover_out;
>
> update_sb_metadata(sbi, -secs);
>
> err = f2fs_commit_super(sbi, false);
> if (err) {
> update_sb_metadata(sbi, secs);
> - goto out;
> + goto recover_out;
> }
>
> - mutex_lock(&sbi->cp_mutex);
> update_fs_metadata(sbi, -secs);
> clear_sbi_flag(sbi, SBI_IS_RESIZEFS);
> set_sbi_flag(sbi, SBI_IS_DIRTY);
> - mutex_unlock(&sbi->cp_mutex);
>
> - err = f2fs_sync_fs(sbi->sb, 1);
> + err = f2fs_write_checkpoint(sbi, &cpc);
> if (err) {
> - mutex_lock(&sbi->cp_mutex);
> update_fs_metadata(sbi, secs);
> - mutex_unlock(&sbi->cp_mutex);
> update_sb_metadata(sbi, secs);
> f2fs_commit_super(sbi, false);
> }
> -out:
> +recover_out:
> if (err) {
> set_sbi_flag(sbi, SBI_NEED_FSCK);
> f2fs_err(sbi, "resize_fs failed, should run fsck to repair!");
>
> - MAIN_SECS(sbi) += secs;
> spin_lock(&sbi->stat_lock);
> sbi->user_block_count += shrunk_blocks;
> spin_unlock(&sbi->stat_lock);
> }
> +out_err:
> + mutex_unlock(&sbi->cp_mutex);
> + up_write(&sbi->gc_lock);
> + thaw_super(sbi->sb);
> clear_sbi_flag(sbi, SBI_IS_RESIZEFS);
> - mutex_unlock(&sbi->resize_mutex);
> - thaw_bdev(sbi->sb->s_bdev, sbi->sb);
> return err;
> }
> diff --git a/fs/f2fs/super.c b/fs/f2fs/super.c
> index 43a61ed592c10..33da1ad238d72 100644
> --- a/fs/f2fs/super.c
> +++ b/fs/f2fs/super.c
> @@ -3420,7 +3420,6 @@ static int f2fs_fill_super(struct super_block *sb, void *data, int silent)
> init_rwsem(&sbi->gc_lock);
> mutex_init(&sbi->writepages);
> mutex_init(&sbi->cp_mutex);
> - mutex_init(&sbi->resize_mutex);
> init_rwsem(&sbi->node_write);
> init_rwsem(&sbi->node_change);
>
> diff --git a/include/trace/events/f2fs.h b/include/trace/events/f2fs.h
> index 3577fa67690af..421a661bfd5aa 100644
> --- a/include/trace/events/f2fs.h
> +++ b/include/trace/events/f2fs.h
> @@ -50,6 +50,7 @@ TRACE_DEFINE_ENUM(CP_RECOVERY);
> TRACE_DEFINE_ENUM(CP_DISCARD);
> TRACE_DEFINE_ENUM(CP_TRIMMED);
> TRACE_DEFINE_ENUM(CP_PAUSE);
> +TRACE_DEFINE_ENUM(CP_RESIZE);
>
> #define show_block_type(type) \
> __print_symbolic(type, \
> @@ -126,7 +127,8 @@ TRACE_DEFINE_ENUM(CP_PAUSE);
> { CP_RECOVERY, "Recovery" }, \
> { CP_DISCARD, "Discard" }, \
> { CP_PAUSE, "Pause" }, \
> - { CP_TRIMMED, "Trimmed" })
> + { CP_TRIMMED, "Trimmed" }, \
> + { CP_RESIZE, "Resize" })
>
> #define show_fsync_cpreason(type) \
> __print_symbolic(type, \
>

2020-04-16 21:43:22

by Jaegeuk Kim

[permalink] [raw]
Subject: Re: [f2fs-dev] [PATCH] f2fs: prevent meta updates while checkpoint is in progress

On 04/14, Jaegeuk Kim wrote:
> On 04/13, Jaegeuk Kim wrote:
> > On 04/03, Jaegeuk Kim wrote:
> > > On 04/03, Jaegeuk Kim wrote:
> > > > On 04/01, Sahitya Tummala wrote:
> > > > > Hi Jaegeuk,
> > > > >
> > > > > Got it.
> > > > > The diff below looks good to me.
> > > > > Would you like me to test it and put a patch for this?
> > > >
> > > > Sahitya, Chao,
> > > >
> > > > Could you please take a look at this patch and test intensively?
> > > >
> > > > Thanks,

v5:
- add signal handler

Sahitya raised an issue:
- prevent meta updates while checkpoint is in progress

allocate_segment_for_resize() can cause metapage updates if
it requires to change the current node/data segments for resizing.
Stop these meta updates when there is a checkpoint already
in progress to prevent inconsistent CP data.

Signed-off-by: Sahitya Tummala <[email protected]>
Signed-off-by: Jaegeuk Kim <[email protected]>
---
fs/f2fs/checkpoint.c | 6 +-
fs/f2fs/f2fs.h | 2 +-
fs/f2fs/file.c | 5 +-
fs/f2fs/gc.c | 114 ++++++++++++++++++++----------------
fs/f2fs/super.c | 1 -
include/trace/events/f2fs.h | 4 +-
6 files changed, 74 insertions(+), 58 deletions(-)

diff --git a/fs/f2fs/checkpoint.c b/fs/f2fs/checkpoint.c
index 6be357c8e0020..dcb3a15574c99 100644
--- a/fs/f2fs/checkpoint.c
+++ b/fs/f2fs/checkpoint.c
@@ -1554,7 +1554,8 @@ int f2fs_write_checkpoint(struct f2fs_sb_info *sbi, struct cp_control *cpc)
return 0;
f2fs_warn(sbi, "Start checkpoint disabled!");
}
- mutex_lock(&sbi->cp_mutex);
+ if (cpc->reason != CP_RESIZE)
+ mutex_lock(&sbi->cp_mutex);

if (!is_sbi_flag_set(sbi, SBI_IS_DIRTY) &&
((cpc->reason & CP_FASTBOOT) || (cpc->reason & CP_SYNC) ||
@@ -1623,7 +1624,8 @@ int f2fs_write_checkpoint(struct f2fs_sb_info *sbi, struct cp_control *cpc)
f2fs_update_time(sbi, CP_TIME);
trace_f2fs_write_checkpoint(sbi->sb, cpc->reason, "finish checkpoint");
out:
- mutex_unlock(&sbi->cp_mutex);
+ if (cpc->reason != CP_RESIZE)
+ mutex_unlock(&sbi->cp_mutex);
return err;
}

diff --git a/fs/f2fs/f2fs.h b/fs/f2fs/f2fs.h
index dd9c1de7c59d2..9dd1fc957b943 100644
--- a/fs/f2fs/f2fs.h
+++ b/fs/f2fs/f2fs.h
@@ -194,6 +194,7 @@ enum {
#define CP_DISCARD 0x00000010
#define CP_TRIMMED 0x00000020
#define CP_PAUSE 0x00000040
+#define CP_RESIZE 0x00000080

#define MAX_DISCARD_BLOCKS(sbi) BLKS_PER_SEC(sbi)
#define DEF_MAX_DISCARD_REQUEST 8 /* issue 8 discards per round */
@@ -1423,7 +1424,6 @@ struct f2fs_sb_info {
unsigned int segs_per_sec; /* segments per section */
unsigned int secs_per_zone; /* sections per zone */
unsigned int total_sections; /* total section count */
- struct mutex resize_mutex; /* for resize exclusion */
unsigned int total_node_count; /* total node block count */
unsigned int total_valid_node_count; /* valid node block count */
loff_t max_file_blocks; /* max block index of file */
diff --git a/fs/f2fs/file.c b/fs/f2fs/file.c
index dc470358f25eb..212c5996d3807 100644
--- a/fs/f2fs/file.c
+++ b/fs/f2fs/file.c
@@ -3306,7 +3306,6 @@ static int f2fs_ioc_resize_fs(struct file *filp, unsigned long arg)
{
struct f2fs_sb_info *sbi = F2FS_I_SB(file_inode(filp));
__u64 block_count;
- int ret;

if (!capable(CAP_SYS_ADMIN))
return -EPERM;
@@ -3318,9 +3317,7 @@ static int f2fs_ioc_resize_fs(struct file *filp, unsigned long arg)
sizeof(block_count)))
return -EFAULT;

- ret = f2fs_resize_fs(sbi, block_count);
-
- return ret;
+ return f2fs_resize_fs(sbi, block_count);
}

static int f2fs_ioc_enable_verity(struct file *filp, unsigned long arg)
diff --git a/fs/f2fs/gc.c b/fs/f2fs/gc.c
index 26248c8936db0..c979dd1add5de 100644
--- a/fs/f2fs/gc.c
+++ b/fs/f2fs/gc.c
@@ -1399,12 +1399,29 @@ void f2fs_build_gc_manager(struct f2fs_sb_info *sbi)
GET_SEGNO(sbi, FDEV(0).end_blk) + 1;
}

-static int free_segment_range(struct f2fs_sb_info *sbi, unsigned int start,
- unsigned int end)
+static int free_segment_range(struct f2fs_sb_info *sbi,
+ unsigned int secs, bool gc_only)
{
- int type;
- unsigned int segno, next_inuse;
+ unsigned int segno, next_inuse, start, end;
+ struct cp_control cpc = { CP_RESIZE, 0, 0, 0 };
+ int gc_mode, gc_type;
int err = 0;
+ int type;
+
+ /* Force block allocation for GC */
+ MAIN_SECS(sbi) -= secs;
+ start = MAIN_SECS(sbi) * sbi->segs_per_sec;
+ end = MAIN_SEGS(sbi) - 1;
+
+ mutex_lock(&DIRTY_I(sbi)->seglist_lock);
+ for (gc_mode = 0; gc_mode < MAX_GC_POLICY; gc_mode++)
+ if (SIT_I(sbi)->last_victim[gc_mode] >= start)
+ SIT_I(sbi)->last_victim[gc_mode] = 0;
+
+ for (gc_type = BG_GC; gc_type <= FG_GC; gc_type++)
+ if (sbi->next_victim_seg[gc_type] >= start)
+ sbi->next_victim_seg[gc_type] = NULL_SEGNO;
+ mutex_unlock(&DIRTY_I(sbi)->seglist_lock);

/* Move out cursegs from the target range */
for (type = CURSEG_HOT_DATA; type < NR_CURSEG_TYPE; type++)
@@ -1417,18 +1434,22 @@ static int free_segment_range(struct f2fs_sb_info *sbi, unsigned int start,
.iroot = RADIX_TREE_INIT(gc_list.iroot, GFP_NOFS),
};

- down_write(&sbi->gc_lock);
do_garbage_collect(sbi, segno, &gc_list, FG_GC);
- up_write(&sbi->gc_lock);
put_gc_inode(&gc_list);

- if (get_valid_blocks(sbi, segno, true))
- return -EAGAIN;
+ if (!gc_only && get_valid_blocks(sbi, segno, true)) {
+ err = -EAGAIN;
+ goto out;
+ }
+ if (fatal_signal_pending(current))
+ return -ERESTARTSYS;
}
+ if (gc_only)
+ goto out;

- err = f2fs_sync_fs(sbi->sb, 1);
+ err = f2fs_write_checkpoint(sbi, &cpc);
if (err)
- return err;
+ goto out;

next_inuse = find_next_inuse(FREE_I(sbi), end + 1, start);
if (next_inuse <= end) {
@@ -1436,6 +1457,8 @@ static int free_segment_range(struct f2fs_sb_info *sbi, unsigned int start,
next_inuse);
f2fs_bug_on(sbi, 1);
}
+out:
+ MAIN_SECS(sbi) -= secs;
return err;
}

@@ -1481,6 +1504,7 @@ static void update_fs_metadata(struct f2fs_sb_info *sbi, int secs)

SM_I(sbi)->segment_count = (int)SM_I(sbi)->segment_count + segs;
MAIN_SEGS(sbi) = (int)MAIN_SEGS(sbi) + segs;
+ MAIN_SECS(sbi) += secs;
FREE_I(sbi)->free_sections = (int)FREE_I(sbi)->free_sections + secs;
FREE_I(sbi)->free_segments = (int)FREE_I(sbi)->free_segments + segs;
F2FS_CKPT(sbi)->user_block_count = cpu_to_le64(user_block_count + blks);
@@ -1502,8 +1526,8 @@ static void update_fs_metadata(struct f2fs_sb_info *sbi, int secs)
int f2fs_resize_fs(struct f2fs_sb_info *sbi, __u64 block_count)
{
__u64 old_block_count, shrunk_blocks;
+ struct cp_control cpc = { CP_RESIZE, 0, 0, 0 };
unsigned int secs;
- int gc_mode, gc_type;
int err = 0;
__u32 rem;

@@ -1538,10 +1562,27 @@ int f2fs_resize_fs(struct f2fs_sb_info *sbi, __u64 block_count)
return -EINVAL;
}

- freeze_bdev(sbi->sb->s_bdev);
-
shrunk_blocks = old_block_count - block_count;
secs = div_u64(shrunk_blocks, BLKS_PER_SEC(sbi));
+
+ /* stop other GC */
+ if (!down_write_trylock(&sbi->gc_lock))
+ return -EAGAIN;
+
+ /* stop CP to protect MAIN_SEC in free_segment_range */
+ f2fs_lock_op(sbi);
+ err = free_segment_range(sbi, secs, true);
+ f2fs_unlock_op(sbi);
+ up_write(&sbi->gc_lock);
+ if (err)
+ return err;
+
+ set_sbi_flag(sbi, SBI_IS_RESIZEFS);
+
+ freeze_super(sbi->sb);
+ down_write(&sbi->gc_lock);
+ mutex_lock(&sbi->cp_mutex);
+
spin_lock(&sbi->stat_lock);
if (shrunk_blocks + valid_user_blocks(sbi) +
sbi->current_reserved_blocks + sbi->unusable_block_count +
@@ -1550,69 +1591,44 @@ int f2fs_resize_fs(struct f2fs_sb_info *sbi, __u64 block_count)
else
sbi->user_block_count -= shrunk_blocks;
spin_unlock(&sbi->stat_lock);
- if (err) {
- thaw_bdev(sbi->sb->s_bdev, sbi->sb);
- return err;
- }
-
- mutex_lock(&sbi->resize_mutex);
- set_sbi_flag(sbi, SBI_IS_RESIZEFS);
-
- mutex_lock(&DIRTY_I(sbi)->seglist_lock);
-
- MAIN_SECS(sbi) -= secs;
-
- for (gc_mode = 0; gc_mode < MAX_GC_POLICY; gc_mode++)
- if (SIT_I(sbi)->last_victim[gc_mode] >=
- MAIN_SECS(sbi) * sbi->segs_per_sec)
- SIT_I(sbi)->last_victim[gc_mode] = 0;
-
- for (gc_type = BG_GC; gc_type <= FG_GC; gc_type++)
- if (sbi->next_victim_seg[gc_type] >=
- MAIN_SECS(sbi) * sbi->segs_per_sec)
- sbi->next_victim_seg[gc_type] = NULL_SEGNO;
-
- mutex_unlock(&DIRTY_I(sbi)->seglist_lock);
+ if (err)
+ goto out_err;

- err = free_segment_range(sbi, MAIN_SECS(sbi) * sbi->segs_per_sec,
- MAIN_SEGS(sbi) - 1);
+ err = free_segment_range(sbi, secs, false);
if (err)
- goto out;
+ goto recover_out;

update_sb_metadata(sbi, -secs);

err = f2fs_commit_super(sbi, false);
if (err) {
update_sb_metadata(sbi, secs);
- goto out;
+ goto recover_out;
}

- mutex_lock(&sbi->cp_mutex);
update_fs_metadata(sbi, -secs);
clear_sbi_flag(sbi, SBI_IS_RESIZEFS);
set_sbi_flag(sbi, SBI_IS_DIRTY);
- mutex_unlock(&sbi->cp_mutex);

- err = f2fs_sync_fs(sbi->sb, 1);
+ err = f2fs_write_checkpoint(sbi, &cpc);
if (err) {
- mutex_lock(&sbi->cp_mutex);
update_fs_metadata(sbi, secs);
- mutex_unlock(&sbi->cp_mutex);
update_sb_metadata(sbi, secs);
f2fs_commit_super(sbi, false);
}
-out:
+recover_out:
if (err) {
set_sbi_flag(sbi, SBI_NEED_FSCK);
f2fs_err(sbi, "resize_fs failed, should run fsck to repair!");

- MAIN_SECS(sbi) += secs;
spin_lock(&sbi->stat_lock);
sbi->user_block_count += shrunk_blocks;
spin_unlock(&sbi->stat_lock);
}
+out_err:
+ mutex_unlock(&sbi->cp_mutex);
+ up_write(&sbi->gc_lock);
+ thaw_super(sbi->sb);
clear_sbi_flag(sbi, SBI_IS_RESIZEFS);
- mutex_unlock(&sbi->resize_mutex);
- thaw_bdev(sbi->sb->s_bdev, sbi->sb);
return err;
}
diff --git a/fs/f2fs/super.c b/fs/f2fs/super.c
index e3a323ff04c34..ad3b66c3dbe0e 100644
--- a/fs/f2fs/super.c
+++ b/fs/f2fs/super.c
@@ -3420,7 +3420,6 @@ static int f2fs_fill_super(struct super_block *sb, void *data, int silent)
init_rwsem(&sbi->gc_lock);
mutex_init(&sbi->writepages);
mutex_init(&sbi->cp_mutex);
- mutex_init(&sbi->resize_mutex);
init_rwsem(&sbi->node_write);
init_rwsem(&sbi->node_change);

diff --git a/include/trace/events/f2fs.h b/include/trace/events/f2fs.h
index 3577fa67690af..421a661bfd5aa 100644
--- a/include/trace/events/f2fs.h
+++ b/include/trace/events/f2fs.h
@@ -50,6 +50,7 @@ TRACE_DEFINE_ENUM(CP_RECOVERY);
TRACE_DEFINE_ENUM(CP_DISCARD);
TRACE_DEFINE_ENUM(CP_TRIMMED);
TRACE_DEFINE_ENUM(CP_PAUSE);
+TRACE_DEFINE_ENUM(CP_RESIZE);

#define show_block_type(type) \
__print_symbolic(type, \
@@ -126,7 +127,8 @@ TRACE_DEFINE_ENUM(CP_PAUSE);
{ CP_RECOVERY, "Recovery" }, \
{ CP_DISCARD, "Discard" }, \
{ CP_PAUSE, "Pause" }, \
- { CP_TRIMMED, "Trimmed" })
+ { CP_TRIMMED, "Trimmed" }, \
+ { CP_RESIZE, "Resize" })

#define show_fsync_cpreason(type) \
__print_symbolic(type, \
--
2.26.1.301.g55bc3eb7cb9-goog

2020-04-17 07:29:46

by Chao Yu

[permalink] [raw]
Subject: Re: [f2fs-dev] [PATCH] f2fs: prevent meta updates while checkpoint is in progress

On 2020/4/17 5:40, Jaegeuk Kim wrote:
> On 04/14, Jaegeuk Kim wrote:
>> On 04/13, Jaegeuk Kim wrote:
>>> On 04/03, Jaegeuk Kim wrote:
>>>> On 04/03, Jaegeuk Kim wrote:
>>>>> On 04/01, Sahitya Tummala wrote:
>>>>>> Hi Jaegeuk,
>>>>>>
>>>>>> Got it.
>>>>>> The diff below looks good to me.
>>>>>> Would you like me to test it and put a patch for this?
>>>>>
>>>>> Sahitya, Chao,
>>>>>
>>>>> Could you please take a look at this patch and test intensively?
>>>>>
>>>>> Thanks,
>
> v5:
> - add signal handler
>
> Sahitya raised an issue:
> - prevent meta updates while checkpoint is in progress
>
> allocate_segment_for_resize() can cause metapage updates if
> it requires to change the current node/data segments for resizing.
> Stop these meta updates when there is a checkpoint already
> in progress to prevent inconsistent CP data.
>
> Signed-off-by: Sahitya Tummala <[email protected]>
> Signed-off-by: Jaegeuk Kim <[email protected]>

Reviewed-by: Chao Yu <[email protected]>

Thanks,

2020-04-17 16:19:00

by Jaegeuk Kim

[permalink] [raw]
Subject: Re: [f2fs-dev] [PATCH] f2fs: prevent meta updates while checkpoint is in progress

Hi Sahitya,

Could you please test this patch fully? I didn't test at all.

Thanks,

On 04/17, Chao Yu wrote:
> On 2020/4/17 5:40, Jaegeuk Kim wrote:
> > On 04/14, Jaegeuk Kim wrote:
> >> On 04/13, Jaegeuk Kim wrote:
> >>> On 04/03, Jaegeuk Kim wrote:
> >>>> On 04/03, Jaegeuk Kim wrote:
> >>>>> On 04/01, Sahitya Tummala wrote:
> >>>>>> Hi Jaegeuk,
> >>>>>>
> >>>>>> Got it.
> >>>>>> The diff below looks good to me.
> >>>>>> Would you like me to test it and put a patch for this?
> >>>>>
> >>>>> Sahitya, Chao,
> >>>>>
> >>>>> Could you please take a look at this patch and test intensively?
> >>>>>
> >>>>> Thanks,
> >
> > v5:
> > - add signal handler
> >
> > Sahitya raised an issue:
> > - prevent meta updates while checkpoint is in progress
> >
> > allocate_segment_for_resize() can cause metapage updates if
> > it requires to change the current node/data segments for resizing.
> > Stop these meta updates when there is a checkpoint already
> > in progress to prevent inconsistent CP data.
> >
> > Signed-off-by: Sahitya Tummala <[email protected]>
> > Signed-off-by: Jaegeuk Kim <[email protected]>
>
> Reviewed-by: Chao Yu <[email protected]>
>
> Thanks,

2020-04-20 11:38:47

by Sahitya Tummala

[permalink] [raw]
Subject: Re: [f2fs-dev] [PATCH] f2fs: prevent meta updates while checkpoint is in progress

Hi Jaegeuk,

On Fri, Apr 17, 2020 at 09:15:16AM -0700, Jaegeuk Kim wrote:
> Hi Sahitya,
>
> Could you please test this patch fully? I didn't test at all.

I have tested v5 and so far found only one problem where MAIN_SECS(sbi)
isn't updated properly. Fixed it as below.

diff --git a/fs/f2fs/gc.c b/fs/f2fs/gc.c
index 603f195..a5166b1 100644
--- a/fs/f2fs/gc.c
+++ b/fs/f2fs/gc.c
@@ -1450,7 +1450,7 @@ static int free_segment_range(struct f2fs_sb_info *sbi,
f2fs_bug_on(sbi, 1);
}
out:
- MAIN_SECS(sbi) -= secs;
+ MAIN_SECS(sbi) += secs;
return err;
}

I will let you know in case anything else shows up later.

Thanks,

>
> Thanks,
>
> On 04/17, Chao Yu wrote:
> > On 2020/4/17 5:40, Jaegeuk Kim wrote:
> > > On 04/14, Jaegeuk Kim wrote:
> > >> On 04/13, Jaegeuk Kim wrote:
> > >>> On 04/03, Jaegeuk Kim wrote:
> > >>>> On 04/03, Jaegeuk Kim wrote:
> > >>>>> On 04/01, Sahitya Tummala wrote:
> > >>>>>> Hi Jaegeuk,
> > >>>>>>
> > >>>>>> Got it.
> > >>>>>> The diff below looks good to me.
> > >>>>>> Would you like me to test it and put a patch for this?
> > >>>>>
> > >>>>> Sahitya, Chao,
> > >>>>>
> > >>>>> Could you please take a look at this patch and test intensively?
> > >>>>>
> > >>>>> Thanks,
> > >
> > > v5:
> > > - add signal handler
> > >
> > > Sahitya raised an issue:
> > > - prevent meta updates while checkpoint is in progress
> > >
> > > allocate_segment_for_resize() can cause metapage updates if
> > > it requires to change the current node/data segments for resizing.
> > > Stop these meta updates when there is a checkpoint already
> > > in progress to prevent inconsistent CP data.
> > >
> > > Signed-off-by: Sahitya Tummala <[email protected]>
> > > Signed-off-by: Jaegeuk Kim <[email protected]>
> >
> > Reviewed-by: Chao Yu <[email protected]>
> >
> > Thanks,

--
--
Sent by a consultant of the Qualcomm Innovation Center, Inc.
The Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum.

2020-04-20 15:55:07

by Jaegeuk Kim

[permalink] [raw]
Subject: Re: [f2fs-dev] [PATCH] f2fs: prevent meta updates while checkpoint is in progress

On 04/20, Sahitya Tummala wrote:
> Hi Jaegeuk,
>
> On Fri, Apr 17, 2020 at 09:15:16AM -0700, Jaegeuk Kim wrote:
> > Hi Sahitya,
> >
> > Could you please test this patch fully? I didn't test at all.
>
> I have tested v5 and so far found only one problem where MAIN_SECS(sbi)
> isn't updated properly. Fixed it as below.

Thanks. I fixed this with one more signal error case together.

Sahitya raised an issue:
- prevent meta updates while checkpoint is in progress

allocate_segment_for_resize() can cause metapage updates if
it requires to change the current node/data segments for resizing.
Stop these meta updates when there is a checkpoint already
in progress to prevent inconsistent CP data.

Signed-off-by: Sahitya Tummala <[email protected]>
Reviewed-by: Chao Yu <[email protected]>
Signed-off-by: Jaegeuk Kim <[email protected]>
---
fs/f2fs/checkpoint.c | 6 +-
fs/f2fs/f2fs.h | 2 +-
fs/f2fs/file.c | 5 +-
fs/f2fs/gc.c | 116 +++++++++++++++++++++---------------
fs/f2fs/super.c | 1 -
include/trace/events/f2fs.h | 4 +-
6 files changed, 76 insertions(+), 58 deletions(-)

diff --git a/fs/f2fs/checkpoint.c b/fs/f2fs/checkpoint.c
index 5ba649e17c72b..eafe37eab5e0c 100644
--- a/fs/f2fs/checkpoint.c
+++ b/fs/f2fs/checkpoint.c
@@ -1559,7 +1559,8 @@ int f2fs_write_checkpoint(struct f2fs_sb_info *sbi, struct cp_control *cpc)
return 0;
f2fs_warn(sbi, "Start checkpoint disabled!");
}
- mutex_lock(&sbi->cp_mutex);
+ if (cpc->reason != CP_RESIZE)
+ mutex_lock(&sbi->cp_mutex);

if (!is_sbi_flag_set(sbi, SBI_IS_DIRTY) &&
((cpc->reason & CP_FASTBOOT) || (cpc->reason & CP_SYNC) ||
@@ -1628,7 +1629,8 @@ int f2fs_write_checkpoint(struct f2fs_sb_info *sbi, struct cp_control *cpc)
f2fs_update_time(sbi, CP_TIME);
trace_f2fs_write_checkpoint(sbi->sb, cpc->reason, "finish checkpoint");
out:
- mutex_unlock(&sbi->cp_mutex);
+ if (cpc->reason != CP_RESIZE)
+ mutex_unlock(&sbi->cp_mutex);
return err;
}

diff --git a/fs/f2fs/f2fs.h b/fs/f2fs/f2fs.h
index 6a80016acb85b..bae8e65deed6b 100644
--- a/fs/f2fs/f2fs.h
+++ b/fs/f2fs/f2fs.h
@@ -194,6 +194,7 @@ enum {
#define CP_DISCARD 0x00000010
#define CP_TRIMMED 0x00000020
#define CP_PAUSE 0x00000040
+#define CP_RESIZE 0x00000080

#define MAX_DISCARD_BLOCKS(sbi) BLKS_PER_SEC(sbi)
#define DEF_MAX_DISCARD_REQUEST 8 /* issue 8 discards per round */
@@ -1435,7 +1436,6 @@ struct f2fs_sb_info {
unsigned int segs_per_sec; /* segments per section */
unsigned int secs_per_zone; /* sections per zone */
unsigned int total_sections; /* total section count */
- struct mutex resize_mutex; /* for resize exclusion */
unsigned int total_node_count; /* total node block count */
unsigned int total_valid_node_count; /* valid node block count */
loff_t max_file_blocks; /* max block index of file */
diff --git a/fs/f2fs/file.c b/fs/f2fs/file.c
index 1f6c7c4738e30..ea04fb4dcdbdd 100644
--- a/fs/f2fs/file.c
+++ b/fs/f2fs/file.c
@@ -3310,7 +3310,6 @@ static int f2fs_ioc_resize_fs(struct file *filp, unsigned long arg)
{
struct f2fs_sb_info *sbi = F2FS_I_SB(file_inode(filp));
__u64 block_count;
- int ret;

if (!capable(CAP_SYS_ADMIN))
return -EPERM;
@@ -3322,9 +3321,7 @@ static int f2fs_ioc_resize_fs(struct file *filp, unsigned long arg)
sizeof(block_count)))
return -EFAULT;

- ret = f2fs_resize_fs(sbi, block_count);
-
- return ret;
+ return f2fs_resize_fs(sbi, block_count);
}

static int f2fs_ioc_enable_verity(struct file *filp, unsigned long arg)
diff --git a/fs/f2fs/gc.c b/fs/f2fs/gc.c
index 28a8c79c8bdc3..8dee6cd8e4d24 100644
--- a/fs/f2fs/gc.c
+++ b/fs/f2fs/gc.c
@@ -1405,12 +1405,29 @@ void f2fs_build_gc_manager(struct f2fs_sb_info *sbi)
GET_SEGNO(sbi, FDEV(0).end_blk) + 1;
}

-static int free_segment_range(struct f2fs_sb_info *sbi, unsigned int start,
- unsigned int end)
+static int free_segment_range(struct f2fs_sb_info *sbi,
+ unsigned int secs, bool gc_only)
{
- int type;
- unsigned int segno, next_inuse;
+ unsigned int segno, next_inuse, start, end;
+ struct cp_control cpc = { CP_RESIZE, 0, 0, 0 };
+ int gc_mode, gc_type;
int err = 0;
+ int type;
+
+ /* Force block allocation for GC */
+ MAIN_SECS(sbi) -= secs;
+ start = MAIN_SECS(sbi) * sbi->segs_per_sec;
+ end = MAIN_SEGS(sbi) - 1;
+
+ mutex_lock(&DIRTY_I(sbi)->seglist_lock);
+ for (gc_mode = 0; gc_mode < MAX_GC_POLICY; gc_mode++)
+ if (SIT_I(sbi)->last_victim[gc_mode] >= start)
+ SIT_I(sbi)->last_victim[gc_mode] = 0;
+
+ for (gc_type = BG_GC; gc_type <= FG_GC; gc_type++)
+ if (sbi->next_victim_seg[gc_type] >= start)
+ sbi->next_victim_seg[gc_type] = NULL_SEGNO;
+ mutex_unlock(&DIRTY_I(sbi)->seglist_lock);

/* Move out cursegs from the target range */
for (type = CURSEG_HOT_DATA; type < NR_CURSEG_TYPE; type++)
@@ -1423,18 +1440,24 @@ static int free_segment_range(struct f2fs_sb_info *sbi, unsigned int start,
.iroot = RADIX_TREE_INIT(gc_list.iroot, GFP_NOFS),
};

- down_write(&sbi->gc_lock);
do_garbage_collect(sbi, segno, &gc_list, FG_GC);
- up_write(&sbi->gc_lock);
put_gc_inode(&gc_list);

- if (get_valid_blocks(sbi, segno, true))
- return -EAGAIN;
+ if (!gc_only && get_valid_blocks(sbi, segno, true)) {
+ err = -EAGAIN;
+ goto out;
+ }
+ if (fatal_signal_pending(current)) {
+ err = -ERESTARTSYS;
+ goto out;
+ }
}
+ if (gc_only)
+ goto out;

- err = f2fs_sync_fs(sbi->sb, 1);
+ err = f2fs_write_checkpoint(sbi, &cpc);
if (err)
- return err;
+ goto out;

next_inuse = find_next_inuse(FREE_I(sbi), end + 1, start);
if (next_inuse <= end) {
@@ -1442,6 +1465,8 @@ static int free_segment_range(struct f2fs_sb_info *sbi, unsigned int start,
next_inuse);
f2fs_bug_on(sbi, 1);
}
+out:
+ MAIN_SECS(sbi) += secs;
return err;
}

@@ -1487,6 +1512,7 @@ static void update_fs_metadata(struct f2fs_sb_info *sbi, int secs)

SM_I(sbi)->segment_count = (int)SM_I(sbi)->segment_count + segs;
MAIN_SEGS(sbi) = (int)MAIN_SEGS(sbi) + segs;
+ MAIN_SECS(sbi) += secs;
FREE_I(sbi)->free_sections = (int)FREE_I(sbi)->free_sections + secs;
FREE_I(sbi)->free_segments = (int)FREE_I(sbi)->free_segments + segs;
F2FS_CKPT(sbi)->user_block_count = cpu_to_le64(user_block_count + blks);
@@ -1508,8 +1534,8 @@ static void update_fs_metadata(struct f2fs_sb_info *sbi, int secs)
int f2fs_resize_fs(struct f2fs_sb_info *sbi, __u64 block_count)
{
__u64 old_block_count, shrunk_blocks;
+ struct cp_control cpc = { CP_RESIZE, 0, 0, 0 };
unsigned int secs;
- int gc_mode, gc_type;
int err = 0;
__u32 rem;

@@ -1544,10 +1570,27 @@ int f2fs_resize_fs(struct f2fs_sb_info *sbi, __u64 block_count)
return -EINVAL;
}

- freeze_bdev(sbi->sb->s_bdev);
-
shrunk_blocks = old_block_count - block_count;
secs = div_u64(shrunk_blocks, BLKS_PER_SEC(sbi));
+
+ /* stop other GC */
+ if (!down_write_trylock(&sbi->gc_lock))
+ return -EAGAIN;
+
+ /* stop CP to protect MAIN_SEC in free_segment_range */
+ f2fs_lock_op(sbi);
+ err = free_segment_range(sbi, secs, true);
+ f2fs_unlock_op(sbi);
+ up_write(&sbi->gc_lock);
+ if (err)
+ return err;
+
+ set_sbi_flag(sbi, SBI_IS_RESIZEFS);
+
+ freeze_super(sbi->sb);
+ down_write(&sbi->gc_lock);
+ mutex_lock(&sbi->cp_mutex);
+
spin_lock(&sbi->stat_lock);
if (shrunk_blocks + valid_user_blocks(sbi) +
sbi->current_reserved_blocks + sbi->unusable_block_count +
@@ -1556,69 +1599,44 @@ int f2fs_resize_fs(struct f2fs_sb_info *sbi, __u64 block_count)
else
sbi->user_block_count -= shrunk_blocks;
spin_unlock(&sbi->stat_lock);
- if (err) {
- thaw_bdev(sbi->sb->s_bdev, sbi->sb);
- return err;
- }
-
- mutex_lock(&sbi->resize_mutex);
- set_sbi_flag(sbi, SBI_IS_RESIZEFS);
-
- mutex_lock(&DIRTY_I(sbi)->seglist_lock);
-
- MAIN_SECS(sbi) -= secs;
-
- for (gc_mode = 0; gc_mode < MAX_GC_POLICY; gc_mode++)
- if (SIT_I(sbi)->last_victim[gc_mode] >=
- MAIN_SECS(sbi) * sbi->segs_per_sec)
- SIT_I(sbi)->last_victim[gc_mode] = 0;
-
- for (gc_type = BG_GC; gc_type <= FG_GC; gc_type++)
- if (sbi->next_victim_seg[gc_type] >=
- MAIN_SECS(sbi) * sbi->segs_per_sec)
- sbi->next_victim_seg[gc_type] = NULL_SEGNO;
-
- mutex_unlock(&DIRTY_I(sbi)->seglist_lock);
+ if (err)
+ goto out_err;

- err = free_segment_range(sbi, MAIN_SECS(sbi) * sbi->segs_per_sec,
- MAIN_SEGS(sbi) - 1);
+ err = free_segment_range(sbi, secs, false);
if (err)
- goto out;
+ goto recover_out;

update_sb_metadata(sbi, -secs);

err = f2fs_commit_super(sbi, false);
if (err) {
update_sb_metadata(sbi, secs);
- goto out;
+ goto recover_out;
}

- mutex_lock(&sbi->cp_mutex);
update_fs_metadata(sbi, -secs);
clear_sbi_flag(sbi, SBI_IS_RESIZEFS);
set_sbi_flag(sbi, SBI_IS_DIRTY);
- mutex_unlock(&sbi->cp_mutex);

- err = f2fs_sync_fs(sbi->sb, 1);
+ err = f2fs_write_checkpoint(sbi, &cpc);
if (err) {
- mutex_lock(&sbi->cp_mutex);
update_fs_metadata(sbi, secs);
- mutex_unlock(&sbi->cp_mutex);
update_sb_metadata(sbi, secs);
f2fs_commit_super(sbi, false);
}
-out:
+recover_out:
if (err) {
set_sbi_flag(sbi, SBI_NEED_FSCK);
f2fs_err(sbi, "resize_fs failed, should run fsck to repair!");

- MAIN_SECS(sbi) += secs;
spin_lock(&sbi->stat_lock);
sbi->user_block_count += shrunk_blocks;
spin_unlock(&sbi->stat_lock);
}
+out_err:
+ mutex_unlock(&sbi->cp_mutex);
+ up_write(&sbi->gc_lock);
+ thaw_super(sbi->sb);
clear_sbi_flag(sbi, SBI_IS_RESIZEFS);
- mutex_unlock(&sbi->resize_mutex);
- thaw_bdev(sbi->sb->s_bdev, sbi->sb);
return err;
}
diff --git a/fs/f2fs/super.c b/fs/f2fs/super.c
index e3a323ff04c34..ad3b66c3dbe0e 100644
--- a/fs/f2fs/super.c
+++ b/fs/f2fs/super.c
@@ -3420,7 +3420,6 @@ static int f2fs_fill_super(struct super_block *sb, void *data, int silent)
init_rwsem(&sbi->gc_lock);
mutex_init(&sbi->writepages);
mutex_init(&sbi->cp_mutex);
- mutex_init(&sbi->resize_mutex);
init_rwsem(&sbi->node_write);
init_rwsem(&sbi->node_change);

diff --git a/include/trace/events/f2fs.h b/include/trace/events/f2fs.h
index 757d3d6031e63..4dbcdc6d27383 100644
--- a/include/trace/events/f2fs.h
+++ b/include/trace/events/f2fs.h
@@ -50,6 +50,7 @@ TRACE_DEFINE_ENUM(CP_RECOVERY);
TRACE_DEFINE_ENUM(CP_DISCARD);
TRACE_DEFINE_ENUM(CP_TRIMMED);
TRACE_DEFINE_ENUM(CP_PAUSE);
+TRACE_DEFINE_ENUM(CP_RESIZE);

#define show_block_type(type) \
__print_symbolic(type, \
@@ -126,7 +127,8 @@ TRACE_DEFINE_ENUM(CP_PAUSE);
{ CP_RECOVERY, "Recovery" }, \
{ CP_DISCARD, "Discard" }, \
{ CP_PAUSE, "Pause" }, \
- { CP_TRIMMED, "Trimmed" })
+ { CP_TRIMMED, "Trimmed" }, \
+ { CP_RESIZE, "Resize" })

#define show_fsync_cpreason(type) \
__print_symbolic(type, \
--
2.26.1.301.g55bc3eb7cb9-goog