2017-03-25 07:59:42

by Jaegeuk Kim

[permalink] [raw]
Subject: [PATCH 1/5] f2fs: relax node version check for victim data in gc

- has_not_enough_free_secs
node_secs: 0 dent_secs: 0 freed:0 free_segments:103 reserved:104

- f2fs_gc
- get_victim_by_default
alloc_mode 0, gc_mode 1, max_search 2672, offset 4654, ofs_unit 1

- do_garbage_collect
start_segno 3976, end_segno 3977 type 0

- is_alive
nid 22797, blkaddr 2131882, ofs_in_node 0, version 0x8/0x0

- gc_data_segment 766, segno 3976, block 512/426 not alive

So, this patch fixes subtle corrupted case where node version does not match
to summary version which results in infinite loop by gc.

Reported-by: Yunlei He <[email protected]>
Signed-off-by: Jaegeuk Kim <[email protected]>
---
fs/f2fs/gc.c | 6 ++++--
1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/fs/f2fs/gc.c b/fs/f2fs/gc.c
index 939be88a8833..bbeee41aaf73 100644
--- a/fs/f2fs/gc.c
+++ b/fs/f2fs/gc.c
@@ -551,8 +551,10 @@ static bool is_alive(struct f2fs_sb_info *sbi, struct f2fs_summary *sum,
get_node_info(sbi, nid, dni);

if (sum->version != dni->version) {
- f2fs_put_page(node_page, 1);
- return false;
+ f2fs_msg(sbi->sb, KERN_WARNING,
+ "%s: valid data with mismatched node version.",
+ __func__);
+ set_sbi_flag(sbi, SBI_NEED_FSCK);
}

*nofs = ofs_of_node(node_page);
--
2.11.0


2017-03-25 07:59:51

by Jaegeuk Kim

[permalink] [raw]
Subject: [PATCH 2/5] f2fs: write small sized IO to hot log

It would better split small and large IOs separately in order to get more
consecutive big writes.

The default threshold is set to 64KB, but configurable by sysfs/min_hot_blocks.

Signed-off-by: Jaegeuk Kim <[email protected]>
---
fs/f2fs/data.c | 5 +++++
fs/f2fs/f2fs.h | 2 ++
fs/f2fs/segment.c | 13 ++++++-------
fs/f2fs/segment.h | 1 +
fs/f2fs/super.c | 2 ++
5 files changed, 16 insertions(+), 7 deletions(-)

diff --git a/fs/f2fs/data.c b/fs/f2fs/data.c
index 090413236b27..7275697bc940 100644
--- a/fs/f2fs/data.c
+++ b/fs/f2fs/data.c
@@ -1511,6 +1511,11 @@ static int f2fs_write_cache_pages(struct address_space *mapping,

pagevec_init(&pvec, 0);

+ if (get_dirty_pages(mapping->host) <= SM_I(F2FS_M_SB(mapping))->min_hot_blocks)
+ set_inode_flag(mapping->host, FI_HOT_DATA);
+ else
+ clear_inode_flag(mapping->host, FI_HOT_DATA);
+
if (wbc->range_cyclic) {
writeback_index = mapping->writeback_index; /* prev offset */
index = writeback_index;
diff --git a/fs/f2fs/f2fs.h b/fs/f2fs/f2fs.h
index 6fbdcac01d9a..19e28127a725 100644
--- a/fs/f2fs/f2fs.h
+++ b/fs/f2fs/f2fs.h
@@ -674,6 +674,7 @@ struct f2fs_sm_info {
unsigned int ipu_policy; /* in-place-update policy */
unsigned int min_ipu_util; /* in-place-update threshold */
unsigned int min_fsync_blocks; /* threshold for fsync */
+ unsigned int min_hot_blocks; /* threshold for hot block allocation */

/* for flush command control */
struct flush_cmd_control *fcc_info;
@@ -1713,6 +1714,7 @@ enum {
FI_DO_DEFRAG, /* indicate defragment is running */
FI_DIRTY_FILE, /* indicate regular/symlink has dirty pages */
FI_NO_PREALLOC, /* indicate skipped preallocated blocks */
+ FI_HOT_DATA, /* indicate file is hot */
};

static inline void __mark_inode_dirty_flag(struct inode *inode,
diff --git a/fs/f2fs/segment.c b/fs/f2fs/segment.c
index c5a5258f71c5..0cba28f95bb8 100644
--- a/fs/f2fs/segment.c
+++ b/fs/f2fs/segment.c
@@ -1797,18 +1797,16 @@ static int __get_segment_type_6(struct page *page, enum page_type p_type)
if (p_type == DATA) {
struct inode *inode = page->mapping->host;

- if (S_ISDIR(inode->i_mode))
- return CURSEG_HOT_DATA;
- else if (is_cold_data(page) || file_is_cold(inode))
+ if (is_cold_data(page) || file_is_cold(inode))
return CURSEG_COLD_DATA;
- else
- return CURSEG_WARM_DATA;
+ if (is_inode_flag_set(inode, FI_HOT_DATA))
+ return CURSEG_HOT_DATA;
+ return CURSEG_WARM_DATA;
} else {
if (IS_DNODE(page))
return is_cold_node(page) ? CURSEG_WARM_NODE :
CURSEG_HOT_NODE;
- else
- return CURSEG_COLD_NODE;
+ return CURSEG_COLD_NODE;
}
}

@@ -2915,6 +2913,7 @@ int build_segment_manager(struct f2fs_sb_info *sbi)
sm_info->ipu_policy = 1 << F2FS_IPU_FSYNC;
sm_info->min_ipu_util = DEF_MIN_IPU_UTIL;
sm_info->min_fsync_blocks = DEF_MIN_FSYNC_BLOCKS;
+ sm_info->min_hot_blocks = DEF_MIN_HOT_BLOCKS;

sm_info->trim_sections = DEF_BATCHED_TRIM_SECTIONS;

diff --git a/fs/f2fs/segment.h b/fs/f2fs/segment.h
index 5e8ad4280a50..4d0dd9f7f4ed 100644
--- a/fs/f2fs/segment.h
+++ b/fs/f2fs/segment.h
@@ -540,6 +540,7 @@ static inline int utilization(struct f2fs_sb_info *sbi)
*/
#define DEF_MIN_IPU_UTIL 70
#define DEF_MIN_FSYNC_BLOCKS 8
+#define DEF_MIN_HOT_BLOCKS 16

enum {
F2FS_IPU_FORCE,
diff --git a/fs/f2fs/super.c b/fs/f2fs/super.c
index 49434f951ace..c4fa1ace8e55 100644
--- a/fs/f2fs/super.c
+++ b/fs/f2fs/super.c
@@ -294,6 +294,7 @@ F2FS_RW_ATTR(SM_INFO, f2fs_sm_info, batched_trim_sections, trim_sections);
F2FS_RW_ATTR(SM_INFO, f2fs_sm_info, ipu_policy, ipu_policy);
F2FS_RW_ATTR(SM_INFO, f2fs_sm_info, min_ipu_util, min_ipu_util);
F2FS_RW_ATTR(SM_INFO, f2fs_sm_info, min_fsync_blocks, min_fsync_blocks);
+F2FS_RW_ATTR(SM_INFO, f2fs_sm_info, min_hot_blocks, min_hot_blocks);
F2FS_RW_ATTR(NM_INFO, f2fs_nm_info, ram_thresh, ram_thresh);
F2FS_RW_ATTR(NM_INFO, f2fs_nm_info, ra_nid_pages, ra_nid_pages);
F2FS_RW_ATTR(NM_INFO, f2fs_nm_info, dirty_nats_ratio, dirty_nats_ratio);
@@ -319,6 +320,7 @@ static struct attribute *f2fs_attrs[] = {
ATTR_LIST(ipu_policy),
ATTR_LIST(min_ipu_util),
ATTR_LIST(min_fsync_blocks),
+ ATTR_LIST(min_hot_blocks),
ATTR_LIST(max_victim_search),
ATTR_LIST(dir_level),
ATTR_LIST(ram_thresh),
--
2.11.0

2017-03-25 08:00:01

by Jaegeuk Kim

[permalink] [raw]
Subject: [PATCH 5/5] f2fs: fix wrong max cost initialization

This patch fixes missing increased max cost caused by a patch that we increased
cose of data segments in greedy algorithm.

Cc: <[email protected]> # v4.10+
Fixes: b9cd20619 "f2fs: node segment is prior to data segment selected victim"
Signed-off-by: Jaegeuk Kim <[email protected]>
---
fs/f2fs/gc.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/fs/f2fs/gc.c b/fs/f2fs/gc.c
index 9d4158d55dbb..c52656ccbde5 100644
--- a/fs/f2fs/gc.c
+++ b/fs/f2fs/gc.c
@@ -186,7 +186,7 @@ static unsigned int get_max_cost(struct f2fs_sb_info *sbi,
if (p->alloc_mode == SSR)
return sbi->blocks_per_seg;
if (p->gc_mode == GC_GREEDY)
- return sbi->blocks_per_seg * p->ofs_unit;
+ return 2 * sbi->blocks_per_seg * p->ofs_unit;
else if (p->gc_mode == GC_CB)
return UINT_MAX;
else /* No other gc_mode */
--
2.11.0

2017-03-25 08:00:00

by Jaegeuk Kim

[permalink] [raw]
Subject: [PATCH 4/5] f2fs: start SSR much eariler to avoid FG_GC

This patch initiates SSR much eariler, resulting in less FG_GC.

Signed-off-by: Jaegeuk Kim <[email protected]>
---
fs/f2fs/segment.h | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/fs/f2fs/segment.h b/fs/f2fs/segment.h
index 4d0dd9f7f4ed..57e36c1ce7bd 100644
--- a/fs/f2fs/segment.h
+++ b/fs/f2fs/segment.h
@@ -495,7 +495,7 @@ static inline bool need_SSR(struct f2fs_sb_info *sbi)
return false;

return free_sections(sbi) <= (node_secs + 2 * dent_secs + imeta_secs +
- reserved_sections(sbi) + 1);
+ 2 * reserved_sections(sbi));
}

static inline bool has_not_enough_free_secs(struct f2fs_sb_info *sbi,
--
2.11.0

2017-03-25 08:00:46

by Jaegeuk Kim

[permalink] [raw]
Subject: [PATCH 3/5] f2fs: allocate node and hot data in the beginning of partition

In order to give more spatial locality, this patch changes the block allocation
policy to assign beginning of partition for small and hot blocks.

So, in main area, it tries to split cold data from all the other data types.

Signed-off-by: Jaegeuk Kim <[email protected]>
---
fs/f2fs/gc.c | 6 +++++-
fs/f2fs/segment.c | 9 +++++++++
fs/f2fs/super.c | 1 +
3 files changed, 15 insertions(+), 1 deletion(-)

diff --git a/fs/f2fs/gc.c b/fs/f2fs/gc.c
index bbeee41aaf73..9d4158d55dbb 100644
--- a/fs/f2fs/gc.c
+++ b/fs/f2fs/gc.c
@@ -172,7 +172,11 @@ static void select_policy(struct f2fs_sb_info *sbi, int gc_type,
if (gc_type != FG_GC && p->max_search > sbi->max_victim_search)
p->max_search = sbi->max_victim_search;

- p->offset = sbi->last_victim[p->gc_mode];
+ /* let's select beginning hot/small space first */
+ if (type == CURSEG_HOT_DATA || IS_NODESEG(type))
+ p->offset = 0;
+ else
+ p->offset = sbi->last_victim[p->gc_mode];
}

static unsigned int get_max_cost(struct f2fs_sb_info *sbi,
diff --git a/fs/f2fs/segment.c b/fs/f2fs/segment.c
index 0cba28f95bb8..302d2accfe17 100644
--- a/fs/f2fs/segment.c
+++ b/fs/f2fs/segment.c
@@ -1519,6 +1519,14 @@ static void reset_curseg(struct f2fs_sb_info *sbi, int type, int modified)
__set_sit_entry_type(sbi, type, curseg->segno, modified);
}

+static unsigned int __get_next_segno(struct f2fs_sb_info *sbi, int type)
+{
+ if (type == CURSEG_HOT_DATA || IS_NODESEG(type))
+ return 0;
+
+ return CURSEG_I(sbi, type)->segno;
+}
+
/*
* Allocate a current working segment.
* This function always allocates a free segment in LFS manner.
@@ -1537,6 +1545,7 @@ static void new_curseg(struct f2fs_sb_info *sbi, int type, bool new_sec)
if (test_opt(sbi, NOHEAP))
dir = ALLOC_RIGHT;

+ segno = __get_next_segno(sbi, type);
get_new_segment(sbi, &segno, new_sec, dir);
curseg->next_segno = segno;
reset_curseg(sbi, type, 1);
diff --git a/fs/f2fs/super.c b/fs/f2fs/super.c
index c4fa1ace8e55..32f08aeddcc5 100644
--- a/fs/f2fs/super.c
+++ b/fs/f2fs/super.c
@@ -1049,6 +1049,7 @@ static void default_options(struct f2fs_sb_info *sbi)
set_opt(sbi, INLINE_DATA);
set_opt(sbi, INLINE_DENTRY);
set_opt(sbi, EXTENT_CACHE);
+ set_opt(sbi, NOHEAP);
sbi->sb->s_flags |= MS_LAZYTIME;
set_opt(sbi, FLUSH_MERGE);
if (f2fs_sb_mounted_blkzoned(sbi->sb)) {
--
2.11.0

2017-03-25 09:06:00

by Chao Yu

[permalink] [raw]
Subject: Re: [f2fs-dev] [PATCH 1/5] f2fs: relax node version check for victim data in gc

Hi Jaegeuk,

On 2017/3/25 15:59, Jaegeuk Kim wrote:
> - has_not_enough_free_secs
> node_secs: 0 dent_secs: 0 freed:0 free_segments:103 reserved:104
>
> - f2fs_gc
> - get_victim_by_default
> alloc_mode 0, gc_mode 1, max_search 2672, offset 4654, ofs_unit 1
>
> - do_garbage_collect
> start_segno 3976, end_segno 3977 type 0
>
> - is_alive
> nid 22797, blkaddr 2131882, ofs_in_node 0, version 0x8/0x0
>
> - gc_data_segment 766, segno 3976, block 512/426 not alive
>
> So, this patch fixes subtle corrupted case where node version does not match
> to summary version which results in infinite loop by gc.
>
> Reported-by: Yunlei He <[email protected]>
> Signed-off-by: Jaegeuk Kim <[email protected]>
> ---
> fs/f2fs/gc.c | 6 ++++--
> 1 file changed, 4 insertions(+), 2 deletions(-)
>
> diff --git a/fs/f2fs/gc.c b/fs/f2fs/gc.c
> index 939be88a8833..bbeee41aaf73 100644
> --- a/fs/f2fs/gc.c
> +++ b/fs/f2fs/gc.c
> @@ -551,8 +551,10 @@ static bool is_alive(struct f2fs_sb_info *sbi, struct f2fs_summary *sum,
> get_node_info(sbi, nid, dni);
>
> if (sum->version != dni->version) {

If the node was been truncated, we will increase its version number, since it
was been truncated, so it will never be writebacked to storage, so the version
in summary will not be updated.

So this case can happen, shouldn't we just set SBI_NEED_FSCK for the case:
sum->version != dni->version - 1

Thanks,

> - f2fs_put_page(node_page, 1);
> - return false;
> + f2fs_msg(sbi->sb, KERN_WARNING,
> + "%s: valid data with mismatched node version.",
> + __func__);
> + set_sbi_flag(sbi, SBI_NEED_FSCK);
> }
>
> *nofs = ofs_of_node(node_page);
>

2017-03-25 21:28:24

by Jaegeuk Kim

[permalink] [raw]
Subject: Re: [f2fs-dev] [PATCH 1/5] f2fs: relax node version check for victim data in gc

On 03/25, Chao Yu wrote:
> Hi Jaegeuk,
>
> On 2017/3/25 15:59, Jaegeuk Kim wrote:
> > - has_not_enough_free_secs
> > node_secs: 0 dent_secs: 0 freed:0 free_segments:103 reserved:104
> >
> > - f2fs_gc
> > - get_victim_by_default
> > alloc_mode 0, gc_mode 1, max_search 2672, offset 4654, ofs_unit 1
> >
> > - do_garbage_collect
> > start_segno 3976, end_segno 3977 type 0
> >
> > - is_alive
> > nid 22797, blkaddr 2131882, ofs_in_node 0, version 0x8/0x0
> >
> > - gc_data_segment 766, segno 3976, block 512/426 not alive
> >
> > So, this patch fixes subtle corrupted case where node version does not match
> > to summary version which results in infinite loop by gc.
> >
> > Reported-by: Yunlei He <[email protected]>
> > Signed-off-by: Jaegeuk Kim <[email protected]>
> > ---
> > fs/f2fs/gc.c | 6 ++++--
> > 1 file changed, 4 insertions(+), 2 deletions(-)
> >
> > diff --git a/fs/f2fs/gc.c b/fs/f2fs/gc.c
> > index 939be88a8833..bbeee41aaf73 100644
> > --- a/fs/f2fs/gc.c
> > +++ b/fs/f2fs/gc.c
> > @@ -551,8 +551,10 @@ static bool is_alive(struct f2fs_sb_info *sbi, struct f2fs_summary *sum,
> > get_node_info(sbi, nid, dni);
> >
> > if (sum->version != dni->version) {
>
> If the node was been truncated, we will increase its version number, since it
> was been truncated, so it will never be writebacked to storage, so the version
> in summary will not be updated.

That's covered by node page lock, so we shouldn't be reached out to this point.
Let's think more about this.

Thanks,

> So this case can happen, shouldn't we just set SBI_NEED_FSCK for the case:
> sum->version != dni->version - 1
>
> Thanks,
>
> > - f2fs_put_page(node_page, 1);
> > - return false;
> > + f2fs_msg(sbi->sb, KERN_WARNING,
> > + "%s: valid data with mismatched node version.",
> > + __func__);
> > + set_sbi_flag(sbi, SBI_NEED_FSCK);
> > }
> >
> > *nofs = ofs_of_node(node_page);
> >

2017-03-27 08:18:50

by Chao Yu

[permalink] [raw]
Subject: Re: [f2fs-dev] [PATCH 1/5] f2fs: relax node version check for victim data in gc

On 2017/3/26 5:27, Jaegeuk Kim wrote:
> On 03/25, Chao Yu wrote:
>> Hi Jaegeuk,
>>
>> On 2017/3/25 15:59, Jaegeuk Kim wrote:
>>> - has_not_enough_free_secs
>>> node_secs: 0 dent_secs: 0 freed:0 free_segments:103 reserved:104
>>>
>>> - f2fs_gc
>>> - get_victim_by_default
>>> alloc_mode 0, gc_mode 1, max_search 2672, offset 4654, ofs_unit 1
>>>
>>> - do_garbage_collect
>>> start_segno 3976, end_segno 3977 type 0
>>>
>>> - is_alive
>>> nid 22797, blkaddr 2131882, ofs_in_node 0, version 0x8/0x0
>>>
>>> - gc_data_segment 766, segno 3976, block 512/426 not alive
>>>
>>> So, this patch fixes subtle corrupted case where node version does not match
>>> to summary version which results in infinite loop by gc.
>>>
>>> Reported-by: Yunlei He <[email protected]>
>>> Signed-off-by: Jaegeuk Kim <[email protected]>
>>> ---
>>> fs/f2fs/gc.c | 6 ++++--
>>> 1 file changed, 4 insertions(+), 2 deletions(-)
>>>
>>> diff --git a/fs/f2fs/gc.c b/fs/f2fs/gc.c
>>> index 939be88a8833..bbeee41aaf73 100644
>>> --- a/fs/f2fs/gc.c
>>> +++ b/fs/f2fs/gc.c
>>> @@ -551,8 +551,10 @@ static bool is_alive(struct f2fs_sb_info *sbi, struct f2fs_summary *sum,
>>> get_node_info(sbi, nid, dni);
>>>
>>> if (sum->version != dni->version) {
>>
>> If the node was been truncated, we will increase its version number, since it
>> was been truncated, so it will never be writebacked to storage, so the version
>> in summary will not be updated.
>
> That's covered by node page lock, so we shouldn't be reached out to this point.
> Let's think more about this.

Yes, agreed. ;)

Thanks,

>
> Thanks,
>
>> So this case can happen, shouldn't we just set SBI_NEED_FSCK for the case:
>> sum->version != dni->version - 1
>>
>> Thanks,
>>
>>> - f2fs_put_page(node_page, 1);
>>> - return false;
>>> + f2fs_msg(sbi->sb, KERN_WARNING,
>>> + "%s: valid data with mismatched node version.",
>>> + __func__);
>>> + set_sbi_flag(sbi, SBI_NEED_FSCK);
>>> }
>>>
>>> *nofs = ofs_of_node(node_page);
>>>
>
> .
>

2017-03-29 07:16:58

by heyunlei

[permalink] [raw]
Subject: Re: [f2fs-dev] [PATCH 1/5] f2fs: relax node version check for victim data in gc

Hi all,

On 2017/3/26 5:27, Jaegeuk Kim wrote:
> On 03/25, Chao Yu wrote:
>> Hi Jaegeuk,
>>
>> On 2017/3/25 15:59, Jaegeuk Kim wrote:
>>> - has_not_enough_free_secs
>>> node_secs: 0 dent_secs: 0 freed:0 free_segments:103 reserved:104
>>>
>>> - f2fs_gc
>>> - get_victim_by_default
>>> alloc_mode 0, gc_mode 1, max_search 2672, offset 4654, ofs_unit 1
>>>
>>> - do_garbage_collect
>>> start_segno 3976, end_segno 3977 type 0
>>>
>>> - is_alive
>>> nid 22797, blkaddr 2131882, ofs_in_node 0, version 0x8/0x0
>>>
>>> - gc_data_segment 766, segno 3976, block 512/426 not alive
>>>
>>> So, this patch fixes subtle corrupted case where node version does not match
>>> to summary version which results in infinite loop by gc.
>>>
>>> Reported-by: Yunlei He <[email protected]>
>>> Signed-off-by: Jaegeuk Kim <[email protected]>
>>> ---
>>> fs/f2fs/gc.c | 6 ++++--
>>> 1 file changed, 4 insertions(+), 2 deletions(-)
>>>
>>> diff --git a/fs/f2fs/gc.c b/fs/f2fs/gc.c
>>> index 939be88a8833..bbeee41aaf73 100644
>>> --- a/fs/f2fs/gc.c
>>> +++ b/fs/f2fs/gc.c
>>> @@ -551,8 +551,10 @@ static bool is_alive(struct f2fs_sb_info *sbi, struct f2fs_summary *sum,
>>> get_node_info(sbi, nid, dni);
>>>
>>> if (sum->version != dni->version) {
>>
>> If the node was been truncated, we will increase its version number, since it
>> was been truncated, so it will never be writebacked to storage, so the version
>> in summary will not be updated.
>

The same problem I came across with a node segment:

481 get_node_info(sbi, nid, &ni);
482 if (ni.blk_addr != start_addr + off) {
483 f2fs_put_page(node_page, 1);
484 continue;
485 }

Here, get victim method always selected segno 5169 for garbage collection,

but this section gc failed for upper condition:

gc_node_segment 494, blk_addr 1697572,start_addr 2668544,off 200

I think is same problem with is_alive function.

Thanks.


> That's covered by node page lock, so we shouldn't be reached out to this point.
> Let's think more about this.
>
> Thanks,
>
>> So this case can happen, shouldn't we just set SBI_NEED_FSCK for the case:
>> sum->version != dni->version - 1
>>
>> Thanks,
>>
>>> - f2fs_put_page(node_page, 1);
>>> - return false;
>>> + f2fs_msg(sbi->sb, KERN_WARNING,
>>> + "%s: valid data with mismatched node version.",
>>> + __func__);
>>> + set_sbi_flag(sbi, SBI_NEED_FSCK);
>>> }
>>>
>>> *nofs = ofs_of_node(node_page);
>>>
>
> ------------------------------------------------------------------------------
> Check out the vibrant tech community on one of the world's most
> engaging tech sites, Slashdot.org! http://sdm.link/slashdot
> _______________________________________________
> Linux-f2fs-devel mailing list
> [email protected]
> https://lists.sourceforge.net/lists/listinfo/linux-f2fs-devel
>
> .
>