2020-08-04 13:20:05

by Chao Yu

[permalink] [raw]
Subject: [PATCH v2 0/5] Support Age-Threshold based Garbage Collection for f2fs

This patch series introduce a new garbage collection algorithm named ATGC
(Age Threshold based Garbage Collection) in order to enhance efficiency and
effect of background garbage collection.

ATGC algorithm tries to fliter few oldest candidates according to defined
age threshold, it selects source section in those candidates and then
select target segment which has almost the same age of source section,
finally, it migrates valid blocks from source section into target segment
with SSR allocator, enhancement shows in below aspects:
- it avoids selecting young victim section which may have high update
frequency;
- SSR write avoids unneeded movement of blocks locate in target segment;
- source section and target segment has almost the same age (update
frequency), it keeps well hot/cold separation effect;

Patch 1 introduces in-memory curseg, since it only exists in memory, so it
avoids on-disk layout change whenever we want to add a new type of log.
Firstly, adapting aligned pinfile allocation to use in-memory curseg,
later, we will use in-memory curseg in ATGC feature for migration as well.

Patch 2 changes segment's mtime definition from recording last update time
to recording average update time, so that it can indicate more precise
update time of each valid blocks in segment.

Patch 3 changes segment mtime update policy during GC, 1) don't update
original segment's mtime; 2) inherit segment's mtime from original segment
to target segment.

Patch 4 adds to support 64-bits key in rb-tree node entry, and introduce
f2fs_lookup_rb_tree_ext() to support lookup functionality with 64-bits key.

Patch 5 adds to support ATGC skeleton to enhance effect and efficiency
of BGGC.

Test and result:
- create 160 dirty segments:
* half of them have 128 valid blocks per segment
* left of them have 384 valid blocks per segment
- run background GC

Benefit: GC count and block movement count both decrease obviously:

- Before:
GC calls: 162 (BG: 220)

Try to move 41454 blocks (BG: 41454)
- data blocks : 40960 (40960)

SSR: 0 blocks in 0 segments
LFS: 41364 blocks in 81 segments

- After:
GC calls: 75 (BG: 76)

Try to move 12813 blocks (BG: 12813)
- data blocks : 12544 (12544)

SSR: 12032 blocks in 77 segments
LFS: 855 blocks in 2 segments

v1 -> v2:
- patch 1/5: fix wrong log type
- patch 3/5: add missing blank line in between functions
- patch 5/5: add mount option to enable atgc

Chao Yu (5):
f2fs: introduce inmem curseg
f2fs: record average update time of segment
f2fs: inherit mtime of original block during GC
f2fs: support 64-bits key in f2fs rb-tree node entry
f2fs: support age threshold based garbage collection

Documentation/filesystems/f2fs.rst | 2 +
fs/f2fs/checkpoint.c | 7 +-
fs/f2fs/data.c | 2 +-
fs/f2fs/debug.c | 10 +-
fs/f2fs/extent_cache.c | 37 ++-
fs/f2fs/f2fs.h | 51 +++-
fs/f2fs/file.c | 5 +-
fs/f2fs/gc.c | 372 ++++++++++++++++++++++++++++-
fs/f2fs/gc.h | 25 ++
fs/f2fs/segment.c | 332 +++++++++++++++++++------
fs/f2fs/segment.h | 40 +++-
fs/f2fs/super.c | 35 ++-
include/trace/events/f2fs.h | 8 +-
13 files changed, 818 insertions(+), 108 deletions(-)

--
2.26.2


2020-08-04 13:20:30

by Chao Yu

[permalink] [raw]
Subject: [PATCH v2 1/5] f2fs: introduce inmem curseg

Previous implementation of aligned pinfile allocation will:
- allocate new segment on cold data log no matter whether last used
segment is partially used or not, it makes IOs more random;
- force concurrent cold data/GCed IO going into warm data area, it
can make a bad effect on hot/cold data separation;

In this patch, we introduce a new type of log named 'inmem curseg',
the differents from normal curseg is:
- it reuses existed segment type (CURSEG_XXX_NODE/DATA);
- it only exists in memory, its segno, blkofs, summary will not b
persisted into checkpoint area;

With this new feature, we can enhance scalability of log, special
allocators can be created for purposes:
- pure lfs allocator for aligned pinfile allocation or file
defragmentation
- pure ssr allocator for later feature

So that, let's update aligned pinfile allocation to use this new
inmem curseg fwk.

Signed-off-by: Chao Yu <[email protected]>
---
fs/f2fs/checkpoint.c | 7 ++-
fs/f2fs/debug.c | 6 ++-
fs/f2fs/f2fs.h | 12 +++--
fs/f2fs/file.c | 5 +-
fs/f2fs/gc.c | 2 +-
fs/f2fs/segment.c | 107 ++++++++++++++++++++++++++++++-------------
fs/f2fs/segment.h | 17 ++++---
fs/f2fs/super.c | 9 ++--
8 files changed, 113 insertions(+), 52 deletions(-)

diff --git a/fs/f2fs/checkpoint.c b/fs/f2fs/checkpoint.c
index ff807e14c891..9e30ff6414b8 100644
--- a/fs/f2fs/checkpoint.c
+++ b/fs/f2fs/checkpoint.c
@@ -1619,11 +1619,16 @@ int f2fs_write_checkpoint(struct f2fs_sb_info *sbi, struct cp_control *cpc)

f2fs_flush_sit_entries(sbi, cpc);

+ /* save inmem log status */
+ f2fs_save_inmem_curseg(sbi, CURSEG_COLD_DATA_PINNED);
+
err = do_checkpoint(sbi, cpc);
if (err)
f2fs_release_discard_addrs(sbi);
else
f2fs_clear_prefree_segments(sbi, cpc);
+
+ f2fs_restore_inmem_curseg(sbi, CURSEG_COLD_DATA_PINNED);
stop:
unblock_operations(sbi);
stat_inc_cp_count(sbi->stat_info);
@@ -1654,7 +1659,7 @@ void f2fs_init_ino_entry_info(struct f2fs_sb_info *sbi)
}

sbi->max_orphans = (sbi->blocks_per_seg - F2FS_CP_PACKS -
- NR_CURSEG_TYPE - __cp_payload(sbi)) *
+ NR_CURSEG_PERSIST_TYPE - __cp_payload(sbi)) *
F2FS_ORPHANS_PER_BLOCK;
}

diff --git a/fs/f2fs/debug.c b/fs/f2fs/debug.c
index 4276c0f79beb..41a91aa8c262 100644
--- a/fs/f2fs/debug.c
+++ b/fs/f2fs/debug.c
@@ -164,7 +164,7 @@ static void update_general_status(struct f2fs_sb_info *sbi)
* 100 / (int)(sbi->user_block_count >> sbi->log_blocks_per_seg)
/ 2;
si->util_invalid = 50 - si->util_free - si->util_valid;
- for (i = CURSEG_HOT_DATA; i <= CURSEG_COLD_NODE; i++) {
+ for (i = CURSEG_HOT_DATA; i < NO_CHECK_TYPE; i++) {
struct curseg_info *curseg = CURSEG_I(sbi, i);
si->curseg[i] = curseg->segno;
si->cursec[i] = GET_SEC_FROM_SEG(sbi, curseg->segno);
@@ -393,6 +393,10 @@ static int stat_show(struct seq_file *s, void *v)
si->dirty_seg[CURSEG_COLD_NODE],
si->full_seg[CURSEG_COLD_NODE],
si->valid_blks[CURSEG_COLD_NODE]);
+ seq_printf(s, " - Pinned file: %8d %8d %8d\n",
+ si->curseg[CURSEG_COLD_DATA_PINNED],
+ si->cursec[CURSEG_COLD_DATA_PINNED],
+ si->curzone[CURSEG_COLD_DATA_PINNED]);
seq_printf(s, "\n - Valid: %d\n - Dirty: %d\n",
si->main_area_segs - si->dirty_count -
si->prefree_count - si->free_segs,
diff --git a/fs/f2fs/f2fs.h b/fs/f2fs/f2fs.h
index 16322ea5b463..784de42a9f9c 100644
--- a/fs/f2fs/f2fs.h
+++ b/fs/f2fs/f2fs.h
@@ -973,7 +973,9 @@ static inline void set_new_dnode(struct dnode_of_data *dn, struct inode *inode,
*/
#define NR_CURSEG_DATA_TYPE (3)
#define NR_CURSEG_NODE_TYPE (3)
-#define NR_CURSEG_TYPE (NR_CURSEG_DATA_TYPE + NR_CURSEG_NODE_TYPE)
+#define NR_CURSEG_INMEM_TYPE (1)
+#define NR_CURSEG_PERSIST_TYPE (NR_CURSEG_DATA_TYPE + NR_CURSEG_NODE_TYPE)
+#define NR_CURSEG_TYPE (NR_CURSEG_INMEM_TYPE + NR_CURSEG_PERSIST_TYPE)

enum {
CURSEG_HOT_DATA = 0, /* directory entry blocks */
@@ -982,8 +984,10 @@ enum {
CURSEG_HOT_NODE, /* direct node blocks of directory files */
CURSEG_WARM_NODE, /* direct node blocks of normal files */
CURSEG_COLD_NODE, /* indirect node blocks */
- NO_CHECK_TYPE,
- CURSEG_COLD_DATA_PINNED,/* cold data for pinned file */
+ NR_PERSISTENT_LOG, /* number of persistent log */
+ CURSEG_COLD_DATA_PINNED = NR_PERSISTENT_LOG,
+ /* pinned file that needs consecutive block address */
+ NO_CHECK_TYPE, /* number of persistent & inmem log */
};

struct flush_cmd {
@@ -3332,6 +3336,8 @@ block_t f2fs_get_unusable_blocks(struct f2fs_sb_info *sbi);
int f2fs_disable_cp_again(struct f2fs_sb_info *sbi, block_t unusable);
void f2fs_release_discard_addrs(struct f2fs_sb_info *sbi);
int f2fs_npages_for_summary_flush(struct f2fs_sb_info *sbi, bool for_ra);
+void f2fs_save_inmem_curseg(struct f2fs_sb_info *sbi, int type);
+void f2fs_restore_inmem_curseg(struct f2fs_sb_info *sbi, int type);
void f2fs_allocate_segment_for_resize(struct f2fs_sb_info *sbi, int type,
unsigned int start, unsigned int end);
void f2fs_allocate_new_segment(struct f2fs_sb_info *sbi, int type);
diff --git a/fs/f2fs/file.c b/fs/f2fs/file.c
index 8a422400e824..cb7427bf2783 100644
--- a/fs/f2fs/file.c
+++ b/fs/f2fs/file.c
@@ -1656,13 +1656,14 @@ static int expand_inode_data(struct inode *inode, loff_t offset,
}

down_write(&sbi->pin_sem);
- map.m_seg_type = CURSEG_COLD_DATA_PINNED;

f2fs_lock_op(sbi);
- f2fs_allocate_new_segment(sbi, CURSEG_COLD_DATA);
+ f2fs_allocate_new_segment(sbi, CURSEG_COLD_DATA_PINNED);
f2fs_unlock_op(sbi);

+ map.m_seg_type = CURSEG_COLD_DATA_PINNED;
err = f2fs_map_blocks(inode, &map, 1, F2FS_GET_BLOCK_PRE_DIO);
+
up_write(&sbi->pin_sem);

done += map.m_len;
diff --git a/fs/f2fs/gc.c b/fs/f2fs/gc.c
index 11b4adde9baf..755641047978 100644
--- a/fs/f2fs/gc.c
+++ b/fs/f2fs/gc.c
@@ -1450,7 +1450,7 @@ static int free_segment_range(struct f2fs_sb_info *sbi,
mutex_unlock(&DIRTY_I(sbi)->seglist_lock);

/* Move out cursegs from the target range */
- for (type = CURSEG_HOT_DATA; type < NR_CURSEG_TYPE; type++)
+ for (type = CURSEG_HOT_DATA; type < NR_CURSEG_PERSIST_TYPE; type++)
f2fs_allocate_segment_for_resize(sbi, type, start, end);

/* do GC to move out valid blocks in the range */
diff --git a/fs/f2fs/segment.c b/fs/f2fs/segment.c
index a65d357f89a9..c9755a0cd304 100644
--- a/fs/f2fs/segment.c
+++ b/fs/f2fs/segment.c
@@ -1958,7 +1958,7 @@ static void set_prefree_as_free_segments(struct f2fs_sb_info *sbi)

mutex_lock(&dirty_i->seglist_lock);
for_each_set_bit(segno, dirty_i->dirty_segmap[PRE], MAIN_SEGS(sbi))
- __set_test_and_free(sbi, segno);
+ __set_test_and_free(sbi, segno, false);
mutex_unlock(&dirty_i->seglist_lock);
}

@@ -2496,6 +2496,7 @@ static void reset_curseg(struct f2fs_sb_info *sbi, int type, int modified)
struct curseg_info *curseg = CURSEG_I(sbi, type);
struct summary_footer *sum_footer;

+ curseg->inited = true;
curseg->segno = curseg->next_segno;
curseg->zone = GET_ZONE_FROM_SEG(sbi, curseg->segno);
curseg->next_blkoff = 0;
@@ -2503,24 +2504,31 @@ static void reset_curseg(struct f2fs_sb_info *sbi, int type, int modified)

sum_footer = &(curseg->sum_blk->footer);
memset(sum_footer, 0, sizeof(struct summary_footer));
- if (IS_DATASEG(type))
+ if (IS_DATASEG(curseg->seg_type))
SET_SUM_TYPE(sum_footer, SUM_TYPE_DATA);
- if (IS_NODESEG(type))
+ if (IS_NODESEG(curseg->seg_type))
SET_SUM_TYPE(sum_footer, SUM_TYPE_NODE);
- __set_sit_entry_type(sbi, type, curseg->segno, modified);
+ __set_sit_entry_type(sbi, curseg->seg_type, curseg->segno, modified);
}

static unsigned int __get_next_segno(struct f2fs_sb_info *sbi, int type)
{
+ struct curseg_info *curseg = CURSEG_I(sbi, type);
+
/* if segs_per_sec is large than 1, we need to keep original policy. */
if (__is_large_section(sbi))
- return CURSEG_I(sbi, type)->segno;
+ return curseg->segno;
+
+ /* inmem log may not locate on any segment after mount */
+ if (!curseg->inited)
+ return 0;

if (unlikely(is_sbi_flag_set(sbi, SBI_CP_DISABLED)))
return 0;

if (test_opt(sbi, NOHEAP) &&
- (type == CURSEG_HOT_DATA || IS_NODESEG(type)))
+ (curseg->seg_type == CURSEG_HOT_DATA ||
+ IS_NODESEG(curseg->seg_type)))
return 0;

if (SIT_I(sbi)->last_victim[ALLOC_NEXT])
@@ -2530,7 +2538,7 @@ static unsigned int __get_next_segno(struct f2fs_sb_info *sbi, int type)
if (F2FS_OPTION(sbi).alloc_mode == ALLOC_MODE_REUSE)
return 0;

- return CURSEG_I(sbi, type)->segno;
+ return curseg->segno;
}

/*
@@ -2540,12 +2548,14 @@ static unsigned int __get_next_segno(struct f2fs_sb_info *sbi, int type)
static void new_curseg(struct f2fs_sb_info *sbi, int type, bool new_sec)
{
struct curseg_info *curseg = CURSEG_I(sbi, type);
+ unsigned short seg_type = curseg->seg_type;
unsigned int segno = curseg->segno;
int dir = ALLOC_LEFT;

- write_sum_page(sbi, curseg->sum_blk,
+ if (curseg->inited)
+ write_sum_page(sbi, curseg->sum_blk,
GET_SUM_BLOCK(sbi, segno));
- if (type == CURSEG_WARM_DATA || type == CURSEG_COLD_DATA)
+ if (seg_type == CURSEG_WARM_DATA || seg_type == CURSEG_COLD_DATA)
dir = ALLOC_RIGHT;

if (test_opt(sbi, NOHEAP))
@@ -2622,6 +2632,43 @@ static void change_curseg(struct f2fs_sb_info *sbi, int type)
f2fs_put_page(sum_page, 1);
}

+void f2fs_save_inmem_curseg(struct f2fs_sb_info *sbi, int type)
+{
+ struct curseg_info *curseg = CURSEG_I(sbi, type);
+
+ mutex_lock(&curseg->curseg_mutex);
+ if (!curseg->inited)
+ goto out;
+
+ if (get_valid_blocks(sbi, curseg->segno, false)) {
+ write_sum_page(sbi, curseg->sum_blk,
+ GET_SUM_BLOCK(sbi, curseg->segno));
+ } else {
+ mutex_lock(&DIRTY_I(sbi)->seglist_lock);
+ __set_test_and_free(sbi, curseg->segno, true);
+ mutex_unlock(&DIRTY_I(sbi)->seglist_lock);
+ }
+out:
+ mutex_unlock(&curseg->curseg_mutex);
+}
+
+void f2fs_restore_inmem_curseg(struct f2fs_sb_info *sbi, int type)
+{
+ struct curseg_info *curseg = CURSEG_I(sbi, type);
+
+ mutex_lock(&curseg->curseg_mutex);
+ if (!curseg->inited)
+ goto out;
+ if (get_valid_blocks(sbi, curseg->segno, false))
+ goto out;
+
+ mutex_lock(&DIRTY_I(sbi)->seglist_lock);
+ __set_test_and_inuse(sbi, curseg->segno);
+ mutex_unlock(&DIRTY_I(sbi)->seglist_lock);
+out:
+ mutex_unlock(&curseg->curseg_mutex);
+}
+
static int get_ssr_segment(struct f2fs_sb_info *sbi, int type)
{
struct curseg_info *curseg = CURSEG_I(sbi, type);
@@ -2738,11 +2785,15 @@ static void __allocate_new_segment(struct f2fs_sb_info *sbi, int type)
struct curseg_info *curseg = CURSEG_I(sbi, type);
unsigned int old_segno;

+ if (!curseg->inited)
+ goto alloc;
+
if (!curseg->next_blkoff &&
!get_valid_blocks(sbi, curseg->segno, false) &&
!get_ckpt_valid_blocks(sbi, curseg->segno))
return;

+alloc:
old_segno = curseg->segno;
SIT_I(sbi)->s_ops->allocate_segment(sbi, type, true);
locate_dirty_segment(sbi, old_segno);
@@ -3126,19 +3177,6 @@ void f2fs_allocate_data_block(struct f2fs_sb_info *sbi, struct page *page,
{
struct sit_info *sit_i = SIT_I(sbi);
struct curseg_info *curseg = CURSEG_I(sbi, type);
- bool put_pin_sem = false;
-
- if (type == CURSEG_COLD_DATA) {
- /* GC during CURSEG_COLD_DATA_PINNED allocation */
- if (down_read_trylock(&sbi->pin_sem)) {
- put_pin_sem = true;
- } else {
- type = CURSEG_WARM_DATA;
- curseg = CURSEG_I(sbi, type);
- }
- } else if (type == CURSEG_COLD_DATA_PINNED) {
- type = CURSEG_COLD_DATA;
- }

down_read(&SM_I(sbi)->curseg_lock);

@@ -3204,9 +3242,6 @@ void f2fs_allocate_data_block(struct f2fs_sb_info *sbi, struct page *page,
mutex_unlock(&curseg->curseg_mutex);

up_read(&SM_I(sbi)->curseg_lock);
-
- if (put_pin_sem)
- up_read(&sbi->pin_sem);
}

static void update_device_state(struct f2fs_io_info *fio)
@@ -3574,7 +3609,7 @@ static int read_normal_summaries(struct f2fs_sb_info *sbi, int type)
blk_off = le16_to_cpu(ckpt->cur_data_blkoff[type -
CURSEG_HOT_DATA]);
if (__exist_node_summaries(sbi))
- blk_addr = sum_blk_addr(sbi, NR_CURSEG_TYPE, type);
+ blk_addr = sum_blk_addr(sbi, NR_CURSEG_PERSIST_TYPE, type);
else
blk_addr = sum_blk_addr(sbi, NR_CURSEG_DATA_TYPE, type);
} else {
@@ -3652,8 +3687,9 @@ static int restore_curseg_summaries(struct f2fs_sb_info *sbi)
}

if (__exist_node_summaries(sbi))
- f2fs_ra_meta_pages(sbi, sum_blk_addr(sbi, NR_CURSEG_TYPE, type),
- NR_CURSEG_TYPE - type, META_CP, true);
+ f2fs_ra_meta_pages(sbi,
+ sum_blk_addr(sbi, NR_CURSEG_PERSIST_TYPE, type),
+ NR_CURSEG_PERSIST_TYPE - type, META_CP, true);

for (; type <= CURSEG_COLD_NODE; type++) {
err = read_normal_summaries(sbi, type);
@@ -4155,14 +4191,14 @@ static int build_curseg(struct f2fs_sb_info *sbi)
struct curseg_info *array;
int i;

- array = f2fs_kzalloc(sbi, array_size(NR_CURSEG_TYPE, sizeof(*array)),
- GFP_KERNEL);
+ array = f2fs_kzalloc(sbi, array_size(NR_CURSEG_TYPE,
+ sizeof(*array)), GFP_KERNEL);
if (!array)
return -ENOMEM;

SM_I(sbi)->curseg_array = array;

- for (i = 0; i < NR_CURSEG_TYPE; i++) {
+ for (i = 0; i < NO_CHECK_TYPE; i++) {
mutex_init(&array[i].curseg_mutex);
array[i].sum_blk = f2fs_kzalloc(sbi, PAGE_SIZE, GFP_KERNEL);
if (!array[i].sum_blk)
@@ -4172,8 +4208,13 @@ static int build_curseg(struct f2fs_sb_info *sbi)
sizeof(struct f2fs_journal), GFP_KERNEL);
if (!array[i].journal)
return -ENOMEM;
+ if (i < NR_PERSISTENT_LOG)
+ array[i].seg_type = CURSEG_HOT_DATA + i;
+ else if (i == CURSEG_COLD_DATA_PINNED)
+ array[i].seg_type = CURSEG_COLD_DATA;
array[i].segno = NULL_SEGNO;
array[i].next_blkoff = 0;
+ array[i].inited = false;
}
return restore_curseg_summaries(sbi);
}
@@ -4408,7 +4449,7 @@ static int sanity_check_curseg(struct f2fs_sb_info *sbi)
* In LFS/SSR curseg, .next_blkoff should point to an unused blkaddr;
* In LFS curseg, all blkaddr after .next_blkoff should be unused.
*/
- for (i = 0; i < NO_CHECK_TYPE; i++) {
+ for (i = 0; i < NR_PERSISTENT_LOG; i++) {
struct curseg_info *curseg = CURSEG_I(sbi, i);
struct seg_entry *se = get_seg_entry(sbi, curseg->segno);
unsigned int blkofs = curseg->next_blkoff;
@@ -4637,7 +4678,7 @@ int f2fs_fix_curseg_write_pointer(struct f2fs_sb_info *sbi)
{
int i, ret;

- for (i = 0; i < NO_CHECK_TYPE; i++) {
+ for (i = 0; i < NR_PERSISTENT_LOG; i++) {
ret = fix_curseg_write_pointer(sbi, i);
if (ret)
return ret;
diff --git a/fs/f2fs/segment.h b/fs/f2fs/segment.h
index 752b177073b2..515d9ed2cd95 100644
--- a/fs/f2fs/segment.h
+++ b/fs/f2fs/segment.h
@@ -22,7 +22,7 @@
#define GET_R2L_SEGNO(free_i, segno) ((segno) + (free_i)->start_segno)

#define IS_DATASEG(t) ((t) <= CURSEG_COLD_DATA)
-#define IS_NODESEG(t) ((t) >= CURSEG_HOT_NODE)
+#define IS_NODESEG(t) ((t) >= CURSEG_HOT_NODE && (t) <= CURSEG_COLD_NODE)

#define IS_HOT(t) ((t) == CURSEG_HOT_NODE || (t) == CURSEG_HOT_DATA)
#define IS_WARM(t) ((t) == CURSEG_WARM_NODE || (t) == CURSEG_WARM_DATA)
@@ -34,7 +34,8 @@
((seg) == CURSEG_I(sbi, CURSEG_COLD_DATA)->segno) || \
((seg) == CURSEG_I(sbi, CURSEG_HOT_NODE)->segno) || \
((seg) == CURSEG_I(sbi, CURSEG_WARM_NODE)->segno) || \
- ((seg) == CURSEG_I(sbi, CURSEG_COLD_NODE)->segno))
+ ((seg) == CURSEG_I(sbi, CURSEG_COLD_NODE)->segno) || \
+ ((seg) == CURSEG_I(sbi, CURSEG_COLD_DATA_PINNED)->segno))

#define IS_CURSEC(sbi, secno) \
(((secno) == CURSEG_I(sbi, CURSEG_HOT_DATA)->segno / \
@@ -48,7 +49,9 @@
((secno) == CURSEG_I(sbi, CURSEG_WARM_NODE)->segno / \
(sbi)->segs_per_sec) || \
((secno) == CURSEG_I(sbi, CURSEG_COLD_NODE)->segno / \
- (sbi)->segs_per_sec)) \
+ (sbi)->segs_per_sec) || \
+ ((secno) == CURSEG_I(sbi, CURSEG_COLD_DATA_PINNED)->segno / \
+ (sbi)->segs_per_sec))

#define MAIN_BLKADDR(sbi) \
(SM_I(sbi) ? SM_I(sbi)->main_blkaddr : \
@@ -288,10 +291,12 @@ struct curseg_info {
struct rw_semaphore journal_rwsem; /* protect journal area */
struct f2fs_journal *journal; /* cached journal info */
unsigned char alloc_type; /* current allocation type */
+ unsigned short seg_type; /* segment type like CURSEG_XXX_TYPE */
unsigned int segno; /* current segment number */
unsigned short next_blkoff; /* next block offset to write */
unsigned int zone; /* current zone number */
unsigned int next_segno; /* preallocated segment */
+ bool inited; /* indicate inmem log is inited */
};

struct sit_entry_set {
@@ -305,8 +310,6 @@ struct sit_entry_set {
*/
static inline struct curseg_info *CURSEG_I(struct f2fs_sb_info *sbi, int type)
{
- if (type == CURSEG_COLD_DATA_PINNED)
- type = CURSEG_COLD_DATA;
return (struct curseg_info *)(SM_I(sbi)->curseg_array + type);
}

@@ -438,7 +441,7 @@ static inline void __set_inuse(struct f2fs_sb_info *sbi,
}

static inline void __set_test_and_free(struct f2fs_sb_info *sbi,
- unsigned int segno)
+ unsigned int segno, bool inmem)
{
struct free_segmap_info *free_i = FREE_I(sbi);
unsigned int secno = GET_SEC_FROM_SEG(sbi, segno);
@@ -449,7 +452,7 @@ static inline void __set_test_and_free(struct f2fs_sb_info *sbi,
if (test_and_clear_bit(segno, free_i->free_segmap)) {
free_i->free_segments++;

- if (IS_CURSEC(sbi, secno))
+ if (!inmem && IS_CURSEC(sbi, secno))
goto skip_free;
next = find_next_bit(free_i->free_segmap,
start_segno + sbi->segs_per_sec, start_segno);
diff --git a/fs/f2fs/super.c b/fs/f2fs/super.c
index cdca2087dba0..bf90c8caeb1e 100644
--- a/fs/f2fs/super.c
+++ b/fs/f2fs/super.c
@@ -578,7 +578,8 @@ static int parse_options(struct super_block *sb, char *options, bool is_remount)
case Opt_active_logs:
if (args->from && match_int(args, &arg))
return -EINVAL;
- if (arg != 2 && arg != 4 && arg != NR_CURSEG_TYPE)
+ if (arg != 2 && arg != 4 &&
+ arg != NR_CURSEG_PERSIST_TYPE)
return -EINVAL;
F2FS_OPTION(sbi).active_logs = arg;
break;
@@ -992,7 +993,7 @@ static int parse_options(struct super_block *sb, char *options, bool is_remount)
}

/* Not pass down write hints if the number of active logs is lesser
- * than NR_CURSEG_TYPE.
+ * than NR_CURSEG_PERSIST_TYPE.
*/
if (F2FS_OPTION(sbi).active_logs != NR_CURSEG_TYPE)
F2FS_OPTION(sbi).whint_mode = WHINT_MODE_OFF;
@@ -1628,7 +1629,7 @@ static int f2fs_show_options(struct seq_file *seq, struct dentry *root)
static void default_options(struct f2fs_sb_info *sbi)
{
/* init some FS parameters */
- F2FS_OPTION(sbi).active_logs = NR_CURSEG_TYPE;
+ F2FS_OPTION(sbi).active_logs = NR_CURSEG_PERSIST_TYPE;
F2FS_OPTION(sbi).inline_xattr_size = DEFAULT_INLINE_XATTR_ADDRS;
F2FS_OPTION(sbi).whint_mode = WHINT_MODE_OFF;
F2FS_OPTION(sbi).alloc_mode = ALLOC_MODE_DEFAULT;
@@ -2960,7 +2961,7 @@ int f2fs_sanity_check_ckpt(struct f2fs_sb_info *sbi)
cp_payload = __cp_payload(sbi);
if (cp_pack_start_sum < cp_payload + 1 ||
cp_pack_start_sum > blocks_per_seg - 1 -
- NR_CURSEG_TYPE) {
+ NR_CURSEG_PERSIST_TYPE) {
f2fs_err(sbi, "Wrong cp_pack_start_sum: %u",
cp_pack_start_sum);
return 1;
--
2.26.2

2020-08-04 13:21:23

by Chao Yu

[permalink] [raw]
Subject: [PATCH v2 4/5] f2fs: support 64-bits key in f2fs rb-tree node entry

then, we can add specified entry into rb-tree with 64-bits segment time
as key.

Signed-off-by: Chao Yu <[email protected]>
---
fs/f2fs/extent_cache.c | 37 +++++++++++++++++++++++++++++++++++--
fs/f2fs/f2fs.h | 15 ++++++++++++---
fs/f2fs/segment.c | 4 ++--
3 files changed, 49 insertions(+), 7 deletions(-)

diff --git a/fs/f2fs/extent_cache.c b/fs/f2fs/extent_cache.c
index 686c68b98610..3ebf976a682d 100644
--- a/fs/f2fs/extent_cache.c
+++ b/fs/f2fs/extent_cache.c
@@ -58,6 +58,29 @@ struct rb_entry *f2fs_lookup_rb_tree(struct rb_root_cached *root,
return re;
}

+struct rb_node **f2fs_lookup_rb_tree_ext(struct f2fs_sb_info *sbi,
+ struct rb_root_cached *root,
+ struct rb_node **parent,
+ unsigned long long key, bool *leftmost)
+{
+ struct rb_node **p = &root->rb_root.rb_node;
+ struct rb_entry *re;
+
+ while (*p) {
+ *parent = *p;
+ re = rb_entry(*parent, struct rb_entry, rb_node);
+
+ if (key < re->key) {
+ p = &(*p)->rb_left;
+ } else {
+ p = &(*p)->rb_right;
+ *leftmost = false;
+ }
+ }
+
+ return p;
+}
+
struct rb_node **f2fs_lookup_rb_tree_for_insert(struct f2fs_sb_info *sbi,
struct rb_root_cached *root,
struct rb_node **parent,
@@ -166,7 +189,7 @@ struct rb_entry *f2fs_lookup_rb_tree_ret(struct rb_root_cached *root,
}

bool f2fs_check_rb_tree_consistence(struct f2fs_sb_info *sbi,
- struct rb_root_cached *root)
+ struct rb_root_cached *root, bool check_key)
{
#ifdef CONFIG_F2FS_CHECK_FS
struct rb_node *cur = rb_first_cached(root), *next;
@@ -183,13 +206,23 @@ bool f2fs_check_rb_tree_consistence(struct f2fs_sb_info *sbi,
cur_re = rb_entry(cur, struct rb_entry, rb_node);
next_re = rb_entry(next, struct rb_entry, rb_node);

+ if (check_key) {
+ if (cur_re->key > next_re->key) {
+ f2fs_info(sbi, "inconsistent rbtree, "
+ "cur(%llu) next(%llu)",
+ cur_re->key, next_re->key);
+ return false;
+ }
+ goto next;
+ }
+
if (cur_re->ofs + cur_re->len > next_re->ofs) {
f2fs_info(sbi, "inconsistent rbtree, cur(%u, %u) next(%u, %u)",
cur_re->ofs, cur_re->len,
next_re->ofs, next_re->len);
return false;
}
-
+next:
cur = next;
}
#endif
diff --git a/fs/f2fs/f2fs.h b/fs/f2fs/f2fs.h
index 20da627249c0..44445b9ccc1b 100644
--- a/fs/f2fs/f2fs.h
+++ b/fs/f2fs/f2fs.h
@@ -612,8 +612,13 @@ enum {

struct rb_entry {
struct rb_node rb_node; /* rb node located in rb-tree */
- unsigned int ofs; /* start offset of the entry */
- unsigned int len; /* length of the entry */
+ union {
+ struct {
+ unsigned int ofs; /* start offset of the entry */
+ unsigned int len; /* length of the entry */
+ };
+ unsigned long long key; /* 64-bits key */
+ };
};

struct extent_info {
@@ -3801,6 +3806,10 @@ void f2fs_leave_shrinker(struct f2fs_sb_info *sbi);
*/
struct rb_entry *f2fs_lookup_rb_tree(struct rb_root_cached *root,
struct rb_entry *cached_re, unsigned int ofs);
+struct rb_node **f2fs_lookup_rb_tree_ext(struct f2fs_sb_info *sbi,
+ struct rb_root_cached *root,
+ struct rb_node **parent,
+ unsigned long long key, bool *left_most);
struct rb_node **f2fs_lookup_rb_tree_for_insert(struct f2fs_sb_info *sbi,
struct rb_root_cached *root,
struct rb_node **parent,
@@ -3811,7 +3820,7 @@ struct rb_entry *f2fs_lookup_rb_tree_ret(struct rb_root_cached *root,
struct rb_node ***insert_p, struct rb_node **insert_parent,
bool force, bool *leftmost);
bool f2fs_check_rb_tree_consistence(struct f2fs_sb_info *sbi,
- struct rb_root_cached *root);
+ struct rb_root_cached *root, bool check_key);
unsigned int f2fs_shrink_extent_tree(struct f2fs_sb_info *sbi, int nr_shrink);
void f2fs_init_extent_tree(struct inode *inode, struct page *ipage);
void f2fs_drop_extent_tree(struct inode *inode);
diff --git a/fs/f2fs/segment.c b/fs/f2fs/segment.c
index 503d7799ea88..f89219db72a2 100644
--- a/fs/f2fs/segment.c
+++ b/fs/f2fs/segment.c
@@ -1521,7 +1521,7 @@ static int __issue_discard_cmd(struct f2fs_sb_info *sbi,
goto next;
if (unlikely(dcc->rbtree_check))
f2fs_bug_on(sbi, !f2fs_check_rb_tree_consistence(sbi,
- &dcc->root));
+ &dcc->root, false));
blk_start_plug(&plug);
list_for_each_entry_safe(dc, tmp, pend_list, list) {
f2fs_bug_on(sbi, dc->state != D_PREP);
@@ -2888,7 +2888,7 @@ static unsigned int __issue_discard_cmd_range(struct f2fs_sb_info *sbi,
mutex_lock(&dcc->cmd_lock);
if (unlikely(dcc->rbtree_check))
f2fs_bug_on(sbi, !f2fs_check_rb_tree_consistence(sbi,
- &dcc->root));
+ &dcc->root, false));

dc = (struct discard_cmd *)f2fs_lookup_rb_tree_ret(&dcc->root,
NULL, start,
--
2.26.2

2020-08-04 13:22:06

by Chao Yu

[permalink] [raw]
Subject: [PATCH v2 2/5] f2fs: record average update time of segment

Previously, once we update one block in segment, we will update mtime of
segment to last time, making aged segment becoming freshest, result in
that GC with cost benefit algorithm missing such segment, So this patch
changes to record mtime as average block updating time instead of last
updating time.

It's not needed to reset mtime for prefree segment, as se->valid_blocks
is zero, then old se->mtime won't take any weight with below calculation:

se->mtime = div_u64(se->mtime * se->valid_blocks + mtime,
se->valid_blocks + 1);

Signed-off-by: Chao Yu <[email protected]>
---
fs/f2fs/segment.c | 21 ++++++++++++++++++---
1 file changed, 18 insertions(+), 3 deletions(-)

diff --git a/fs/f2fs/segment.c b/fs/f2fs/segment.c
index c9755a0cd304..3b9479cba1ef 100644
--- a/fs/f2fs/segment.c
+++ b/fs/f2fs/segment.c
@@ -2150,6 +2150,22 @@ static void __set_sit_entry_type(struct f2fs_sb_info *sbi, int type,
__mark_sit_entry_dirty(sbi, segno);
}

+static void update_segment_mtime(struct f2fs_sb_info *sbi, block_t blkaddr)
+{
+ unsigned int segno = GET_SEGNO(sbi, blkaddr);
+ struct seg_entry *se = get_seg_entry(sbi, segno);
+ unsigned long long mtime = get_mtime(sbi, false);
+
+ if (!se->mtime)
+ se->mtime = mtime;
+ else
+ se->mtime = div_u64(se->mtime * se->valid_blocks + mtime,
+ se->valid_blocks + 1);
+
+ if (mtime > SIT_I(sbi)->max_mtime)
+ SIT_I(sbi)->max_mtime = mtime;
+}
+
static void update_sit_entry(struct f2fs_sb_info *sbi, block_t blkaddr, int del)
{
struct seg_entry *se;
@@ -2169,10 +2185,9 @@ static void update_sit_entry(struct f2fs_sb_info *sbi, block_t blkaddr, int del)
f2fs_bug_on(sbi, (new_vblocks < 0 ||
(new_vblocks > sbi->blocks_per_seg)));

+ update_segment_mtime(sbi, blkaddr);
+
se->valid_blocks = new_vblocks;
- se->mtime = get_mtime(sbi, false);
- if (se->mtime > SIT_I(sbi)->max_mtime)
- SIT_I(sbi)->max_mtime = se->mtime;

/* Update valid block bitmap */
if (del > 0) {
--
2.26.2