Received: by 2002:a05:7412:b995:b0:f9:9502:5bb8 with SMTP id it21csp3784662rdb; Wed, 27 Dec 2023 22:45:22 -0800 (PST) X-Google-Smtp-Source: AGHT+IGkyUPgSV0LnJ/lbNtK+5errJBSNf/9NXjNvhOIqpPmBNLEjCwQyPfXUhGuR53ZNoWOfXX0 X-Received: by 2002:a05:6870:e2d3:b0:1fb:8965:e2b1 with SMTP id w19-20020a056870e2d300b001fb8965e2b1mr11734463oad.64.1703745921973; Wed, 27 Dec 2023 22:45:21 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1703745921; cv=none; d=google.com; s=arc-20160816; b=Sj8IzxooaV+B/rPMk4wsbVQua7+gTDuOZ0tvTN3D3LEkn5Exdf63TWEQ9mi//Ty507 xDwOqnlXQ4G/rA1Zdv9phC8rRux2sIOnhU3rZoOsAkJQv2aDTfRgvUl7fMiGd0nkk41e MxcdEJ1FtzxSqA7NE0QsUlOwzFTYtwenNASylUORcu8PdpJesy56ex+Y2gUI8graGgQ7 LxkzG0qXTK2a36tv+NuSJlNCbtgX853NHz4Kroy6KhRL5EJiNoz8oEMgSv+MszWNq6f9 2zf8Css1oLM/WQfOgNHWT44ROhg2sWfs0FRD4j96pkVN1uxclC6ihhIMfqZxD82tUBcI DzOw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=references:dlp-filter:cms-type:content-transfer-encoding:date :message-id:cc:to:from:sender:reply-to:subject:mime-version :list-unsubscribe:list-subscribe:list-id:precedence:dkim-signature :dkim-filter; bh=X7yZK13nwyDzs58dDzp+M/6sU3bq6c/zRPLxOpxegOI=; fh=X302JOUsD8cB9g3jJG2g14YRoziADXW8fj4wbJI6OBM=; b=zZJ1PECDecwnLUuSJDDHwHN81N157jWZsjbXMOT8cYuo3Llrsk6ubcADctANWXe6Ev dkpNSA0a3EA6O1tvz5kfZdDAPjkZZWDauoZk5AfzklztvMihl84rFZaawCqwq4/uFfva Bym5l1Gl2b93+B73QFBVtIVzqrUcull/WsD7iug78ngx6CX+F4vbrXnToI9QQFcO/FZq WVFSXEDfdrdPUaXN1mMcJ5wJwM4ggCmE/1P5ELtcEpGqrj9Zcbxqfa7S6w/kVysPJ4VD 58nqzoFClZdhVYY2FXTmhsGbJIuAKZS+MhVlSFKFRNfsgcCuH7KKxNJzV0nX2KZLiHkE Oi5A== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@samsung.com header.s=mail20170921 header.b=IJHkBY5W; spf=pass (google.com: domain of linux-kernel+bounces-12467-linux.lists.archive=gmail.com@vger.kernel.org designates 2604:1380:45e3:2400::1 as permitted sender) smtp.mailfrom="linux-kernel+bounces-12467-linux.lists.archive=gmail.com@vger.kernel.org"; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=samsung.com Return-Path: Received: from sv.mirrors.kernel.org (sv.mirrors.kernel.org. [2604:1380:45e3:2400::1]) by mx.google.com with ESMTPS id 7-20020a630107000000b005cdf915a896si10173064pgb.571.2023.12.27.22.45.21 for (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 27 Dec 2023 22:45:21 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel+bounces-12467-linux.lists.archive=gmail.com@vger.kernel.org designates 2604:1380:45e3:2400::1 as permitted sender) client-ip=2604:1380:45e3:2400::1; Authentication-Results: mx.google.com; dkim=pass header.i=@samsung.com header.s=mail20170921 header.b=IJHkBY5W; spf=pass (google.com: domain of linux-kernel+bounces-12467-linux.lists.archive=gmail.com@vger.kernel.org designates 2604:1380:45e3:2400::1 as permitted sender) smtp.mailfrom="linux-kernel+bounces-12467-linux.lists.archive=gmail.com@vger.kernel.org"; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=samsung.com Received: from smtp.subspace.kernel.org (wormhole.subspace.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by sv.mirrors.kernel.org (Postfix) with ESMTPS id 9692D2820E7 for ; Thu, 28 Dec 2023 06:45:21 +0000 (UTC) Received: from localhost.localdomain (localhost.localdomain [127.0.0.1]) by smtp.subspace.kernel.org (Postfix) with ESMTP id A8EFD3C16; Thu, 28 Dec 2023 06:45:15 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=samsung.com header.i=@samsung.com header.b="IJHkBY5W" X-Original-To: linux-kernel@vger.kernel.org Received: from mailout4.samsung.com (mailout4.samsung.com [203.254.224.34]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id E5CD23C07 for ; Thu, 28 Dec 2023 06:45:11 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=samsung.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=samsung.com Received: from epcas2p3.samsung.com (unknown [182.195.41.55]) by mailout4.samsung.com (KnoxPortal) with ESMTP id 20231228064510epoutp0427ce082741806070205af7ba6727d9bb~k6_DjHYAf1778917789epoutp04O for ; Thu, 28 Dec 2023 06:45:10 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 mailout4.samsung.com 20231228064510epoutp0427ce082741806070205af7ba6727d9bb~k6_DjHYAf1778917789epoutp04O DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=samsung.com; s=mail20170921; t=1703745910; bh=X7yZK13nwyDzs58dDzp+M/6sU3bq6c/zRPLxOpxegOI=; h=Subject:Reply-To:From:To:CC:Date:References:From; b=IJHkBY5WyqB6axb1bqsEQY1N8ajAp2leRpgNyuDlDk4mWvJaF5qK1ns/yIukeireD 6RTFOr6Uche5EFNZ8tPg2D+ULLV5Eu3xwD2VD5UNOPrD9zgiodDAGED0g0wwlYYJPI K025CVjVzWoVej31kk6NDfLGs9atKR9lH+qTekjA= Received: from epsnrtp4.localdomain (unknown [182.195.42.165]) by epcas2p1.samsung.com (KnoxPortal) with ESMTP id 20231228064509epcas2p1016b4762c56f044319ce2fc190e755db~k6_DJIUwX3025530255epcas2p1M; Thu, 28 Dec 2023 06:45:09 +0000 (GMT) Received: from epsmges2p1.samsung.com (unknown [182.195.36.70]) by epsnrtp4.localdomain (Postfix) with ESMTP id 4T0zXP0P87z4x9Pv; Thu, 28 Dec 2023 06:45:09 +0000 (GMT) X-AuditID: b6c32a45-3ebfd70000002716-da-658d19740c43 Received: from epcas2p2.samsung.com ( [182.195.41.54]) by epsmges2p1.samsung.com (Symantec Messaging Gateway) with SMTP id C1.6C.10006.4791D856; Thu, 28 Dec 2023 15:45:08 +0900 (KST) Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: Mime-Version: 1.0 Subject: [PATCH v4] f2fs: New victim selection for GC Reply-To: yonggil.song@samsung.com Sender: Yonggil Song From: Yonggil Song To: "jaegeuk@kernel.org" , "chao@kernel.org" , "linux-f2fs-devel@lists.sourceforge.net" , "linux-kernel@vger.kernel.org" , Seokhwan Kim , Daejun Park , Siwoo Jung CC: Yonggil Song X-Priority: 3 X-Content-Kind-Code: NORMAL X-CPGS-Detection: blocking_info_exchange X-Drm-Type: N,general X-Msg-Generator: Mail X-Msg-Type: PERSONAL X-Reply-Demand: N Message-ID: <20231228064508epcms2p1f74a30f7b615716d678950c0d5bc0748@epcms2p1> Date: Thu, 28 Dec 2023 15:45:08 +0900 X-CMS-MailID: 20231228064508epcms2p1f74a30f7b615716d678950c0d5bc0748 Content-Transfer-Encoding: 7bit Content-Type: text/plain; charset="utf-8" X-Sendblock-Type: AUTO_CONFIDENTIAL CMS-TYPE: 102P X-Brightmail-Tracker: H4sIAAAAAAAAA+NgFmpik+LIzCtJLcpLzFFi42LZdljTTLdEsjfVoGWhhcXpqWeZLFY9CLd4 sn4Ws8WlRe4Wl3fNYbM4P/E1ULhjLqPF1PNHmBw4PDat6mTz2L3gM5NH35ZVjB6fN8kFsERl 22SkJqakFimk5iXnp2TmpdsqeQfHO8ebmhkY6hpaWpgrKeQl5qbaKrn4BOi6ZeYAHaGkUJaY UwoUCkgsLlbSt7Mpyi8tSVXIyC8usVVKLUjJKTAv0CtOzC0uzUvXy0stsTI0MDAyBSpMyM7Y ++c1c8Eb54obK84zNzAuMO1i5OSQEDCRODdvJWMXIxeHkMAORonzO7YDORwcvAKCEn93CIPU CAuYSuy72soCYgsJKElcO9DLAhHXl9i8eBk7iM0moCvxd8NyMFtE4C6TRF+3O4jNLKAtsXH2 WSaIXbwSM9qfskDY0hLbl29lhLA1JH4s62WGsEUlbq5+yw5jvz82H6pGRKL13lmoGkGJBz93 Q8UlJRYdOg81P1/i74rrbBB2jcTWhjaouL7EtY6NYHt5BXwl3n+dD/Yii4CqxJIJqRAlLhLf 7x9ngzhZXmL72znMICXMApoS63fpg5gSAsoSR26xQFTwSXQc/ssO89SOeU+gFqlJbN60mRXC lpG48LgN6kgPiT0T/7BDAjBQ4uyDN4wTGBVmIYJ5FpK9sxD2LmBkXsUollpQnJueWmxUYAiP 2eT83E2M4MSo5bqDcfLbD3qHGJk4GA8xSnAwK4nwHhftSRXiTUmsrEotyo8vKs1JLT7EaAr0 8ERmKdHkfGBqziuJNzSxNDAxMzM0NzI1MFcS573XOjdFSCA9sSQ1OzW1ILUIpo+Jg1OqgUnt lyybZdajI90rpE3m2+ifDIyLEF6x687p7RKrCsQzX68OubVBdQXXultm95iFw/2D7rxn2i3m rM7YNdXEOm6V7xJpocqdf/bbTX4kq/6o6bz0++czL/65PXNviFu9aWPlqaiZbzIi3fceXptZ Lh3LZFHl6hGx9PF0U89wWaVrXCI+2p7vQ5o+TKxN0Xv+JJhNxvv6xg6W7Y/N1whWvNDcYtck VXSQK6tlb7SB+XvdJbyf7r4/ImD8iCt5a8Fyw4frPjocPXR42Zxp8aHR3Vd8j89lmslQdkjJ eJO/XrC/1P+Jm+axrHj7evru+HsF/9sEDlbzLWjPlrfOSGPzurZz1pxjVxpfyccc2bbAqEqJ pTgj0VCLuag4EQDf7lbKFQQAAA== DLP-Filter: Pass X-CFilter-Loop: Reflected X-CMS-RootMailID: 20231228064508epcms2p1f74a30f7b615716d678950c0d5bc0748 References: From d08b97183bc830779c82b83d94f8b75ad11cb29a Mon Sep 17 00:00:00 2001 From: Yonggil Song Date: Thu, 7 Dec 2023 16:34:38 +0900 Subject: [PATCH v4] f2fs: New victim selection for GC Overview ======== This patch introduces a new way to preference data sections when selecting GC victims. Migration of data blocks causes invalidation of node blocks. Therefore, in situations where GC is frequent, selecting data blocks as victims can reduce unnecessary block migration by invalidating node blocks. For exceptional situations where free sections are insufficient, node blocks are selected as victims instead of data blocks to get extra free sections. Problem ======= If the total amount of nodes is larger than the size of one section, nodes occupy multiple sections, and node victims are often selected because the gc cost is lowered by data block migration in GC. Since moving the data section causes frequent node victim selection, victim threshing occurs in the node section. This results in an increase in WAF. Experiment ========== Test environment is as follows. System info - 3.6GHz, 16 core CPU - 36GiB Memory Device info - a conventional null_blk with 228MiB - a sequential null_blk with 4068 zones of 8MiB Format - mkfs.f2fs -c -m -Z 8 -o 3.89 Mount - mount Fio script - fio --rw=randwrite --bs=4k --ba=4k --filesize=31187m --norandommap --overwrite=1 --name=job1 --filename=./mnt/sustain --io_size=128g WAF calculation - (IOs on conv. null_blk + IOs on seq. null_blk) / random write IOs Conclusion ========== This experiment showed that the WAF was reduced by 29% (18.75 -> 13.3) when the data section was selected first when selecting GC victims. This was achieved by reducing the migration of the node blocks by 69.4% (253,131,743 blks -> 77,463,278 blks). It is possible to achieve low WAF performance with the GC victim selection method in environments where the section size is relatively small. Signed-off-by: Yonggil Song --- fs/f2fs/f2fs.h | 1 + fs/f2fs/gc.c | 99 +++++++++++++++++++++++++++++++++++++++----------- fs/f2fs/gc.h | 6 +++ 3 files changed, 85 insertions(+), 21 deletions(-) diff --git a/fs/f2fs/f2fs.h b/fs/f2fs/f2fs.h index 9043cedfa12b..b2c0adfb2704 100644 --- a/fs/f2fs/f2fs.h +++ b/fs/f2fs/f2fs.h @@ -1649,6 +1649,7 @@ struct f2fs_sb_info { struct f2fs_mount_info mount_opt; /* mount options */ /* for cleaning operations */ + bool require_node_gc; /* flag for node GC */ struct f2fs_rwsem gc_lock; /* * semaphore for GC, avoid * race between GC and GC or CP diff --git a/fs/f2fs/gc.c b/fs/f2fs/gc.c index f550cdeaa663..d8a81a6ed325 100644 --- a/fs/f2fs/gc.c +++ b/fs/f2fs/gc.c @@ -341,6 +341,14 @@ static unsigned int get_cb_cost(struct f2fs_sb_info *sbi, unsigned int segno) unsigned int i; unsigned int usable_segs_per_sec = f2fs_usable_segs_in_sec(sbi, segno); + /* + * When BG_GC selects victims based on age, it prevents node victims + * from being selected. This is because node blocks can be invalidated + * by moving data blocks. + */ + if (__skip_node_gc(sbi, segno)) + return UINT_MAX; + for (i = 0; i < usable_segs_per_sec; i++) mtime += get_seg_entry(sbi, start + i)->mtime; vblocks = get_valid_blocks(sbi, segno, true); @@ -369,10 +377,24 @@ static inline unsigned int get_gc_cost(struct f2fs_sb_info *sbi, return get_seg_entry(sbi, segno)->ckpt_valid_blocks; /* alloc_mode == LFS */ - if (p->gc_mode == GC_GREEDY) - return get_valid_blocks(sbi, segno, true); - else if (p->gc_mode == GC_CB) + if (p->gc_mode == GC_GREEDY) { + /* + * If the data block that the node block pointed to is GCed, + * the node block is invalidated. For this reason, we add a + * weight to cost of node victims to give priority to data + * victims during the gc process. However, in a situation + * where we run out of free sections, we remove the weight + * because we need to clean up node blocks. + */ + unsigned int cost = get_valid_blocks(sbi, segno, true); + + if (__skip_node_gc(sbi, segno)) + return cost + + (sbi->segs_per_sec << sbi->log_blocks_per_seg); + return cost; + } else if (p->gc_mode == GC_CB) { return get_cb_cost(sbi, segno); + } f2fs_bug_on(sbi, 1); return 0; @@ -557,6 +579,14 @@ static void atgc_lookup_victim(struct f2fs_sb_info *sbi, if (ve->mtime >= max_mtime || ve->mtime < min_mtime) goto skip; + /* + * When BG_GC selects victims based on age, it prevents node victims + * from being selected. This is because node blocks can be invalidated + * by moving data blocks. + */ + if (__skip_node_gc(sbi, ve->segno)) + goto skip; + /* age = 10000 * x% * 60 */ age = div64_u64(accu * (max_mtime - ve->mtime), total_time) * age_weight; @@ -913,7 +943,22 @@ int f2fs_get_victim(struct f2fs_sb_info *sbi, unsigned int *result, goto retry; } + if (p.min_segno != NULL_SEGNO) { + if (sbi->require_node_gc && + IS_DATASEG(get_seg_entry(sbi, p.min_segno)->type)) { + /* + * We need to clean node sections. but, data victim + * cost is the lowest. If free sections are enough, + * stop cleaning node victim. If not, it goes on + * by GCing data victims. + */ + if (has_enough_free_secs(sbi, prefree_segments(sbi), 0)) { + sbi->require_node_gc = false; + p.min_segno = NULL_SEGNO; + goto out; + } + } got_it: *result = (p.min_segno / p.ofs_unit) * p.ofs_unit; got_result: @@ -1830,8 +1875,27 @@ int f2fs_gc(struct f2fs_sb_info *sbi, struct f2fs_gc_control *gc_control) goto stop; } + __get_secs_required(sbi, NULL, &upper_secs, NULL); + + /* + * Write checkpoint to reclaim prefree segments. + * We need more three extra sections for writer's data/node/dentry. + */ + if (free_sections(sbi) <= upper_secs + NR_GC_CHECKPOINT_SECS) { + sbi->require_node_gc = true; + + if (prefree_segments(sbi)) { + stat_inc_cp_call_count(sbi, TOTAL_CALL); + ret = f2fs_write_checkpoint(sbi, &cpc); + if (ret) + goto stop; + /* Reset due to checkpoint */ + sec_freed = 0; + } + } + /* Let's run FG_GC, if we don't have enough space. */ - if (has_not_enough_free_secs(sbi, 0, 0)) { + if (gc_type == BG_GC && has_not_enough_free_secs(sbi, 0, 0)) { gc_type = FG_GC; /* @@ -1882,7 +1946,13 @@ int f2fs_gc(struct f2fs_sb_info *sbi, struct f2fs_gc_control *gc_control) if (!gc_control->no_bg_gc && total_sec_freed < gc_control->nr_free_secs) goto go_gc_more; - goto stop; + /* + * If require_node_gc flag is set even though there + * are enough free sections, node cleaning will + * continue. + */ + if (!sbi->require_node_gc) + goto stop; } if (sbi->skipped_gc_rwsem) skipped_round++; @@ -1897,21 +1967,6 @@ int f2fs_gc(struct f2fs_sb_info *sbi, struct f2fs_gc_control *gc_control) goto stop; } - __get_secs_required(sbi, NULL, &upper_secs, NULL); - - /* - * Write checkpoint to reclaim prefree segments. - * We need more three extra sections for writer's data/node/dentry. - */ - if (free_sections(sbi) <= upper_secs + NR_GC_CHECKPOINT_SECS && - prefree_segments(sbi)) { - stat_inc_cp_call_count(sbi, TOTAL_CALL); - ret = f2fs_write_checkpoint(sbi, &cpc); - if (ret) - goto stop; - /* Reset due to checkpoint */ - sec_freed = 0; - } go_gc_more: segno = NULL_SEGNO; goto gc_more; @@ -1920,8 +1975,10 @@ int f2fs_gc(struct f2fs_sb_info *sbi, struct f2fs_gc_control *gc_control) SIT_I(sbi)->last_victim[ALLOC_NEXT] = 0; SIT_I(sbi)->last_victim[FLUSH_DEVICE] = gc_control->victim_segno; - if (gc_type == FG_GC) + if (gc_type == FG_GC) { f2fs_unpin_all_sections(sbi, true); + sbi->require_node_gc = false; + } trace_f2fs_gc_end(sbi->sb, ret, total_freed, total_sec_freed, get_pages(sbi, F2FS_DIRTY_NODES), diff --git a/fs/f2fs/gc.h b/fs/f2fs/gc.h index 28a00942802c..cd07bf125177 100644 --- a/fs/f2fs/gc.h +++ b/fs/f2fs/gc.h @@ -166,3 +166,9 @@ static inline bool has_enough_invalid_blocks(struct f2fs_sb_info *sbi) free_user_blocks(sbi) < limit_free_user_blocks(invalid_user_blocks)); } + +static inline bool __skip_node_gc(struct f2fs_sb_info *sbi, unsigned int segno) +{ + return (IS_NODESEG(get_seg_entry(sbi, segno)->type) && + !sbi->require_node_gc); +} -- 2.34.1