Received: by 2002:a05:7412:b995:b0:f9:9502:5bb8 with SMTP id it21csp6840137rdb; Tue, 2 Jan 2024 15:59:38 -0800 (PST) X-Google-Smtp-Source: AGHT+IF0eQIR+MmK2zczcQtvrRg0YDjs9TFuLfvrFUzy4XOuVGVGt0yXql7Y5qJ9KmBHkgeir/eM X-Received: by 2002:a05:600c:3ba6:b0:40d:6051:4c14 with SMTP id n38-20020a05600c3ba600b0040d60514c14mr5154533wms.93.1704239978767; Tue, 02 Jan 2024 15:59:38 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1704239978; cv=none; d=google.com; s=arc-20160816; b=YTnt3zKs0zdYtaUmGoYyWlwKbw2GvKPCwNvVb57J6UebOGNluNfPZNQ01VjpYeSR+8 fec8YBXHfK/mAZU0vm2RXHybLJgXGWcZk98LNxgCASH3HodPgZPAf4Lwb1zoy87+Vg1x ziQuE5hzy3L0/jOy4gxIrC8A+DcfDVhqjf8dLB9Z+qwsqRodmGYaGWVe1Q10nHYeAcfc DyK7MlsIdwKJ9KwoFlzQKeGFYzA0Imcz9oSyUX/NPOlgaIRWhNPn1fgUvQmpi7JYybt2 KdR2FaGAGp3/FWcG3OHtx+UKbFfIVXW6MVbKWJ3nUp9/NIcqT130UiCGulsW+I5YIpTO NlPQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=in-reply-to:content-disposition:mime-version:list-unsubscribe :list-subscribe:list-id:precedence:references:message-id:subject:cc :to:from:date:dkim-signature; bh=iJCWZg5UIGABNhhkQMq2rKGwg0YvF2KCQK/k4R3DLNU=; fh=QkmKqaC9l6H2vgX4LhceBKmzQ6ofqCyO72v5dpq4Ybs=; b=cnXgysk+aTtqDA5y9mb51Ew5+V85bq9oEIMzlqjzS+BrCudCrCVmW7wkzPseGP4cCu vRPf0+bwefhbatjylPV6S87/PKabB6zWMbnq+Tqlj4WLFMUw7Tt2OYNJVnUd3PCCEQca LXMdiKdkMr2cBWLHFhWKYTKTccD+SqPBWEMv0MOJ9PwZ7ItR8ThkvvO1O3pO6nTWSub0 8AzOZHpkSGFnZQQEqjNIll93CIvEAWmvoBfH2LlONtmPaGCV2p4qzFbKM4U925R3YC9Z q/+isPQ1ag2A7FeAU6MMYRWPLxBJCaetlaGUtAUj5uQpHSAiGDJDSNjZLk0kgNGcSkKF lDig== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@kernel.org header.s=k20201202 header.b=WyMGZDYP; spf=pass (google.com: domain of linux-kernel+bounces-14989-linux.lists.archive=gmail.com@vger.kernel.org designates 2604:1380:4601:e00::3 as permitted sender) smtp.mailfrom="linux-kernel+bounces-14989-linux.lists.archive=gmail.com@vger.kernel.org"; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=kernel.org Return-Path: Received: from am.mirrors.kernel.org (am.mirrors.kernel.org. [2604:1380:4601:e00::3]) by mx.google.com with ESMTPS id n4-20020a170906b30400b00a1d885359cfsi10853306ejz.326.2024.01.02.15.59.38 for (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 02 Jan 2024 15:59:38 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel+bounces-14989-linux.lists.archive=gmail.com@vger.kernel.org designates 2604:1380:4601:e00::3 as permitted sender) client-ip=2604:1380:4601:e00::3; Authentication-Results: mx.google.com; dkim=pass header.i=@kernel.org header.s=k20201202 header.b=WyMGZDYP; spf=pass (google.com: domain of linux-kernel+bounces-14989-linux.lists.archive=gmail.com@vger.kernel.org designates 2604:1380:4601:e00::3 as permitted sender) smtp.mailfrom="linux-kernel+bounces-14989-linux.lists.archive=gmail.com@vger.kernel.org"; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=kernel.org Received: from smtp.subspace.kernel.org (wormhole.subspace.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by am.mirrors.kernel.org (Postfix) with ESMTPS id 58FA61F232C3 for ; Tue, 2 Jan 2024 23:59:38 +0000 (UTC) Received: from localhost.localdomain (localhost.localdomain [127.0.0.1]) by smtp.subspace.kernel.org (Postfix) with ESMTP id EC6C817998; Tue, 2 Jan 2024 23:59:29 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="WyMGZDYP" X-Original-To: linux-kernel@vger.kernel.org Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 31ED217981 for ; Tue, 2 Jan 2024 23:59:28 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id 694FDC433C7; Tue, 2 Jan 2024 23:59:28 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1704239968; bh=SSCsLGtofxHPBXzoUPPjIUSPKUH3AsoW7qd+vAaslpc=; h=Date:From:To:Cc:Subject:References:In-Reply-To:From; b=WyMGZDYPnt9RZjkh5lVwz1wgQJGdXopTV9UiUko62OuF5aUssXVErIkIrAahWJkBy fhqnkIIRCuSTyVRdIyg3qiblyOWYVFSgJrDMXMXCmyDfXvEKorC9pwSN0MqUjTDJYw CABLWs4sFvVhNRz5iIuNk0F0hAlX3S8RisQpr/qRbnlK+U2K57J9u5aGkf1n2WzTqP COrvEh7141Jtgg3R1NRVTF8flqpBMzvvHisaUECMll4Ce9m4fHEQVX0p4HX2iO7kDX LniWhSgVxRCygdFHDkcB5SX+CAnbeRj0+CktcAtA7uCqbWd2NZ0lUuBfg8Qb1epnTt jy5enhQW10B3w== Date: Tue, 2 Jan 2024 15:59:26 -0800 From: Jaegeuk Kim To: Yonggil Song Cc: "chao@kernel.org" , "linux-f2fs-devel@lists.sourceforge.net" , "linux-kernel@vger.kernel.org" , Seokhwan Kim , Daejun Park , Siwoo Jung Subject: Re: [PATCH v4] f2fs: New victim selection for GC Message-ID: References: <20231228064508epcms2p1f74a30f7b615716d678950c0d5bc0748@epcms2p1> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20231228064508epcms2p1f74a30f7b615716d678950c0d5bc0748@epcms2p1> On 12/28, Yonggil Song wrote: > >From d08b97183bc830779c82b83d94f8b75ad11cb29a Mon Sep 17 00:00:00 2001 > From: Yonggil Song > Date: Thu, 7 Dec 2023 16:34:38 +0900 > Subject: [PATCH v4] f2fs: New victim selection for GC > > Overview > ======== > > This patch introduces a new way to preference data sections when selecting > GC victims. Migration of data blocks causes invalidation of node blocks. > Therefore, in situations where GC is frequent, selecting data blocks as > victims can reduce unnecessary block migration by invalidating node blocks. > For exceptional situations where free sections are insufficient, node blocks > are selected as victims instead of data blocks to get extra free sections. > > Problem > ======= > > If the total amount of nodes is larger than the size of one section, nodes > occupy multiple sections, and node victims are often selected because the > gc cost is lowered by data block migration in GC. Since moving the data > section causes frequent node victim selection, victim threshing occurs in > the node section. This results in an increase in WAF. > > Experiment > ========== > > Test environment is as follows. > > System info > - 3.6GHz, 16 core CPU > - 36GiB Memory > Device info > - a conventional null_blk with 228MiB > - a sequential null_blk with 4068 zones of 8MiB > Format > - mkfs.f2fs -c -m -Z 8 -o 3.89 > Mount > - mount > Fio script > - fio --rw=randwrite --bs=4k --ba=4k --filesize=31187m --norandommap --overwrite=1 --name=job1 --filename=./mnt/sustain --io_size=128g > WAF calculation > - (IOs on conv. null_blk + IOs on seq. null_blk) / random write IOs > > Conclusion > ========== > > This experiment showed that the WAF was reduced by 29% (18.75 -> 13.3) when > the data section was selected first when selecting GC victims. This was > achieved by reducing the migration of the node blocks by 69.4% > (253,131,743 blks -> 77,463,278 blks). It is possible to achieve low WAF > performance with the GC victim selection method in environments where the > section size is relatively small. > > Signed-off-by: Yonggil Song > --- > fs/f2fs/f2fs.h | 1 + > fs/f2fs/gc.c | 99 +++++++++++++++++++++++++++++++++++++++----------- > fs/f2fs/gc.h | 6 +++ > 3 files changed, 85 insertions(+), 21 deletions(-) > > diff --git a/fs/f2fs/f2fs.h b/fs/f2fs/f2fs.h > index 9043cedfa12b..b2c0adfb2704 100644 > --- a/fs/f2fs/f2fs.h > +++ b/fs/f2fs/f2fs.h > @@ -1649,6 +1649,7 @@ struct f2fs_sb_info { > struct f2fs_mount_info mount_opt; /* mount options */ > > /* for cleaning operations */ > + bool require_node_gc; /* flag for node GC */ > struct f2fs_rwsem gc_lock; /* > * semaphore for GC, avoid > * race between GC and GC or CP > diff --git a/fs/f2fs/gc.c b/fs/f2fs/gc.c > index f550cdeaa663..d8a81a6ed325 100644 > --- a/fs/f2fs/gc.c > +++ b/fs/f2fs/gc.c > @@ -341,6 +341,14 @@ static unsigned int get_cb_cost(struct f2fs_sb_info *sbi, unsigned int segno) > unsigned int i; > unsigned int usable_segs_per_sec = f2fs_usable_segs_in_sec(sbi, segno); > > + /* > + * When BG_GC selects victims based on age, it prevents node victims > + * from being selected. This is because node blocks can be invalidated > + * by moving data blocks. > + */ > + if (__skip_node_gc(sbi, segno)) > + return UINT_MAX; > + > for (i = 0; i < usable_segs_per_sec; i++) > mtime += get_seg_entry(sbi, start + i)->mtime; > vblocks = get_valid_blocks(sbi, segno, true); > @@ -369,10 +377,24 @@ static inline unsigned int get_gc_cost(struct f2fs_sb_info *sbi, > return get_seg_entry(sbi, segno)->ckpt_valid_blocks; > > /* alloc_mode == LFS */ > - if (p->gc_mode == GC_GREEDY) > - return get_valid_blocks(sbi, segno, true); > - else if (p->gc_mode == GC_CB) > + if (p->gc_mode == GC_GREEDY) { > + /* > + * If the data block that the node block pointed to is GCed, > + * the node block is invalidated. For this reason, we add a > + * weight to cost of node victims to give priority to data > + * victims during the gc process. However, in a situation > + * where we run out of free sections, we remove the weight > + * because we need to clean up node blocks. > + */ > + unsigned int cost = get_valid_blocks(sbi, segno, true); > + > + if (__skip_node_gc(sbi, segno)) > + return cost + > + (sbi->segs_per_sec << sbi->log_blocks_per_seg); > + return cost; Given the comment, can we use a weight instead of cost? - unsigned int cost = get_valid_blocks(sbi, segno, true); + unsigned int weight = 0; - if (__skip_node_gc(sbi, segno)) - return cost + - (sbi->segs_per_sec << sbi->log_blocks_per_seg); - return cost; + if (__skip_node_gc(sbi, segno)) { + weight = sbi->segs_per_sec << sbi->log_blocks_per_seg; + + return get_valid_blocks(sbi, segno, true) + weight; > + } else if (p->gc_mode == GC_CB) { > return get_cb_cost(sbi, segno); > + } > > f2fs_bug_on(sbi, 1); > return 0; > @@ -557,6 +579,14 @@ static void atgc_lookup_victim(struct f2fs_sb_info *sbi, > if (ve->mtime >= max_mtime || ve->mtime < min_mtime) > goto skip; > > + /* > + * When BG_GC selects victims based on age, it prevents node victims > + * from being selected. This is because node blocks can be invalidated > + * by moving data blocks. > + */ > + if (__skip_node_gc(sbi, ve->segno)) > + goto skip; > + > /* age = 10000 * x% * 60 */ > age = div64_u64(accu * (max_mtime - ve->mtime), total_time) * > age_weight; > @@ -913,7 +943,22 @@ int f2fs_get_victim(struct f2fs_sb_info *sbi, unsigned int *result, > goto retry; > } > > + Unnecessary line. > if (p.min_segno != NULL_SEGNO) { I'm wondering whether we need to modify all the changes below. If we added more cost to the node segments, how about just managing require_node_gc in the specific cases? > + if (sbi->require_node_gc && > + IS_DATASEG(get_seg_entry(sbi, p.min_segno)->type)) { > + /* > + * We need to clean node sections. but, data victim > + * cost is the lowest. If free sections are enough, > + * stop cleaning node victim. If not, it goes on > + * by GCing data victims. > + */ ^-- Wrong indentation. > + if (has_enough_free_secs(sbi, prefree_segments(sbi), 0)) { > + sbi->require_node_gc = false; > + p.min_segno = NULL_SEGNO; > + goto out; > + } > + } > got_it: > *result = (p.min_segno / p.ofs_unit) * p.ofs_unit; > got_result: > @@ -1830,8 +1875,27 @@ int f2fs_gc(struct f2fs_sb_info *sbi, struct f2fs_gc_control *gc_control) > goto stop; > } > > + __get_secs_required(sbi, NULL, &upper_secs, NULL); > + > + /* > + * Write checkpoint to reclaim prefree segments. > + * We need more three extra sections for writer's data/node/dentry. > + */ > + if (free_sections(sbi) <= upper_secs + NR_GC_CHECKPOINT_SECS) { > + sbi->require_node_gc = true; > + > + if (prefree_segments(sbi)) { > + stat_inc_cp_call_count(sbi, TOTAL_CALL); > + ret = f2fs_write_checkpoint(sbi, &cpc); > + if (ret) > + goto stop; > + /* Reset due to checkpoint */ > + sec_freed = 0; > + } > + } > + > /* Let's run FG_GC, if we don't have enough space. */ > - if (has_not_enough_free_secs(sbi, 0, 0)) { > + if (gc_type == BG_GC && has_not_enough_free_secs(sbi, 0, 0)) { > gc_type = FG_GC; > > /* > @@ -1882,7 +1946,13 @@ int f2fs_gc(struct f2fs_sb_info *sbi, struct f2fs_gc_control *gc_control) > if (!gc_control->no_bg_gc && > total_sec_freed < gc_control->nr_free_secs) > goto go_gc_more; > - goto stop; > + /* > + * If require_node_gc flag is set even though there > + * are enough free sections, node cleaning will > + * continue. > + */ > + if (!sbi->require_node_gc) > + goto stop; > } > if (sbi->skipped_gc_rwsem) > skipped_round++; > @@ -1897,21 +1967,6 @@ int f2fs_gc(struct f2fs_sb_info *sbi, struct f2fs_gc_control *gc_control) > goto stop; > } > > - __get_secs_required(sbi, NULL, &upper_secs, NULL); > - > - /* > - * Write checkpoint to reclaim prefree segments. > - * We need more three extra sections for writer's data/node/dentry. > - */ > - if (free_sections(sbi) <= upper_secs + NR_GC_CHECKPOINT_SECS && > - prefree_segments(sbi)) { > - stat_inc_cp_call_count(sbi, TOTAL_CALL); > - ret = f2fs_write_checkpoint(sbi, &cpc); > - if (ret) > - goto stop; > - /* Reset due to checkpoint */ > - sec_freed = 0; > - } > go_gc_more: > segno = NULL_SEGNO; > goto gc_more; > @@ -1920,8 +1975,10 @@ int f2fs_gc(struct f2fs_sb_info *sbi, struct f2fs_gc_control *gc_control) > SIT_I(sbi)->last_victim[ALLOC_NEXT] = 0; > SIT_I(sbi)->last_victim[FLUSH_DEVICE] = gc_control->victim_segno; > > - if (gc_type == FG_GC) > + if (gc_type == FG_GC) { > f2fs_unpin_all_sections(sbi, true); > + sbi->require_node_gc = false; > + } > > trace_f2fs_gc_end(sbi->sb, ret, total_freed, total_sec_freed, > get_pages(sbi, F2FS_DIRTY_NODES), > diff --git a/fs/f2fs/gc.h b/fs/f2fs/gc.h > index 28a00942802c..cd07bf125177 100644 > --- a/fs/f2fs/gc.h > +++ b/fs/f2fs/gc.h > @@ -166,3 +166,9 @@ static inline bool has_enough_invalid_blocks(struct f2fs_sb_info *sbi) > free_user_blocks(sbi) < > limit_free_user_blocks(invalid_user_blocks)); > } > + > +static inline bool __skip_node_gc(struct f2fs_sb_info *sbi, unsigned int segno) > +{ > + return (IS_NODESEG(get_seg_entry(sbi, segno)->type) && > + !sbi->require_node_gc); > +} > -- > 2.34.1