Received: by 2002:a05:6a10:1a4d:0:0:0:0 with SMTP id nk13csp3365074pxb; Fri, 4 Feb 2022 07:07:54 -0800 (PST) X-Google-Smtp-Source: ABdhPJwF0DOaP+lL5zBaNvSdC5eu/StiaHiqNwIEzqbj6X85sLdSu3qghjNMlY7YOuOJESyZK/tX X-Received: by 2002:a63:8243:: with SMTP id w64mr2675301pgd.588.1643987274739; Fri, 04 Feb 2022 07:07:54 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1643987274; cv=none; d=google.com; s=arc-20160816; b=R1HFDSvAsDtY+mA5fwebFKfnX6MCQk8al2Fv+lTXzXZDkE/YmdgJGonnNkS9IjrR2T OAms7tXQ1wGACsZ/4ehq1VOY81jJFsFquiyqdV1fmjNjT4A8k5ToLxRH+L2i1jI9LlKB mZTN74EOM0P5wjcVQc6skJB1GxG3fjCZX9il8Z8naNpWZjb64V9ciZRlD5sFBhbZEOyn Cj6DMMjabPtzOW2Hg+e1qdfllVIY74lHeYW8gPAyCuO5HOfxzt8soXe9PX5VuwKbddud 5iOJX0IOGQLnCkQ5BhJ3d75LauKI36Kw/4p7V9njXisEWzB1GKcJ7irIAy9yGSxPYUBk 8KSQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:in-reply-to:from :references:cc:to:content-language:subject:user-agent:mime-version :date:message-id:dkim-signature; bh=tlB9n/6JhO7vYbSkSJ00bxoIVXM4ottuh8rOnuG/VfU=; b=oVYBUGP2gpIhQZmORS87F9iVe99Ec9KGGmlWA1gm/P6jSDWQB5RpQfdUxdUYtsKfnX Ip0WhiQqc6SiAmW6AqAT6WILuul8iV1YmqgCpq15kv2mFEC93Etwjngw2qzmFxr5xCgb XLon+gnsuCbkourVQn1KX8XZq1qABcHpCZxRR9VzYZeKS3aGRArNRdwyQKyopBZb4PBh eQJeb7e5UkPoTTEivZslhJeCLrfRkqAG/QHFuOGXw7+cHAK+8LzY+J1WJUuqNfAS3XQ3 /AM2uB+UciEJLQ+06D4jKxqLIiTrD9cam99uMeP/pOF/lUPKqt1QRnzBx/H5Wlh97VJG kO0w== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@kernel.org header.s=k20201202 header.b=np2YIvHb; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=kernel.org Return-Path: Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id mw12si2370042pjb.180.2022.02.04.07.07.42; Fri, 04 Feb 2022 07:07:54 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@kernel.org header.s=k20201202 header.b=np2YIvHb; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1356207AbiBDAUo (ORCPT + 99 others); Thu, 3 Feb 2022 19:20:44 -0500 Received: from dfw.source.kernel.org ([139.178.84.217]:42136 "EHLO dfw.source.kernel.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1356200AbiBDAUn (ORCPT ); Thu, 3 Feb 2022 19:20:43 -0500 Received: from smtp.kernel.org (relay.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by dfw.source.kernel.org (Postfix) with ESMTPS id 92D966191C for ; Fri, 4 Feb 2022 00:20:43 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id 2E628C340E8; Fri, 4 Feb 2022 00:20:41 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1643934043; bh=SAspO1lsj8KC/6vzo9n7vXR2mlOb38UitSov7/738X0=; h=Date:Subject:To:Cc:References:From:In-Reply-To:From; b=np2YIvHbh/LZS/+wgvzyxlkDi9Vk8z8DGT5CGLc8RzU6E7U//hse3MneJX37CM51n xa8bQUPOrpga74dOQpVKJ729lqxdK2CG6oGbQQ3smNjh4YDND1Lu3xso61eBjipfD7 ncCe6e9Iv2gbk66Bf1oj9FmghbKAdNal9qP1H/E0xkO0aTDbEioP4FvyYW73jGdpV6 QLNWPCy7agFnUDqPz0/TCwFpDGaYr5gU0c14ZA/niRUJP3EbqEcr6L/z0C0GoJ9SbP n15LaTsVIPwR+etB1bIRGyO0xtf/72qKNj3fkzeI3zxMD9sdxVVsq54lxyWbXCX9Yw Kj8fYEz6W7LbQ== Message-ID: <211c28eb-789e-e6e6-5daf-8040ac5ddd93@kernel.org> Date: Fri, 4 Feb 2022 08:20:39 +0800 MIME-Version: 1.0 User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:91.0) Gecko/20100101 Thunderbird/91.4.0 Subject: Re: [f2fs-dev] [PATCH v2] f2fs: add a way to limit roll forward recovery time Content-Language: en-US To: Jaegeuk Kim Cc: linux-kernel@vger.kernel.org, linux-f2fs-devel@lists.sourceforge.net References: <20220127214102.2040254-1-jaegeuk@kernel.org> <142d2cc9-73f2-f9fa-2543-6426c62e77a6@kernel.org> From: Chao Yu In-Reply-To: Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 7bit Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 2022/2/4 1:42, Jaegeuk Kim wrote: > On 02/03, Chao Yu wrote: >> On 2022/2/3 8:34, Jaegeuk Kim wrote: >>> This adds a sysfs entry to call checkpoint during fsync() in order to avoid >>> long elapsed time to run roll-forward recovery when booting the device. >>> Default value doesn't enforce the limitation which is same as before. >>> >>> Signed-off-by: Jaegeuk Kim >>> --- >>> v2 from v1: >>> - make the default w/o enforcement >>> >>> Documentation/ABI/testing/sysfs-fs-f2fs | 6 ++++++ >>> fs/f2fs/checkpoint.c | 1 + >>> fs/f2fs/f2fs.h | 3 +++ >>> fs/f2fs/node.c | 2 ++ >>> fs/f2fs/node.h | 3 +++ >>> fs/f2fs/recovery.c | 4 ++++ >>> fs/f2fs/sysfs.c | 2 ++ >>> 7 files changed, 21 insertions(+) >>> >>> diff --git a/Documentation/ABI/testing/sysfs-fs-f2fs b/Documentation/ABI/testing/sysfs-fs-f2fs >>> index 87d3884c90ea..ce8103f522cb 100644 >>> --- a/Documentation/ABI/testing/sysfs-fs-f2fs >>> +++ b/Documentation/ABI/testing/sysfs-fs-f2fs >>> @@ -567,3 +567,9 @@ Contact: "Daeho Jeong" >>> Description: You can set the trial count limit for GC urgent high mode with this value. >>> If GC thread gets to the limit, the mode will turn back to GC normal mode. >>> By default, the value is zero, which means there is no limit like before. >>> + >>> +What: /sys/fs/f2fs//max_roll_forward_node_blocks >>> +Date: January 2022 >>> +Contact: "Jaegeuk Kim" >>> +Description: Controls max # of node block writes to be used for roll forward >>> + recovery. This can limit the roll forward recovery time. >>> diff --git a/fs/f2fs/checkpoint.c b/fs/f2fs/checkpoint.c >>> index deeda95688f0..57a2d9164bee 100644 >>> --- a/fs/f2fs/checkpoint.c >>> +++ b/fs/f2fs/checkpoint.c >>> @@ -1543,6 +1543,7 @@ static int do_checkpoint(struct f2fs_sb_info *sbi, struct cp_control *cpc) >>> /* update user_block_counts */ >>> sbi->last_valid_block_count = sbi->total_valid_block_count; >>> percpu_counter_set(&sbi->alloc_valid_block_count, 0); >>> + percpu_counter_set(&sbi->rf_node_block_count, 0); >>> /* Here, we have one bio having CP pack except cp pack 2 page */ >>> f2fs_sync_meta_pages(sbi, META, LONG_MAX, FS_CP_META_IO); >>> diff --git a/fs/f2fs/f2fs.h b/fs/f2fs/f2fs.h >>> index 63c90416364b..6ddb98ff0b7c 100644 >>> --- a/fs/f2fs/f2fs.h >>> +++ b/fs/f2fs/f2fs.h >>> @@ -913,6 +913,7 @@ struct f2fs_nm_info { >>> nid_t max_nid; /* maximum possible node ids */ >>> nid_t available_nids; /* # of available node ids */ >>> nid_t next_scan_nid; /* the next nid to be scanned */ >>> + nid_t max_rf_node_blocks; /* max # of nodes for recovery */ >>> unsigned int ram_thresh; /* control the memory footprint */ >>> unsigned int ra_nid_pages; /* # of nid pages to be readaheaded */ >>> unsigned int dirty_nats_ratio; /* control dirty nats ratio threshold */ >>> @@ -1684,6 +1685,8 @@ struct f2fs_sb_info { >>> atomic_t nr_pages[NR_COUNT_TYPE]; >>> /* # of allocated blocks */ >>> struct percpu_counter alloc_valid_block_count; >>> + /* # of node block writes as roll forward recovery */ >>> + struct percpu_counter rf_node_block_count; >>> /* writeback control */ >>> atomic_t wb_sync_req[META]; /* count # of WB_SYNC threads */ >>> diff --git a/fs/f2fs/node.c b/fs/f2fs/node.c >>> index 93512f8859d5..0d9883457579 100644 >>> --- a/fs/f2fs/node.c >>> +++ b/fs/f2fs/node.c >>> @@ -1782,6 +1782,7 @@ int f2fs_fsync_node_pages(struct f2fs_sb_info *sbi, struct inode *inode, >>> if (!atomic || page == last_page) { >>> set_fsync_mark(page, 1); >>> + percpu_counter_inc(&sbi->rf_node_block_count); >> >> if (NM_I(sbi)->max_rf_node_blocks) >> percpu_counter_inc(&sbi->rf_node_block_count); > > I think we can just count this and adjust right away once sysfs is changed. Since this long recovery latency issue is a corner case, I guess we can avoid this to save cpu time... BTW, shouldn't we account all warn dnode blocks? as we will traverse all blocks there in warn node list. Thanks, > >> >> Thanks, >> >>> if (IS_INODE(page)) { >>> if (is_inode_flag_set(inode, >>> FI_DIRTY_INODE)) >>> @@ -3218,6 +3219,7 @@ static int init_node_manager(struct f2fs_sb_info *sbi) >>> nm_i->ram_thresh = DEF_RAM_THRESHOLD; >>> nm_i->ra_nid_pages = DEF_RA_NID_PAGES; >>> nm_i->dirty_nats_ratio = DEF_DIRTY_NAT_RATIO_THRESHOLD; >>> + nm_i->max_rf_node_blocks = DEF_RF_NODE_BLOCKS; >>> INIT_RADIX_TREE(&nm_i->free_nid_root, GFP_ATOMIC); >>> INIT_LIST_HEAD(&nm_i->free_nid_list); >>> diff --git a/fs/f2fs/node.h b/fs/f2fs/node.h >>> index 18b98cf0465b..4c1d34bfea78 100644 >>> --- a/fs/f2fs/node.h >>> +++ b/fs/f2fs/node.h >>> @@ -31,6 +31,9 @@ >>> /* control total # of nats */ >>> #define DEF_NAT_CACHE_THRESHOLD 100000 >>> +/* control total # of node writes used for roll-fowrad recovery */ >>> +#define DEF_RF_NODE_BLOCKS 0 >>> + >>> /* vector size for gang look-up from nat cache that consists of radix tree */ >>> #define NATVEC_SIZE 64 >>> #define SETVEC_SIZE 32 >>> diff --git a/fs/f2fs/recovery.c b/fs/f2fs/recovery.c >>> index 10d152cfa58d..1c8041fd854e 100644 >>> --- a/fs/f2fs/recovery.c >>> +++ b/fs/f2fs/recovery.c >>> @@ -53,9 +53,13 @@ extern struct kmem_cache *f2fs_cf_name_slab; >>> bool f2fs_space_for_roll_forward(struct f2fs_sb_info *sbi) >>> { >>> s64 nalloc = percpu_counter_sum_positive(&sbi->alloc_valid_block_count); >>> + u32 rf_node = percpu_counter_sum_positive(&sbi->rf_node_block_count); >>> if (sbi->last_valid_block_count + nalloc > sbi->user_block_count) >>> return false; >>> + if (NM_I(sbi)->max_rf_node_blocks && >>> + rf_node >= NM_I(sbi)->max_rf_node_blocks) >>> + return false; >>> return true; >>> } >>> diff --git a/fs/f2fs/sysfs.c b/fs/f2fs/sysfs.c >>> index 281bc0133ee6..47efcf233afd 100644 >>> --- a/fs/f2fs/sysfs.c >>> +++ b/fs/f2fs/sysfs.c >>> @@ -732,6 +732,7 @@ F2FS_RW_ATTR(SM_INFO, f2fs_sm_info, min_ssr_sections, min_ssr_sections); >>> F2FS_RW_ATTR(NM_INFO, f2fs_nm_info, ram_thresh, ram_thresh); >>> F2FS_RW_ATTR(NM_INFO, f2fs_nm_info, ra_nid_pages, ra_nid_pages); >>> F2FS_RW_ATTR(NM_INFO, f2fs_nm_info, dirty_nats_ratio, dirty_nats_ratio); >>> +F2FS_RW_ATTR(NM_INFO, f2fs_nm_info, max_roll_forward_node_blocks, max_rf_node_blocks); >>> F2FS_RW_ATTR(F2FS_SBI, f2fs_sb_info, max_victim_search, max_victim_search); >>> F2FS_RW_ATTR(F2FS_SBI, f2fs_sb_info, migration_granularity, migration_granularity); >>> F2FS_RW_ATTR(F2FS_SBI, f2fs_sb_info, dir_level, dir_level); >>> @@ -855,6 +856,7 @@ static struct attribute *f2fs_attrs[] = { >>> ATTR_LIST(ram_thresh), >>> ATTR_LIST(ra_nid_pages), >>> ATTR_LIST(dirty_nats_ratio), >>> + ATTR_LIST(max_roll_forward_node_blocks), >>> ATTR_LIST(cp_interval), >>> ATTR_LIST(idle_interval), >>> ATTR_LIST(discard_idle_interval),