Received: by 2002:a05:6a10:1287:0:0:0:0 with SMTP id d7csp7163736pxv; Fri, 30 Jul 2021 11:34:23 -0700 (PDT) X-Google-Smtp-Source: ABdhPJyZDRgEKwnM83Kvjmq42asPfQc3m2S5W1XvYprlYtMpp1McxmLpXdR0XSbM/AbnQDf3pI+0 X-Received: by 2002:a05:6402:1c19:: with SMTP id ck25mr4585759edb.128.1627670063560; Fri, 30 Jul 2021 11:34:23 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1627670063; cv=none; d=google.com; s=arc-20160816; b=J8u0DrfWKY1gWhjIocPVS0T30A4/awJz2B8M6LMmYMlBW1WJIQo9xQzaGOwCiQzwRm fMqjgDLCPTNaunaAGOJD5Ahc4Zr37YsKl9h6bSRHf+BOulzYmQJqN1muyULzzVNBDKUn Ryos6tph19Y+lN9Wih/DY77kiD59qP+Dqp5FnDnrEWJZSbClHQqb+6WLmrU4YDdzyEnU LpNlVpUu8vfY8aOkuVZnDytn9Vid/BEl52pmc8GOgovrvPTrR6Pg/rSyLBAwihH7nnIo 663UDqu7fMkqdst5mdv7/teHS9xFbJ1r0JLdDIw31i3lCIic2qaohLQZAP/ZG32Qxw5t zJnQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:in-reply-to:content-disposition:mime-version :references:message-id:subject:cc:to:from:date:dkim-signature; bh=8F9nLwpIRqS29CC1m8hnVij55sNmxb7Nojvai5RF1nI=; b=0Lr8NrCwcdvHhoI49dtQFuwPP/i4XAaAA5trn12BAbIgXPrRi+RTd5EAqJj2Z+L+r3 SZWAO39oSNSxP35E04AeE5iDJU4cjeFR8Hyb5dXXaXiPmNv8YqWctLDtxWXYsKuqh/9c sr1sik1qZU17l11U6umWchqUzi0YEjjOawN86r2zvB7Hk+hlOQaA/0eDH1QTo8kfJabh q1NMYwfChBqGKb/JnDIOwsUJXMZoYcGjtKYkYPeIIoiGk+tiCO4sISpaSBe4Z8C0MoFL VdBvgnBCXE/SNaJGRpoEfJ1reFrvGqF/8/jfgneJ/hDkFYe62jb4yp44Ac0t1GcfRBw2 2Ung== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@kernel.org header.s=k20201202 header.b=G+3Pp6RY; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id de17si2345321edb.444.2021.07.30.11.33.59; Fri, 30 Jul 2021 11:34:23 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; dkim=pass header.i=@kernel.org header.s=k20201202 header.b=G+3Pp6RY; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230521AbhG3Saa (ORCPT + 99 others); Fri, 30 Jul 2021 14:30:30 -0400 Received: from mail.kernel.org ([198.145.29.99]:40046 "EHLO mail.kernel.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230335AbhG3S2Z (ORCPT ); Fri, 30 Jul 2021 14:28:25 -0400 Received: by mail.kernel.org (Postfix) with ESMTPSA id 9203A60F4B; Fri, 30 Jul 2021 18:28:07 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1627669687; bh=KvEbNHPVlZ+FnXLTxS+GmWYfDCLL6A+oHGDmu713e1Y=; h=Date:From:To:Cc:Subject:References:In-Reply-To:From; b=G+3Pp6RYbqzvhmOMwPZ0Qr6LpvStWeRoopMIqSUEPOydLjXVlxGy82g+eF3ByHB5m 8Ftwkl73/6tP3gPBqhgM70cgBcrmXipl1xchi4LrpAhENVTeugxEFSo3M/8KAc7gHx GJChKuKA/nxC9I6t8Pg5wzw7zpGnInWv2MqLhIggrKbJUtEJWhDl4oWCdXhm+dQPIO 3iwGtGz6GMtJR6xDPRF+9jUh/FaskLRltNX7AIxiS1X5W1Uaz+fRX4jf101VjX3ArZ rJcIHLxieGmxAygcHSXB/1OpXwN5yRFJNUX1lf0UBoqmgoVMn/JrFwuDofC73MGehC uaeEAByWZ4QWw== Date: Fri, 30 Jul 2021 11:28:06 -0700 From: Jaegeuk Kim To: Chao Yu Cc: linux-f2fs-devel@lists.sourceforge.net, linux-kernel@vger.kernel.org, Chao Yu Subject: Re: [PATCH] f2fs: introduce nosmall_discard mount option Message-ID: References: <20210730100530.4401-1-chao@kernel.org> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20210730100530.4401-1-chao@kernel.org> Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 07/30, Chao Yu wrote: > As James Z reported in bugzilla: > > https://bugzilla.kernel.org/show_bug.cgi?id=213877 > > [1.] One-line summary of the problem: > Mount multiple SMR block devices exceed certain number cause system non-response > > [2.] Full description of the problem/report: > Created some F2FS on SMR devices (mkfs.f2fs -m), then mounted in sequence. Each device is the same Model: HGST HSH721414AL (Size 14TB). > Empirically, found that when the amount of SMR device * 1.5Gb > System RAM, the system ran out of memory and hung. No dmesg output. For example, 24 SMR Disk need 24*1.5GB = 36GB. A system with 32G RAM can only mount 21 devices, the 22nd device will be a reproducible cause of system hang. > The number of SMR devices with other FS mounted on this system does not interfere with the result above. > > [3.] Keywords (i.e., modules, networking, kernel): > F2FS, SMR, Memory > > [4.] Kernel information > [4.1.] Kernel version (uname -a): > Linux 5.13.4-200.fc34.x86_64 #1 SMP Tue Jul 20 20:27:29 UTC 2021 x86_64 x86_64 x86_64 GNU/Linux > > [4.2.] Kernel .config file: > Default Fedora 34 with f2fs-tools-1.14.0-2.fc34.x86_64 > > [5.] Most recent kernel version which did not have the bug: > None > > [6.] Output of Oops.. message (if applicable) with symbolic information > resolved (see Documentation/admin-guide/oops-tracing.rst) > None > > [7.] A small shell script or example program which triggers the > problem (if possible) > mount /dev/sdX /mnt/0X > > [8.] Memory consumption > > With 24 * 14T SMR Block device with F2FS > free -g > total used free shared buff/cache available > Mem: 46 36 0 0 10 10 > Swap: 0 0 0 > > With 3 * 14T SMR Block device with F2FS > free -g > total used free shared buff/cache available > Mem: 7 5 0 0 1 1 > Swap: 7 0 7 > > The root cause is, there are three bitmaps: > - cur_valid_map > - ckpt_valid_map > - discard_map > and each of them will cost ~500MB memory, {cur, ckpt}_valid_map are > necessary, but discard_map is optional, since this bitmap will only be > useful in mountpoint that small discard is enabled. > > For a blkzoned device such as SMR or ZNS devices, f2fs will only issue > discard for a section(zone) when all blocks of that section are invalid, > so, for such device, we don't need small discard functionality at all. > > This patch introduces a new mountoption "nosmall_discard" to support > disabling small discard functionality, note that the mountoption can not > be removed by remount() due to related metadata should always be > initialized during mount(). > > Signed-off-by: Chao Yu > --- > Documentation/filesystems/f2fs.rst | 4 ++ > fs/f2fs/f2fs.h | 3 ++ > fs/f2fs/segment.c | 81 +++++++++++++++++++----------- > fs/f2fs/super.c | 17 ++++++- > 4 files changed, 75 insertions(+), 30 deletions(-) > > diff --git a/Documentation/filesystems/f2fs.rst b/Documentation/filesystems/f2fs.rst > index ff9e7cc97c65..9ce32fd1c90f 100644 > --- a/Documentation/filesystems/f2fs.rst > +++ b/Documentation/filesystems/f2fs.rst > @@ -312,6 +312,10 @@ inlinecrypt When possible, encrypt/decrypt the contents of encrypted > Documentation/block/inline-encryption.rst. > atgc Enable age-threshold garbage collection, it provides high > effectiveness and efficiency on background GC. > +nosmall_discard Support to disable small discard, with this, filesystem will how about adding "discard_gran=[small|segment|section]", and keeping discard_map in the small case only? And, I think we should set the section mode for zoned device automatically. > + only issue discard aligned to segment or section in lfs mode, > + it is useful for large sized SMR or ZNS devices to reduce > + memory cost. > ======================== ============================================================ > > Debugfs Entries > diff --git a/fs/f2fs/f2fs.h b/fs/f2fs/f2fs.h > index 5d16486feb8f..fb0ce582808f 100644 > --- a/fs/f2fs/f2fs.h > +++ b/fs/f2fs/f2fs.h > @@ -99,6 +99,7 @@ extern const char *f2fs_fault_name[FAULT_MAX]; > #define F2FS_MOUNT_MERGE_CHECKPOINT 0x10000000 > #define F2FS_MOUNT_GC_MERGE 0x20000000 > #define F2FS_MOUNT_COMPRESS_CACHE 0x40000000 > +#define F2FS_MOUNT_NOSMALL_DISCARD 0x80000000 > > #define F2FS_OPTION(sbi) ((sbi)->mount_opt) > #define clear_opt(sbi, option) (F2FS_OPTION(sbi).opt &= ~F2FS_MOUNT_##option) > @@ -302,6 +303,8 @@ struct discard_entry { > /* default discard granularity of inner discard thread, unit: block count */ > #define DEFAULT_DISCARD_GRANULARITY 16 > > +#define SEGMENT_ALIGNED_DISCARD_GRANULARITY 512 > + > /* max discard pend list number */ > #define MAX_PLIST_NUM 512 > #define plist_idx(blk_num) ((blk_num) >= MAX_PLIST_NUM ? \ > diff --git a/fs/f2fs/segment.c b/fs/f2fs/segment.c > index f9b7fb785e1d..73b532601aef 100644 > --- a/fs/f2fs/segment.c > +++ b/fs/f2fs/segment.c > @@ -1871,6 +1871,7 @@ static int f2fs_issue_discard(struct f2fs_sb_info *sbi, > unsigned int offset; > block_t i; > int err = 0; > + bool small_discard = !test_opt(sbi, NOSMALL_DISCARD); > > bdev = f2fs_target_device(sbi, blkstart, NULL); > > @@ -1893,8 +1894,10 @@ static int f2fs_issue_discard(struct f2fs_sb_info *sbi, > se = get_seg_entry(sbi, GET_SEGNO(sbi, i)); > offset = GET_BLKOFF_FROM_SEG0(sbi, i); > > - if (!f2fs_test_and_set_bit(offset, se->discard_map)) > - sbi->discard_blks--; > + if (small_discard) { > + if (!f2fs_test_and_set_bit(offset, se->discard_map)) > + sbi->discard_blks--; > + } > } > > if (len) > @@ -1918,7 +1921,8 @@ static bool add_discard_addrs(struct f2fs_sb_info *sbi, struct cp_control *cpc, > struct list_head *head = &SM_I(sbi)->dcc_info->entry_list; > int i; > > - if (se->valid_blocks == max_blocks || !f2fs_hw_support_discard(sbi)) > + if (se->valid_blocks == max_blocks || !f2fs_hw_support_discard(sbi) || > + test_opt(sbi, NOSMALL_DISCARD)) > return false; > > if (!force) { > @@ -2056,6 +2060,9 @@ void f2fs_clear_prefree_segments(struct f2fs_sb_info *sbi, > } > mutex_unlock(&dirty_i->seglist_lock); > > + if (test_opt(sbi, NOSMALL_DISCARD)) > + goto wakeup; > + > /* send small discards */ > list_for_each_entry_safe(entry, this, head, list) { > unsigned int cur_pos = 0, next_pos, len, total_len = 0; > @@ -2089,6 +2096,7 @@ void f2fs_clear_prefree_segments(struct f2fs_sb_info *sbi, > dcc->nr_discards -= total_len; > } > > +wakeup: > wake_up_discard_thread(sbi, false); > } > > @@ -2108,6 +2116,9 @@ static int create_discard_cmd_control(struct f2fs_sb_info *sbi) > return -ENOMEM; > > dcc->discard_granularity = DEFAULT_DISCARD_GRANULARITY; > + if (test_opt(sbi, NOSMALL_DISCARD)) > + dcc->discard_granularity = SEGMENT_ALIGNED_DISCARD_GRANULARITY; > + > INIT_LIST_HEAD(&dcc->entry_list); > for (i = 0; i < MAX_PLIST_NUM; i++) > INIT_LIST_HEAD(&dcc->pend_list[i]); > @@ -2255,8 +2266,10 @@ static void update_sit_entry(struct f2fs_sb_info *sbi, block_t blkaddr, int del) > del = 0; > } > > - if (!f2fs_test_and_set_bit(offset, se->discard_map)) > - sbi->discard_blks--; > + if (!test_opt(sbi, NOSMALL_DISCARD)) { > + if (!f2fs_test_and_set_bit(offset, se->discard_map)) > + sbi->discard_blks--; > + } > > /* > * SSR should never reuse block which is checkpointed > @@ -2297,8 +2310,10 @@ static void update_sit_entry(struct f2fs_sb_info *sbi, block_t blkaddr, int del) > } > } > > - if (f2fs_test_and_clear_bit(offset, se->discard_map)) > - sbi->discard_blks++; > + if (!test_opt(sbi, NOSMALL_DISCARD)) { > + if (f2fs_test_and_clear_bit(offset, se->discard_map)) > + sbi->discard_blks++; > + } > } > if (!f2fs_test_bit(offset, se->ckpt_valid_map)) > se->ckpt_valid_blocks += del; > @@ -4282,6 +4297,7 @@ static int build_sit_info(struct f2fs_sb_info *sbi) > unsigned int sit_segs, start; > char *src_bitmap, *bitmap; > unsigned int bitmap_size, main_bitmap_size, sit_bitmap_size; > + unsigned int discard_map = test_opt(sbi, NOSMALL_DISCARD) ? 0 : 1; > > /* allocate memory for SIT information */ > sit_i = f2fs_kzalloc(sbi, sizeof(struct sit_info), GFP_KERNEL); > @@ -4304,9 +4320,9 @@ static int build_sit_info(struct f2fs_sb_info *sbi) > return -ENOMEM; > > #ifdef CONFIG_F2FS_CHECK_FS > - bitmap_size = MAIN_SEGS(sbi) * SIT_VBLOCK_MAP_SIZE * 4; > + bitmap_size = MAIN_SEGS(sbi) * SIT_VBLOCK_MAP_SIZE * (3 + discard_map); > #else > - bitmap_size = MAIN_SEGS(sbi) * SIT_VBLOCK_MAP_SIZE * 3; > + bitmap_size = MAIN_SEGS(sbi) * SIT_VBLOCK_MAP_SIZE * (2 + discard_map); > #endif > sit_i->bitmap = f2fs_kvzalloc(sbi, bitmap_size, GFP_KERNEL); > if (!sit_i->bitmap) > @@ -4326,8 +4342,10 @@ static int build_sit_info(struct f2fs_sb_info *sbi) > bitmap += SIT_VBLOCK_MAP_SIZE; > #endif > > - sit_i->sentries[start].discard_map = bitmap; > - bitmap += SIT_VBLOCK_MAP_SIZE; > + if (discard_map) { > + sit_i->sentries[start].discard_map = bitmap; > + bitmap += SIT_VBLOCK_MAP_SIZE; > + } > } > > sit_i->tmp_map = f2fs_kzalloc(sbi, SIT_VBLOCK_MAP_SIZE, GFP_KERNEL); > @@ -4462,6 +4480,7 @@ static int build_sit_entries(struct f2fs_sb_info *sbi) > unsigned int readed, start_blk = 0; > int err = 0; > block_t total_node_blocks = 0; > + bool small_discard = !test_opt(sbi, NOSMALL_DISCARD); > > do { > readed = f2fs_ra_meta_pages(sbi, start_blk, BIO_MAX_VECS, > @@ -4489,17 +4508,19 @@ static int build_sit_entries(struct f2fs_sb_info *sbi) > if (IS_NODESEG(se->type)) > total_node_blocks += se->valid_blocks; > > - /* build discard map only one time */ > - if (is_set_ckpt_flags(sbi, CP_TRIMMED_FLAG)) { > - memset(se->discard_map, 0xff, > - SIT_VBLOCK_MAP_SIZE); > - } else { > - memcpy(se->discard_map, > - se->cur_valid_map, > - SIT_VBLOCK_MAP_SIZE); > - sbi->discard_blks += > - sbi->blocks_per_seg - > - se->valid_blocks; > + if (small_discard) { > + /* build discard map only one time */ > + if (is_set_ckpt_flags(sbi, CP_TRIMMED_FLAG)) { > + memset(se->discard_map, 0xff, > + SIT_VBLOCK_MAP_SIZE); > + } else { > + memcpy(se->discard_map, > + se->cur_valid_map, > + SIT_VBLOCK_MAP_SIZE); > + sbi->discard_blks += > + sbi->blocks_per_seg - > + se->valid_blocks; > + } > } > > if (__is_large_section(sbi)) > @@ -4535,13 +4556,15 @@ static int build_sit_entries(struct f2fs_sb_info *sbi) > if (IS_NODESEG(se->type)) > total_node_blocks += se->valid_blocks; > > - if (is_set_ckpt_flags(sbi, CP_TRIMMED_FLAG)) { > - memset(se->discard_map, 0xff, SIT_VBLOCK_MAP_SIZE); > - } else { > - memcpy(se->discard_map, se->cur_valid_map, > - SIT_VBLOCK_MAP_SIZE); > - sbi->discard_blks += old_valid_blocks; > - sbi->discard_blks -= se->valid_blocks; > + if (small_discard) { > + if (is_set_ckpt_flags(sbi, CP_TRIMMED_FLAG)) { > + memset(se->discard_map, 0xff, SIT_VBLOCK_MAP_SIZE); > + } else { > + memcpy(se->discard_map, se->cur_valid_map, > + SIT_VBLOCK_MAP_SIZE); > + sbi->discard_blks += old_valid_blocks; > + sbi->discard_blks -= se->valid_blocks; > + } > } > > if (__is_large_section(sbi)) { > diff --git a/fs/f2fs/super.c b/fs/f2fs/super.c > index 18c1ffb8710e..98f0a84d1b1f 100644 > --- a/fs/f2fs/super.c > +++ b/fs/f2fs/super.c > @@ -155,6 +155,7 @@ enum { > Opt_atgc, > Opt_gc_merge, > Opt_nogc_merge, > + Opt_nosmall_discard, > Opt_err, > }; > > @@ -231,6 +232,7 @@ static match_table_t f2fs_tokens = { > {Opt_atgc, "atgc"}, > {Opt_gc_merge, "gc_merge"}, > {Opt_nogc_merge, "nogc_merge"}, > + {Opt_nosmall_discard, "nosmall_discard"}, > {Opt_err, NULL}, > }; > > @@ -1173,6 +1175,9 @@ static int parse_options(struct super_block *sb, char *options, bool is_remount) > case Opt_nogc_merge: > clear_opt(sbi, GC_MERGE); > break; > + case Opt_nosmall_discard: > + set_opt(sbi, NOSMALL_DISCARD); > + break; > default: > f2fs_err(sbi, "Unrecognized mount option \"%s\" or missing value", > p); > @@ -1925,6 +1930,8 @@ static int f2fs_show_options(struct seq_file *seq, struct dentry *root) > > if (test_opt(sbi, ATGC)) > seq_puts(seq, ",atgc"); > + if (test_opt(sbi, NOSMALL_DISCARD)) > + seq_puts(seq, ",nosmall_discard"); > return 0; > } > > @@ -2066,6 +2073,7 @@ static int f2fs_remount(struct super_block *sb, int *flags, char *data) > bool no_io_align = !F2FS_IO_ALIGNED(sbi); > bool no_atgc = !test_opt(sbi, ATGC); > bool no_compress_cache = !test_opt(sbi, COMPRESS_CACHE); > + bool small_discard = !test_opt(sbi, NOSMALL_DISCARD); > #ifdef CONFIG_QUOTA > int i, j; > #endif > @@ -2166,6 +2174,12 @@ static int f2fs_remount(struct super_block *sb, int *flags, char *data) > goto restore_opts; > } > > + if (small_discard == !!test_opt(sbi, NOSMALL_DISCARD)) { > + err = -EINVAL; > + f2fs_warn(sbi, "switch nosmall_discard option is not allowed"); > + goto restore_opts; > + } > + > if ((*flags & SB_RDONLY) && test_opt(sbi, DISABLE_CHECKPOINT)) { > err = -EINVAL; > f2fs_warn(sbi, "disabling checkpoint not compatible with read-only"); > @@ -3779,7 +3793,8 @@ static void f2fs_tuning_parameters(struct f2fs_sb_info *sbi) > /* adjust parameters according to the volume size */ > if (sm_i->main_segments <= SMALL_VOLUME_SEGMENTS) { > F2FS_OPTION(sbi).alloc_mode = ALLOC_MODE_REUSE; > - sm_i->dcc_info->discard_granularity = 1; > + if (!test_opt(sbi, NOSMALL_DISCARD)) > + sm_i->dcc_info->discard_granularity = 1; > sm_i->ipu_policy = 1 << F2FS_IPU_FORCE; > } > > -- > 2.22.1