As James Z reported in bugzilla:
https://bugzilla.kernel.org/show_bug.cgi?id=213877
[1.] One-line summary of the problem:
Mount multiple SMR block devices exceed certain number cause system non-response
[2.] Full description of the problem/report:
Created some F2FS on SMR devices (mkfs.f2fs -m), then mounted in sequence. Each device is the same Model: HGST HSH721414AL (Size 14TB).
Empirically, found that when the amount of SMR device * 1.5Gb > System RAM, the system ran out of memory and hung. No dmesg output. For example, 24 SMR Disk need 24*1.5GB = 36GB. A system with 32G RAM can only mount 21 devices, the 22nd device will be a reproducible cause of system hang.
The number of SMR devices with other FS mounted on this system does not interfere with the result above.
[3.] Keywords (i.e., modules, networking, kernel):
F2FS, SMR, Memory
[4.] Kernel information
[4.1.] Kernel version (uname -a):
Linux 5.13.4-200.fc34.x86_64 #1 SMP Tue Jul 20 20:27:29 UTC 2021 x86_64 x86_64 x86_64 GNU/Linux
[4.2.] Kernel .config file:
Default Fedora 34 with f2fs-tools-1.14.0-2.fc34.x86_64
[5.] Most recent kernel version which did not have the bug:
None
[6.] Output of Oops.. message (if applicable) with symbolic information
resolved (see Documentation/admin-guide/oops-tracing.rst)
None
[7.] A small shell script or example program which triggers the
problem (if possible)
mount /dev/sdX /mnt/0X
[8.] Memory consumption
With 24 * 14T SMR Block device with F2FS
free -g
total used free shared buff/cache available
Mem: 46 36 0 0 10 10
Swap: 0 0 0
With 3 * 14T SMR Block device with F2FS
free -g
total used free shared buff/cache available
Mem: 7 5 0 0 1 1
Swap: 7 0 7
The root cause is, there are three bitmaps:
- cur_valid_map
- ckpt_valid_map
- discard_map
and each of them will cost ~500MB memory, {cur, ckpt}_valid_map are
necessary, but discard_map is optional, since this bitmap will only be
useful in mountpoint that small discard is enabled.
For a blkzoned device such as SMR or ZNS devices, f2fs will only issue
discard for a section(zone) when all blocks of that section are invalid,
so, for such device, we don't need small discard functionality at all.
This patch introduces a new mountoption "nosmall_discard" to support
disabling small discard functionality, note that the mountoption can not
be removed by remount() due to related metadata should always be
initialized during mount().
Signed-off-by: Chao Yu <[email protected]>
---
Documentation/filesystems/f2fs.rst | 4 ++
fs/f2fs/f2fs.h | 3 ++
fs/f2fs/segment.c | 81 +++++++++++++++++++-----------
fs/f2fs/super.c | 17 ++++++-
4 files changed, 75 insertions(+), 30 deletions(-)
diff --git a/Documentation/filesystems/f2fs.rst b/Documentation/filesystems/f2fs.rst
index ff9e7cc97c65..9ce32fd1c90f 100644
--- a/Documentation/filesystems/f2fs.rst
+++ b/Documentation/filesystems/f2fs.rst
@@ -312,6 +312,10 @@ inlinecrypt When possible, encrypt/decrypt the contents of encrypted
Documentation/block/inline-encryption.rst.
atgc Enable age-threshold garbage collection, it provides high
effectiveness and efficiency on background GC.
+nosmall_discard Support to disable small discard, with this, filesystem will
+ only issue discard aligned to segment or section in lfs mode,
+ it is useful for large sized SMR or ZNS devices to reduce
+ memory cost.
======================== ============================================================
Debugfs Entries
diff --git a/fs/f2fs/f2fs.h b/fs/f2fs/f2fs.h
index 5d16486feb8f..fb0ce582808f 100644
--- a/fs/f2fs/f2fs.h
+++ b/fs/f2fs/f2fs.h
@@ -99,6 +99,7 @@ extern const char *f2fs_fault_name[FAULT_MAX];
#define F2FS_MOUNT_MERGE_CHECKPOINT 0x10000000
#define F2FS_MOUNT_GC_MERGE 0x20000000
#define F2FS_MOUNT_COMPRESS_CACHE 0x40000000
+#define F2FS_MOUNT_NOSMALL_DISCARD 0x80000000
#define F2FS_OPTION(sbi) ((sbi)->mount_opt)
#define clear_opt(sbi, option) (F2FS_OPTION(sbi).opt &= ~F2FS_MOUNT_##option)
@@ -302,6 +303,8 @@ struct discard_entry {
/* default discard granularity of inner discard thread, unit: block count */
#define DEFAULT_DISCARD_GRANULARITY 16
+#define SEGMENT_ALIGNED_DISCARD_GRANULARITY 512
+
/* max discard pend list number */
#define MAX_PLIST_NUM 512
#define plist_idx(blk_num) ((blk_num) >= MAX_PLIST_NUM ? \
diff --git a/fs/f2fs/segment.c b/fs/f2fs/segment.c
index f9b7fb785e1d..73b532601aef 100644
--- a/fs/f2fs/segment.c
+++ b/fs/f2fs/segment.c
@@ -1871,6 +1871,7 @@ static int f2fs_issue_discard(struct f2fs_sb_info *sbi,
unsigned int offset;
block_t i;
int err = 0;
+ bool small_discard = !test_opt(sbi, NOSMALL_DISCARD);
bdev = f2fs_target_device(sbi, blkstart, NULL);
@@ -1893,8 +1894,10 @@ static int f2fs_issue_discard(struct f2fs_sb_info *sbi,
se = get_seg_entry(sbi, GET_SEGNO(sbi, i));
offset = GET_BLKOFF_FROM_SEG0(sbi, i);
- if (!f2fs_test_and_set_bit(offset, se->discard_map))
- sbi->discard_blks--;
+ if (small_discard) {
+ if (!f2fs_test_and_set_bit(offset, se->discard_map))
+ sbi->discard_blks--;
+ }
}
if (len)
@@ -1918,7 +1921,8 @@ static bool add_discard_addrs(struct f2fs_sb_info *sbi, struct cp_control *cpc,
struct list_head *head = &SM_I(sbi)->dcc_info->entry_list;
int i;
- if (se->valid_blocks == max_blocks || !f2fs_hw_support_discard(sbi))
+ if (se->valid_blocks == max_blocks || !f2fs_hw_support_discard(sbi) ||
+ test_opt(sbi, NOSMALL_DISCARD))
return false;
if (!force) {
@@ -2056,6 +2060,9 @@ void f2fs_clear_prefree_segments(struct f2fs_sb_info *sbi,
}
mutex_unlock(&dirty_i->seglist_lock);
+ if (test_opt(sbi, NOSMALL_DISCARD))
+ goto wakeup;
+
/* send small discards */
list_for_each_entry_safe(entry, this, head, list) {
unsigned int cur_pos = 0, next_pos, len, total_len = 0;
@@ -2089,6 +2096,7 @@ void f2fs_clear_prefree_segments(struct f2fs_sb_info *sbi,
dcc->nr_discards -= total_len;
}
+wakeup:
wake_up_discard_thread(sbi, false);
}
@@ -2108,6 +2116,9 @@ static int create_discard_cmd_control(struct f2fs_sb_info *sbi)
return -ENOMEM;
dcc->discard_granularity = DEFAULT_DISCARD_GRANULARITY;
+ if (test_opt(sbi, NOSMALL_DISCARD))
+ dcc->discard_granularity = SEGMENT_ALIGNED_DISCARD_GRANULARITY;
+
INIT_LIST_HEAD(&dcc->entry_list);
for (i = 0; i < MAX_PLIST_NUM; i++)
INIT_LIST_HEAD(&dcc->pend_list[i]);
@@ -2255,8 +2266,10 @@ static void update_sit_entry(struct f2fs_sb_info *sbi, block_t blkaddr, int del)
del = 0;
}
- if (!f2fs_test_and_set_bit(offset, se->discard_map))
- sbi->discard_blks--;
+ if (!test_opt(sbi, NOSMALL_DISCARD)) {
+ if (!f2fs_test_and_set_bit(offset, se->discard_map))
+ sbi->discard_blks--;
+ }
/*
* SSR should never reuse block which is checkpointed
@@ -2297,8 +2310,10 @@ static void update_sit_entry(struct f2fs_sb_info *sbi, block_t blkaddr, int del)
}
}
- if (f2fs_test_and_clear_bit(offset, se->discard_map))
- sbi->discard_blks++;
+ if (!test_opt(sbi, NOSMALL_DISCARD)) {
+ if (f2fs_test_and_clear_bit(offset, se->discard_map))
+ sbi->discard_blks++;
+ }
}
if (!f2fs_test_bit(offset, se->ckpt_valid_map))
se->ckpt_valid_blocks += del;
@@ -4282,6 +4297,7 @@ static int build_sit_info(struct f2fs_sb_info *sbi)
unsigned int sit_segs, start;
char *src_bitmap, *bitmap;
unsigned int bitmap_size, main_bitmap_size, sit_bitmap_size;
+ unsigned int discard_map = test_opt(sbi, NOSMALL_DISCARD) ? 0 : 1;
/* allocate memory for SIT information */
sit_i = f2fs_kzalloc(sbi, sizeof(struct sit_info), GFP_KERNEL);
@@ -4304,9 +4320,9 @@ static int build_sit_info(struct f2fs_sb_info *sbi)
return -ENOMEM;
#ifdef CONFIG_F2FS_CHECK_FS
- bitmap_size = MAIN_SEGS(sbi) * SIT_VBLOCK_MAP_SIZE * 4;
+ bitmap_size = MAIN_SEGS(sbi) * SIT_VBLOCK_MAP_SIZE * (3 + discard_map);
#else
- bitmap_size = MAIN_SEGS(sbi) * SIT_VBLOCK_MAP_SIZE * 3;
+ bitmap_size = MAIN_SEGS(sbi) * SIT_VBLOCK_MAP_SIZE * (2 + discard_map);
#endif
sit_i->bitmap = f2fs_kvzalloc(sbi, bitmap_size, GFP_KERNEL);
if (!sit_i->bitmap)
@@ -4326,8 +4342,10 @@ static int build_sit_info(struct f2fs_sb_info *sbi)
bitmap += SIT_VBLOCK_MAP_SIZE;
#endif
- sit_i->sentries[start].discard_map = bitmap;
- bitmap += SIT_VBLOCK_MAP_SIZE;
+ if (discard_map) {
+ sit_i->sentries[start].discard_map = bitmap;
+ bitmap += SIT_VBLOCK_MAP_SIZE;
+ }
}
sit_i->tmp_map = f2fs_kzalloc(sbi, SIT_VBLOCK_MAP_SIZE, GFP_KERNEL);
@@ -4462,6 +4480,7 @@ static int build_sit_entries(struct f2fs_sb_info *sbi)
unsigned int readed, start_blk = 0;
int err = 0;
block_t total_node_blocks = 0;
+ bool small_discard = !test_opt(sbi, NOSMALL_DISCARD);
do {
readed = f2fs_ra_meta_pages(sbi, start_blk, BIO_MAX_VECS,
@@ -4489,17 +4508,19 @@ static int build_sit_entries(struct f2fs_sb_info *sbi)
if (IS_NODESEG(se->type))
total_node_blocks += se->valid_blocks;
- /* build discard map only one time */
- if (is_set_ckpt_flags(sbi, CP_TRIMMED_FLAG)) {
- memset(se->discard_map, 0xff,
- SIT_VBLOCK_MAP_SIZE);
- } else {
- memcpy(se->discard_map,
- se->cur_valid_map,
- SIT_VBLOCK_MAP_SIZE);
- sbi->discard_blks +=
- sbi->blocks_per_seg -
- se->valid_blocks;
+ if (small_discard) {
+ /* build discard map only one time */
+ if (is_set_ckpt_flags(sbi, CP_TRIMMED_FLAG)) {
+ memset(se->discard_map, 0xff,
+ SIT_VBLOCK_MAP_SIZE);
+ } else {
+ memcpy(se->discard_map,
+ se->cur_valid_map,
+ SIT_VBLOCK_MAP_SIZE);
+ sbi->discard_blks +=
+ sbi->blocks_per_seg -
+ se->valid_blocks;
+ }
}
if (__is_large_section(sbi))
@@ -4535,13 +4556,15 @@ static int build_sit_entries(struct f2fs_sb_info *sbi)
if (IS_NODESEG(se->type))
total_node_blocks += se->valid_blocks;
- if (is_set_ckpt_flags(sbi, CP_TRIMMED_FLAG)) {
- memset(se->discard_map, 0xff, SIT_VBLOCK_MAP_SIZE);
- } else {
- memcpy(se->discard_map, se->cur_valid_map,
- SIT_VBLOCK_MAP_SIZE);
- sbi->discard_blks += old_valid_blocks;
- sbi->discard_blks -= se->valid_blocks;
+ if (small_discard) {
+ if (is_set_ckpt_flags(sbi, CP_TRIMMED_FLAG)) {
+ memset(se->discard_map, 0xff, SIT_VBLOCK_MAP_SIZE);
+ } else {
+ memcpy(se->discard_map, se->cur_valid_map,
+ SIT_VBLOCK_MAP_SIZE);
+ sbi->discard_blks += old_valid_blocks;
+ sbi->discard_blks -= se->valid_blocks;
+ }
}
if (__is_large_section(sbi)) {
diff --git a/fs/f2fs/super.c b/fs/f2fs/super.c
index 18c1ffb8710e..98f0a84d1b1f 100644
--- a/fs/f2fs/super.c
+++ b/fs/f2fs/super.c
@@ -155,6 +155,7 @@ enum {
Opt_atgc,
Opt_gc_merge,
Opt_nogc_merge,
+ Opt_nosmall_discard,
Opt_err,
};
@@ -231,6 +232,7 @@ static match_table_t f2fs_tokens = {
{Opt_atgc, "atgc"},
{Opt_gc_merge, "gc_merge"},
{Opt_nogc_merge, "nogc_merge"},
+ {Opt_nosmall_discard, "nosmall_discard"},
{Opt_err, NULL},
};
@@ -1173,6 +1175,9 @@ static int parse_options(struct super_block *sb, char *options, bool is_remount)
case Opt_nogc_merge:
clear_opt(sbi, GC_MERGE);
break;
+ case Opt_nosmall_discard:
+ set_opt(sbi, NOSMALL_DISCARD);
+ break;
default:
f2fs_err(sbi, "Unrecognized mount option \"%s\" or missing value",
p);
@@ -1925,6 +1930,8 @@ static int f2fs_show_options(struct seq_file *seq, struct dentry *root)
if (test_opt(sbi, ATGC))
seq_puts(seq, ",atgc");
+ if (test_opt(sbi, NOSMALL_DISCARD))
+ seq_puts(seq, ",nosmall_discard");
return 0;
}
@@ -2066,6 +2073,7 @@ static int f2fs_remount(struct super_block *sb, int *flags, char *data)
bool no_io_align = !F2FS_IO_ALIGNED(sbi);
bool no_atgc = !test_opt(sbi, ATGC);
bool no_compress_cache = !test_opt(sbi, COMPRESS_CACHE);
+ bool small_discard = !test_opt(sbi, NOSMALL_DISCARD);
#ifdef CONFIG_QUOTA
int i, j;
#endif
@@ -2166,6 +2174,12 @@ static int f2fs_remount(struct super_block *sb, int *flags, char *data)
goto restore_opts;
}
+ if (small_discard == !!test_opt(sbi, NOSMALL_DISCARD)) {
+ err = -EINVAL;
+ f2fs_warn(sbi, "switch nosmall_discard option is not allowed");
+ goto restore_opts;
+ }
+
if ((*flags & SB_RDONLY) && test_opt(sbi, DISABLE_CHECKPOINT)) {
err = -EINVAL;
f2fs_warn(sbi, "disabling checkpoint not compatible with read-only");
@@ -3779,7 +3793,8 @@ static void f2fs_tuning_parameters(struct f2fs_sb_info *sbi)
/* adjust parameters according to the volume size */
if (sm_i->main_segments <= SMALL_VOLUME_SEGMENTS) {
F2FS_OPTION(sbi).alloc_mode = ALLOC_MODE_REUSE;
- sm_i->dcc_info->discard_granularity = 1;
+ if (!test_opt(sbi, NOSMALL_DISCARD))
+ sm_i->dcc_info->discard_granularity = 1;
sm_i->ipu_policy = 1 << F2FS_IPU_FORCE;
}
--
2.22.1
On 07/30, Chao Yu wrote:
> As James Z reported in bugzilla:
>
> https://bugzilla.kernel.org/show_bug.cgi?id=213877
>
> [1.] One-line summary of the problem:
> Mount multiple SMR block devices exceed certain number cause system non-response
>
> [2.] Full description of the problem/report:
> Created some F2FS on SMR devices (mkfs.f2fs -m), then mounted in sequence. Each device is the same Model: HGST HSH721414AL (Size 14TB).
> Empirically, found that when the amount of SMR device * 1.5Gb > System RAM, the system ran out of memory and hung. No dmesg output. For example, 24 SMR Disk need 24*1.5GB = 36GB. A system with 32G RAM can only mount 21 devices, the 22nd device will be a reproducible cause of system hang.
> The number of SMR devices with other FS mounted on this system does not interfere with the result above.
>
> [3.] Keywords (i.e., modules, networking, kernel):
> F2FS, SMR, Memory
>
> [4.] Kernel information
> [4.1.] Kernel version (uname -a):
> Linux 5.13.4-200.fc34.x86_64 #1 SMP Tue Jul 20 20:27:29 UTC 2021 x86_64 x86_64 x86_64 GNU/Linux
>
> [4.2.] Kernel .config file:
> Default Fedora 34 with f2fs-tools-1.14.0-2.fc34.x86_64
>
> [5.] Most recent kernel version which did not have the bug:
> None
>
> [6.] Output of Oops.. message (if applicable) with symbolic information
> resolved (see Documentation/admin-guide/oops-tracing.rst)
> None
>
> [7.] A small shell script or example program which triggers the
> problem (if possible)
> mount /dev/sdX /mnt/0X
>
> [8.] Memory consumption
>
> With 24 * 14T SMR Block device with F2FS
> free -g
> total used free shared buff/cache available
> Mem: 46 36 0 0 10 10
> Swap: 0 0 0
>
> With 3 * 14T SMR Block device with F2FS
> free -g
> total used free shared buff/cache available
> Mem: 7 5 0 0 1 1
> Swap: 7 0 7
>
> The root cause is, there are three bitmaps:
> - cur_valid_map
> - ckpt_valid_map
> - discard_map
> and each of them will cost ~500MB memory, {cur, ckpt}_valid_map are
> necessary, but discard_map is optional, since this bitmap will only be
> useful in mountpoint that small discard is enabled.
>
> For a blkzoned device such as SMR or ZNS devices, f2fs will only issue
> discard for a section(zone) when all blocks of that section are invalid,
> so, for such device, we don't need small discard functionality at all.
>
> This patch introduces a new mountoption "nosmall_discard" to support
> disabling small discard functionality, note that the mountoption can not
> be removed by remount() due to related metadata should always be
> initialized during mount().
>
> Signed-off-by: Chao Yu <[email protected]>
> ---
> Documentation/filesystems/f2fs.rst | 4 ++
> fs/f2fs/f2fs.h | 3 ++
> fs/f2fs/segment.c | 81 +++++++++++++++++++-----------
> fs/f2fs/super.c | 17 ++++++-
> 4 files changed, 75 insertions(+), 30 deletions(-)
>
> diff --git a/Documentation/filesystems/f2fs.rst b/Documentation/filesystems/f2fs.rst
> index ff9e7cc97c65..9ce32fd1c90f 100644
> --- a/Documentation/filesystems/f2fs.rst
> +++ b/Documentation/filesystems/f2fs.rst
> @@ -312,6 +312,10 @@ inlinecrypt When possible, encrypt/decrypt the contents of encrypted
> Documentation/block/inline-encryption.rst.
> atgc Enable age-threshold garbage collection, it provides high
> effectiveness and efficiency on background GC.
> +nosmall_discard Support to disable small discard, with this, filesystem will
how about adding "discard_gran=[small|segment|section]", and keeping discard_map
in the small case only? And, I think we should set the section mode for zoned
device automatically.
> + only issue discard aligned to segment or section in lfs mode,
> + it is useful for large sized SMR or ZNS devices to reduce
> + memory cost.
> ======================== ============================================================
>
> Debugfs Entries
> diff --git a/fs/f2fs/f2fs.h b/fs/f2fs/f2fs.h
> index 5d16486feb8f..fb0ce582808f 100644
> --- a/fs/f2fs/f2fs.h
> +++ b/fs/f2fs/f2fs.h
> @@ -99,6 +99,7 @@ extern const char *f2fs_fault_name[FAULT_MAX];
> #define F2FS_MOUNT_MERGE_CHECKPOINT 0x10000000
> #define F2FS_MOUNT_GC_MERGE 0x20000000
> #define F2FS_MOUNT_COMPRESS_CACHE 0x40000000
> +#define F2FS_MOUNT_NOSMALL_DISCARD 0x80000000
>
> #define F2FS_OPTION(sbi) ((sbi)->mount_opt)
> #define clear_opt(sbi, option) (F2FS_OPTION(sbi).opt &= ~F2FS_MOUNT_##option)
> @@ -302,6 +303,8 @@ struct discard_entry {
> /* default discard granularity of inner discard thread, unit: block count */
> #define DEFAULT_DISCARD_GRANULARITY 16
>
> +#define SEGMENT_ALIGNED_DISCARD_GRANULARITY 512
> +
> /* max discard pend list number */
> #define MAX_PLIST_NUM 512
> #define plist_idx(blk_num) ((blk_num) >= MAX_PLIST_NUM ? \
> diff --git a/fs/f2fs/segment.c b/fs/f2fs/segment.c
> index f9b7fb785e1d..73b532601aef 100644
> --- a/fs/f2fs/segment.c
> +++ b/fs/f2fs/segment.c
> @@ -1871,6 +1871,7 @@ static int f2fs_issue_discard(struct f2fs_sb_info *sbi,
> unsigned int offset;
> block_t i;
> int err = 0;
> + bool small_discard = !test_opt(sbi, NOSMALL_DISCARD);
>
> bdev = f2fs_target_device(sbi, blkstart, NULL);
>
> @@ -1893,8 +1894,10 @@ static int f2fs_issue_discard(struct f2fs_sb_info *sbi,
> se = get_seg_entry(sbi, GET_SEGNO(sbi, i));
> offset = GET_BLKOFF_FROM_SEG0(sbi, i);
>
> - if (!f2fs_test_and_set_bit(offset, se->discard_map))
> - sbi->discard_blks--;
> + if (small_discard) {
> + if (!f2fs_test_and_set_bit(offset, se->discard_map))
> + sbi->discard_blks--;
> + }
> }
>
> if (len)
> @@ -1918,7 +1921,8 @@ static bool add_discard_addrs(struct f2fs_sb_info *sbi, struct cp_control *cpc,
> struct list_head *head = &SM_I(sbi)->dcc_info->entry_list;
> int i;
>
> - if (se->valid_blocks == max_blocks || !f2fs_hw_support_discard(sbi))
> + if (se->valid_blocks == max_blocks || !f2fs_hw_support_discard(sbi) ||
> + test_opt(sbi, NOSMALL_DISCARD))
> return false;
>
> if (!force) {
> @@ -2056,6 +2060,9 @@ void f2fs_clear_prefree_segments(struct f2fs_sb_info *sbi,
> }
> mutex_unlock(&dirty_i->seglist_lock);
>
> + if (test_opt(sbi, NOSMALL_DISCARD))
> + goto wakeup;
> +
> /* send small discards */
> list_for_each_entry_safe(entry, this, head, list) {
> unsigned int cur_pos = 0, next_pos, len, total_len = 0;
> @@ -2089,6 +2096,7 @@ void f2fs_clear_prefree_segments(struct f2fs_sb_info *sbi,
> dcc->nr_discards -= total_len;
> }
>
> +wakeup:
> wake_up_discard_thread(sbi, false);
> }
>
> @@ -2108,6 +2116,9 @@ static int create_discard_cmd_control(struct f2fs_sb_info *sbi)
> return -ENOMEM;
>
> dcc->discard_granularity = DEFAULT_DISCARD_GRANULARITY;
> + if (test_opt(sbi, NOSMALL_DISCARD))
> + dcc->discard_granularity = SEGMENT_ALIGNED_DISCARD_GRANULARITY;
> +
> INIT_LIST_HEAD(&dcc->entry_list);
> for (i = 0; i < MAX_PLIST_NUM; i++)
> INIT_LIST_HEAD(&dcc->pend_list[i]);
> @@ -2255,8 +2266,10 @@ static void update_sit_entry(struct f2fs_sb_info *sbi, block_t blkaddr, int del)
> del = 0;
> }
>
> - if (!f2fs_test_and_set_bit(offset, se->discard_map))
> - sbi->discard_blks--;
> + if (!test_opt(sbi, NOSMALL_DISCARD)) {
> + if (!f2fs_test_and_set_bit(offset, se->discard_map))
> + sbi->discard_blks--;
> + }
>
> /*
> * SSR should never reuse block which is checkpointed
> @@ -2297,8 +2310,10 @@ static void update_sit_entry(struct f2fs_sb_info *sbi, block_t blkaddr, int del)
> }
> }
>
> - if (f2fs_test_and_clear_bit(offset, se->discard_map))
> - sbi->discard_blks++;
> + if (!test_opt(sbi, NOSMALL_DISCARD)) {
> + if (f2fs_test_and_clear_bit(offset, se->discard_map))
> + sbi->discard_blks++;
> + }
> }
> if (!f2fs_test_bit(offset, se->ckpt_valid_map))
> se->ckpt_valid_blocks += del;
> @@ -4282,6 +4297,7 @@ static int build_sit_info(struct f2fs_sb_info *sbi)
> unsigned int sit_segs, start;
> char *src_bitmap, *bitmap;
> unsigned int bitmap_size, main_bitmap_size, sit_bitmap_size;
> + unsigned int discard_map = test_opt(sbi, NOSMALL_DISCARD) ? 0 : 1;
>
> /* allocate memory for SIT information */
> sit_i = f2fs_kzalloc(sbi, sizeof(struct sit_info), GFP_KERNEL);
> @@ -4304,9 +4320,9 @@ static int build_sit_info(struct f2fs_sb_info *sbi)
> return -ENOMEM;
>
> #ifdef CONFIG_F2FS_CHECK_FS
> - bitmap_size = MAIN_SEGS(sbi) * SIT_VBLOCK_MAP_SIZE * 4;
> + bitmap_size = MAIN_SEGS(sbi) * SIT_VBLOCK_MAP_SIZE * (3 + discard_map);
> #else
> - bitmap_size = MAIN_SEGS(sbi) * SIT_VBLOCK_MAP_SIZE * 3;
> + bitmap_size = MAIN_SEGS(sbi) * SIT_VBLOCK_MAP_SIZE * (2 + discard_map);
> #endif
> sit_i->bitmap = f2fs_kvzalloc(sbi, bitmap_size, GFP_KERNEL);
> if (!sit_i->bitmap)
> @@ -4326,8 +4342,10 @@ static int build_sit_info(struct f2fs_sb_info *sbi)
> bitmap += SIT_VBLOCK_MAP_SIZE;
> #endif
>
> - sit_i->sentries[start].discard_map = bitmap;
> - bitmap += SIT_VBLOCK_MAP_SIZE;
> + if (discard_map) {
> + sit_i->sentries[start].discard_map = bitmap;
> + bitmap += SIT_VBLOCK_MAP_SIZE;
> + }
> }
>
> sit_i->tmp_map = f2fs_kzalloc(sbi, SIT_VBLOCK_MAP_SIZE, GFP_KERNEL);
> @@ -4462,6 +4480,7 @@ static int build_sit_entries(struct f2fs_sb_info *sbi)
> unsigned int readed, start_blk = 0;
> int err = 0;
> block_t total_node_blocks = 0;
> + bool small_discard = !test_opt(sbi, NOSMALL_DISCARD);
>
> do {
> readed = f2fs_ra_meta_pages(sbi, start_blk, BIO_MAX_VECS,
> @@ -4489,17 +4508,19 @@ static int build_sit_entries(struct f2fs_sb_info *sbi)
> if (IS_NODESEG(se->type))
> total_node_blocks += se->valid_blocks;
>
> - /* build discard map only one time */
> - if (is_set_ckpt_flags(sbi, CP_TRIMMED_FLAG)) {
> - memset(se->discard_map, 0xff,
> - SIT_VBLOCK_MAP_SIZE);
> - } else {
> - memcpy(se->discard_map,
> - se->cur_valid_map,
> - SIT_VBLOCK_MAP_SIZE);
> - sbi->discard_blks +=
> - sbi->blocks_per_seg -
> - se->valid_blocks;
> + if (small_discard) {
> + /* build discard map only one time */
> + if (is_set_ckpt_flags(sbi, CP_TRIMMED_FLAG)) {
> + memset(se->discard_map, 0xff,
> + SIT_VBLOCK_MAP_SIZE);
> + } else {
> + memcpy(se->discard_map,
> + se->cur_valid_map,
> + SIT_VBLOCK_MAP_SIZE);
> + sbi->discard_blks +=
> + sbi->blocks_per_seg -
> + se->valid_blocks;
> + }
> }
>
> if (__is_large_section(sbi))
> @@ -4535,13 +4556,15 @@ static int build_sit_entries(struct f2fs_sb_info *sbi)
> if (IS_NODESEG(se->type))
> total_node_blocks += se->valid_blocks;
>
> - if (is_set_ckpt_flags(sbi, CP_TRIMMED_FLAG)) {
> - memset(se->discard_map, 0xff, SIT_VBLOCK_MAP_SIZE);
> - } else {
> - memcpy(se->discard_map, se->cur_valid_map,
> - SIT_VBLOCK_MAP_SIZE);
> - sbi->discard_blks += old_valid_blocks;
> - sbi->discard_blks -= se->valid_blocks;
> + if (small_discard) {
> + if (is_set_ckpt_flags(sbi, CP_TRIMMED_FLAG)) {
> + memset(se->discard_map, 0xff, SIT_VBLOCK_MAP_SIZE);
> + } else {
> + memcpy(se->discard_map, se->cur_valid_map,
> + SIT_VBLOCK_MAP_SIZE);
> + sbi->discard_blks += old_valid_blocks;
> + sbi->discard_blks -= se->valid_blocks;
> + }
> }
>
> if (__is_large_section(sbi)) {
> diff --git a/fs/f2fs/super.c b/fs/f2fs/super.c
> index 18c1ffb8710e..98f0a84d1b1f 100644
> --- a/fs/f2fs/super.c
> +++ b/fs/f2fs/super.c
> @@ -155,6 +155,7 @@ enum {
> Opt_atgc,
> Opt_gc_merge,
> Opt_nogc_merge,
> + Opt_nosmall_discard,
> Opt_err,
> };
>
> @@ -231,6 +232,7 @@ static match_table_t f2fs_tokens = {
> {Opt_atgc, "atgc"},
> {Opt_gc_merge, "gc_merge"},
> {Opt_nogc_merge, "nogc_merge"},
> + {Opt_nosmall_discard, "nosmall_discard"},
> {Opt_err, NULL},
> };
>
> @@ -1173,6 +1175,9 @@ static int parse_options(struct super_block *sb, char *options, bool is_remount)
> case Opt_nogc_merge:
> clear_opt(sbi, GC_MERGE);
> break;
> + case Opt_nosmall_discard:
> + set_opt(sbi, NOSMALL_DISCARD);
> + break;
> default:
> f2fs_err(sbi, "Unrecognized mount option \"%s\" or missing value",
> p);
> @@ -1925,6 +1930,8 @@ static int f2fs_show_options(struct seq_file *seq, struct dentry *root)
>
> if (test_opt(sbi, ATGC))
> seq_puts(seq, ",atgc");
> + if (test_opt(sbi, NOSMALL_DISCARD))
> + seq_puts(seq, ",nosmall_discard");
> return 0;
> }
>
> @@ -2066,6 +2073,7 @@ static int f2fs_remount(struct super_block *sb, int *flags, char *data)
> bool no_io_align = !F2FS_IO_ALIGNED(sbi);
> bool no_atgc = !test_opt(sbi, ATGC);
> bool no_compress_cache = !test_opt(sbi, COMPRESS_CACHE);
> + bool small_discard = !test_opt(sbi, NOSMALL_DISCARD);
> #ifdef CONFIG_QUOTA
> int i, j;
> #endif
> @@ -2166,6 +2174,12 @@ static int f2fs_remount(struct super_block *sb, int *flags, char *data)
> goto restore_opts;
> }
>
> + if (small_discard == !!test_opt(sbi, NOSMALL_DISCARD)) {
> + err = -EINVAL;
> + f2fs_warn(sbi, "switch nosmall_discard option is not allowed");
> + goto restore_opts;
> + }
> +
> if ((*flags & SB_RDONLY) && test_opt(sbi, DISABLE_CHECKPOINT)) {
> err = -EINVAL;
> f2fs_warn(sbi, "disabling checkpoint not compatible with read-only");
> @@ -3779,7 +3793,8 @@ static void f2fs_tuning_parameters(struct f2fs_sb_info *sbi)
> /* adjust parameters according to the volume size */
> if (sm_i->main_segments <= SMALL_VOLUME_SEGMENTS) {
> F2FS_OPTION(sbi).alloc_mode = ALLOC_MODE_REUSE;
> - sm_i->dcc_info->discard_granularity = 1;
> + if (!test_opt(sbi, NOSMALL_DISCARD))
> + sm_i->dcc_info->discard_granularity = 1;
> sm_i->ipu_policy = 1 << F2FS_IPU_FORCE;
> }
>
> --
> 2.22.1
On 2021/7/31 2:28, Jaegeuk Kim wrote:
> how about adding "discard_gran=[small|segment|section]", and keeping discard_map
> in the small case only? And, I think we should set the section mode for zoned
> device automatically.
Yup, better,
About the naming, it look discard_gran is not clear here, since there is another
sysfs entry /sys/fs/f2fs/<devname>/discard_granularity has the same name, and
also semantics of newly added mount option is not only control the smallest discard
size, but also indicate discard start offset should be aligned to segment or section.
So how about using: "aligned_discard=[none|segment|section]" instead?
Thanks,
On 2021/7/31 7:41, Chao Yu wrote:
> On 2021/7/31 2:28, Jaegeuk Kim wrote:
>> how about adding "discard_gran=[small|segment|section]", and keeping discard_map
>> in the small case only? And, I think we should set the section mode for zoned
>> device automatically.
>
> Yup, better,
>
> About the naming, it look discard_gran is not clear here, since there is another
> sysfs entry /sys/fs/f2fs/<devname>/discard_granularity has the same name, and
> also semantics of newly added mount option is not only control the smallest discard
> size, but also indicate discard start offset should be aligned to segment or section.
> So how about using: "aligned_discard=[none|segment|section]" instead?
Maybe: "aligned_discard=[block|segment|section]"?
>
> Thanks,
>
>
> _______________________________________________
> Linux-f2fs-devel mailing list
> [email protected]
> https://lists.sourceforge.net/lists/listinfo/linux-f2fs-devel
>
On 07/31, Chao Yu wrote:
> On 2021/7/31 7:41, Chao Yu wrote:
> > On 2021/7/31 2:28, Jaegeuk Kim wrote:
> > > how about adding "discard_gran=[small|segment|section]", and keeping discard_map
> > > in the small case only? And, I think we should set the section mode for zoned
> > > device automatically.
> >
> > Yup, better,
> >
> > About the naming, it look discard_gran is not clear here, since there is another
> > sysfs entry /sys/fs/f2fs/<devname>/discard_granularity has the same name, and
> > also semantics of newly added mount option is not only control the smallest discard
> > size, but also indicate discard start offset should be aligned to segment or section.
> > So how about using: "aligned_discard=[none|segment|section]" instead?
>
> Maybe: "aligned_discard=[block|segment|section]"?
How about discard_unit? I feel that issuing discards in one of those unit
indicate the offset.
>
> >
> > Thanks,
> >
> >
> > _______________________________________________
> > Linux-f2fs-devel mailing list
> > [email protected]
> > https://lists.sourceforge.net/lists/listinfo/linux-f2fs-devel
> >
On 2021/7/31 8:42, Jaegeuk Kim wrote:
> On 07/31, Chao Yu wrote:
>> On 2021/7/31 7:41, Chao Yu wrote:
>>> On 2021/7/31 2:28, Jaegeuk Kim wrote:
>>>> how about adding "discard_gran=[small|segment|section]", and keeping discard_map
>>>> in the small case only? And, I think we should set the section mode for zoned
>>>> device automatically.
>>>
>>> Yup, better,
>>>
>>> About the naming, it look discard_gran is not clear here, since there is another
>>> sysfs entry /sys/fs/f2fs/<devname>/discard_granularity has the same name, and
>>> also semantics of newly added mount option is not only control the smallest discard
>>> size, but also indicate discard start offset should be aligned to segment or section.
>>> So how about using: "aligned_discard=[none|segment|section]" instead?
>>
>> Maybe: "aligned_discard=[block|segment|section]"?
>
> How about discard_unit? I feel that issuing discards in one of those unit
> indicate the offset.
Looks fine to me, let me update the patch.
Thanks,
>
>>
>>>
>>> Thanks,
>>>
>>>
>>> _______________________________________________
>>> Linux-f2fs-devel mailing list
>>> [email protected]
>>> https://lists.sourceforge.net/lists/listinfo/linux-f2fs-devel
>>>
On Fri, Jul 30, 2021 at 06:05:30PM +0800, Chao Yu wrote:
> This patch introduces a new mountoption "nosmall_discard" to support
> disabling small discard functionality, note that the mountoption can not
> be removed by remount() due to related metadata should always be
> initialized during mount().
Why does this need an option? It should be enable IFF and only IFF a
SMR drive is detected.
On 2021/8/4 21:52, Christoph Hellwig wrote:
> On Fri, Jul 30, 2021 at 06:05:30PM +0800, Chao Yu wrote:
>> This patch introduces a new mountoption "nosmall_discard" to support
>> disabling small discard functionality, note that the mountoption can not
>> be removed by remount() due to related metadata should always be
>> initialized during mount().
>
> Why does this need an option? It should be enable IFF and only IFF a
> SMR drive is detected.
>
We've updated this patch as below:
@@ -1961,10 +1998,13 @@ static void default_options(struct f2fs_sb_info *sbi)
sbi->sb->s_flags |= SB_LAZYTIME;
set_opt(sbi, FLUSH_MERGE);
set_opt(sbi, DISCARD);
- if (f2fs_sb_has_blkzoned(sbi))
+ if (f2fs_sb_has_blkzoned(sbi)) {
F2FS_OPTION(sbi).fs_mode = FS_MODE_LFS;
- else
+ F2FS_OPTION(sbi).discard_unit = DISCARD_UNIT_SECTION;
+ } else {
F2FS_OPTION(sbi).fs_mode = FS_MODE_ADAPTIVE;
+ F2FS_OPTION(sbi).discard_unit = DISCARD_UNIT_BLOCK;
+ }
https://git.kernel.org/pub/scm/linux/kernel/git/chao/linux.git/commit/?h=dev&id=4f993264fe2965c344f223d854ccbb549b16ed71
For blkzoned device, section unit based discard will be enabled by default.
For other type of devices, IMO, this is a tradeoff in between small-sized
discard support and memory overhead, so I proposed to add such mount option
to support selection from different benefit point.
Thanks,