Received: by 2002:a05:6a11:4021:0:0:0:0 with SMTP id ky33csp868845pxb; Wed, 29 Sep 2021 11:20:53 -0700 (PDT) X-Google-Smtp-Source: ABdhPJxD0KNg2EPFVgx8A7aw7w2uThE1QmecPMoP9r38hA9qs2wAHlkDDswGhUBJde+37V5WW3Ow X-Received: by 2002:a50:9d42:: with SMTP id j2mr1630756edk.7.1632939652990; Wed, 29 Sep 2021 11:20:52 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1632939652; cv=none; d=google.com; s=arc-20160816; b=hhsLbgY+74KZVcGULDwLxl6RYbY0Vp57WrBIADXPTcvC0mkSPRqtSqqLZFiIiriZqh KYzE0B1AG25/Awhcn9MQWpiAd25wQ3jTeFcZg/WcQcXVYI4cf/b2izn2Xjw2Qhlz8XlV p1mlpPjhj2QCy2b2h0K+Q96Vunhu2qX91zUmZO7evYlUZqhHafZBkPKDpljdo7AYsS86 4NlBtAiXQ5HhvGCrbXzt7+/TZOazfOagfrbZNHfvbEXhYkhmDpkLl3PIqbfk9GmtSqKW Y81MIJ89ZhfQwG/5CMEHkZk8GzJ8rSSESpDQU/PlZmluBKeYHurecJWkC5xfNhwtddHW yihQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :message-id:date:subject:cc:to:from:dkim-signature; bh=nsNTwFWS2WL84yBzPap7vLIN/SvfzO3gu+2ROTwIrU4=; b=tMNrgg5UoYlbDWYHnKGDtIY39+L69FuTO9e3GYInE5Prg/J92g9qTGPtYsj8LAOT8L nlD+cLcdeZojgg1Va+qJC7xDMsqwGgzjvv0i5rTn8YTQyzM9+4smWFgZLGJJH2uoaRTf SC5a2cnZq+eM5SHM5wrooj6tMliW8UG3h+BVnbVbtes/c2ai2xl2FdrV2ROWFQHcQFE5 KiGrO2eBip8xBoxTV0pRbRMVtqoUjytn/zBJUUbq6Y2LLZlDknXbM8jLboJ+U8mrycsp yUU/0Ogk82cpi7raduZt67Kb1qD6jX44Q1BCVINMnSbjWUL1Mfvg0aADHt1wda2QV9rD 2ycQ== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@gmail.com header.s=20210112 header.b=GzCC7m5O; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=QUARANTINE dis=NONE) header.from=gmail.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id h1si732580edk.285.2021.09.29.11.20.25; Wed, 29 Sep 2021 11:20:52 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; dkim=pass header.i=@gmail.com header.s=20210112 header.b=GzCC7m5O; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=QUARANTINE dis=NONE) header.from=gmail.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1345792AbhI2SNv (ORCPT + 99 others); Wed, 29 Sep 2021 14:13:51 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:39164 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1344188AbhI2SNu (ORCPT ); Wed, 29 Sep 2021 14:13:50 -0400 Received: from mail-pj1-x102d.google.com (mail-pj1-x102d.google.com [IPv6:2607:f8b0:4864:20::102d]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 49592C06161C for ; Wed, 29 Sep 2021 11:12:09 -0700 (PDT) Received: by mail-pj1-x102d.google.com with SMTP id me3-20020a17090b17c300b0019f44d2e401so593450pjb.5 for ; Wed, 29 Sep 2021 11:12:09 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=from:to:cc:subject:date:message-id:mime-version :content-transfer-encoding; bh=nsNTwFWS2WL84yBzPap7vLIN/SvfzO3gu+2ROTwIrU4=; b=GzCC7m5OFeNibX4Qx6lXnns5xRE9dfTsVqF2xzaaPupnPngd3FDrTFT/JWUcXCkS5U nEcFO6IwHImK9Lx7c70h2EowfW0K1OPrfVLQrtD6vYttFqeR3CFqPVNE80nT36xTDSpT ArNAwPpe1CbPtNxhqKLekth84ZVCm80Izqr9oyDjpqXTPESRtKTWQohiJj2ipQUlAJLM xQve1Xh3q73M2oc/vPaAnS2vs/PNxVSvXPcD1hVaz4hozpjwrn2Z6IbKVkKi5Ul+l4rj DpkJkXoy6TboZQvu4uxm5H9+yuTxrliPzCUYUXwx+v11ZLxhj+OU4VtauWfzfMQzQ4Ui 5XCQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:from:to:cc:subject:date:message-id:mime-version :content-transfer-encoding; bh=nsNTwFWS2WL84yBzPap7vLIN/SvfzO3gu+2ROTwIrU4=; b=doPv8JiX3/zjAz6HZPOQS5UJJ3RRJ7YzDXFfBFv/pa9gn5LBW0zZMdk0BufSN6iLbu vjRMBaYjrU0A0iNTOF+IGxsMmSIZdGomMZYd3loneSFmKyhbviJ2NuLN84p/HMgrpI8y LjBVuWe7dUyQzZz+KASfSthd86XmKDYANLcAqmtq+JA/5bFCjvymkVgMsDyzqO3zlM/T ix7khn0WswF7aUSekQHRcdCIqCTtqm0cEIb2g4ztvdUmZsgOMH5JwseRghMQAG/jrxe3 lVVIQ26vcIDThJuo9OYtHmMQnXFjTuY1LQI9c+W0tgDfIq5PnRtEnfEOVspmRzVSwoqu i7LQ== X-Gm-Message-State: AOAM530lL2FRchB9pBRHl8skXlunsAKVzvK7GzCum1eWQRhJm2QvN2QT ph07X8mj/f8EEey0mIbxgoiLYSZlnNk= X-Received: by 2002:a17:90b:104d:: with SMTP id gq13mr8071774pjb.101.1632939128197; Wed, 29 Sep 2021 11:12:08 -0700 (PDT) Received: from daehojeong-desktop.mtv.corp.google.com ([2620:15c:211:201:dbe5:409c:18b6:140e]) by smtp.gmail.com with ESMTPSA id s10sm2527108pjn.38.2021.09.29.11.12.07 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 29 Sep 2021 11:12:07 -0700 (PDT) From: Daeho Jeong To: linux-kernel@vger.kernel.org, linux-f2fs-devel@lists.sourceforge.net, kernel-team@android.com Cc: Daeho Jeong Subject: [PATCH v5] f2fs: introduce fragment allocation mode mount option Date: Wed, 29 Sep 2021 11:12:03 -0700 Message-Id: <20210929181203.1606675-1-daeho43@gmail.com> X-Mailer: git-send-email 2.33.0.685.g46640cef36-goog MIME-Version: 1.0 Content-Transfer-Encoding: 8bit Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org From: Daeho Jeong Added two options into "mode=" mount option to make it possible for developers to simulate filesystem fragmentation/after-GC situation itself. The developers use these modes to understand filesystem fragmentation/after-GC condition well, and eventually get some insights to handle them better. "fragment:segment": f2fs allocates a new segment in ramdom position. With this, we can simulate the after-GC condition. "fragment:block" : We can scatter block allocation with "max_fragment_chunk" and "max_fragment_hole" sysfs nodes. f2fs will allocate 1.. blocks in a chunk and make a hole in the length of 1.. by turns in a newly allocated free segment. Plus, this mode implicitly enables "fragment:segment" option for more randomness. Signed-off-by: Daeho Jeong --- v5: get back to original two modes. v4: implicitly enabled "fragment:segment" option only in "fragment:rand_block". v3: divided "fragment:block" mode and fixed a race condition related to making chunks. v2: changed mode name and added sysfs nodes to control the fragmentation pattern. --- Documentation/ABI/testing/sysfs-fs-f2fs | 16 ++++++++++++++++ Documentation/filesystems/f2fs.rst | 18 ++++++++++++++++++ fs/f2fs/f2fs.h | 19 +++++++++++++++++-- fs/f2fs/gc.c | 5 ++++- fs/f2fs/segment.c | 20 ++++++++++++++++++-- fs/f2fs/segment.h | 1 + fs/f2fs/super.c | 10 ++++++++++ fs/f2fs/sysfs.c | 20 ++++++++++++++++++++ 8 files changed, 104 insertions(+), 5 deletions(-) diff --git a/Documentation/ABI/testing/sysfs-fs-f2fs b/Documentation/ABI/testing/sysfs-fs-f2fs index f627e705e663..b268e3e18b4a 100644 --- a/Documentation/ABI/testing/sysfs-fs-f2fs +++ b/Documentation/ABI/testing/sysfs-fs-f2fs @@ -512,3 +512,19 @@ Date: July 2021 Contact: "Daeho Jeong" Description: You can control the multiplier value of bdi device readahead window size between 2 (default) and 256 for POSIX_FADV_SEQUENTIAL advise option. + +What: /sys/fs/f2fs//max_fragment_chunk +Date: August 2021 +Contact: "Daeho Jeong" +Description: With "mode=fragment:block" mount options, we can scatter block allocation. + f2fs will allocate 1.. blocks in a chunk and make a hole + in the length of 1.. by turns. This value can be set + between 1..512 and the default value is 4. + +What: /sys/fs/f2fs//max_fragment_hole +Date: August 2021 +Contact: "Daeho Jeong" +Description: With "mode=fragment:block" mount options, we can scatter block allocation. + f2fs will allocate 1.. blocks in a chunk and make a hole + in the length of 1.. by turns. This value can be set + between 1..512 and the default value is 4. diff --git a/Documentation/filesystems/f2fs.rst b/Documentation/filesystems/f2fs.rst index 09de6ebbbdfa..4294db649fa8 100644 --- a/Documentation/filesystems/f2fs.rst +++ b/Documentation/filesystems/f2fs.rst @@ -201,6 +201,24 @@ fault_type=%d Support configuring fault injection type, should be mode=%s Control block allocation mode which supports "adaptive" and "lfs". In "lfs" mode, there should be no random writes towards main area. + "fragment:segment" and "fragment:block" are newly added here. + These are developer options for experiments to simulate filesystem + fragmentation/after-GC situation itself. The developers use these + modes to understand filesystem fragmentation/after-GC condition well, + and eventually get some insights to handle them better. + In "fragment:segment", f2fs allocates a new segment in ramdom + position. With this, we can simulate the after-GC condition. + In "fragment:block", we can scatter block allocation with + "max_fragment_chunk" and "max_fragment_hole" sysfs nodes. + We added some randomness to both chunk and hole size to make + it close to realistic IO pattern. So, in this mode, f2fs will allocate + 1.. blocks in a chunk and make a hole in the + length of 1.. by turns. With this, the newly + allocated blocks will be scattered throughout the whole partition. + Note that "fragment:block" implicitly enables "fragment:segment" + option for more randomness. + Please, use these options for your experiments and we strongly + recommend to re-format the filesystem after using these options. io_bits=%u Set the bit size of write IO requests. It should be set with "mode=lfs". usrquota Enable plain user disk quota accounting. diff --git a/fs/f2fs/f2fs.h b/fs/f2fs/f2fs.h index c24f03e054cb..6b574c3d426e 100644 --- a/fs/f2fs/f2fs.h +++ b/fs/f2fs/f2fs.h @@ -1285,8 +1285,10 @@ enum { }; enum { - FS_MODE_ADAPTIVE, /* use both lfs/ssr allocation */ - FS_MODE_LFS, /* use lfs allocation only */ + FS_MODE_ADAPTIVE, /* use both lfs/ssr allocation */ + FS_MODE_LFS, /* use lfs allocation only */ + FS_MODE_FRAGMENT_SEG, /* segment fragmentation mode */ + FS_MODE_FRAGMENT_BLK, /* block fragmentation mode */ }; enum { @@ -1757,6 +1759,9 @@ struct f2fs_sb_info { unsigned long seq_file_ra_mul; /* multiplier for ra_pages of seq. files in fadvise */ + int max_fragment_chunk; /* max chunk size for block fragmentation mode */ + int max_fragment_hole; /* max hole size for block fragmentation mode */ + #ifdef CONFIG_F2FS_FS_COMPRESSION struct kmem_cache *page_array_slab; /* page array entry */ unsigned int page_array_slab_size; /* default page array slab size */ @@ -3517,6 +3522,16 @@ unsigned int f2fs_usable_segs_in_sec(struct f2fs_sb_info *sbi, unsigned int f2fs_usable_blks_in_seg(struct f2fs_sb_info *sbi, unsigned int segno); +#define DEF_FRAGMENT_SIZE 4 +#define MIN_FRAGMENT_SIZE 1 +#define MAX_FRAGMENT_SIZE 512 + +static inline bool f2fs_need_rand_seg(struct f2fs_sb_info *sbi) +{ + return F2FS_OPTION(sbi).fs_mode == FS_MODE_FRAGMENT_SEG || + F2FS_OPTION(sbi).fs_mode == FS_MODE_FRAGMENT_BLK; +} + /* * checkpoint.c */ diff --git a/fs/f2fs/gc.c b/fs/f2fs/gc.c index 2c18443972b6..72cfad99bbbe 100644 --- a/fs/f2fs/gc.c +++ b/fs/f2fs/gc.c @@ -14,6 +14,7 @@ #include #include #include +#include #include "f2fs.h" #include "node.h" @@ -257,7 +258,9 @@ static void select_policy(struct f2fs_sb_info *sbi, int gc_type, p->max_search = sbi->max_victim_search; /* let's select beginning hot/small space first in no_heap mode*/ - if (test_opt(sbi, NOHEAP) && + if (f2fs_need_rand_seg(sbi)) + p->offset = prandom_u32() % (MAIN_SECS(sbi) * sbi->segs_per_sec); + else if (test_opt(sbi, NOHEAP) && (type == CURSEG_HOT_DATA || IS_NODESEG(type))) p->offset = 0; else diff --git a/fs/f2fs/segment.c b/fs/f2fs/segment.c index a135d2247415..d648297a0132 100644 --- a/fs/f2fs/segment.c +++ b/fs/f2fs/segment.c @@ -15,6 +15,7 @@ #include #include #include +#include #include "f2fs.h" #include "segment.h" @@ -2630,6 +2631,8 @@ static unsigned int __get_next_segno(struct f2fs_sb_info *sbi, int type) unsigned short seg_type = curseg->seg_type; sanity_check_seg_type(sbi, seg_type); + if (f2fs_need_rand_seg(sbi)) + return prandom_u32() % (MAIN_SECS(sbi) * sbi->segs_per_sec); /* if segs_per_sec is large than 1, we need to keep original policy. */ if (__is_large_section(sbi)) @@ -2681,6 +2684,9 @@ static void new_curseg(struct f2fs_sb_info *sbi, int type, bool new_sec) curseg->next_segno = segno; reset_curseg(sbi, type, 1); curseg->alloc_type = LFS; + if (F2FS_OPTION(sbi).fs_mode == FS_MODE_FRAGMENT_BLK) + curseg->fragment_remained_chunk = + prandom_u32() % sbi->max_fragment_chunk + 1; } static int __next_free_blkoff(struct f2fs_sb_info *sbi, @@ -2707,12 +2713,22 @@ static int __next_free_blkoff(struct f2fs_sb_info *sbi, static void __refresh_next_blkoff(struct f2fs_sb_info *sbi, struct curseg_info *seg) { - if (seg->alloc_type == SSR) + if (seg->alloc_type == SSR) { seg->next_blkoff = __next_free_blkoff(sbi, seg->segno, seg->next_blkoff + 1); - else + } else { seg->next_blkoff++; + if (F2FS_OPTION(sbi).fs_mode == FS_MODE_FRAGMENT_BLK) { + /* To allocate block chunks in different sizes, use random number */ + if (--seg->fragment_remained_chunk <= 0) { + seg->fragment_remained_chunk = + prandom_u32() % sbi->max_fragment_chunk + 1; + seg->next_blkoff += + prandom_u32() % sbi->max_fragment_hole + 1; + } + } + } } bool f2fs_segment_has_free_slot(struct f2fs_sb_info *sbi, int segno) diff --git a/fs/f2fs/segment.h b/fs/f2fs/segment.h index 89fff258727d..46fde9f3f28e 100644 --- a/fs/f2fs/segment.h +++ b/fs/f2fs/segment.h @@ -314,6 +314,7 @@ struct curseg_info { unsigned short next_blkoff; /* next block offset to write */ unsigned int zone; /* current zone number */ unsigned int next_segno; /* preallocated segment */ + int fragment_remained_chunk; /* remained block size in a chunk for block fragmentation mode */ bool inited; /* indicate inmem log is inited */ }; diff --git a/fs/f2fs/super.c b/fs/f2fs/super.c index 49e153fd8183..495a76e77927 100644 --- a/fs/f2fs/super.c +++ b/fs/f2fs/super.c @@ -817,6 +817,10 @@ static int parse_options(struct super_block *sb, char *options, bool is_remount) F2FS_OPTION(sbi).fs_mode = FS_MODE_ADAPTIVE; } else if (!strcmp(name, "lfs")) { F2FS_OPTION(sbi).fs_mode = FS_MODE_LFS; + } else if (!strcmp(name, "fragment:segment")) { + F2FS_OPTION(sbi).fs_mode = FS_MODE_FRAGMENT_SEG; + } else if (!strcmp(name, "fragment:block")) { + F2FS_OPTION(sbi).fs_mode = FS_MODE_FRAGMENT_BLK; } else { kfree(name); return -EINVAL; @@ -1897,6 +1901,10 @@ static int f2fs_show_options(struct seq_file *seq, struct dentry *root) seq_puts(seq, "adaptive"); else if (F2FS_OPTION(sbi).fs_mode == FS_MODE_LFS) seq_puts(seq, "lfs"); + else if (F2FS_OPTION(sbi).fs_mode == FS_MODE_FRAGMENT_SEG) + seq_puts(seq, "fragment:segment"); + else if (F2FS_OPTION(sbi).fs_mode == FS_MODE_FRAGMENT_BLK) + seq_puts(seq, "fragment:block"); seq_printf(seq, ",active_logs=%u", F2FS_OPTION(sbi).active_logs); if (test_opt(sbi, RESERVE_ROOT)) seq_printf(seq, ",reserve_root=%u,resuid=%u,resgid=%u", @@ -3515,6 +3523,8 @@ static void init_sb_info(struct f2fs_sb_info *sbi) sbi->max_victim_search = DEF_MAX_VICTIM_SEARCH; sbi->migration_granularity = sbi->segs_per_sec; sbi->seq_file_ra_mul = MIN_RA_MUL; + sbi->max_fragment_chunk = DEF_FRAGMENT_SIZE; + sbi->max_fragment_hole = DEF_FRAGMENT_SIZE; sbi->dir_level = DEF_DIR_LEVEL; sbi->interval_time[CP_TIME] = DEF_CP_INTERVAL; diff --git a/fs/f2fs/sysfs.c b/fs/f2fs/sysfs.c index a1a3e0f6d658..54fc600a07b3 100644 --- a/fs/f2fs/sysfs.c +++ b/fs/f2fs/sysfs.c @@ -551,6 +551,22 @@ static ssize_t __sbi_store(struct f2fs_attr *a, return count; } + if (!strcmp(a->attr.name, "max_fragment_chunk")) { + if (t >= MIN_FRAGMENT_SIZE && t <= MAX_FRAGMENT_SIZE) + sbi->max_fragment_chunk = t; + else + return -EINVAL; + return count; + } + + if (!strcmp(a->attr.name, "max_fragment_hole")) { + if (t >= MIN_FRAGMENT_SIZE && t <= MAX_FRAGMENT_SIZE) + sbi->max_fragment_hole = t; + else + return -EINVAL; + return count; + } + *ui = (unsigned int)t; return count; @@ -781,6 +797,8 @@ F2FS_RW_ATTR(ATGC_INFO, atgc_management, atgc_age_threshold, age_threshold); F2FS_RW_ATTR(F2FS_SBI, f2fs_sb_info, seq_file_ra_mul, seq_file_ra_mul); F2FS_RW_ATTR(F2FS_SBI, f2fs_sb_info, gc_segment_mode, gc_segment_mode); F2FS_RW_ATTR(F2FS_SBI, f2fs_sb_info, gc_reclaimed_segments, gc_reclaimed_segs); +F2FS_RW_ATTR(F2FS_SBI, f2fs_sb_info, max_fragment_chunk, max_fragment_chunk); +F2FS_RW_ATTR(F2FS_SBI, f2fs_sb_info, max_fragment_hole, max_fragment_hole); #define ATTR_LIST(name) (&f2fs_attr_##name.attr) static struct attribute *f2fs_attrs[] = { @@ -859,6 +877,8 @@ static struct attribute *f2fs_attrs[] = { ATTR_LIST(seq_file_ra_mul), ATTR_LIST(gc_segment_mode), ATTR_LIST(gc_reclaimed_segments), + ATTR_LIST(max_fragment_chunk), + ATTR_LIST(max_fragment_hole), NULL, }; ATTRIBUTE_GROUPS(f2fs); -- 2.33.0.685.g46640cef36-goog