2023-02-21 03:57:45

by Kemeng Shi

[permalink] [raw]
Subject: [PATCH 0/7] A few patches to improve ext4 balloc

Hi, this series contain patches to properly handle error in
ext4_read_block_bitmap_nowait to avoid potential data
inconsistent, correct validation check of inode table in
ext4_valid_block_bitmap and other random cleanup patches.
More details can be found in respective log.
Thanks!

Kemeng Shi (7):
ext4: properly handle error of ext4_init_block_bitmap in
ext4_read_block_bitmap_nowait
ext4: correct validation check of inode table in
ext4_valid_block_bitmap
ext4: call ext4_bg_num_gdb_[no]meta directly in
ext4_num_base_meta_clusters
ext4: remove unnecessary check in ext4_bg_num_gdb_nometa
ext4: remove stale comment in ext4_init_block_bitmap
ext4: stop trying to verify just initialized bitmap in
ext4_read_block_bitmap_nowait
ext4: improve inode table blocks counting in
ext4_num_overhead_clusters

fs/ext4/balloc.c | 121 ++++++++++++++++++++++++-----------------------
1 file changed, 61 insertions(+), 60 deletions(-)

--
2.30.0



2023-02-21 03:57:47

by Kemeng Shi

[permalink] [raw]
Subject: [PATCH 3/7] ext4: call ext4_bg_num_gdb_[no]meta directly in ext4_num_base_meta_clusters

ext4_num_base_meta_clusters is already aware of meta_bg feature and test
if block_group is inside real meta block groups before calling
ext4_bg_num_gdb. Then ext4_bg_num_gdb will check if block group is inside
a real meta block groups again to decide either ext4_bg_num_gdb_meta or
ext4_bg_num_gdb_nometa is needed.
Call ext4_bg_num_gdb_meta or ext4_bg_num_gdb_nometa directly after we
check if block_group is inside a meta block groups in
ext4_num_base_meta_clusters to remove redundant check of meta block
groups in ext4_bg_num_gdb.

Signed-off-by: Kemeng Shi <[email protected]>
---
fs/ext4/balloc.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/fs/ext4/balloc.c b/fs/ext4/balloc.c
index 22be5cd70505..9b8a32b90ddc 100644
--- a/fs/ext4/balloc.c
+++ b/fs/ext4/balloc.c
@@ -889,11 +889,11 @@ static unsigned ext4_num_base_meta_clusters(struct super_block *sb,
block_group < le32_to_cpu(sbi->s_es->s_first_meta_bg) *
sbi->s_desc_per_block) {
if (num) {
- num += ext4_bg_num_gdb(sb, block_group);
+ num += ext4_bg_num_gdb_nometa(sb, block_group);
num += le16_to_cpu(sbi->s_es->s_reserved_gdt_blocks);
}
} else { /* For META_BG_BLOCK_GROUPS */
- num += ext4_bg_num_gdb(sb, block_group);
+ num += ext4_bg_num_gdb_meta(sb, block_group);
}
return EXT4_NUM_B2C(sbi, num);
}
--
2.30.0


2023-02-21 03:57:48

by Kemeng Shi

[permalink] [raw]
Subject: [PATCH 6/7] ext4: stop trying to verify just initialized bitmap in ext4_read_block_bitmap_nowait

For case we initialize a bitmap bh, we will set bitmap bh verified.
We can return immediately instead of goto verify to remove unnecessary
work for trying to verify bitmap bh.

Signed-off-by: Kemeng Shi <[email protected]>
---
fs/ext4/balloc.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/fs/ext4/balloc.c b/fs/ext4/balloc.c
index 4de06d68e9e7..dab46274d591 100644
--- a/fs/ext4/balloc.c
+++ b/fs/ext4/balloc.c
@@ -484,7 +484,7 @@ ext4_read_block_bitmap_nowait(struct super_block *sb, ext4_group_t block_group,
set_buffer_verified(bh);
ext4_unlock_group(sb, block_group);
unlock_buffer(bh);
- goto verify;
+ return bh;
}
ext4_unlock_group(sb, block_group);
if (buffer_uptodate(bh)) {
--
2.30.0


2023-02-21 03:57:48

by Kemeng Shi

[permalink] [raw]
Subject: [PATCH 2/7] ext4: correct validation check of inode table in ext4_valid_block_bitmap

1.Last valid cluster of inode table is EXT4_B2C(sbi, offset +
sbi->s_itb_per_group - 1). We should make sure last valid cluster is <
max_bit, i.e., EXT4_B2C(sbi, offset + sbi->s_itb_per_group - 1) is <
max_bit rather than EXT4_B2C(sbi, offset + sbi->s_itb_per_group) is
< max_bit.

2.Bit search length should be last valid cluster plus 1, i.e.,
EXT4_B2C(sbi, offset + sbi->s_itb_per_group - 1) + 1 rather than
EXT4_B2C(sbi, offset + sbi->s_itb_per_group).

Signed-off-by: Kemeng Shi <[email protected]>
---
fs/ext4/balloc.c | 6 +++---
1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/fs/ext4/balloc.c b/fs/ext4/balloc.c
index 10cab743e313..22be5cd70505 100644
--- a/fs/ext4/balloc.c
+++ b/fs/ext4/balloc.c
@@ -350,13 +350,13 @@ static ext4_fsblk_t ext4_valid_block_bitmap(struct super_block *sb,
blk = ext4_inode_table(sb, desc);
offset = blk - group_first_block;
if (offset < 0 || EXT4_B2C(sbi, offset) >= max_bit ||
- EXT4_B2C(sbi, offset + sbi->s_itb_per_group) >= max_bit)
+ EXT4_B2C(sbi, offset + sbi->s_itb_per_group - 1) >= max_bit)
return blk;
next_zero_bit = ext4_find_next_zero_bit(bh->b_data,
- EXT4_B2C(sbi, offset + sbi->s_itb_per_group),
+ EXT4_B2C(sbi, offset + sbi->s_itb_per_group - 1) + 1,
EXT4_B2C(sbi, offset));
if (next_zero_bit <
- EXT4_B2C(sbi, offset + sbi->s_itb_per_group))
+ EXT4_B2C(sbi, offset + sbi->s_itb_per_group - 1) + 1)
/* bad bitmap for inode tables */
return blk;
return 0;
--
2.30.0


2023-02-21 03:57:50

by Kemeng Shi

[permalink] [raw]
Subject: [PATCH 4/7] ext4: remove unnecessary check in ext4_bg_num_gdb_nometa

We only call ext4_bg_num_gdb_nometa if there is no meta_bg feature or group
does not reach first_meta_bg, so group must not reside in meta group. Then
group has a global gdt if superblock exists. Just remove confusing branch
that meta_bg feature exists.

Signed-off-by: Kemeng Shi <[email protected]>
---
fs/ext4/balloc.c | 5 +----
1 file changed, 1 insertion(+), 4 deletions(-)

diff --git a/fs/ext4/balloc.c b/fs/ext4/balloc.c
index 9b8a32b90ddc..08f1692f7d2f 100644
--- a/fs/ext4/balloc.c
+++ b/fs/ext4/balloc.c
@@ -844,10 +844,7 @@ static unsigned long ext4_bg_num_gdb_nometa(struct super_block *sb,
if (!ext4_bg_has_super(sb, group))
return 0;

- if (ext4_has_feature_meta_bg(sb))
- return le32_to_cpu(EXT4_SB(sb)->s_es->s_first_meta_bg);
- else
- return EXT4_SB(sb)->s_gdb_count;
+ return EXT4_SB(sb)->s_gdb_count;
}

/**
--
2.30.0


2023-02-21 03:57:51

by Kemeng Shi

[permalink] [raw]
Subject: [PATCH 5/7] ext4: remove stale comment in ext4_init_block_bitmap

Commit bdfb6ff4a255d ("ext4: mark group corrupt on group descriptor
checksum") added flag to indicate corruption of group instead of
marking all blocks used. Just remove stale comment.

Signed-off-by: Kemeng Shi <[email protected]>
---
fs/ext4/balloc.c | 2 --
1 file changed, 2 deletions(-)

diff --git a/fs/ext4/balloc.c b/fs/ext4/balloc.c
index 08f1692f7d2f..4de06d68e9e7 100644
--- a/fs/ext4/balloc.c
+++ b/fs/ext4/balloc.c
@@ -187,8 +187,6 @@ static int ext4_init_block_bitmap(struct super_block *sb,

ASSERT(buffer_locked(bh));

- /* If checksum is bad mark all blocks used to prevent allocation
- * essentially implementing a per-group read-only flag. */
if (!ext4_group_desc_csum_verify(sb, block_group, gdp)) {
ext4_mark_group_bitmap_corrupted(sb, block_group,
EXT4_GROUP_INFO_BBITMAP_CORRUPT |
--
2.30.0


2023-02-21 03:58:05

by Kemeng Shi

[permalink] [raw]
Subject: [PATCH 7/7] ext4: improve inode table blocks counting in ext4_num_overhead_clusters

As inode table blocks are contiguous, inode table blocks inside the
block_group can be represented as range [itbl_cluster_start,
itbl_cluster_last]. Then we can simply account inode table cluters and
check cluster overlap with [itbl_cluster_start, itbl_cluster_last] instead
of traverse each block of inode table.
By the way, this patch fixes code style problem of comment for
ext4_num_overhead_clusters.

Signed-off-by: Kemeng Shi <[email protected]>
---
fs/ext4/balloc.c | 90 +++++++++++++++++++++++++-----------------------
1 file changed, 47 insertions(+), 43 deletions(-)

diff --git a/fs/ext4/balloc.c b/fs/ext4/balloc.c
index dab46274d591..689edfed231a 100644
--- a/fs/ext4/balloc.c
+++ b/fs/ext4/balloc.c
@@ -80,32 +80,56 @@ static inline int ext4_block_in_group(struct super_block *sb,
return (actual_group == block_group) ? 1 : 0;
}

-/* Return the number of clusters used for file system metadata; this
+/*
+ * Return the number of clusters used for file system metadata; this
* represents the overhead needed by the file system.
*/
static unsigned ext4_num_overhead_clusters(struct super_block *sb,
ext4_group_t block_group,
struct ext4_group_desc *gdp)
{
- unsigned num_clusters;
- int block_cluster = -1, inode_cluster = -1, itbl_cluster = -1, i, c;
+ unsigned base_clusters, num_clusters;
+ int block_cluster, inode_cluster;
+ int itbl_cluster_start = -1, itbl_cluster_end = -1;
ext4_fsblk_t start = ext4_group_first_block_no(sb, block_group);
- ext4_fsblk_t itbl_blk;
+ ext4_fsblk_t end = start + EXT4_BLOCKS_PER_GROUP(sb) - 1;
+ ext4_fsblk_t itbl_blk_start, itbl_blk_end;
struct ext4_sb_info *sbi = EXT4_SB(sb);

/* This is the number of clusters used by the superblock,
* block group descriptors, and reserved block group
* descriptor blocks */
- num_clusters = ext4_num_base_meta_clusters(sb, block_group);
+ base_clusters = ext4_num_base_meta_clusters(sb, block_group);
+ num_clusters = base_clusters;

/*
- * For the allocation bitmaps and inode table, we first need
- * to check to see if the block is in the block group. If it
- * is, then check to see if the cluster is already accounted
- * for in the clusters used for the base metadata cluster, or
- * if we can increment the base metadata cluster to include
- * that block. Otherwise, we will have to track the cluster
- * used for the allocation bitmap or inode table explicitly.
+ * Account and record inode table clusters if any cluster
+ * is in the block group, or inode table cluster range is
+ * [-1, -1] and won't overlap with block/inode bitmap cluster
+ * accounted below.
+ */
+ itbl_blk_start = ext4_inode_table(sb, gdp);
+ itbl_blk_end = itbl_blk_start + sbi->s_itb_per_group - 1;
+ if (itbl_blk_start <= end && itbl_blk_end >= start) {
+ itbl_blk_start = itbl_blk_start >= start ?
+ itbl_blk_start : start;
+ itbl_blk_end = itbl_blk_end <= end ?
+ itbl_blk_end : end;
+
+ itbl_cluster_start = EXT4_B2C(sbi, itbl_blk_start - start);
+ itbl_cluster_end = EXT4_B2C(sbi, itbl_blk_end - start);
+
+ num_clusters += itbl_cluster_end - itbl_cluster_start + 1;
+ /* check if border cluster is overlapped */
+ if (itbl_cluster_start == base_clusters - 1)
+ num_clusters--;
+ }
+
+ /*
+ * For the allocation bitmaps, we first need to check to see
+ * if the block is in the block group. If it is, then check
+ * to see if the cluster is already accounted for in the clusters
+ * used for the base metadata cluster and inode tables cluster.
* Normally all of these blocks are contiguous, so the special
* case handling shouldn't be necessary except for *very*
* unusual file system layouts.
@@ -113,46 +137,26 @@ static unsigned ext4_num_overhead_clusters(struct super_block *sb,
if (ext4_block_in_group(sb, ext4_block_bitmap(sb, gdp), block_group)) {
block_cluster = EXT4_B2C(sbi,
ext4_block_bitmap(sb, gdp) - start);
- if (block_cluster < num_clusters)
- block_cluster = -1;
- else if (block_cluster == num_clusters) {
+ if (block_cluster >= base_clusters &&
+ (block_cluster < itbl_cluster_start ||
+ block_cluster > itbl_cluster_end))
num_clusters++;
- block_cluster = -1;
- }
}

if (ext4_block_in_group(sb, ext4_inode_bitmap(sb, gdp), block_group)) {
inode_cluster = EXT4_B2C(sbi,
ext4_inode_bitmap(sb, gdp) - start);
- if (inode_cluster < num_clusters)
- inode_cluster = -1;
- else if (inode_cluster == num_clusters) {
- num_clusters++;
- inode_cluster = -1;
- }
- }
-
- itbl_blk = ext4_inode_table(sb, gdp);
- for (i = 0; i < sbi->s_itb_per_group; i++) {
- if (ext4_block_in_group(sb, itbl_blk + i, block_group)) {
- c = EXT4_B2C(sbi, itbl_blk + i - start);
- if ((c < num_clusters) || (c == inode_cluster) ||
- (c == block_cluster) || (c == itbl_cluster))
- continue;
- if (c == num_clusters) {
- num_clusters++;
- continue;
- }
+ /*
+ * Additional check if inode bitmap is in just accounted
+ * block_cluster
+ */
+ if (inode_cluster != block_cluster &&
+ inode_cluster >= base_clusters &&
+ (inode_cluster < itbl_cluster_start ||
+ inode_cluster > itbl_cluster_end))
num_clusters++;
- itbl_cluster = c;
- }
}

- if (block_cluster != -1)
- num_clusters++;
- if (inode_cluster != -1)
- num_clusters++;
-
return num_clusters;
}

--
2.30.0


2023-02-22 15:14:15

by Dan Carpenter

[permalink] [raw]
Subject: Re: [PATCH 7/7] ext4: improve inode table blocks counting in ext4_num_overhead_clusters

Hi Kemeng,

https://git-scm.com/docs/git-format-patch#_base_tree_information]

url: https://github.com/intel-lab-lkp/linux/commits/Kemeng-Shi/ext4-properly-handle-error-of-ext4_init_block_bitmap-in-ext4_read_block_bitmap_nowait/20230221-115830
base: https://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4.git dev
patch link: https://lore.kernel.org/r/20230221115919.1918161-8-shikemeng%40huaweicloud.com
patch subject: [PATCH 7/7] ext4: improve inode table blocks counting in ext4_num_overhead_clusters
config: riscv-randconfig-m031-20230219 (https://download.01.org/0day-ci/archive/20230222/[email protected]/config)
compiler: riscv32-linux-gcc (GCC) 12.1.0

If you fix the issue, kindly add following tag where applicable
| Reported-by: kernel test robot <[email protected]>
| Reported-by: Dan Carpenter <[email protected]>
| Link: https://lore.kernel.org/r/[email protected]/

New smatch warnings:
fs/ext4/balloc.c:153 ext4_num_overhead_clusters() error: uninitialized symbol 'block_cluster'.

vim +/block_cluster +153 fs/ext4/balloc.c

c197855ea14175 Stephen Hemminger 2014-05-12 87 static unsigned ext4_num_overhead_clusters(struct super_block *sb,
e187c6588d6ef3 Theodore Ts'o 2009-02-06 88 ext4_group_t block_group,
e187c6588d6ef3 Theodore Ts'o 2009-02-06 89 struct ext4_group_desc *gdp)
0bf7e8379ce7e0 Jose R. Santos 2008-06-03 90 {
2b59a2fd93873a Kemeng Shi 2023-02-21 91 unsigned base_clusters, num_clusters;
2b59a2fd93873a Kemeng Shi 2023-02-21 92 int block_cluster, inode_cluster;
2b59a2fd93873a Kemeng Shi 2023-02-21 93 int itbl_cluster_start = -1, itbl_cluster_end = -1;
d5b8f31007a937 Theodore Ts'o 2011-09-09 94 ext4_fsblk_t start = ext4_group_first_block_no(sb, block_group);
2b59a2fd93873a Kemeng Shi 2023-02-21 95 ext4_fsblk_t end = start + EXT4_BLOCKS_PER_GROUP(sb) - 1;
2b59a2fd93873a Kemeng Shi 2023-02-21 96 ext4_fsblk_t itbl_blk_start, itbl_blk_end;
0bf7e8379ce7e0 Jose R. Santos 2008-06-03 97 struct ext4_sb_info *sbi = EXT4_SB(sb);
0bf7e8379ce7e0 Jose R. Santos 2008-06-03 98
d5b8f31007a937 Theodore Ts'o 2011-09-09 99 /* This is the number of clusters used by the superblock,
d5b8f31007a937 Theodore Ts'o 2011-09-09 100 * block group descriptors, and reserved block group
d5b8f31007a937 Theodore Ts'o 2011-09-09 101 * descriptor blocks */
2b59a2fd93873a Kemeng Shi 2023-02-21 102 base_clusters = ext4_num_base_meta_clusters(sb, block_group);
2b59a2fd93873a Kemeng Shi 2023-02-21 103 num_clusters = base_clusters;
2b59a2fd93873a Kemeng Shi 2023-02-21 104
2b59a2fd93873a Kemeng Shi 2023-02-21 105 /*
2b59a2fd93873a Kemeng Shi 2023-02-21 106 * Account and record inode table clusters if any cluster
2b59a2fd93873a Kemeng Shi 2023-02-21 107 * is in the block group, or inode table cluster range is
2b59a2fd93873a Kemeng Shi 2023-02-21 108 * [-1, -1] and won't overlap with block/inode bitmap cluster
2b59a2fd93873a Kemeng Shi 2023-02-21 109 * accounted below.
2b59a2fd93873a Kemeng Shi 2023-02-21 110 */
2b59a2fd93873a Kemeng Shi 2023-02-21 111 itbl_blk_start = ext4_inode_table(sb, gdp);
2b59a2fd93873a Kemeng Shi 2023-02-21 112 itbl_blk_end = itbl_blk_start + sbi->s_itb_per_group - 1;
2b59a2fd93873a Kemeng Shi 2023-02-21 113 if (itbl_blk_start <= end && itbl_blk_end >= start) {
2b59a2fd93873a Kemeng Shi 2023-02-21 114 itbl_blk_start = itbl_blk_start >= start ?
2b59a2fd93873a Kemeng Shi 2023-02-21 115 itbl_blk_start : start;
2b59a2fd93873a Kemeng Shi 2023-02-21 116 itbl_blk_end = itbl_blk_end <= end ?
2b59a2fd93873a Kemeng Shi 2023-02-21 117 itbl_blk_end : end;
2b59a2fd93873a Kemeng Shi 2023-02-21 118
2b59a2fd93873a Kemeng Shi 2023-02-21 119 itbl_cluster_start = EXT4_B2C(sbi, itbl_blk_start - start);
2b59a2fd93873a Kemeng Shi 2023-02-21 120 itbl_cluster_end = EXT4_B2C(sbi, itbl_blk_end - start);
2b59a2fd93873a Kemeng Shi 2023-02-21 121
2b59a2fd93873a Kemeng Shi 2023-02-21 122 num_clusters += itbl_cluster_end - itbl_cluster_start + 1;
2b59a2fd93873a Kemeng Shi 2023-02-21 123 /* check if border cluster is overlapped */
2b59a2fd93873a Kemeng Shi 2023-02-21 124 if (itbl_cluster_start == base_clusters - 1)
2b59a2fd93873a Kemeng Shi 2023-02-21 125 num_clusters--;
2b59a2fd93873a Kemeng Shi 2023-02-21 126 }
0bf7e8379ce7e0 Jose R. Santos 2008-06-03 127
d5b8f31007a937 Theodore Ts'o 2011-09-09 128 /*
2b59a2fd93873a Kemeng Shi 2023-02-21 129 * For the allocation bitmaps, we first need to check to see
2b59a2fd93873a Kemeng Shi 2023-02-21 130 * if the block is in the block group. If it is, then check
2b59a2fd93873a Kemeng Shi 2023-02-21 131 * to see if the cluster is already accounted for in the clusters
2b59a2fd93873a Kemeng Shi 2023-02-21 132 * used for the base metadata cluster and inode tables cluster.
d5b8f31007a937 Theodore Ts'o 2011-09-09 133 * Normally all of these blocks are contiguous, so the special
d5b8f31007a937 Theodore Ts'o 2011-09-09 134 * case handling shouldn't be necessary except for *very*
d5b8f31007a937 Theodore Ts'o 2011-09-09 135 * unusual file system layouts.
d5b8f31007a937 Theodore Ts'o 2011-09-09 136 */
d5b8f31007a937 Theodore Ts'o 2011-09-09 137 if (ext4_block_in_group(sb, ext4_block_bitmap(sb, gdp), block_group)) {
b0dd6b70f0fda1 Theodore Ts'o 2012-06-07 138 block_cluster = EXT4_B2C(sbi,
b0dd6b70f0fda1 Theodore Ts'o 2012-06-07 139 ext4_block_bitmap(sb, gdp) - start);
2b59a2fd93873a Kemeng Shi 2023-02-21 140 if (block_cluster >= base_clusters &&
2b59a2fd93873a Kemeng Shi 2023-02-21 141 (block_cluster < itbl_cluster_start ||
2b59a2fd93873a Kemeng Shi 2023-02-21 142 block_cluster > itbl_cluster_end))
d5b8f31007a937 Theodore Ts'o 2011-09-09 143 num_clusters++;
d5b8f31007a937 Theodore Ts'o 2011-09-09 144 }
d5b8f31007a937 Theodore Ts'o 2011-09-09 145
d5b8f31007a937 Theodore Ts'o 2011-09-09 146 if (ext4_block_in_group(sb, ext4_inode_bitmap(sb, gdp), block_group)) {

These two conditions are exactly the same so I don't see why they can't
be combined into one condition. I have read the comment, but I guess I
don't understand ext4 well enough to really understand it.

d5b8f31007a937 Theodore Ts'o 2011-09-09 147 inode_cluster = EXT4_B2C(sbi,
b0dd6b70f0fda1 Theodore Ts'o 2012-06-07 148 ext4_inode_bitmap(sb, gdp) - start);
2b59a2fd93873a Kemeng Shi 2023-02-21 149 /*
2b59a2fd93873a Kemeng Shi 2023-02-21 150 * Additional check if inode bitmap is in just accounted
2b59a2fd93873a Kemeng Shi 2023-02-21 151 * block_cluster
2b59a2fd93873a Kemeng Shi 2023-02-21 152 */
2b59a2fd93873a Kemeng Shi 2023-02-21 @153 if (inode_cluster != block_cluster &&

So this seems like a false positive to me. But the code is puzzling for
a human or for a static checker.

2b59a2fd93873a Kemeng Shi 2023-02-21 154 inode_cluster >= base_clusters &&
2b59a2fd93873a Kemeng Shi 2023-02-21 155 (inode_cluster < itbl_cluster_start ||
2b59a2fd93873a Kemeng Shi 2023-02-21 156 inode_cluster > itbl_cluster_end))
d5b8f31007a937 Theodore Ts'o 2011-09-09 157 num_clusters++;
0bf7e8379ce7e0 Jose R. Santos 2008-06-03 158 }
d5b8f31007a937 Theodore Ts'o 2011-09-09 159
d5b8f31007a937 Theodore Ts'o 2011-09-09 160 return num_clusters;
0bf7e8379ce7e0 Jose R. Santos 2008-06-03 161 }

--
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests


2023-02-23 01:32:08

by Kemeng Shi

[permalink] [raw]
Subject: Re: [PATCH 7/7] ext4: improve inode table blocks counting in ext4_num_overhead_clusters



on 2/22/2023 11:13 PM, Dan Carpenter wrote:
> Hi Kemeng,
>
> https://git-scm.com/docs/git-format-patch#_base_tree_information]
>
> url: https://github.com/intel-lab-lkp/linux/commits/Kemeng-Shi/ext4-properly-handle-error-of-ext4_init_block_bitmap-in-ext4_read_block_bitmap_nowait/20230221-115830
> base: https://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4.git dev
> patch link: https://lore.kernel.org/r/20230221115919.1918161-8-shikemeng%40huaweicloud.com
> patch subject: [PATCH 7/7] ext4: improve inode table blocks counting in ext4_num_overhead_clusters
> config: riscv-randconfig-m031-20230219 (https://download.01.org/0day-ci/archive/20230222/[email protected]/config)
> compiler: riscv32-linux-gcc (GCC) 12.1.0
>
> If you fix the issue, kindly add following tag where applicable
> | Reported-by: kernel test robot <[email protected]>
> | Reported-by: Dan Carpenter <[email protected]>
> | Link: https://lore.kernel.org/r/[email protected]/
>
> New smatch warnings:
> fs/ext4/balloc.c:153 ext4_num_overhead_clusters() error: uninitialized symbol 'block_cluster'.
>
> vim +/block_cluster +153 fs/ext4/balloc.c
[...]
> d5b8f31007a937 Theodore Ts'o 2011-09-09 128 /*
> 2b59a2fd93873a Kemeng Shi 2023-02-21 129 * For the allocation bitmaps, we first need to check to see
> 2b59a2fd93873a Kemeng Shi 2023-02-21 130 * if the block is in the block group. If it is, then check
> 2b59a2fd93873a Kemeng Shi 2023-02-21 131 * to see if the cluster is already accounted for in the clusters
> 2b59a2fd93873a Kemeng Shi 2023-02-21 132 * used for the base metadata cluster and inode tables cluster.
> d5b8f31007a937 Theodore Ts'o 2011-09-09 133 * Normally all of these blocks are contiguous, so the special
> d5b8f31007a937 Theodore Ts'o 2011-09-09 134 * case handling shouldn't be necessary except for *very*
> d5b8f31007a937 Theodore Ts'o 2011-09-09 135 * unusual file system layouts.
> d5b8f31007a937 Theodore Ts'o 2011-09-09 136 */
> d5b8f31007a937 Theodore Ts'o 2011-09-09 137 if (ext4_block_in_group(sb, ext4_block_bitmap(sb, gdp), block_group)) {
> b0dd6b70f0fda1 Theodore Ts'o 2012-06-07 138 block_cluster = EXT4_B2C(sbi,
> b0dd6b70f0fda1 Theodore Ts'o 2012-06-07 139 ext4_block_bitmap(sb, gdp) - start);
> 2b59a2fd93873a Kemeng Shi 2023-02-21 140 if (block_cluster >= base_clusters &&
> 2b59a2fd93873a Kemeng Shi 2023-02-21 141 (block_cluster < itbl_cluster_start ||
> 2b59a2fd93873a Kemeng Shi 2023-02-21 142 block_cluster > itbl_cluster_end))
> d5b8f31007a937 Theodore Ts'o 2011-09-09 143 num_clusters++;
> d5b8f31007a937 Theodore Ts'o 2011-09-09 144 }
> d5b8f31007a937 Theodore Ts'o 2011-09-09 145
> d5b8f31007a937 Theodore Ts'o 2011-09-09 146 if (ext4_block_in_group(sb, ext4_inode_bitmap(sb, gdp), block_group)) {
>
> These two conditions are exactly the same so I don't see why they can't
> be combined into one condition. I have read the comment, but I guess I
> don't understand ext4 well enough to really understand it.
These two conditions check two kinds of bitmap block: *block* bitmap block
and *inode* bitmap block. For case that block bitmap in the block group
while inode bitmap in a different group, there is a risk to access
uninitialized block_cluster.
I will fix this in next version, Thanks!

--
Best wishes
Kemeng Shi


2023-02-23 04:04:44

by Dan Carpenter

[permalink] [raw]
Subject: Re: [PATCH 7/7] ext4: improve inode table blocks counting in ext4_num_overhead_clusters

On Thu, Feb 23, 2023 at 09:31:54AM +0800, Kemeng Shi wrote:
>
>
> on 2/22/2023 11:13 PM, Dan Carpenter wrote:
> > Hi Kemeng,
> >
> > https://git-scm.com/docs/git-format-patch#_base_tree_information]
> >
> > url: https://github.com/intel-lab-lkp/linux/commits/Kemeng-Shi/ext4-properly-handle-error-of-ext4_init_block_bitmap-in-ext4_read_block_bitmap_nowait/20230221-115830
> > base: https://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4.git dev
> > patch link: https://lore.kernel.org/r/20230221115919.1918161-8-shikemeng%40huaweicloud.com
> > patch subject: [PATCH 7/7] ext4: improve inode table blocks counting in ext4_num_overhead_clusters
> > config: riscv-randconfig-m031-20230219 (https://download.01.org/0day-ci/archive/20230222/[email protected]/config)
> > compiler: riscv32-linux-gcc (GCC) 12.1.0
> >
> > If you fix the issue, kindly add following tag where applicable
> > | Reported-by: kernel test robot <[email protected]>
> > | Reported-by: Dan Carpenter <[email protected]>
> > | Link: https://lore.kernel.org/r/[email protected]/
> >
> > New smatch warnings:
> > fs/ext4/balloc.c:153 ext4_num_overhead_clusters() error: uninitialized symbol 'block_cluster'.
> >
> > vim +/block_cluster +153 fs/ext4/balloc.c
> [...]
> > d5b8f31007a937 Theodore Ts'o 2011-09-09 128 /*
> > 2b59a2fd93873a Kemeng Shi 2023-02-21 129 * For the allocation bitmaps, we first need to check to see
> > 2b59a2fd93873a Kemeng Shi 2023-02-21 130 * if the block is in the block group. If it is, then check
> > 2b59a2fd93873a Kemeng Shi 2023-02-21 131 * to see if the cluster is already accounted for in the clusters
> > 2b59a2fd93873a Kemeng Shi 2023-02-21 132 * used for the base metadata cluster and inode tables cluster.
> > d5b8f31007a937 Theodore Ts'o 2011-09-09 133 * Normally all of these blocks are contiguous, so the special
> > d5b8f31007a937 Theodore Ts'o 2011-09-09 134 * case handling shouldn't be necessary except for *very*
> > d5b8f31007a937 Theodore Ts'o 2011-09-09 135 * unusual file system layouts.
> > d5b8f31007a937 Theodore Ts'o 2011-09-09 136 */
> > d5b8f31007a937 Theodore Ts'o 2011-09-09 137 if (ext4_block_in_group(sb, ext4_block_bitmap(sb, gdp), block_group)) {
> > b0dd6b70f0fda1 Theodore Ts'o 2012-06-07 138 block_cluster = EXT4_B2C(sbi,
> > b0dd6b70f0fda1 Theodore Ts'o 2012-06-07 139 ext4_block_bitmap(sb, gdp) - start);
> > 2b59a2fd93873a Kemeng Shi 2023-02-21 140 if (block_cluster >= base_clusters &&
> > 2b59a2fd93873a Kemeng Shi 2023-02-21 141 (block_cluster < itbl_cluster_start ||
> > 2b59a2fd93873a Kemeng Shi 2023-02-21 142 block_cluster > itbl_cluster_end))
> > d5b8f31007a937 Theodore Ts'o 2011-09-09 143 num_clusters++;
> > d5b8f31007a937 Theodore Ts'o 2011-09-09 144 }
> > d5b8f31007a937 Theodore Ts'o 2011-09-09 145
> > d5b8f31007a937 Theodore Ts'o 2011-09-09 146 if (ext4_block_in_group(sb, ext4_inode_bitmap(sb, gdp), block_group)) {
> >
> > These two conditions are exactly the same so I don't see why they can't
> > be combined into one condition. I have read the comment, but I guess I
> > don't understand ext4 well enough to really understand it.
> These two conditions check two kinds of bitmap block: *block* bitmap block
> and *inode* bitmap block.

Ah right. When I was reviewing this code, I copy and pasted the if
conditions so they were exactly lined up with each other and I still
didn't see the block vs inode difference until you pointed it out. :P

Thanks!

regards,
dan carpenter


2023-03-17 01:53:01

by Theodore Ts'o

[permalink] [raw]
Subject: Re: [PATCH 0/7] A few patches to improve ext4 balloc

On Tue, 21 Feb 2023 19:59:12 +0800, Kemeng Shi wrote:
> ext4_read_block_bitmap_nowait to avoid potential data
> inconsistent, correct validation check of inode table in
> ext4_valid_block_bitmap and other random cleanup patches.
> More details can be found in respective log.
> Thanks!
>
> Kemeng Shi (7):
> ext4: properly handle error of ext4_init_block_bitmap in
> ext4_read_block_bitmap_nowait
> ext4: correct validation check of inode table in
> ext4_valid_block_bitmap
> ext4: call ext4_bg_num_gdb_[no]meta directly in
> ext4_num_base_meta_clusters
> ext4: remove unnecessary check in ext4_bg_num_gdb_nometa
> ext4: remove stale comment in ext4_init_block_bitmap
> ext4: stop trying to verify just initialized bitmap in
> ext4_read_block_bitmap_nowait
> ext4: improve inode table blocks counting in
> ext4_num_overhead_clusters
>
> [...]

Applied, thanks!

[1/7] ext4: properly handle error of ext4_init_block_bitmap in ext4_read_block_bitmap_nowait
commit: 68070432da0de5e346f5f75a76f76858000c8e53
[2/7] ext4: correct validation check of inode table in ext4_valid_block_bitmap
commit: 1d4ef2264e0f18275d8176fa3083a7988117de40
[3/7] ext4: call ext4_bg_num_gdb_[no]meta directly in ext4_num_base_meta_clusters
commit: 8bd501d2ef66bb06765aecfd3a69a5ac3213e28f
[4/7] ext4: remove unnecessary check in ext4_bg_num_gdb_nometa
commit: 6b8948ef40e97de05fa6329eb985c9cc1168aefa
[5/7] ext4: remove stale comment in ext4_init_block_bitmap
commit: 35690178606526d33a8ae254660a8a2239a08d6d
[6/7] ext4: stop trying to verify just initialized bitmap in ext4_read_block_bitmap_nowait
commit: 23a7838694dc48315b38d0cf111afaaedfc5081f
[7/7] ext4: improve inode table blocks counting in ext4_num_overhead_clusters
commit: e3c70113e2cbeb3dadb3768964920337eff290f6

Best regards,
--
Theodore Ts'o <[email protected]>

2023-03-20 12:45:27

by Jan Kara

[permalink] [raw]
Subject: Re: [PATCH 7/7] ext4: improve inode table blocks counting in ext4_num_overhead_clusters

On Tue 21-02-23 19:59:19, Kemeng Shi wrote:
> As inode table blocks are contiguous, inode table blocks inside the
> block_group can be represented as range [itbl_cluster_start,
> itbl_cluster_last]. Then we can simply account inode table cluters and
> check cluster overlap with [itbl_cluster_start, itbl_cluster_last] instead
> of traverse each block of inode table.
> By the way, this patch fixes code style problem of comment for
> ext4_num_overhead_clusters.
>
> Signed-off-by: Kemeng Shi <[email protected]>

FWIW this is triggering Coverity warning:

*** CID 1536792: Uninitialized variables (UNINIT)
/fs/ext4/balloc.c: 153 in ext4_num_overhead_clusters()
147 inode_cluster = EXT4_B2C(sbi,
148 ext4_inode_bitmap(sb, gdp) - st
149 /*
150 * Additional check if inode bitmap is in just accounted
151 * block_cluster
152 */
>>> CID 1536792: Uninitialized variables (UNINIT)
>>> Using uninitialized value "block_cluster".
153 if (inode_cluster != block_cluster &&
154 inode_cluster >= base_clusters &&
155 (inode_cluster < itbl_cluster_start ||
156 inode_cluster > itbl_cluster_end))
157 num_clusters++;
158 }

which actually looks valid AFAICT.

Honza
--
Jan Kara <[email protected]>
SUSE Labs, CR

2023-03-20 13:20:14

by Kemeng Shi

[permalink] [raw]
Subject: Re: [PATCH 7/7] ext4: improve inode table blocks counting in ext4_num_overhead_clusters



on 3/20/2023 8:44 PM, Jan Kara wrote:
> On Tue 21-02-23 19:59:19, Kemeng Shi wrote:
>> As inode table blocks are contiguous, inode table blocks inside the
>> block_group can be represented as range [itbl_cluster_start,
>> itbl_cluster_last]. Then we can simply account inode table cluters and
>> check cluster overlap with [itbl_cluster_start, itbl_cluster_last] instead
>> of traverse each block of inode table.
>> By the way, this patch fixes code style problem of comment for
>> ext4_num_overhead_clusters.
>>
>> Signed-off-by: Kemeng Shi <[email protected]>
>
> FWIW this is triggering Coverity warning:
>
> *** CID 1536792: Uninitialized variables (UNINIT)
> /fs/ext4/balloc.c: 153 in ext4_num_overhead_clusters()
> 147 inode_cluster = EXT4_B2C(sbi,
> 148 ext4_inode_bitmap(sb, gdp) - st
> 149 /*
> 150 * Additional check if inode bitmap is in just accounted
> 151 * block_cluster
> 152 */
>>>> CID 1536792: Uninitialized variables (UNINIT)
>>>> Using uninitialized value "block_cluster".
> 153 if (inode_cluster != block_cluster &&
> 154 inode_cluster >= base_clusters &&
> 155 (inode_cluster < itbl_cluster_start ||
> 156 inode_cluster > itbl_cluster_end))
> 157 num_clusters++;
> 158 }
>
> which actually looks valid AFAICT.
Yes, there is a risk to access uninitialized block_cluster if block bitmap block
and inode bitmap block are in different groups. Patch to fix is just sent. Thanks!

--
Best wishes
Kemeng Shi


2023-06-13 12:13:28

by Thorsten Leemhuis

[permalink] [raw]
Subject: Re: [PATCH 4/7] ext4: remove unnecessary check in ext4_bg_num_gdb_nometa

[just resend while CCing the regression list, as the report apparently
didn't make it to the lists]

On 13.06.23 13:15, Stefan Lippers-Hollmann wrote:
> On 2023-02-21, Kemeng Shi wrote:
>> We only call ext4_bg_num_gdb_nometa if there is no meta_bg feature or group
>> does not reach first_meta_bg, so group must not reside in meta group. Then
>> group has a global gdt if superblock exists. Just remove confusing branch
>> that meta_bg feature exists.
>>
>> Signed-off-by: Kemeng Shi <[email protected]>
>> ---
>> fs/ext4/balloc.c | 5 +----
>> 1 file changed, 1 insertion(+), 4 deletions(-)
>>
>> diff --git a/fs/ext4/balloc.c b/fs/ext4/balloc.c
>> index 9b8a32b90ddc..08f1692f7d2f 100644
>> --- a/fs/ext4/balloc.c
>> +++ b/fs/ext4/balloc.c
>> @@ -844,10 +844,7 @@ static unsigned long ext4_bg_num_gdb_nometa(struct super_block *sb,
>> if (!ext4_bg_has_super(sb, group))
>> return 0;
>>
>> - if (ext4_has_feature_meta_bg(sb))
>> - return le32_to_cpu(EXT4_SB(sb)->s_es->s_first_meta_bg);
>> - else
>> - return EXT4_SB(sb)->s_gdb_count;
>> + return EXT4_SB(sb)->s_gdb_count;
>> }
>>
>> /**
>
> This change as part of v6.4-rc6-26-gfb054096aea0 seems to cause a
> regression for mounting a (large, lvm2 backed, grown several times,
> probably also shrunk at least once) ext4 filesystem for me (other,
> smaller/ simpler ext4 filesystems mount fine).
>
> # LANG= mount /dev/vg-trident/storage /mnt/
> mount: /mnt: mount(2) system call failed: Structure needs cleaning.
> dmesg(1) may have more information after failed mount system call.
>
> [ 569.576241] EXT4-fs (dm-6): ext4_check_descriptors: Inode table for group 2 overlaps block group descriptors
> [ 569.576251] EXT4-fs (dm-6): group descriptors corrupted!
>
> e2fsck (1.47.0-2, Debian/unstable, amd64) does not find any problems
> about this fs.
>
> # fsck -f /dev/vg-trident/storage
> fsck from util-linux 2.38.1
> e2fsck 1.47.0 (5-Feb-2023)
> Pass 1: Checking inodes, blocks, and sizes
> Pass 2: Checking directory structure
> Pass 3: Checking directory connectivity
> Pass 4: Checking reference counts
> Pass 5: Checking group summary information
> storage: 1274324/595369984 files (0.4% non-contiguous), 4194120223/4762936320 blocks
>
> Bisecting leads me to this patch:
>
> $ git bisect log
> git bisect start
> # Status: warte auf guten und schlechten Commit
> # good: [457391b0380335d5e9a5babdec90ac53928b23b4] Linux 6.3
> git bisect good 457391b0380335d5e9a5babdec90ac53928b23b4
> # Status: warte auf schlechten Commit, 1 guter Commit bekannt
> # bad: [fb054096aea0576f0c0a61c598e5e9676443ee86] Merge tag 'mm-hotfixes-stable-2023-06-12-12-22' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
> git bisect bad fb054096aea0576f0c0a61c598e5e9676443ee86
> # bad: [6e98b09da931a00bf4e0477d0fa52748bf28fcce] Merge tag 'net-next-6.4' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next
> git bisect bad 6e98b09da931a00bf4e0477d0fa52748bf28fcce
> # good: [088e0c188513b58a0056a488cf5b7df094a8a48a] Merge tag 'platform-drivers-x86-v6.4-1' of git://git.kernel.org/pub/scm/linux/kernel/git/pdx86/platform-drivers-x86
> git bisect good 088e0c188513b58a0056a488cf5b7df094a8a48a
> # good: [ca288965801572fe41386560d4e6c5cc0e5cc56d] Merge tag 'wireless-next-2023-04-21' of git://git.kernel.org/pub/scm/linux/kernel/git/wireless/wireless-next
> git bisect good ca288965801572fe41386560d4e6c5cc0e5cc56d
> # bad: [94fc0792661a96d64a4bb79cf10d0793ecadf76e] Merge tag 'fs_for_v6.4-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs
> git bisect bad 94fc0792661a96d64a4bb79cf10d0793ecadf76e
> # good: [4173cf6fb6b7d1b4569cca08af318c4561356fb5] Merge tag 'hwmon-for-v6.4' of git://git.kernel.org/pub/scm/linux/kernel/git/groeck/linux-staging
> git bisect good 4173cf6fb6b7d1b4569cca08af318c4561356fb5
> # good: [98f99e67a1dc456e9a542584819b2aa265ffc737] Merge tag 'flex-array-transformations-6.4-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gustavoars/linux
> git bisect good 98f99e67a1dc456e9a542584819b2aa265ffc737
> # good: [ba24b8eb3ef676cb7d6cef4a2a53f3624f880d42] crypto: testmgr - Add some test vectors for cmac(camellia)
> git bisect good ba24b8eb3ef676cb7d6cef4a2a53f3624f880d42
> # bad: [4d934a5e6caa6dcdd3fbee7b96fe512a455863b6] ext4: Convert ext4_write_begin() to use a folio
> git bisect bad 4d934a5e6caa6dcdd3fbee7b96fe512a455863b6
> # bad: [78dc9f844f4ec999a30517313040948a4c4bbc00] ext4: use best found when complex scan of group finishs
> git bisect bad 78dc9f844f4ec999a30517313040948a4c4bbc00
> # bad: [b83acc77718644b2c19832226ca284ce01efc550] ext4: remove unused group parameter in ext4_inode_bitmap_csum_verify
> git bisect bad b83acc77718644b2c19832226ca284ce01efc550
> # good: [e6c28a26b799c7640b77daff3e4a67808c74381c] ext4: Fix warnings when freezing filesystem with journaled data
> git bisect good e6c28a26b799c7640b77daff3e4a67808c74381c
> # good: [a38627f14356f505f621b31197fd872b99a10563] ext4: call ext4_bg_num_gdb_[no]meta directly in ext4_num_base_meta_clusters
> git bisect good a38627f14356f505f621b31197fd872b99a10563
> # bad: [f567ea7843562b7d7dfe1e6cfea455a1e9623208] ext4: remove stale comment in ext4_init_block_bitmap
> git bisect bad f567ea7843562b7d7dfe1e6cfea455a1e9623208
>
> # tune2fs -l /dev/vg-trident/storage
> tune2fs 1.47.0 (5-Feb-2023)
> Filesystem volume name: storage
> Last mounted on: /srv/storage
> Filesystem UUID: ad4e7e8d-f375-491f-a5dd-6251d32229e6
> Filesystem magic number: 0xEF53
> Filesystem revision #: 1 (dynamic)
> Filesystem features: has_journal ext_attr dir_index filetype meta_bg extent 64bit flex_bg sparse_super large_file huge_file dir_nlink extra_isize metadata_csum
> Filesystem flags: signed_directory_hash
> Default mount options: user_xattr acl
> Filesystem state: clean
> Errors behavior: Continue
> Filesystem OS type: Linux
> Inode count: 595369984
> Block count: 4762936320
> Reserved block count: 0
> Overhead clusters: 37571453
> Free blocks: 568816097
> Free inodes: 594095660
> First block: 0
> Block size: 4096
> Fragment size: 4096
> Group descriptor size: 64
> Blocks per group: 32768
> Fragments per group: 32768
> Inodes per group: 4096
> Inode blocks per group: 256
> First meta block group: 1453
> Flex block group size: 16
> Filesystem created: Tue Dec 16 03:26:32 2014
> Last mount time: Tue Jun 13 08:54:33 2023
> Last write time: Tue Jun 13 08:55:57 2023
> Mount count: 0
> Maximum mount count: -1
> Last checked: Tue Jun 13 08:55:57 2023
> Check interval: 0 (<none>)
> Lifetime writes: 145 TB
> Reserved blocks uid: 0 (user root)
> Reserved blocks gid: 0 (group root)
> First inode: 11
> Inode size: 256
> Required extra isize: 28
> Desired extra isize: 28
> Journal inode: 8
> Default directory hash: half_md4
> Directory Hash Seed: 3e89594a-1c09-4c46-bb6e-402ad3d697b8
> Journal backup: inode blocks
> Checksum type: crc32c
> Checksum: 0xa57a2ce7
>
> -- dumpe2fs --
>
> dumpe2fs 1.47.0 (5-Feb-2023)
> Filesystem volume name: storage
> Last mounted on: /srv/storage
> Filesystem UUID: ad4e7e8d-f375-491f-a5dd-6251d32229e6
> Filesystem magic number: 0xEF53
> Filesystem revision #: 1 (dynamic)
> Filesystem features: has_journal ext_attr dir_index filetype meta_bg extent 64bit flex_bg sparse_super large_file huge_file dir_nlink extra_isize metadata_csum
> Filesystem flags: signed_directory_hash
> Default mount options: user_xattr acl
> Filesystem state: clean
> Errors behavior: Continue
> Filesystem OS type: Linux
> Inode count: 595369984
> Block count: 4762936320
> Reserved block count: 0
> Overhead clusters: 37571453
> Free blocks: 568816097
> Free inodes: 594095660
> First block: 0
> Block size: 4096
> Fragment size: 4096
> Group descriptor size: 64
> Blocks per group: 32768
> Fragments per group: 32768
> Inodes per group: 4096
> Inode blocks per group: 256
> First meta block group: 1453
> Flex block group size: 16
> Filesystem created: Tue Dec 16 03:26:32 2014
> Last mount time: Tue Jun 13 08:54:33 2023
> Last write time: Tue Jun 13 09:07:45 2023
> Mount count: 0
> Maximum mount count: -1
> Last checked: Tue Jun 13 09:07:45 2023
> Check interval: 0 (<none>)
> Lifetime writes: 145 TB
> Reserved blocks uid: 0 (user root)
> Reserved blocks gid: 0 (group root)
> First inode: 11
> Inode size: 256
> Required extra isize: 28
> Desired extra isize: 28
> Journal inode: 8
> Default directory hash: half_md4
> Directory Hash Seed: 3e89594a-1c09-4c46-bb6e-402ad3d697b8
> Journal backup: inode blocks
> Checksum type: crc32c
> Checksum: 0xf32a54d5
> Journal features: journal_incompat_revoke journal_64bit journal_checksum_v3
> Total journal size: 128M
> Total journal blocks: 32768
> Max transaction length: 32768
> Fast commit length: 0
> Journal sequence: 0x0074096a
> Journal start: 0
> Journal checksum type: crc32c
> Journal checksum: 0x478ec5ac
>
> [... cut, the full dumpe2fs is ~60 MB, and even xz -9 compressed
> still ~6 MB large ...]
>
> Reverting (just) ad3f09be6cfe332be8ff46c78e6ec0f8839107aa from
> v6.4-rc6-27-ge328c473340c allows me to mount this filesystem again.
>
> This seems to be the same issue as:
> https://bugzilla.kernel.org/show_bug.cgi?id=217534
> https://lkml.kernel.org/r/<[email protected]>
>
> Regards
> Stefan Lippers-Hollmann
>

2023-06-13 14:51:56

by Kemeng Shi

[permalink] [raw]
Subject: Re: [PATCH 4/7] ext4: remove unnecessary check in ext4_bg_num_gdb_nometa



on 6/13/2023 7:15 PM, Stefan Lippers-Hollmann wrote:
> Hi
>
> On 2023-02-21, Kemeng Shi wrote:
>> We only call ext4_bg_num_gdb_nometa if there is no meta_bg feature or group
>> does not reach first_meta_bg, so group must not reside in meta group. Then
>> group has a global gdt if superblock exists. Just remove confusing branch
>> that meta_bg feature exists.
>>
>> Signed-off-by: Kemeng Shi <[email protected]>
>> ---
>> fs/ext4/balloc.c | 5 +----
>> 1 file changed, 1 insertion(+), 4 deletions(-)
>>
>> diff --git a/fs/ext4/balloc.c b/fs/ext4/balloc.c
>> index 9b8a32b90ddc..08f1692f7d2f 100644
>> --- a/fs/ext4/balloc.c
>> +++ b/fs/ext4/balloc.c
>> @@ -844,10 +844,7 @@ static unsigned long ext4_bg_num_gdb_nometa(struct super_block *sb,
>> if (!ext4_bg_has_super(sb, group))
>> return 0;
>>
>> - if (ext4_has_feature_meta_bg(sb))
>> - return le32_to_cpu(EXT4_SB(sb)->s_es->s_first_meta_bg);
>> - else
>> - return EXT4_SB(sb)->s_gdb_count;
>> + return EXT4_SB(sb)->s_gdb_count;
>> }
>>
>> /**
>
> This change as part of v6.4-rc6-26-gfb054096aea0 seems to cause a
> regression for mounting a (large, lvm2 backed, grown several times,
> probably also shrunk at least once) ext4 filesystem for me (other,
> smaller/ simpler ext4 filesystems mount fine).
>
Hi Stefan, thank you for let me know the issue. It seems that s_gdb_count
is block number used for both non-meta and meta block group descriptors.
However we should return block number used for non-meta block group in
ext4_bg_num_gdb_nometa.

Sorry for this problem, I will send a revert patch to fix this.
> # LANG= mount /dev/vg-trident/storage /mnt/
> mount: /mnt: mount(2) system call failed: Structure needs cleaning.
> dmesg(1) may have more information after failed mount system call.
>
> [ 569.576241] EXT4-fs (dm-6): ext4_check_descriptors: Inode table for group 2 overlaps block group descriptors
> [ 569.576251] EXT4-fs (dm-6): group descriptors corrupted!
>
> e2fsck (1.47.0-2, Debian/unstable, amd64) does not find any problems
> about this fs.
>
> # fsck -f /dev/vg-trident/storage
> fsck from util-linux 2.38.1
> e2fsck 1.47.0 (5-Feb-2023)
> Pass 1: Checking inodes, blocks, and sizes
> Pass 2: Checking directory structure
> Pass 3: Checking directory connectivity
> Pass 4: Checking reference counts
> Pass 5: Checking group summary information
> storage: 1274324/595369984 files (0.4% non-contiguous), 4194120223/4762936320 blocks
>
> Bisecting leads me to this patch:
>
> $ git bisect log
> git bisect start
> # Status: warte auf guten und schlechten Commit
> # good: [457391b0380335d5e9a5babdec90ac53928b23b4] Linux 6.3
> git bisect good 457391b0380335d5e9a5babdec90ac53928b23b4
> # Status: warte auf schlechten Commit, 1 guter Commit bekannt
> # bad: [fb054096aea0576f0c0a61c598e5e9676443ee86] Merge tag 'mm-hotfixes-stable-2023-06-12-12-22' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
> git bisect bad fb054096aea0576f0c0a61c598e5e9676443ee86
> # bad: [6e98b09da931a00bf4e0477d0fa52748bf28fcce] Merge tag 'net-next-6.4' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next
> git bisect bad 6e98b09da931a00bf4e0477d0fa52748bf28fcce
> # good: [088e0c188513b58a0056a488cf5b7df094a8a48a] Merge tag 'platform-drivers-x86-v6.4-1' of git://git.kernel.org/pub/scm/linux/kernel/git/pdx86/platform-drivers-x86
> git bisect good 088e0c188513b58a0056a488cf5b7df094a8a48a
> # good: [ca288965801572fe41386560d4e6c5cc0e5cc56d] Merge tag 'wireless-next-2023-04-21' of git://git.kernel.org/pub/scm/linux/kernel/git/wireless/wireless-next
> git bisect good ca288965801572fe41386560d4e6c5cc0e5cc56d
> # bad: [94fc0792661a96d64a4bb79cf10d0793ecadf76e] Merge tag 'fs_for_v6.4-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs
> git bisect bad 94fc0792661a96d64a4bb79cf10d0793ecadf76e
> # good: [4173cf6fb6b7d1b4569cca08af318c4561356fb5] Merge tag 'hwmon-for-v6.4' of git://git.kernel.org/pub/scm/linux/kernel/git/groeck/linux-staging
> git bisect good 4173cf6fb6b7d1b4569cca08af318c4561356fb5
> # good: [98f99e67a1dc456e9a542584819b2aa265ffc737] Merge tag 'flex-array-transformations-6.4-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gustavoars/linux
> git bisect good 98f99e67a1dc456e9a542584819b2aa265ffc737
> # good: [ba24b8eb3ef676cb7d6cef4a2a53f3624f880d42] crypto: testmgr - Add some test vectors for cmac(camellia)
> git bisect good ba24b8eb3ef676cb7d6cef4a2a53f3624f880d42
> # bad: [4d934a5e6caa6dcdd3fbee7b96fe512a455863b6] ext4: Convert ext4_write_begin() to use a folio
> git bisect bad 4d934a5e6caa6dcdd3fbee7b96fe512a455863b6
> # bad: [78dc9f844f4ec999a30517313040948a4c4bbc00] ext4: use best found when complex scan of group finishs
> git bisect bad 78dc9f844f4ec999a30517313040948a4c4bbc00
> # bad: [b83acc77718644b2c19832226ca284ce01efc550] ext4: remove unused group parameter in ext4_inode_bitmap_csum_verify
> git bisect bad b83acc77718644b2c19832226ca284ce01efc550
> # good: [e6c28a26b799c7640b77daff3e4a67808c74381c] ext4: Fix warnings when freezing filesystem with journaled data
> git bisect good e6c28a26b799c7640b77daff3e4a67808c74381c
> # good: [a38627f14356f505f621b31197fd872b99a10563] ext4: call ext4_bg_num_gdb_[no]meta directly in ext4_num_base_meta_clusters
> git bisect good a38627f14356f505f621b31197fd872b99a10563
> # bad: [f567ea7843562b7d7dfe1e6cfea455a1e9623208] ext4: remove stale comment in ext4_init_block_bitmap
> git bisect bad f567ea7843562b7d7dfe1e6cfea455a1e9623208
>
> # tune2fs -l /dev/vg-trident/storage
> tune2fs 1.47.0 (5-Feb-2023)
> Filesystem volume name: storage
> Last mounted on: /srv/storage
> Filesystem UUID: ad4e7e8d-f375-491f-a5dd-6251d32229e6
> Filesystem magic number: 0xEF53
> Filesystem revision #: 1 (dynamic)
> Filesystem features: has_journal ext_attr dir_index filetype meta_bg extent 64bit flex_bg sparse_super large_file huge_file dir_nlink extra_isize metadata_csum
> Filesystem flags: signed_directory_hash
> Default mount options: user_xattr acl
> Filesystem state: clean
> Errors behavior: Continue
> Filesystem OS type: Linux
> Inode count: 595369984
> Block count: 4762936320
> Reserved block count: 0
> Overhead clusters: 37571453
> Free blocks: 568816097
> Free inodes: 594095660
> First block: 0
> Block size: 4096
> Fragment size: 4096
> Group descriptor size: 64
> Blocks per group: 32768
> Fragments per group: 32768
> Inodes per group: 4096
> Inode blocks per group: 256
> First meta block group: 1453
> Flex block group size: 16
> Filesystem created: Tue Dec 16 03:26:32 2014
> Last mount time: Tue Jun 13 08:54:33 2023
> Last write time: Tue Jun 13 08:55:57 2023
> Mount count: 0
> Maximum mount count: -1
> Last checked: Tue Jun 13 08:55:57 2023
> Check interval: 0 (<none>)
> Lifetime writes: 145 TB
> Reserved blocks uid: 0 (user root)
> Reserved blocks gid: 0 (group root)
> First inode: 11
> Inode size: 256
> Required extra isize: 28
> Desired extra isize: 28
> Journal inode: 8
> Default directory hash: half_md4
> Directory Hash Seed: 3e89594a-1c09-4c46-bb6e-402ad3d697b8
> Journal backup: inode blocks
> Checksum type: crc32c
> Checksum: 0xa57a2ce7
>
> -- dumpe2fs --
>
> dumpe2fs 1.47.0 (5-Feb-2023)
> Filesystem volume name: storage
> Last mounted on: /srv/storage
> Filesystem UUID: ad4e7e8d-f375-491f-a5dd-6251d32229e6
> Filesystem magic number: 0xEF53
> Filesystem revision #: 1 (dynamic)
> Filesystem features: has_journal ext_attr dir_index filetype meta_bg extent 64bit flex_bg sparse_super large_file huge_file dir_nlink extra_isize metadata_csum
> Filesystem flags: signed_directory_hash
> Default mount options: user_xattr acl
> Filesystem state: clean
> Errors behavior: Continue
> Filesystem OS type: Linux
> Inode count: 595369984
> Block count: 4762936320
> Reserved block count: 0
> Overhead clusters: 37571453
> Free blocks: 568816097
> Free inodes: 594095660
> First block: 0
> Block size: 4096
> Fragment size: 4096
> Group descriptor size: 64
> Blocks per group: 32768
> Fragments per group: 32768
> Inodes per group: 4096
> Inode blocks per group: 256
> First meta block group: 1453
> Flex block group size: 16
> Filesystem created: Tue Dec 16 03:26:32 2014
> Last mount time: Tue Jun 13 08:54:33 2023
> Last write time: Tue Jun 13 09:07:45 2023
> Mount count: 0
> Maximum mount count: -1
> Last checked: Tue Jun 13 09:07:45 2023
> Check interval: 0 (<none>)
> Lifetime writes: 145 TB
> Reserved blocks uid: 0 (user root)
> Reserved blocks gid: 0 (group root)
> First inode: 11
> Inode size: 256
> Required extra isize: 28
> Desired extra isize: 28
> Journal inode: 8
> Default directory hash: half_md4
> Directory Hash Seed: 3e89594a-1c09-4c46-bb6e-402ad3d697b8
> Journal backup: inode blocks
> Checksum type: crc32c
> Checksum: 0xf32a54d5
> Journal features: journal_incompat_revoke journal_64bit journal_checksum_v3
> Total journal size: 128M
> Total journal blocks: 32768
> Max transaction length: 32768
> Fast commit length: 0
> Journal sequence: 0x0074096a
> Journal start: 0
> Journal checksum type: crc32c
> Journal checksum: 0x478ec5ac
>
> [... cut, the full dumpe2fs is ~60 MB, and even xz -9 compressed
> still ~6 MB large ...]
>
> Reverting (just) ad3f09be6cfe332be8ff46c78e6ec0f8839107aa from
> v6.4-rc6-27-ge328c473340c allows me to mount this filesystem again.
>
> This seems to be the same issue as:
> https://bugzilla.kernel.org/show_bug.cgi?id=217534
> https://lkml.kernel.org/r/<[email protected]>
>
> Regards
> Stefan Lippers-Hollmann
>

--
Best wishes
Kemeng Shi