LinuxLists.cc - df returns incorrect size of partition due to huge overhead block count in ext4 partition

2022-03-25 19:30:04

Subject: df returns incorrect size of partition due to huge overhead block count in ext4 partition

My eMMC partition is ext4 formatted and has about 100MB size. df -h
command lists the size of the partition and the used percentage as
below.
Filesystem Size Used Avail Use% Mounted on

/dev/mmcblk2p4 16Z 16Z 79M 100% /data

For your reference, the returned values for statfs64( ) are

statfs64("/data", 88, {f_type="EXT2_SUPER_MAGIC", f_bsize=1024,
f_blocks=18446744073659310077, f_bfree=87628, f_bavail=80460,
f_files=25688, f_ffree=25189, f_fsid={-1446355608, 1063639410},
f_namelen=255, f_frsize=1024, f_flags=4128}) = 0

The output dumpe2fs returns the following

Block count: 102400

Reserved block count: 5120

Overhead blocks: 50343939

As per my kernel (4.9.31) code, the f_blocks is block_count - overhead
blocks. Considering the subtraction with the above values results in a
negative value this is interpreted as the huge value of
18446744073659310077.

I have a script which monitors the used percentage of the partition
using df -h command and when the used percentage is greater than 70%,
it deletes files until the used percentage comes down. Considering df
is reporting all the time 100% usage, all my files get deleted.

My questions are:

a) Where does overhead blocks get set?

b) Why is this value huge for my partition and how to correct it
considering fsck is also not correcting this

Please note fsck on this partition doesn't report any issues at all.

I am also able to create files in this partition.

2022-03-25 22:39:28

by Theodore Ts'o

[permalink] [raw]

Subject: Re: df returns incorrect size of partition due to huge overhead block count in ext4 partition

On Fri, Mar 25, 2022 at 12:12:30PM +0530, Fariya F wrote:
> The output dumpe2fs returns the following
>
> Block count: 102400
> Reserved block count: 5120
> Overhead blocks: 50343939

Yeah, that value is obviously wrong; I'm not sure how it got
corrupted, but that's the cause of the your problem.

> a) Where does overhead blocks get set?

The kernel can calculate the overhead value, but it can be slow for
very large file systems. For that reason, it is cached in the
superblock. So if the s_overhead_clusters is zero, the kernel will
calculate the overhead value, and then update the superblock.

In newer versions of e2fsprogs, mkfs.ext4 / mke2fs will write the
overhead value into the superblock.

> b) Why is this value huge for my partition and how to correct it
> considering fsck is also not correcting this

The simpleest way is to run the following command with the file system
unmounted:

debugfs -w -R "set_super_value overhead_clusters 0" /dev/sdXX

Then the next time you mount the file system, the correct value should
get caluclated and filled in.

It's a bug that fsck isn't notcing the problem and correcting it.
I'll work on getting that fixed in a future version of e2fsprogs.

My apologies for the inconvenience.

Cheers,

- Ted

2022-03-28 20:45:31

by Fariya F

[permalink] [raw]

Subject: Re: df returns incorrect size of partition due to huge overhead block count in ext4 partition

Hi Ted,

Thanks for the response. Really appreciate it. Some questions:

a) This issue is observed on one of the customer board and hence a fix
is a must for us or at least I will need to do a work-around so other
customer boards do not face this issue. As I mentioned my script
relies on df -h output of used percentage. In the case of the board
reporting 16Z of used space and size, the available space is somehow
reported correctly. Should my script rely on available space and not
on the used space% output of df. Will that be a reliable work-around?
Do you see any issue in using the partition from then or some where
down the line the overhead blocks number would create a problem and my
partition would end up misbehaving or any sort of data loss could
occur? Data loss would be a concern for us. Please guide.

//* More info on my script: I have a script which monitors the used
percentage of the partition using df -h command and when the used
percentage is greater than 70%, it deletes files until the used
percentage comes down. Considering df
is reporting all the time 100% usage, all my files get deleted.*//

b) Any other suggestions of a work-around so even if the overhead
blocks reports more blocks than actual blocks on the partition, i am
able to use the partition reliably or do you think it would be a
better suggestion to wait for the fix in e2fsprogs?

I think apart from the fix in e2fsprogs tool, a kernel fix is also
required, wherein it performs check that the overhead blocks should
not be greater than the actual blocks on the partition.

Regards

On Sat, Mar 26, 2022 at 3:41 AM Theodore Ts'o <[email protected]> wrote:
>
> On Fri, Mar 25, 2022 at 12:12:30PM +0530, Fariya F wrote:
> > The output dumpe2fs returns the following
> >
> > Block count: 102400
> > Reserved block count: 5120
> > Overhead blocks: 50343939
>
> Yeah, that value is obviously wrong; I'm not sure how it got
> corrupted, but that's the cause of the your problem.
>
> > a) Where does overhead blocks get set?
>
> The kernel can calculate the overhead value, but it can be slow for
> very large file systems. For that reason, it is cached in the
> superblock. So if the s_overhead_clusters is zero, the kernel will
> calculate the overhead value, and then update the superblock.
>
> In newer versions of e2fsprogs, mkfs.ext4 / mke2fs will write the
> overhead value into the superblock.
>
> > b) Why is this value huge for my partition and how to correct it
> > considering fsck is also not correcting this
>
> The simpleest way is to run the following command with the file system
> unmounted:
>
> debugfs -w -R "set_super_value overhead_clusters 0" /dev/sdXX
>
> Then the next time you mount the file system, the correct value should
> get caluclated and filled in.
>
> It's a bug that fsck isn't notcing the problem and correcting it.
> I'll work on getting that fixed in a future version of e2fsprogs.
>
> My apologies for the inconvenience.
>
> Cheers,
>
> - Ted

2022-03-29 15:08:32

by Theodore Ts'o

[permalink] [raw]

Subject: Re: df returns incorrect size of partition due to huge overhead block count in ext4 partition

(Removing linux-fsdevel from the cc list since this is an ext4
specific issue.)

On Mon, Mar 28, 2022 at 09:38:18PM +0530, Fariya F wrote:
> Hi Ted,
>
> Thanks for the response. Really appreciate it. Some questions:
>
> a) This issue is observed on one of the customer board and hence a fix
> is a must for us or at least I will need to do a work-around so other
> customer boards do not face this issue. As I mentioned my script
> relies on df -h output of used percentage. In the case of the board
> reporting 16Z of used space and size, the available space is somehow
> reported correctly. Should my script rely on available space and not
> on the used space% output of df. Will that be a reliable work-around?
> Do you see any issue in using the partition from then or some where
> down the line the overhead blocks number would create a problem and my
> partition would end up misbehaving or any sort of data loss could
> occur? Data loss would be a concern for us. Please guide.

I'm guessing that the problem was caused by a bit-flip in the
superblock, so it was just a matter of hardware error. What version
of e2fsprogs are using, and did you have metadata checksum (meta_csum)
feature enabled? Depending on where the bit-flip happened --- e.g.,
whether it was in memory and then superblock was written out, or on
the eMMC or other storage device --- if the metadata checksum feature
caught the superblock error, it would have detected the issue, and
while it would have required a manual fsck to fix it, at that point it
would have fallen back to use the backup superblock version.

> b) Any other suggestions of a work-around so even if the overhead
> blocks reports more blocks than actual blocks on the partition, i am
> able to use the partition reliably or do you think it would be a
> better suggestion to wait for the fix in e2fsprogs?
>
> I think apart from the fix in e2fsprogs tool, a kernel fix is also
> required, wherein it performs check that the overhead blocks should
> not be greater than the actual blocks on the partition.

Yes, we can certainly have the kernel check to see if the overhead
value is completely insane, and if so, recalculate it (even though it
would slow down the mount).

Another thing we could do is to always recaluclate the overhead amount
if the file system is smaller than some arbitrary size, on the theory
that (a) for small file systems, the increased time to mount the file
system will not be noticeable, and (b) embedded and mobile devices are
often where "cost optimized" (my polite way of saying crappy quality
to save a pentty or two in Bill of Materials costs) are most likely,
and so those are where bit flips are more likely.

Cheers,

- Ted

2022-04-12 23:31:03

by Fariya F

[permalink] [raw]

Subject: Re: df returns incorrect size of partition due to huge overhead block count in ext4 partition

The e2fsprogs version is 1.42.99. The exact version of df utility when
I query is 8.25.
The Linux kernel is 4.9.31. Please note the e2fsprogs ipk file was
available as part of Arago distribution for the ARM processor I use.

From your email I understand that below are the options as of now:
a) Fix in the fsck tool and kernel fix: This is something I am looking
forward to. Could you please help prioritize it?
b) Recalculating overhead at mount time: Is it possible to do it with
some specific options at mount time. I still think option #a is what
works best for us.
c) Enabling metadata checksum: May not be possible for us at the moment.

Thanks a lot for all your help, Ted. Appreciate if you could
prioritize the fix.

On Tue, Mar 29, 2022 at 6:38 PM Theodore Ts'o <[email protected]> wrote:
>
> (Removing linux-fsdevel from the cc list since this is an ext4
> specific issue.)
>
> On Mon, Mar 28, 2022 at 09:38:18PM +0530, Fariya F wrote:
> > Hi Ted,
> >
> > Thanks for the response. Really appreciate it. Some questions:
> >
> > a) This issue is observed on one of the customer board and hence a fix
> > is a must for us or at least I will need to do a work-around so other
> > customer boards do not face this issue. As I mentioned my script
> > relies on df -h output of used percentage. In the case of the board
> > reporting 16Z of used space and size, the available space is somehow
> > reported correctly. Should my script rely on available space and not
> > on the used space% output of df. Will that be a reliable work-around?
> > Do you see any issue in using the partition from then or some where
> > down the line the overhead blocks number would create a problem and my
> > partition would end up misbehaving or any sort of data loss could
> > occur? Data loss would be a concern for us. Please guide.
>
> I'm guessing that the problem was caused by a bit-flip in the
> superblock, so it was just a matter of hardware error. What version
> of e2fsprogs are using, and did you have metadata checksum (meta_csum)
> feature enabled? Depending on where the bit-flip happened --- e.g.,
> whether it was in memory and then superblock was written out, or on
> the eMMC or other storage device --- if the metadata checksum feature
> caught the superblock error, it would have detected the issue, and
> while it would have required a manual fsck to fix it, at that point it
> would have fallen back to use the backup superblock version.
>
> > b) Any other suggestions of a work-around so even if the overhead
> > blocks reports more blocks than actual blocks on the partition, i am
> > able to use the partition reliably or do you think it would be a
> > better suggestion to wait for the fix in e2fsprogs?
> >
> > I think apart from the fix in e2fsprogs tool, a kernel fix is also
> > required, wherein it performs check that the overhead blocks should
> > not be greater than the actual blocks on the partition.
>
> Yes, we can certainly have the kernel check to see if the overhead
> value is completely insane, and if so, recalculate it (even though it
> would slow down the mount).
>
> Another thing we could do is to always recaluclate the overhead amount
> if the file system is smaller than some arbitrary size, on the theory
> that (a) for small file systems, the increased time to mount the file
> system will not be noticeable, and (b) embedded and mobile devices are
> often where "cost optimized" (my polite way of saying crappy quality
> to save a pentty or two in Bill of Materials costs) are most likely,
> and so those are where bit flips are more likely.
>
> Cheers,
>
> - Ted

2022-04-15 06:28:23

by Theodore Ts'o

[permalink] [raw]

Subject: [PATCH 1/3] ext4: fix overhead calculation to account for the reserved gdt blocks

The kernel calculation was underestimating the overhead by not taking
into account the reserved gdt blocks. With this change, the overhead
calculated by the kernel matches the overhead calculation in mke2fs.

Signed-off-by: Theodore Ts'o <[email protected]>
Cc: [email protected]
---
fs/ext4/super.c | 4 +++-
1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/fs/ext4/super.c b/fs/ext4/super.c
index f2a5e78f93a9..23a9b2c086ed 100644
--- a/fs/ext4/super.c
+++ b/fs/ext4/super.c
@@ -4177,9 +4177,11 @@ static int count_overhead(struct super_block *sb, ext4_group_t grp,
ext4_fsblk_t first_block, last_block, b;
ext4_group_t i, ngroups = ext4_get_groups_count(sb);
int s, j, count = 0;
+ int has_super = ext4_bg_has_super(sb, grp);

if (!ext4_has_feature_bigalloc(sb))
- return (ext4_bg_has_super(sb, grp) + ext4_bg_num_gdb(sb, grp) +
+ return (has_super + ext4_bg_num_gdb(sb, grp) +
+ (has_super ? le16_to_cpu(sbi->s_es->s_reserved_gdt_blocks) : 0) +
sbi->s_itb_per_group + 2);

first_block = le32_to_cpu(sbi->s_es->s_first_data_block) +
--
2.31.0

2022-04-15 10:58:00

by Theodore Ts'o

[permalink] [raw]

Subject: [PATCH 3/3] ext4: update the cached overhead value in the superblock

If we (re-)calculate the file system overhead amount and it's
different from the on-disk s_overhead_clusters value, update the
on-disk version since this can take potentially quite a while on
bigalloc file systems.

Signed-off-by: Theodore Ts'o <[email protected]>
Cc: [email protected]
---
fs/ext4/ext4.h | 1 +
fs/ext4/ioctl.c | 16 ++++++++++++++++
fs/ext4/super.c | 2 ++
3 files changed, 19 insertions(+)

diff --git a/fs/ext4/ext4.h b/fs/ext4/ext4.h
index 48dc2c3247ad..a743b1e3b89e 100644
--- a/fs/ext4/ext4.h
+++ b/fs/ext4/ext4.h
@@ -3068,6 +3068,7 @@ int ext4_fileattr_set(struct user_namespace *mnt_userns,
struct dentry *dentry, struct fileattr *fa);
int ext4_fileattr_get(struct dentry *dentry, struct fileattr *fa);
extern void ext4_reset_inode_seed(struct inode *inode);
+int ext4_update_overhead(struct super_block *sb);

/* migrate.c */
extern int ext4_ext_migrate(struct inode *);
diff --git a/fs/ext4/ioctl.c b/fs/ext4/ioctl.c
index 992229ca2d83..ba44fa1be70a 100644
--- a/fs/ext4/ioctl.c
+++ b/fs/ext4/ioctl.c
@@ -1652,3 +1652,19 @@ long ext4_compat_ioctl(struct file *file, unsigned int cmd, unsigned long arg)
return ext4_ioctl(file, cmd, (unsigned long) compat_ptr(arg));
}
#endif
+
+static void set_overhead(struct ext4_super_block *es, const void *arg)
+{
+ es->s_overhead_clusters = cpu_to_le32(*((unsigned long *) arg));
+}
+
+int ext4_update_overhead(struct super_block *sb)
+{
+ struct ext4_sb_info *sbi = EXT4_SB(sb);
+
+ if (sb_rdonly(sb) || sbi->s_overhead == 0 ||
+ sbi->s_overhead == le32_to_cpu(sbi->s_es->s_overhead_clusters))
+ return 0;
+
+ return ext4_update_superblocks_fn(sb, set_overhead, &sbi->s_overhead);
+}
diff --git a/fs/ext4/super.c b/fs/ext4/super.c
index d08820fdfdee..1847b46af808 100644
--- a/fs/ext4/super.c
+++ b/fs/ext4/super.c
@@ -5618,6 +5618,8 @@ static int ext4_fill_super(struct super_block *sb, struct fs_context *fc)
ext4_msg(sb, KERN_INFO, "mounted filesystem with%s. "
"Quota mode: %s.", descr, ext4_quota_mode(sb));

+ /* Update the s_overhead_clusters if necessary */
+ ext4_update_overhead(sb);
return 0;

free_sbi:
--
2.31.0

2022-04-16 02:18:24

by Theodore Ts'o

[permalink] [raw]

Subject: [PATCH 2/3] ext4: force overhead calculation if the s_overhead_cluster makes no sense

If the file system does not use bigalloc, calculating the overhead is
cheap, so force the recalculation of the overhead so we don't have to
trust the precalculated overhead in the superblock.

Signed-off-by: Theodore Ts'o <[email protected]>
Cc: [email protected]
---
fs/ext4/super.c | 15 ++++++++++++---
1 file changed, 12 insertions(+), 3 deletions(-)

diff --git a/fs/ext4/super.c b/fs/ext4/super.c
index 23a9b2c086ed..d08820fdfdee 100644
--- a/fs/ext4/super.c
+++ b/fs/ext4/super.c
@@ -5289,9 +5289,18 @@ static int __ext4_fill_super(struct fs_context *fc, struct super_block *sb)
* Get the # of file system overhead blocks from the
* superblock if present.
*/
- if (es->s_overhead_clusters)
- sbi->s_overhead = le32_to_cpu(es->s_overhead_clusters);
- else {
+ sbi->s_overhead = le32_to_cpu(es->s_overhead_clusters);
+ /* ignore the precalculated value if it is ridiculous */
+ if (sbi->s_overhead > ext4_blocks_count(es))
+ sbi->s_overhead = 0;
+ /*
+ * If the bigalloc feature is not enabled recalculating the
+ * overhead doesn't take long, so we might as well just redo
+ * it to make sure we are using the correct value.
+ */
+ if (!ext4_has_feature_bigalloc(sb))
+ sbi->s_overhead = 0;
+ if (sbi->s_overhead == 0) {
err = ext4_calculate_overhead(sb);
if (err)
goto failed_mount_wq;
--
2.31.0