LinuxLists.cc - e2scrub finds corruption immediately after mounting

2024-01-03 21:14:47

Subject: e2scrub finds corruption immediately after mounting

I am trying to migrate from lvcheck
(https://github.com/BryanKadzban/lvcheck) to using the officially
supported e2scrub[_all] kit.

I am finding that e2scrub very often (much more than lvcheck even)
finds corruption and wants me to do an offline e2fsck. Not only does
it do this immediately after booting a system that includes filesystem
checks (that were caused by e2scrub previously setting a filesystem to
be checked on next boot), but it happens immediately after I run an
e2fsck and then mount the filesystem, even without any activity on it.
Observe:

# umount /opt
# e2fsck -y /dev/rootvol_tmp/almalinux8_opt
e2fsck 1.45.6 (20-Mar-2020)
/dev/mapper/rootvol_tmp-almalinux8_opt: clean, 1698/178816 files,
482404/716800 blocks
# e2scrub /dev/rootvol_tmp/almalinux8_opt
Logical volume "almalinux8_opt.e2scrub" created.
e2fsck 1.45.6 (20-Mar-2020)
Pass 1: Checking inodes, blocks, and sizes
Pass 2: Checking directory structure
Pass 3: Checking directory connectivity
Pass 4: Checking reference counts
Pass 5: Checking group summary information
/dev/rootvol_tmp/almalinux8_opt.e2scrub: 1698/178816 files (86.9% non-
contiguous), 482404/716800 blocks
/dev/rootvol_tmp/almalinux8_opt: Scrub succeeded.
tune2fs 1.45.6 (20-Mar-2020)
Setting current mount count to 0
Setting time filesystem last checked to Wed Jan 3 11:37:04 2024

Logical volume "almalinux8_opt.e2scrub" successfully removed.
# mount /opt
# e2scrub /dev/rootvol_tmp/almalinux8_opt
Logical volume "almalinux8_opt.e2scrub" created.
e2fsck 1.45.6 (20-Mar-2020)
Pass 1: Checking inodes, blocks, and sizes
Pass 2: Checking directory structure
Pass 3: Checking directory connectivity
Pass 4: Checking reference counts
Pass 5: Checking group summary information
/dev/rootvol_tmp/almalinux8_opt.e2scrub: 1698/178816 files (86.9% non-
contiguous), 482404/716800 blocks
/dev/rootvol_tmp/almalinux8_opt: Scrub FAILED due to corruption!
Unmount and run e2fsck -y.
tune2fs 1.45.6 (20-Mar-2020)
Setting filesystem error flag to force fsck.
Logical volume "almalinux8_opt.e2scrub" successfully removed.

So as you can see, I unmount /opt, run an e2fsck -y on it to clean it
and then before mounting run e2scrub and it finds the filesystem clean.
Good so far.

I then mount it and then immediately run another e2scrub on it and that
finds it dirty and wants me to unmount and run another e2fsck -y on it.
But how can that be? Surely an e2scrub on a freshly cleaned and
mounted filesystem (with no activity on it in between) should be clean,
yes?

Cheers,
b.

Attachments:

signature.asc (499.00 B)
This is a digitally signed message part

2024-01-04 04:38:33

by Theodore Ts'o

[permalink] [raw]

Subject: Re: e2scrub finds corruption immediately after mounting

On Wed, Jan 03, 2024 at 04:14:36PM -0500, Brian J. Murrell wrote:
> I am trying to migrate from lvcheck
> (https://github.com/BryanKadzban/lvcheck) to using the officially
> supported e2scrub[_all] kit.

What distribution are you using, and what version of the kernel are
you using? I note that you are using e2fsprogs 1.45.6, and Debian
Stable is shipping with e2fsprogs 1.47.0.

That being said, this is the first time I've seen any report of an
issue like what you've reported..

> # e2scrub /dev/rootvol_tmp/almalinux8_opt
> Logical volume "almalinux8_opt.e2scrub" created.
> e2fsck 1.45.6 (20-Mar-2020)
> Pass 1: Checking inodes, blocks, and sizes
> Pass 2: Checking directory structure
> Pass 3: Checking directory connectivity
> Pass 4: Checking reference counts
> Pass 5: Checking group summary information
> /dev/rootvol_tmp/almalinux8_opt.e2scrub: 1698/178816 files (86.9% non-
> contiguous), 482404/716800 blocks
> /dev/rootvol_tmp/almalinux8_opt: Scrub FAILED due to corruption!

This error means that e2fsck exited with a non-zero exit status.
Which is strange because there is no report of any kind of problem
from e2fsck in its output. From the e2scrub script:

check() {
# First we recover the journal, then we see if e2fsck tries any
# non-optimization repairs. If either of these two returns a
# non-zero status (errors fixed or remaining) then this fs is bad.
E2FSCK_FIXES_ONLY=1
export E2FSCK_FIXES_ONLY
${DBG} "@root_sbindir@/e2fsck" -E journal_only -p ${e2fsck_opts} "${snap_dev}" || return $?
${DBG} "@root_sbindir@/e2fsck" -f -y ${e2fsck_opts} "${snap_dev}"
}

...

check
case "$?" in
"0")
# Clean check!
echo "${arg}: Scrub succeeded."
...

"8")
# Operational error, what now?
echo "${arg}: e2fsck operational error."
...

*)
# fsck failed. Check if the snapshot is invalid; if so, make a
# note of that at the end of the log. This isn't necessarily a
# failure because the mounted fs could have overflowed the
# snapshot with regular disk writes /or/ our repair process
# could have done it by repairing too much.
#
# If it's really corrupt we ought to fsck at next boot.
is_invalid="$(lvs -o lv_snapshot_invalid --noheadings "${snap_dev}" | awk '{print $1}')"
if [ -n "${is_invalid}" ]; then
echo "${arg}: Scrub FAILED due to invalid snapshot."
ret=8
else
echo "${arg}: Scrub FAILED due to corruption! Unmount and run e2fsck -y."
mark_corrupt
ret=6
fi
...

My best guess is that e2fsck from 1.45.6 is somehow returning a
non-zero exit status for some reason. So the first thing I'd suggest
is upgrading to e2fsprogs 1.47.0 and see if that causes the problem to
resolve itself.

Cheers,

- Ted

2024-01-04 04:55:46

by Darrick J. Wong

[permalink] [raw]

Subject: Re: e2scrub finds corruption immediately after mounting

On Wed, Jan 03, 2024 at 04:14:36PM -0500, Brian J. Murrell wrote:
> I am trying to migrate from lvcheck
> (https://github.com/BryanKadzban/lvcheck) to using the officially
> supported e2scrub[_all] kit.
>
> I am finding that e2scrub very often (much more than lvcheck even)
> finds corruption and wants me to do an offline e2fsck. Not only does
> it do this immediately after booting a system that includes filesystem
> checks (that were caused by e2scrub previously setting a filesystem to
> be checked on next boot), but it happens immediately after I run an
> e2fsck and then mount the filesystem, even without any activity on it.
> Observe:
>
> # umount /opt
> # e2fsck -y /dev/rootvol_tmp/almalinux8_opt
> e2fsck 1.45.6 (20-Mar-2020)
> /dev/mapper/rootvol_tmp-almalinux8_opt: clean, 1698/178816 files,
> 482404/716800 blocks
> # e2scrub /dev/rootvol_tmp/almalinux8_opt
> Logical volume "almalinux8_opt.e2scrub" created.
> e2fsck 1.45.6 (20-Mar-2020)
> Pass 1: Checking inodes, blocks, and sizes
> Pass 2: Checking directory structure
> Pass 3: Checking directory connectivity
> Pass 4: Checking reference counts
> Pass 5: Checking group summary information
> /dev/rootvol_tmp/almalinux8_opt.e2scrub: 1698/178816 files (86.9% non-
> contiguous), 482404/716800 blocks
> /dev/rootvol_tmp/almalinux8_opt: Scrub succeeded.
> tune2fs 1.45.6 (20-Mar-2020)
> Setting current mount count to 0
> Setting time filesystem last checked to Wed Jan 3 11:37:04 2024
>
> Logical volume "almalinux8_opt.e2scrub" successfully removed.
> # mount /opt
> # e2scrub /dev/rootvol_tmp/almalinux8_opt
> Logical volume "almalinux8_opt.e2scrub" created.
> e2fsck 1.45.6 (20-Mar-2020)
> Pass 1: Checking inodes, blocks, and sizes
> Pass 2: Checking directory structure
> Pass 3: Checking directory connectivity
> Pass 4: Checking reference counts
> Pass 5: Checking group summary information
> /dev/rootvol_tmp/almalinux8_opt.e2scrub: 1698/178816 files (86.9% non-
> contiguous), 482404/716800 blocks
> /dev/rootvol_tmp/almalinux8_opt: Scrub FAILED due to corruption!

Curious. Normally e2scrub will run e2fsck twice: Once in journal-only
preen mode to replay the journal, then again with -fy to perform the
full filesystem (snapshot) check.

I wonder if you would paste the output of
"bash -x e2scrub /dev/rootvol_tmp/almalinux8_opt" here? I'd be curious
to see what the command flow is.

Assuming that 1.47.0 doesn't magically fix it. :)

> Unmount and run e2fsck -y.
> tune2fs 1.45.6 (20-Mar-2020)
> Setting filesystem error flag to force fsck.
> Logical volume "almalinux8_opt.e2scrub" successfully removed.
>
> So as you can see, I unmount /opt, run an e2fsck -y on it to clean it
> and then before mounting run e2scrub and it finds the filesystem clean.
> Good so far.
>
> I then mount it and then immediately run another e2scrub on it and that
> finds it dirty and wants me to unmount and run another e2fsck -y on it.
> But how can that be? Surely an e2scrub on a freshly cleaned and
> mounted filesystem (with no activity on it in between) should be clean,
> yes?

Right. Unless something's broken in e2fsck. :/

--D

>
> Cheers,
> b.
>

2024-01-04 14:10:34

by Brian J. Murrell

[permalink] [raw]

Subject: Re: e2scrub finds corruption immediately after mounting

On Wed, 2024-01-03 at 23:38 -0500, Theodore Ts'o wrote:
> What distribution are you using,

EL8, specifically AlmaLinux 8.9.

> and what version of the kernel are
> you using?

EL8 is currently shipping 4.18.0-513.9.1.el8_9.x86_64 but as you know
at this point in an EL8 kernel's life, the version hardly reflects
what's actually in the kernel due to the copious backporting RH do
their kernel.

> This error means that e2fsck exited with a non-zero exit status.
> Which is strange because there is no report of any kind of problem
> from e2fsck in its output.

Indeed! I even added debug output to e2scrub to print e2fsck's exit
value and it's usually 1.

> My best guess is that e2fsck from 1.45.6 is somehow returning a
> non-zero exit status for some reason. So the first thing I'd suggest
> is upgrading to e2fsprogs 1.47.0 and see if that causes the problem
> to
> resolve itself.

Unfortunately, that doesn't seem to be the solution. :-(

+ umount /opt
+ e2fsck -y /dev/rootvol_tmp/almalinux8_opt
e2fsck 1.47.0 (5-Feb-2023)
/dev/rootvol_tmp/almalinux8_opt: clean, 1698/178816 files, 482473/716800 blocks
+ e2scrub /dev/rootvol_tmp/almalinux8_opt
Logical volume "almalinux8_opt.e2scrub" created.
e2fsck 1.47.0 (5-Feb-2023)
Pass 1: Checking inodes, blocks, and sizes
Pass 2: Checking directory structure
Pass 3: Checking directory connectivity
Pass 4: Checking reference counts
Pass 5: Checking group summary information
/dev/rootvol_tmp/almalinux8_opt.e2scrub: 1698/178816 files (87.0% non-contiguous), 482473/716800 blocks
/dev/rootvol_tmp/almalinux8_opt: Scrub succeeded.
tune2fs 1.47.0 (5-Feb-2023)
Setting current mount count to 0
Setting time filesystem last checked to Thu Jan 4 09:07:56 2024

Logical volume "almalinux8_opt.e2scrub" successfully removed.
+ mount /opt
+ e2scrub /dev/rootvol_tmp/almalinux8_opt
Logical volume "almalinux8_opt.e2scrub" created.
e2fsck 1.47.0 (5-Feb-2023)
Pass 1: Checking inodes, blocks, and sizes
Pass 2: Checking directory structure
Pass 3: Checking directory connectivity
Pass 4: Checking reference counts
Pass 5: Checking group summary information
/dev/rootvol_tmp/almalinux8_opt.e2scrub: 1698/178816 files (87.0% non-contiguous), 482473/716800 blocks
/dev/rootvol_tmp/almalinux8_opt: Scrub FAILED due to corruption! Unmount and run e2fsck -y.
tune2fs 1.47.0 (5-Feb-2023)
Setting filesystem error flag to force fsck.
Logical volume "almalinux8_opt.e2scrub" successfully removed.

Cheers,
b.

Attachments:

signature.asc (499.00 B)
This is a digitally signed message part

2024-01-04 14:14:05

by Brian J. Murrell

[permalink] [raw]

Subject: Re: e2scrub finds corruption immediately after mounting

On Wed, 2024-01-03 at 20:55 -0800, Darrick J. Wong wrote:
> Curious. Normally e2scrub will run e2fsck twice: Once in journal-
> only
> preen mode to replay the journal, then again with -fy to perform the
> full filesystem (snapshot) check.

It is doing that. I suspect the first e2fsck is silent.

> I wonder if you would paste the output of
> "bash -x e2scrub /dev/rootvol_tmp/almalinux8_opt" here? I'd be
> curious
> to see what the command flow is.

Sure.

+ PATH=/sbin:/bin:/usr/sbin:/usr/bin:/sbin:/bin:/usr/sbin:/usr/bin:/root/.dotnet/tools
+ (( 0 != 0 ))
+ snap_size_mb=256
+ fstrim=0
+ reap=0
+ e2fsck_opts=
+ conffile=/etc/e2scrub.conf
+ test -f /etc/e2scrub.conf
+ . /etc/e2scrub.conf
++ periodic_e2scrub=1
++ [email protected]
+ getopts nrtV opt
+ shift 0
+ arg=/dev/rootvol_tmp/almalinux8_opt
+ '[' -z /dev/rootvol_tmp/almalinux8_opt ']'
+ type lsblk
+ type lvcreate
+ exec
+ '[' -b /dev/rootvol_tmp/almalinux8_opt ']'
++ dev_from_arg /dev/rootvol_tmp/almalinux8_opt
++ local dev=/dev/rootvol_tmp/almalinux8_opt
+++ lsblk -o FSTYPE -n /dev/rootvol_tmp/almalinux8_opt
++ local fstype=ext2
++ case "${fstype}" in
++ echo /dev/rootvol_tmp/almalinux8_opt
++ return 0
+ dev=/dev/rootvol_tmp/almalinux8_opt
++ mnt_from_dev /dev/rootvol_tmp/almalinux8_opt
++ local dev=/dev/rootvol_tmp/almalinux8_opt
++ '[' -n /dev/rootvol_tmp/almalinux8_opt ']'
++ lsblk -o MOUNTPOINT -n /dev/rootvol_tmp/almalinux8_opt
+ mnt=/opt
+ '[' '!' -e /dev/rootvol_tmp/almalinux8_opt ']'
++ lvs --nameprefixes -o name,vgname,lv_role --noheadings /dev/rootvol_tmp/almalinux8_opt
+ lvm_vars=' LVM2_LV_NAME='\''almalinux8_opt'\'' LVM2_VG_NAME='\''rootvol_tmp'\'' LVM2_LV_ROLE='\''public'\'''
+ eval ' LVM2_LV_NAME='\''almalinux8_opt'\'' LVM2_VG_NAME='\''rootvol_tmp'\'' LVM2_LV_ROLE='\''public'\'''
++ LVM2_LV_NAME=almalinux8_opt
++ LVM2_VG_NAME=rootvol_tmp
++ LVM2_LV_ROLE=public
+ '[' -z rootvol_tmp ']'
+ '[' -z almalinux8_opt ']'
+ echo public
+ grep -q snapshot
++ date +%Y%m%d%H%M%S
+ start_time=20240104091039
+ snap=almalinux8_opt.e2scrub
+ snap_dev=/dev/rootvol_tmp/almalinux8_opt.e2scrub
+ '[' 0 -gt 0 ']'
+ setup
++ date +%s
+ lvremove_deadline=1704377469
+ lvremove -f rootvol_tmp/almalinux8_opt.e2scrub
+ '[' -e /dev/rootvol_tmp/almalinux8_opt.e2scrub ']'
+ '[' -e /dev/rootvol_tmp/almalinux8_opt.e2scrub ']'
+ lvcreate -s -L 256m -n almalinux8_opt.e2scrub rootvol_tmp/almalinux8_opt
Logical volume "almalinux8_opt.e2scrub" created.
+ '[' 0 -ne 0 ']'
+ udevadm settle
+ return 0
+ trap 'teardown; exit 1' EXIT INT QUIT TERM
+ check
+ E2FSCK_FIXES_ONLY=1
+ export E2FSCK_FIXES_ONLY
+ /usr/sbin/e2fsck -E journal_only -p /dev/rootvol_tmp/almalinux8_opt.e2scrub
+ /usr/sbin/e2fsck -f -y /dev/rootvol_tmp/almalinux8_opt.e2scrub
e2fsck 1.47.0 (5-Feb-2023)
Pass 1: Checking inodes, blocks, and sizes
Pass 2: Checking directory structure
Pass 3: Checking directory connectivity
Pass 4: Checking reference counts
Pass 5: Checking group summary information
/dev/rootvol_tmp/almalinux8_opt.e2scrub: 1698/178816 files (87.0% non-contiguous), 482473/716800 blocks
+ case "$?" in
++ lvs -o lv_snapshot_invalid --noheadings /dev/rootvol_tmp/almalinux8_opt.e2scrub
++ awk '{print $1}'
+ is_invalid=
+ '[' -n '' ']'
+ echo '/dev/rootvol_tmp/almalinux8_opt: Scrub FAILED due to corruption! Unmount and run e2fsck -y.'
/dev/rootvol_tmp/almalinux8_opt: Scrub FAILED due to corruption! Unmount and run e2fsck -y.
+ mark_corrupt
+ /usr/sbin/tune2fs -E force_fsck /dev/rootvol_tmp/almalinux8_opt
tune2fs 1.47.0 (5-Feb-2023)
Setting filesystem error flag to force fsck.
+ ret=6
+ teardown
+ lvremove -f rootvol_tmp/almalinux8_opt.e2scrub
Logical volume "almalinux8_opt.e2scrub" successfully removed.
+ '[' -e /dev/rootvol_tmp/almalinux8_opt.e2scrub ']'
+ trap '' EXIT
+ exitcode 6
+ ret=6
+ '[' -n '' -a 6 -ne 0 ']'
+ exit 6

Cheers,
b.

Attachments:

signature.asc (499.00 B)
This is a digitally signed message part

2024-01-04 14:37:47

by Brian J. Murrell

[permalink] [raw]

Subject: Re: e2scrub finds corruption immediately after mounting

As a point of reference, the aforementioned lvcheck doesn't seem to
find any corruption on the same device and here is what it's doing:

…
+ lvcreate -s -L 256M -n almalinux8_opt-lvcheck-temp-20240104 rootvol_tmp/almalinux8_opt
Logical volume "almalinux8_opt-lvcheck-temp-20240104" created.
+ perform_check /dev/rootvol_tmp/almalinux8_opt-lvcheck-temp-20240104 ext2 /tmp/lvcheck.log.e0Xq523Wio
+ local dev=/dev/rootvol_tmp/almalinux8_opt-lvcheck-temp-20240104
+ local fstype=ext2
+ local tmpfile=/tmp/lvcheck.log.e0Xq523Wio
+ case "$fstype" in
+ nice logsave -as /tmp/lvcheck.log.e0Xq523Wio e2fsck -p -C 0 /dev/rootvol_tmp/almalinux8_opt-lvcheck-temp-20240104
/dev/rootvol_tmp/almalinux8_opt-lvcheck-temp-20240104 contains a file system with errors, check forced.
/dev/rootvol_tmp/almalinux8_opt-lvcheck-temp-20240104: 1698/178816 files (87.0% non-contiguous), 482473/716800 blocks
e2fsck exited with status code 1
+ nice logsave -as /tmp/lvcheck.log.e0Xq523Wio e2fsck -fy -C 0 /dev/rootvol_tmp/almalinux8_opt-lvcheck-temp-20240104
e2fsck 1.47.0 (5-Feb-2023)
Pass 1: Checking inodes, blocks, and sizes
Pass 2: Checking directory structure
Pass 3: Checking directory connectivity
Pass 4: Checking reference counts
Pass 5: Checking group summary information
/dev/rootvol_tmp/almalinux8_opt-lvcheck-temp-20240104: 1698/178816 files (87.0% non-contiguous), 482473/716800 blocks
+ return 0
+ log info 'Background scrubbing of /dev/rootvol_tmp/almalinux8_opt succeeded.'
+ local sev=info
+ local 'msg=Background scrubbing of /dev/rootvol_tmp/almalinux8_opt succeeded.'
+ local arg=
+ '[' info == emerg -o info == alert -o info == crit -o info == err -o info == warning ']'
+ logger -t lvcheck -p user.info -- 'Background scrubbing of /dev/rootvol_tmp/almalinux8_opt succeeded.'
+ try_delay_checks /dev/rootvol_tmp/almalinux8_opt ext2
+ local dev=/dev/rootvol_tmp/almalinux8_opt
+ local fstype=ext2
+ case "$fstype" in
+ tune2fs -C 0 -T now /dev/rootvol_tmp/almalinux8_opt
tune2fs 1.47.0 (5-Feb-2023)
Setting current mount count to 0
Setting time filesystem last checked to Thu Jan 4 09:29:25 2024

The significant difference between lvcheck and e2scrub seems to be the
'-E journal_only' option to e2fsck that e2scrub is adding.

Cheers,
b.

Attachments:

signature.asc (499.00 B)
This is a digitally signed message part

2024-01-08 12:53:35

by Brian J. Murrell

[permalink] [raw]

Subject: Re: e2scrub finds corruption immediately after mounting

On Thu, 2024-01-04 at 09:13 -0500, Brian J. Murrell wrote:
> On Wed, 2024-01-03 at 20:55 -0800, Darrick J. Wong wrote:
> > Curious. Normally e2scrub will run e2fsck twice: Once in journal-
> > only
> > preen mode to replay the journal, then again with -fy to perform
> > the
> > full filesystem (snapshot) check.
>
> It is doing that. I suspect the first e2fsck is silent.
>
> > I wonder if you would paste the output of
> > "bash -x e2scrub /dev/rootvol_tmp/almalinux8_opt" here? I'd be
> > curious
> > to see what the command flow is.
>
> Sure.

Was the bash -x output useful in any way, or was any of the information
I supplied in my other replies on this list:

https://lore.kernel.org/linux-ext4/[email protected]/
https://lore.kernel.org/linux-ext4/[email protected]/

useful including the test of 1.47.0 being able to reproduce the
behaviour?

Any thoughts on how to proceed?

Cheers,
b.

Attachments:

signature.asc (499.00 B)
This is a digitally signed message part

2024-01-09 06:06:42

by Darrick J. Wong

[permalink] [raw]

Subject: Re: e2scrub finds corruption immediately after mounting

On Mon, Jan 08, 2024 at 07:52:33AM -0500, Brian J. Murrell wrote:
> On Thu, 2024-01-04 at 09:13 -0500, Brian J. Murrell wrote:
> > On Wed, 2024-01-03 at 20:55 -0800, Darrick J. Wong wrote:
> > > Curious.? Normally e2scrub will run e2fsck twice: Once in journal-
> > > only
> > > preen mode to replay the journal, then again with -fy to perform
> > > the
> > > full filesystem (snapshot) check.
> >
> > It is doing that.? I suspect the first e2fsck is silent.
> >
> > > I wonder if you would paste the output of
> > > "bash -x e2scrub /dev/rootvol_tmp/almalinux8_opt" here?? I'd be
> > > curious
> > > to see what the command flow is.
> >
> > Sure.
>
> Was the bash -x output useful in any way, or was any of the information
> I supplied in my other replies on this list:
>
> https://lore.kernel.org/linux-ext4/[email protected]/
> https://lore.kernel.org/linux-ext4/[email protected]/
>
> useful including the test of 1.47.0 being able to reproduce the
> behaviour?

It was good and bad -- good in that it eliminated all of my hypotheses
about what could be causing it; and bad in that now I have no idea.

*Something* is causing the e2fsck exit code to be nonzero, but there's
nothing identifying what did that in the stdout/stderr dump.

> Any thoughts on how to proceed?

If you're willing to share a metadata dump of the filesystem, injecting:

e2image -Q "${snap_dev}" /tmp/disk.qcow2

right before the second e2fsck invocation in check() might help us get a
reproducer going. Please compress the qcow2 file before uploading it
somewhere.

--D

> Cheers,
> b.
>

2024-01-10 05:31:42

by Darrick J. Wong

[permalink] [raw]

Subject: Re: e2scrub finds corruption immediately after mounting

On Mon, Jan 08, 2024 at 10:06:29PM -0800, Darrick J. Wong wrote:
> On Mon, Jan 08, 2024 at 07:52:33AM -0500, Brian J. Murrell wrote:
> > On Thu, 2024-01-04 at 09:13 -0500, Brian J. Murrell wrote:
> > > On Wed, 2024-01-03 at 20:55 -0800, Darrick J. Wong wrote:
> > > > Curious.? Normally e2scrub will run e2fsck twice: Once in journal-
> > > > only
> > > > preen mode to replay the journal, then again with -fy to perform
> > > > the
> > > > full filesystem (snapshot) check.
> > >
> > > It is doing that.? I suspect the first e2fsck is silent.
> > >
> > > > I wonder if you would paste the output of
> > > > "bash -x e2scrub /dev/rootvol_tmp/almalinux8_opt" here?? I'd be
> > > > curious
> > > > to see what the command flow is.
> > >
> > > Sure.
> >
> > Was the bash -x output useful in any way, or was any of the information
> > I supplied in my other replies on this list:
> >
> > https://lore.kernel.org/linux-ext4/[email protected]/
> > https://lore.kernel.org/linux-ext4/[email protected]/
> >
> > useful including the test of 1.47.0 being able to reproduce the
> > behaviour?
>
> It was good and bad -- good in that it eliminated all of my hypotheses
> about what could be causing it; and bad in that now I have no idea.
>
> *Something* is causing the e2fsck exit code to be nonzero, but there's
> nothing identifying what did that in the stdout/stderr dump.
>
> > Any thoughts on how to proceed?
>
> If you're willing to share a metadata dump of the filesystem, injecting:
>
> e2image -Q "${snap_dev}" /tmp/disk.qcow2
>
> right before the second e2fsck invocation in check() might help us get a
> reproducer going. Please compress the qcow2 file before uploading it
> somewhere.

/me downloads dump, takes a look...

AHA! This is an ext2 filesystem, since it doesn't have the
"has_journal" or "extents" features turned on:

# e2image -r /tmp/disk.qcow2 /dev/sda
# dumpe2fs /dev/sda -h
dumpe2fs 1.47.1~WIP-2023-12-27 (27-Dec-2023)
Filesystem volume name: <none>
Last mounted on: /opt
Filesystem UUID: 2c70368a-0d54-4805-8620-fda19466d819
Filesystem magic number: 0xEF53
Filesystem revision #: 1 (dynamic)
Filesystem features: ext_attr resize_inode dir_index filetype sparse_super large_file
Filesystem flags: signed_directory_hash
Default mount options: user_xattr acl
Filesystem state: not clean with errors

(Note: Filesystem state == "clean" means that EXT2_VALID_FS is set in
the superblock s_state field; "not clean with errors" means that the
flag is not set.)

I bet the "journal only" preen doesn't actually reset the filesystem
state either:

# e2fsck -E journal_only -p /dev/sda
# dumpe2fs /dev/sda -h | grep state
dumpe2fs 1.47.1~WIP-2023-12-27 (27-Dec-2023)
Filesystem state: not clean with errors

Nope.

So now I know what happened -- when mounting an ext* filesystem that
doesn't have a journal, the driver clears EXT2_VALID_FS from the primary
superblock. This forces the system to run e2fsck after a crash, because
that's what you have to do for unjournalled filesystems.

The "e2fsck -E journal_only -p" call in e2scrub only replays the
journal. Since there is no journal, it exits almost immediately.
That's the intended behavior, but then it means that the "e2fsck -fy"
call immediately after sees that the superblock doesn't have
EXT2_VALID_FS set, sets it, and makes e2fsck return 1.

So that's why you're getting the e2scrub failures.

Contrast this to what you get when the filesystem has a journal:

# dumpe2fs -h /dev/sdb
dumpe2fs 1.47.0 (5-Feb-2023)
Filesystem volume name: <none>
Last mounted on: <not available>
Filesystem UUID: e18b8b57-a75e-4316-87ce-6a08969476c3
Filesystem magic number: 0xEF53
Filesystem revision #: 1 (dynamic)
Filesystem features: has_journal ext_attr resize_inode dir_index filetype needs_recovery sparse_super large_file
Filesystem flags: signed_directory_hash
Default mount options: user_xattr acl
Filesystem state: clean

Filesystems with journals retain their EXT4_VALID_FS state when they're
mounted.

Hmm. I'll have to think tomorrow morning about what e2scrub should do
about unjournalled filesystems. My initial thought is that it skip
them, because a mounted unjournalled filesystem cannot by definition be
made to be consistent.

Restricting the scope of e2scrub sucks, but in the meantime at least it
means that your filesystem isn't massively corrupt. Thanks for the
metadump, it was very useful for root cause analysis.

Ted: do you have any ideas?

--D

> --D
>
> > Cheers,
> > b.
> >
>
>
>

2024-01-10 13:46:05

by Brian J. Murrell

[permalink] [raw]

Subject: Re: e2scrub finds corruption immediately after mounting

On Tue, 2024-01-09 at 21:31 -0800, Darrick J. Wong wrote:
>
> AHA! This is an ext2 filesystem, since it doesn't have the
> "has_journal" or "extents" features turned on:

This is very odd. I haven't (intentionally) created a ext2 filesystem
since ext3 became available. :-)

Moreover /proc/mounts says it's an ext4 filesystem:

/dev/mapper/rootvol_tmp-almalinux8_opt /opt ext4 rw,seclabel,relatime 0 0

Do ext2 filesystems actually mount successfully and quietly when
mounted as ext4? Surely if one asks to mount an ext2 filesystem as
ext4 mount should fail and complain, yes?

Is https://ext4.wiki.kernel.org/index.php/UpgradeToExt4 still
considered accurate, in terms of an in-place upgrade of ext2 to ext4
being sub-optimal?

Is metadata locality the only thing you don't get with an in-place
upgrade? If so, how important is that, really?

> Thanks for the
> metadump, it was very useful for root cause analysis.

NPAA. Thank-you very much for your time and analysis on this issue.

Cheers,
b.

Attachments:

signature.asc (499.00 B)
This is a digitally signed message part

2024-01-10 18:06:18

by Darrick J. Wong

[permalink] [raw]

Subject: Re: e2scrub finds corruption immediately after mounting

On Wed, Jan 10, 2024 at 08:44:31AM -0500, Brian J. Murrell wrote:
> On Tue, 2024-01-09 at 21:31 -0800, Darrick J. Wong wrote:
> >
> > AHA!? This is an ext2 filesystem, since it doesn't have the
> > "has_journal" or "extents" features turned on:
>
> This is very odd. I haven't (intentionally) created a ext2 filesystem
> since ext3 became available. :-)

Huh. Do you remember the exact command that was used to format this
filesystem? "mke2fs" still formats ext2 filesystems unless you pass
-T ext4 or call its cousin mkfs.ext4.

> Moreover /proc/mounts says it's an ext4 filesystem:
>
> /dev/mapper/rootvol_tmp-almalinux8_opt /opt ext4 rw,seclabel,relatime 0 0

Check /etc/fstab -- if the type is specified as ext4, then that's what
ends up in /proc/mounts, even if it's an ext2 filesystem.

> Do ext2 filesystems actually mount successfully and quietly when
> mounted as ext4?

Yes. Most distros enable ext4.ko and do not enable ext2.ko, and the
ext4 driver is happy to mount ext2 filesystems but report them as ext4.

> Surely if one asks to mount an ext2 filesystem as ext4 mount should
> fail and complain, yes?

Nope. ext4 is really just ext2 plus a bunch of new features (journal,
extents, uninit_bg, dir_index). Or another way to look at it is that
ext2 is really just ext4 minus a bunch of features.

Muddying the water here is the fact that you're allowed to turn /off/
all these new features from the past 20 years, which means that the
integer after "ext" is not actually a gestalt id.

> Is https://ext4.wiki.kernel.org/index.php/UpgradeToExt4 still
> considered accurate, in terms of an in-place upgrade of ext2 to ext4
> being sub-optimal?

Yes, that's accurate. It's suboptimal in the sense that you ought to
back up the directory tree before running any of those commands in case
something goes wrong (program bug, power outage, etc) but if you have a
backup, you might as well format fresh and restore the backup.

> Is metadata locality the only thing you don't get with an in-place
> upgrade? If so, how important is that, really?

IIRC I think you don't get flex_bg, which means that the bitmaps are
every 128M instead of every 1G or so, which leads to more seeking.

> > Thanks for the
> > metadump, it was very useful for root cause analysis.
>
> NPAA. Thank-you very much for your time and analysis on this issue.

No problem. It's always fun to do a bit of Why, Tho? ;)

--D

>
> Cheers,
> b.
>

2024-01-10 23:43:36

by Andreas Dilger

[permalink] [raw]

Subject: Re: e2scrub finds corruption immediately after mounting

On Jan 10, 2024, at 11:06 AM, Darrick J. Wong <[email protected]> wrote:
>
> On Wed, Jan 10, 2024 at 08:44:31AM -0500, Brian J. Murrell wrote:
>> On Tue, 2024-01-09 at 21:31 -0800, Darrick J. Wong wrote:
>>>
>>> AHA! This is an ext2 filesystem, since it doesn't have the
>>> "has_journal" or "extents" features turned on:
>>
>> This is very odd. I haven't (intentionally) created a ext2 filesystem
>> since ext3 became available. :-)
>
> Huh. Do you remember the exact command that was used to format this
> filesystem? "mke2fs" still formats ext2 filesystems unless you pass
> -T ext4 or call its cousin mkfs.ext4.
>
>> Moreover /proc/mounts says it's an ext4 filesystem:
>>
>> /dev/mapper/rootvol_tmp-almalinux8_opt /opt ext4 rw,seclabel,relatime 0 0
>
> Check /etc/fstab -- if the type is specified as ext4, then that's what
> ends up in /proc/mounts, even if it's an ext2 filesystem.
>
>> Do ext2 filesystems actually mount successfully and quietly when
>> mounted as ext4?
>
> Yes. Most distros enable ext4.ko and do not enable ext2.ko, and the
> ext4 driver is happy to mount ext2 filesystems but report them as ext4.
>
>> Surely if one asks to mount an ext2 filesystem as ext4 mount should
>> fail and complain, yes?
>
> Nope. ext4 is really just ext2 plus a bunch of new features (journal,
> extents, uninit_bg, dir_index). Or another way to look at it is that
> ext2 is really just ext4 minus a bunch of features.
>
> Muddying the water here is the fact that you're allowed to turn /off/
> all these new features from the past 20 years, which means that the
> integer after "ext" is not actually a gestalt id.
>
>> Is https://ext4.wiki.kernel.org/index.php/UpgradeToExt4 still
>> considered accurate, in terms of an in-place upgrade of ext2 to ext4
>> being sub-optimal?
>
> Yes, that's accurate. It's suboptimal in the sense that you ought to
> back up the directory tree before running any of those commands in case
> something goes wrong (program bug, power outage, etc) but if you have a
> backup, you might as well format fresh and restore the backup.
>
>> Is metadata locality the only thing you don't get with an in-place
>> upgrade? If so, how important is that, really?
>
> IIRC I think you don't get flex_bg, which means that the bitmaps are
> every 128M instead of every 1G or so, which leads to more seeking.
>
>>> Thanks for the
>>> metadump, it was very useful for root cause analysis.
>>
>> NPAA. Thank-you very much for your time and analysis on this issue.
>
> No problem. It's always fun to do a bit of Why, Tho? ;)

Hello Brian, long time no see!

I was wondering if this might be a case where e2fsck removed the journal
on an ext4 filesystem, and then it wasn't recreated (e.g. if e2fsck was
killed before it finished cleanly).

However, looking at the features enabled on the filesystem, it definitely
looks like this was originally formatted as ext4. Like Darrick mentioned,
it is missing flex_bg, along with a whole slew of newer features. On one
of my local ext4 filesystems it has:

Filesystem features: has_journal ext_attr resize_inode dir_index
filetype extent flex_bg sparse_super large_file huge_file uninit_bg
dir_nlink extra_isize

compared to your filesystem:

Filesystem features: ext_attr resize_inode dir_index filetype
sparse_super large_file

Many of these features can be enabled on an existing filesystem, like
has_journal (ext3/4 journal), extents (improved large file allocation),
huge_file (> 2TB files), dir_nlink (> 32000 subdirs) if you want them.
I _think_ uninit_bg (e2fsck skip unused metadata may) is included here.

Some cannot be enabled on an existing filesystem like flex_bg (localized
metadata), and extra_isize (fast xattrs).

Whether that is worthwhile for you to enable, or just backup/reformat/sync
is up to you.

Cheers, Andreas

Attachments:

signature.asc (890.00 B)
Message signed with OpenPGP

2024-01-16 13:22:22

by Brian J. Murrell

[permalink] [raw]

Subject: Re: e2scrub finds corruption immediately after mounting

On Wed, 2024-01-10 at 10:06 -0800, Darrick J. Wong wrote:
>
> Huh. Do you remember the exact command that was used to format this
> filesystem?

I do not. It was created quite a while ago.

> "mke2fs" still formats ext2 filesystems unless you pass
> -T ext4 or call its cousin mkfs.ext4.

I wonder if that's what I did perhaps.

> Nope. ext4 is really just ext2 plus a bunch of new features
> (journal,
> extents, uninit_bg, dir_index).

Yes, that's completely understood. I would have thought it an
interesting "safety" measure to flag that when a user requests an ext4
mount and the file system is actually only ext2 that a refusal to mount
would indicate to the user that their ext* file system does not have
the required features to be called ext4.

Cheers,
b.

Attachments:

signature.asc (499.00 B)
This is a digitally signed message part

2024-01-16 13:29:48

by Brian J. Murrell

[permalink] [raw]

Subject: Re: e2scrub finds corruption immediately after mounting

On Wed, 2024-01-10 at 16:43 -0700, Andreas Dilger wrote:
>
> Hello Brian, long time no see!

Hi Andreas. Indeed, it has been a while. :-)

> I was wondering if this might be a case where e2fsck removed the
> journal
> on an ext4 filesystem, and then it wasn't recreated (e.g. if e2fsck
> was
> killed before it finished cleanly).
>
> However, looking at the features enabled on the filesystem, it
> definitely
> looks like this was originally formatted as ext4.

I suspect you mean s/4/2/ above?

> Many of these features can be enabled on an existing filesystem, like
> has_journal (ext3/4 journal), extents (improved large file
> allocation),
> huge_file (> 2TB files), dir_nlink (> 32000 subdirs) if you want
> them.
> I _think_ uninit_bg (e2fsck skip unused metadata may) is included
> here.
>
> Some cannot be enabled on an existing filesystem like flex_bg
> (localized
> metadata), and extra_isize (fast xattrs).
>
> Whether that is worthwhile for you to enable, or just
> backup/reformat/sync
> is up to you.

Indeed. I did simply re-create and copy as suggested by the wiki
entry.

Cheers,
b.

Attachments:

signature.asc (499.00 B)
This is a digitally signed message part

2024-01-17 19:42:36

by Andreas Dilger

[permalink] [raw]

Subject: Re: e2scrub finds corruption immediately after mounting

On Jan 16, 2024, at 6:29 AM, Brian J. Murrell <[email protected]> wrote:
>
> On Wed, 2024-01-10 at 10:06 -0800, Darrick J. Wong wrote:
>>
>> Huh. Do you remember the exact command that was used to format this
>> filesystem?
>
> I do not. It was created quite a while ago.
>
>> "mke2fs" still formats ext2 filesystems unless you pass
>> -T ext4 or call its cousin mkfs.ext4.
>
> I wonder if that's what I did perhaps.
>
>
>> Nope. ext4 is really just ext2 plus a bunch of new features
>> (journal,
>> extents, uninit_bg, dir_index).
>
> Yes, that's completely understood. I would have thought it an
> interesting "safety" measure to flag that when a user requests an ext4
> mount and the file system is actually only ext2 that a refusal to mount
> would indicate to the user that their ext* file system does not have
> the required features to be called ext4.

At this stage in the game, it _probably_ makes sense that bare "mke2fs"
default to ext4 instead of ext2 to avoid this issue?

Cheers, Andreas

Attachments:

signature.asc (890.00 B)
Message signed with OpenPGP

2024-01-17 22:29:59

by Brian J. Murrell

[permalink] [raw]

Subject: Re: e2scrub finds corruption immediately after mounting

On Wed, 2024-01-17 at 12:42 -0700, Andreas Dilger wrote:
>
> At this stage in the game, it _probably_ makes sense that bare
> "mke2fs"
> default to ext4 instead of ext2 to avoid this issue?

Seems reasonable to me. :-)

Cheers,
b.

Attachments:

signature.asc (499.00 B)
This is a digitally signed message part