2020-07-13 22:27:50

by Nathan Royce

[permalink] [raw]
Subject: F2FS Segmentation Fault

I won't re-format unless I hear something within a few days in case
you want me to try something.

Preface: There was a notable power outage a couple of nights ago.
When the power returned, everything seemed fine. No issues during
bootup or anything.
Then today, I went to open an application and my system started
schitzing out with programs suddenly closing(/crashing?).
I switched tty and tried to log in but was unable to even be allowed
to enter in my password.
I switched to another and tried logging in as root which succeeded (somehow).
I looked at the journal and saw an entry saying something about
/bin/login not being a valid exec format.
I went to reboot and when it got to fsck part of initramfs, it failed
and I was kicked to root.
I ran fsck and saw a bunch of issues, but I guess nothing could get
resolved enough to let me reboot.
Oh, in case you're wondering, my / (system) is on a 64GB SDHC card.
I just happened to also have an older / system on my mechanical drive
using BTRFS which I could boot to (which I'm on now).
I ran fsck from this older system and it seems I got the same results:

*****
Info: Fix the reported corruption.
Info: Force to fix corruption
Info: Segments per section = 1
Info: Sections per zone = 1
Info: sector size = 512
Info: total sectors = 124168159 (60628 MB)
Info: MKFS version
"Linux version 5.1.15.a-1-hardened ([email protected]) (gcc version
9.1.0 (GCC)) #1 SMP PREEMPT Thu Jun 27 11:33:04 CEST 2019"
Info: FSCK version
from "Linux version 4.19.13-dirty ([email protected]) (gcc
version 8.2.1 20181127 (GCC)) #2 SMP PREEMPT Mon Dec 31 00:15:50 CST
2018"
to "Linux version 4.19.13-dirty ([email protected]) (gcc
version 8.2.1 20181127 (GCC)) #2 SMP PREEMPT Mon Dec 31 00:15:50 CST
2018"
Info: superblock features = 0 :
Info: superblock encrypt level = 0, salt = 00000000000000000000000000000000
Info: total FS sectors = 124168152 (60628 MB)
Info: CKPT version = 63f2b4a
Info: checkpoint state = 55 : crc fsck compacted_summary unmount

NID[0x18eca] is unreachable, blkaddr:0xcf1d9d3c
NID[0x18ecb] is unreachable, blkaddr:0x5db5f91f
NID[0x18ecc] is unreachable, blkaddr:0x4653d
NID[0x18ee3] is unreachable, blkaddr:0x144dc401
NID[0x18ee4] is unreachable, blkaddr:0x558cfba9
NID[0x18ee5] is unreachable, blkaddr:0x45553
NID[0x18f78] is unreachable, blkaddr:0x560555ac
NID[0x18f79] is unreachable, blkaddr:0x58cccb0d
NID[0x18f7a] is unreachable, blkaddr:0x53d84
NID[0x4d621] is unreachable, blkaddr:0x4fc1d
NID[0x4d622] is unreachable, blkaddr:0x4fc1e
NID[0x7fa32] is unreachable, blkaddr:0x20b0ca3a
NID[0x7fa33] is unreachable, blkaddr:0xf71b60
[FSCK] Unreachable nat entries [Fail] [0xd]
[FSCK] SIT valid block bitmap checking [Fail]
[FSCK] Hard link checking for regular file [Ok..] [0x4f6]
[FSCK] valid_block_count matching with CP [Fail] [0x736fcb]
[FSCK] valid_node_count matcing with CP (de lookup) [Fail] [0x70327]
[FSCK] valid_node_count matcing with CP (nat lookup) [Ok..] [0x70334]
[FSCK] valid_inode_count matched with CP [Fail] [0x6f09e]
[FSCK] free segment_count matched with CP [Ok..] [0x3bfc]
[FSCK] next block offset is free [Ok..]
[FSCK] fixing SIT types
[FSCK] other corrupted bugs [Fail]

Do you want to restore lost files into ./lost_found/? [Y/N] Y
Segmentation fault
*****

*****
Message: Process 3425 (fsck.f2fs) of user 0 dumped core.

Stack trace of thread 3425:
#0 0x000055f8515739c8 n/a (fsck.f2fs)
#1 0x000055f851575261 n/a (fsck.f2fs)
#2 0x000055f851572c56 n/a (fsck.f2fs)
#3 0x000055f85156a3f0 n/a (fsck.f2fs)
#4 0x00007f51420feee3 __libc_start_main (libc.so.6)
#5 0x000055f85156a95e n/a (fsck.f2fs)
*****

So if you want more information or need me to try something, let me
know soon if you would. Otherwise, I'll just be reformatting my card
in a few days.
It just could've been a fluke occurred because of the power outage but
didn't manifest itself until today.


2020-07-14 00:06:50

by Jaegeuk Kim

[permalink] [raw]
Subject: Re: F2FS Segmentation Fault

Hi Nathan,

On 07/13, Nathan Royce wrote:
> I won't re-format unless I hear something within a few days in case
> you want me to try something.
>
> Preface: There was a notable power outage a couple of nights ago.
> When the power returned, everything seemed fine. No issues during
> bootup or anything.
> Then today, I went to open an application and my system started
> schitzing out with programs suddenly closing(/crashing?).
> I switched tty and tried to log in but was unable to even be allowed
> to enter in my password.
> I switched to another and tried logging in as root which succeeded (somehow).
> I looked at the journal and saw an entry saying something about
> /bin/login not being a valid exec format.
> I went to reboot and when it got to fsck part of initramfs, it failed
> and I was kicked to root.
> I ran fsck and saw a bunch of issues, but I guess nothing could get
> resolved enough to let me reboot.
> Oh, in case you're wondering, my / (system) is on a 64GB SDHC card.
> I just happened to also have an older / system on my mechanical drive
> using BTRFS which I could boot to (which I'm on now).
> I ran fsck from this older system and it seems I got the same results:
>
> *****
> Info: Fix the reported corruption.
> Info: Force to fix corruption
> Info: Segments per section = 1
> Info: Sections per zone = 1
> Info: sector size = 512
> Info: total sectors = 124168159 (60628 MB)
> Info: MKFS version
> "Linux version 5.1.15.a-1-hardened ([email protected]) (gcc version
> 9.1.0 (GCC)) #1 SMP PREEMPT Thu Jun 27 11:33:04 CEST 2019"
> Info: FSCK version
> from "Linux version 4.19.13-dirty ([email protected]) (gcc
> version 8.2.1 20181127 (GCC)) #2 SMP PREEMPT Mon Dec 31 00:15:50 CST
> 2018"
> to "Linux version 4.19.13-dirty ([email protected]) (gcc
> version 8.2.1 20181127 (GCC)) #2 SMP PREEMPT Mon Dec 31 00:15:50 CST
> 2018"
> Info: superblock features = 0 :
> Info: superblock encrypt level = 0, salt = 00000000000000000000000000000000
> Info: total FS sectors = 124168152 (60628 MB)
> Info: CKPT version = 63f2b4a
> Info: checkpoint state = 55 : crc fsck compacted_summary unmount
>
> NID[0x18eca] is unreachable, blkaddr:0xcf1d9d3c
> NID[0x18ecb] is unreachable, blkaddr:0x5db5f91f
> NID[0x18ecc] is unreachable, blkaddr:0x4653d
> NID[0x18ee3] is unreachable, blkaddr:0x144dc401
> NID[0x18ee4] is unreachable, blkaddr:0x558cfba9
> NID[0x18ee5] is unreachable, blkaddr:0x45553
> NID[0x18f78] is unreachable, blkaddr:0x560555ac
> NID[0x18f79] is unreachable, blkaddr:0x58cccb0d
> NID[0x18f7a] is unreachable, blkaddr:0x53d84
> NID[0x4d621] is unreachable, blkaddr:0x4fc1d
> NID[0x4d622] is unreachable, blkaddr:0x4fc1e
> NID[0x7fa32] is unreachable, blkaddr:0x20b0ca3a
> NID[0x7fa33] is unreachable, blkaddr:0xf71b60
> [FSCK] Unreachable nat entries [Fail] [0xd]
> [FSCK] SIT valid block bitmap checking [Fail]
> [FSCK] Hard link checking for regular file [Ok..] [0x4f6]
> [FSCK] valid_block_count matching with CP [Fail] [0x736fcb]
> [FSCK] valid_node_count matcing with CP (de lookup) [Fail] [0x70327]
> [FSCK] valid_node_count matcing with CP (nat lookup) [Ok..] [0x70334]
> [FSCK] valid_inode_count matched with CP [Fail] [0x6f09e]
> [FSCK] free segment_count matched with CP [Ok..] [0x3bfc]
> [FSCK] next block offset is free [Ok..]
> [FSCK] fixing SIT types
> [FSCK] other corrupted bugs [Fail]
>
> Do you want to restore lost files into ./lost_found/? [Y/N] Y

Could you try to say "N" here to move forward to fix the corrupted metadata?

Thanks,

> Segmentation fault
> *****
>
> *****
> Message: Process 3425 (fsck.f2fs) of user 0 dumped core.
>
> Stack trace of thread 3425:
> #0 0x000055f8515739c8 n/a (fsck.f2fs)
> #1 0x000055f851575261 n/a (fsck.f2fs)
> #2 0x000055f851572c56 n/a (fsck.f2fs)
> #3 0x000055f85156a3f0 n/a (fsck.f2fs)
> #4 0x00007f51420feee3 __libc_start_main (libc.so.6)
> #5 0x000055f85156a95e n/a (fsck.f2fs)
> *****
>
> So if you want more information or need me to try something, let me
> know soon if you would. Otherwise, I'll just be reformatting my card
> in a few days.
> It just could've been a fluke occurred because of the power outage but
> didn't manifest itself until today.

2020-07-14 02:27:24

by Nathan Royce

[permalink] [raw]
Subject: Re: F2FS Segmentation Fault

On Mon, Jul 13, 2020 at 7:03 PM Jaegeuk Kim <[email protected]> wrote:
>
> Hi Nathan,
>
> Could you try to say "N" here to move forward to fix the corrupted metadata?
>
> Thanks,
*****
Do you want to restore lost files into ./lost_found/? [Y/N] N
Info: Write valid nat_bits in checkpoint
[FIX] (nullify_nat_entry:2273) --> Remove nid [0x18eca] in NAT
[FIX] (nullify_nat_entry:2273) --> Remove nid [0x18ecb] in NAT
[FIX] (nullify_nat_entry:2273) --> Remove nid [0x18ecc] in NAT
[FIX] (nullify_nat_entry:2273) --> Remove nid [0x18ee3] in NAT
[FIX] (nullify_nat_entry:2273) --> Remove nid [0x18ee4] in NAT
[FIX] (nullify_nat_entry:2273) --> Remove nid [0x18ee5] in NAT
[FIX] (nullify_nat_entry:2273) --> Remove nid [0x18f78] in NAT
[FIX] (nullify_nat_entry:2273) --> Remove nid [0x18f79] in NAT
[FIX] (nullify_nat_entry:2273) --> Remove nid [0x18f7a] in NAT
[FIX] (nullify_nat_entry:2273) --> Remove nid [0x4d621] in NAT
[FIX] (nullify_nat_entry:2273) --> Remove nid [0x4d622] in NAT
[FIX] (nullify_nat_entry:2273) --> Remove nid [0x7fa32] in NAT
[FIX] (nullify_nat_entry:2273) --> Remove nid [0x7fa33] in NAT
Info: Write valid nat_bits in checkpoint

Done.
*****

*****
Info: Fix the reported corruption.
Info: Force to fix corruption
Info: Segments per section = 1
Info: Sections per zone = 1
Info: sector size = 512
Info: total sectors = 124168159 (60628 MB)
Info: MKFS version
"Linux version 5.1.15.a-1-hardened ([email protected]) (gcc version
9.1.0 (GCC)) #1 SMP PREEMPT Thu Jun 27 11:33:04 CEST 2019"
Info: FSCK version
from "Linux version 4.19.13-dirty ([email protected]) (gcc version 8.2.1
20181127 (GCC)) #2 SMP PREEMPT Mon Dec 31 00:15:50 CST 2018"
to "Linux version 4.19.13-dirty ([email protected]) (gcc version 8.2.1
20181127 (GCC)) #2 SMP PREEMPT Mon Dec 31 00:15:50 CST 2018"
Info: superblock features = 0 :
Info: superblock encrypt level = 0, salt = 00000000000000000000000000000000
Info: total FS sectors = 124168152 (60628 MB)
Info: CKPT version = 63f2b4a
Info: checkpoint state = 281 : allow_nocrc nat_bits unmount
Info: No error was reported
*****
I'm now booted in from my SDHC card. So it "seems" I'm good to go.
But with the actions taken and the files I've seen displayed during
the fsck, I'm thinking I'm going to reinstall all packages.
Assuming the issue was related to the power outage, I do wonder why
there weren't any fsck issues at bootup at that time. I hadn't had any
disk issues before with that card.
At least now I know the issue would be resolved by not saving the lost
files and I can continue on my merry way.

2020-07-14 05:55:13

by Jaegeuk Kim

[permalink] [raw]
Subject: Re: F2FS Segmentation Fault

On 07/13, Nathan Royce wrote:
> On Mon, Jul 13, 2020 at 7:03 PM Jaegeuk Kim <[email protected]> wrote:
> >
> > Hi Nathan,
> >
> > Could you try to say "N" here to move forward to fix the corrupted metadata?
> >
> > Thanks,
> *****
> Do you want to restore lost files into ./lost_found/? [Y/N] N
> Info: Write valid nat_bits in checkpoint
> [FIX] (nullify_nat_entry:2273) --> Remove nid [0x18eca] in NAT
> [FIX] (nullify_nat_entry:2273) --> Remove nid [0x18ecb] in NAT
> [FIX] (nullify_nat_entry:2273) --> Remove nid [0x18ecc] in NAT
> [FIX] (nullify_nat_entry:2273) --> Remove nid [0x18ee3] in NAT
> [FIX] (nullify_nat_entry:2273) --> Remove nid [0x18ee4] in NAT
> [FIX] (nullify_nat_entry:2273) --> Remove nid [0x18ee5] in NAT
> [FIX] (nullify_nat_entry:2273) --> Remove nid [0x18f78] in NAT
> [FIX] (nullify_nat_entry:2273) --> Remove nid [0x18f79] in NAT
> [FIX] (nullify_nat_entry:2273) --> Remove nid [0x18f7a] in NAT
> [FIX] (nullify_nat_entry:2273) --> Remove nid [0x4d621] in NAT
> [FIX] (nullify_nat_entry:2273) --> Remove nid [0x4d622] in NAT
> [FIX] (nullify_nat_entry:2273) --> Remove nid [0x7fa32] in NAT
> [FIX] (nullify_nat_entry:2273) --> Remove nid [0x7fa33] in NAT
> Info: Write valid nat_bits in checkpoint
>
> Done.
> *****
>
> *****
> Info: Fix the reported corruption.
> Info: Force to fix corruption
> Info: Segments per section = 1
> Info: Sections per zone = 1
> Info: sector size = 512
> Info: total sectors = 124168159 (60628 MB)
> Info: MKFS version
> "Linux version 5.1.15.a-1-hardened ([email protected]) (gcc version
> 9.1.0 (GCC)) #1 SMP PREEMPT Thu Jun 27 11:33:04 CEST 2019"
> Info: FSCK version
> from "Linux version 4.19.13-dirty ([email protected]) (gcc version 8.2.1
> 20181127 (GCC)) #2 SMP PREEMPT Mon Dec 31 00:15:50 CST 2018"
> to "Linux version 4.19.13-dirty ([email protected]) (gcc version 8.2.1
> 20181127 (GCC)) #2 SMP PREEMPT Mon Dec 31 00:15:50 CST 2018"
> Info: superblock features = 0 :
> Info: superblock encrypt level = 0, salt = 00000000000000000000000000000000
> Info: total FS sectors = 124168152 (60628 MB)
> Info: CKPT version = 63f2b4a
> Info: checkpoint state = 281 : allow_nocrc nat_bits unmount
> Info: No error was reported
> *****
> I'm now booted in from my SDHC card. So it "seems" I'm good to go.
> But with the actions taken and the files I've seen displayed during
> the fsck, I'm thinking I'm going to reinstall all packages.
> Assuming the issue was related to the power outage, I do wonder why
> there weren't any fsck issues at bootup at that time. I hadn't had any
> disk issues before with that card.
> At least now I know the issue would be resolved by not saving the lost
> files and I can continue on my merry way.

I suspect the last power outage caused the FTL in the card firmware to point
the f2fs NAT table area to somewhere wrong flash cells. It may be possible
that we can't see any filesystem corruption easily, since it can corrupt
data area in higher possibility; this doesn't lead filesystem inconsistency.

I guess you use "-a" for fsck at boot up, which means scanning the disk when
runtime f2fs detects any inconsistent data. But that is true, only when the disk
guarantees SYNC_CACHE at least. Unfortunately, IMHO, such the flash card doesn't
support it gracefully, and thus, we can't rely on SYNC_CACHE which should be the
baseline to guarantee filesystem consistency. I think it'd be good to run "-f",
if sudden power cut happens in this case.

Thanks,