2018-10-28 17:52:11

by Anatoly Trosinenko

[permalink] [raw]
Subject: XFS: Hang and dmesg flood on mounting invalid FS image

Hello,

When mounting a broken XFS image, the kernel hangs and floods dmesg
with stack traces.

How to reproduce with kvm-xfstests:
1) Checkout v4.19, copy x86_64-config-4.14 to .config, `make
olddefconfig` and compile
2) Unpack the attached image (128 Mb uncompressed) to /tmp/kvm-xfstests-$USER
3) Inside the `kvm-xfstests shell`:

root@kvm-xfstests:~# mount /vtmp
root@kvm-xfstests:~# mount /vtmp/xfs.img /mnt
[ 39.280840] XFS (loop0): Mounting V5 Filesystem
[ 39.303657] XFS (loop0): Internal error xlog_valid_rec_header(2) at
line 5283 of file fs/xfs/xfs_log_recover.c. Caller
xlog_do_recovery_pass+0x2bd/0x5d0
[ 39.304886] CPU: 0 PID: 373 Comm: mount Not tainted 4.19.0-xfstests #1
[ 39.305491] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996),
BIOS 1.11.1-1ubuntu1 04/01/2014
[ 39.306247] Call Trace:
[ 39.306464] dump_stack+0x85/0xc0
[ 39.306751] xlog_valid_rec_header.isra.6+0xd5/0xe0
[ 39.307250] xlog_do_recovery_pass+0x2bd/0x5d0
[ 39.307624] ? xlog_bread_noalign+0x98/0x110
[ 39.307983] ? xlog_bread_noalign+0x98/0x110
[ 39.308344] xlog_verify_head+0xab/0x190
[ 39.308675] xlog_find_tail+0x208/0x350
[ 39.308999] xlog_recover+0x2b/0x160
[ 39.309307] xfs_log_mount+0x280/0x2a0
[ 39.309625] xfs_mountfs+0x42f/0x890
[ 39.309928] ? xfs_mru_cache_create+0x18b/0x1f0
[ 39.310309] xfs_fs_fill_super+0x4f8/0x6d0
[ 39.310655] ? xfs_test_remount_options+0x60/0x60
[ 39.311050] mount_bdev+0x17f/0x1b0
[ 39.311347] mount_fs+0x3b/0x170
[ 39.311624] ? __init_waitqueue_head+0x36/0x50
[ 39.311999] vfs_kern_mount.part.16+0x54/0x160
[ 39.312374] do_mount+0x20e/0xdf0
[ 39.312657] ? memdup_user+0x3e/0x70
[ 39.312961] __ia32_compat_sys_mount+0xa3/0x140
[ 39.313346] do_fast_syscall_32+0x9d/0x2f0
[ 39.313694] entry_SYSENTER_compat+0x84/0x96
[ 39.314126] XFS (loop0): Torn write (CRC failure) detected at log
block 0xaaf. Truncating head block from 0xab7.
[ 39.316912] XFS (loop0): Internal error xlog_valid_rec_header(2) at
line 5283 of file fs/xfs/xfs_log_recover.c. Caller
xlog_do_recovery_pass+0x2bd/0x5d0
[ 39.318057] CPU: 0 PID: 373 Comm: mount Not tainted 4.19.0-xfstests #1
[ 39.318601] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996),
BIOS 1.11.1-1ubuntu1 04/01/2014
[ 39.319339] Call Trace:
[ 39.319552] dump_stack+0x85/0xc0
[ 39.319834] xlog_valid_rec_header.isra.6+0xd5/0xe0
[ 39.320244] xlog_do_recovery_pass+0x2bd/0x5d0
[ 39.320618] ? xlog_bread_noalign+0x98/0x110
[ 39.320979] ? xlog_bread_noalign+0x98/0x110
[ 39.321344] xlog_verify_tail+0x144/0x1e0
[ 39.321686] xlog_verify_head+0xd4/0x190
[ 39.322021] xlog_find_tail+0x208/0x350
[ 39.322348] xlog_recover+0x2b/0x160
[ 39.322654] xfs_log_mount+0x280/0x2a0
[ 39.322974] xfs_mountfs+0x42f/0x890
[ 39.323280] ? xfs_mru_cache_create+0x18b/0x1f0
[ 39.323664] xfs_fs_fill_super+0x4f8/0x6d0
[ 39.324013] ? xfs_test_remount_options+0x60/0x60
[ 39.324411] mount_bdev+0x17f/0x1b0
[ 39.324710] mount_fs+0x3b/0x170
[ 39.324988] ? __init_waitqueue_head+0x36/0x50
[ 39.325369] vfs_kern_mount.part.16+0x54/0x160
[ 39.325747] do_mount+0x20e/0xdf0
[ 39.326063] ? memdup_user+0x3e/0x70
[ 39.326369] __ia32_compat_sys_mount+0xa3/0x140
[ 39.326753] do_fast_syscall_32+0x9d/0x2f0
[ 39.327101] entry_SYSENTER_compat+0x84/0x96
[ 39.328873] XFS (loop0): Internal error xlog_valid_rec_header(2) at
line 5283 of file fs/xfs/xfs_log_recover.c. Caller
xlog_do_recovery_pass+0x2bd/0x5d0
[ 39.330024] CPU: 0 PID: 373 Comm: mount Not tainted 4.19.0-xfstests #1
[ 39.330574] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996),
BIOS 1.11.1-1ubuntu1 04/01/2014
[ 39.331349] Call Trace:
[ 39.331565] dump_stack+0x85/0xc0
[ 39.331850] xlog_valid_rec_header.isra.6+0xd5/0xe0
[ 39.332264] xlog_do_recovery_pass+0x2bd/0x5d0
[ 39.332642] ? xlog_bread_noalign+0x98/0x110
[ 39.333005] ? xlog_bread_noalign+0x98/0x110
[ 39.333371] xlog_verify_tail+0x144/0x1e0
[ 39.333713] xlog_verify_head+0xd4/0x190
[ 39.334048] xlog_find_tail+0x208/0x350
[ 39.334376] xlog_recover+0x2b/0x160
[ 39.334682] xfs_log_mount+0x280/0x2a0
[ 39.335003] xfs_mountfs+0x42f/0x890
[ 39.335309] ? xfs_mru_cache_create+0x18b/0x1f0
[ 39.335693] xfs_fs_fill_super+0x4f8/0x6d0
[ 39.336041] ? xfs_test_remount_options+0x60/0x60
[ 39.336439] mount_bdev+0x17f/0x1b0
[ 39.336739] mount_fs+0x3b/0x170
[ 39.337016] ? __init_waitqueue_head+0x36/0x50
[ 39.337396] vfs_kern_mount.part.16+0x54/0x160
[ 39.337774] do_mount+0x20e/0xdf0
[ 39.338059] ? memdup_user+0x3e/0x70
[ 39.338365] __ia32_compat_sys_mount+0xa3/0x140
[ 39.338749] do_fast_syscall_32+0x9d/0x2f0
[ 39.339098] entry_SYSENTER_compat+0x84/0x96
[ 39.340891] XFS (loop0): Internal error xlog_valid_rec_header(2) at
line 5283 of file fs/xfs/xfs_log_recover.c. Caller
xlog_do_recovery_pass+0x2bd/0x5d0
[ 39.342084] CPU: 0 PID: 373 Comm: mount Not tainted 4.19.0-xfstests #1
[ 39.342636] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996),
BIOS 1.11.1-1ubuntu1 04/01/2014
[ 39.343385] Call Trace:
[ 39.343601] dump_stack+0x85/0xc0
[ 39.343890] xlog_valid_rec_header.isra.6+0xd5/0xe0
[ 39.344309] xlog_do_recovery_pass+0x2bd/0x5d0
[ 39.344692] ? xlog_bread_noalign+0x98/0x110
[ 39.345092] ? xlog_bread_noalign+0x98/0x110
[ 39.345709] xlog_verify_tail+0x144/0x1e0
[ 39.346059] xlog_verify_head+0xd4/0x190
[ 39.346392] xlog_find_tail+0x208/0x350
[ 39.346718] xlog_recover+0x2b/0x160
[ 39.347025] xfs_log_mount+0x280/0x2a0
[ 39.347345] xfs_mountfs+0x42f/0x890
[ 39.347650] ? xfs_mru_cache_create+0x18b/0x1f0
[ 39.348032] xfs_fs_fill_super+0x4f8/0x6d0
[ 39.348380] ? xfs_test_remount_options+0x60/0x60
[ 39.348778] mount_bdev+0x17f/0x1b0
[ 39.349077] mount_fs+0x3b/0x170
[ 39.349366] ? __init_waitqueue_head+0x36/0x50
[ 39.349753] vfs_kern_mount.part.16+0x54/0x160
[ 39.350138] do_mount+0x20e/0xdf0
[ 39.350425] ? memdup_user+0x3e/0x70
[ 39.350731] __ia32_compat_sys_mount+0xa3/0x140
[ 39.351115] do_fast_syscall_32+0x9d/0x2f0
[ 39.351464] entry_SYSENTER_compat+0x84/0x96
[ 39.353323] XFS (loop0): Internal error xlog_valid_rec_header(2) at
line 5283 of file fs/xfs/xfs_log_recover.c. Caller
xlog_do_recovery_pass+0x2bd/0x5d0
// ... and so on ...

Best regards
Anatoly


Attachments:
xfs_128mb.img.bz2 (5.15 kB)

2018-10-29 01:36:21

by Dave Chinner

[permalink] [raw]
Subject: Re: XFS: Hang and dmesg flood on mounting invalid FS image

On Sun, Oct 28, 2018 at 08:50:46PM +0300, Anatoly Trosinenko wrote:
> Hello,
>
> When mounting a broken XFS image, the kernel hangs and floods dmesg
> with stack traces.

How did the corruption occur?

$ sudo xfs_logprint -d /dev/vdc
xfs_logprint:
data device: 0xfd20
log device: 0xfd20 daddr: 131112 length: 6840

0 HEADER Cycle 1 tail 1:000000 len 512 ops 1
[00000 - 00000] Cycle 0xffffffff New Cycle 0x00000001
2 HEADER Cycle 1 tail 1:000002 len 512 ops 5
4 HEADER Cycle 1 tail -2147483647:000002 len 512 ops 1
^^^^^^^^^^^^
6 HEADER Cycle 0 tail 1:000000 len 0 ops 0
[00000 - 00006] Cycle 0x00000001 New Cycle 0x00000000
7 HEADER Cycle 0 tail 1:000000 len 0 ops 0

Ok, so from this the head of the log is block 4, and it has a
corrupt tail pointer it points to:


$ sudo xfs_logprint -D -s 4 /dev/vdc |head -10
xfs_logprint:
data device: 0xfd20
log device: 0xfd20 daddr: 131112 length: 6840

BLKNO: 4
0 bebaedfe 1000000 2000000 20000 1000000 3610000 1000080 2000000
^^^^^^^ ^ ^
wrong wrong wrong

8 2f27bae6 2000000 1000000 dabdbab0 0 0 0 0
10 0 0 0 0 0 0 0 0
18 0 0 0 0 0 0 0 0
20 0 0 0 0 0 0 0 0

They decode as:

cycle: 1 version: 2 lsn: 1,24835 tail_lsn: 2147483649,2

So the tail LSN points to an invalid log cycle and the previous
block. IOWs, the block number in the tail indicates the whole log is
valid and needs to be scanned. but the cycle is not valid.

And that's the problem. Neither the head or tail blocks are
validated before they are used. CRC checking of the head and tail
blocks comes later....

Cheers,

Dave.
--
Dave Chinner
[email protected]

2018-10-29 06:58:10

by Anatoly Trosinenko

[permalink] [raw]
Subject: Re: XFS: Hang and dmesg flood on mounting invalid FS image

> How did the corruption occur?

It is a fuzzed image. Most probably, it was artificially "patched" by
fuzzer. Or do you mean "what particular bytes were changed"?

Best regards
Anatoly
пн, 29 окт. 2018 г. в 4:32, Dave Chinner <[email protected]>:
>
> On Sun, Oct 28, 2018 at 08:50:46PM +0300, Anatoly Trosinenko wrote:
> > Hello,
> >
> > When mounting a broken XFS image, the kernel hangs and floods dmesg
> > with stack traces.
>
> How did the corruption occur?
>
> $ sudo xfs_logprint -d /dev/vdc
> xfs_logprint:
> data device: 0xfd20
> log device: 0xfd20 daddr: 131112 length: 6840
>
> 0 HEADER Cycle 1 tail 1:000000 len 512 ops 1
> [00000 - 00000] Cycle 0xffffffff New Cycle 0x00000001
> 2 HEADER Cycle 1 tail 1:000002 len 512 ops 5
> 4 HEADER Cycle 1 tail -2147483647:000002 len 512 ops 1
> ^^^^^^^^^^^^
> 6 HEADER Cycle 0 tail 1:000000 len 0 ops 0
> [00000 - 00006] Cycle 0x00000001 New Cycle 0x00000000
> 7 HEADER Cycle 0 tail 1:000000 len 0 ops 0
>
> Ok, so from this the head of the log is block 4, and it has a
> corrupt tail pointer it points to:
>
>
> $ sudo xfs_logprint -D -s 4 /dev/vdc |head -10
> xfs_logprint:
> data device: 0xfd20
> log device: 0xfd20 daddr: 131112 length: 6840
>
> BLKNO: 4
> 0 bebaedfe 1000000 2000000 20000 1000000 3610000 1000080 2000000
> ^^^^^^^ ^ ^
> wrong wrong wrong
>
> 8 2f27bae6 2000000 1000000 dabdbab0 0 0 0 0
> 10 0 0 0 0 0 0 0 0
> 18 0 0 0 0 0 0 0 0
> 20 0 0 0 0 0 0 0 0
>
> They decode as:
>
> cycle: 1 version: 2 lsn: 1,24835 tail_lsn: 2147483649,2
>
> So the tail LSN points to an invalid log cycle and the previous
> block. IOWs, the block number in the tail indicates the whole log is
> valid and needs to be scanned. but the cycle is not valid.
>
> And that's the problem. Neither the head or tail blocks are
> validated before they are used. CRC checking of the head and tail
> blocks comes later....
>
> Cheers,
>
> Dave.
> --
> Dave Chinner
> [email protected]

2018-10-29 07:11:36

by Dave Chinner

[permalink] [raw]
Subject: Re: XFS: Hang and dmesg flood on mounting invalid FS image

On Mon, Oct 29, 2018 at 09:57:20AM +0300, Anatoly Trosinenko wrote:
> > How did the corruption occur?
>
> It is a fuzzed image. Most probably, it was artificially "patched" by
> fuzzer. Or do you mean "what particular bytes were changed"?

I wondered how this specific corruption occurred in the real world.
If i was a real world problem, it would have been indicative of a
code bug if it did occur (i.e. whatever wrote the log record would
have been broken) but seeing as it's a fuzzer problem, I don't need
to look for any code bugs other than "we didn't validate the input
properly".

Cheers,

Dave.
--
Dave Chinner
[email protected]