2013-06-15 02:36:51

by Fengguang Wu

[permalink] [raw]
Subject: XFS (vdb): Corruption detected. Unmount and run xfs_repair

Greetings,

I got the below dmesg in both upstream and linux-next, and the first
bad commit *might be* commit 211d022c43ca ("xfs: Avoid pathological
backwards allocation").

[ 74.595386]
[ 74.603826] CPU: 0 PID: 2137 Comm: kworker/0:1H Not tainted 3.10.0-rc1-00031-gade1335 #1508
[ 74.609255] Hardware name: Bochs Bochs, BIOS Bochs 01/01/2007
[ 74.612498] Workqueue: xfslogd xfs_buf_iodone_work
[ 74.615690] 0000000000000001 ffff880016815c68 ffffffff81aa456f ffff880016815c88
[ 74.621548] ffffffff8130b179 ffffffff81309514 0000000000000001 ffff880016815cc8
[ 74.627417] ffffffff8130b1d0 0000000000000000 00000000000002da 0000000000000016
[ 74.633321] Call Trace:
[ 74.635427] [<ffffffff81aa456f>] dump_stack+0x19/0x1b
[ 74.638412] [<ffffffff8130b179>] xfs_error_report+0x3d/0x3f
[ 74.641627] [<ffffffff81309514>] ? xfs_buf_iodone_work+0x4a/0x83
[ 74.644970] [<ffffffff8130b1d0>] xfs_corruption_error+0x55/0x71
[ 74.648217] [<ffffffff81352ce1>] xfs_sb_read_verify+0xee/0x105
[ 74.651478] [<ffffffff81309514>] ? xfs_buf_iodone_work+0x4a/0x83
[ 74.654820] [<ffffffff8108045d>] ? ftrace_raw_event_workqueue_execute_start+0x92/0xa1
[ 74.659821] [<ffffffff81309514>] xfs_buf_iodone_work+0x4a/0x83
[ 74.663042] [<ffffffff81082561>] process_one_work+0x26c/0x470
[ 74.666296] [<ffffffff810824bf>] ? process_one_work+0x1ca/0x470
[ 74.669647] [<ffffffff81082ee6>] worker_thread+0x1d0/0x2cb
[ 74.672770] [<ffffffff81082d16>] ? manage_workers.isra.19+0x1c3/0x1c3
[ 74.676201] [<ffffffff8108a590>] kthread+0xd5/0xdd
[ 74.679151] [<ffffffff810bd47c>] ? trace_hardirqs_on+0xd/0xf
[ 74.682411] [<ffffffff8108a4bb>] ? __init_kthread_worker+0x5a/0x5a
[ 74.685776] [<ffffffff81ab74dc>] ret_from_fork+0x7c/0xb0
[ 74.688798] [<ffffffff8108a4bb>] ? __init_kthread_worker+0x5a/0x5a
[ 74.692206] XFS (vdb): Corruption detected. Unmount and run xfs_repair
[ 74.696000] XFS (vdb): SB validate failed with error 22.

I'm not sure whether it's the first bad commit because

- the parent commit f722406faae2d073cc1d01063d1123c35425939e reliably
crashes in a slightly earlier boot stage (2nd dmesg)

- the error is still there after reverting the patch

The bisect log does indicate that the error is introduced somewhere
after v3.10-rc1.

git bisect start ade1335afef556df6538eb02e8c0dc91fbd9cc37 f722406faae2d073cc1d01063d1123c35425939e --
git bisect bad d4c712bcf26a25c2b67c90e44e0b74c7993b5334 # 06:21 0- xfs: fully initialise temp leaf in xfs_attr3_leaf_compact
git bisect bad b38958d715316031fe9ea0cc6c22043072a55f49 # 06:27 0- xfs: xfs_attr_shortform_allfit() does not handle attr3 format.
git bisect bad 28ca489c63e9aceed8801d2f82d731b3c9aa50f5 # 06:35 1- xfs: fix rounding in xfs_free_file_space
git bisect bad 49b137cbbcc836ef231866c137d24f42c42bb483 # 06:41 0- xfs: fix sub-page blocksize data integrity writes
git bisect bad 211d022c43cac3aecbe967fcaf9b10156bfa63ad # 06:47 0- xfs: Avoid pathological backwards allocation
git bisect good f722406faae2d073cc1d01063d1123c35425939e # 09:52 300+ Linux 3.10-rc1
git bisect bad ade1335afef556df6538eb02e8c0dc91fbd9cc37 # 09:52 0- xfs: ensure btree root split sets blkno correctly
git bisect bad 7872bbb055d8070cdd8e7691d7a4181fbfa43caa # 09:59 2- Revert "xfs: Avoid pathological backwards allocation"
git bisect bad a2648ebb7ed69ef209d9c8a76fadeb3252d9a023 # 10:03 9- Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs
git bisect bad c04efed734409f5a44715b54a6ca1b54b0ccf215 # 10:06 0- Add linux-next specific files for 20130607

Thanks,
Fengguang


Attachments:
(No filename) (3.59 kB)
dmesg-kvm-kbuild-31646-20130615005640-3.10.0-rc1-00031-gade1335-1508 (314.08 kB)
bisect-ade1335afef556df6538eb02e8c0dc91fbd9cc37-x86_64-nfsroot-xfs_sb_read_verify-25543.log (10.94 kB)
.config-bisect (91.81 kB)
dmesg-kvm-xian-2307-20130615060932-3.10.0-rc1-587 (53.50 kB)
Download all attachments

2013-06-15 03:09:33

by Dave Chinner

[permalink] [raw]
Subject: Re: XFS (vdb): Corruption detected. Unmount and run xfs_repair

[cc [email protected], where XFS bug reports should go]

On Sat, Jun 15, 2013 at 10:36:20AM +0800, Fengguang Wu wrote:
> Greetings,
>
> I got the below dmesg in both upstream and linux-next, and the first
> bad commit *might be* commit 211d022c43ca ("xfs: Avoid pathological
> backwards allocation").
>
> [ 74.595386]
> [ 74.603826] CPU: 0 PID: 2137 Comm: kworker/0:1H Not tainted 3.10.0-rc1-00031-gade1335 #1508
> [ 74.609255] Hardware name: Bochs Bochs, BIOS Bochs 01/01/2007
> [ 74.612498] Workqueue: xfslogd xfs_buf_iodone_work
> [ 74.615690] 0000000000000001 ffff880016815c68 ffffffff81aa456f ffff880016815c88
> [ 74.621548] ffffffff8130b179 ffffffff81309514 0000000000000001 ffff880016815cc8
> [ 74.627417] ffffffff8130b1d0 0000000000000000 00000000000002da 0000000000000016
> [ 74.633321] Call Trace:
> [ 74.635427] [<ffffffff81aa456f>] dump_stack+0x19/0x1b
> [ 74.638412] [<ffffffff8130b179>] xfs_error_report+0x3d/0x3f
> [ 74.641627] [<ffffffff81309514>] ? xfs_buf_iodone_work+0x4a/0x83
> [ 74.644970] [<ffffffff8130b1d0>] xfs_corruption_error+0x55/0x71
> [ 74.648217] [<ffffffff81352ce1>] xfs_sb_read_verify+0xee/0x105
> [ 74.651478] [<ffffffff81309514>] ? xfs_buf_iodone_work+0x4a/0x83
> [ 74.654820] [<ffffffff8108045d>] ? ftrace_raw_event_workqueue_execute_start+0x92/0xa1
> [ 74.659821] [<ffffffff81309514>] xfs_buf_iodone_work+0x4a/0x83
> [ 74.663042] [<ffffffff81082561>] process_one_work+0x26c/0x470
> [ 74.666296] [<ffffffff810824bf>] ? process_one_work+0x1ca/0x470
> [ 74.669647] [<ffffffff81082ee6>] worker_thread+0x1d0/0x2cb
> [ 74.672770] [<ffffffff81082d16>] ? manage_workers.isra.19+0x1c3/0x1c3
> [ 74.676201] [<ffffffff8108a590>] kthread+0xd5/0xdd
> [ 74.679151] [<ffffffff810bd47c>] ? trace_hardirqs_on+0xd/0xf
> [ 74.682411] [<ffffffff8108a4bb>] ? __init_kthread_worker+0x5a/0x5a
> [ 74.685776] [<ffffffff81ab74dc>] ret_from_fork+0x7c/0xb0
> [ 74.688798] [<ffffffff8108a4bb>] ? __init_kthread_worker+0x5a/0x5a
> [ 74.692206] XFS (vdb): Corruption detected. Unmount and run xfs_repair
> [ 74.696000] XFS (vdb): SB validate failed with error 22.

EINVAL, which means there should have been some kind of output in
the log before the -corruption report- that explains why EINVAL was
returned.

> I'm not sure whether it's the first bad commit because

It's not, because it isn't in the upstream kernel and so if you are
seeing it in the upstream kernel, it can't be the cause. And,
besides:

> [ 74.570969] XFS (vdb): bad magic number
> [ 74.573837] ffff8800170ed000: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
> [ 74.579266] ffff8800170ed010: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
> [ 74.584581] ffff8800170ed020: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
> [ 74.590036] ffff8800170ed030: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
> [ 74.595386] XFS (vdb): Internal error xfs_sb_read_verify at line 730 of file /c/kernel-tests/src/stable/fs/xfs/xfs_mount.c. Caller 0xffffffff81309514
.....
> [ 74.692206] XFS (vdb): Corruption detected. Unmount and run xfs_repair
> [ 74.696000] XFS (vdb): SB validate failed with error 22.

It's obviously not an XFS filesystem you are asking the kernel to
mount, so it's perfectly valid to throw a corruption error at you.
What it has actually thrown is EWRONGFS, but because you've asked
the kernel specifically to mount the device as an XFS filesystem,
the kernel is explicitly telling you that it's a corrupt
filesystem... :)

> common.rc: retrying test device mount with external set
> [ 74.782247] XFS (vdb): bad magic number
> [ 74.784895] ffff8800170e7000: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
> [ 74.790201] ffff8800170e7010: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
> [ 74.795466] ffff8800170e7020: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
> [ 74.800759] ffff8800170e7030: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
> [ 74.806031] XFS (vdb): Internal error xfs_sb_read_verify at line 730 of file /c/kernel-tests/src/stable/fs/xfs/xfs_mount.c. Caller 0xffffffff81309514

It still isn't an XFS filesystem.... :/

This looks like user error, not a bug.

Cheers,

Dave.
--
Dave Chinner
[email protected]

2013-06-17 18:37:54

by Ben Myers

[permalink] [raw]
Subject: Re: XFS (vdb): Corruption detected. Unmount and run xfs_repair

Hey Fengguang,

On Sat, Jun 15, 2013 at 01:09:28PM +1000, Dave Chinner wrote:
> [cc [email protected], where XFS bug reports should go]
>
> On Sat, Jun 15, 2013 at 10:36:20AM +0800, Fengguang Wu wrote:
> > [ 74.570969] XFS (vdb): bad magic number
> > [ 74.573837] ffff8800170ed000: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
X F S B

That's the magic it's looking for...

> > [ 74.579266] ffff8800170ed010: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
> > [ 74.584581] ffff8800170ed020: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
> > [ 74.590036] ffff8800170ed030: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
> > [ 74.595386] XFS (vdb): Internal error xfs_sb_read_verify at line 730 of file /c/kernel-tests/src/stable/fs/xfs/xfs_mount.c. Caller 0xffffffff81309514
> .....
> > [ 74.692206] XFS (vdb): Corruption detected. Unmount and run xfs_repair
> > [ 74.696000] XFS (vdb): SB validate failed with error 22.
>
> It's obviously not an XFS filesystem you are asking the kernel to
> mount, so it's perfectly valid to throw a corruption error at you.
> What it has actually thrown is EWRONGFS, but because you've asked
> the kernel specifically to mount the device as an XFS filesystem,
> the kernel is explicitly telling you that it's a corrupt
> filesystem... :)

We did have an issue in this area in 3.7 which fixed in commit aeb4f20a that
made 3.8. We were returning EFSCORRUPTED instead of EWRONGFS. Maybe that's
not your kernel.

Regards,
Ben