2016-03-11 15:40:21

by Eryu Guan

[permalink] [raw]
Subject: [PATCH 1/2] common: make _dmerror_init accept device and mount point as param

Currently dmerror code takes use of SCRATCH_DEV and SCRATCH_MNT as the
backend device and mount point, and there's no way to change them.

Now teach _dmerror_init to accept first argument as backend device and
second argument as the alternative mount point, this can be useful when
SCRATCH_DEV and/or SCRATCH_MNT is not suitable for the test.

Signed-off-by: Eryu Guan <[email protected]>
---
common/dmerror | 9 +++++----
1 file changed, 5 insertions(+), 4 deletions(-)

diff --git a/common/dmerror b/common/dmerror
index 004530d..b2f1e8f 100644
--- a/common/dmerror
+++ b/common/dmerror
@@ -20,7 +20,8 @@

_dmerror_init()
{
- local dm_backing_dev=$SCRATCH_DEV
+ local dm_backing_dev=${1:-$SCRATCH_DEV}
+ DMERROR_MNT=${2:-$SCRATCH_MNT}

$DMSETUP_PROG remove error-test > /dev/null 2>&1

@@ -38,7 +39,7 @@ _dmerror_init()

_dmerror_mount_options()
{
- echo `_common_dev_mount_options $*` $DMERROR_DEV $SCRATCH_MNT
+ echo `_common_dev_mount_options $*` $DMERROR_DEV $DMERROR_MNT
}

_dmerror_mount()
@@ -48,12 +49,12 @@ _dmerror_mount()

_dmerror_unmount()
{
- umount $SCRATCH_MNT
+ umount $DMERROR_MNT
}

_dmerror_cleanup()
{
- $UMOUNT_PROG $SCRATCH_MNT > /dev/null 2>&1
+ $UMOUNT_PROG $DMERROR_MNT > /dev/null 2>&1
$DMSETUP_PROG remove error-test > /dev/null 2>&1
}

--
2.5.0


2016-03-11 15:41:52

by Eryu Guan

[permalink] [raw]
Subject: [PATCH 2/2] generic: test I/O on dm error device

This is a test that performs simple I/O on dm error device, which
returns EIO on all I/O request.

This is motivated by an ext4 bug that crashes kernel on error path when
trying to update atime. Following kernel patch should fix the issue

ext4: fix NULL pointer dereference in ext4_mark_inode_dirty()

Signed-off-by: Eryu Guan <[email protected]>
---
tests/generic/338 | 90 +++++++++++++++++++++++++++++++++++++++++++++++++++
tests/generic/338.out | 2 ++
tests/generic/group | 1 +
3 files changed, 93 insertions(+)
create mode 100755 tests/generic/338
create mode 100644 tests/generic/338.out

diff --git a/tests/generic/338 b/tests/generic/338
new file mode 100755
index 0000000..cea4d82
--- /dev/null
+++ b/tests/generic/338
@@ -0,0 +1,90 @@
+#! /bin/bash
+# FS QA Test 338
+#
+# Test I/O on dm error device.
+#
+# Motivated by an ext4 bug that crashes kernel on error path when trying to
+# update atime.
+#
+#-----------------------------------------------------------------------
+# Copyright (c) 2016 Red Hat Inc., All Rights Reserved.
+#
+# This program is free software; you can redistribute it and/or
+# modify it under the terms of the GNU General Public License as
+# published by the Free Software Foundation.
+#
+# This program is distributed in the hope that it would be useful,
+# but WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+# GNU General Public License for more details.
+#
+# You should have received a copy of the GNU General Public License
+# along with this program; if not, write the Free Software Foundation,
+# Inc., 51 Franklin St, Fifth Floor, Boston, MA 02110-1301 USA
+#-----------------------------------------------------------------------
+#
+
+seq=`basename $0`
+seqres=$RESULT_DIR/$seq
+echo "QA output created by $seq"
+
+here=`pwd`
+tmp=/tmp/$$
+status=1 # failure is the default!
+trap "_cleanup; exit \$status" 0 1 2 3 15
+
+_cleanup()
+{
+ cd /
+ rm -f $tmp.*
+ _dmerror_cleanup
+ _destroy_loop_device $LOOP_DEV
+ rm -f $LOOP_FILE
+ rm -rf $LOOP_MNT
+}
+
+# get standard environment, filters and checks
+. ./common/rc
+. ./common/filter
+. ./common/dmerror
+
+# remove previous $seqres.full before test
+rm -f $seqres.full
+
+# real QA test starts here
+_supported_fs generic
+_supported_os Linux
+_require_test
+_require_loop
+_require_dm_target error
+# If TEST_DEV is not a valid block device, FSTYP cannot be mkfs'ed either
+_require_block_device $TEST_DEV
+
+echo "Silence is golden"
+
+# Use loop device as backend of dm error device, because drop_caches drops
+# caches hold by loop device too and forces reading inode info from disk and
+# triggers NULL pointer dereference on buggy ext4
+LOOP_FILE=$TEST_DIR/$seq-$$.img
+LOOP_MNT=$TEST_DIR/$seq-$$.mnt
+mkdir -p $LOOP_MNT
+$XFS_IO_PROG -fc "truncate 512M" $LOOP_FILE >>$seqres.full 2>&1
+LOOP_DEV=`_create_loop_device $LOOP_FILE`
+
+_dmerror_init $LOOP_DEV $LOOP_MNT
+_mkfs_dev $DMERROR_DEV
+# Use strictatime mount option here to force atime updates, which could help
+# trigger the NULL pointer dereference on ext4 more easily
+_dmerror_mount "-o strictatime"
+_dmerror_load_error_table
+
+# drop all caches, force reading from error device
+echo 3 > /proc/sys/vm/drop_caches
+
+# do some test I/O
+ls -l $LOOP_MNT >>$seqres.full 2>&1
+$XFS_IO_PROG -fc "pwrite 0 1M" $LOOP_MNT/testfile >>$seqres.full 2>&1
+
+# no panic no hang, success, all done
+status=0
+exit
diff --git a/tests/generic/338.out b/tests/generic/338.out
new file mode 100644
index 0000000..3482cf4
--- /dev/null
+++ b/tests/generic/338.out
@@ -0,0 +1,2 @@
+QA output created by 338
+Silence is golden
diff --git a/tests/generic/group b/tests/generic/group
index 727648c..8818827 100644
--- a/tests/generic/group
+++ b/tests/generic/group
@@ -340,3 +340,4 @@
335 auto quick metadata
336 auto quick metadata
337 auto quick metadata
+338 auto quick rw
--
2.5.0


2016-03-15 02:46:32

by Dave Chinner

[permalink] [raw]
Subject: Re: [PATCH 2/2] generic: test I/O on dm error device

On Fri, Mar 11, 2016 at 11:40:22PM +0800, Eryu Guan wrote:
> This is a test that performs simple I/O on dm error device, which
> returns EIO on all I/O request.
>
> This is motivated by an ext4 bug that crashes kernel on error path when
> trying to update atime. Following kernel patch should fix the issue
>
> ext4: fix NULL pointer dereference in ext4_mark_inode_dirty()

Why does this test require the loop device? Why can't you just
unmount the filesystem, run 'blkdev --flushbufs <dev>' to ensure
there are no cached buffers/pages on the block device, then mount
it again?

Cheers,

Dave.
--
Dave Chinner
[email protected]

2016-03-15 08:02:14

by Eryu Guan

[permalink] [raw]
Subject: Re: [PATCH 2/2] generic: test I/O on dm error device

On Tue, Mar 15, 2016 at 01:46:16PM +1100, Dave Chinner wrote:
> On Fri, Mar 11, 2016 at 11:40:22PM +0800, Eryu Guan wrote:
> > This is a test that performs simple I/O on dm error device, which
> > returns EIO on all I/O request.
> >
> > This is motivated by an ext4 bug that crashes kernel on error path when
> > trying to update atime. Following kernel patch should fix the issue
> >
> > ext4: fix NULL pointer dereference in ext4_mark_inode_dirty()
>
> Why does this test require the loop device? Why can't you just
> unmount the filesystem, run 'blkdev --flushbufs <dev>' to ensure
> there are no cached buffers/pages on the block device, then mount
> it again?

Yes, 'blockdev --flushbufs <dev>' works, and I found that I only need to
add a blockdev call before dropping caches. This makes the code much
cleaner and easier to read, perhaps the first patch can be dropped as
well. I'll send out v2 shortly. Thanks for the review!

Eryu

2016-03-15 08:12:14

by Eryu Guan

[permalink] [raw]
Subject: [PATCH v2] generic: test I/O on dm error device

This is a test that performs simple I/O on dm error device, which
returns EIO on all I/O request.

This is motivated by an ext4 bug that crashes kernel on error path when
trying to update atime. Following kernel patch should fix the issue

ext4: fix NULL pointer dereference in ext4_mark_inode_dirty()

Signed-off-by: Eryu Guan <[email protected]>
---

v2:
- use SCRATCH_DEV directly instead of loop device and call
blockdev --flushbufs $SCRATCH_DEV before drop caches (suggested by Dave)

tests/generic/338 | 80 +++++++++++++++++++++++++++++++++++++++++++++++++++
tests/generic/338.out | 2 ++
tests/generic/group | 1 +
3 files changed, 83 insertions(+)
create mode 100755 tests/generic/338
create mode 100644 tests/generic/338.out

diff --git a/tests/generic/338 b/tests/generic/338
new file mode 100755
index 0000000..235549a
--- /dev/null
+++ b/tests/generic/338
@@ -0,0 +1,80 @@
+#! /bin/bash
+# FS QA Test 338
+#
+# Test I/O on dm error device.
+#
+# Motivated by an ext4 bug that crashes kernel on error path when trying to
+# update atime.
+#
+#-----------------------------------------------------------------------
+# Copyright (c) 2016 Red Hat Inc., All Rights Reserved.
+#
+# This program is free software; you can redistribute it and/or
+# modify it under the terms of the GNU General Public License as
+# published by the Free Software Foundation.
+#
+# This program is distributed in the hope that it would be useful,
+# but WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+# GNU General Public License for more details.
+#
+# You should have received a copy of the GNU General Public License
+# along with this program; if not, write the Free Software Foundation,
+# Inc., 51 Franklin St, Fifth Floor, Boston, MA 02110-1301 USA
+#-----------------------------------------------------------------------
+#
+
+seq=`basename $0`
+seqres=$RESULT_DIR/$seq
+echo "QA output created by $seq"
+
+here=`pwd`
+tmp=/tmp/$$
+status=1 # failure is the default!
+trap "_cleanup; exit \$status" 0 1 2 3 15
+
+_cleanup()
+{
+ cd /
+ rm -f $tmp.*
+ _dmerror_cleanup
+}
+
+# get standard environment, filters and checks
+. ./common/rc
+. ./common/filter
+. ./common/dmerror
+
+# remove previous $seqres.full before test
+rm -f $seqres.full
+
+# real QA test starts here
+_supported_fs generic
+_supported_os Linux
+_require_scratch
+_require_dm_target error
+# If SCRATCH_DEV is not a valid block device, FSTYP cannot be mkfs'ed either
+_require_block_device $SCRATCH_DEV
+
+echo "Silence is golden"
+
+_dmerror_init
+_mkfs_dev $DMERROR_DEV
+
+# Use strictatime mount option here to force atime updates, which could help
+# trigger the NULL pointer dereference on ext4 more easily
+_dmerror_mount "-o strictatime"
+_dmerror_load_error_table
+
+# flush dmerror block device buffers and drop all caches, force reading from
+# error device
+blockdev --flushbufs $DMERROR_DEV
+echo 3 > /proc/sys/vm/drop_caches
+
+# do some test I/O
+ls -l $SCRATCH_MNT >>$seqres.full 2>&1
+$XFS_IO_PROG -fc "pwrite 0 1M" $SCRATCH_MNT/testfile >>$seqres.full 2>&1
+
+# no panic no hang, success, all done
+status=0
+exit
diff --git a/tests/generic/338.out b/tests/generic/338.out
new file mode 100644
index 0000000..3482cf4
--- /dev/null
+++ b/tests/generic/338.out
@@ -0,0 +1,2 @@
+QA output created by 338
+Silence is golden
diff --git a/tests/generic/group b/tests/generic/group
index 727648c..8818827 100644
--- a/tests/generic/group
+++ b/tests/generic/group
@@ -340,3 +340,4 @@
335 auto quick metadata
336 auto quick metadata
337 auto quick metadata
+338 auto quick rw
--
1.8.3.1

2016-03-23 02:53:27

by Dave Chinner

[permalink] [raw]
Subject: Re: [PATCH v2] generic: test I/O on dm error device

On Tue, Mar 15, 2016 at 04:12:14PM +0800, Eryu Guan wrote:
> This is a test that performs simple I/O on dm error device, which
> returns EIO on all I/O request.
>
> This is motivated by an ext4 bug that crashes kernel on error path when
> trying to update atime. Following kernel patch should fix the issue
>
> ext4: fix NULL pointer dereference in ext4_mark_inode_dirty()
>
> Signed-off-by: Eryu Guan <[email protected]>
> ---

Fails with:

@@ -1,2 +1,6 @@
QA output created by 338
Silence is golden
+specified blocksize 1024 is less than device physical sector size 4096
+switching to logical sector size 512
+mkfs.xfs: /dev/mapper/error-test appears to contain an existing filesystem (xfs).
+mkfs.xfs: Use the -f option to force overwrite.

And then it failed to clean up properly and caused all sorts of
subsequent problems.

-Dave.
--
Dave Chinner
[email protected]

2016-03-23 03:25:31

by Eryu Guan

[permalink] [raw]
Subject: Re: [PATCH v2] generic: test I/O on dm error device

On Wed, Mar 23, 2016 at 01:53:27PM +1100, Dave Chinner wrote:
> On Tue, Mar 15, 2016 at 04:12:14PM +0800, Eryu Guan wrote:
> > This is a test that performs simple I/O on dm error device, which
> > returns EIO on all I/O request.
> >
> > This is motivated by an ext4 bug that crashes kernel on error path when
> > trying to update atime. Following kernel patch should fix the issue
> >
> > ext4: fix NULL pointer dereference in ext4_mark_inode_dirty()
> >
> > Signed-off-by: Eryu Guan <[email protected]>
> > ---
>
> Fails with:
>
> @@ -1,2 +1,6 @@
> QA output created by 338
> Silence is golden
> +specified blocksize 1024 is less than device physical sector size 4096
> +switching to logical sector size 512
> +mkfs.xfs: /dev/mapper/error-test appears to contain an existing filesystem (xfs).
> +mkfs.xfs: Use the -f option to force overwrite.
>
> And then it failed to clean up properly and caused all sorts of
> subsequent problems.

Test passed for me, seems it has something to do with the "physical
sector size 4096" device. I'll look into it. Thanks for the review!

Eryu

2016-03-23 04:14:42

by Eryu Guan

[permalink] [raw]
Subject: Re: [PATCH v2] generic: test I/O on dm error device

On Wed, Mar 23, 2016 at 11:25:31AM +0800, Eryu Guan wrote:
> On Wed, Mar 23, 2016 at 01:53:27PM +1100, Dave Chinner wrote:
> > On Tue, Mar 15, 2016 at 04:12:14PM +0800, Eryu Guan wrote:
> > > This is a test that performs simple I/O on dm error device, which
> > > returns EIO on all I/O request.
> > >
> > > This is motivated by an ext4 bug that crashes kernel on error path when
> > > trying to update atime. Following kernel patch should fix the issue
> > >
> > > ext4: fix NULL pointer dereference in ext4_mark_inode_dirty()
> > >
> > > Signed-off-by: Eryu Guan <[email protected]>
> > > ---
> >
> > Fails with:
> >
> > @@ -1,2 +1,6 @@
> > QA output created by 338
> > Silence is golden
> > +specified blocksize 1024 is less than device physical sector size 4096
> > +switching to logical sector size 512
> > +mkfs.xfs: /dev/mapper/error-test appears to contain an existing filesystem (xfs).
> > +mkfs.xfs: Use the -f option to force overwrite.
> >
> > And then it failed to clean up properly and caused all sorts of
> > subsequent problems.
>
> Test passed for me, seems it has something to do with the "physical
> sector size 4096" device. I'll look into it. Thanks for the review!

It fails because "_mkfs_dev $DMERROR_DEV" refuses to create new fs
without "-f" option, has nothing to do with the 4k sector device. It
passed for me is because I add "-f" mkfs option to my local.config for
xfs sections, so _mkfs_dev passed. I'll send v3 to fix this.

And the test fails to do cleanups on failure because "dmsetup remove
error-test" reports device is busy. Adding a "$UDEV_SETTLE_PROG" call
before "dmsetup remove error-test" in common/dmerror fixes the issue for
me. I'll send another patch to fix it.

Thanks,
Eryu