2022-03-31 15:39:44

by Ritesh Harjani

[permalink] [raw]
Subject: [PATCHv3 1/4] generic/468: Add another falloc test entry

From: Ritesh Harjani <[email protected]>

Add another falloc test entry which could hit a kernel bug
with ext4 fast_commit feature w/o below kernel commit [1].

<log>
[ 410.888496][ T2743] BUG: KASAN: use-after-free in ext4_mb_mark_bb+0x26a/0x6c0
[ 410.890432][ T2743] Read of size 8 at addr ffff888171886000 by task mount/2743

This happens when falloc -k size is huge which spans across more than
1 flex block group in ext4. This causes a bug in fast_commit replay
code which is fixed by kernel commit at [1].

[1]: https://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4.git/commit/?h=dev&id=bfdc502a4a4c058bf4cbb1df0c297761d528f54d

Signed-off-by: Ritesh Harjani <[email protected]>
---
tests/generic/468 | 8 ++++++++
tests/generic/468.out | 2 ++
2 files changed, 10 insertions(+)

diff --git a/tests/generic/468 b/tests/generic/468
index 95752d3b..5e73cff9 100755
--- a/tests/generic/468
+++ b/tests/generic/468
@@ -34,6 +34,13 @@ _scratch_mkfs >/dev/null 2>&1
_require_metadata_journaling $SCRATCH_DEV
_scratch_mount

+# blocksize and fact are used in the last case of the fsync/fdatasync test.
+# This is mainly trying to test recovery operation in case where the data
+# blocks written, exceeds the default flex group size (32768*4096*16) in ext4.
+blocks=32768
+blocksize=4096
+fact=18
+
testfile=$SCRATCH_MNT/testfile

# check inode metadata after shutdown
@@ -85,6 +92,7 @@ for i in fsync fdatasync; do
test_falloc $i "-k " 1024
test_falloc $i "-k " 4096
test_falloc $i "-k " 104857600
+ test_falloc $i "-k " $(($blocks*$blocksize*$fact))
done

status=0
diff --git a/tests/generic/468.out b/tests/generic/468.out
index b3a28d5e..a09cedb8 100644
--- a/tests/generic/468.out
+++ b/tests/generic/468.out
@@ -5,9 +5,11 @@ QA output created by 468
==== falloc -k 1024 test with fsync ====
==== falloc -k 4096 test with fsync ====
==== falloc -k 104857600 test with fsync ====
+==== falloc -k 2415919104 test with fsync ====
==== falloc 1024 test with fdatasync ====
==== falloc 4096 test with fdatasync ====
==== falloc 104857600 test with fdatasync ====
==== falloc -k 1024 test with fdatasync ====
==== falloc -k 4096 test with fdatasync ====
==== falloc -k 104857600 test with fdatasync ====
+==== falloc -k 2415919104 test with fdatasync ====
--
2.31.1


2022-04-04 08:22:38

by Dave Chinner

[permalink] [raw]
Subject: Re: [PATCHv3 1/4] generic/468: Add another falloc test entry

On Thu, Mar 31, 2022 at 06:24:20PM +0530, Ritesh Harjani wrote:
> From: Ritesh Harjani <[email protected]>
>
> Add another falloc test entry which could hit a kernel bug
> with ext4 fast_commit feature w/o below kernel commit [1].
>
> <log>
> [ 410.888496][ T2743] BUG: KASAN: use-after-free in ext4_mb_mark_bb+0x26a/0x6c0
> [ 410.890432][ T2743] Read of size 8 at addr ffff888171886000 by task mount/2743
>
> This happens when falloc -k size is huge which spans across more than
> 1 flex block group in ext4. This causes a bug in fast_commit replay
> code which is fixed by kernel commit at [1].
>
> [1]: https://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4.git/commit/?h=dev&id=bfdc502a4a4c058bf4cbb1df0c297761d528f54d
>
> Signed-off-by: Ritesh Harjani <[email protected]>
> ---
> tests/generic/468 | 8 ++++++++
> tests/generic/468.out | 2 ++
> 2 files changed, 10 insertions(+)
>
> diff --git a/tests/generic/468 b/tests/generic/468
> index 95752d3b..5e73cff9 100755
> --- a/tests/generic/468
> +++ b/tests/generic/468
> @@ -34,6 +34,13 @@ _scratch_mkfs >/dev/null 2>&1
> _require_metadata_journaling $SCRATCH_DEV
> _scratch_mount
>
> +# blocksize and fact are used in the last case of the fsync/fdatasync test.
> +# This is mainly trying to test recovery operation in case where the data
> +# blocks written, exceeds the default flex group size (32768*4096*16) in ext4.
> +blocks=32768
> +blocksize=4096

Block size can change based on mkfs parameters. You should extract
this dynamically from the filesystem the test is being run on.

> +fact=18

What is "fact" supposed to mean?

Indeed, wouldn't this simply be better as something like:

larger_than_ext4_fg_size=$((32768 * $blksize * 18))

And then

> testfile=$SCRATCH_MNT/testfile
>
> # check inode metadata after shutdown
> @@ -85,6 +92,7 @@ for i in fsync fdatasync; do
> test_falloc $i "-k " 1024
> test_falloc $i "-k " 4096
> test_falloc $i "-k " 104857600
> + test_falloc $i "-k " $(($blocks*$blocksize*$fact))

test_falloc $i "-k " $larger_than_ext4_fg_size

And just scrub all the sizes from the golden output?

Cheers,

Dave.
--
Dave Chinner
[email protected]

2022-04-05 13:38:06

by Ritesh Harjani

[permalink] [raw]
Subject: Re: [PATCHv3 1/4] generic/468: Add another falloc test entry

On 22/04/04 09:28AM, Dave Chinner wrote:
> On Thu, Mar 31, 2022 at 06:24:20PM +0530, Ritesh Harjani wrote:
> > From: Ritesh Harjani <[email protected]>
> >
> > Add another falloc test entry which could hit a kernel bug
> > with ext4 fast_commit feature w/o below kernel commit [1].
> >
> > <log>
> > [ 410.888496][ T2743] BUG: KASAN: use-after-free in ext4_mb_mark_bb+0x26a/0x6c0
> > [ 410.890432][ T2743] Read of size 8 at addr ffff888171886000 by task mount/2743
> >
> > This happens when falloc -k size is huge which spans across more than
> > 1 flex block group in ext4. This causes a bug in fast_commit replay
> > code which is fixed by kernel commit at [1].
> >
> > [1]: https://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4.git/commit/?h=dev&id=bfdc502a4a4c058bf4cbb1df0c297761d528f54d
> >
> > Signed-off-by: Ritesh Harjani <[email protected]>
> > ---
> > tests/generic/468 | 8 ++++++++
> > tests/generic/468.out | 2 ++
> > 2 files changed, 10 insertions(+)
> >
> > diff --git a/tests/generic/468 b/tests/generic/468
> > index 95752d3b..5e73cff9 100755
> > --- a/tests/generic/468
> > +++ b/tests/generic/468
> > @@ -34,6 +34,13 @@ _scratch_mkfs >/dev/null 2>&1
> > _require_metadata_journaling $SCRATCH_DEV
> > _scratch_mount
> >
> > +# blocksize and fact are used in the last case of the fsync/fdatasync test.
> > +# This is mainly trying to test recovery operation in case where the data
> > +# blocks written, exceeds the default flex group size (32768*4096*16) in ext4.
> > +blocks=32768
> > +blocksize=4096
>
> Block size can change based on mkfs parameters. You should extract
> this dynamically from the filesystem the test is being run on.
>

Yes, but we still have kept just 4096 because, anything bigger than that like
65536 might require a bigger disk size itself to test. The overall size
requirement of the disk will then become ~36G (32768 * 65536 * 18)
Hence I went ahead with 4096 which is good enough for testing.

But sure, I will add a comment explaining why we have hardcoded it to 4096
so that others don't get confused. Larger than this size disk anyway doesn't get
tested much right?


> > +fact=18
>
> What is "fact" supposed to mean?
>
> Indeed, wouldn't this simply be better as something like:
>
> larger_than_ext4_fg_size=$((32768 * $blksize * 18))
>
> And then
>
> > testfile=$SCRATCH_MNT/testfile
> >
> > # check inode metadata after shutdown
> > @@ -85,6 +92,7 @@ for i in fsync fdatasync; do
> > test_falloc $i "-k " 1024
> > test_falloc $i "-k " 4096
> > test_falloc $i "-k " 104857600
> > + test_falloc $i "-k " $(($blocks*$blocksize*$fact))
>
> test_falloc $i "-k " $larger_than_ext4_fg_size
>

Yes, looks good to me. Thanks for suggestion.


> And just scrub all the sizes from the golden output?
>

This won't be needed since I still would like to go with 4096 blocksize,
to avoid a large disk size requirement which anyway won't be tested much.

If this sounds good to you, I will fix rest of the changes as discussed in
the next revision.

-ritesh

2022-04-06 13:59:23

by Theodore Ts'o

[permalink] [raw]
Subject: Re: [PATCHv3 1/4] generic/468: Add another falloc test entry

On Tue, Apr 05, 2022 at 04:36:03PM +0530, Ritesh Harjani wrote:
> > > +# blocksize and fact are used in the last case of the fsync/fdatasync test.
> > > +# This is mainly trying to test recovery operation in case where the data
> > > +# blocks written, exceeds the default flex group size (32768*4096*16) in ext4.
> > > +blocks=32768
> > > +blocksize=4096
> >
> > Block size can change based on mkfs parameters. You should extract
> > this dynamically from the filesystem the test is being run on.
> >
>
> Yes, but we still have kept just 4096 because, anything bigger than that like
> 65536 might require a bigger disk size itself to test. The overall size
> requirement of the disk will then become ~36G (32768 * 65536 * 18)
> Hence I went ahead with 4096 which is good enough for testing.

What if the block size is *smaller*? For example, I run an ext4/1k
configuration (which is how I test block size > page size on x86 VM's :-).

> But sure, I will add a comment explaining why we have hardcoded it to 4096
> so that others don't get confused. Larger than this size disk anyway doesn't get
> tested much right?

At $WORK we use a 100GB disk by default when running xfstests, and I
wouldn't be surprised if theree are other folks who might use larger
disk sizes.

Maybe test to see whether the scratch disk is too small for the given
parameters and if so skip the test using _notrun?

- Ted

2022-04-06 14:03:25

by Dave Chinner

[permalink] [raw]
Subject: Re: [PATCHv3 1/4] generic/468: Add another falloc test entry

On Tue, Apr 05, 2022 at 04:36:03PM +0530, Ritesh Harjani wrote:
> On 22/04/04 09:28AM, Dave Chinner wrote:
> > On Thu, Mar 31, 2022 at 06:24:20PM +0530, Ritesh Harjani wrote:
> > > From: Ritesh Harjani <[email protected]>
> > >
> > > Add another falloc test entry which could hit a kernel bug
> > > with ext4 fast_commit feature w/o below kernel commit [1].
> > >
> > > <log>
> > > [ 410.888496][ T2743] BUG: KASAN: use-after-free in ext4_mb_mark_bb+0x26a/0x6c0
> > > [ 410.890432][ T2743] Read of size 8 at addr ffff888171886000 by task mount/2743
> > >
> > > This happens when falloc -k size is huge which spans across more than
> > > 1 flex block group in ext4. This causes a bug in fast_commit replay
> > > code which is fixed by kernel commit at [1].
> > >
> > > [1]: https://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4.git/commit/?h=dev&id=bfdc502a4a4c058bf4cbb1df0c297761d528f54d
> > >
> > > Signed-off-by: Ritesh Harjani <[email protected]>
> > > ---
> > > tests/generic/468 | 8 ++++++++
> > > tests/generic/468.out | 2 ++
> > > 2 files changed, 10 insertions(+)
> > >
> > > diff --git a/tests/generic/468 b/tests/generic/468
> > > index 95752d3b..5e73cff9 100755
> > > --- a/tests/generic/468
> > > +++ b/tests/generic/468
> > > @@ -34,6 +34,13 @@ _scratch_mkfs >/dev/null 2>&1
> > > _require_metadata_journaling $SCRATCH_DEV
> > > _scratch_mount
> > >
> > > +# blocksize and fact are used in the last case of the fsync/fdatasync test.
> > > +# This is mainly trying to test recovery operation in case where the data
> > > +# blocks written, exceeds the default flex group size (32768*4096*16) in ext4.
> > > +blocks=32768
> > > +blocksize=4096
> >
> > Block size can change based on mkfs parameters. You should extract
> > this dynamically from the filesystem the test is being run on.
> >
>
> Yes, but we still have kept just 4096 because, anything bigger than that like
> 65536 might require a bigger disk size itself to test. The overall size
> requirement of the disk will then become ~36G (32768 * 65536 * 18)
> Hence I went ahead with 4096 which is good enough for testing.

If the test setup doesn't have a disk large enough, then the test
should be skipped. That's what '_require_scratch_size' is for.

i.e. _require_scratch_size $larger_than_ext4_fg_size

Will do that check once we've calculated the size needed.

> But sure, I will add a comment explaining why we have hardcoded it to 4096
> so that others don't get confused. Larger than this size disk anyway doesn't get
> tested much right?

You shouldn't be constricting the test based on assumptions about
test configurations. If someone decides to test 64k block size, then
they can size their devices appropriately for the configuration they
want to test. If a 64kB block size filesystem can overrun the
on-disk structure and fail, then the test should exercise that and
fail appropriately.

Cheers,

Dave.
--
Dave Chinner
[email protected]

2022-04-06 15:42:47

by Ritesh Harjani

[permalink] [raw]
Subject: Re: [PATCHv3 1/4] generic/468: Add another falloc test entry

On 22/04/06 02:05PM, Dave Chinner wrote:
> On Tue, Apr 05, 2022 at 04:36:03PM +0530, Ritesh Harjani wrote:
> > On 22/04/04 09:28AM, Dave Chinner wrote:
> > > On Thu, Mar 31, 2022 at 06:24:20PM +0530, Ritesh Harjani wrote:
> > > > From: Ritesh Harjani <[email protected]>
> > > >
> > > > Add another falloc test entry which could hit a kernel bug
> > > > with ext4 fast_commit feature w/o below kernel commit [1].
> > > >
> > > > <log>
> > > > [ 410.888496][ T2743] BUG: KASAN: use-after-free in ext4_mb_mark_bb+0x26a/0x6c0
> > > > [ 410.890432][ T2743] Read of size 8 at addr ffff888171886000 by task mount/2743
> > > >
> > > > This happens when falloc -k size is huge which spans across more than
> > > > 1 flex block group in ext4. This causes a bug in fast_commit replay
> > > > code which is fixed by kernel commit at [1].
> > > >
> > > > [1]: https://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4.git/commit/?h=dev&id=bfdc502a4a4c058bf4cbb1df0c297761d528f54d
> > > >
> > > > Signed-off-by: Ritesh Harjani <[email protected]>
> > > > ---
> > > > tests/generic/468 | 8 ++++++++
> > > > tests/generic/468.out | 2 ++
> > > > 2 files changed, 10 insertions(+)
> > > >
> > > > diff --git a/tests/generic/468 b/tests/generic/468
> > > > index 95752d3b..5e73cff9 100755
> > > > --- a/tests/generic/468
> > > > +++ b/tests/generic/468
> > > > @@ -34,6 +34,13 @@ _scratch_mkfs >/dev/null 2>&1
> > > > _require_metadata_journaling $SCRATCH_DEV
> > > > _scratch_mount
> > > >
> > > > +# blocksize and fact are used in the last case of the fsync/fdatasync test.
> > > > +# This is mainly trying to test recovery operation in case where the data
> > > > +# blocks written, exceeds the default flex group size (32768*4096*16) in ext4.
> > > > +blocks=32768
> > > > +blocksize=4096
> > >
> > > Block size can change based on mkfs parameters. You should extract
> > > this dynamically from the filesystem the test is being run on.
> > >
> >
> > Yes, but we still have kept just 4096 because, anything bigger than that like
> > 65536 might require a bigger disk size itself to test. The overall size
> > requirement of the disk will then become ~36G (32768 * 65536 * 18)
> > Hence I went ahead with 4096 which is good enough for testing.
>
> If the test setup doesn't have a disk large enough, then the test
> should be skipped. That's what '_require_scratch_size' is for.
>
> i.e. _require_scratch_size $larger_than_ext4_fg_size
>
> Will do that check once we've calculated the size needed.

Sure.

>
> > But sure, I will add a comment explaining why we have hardcoded it to 4096
> > so that others don't get confused. Larger than this size disk anyway doesn't get
> > tested much right?
>
> You shouldn't be constricting the test based on assumptions about
> test configurations. If someone decides to test 64k block size, then
> they can size their devices appropriately for the configuration they
> want to test. If a 64kB block size filesystem can overrun the
> on-disk structure and fail, then the test should exercise that and
> fail appropriately.

Sure Dave. Got the point. I will try and make the changes, such that
test doesn't assume any particular user test configuration. And be generic as
much as possible so that we could hit the issue we are aiming via this test.

-ritesh

2022-04-06 15:44:40

by Ritesh Harjani

[permalink] [raw]
Subject: Re: [PATCHv3 1/4] generic/468: Add another falloc test entry

On 22/04/05 06:00PM, Theodore Ts'o wrote:
> On Tue, Apr 05, 2022 at 04:36:03PM +0530, Ritesh Harjani wrote:
> > > > +# blocksize and fact are used in the last case of the fsync/fdatasync test.
> > > > +# This is mainly trying to test recovery operation in case where the data
> > > > +# blocks written, exceeds the default flex group size (32768*4096*16) in ext4.
> > > > +blocks=32768
> > > > +blocksize=4096
> > >
> > > Block size can change based on mkfs parameters. You should extract
> > > this dynamically from the filesystem the test is being run on.
> > >
> >
> > Yes, but we still have kept just 4096 because, anything bigger than that like
> > 65536 might require a bigger disk size itself to test. The overall size
> > requirement of the disk will then become ~36G (32768 * 65536 * 18)
> > Hence I went ahead with 4096 which is good enough for testing.
>
> What if the block size is *smaller*? For example, I run an ext4/1k
> configuration (which is how I test block size > page size on x86 VM's :-).

For 1k bs, this test can still reproduce the problem. Because the given size
will easily overflow the required number of blocks in 1K case.

>
> > But sure, I will add a comment explaining why we have hardcoded it to 4096
> > so that others don't get confused. Larger than this size disk anyway doesn't get
> > tested much right?
>
> At $WORK we use a 100GB disk by default when running xfstests, and I
> wouldn't be surprised if theree are other folks who might use larger
> disk sizes.

Ohk, sure. Thanks for the info.

>
> Maybe test to see whether the scratch disk is too small for the given
> parameters and if so skip the test using _notrun?
>

Yes, I think I got the point. I will make the changes accordingly.

-ritesh