2023-03-17 01:18:19

by kernel test robot

[permalink] [raw]
Subject: [linus:master] [btrfs] 5f58d783fd: xfstests.btrfs.172.fail


Greeting,

FYI, we noticed xfstests.btrfs.172.fail due to commit (built with gcc-11):

commit: 5f58d783fd7823b2c2d5954d1126e702f94bfc4c ("btrfs: free device in btrfs_close_devices for a single device filesystem")
https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git master

in testcase: xfstests
version: xfstests-i386-5a5e419-1_20220926
with following parameters:

disk: 6HDD
fs: btrfs
test: btrfs-logwrites

test-description: xfstests is a regression test suite for xfs and other files ystems.
test-url: git://git.kernel.org/pub/scm/fs/xfs/xfstests-dev.git


on test machine: 8 threads 1 sockets Intel(R) Core(TM) i7-4770 CPU @ 3.40GHz (Haswell) with 8G memory

caused below changes (please refer to attached dmesg/kmsg for entire log/backtrace):


we did this test on i386 and we found the issue doesn't always happen,
on both this commit and v6.3-rc2, it failed in about half of tests.
however, always clean on parent commit 519b7e13b5ae8dd3.

519b7e13b5ae8dd3 5f58d783fd7823b2c2d5954d112 v6.3-rc2
---------------- --------------------------- ---------------------------
fail:runs %reproduction fail:runs %reproduction fail:runs
| | | | |
:14 23% 7:13 33% 10:19 xfstests.btrfs.172.fail
14:14 -27% 6:13 -17% 9:19 xfstests.btrfs.172.pass



If you fix the issue, kindly add following tag
| Reported-by: kernel test robot <[email protected]>
| Link: https://lore.kernel.org/oe-lkp/[email protected]

2023-03-08 04:47:50 export TEST_DIR=/fs/sdb1
2023-03-08 04:47:50 export TEST_DEV=/dev/sdb1
2023-03-08 04:47:50 export FSTYP=btrfs
2023-03-08 04:47:50 export SCRATCH_MNT=/fs/scratch
2023-03-08 04:47:50 mkdir /fs/scratch -p
2023-03-08 04:47:50 export SCRATCH_DEV_POOL="/dev/sdb2 /dev/sdb3 /dev/sdb4 /dev/sdb5 /dev/sdb6"
2023-03-08 04:47:50 export LOGWRITES_DEV=/dev/sdb2
2023-03-08 04:47:50 export SCRATCH_DEV=/dev/sdb6
2023-03-08 04:47:50 unset SCRATCH_DEV_POOL
2023-03-08 04:47:50 sed "s:^:btrfs/:" //lkp/benchmarks/xfstests/tests/btrfs-logwrites
2023-03-08 04:47:50 ./check btrfs/291 btrfs/206 btrfs/196 btrfs/192 btrfs/190 btrfs/172
FSTYP -- btrfs
PLATFORM -- Linux/i686 lkp-hsw-d01 6.1.0-rc8-00284-g5f58d783fd78 #1 SMP Wed Mar 8 11:05:36 CST 2023
MKFS_OPTIONS -- /dev/sdb6
MOUNT_OPTIONS -- /dev/sdb6 /fs/scratch

btrfs/172 [failed, exit status 1]- output mismatch (see /lkp/benchmarks/xfstests/results//btrfs/172.out.bad)
--- tests/btrfs/172.out 2022-09-26 09:38:15.000000000 +0000
+++ /lkp/benchmarks/xfstests/results//btrfs/172.out.bad 2023-03-08 04:47:54.134026209 +0000
@@ -1,3 +1,5 @@
QA output created by 172
wrote 5242880/5242880 bytes at offset 0
XXX Bytes, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
+_check_btrfs_filesystem: filesystem on /dev/sdb6 is inconsistent
+(see /lkp/benchmarks/xfstests/results//btrfs/172.full for details)
...
(Run 'diff -u /lkp/benchmarks/xfstests/tests/btrfs/172.out /lkp/benchmarks/xfstests/results//btrfs/172.out.bad' to see the entire diff)
btrfs/190 15s
btrfs/192 101s
btrfs/196 163s
btrfs/206 4s
btrfs/291 [not run] kernel btrfs isn't configured with verity support
Ran: btrfs/172 btrfs/190 btrfs/192 btrfs/196 btrfs/206 btrfs/291
Not run: btrfs/291
Failures: btrfs/172
Failed 1 of 6 tests




To reproduce:

git clone https://github.com/intel/lkp-tests.git
cd lkp-tests
sudo bin/lkp install job.yaml # job file is attached in this email
bin/lkp split-job --compatible job.yaml # generate the yaml file for lkp run
sudo bin/lkp run generated-yaml-file

# if come across any failure that blocks the test,
# please remove ~/.lkp and /lkp dir to run from a clean state.



--
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests



Attachments:
(No filename) (3.90 kB)
config-6.1.0-rc8-00284-g5f58d783fd78 (150.24 kB)
job-script (5.54 kB)
dmesg.xz (44.83 kB)
xfstests (1.80 kB)
job.yaml (4.79 kB)
reproduce (1.03 kB)
Download all attachments

2023-03-17 04:15:03

by Anand Jain

[permalink] [raw]
Subject: Re: [linus:master] [btrfs] 5f58d783fd: xfstests.btrfs.172.fail



On 3/17/23 09:17, kernel test rboot wrote:
>
> Greeting,
>
> FYI, we noticed xfstests.btrfs.172.fail due to commit (built with gcc-11):
>
> commit: 5f58d783fd7823b2c2d5954d1126e702f94bfc4c ("btrfs: free device in btrfs_close_devices for a single device filesystem")
> https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git master
>
> in testcase: xfstests
> version: xfstests-i386-5a5e419-1_20220926
> with following parameters:
>
> disk: 6HDD
> fs: btrfs
> test: btrfs-logwrites
>
> test-description: xfstests is a regression test suite for xfs and other files ystems.
> test-url: git://git.kernel.org/pub/scm/fs/xfs/xfstests-dev.git
>
>
> on test machine: 8 threads 1 sockets Intel(R) Core(TM) i7-4770 CPU @ 3.40GHz (Haswell) with 8G memory
>
> caused below changes (please refer to attached dmesg/kmsg for entire log/backtrace):
>
>
> we did this test on i386 and we found the issue doesn't always happen,
> on both this commit and v6.3-rc2, it failed in about half of tests.
> however, always clean on parent commit 519b7e13b5ae8dd3.
>
> 519b7e13b5ae8dd3 5f58d783fd7823b2c2d5954d112 v6.3-rc2
> ---------------- --------------------------- ---------------------------
> fail:runs %reproduction fail:runs %reproduction fail:runs
> | | | | |
> :14 23% 7:13 33% 10:19 xfstests.btrfs.172.fail
> 14:14 -27% 6:13 -17% 9:19 xfstests.btrfs.172.pass
>
>
>
> If you fix the issue, kindly add following tag
> | Reported-by: kernel test robot <[email protected]>
> | Link: https://lore.kernel.org/oe-lkp/[email protected]
>
> 2023-03-08 04:47:50 export TEST_DIR=/fs/sdb1
> 2023-03-08 04:47:50 export TEST_DEV=/dev/sdb1
> 2023-03-08 04:47:50 export FSTYP=btrfs
> 2023-03-08 04:47:50 export SCRATCH_MNT=/fs/scratch
> 2023-03-08 04:47:50 mkdir /fs/scratch -p
> 2023-03-08 04:47:50 export SCRATCH_DEV_POOL="/dev/sdb2 /dev/sdb3 /dev/sdb4 /dev/sdb5 /dev/sdb6"
> 2023-03-08 04:47:50 export LOGWRITES_DEV=/dev/sdb2
> 2023-03-08 04:47:50 export SCRATCH_DEV=/dev/sdb6
> 2023-03-08 04:47:50 unset SCRATCH_DEV_POOL
> 2023-03-08 04:47:50 sed "s:^:btrfs/:" //lkp/benchmarks/xfstests/tests/btrfs-logwrites
> 2023-03-08 04:47:50 ./check btrfs/291 btrfs/206 btrfs/196 btrfs/192 btrfs/190 btrfs/172
> FSTYP -- btrfs
> PLATFORM -- Linux/i686 lkp-hsw-d01 6.1.0-rc8-00284-g5f58d783fd78 #1 SMP Wed Mar 8 11:05:36 CST 2023
> MKFS_OPTIONS -- /dev/sdb6
> MOUNT_OPTIONS -- /dev/sdb6 /fs/scratch
>
> btrfs/172 [failed, exit status 1]- output mismatch (see /lkp/benchmarks/xfstests/results//btrfs/172.out.bad)
> --- tests/btrfs/172.out 2022-09-26 09:38:15.000000000 +0000
> +++ /lkp/benchmarks/xfstests/results//btrfs/172.out.bad 2023-03-08 04:47:54.134026209 +0000
> @@ -1,3 +1,5 @@
> QA output created by 172
> wrote 5242880/5242880 bytes at offset 0
> XXX Bytes, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)

> +_check_btrfs_filesystem: filesystem on /dev/sdb6 is inconsistent


Thank you for reporting it. I am currently looking into it. It appears
that the device is being freed after it's closed, which could indicate
that we overlooked something at close, just a hypothesis at this point.

-Anand


> +(see /lkp/benchmarks/xfstests/results//btrfs/172.full for details)
> ...
> (Run 'diff -u /lkp/benchmarks/xfstests/tests/btrfs/172.out /lkp/benchmarks/xfstests/results//btrfs/172.out.bad' to see the entire diff)
> btrfs/190 15s
> btrfs/192 101s
> btrfs/196 163s
> btrfs/206 4s
> btrfs/291 [not run] kernel btrfs isn't configured with verity support
> Ran: btrfs/172 btrfs/190 btrfs/192 btrfs/196 btrfs/206 btrfs/291
> Not run: btrfs/291
> Failures: btrfs/172
> Failed 1 of 6 tests
>
>
>
>
> To reproduce:
>
> git clone https://github.com/intel/lkp-tests.git
> cd lkp-tests
> sudo bin/lkp install job.yaml # job file is attached in this email
> bin/lkp split-job --compatible job.yaml # generate the yaml file for lkp run
> sudo bin/lkp run generated-yaml-file
>
> # if come across any failure that blocks the test,
> # please remove ~/.lkp and /lkp dir to run from a clean state.
>
>
>

2023-03-23 00:08:51

by Anand Jain

[permalink] [raw]
Subject: Re: [linus:master] [btrfs] 5f58d783fd: xfstests.btrfs.172.fail

On 17/03/2023 12:14, Anand Jain wrote:
>
>
> On 3/17/23 09:17, kernel test rboot wrote:
>>
>> Greeting,
>>
>> FYI, we noticed xfstests.btrfs.172.fail due to commit (built with
>> gcc-11):
>>
>> commit: 5f58d783fd7823b2c2d5954d1126e702f94bfc4c ("btrfs: free device
>> in btrfs_close_devices for a single device filesystem")
>> https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git master
>>
>> in testcase: xfstests
>> version: xfstests-i386-5a5e419-1_20220926
>> with following parameters:
>>
>>     disk: 6HDD
>>     fs: btrfs
>>     test: btrfs-logwrites
>>
>> test-description: xfstests is a regression test suite for xfs and
>> other files ystems.
>> test-url: git://git.kernel.org/pub/scm/fs/xfs/xfstests-dev.git
>>
>>
>> on test machine: 8 threads 1 sockets Intel(R) Core(TM) i7-4770 CPU @
>> 3.40GHz (Haswell) with 8G memory
>>
>> caused below changes (please refer to attached dmesg/kmsg for entire
>> log/backtrace):
>>
>>
>> we did this test on i386 and we found the issue doesn't always happen,
>> on both this commit and v6.3-rc2, it failed in about half of tests.
>> however, always clean on parent commit 519b7e13b5ae8dd3.
>>
>> 519b7e13b5ae8dd3 5f58d783fd7823b2c2d5954d112                    v6.3-rc2
>> ---------------- --------------------------- ---------------------------
>>         fail:runs  %reproduction    fail:runs  %reproduction    fail:runs
>>             |             |             |             |             |
>>             :14          23%           7:13          33%
>> 10:19    xfstests.btrfs.172.fail
>>           14:14         -27%           6:13         -17%
>> 9:19    xfstests.btrfs.172.pass
>>
>>
>>
>> If you fix the issue, kindly add following tag
>> | Reported-by: kernel test robot <[email protected]>
>> | Link:
>> https://lore.kernel.org/oe-lkp/[email protected]
>>
>> 2023-03-08 04:47:50 export TEST_DIR=/fs/sdb1
>> 2023-03-08 04:47:50 export TEST_DEV=/dev/sdb1
>> 2023-03-08 04:47:50 export FSTYP=btrfs
>> 2023-03-08 04:47:50 export SCRATCH_MNT=/fs/scratch
>> 2023-03-08 04:47:50 mkdir /fs/scratch -p
>> 2023-03-08 04:47:50 export SCRATCH_DEV_POOL="/dev/sdb2 /dev/sdb3
>> /dev/sdb4 /dev/sdb5 /dev/sdb6"
>> 2023-03-08 04:47:50 export LOGWRITES_DEV=/dev/sdb2
>> 2023-03-08 04:47:50 export SCRATCH_DEV=/dev/sdb6
>> 2023-03-08 04:47:50 unset SCRATCH_DEV_POOL
>> 2023-03-08 04:47:50 sed "s:^:btrfs/:"
>> //lkp/benchmarks/xfstests/tests/btrfs-logwrites
>> 2023-03-08 04:47:50 ./check btrfs/291 btrfs/206 btrfs/196 btrfs/192
>> btrfs/190 btrfs/172
>> FSTYP         -- btrfs
>> PLATFORM      -- Linux/i686 lkp-hsw-d01 6.1.0-rc8-00284-g5f58d783fd78
>> #1 SMP Wed Mar  8 11:05:36 CST 2023
>> MKFS_OPTIONS  -- /dev/sdb6
>> MOUNT_OPTIONS -- /dev/sdb6 /fs/scratch
>>
>> btrfs/172       [failed, exit status 1]- output mismatch (see
>> /lkp/benchmarks/xfstests/results//btrfs/172.out.bad)
>>      --- tests/btrfs/172.out    2022-09-26 09:38:15.000000000 +0000
>>      +++ /lkp/benchmarks/xfstests/results//btrfs/172.out.bad
>> 2023-03-08 04:47:54.134026209 +0000
>>      @@ -1,3 +1,5 @@
>>       QA output created by 172
>>       wrote 5242880/5242880 bytes at offset 0
>>       XXX Bytes, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
>
>>      +_check_btrfs_filesystem: filesystem on /dev/sdb6 is inconsistent
>
>
> Thank you for reporting it. I am currently looking into it. It appears
> that the device is being freed after it's closed, which could indicate
> that we overlooked something at close, just a hypothesis at this point.
>
> -Anand
>
>


>>      +(see /lkp/benchmarks/xfstests/results//btrfs/172.full for details)


Hmm, I'm unable to reproduce the issue. However, it's possible that the
O_EXCL open for the 'btrfs check' command failed because 'systemd-udev'
was scanning the btrfs device at the same time. I noticed from the dmesg
that the 'systemd-udevd' thread was running [1] at some point. I'm not
entirely sure if it raced with the 'btrfs check' command to successfully
acquire the O_EXCL lock. If you could send me the 'testcase.full' and
'testcase.out.bad' logs from the system, I could verify the issue.

[1]
[ 337.769932][ T6408] BTRFS: device fsid
8dcaa8fb-b317-4e13-9e04-e9b63fe91948 devid 1 transid 6 /dev/sdb6 scanned
by systemd-udevd (6408)


Could you please also attach the full log and out.bad file for
the test case in the report as shown below?

/lkp/benchmarks/xfstests/results/btrfs/<test-case-number>.full
and
/lkp/benchmarks/xfstests/results/btrfs/<test-case-number>.out.bad


Thanks, Anand

>>      ...
>>      (Run 'diff -u /lkp/benchmarks/xfstests/tests/btrfs/172.out
>> /lkp/benchmarks/xfstests/results//btrfs/172.out.bad'  to see the
>> entire diff)
>> btrfs/190        15s
>> btrfs/192        101s
>> btrfs/196        163s
>> btrfs/206        4s
>> btrfs/291       [not run] kernel btrfs isn't configured with verity
>> support
>> Ran: btrfs/172 btrfs/190 btrfs/192 btrfs/196 btrfs/206 btrfs/291
>> Not run: btrfs/291
>> Failures: btrfs/172
>> Failed 1 of 6 tests
>>
>>
>>
>>
>> To reproduce:
>>
>>          git clone https://github.com/intel/lkp-tests.git
>>          cd lkp-tests
>>          sudo bin/lkp install job.yaml           # job file is
>> attached in this email
>>          bin/lkp split-job --compatible job.yaml # generate the yaml
>> file for lkp run
>>          sudo bin/lkp run generated-yaml-file
>>
>>          # if come across any failure that blocks the test,
>>          # please remove ~/.lkp and /lkp dir to run from a clean state.
>>
>>
>>

2023-03-23 04:11:04

by kernel test robot

[permalink] [raw]
Subject: Re: [linus:master] [btrfs] 5f58d783fd: xfstests.btrfs.172.fail

hi, Anand,

On Thu, Mar 23, 2023 at 07:49:27AM +0800, Anand Jain wrote:
> On 17/03/2023 12:14, Anand Jain wrote:
> >
> >
> > Thank you for reporting it. I am currently looking into it. It appears
> > that the device is being freed after it's closed, which could indicate
> > that we overlooked something at close, just a hypothesis at this point.
> >
> > -Anand
> >
> >
>
>
> > > ???? +(see /lkp/benchmarks/xfstests/results//btrfs/172.full for details)
>
>
> Hmm, I'm unable to reproduce the issue. However, it's possible that the
> O_EXCL open for the 'btrfs check' command failed because 'systemd-udev'
> was scanning the btrfs device at the same time. I noticed from the dmesg
> that the 'systemd-udevd' thread was running [1] at some point. I'm not
> entirely sure if it raced with the 'btrfs check' command to successfully
> acquire the O_EXCL lock. If you could send me the 'testcase.full' and
> 'testcase.out.bad' logs from the system, I could verify the issue.
>
> [1]
> [ 337.769932][ T6408] BTRFS: device fsid
> 8dcaa8fb-b317-4e13-9e04-e9b63fe91948 devid 1 transid 6 /dev/sdb6 scanned by
> systemd-udevd (6408)
>
>
> Could you please also attach the full log and out.bad file for
> the test case in the report as shown below?
>
> /lkp/benchmarks/xfstests/results/btrfs/<test-case-number>.full
> and
> /lkp/benchmarks/xfstests/results/btrfs/<test-case-number>.out.bad
>

as attached. Thanks!

>
> Thanks, Anand
>


Attachments:
(No filename) (1.45 kB)
172.full (75.45 kB)
172.out.bad (261.00 B)
Download all attachments

2023-03-23 04:27:13

by Anand Jain

[permalink] [raw]
Subject: Re: [linus:master] [btrfs] 5f58d783fd: xfstests.btrfs.172.fail



On 23/03/2023 12:07, Oliver Sang wrote:
> hi, Anand,
>
> On Thu, Mar 23, 2023 at 07:49:27AM +0800, Anand Jain wrote:
>> On 17/03/2023 12:14, Anand Jain wrote:
>>>
>>>
>>> Thank you for reporting it. I am currently looking into it. It appears
>>> that the device is being freed after it's closed, which could indicate
>>> that we overlooked something at close, just a hypothesis at this point.
>>>
>>> -Anand
>>>
>>>
>>
>>
>>>>      +(see /lkp/benchmarks/xfstests/results//btrfs/172.full for details)
>>
>>
>> Hmm, I'm unable to reproduce the issue. However, it's possible that the
>> O_EXCL open for the 'btrfs check' command failed because 'systemd-udev'
>> was scanning the btrfs device at the same time. I noticed from the dmesg
>> that the 'systemd-udevd' thread was running [1] at some point. I'm not
>> entirely sure if it raced with the 'btrfs check' command to successfully
>> acquire the O_EXCL lock. If you could send me the 'testcase.full' and
>> 'testcase.out.bad' logs from the system, I could verify the issue.
>>
>> [1]
>> [ 337.769932][ T6408] BTRFS: device fsid
>> 8dcaa8fb-b317-4e13-9e04-e9b63fe91948 devid 1 transid 6 /dev/sdb6 scanned by
>> systemd-udevd (6408)
>>
>>
>> Could you please also attach the full log and out.bad file for
>> the test case in the report as shown below?
>>
>> /lkp/benchmarks/xfstests/results/btrfs/<test-case-number>.full
>> and
>> /lkp/benchmarks/xfstests/results/btrfs/<test-case-number>.out.bad
>>
>
> as attached. Thanks!


Thanks.

My assumption about the reason for fsck failure was correct, it failed
due to device busy.

*** fsck.btrfs output ***
ERROR: cannot open device '/dev/sdb6': Device or resource busy
ERROR: cannot open file system
Opening filesystem to check...
*** end fsck.btrfs output

I am testing a fix, I will send it.

Anand