Hi,
It is curious but it happens only on machine which have BTRFS volume
combined from two high speed nvme (pcie 4) SSD in RAID 0. On machines
with BTRFS volume from one HDD the bug does not appear.
To bisect the problematic commit, I had to sweat a lot. At each step,
I downloaded the 150 GB game "Assassin's Creed Valhalla" 4 times and
deleted it. For make sure that the commit previous to
947a629988f191807d2d22ba63ae18259bb645c5 is definitely not affected by
the bug, I downloaded this game 10 times, which should have provided
more than 1.5 Tb of data writing to the btrfs volume.
Here is result of my bisection:
947a629988f191807d2d22ba63ae18259bb645c5 is the first bad commit
commit 947a629988f191807d2d22ba63ae18259bb645c5
Author: Qu Wenruo <[email protected]>
Date: Wed Sep 14 13:32:51 2022 +0800
btrfs: move tree block parentness check into validate_extent_buffer()
[BACKGROUND]
Although both btrfs metadata and data has their read time verification
done at endio time (btrfs_validate_metadata_buffer() and
btrfs_verify_data_csum()), metadata has extra verification, mostly
parentness check including first key/transid/owner_root/level, done at
read_tree_block() and btrfs_read_extent_buffer().
On the other hand, all the data verification is done at endio context.
[ENHANCEMENT]
This patch will make a new union in btrfs_bio, taking the space of the
old data checksums, thus it will not increase the memory usage.
With that extra btrfs_tree_parent_check inside btrfs_bio, we can just
pass the check parameter into read_extent_buffer_pages(), and before
submitting the bio, we can copy the check structure into btrfs_bio.
And finally at endio time, we can grab btrfs_bio::parent_check and pass
it to validate_extent_buffer(), to move the remaining checks into it.
This brings the following benefits:
- Much simpler btrfs_read_extent_buffer()
Now it only needs to iterate through all mirrors.
- Simpler read-time transid check
Previously we go verify_parent_transid() after reading out the extent
buffer.
Now the transid check is done inside the endio function, no other
code can modify the content.
Thus no need to use the extent lock anymore.
Signed-off-by: Qu Wenruo <[email protected]>
Signed-off-by: David Sterba <[email protected]>
fs/btrfs/disk-io.c | 73 ++++++++++++++++++++++++++++++++++++++--------------
fs/btrfs/extent_io.c | 18 ++++++++++---
fs/btrfs/extent_io.h | 5 ++--
fs/btrfs/volumes.h | 25 +++++++++++++++---
4 files changed, 93 insertions(+), 28 deletions(-)
Before going to readonly, the preceding line in kernel log display a message:
[ 1908.029663] BTRFS: error (device nvme0n1p3: state A) in
btrfs_run_delayed_refs:2147: errno=-5 IO failure
I also attached a full kernel log.
--
Best Regards,
Mike Gavrilov.
On 2022/12/26 05:32, Mikhail Gavrilov wrote:
> Hi,
> It is curious but it happens only on machine which have BTRFS volume
> combined from two high speed nvme (pcie 4) SSD in RAID 0. On machines
> with BTRFS volume from one HDD the bug does not appear.
>
> To bisect the problematic commit, I had to sweat a lot. At each step,
> I downloaded the 150 GB game "Assassin's Creed Valhalla" 4 times and
> deleted it. For make sure that the commit previous to
> 947a629988f191807d2d22ba63ae18259bb645c5 is definitely not affected by
> the bug, I downloaded this game 10 times, which should have provided
> more than 1.5 Tb of data writing to the btrfs volume.
>
> Here is result of my bisection:
> 947a629988f191807d2d22ba63ae18259bb645c5 is the first bad commit
> commit 947a629988f191807d2d22ba63ae18259bb645c5
> Author: Qu Wenruo <[email protected]>
> Date: Wed Sep 14 13:32:51 2022 +0800
>
> btrfs: move tree block parentness check into validate_extent_buffer()
>
[...]
> Signed-off-by: Qu Wenruo <[email protected]>
> Signed-off-by: David Sterba <[email protected]>
>
> fs/btrfs/disk-io.c | 73 ++++++++++++++++++++++++++++++++++++++--------------
> fs/btrfs/extent_io.c | 18 ++++++++++---
> fs/btrfs/extent_io.h | 5 ++--
> fs/btrfs/volumes.h | 25 +++++++++++++++---
> 4 files changed, 93 insertions(+), 28 deletions(-)
>
> Before going to readonly, the preceding line in kernel log display a message:
> [ 1908.029663] BTRFS: error (device nvme0n1p3: state A) in
> btrfs_run_delayed_refs:2147: errno=-5 IO failure
>
> I also attached a full kernel log.
>
Thanks a lot for the full kernel log.
It indeed shows something is wrong in the run_one_delayed_ref().
But surprisingly, if there is something wrong, I'd expect more output
from btrfs, as normally if one tree block failed to pass whatever the
checks, it should cause an error message at least.
Since you can reproduce the bug (although I don't think this is easy to
reproduce), mind to apply the extra debug patch and then try to reproduce?
(Part of the patch would go upstreamed soon)
Another thing is, mind to run "btrfs check --readonly" on the fs?
I don't believe that's the case, but just in case.
Thanks,
Qu
On Mon, Dec 26, 2022 at 5:15 AM Qu Wenruo <[email protected]> wrote:
>
> Thanks a lot for the full kernel log.
>
> It indeed shows something is wrong in the run_one_delayed_ref().
> But surprisingly, if there is something wrong, I'd expect more output
> from btrfs, as normally if one tree block failed to pass whatever the
> checks, it should cause an error message at least.
>
> Since you can reproduce the bug (although I don't think this is easy to
> reproduce), mind to apply the extra debug patch and then try to reproduce?
Of course I am still able to reproduce.
The number of messages foreshadowing readonly has become somewhat more:
[ 2295.155437] BTRFS error (device nvme0n1p3): level check failed on
logical 4957418700800 mirror 1 wanted 0 found 1
[ 2295.155831] BTRFS error (device nvme0n1p3: state A): Transaction
aborted (error -5)
[ 2295.155946] BTRFS: error (device nvme0n1p3: state A) in
do_free_extent_accounting:2849: errno=-5 IO failure
[ 2295.155978] BTRFS info (device nvme0n1p3: state EA): forced readonly
[ 2295.155985] BTRFS error (device nvme0n1p3: state EA):
run_one_delayed_ref returned -5
[ 2295.156051] BTRFS: error (device nvme0n1p3: state EA) in
btrfs_run_delayed_refs:2153: errno=-5 IO failure
Of course full logs are also attached.
> Another thing is, mind to run "btrfs check --readonly" on the fs?
Result of check attached too.
--
Best Regards,
Mike Gavrilov.
On 2022/12/26 10:47, Mikhail Gavrilov wrote:
> On Mon, Dec 26, 2022 at 5:15 AM Qu Wenruo <[email protected]> wrote:
>>
>> Thanks a lot for the full kernel log.
>>
>> It indeed shows something is wrong in the run_one_delayed_ref().
>> But surprisingly, if there is something wrong, I'd expect more output
>> from btrfs, as normally if one tree block failed to pass whatever the
>> checks, it should cause an error message at least.
>>
>> Since you can reproduce the bug (although I don't think this is easy to
>> reproduce), mind to apply the extra debug patch and then try to reproduce?
>
> Of course I am still able to reproduce.
> The number of messages foreshadowing readonly has become somewhat more:
> [ 2295.155437] BTRFS error (device nvme0n1p3): level check failed on
> logical 4957418700800 mirror 1 wanted 0 found 1
OK, indeed a level mismatch.
From the remaining lines, it shows we're failing at
do_free_extent_accounting(), which failed at the btrfs_del_csums().
And inside btrfs_del_csums(), what we do are all regular btree
operations, thus the tree level check should work without problem.
Thus it seems to be a corrupted csum tree.
> [ 2295.155831] BTRFS error (device nvme0n1p3: state A): Transaction
> aborted (error -5)
> [ 2295.155946] BTRFS: error (device nvme0n1p3: state A) in
> do_free_extent_accounting:2849: errno=-5 IO failure
> [ 2295.155978] BTRFS info (device nvme0n1p3: state EA): forced readonly
> [ 2295.155985] BTRFS error (device nvme0n1p3: state EA):
> run_one_delayed_ref returned -5
> [ 2295.156051] BTRFS: error (device nvme0n1p3: state EA) in
> btrfs_run_delayed_refs:2153: errno=-5 IO failure
>
> Of course full logs are also attached.
>
>> Another thing is, mind to run "btrfs check --readonly" on the fs?
> Result of check attached too.
>
Could you please run "btrfs check --readonly" from a liveCD?
There are tons of possible false alerts if ran on a RW mounted fs.
Thanks,
Qu
Hi,
I am experiencing I similiar issue using kernel 6.1.1. The problem has
started in kernel 6.1. It never happend before.
A btrfs scrub and a btrfs check --readonly returned 0 errors.
2022-12-26T07:44:45.000000+01:00 server02 kernel - - - BTRFS critical
(device sda2): corrupt leaf: root=18446744073709551610
block=203366612992 slot=73, bad key order, prev (484119 96 1312873)
current (484119 96 1312849)
2022-12-26T07:44:45.000000+01:00 server02 kernel - - - BTRFS info
(device sda2): leaf 203366612992 gen 5068802 total ptrs 105 free space
10820 owner 18446744073709551610
2022-12-26T07:44:45.000000+01:00 server02 kernel - - - \x09item 0 key
(484119 1 0) itemoff 16123 itemsize 160
2022-12-26T07:44:45.000000+01:00 server02 kernel - - - \x09\x09inode
generation 45 size 2250 mode 40700
2022-12-26T07:44:45.000000+01:00 server02 kernel - - - \x09item 1 key
(484119 12 484118) itemoff 16097 itemsize 26
2022-12-26T07:44:45.000000+01:00 server02 kernel - - - \x09item 2 key
(484119 72 15) itemoff 16089 itemsize 8
2022-12-26T07:44:45.000000+01:00 server02 kernel - - - \x09item 3 key
(484119 72 20) itemoff 16081 itemsize 8
...
2022-12-26T07:44:45.000000+01:00 server02 kernel - - - \x09item 82 key
(484119 96 1312873) itemoff 14665 itemsize 51
2022-12-26T07:44:45.000000+01:00 server02 kernel - - - \x09item 83 key
(484119 96 1312877) itemoff 14609 itemsize 56
2022-12-26T07:44:45.000000+01:00 server02 kernel - - - \x09item 84 key
(484128 1 0) itemoff 14449 itemsize 160
2022-12-26T07:44:45.000000+01:00 server02 kernel - - - \x09\x09inode
generation 45 size 98304 mode 100644
2022-12-26T07:44:45.000000+01:00 server02 kernel - - - \x09item 85 key
(484128 108 0) itemoff 14396 itemsize 53
2022-12-26T07:44:45.000000+01:00 server02 kernel - - - \x09\x09extent
data disk bytenr 10674830381056 nr 65536
2022-12-26T07:44:45.000000+01:00 server02 kernel - - - \x09\x09extent
data offset 0 nr 65536 ram 65536
2022-12-26T07:44:45.000000+01:00 server02 kernel - - - \x09item 86 key
(484129 1 0) itemoff 14236 itemsize 160
2022-12-26T07:44:45.000000+01:00 server02 kernel - - - \x09\x09inode
generation 45 size 26214400 mode 100644
2022-12-26T07:44:45.000000+01:00 server02 kernel - - - \x09item 87 key
(484129 108 589824) itemoff 14183 itemsize 53
2022-12-26T07:44:45.000000+01:00 server02 kernel - - - \x09\x09extent
data disk bytenr 10665699962880 nr 32768
2022-12-26T07:44:45.000000+01:00 server02 kernel - - - \x09\x09extent
data offset 0 nr 32768 ram 32768
2022-12-26T07:44:45.000000+01:00 server02 kernel - - - \x09item 88 key
(484129 108 2850816) itemoff 14130 itemsize 53
2022-12-26T07:44:45.000000+01:00 server02 kernel - - - \x09\x09extent
data disk bytenr 10665848733696 nr 32768
2022-12-26T07:44:45.000000+01:00 server02 kernel - - - \x09\x09extent
data offset 0 nr 32768 ram 32768
2022-12-26T07:44:45.000000+01:00 server02 kernel - - - \x09item 89 key
(484129 108 11042816) itemoff 14077 itemsize 53
2022-12-26T07:44:45.000000+01:00 server02 kernel - - - \x09\x09extent
data disk bytenr 10660869349376 nr 32768
2022-12-26T07:44:45.000000+01:00 server02 kernel - - - \x09\x09extent
data offset 0 nr 32768 ram 32768
2022-12-26T07:44:45.000000+01:00 server02 kernel - - - \x09item 90 key
(484129 108 13402112) itemoff 14024 itemsize 53
2022-12-26T07:44:45.000000+01:00 server02 kernel - - - \x09\x09extent
data disk bytenr 10660207378432 nr 32768
2022-12-26T07:44:45.000000+01:00 server02 kernel - - - \x09\x09extent
data offset 0 nr 32768 ram 32768
2022-12-26T07:44:45.000000+01:00 server02 kernel - - - \x09item 91 key
(484129 108 19628032) itemoff 13971 itemsize 53
2022-12-26T07:44:45.000000+01:00 server02 kernel - - - \x09\x09extent
data disk bytenr 10665835618304 nr 32768
2022-12-26T07:44:45.000000+01:00 server02 kernel - - - \x09\x09extent
data offset 0 nr 32768 ram 32768
2022-12-26T07:44:45.000000+01:00 server02 kernel - - - \x09item 92 key
(484129 108 21266432) itemoff 13918 itemsize 53
2022-12-26T07:44:45.000000+01:00 server02 kernel - - - \x09\x09extent
data disk bytenr 10661222666240 nr 32768
2022-12-26T07:44:45.000000+01:00 server02 kernel - - - \x09\x09extent
data offset 0 nr 32768 ram 32768
2022-12-26T07:44:45.000000+01:00 server02 kernel - - - \x09item 93 key
(484129 108 22740992) itemoff 13865 itemsize 53
2022-12-26T07:44:45.000000+01:00 server02 kernel - - - \x09\x09extent
data disk bytenr 10665565814784 nr 32768
2022-12-26T07:44:45.000000+01:00 server02 kernel - - - \x09\x09extent
data offset 0 nr 32768 ram 32768
2022-12-26T07:44:45.000000+01:00 server02 kernel - - - \x09item 94 key
(484129 108 23101440) itemoff 13812 itemsize 53
2022-12-26T07:44:45.000000+01:00 server02 kernel - - - \x09\x09extent
data disk bytenr 10665836371968 nr 32768
2022-12-26T07:44:45.000000+01:00 server02 kernel - - - \x09\x09extent
data offset 0 nr 32768 ram 32768
2022-12-26T07:44:45.000000+01:00 server02 kernel - - - \x09item 95 key
(484129 108 24084480) itemoff 13759 itemsize 53
2022-12-26T07:44:45.000000+01:00 server02 kernel - - - \x09\x09extent
data disk bytenr 10665836404736 nr 32768
2022-12-26T07:44:45.000000+01:00 server02 kernel - - - \x09\x09extent
data offset 0 nr 32768 ram 32768
2022-12-26T07:44:45.000000+01:00 server02 kernel - - - \x09item 96 key
(484129 108 24150016) itemoff 13706 itemsize 53
2022-12-26T07:44:45.000000+01:00 server02 kernel - - - \x09\x09extent
data disk bytenr 10665849159680 nr 32768
2022-12-26T07:44:45.000000+01:00 server02 kernel - - - \x09\x09extent
data offset 0 nr 32768 ram 32768
2022-12-26T07:44:45.000000+01:00 server02 kernel - - - \x09item 97 key
(484129 108 24313856) itemoff 13653 itemsize 53
2022-12-26T07:44:45.000000+01:00 server02 kernel - - - \x09\x09extent
data disk bytenr 10665849192448 nr 32768
2022-12-26T07:44:45.000000+01:00 server02 kernel - - - \x09\x09extent
data offset 0 nr 32768 ram 32768
2022-12-26T07:44:45.000000+01:00 server02 kernel - - - \x09item 98 key
(484147 1 0) itemoff 13493 itemsize 160
2022-12-26T07:44:45.000000+01:00 server02 kernel - - - \x09\x09inode
generation 45 size 886 mode 40755
2022-12-26T07:44:45.000000+01:00 server02 kernel - - - \x09item 99 key
(484147 72 4) itemoff 13485 itemsize 8
2022-12-26T07:44:45.000000+01:00 server02 kernel - - - \x09item 100 key
(484147 72 27) itemoff 13477 itemsize 8
2022-12-26T07:44:45.000000+01:00 server02 kernel - - - \x09item 101 key
(484147 72 35) itemoff 13469 itemsize 8
2022-12-26T07:44:45.000000+01:00 server02 kernel - - - \x09item 102 key
(484147 72 40) itemoff 13461 itemsize 8
2022-12-26T07:44:45.000000+01:00 server02 kernel - - - \x09item 103 key
(484147 72 45) itemoff 13453 itemsize 8
2022-12-26T07:44:45.000000+01:00 server02 kernel - - - \x09item 104 key
(484147 72 52) itemoff 13445 itemsize 8
2022-12-26T07:44:45.000000+01:00 server02 kernel - - - BTRFS error
(device sda2): block=203366612992 write time tree block corruption detected
2022-12-26T07:44:45.000000+01:00 server02 kernel - - - BTRFS: error
(device sda2: state AL) in free_log_tree:3284: errno=-5 IO failure
2022-12-26T07:44:45.000000+01:00 server02 kernel - - - BTRFS info
(device sda2: state EAL): forced readonly
2022-12-26T07:44:45.000000+01:00 server02 kernel - - - BTRFS warning
(device sda2: state EAL): Skipping commit of aborted transaction.
2022-12-26T07:44:45.000000+01:00 server02 kernel - - - BTRFS: error
(device sda2: state EAL) in cleanup_transaction:1958: errno=-5 IO failure
There are no SSD access errors in the kernel logs. Smart data for the
SSD is normal. I also did a 12 hour memtest to rule out bad RAM.
The filesystem consists of a single 480GB SATA SSD (Corsair Neutron
XTI). Like for the original poster, the problems occurs only on one
machine.
The error appears about every few days and seems to be triggered by a
combination specially under high cpu utilization combined with some disk
IO. CPU temperature never exceeds 60 degrees.
On 26/12/2022 04:29, Qu Wenruo wrote:
>
>
> On 2022/12/26 10:47, Mikhail Gavrilov wrote:
>> On Mon, Dec 26, 2022 at 5:15 AM Qu Wenruo <[email protected]>
>> wrote:
>>>
>>> Thanks a lot for the full kernel log.
>>>
>>> It indeed shows something is wrong in the run_one_delayed_ref().
>>> But surprisingly, if there is something wrong, I'd expect more output
>>> from btrfs, as normally if one tree block failed to pass whatever the
>>> checks, it should cause an error message at least.
>>>
>>> Since you can reproduce the bug (although I don't think this is easy to
>>> reproduce), mind to apply the extra debug patch and then try to
>>> reproduce?
>>
>> Of course I am still able to reproduce.
>> The number of messages foreshadowing readonly has become somewhat more:
>> [ 2295.155437] BTRFS error (device nvme0n1p3): level check failed on
>> logical 4957418700800 mirror 1 wanted 0 found 1
>
> OK, indeed a level mismatch.
>
> From the remaining lines, it shows we're failing at
> do_free_extent_accounting(), which failed at the btrfs_del_csums().
>
> And inside btrfs_del_csums(), what we do are all regular btree
> operations, thus the tree level check should work without problem.
>
> Thus it seems to be a corrupted csum tree.
>
>> [ 2295.155831] BTRFS error (device nvme0n1p3: state A): Transaction
>> aborted (error -5)
>> [ 2295.155946] BTRFS: error (device nvme0n1p3: state A) in
>> do_free_extent_accounting:2849: errno=-5 IO failure
>> [ 2295.155978] BTRFS info (device nvme0n1p3: state EA): forced readonly
>> [ 2295.155985] BTRFS error (device nvme0n1p3: state EA):
>> run_one_delayed_ref returned -5
>> [ 2295.156051] BTRFS: error (device nvme0n1p3: state EA) in
>> btrfs_run_delayed_refs:2153: errno=-5 IO failure
>>
>> Of course full logs are also attached.
>>
>>> Another thing is, mind to run "btrfs check --readonly" on the fs?
>> Result of check attached too.
>>
> Could you please run "btrfs check --readonly" from a liveCD?
> There are tons of possible false alerts if ran on a RW mounted fs.
>
> Thanks,
> Qu
On Mon, Dec 26, 2022 at 8:29 AM Qu Wenruo <[email protected]> wrote:
>
>
> OK, indeed a level mismatch.
>
> From the remaining lines, it shows we're failing at
> do_free_extent_accounting(), which failed at the btrfs_del_csums().
>
> And inside btrfs_del_csums(), what we do are all regular btree
> operations, thus the tree level check should work without problem.
>
> Thus it seems to be a corrupted csum tree.
Do I need to debug anything else to understand the cause of the error?
Thanks.
> Could you please run "btrfs check --readonly" from a liveCD?
> There are tons of possible false alerts if ran on a RW mounted fs.
>
# btrfs check --readonly /dev/nvme0n1p3
Opening filesystem to check...
Checking filesystem on /dev/nvme0n1p3
UUID: 40e0b5d2-df54-46e0-b6f4-2f868296271d
[1/7] checking root items
[2/7] checking extents
[3/7] checking free space tree
[4/7] checking fs roots
[5/7] checking only csums items (without verifying data)
[6/7] checking root refs
[7/7] checking quota groups skipped (not enabled on this FS)
found 6828416307200 bytes used, no error found
total csum bytes: 6651838248
total tree bytes: 16378380288
total fs tree bytes: 7483179008
total extent tree bytes: 1228210176
btree space waste bytes: 2413299694
file data blocks allocated: 6899999100928
referenced 7488299450368
[root@localhost-live ~]#
With liveCD looks like all OK (no errors found).
--
Best Regards,
Mike Gavrilov.
On 2022/12/26 16:15, Mikhail Gavrilov wrote:
> On Mon, Dec 26, 2022 at 8:29 AM Qu Wenruo <[email protected]> wrote:
>>
>>
>> OK, indeed a level mismatch.
>>
>> From the remaining lines, it shows we're failing at
>> do_free_extent_accounting(), which failed at the btrfs_del_csums().
>>
>> And inside btrfs_del_csums(), what we do are all regular btree
>> operations, thus the tree level check should work without problem.
>>
>> Thus it seems to be a corrupted csum tree.
>
> Do I need to debug anything else to understand the cause of the error?
> Thanks.
With the check output, it's indeed a runtime error.
(At least no corruption to your fs)
And it can be some call paths not properly initializing the level to check.
Here is the new debug patch.
It should be applied without any previous debug patch.
Thanks,
Qu
>
>> Could you please run "btrfs check --readonly" from a liveCD?
>> There are tons of possible false alerts if ran on a RW mounted fs.
>>
>
> # btrfs check --readonly /dev/nvme0n1p3
> Opening filesystem to check...
> Checking filesystem on /dev/nvme0n1p3
> UUID: 40e0b5d2-df54-46e0-b6f4-2f868296271d
> [1/7] checking root items
> [2/7] checking extents
> [3/7] checking free space tree
> [4/7] checking fs roots
> [5/7] checking only csums items (without verifying data)
> [6/7] checking root refs
> [7/7] checking quota groups skipped (not enabled on this FS)
> found 6828416307200 bytes used, no error found
> total csum bytes: 6651838248
> total tree bytes: 16378380288
> total fs tree bytes: 7483179008
> total extent tree bytes: 1228210176
> btree space waste bytes: 2413299694
> file data blocks allocated: 6899999100928
> referenced 7488299450368
> [root@localhost-live ~]#
>
> With liveCD looks like all OK (no errors found).
>
On Mon, Dec 26, 2022 at 1:47 PM Qu Wenruo <[email protected]> wrote:
>
>
> With the check output, it's indeed a runtime error.
> (At least no corruption to your fs)
>
> And it can be some call paths not properly initializing the level to check.
>
> Here is the new debug patch.
> It should be applied without any previous debug patch.
>
yeah, gotcha trace
[ 2561.764141] BTRFS error (device nvme1n1p3): level verify failed on
logical 45170688 mirror 1 wanted 0 found 1
[ 2561.764220] BTRFS error (device nvme1n1p3): read time tree block
corruption detected on logical 45170688 mirror 1
[ 2561.764224] BTRFS error (device nvme1n1p3): check owner_root=0
transid=0 first_key=(0 0 0) has_first_key=0 level=0
[ 2561.764242] BTRFS error (device nvme1n1p3): read failed, check
owner_root=259 transid=650936 has_first_key=1 first_key=(2221294 1 0)
level=1
[ 2561.764342] CPU: 8 PID: 727 Comm: systemd-udevd Tainted: G W
L 6.2.0-rc1-1b929c02afd37871d5afb9d498426f83432e71c2-btrfs-debug2+
#3
[ 2561.764348] Hardware name: ASUSTeK COMPUTER INC. ROG Strix
G513QY_G513QY/G513QY, BIOS G513QY.320 09/07/2022
[ 2561.764352] Call Trace:
[ 2561.764355] <TASK>
[ 2561.764360] dump_stack_lvl+0x5b/0x77
[ 2561.764369] read_extent_buffer_pages.cold+0x5a/0x9d
[ 2561.764382] ? __pfx_end_bio_extent_readpage+0x10/0x10
[ 2561.764390] ? folio_unlock+0x1d/0x50
[ 2561.764400] btrfs_read_extent_buffer+0x98/0x150
[ 2561.764410] read_tree_block+0x43/0xa0
[ 2561.764417] read_block_for_search+0x218/0x320
[ 2561.764432] btrfs_search_slot+0x351/0xd30
[ 2561.764450] btrfs_lookup_xattr+0x7b/0xe0
[ 2561.764460] btrfs_getxattr+0x55/0x110
[ 2561.764470] __vfs_getxattr+0x7e/0xb0
[ 2561.764480] get_vfs_caps_from_disk+0x7a/0x220
[ 2561.764492] audit_copy_inode+0x95/0xd0
[ 2561.764501] filename_lookup+0x16b/0x1c0
[ 2561.764525] ? _raw_spin_unlock+0x29/0x50
[ 2561.764534] vfs_statx+0x8e/0x150
[ 2561.764545] vfs_fstatat+0x51/0x70
[ 2561.764553] __do_sys_newfstatat+0x2e/0x50
[ 2561.764571] do_syscall_64+0x5b/0x80
[ 2561.764576] ? do_syscall_64+0x67/0x80
[ 2561.764585] ? up_read+0x17/0x20
[ 2561.764593] ? lock_is_held_type+0xe8/0x140
[ 2561.764601] ? asm_exc_page_fault+0x22/0x30
[ 2561.764607] ? lockdep_hardirqs_on+0x7d/0x100
[ 2561.764613] entry_SYSCALL_64_after_hwframe+0x72/0xdc
[ 2561.764618] RIP: 0033:0x7f7b1b91e05e
[ 2561.764639] Code: 48 89 f2 b9 00 01 00 00 48 89 fe bf 9c ff ff ff
e9 07 00 00 00 0f 1f 80 00 00 00 00 f3 0f 1e fa 41 89 ca b8 06 01 00
00 0f 05 <3d> 00 f0 ff ff 77 0b 31 c0 c3 0f 1f 84 00 00 00 00 00 48 8b
15 71
[ 2561.764643] RSP: 002b:00007fffdc4062c8 EFLAGS: 00000246 ORIG_RAX:
0000000000000106
[ 2561.764649] RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007f7b1b91e05e
[ 2561.764653] RDX: 00007fffdc406320 RSI: 000055efe7d04bc0 RDI: 00000000ffffff9c
[ 2561.764656] RBP: 000055efe7e79668 R08: 00000000000000f0 R09: 000055efe7ea85b0
[ 2561.764659] R10: 0000000000000000 R11: 0000000000000246 R12: 00007fffdc4064b0
[ 2561.764663] R13: 00007fffdc406320 R14: 00007fffdc4064a8 R15: 00007fffdc406308
[ 2561.764680] </TASK>
[ 2561.764684] BTRFS error (device nvme1n1p3): read failed, check
owner_root=259 transid=650936 has_first_key=1 first_key=(2221294 1 0)
level=1
Of course full logs are also attached.
Thanks.
--
Best Regards,
Mike Gavrilov.
On 2022/12/26 19:14, Mikhail Gavrilov wrote:
> On Mon, Dec 26, 2022 at 1:47 PM Qu Wenruo <[email protected]> wrote:
>>
>>
>> With the check output, it's indeed a runtime error.
>> (At least no corruption to your fs)
>>
>> And it can be some call paths not properly initializing the level to check.
>>
>> Here is the new debug patch.
>> It should be applied without any previous debug patch.
>>
>
> yeah, gotcha trace
>
> [ 2561.764141] BTRFS error (device nvme1n1p3): level verify failed on
> logical 45170688 mirror 1 wanted 0 found 1
> [ 2561.764220] BTRFS error (device nvme1n1p3): read time tree block
> corruption detected on logical 45170688 mirror 1
> [ 2561.764224] BTRFS error (device nvme1n1p3): check owner_root=0
> transid=0 first_key=(0 0 0) has_first_key=0 level=0
Damn it, some check structure is completely uninitialized.
> [ 2561.764242] BTRFS error (device nvme1n1p3): read failed, check
> owner_root=259 transid=650936 has_first_key=1 first_key=(2221294 1 0)
> level=1
While at the caller, the structure is properly passed in.
So there is something wrong between the endio function and the check.
I have created the v2 version patch to debug, please apply without any
previous debug patch.
Meanwhile this really looks like a race, thus I'm not 100% sure if my
debug patch would reduce the possibility to reproduce.
Your bisect should be the determining evidence, for the worst case we
can revert the offending patch.
Thank you very much for all of the testing, it really helps a lot.
Qu
> [ 2561.764342] CPU: 8 PID: 727 Comm: systemd-udevd Tainted: G W
> L 6.2.0-rc1-1b929c02afd37871d5afb9d498426f83432e71c2-btrfs-debug2+
> #3
> [ 2561.764348] Hardware name: ASUSTeK COMPUTER INC. ROG Strix
> G513QY_G513QY/G513QY, BIOS G513QY.320 09/07/2022
> [ 2561.764352] Call Trace:
> [ 2561.764355] <TASK>
> [ 2561.764360] dump_stack_lvl+0x5b/0x77
> [ 2561.764369] read_extent_buffer_pages.cold+0x5a/0x9d
> [ 2561.764382] ? __pfx_end_bio_extent_readpage+0x10/0x10
> [ 2561.764390] ? folio_unlock+0x1d/0x50
> [ 2561.764400] btrfs_read_extent_buffer+0x98/0x150
> [ 2561.764410] read_tree_block+0x43/0xa0
> [ 2561.764417] read_block_for_search+0x218/0x320
> [ 2561.764432] btrfs_search_slot+0x351/0xd30
> [ 2561.764450] btrfs_lookup_xattr+0x7b/0xe0
> [ 2561.764460] btrfs_getxattr+0x55/0x110
> [ 2561.764470] __vfs_getxattr+0x7e/0xb0
> [ 2561.764480] get_vfs_caps_from_disk+0x7a/0x220
> [ 2561.764492] audit_copy_inode+0x95/0xd0
> [ 2561.764501] filename_lookup+0x16b/0x1c0
> [ 2561.764525] ? _raw_spin_unlock+0x29/0x50
> [ 2561.764534] vfs_statx+0x8e/0x150
> [ 2561.764545] vfs_fstatat+0x51/0x70
> [ 2561.764553] __do_sys_newfstatat+0x2e/0x50
> [ 2561.764571] do_syscall_64+0x5b/0x80
> [ 2561.764576] ? do_syscall_64+0x67/0x80
> [ 2561.764585] ? up_read+0x17/0x20
> [ 2561.764593] ? lock_is_held_type+0xe8/0x140
> [ 2561.764601] ? asm_exc_page_fault+0x22/0x30
> [ 2561.764607] ? lockdep_hardirqs_on+0x7d/0x100
> [ 2561.764613] entry_SYSCALL_64_after_hwframe+0x72/0xdc
> [ 2561.764618] RIP: 0033:0x7f7b1b91e05e
> [ 2561.764639] Code: 48 89 f2 b9 00 01 00 00 48 89 fe bf 9c ff ff ff
> e9 07 00 00 00 0f 1f 80 00 00 00 00 f3 0f 1e fa 41 89 ca b8 06 01 00
> 00 0f 05 <3d> 00 f0 ff ff 77 0b 31 c0 c3 0f 1f 84 00 00 00 00 00 48 8b
> 15 71
> [ 2561.764643] RSP: 002b:00007fffdc4062c8 EFLAGS: 00000246 ORIG_RAX:
> 0000000000000106
> [ 2561.764649] RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007f7b1b91e05e
> [ 2561.764653] RDX: 00007fffdc406320 RSI: 000055efe7d04bc0 RDI: 00000000ffffff9c
> [ 2561.764656] RBP: 000055efe7e79668 R08: 00000000000000f0 R09: 000055efe7ea85b0
> [ 2561.764659] R10: 0000000000000000 R11: 0000000000000246 R12: 00007fffdc4064b0
> [ 2561.764663] R13: 00007fffdc406320 R14: 00007fffdc4064a8 R15: 00007fffdc406308
> [ 2561.764680] </TASK>
> [ 2561.764684] BTRFS error (device nvme1n1p3): read failed, check
> owner_root=259 transid=650936 has_first_key=1 first_key=(2221294 1 0)
> level=1
>
> Of course full logs are also attached.
> Thanks.
>
On Mon, Dec 26, 2022 at 5:11 PM Qu Wenruo <[email protected]> wrote:
>
>
> While at the caller, the structure is properly passed in.
>
> So there is something wrong between the endio function and the check.
>
> I have created the v2 version patch to debug, please apply without any
> previous debug patch.
>
> Meanwhile this really looks like a race, thus I'm not 100% sure if my
> debug patch would reduce the possibility to reproduce.
>
> Your bisect should be the determining evidence, for the worst case we
> can revert the offending patch.
>
> Thank you very much for all of the testing, it really helps a lot.
> Qu
>
Looks like v2 version patch is missed in last message
--
Best Regards,
Mike Gavrilov.
[Note: this mail contains only information for Linux kernel regression
tracking. Mails like these contain '#forregzbot' in the subject to make
then easy to spot and filter out. The author also tried to remove most
or all individuals from the list of recipients to spare them the hassle.]
On 25.12.22 22:32, Mikhail Gavrilov wrote:
> Hi,
> It is curious but it happens only on machine which have BTRFS volume
> combined from two high speed nvme (pcie 4) SSD in RAID 0. On machines
> with BTRFS volume from one HDD the bug does not appear.
>
> To bisect the problematic commit, I had to sweat a lot. At each step,
> I downloaded the 150 GB game "Assassin's Creed Valhalla" 4 times and
> deleted it. For make sure that the commit previous to
> 947a629988f191807d2d22ba63ae18259bb645c5 is definitely not affected by
> the bug, I downloaded this game 10 times, which should have provided
> more than 1.5 Tb of data writing to the btrfs volume.
>
> Here is result of my bisection:
> 947a629988f191807d2d22ba63ae18259bb645c5 is the first bad commit
> [...]
> Before going to readonly, the preceding line in kernel log display a message:
> [ 1908.029663] BTRFS: error (device nvme0n1p3: state A) in
> btrfs_run_delayed_refs:2147: errno=-5 IO failure
>
> I also attached a full kernel log.
Thanks for the report. To be sure below issue doesn't fall through the
cracks unnoticed, I'm adding it to regzbot, my Linux kernel regression
tracking bot:
#regzbot ^introduced 947a629988f191807d2d22ba63ae18259bb64
#regzbot title btrfs: volume periodical forced switch to readonly after
a lot of disk writes
#regzbot ignore-activity
Ciao, Thorsten (wearing his 'the Linux kernel's regression tracker' hat)
P.S.: As the Linux kernel's regression tracker I deal with a lot of
reports and sometimes miss something important when writing mails like
this. If that's the case here, don't hesitate to tell me in a public
reply, it's in everyone's interest to set the public record straight.
On 2022/12/26 20:15, Mikhail Gavrilov wrote:
> On Mon, Dec 26, 2022 at 5:11 PM Qu Wenruo <[email protected]> wrote:
>>
>>
>> While at the caller, the structure is properly passed in.
>>
>> So there is something wrong between the endio function and the check.
>>
>> I have created the v2 version patch to debug, please apply without any
>> previous debug patch.
>>
>> Meanwhile this really looks like a race, thus I'm not 100% sure if my
>> debug patch would reduce the possibility to reproduce.
>>
>> Your bisect should be the determining evidence, for the worst case we
>> can revert the offending patch.
>>
>> Thank you very much for all of the testing, it really helps a lot.
>> Qu
>>
>
> Looks like v2 version patch is missed in last message
>
Oh, my bad.
Please be aware that, this newer version may cause extra kernel
warnings, if it found any uninitialized btrfs_tree_parent_check
structure at submit time.
But there are completely valid cases we can do that, thus it may cause
much larger dmesg output, and you should not rely on the kernel warning
to determine if the bug is triggered.
Thanks,
Qu
On Tue, Dec 27, 2022 at 4:36 AM Qu Wenruo <[email protected]> wrote:
>
>
> Please be aware that, this newer version may cause extra kernel
> warnings, if it found any uninitialized btrfs_tree_parent_check
> structure at submit time.
>
> But there are completely valid cases we can do that, thus it may cause
> much larger dmesg output, and you should not rely on the kernel warning
> to determine if the bug is triggered.
>
The kernel failed to built with v2 patch.
Here list of compilatore error messages:
fs/btrfs/disk-io.c: In function ‘btrfs_submit_metadata_bio’:
fs/btrfs/disk-io.c:5413:2: error: unterminated argument list invoking
macro "WARN_ON"
5413 | }
| ^
CC net/ipv4/icmp.o
fs/btrfs/disk-io.c:851:17: error: ‘WARN_ON’ undeclared (first use in
this function)
851 | WARN_ON(!memcmp(&check, &bbio->parent_check,
| ^~~~~~~
fs/btrfs/disk-io.c:851:17: note: each undeclared identifier is
reported only once for each function it appears in
fs/btrfs/disk-io.c:851:24: error: expected ‘;’ at end of input
851 | WARN_ON(!memcmp(&check, &bbio->parent_check,
| ^
| ;
......
fs/btrfs/disk-io.c:851:17: error: expected declaration or statement at
end of input
851 | WARN_ON(!memcmp(&check, &bbio->parent_check,
| ^~~~~~~
fs/btrfs/disk-io.c:851:17: error: expected declaration or statement at
end of input
fs/btrfs/disk-io.c:845:22: warning: unused variable ‘ret’ [-Wunused-variable]
845 | blk_status_t ret;
| ^~~
fs/btrfs/disk-io.c:844:40: warning: unused variable ‘check’ [-Wunused-variable]
844 | struct btrfs_tree_parent_check check = {0};
| ^~~~~
fs/btrfs/disk-io.c:842:31: warning: unused variable ‘fs_info’
[-Wunused-variable]
842 | struct btrfs_fs_info *fs_info = inode->root->fs_info;
| ^~~~~~~
HDRTEST usr/include/linux/loadpin.h
fs/btrfs/disk-io.c: At top level:
fs/btrfs/disk-io.c:63:13: warning: ‘btrfs_destroy_ordered_extents’
declared ‘static’ but never defined [-Wunused-function]
63 | static void btrfs_destroy_ordered_extents(struct btrfs_root *root);
| ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~
fs/btrfs/disk-io.c:64:12: warning: ‘btrfs_destroy_delayed_refs’
declared ‘static’ but never defined [-Wunused-function]
64 | static int btrfs_destroy_delayed_refs(struct btrfs_transaction *trans,
| ^~~~~~~~~~~~~~~~~~~~~~~~~~
fs/btrfs/disk-io.c:66:13: warning: ‘btrfs_destroy_delalloc_inodes’
declared ‘static’ but never defined [-Wunused-function]
66 | static void btrfs_destroy_delalloc_inodes(struct btrfs_root *root);
| ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~
fs/btrfs/disk-io.c:67:12: warning: ‘btrfs_destroy_marked_extents’
declared ‘static’ but never defined [-Wunused-function]
67 | static int btrfs_destroy_marked_extents(struct btrfs_fs_info *fs_info,
| ^~~~~~~~~~~~~~~~~~~~~~~~~~~~
fs/btrfs/disk-io.c:70:12: warning: ‘btrfs_destroy_pinned_extent’
declared ‘static’ but never defined [-Wunused-function]
70 | static int btrfs_destroy_pinned_extent(struct btrfs_fs_info *fs_info,
| ^~~~~~~~~~~~~~~~~~~~~~~~~~~
fs/btrfs/disk-io.c:72:12: warning: ‘btrfs_cleanup_transaction’
declared ‘static’ but never defined [-Wunused-function]
72 | static int btrfs_cleanup_transaction(struct btrfs_fs_info *fs_info);
| ^~~~~~~~~~~~~~~~~~~~~~~~~
fs/btrfs/disk-io.c:73:13: warning: ‘btrfs_error_commit_super’ declared
‘static’ but never defined [-Wunused-function]
73 | static void btrfs_error_commit_super(struct btrfs_fs_info *fs_info);
| ^~~~~~~~~~~~~~~~~~~~~~~~
fs/btrfs/disk-io.c:828:13: warning: ‘should_async_write’ defined but
not used [-Wunused-function]
828 | static bool should_async_write(struct btrfs_fs_info *fs_info,
| ^~~~~~~~~~~~~~~~~~
fs/btrfs/disk-io.c:161:13: warning: ‘btrfs_supported_super_csum’
defined but not used [-Wunused-function]
161 | static bool btrfs_supported_super_csum(u16 csum_type)
| ^~~~~~~~~~~~~~~~~~~~~~~~~~
fs/btrfs/disk-io.c:130:12: warning: ‘verify_parent_transid’ defined
but not used [-Wunused-function]
130 | static int verify_parent_transid(struct extent_io_tree *io_tree,
| ^~~~~~~~~~~~~~~~~~~~~
fs/btrfs/disk-io.c:75:13: warning: ‘btrfs_free_csum_hash’ defined but
not used [-Wunused-function]
75 | static void btrfs_free_csum_hash(struct btrfs_fs_info *fs_info)
| ^~~~~~~~~~~~~~~~~~~~
CC kernel/notifier.o
make[3]: *** [scripts/Makefile.build:252: fs/btrfs/disk-io.o] Error 1
make[3]: *** Waiting for unfinished jobs....
--
Best Regards,
Mike Gavrilov.
On 2022/12/27 07:52, Mikhail Gavrilov wrote:
> On Tue, Dec 27, 2022 at 4:36 AM Qu Wenruo <[email protected]> wrote:
>>
>>
>> Please be aware that, this newer version may cause extra kernel
>> warnings, if it found any uninitialized btrfs_tree_parent_check
>> structure at submit time.
>>
>> But there are completely valid cases we can do that, thus it may cause
>> much larger dmesg output, and you should not rely on the kernel warning
>> to determine if the bug is triggered.
>>
>
> The kernel failed to built with v2 patch.
>
> Here list of compilatore error messages:
> fs/btrfs/disk-io.c: In function ‘btrfs_submit_metadata_bio’:
> fs/btrfs/disk-io.c:5413:2: error: unterminated argument list invoking
> macro "WARN_ON"
> 5413 | }
> | ^
> CC net/ipv4/icmp.o
> fs/btrfs/disk-io.c:851:17: error: ‘WARN_ON’ undeclared (first use in
> this function)
> 851 | WARN_ON(!memcmp(&check, &bbio->parent_check,
> | ^~~~~~~
> fs/btrfs/disk-io.c:851:17: note: each undeclared identifier is
> reported only once for each function it appears in
> fs/btrfs/disk-io.c:851:24: error: expected ‘;’ at end of input
> 851 | WARN_ON(!memcmp(&check, &bbio->parent_check,
> | ^
> | ;
> ......
> fs/btrfs/disk-io.c:851:17: error: expected declaration or statement at
> end of input
> 851 | WARN_ON(!memcmp(&check, &bbio->parent_check,
> | ^~~~~~~
> fs/btrfs/disk-io.c:851:17: error: expected declaration or statement at
> end of input
> fs/btrfs/disk-io.c:845:22: warning: unused variable ‘ret’ [-Wunused-variable]
> 845 | blk_status_t ret;
> | ^~~
> fs/btrfs/disk-io.c:844:40: warning: unused variable ‘check’ [-Wunused-variable]
> 844 | struct btrfs_tree_parent_check check = {0};
> | ^~~~~
> fs/btrfs/disk-io.c:842:31: warning: unused variable ‘fs_info’
> [-Wunused-variable]
> 842 | struct btrfs_fs_info *fs_info = inode->root->fs_info;
> | ^~~~~~~
> HDRTEST usr/include/linux/loadpin.h
> fs/btrfs/disk-io.c: At top level:
> fs/btrfs/disk-io.c:63:13: warning: ‘btrfs_destroy_ordered_extents’
> declared ‘static’ but never defined [-Wunused-function]
> 63 | static void btrfs_destroy_ordered_extents(struct btrfs_root *root);
> | ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~
> fs/btrfs/disk-io.c:64:12: warning: ‘btrfs_destroy_delayed_refs’
> declared ‘static’ but never defined [-Wunused-function]
> 64 | static int btrfs_destroy_delayed_refs(struct btrfs_transaction *trans,
> | ^~~~~~~~~~~~~~~~~~~~~~~~~~
> fs/btrfs/disk-io.c:66:13: warning: ‘btrfs_destroy_delalloc_inodes’
> declared ‘static’ but never defined [-Wunused-function]
> 66 | static void btrfs_destroy_delalloc_inodes(struct btrfs_root *root);
> | ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~
> fs/btrfs/disk-io.c:67:12: warning: ‘btrfs_destroy_marked_extents’
> declared ‘static’ but never defined [-Wunused-function]
> 67 | static int btrfs_destroy_marked_extents(struct btrfs_fs_info *fs_info,
> | ^~~~~~~~~~~~~~~~~~~~~~~~~~~~
> fs/btrfs/disk-io.c:70:12: warning: ‘btrfs_destroy_pinned_extent’
> declared ‘static’ but never defined [-Wunused-function]
> 70 | static int btrfs_destroy_pinned_extent(struct btrfs_fs_info *fs_info,
> | ^~~~~~~~~~~~~~~~~~~~~~~~~~~
> fs/btrfs/disk-io.c:72:12: warning: ‘btrfs_cleanup_transaction’
> declared ‘static’ but never defined [-Wunused-function]
> 72 | static int btrfs_cleanup_transaction(struct btrfs_fs_info *fs_info);
> | ^~~~~~~~~~~~~~~~~~~~~~~~~
> fs/btrfs/disk-io.c:73:13: warning: ‘btrfs_error_commit_super’ declared
> ‘static’ but never defined [-Wunused-function]
> 73 | static void btrfs_error_commit_super(struct btrfs_fs_info *fs_info);
> | ^~~~~~~~~~~~~~~~~~~~~~~~
> fs/btrfs/disk-io.c:828:13: warning: ‘should_async_write’ defined but
> not used [-Wunused-function]
> 828 | static bool should_async_write(struct btrfs_fs_info *fs_info,
> | ^~~~~~~~~~~~~~~~~~
> fs/btrfs/disk-io.c:161:13: warning: ‘btrfs_supported_super_csum’
> defined but not used [-Wunused-function]
> 161 | static bool btrfs_supported_super_csum(u16 csum_type)
> | ^~~~~~~~~~~~~~~~~~~~~~~~~~
> fs/btrfs/disk-io.c:130:12: warning: ‘verify_parent_transid’ defined
> but not used [-Wunused-function]
> 130 | static int verify_parent_transid(struct extent_io_tree *io_tree,
> | ^~~~~~~~~~~~~~~~~~~~~
> fs/btrfs/disk-io.c:75:13: warning: ‘btrfs_free_csum_hash’ defined but
> not used [-Wunused-function]
> 75 | static void btrfs_free_csum_hash(struct btrfs_fs_info *fs_info)
> | ^~~~~~~~~~~~~~~~~~~~
> CC kernel/notifier.o
> make[3]: *** [scripts/Makefile.build:252: fs/btrfs/disk-io.o] Error 1
> make[3]: *** Waiting for unfinished jobs....
>
>
My bad, missing a ")" in the WARN_ON() line.
Fixed with v3 patch.
Thanks,
Qu
On Tue, Dec 27, 2022 at 5:18 AM Qu Wenruo <[email protected]> wrote:
>
> My bad, missing a ")" in the WARN_ON() line.
>
> Fixed with v3 patch.
>
New kernel logs are ready.
--
Best Regards,
Mike Gavrilov.
On 2022/12/27 09:35, Mikhail Gavrilov wrote:
> On Tue, Dec 27, 2022 at 5:18 AM Qu Wenruo <[email protected]> wrote:
>>
>> My bad, missing a ")" in the WARN_ON() line.
>>
>> Fixed with v3 patch.
>>
>
> New kernel logs are ready.
>
The result doesn't make sense...
A lot of read_block_for_search() and btrfs_read_node_slot() are
triggering the warning.
But inside both functions, we have just set the numbers before the call:
In read_block_for_search() we got:
check.has_first_key = true;
check.level = parent_level - 1;
check.transid = gen;
check.owner_root = root->root_key.objectid;
Thus at least check.has_first_key is always true, and the WARN_ON()
should never get triggered.
The same applies to btrfs_read_node_slot().
It looks like something involved in memory barrier.
Anyway, the latest debug patch has extra mb to be sure.
And despite the possible fix, could you provide extra info of your
build? Include:
- Hardware (mostly CPU and RAM spec)
- Toolchain used to compile the kernel (include compiler and its
version)
- Kernel config
Thanks,
Qu
On Tue, Dec 27, 2022 at 10:13 AM Qu Wenruo <[email protected]> wrote:
> The result doesn't make sense...
>
> A lot of read_block_for_search() and btrfs_read_node_slot() are
> triggering the warning.
>
> But inside both functions, we have just set the numbers before the call:
>
> In read_block_for_search() we got:
>
> check.has_first_key = true;
> check.level = parent_level - 1;
> check.transid = gen;
> check.owner_root = root->root_key.objectid;
>
> Thus at least check.has_first_key is always true, and the WARN_ON()
> should never get triggered.
> The same applies to btrfs_read_node_slot().
>
> It looks like something involved in memory barrier.
>
> Anyway, the latest debug patch has extra mb to be sure.
>
> And despite the possible fix, could you provide extra info of your
> build? Include:
Kernel log is attached. All answers are below.
> - Hardware (mostly CPU and RAM spec)
This is laptop ASUS ROG Strix G15 Advantage Edition G513QY-HQ007
with CPU AMD Ryzen 9 5900HX and GPU Radeon RX 6800M
I upgraded RAM to Crucial CT32G4SFD832A DDR4 - 32GB x 2 (64GB) and SSD
to Seagate FireCuda 530 ZP4000GM3A013 4TB x 2 (8TB)
https://linux-hardware.org/?probe=0e063e5fd5
> - Toolchain used to compile the kernel (include compiler and its version)
$ rpm -q binutils
binutils-2.39-6.fc38.x86_64
$ ld --version
GNU ld version 2.39-6.fc38
$ rpm -q gcc
gcc-12.2.1-4.fc38.x86_64
$ gcc --version
gcc (GCC) 12.2.1 20221121 (Red Hat 12.2.1-4)
$ rpm -q make
make-4.3-11.fc37.x86_64
$ make --version
GNU Make 4.3
> - Kernel config
Attached with debug logs.
--
Best Regards,
Mike Gavrilov.
On 2022/12/27 18:19, Mikhail Gavrilov wrote:
> On Tue, Dec 27, 2022 at 10:13 AM Qu Wenruo <[email protected]> wrote:
>> The result doesn't make sense...
>>
>> A lot of read_block_for_search() and btrfs_read_node_slot() are
>> triggering the warning.
>>
>> But inside both functions, we have just set the numbers before the call:
>>
>> In read_block_for_search() we got:
>>
>> check.has_first_key = true;
>> check.level = parent_level - 1;
>> check.transid = gen;
>> check.owner_root = root->root_key.objectid;
>>
>> Thus at least check.has_first_key is always true, and the WARN_ON()
>> should never get triggered.
>> The same applies to btrfs_read_node_slot().
>>
>> It looks like something involved in memory barrier.
>>
>> Anyway, the latest debug patch has extra mb to be sure.
>>
>> And despite the possible fix, could you provide extra info of your
>> build? Include:
> Kernel log is attached. All answers are below.
>
>> - Hardware (mostly CPU and RAM spec)
> This is laptop ASUS ROG Strix G15 Advantage Edition G513QY-HQ007
> with CPU AMD Ryzen 9 5900HX and GPU Radeon RX 6800M
I have a similar laptop (G14), only GPU is different (RTX3060), and I
failed to reproduce this so far...
> I upgraded RAM to Crucial CT32G4SFD832A DDR4 - 32GB x 2 (64GB) and SSD
> to Seagate FireCuda 530 ZP4000GM3A013 4TB x 2 (8TB)
> https://linux-hardware.org/?probe=0e063e5fd5
>
>> - Toolchain used to compile the kernel (include compiler and its version)
> $ rpm -q binutils
> binutils-2.39-6.fc38.x86_64
>
> $ ld --version
> GNU ld version 2.39-6.fc38
>
> $ rpm -q gcc
> gcc-12.2.1-4.fc38.x86_64
My gcc is only a small version behind (12.2.0).
Thus none of the hardware seems suspicious at all...
Anyway I have attached my last struggle for the weird problem.
For now, I have no idea why this can even happen...
Thanks,
Qu
>
> $ gcc --version
> gcc (GCC) 12.2.1 20221121 (Red Hat 12.2.1-4)
>
> $ rpm -q make
> make-4.3-11.fc37.x86_64
>
> $ make --version
> GNU Make 4.3
>
>> - Kernel config
> Attached with debug logs.
>
On Tue, Dec 27, 2022 at 4:03 PM Qu Wenruo <[email protected]> wrote:
>
> I have a similar laptop (G14), only GPU is different (RTX3060), and I
> failed to reproduce this so far...
>
> My gcc is only a small version behind (12.2.0).
>
> Thus none of the hardware seems suspicious at all...
>
> Anyway I have attached my last struggle for the weird problem.
> For now, I have no idea why this can even happen...
The new Kernel log is attached.
This time, the main difference was that the file system did not
immediately switch to readonly.
The Steam client stopped a couple of times with a write error, but
after pressing the resume button, it resumed downloading. For the
third or fourth time refused to download.
--
Best Regards,
Mike Gavrilov.
On 2022/12/27 21:11, Mikhail Gavrilov wrote:
> On Tue, Dec 27, 2022 at 4:03 PM Qu Wenruo <[email protected]> wrote:
>>
>> I have a similar laptop (G14), only GPU is different (RTX3060), and I
>> failed to reproduce this so far...
>>
>> My gcc is only a small version behind (12.2.0).
>>
>> Thus none of the hardware seems suspicious at all...
>>
>> Anyway I have attached my last struggle for the weird problem.
>> For now, I have no idea why this can even happen...
>
> The new Kernel log is attached.
> This time, the main difference was that the file system did not
> immediately switch to readonly.
> The Steam client stopped a couple of times with a write error, but
> after pressing the resume button, it resumed downloading. For the
> third or fourth time refused to download.
>
I'm a total idiot.
From the very first dmesg with calltrack, it already shows the
submit_one_bio() is called from submit_extent_page(), which means cases
cross stripe boundary, and has no parent_check populated at all.
And since you're using RAID0 on two NVMEs, it matches the symptom, while
most tests done here are using single device (DUP and SINGLE), thus no
stripe boundary cases at all.
(In fact it should still be possible to trigger on SINGLE, but way too
hard to trigger)
With proper root cause found, this version should mostly handle the
regression correctly.
This version should mostly be the formal one I'd later send to the
mailing list.
I can not thank you more for all the testing you have provided, it not
only pinned down the bug, but also proves I'm a total idiot...
Thanks,
Qu
On Wed, Dec 28, 2022 at 6:08 AM Qu Wenruo <[email protected]> wrote:
>
> From the very first dmesg with calltrack, it already shows the
> submit_one_bio() is called from submit_extent_page(), which means cases
> cross stripe boundary, and has no parent_check populated at all.
>
> And since you're using RAID0 on two NVMEs, it matches the symptom, while
> most tests done here are using single device (DUP and SINGLE), thus no
> stripe boundary cases at all.
> (In fact it should still be possible to trigger on SINGLE, but way too
> hard to trigger)
>
> With proper root cause found, this version should mostly handle the
> regression correctly.
>
> This version should mostly be the formal one I'd later send to the
> mailing list.
>
> I can not thank you more for all the testing you have provided, it not
> only pinned down the bug, but also proves I'm a total idiot...
I have already written over 1.6Tb of data to disk and there are no
hints of errors.
For me, this is a sign that the problem has been fixed.
Tested-by: Mikhail Gavrilov <[email protected]>
❯ dmesg | grep -i btrfs
[ 0.000000] Linux version
6.2.0-rc1-1b929c02afd37871d5afb9d498426f83432e71c2-btrfs-fix+
(mikhail@mikhail-laptop) (gcc (GCC) 12.2.1 20221121 (Red Hat
12.2.1-4), GNU ld version 2.39-6.fc38) #7 SMP PREEMPT_DYNAMIC Wed Dec
28 10:00:39 +05 2022
[ 0.000000] Command line:
BOOT_IMAGE=(hd0,gpt1)/@root/boot/vmlinuz-6.2.0-rc1-1b929c02afd37871d5afb9d498426f83432e71c2-btrfs-fix+
root=UUID=40e0b5d2-df54-46e0-b6f4-2f868296271d ro
rootflags=subvol=@root
resume=UUID=db79988f-6b70-4b52-84f5-3e505471c85e log_buf_len=16M
sysrq_always_enabled=1 nmi_watchdog=1
amdgpu.lockup_timeout=-1,-1,-1,-1
[ 0.154567] Kernel command line:
BOOT_IMAGE=(hd0,gpt1)/@root/boot/vmlinuz-6.2.0-rc1-1b929c02afd37871d5afb9d498426f83432e71c2-btrfs-fix+
root=UUID=40e0b5d2-df54-46e0-b6f4-2f868296271d ro
rootflags=subvol=@root
resume=UUID=db79988f-6b70-4b52-84f5-3e505471c85e log_buf_len=16M
sysrq_always_enabled=1 nmi_watchdog=1
amdgpu.lockup_timeout=-1,-1,-1,-1
[ 0.154654] Unknown kernel command line parameters
"BOOT_IMAGE=(hd0,gpt1)/@root/boot/vmlinuz-6.2.0-rc1-1b929c02afd37871d5afb9d498426f83432e71c2-btrfs-fix+",
will be passed to user space.
[ 4.496766] usb usb2: Manufacturer: Linux
6.2.0-rc1-1b929c02afd37871d5afb9d498426f83432e71c2-btrfs-fix+ xhci-hcd
[ 4.498963] usb usb1: Manufacturer: Linux
6.2.0-rc1-1b929c02afd37871d5afb9d498426f83432e71c2-btrfs-fix+ xhci-hcd
[ 4.500665] usb usb3: Manufacturer: Linux
6.2.0-rc1-1b929c02afd37871d5afb9d498426f83432e71c2-btrfs-fix+ xhci-hcd
[ 4.501851] usb usb4: Manufacturer: Linux
6.2.0-rc1-1b929c02afd37871d5afb9d498426f83432e71c2-btrfs-fix+ xhci-hcd
[ 4.735212] Btrfs loaded, crc32c=crc32c-generic, assert=on,
zoned=yes, fsverity=yes
[ 5.223368]
BOOT_IMAGE=(hd0,gpt1)/@root/boot/vmlinuz-6.2.0-rc1-1b929c02afd37871d5afb9d498426f83432e71c2-btrfs-fix+
[ 6.923453] BTRFS: device label fedora_localhost-live devid 2
transid 652981 /dev/nvme1n1p1 scanned by systemd-udevd (448)
[ 6.974412] BTRFS: device label fedora_localhost-live devid 1
transid 652981 /dev/nvme0n1p3 scanned by systemd-udevd (484)
[ 11.113437] CPU: 15 PID: 478 Comm: systemd-udevd Tainted: G
L 6.2.0-rc1-1b929c02afd37871d5afb9d498426f83432e71c2-btrfs-fix+
#7
[ 11.221359] CPU: 15 PID: 478 Comm: systemd-udevd Tainted: G
W L 6.2.0-rc1-1b929c02afd37871d5afb9d498426f83432e71c2-btrfs-fix+
#7
[ 13.731015] BTRFS info (device nvme0n1p3): using crc32c
(crc32c-intel) checksum algorithm
[ 13.731147] BTRFS info (device nvme0n1p3): using free space tree
[ 14.328439] BTRFS info (device nvme0n1p3): enabling ssd optimizations
[ 14.328469] BTRFS info (device nvme0n1p3): auto enabling async discard
[ 16.592713] BTRFS info (device nvme0n1p3: state M): use zstd
compression, level 1
[11691.071176] CPU: 11 PID: 2068 Comm: gnome-shell Tainted: G W
L 6.2.0-rc1-1b929c02afd37871d5afb9d498426f83432e71c2-btrfs-fix+
#7
<OFFTOPIC>
As I mentioned at the first message I also have a computer where the
btrfs partition is located on a slow HDD.
When I update the container (podman pull), the system becomes
unresposible for half an hour, which is how long it takes to update
the container.
I do not expect any super-speed from the HDD, I just would like to do
something else with this computer. Yes, at least watching videos on
youtube. Is there anything that can be done here or is there nothing
that we can do?
[46944.301588] INFO: task btrfs-transacti:1184 blocked for more than
122 seconds.
[46944.301825] Tainted: G W L ------- ---
6.2.0-0.rc1.14.fc38.x86_64+debug #1
[46944.301829] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs"
disables this message.
[46944.301832] task:btrfs-transacti state:D stack:12000 pid:1184
ppid:2 flags:0x00004000
[46944.301840] Call Trace:
[46944.301843] <TASK>
[46944.301851] __schedule+0x50c/0x1780
[46944.301863] ? _raw_spin_unlock_irqrestore+0x30/0x60
[46944.301876] schedule+0x5d/0xe0
[46944.301881] wait_current_trans+0x110/0x170
[46944.301888] ? __pfx_autoremove_wake_function+0x10/0x10
[46944.301895] start_transaction+0x36c/0x680
[46944.301904] transaction_kthread+0xb6/0x1b0
[46944.301912] ? __pfx_transaction_kthread+0x10/0x10
[46944.301916] kthread+0xf5/0x120
[46944.301920] ? __pfx_kthread+0x10/0x10
[46944.301926] ret_from_fork+0x2c/0x50
[46944.301941] </TASK>
I attached a full kernel log from this machine.
I can start a separate thread if it makes sense.
Sorry for oftop.
</OFFTOPIC>
--
Best Regards,
Mike Gavrilov.
On 2022/12/28 22:12, Mikhail Gavrilov wrote:
> On Wed, Dec 28, 2022 at 6:08 AM Qu Wenruo <[email protected]> wrote:
>>
>> From the very first dmesg with calltrack, it already shows the
>> submit_one_bio() is called from submit_extent_page(), which means cases
>> cross stripe boundary, and has no parent_check populated at all.
>>
>> And since you're using RAID0 on two NVMEs, it matches the symptom, while
>> most tests done here are using single device (DUP and SINGLE), thus no
>> stripe boundary cases at all.
>> (In fact it should still be possible to trigger on SINGLE, but way too
>> hard to trigger)
>>
>> With proper root cause found, this version should mostly handle the
>> regression correctly.
>>
>> This version should mostly be the formal one I'd later send to the
>> mailing list.
>>
>> I can not thank you more for all the testing you have provided, it not
>> only pinned down the bug, but also proves I'm a total idiot...
>
> I have already written over 1.6Tb of data to disk and there are no
> hints of errors.
> For me, this is a sign that the problem has been fixed.
> Tested-by: Mikhail Gavrilov <[email protected]>
Just one last thing to confirm, mind to provide the following dump?
# btrfs ins dump-tree -t chunk <device>
If the fix works, it means you have some tree block crossing stripe
boundary, which is not the ideal situation.
But on the other hand, you have provided "btrfs check" output, which
shows no such sign.
I'm not sure if it's a bug in newer check.
Thanks for your comprehensive testing and detailed report!
Qu
>
> ❯ dmesg | grep -i btrfs
> [ 0.000000] Linux version
> 6.2.0-rc1-1b929c02afd37871d5afb9d498426f83432e71c2-btrfs-fix+
> (mikhail@mikhail-laptop) (gcc (GCC) 12.2.1 20221121 (Red Hat
> 12.2.1-4), GNU ld version 2.39-6.fc38) #7 SMP PREEMPT_DYNAMIC Wed Dec
> 28 10:00:39 +05 2022
> [ 0.000000] Command line:
> BOOT_IMAGE=(hd0,gpt1)/@root/boot/vmlinuz-6.2.0-rc1-1b929c02afd37871d5afb9d498426f83432e71c2-btrfs-fix+
> root=UUID=40e0b5d2-df54-46e0-b6f4-2f868296271d ro
> rootflags=subvol=@root
> resume=UUID=db79988f-6b70-4b52-84f5-3e505471c85e log_buf_len=16M
> sysrq_always_enabled=1 nmi_watchdog=1
> amdgpu.lockup_timeout=-1,-1,-1,-1
> [ 0.154567] Kernel command line:
> BOOT_IMAGE=(hd0,gpt1)/@root/boot/vmlinuz-6.2.0-rc1-1b929c02afd37871d5afb9d498426f83432e71c2-btrfs-fix+
> root=UUID=40e0b5d2-df54-46e0-b6f4-2f868296271d ro
> rootflags=subvol=@root
> resume=UUID=db79988f-6b70-4b52-84f5-3e505471c85e log_buf_len=16M
> sysrq_always_enabled=1 nmi_watchdog=1
> amdgpu.lockup_timeout=-1,-1,-1,-1
> [ 0.154654] Unknown kernel command line parameters
> "BOOT_IMAGE=(hd0,gpt1)/@root/boot/vmlinuz-6.2.0-rc1-1b929c02afd37871d5afb9d498426f83432e71c2-btrfs-fix+",
> will be passed to user space.
> [ 4.496766] usb usb2: Manufacturer: Linux
> 6.2.0-rc1-1b929c02afd37871d5afb9d498426f83432e71c2-btrfs-fix+ xhci-hcd
> [ 4.498963] usb usb1: Manufacturer: Linux
> 6.2.0-rc1-1b929c02afd37871d5afb9d498426f83432e71c2-btrfs-fix+ xhci-hcd
> [ 4.500665] usb usb3: Manufacturer: Linux
> 6.2.0-rc1-1b929c02afd37871d5afb9d498426f83432e71c2-btrfs-fix+ xhci-hcd
> [ 4.501851] usb usb4: Manufacturer: Linux
> 6.2.0-rc1-1b929c02afd37871d5afb9d498426f83432e71c2-btrfs-fix+ xhci-hcd
> [ 4.735212] Btrfs loaded, crc32c=crc32c-generic, assert=on,
> zoned=yes, fsverity=yes
> [ 5.223368]
> BOOT_IMAGE=(hd0,gpt1)/@root/boot/vmlinuz-6.2.0-rc1-1b929c02afd37871d5afb9d498426f83432e71c2-btrfs-fix+
> [ 6.923453] BTRFS: device label fedora_localhost-live devid 2
> transid 652981 /dev/nvme1n1p1 scanned by systemd-udevd (448)
> [ 6.974412] BTRFS: device label fedora_localhost-live devid 1
> transid 652981 /dev/nvme0n1p3 scanned by systemd-udevd (484)
> [ 11.113437] CPU: 15 PID: 478 Comm: systemd-udevd Tainted: G
> L 6.2.0-rc1-1b929c02afd37871d5afb9d498426f83432e71c2-btrfs-fix+
> #7
> [ 11.221359] CPU: 15 PID: 478 Comm: systemd-udevd Tainted: G
> W L 6.2.0-rc1-1b929c02afd37871d5afb9d498426f83432e71c2-btrfs-fix+
> #7
> [ 13.731015] BTRFS info (device nvme0n1p3): using crc32c
> (crc32c-intel) checksum algorithm
> [ 13.731147] BTRFS info (device nvme0n1p3): using free space tree
> [ 14.328439] BTRFS info (device nvme0n1p3): enabling ssd optimizations
> [ 14.328469] BTRFS info (device nvme0n1p3): auto enabling async discard
> [ 16.592713] BTRFS info (device nvme0n1p3: state M): use zstd
> compression, level 1
> [11691.071176] CPU: 11 PID: 2068 Comm: gnome-shell Tainted: G W
> L 6.2.0-rc1-1b929c02afd37871d5afb9d498426f83432e71c2-btrfs-fix+
> #7
>
>
> <OFFTOPIC>
> As I mentioned at the first message I also have a computer where the
> btrfs partition is located on a slow HDD.
> When I update the container (podman pull), the system becomes
> unresposible for half an hour, which is how long it takes to update
> the container.
> I do not expect any super-speed from the HDD, I just would like to do
> something else with this computer. Yes, at least watching videos on
> youtube. Is there anything that can be done here or is there nothing
> that we can do?
> [46944.301588] INFO: task btrfs-transacti:1184 blocked for more than
> 122 seconds.
> [46944.301825] Tainted: G W L ------- ---
> 6.2.0-0.rc1.14.fc38.x86_64+debug #1
> [46944.301829] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs"
> disables this message.
> [46944.301832] task:btrfs-transacti state:D stack:12000 pid:1184
> ppid:2 flags:0x00004000
> [46944.301840] Call Trace:
> [46944.301843] <TASK>
> [46944.301851] __schedule+0x50c/0x1780
> [46944.301863] ? _raw_spin_unlock_irqrestore+0x30/0x60
> [46944.301876] schedule+0x5d/0xe0
> [46944.301881] wait_current_trans+0x110/0x170
> [46944.301888] ? __pfx_autoremove_wake_function+0x10/0x10
> [46944.301895] start_transaction+0x36c/0x680
> [46944.301904] transaction_kthread+0xb6/0x1b0
> [46944.301912] ? __pfx_transaction_kthread+0x10/0x10
> [46944.301916] kthread+0xf5/0x120
> [46944.301920] ? __pfx_kthread+0x10/0x10
> [46944.301926] ret_from_fork+0x2c/0x50
> [46944.301941] </TASK>
>
>
> I attached a full kernel log from this machine.
> I can start a separate thread if it makes sense.
> Sorry for oftop.
> </OFFTOPIC>
>
> --
> Best Regards,
> Mike Gavrilov.
On Thu, Dec 29, 2022 at 4:24 AM Qu Wenruo <[email protected]> wrote:
>
> Just one last thing to confirm, mind to provide the following dump?
>
> # btrfs ins dump-tree -t chunk <device>
>
> If the fix works, it means you have some tree block crossing stripe
> boundary, which is not the ideal situation.
I attached this dump.
The only thing that I did from a working system I hope it is not
affected to result?
--
Best Regards,
Mike Gavrilov.
On 2022/12/28 22:12, Mikhail Gavrilov wrote:
> On Wed, Dec 28, 2022 at 6:08 AM Qu Wenruo <[email protected]> wrote:
>>
>> From the very first dmesg with calltrack, it already shows the
>> submit_one_bio() is called from submit_extent_page(), which means cases
>> cross stripe boundary, and has no parent_check populated at all.
>>
>> And since you're using RAID0 on two NVMEs, it matches the symptom, while
>> most tests done here are using single device (DUP and SINGLE), thus no
>> stripe boundary cases at all.
>> (In fact it should still be possible to trigger on SINGLE, but way too
>> hard to trigger)
>>
>> With proper root cause found, this version should mostly handle the
>> regression correctly.
>>
>> This version should mostly be the formal one I'd later send to the
>> mailing list.
>>
>> I can not thank you more for all the testing you have provided, it not
>> only pinned down the bug, but also proves I'm a total idiot...
>
> I have already written over 1.6Tb of data to disk and there are no
> hints of errors.
> For me, this is a sign that the problem has been fixed.
> Tested-by: Mikhail Gavrilov <[email protected]>
>
> ❯ dmesg | grep -i btrfs
> [ 0.000000] Linux version
> 6.2.0-rc1-1b929c02afd37871d5afb9d498426f83432e71c2-btrfs-fix+
> (mikhail@mikhail-laptop) (gcc (GCC) 12.2.1 20221121 (Red Hat
> 12.2.1-4), GNU ld version 2.39-6.fc38) #7 SMP PREEMPT_DYNAMIC Wed Dec
> 28 10:00:39 +05 2022
> [ 0.000000] Command line:
> BOOT_IMAGE=(hd0,gpt1)/@root/boot/vmlinuz-6.2.0-rc1-1b929c02afd37871d5afb9d498426f83432e71c2-btrfs-fix+
> root=UUID=40e0b5d2-df54-46e0-b6f4-2f868296271d ro
> rootflags=subvol=@root
> resume=UUID=db79988f-6b70-4b52-84f5-3e505471c85e log_buf_len=16M
> sysrq_always_enabled=1 nmi_watchdog=1
> amdgpu.lockup_timeout=-1,-1,-1,-1
> [ 0.154567] Kernel command line:
> BOOT_IMAGE=(hd0,gpt1)/@root/boot/vmlinuz-6.2.0-rc1-1b929c02afd37871d5afb9d498426f83432e71c2-btrfs-fix+
> root=UUID=40e0b5d2-df54-46e0-b6f4-2f868296271d ro
> rootflags=subvol=@root
> resume=UUID=db79988f-6b70-4b52-84f5-3e505471c85e log_buf_len=16M
> sysrq_always_enabled=1 nmi_watchdog=1
> amdgpu.lockup_timeout=-1,-1,-1,-1
> [ 0.154654] Unknown kernel command line parameters
> "BOOT_IMAGE=(hd0,gpt1)/@root/boot/vmlinuz-6.2.0-rc1-1b929c02afd37871d5afb9d498426f83432e71c2-btrfs-fix+",
> will be passed to user space.
> [ 4.496766] usb usb2: Manufacturer: Linux
> 6.2.0-rc1-1b929c02afd37871d5afb9d498426f83432e71c2-btrfs-fix+ xhci-hcd
> [ 4.498963] usb usb1: Manufacturer: Linux
> 6.2.0-rc1-1b929c02afd37871d5afb9d498426f83432e71c2-btrfs-fix+ xhci-hcd
> [ 4.500665] usb usb3: Manufacturer: Linux
> 6.2.0-rc1-1b929c02afd37871d5afb9d498426f83432e71c2-btrfs-fix+ xhci-hcd
> [ 4.501851] usb usb4: Manufacturer: Linux
> 6.2.0-rc1-1b929c02afd37871d5afb9d498426f83432e71c2-btrfs-fix+ xhci-hcd
> [ 4.735212] Btrfs loaded, crc32c=crc32c-generic, assert=on,
> zoned=yes, fsverity=yes
> [ 5.223368]
> BOOT_IMAGE=(hd0,gpt1)/@root/boot/vmlinuz-6.2.0-rc1-1b929c02afd37871d5afb9d498426f83432e71c2-btrfs-fix+
> [ 6.923453] BTRFS: device label fedora_localhost-live devid 2
> transid 652981 /dev/nvme1n1p1 scanned by systemd-udevd (448)
> [ 6.974412] BTRFS: device label fedora_localhost-live devid 1
> transid 652981 /dev/nvme0n1p3 scanned by systemd-udevd (484)
> [ 11.113437] CPU: 15 PID: 478 Comm: systemd-udevd Tainted: G
> L 6.2.0-rc1-1b929c02afd37871d5afb9d498426f83432e71c2-btrfs-fix+
> #7
> [ 11.221359] CPU: 15 PID: 478 Comm: systemd-udevd Tainted: G
> W L 6.2.0-rc1-1b929c02afd37871d5afb9d498426f83432e71c2-btrfs-fix+
> #7
> [ 13.731015] BTRFS info (device nvme0n1p3): using crc32c
> (crc32c-intel) checksum algorithm
> [ 13.731147] BTRFS info (device nvme0n1p3): using free space tree
> [ 14.328439] BTRFS info (device nvme0n1p3): enabling ssd optimizations
> [ 14.328469] BTRFS info (device nvme0n1p3): auto enabling async discard
> [ 16.592713] BTRFS info (device nvme0n1p3: state M): use zstd
> compression, level 1
> [11691.071176] CPU: 11 PID: 2068 Comm: gnome-shell Tainted: G W
> L 6.2.0-rc1-1b929c02afd37871d5afb9d498426f83432e71c2-btrfs-fix+
> #7
>
>
> <OFFTOPIC>
> As I mentioned at the first message I also have a computer where the
> btrfs partition is located on a slow HDD.
> When I update the container (podman pull), the system becomes
> unresposible for half an hour, which is how long it takes to update
> the container.
> I do not expect any super-speed from the HDD, I just would like to do
> something else with this computer. Yes, at least watching videos on
> youtube. Is there anything that can be done here or is there nothing
> that we can do?
Are you using qgroup? If so it may be worthy trying disabling qgroup.
But for newer kernel, qgroup hang should only happen when dropping large
snapshot, I don't know if podman pull would cause older snapshots to be
deleted...
Thanks,
Qu
> [46944.301588] INFO: task btrfs-transacti:1184 blocked for more than
> 122 seconds.
> [46944.301825] Tainted: G W L ------- ---
> 6.2.0-0.rc1.14.fc38.x86_64+debug #1
> [46944.301829] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs"
> disables this message.
> [46944.301832] task:btrfs-transacti state:D stack:12000 pid:1184
> ppid:2 flags:0x00004000
> [46944.301840] Call Trace:
> [46944.301843] <TASK>
> [46944.301851] __schedule+0x50c/0x1780
> [46944.301863] ? _raw_spin_unlock_irqrestore+0x30/0x60
> [46944.301876] schedule+0x5d/0xe0
> [46944.301881] wait_current_trans+0x110/0x170
> [46944.301888] ? __pfx_autoremove_wake_function+0x10/0x10
> [46944.301895] start_transaction+0x36c/0x680
> [46944.301904] transaction_kthread+0xb6/0x1b0
> [46944.301912] ? __pfx_transaction_kthread+0x10/0x10
> [46944.301916] kthread+0xf5/0x120
> [46944.301920] ? __pfx_kthread+0x10/0x10
> [46944.301926] ret_from_fork+0x2c/0x50
> [46944.301941] </TASK>
>
>
> I attached a full kernel log from this machine.
> I can start a separate thread if it makes sense.
> Sorry for oftop.
> </OFFTOPIC>
>
> --
> Best Regards,
> Mike Gavrilov.
On 2022/12/29 07:42, Mikhail Gavrilov wrote:
> On Thu, Dec 29, 2022 at 4:24 AM Qu Wenruo <[email protected]> wrote:
>>
>> Just one last thing to confirm, mind to provide the following dump?
>>
>> # btrfs ins dump-tree -t chunk <device>
>>
>> If the fix works, it means you have some tree block crossing stripe
>> boundary, which is not the ideal situation.
>
> I attached this dump.
> The only thing that I did from a working system I hope it is not
> affected to result?
>
For small dumps like chunk tree, it's completely fine to dump from live
systems.
And your dumps (along with your previous debug output) shows no metadata
crossing stripe boundaries.
I'd need dig deeper why btrfs is splitting extent buffers unnecessarily.
Thanks,
Qu
On Thu, Dec 29, 2022 at 4:31 AM Qu Wenruo <[email protected]> wrote:
>
> Are you using qgroup? If so it may be worthy trying disabling qgroup.
I do not use quota.
And looks like my distro does not use quita by default.
❯ btrfs qgroup show -f /
ERROR: can't list qgroups: quotas not enabled
> But for newer kernel, qgroup hang should only happen when dropping large
> snapshot, I don't know if podman pull would cause older snapshots to be
> deleted...
It is not a regression, it also happened on older kernels.
But it is really annoying when the browser process waits when "podman
pull" writes changes to disk.
In fact, I have been waiting for 5 years for caching of slow HDDs by
using the cache on the SSD, but apparently I can’t wait.
And I started slowly buying expensive large SSDs to replace the big
HDD. I still can’t find time to connect D5 P5316 30.72 Tb to the
primary workstation.
I want to make a video review of it. I understand this is an expensive
solution and not suitable for everyone, unlike an affordable SDD
cache.
--
Best Regards,
Mike Gavrilov.
On 2022/12/29 08:08, Mikhail Gavrilov wrote:
> On Thu, Dec 29, 2022 at 4:31 AM Qu Wenruo <[email protected]> wrote:
>>
>> Are you using qgroup? If so it may be worthy trying disabling qgroup.
>
> I do not use quota.
> And looks like my distro does not use quita by default.
> ❯ btrfs qgroup show -f /
> ERROR: can't list qgroups: quotas not enabled
>
>> But for newer kernel, qgroup hang should only happen when dropping large
>> snapshot, I don't know if podman pull would cause older snapshots to be
>> deleted...
>
> It is not a regression, it also happened on older kernels.
> But it is really annoying when the browser process waits when "podman
> pull" writes changes to disk.
> In fact, I have been waiting for 5 years for caching of slow HDDs by
> using the cache on the SSD, but apparently I can’t wait.
> And I started slowly buying expensive large SSDs to replace the big
> HDD. I still can’t find time to connect D5 P5316 30.72 Tb to the
> primary workstation.
> I want to make a video review of it. I understand this is an expensive
> solution and not suitable for everyone, unlike an affordable SDD
> cache.
>
This is really sad to hear that.
For now, I only have several guesses on how this could happen.
- Extra seeks between metadata and data chunks
You can try with mixed block groups, but this needs mkfs time tuning.
- Extra inlined extents causing too much metadata overhead
You can disable inline extents using max_inline=0 as mount options.
But that only affects newly created files, not the existing ones.
Otherwise, using bcache may be a solution.
For now I'm not aware of any HDD specific tests, other than zoned
devices, thus the performance problem can be really hard to debug.
Thanks,
Qu
On Mon, Dec 26, 2022 at 7:50 AM David Arendt <[email protected]> wrote:
>
> Hi,
>
> I am experiencing I similiar issue using kernel 6.1.1. The problem has
> started in kernel 6.1. It never happend before.
>
> A btrfs scrub and a btrfs check --readonly returned 0 errors.
>
> 2022-12-26T07:44:45.000000+01:00 server02 kernel - - - BTRFS critical
> (device sda2): corrupt leaf: root=18446744073709551610
> block=203366612992 slot=73, bad key order, prev (484119 96 1312873)
> current (484119 96 1312849)
> 2022-12-26T07:44:45.000000+01:00 server02 kernel - - - BTRFS info
> (device sda2): leaf 203366612992 gen 5068802 total ptrs 105 free space
> 10820 owner 18446744073709551610
> 2022-12-26T07:44:45.000000+01:00 server02 kernel - - - \x09item 0 key
> (484119 1 0) itemoff 16123 itemsize 160
> 2022-12-26T07:44:45.000000+01:00 server02 kernel - - - \x09\x09inode
> generation 45 size 2250 mode 40700
> 2022-12-26T07:44:45.000000+01:00 server02 kernel - - - \x09item 1 key
> (484119 12 484118) itemoff 16097 itemsize 26
> 2022-12-26T07:44:45.000000+01:00 server02 kernel - - - \x09item 2 key
> (484119 72 15) itemoff 16089 itemsize 8
> 2022-12-26T07:44:45.000000+01:00 server02 kernel - - - \x09item 3 key
> (484119 72 20) itemoff 16081 itemsize 8
> ...
You've removed the most interesting part of the message.
Can you paste it without any cuts (...)?
Thanks.
> 2022-12-26T07:44:45.000000+01:00 server02 kernel - - - \x09item 82 key
> (484119 96 1312873) itemoff 14665 itemsize 51
> 2022-12-26T07:44:45.000000+01:00 server02 kernel - - - \x09item 83 key
> (484119 96 1312877) itemoff 14609 itemsize 56
> 2022-12-26T07:44:45.000000+01:00 server02 kernel - - - \x09item 84 key
> (484128 1 0) itemoff 14449 itemsize 160
> 2022-12-26T07:44:45.000000+01:00 server02 kernel - - - \x09\x09inode
> generation 45 size 98304 mode 100644
> 2022-12-26T07:44:45.000000+01:00 server02 kernel - - - \x09item 85 key
> (484128 108 0) itemoff 14396 itemsize 53
> 2022-12-26T07:44:45.000000+01:00 server02 kernel - - - \x09\x09extent
> data disk bytenr 10674830381056 nr 65536
> 2022-12-26T07:44:45.000000+01:00 server02 kernel - - - \x09\x09extent
> data offset 0 nr 65536 ram 65536
> 2022-12-26T07:44:45.000000+01:00 server02 kernel - - - \x09item 86 key
> (484129 1 0) itemoff 14236 itemsize 160
> 2022-12-26T07:44:45.000000+01:00 server02 kernel - - - \x09\x09inode
> generation 45 size 26214400 mode 100644
> 2022-12-26T07:44:45.000000+01:00 server02 kernel - - - \x09item 87 key
> (484129 108 589824) itemoff 14183 itemsize 53
> 2022-12-26T07:44:45.000000+01:00 server02 kernel - - - \x09\x09extent
> data disk bytenr 10665699962880 nr 32768
> 2022-12-26T07:44:45.000000+01:00 server02 kernel - - - \x09\x09extent
> data offset 0 nr 32768 ram 32768
> 2022-12-26T07:44:45.000000+01:00 server02 kernel - - - \x09item 88 key
> (484129 108 2850816) itemoff 14130 itemsize 53
> 2022-12-26T07:44:45.000000+01:00 server02 kernel - - - \x09\x09extent
> data disk bytenr 10665848733696 nr 32768
> 2022-12-26T07:44:45.000000+01:00 server02 kernel - - - \x09\x09extent
> data offset 0 nr 32768 ram 32768
> 2022-12-26T07:44:45.000000+01:00 server02 kernel - - - \x09item 89 key
> (484129 108 11042816) itemoff 14077 itemsize 53
> 2022-12-26T07:44:45.000000+01:00 server02 kernel - - - \x09\x09extent
> data disk bytenr 10660869349376 nr 32768
> 2022-12-26T07:44:45.000000+01:00 server02 kernel - - - \x09\x09extent
> data offset 0 nr 32768 ram 32768
> 2022-12-26T07:44:45.000000+01:00 server02 kernel - - - \x09item 90 key
> (484129 108 13402112) itemoff 14024 itemsize 53
> 2022-12-26T07:44:45.000000+01:00 server02 kernel - - - \x09\x09extent
> data disk bytenr 10660207378432 nr 32768
> 2022-12-26T07:44:45.000000+01:00 server02 kernel - - - \x09\x09extent
> data offset 0 nr 32768 ram 32768
> 2022-12-26T07:44:45.000000+01:00 server02 kernel - - - \x09item 91 key
> (484129 108 19628032) itemoff 13971 itemsize 53
> 2022-12-26T07:44:45.000000+01:00 server02 kernel - - - \x09\x09extent
> data disk bytenr 10665835618304 nr 32768
> 2022-12-26T07:44:45.000000+01:00 server02 kernel - - - \x09\x09extent
> data offset 0 nr 32768 ram 32768
> 2022-12-26T07:44:45.000000+01:00 server02 kernel - - - \x09item 92 key
> (484129 108 21266432) itemoff 13918 itemsize 53
> 2022-12-26T07:44:45.000000+01:00 server02 kernel - - - \x09\x09extent
> data disk bytenr 10661222666240 nr 32768
> 2022-12-26T07:44:45.000000+01:00 server02 kernel - - - \x09\x09extent
> data offset 0 nr 32768 ram 32768
> 2022-12-26T07:44:45.000000+01:00 server02 kernel - - - \x09item 93 key
> (484129 108 22740992) itemoff 13865 itemsize 53
> 2022-12-26T07:44:45.000000+01:00 server02 kernel - - - \x09\x09extent
> data disk bytenr 10665565814784 nr 32768
> 2022-12-26T07:44:45.000000+01:00 server02 kernel - - - \x09\x09extent
> data offset 0 nr 32768 ram 32768
> 2022-12-26T07:44:45.000000+01:00 server02 kernel - - - \x09item 94 key
> (484129 108 23101440) itemoff 13812 itemsize 53
> 2022-12-26T07:44:45.000000+01:00 server02 kernel - - - \x09\x09extent
> data disk bytenr 10665836371968 nr 32768
> 2022-12-26T07:44:45.000000+01:00 server02 kernel - - - \x09\x09extent
> data offset 0 nr 32768 ram 32768
> 2022-12-26T07:44:45.000000+01:00 server02 kernel - - - \x09item 95 key
> (484129 108 24084480) itemoff 13759 itemsize 53
> 2022-12-26T07:44:45.000000+01:00 server02 kernel - - - \x09\x09extent
> data disk bytenr 10665836404736 nr 32768
> 2022-12-26T07:44:45.000000+01:00 server02 kernel - - - \x09\x09extent
> data offset 0 nr 32768 ram 32768
> 2022-12-26T07:44:45.000000+01:00 server02 kernel - - - \x09item 96 key
> (484129 108 24150016) itemoff 13706 itemsize 53
> 2022-12-26T07:44:45.000000+01:00 server02 kernel - - - \x09\x09extent
> data disk bytenr 10665849159680 nr 32768
> 2022-12-26T07:44:45.000000+01:00 server02 kernel - - - \x09\x09extent
> data offset 0 nr 32768 ram 32768
> 2022-12-26T07:44:45.000000+01:00 server02 kernel - - - \x09item 97 key
> (484129 108 24313856) itemoff 13653 itemsize 53
> 2022-12-26T07:44:45.000000+01:00 server02 kernel - - - \x09\x09extent
> data disk bytenr 10665849192448 nr 32768
> 2022-12-26T07:44:45.000000+01:00 server02 kernel - - - \x09\x09extent
> data offset 0 nr 32768 ram 32768
> 2022-12-26T07:44:45.000000+01:00 server02 kernel - - - \x09item 98 key
> (484147 1 0) itemoff 13493 itemsize 160
> 2022-12-26T07:44:45.000000+01:00 server02 kernel - - - \x09\x09inode
> generation 45 size 886 mode 40755
> 2022-12-26T07:44:45.000000+01:00 server02 kernel - - - \x09item 99 key
> (484147 72 4) itemoff 13485 itemsize 8
> 2022-12-26T07:44:45.000000+01:00 server02 kernel - - - \x09item 100 key
> (484147 72 27) itemoff 13477 itemsize 8
> 2022-12-26T07:44:45.000000+01:00 server02 kernel - - - \x09item 101 key
> (484147 72 35) itemoff 13469 itemsize 8
> 2022-12-26T07:44:45.000000+01:00 server02 kernel - - - \x09item 102 key
> (484147 72 40) itemoff 13461 itemsize 8
> 2022-12-26T07:44:45.000000+01:00 server02 kernel - - - \x09item 103 key
> (484147 72 45) itemoff 13453 itemsize 8
> 2022-12-26T07:44:45.000000+01:00 server02 kernel - - - \x09item 104 key
> (484147 72 52) itemoff 13445 itemsize 8
> 2022-12-26T07:44:45.000000+01:00 server02 kernel - - - BTRFS error
> (device sda2): block=203366612992 write time tree block corruption detected
> 2022-12-26T07:44:45.000000+01:00 server02 kernel - - - BTRFS: error
> (device sda2: state AL) in free_log_tree:3284: errno=-5 IO failure
> 2022-12-26T07:44:45.000000+01:00 server02 kernel - - - BTRFS info
> (device sda2: state EAL): forced readonly
> 2022-12-26T07:44:45.000000+01:00 server02 kernel - - - BTRFS warning
> (device sda2: state EAL): Skipping commit of aborted transaction.
> 2022-12-26T07:44:45.000000+01:00 server02 kernel - - - BTRFS: error
> (device sda2: state EAL) in cleanup_transaction:1958: errno=-5 IO failure
>
>
> There are no SSD access errors in the kernel logs. Smart data for the
> SSD is normal. I also did a 12 hour memtest to rule out bad RAM.
>
> The filesystem consists of a single 480GB SATA SSD (Corsair Neutron
> XTI). Like for the original poster, the problems occurs only on one
> machine.
>
> The error appears about every few days and seems to be triggered by a
> combination specially under high cpu utilization combined with some disk
> IO. CPU temperature never exceeds 60 degrees.
>
>
> On 26/12/2022 04:29, Qu Wenruo wrote:
> >
> >
> > On 2022/12/26 10:47, Mikhail Gavrilov wrote:
> >> On Mon, Dec 26, 2022 at 5:15 AM Qu Wenruo <[email protected]>
> >> wrote:
> >>>
> >>> Thanks a lot for the full kernel log.
> >>>
> >>> It indeed shows something is wrong in the run_one_delayed_ref().
> >>> But surprisingly, if there is something wrong, I'd expect more output
> >>> from btrfs, as normally if one tree block failed to pass whatever the
> >>> checks, it should cause an error message at least.
> >>>
> >>> Since you can reproduce the bug (although I don't think this is easy to
> >>> reproduce), mind to apply the extra debug patch and then try to
> >>> reproduce?
> >>
> >> Of course I am still able to reproduce.
> >> The number of messages foreshadowing readonly has become somewhat more:
> >> [ 2295.155437] BTRFS error (device nvme0n1p3): level check failed on
> >> logical 4957418700800 mirror 1 wanted 0 found 1
> >
> > OK, indeed a level mismatch.
> >
> > From the remaining lines, it shows we're failing at
> > do_free_extent_accounting(), which failed at the btrfs_del_csums().
> >
> > And inside btrfs_del_csums(), what we do are all regular btree
> > operations, thus the tree level check should work without problem.
> >
> > Thus it seems to be a corrupted csum tree.
> >
> >> [ 2295.155831] BTRFS error (device nvme0n1p3: state A): Transaction
> >> aborted (error -5)
> >> [ 2295.155946] BTRFS: error (device nvme0n1p3: state A) in
> >> do_free_extent_accounting:2849: errno=-5 IO failure
> >> [ 2295.155978] BTRFS info (device nvme0n1p3: state EA): forced readonly
> >> [ 2295.155985] BTRFS error (device nvme0n1p3: state EA):
> >> run_one_delayed_ref returned -5
> >> [ 2295.156051] BTRFS: error (device nvme0n1p3: state EA) in
> >> btrfs_run_delayed_refs:2153: errno=-5 IO failure
> >>
> >> Of course full logs are also attached.
> >>
> >>> Another thing is, mind to run "btrfs check --readonly" on the fs?
> >> Result of check attached too.
> >>
> > Could you please run "btrfs check --readonly" from a liveCD?
> > There are tons of possible false alerts if ran on a RW mounted fs.
> >
> > Thanks,
> > Qu
>
>