2023-06-27 00:00:44

by Mirsad Todorovac

[permalink] [raw]
Subject: [BUG] btrfs: MAX_LOCKDEP_CHAIN_HLOCKS too low!

Hi,

There is a bug apparently in btrfs file system.

The platform is an assembled box with Ryzen 9 processor and ASRock X670E PG
Lightning motherboard.

I do not have a reproducer, just kernel log:

Jun 26 20:41:58 defiant kernel: [ 2273.786736] BUG: MAX_LOCKDEP_CHAIN_HLOCKS too low!
Jun 26 20:41:58 defiant kernel: [ 2273.786742] turning off the locking correctness validator.
Jun 26 20:41:58 defiant kernel: [ 2273.786743] CPU: 20 PID: 6316 Comm: URL Classifier Not tainted 6.4.0-kmlk-netdbg-iwlwifi #3
Jun 26 20:41:58 defiant kernel: [ 2273.786746] Hardware name: ASRock X670E PG Lightning/X670E PG Lightning, BIOS 1.21 04/26/2023
Jun 26 20:41:58 defiant kernel: [ 2273.786748] Call Trace:
Jun 26 20:41:58 defiant kernel: [ 2273.786749] <TASK>
Jun 26 20:41:58 defiant kernel: [ 2273.786751] dump_stack_lvl+0x60/0xc0
Jun 26 20:41:58 defiant kernel: [ 2273.786757] dump_stack+0x10/0x20
Jun 26 20:41:58 defiant kernel: [ 2273.786759] __lock_acquire+0x1dae/0x2340
Jun 26 20:41:58 defiant kernel: [ 2273.786764] lock_acquire+0xca/0x2c0
Jun 26 20:41:58 defiant kernel: [ 2273.786767] ? kmemleak_free+0x36/0xc0
Jun 26 20:41:58 defiant kernel: [ 2273.786772] _raw_spin_lock_irqsave+0x4e/0x80
Jun 26 20:41:58 defiant kernel: [ 2273.786775] ? kmemleak_free+0x36/0xc0
Jun 26 20:41:58 defiant kernel: [ 2273.786778] kmemleak_free+0x36/0xc0
Jun 26 20:41:58 defiant kernel: [ 2273.786781] slab_free_freelist_hook.constprop.0+0xe7/0x210
Jun 26 20:41:58 defiant kernel: [ 2273.786784] ? add_delayed_ref_head+0x24e/0x280 [btrfs]
Jun 26 20:41:58 defiant kernel: [ 2273.786843] kmem_cache_free+0x1a4/0x320
Jun 26 20:41:58 defiant kernel: [ 2273.786847] add_delayed_ref_head+0x24e/0x280 [btrfs]
Jun 26 20:41:58 defiant kernel: [ 2273.786881] btrfs_add_delayed_tree_ref+0x284/0x450 [btrfs]
Jun 26 20:41:58 defiant kernel: [ 2273.786912] btrfs_free_tree_block+0xc0/0x2d0 [btrfs]
Jun 26 20:41:58 defiant kernel: [ 2273.786946] btrfs_del_leaf+0xb8/0xf0 [btrfs]
Jun 26 20:41:58 defiant kernel: [ 2273.786975] btrfs_del_items+0x46b/0x4e0 [btrfs]
Jun 26 20:41:58 defiant kernel: [ 2273.787004] btrfs_delete_one_dir_name+0xf9/0x120 [btrfs]
Jun 26 20:41:58 defiant kernel: [ 2273.787034] __btrfs_unlink_inode+0xb5/0x4e0 [btrfs]
Jun 26 20:41:58 defiant kernel: [ 2273.787067] btrfs_unlink+0xdb/0x150 [btrfs]
Jun 26 20:41:58 defiant kernel: [ 2273.787094] vfs_unlink+0x131/0x2a0
Jun 26 20:41:58 defiant kernel: [ 2273.787098] do_unlinkat+0x1b1/0x2d0
Jun 26 20:41:58 defiant kernel: [ 2273.787102] __x64_sys_unlink+0x42/0x70
Jun 26 20:41:58 defiant kernel: [ 2273.787104] do_syscall_64+0x59/0x90
Jun 26 20:41:58 defiant kernel: [ 2273.787107] ? syscall_exit_to_user_mode+0x39/0x60
Jun 26 20:41:58 defiant kernel: [ 2273.787110] ? do_syscall_64+0x69/0x90
Jun 26 20:41:58 defiant kernel: [ 2273.787113] ? syscall_exit_to_user_mode+0x39/0x60
Jun 26 20:41:58 defiant kernel: [ 2273.787116] ? do_syscall_64+0x69/0x90
Jun 26 20:41:58 defiant kernel: [ 2273.787118] ? do_syscall_64+0x69/0x90
Jun 26 20:41:58 defiant kernel: [ 2273.787120] ? do_syscall_64+0x69/0x90
Jun 26 20:41:58 defiant kernel: [ 2273.787122] entry_SYSCALL_64_after_hwframe+0x72/0xdc
Jun 26 20:41:58 defiant kernel: [ 2273.787124] RIP: 0033:0x7fe80911625b
Jun 26 20:41:58 defiant kernel: [ 2273.787127] Code: f0 ff ff 73 01 c3 48 8b 0d d2 2b 10 00 f7 d8 64 89 01 48 83 c8 ff c3 0f 1f 84 00 00 00 00 00 f3 0f 1e fa b8 57 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d a5 2b 10 00 f7 d8 64 89 01 48
Jun 26 20:41:58 defiant kernel: [ 2273.787129] RSP: 002b:00007fe7da0895f8 EFLAGS: 00000206 ORIG_RAX: 0000000000000057
Jun 26 20:41:58 defiant kernel: [ 2273.787131] RAX: ffffffffffffffda RBX: 00007fe77e9c29d0 RCX: 00007fe80911625b
Jun 26 20:41:58 defiant kernel: [ 2273.787133] RDX: 00007fe7da089620 RSI: 00007fe77e705608 RDI: 00007fe77e705608
Jun 26 20:41:58 defiant kernel: [ 2273.787134] RBP: 0000000080520015 R08: 0000000000000000 R09: 0000000000000053
Jun 26 20:41:58 defiant kernel: [ 2273.787135] R10: 0000000000000100 R11: 0000000000000206 R12: 00007fe77e9c2a70
Jun 26 20:41:58 defiant kernel: [ 2273.787137] R13: 00007fe77e9c29d0 R14: 0000000000000000 R15: 0000000000000001
Jun 26 20:41:58 defiant kernel: [ 2273.787140] </TASK>

Later the machine crashed completely, with the broken record sound during the
"make kselftest", which I will try to reproduce later.

The config and lshw are in attachment.

The kernel is the recent 6.4.0 from torvalds tree, with Ubuntu generic config
merged with per kselftest configs.

I understand that the bugs with locking are extremely hard to reproduce, so
I add this bug report for documentation sake.

Best regards,
Mirsad Todorovac


Attachments:
config-6.4.0-kmlk-netdbg-iwlwifi.xz (56.39 kB)
lshw.txt.xz (6.42 kB)
Download all attachments

2023-06-27 00:32:46

by Qu Wenruo

[permalink] [raw]
Subject: Re: [BUG] btrfs: MAX_LOCKDEP_CHAIN_HLOCKS too low!



On 2023/6/27 07:40, Mirsad Goran Todorovac wrote:
> Hi,
>
> There is a bug apparently in btrfs file system.
>
> The platform is an assembled box with Ryzen 9 processor and ASRock X670E PG
> Lightning motherboard.
>
> I do not have a reproducer, just kernel log:
>
> Jun 26 20:41:58 defiant kernel: [ 2273.786736] BUG:
> MAX_LOCKDEP_CHAIN_HLOCKS too low!

This is really what it said.

The setting is too low for certain workload.

In fact Fedora is already increasing this value.
If you want lockdep, I guess that's the only way to go.

Thanks,
Qu
> Jun 26 20:41:58 defiant kernel: [ 2273.786742] turning off the locking
> correctness validator.
> Jun 26 20:41:58 defiant kernel: [ 2273.786743] CPU: 20 PID: 6316 Comm:
> URL Classifier Not tainted 6.4.0-kmlk-netdbg-iwlwifi #3
> Jun 26 20:41:58 defiant kernel: [ 2273.786746] Hardware name: ASRock
> X670E PG Lightning/X670E PG Lightning, BIOS 1.21 04/26/2023
> Jun 26 20:41:58 defiant kernel: [ 2273.786748] Call Trace:
> Jun 26 20:41:58 defiant kernel: [ 2273.786749]  <TASK>
> Jun 26 20:41:58 defiant kernel: [ 2273.786751]  dump_stack_lvl+0x60/0xc0
> Jun 26 20:41:58 defiant kernel: [ 2273.786757]  dump_stack+0x10/0x20
> Jun 26 20:41:58 defiant kernel: [ 2273.786759]
> __lock_acquire+0x1dae/0x2340
> Jun 26 20:41:58 defiant kernel: [ 2273.786764]  lock_acquire+0xca/0x2c0
> Jun 26 20:41:58 defiant kernel: [ 2273.786767]  ? kmemleak_free+0x36/0xc0
> Jun 26 20:41:58 defiant kernel: [ 2273.786772]
> _raw_spin_lock_irqsave+0x4e/0x80
> Jun 26 20:41:58 defiant kernel: [ 2273.786775]  ? kmemleak_free+0x36/0xc0
> Jun 26 20:41:58 defiant kernel: [ 2273.786778]  kmemleak_free+0x36/0xc0
> Jun 26 20:41:58 defiant kernel: [ 2273.786781]
> slab_free_freelist_hook.constprop.0+0xe7/0x210
> Jun 26 20:41:58 defiant kernel: [ 2273.786784]  ?
> add_delayed_ref_head+0x24e/0x280 [btrfs]
> Jun 26 20:41:58 defiant kernel: [ 2273.786843]  kmem_cache_free+0x1a4/0x320
> Jun 26 20:41:58 defiant kernel: [ 2273.786847]
> add_delayed_ref_head+0x24e/0x280 [btrfs]
> Jun 26 20:41:58 defiant kernel: [ 2273.786881]
> btrfs_add_delayed_tree_ref+0x284/0x450 [btrfs]
> Jun 26 20:41:58 defiant kernel: [ 2273.786912]
> btrfs_free_tree_block+0xc0/0x2d0 [btrfs]
> Jun 26 20:41:58 defiant kernel: [ 2273.786946]  btrfs_del_leaf+0xb8/0xf0
> [btrfs]
> Jun 26 20:41:58 defiant kernel: [ 2273.786975]
> btrfs_del_items+0x46b/0x4e0 [btrfs]
> Jun 26 20:41:58 defiant kernel: [ 2273.787004]
> btrfs_delete_one_dir_name+0xf9/0x120 [btrfs]
> Jun 26 20:41:58 defiant kernel: [ 2273.787034]
> __btrfs_unlink_inode+0xb5/0x4e0 [btrfs]
> Jun 26 20:41:58 defiant kernel: [ 2273.787067]  btrfs_unlink+0xdb/0x150
> [btrfs]
> Jun 26 20:41:58 defiant kernel: [ 2273.787094]  vfs_unlink+0x131/0x2a0
> Jun 26 20:41:58 defiant kernel: [ 2273.787098]  do_unlinkat+0x1b1/0x2d0
> Jun 26 20:41:58 defiant kernel: [ 2273.787102]  __x64_sys_unlink+0x42/0x70
> Jun 26 20:41:58 defiant kernel: [ 2273.787104]  do_syscall_64+0x59/0x90
> Jun 26 20:41:58 defiant kernel: [ 2273.787107]  ?
> syscall_exit_to_user_mode+0x39/0x60
> Jun 26 20:41:58 defiant kernel: [ 2273.787110]  ? do_syscall_64+0x69/0x90
> Jun 26 20:41:58 defiant kernel: [ 2273.787113]  ?
> syscall_exit_to_user_mode+0x39/0x60
> Jun 26 20:41:58 defiant kernel: [ 2273.787116]  ? do_syscall_64+0x69/0x90
> Jun 26 20:41:58 defiant kernel: [ 2273.787118]  ? do_syscall_64+0x69/0x90
> Jun 26 20:41:58 defiant kernel: [ 2273.787120]  ? do_syscall_64+0x69/0x90
> Jun 26 20:41:58 defiant kernel: [ 2273.787122]
> entry_SYSCALL_64_after_hwframe+0x72/0xdc
> Jun 26 20:41:58 defiant kernel: [ 2273.787124] RIP: 0033:0x7fe80911625b
> Jun 26 20:41:58 defiant kernel: [ 2273.787127] Code: f0 ff ff 73 01 c3
> 48 8b 0d d2 2b 10 00 f7 d8 64 89 01 48 83 c8 ff c3 0f 1f 84 00 00 00 00
> 00 f3 0f 1e fa b8 57 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b
> 0d a5 2b 10 00 f7 d8 64 89 01 48
> Jun 26 20:41:58 defiant kernel: [ 2273.787129] RSP:
> 002b:00007fe7da0895f8 EFLAGS: 00000206 ORIG_RAX: 0000000000000057
> Jun 26 20:41:58 defiant kernel: [ 2273.787131] RAX: ffffffffffffffda
> RBX: 00007fe77e9c29d0 RCX: 00007fe80911625b
> Jun 26 20:41:58 defiant kernel: [ 2273.787133] RDX: 00007fe7da089620
> RSI: 00007fe77e705608 RDI: 00007fe77e705608
> Jun 26 20:41:58 defiant kernel: [ 2273.787134] RBP: 0000000080520015
> R08: 0000000000000000 R09: 0000000000000053
> Jun 26 20:41:58 defiant kernel: [ 2273.787135] R10: 0000000000000100
> R11: 0000000000000206 R12: 00007fe77e9c2a70
> Jun 26 20:41:58 defiant kernel: [ 2273.787137] R13: 00007fe77e9c29d0
> R14: 0000000000000000 R15: 0000000000000001
> Jun 26 20:41:58 defiant kernel: [ 2273.787140]  </TASK>
>
> Later the machine crashed completely, with the broken record sound
> during the
> "make kselftest", which I will try to reproduce later.
>
> The config and lshw are in attachment.
>
> The kernel is the recent 6.4.0 from torvalds tree, with Ubuntu generic
> config
> merged with per kselftest configs.
>
> I understand that the bugs with locking are extremely hard to reproduce, so
> I add this bug report for documentation sake. >
> Best regards,
> Mirsad Todorovac

2023-06-27 17:00:57

by Mirsad Todorovac

[permalink] [raw]
Subject: Re: [BUG] btrfs: MAX_LOCKDEP_CHAIN_HLOCKS too low!

Hi, Qu, David,

On 6/27/23 16:27, David Sterba wrote:
> On Tue, Jun 27, 2023 at 08:19:37AM +0800, Qu Wenruo wrote:
>>
>> On 2023/6/27 07:40, Mirsad Goran Todorovac wrote:
>>> Hi,
>>>
>>> There is a bug apparently in btrfs file system.
>>>
>>> The platform is an assembled box with Ryzen 9 processor and ASRock X670E PG
>>> Lightning motherboard.
>>>
>>> I do not have a reproducer, just kernel log:
>>>
>>> Jun 26 20:41:58 defiant kernel: [ 2273.786736] BUG:
>>> MAX_LOCKDEP_CHAIN_HLOCKS too low!
>>
>> This is really what it said.
>>
>> The setting is too low for certain workload.
>>
>> In fact Fedora is already increasing this value.
>> If you want lockdep, I guess that's the only way to go.
>
> https://btrfs.readthedocs.io/en/latest/dev/Development-notes.html#bug-max-lockdep-chain-hlocks-too-low
>
> As the reports will probably keep coming until lockdep changes I'll move
> the answer to a more user-visible page.

Thank you for your prompt response. It wasn't obvious from the kernel
message though which .config setting needs adjustment.

I am yet unable to tell whether the lockdep setting being too low and
turning off the locking correctness validator caused the kernel crash, but
I will know more after restarting the kselftest with the new lockdep
setting ...

Best regards,
Mirsad Todorovac

2023-06-27 18:37:53

by Mirsad Todorovac

[permalink] [raw]
Subject: Re: [BUG] btrfs: MAX_LOCKDEP_CHAIN_HLOCKS too low!

On 6/27/23 18:27, Mirsad Goran Todorovac wrote:
> Hi, Qu, David,
>
> On 6/27/23 16:27, David Sterba wrote:
>> On Tue, Jun 27, 2023 at 08:19:37AM +0800, Qu Wenruo wrote:
>>>
>>> On 2023/6/27 07:40, Mirsad Goran Todorovac wrote:
>>>> Hi,
>>>>
>>>> There is a bug apparently in btrfs file system.
>>>>
>>>> The platform is an assembled box with Ryzen 9 processor and ASRock X670E PG
>>>> Lightning motherboard.
>>>>
>>>> I do not have a reproducer, just kernel log:
>>>>
>>>> Jun 26 20:41:58 defiant kernel: [ 2273.786736] BUG:
>>>> MAX_LOCKDEP_CHAIN_HLOCKS too low!
>>>
>>> This is really what it said.
>>>
>>> The setting is too low for certain workload.
>>>
>>> In fact Fedora is already increasing this value.
>>> If you want lockdep, I guess that's the only way to go.
>>
>> https://btrfs.readthedocs.io/en/latest/dev/Development-notes.html#bug-max-lockdep-chain-hlocks-too-low
>>
>> As the reports will probably keep coming until lockdep changes I'll move
>> the answer to a more user-visible page.
>
> Thank you for your prompt response. It wasn't obvious from the kernel
> message though which .config setting needs adjustment.
>
> I am yet unable to tell whether the lockdep setting being too low and
> turning off the locking correctness validator caused the kernel crash, but
> I will know more after restarting the kselftest with the new lockdep
> setting ...

P.S.

I heard it is uncool to answer one's own email, but I have been unable to
reproduce the crash of yesterday without or with pcscd smartcard daemon
in debug mode.

This of course doesn't mean that the LOCKDEP setting was a direct cause
of crash, I believe that findings are inconclusive. (IMHO, correct locking
should work even with locking correctness validator turned off.)

FYI, here are the crash logs, in case it could help improve btrfs:

[1] https://domac.alu.unizg.hr/~mtodorov/linux/crashes/2023-06-27/

Best regards,
Mirsad Todorovac

2023-06-27 19:13:14

by Chris Murphy

[permalink] [raw]
Subject: Re: [BUG] btrfs: MAX_LOCKDEP_CHAIN_HLOCKS too low!



On Tue, Jun 27, 2023, at 12:27 PM, Mirsad Goran Todorovac wrote:

> Thank you for your prompt response. It wasn't obvious from the kernel
> message though which .config setting needs adjustment.


https://gitlab.com/cki-project/kernel-ark/-/merge_requests/2271


> I am yet unable to tell whether the lockdep setting being too low and
> turning off the locking correctness validator caused the kernel crash, but
> I will know more after restarting the kselftest with the new lockdep
> setting ...

Lockdep is complex and so are real file system workloads, so the warnings might seem transient.

--
Chris Murphy