2020-12-06 23:04:42

by Richard Weinberger

[permalink] [raw]
Subject: BUG: Invalid wait context with KMEMLEAK and KASAN enabled

Hi!

With both KMEMLEAK and KASAN enabled, I'm facing the following lockdep
splat at random times on Linus' tree as of today.
Sometimes it happens at bootup, sometimes much later when userspace has started.

Does this ring a bell?

[ 2.298447] =============================
[ 2.298971] [ BUG: Invalid wait context ]
[ 2.298971] 5.10.0-rc6+ #388 Not tainted
[ 2.298971] -----------------------------
[ 2.298971] ksoftirqd/1/15 is trying to lock:
[ 2.298971] ffff888100b94598 (&n->list_lock){....}-{3:3}, at:
free_debug_processing+0x3d/0x210
[ 2.298971] other info that might help us debug this:
[ 2.298971] context-{2:2}
[ 2.298971] 1 lock held by ksoftirqd/1/15:
[ 2.298971] #0: ffffffff835f4140 (rcu_callback){....}-{0:0}, at:
rcu_core+0x408/0x1040
[ 2.298971] stack backtrace:
[ 2.298971] CPU: 1 PID: 15 Comm: ksoftirqd/1 Not tainted 5.10.0-rc6+ #388
[ 2.298971] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996),
BIOS rel-1.12.0-0-ga698c89-rebuilt.opensuse.org 04/01/2014
[ 2.298971] Call Trace:
[ 2.298971] <IRQ>
[ 2.298971] dump_stack+0x9a/0xcc
[ 2.298971] __lock_acquire.cold+0xce/0x34b
[ 2.298971] ? lockdep_hardirqs_on_prepare+0x1f0/0x1f0
[ 2.298971] ? rcu_read_lock_sched_held+0x9c/0xd0
[ 2.298971] lock_acquire+0x153/0x4c0
[ 2.298971] ? free_debug_processing+0x3d/0x210
[ 2.298971] ? lock_release+0x690/0x690
[ 2.298971] ? rcu_read_lock_bh_held+0xb0/0xb0
[ 2.298971] ? pvclock_clocksource_read+0xd9/0x1a0
[ 2.298971] _raw_spin_lock_irqsave+0x3b/0x80
[ 2.298971] ? free_debug_processing+0x3d/0x210
[ 2.298971] ? qlist_free_all+0x35/0xd0
[ 2.298971] free_debug_processing+0x3d/0x210
[ 2.298971] __slab_free+0x286/0x490
[ 2.298971] ? lockdep_enabled+0x39/0x50
[ 2.298971] ? rcu_read_lock_sched_held+0x9c/0xd0
[ 2.298971] ? run_posix_cpu_timers+0x256/0x2c0
[ 2.298971] ? rcu_read_lock_bh_held+0xb0/0xb0
[ 2.298971] ? posix_cpu_timers_exit_group+0x30/0x30
[ 2.298971] qlist_free_all+0x59/0xd0
[ 2.298971] ? qlist_free_all+0xd0/0xd0
[ 2.298971] per_cpu_remove_cache+0x47/0x50
[ 2.298971] flush_smp_call_function_queue+0xea/0x2b0
[ 2.298971] __sysvec_call_function+0x6c/0x250
[ 2.298971] asm_call_irq_on_stack+0x12/0x20
[ 2.298971] </IRQ>
[ 2.298971] sysvec_call_function+0x84/0xa0
[ 2.298971] asm_sysvec_call_function+0x12/0x20
[ 2.298971] RIP: 0010:__asan_load4+0x1d/0x80
[ 2.298971] Code: 10 00 75 ee c3 0f 1f 84 00 00 00 00 00 4c 8b 04
24 48 83 ff fb 77 4d 48 b8 ff ff ff ff ff 7f ff ff 48 39 c7 76 3e 48
8d 47 03 <48> 89 c2 83 e2 07 48 83 fa 02 76 17 48 b9 00 00 00 00 00 fc
ff df
[ 2.298971] RSP: 0000:ffff888100e4f858 EFLAGS: 00000216
[ 2.298971] RAX: ffffffff83c55773 RBX: ffffffff81002431 RCX: dffffc0000000000
[ 2.298971] RDX: 0000000000000001 RSI: ffffffff83ee8d78 RDI: ffffffff83c55770
[ 2.298971] RBP: ffffffff83c5576c R08: ffffffff81083433 R09: fffffbfff07e333d
[ 2.298971] R10: 000000000001803d R11: fffffbfff07e333c R12: ffffffff83c5575c
[ 2.298971] R13: ffffffff83c55774 R14: ffffffff83c55770 R15: ffffffff83c55770
[ 2.298971] ? ret_from_fork+0x21/0x30
[ 2.298971] ? __orc_find+0x63/0xc0
[ 2.298971] ? stack_access_ok+0x35/0x90
[ 2.298971] __orc_find+0x63/0xc0
[ 2.298971] unwind_next_frame+0x1ee/0xbd0
[ 2.298971] ? ret_from_fork+0x22/0x30
[ 2.298971] ? ret_from_fork+0x21/0x30
[ 2.298971] ? deref_stack_reg+0x40/0x40
[ 2.298971] ? __unwind_start+0x2e8/0x370
[ 2.298971] ? create_prof_cpu_mask+0x20/0x20
[ 2.298971] arch_stack_walk+0x83/0xf0
[ 2.298971] ? ret_from_fork+0x22/0x30
[ 2.298971] ? rcu_core+0x488/0x1040
[ 2.298971] stack_trace_save+0x8c/0xc0
[ 2.298971] ? stack_trace_consume_entry+0x80/0x80
[ 2.298971] ? sched_clock_local+0x99/0xc0
[ 2.298971] kasan_save_stack+0x1b/0x40
[ 2.298971] ? kasan_save_stack+0x1b/0x40
[ 2.298971] ? kasan_set_track+0x1c/0x30
[ 2.298971] ? kasan_set_free_info+0x1b/0x30
[ 2.298971] ? __kasan_slab_free+0x10f/0x150
[ 2.298971] ? kmem_cache_free+0xa8/0x350
[ 2.298971] ? rcu_core+0x488/0x1040
[ 2.298971] ? __do_softirq+0x101/0x573
[ 2.298971] ? run_ksoftirqd+0x21/0x50
[ 2.298971] ? smpboot_thread_fn+0x1fc/0x380
[ 2.298971] ? kthread+0x1c7/0x220
[ 2.298971] ? ret_from_fork+0x22/0x30
[ 2.298971] ? lockdep_hardirqs_on_prepare+0x1f0/0x1f0
[ 2.298971] ? rcu_read_lock_sched_held+0x9c/0xd0
[ 2.298971] ? lock_acquire+0x153/0x4c0
[ 2.298971] ? rcu_core+0x408/0x1040
[ 2.298971] ? lock_release+0x690/0x690
[ 2.298971] ? lockdep_enabled+0x39/0x50
[ 2.298971] ? mark_held_locks+0x49/0x90
[ 2.298971] kasan_set_track+0x1c/0x30
[ 2.298971] kasan_set_free_info+0x1b/0x30
[ 2.298971] __kasan_slab_free+0x10f/0x150
[ 2.298971] ? rcu_core+0x488/0x1040
[ 2.298971] kmem_cache_free+0xa8/0x350
[ 2.298971] ? __ia32_compat_sys_move_pages+0x130/0x130
[ 2.298971] rcu_core+0x488/0x1040
[ 2.298971] ? call_rcu+0x5d0/0x5d0
[ 2.298971] ? rcu_read_lock_sched_held+0x9c/0xd0
[ 2.298971] ? rcu_read_lock_bh_held+0xb0/0xb0
[ 2.298971] __do_softirq+0x101/0x573
[ 2.298971] ? trace_event_raw_event_irq_handler_exit+0x150/0x150
[ 2.298971] run_ksoftirqd+0x21/0x50
[ 2.298971] smpboot_thread_fn+0x1fc/0x380
[ 2.298971] ? smpboot_register_percpu_thread+0x180/0x180
[ 2.298971] ? __kthread_parkme+0xbb/0xd0
[ 2.298971] ? smpboot_register_percpu_thread+0x180/0x180
[ 2.298971] kthread+0x1c7/0x220
[ 2.298971] ? kthread_create_on_node+0xd0/0xd0
[ 2.298971] ret_from_fork+0x22/0x30

--
Thanks,
//richard


2020-12-07 04:39:40

by Boqun Feng

[permalink] [raw]
Subject: Re: BUG: Invalid wait context with KMEMLEAK and KASAN enabled

Hi Richard,

On Sun, Dec 06, 2020 at 11:59:16PM +0100, Richard Weinberger wrote:
> Hi!
>
> With both KMEMLEAK and KASAN enabled, I'm facing the following lockdep
> splat at random times on Linus' tree as of today.
> Sometimes it happens at bootup, sometimes much later when userspace has started.
>
> Does this ring a bell?
>
> [ 2.298447] =============================
> [ 2.298971] [ BUG: Invalid wait context ]
> [ 2.298971] 5.10.0-rc6+ #388 Not tainted
> [ 2.298971] -----------------------------
> [ 2.298971] ksoftirqd/1/15 is trying to lock:
> [ 2.298971] ffff888100b94598 (&n->list_lock){....}-{3:3}, at:
> free_debug_processing+0x3d/0x210

I guest you also had CONFIG_PROVE_RAW_LOCK_NESTING=y, right? With that
config, the wait context detetion of lockdep will treat spinlock_t as
sleepable locks (considering PREEMPT_RT kernel), and here it complained
about trying to acquire a sleepable lock (in PREEMPT_RT kernel) inside a
irq context which cannot be threaded (in this case, it's the IPI). A
proper fix will be modifying kmem_cache_node->list_lock to
raw_spinlock_t.

Regards,
Boqun

> [ 2.298971] other info that might help us debug this:
> [ 2.298971] context-{2:2}
> [ 2.298971] 1 lock held by ksoftirqd/1/15:
> [ 2.298971] #0: ffffffff835f4140 (rcu_callback){....}-{0:0}, at:
> rcu_core+0x408/0x1040
> [ 2.298971] stack backtrace:
> [ 2.298971] CPU: 1 PID: 15 Comm: ksoftirqd/1 Not tainted 5.10.0-rc6+ #388
> [ 2.298971] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996),
> BIOS rel-1.12.0-0-ga698c89-rebuilt.opensuse.org 04/01/2014
> [ 2.298971] Call Trace:
> [ 2.298971] <IRQ>
> [ 2.298971] dump_stack+0x9a/0xcc
> [ 2.298971] __lock_acquire.cold+0xce/0x34b
> [ 2.298971] ? lockdep_hardirqs_on_prepare+0x1f0/0x1f0
> [ 2.298971] ? rcu_read_lock_sched_held+0x9c/0xd0
> [ 2.298971] lock_acquire+0x153/0x4c0
> [ 2.298971] ? free_debug_processing+0x3d/0x210
> [ 2.298971] ? lock_release+0x690/0x690
> [ 2.298971] ? rcu_read_lock_bh_held+0xb0/0xb0
> [ 2.298971] ? pvclock_clocksource_read+0xd9/0x1a0
> [ 2.298971] _raw_spin_lock_irqsave+0x3b/0x80
> [ 2.298971] ? free_debug_processing+0x3d/0x210
> [ 2.298971] ? qlist_free_all+0x35/0xd0
> [ 2.298971] free_debug_processing+0x3d/0x210
> [ 2.298971] __slab_free+0x286/0x490
> [ 2.298971] ? lockdep_enabled+0x39/0x50
> [ 2.298971] ? rcu_read_lock_sched_held+0x9c/0xd0
> [ 2.298971] ? run_posix_cpu_timers+0x256/0x2c0
> [ 2.298971] ? rcu_read_lock_bh_held+0xb0/0xb0
> [ 2.298971] ? posix_cpu_timers_exit_group+0x30/0x30
> [ 2.298971] qlist_free_all+0x59/0xd0
> [ 2.298971] ? qlist_free_all+0xd0/0xd0
> [ 2.298971] per_cpu_remove_cache+0x47/0x50
> [ 2.298971] flush_smp_call_function_queue+0xea/0x2b0
> [ 2.298971] __sysvec_call_function+0x6c/0x250
> [ 2.298971] asm_call_irq_on_stack+0x12/0x20
> [ 2.298971] </IRQ>
> [ 2.298971] sysvec_call_function+0x84/0xa0
> [ 2.298971] asm_sysvec_call_function+0x12/0x20
> [ 2.298971] RIP: 0010:__asan_load4+0x1d/0x80
> [ 2.298971] Code: 10 00 75 ee c3 0f 1f 84 00 00 00 00 00 4c 8b 04
> 24 48 83 ff fb 77 4d 48 b8 ff ff ff ff ff 7f ff ff 48 39 c7 76 3e 48
> 8d 47 03 <48> 89 c2 83 e2 07 48 83 fa 02 76 17 48 b9 00 00 00 00 00 fc
> ff df
> [ 2.298971] RSP: 0000:ffff888100e4f858 EFLAGS: 00000216
> [ 2.298971] RAX: ffffffff83c55773 RBX: ffffffff81002431 RCX: dffffc0000000000
> [ 2.298971] RDX: 0000000000000001 RSI: ffffffff83ee8d78 RDI: ffffffff83c55770
> [ 2.298971] RBP: ffffffff83c5576c R08: ffffffff81083433 R09: fffffbfff07e333d
> [ 2.298971] R10: 000000000001803d R11: fffffbfff07e333c R12: ffffffff83c5575c
> [ 2.298971] R13: ffffffff83c55774 R14: ffffffff83c55770 R15: ffffffff83c55770
> [ 2.298971] ? ret_from_fork+0x21/0x30
> [ 2.298971] ? __orc_find+0x63/0xc0
> [ 2.298971] ? stack_access_ok+0x35/0x90
> [ 2.298971] __orc_find+0x63/0xc0
> [ 2.298971] unwind_next_frame+0x1ee/0xbd0
> [ 2.298971] ? ret_from_fork+0x22/0x30
> [ 2.298971] ? ret_from_fork+0x21/0x30
> [ 2.298971] ? deref_stack_reg+0x40/0x40
> [ 2.298971] ? __unwind_start+0x2e8/0x370
> [ 2.298971] ? create_prof_cpu_mask+0x20/0x20
> [ 2.298971] arch_stack_walk+0x83/0xf0
> [ 2.298971] ? ret_from_fork+0x22/0x30
> [ 2.298971] ? rcu_core+0x488/0x1040
> [ 2.298971] stack_trace_save+0x8c/0xc0
> [ 2.298971] ? stack_trace_consume_entry+0x80/0x80
> [ 2.298971] ? sched_clock_local+0x99/0xc0
> [ 2.298971] kasan_save_stack+0x1b/0x40
> [ 2.298971] ? kasan_save_stack+0x1b/0x40
> [ 2.298971] ? kasan_set_track+0x1c/0x30
> [ 2.298971] ? kasan_set_free_info+0x1b/0x30
> [ 2.298971] ? __kasan_slab_free+0x10f/0x150
> [ 2.298971] ? kmem_cache_free+0xa8/0x350
> [ 2.298971] ? rcu_core+0x488/0x1040
> [ 2.298971] ? __do_softirq+0x101/0x573
> [ 2.298971] ? run_ksoftirqd+0x21/0x50
> [ 2.298971] ? smpboot_thread_fn+0x1fc/0x380
> [ 2.298971] ? kthread+0x1c7/0x220
> [ 2.298971] ? ret_from_fork+0x22/0x30
> [ 2.298971] ? lockdep_hardirqs_on_prepare+0x1f0/0x1f0
> [ 2.298971] ? rcu_read_lock_sched_held+0x9c/0xd0
> [ 2.298971] ? lock_acquire+0x153/0x4c0
> [ 2.298971] ? rcu_core+0x408/0x1040
> [ 2.298971] ? lock_release+0x690/0x690
> [ 2.298971] ? lockdep_enabled+0x39/0x50
> [ 2.298971] ? mark_held_locks+0x49/0x90
> [ 2.298971] kasan_set_track+0x1c/0x30
> [ 2.298971] kasan_set_free_info+0x1b/0x30
> [ 2.298971] __kasan_slab_free+0x10f/0x150
> [ 2.298971] ? rcu_core+0x488/0x1040
> [ 2.298971] kmem_cache_free+0xa8/0x350
> [ 2.298971] ? __ia32_compat_sys_move_pages+0x130/0x130
> [ 2.298971] rcu_core+0x488/0x1040
> [ 2.298971] ? call_rcu+0x5d0/0x5d0
> [ 2.298971] ? rcu_read_lock_sched_held+0x9c/0xd0
> [ 2.298971] ? rcu_read_lock_bh_held+0xb0/0xb0
> [ 2.298971] __do_softirq+0x101/0x573
> [ 2.298971] ? trace_event_raw_event_irq_handler_exit+0x150/0x150
> [ 2.298971] run_ksoftirqd+0x21/0x50
> [ 2.298971] smpboot_thread_fn+0x1fc/0x380
> [ 2.298971] ? smpboot_register_percpu_thread+0x180/0x180
> [ 2.298971] ? __kthread_parkme+0xbb/0xd0
> [ 2.298971] ? smpboot_register_percpu_thread+0x180/0x180
> [ 2.298971] kthread+0x1c7/0x220
> [ 2.298971] ? kthread_create_on_node+0xd0/0xd0
> [ 2.298971] ret_from_fork+0x22/0x30
>
> --
> Thanks,
> //richard

2020-12-07 07:41:21

by Richard Weinberger

[permalink] [raw]
Subject: Re: BUG: Invalid wait context with KMEMLEAK and KASAN enabled

Boqun,

----- Ursprüngliche Mail -----
>> Does this ring a bell?
>>
>> [ 2.298447] =============================
>> [ 2.298971] [ BUG: Invalid wait context ]
>> [ 2.298971] 5.10.0-rc6+ #388 Not tainted
>> [ 2.298971] -----------------------------
>> [ 2.298971] ksoftirqd/1/15 is trying to lock:
>> [ 2.298971] ffff888100b94598 (&n->list_lock){....}-{3:3}, at:
>> free_debug_processing+0x3d/0x210
>
> I guest you also had CONFIG_PROVE_RAW_LOCK_NESTING=y, right?

Yes, this is the case!

Thanks,
//richard