2018-04-29 02:28:41

by Fengguang Wu

[permalink] [raw]
Subject: [crng_reseed] WARNING: inconsistent lock state

Hello,

FYI this happens in mainline kernel 4.17.0-rc2.
It looks like a new regression.

It occurs in 3 out of 3 boots.

There is another "[ 294.642506] BUG: sleeping function called from
invalid context at mm/slab.h:421" at the bottom of this long dmesg:

[ 123.885728] Writes: Total: 41708535 Max/Min: 0/0 Fail: 0
[ 185.312613] Writes: Total: 61148802 Max/Min: 0/0 Fail: 0
[ 246.752095] Writes: Total: 86067258 Max/Min: 0/0 Fail: 0
[ 294.560671]
[ 294.560924] ================================
[ 294.561539] WARNING: inconsistent lock state
[ 294.562165] 4.17.0-rc2 #145 Not tainted
[ 294.562724] --------------------------------
[ 294.563338] inconsistent {HARDIRQ-ON-W} -> {IN-HARDIRQ-W} usage.
[ 294.564201] kworker/0:2/28 [HC1[1]:SC0[0]:HE0:SE1] takes:
[ 294.564976] (ptrval) (fs_reclaim){?.+.}, at: fs_reclaim_acquire+0x0/0x30
[ 294.566157] {HARDIRQ-ON-W} state was registered at:
[ 294.566863] lock_acquire+0x79/0xd0:
lock_acquire at kernel/locking/lockdep.c:3922
[ 294.567379] fs_reclaim_acquire+0x24/0x30
[ 294.568069] fs_reclaim_acquire+0x14/0x20:
fs_reclaim_acquire at mm/page_alloc.c:3740
[ 294.568660] kmem_cache_alloc_node+0x32/0x240:
slab_pre_alloc_hook at mm/slab.h:419
(inlined by) slab_alloc_node at mm/slab.c:3299
(inlined by) kmem_cache_alloc_node at mm/slab.c:3642
[ 294.569294] __kmalloc_node+0x24/0x30:
__kmalloc_node at mm/slab.c:3690

[ 294.569841] alloc_cpumask_var_node+0x1f/0x50
[ 294.570477] zalloc_cpumask_var+0x14/0x20
[ 294.571071] native_smp_prepare_cpus+0xf4/0x364
[ 294.571738] kvm_smp_prepare_cpus+0x21/0xff
[ 294.572349] kernel_init_freeable+0x2b5/0x4db
[ 294.572995] kernel_init+0x9/0x100
[ 294.573499] ret_from_fork+0x24/0x30
[ 294.574169] irq event stamp: 141042
[ 294.574681] hardirqs last enabled at (141041): [<ffffffff81d6dd81>] _raw_spin_unlock_irqrestore+0x31/0x60
[ 294.576050] hardirqs last disabled at (141042): [<ffffffff81e008d4>] interrupt_entry+0xd4/0x100
[ 294.577278] softirqs last enabled at (140644): [<ffffffff81aaec82>] update_defense_level+0x122/0x450
[ 294.578577] softirqs last disabled at (140642): [<ffffffff81aaeb60>] update_defense_level+0x0/0x450
[ 294.579855]
[ 294.579855] other info that might help us debug this:
[ 294.580867] Possible unsafe locking scenario:
[ 294.580867]
[ 294.581714] CPU0
[ 294.582073] ----
[ 294.582431] lock(fs_reclaim);
[ 294.582892] <Interrupt>
[ 294.583273] lock(fs_reclaim);
[ 294.583757]
[ 294.583757] *** DEADLOCK ***
[ 294.583757]
[ 294.584592] 2 locks held by kworker/0:2/28:
[ 294.585196] #0: (ptrval) ((wq_completion)"events"){+.+.}, at: process_one_work+0x1d6/0x460
[ 294.586497] #1: (ptrval) ((work_completion)(&(&adapter->watchdog_task)->work)){+.+.}, at: process_one_work+0x1d6/0x460
[ 294.588125]
[ 294.588125] stack backtrace:
[ 294.588753] CPU: 0 PID: 28 Comm: kworker/0:2 Not tainted 4.17.0-rc2 #145
[ 294.589803] Workqueue: events e1000_watchdog
[ 294.590410] Call Trace:
[ 294.590776] <IRQ>
[ 294.591078] dump_stack+0x8e/0xd5
[ 294.591557] print_usage_bug+0x247/0x262
[ 294.592221] mark_lock+0x5c5/0x660
[ 294.592719] ? check_usage_backwards+0x160/0x160
[ 294.593436] __lock_acquire+0xdcf/0x1b70
[ 294.594006] ? __lock_acquire+0x3f1/0x1b70
[ 294.594595] ? trace_hardirqs_off+0xd/0x10
[ 294.595190] lock_acquire+0x79/0xd0
[ 294.595706] ? lock_acquire+0x79/0xd0
[ 294.596259] ? find_suitable_fallback+0x80/0x80
[ 294.609933] ? crng_reseed+0x13e/0x2e0
[ 294.610479] fs_reclaim_acquire+0x24/0x30
[ 294.611163] ? find_suitable_fallback+0x80/0x80
[ 294.611915] fs_reclaim_acquire+0x14/0x20
[ 294.612490] __kmalloc+0x35/0x240
[ 294.612982] crng_reseed+0x13e/0x2e0
[ 294.613499] credit_entropy_bits+0x21c/0x230
[ 294.625342] add_interrupt_randomness+0x293/0x300
[ 294.626072] handle_irq_event_percpu+0x3b/0x70
[ 294.626718] handle_irq_event+0x34/0x60
[ 294.627272] handle_fasteoi_irq+0x70/0x120
[ 294.627867] handle_irq+0x15/0x20
[ 294.628349] do_IRQ+0x53/0x100
[ 294.628803] common_interrupt+0xf/0xf
[ 294.629331] </IRQ>
[ 294.629653] RIP: 0010:e1000_watchdog+0x126/0x530
[ 294.630335] RSP: 0000:ffff880016823e00 EFLAGS: 00000297 ORIG_RAX: ffffffffffffffda
[ 294.631690] RAX: 0000000000000010 RBX: ffff880014f992c0 RCX: 0000000000000006
[ 294.632971] RDX: ffffc900008e0000 RSI: ffff88001681d3b8 RDI: ffff880014f98e70
[ 294.634237] RBP: ffff880016823e38 R08: 0000000097c673cd R09: 0000000000000000
[ 294.635549] R10: 0000000000000000 R11: 0000000000000000 R12: ffff880014f98a80
[ 294.636838] R13: ffff880014f98000 R14: ffff880018994480 R15: ffff880014f98e70
[ 294.638276] process_one_work+0x236/0x460
[ 294.638993] ? process_one_work+0x1d6/0x460
[ 294.639652] worker_thread+0x35/0x3f0
[ 294.640175] kthread+0x122/0x140
[ 294.640646] ? process_one_work+0x460/0x460
[ 294.641242] ? __kthread_create_on_node+0x190/0x190
[ 294.641941] ret_from_fork+0x24/0x30
[ 294.642506] BUG: sleeping function called from invalid context at mm/slab.h:421
[ 294.644135] in_atomic(): 1, irqs_disabled(): 1, pid: 28, name: kworker/0:2
[ 294.646373] INFO: lockdep is turned off.
[ 294.647940] irq event stamp: 141042
[ 294.648638] hardirqs last enabled at (141041): [<ffffffff81d6dd81>] _raw_spin_unlock_irqrestore+0x31/0x60
[ 294.655362] hardirqs last disabled at (141042): [<ffffffff81e008d4>] interrupt_entry+0xd4/0x100
[ 294.676115] softirqs last enabled at (140644): [<ffffffff81aaec82>] update_defense_level+0x122/0x450
[ 294.677737] softirqs last disabled at (140642): [<ffffffff81aaeb60>] update_defense_level+0x0/0x450
[ 294.682738] CPU: 0 PID: 28 Comm: kworker/0:2 Not tainted 4.17.0-rc2 #145
[ 294.683763] Workqueue: events e1000_watchdog
[ 294.684414] Call Trace:
[ 294.697850] <IRQ>
[ 294.698174] dump_stack+0x8e/0xd5
[ 294.698697] ___might_sleep+0x15f/0x250
[ 294.699288] __might_sleep+0x45/0x80
[ 294.699853] __kmalloc+0x1a3/0x240
[ 294.700387] crng_reseed+0x13e/0x2e0
[ 294.700957] credit_entropy_bits+0x21c/0x230
[ 294.701642] add_interrupt_randomness+0x293/0x300
[ 294.702382] handle_irq_event_percpu+0x3b/0x70
[ 294.703080] handle_irq_event+0x34/0x60
[ 294.703689] handle_fasteoi_irq+0x70/0x120
[ 294.704325] handle_irq+0x15/0x20
[ 294.704863] do_IRQ+0x53/0x100
[ 294.705347] common_interrupt+0xf/0xf
[ 294.705927] </IRQ>
[ 294.706267] RIP: 0010:e1000_watchdog+0x126/0x530
[ 294.706983] RSP: 0000:ffff880016823e00 EFLAGS: 00000297 ORIG_RAX: ffffffffffffffda
[ 294.708142] RAX: 0000000000000010 RBX: ffff880014f992c0 RCX: 0000000000000006
[ 294.709228] RDX: ffffc900008e0000 RSI: ffff88001681d3b8 RDI: ffff880014f98e70
[ 294.710314] RBP: ffff880016823e38 R08: 0000000097c673cd R09: 0000000000000000
[ 294.715815] R10: 0000000000000000 R11: 0000000000000000 R12: ffff880014f98a80
[ 294.716907] R13: ffff880014f98000 R14: ffff880018994480 R15: ffff880014f98e70
[ 294.718019] process_one_work+0x236/0x460
[ 294.718643] ? process_one_work+0x1d6/0x460
[ 294.719281] worker_thread+0x35/0x3f0
[ 294.719859] kthread+0x122/0x140
[ 294.720361] ? process_one_work+0x460/0x460
[ 294.721008] ? __kthread_create_on_node+0x190/0x190
[ 294.721766] ret_from_fork+0x24/0x30
[ 294.722324] random: crng init done

Attached the full dmesg, kconfig and reproduce scripts.

Thanks,
Fengguang


Attachments:
(No filename) (7.36 kB)
dmesg-vm-ivb41-yocto-ia32-18:20180425110746:x86_64-randconfig-s4-04251005:4.17.0-rc2:145 (65.92 kB)
.config (114.46 kB)
job-script (3.84 kB)
reproduce-vm-ivb41-yocto-ia32-18:20180425110746:x86_64-randconfig-s4-04251005:4.17.0-rc2:145 (2.18 kB)
Download all attachments

2018-04-29 03:08:42

by Linus Torvalds

[permalink] [raw]
Subject: Re: [crng_reseed] WARNING: inconsistent lock state

On Sat, Apr 28, 2018 at 7:26 PM Fengguang Wu <[email protected]> wrote:

> FYI this happens in mainline kernel 4.17.0-rc2.
> It looks like a new regression.

> It occurs in 3 out of 3 boots.

> There is another "[ 294.642506] BUG: sleeping function called from
> invalid context at mm/slab.h:421" at the bottom of this long dmesg:

This should be fixed by commit 6c1e851c4edc ("random: fix possible sleeping
allocation from irq context").

Linus

2018-04-29 12:19:25

by Fengguang Wu

[permalink] [raw]
Subject: Re: [crng_reseed] WARNING: inconsistent lock state

On Sun, Apr 29, 2018 at 03:07:06AM +0000, Linus Torvalds wrote:
>On Sat, Apr 28, 2018 at 7:26 PM Fengguang Wu <[email protected]> wrote:
>
>> FYI this happens in mainline kernel 4.17.0-rc2.
>> It looks like a new regression.
>
>> It occurs in 3 out of 3 boots.
>
>> There is another "[ 294.642506] BUG: sleeping function called from
>> invalid context at mm/slab.h:421" at the bottom of this long dmesg:
>
>This should be fixed by commit 6c1e851c4edc ("random: fix possible sleeping
>allocation from irq context").

Yes that fixes the bug. Sorry for the late report!

Regards,
Fengguang