Return-path: Received: from mail-pa0-f52.google.com ([209.85.220.52]:46417 "EHLO mail-pa0-f52.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1750818AbaHQD2s (ORCPT ); Sat, 16 Aug 2014 23:28:48 -0400 Received: by mail-pa0-f52.google.com with SMTP id bj1so5549061pad.39 for ; Sat, 16 Aug 2014 20:28:47 -0700 (PDT) Date: Sat, 16 Aug 2014 20:27:01 -0700 (PDT) From: Hugh Dickins To: Vincent Donnefort cc: Bryan Wu , linux-leds@vger.kernel.org, linux-kernel@vger.kernel.org, linux-wireless@vger.kernel.org Subject: 3.17-rc1: leds blink workqueue causes sleeping BUGs Message-ID: (sfid-20140817_052921_862124_157BF5E2) MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Sender: linux-wireless-owner@vger.kernel.org List-ID: Can we safely revert your 8b37e1bef5a6 ("leds: convert blink timer to workqueue"), or have there been other changes which now depend upon it? Your commit comment says: This patch converts the blink timer from led-core to workqueue which is more suitable for this kind of non-priority operations. Moreover, timer may lead to errors when a LED setting function use a scheduling function such as pinctrl which is using mutex. Which sounds like a good change, except led_blink_set() itself may now sleep, and at least one established user calls it while holding a lock. I have CONFIG_DEBUG_ATOMIC_SLEEP=y, plus lockdep: once wireless comes up, I get the stream of messages below. Reverting 8b37e1bef5a6 works for me, but perhaps something else would need to be reverted too? Hugh BUG: sleeping function called from invalid context at kernel/workqueue.c:2650 in_atomic(): 1, irqs_disabled(): 0, pid: 332, name: wpa_supplicant 7 locks held by wpa_supplicant/332: #0: (cb_lock){++++++}, at: [] genl_rcv+0x14/0x32 #1: (genl_mutex){+.+.+.}, at: [] genl_lock+0x12/0x14 #2: (rtnl_mutex){+.+.+.}, at: [] rtnl_lock+0x12/0x14 #3: (&wdev->mtx){+.+.+.}, at: [] nl80211_authenticate+0x20f/0x2ad #4: (&local->mtx){+.+.+.}, at: [] ieee80211_prep_connection+0x37a/0xbe9 #5: (&local->chanctx_mtx){+.+.+.}, at: [] ieee80211_vif_use_channel+0x6c/0x21e #6: (&trig->leddev_list_lock){.+.+..}, at: [] tpt_trig_timer+0xd0/0x11b Preemption disabled at:[] tpt_trig_timer+0xd0/0x11b CPU: 3 PID: 332 Comm: wpa_supplicant Not tainted 3.17.0-rc1 #2 Hardware name: LENOVO 4174EH1/4174EH1, BIOS 8CET51WW (1.31 ) 11/29/2011 0000000000000000 ffff8800b359b5e8 ffffffff815b4eb1 0000000000000000 ffff8800b359b610 ffffffff810a2f46 ffff8800b34051b0 ffff8800b34051d0 0000000ffffffff1 ffff8800b359b6d8 ffffffff8109966f ffffffff81099610 Call Trace: [] dump_stack+0x4e/0x7a [] __might_sleep+0x1fa/0x201 [] flush_work+0x5f/0x213 [] ? mod_delayed_work_on+0x75/0x75 [] ? __lock_acquire+0x10ec/0x17e8 [] ? mark_held_locks+0x50/0x6e [] ? __cancel_work_timer+0x9d/0xec [] ? trace_hardirqs_on_caller+0x142/0x19e [] __cancel_work_timer+0xa9/0xec [] cancel_delayed_work_sync+0xe/0x10 [] led_blink_set+0x1d/0x39 [] tpt_trig_timer+0xec/0x11b [] ieee80211_mod_tpt_led_trig+0x103/0x130 [] __ieee80211_recalc_idle+0xcf/0x122 [] ieee80211_idle_off+0xe/0x10 [] ieee80211_add_chanctx+0x65/0x110 [] ieee80211_new_chanctx+0x6c/0xcb [] ieee80211_vif_use_channel+0x15d/0x21e [] ieee80211_prep_connection+0x388/0xbe9 [] ieee80211_mgd_auth+0x1db/0x266 [] ? cfg80211_get_bss+0x196/0x1b2 [] ieee80211_auth+0x13/0x15 [] cfg80211_mlme_auth+0x123/0x171 [] nl80211_authenticate+0x26f/0x2ad [] genl_family_rcv_msg+0x1f3/0x254 [] genl_rcv_msg+0x3b/0x5c [] ? genl_family_rcv_msg+0x254/0x254 [] ? genl_family_rcv_msg+0x254/0x254 [] netlink_rcv_skb+0x3c/0x88 [] genl_rcv+0x23/0x32 [] netlink_unicast+0xf4/0x19c [] netlink_sendmsg+0x325/0x37b [] sock_sendmsg+0x69/0x7a [] ? might_fault+0x9c/0xa1 [] ? might_fault+0x53/0xa1 [] ? verify_iovec+0x64/0xb6 [] ___sys_sendmsg+0x1f4/0x272 [] ? debug_smp_processor_id+0x17/0x19 [] ? __fget_light+0xb3/0xda [] __sys_sendmsg+0x3d/0x5e [] SyS_sendmsg+0xd/0x19 [] system_call_fastpath+0x16/0x1b wlp3s0: send auth to c0:3f:0e:ad:ff:ee (try 1/3) wlp3s0: authenticated wlp3s0: associate with c0:3f:0e:ad:ff:ee (try 1/3) wlp3s0: RX AssocResp from c0:3f:0e:ad:ff:ee (capab=0x411 status=0 aid=4) wlp3s0: associated IPv6: ADDRCONF(NETDEV_CHANGE): wlp3s0: link becomes ready ================================= [ INFO: inconsistent lock state ] 3.17.0-rc1 #2 Not tainted --------------------------------- inconsistent {SOFTIRQ-ON-W} -> {IN-SOFTIRQ-W} usage. swapper/3/0 [HC0[0]:SC1[1]:HE1:SE0] takes: ((&(&led_cdev->blink_work)->work)){+.?...}, at: [] flush_work+0x0/0x213 {SOFTIRQ-ON-W} state was registered at: [] __lock_acquire+0x616/0x17e8 [] lock_acquire+0x61/0x78 [] flush_work+0x38/0x213 [] __cancel_work_timer+0xa9/0xec [] cancel_delayed_work_sync+0xe/0x10 [] led_blink_set+0x1d/0x39 [] tpt_trig_timer+0xec/0x11b [] ieee80211_mod_tpt_led_trig+0x103/0x130 [] __ieee80211_recalc_idle+0xcf/0x122 [] ieee80211_idle_off+0xe/0x10 [] ieee80211_add_chanctx+0x65/0x110 [] ieee80211_new_chanctx+0x6c/0xcb [] ieee80211_vif_use_channel+0x15d/0x21e [] ieee80211_prep_connection+0x388/0xbe9 [] ieee80211_mgd_auth+0x1db/0x266 [] ieee80211_auth+0x13/0x15 [] cfg80211_mlme_auth+0x123/0x171 [] nl80211_authenticate+0x26f/0x2ad [] genl_family_rcv_msg+0x1f3/0x254 [] genl_rcv_msg+0x3b/0x5c [] netlink_rcv_skb+0x3c/0x88 [] genl_rcv+0x23/0x32 [] netlink_unicast+0xf4/0x19c [] netlink_sendmsg+0x325/0x37b [] sock_sendmsg+0x69/0x7a [] ___sys_sendmsg+0x1f4/0x272 [] __sys_sendmsg+0x3d/0x5e [] SyS_sendmsg+0xd/0x19 [] system_call_fastpath+0x16/0x1b irq event stamp: 45440 hardirqs last enabled at (45440): [] __cancel_work_timer+0x9d/0xec hardirqs last disabled at (45439): [] try_to_grab_pending+0x21/0x14d softirqs last enabled at (45432): [] _local_bh_enable+0x3e/0x40 softirqs last disabled at (45433): [] irq_exit+0x3d/0x92 other info that might help us debug this: Possible unsafe locking scenario: CPU0 ---- lock((&(&led_cdev->blink_work)->work)); lock((&(&led_cdev->blink_work)->work)); *** DEADLOCK *** 2 locks held by swapper/3/0: #0: (((&tpt_trig->timer))){+.-...}, at: [] call_timer_fn+0x0/0xd4 #1: (&trig->leddev_list_lock){.+.?..}, at: [] tpt_trig_timer+0xd0/0x11b stack backtrace: CPU: 3 PID: 0 Comm: swapper/3 Not tainted 3.17.0-rc1 #2 Hardware name: LENOVO 4174EH1/4174EH1, BIOS 8CET51WW (1.31 ) 11/29/2011 0000000000000000 ffff88023e383b98 ffffffff815b4eb1 ffff8802339ac310 ffff88023e383be8 ffffffff815b1351 0000000000000001 0000000000000001 ffff880200000000 ffff8802339acb28 0000000000000004 0000000000000006 Call Trace: [] dump_stack+0x4e/0x7a [] print_usage_bug+0x2ac/0x2bd [] ? print_irq_inversion_bug+0x1cc/0x1cc [] mark_lock+0x348/0x58e [] __lock_acquire+0x59f/0x17e8 [] ? check_usage_forwards+0xab/0xe7 [] ? __ieee80211_create_tpt_led_trigger+0xf3/0xf3 [] lock_acquire+0x61/0x78 [] ? mod_delayed_work_on+0x75/0x75 [] flush_work+0x38/0x213 [] ? mod_delayed_work_on+0x75/0x75 [] ? __lock_acquire+0x10ec/0x17e8 [] ? mark_lock+0x253/0x58e [] ? mark_held_locks+0x50/0x6e [] ? __cancel_work_timer+0x9d/0xec [] ? __ieee80211_create_tpt_led_trigger+0xf3/0xf3 [] ? trace_hardirqs_on_caller+0x18f/0x19e [] ? __ieee80211_create_tpt_led_trigger+0xf3/0xf3 [] __cancel_work_timer+0xa9/0xec [] cancel_delayed_work_sync+0xe/0x10 [] led_blink_set+0x1d/0x39 [] tpt_trig_timer+0xec/0x11b [] ? __ieee80211_create_tpt_led_trigger+0xf3/0xf3 [] call_timer_fn+0x67/0xd4 [] ? process_timeout+0xb/0xb [] run_timer_softirq+0x1aa/0x1f2 [] __do_softirq+0xfc/0x21f [] irq_exit+0x3d/0x92 [] smp_apic_timer_interrupt+0x3f/0x4b [] apic_timer_interrupt+0x6c/0x80 [] ? cpuidle_enter_state+0x44/0xa0 [] ? cpuidle_enter_state+0x50/0xa0 [] cpuidle_enter+0x12/0x14 [] cpu_startup_entry+0x183/0x23f [] start_secondary+0x1b0/0x1b5 BUG: sleeping function called from invalid context at kernel/workqueue.c:2650 in_atomic(): 1, irqs_disabled(): 0, pid: 0, name: swapper/3 INFO: lockdep is turned off. Preemption disabled at:[] cpu_startup_entry+0x22f/0x23f CPU: 3 PID: 0 Comm: swapper/3 Not tainted 3.17.0-rc1 #2 Hardware name: LENOVO 4174EH1/4174EH1, BIOS 8CET51WW (1.31 ) 11/29/2011 0000000000000000 ffff88023e383cf8 ffffffff815b4eb1 0000000000000000 ffff88023e383d20 ffffffff810a2f46 ffff8800b34051b0 ffff8800b34051d0 0000000ffffffff1 ffff88023e383de8 ffffffff8109966f ffffffff81099610 Call Trace: [] dump_stack+0x4e/0x7a [] __might_sleep+0x1fa/0x201 [] flush_work+0x5f/0x213 [] ? mod_delayed_work_on+0x75/0x75 [] ? __ieee80211_create_tpt_led_trigger+0xf3/0xf3 [] ? try_to_grab_pending+0x3c/0x14d [] ? __ieee80211_create_tpt_led_trigger+0xf3/0xf3 [] __cancel_work_timer+0xa9/0xec [] cancel_delayed_work_sync+0xe/0x10 [] led_blink_set+0x1d/0x39 [] tpt_trig_timer+0xec/0x11b [] ? __ieee80211_create_tpt_led_trigger+0xf3/0xf3 [] call_timer_fn+0x67/0xd4 [] ? process_timeout+0xb/0xb [] run_timer_softirq+0x1aa/0x1f2 [] __do_softirq+0xfc/0x21f [] irq_exit+0x3d/0x92 [] smp_apic_timer_interrupt+0x3f/0x4b [] apic_timer_interrupt+0x6c/0x80 [] ? cpuidle_enter_state+0x44/0xa0 [] ? cpuidle_enter_state+0x4c/0xa0 [] ? cpuidle_enter_state+0x50/0xa0 [] cpuidle_enter+0x12/0x14 [] cpu_startup_entry+0x183/0x23f [] start_secondary+0x1b0/0x1b5 BUG: sleeping function called from invalid context at kernel/workqueue.c:2650 in_atomic(): 1, irqs_disabled(): 0, pid: 0, name: swapper/3 INFO: lockdep is turned off. Preemption disabled at:[] cpu_startup_entry+0x22f/0x23f CPU: 3 PID: 0 Comm: swapper/3 Not tainted 3.17.0-rc1 #2 Hardware name: LENOVO 4174EH1/4174EH1, BIOS 8CET51WW (1.31 ) 11/29/2011 0000000000000000 ffff88023e383cf8 ffffffff815b4eb1 0000000000000000 ffff88023e383d20 ffffffff810a2f46 ffff8800b34051b0 ffff8800b34051d0 0000000ffffffff1 ffff88023e383de8 ffffffff8109966f ffffffff81099610 Call Trace: [] dump_stack+0x4e/0x7a [] __might_sleep+0x1fa/0x201 [] flush_work+0x5f/0x213 [] ? mod_delayed_work_on+0x75/0x75 [] ? __ieee80211_create_tpt_led_trigger+0xf3/0xf3 [] ? try_to_grab_pending+0x3c/0x14d [] ? __ieee80211_create_tpt_led_trigger+0xf3/0xf3 [] __cancel_work_timer+0xa9/0xec [] cancel_delayed_work_sync+0xe/0x10 [] led_blink_set+0x1d/0x39 [] tpt_trig_timer+0xec/0x11b [] ? __ieee80211_create_tpt_led_trigger+0xf3/0xf3 [] call_timer_fn+0x67/0xd4 [] ? process_timeout+0xb/0xb [] run_timer_softirq+0x1aa/0x1f2 [] __do_softirq+0xfc/0x21f [] irq_exit+0x3d/0x92 [] smp_apic_timer_interrupt+0x3f/0x4b [] apic_timer_interrupt+0x6c/0x80 [] ? cpuidle_enter_state+0x44/0xa0 [] ? cpuidle_enter_state+0x4c/0xa0 [] ? cpuidle_enter_state+0x50/0xa0 [] cpuidle_enter+0x12/0x14 [] cpu_startup_entry+0x183/0x23f [] start_secondary+0x1b0/0x1b5 and another such message every second.