Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S965260AbdCWPrI (ORCPT ); Thu, 23 Mar 2017 11:47:08 -0400 Received: from s3.sipsolutions.net ([5.9.151.49]:42718 "EHLO sipsolutions.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S964908AbdCWPrH (ORCPT ); Thu, 23 Mar 2017 11:47:07 -0400 Message-ID: <1490284024.2766.12.camel@sipsolutions.net> Subject: Re: deadlock in synchronize_srcu() in debugfs? From: Johannes Berg To: Nicolai Stange Cc: linux-kernel , "Paul E.McKenney" , gregkh Date: Thu, 23 Mar 2017 16:47:04 +0100 In-Reply-To: <87o9ws6m4s.fsf@gmail.com> (sfid-20170323_163621_602585_CBD64B58) References: <1490280886.2766.4.camel@sipsolutions.net> <87o9ws6m4s.fsf@gmail.com> (sfid-20170323_163621_602585_CBD64B58) Content-Type: text/plain; charset="UTF-8" X-Mailer: Evolution 3.22.4-1 Mime-Version: 1.0 Content-Transfer-Encoding: 8bit Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 4330 Lines: 85 Hi, > Not yet. How reproducible is this? Apparently quite. I haven't tried myself - it happens during some automated test that I need to analyse further. > > We're observing that with our (backported, but very recent) driver > > against 4.9 (and 4.10, I think), > > Do I understand it correctly that this driver has been backported > from 4.11-rcX to 4.9/10 Yes. > and that there isn't any issue with 4.11-rcX? No, I can't say this, we haven't run that test. > > but there are no backports of any debugfs things so the backport > > itself doesn't seem like a likely problem. > > Right, there haven't been any SRCU related changes to debugfs after > 4.8. Right. > > sysrq-w shows a lot of tasks blocked on various locks (e.g. RTNL), > > but > > the ultimate problem is the wireless stack getting blocked on > > debugfs_remove_recursive(), in __synchronize_srcu(), in > > wait_for_completion() (while holding lots of locks, hence the other > > tasks getting stuck). > > Could you share a complete backtrace? For example, is the > debugfs_remove_recursive() called from any debugfs file's fops and > thus, possibly from within a SRCU read side critical section? No, it's called from netlink: [  884.634857] wpa_supplicant  D    0  1769   1005 0x00000000 [  884.634874]  0000000000000000 ffff8ca50633d140 ffff8ca507b219c0 ffff8ca5455d4cc0 [  884.634898]  ffff8ca54f599d98 ffff97df431c36a0 ffffffff878dadf3 ffff8ca500000001 [  884.634927]  81ed67337c8469e4 ffff8ca54f599d98 0000932a07b219c0 ffff8ca507b219c0 [  884.634952] Call Trace: [  884.634969]  [] ? __schedule+0x303/0xb00 [  884.634985]  [] schedule+0x3d/0x90 [  884.635002]  [] schedule_timeout+0x2fc/0x600 [  884.635021]  [] ? mark_held_locks+0x66/0x90 [  884.635041]  [] ? _raw_spin_unlock_irq+0x2c/0x40 [  884.635059]  [] wait_for_completion+0xdc/0x110 [  884.635073]  [] ? wake_up_q+0x80/0x80 [  884.635091]  [] __synchronize_srcu+0x11e/0x1c0 [  884.635109]  [] ? trace_raw_output_rcu_utilization+0x60/0x60 [  884.635131]  [] synchronize_srcu+0x32/0x40 [  884.635145]  [] debugfs_remove_recursive+0x17d/0x190 [  884.635239]  [] ieee80211_debugfs_key_remove+0x1e/0x30 [mac80211] [  884.635333]  [] __ieee80211_key_destroy+0x1b3/0x480 [mac80211] [  884.635440]  [] ieee80211_free_sta_keys+0x117/0x170 [mac80211] [  884.635524]  [] __sta_info_destroy_part2+0x4c/0x200 [mac80211] [  884.635597]  [] __sta_info_flush+0x10d/0x1a0 [mac80211] [  884.635706]  [] ieee80211_set_disassoc+0xcb/0x530 [mac80211] [  884.635802]  [] ieee80211_mgd_deauth+0x2e6/0x7b0 [mac80211] [  884.635901]  [] ieee80211_deauth+0x18/0x20 [mac80211] [  884.636024]  [] cfg80211_mlme_deauth+0x14f/0x3b0 [cfg80211] [  884.636110]  [] nl80211_deauthenticate+0xe5/0x130 [cfg80211] [  884.636133]  [] genl_family_rcv_msg+0x1bc/0x370 [  884.636151]  [] ? genl_family_rcv_msg+0x370/0x370 [  884.636262]  [] genl_rcv_msg+0x80/0xc0 [  884.636275]  [] netlink_rcv_skb+0xa7/0xc0 [  884.636289]  [] genl_rcv+0x28/0x40 [  884.636303]  [] netlink_unicast+0x15b/0x210 [  884.636318]  [] netlink_sendmsg+0x31a/0x3a0 [  884.636335]  [] sock_sendmsg+0x38/0x50 [  884.636354]  [] ___sys_sendmsg+0x26c/0x280 [  884.636378]  [] ? ring_buffer_unlock_commit+0x32/0x290 [  884.636393]  [] ? __buffer_unlock_commit+0x1e/0x40 [  884.636407]  [] ? tracing_mark_write+0x162/0x2b0 [  884.636423]  [] ? __lock_is_held+0x49/0x70 [  884.636440]  [] __sys_sendmsg+0x45/0x80 [  884.636459]  [] SyS_sendmsg+0x12/0x20 [  884.636477]  [] entry_SYSCALL_64_fastpath+0x23/0xc6 johannes