2016-04-01 21:12:20

by Ben Greear

[permalink] [raw]
Subject: [PATCH v2 1/5] ath10k: Ensure txrx-compl-task is stopped when cleaning htt-tx.

From: Ben Greear <[email protected]>

Otherwise, the txrx-compl-task may access some bad memory?

Signed-off-by: Ben Greear <[email protected]>
---
drivers/net/wireless/ath/ath10k/htt_tx.c | 2 ++
1 file changed, 2 insertions(+)

diff --git a/drivers/net/wireless/ath/ath10k/htt_tx.c b/drivers/net/wireless/ath/ath10k/htt_tx.c
index 07b960e..58e88d3 100644
--- a/drivers/net/wireless/ath/ath10k/htt_tx.c
+++ b/drivers/net/wireless/ath/ath10k/htt_tx.c
@@ -376,6 +376,8 @@ void ath10k_htt_tx_free(struct ath10k_htt *htt)
{
int size;

+ tasklet_kill(&htt->txrx_compl_task);
+
idr_for_each(&htt->pending_tx, ath10k_htt_tx_clean_up_pending, htt->ar);
idr_destroy(&htt->pending_tx);

--
2.4.3



2016-04-01 21:12:20

by Ben Greear

[permalink] [raw]
Subject: [PATCH v2 2/5] ath10k: Ensure peer_map references are cleaned up.

From: Ben Greear <[email protected]>

While debugging OS crashes due to firmware crashes, I enabled
kasan, and it noticed that peer objects were being used-after-freed.

Looks like there are two places we could be leaving stale references
in the peer-map, so clean that up.

Signed-off-by: Ben Greear <[email protected]>
---
drivers/net/wireless/ath/ath10k/mac.c | 17 +++++++++++++++++
1 file changed, 17 insertions(+)

diff --git a/drivers/net/wireless/ath/ath10k/mac.c b/drivers/net/wireless/ath/ath10k/mac.c
index 8783119..5e5cc9c 100644
--- a/drivers/net/wireless/ath/ath10k/mac.c
+++ b/drivers/net/wireless/ath/ath10k/mac.c
@@ -794,6 +794,7 @@ static void ath10k_peer_cleanup(struct ath10k *ar, u32 vdev_id)
{
struct ath10k_peer *peer, *tmp;
int peer_id;
+ int i;

lockdep_assert_held(&ar->conf_mutex);

@@ -812,6 +813,17 @@ static void ath10k_peer_cleanup(struct ath10k *ar, u32 vdev_id)
ar->peer_map[peer_id] = NULL;
}

+ /* Double check that peer is properly un-referenced from
+ * the peer_map
+ */
+ for (i = 0; i < ARRAY_SIZE(ar->peer_map); i++) {
+ if (ar->peer_map[i] == peer) {
+ ath10k_warn(ar, "removing stale peer_map entry for %pM (ptr %p idx %d)\n",
+ peer->addr, peer, i);
+ ar->peer_map[i] = NULL;
+ }
+ }
+
list_del(&peer->list);
kfree(peer);
ar->num_peers--;
@@ -840,6 +852,7 @@ void ath10k_dump_peer_info(struct ath10k *ar)
static void ath10k_peer_cleanup_all(struct ath10k *ar)
{
struct ath10k_peer *peer, *tmp;
+ int i;

lockdep_assert_held(&ar->conf_mutex);

@@ -850,6 +863,10 @@ static void ath10k_peer_cleanup_all(struct ath10k *ar)
list_del(&peer->list);
kfree(peer);
}
+
+ for (i = 0; i < ARRAY_SIZE(ar->peer_map); i++)
+ ar->peer_map[i] = NULL;
+
spin_unlock_bh(&ar->data_lock);

ar->num_peers = 0;
--
2.4.3


2016-04-01 21:12:20

by Ben Greear

[permalink] [raw]
Subject: [PATCH v2 3/5] ath10k: Add WARN_ON if we over-write peer-map pointer.

From: Ben Greear <[email protected]>

Not sure this can happen, but seems like a reasonable sanity
check.

Signed-off-by: Ben Greear <[email protected]>
---
drivers/net/wireless/ath/ath10k/txrx.c | 1 +
1 file changed, 1 insertion(+)

diff --git a/drivers/net/wireless/ath/ath10k/txrx.c b/drivers/net/wireless/ath/ath10k/txrx.c
index a779a4e..2edef8a 100644
--- a/drivers/net/wireless/ath/ath10k/txrx.c
+++ b/drivers/net/wireless/ath/ath10k/txrx.c
@@ -309,6 +309,7 @@ void ath10k_peer_map_event(struct ath10k_htt *htt,
ath10k_warn(ar, /*ATH10K_DBG_HTT,*/ "htt peer map vdev %d peer %pM id %d\n",
ev->vdev_id, ev->addr, ev->peer_id);

+ WARN_ON(ar->peer_map[ev->peer_id] && (ar->peer_map[ev->peer_id] != peer));
ar->peer_map[ev->peer_id] = peer;
set_bit(ev->peer_id, peer->peer_ids);
exit:
--
2.4.3


2016-04-01 21:12:20

by Ben Greear

[permalink] [raw]
Subject: [PATCH v2 4/5] ath10k: Clean up peer when sta goes away.

From: Ben Greear <[email protected]>

If WMI and/or firmware has issues removing the peer object,
then we still need to clean up the peer object in the driver.

Signed-off-by: Ben Greear <[email protected]>
---
drivers/net/wireless/ath/ath10k/mac.c | 12 ++++++++++--
1 file changed, 10 insertions(+), 2 deletions(-)

diff --git a/drivers/net/wireless/ath/ath10k/mac.c b/drivers/net/wireless/ath/ath10k/mac.c
index 5e5cc9c..020dd25 100644
--- a/drivers/net/wireless/ath/ath10k/mac.c
+++ b/drivers/net/wireless/ath/ath10k/mac.c
@@ -6501,9 +6501,17 @@ static int ath10k_sta_state(struct ieee80211_hw *hw,
continue;

if (peer->sta == sta) {
- ath10k_warn(ar, "found sta peer %pM id: %d entry on vdev %i after it was supposedly removed\n",
- sta->addr, i, arvif->vdev_id);
+ ath10k_warn(ar, "found sta peer %pM (ptr %p id %d) entry on vdev %i after it was supposedly removed\n",
+ sta->addr, peer, i, arvif->vdev_id);
peer->sta = NULL;
+
+ /* Clean up the peer object as well since we
+ * must have failed to do this above.
+ */
+ list_del(&peer->list);
+ ar->peer_map[i] = NULL;
+ kfree(peer);
+ ar->num_peers--;
}
}
spin_unlock_bh(&ar->data_lock);
--
2.4.3


2016-04-01 21:12:21

by Ben Greear

[permalink] [raw]
Subject: [PATCH v2 5/5] ath10k: Fix deadlock when peer cannot be created.

From: Ben Greear <[email protected]>

We must not attempt to send WMI packets while holding the data-lock,
as it may deadlock:

BUG: sleeping function called from invalid context at drivers/net/wireless/ath/ath10k/wmi.c:1824
in_atomic(): 1, irqs_disabled(): 0, pid: 2878, name: wpa_supplicant

=============================================
[ INFO: possible recursive locking detected ]
4.4.6+ #21 Tainted: G W O
---------------------------------------------
wpa_supplicant/2878 is trying to acquire lock:
(&(&ar->data_lock)->rlock){+.-...}, at: [<ffffffffa0721511>] ath10k_wmi_tx_beacons_iter+0x26/0x11a [ath10k_core]

but task is already holding lock:
(&(&ar->data_lock)->rlock){+.-...}, at: [<ffffffffa070251b>] ath10k_peer_create+0x122/0x1ae [ath10k_core]

other info that might help us debug this:
Possible unsafe locking scenario:

CPU0
----
lock(&(&ar->data_lock)->rlock);
lock(&(&ar->data_lock)->rlock);

*** DEADLOCK ***

May be due to missing lock nesting notation

4 locks held by wpa_supplicant/2878:
#0: (rtnl_mutex){+.+.+.}, at: [<ffffffff816493ca>] rtnl_lock+0x12/0x14
#1: (&ar->conf_mutex){+.+.+.}, at: [<ffffffffa0706932>] ath10k_add_interface+0x3b/0xbda [ath10k_core]
#2: (&(&ar->data_lock)->rlock){+.-...}, at: [<ffffffffa070251b>] ath10k_peer_create+0x122/0x1ae [ath10k_core]
#3: (rcu_read_lock){......}, at: [<ffffffffa062f304>] rcu_read_lock+0x0/0x66 [mac80211]

stack backtrace:
CPU: 3 PID: 2878 Comm: wpa_supplicant Tainted: G W O 4.4.6+ #21
Hardware name: To be filled by O.E.M. To be filled by O.E.M./ChiefRiver, BIOS 4.6.5 06/07/2013
0000000000000000 ffff8801fcadf8f0 ffffffff8137086d ffffffff82681720
ffffffff82681720 ffff8801fcadf9b0 ffffffff8112e3be ffff8801fcadf920
0000000100000000 ffffffff82681720 ffffffffa0721500 ffff8801fcb8d348
Call Trace:
[<ffffffff8137086d>] dump_stack+0x81/0xb6
[<ffffffff8112e3be>] __lock_acquire+0xc5b/0xde7
[<ffffffffa0721500>] ? ath10k_wmi_tx_beacons_iter+0x15/0x11a [ath10k_core]
[<ffffffff8112d0d0>] ? mark_lock+0x24/0x201
[<ffffffff8112e908>] lock_acquire+0x132/0x1cb
[<ffffffff8112e908>] ? lock_acquire+0x132/0x1cb
[<ffffffffa0721511>] ? ath10k_wmi_tx_beacons_iter+0x26/0x11a [ath10k_core]
[<ffffffffa07214eb>] ? ath10k_wmi_cmd_send_nowait+0x1ce/0x1ce [ath10k_core]
[<ffffffff816f9e2b>] _raw_spin_lock_bh+0x31/0x40
[<ffffffffa0721511>] ? ath10k_wmi_tx_beacons_iter+0x26/0x11a [ath10k_core]
[<ffffffffa0721511>] ath10k_wmi_tx_beacons_iter+0x26/0x11a [ath10k_core]
[<ffffffffa07214eb>] ? ath10k_wmi_cmd_send_nowait+0x1ce/0x1ce [ath10k_core]
[<ffffffffa062eb18>] __iterate_interfaces+0x9d/0x13d [mac80211]
[<ffffffffa062f609>] ieee80211_iterate_active_interfaces_atomic+0x32/0x3e [mac80211]
[<ffffffffa07214eb>] ? ath10k_wmi_cmd_send_nowait+0x1ce/0x1ce [ath10k_core]
[<ffffffffa071fa9f>] ath10k_wmi_tx_beacons_nowait.isra.13+0x14/0x16 [ath10k_core]
[<ffffffffa0721676>] ath10k_wmi_cmd_send+0x71/0x242 [ath10k_core]
[<ffffffffa07023f6>] ath10k_wmi_peer_delete+0x3f/0x42 [ath10k_core]
[<ffffffffa0702557>] ath10k_peer_create+0x15e/0x1ae [ath10k_core]
[<ffffffffa0707004>] ath10k_add_interface+0x70d/0xbda [ath10k_core]
[<ffffffffa05fffcc>] drv_add_interface+0x123/0x1a5 [mac80211]
[<ffffffffa061554b>] ieee80211_do_open+0x351/0x667 [mac80211]
[<ffffffffa06158aa>] ieee80211_open+0x49/0x4c [mac80211]
[<ffffffff8163ecf9>] __dev_open+0x88/0xde
[<ffffffff8163ef6e>] __dev_change_flags+0xa4/0x13a
[<ffffffff8163f023>] dev_change_flags+0x1f/0x54
[<ffffffff816a5532>] devinet_ioctl+0x2b9/0x5c9
[<ffffffff816514dd>] ? copy_to_user+0x32/0x38
[<ffffffff816a6115>] inet_ioctl+0x81/0x9d
[<ffffffff816a6115>] ? inet_ioctl+0x81/0x9d
[<ffffffff81621cf8>] sock_do_ioctl+0x20/0x3d
[<ffffffff816223c4>] sock_ioctl+0x222/0x22e
[<ffffffff8121cf95>] do_vfs_ioctl+0x453/0x4d7
[<ffffffff81625603>] ? __sys_recvmsg+0x4c/0x5b
[<ffffffff81225af1>] ? __fget_light+0x48/0x6c
[<ffffffff8121d06b>] SyS_ioctl+0x52/0x74
[<ffffffff816fa736>] entry_SYSCALL_64_fastpath+0x16/0x7a

Signed-off-by: Ben Greear <[email protected]>
---
drivers/net/wireless/ath/ath10k/mac.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/net/wireless/ath/ath10k/mac.c b/drivers/net/wireless/ath/ath10k/mac.c
index 020dd25..be8345c 100644
--- a/drivers/net/wireless/ath/ath10k/mac.c
+++ b/drivers/net/wireless/ath/ath10k/mac.c
@@ -700,10 +700,10 @@ static int ath10k_peer_create(struct ath10k *ar,

peer = ath10k_peer_find(ar, vdev_id, addr);
if (!peer) {
+ spin_unlock_bh(&ar->data_lock);
ath10k_warn(ar, "failed to find peer %pM on vdev %i after creation\n",
addr, vdev_id);
ath10k_wmi_peer_delete(ar, vdev_id, addr);
- spin_unlock_bh(&ar->data_lock);
return -ENOENT;
}

--
2.4.3


2016-05-10 16:38:36

by Mohammed Shafi Shajakhan

[permalink] [raw]
Subject: Re: [PATCH v2 3/5] ath10k: Add WARN_ON if we over-write peer-map pointer.

On Tue, May 10, 2016 at 07:41:44AM -0700, Ben Greear wrote:
>
>
> On 05/10/2016 12:12 AM, Mohammed Shafi Shajakhan wrote:
> >Hi Ben,
> >
> >On Fri, Apr 01, 2016 at 02:12:10PM -0700, [email protected] wrote:
> >>From: Ben Greear <[email protected]>
> >>
> >>Not sure this can happen, but seems like a reasonable sanity
> >>check.
> >
> >[shafi] possibly if the peer is removed and the bit is not cleared in the driver
> >? when the new peer occupies the slot
>
> A firmware bug could cause this, but I never saw this WARN hit. Still seems like
> a good sanity check to me, though...it would be quite hard to debug this sort of bug
> if it actually did happen.

[shafi] yeah its good to have, in one of the case i saw a duplicate event
(though its harmless )

>
> >
> >>
> >>Signed-off-by: Ben Greear <[email protected]>
> >>---
> >> drivers/net/wireless/ath/ath10k/txrx.c | 1 +
> >> 1 file changed, 1 insertion(+)
> >>
> >>diff --git a/drivers/net/wireless/ath/ath10k/txrx.c b/drivers/net/wireless/ath/ath10k/txrx.c
> >>index a779a4e..2edef8a 100644
> >>--- a/drivers/net/wireless/ath/ath10k/txrx.c
> >>+++ b/drivers/net/wireless/ath/ath10k/txrx.c
> >>@@ -309,6 +309,7 @@ void ath10k_peer_map_event(struct ath10k_htt *htt,
> >> ath10k_warn(ar, /*ATH10K_DBG_HTT,*/ "htt peer map vdev %d peer %pM id %d\n",
> >> ev->vdev_id, ev->addr, ev->peer_id);
> >>
> >>+ WARN_ON(ar->peer_map[ev->peer_id] && (ar->peer_map[ev->peer_id] != peer));
> >> ar->peer_map[ev->peer_id] = peer;
> >> set_bit(ev->peer_id, peer->peer_ids);
> >> exit:
> >>--
> >>2.4.3
> >
> >regards,
> >shafi
> >
> >>
> >>
> >>_______________________________________________
> >>ath10k mailing list
> >>[email protected]
> >>http://lists.infradead.org/mailman/listinfo/ath10k
> >
> >_______________________________________________
> >ath10k mailing list
> >[email protected]
> >http://lists.infradead.org/mailman/listinfo/ath10k
> >
>
> --
> Ben Greear <[email protected]>
> Candela Technologies Inc http://www.candelatech.com
>
> _______________________________________________
> ath10k mailing list
> [email protected]
> http://lists.infradead.org/mailman/listinfo/ath10k

2016-05-10 06:48:10

by Mohammed Shafi Shajakhan

[permalink] [raw]
Subject: Re: [PATCH v2 1/5] ath10k: Ensure txrx-compl-task is stopped when cleaning htt-tx.

Hi Ben,

On Fri, Apr 01, 2016 at 02:12:08PM -0700, [email protected] wrote:
> From: Ben Greear <[email protected]>
>
> Otherwise, the txrx-compl-task may access some bad memory?

good to mention when this happens, will be helpful ifsome one recreates
the issue and matches your call trace

>
> Signed-off-by: Ben Greear <[email protected]>
> ---
> drivers/net/wireless/ath/ath10k/htt_tx.c | 2 ++
> 1 file changed, 2 insertions(+)
>
> diff --git a/drivers/net/wireless/ath/ath10k/htt_tx.c b/drivers/net/wireless/ath/ath10k/htt_tx.c
> index 07b960e..58e88d3 100644
> --- a/drivers/net/wireless/ath/ath10k/htt_tx.c
> +++ b/drivers/net/wireless/ath/ath10k/htt_tx.c
> @@ -376,6 +376,8 @@ void ath10k_htt_tx_free(struct ath10k_htt *htt)
> {
> int size;
>
> + tasklet_kill(&htt->txrx_compl_task);
> +
> idr_for_each(&htt->pending_tx, ath10k_htt_tx_clean_up_pending, htt->ar);
> idr_destroy(&htt->pending_tx);
>
> --
> 2.4.3

regards,
shafi

>
>
> _______________________________________________
> ath10k mailing list
> [email protected]
> http://lists.infradead.org/mailman/listinfo/ath10k

2016-05-09 17:58:33

by Ben Greear

[permalink] [raw]
Subject: Re: [PATCH v2 5/5] ath10k: Fix deadlock when peer cannot be created.

On 05/09/2016 10:54 AM, Manoharan, Rajkumar wrote:
>> On Monday, May 9, 2016 10:49 PM, [email protected] wrote:
>>> On 04/01/2016 02:12 PM, [email protected] wrote:
>>> From: Ben Greear <[email protected]>
>>>
>>> We must not attempt to send WMI packets while holding the data-lock,
>>> as it may deadlock:
>>>
>>> BUG: sleeping function called from invalid context at drivers/net/wireless/ath/ath10k/wmi.c:1824
>>> in_atomic(): 1, irqs_disabled(): 0, pid: 2878, name: wpa_supplicant
>>>
>> Kalle: I notice these 5 patches are not in the latest wireless-testing.
>>
>> Are they not acceptable, or???
>>
> Aah!.. I recently cooked up similar patch for BUG_ON issue. I think this one is stable candidate. no?

All 5 have a possibility of being worth stable I think, but at least they should probably
go into upstream!

They were not a lot of fun to find or fix, so it would be nice if no one else had to waste
time on it.

Thanks,
Ben

--
Ben Greear <[email protected]>
Candela Technologies Inc http://www.candelatech.com


2016-05-09 17:54:57

by Rajkumar Manoharan

[permalink] [raw]
Subject: Re: [PATCH v2 5/5] ath10k: Fix deadlock when peer cannot be created.

> On Monday, May 9, 2016 10:49 PM, [email protected] wrote:
>> On 04/01/2016 02:12 PM, [email protected] wrote:
>> From: Ben Greear <[email protected]>
>>
>> We must not attempt to send WMI packets while holding the data-lock,
>> as it may deadlock:
>>
>> BUG: sleeping function called from invalid context at drivers/net/wireless/ath/ath10k/wmi.c:1824
>> in_atomic(): 1, irqs_disabled(): 0, pid: 2878, name: wpa_supplicant
>>
> Kalle: I notice these 5 patches are not in the latest wireless-testing.
>
> Are they not acceptable, or???
>
Aah!.. I recently cooked up similar patch for BUG_ON issue. I think this one is stable candidate. no?

-Rajkumar

2016-05-10 14:40:01

by Ben Greear

[permalink] [raw]
Subject: Re: [PATCH v2 1/5] ath10k: Ensure txrx-compl-task is stopped when cleaning htt-tx.



On 05/09/2016 11:48 PM, Mohammed Shafi Shajakhan wrote:
> Hi Ben,
>
> On Fri, Apr 01, 2016 at 02:12:08PM -0700, [email protected] wrote:
>> From: Ben Greear <[email protected]>
>>
>> Otherwise, the txrx-compl-task may access some bad memory?
>
> good to mention when this happens, will be helpful ifsome one recreates
> the issue and matches your call trace

The backtraces were all over the place because the driver was writing to
memory after it was freed, so I am not sure they are worth while. I'm sure I posted
some to the mailing lists around the time where I was working on this, but since
I was fighting several different problems, hard to know exactly what was what.

Thanks,
Ben

>> Signed-off-by: Ben Greear <[email protected]>
>> ---
>> drivers/net/wireless/ath/ath10k/htt_tx.c | 2 ++
>> 1 file changed, 2 insertions(+)
>>
>> diff --git a/drivers/net/wireless/ath/ath10k/htt_tx.c b/drivers/net/wireless/ath/ath10k/htt_tx.c
>> index 07b960e..58e88d3 100644
>> --- a/drivers/net/wireless/ath/ath10k/htt_tx.c
>> +++ b/drivers/net/wireless/ath/ath10k/htt_tx.c
>> @@ -376,6 +376,8 @@ void ath10k_htt_tx_free(struct ath10k_htt *htt)
>> {
>> int size;
>>
>> + tasklet_kill(&htt->txrx_compl_task);
>> +
>> idr_for_each(&htt->pending_tx, ath10k_htt_tx_clean_up_pending, htt->ar);
>> idr_destroy(&htt->pending_tx);
>>
>> --
>> 2.4.3
>
> regards,
> shafi
>
>>
>>
>> _______________________________________________
>> ath10k mailing list
>> [email protected]
>> http://lists.infradead.org/mailman/listinfo/ath10k
>
> _______________________________________________
> ath10k mailing list
> [email protected]
> http://lists.infradead.org/mailman/listinfo/ath10k
>

--
Ben Greear <[email protected]>
Candela Technologies Inc http://www.candelatech.com

2016-05-10 07:12:25

by Mohammed Shafi Shajakhan

[permalink] [raw]
Subject: Re: [PATCH v2 3/5] ath10k: Add WARN_ON if we over-write peer-map pointer.

Hi Ben,

On Fri, Apr 01, 2016 at 02:12:10PM -0700, [email protected] wrote:
> From: Ben Greear <[email protected]>
>
> Not sure this can happen, but seems like a reasonable sanity
> check.

[shafi] possibly if the peer is removed and the bit is not cleared in the driver
? when the new peer occupies the slot

>
> Signed-off-by: Ben Greear <[email protected]>
> ---
> drivers/net/wireless/ath/ath10k/txrx.c | 1 +
> 1 file changed, 1 insertion(+)
>
> diff --git a/drivers/net/wireless/ath/ath10k/txrx.c b/drivers/net/wireless/ath/ath10k/txrx.c
> index a779a4e..2edef8a 100644
> --- a/drivers/net/wireless/ath/ath10k/txrx.c
> +++ b/drivers/net/wireless/ath/ath10k/txrx.c
> @@ -309,6 +309,7 @@ void ath10k_peer_map_event(struct ath10k_htt *htt,
> ath10k_warn(ar, /*ATH10K_DBG_HTT,*/ "htt peer map vdev %d peer %pM id %d\n",
> ev->vdev_id, ev->addr, ev->peer_id);
>
> + WARN_ON(ar->peer_map[ev->peer_id] && (ar->peer_map[ev->peer_id] != peer));
> ar->peer_map[ev->peer_id] = peer;
> set_bit(ev->peer_id, peer->peer_ids);
> exit:
> --
> 2.4.3

regards,
shafi

>
>
> _______________________________________________
> ath10k mailing list
> [email protected]
> http://lists.infradead.org/mailman/listinfo/ath10k

2016-05-10 14:41:46

by Ben Greear

[permalink] [raw]
Subject: Re: [PATCH v2 3/5] ath10k: Add WARN_ON if we over-write peer-map pointer.



On 05/10/2016 12:12 AM, Mohammed Shafi Shajakhan wrote:
> Hi Ben,
>
> On Fri, Apr 01, 2016 at 02:12:10PM -0700, [email protected] wrote:
>> From: Ben Greear <[email protected]>
>>
>> Not sure this can happen, but seems like a reasonable sanity
>> check.
>
> [shafi] possibly if the peer is removed and the bit is not cleared in the driver
> ? when the new peer occupies the slot

A firmware bug could cause this, but I never saw this WARN hit. Still seems like
a good sanity check to me, though...it would be quite hard to debug this sort of bug
if it actually did happen.

Thanks,
Ben

>
>>
>> Signed-off-by: Ben Greear <[email protected]>
>> ---
>> drivers/net/wireless/ath/ath10k/txrx.c | 1 +
>> 1 file changed, 1 insertion(+)
>>
>> diff --git a/drivers/net/wireless/ath/ath10k/txrx.c b/drivers/net/wireless/ath/ath10k/txrx.c
>> index a779a4e..2edef8a 100644
>> --- a/drivers/net/wireless/ath/ath10k/txrx.c
>> +++ b/drivers/net/wireless/ath/ath10k/txrx.c
>> @@ -309,6 +309,7 @@ void ath10k_peer_map_event(struct ath10k_htt *htt,
>> ath10k_warn(ar, /*ATH10K_DBG_HTT,*/ "htt peer map vdev %d peer %pM id %d\n",
>> ev->vdev_id, ev->addr, ev->peer_id);
>>
>> + WARN_ON(ar->peer_map[ev->peer_id] && (ar->peer_map[ev->peer_id] != peer));
>> ar->peer_map[ev->peer_id] = peer;
>> set_bit(ev->peer_id, peer->peer_ids);
>> exit:
>> --
>> 2.4.3
>
> regards,
> shafi
>
>>
>>
>> _______________________________________________
>> ath10k mailing list
>> [email protected]
>> http://lists.infradead.org/mailman/listinfo/ath10k
>
> _______________________________________________
> ath10k mailing list
> [email protected]
> http://lists.infradead.org/mailman/listinfo/ath10k
>

--
Ben Greear <[email protected]>
Candela Technologies Inc http://www.candelatech.com

2016-05-09 17:19:21

by Ben Greear

[permalink] [raw]
Subject: Re: [PATCH v2 5/5] ath10k: Fix deadlock when peer cannot be created.

Kalle: I notice these 5 patches are not in the latest wireless-testing.

Are they not acceptable, or???

Thanks,
Ben

On 04/01/2016 02:12 PM, [email protected] wrote:
> From: Ben Greear <[email protected]>
>
> We must not attempt to send WMI packets while holding the data-lock,
> as it may deadlock:
>
> BUG: sleeping function called from invalid context at drivers/net/wireless/ath/ath10k/wmi.c:1824
> in_atomic(): 1, irqs_disabled(): 0, pid: 2878, name: wpa_supplicant
>
> =============================================
> [ INFO: possible recursive locking detected ]
> 4.4.6+ #21 Tainted: G W O
> ---------------------------------------------
> wpa_supplicant/2878 is trying to acquire lock:
> (&(&ar->data_lock)->rlock){+.-...}, at: [<ffffffffa0721511>] ath10k_wmi_tx_beacons_iter+0x26/0x11a [ath10k_core]
>
> but task is already holding lock:
> (&(&ar->data_lock)->rlock){+.-...}, at: [<ffffffffa070251b>] ath10k_peer_create+0x122/0x1ae [ath10k_core]
>
> other info that might help us debug this:
> Possible unsafe locking scenario:
>
> CPU0
> ----
> lock(&(&ar->data_lock)->rlock);
> lock(&(&ar->data_lock)->rlock);
>
> *** DEADLOCK ***
>
> May be due to missing lock nesting notation
>
> 4 locks held by wpa_supplicant/2878:
> #0: (rtnl_mutex){+.+.+.}, at: [<ffffffff816493ca>] rtnl_lock+0x12/0x14
> #1: (&ar->conf_mutex){+.+.+.}, at: [<ffffffffa0706932>] ath10k_add_interface+0x3b/0xbda [ath10k_core]
> #2: (&(&ar->data_lock)->rlock){+.-...}, at: [<ffffffffa070251b>] ath10k_peer_create+0x122/0x1ae [ath10k_core]
> #3: (rcu_read_lock){......}, at: [<ffffffffa062f304>] rcu_read_lock+0x0/0x66 [mac80211]
>
> stack backtrace:
> CPU: 3 PID: 2878 Comm: wpa_supplicant Tainted: G W O 4.4.6+ #21
> Hardware name: To be filled by O.E.M. To be filled by O.E.M./ChiefRiver, BIOS 4.6.5 06/07/2013
> 0000000000000000 ffff8801fcadf8f0 ffffffff8137086d ffffffff82681720
> ffffffff82681720 ffff8801fcadf9b0 ffffffff8112e3be ffff8801fcadf920
> 0000000100000000 ffffffff82681720 ffffffffa0721500 ffff8801fcb8d348
> Call Trace:
> [<ffffffff8137086d>] dump_stack+0x81/0xb6
> [<ffffffff8112e3be>] __lock_acquire+0xc5b/0xde7
> [<ffffffffa0721500>] ? ath10k_wmi_tx_beacons_iter+0x15/0x11a [ath10k_core]
> [<ffffffff8112d0d0>] ? mark_lock+0x24/0x201
> [<ffffffff8112e908>] lock_acquire+0x132/0x1cb
> [<ffffffff8112e908>] ? lock_acquire+0x132/0x1cb
> [<ffffffffa0721511>] ? ath10k_wmi_tx_beacons_iter+0x26/0x11a [ath10k_core]
> [<ffffffffa07214eb>] ? ath10k_wmi_cmd_send_nowait+0x1ce/0x1ce [ath10k_core]
> [<ffffffff816f9e2b>] _raw_spin_lock_bh+0x31/0x40
> [<ffffffffa0721511>] ? ath10k_wmi_tx_beacons_iter+0x26/0x11a [ath10k_core]
> [<ffffffffa0721511>] ath10k_wmi_tx_beacons_iter+0x26/0x11a [ath10k_core]
> [<ffffffffa07214eb>] ? ath10k_wmi_cmd_send_nowait+0x1ce/0x1ce [ath10k_core]
> [<ffffffffa062eb18>] __iterate_interfaces+0x9d/0x13d [mac80211]
> [<ffffffffa062f609>] ieee80211_iterate_active_interfaces_atomic+0x32/0x3e [mac80211]
> [<ffffffffa07214eb>] ? ath10k_wmi_cmd_send_nowait+0x1ce/0x1ce [ath10k_core]
> [<ffffffffa071fa9f>] ath10k_wmi_tx_beacons_nowait.isra.13+0x14/0x16 [ath10k_core]
> [<ffffffffa0721676>] ath10k_wmi_cmd_send+0x71/0x242 [ath10k_core]
> [<ffffffffa07023f6>] ath10k_wmi_peer_delete+0x3f/0x42 [ath10k_core]
> [<ffffffffa0702557>] ath10k_peer_create+0x15e/0x1ae [ath10k_core]
> [<ffffffffa0707004>] ath10k_add_interface+0x70d/0xbda [ath10k_core]
> [<ffffffffa05fffcc>] drv_add_interface+0x123/0x1a5 [mac80211]
> [<ffffffffa061554b>] ieee80211_do_open+0x351/0x667 [mac80211]
> [<ffffffffa06158aa>] ieee80211_open+0x49/0x4c [mac80211]
> [<ffffffff8163ecf9>] __dev_open+0x88/0xde
> [<ffffffff8163ef6e>] __dev_change_flags+0xa4/0x13a
> [<ffffffff8163f023>] dev_change_flags+0x1f/0x54
> [<ffffffff816a5532>] devinet_ioctl+0x2b9/0x5c9
> [<ffffffff816514dd>] ? copy_to_user+0x32/0x38
> [<ffffffff816a6115>] inet_ioctl+0x81/0x9d
> [<ffffffff816a6115>] ? inet_ioctl+0x81/0x9d
> [<ffffffff81621cf8>] sock_do_ioctl+0x20/0x3d
> [<ffffffff816223c4>] sock_ioctl+0x222/0x22e
> [<ffffffff8121cf95>] do_vfs_ioctl+0x453/0x4d7
> [<ffffffff81625603>] ? __sys_recvmsg+0x4c/0x5b
> [<ffffffff81225af1>] ? __fget_light+0x48/0x6c
> [<ffffffff8121d06b>] SyS_ioctl+0x52/0x74
> [<ffffffff816fa736>] entry_SYSCALL_64_fastpath+0x16/0x7a
>
> Signed-off-by: Ben Greear <[email protected]>
> ---
> drivers/net/wireless/ath/ath10k/mac.c | 2 +-
> 1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/drivers/net/wireless/ath/ath10k/mac.c b/drivers/net/wireless/ath/ath10k/mac.c
> index 020dd25..be8345c 100644
> --- a/drivers/net/wireless/ath/ath10k/mac.c
> +++ b/drivers/net/wireless/ath/ath10k/mac.c
> @@ -700,10 +700,10 @@ static int ath10k_peer_create(struct ath10k *ar,
>
> peer = ath10k_peer_find(ar, vdev_id, addr);
> if (!peer) {
> + spin_unlock_bh(&ar->data_lock);
> ath10k_warn(ar, "failed to find peer %pM on vdev %i after creation\n",
> addr, vdev_id);
> ath10k_wmi_peer_delete(ar, vdev_id, addr);
> - spin_unlock_bh(&ar->data_lock);
> return -ENOENT;
> }
>
>


--
Ben Greear <[email protected]>
Candela Technologies Inc http://www.candelatech.com


2016-05-13 14:07:40

by Kalle Valo

[permalink] [raw]
Subject: Re: [PATCH v2 5/5] ath10k: Fix deadlock when peer cannot be created.

Ben Greear <[email protected]> writes:

> Kalle: I notice these 5 patches are not in the latest wireless-testing.
>
> Are they not acceptable, or???

These five are in deferred state:

https://patchwork.kernel.org/patch/8727841/

The deferred state means that I postponed them for some reason (for thse
I just can't remember right now why) and will take a look at them later.

--
Kalle Valo

2016-05-10 16:36:47

by Mohammed Shafi Shajakhan

[permalink] [raw]
Subject: Re: [PATCH v2 1/5] ath10k: Ensure txrx-compl-task is stopped when cleaning htt-tx.

> >
> >On Fri, Apr 01, 2016 at 02:12:08PM -0700, [email protected] wrote:
> >>From: Ben Greear <[email protected]>
> >>
> >>Otherwise, the txrx-compl-task may access some bad memory?
> >
> >good to mention when this happens, will be helpful ifsome one recreates
> >the issue and matches your call trace
>
> The backtraces were all over the place because the driver was writing to
> memory after it was freed, so I am not sure they are worth while. I'm sure I posted
> some to the mailing lists around the time where I was working on this, but since
> I was fighting several different problems, hard to know exactly what was what.

[shafi] sure Ben, lets wait for the comments from ath10k reviewers.

>
> Thanks,
> Ben
>
> >>Signed-off-by: Ben Greear <[email protected]>
> >>---
> >> drivers/net/wireless/ath/ath10k/htt_tx.c | 2 ++
> >> 1 file changed, 2 insertions(+)
> >>
> >>diff --git a/drivers/net/wireless/ath/ath10k/htt_tx.c b/drivers/net/wireless/ath/ath10k/htt_tx.c
> >>index 07b960e..58e88d3 100644
> >>--- a/drivers/net/wireless/ath/ath10k/htt_tx.c
> >>+++ b/drivers/net/wireless/ath/ath10k/htt_tx.c
> >>@@ -376,6 +376,8 @@ void ath10k_htt_tx_free(struct ath10k_htt *htt)
> >> {
> >> int size;
> >>
> >>+ tasklet_kill(&htt->txrx_compl_task);
> >>+
> >> idr_for_each(&htt->pending_tx, ath10k_htt_tx_clean_up_pending, htt->ar);
> >> idr_destroy(&htt->pending_tx);
> >>
> >>--
> >>2.4.3
> >
> >regards,
> >shafi
> >
> >>
> >>
> >>_______________________________________________
> >>ath10k mailing list
> >>[email protected]
> >>http://lists.infradead.org/mailman/listinfo/ath10k
> >
> >_______________________________________________
> >ath10k mailing list
> >[email protected]
> >http://lists.infradead.org/mailman/listinfo/ath10k
> >
>
> --
> Ben Greear <[email protected]>
> Candela Technologies Inc http://www.candelatech.com

2016-06-06 17:24:52

by Kalle Valo

[permalink] [raw]
Subject: Re: [v2,5/5] ath10k: Fix deadlock when peer cannot be created.

Ben Greear <[email protected]> wrote:
> From: Ben Greear <[email protected]>
>
> We must not attempt to send WMI packets while holding the data-lock,
> as it may deadlock:
>
> BUG: sleeping function called from invalid context at drivers/net/wireless/ath/ath10k/wmi.c:1824
> in_atomic(): 1, irqs_disabled(): 0, pid: 2878, name: wpa_supplicant
>
> =============================================
> [ INFO: possible recursive locking detected ]
> 4.4.6+ #21 Tainted: G W O
> ---------------------------------------------
> wpa_supplicant/2878 is trying to acquire lock:
> (&(&ar->data_lock)->rlock){+.-...}, at: [<ffffffffa0721511>] ath10k_wmi_tx_beacons_iter+0x26/0x11a [ath10k_core]
>
> but task is already holding lock:
> (&(&ar->data_lock)->rlock){+.-...}, at: [<ffffffffa070251b>] ath10k_peer_create+0x122/0x1ae [ath10k_core]
>
> other info that might help us debug this:
> Possible unsafe locking scenario:
>
> CPU0
> ----
> lock(&(&ar->data_lock)->rlock);
> lock(&(&ar->data_lock)->rlock);
>
> *** DEADLOCK ***
>
> May be due to missing lock nesting notation
>
> 4 locks held by wpa_supplicant/2878:
> #0: (rtnl_mutex){+.+.+.}, at: [<ffffffff816493ca>] rtnl_lock+0x12/0x14
> #1: (&ar->conf_mutex){+.+.+.}, at: [<ffffffffa0706932>] ath10k_add_interface+0x3b/0xbda [ath10k_core]
> #2: (&(&ar->data_lock)->rlock){+.-...}, at: [<ffffffffa070251b>] ath10k_peer_create+0x122/0x1ae [ath10k_core]
> #3: (rcu_read_lock){......}, at: [<ffffffffa062f304>] rcu_read_lock+0x0/0x66 [mac80211]
>
> stack backtrace:
> CPU: 3 PID: 2878 Comm: wpa_supplicant Tainted: G W O 4.4.6+ #21
> Hardware name: To be filled by O.E.M. To be filled by O.E.M./ChiefRiver, BIOS 4.6.5 06/07/2013
> 0000000000000000 ffff8801fcadf8f0 ffffffff8137086d ffffffff82681720
> ffffffff82681720 ffff8801fcadf9b0 ffffffff8112e3be ffff8801fcadf920
> 0000000100000000 ffffffff82681720 ffffffffa0721500 ffff8801fcb8d348
> Call Trace:
> [<ffffffff8137086d>] dump_stack+0x81/0xb6
> [<ffffffff8112e3be>] __lock_acquire+0xc5b/0xde7
> [<ffffffffa0721500>] ? ath10k_wmi_tx_beacons_iter+0x15/0x11a [ath10k_core]
> [<ffffffff8112d0d0>] ? mark_lock+0x24/0x201
> [<ffffffff8112e908>] lock_acquire+0x132/0x1cb
> [<ffffffff8112e908>] ? lock_acquire+0x132/0x1cb
> [<ffffffffa0721511>] ? ath10k_wmi_tx_beacons_iter+0x26/0x11a [ath10k_core]
> [<ffffffffa07214eb>] ? ath10k_wmi_cmd_send_nowait+0x1ce/0x1ce [ath10k_core]
> [<ffffffff816f9e2b>] _raw_spin_lock_bh+0x31/0x40
> [<ffffffffa0721511>] ? ath10k_wmi_tx_beacons_iter+0x26/0x11a [ath10k_core]
> [<ffffffffa0721511>] ath10k_wmi_tx_beacons_iter+0x26/0x11a [ath10k_core]
> [<ffffffffa07214eb>] ? ath10k_wmi_cmd_send_nowait+0x1ce/0x1ce [ath10k_core]
> [<ffffffffa062eb18>] __iterate_interfaces+0x9d/0x13d [mac80211]
> [<ffffffffa062f609>] ieee80211_iterate_active_interfaces_atomic+0x32/0x3e [mac80211]
> [<ffffffffa07214eb>] ? ath10k_wmi_cmd_send_nowait+0x1ce/0x1ce [ath10k_core]
> [<ffffffffa071fa9f>] ath10k_wmi_tx_beacons_nowait.isra.13+0x14/0x16 [ath10k_core]
> [<ffffffffa0721676>] ath10k_wmi_cmd_send+0x71/0x242 [ath10k_core]
> [<ffffffffa07023f6>] ath10k_wmi_peer_delete+0x3f/0x42 [ath10k_core]
> [<ffffffffa0702557>] ath10k_peer_create+0x15e/0x1ae [ath10k_core]
> [<ffffffffa0707004>] ath10k_add_interface+0x70d/0xbda [ath10k_core]
> [<ffffffffa05fffcc>] drv_add_interface+0x123/0x1a5 [mac80211]
> [<ffffffffa061554b>] ieee80211_do_open+0x351/0x667 [mac80211]
> [<ffffffffa06158aa>] ieee80211_open+0x49/0x4c [mac80211]
> [<ffffffff8163ecf9>] __dev_open+0x88/0xde
> [<ffffffff8163ef6e>] __dev_change_flags+0xa4/0x13a
> [<ffffffff8163f023>] dev_change_flags+0x1f/0x54
> [<ffffffff816a5532>] devinet_ioctl+0x2b9/0x5c9
> [<ffffffff816514dd>] ? copy_to_user+0x32/0x38
> [<ffffffff816a6115>] inet_ioctl+0x81/0x9d
> [<ffffffff816a6115>] ? inet_ioctl+0x81/0x9d
> [<ffffffff81621cf8>] sock_do_ioctl+0x20/0x3d
> [<ffffffff816223c4>] sock_ioctl+0x222/0x22e
> [<ffffffff8121cf95>] do_vfs_ioctl+0x453/0x4d7
> [<ffffffff81625603>] ? __sys_recvmsg+0x4c/0x5b
> [<ffffffff81225af1>] ? __fget_light+0x48/0x6c
> [<ffffffff8121d06b>] SyS_ioctl+0x52/0x74
> [<ffffffff816fa736>] entry_SYSCALL_64_fastpath+0x16/0x7a
>
> Signed-off-by: Ben Greear <[email protected]>

Thanks, 1 patch applied to ath-current branch of ath.git:

fee48cf83745 ath10k: fix deadlock when peer cannot be created

--
Sent by pwcli
https://patchwork.kernel.org/patch/8727841/


2016-07-08 06:48:47

by Kalle Valo

[permalink] [raw]
Subject: Re: [v2,3/5] ath10k: Add WARN_ON if we over-write peer-map pointer.

Ben Greear <[email protected]> wrote:
> From: Ben Greear <[email protected]>
>
> Not sure this can happen, but seems like a reasonable sanity
> check.
>
> Signed-off-by: Ben Greear <[email protected]>

Thanks, 2 patches applied to ath-next branch of ath.git:

c5ace87a886d ath10k: Add WARN_ON if we over-write peer-map pointer.
d0eeafad1189 ath10k: Clean up peer when sta goes away.

--
Sent by pwcli
https://patchwork.kernel.org/patch/8727861/


2016-07-08 06:43:09

by Kalle Valo

[permalink] [raw]
Subject: Re: [v2, 1/5] ath10k: Ensure txrx-compl-task is stopped when cleaning htt-tx.

Ben Greear <[email protected]> wrote:
> From: Ben Greear <[email protected]>
>
> Otherwise, the txrx-compl-task may access some bad memory?
>
> Signed-off-by: Ben Greear <[email protected]>

Thanks, 2 patches applied to ath-next branch of ath.git:

de0170beaa88 ath10k: ensure txrx-compl-task is stopped when cleaning htt-tx
6d68f7900d25 ath10k: ensure peer_map references are cleaned up

--
Sent by pwcli
https://patchwork.kernel.org/patch/8727831/