Subject: ath10k: Fix soft lockup during firmware crash/hw-restart

From: Mohammed Shafi Shajakhan <[email protected]>

During firmware crash (or) user requested manual restart
the system gets into a soft lock up state because of the
below root cause.

During user requested hardware restart / firmware crash
the system goes into a soft lockup state as 'napi_synchronize'
is called after 'napi_disable' (which sets 'NAPI_STATE_SCHED'
bit) and it sleeps into infinite loop as it waits for
'NAPI_STATE_SCHED' to be cleared. This condition is hit because
'ath10k_hif_stop' is called twice as below (resulting in calling
'napi_synchronize' after 'napi_disable')

'ath10k_core_restart' -> 'ath10k_hif_stop' (ATH10K_STATE_ON) ->
-> 'ieee80211_restart_hw' -> 'ath10k_start' -> 'ath10k_halt' ->
'ath10k_core_stop' -> 'ath10k_hif_stop' (ATH10K_STATE_RESTARTING)

Fix this by calling 'ath10k_halt' in ath10k_core_restart itself
as it makes more sense before informing mac80211 to restart h/w
Also remove 'ath10k_halt' in ath10k_start for the state of 'restarting'

Signed-off-by: Mohammed Shafi Shajakhan <[email protected]>
---
[thanks to Kalle and Michal for their inputs]

drivers/net/wireless/ath/ath10k/core.c | 2 +-
drivers/net/wireless/ath/ath10k/mac.c | 1 -
2 files changed, 1 insertion(+), 2 deletions(-)

diff --git a/drivers/net/wireless/ath/ath10k/core.c b/drivers/net/wireless/ath/ath10k/core.c
index 7005e2a..5bc6847 100644
--- a/drivers/net/wireless/ath/ath10k/core.c
+++ b/drivers/net/wireless/ath/ath10k/core.c
@@ -1536,7 +1536,7 @@ static void ath10k_core_restart(struct work_struct *work)
switch (ar->state) {
case ATH10K_STATE_ON:
ar->state = ATH10K_STATE_RESTARTING;
- ath10k_hif_stop(ar);
+ ath10k_halt(ar);
ath10k_scan_finish(ar);
ieee80211_restart_hw(ar->hw);
break;
diff --git a/drivers/net/wireless/ath/ath10k/mac.c b/drivers/net/wireless/ath/ath10k/mac.c
index 717b2fa..481842b 100644
--- a/drivers/net/wireless/ath/ath10k/mac.c
+++ b/drivers/net/wireless/ath/ath10k/mac.c
@@ -4449,7 +4449,6 @@ static int ath10k_start(struct ieee80211_hw *hw)
ar->state = ATH10K_STATE_ON;
break;
case ATH10K_STATE_RESTARTING:
- ath10k_halt(ar);
ar->state = ATH10K_STATE_RESTARTED;
break;
case ATH10K_STATE_ON:
--
1.9.1


2016-11-30 06:10:47

by Mohammed Shafi Shajakhan

[permalink] [raw]
Subject: Re: ath10k: Fix soft lockup during firmware crash/hw-restart

On Tue, Nov 29, 2016 at 04:43:48PM -0800, Ben Greear wrote:
> I have not seen this hang since adding this patch, so
> hopefully it has resolved the problem.

thanks Ben ! I had updated the commit log with Fixes tag
https://patchwork.kernel.org/patch/9453609/

>
> On 11/29/2016 06:46 AM, Mohammed Shafi Shajakhan wrote:
> >From: Mohammed Shafi Shajakhan <[email protected]>
> >
> >During firmware crash (or) user requested manual restart
> >the system gets into a soft lock up state because of the
> >below root cause.
> >
> >During user requested hardware restart / firmware crash
> >the system goes into a soft lockup state as 'napi_synchronize'
> >is called after 'napi_disable' (which sets 'NAPI_STATE_SCHED'
> >bit) and it sleeps into infinite loop as it waits for
> >'NAPI_STATE_SCHED' to be cleared. This condition is hit because
> >'ath10k_hif_stop' is called twice as below (resulting in calling
> >'napi_synchronize' after 'napi_disable')
> >
> >'ath10k_core_restart' -> 'ath10k_hif_stop' (ATH10K_STATE_ON) ->
> >-> 'ieee80211_restart_hw' -> 'ath10k_start' -> 'ath10k_halt' ->
> >'ath10k_core_stop' -> 'ath10k_hif_stop' (ATH10K_STATE_RESTARTING)
> >
> >Fix this by calling 'ath10k_halt' in ath10k_core_restart itself
> >as it makes more sense before informing mac80211 to restart h/w
> >Also remove 'ath10k_halt' in ath10k_start for the state of 'restarting'
> >
> >Signed-off-by: Mohammed Shafi Shajakhan <[email protected]>
> >---
> >[thanks to Kalle and Michal for their inputs]
> >
> > drivers/net/wireless/ath/ath10k/core.c | 2 +-
> > drivers/net/wireless/ath/ath10k/mac.c | 1 -
> > 2 files changed, 1 insertion(+), 2 deletions(-)
> >
> >diff --git a/drivers/net/wireless/ath/ath10k/core.c b/drivers/net/wireless/ath/ath10k/core.c
> >index 7005e2a..5bc6847 100644
> >--- a/drivers/net/wireless/ath/ath10k/core.c
> >+++ b/drivers/net/wireless/ath/ath10k/core.c
> >@@ -1536,7 +1536,7 @@ static void ath10k_core_restart(struct work_struct *work)
> > switch (ar->state) {
> > case ATH10K_STATE_ON:
> > ar->state = ATH10K_STATE_RESTARTING;
> >- ath10k_hif_stop(ar);
> >+ ath10k_halt(ar);
> > ath10k_scan_finish(ar);
> > ieee80211_restart_hw(ar->hw);
> > break;
> >diff --git a/drivers/net/wireless/ath/ath10k/mac.c b/drivers/net/wireless/ath/ath10k/mac.c
> >index 717b2fa..481842b 100644
> >--- a/drivers/net/wireless/ath/ath10k/mac.c
> >+++ b/drivers/net/wireless/ath/ath10k/mac.c
> >@@ -4449,7 +4449,6 @@ static int ath10k_start(struct ieee80211_hw *hw)
> > ar->state = ATH10K_STATE_ON;
> > break;
> > case ATH10K_STATE_RESTARTING:
> >- ath10k_halt(ar);
> > ar->state = ATH10K_STATE_RESTARTED;
> > break;
> > case ATH10K_STATE_ON:
> >
>
>
> --
> Ben Greear <[email protected]>
> Candela Technologies Inc http://www.candelatech.com
>

2016-11-29 18:16:51

by Ben Greear

[permalink] [raw]
Subject: Re: ath10k: Fix soft lockup during firmware crash/hw-restart

Is this something for stable? And if so, how far back should it be applied?

I'll add this patch to my 4.9 tree and test it...

Thanks,
Ben

On 11/29/2016 06:46 AM, Mohammed Shafi Shajakhan wrote:
> From: Mohammed Shafi Shajakhan <[email protected]>
>
> During firmware crash (or) user requested manual restart
> the system gets into a soft lock up state because of the
> below root cause.
>
> During user requested hardware restart / firmware crash
> the system goes into a soft lockup state as 'napi_synchronize'
> is called after 'napi_disable' (which sets 'NAPI_STATE_SCHED'
> bit) and it sleeps into infinite loop as it waits for
> 'NAPI_STATE_SCHED' to be cleared. This condition is hit because
> 'ath10k_hif_stop' is called twice as below (resulting in calling
> 'napi_synchronize' after 'napi_disable')
>
> 'ath10k_core_restart' -> 'ath10k_hif_stop' (ATH10K_STATE_ON) ->
> -> 'ieee80211_restart_hw' -> 'ath10k_start' -> 'ath10k_halt' ->
> 'ath10k_core_stop' -> 'ath10k_hif_stop' (ATH10K_STATE_RESTARTING)
>
> Fix this by calling 'ath10k_halt' in ath10k_core_restart itself
> as it makes more sense before informing mac80211 to restart h/w
> Also remove 'ath10k_halt' in ath10k_start for the state of 'restarting'
>
> Signed-off-by: Mohammed Shafi Shajakhan <[email protected]>
> ---
> [thanks to Kalle and Michal for their inputs]
>
> drivers/net/wireless/ath/ath10k/core.c | 2 +-
> drivers/net/wireless/ath/ath10k/mac.c | 1 -
> 2 files changed, 1 insertion(+), 2 deletions(-)
>
> diff --git a/drivers/net/wireless/ath/ath10k/core.c b/drivers/net/wireless/ath/ath10k/core.c
> index 7005e2a..5bc6847 100644
> --- a/drivers/net/wireless/ath/ath10k/core.c
> +++ b/drivers/net/wireless/ath/ath10k/core.c
> @@ -1536,7 +1536,7 @@ static void ath10k_core_restart(struct work_struct *work)
> switch (ar->state) {
> case ATH10K_STATE_ON:
> ar->state = ATH10K_STATE_RESTARTING;
> - ath10k_hif_stop(ar);
> + ath10k_halt(ar);
> ath10k_scan_finish(ar);
> ieee80211_restart_hw(ar->hw);
> break;
> diff --git a/drivers/net/wireless/ath/ath10k/mac.c b/drivers/net/wireless/ath/ath10k/mac.c
> index 717b2fa..481842b 100644
> --- a/drivers/net/wireless/ath/ath10k/mac.c
> +++ b/drivers/net/wireless/ath/ath10k/mac.c
> @@ -4449,7 +4449,6 @@ static int ath10k_start(struct ieee80211_hw *hw)
> ar->state = ATH10K_STATE_ON;
> break;
> case ATH10K_STATE_RESTARTING:
> - ath10k_halt(ar);
> ar->state = ATH10K_STATE_RESTARTED;
> break;
> case ATH10K_STATE_ON:
>


--
Ben Greear <[email protected]>
Candela Technologies Inc http://www.candelatech.com

2016-11-30 00:43:49

by Ben Greear

[permalink] [raw]
Subject: Re: ath10k: Fix soft lockup during firmware crash/hw-restart

I have not seen this hang since adding this patch, so
hopefully it has resolved the problem.

Thanks,
Ben

On 11/29/2016 06:46 AM, Mohammed Shafi Shajakhan wrote:
> From: Mohammed Shafi Shajakhan <[email protected]>
>
> During firmware crash (or) user requested manual restart
> the system gets into a soft lock up state because of the
> below root cause.
>
> During user requested hardware restart / firmware crash
> the system goes into a soft lockup state as 'napi_synchronize'
> is called after 'napi_disable' (which sets 'NAPI_STATE_SCHED'
> bit) and it sleeps into infinite loop as it waits for
> 'NAPI_STATE_SCHED' to be cleared. This condition is hit because
> 'ath10k_hif_stop' is called twice as below (resulting in calling
> 'napi_synchronize' after 'napi_disable')
>
> 'ath10k_core_restart' -> 'ath10k_hif_stop' (ATH10K_STATE_ON) ->
> -> 'ieee80211_restart_hw' -> 'ath10k_start' -> 'ath10k_halt' ->
> 'ath10k_core_stop' -> 'ath10k_hif_stop' (ATH10K_STATE_RESTARTING)
>
> Fix this by calling 'ath10k_halt' in ath10k_core_restart itself
> as it makes more sense before informing mac80211 to restart h/w
> Also remove 'ath10k_halt' in ath10k_start for the state of 'restarting'
>
> Signed-off-by: Mohammed Shafi Shajakhan <[email protected]>
> ---
> [thanks to Kalle and Michal for their inputs]
>
> drivers/net/wireless/ath/ath10k/core.c | 2 +-
> drivers/net/wireless/ath/ath10k/mac.c | 1 -
> 2 files changed, 1 insertion(+), 2 deletions(-)
>
> diff --git a/drivers/net/wireless/ath/ath10k/core.c b/drivers/net/wireless/ath/ath10k/core.c
> index 7005e2a..5bc6847 100644
> --- a/drivers/net/wireless/ath/ath10k/core.c
> +++ b/drivers/net/wireless/ath/ath10k/core.c
> @@ -1536,7 +1536,7 @@ static void ath10k_core_restart(struct work_struct *work)
> switch (ar->state) {
> case ATH10K_STATE_ON:
> ar->state = ATH10K_STATE_RESTARTING;
> - ath10k_hif_stop(ar);
> + ath10k_halt(ar);
> ath10k_scan_finish(ar);
> ieee80211_restart_hw(ar->hw);
> break;
> diff --git a/drivers/net/wireless/ath/ath10k/mac.c b/drivers/net/wireless/ath/ath10k/mac.c
> index 717b2fa..481842b 100644
> --- a/drivers/net/wireless/ath/ath10k/mac.c
> +++ b/drivers/net/wireless/ath/ath10k/mac.c
> @@ -4449,7 +4449,6 @@ static int ath10k_start(struct ieee80211_hw *hw)
> ar->state = ATH10K_STATE_ON;
> break;
> case ATH10K_STATE_RESTARTING:
> - ath10k_halt(ar);
> ar->state = ATH10K_STATE_RESTARTED;
> break;
> case ATH10K_STATE_ON:
>


--
Ben Greear <[email protected]>
Candela Technologies Inc http://www.candelatech.com

2016-12-01 10:34:57

by Kalle Valo

[permalink] [raw]
Subject: Re: ath10k: Fix soft lockup during firmware crash/hw-restart

Mohammed Shafi Shajakhan <[email protected]> writes:

> On Tue, Nov 29, 2016 at 10:16:50AM -0800, Ben Greear wrote:
>> Is this something for stable? And if so, how far back should it be appl=
ied?
>
> @Kalle,
>
> [shafi] kindly suggest. If i am not wrong this is only needed for 4.9

Correct, commit 3c97f5de1f28 went to 4.9-rc1, it was not in 4.8. I'll
add CC stable to the commit log.

--=20
Kalle Valo=

2016-12-01 06:06:42

by Mohammed Shafi Shajakhan

[permalink] [raw]
Subject: Re: ath10k: Fix soft lockup during firmware crash/hw-restart

On Tue, Nov 29, 2016 at 10:16:50AM -0800, Ben Greear wrote:
> Is this something for stable? And if so, how far back should it be applied?

@Kalle,

[shafi] kindly suggest. If i am not wrong this is only needed for 4.9

>
> I'll add this patch to my 4.9 tree and test it...
>

[shafi] thanks a lot Ben.

>
> On 11/29/2016 06:46 AM, Mohammed Shafi Shajakhan wrote:
> >From: Mohammed Shafi Shajakhan <[email protected]>
> >
> >During firmware crash (or) user requested manual restart
> >the system gets into a soft lock up state because of the
> >below root cause.
> >
> >During user requested hardware restart / firmware crash
> >the system goes into a soft lockup state as 'napi_synchronize'
> >is called after 'napi_disable' (which sets 'NAPI_STATE_SCHED'
> >bit) and it sleeps into infinite loop as it waits for
> >'NAPI_STATE_SCHED' to be cleared. This condition is hit because
> >'ath10k_hif_stop' is called twice as below (resulting in calling
> >'napi_synchronize' after 'napi_disable')
> >
> >'ath10k_core_restart' -> 'ath10k_hif_stop' (ATH10K_STATE_ON) ->
> >-> 'ieee80211_restart_hw' -> 'ath10k_start' -> 'ath10k_halt' ->
> >'ath10k_core_stop' -> 'ath10k_hif_stop' (ATH10K_STATE_RESTARTING)
> >
> >Fix this by calling 'ath10k_halt' in ath10k_core_restart itself
> >as it makes more sense before informing mac80211 to restart h/w
> >Also remove 'ath10k_halt' in ath10k_start for the state of 'restarting'
> >
> >Signed-off-by: Mohammed Shafi Shajakhan <[email protected]>
> >---
> >[thanks to Kalle and Michal for their inputs]
> >
> > drivers/net/wireless/ath/ath10k/core.c | 2 +-
> > drivers/net/wireless/ath/ath10k/mac.c | 1 -
> > 2 files changed, 1 insertion(+), 2 deletions(-)
> >
> >diff --git a/drivers/net/wireless/ath/ath10k/core.c b/drivers/net/wireless/ath/ath10k/core.c
> >index 7005e2a..5bc6847 100644
> >--- a/drivers/net/wireless/ath/ath10k/core.c
> >+++ b/drivers/net/wireless/ath/ath10k/core.c
> >@@ -1536,7 +1536,7 @@ static void ath10k_core_restart(struct work_struct *work)
> > switch (ar->state) {
> > case ATH10K_STATE_ON:
> > ar->state = ATH10K_STATE_RESTARTING;
> >- ath10k_hif_stop(ar);
> >+ ath10k_halt(ar);
> > ath10k_scan_finish(ar);
> > ieee80211_restart_hw(ar->hw);
> > break;
> >diff --git a/drivers/net/wireless/ath/ath10k/mac.c b/drivers/net/wireless/ath/ath10k/mac.c
> >index 717b2fa..481842b 100644
> >--- a/drivers/net/wireless/ath/ath10k/mac.c
> >+++ b/drivers/net/wireless/ath/ath10k/mac.c
> >@@ -4449,7 +4449,6 @@ static int ath10k_start(struct ieee80211_hw *hw)
> > ar->state = ATH10K_STATE_ON;
> > break;
> > case ATH10K_STATE_RESTARTING:
> >- ath10k_halt(ar);
> > ar->state = ATH10K_STATE_RESTARTED;
> > break;
> > case ATH10K_STATE_ON:
> >
>
>
> --
> Ben Greear <[email protected]>
> Candela Technologies Inc http://www.candelatech.com
>