2023-06-30 22:19:52

by Doug Anderson

[permalink] [raw]
Subject: [PATCH] ath10k: Don't touch the CE interrupt registers after power up

As talked about in commit d66d24ac300c ("ath10k: Keep track of which
interrupts fired, don't poll them"), if we access the copy engine
register at a bad time then ath10k can go boom. However, it's not
necessarily easy to know when it's safe to access them.

The ChromeOS test labs saw a crash that looked like this at
shutdown/reboot time (on a chromeos-5.15 kernel, but likely the
problem could also reproduce upstream):

Internal error: synchronous external abort: 96000010 [#1] PREEMPT SMP
...
CPU: 4 PID: 6168 Comm: reboot Not tainted 5.15.111-lockdep-19350-g1d624fe6758f #1 010b9b233ab055c27c6dc88efb0be2f4e9e86f51
Hardware name: Google Kingoftown (DT)
...
pc : ath10k_snoc_read32+0x50/0x74 [ath10k_snoc]
lr : ath10k_snoc_read32+0x24/0x74 [ath10k_snoc]
...
Call trace:
ath10k_snoc_read32+0x50/0x74 [ath10k_snoc ...]
ath10k_ce_disable_interrupt+0x190/0x65c [ath10k_core ...]
ath10k_ce_disable_interrupts+0x8c/0x120 [ath10k_core ...]
ath10k_snoc_hif_stop+0x78/0x660 [ath10k_snoc ...]
ath10k_core_stop+0x13c/0x1ec [ath10k_core ...]
ath10k_halt+0x398/0x5b0 [ath10k_core ...]
ath10k_stop+0xfc/0x1a8 [ath10k_core ...]
drv_stop+0x148/0x6b4 [mac80211 ...]
ieee80211_stop_device+0x70/0x80 [mac80211 ...]
ieee80211_do_stop+0x10d8/0x15b0 [mac80211 ...]
ieee80211_stop+0x144/0x1a0 [mac80211 ...]
__dev_close_many+0x1e8/0x2c0
dev_close_many+0x198/0x33c
dev_close+0x140/0x210
cfg80211_shutdown_all_interfaces+0xc8/0x1e0 [cfg80211 ...]
ieee80211_remove_interfaces+0x118/0x5c4 [mac80211 ...]
ieee80211_unregister_hw+0x64/0x1f4 [mac80211 ...]
ath10k_mac_unregister+0x4c/0xf0 [ath10k_core ...]
ath10k_core_unregister+0x80/0xb0 [ath10k_core ...]
ath10k_snoc_free_resources+0xb8/0x1ec [ath10k_snoc ...]
ath10k_snoc_shutdown+0x98/0xd0 [ath10k_snoc ...]
platform_shutdown+0x7c/0xa0
device_shutdown+0x3e0/0x58c
kernel_restart_prepare+0x68/0xa0
kernel_restart+0x28/0x7c

Though there's no known way to reproduce the problem, it makes sense
that it would be the same issue where we're trying to access copy
engine registers when it's not allowed.

Let's fix this by changing how we "disable" the interrupts. Instead of
tweaking the copy engine registers we'll just use disable_irq() and
enable_irq(). Then we'll configure the interrupts once at power up
time.

Tested-on: WCN3990 hw1.0 SNOC WLAN.HL.3.2.2.c10-00754-QCAHLSWMTPL-1

Signed-off-by: Douglas Anderson <[email protected]>
---

drivers/net/wireless/ath/ath10k/snoc.c | 18 ++++++++++++++----
1 file changed, 14 insertions(+), 4 deletions(-)

diff --git a/drivers/net/wireless/ath/ath10k/snoc.c b/drivers/net/wireless/ath/ath10k/snoc.c
index 26214c00cd0d..2c39bad7ebfb 100644
--- a/drivers/net/wireless/ath/ath10k/snoc.c
+++ b/drivers/net/wireless/ath/ath10k/snoc.c
@@ -828,12 +828,20 @@ static void ath10k_snoc_hif_get_default_pipe(struct ath10k *ar,

static inline void ath10k_snoc_irq_disable(struct ath10k *ar)
{
- ath10k_ce_disable_interrupts(ar);
+ struct ath10k_snoc *ar_snoc = ath10k_snoc_priv(ar);
+ int id;
+
+ for (id = 0; id < CE_COUNT_MAX; id++)
+ disable_irq(ar_snoc->ce_irqs[id].irq_line);
}

static inline void ath10k_snoc_irq_enable(struct ath10k *ar)
{
- ath10k_ce_enable_interrupts(ar);
+ struct ath10k_snoc *ar_snoc = ath10k_snoc_priv(ar);
+ int id;
+
+ for (id = 0; id < CE_COUNT_MAX; id++)
+ enable_irq(ar_snoc->ce_irqs[id].irq_line);
}

static void ath10k_snoc_rx_pipe_cleanup(struct ath10k_snoc_pipe *snoc_pipe)
@@ -1090,6 +1098,8 @@ static int ath10k_snoc_hif_power_up(struct ath10k *ar,
goto err_free_rri;
}

+ ath10k_ce_enable_interrupts(ar);
+
return 0;

err_free_rri:
@@ -1253,8 +1263,8 @@ static int ath10k_snoc_request_irq(struct ath10k *ar)

for (id = 0; id < CE_COUNT_MAX; id++) {
ret = request_irq(ar_snoc->ce_irqs[id].irq_line,
- ath10k_snoc_per_engine_handler, 0,
- ce_name[id], ar);
+ ath10k_snoc_per_engine_handler,
+ IRQF_NO_AUTOEN, ce_name[id], ar);
if (ret) {
ath10k_err(ar,
"failed to register IRQ handler for CE %d: %d\n",
--
2.41.0.255.g8b1d071c50-goog



2023-10-02 17:07:20

by Kalle Valo

[permalink] [raw]
Subject: Re: [PATCH] ath10k: Don't touch the CE interrupt registers after power up

Douglas Anderson <[email protected]> wrote:

> As talked about in commit d66d24ac300c ("ath10k: Keep track of which
> interrupts fired, don't poll them"), if we access the copy engine
> register at a bad time then ath10k can go boom. However, it's not
> necessarily easy to know when it's safe to access them.
>
> The ChromeOS test labs saw a crash that looked like this at
> shutdown/reboot time (on a chromeos-5.15 kernel, but likely the
> problem could also reproduce upstream):
>
> Internal error: synchronous external abort: 96000010 [#1] PREEMPT SMP
> ...
> CPU: 4 PID: 6168 Comm: reboot Not tainted 5.15.111-lockdep-19350-g1d624fe6758f #1 010b9b233ab055c27c6dc88efb0be2f4e9e86f51
> Hardware name: Google Kingoftown (DT)
> ...
> pc : ath10k_snoc_read32+0x50/0x74 [ath10k_snoc]
> lr : ath10k_snoc_read32+0x24/0x74 [ath10k_snoc]
> ...
> Call trace:
> ath10k_snoc_read32+0x50/0x74 [ath10k_snoc ...]
> ath10k_ce_disable_interrupt+0x190/0x65c [ath10k_core ...]
> ath10k_ce_disable_interrupts+0x8c/0x120 [ath10k_core ...]
> ath10k_snoc_hif_stop+0x78/0x660 [ath10k_snoc ...]
> ath10k_core_stop+0x13c/0x1ec [ath10k_core ...]
> ath10k_halt+0x398/0x5b0 [ath10k_core ...]
> ath10k_stop+0xfc/0x1a8 [ath10k_core ...]
> drv_stop+0x148/0x6b4 [mac80211 ...]
> ieee80211_stop_device+0x70/0x80 [mac80211 ...]
> ieee80211_do_stop+0x10d8/0x15b0 [mac80211 ...]
> ieee80211_stop+0x144/0x1a0 [mac80211 ...]
> __dev_close_many+0x1e8/0x2c0
> dev_close_many+0x198/0x33c
> dev_close+0x140/0x210
> cfg80211_shutdown_all_interfaces+0xc8/0x1e0 [cfg80211 ...]
> ieee80211_remove_interfaces+0x118/0x5c4 [mac80211 ...]
> ieee80211_unregister_hw+0x64/0x1f4 [mac80211 ...]
> ath10k_mac_unregister+0x4c/0xf0 [ath10k_core ...]
> ath10k_core_unregister+0x80/0xb0 [ath10k_core ...]
> ath10k_snoc_free_resources+0xb8/0x1ec [ath10k_snoc ...]
> ath10k_snoc_shutdown+0x98/0xd0 [ath10k_snoc ...]
> platform_shutdown+0x7c/0xa0
> device_shutdown+0x3e0/0x58c
> kernel_restart_prepare+0x68/0xa0
> kernel_restart+0x28/0x7c
>
> Though there's no known way to reproduce the problem, it makes sense
> that it would be the same issue where we're trying to access copy
> engine registers when it's not allowed.
>
> Let's fix this by changing how we "disable" the interrupts. Instead of
> tweaking the copy engine registers we'll just use disable_irq() and
> enable_irq(). Then we'll configure the interrupts once at power up
> time.
>
> Tested-on: WCN3990 hw1.0 SNOC WLAN.HL.3.2.2.c10-00754-QCAHLSWMTPL-1
>
> Signed-off-by: Douglas Anderson <[email protected]>
> Signed-off-by: Kalle Valo <[email protected]>

Patch applied to ath-next branch of ath.git, thanks.

170c75d43a77 wifi: ath10k: Don't touch the CE interrupt registers after power up

--
https://patchwork.kernel.org/project/linux-wireless/patch/20230630151842.1.If764ede23c4e09a43a842771c2ddf99608f25f8e@changeid/

https://wireless.wiki.kernel.org/en/developers/documentation/submittingpatches

2023-12-07 14:29:24

by Yongqin Liu

[permalink] [raw]
Subject: Re: [PATCH] ath10k: Don't touch the CE interrupt registers after power up

Hi, Douglas

On Sat, 1 Jul 2023 at 06:19, Douglas Anderson <[email protected]> wrote:
>
> As talked about in commit d66d24ac300c ("ath10k: Keep track of which
> interrupts fired, don't poll them"), if we access the copy engine
> register at a bad time then ath10k can go boom. However, it's not
> necessarily easy to know when it's safe to access them.
>
> The ChromeOS test labs saw a crash that looked like this at
> shutdown/reboot time (on a chromeos-5.15 kernel, but likely the
> problem could also reproduce upstream):
>
> Internal error: synchronous external abort: 96000010 [#1] PREEMPT SMP
> ...
> CPU: 4 PID: 6168 Comm: reboot Not tainted 5.15.111-lockdep-19350-g1d624fe6758f #1 010b9b233ab055c27c6dc88efb0be2f4e9e86f51
> Hardware name: Google Kingoftown (DT)
> ...
> pc : ath10k_snoc_read32+0x50/0x74 [ath10k_snoc]
> lr : ath10k_snoc_read32+0x24/0x74 [ath10k_snoc]
> ...
> Call trace:
> ath10k_snoc_read32+0x50/0x74 [ath10k_snoc ...]
> ath10k_ce_disable_interrupt+0x190/0x65c [ath10k_core ...]
> ath10k_ce_disable_interrupts+0x8c/0x120 [ath10k_core ...]
> ath10k_snoc_hif_stop+0x78/0x660 [ath10k_snoc ...]
> ath10k_core_stop+0x13c/0x1ec [ath10k_core ...]
> ath10k_halt+0x398/0x5b0 [ath10k_core ...]
> ath10k_stop+0xfc/0x1a8 [ath10k_core ...]
> drv_stop+0x148/0x6b4 [mac80211 ...]
> ieee80211_stop_device+0x70/0x80 [mac80211 ...]
> ieee80211_do_stop+0x10d8/0x15b0 [mac80211 ...]
> ieee80211_stop+0x144/0x1a0 [mac80211 ...]
> __dev_close_many+0x1e8/0x2c0
> dev_close_many+0x198/0x33c
> dev_close+0x140/0x210
> cfg80211_shutdown_all_interfaces+0xc8/0x1e0 [cfg80211 ...]
> ieee80211_remove_interfaces+0x118/0x5c4 [mac80211 ...]
> ieee80211_unregister_hw+0x64/0x1f4 [mac80211 ...]
> ath10k_mac_unregister+0x4c/0xf0 [ath10k_core ...]
> ath10k_core_unregister+0x80/0xb0 [ath10k_core ...]
> ath10k_snoc_free_resources+0xb8/0x1ec [ath10k_snoc ...]
> ath10k_snoc_shutdown+0x98/0xd0 [ath10k_snoc ...]
> platform_shutdown+0x7c/0xa0
> device_shutdown+0x3e0/0x58c
> kernel_restart_prepare+0x68/0xa0
> kernel_restart+0x28/0x7c
>
> Though there's no known way to reproduce the problem, it makes sense
> that it would be the same issue where we're trying to access copy
> engine registers when it's not allowed.
>
> Let's fix this by changing how we "disable" the interrupts. Instead of
> tweaking the copy engine registers we'll just use disable_irq() and
> enable_irq(). Then we'll configure the interrupts once at power up
> time.
>
> Tested-on: WCN3990 hw1.0 SNOC WLAN.HL.3.2.2.c10-00754-QCAHLSWMTPL-1
>
> Signed-off-by: Douglas Anderson <[email protected]>
> ---

Recently during our Android build test on the Dragonboard 845c board,
with the Android Common Kernel android11-5.4-lts and android12-5.4-lts branches,

we found there are some ufshcd related changes printed,
and the serial console gets stuck, no response for input,
and the Android boot is stuck at the animation window.

The problem is reported here
https://issuetracker.google.com/issues/314366682
You could check there for more log details.

And with some bisection, I found it's related to this commit,
when I revert this commit, the problem is gone.

So replied here, not sure if you have any idea about it,
or any suggestions on what we should do next to resolve the problem?

Thanks,
Yongqin Liu

> drivers/net/wireless/ath/ath10k/snoc.c | 18 ++++++++++++++----
> 1 file changed, 14 insertions(+), 4 deletions(-)
>
> diff --git a/drivers/net/wireless/ath/ath10k/snoc.c b/drivers/net/wireless/ath/ath10k/snoc.c
> index 26214c00cd0d..2c39bad7ebfb 100644
> --- a/drivers/net/wireless/ath/ath10k/snoc.c
> +++ b/drivers/net/wireless/ath/ath10k/snoc.c
> @@ -828,12 +828,20 @@ static void ath10k_snoc_hif_get_default_pipe(struct ath10k *ar,
>
> static inline void ath10k_snoc_irq_disable(struct ath10k *ar)
> {
> - ath10k_ce_disable_interrupts(ar);
> + struct ath10k_snoc *ar_snoc = ath10k_snoc_priv(ar);
> + int id;
> +
> + for (id = 0; id < CE_COUNT_MAX; id++)
> + disable_irq(ar_snoc->ce_irqs[id].irq_line);
> }
>
> static inline void ath10k_snoc_irq_enable(struct ath10k *ar)
> {
> - ath10k_ce_enable_interrupts(ar);
> + struct ath10k_snoc *ar_snoc = ath10k_snoc_priv(ar);
> + int id;
> +
> + for (id = 0; id < CE_COUNT_MAX; id++)
> + enable_irq(ar_snoc->ce_irqs[id].irq_line);
> }
>
> static void ath10k_snoc_rx_pipe_cleanup(struct ath10k_snoc_pipe *snoc_pipe)
> @@ -1090,6 +1098,8 @@ static int ath10k_snoc_hif_power_up(struct ath10k *ar,
> goto err_free_rri;
> }
>
> + ath10k_ce_enable_interrupts(ar);
> +
> return 0;
>
> err_free_rri:
> @@ -1253,8 +1263,8 @@ static int ath10k_snoc_request_irq(struct ath10k *ar)
>
> for (id = 0; id < CE_COUNT_MAX; id++) {
> ret = request_irq(ar_snoc->ce_irqs[id].irq_line,
> - ath10k_snoc_per_engine_handler, 0,
> - ce_name[id], ar);
> + ath10k_snoc_per_engine_handler,
> + IRQF_NO_AUTOEN, ce_name[id], ar);
> if (ret) {
> ath10k_err(ar,
> "failed to register IRQ handler for CE %d: %d\n",
> --
> 2.41.0.255.g8b1d071c50-goog
>


--
Best Regards,
Yongqin Liu
---------------------------------------------------------------
#mailing list
[email protected]
http://lists.linaro.org/mailman/listinfo/linaro-android

2023-12-07 14:49:31

by Kalle Valo

[permalink] [raw]
Subject: Re: [PATCH] ath10k: Don't touch the CE interrupt registers after power up

Yongqin Liu <[email protected]> writes:

> On Sat, 1 Jul 2023 at 06:19, Douglas Anderson <[email protected]> wrote:
>>
>> As talked about in commit d66d24ac300c ("ath10k: Keep track of which
>> interrupts fired, don't poll them"), if we access the copy engine
>> register at a bad time then ath10k can go boom. However, it's not
>> necessarily easy to know when it's safe to access them.
>>
>> The ChromeOS test labs saw a crash that looked like this at
>> shutdown/reboot time (on a chromeos-5.15 kernel, but likely the
>> problem could also reproduce upstream):
>>
>> Internal error: synchronous external abort: 96000010 [#1] PREEMPT SMP
>> ...
>> CPU: 4 PID: 6168 Comm: reboot Not tainted
>> 5.15.111-lockdep-19350-g1d624fe6758f #1
>> 010b9b233ab055c27c6dc88efb0be2f4e9e86f51
>> Hardware name: Google Kingoftown (DT)
>> ...
>> pc : ath10k_snoc_read32+0x50/0x74 [ath10k_snoc]
>> lr : ath10k_snoc_read32+0x24/0x74 [ath10k_snoc]
>> ...
>> Call trace:
>> ath10k_snoc_read32+0x50/0x74 [ath10k_snoc ...]
>> ath10k_ce_disable_interrupt+0x190/0x65c [ath10k_core ...]
>> ath10k_ce_disable_interrupts+0x8c/0x120 [ath10k_core ...]
>> ath10k_snoc_hif_stop+0x78/0x660 [ath10k_snoc ...]
>> ath10k_core_stop+0x13c/0x1ec [ath10k_core ...]
>> ath10k_halt+0x398/0x5b0 [ath10k_core ...]
>> ath10k_stop+0xfc/0x1a8 [ath10k_core ...]
>> drv_stop+0x148/0x6b4 [mac80211 ...]
>> ieee80211_stop_device+0x70/0x80 [mac80211 ...]
>> ieee80211_do_stop+0x10d8/0x15b0 [mac80211 ...]
>> ieee80211_stop+0x144/0x1a0 [mac80211 ...]
>> __dev_close_many+0x1e8/0x2c0
>> dev_close_many+0x198/0x33c
>> dev_close+0x140/0x210
>> cfg80211_shutdown_all_interfaces+0xc8/0x1e0 [cfg80211 ...]
>> ieee80211_remove_interfaces+0x118/0x5c4 [mac80211 ...]
>> ieee80211_unregister_hw+0x64/0x1f4 [mac80211 ...]
>> ath10k_mac_unregister+0x4c/0xf0 [ath10k_core ...]
>> ath10k_core_unregister+0x80/0xb0 [ath10k_core ...]
>> ath10k_snoc_free_resources+0xb8/0x1ec [ath10k_snoc ...]
>> ath10k_snoc_shutdown+0x98/0xd0 [ath10k_snoc ...]
>> platform_shutdown+0x7c/0xa0
>> device_shutdown+0x3e0/0x58c
>> kernel_restart_prepare+0x68/0xa0
>> kernel_restart+0x28/0x7c
>>
>> Though there's no known way to reproduce the problem, it makes sense
>> that it would be the same issue where we're trying to access copy
>> engine registers when it's not allowed.
>>
>> Let's fix this by changing how we "disable" the interrupts. Instead of
>> tweaking the copy engine registers we'll just use disable_irq() and
>> enable_irq(). Then we'll configure the interrupts once at power up
>> time.
>>
>> Tested-on: WCN3990 hw1.0 SNOC WLAN.HL.3.2.2.c10-00754-QCAHLSWMTPL-1
>>
>> Signed-off-by: Douglas Anderson <[email protected]>
>> ---
>
> Recently during our Android build test on the Dragonboard 845c board,
> with the Android Common Kernel android11-5.4-lts and android12-5.4-lts branches,
>
> we found there are some ufshcd related changes printed,
> and the serial console gets stuck, no response for input,
> and the Android boot is stuck at the animation window.
>
> The problem is reported here
> https://issuetracker.google.com/issues/314366682
> You could check there for more log details.
>
> And with some bisection, I found it's related to this commit,
> when I revert this commit, the problem is gone.
>
> So replied here, not sure if you have any idea about it,
> or any suggestions on what we should do next to resolve the problem?

FWIW we don't support Android kernels, only kernel.org releases.

--
https://patchwork.kernel.org/project/linux-wireless/list/

https://wireless.wiki.kernel.org/en/developers/documentation/submittingpatches

2023-12-08 02:11:15

by Yongqin Liu

[permalink] [raw]
Subject: Re: [PATCH] ath10k: Don't touch the CE interrupt registers after power up

On Fri, 8 Dec 2023 at 00:49, Doug Anderson <[email protected]> wrote:
>
> Hi,
>
> On Thu, Dec 7, 2023 at 6:49 AM Kalle Valo <[email protected]> wrote:
> >
> > > Recently during our Android build test on the Dragonboard 845c board,
> > > with the Android Common Kernel android11-5.4-lts and android12-5.4-lts branches,
> > >
> > > we found there are some ufshcd related changes printed,
> > > and the serial console gets stuck, no response for input,
> > > and the Android boot is stuck at the animation window.
> > >
> > > The problem is reported here
> > > https://issuetracker.google.com/issues/314366682
> > > You could check there for more log details.
> > >
> > > And with some bisection, I found it's related to this commit,
> > > when I revert this commit, the problem is gone.
> > >
> > > So replied here, not sure if you have any idea about it,
> > > or any suggestions on what we should do next to resolve the problem?
> >
> > FWIW we don't support Android kernels, only kernel.org releases.
>
> Right. If the problem also reproduces on mainline Linux then that
> would be interesting to know. I think db845c is at least somewhat well
> supported by mainline so it should be possible to test it there.

I checked with the ACK android-mainline branch, which is based on the
mainline Linux,
this commit is there, but the problem is not seen.

> If I had to guess, I'd think that probably the CE interrupts are
> firing nonstop for you and not getting handled. Then those constant
> interrupts are (presumably) causing the UFS controller to timeout. If
> this is true, the question is: why? Maybe you could use ftrace to
> confirm this by adding some traces to
> ath10k_snoc_per_engine_handler()? There's a way to get ftrace buffers
> dumped on panic (or, if you use kdb, it has a command for it).

Thanks for the suggestions, I will check internally on how to debug that.

> If this reproduces on mainline and it's not obvious how to fix this, I
> don't object to a revert. As per the description of the original
> patch, the problem being fixed was fairly rare and I didn't have a way
> to reproduce it. The fix seemed safe to me and we've been using it on
> Chromebooks based on sc7180, but if it had to get reverted it wouldn't
> be the end of the world.
>
> -Doug



--
Best Regards,
Yongqin Liu
---------------------------------------------------------------
#mailing list
[email protected]
http://lists.linaro.org/mailman/listinfo/linaro-android

2024-01-03 16:32:17

by Amit Pundir

[permalink] [raw]
Subject: Re: [PATCH] ath10k: Don't touch the CE interrupt registers after power up

On Sat, 1 Jul 2023 at 03:49, Douglas Anderson <[email protected]> wrote:
>
> As talked about in commit d66d24ac300c ("ath10k: Keep track of which
> interrupts fired, don't poll them"),

Hi Douglas, does this fix has a dependency on the above upstream
commit d66d24ac300c, that you refer to?

Asking because this patch landed on stable v5.4.y branch recently and
now I see RCU stalls and lockups around "ath10k_snoc 18800000.wifi:
failed to receive control response completion, polling.." message
during ath10k_snoc initialization/bringup on DB845c. Here is the
relevant log https://www.irccloud.com/pastebin/raw/NjKm3mLc, with
DB845c rebooting into USB crash dump mode eventually.

I wonder if commit d66d24ac300c need to be backported to v5.4.y as
well? I tried cherry-picking it but ran into non-trivial conflicts, so
didn't spend much time on it.

Regards,
Amit Pundir

> if we access the copy engine
> register at a bad time then ath10k can go boom. However, it's not
> necessarily easy to know when it's safe to access them.
>
> The ChromeOS test labs saw a crash that looked like this at
> shutdown/reboot time (on a chromeos-5.15 kernel, but likely the
> problem could also reproduce upstream):
>
> Internal error: synchronous external abort: 96000010 [#1] PREEMPT SMP
> ...
> CPU: 4 PID: 6168 Comm: reboot Not tainted 5.15.111-lockdep-19350-g1d624fe6758f #1 010b9b233ab055c27c6dc88efb0be2f4e9e86f51
> Hardware name: Google Kingoftown (DT)
> ...
> pc : ath10k_snoc_read32+0x50/0x74 [ath10k_snoc]
> lr : ath10k_snoc_read32+0x24/0x74 [ath10k_snoc]
> ...
> Call trace:
> ath10k_snoc_read32+0x50/0x74 [ath10k_snoc ...]
> ath10k_ce_disable_interrupt+0x190/0x65c [ath10k_core ...]
> ath10k_ce_disable_interrupts+0x8c/0x120 [ath10k_core ...]
> ath10k_snoc_hif_stop+0x78/0x660 [ath10k_snoc ...]
> ath10k_core_stop+0x13c/0x1ec [ath10k_core ...]
> ath10k_halt+0x398/0x5b0 [ath10k_core ...]
> ath10k_stop+0xfc/0x1a8 [ath10k_core ...]
> drv_stop+0x148/0x6b4 [mac80211 ...]
> ieee80211_stop_device+0x70/0x80 [mac80211 ...]
> ieee80211_do_stop+0x10d8/0x15b0 [mac80211 ...]
> ieee80211_stop+0x144/0x1a0 [mac80211 ...]
> __dev_close_many+0x1e8/0x2c0
> dev_close_many+0x198/0x33c
> dev_close+0x140/0x210
> cfg80211_shutdown_all_interfaces+0xc8/0x1e0 [cfg80211 ...]
> ieee80211_remove_interfaces+0x118/0x5c4 [mac80211 ...]
> ieee80211_unregister_hw+0x64/0x1f4 [mac80211 ...]
> ath10k_mac_unregister+0x4c/0xf0 [ath10k_core ...]
> ath10k_core_unregister+0x80/0xb0 [ath10k_core ...]
> ath10k_snoc_free_resources+0xb8/0x1ec [ath10k_snoc ...]
> ath10k_snoc_shutdown+0x98/0xd0 [ath10k_snoc ...]
> platform_shutdown+0x7c/0xa0
> device_shutdown+0x3e0/0x58c
> kernel_restart_prepare+0x68/0xa0
> kernel_restart+0x28/0x7c
>
> Though there's no known way to reproduce the problem, it makes sense
> that it would be the same issue where we're trying to access copy
> engine registers when it's not allowed.
>
> Let's fix this by changing how we "disable" the interrupts. Instead of
> tweaking the copy engine registers we'll just use disable_irq() and
> enable_irq(). Then we'll configure the interrupts once at power up
> time.
>
> Tested-on: WCN3990 hw1.0 SNOC WLAN.HL.3.2.2.c10-00754-QCAHLSWMTPL-1
>
> Signed-off-by: Douglas Anderson <[email protected]>
> ---
>
> drivers/net/wireless/ath/ath10k/snoc.c | 18 ++++++++++++++----
> 1 file changed, 14 insertions(+), 4 deletions(-)
>
> diff --git a/drivers/net/wireless/ath/ath10k/snoc.c b/drivers/net/wireless/ath/ath10k/snoc.c
> index 26214c00cd0d..2c39bad7ebfb 100644
> --- a/drivers/net/wireless/ath/ath10k/snoc.c
> +++ b/drivers/net/wireless/ath/ath10k/snoc.c
> @@ -828,12 +828,20 @@ static void ath10k_snoc_hif_get_default_pipe(struct ath10k *ar,
>
> static inline void ath10k_snoc_irq_disable(struct ath10k *ar)
> {
> - ath10k_ce_disable_interrupts(ar);
> + struct ath10k_snoc *ar_snoc = ath10k_snoc_priv(ar);
> + int id;
> +
> + for (id = 0; id < CE_COUNT_MAX; id++)
> + disable_irq(ar_snoc->ce_irqs[id].irq_line);
> }
>
> static inline void ath10k_snoc_irq_enable(struct ath10k *ar)
> {
> - ath10k_ce_enable_interrupts(ar);
> + struct ath10k_snoc *ar_snoc = ath10k_snoc_priv(ar);
> + int id;
> +
> + for (id = 0; id < CE_COUNT_MAX; id++)
> + enable_irq(ar_snoc->ce_irqs[id].irq_line);
> }
>
> static void ath10k_snoc_rx_pipe_cleanup(struct ath10k_snoc_pipe *snoc_pipe)
> @@ -1090,6 +1098,8 @@ static int ath10k_snoc_hif_power_up(struct ath10k *ar,
> goto err_free_rri;
> }
>
> + ath10k_ce_enable_interrupts(ar);
> +
> return 0;
>
> err_free_rri:
> @@ -1253,8 +1263,8 @@ static int ath10k_snoc_request_irq(struct ath10k *ar)
>
> for (id = 0; id < CE_COUNT_MAX; id++) {
> ret = request_irq(ar_snoc->ce_irqs[id].irq_line,
> - ath10k_snoc_per_engine_handler, 0,
> - ce_name[id], ar);
> + ath10k_snoc_per_engine_handler,
> + IRQF_NO_AUTOEN, ce_name[id], ar);
> if (ret) {
> ath10k_err(ar,
> "failed to register IRQ handler for CE %d: %d\n",
> --
> 2.41.0.255.g8b1d071c50-goog
>