2020-06-02 05:28:48

by John Stultz

[permalink] [raw]
Subject: [PATCH] wireless: ath10k: Return early in ath10k_qmi_event_server_exit() to avoid hard crash on reboot

Ever since 5.7-rc1, if we call
ath10k_qmi_remove_msa_permission(), the db845c hard crashes on
reboot, resulting in the device getting stuck in the usb crash
debug mode and not coming back up wihthout a hard power off.

This hack avoids the issue by returning early in
ath10k_qmi_event_server_exit().

A better solution is very much desired!

Feedback and suggestions welcome!

Cc: Rakesh Pillai <[email protected]>
Cc: Govind Singh <[email protected]>
Cc: Bjorn Andersson <[email protected]>
Cc: Niklas Cassel <[email protected]>
Cc: Manivannan Sadhasivam <[email protected]>
Cc: Amit Pundir <[email protected]>
Cc: Brian Norris <[email protected]>
Cc: Kalle Valo <[email protected]>
Cc: [email protected]
Reported-by: Amit Pundir <[email protected]>
Signed-off-by: John Stultz <[email protected]>
---
drivers/net/wireless/ath/ath10k/qmi.c | 5 +++++
1 file changed, 5 insertions(+)

diff --git a/drivers/net/wireless/ath/ath10k/qmi.c b/drivers/net/wireless/ath/ath10k/qmi.c
index 85dce43c5439..ab38562ce1cb 100644
--- a/drivers/net/wireless/ath/ath10k/qmi.c
+++ b/drivers/net/wireless/ath/ath10k/qmi.c
@@ -854,6 +854,11 @@ static void ath10k_qmi_event_server_exit(struct ath10k_qmi *qmi)
struct ath10k *ar = qmi->ar;
struct ath10k_snoc *ar_snoc = ath10k_snoc_priv(ar);

+ /*
+ * HACK: Calling ath10k_qmi_remove_msa_permission causes
+ * hardware to hard crash on reboot
+ */
+ return;
ath10k_qmi_remove_msa_permission(qmi);
ath10k_core_free_board_files(ar);
if (!test_bit(ATH10K_SNOC_FLAG_UNREGISTERING, &ar_snoc->flags))
--
2.17.1


2020-06-02 19:20:47

by Brian Norris

[permalink] [raw]
Subject: Re: [PATCH] wireless: ath10k: Return early in ath10k_qmi_event_server_exit() to avoid hard crash on reboot

+ Sibi

On Mon, Jun 1, 2020 at 10:25 PM John Stultz <[email protected]> wrote:
>
> Ever since 5.7-rc1, if we call
> ath10k_qmi_remove_msa_permission(), the db845c hard crashes on
> reboot, resulting in the device getting stuck in the usb crash
> debug mode and not coming back up wihthout a hard power off.
>
> This hack avoids the issue by returning early in
> ath10k_qmi_event_server_exit().
>
> A better solution is very much desired!

Any chance you can bisect what caused this? There are a lot of
non-ath10k pieces involved in this stuff.

Brian

2020-06-02 19:43:26

by John Stultz

[permalink] [raw]
Subject: Re: [PATCH] wireless: ath10k: Return early in ath10k_qmi_event_server_exit() to avoid hard crash on reboot

On Tue, Jun 2, 2020 at 12:16 PM Brian Norris <[email protected]> wrote:
>
> + Sibi
>
> On Mon, Jun 1, 2020 at 10:25 PM John Stultz <[email protected]> wrote:
> >
> > Ever since 5.7-rc1, if we call
> > ath10k_qmi_remove_msa_permission(), the db845c hard crashes on
> > reboot, resulting in the device getting stuck in the usb crash
> > debug mode and not coming back up wihthout a hard power off.
> >
> > This hack avoids the issue by returning early in
> > ath10k_qmi_event_server_exit().
> >
> > A better solution is very much desired!
>
> Any chance you can bisect what caused this? There are a lot of
> non-ath10k pieces involved in this stuff.

Amit had spent some work on chasing it down to the in kernel qrtr-ns
work, and reported it here:
https://lists.infradead.org/pipermail/ath10k/2020-April/014970.html

But that discussion seemingly stalled out, so I came up with this hack
to workaround it for us.

thanks
-john

2020-06-02 20:07:27

by Brian Norris

[permalink] [raw]
Subject: Re: [PATCH] wireless: ath10k: Return early in ath10k_qmi_event_server_exit() to avoid hard crash on reboot

On Tue, Jun 2, 2020 at 12:40 PM John Stultz <[email protected]> wrote:
> On Tue, Jun 2, 2020 at 12:16 PM Brian Norris <[email protected]> wrote:
> > On Mon, Jun 1, 2020 at 10:25 PM John Stultz <[email protected]> wrote:
> > >
> > > Ever since 5.7-rc1, if we call
> > > ath10k_qmi_remove_msa_permission(), the db845c hard crashes on
> > > reboot, resulting in the device getting stuck in the usb crash
> > > debug mode and not coming back up wihthout a hard power off.
> > >
> > > This hack avoids the issue by returning early in
> > > ath10k_qmi_event_server_exit().
> > >
> > > A better solution is very much desired!
> >
> > Any chance you can bisect what caused this? There are a lot of
> > non-ath10k pieces involved in this stuff.
>
> Amit had spent some work on chasing it down to the in kernel qrtr-ns
> work, and reported it here:
> https://lists.infradead.org/pipermail/ath10k/2020-April/014970.html
>
> But that discussion seemingly stalled out, so I came up with this hack
> to workaround it for us.

If I'm reading it right, then that means we should revert this stuff
from v5.7-rc1:

0c2204a4ad71 net: qrtr: Migrate nameservice to kernel from userspace

At least, until people can resolve the tail end of that thread. New
features (ath11k, etc.) are not a reason to break existing features
(ath10k/wcn3990).

Brian

2020-06-03 00:31:54

by Manivannan Sadhasivam

[permalink] [raw]
Subject: Re: [PATCH] wireless: ath10k: Return early in ath10k_qmi_event_server_exit() to avoid hard crash on reboot

On Tue, Jun 02, 2020 at 01:04:26PM -0700, Brian Norris wrote:
> On Tue, Jun 2, 2020 at 12:40 PM John Stultz <[email protected]> wrote:
> > On Tue, Jun 2, 2020 at 12:16 PM Brian Norris <[email protected]> wrote:
> > > On Mon, Jun 1, 2020 at 10:25 PM John Stultz <[email protected]> wrote:
> > > >
> > > > Ever since 5.7-rc1, if we call
> > > > ath10k_qmi_remove_msa_permission(), the db845c hard crashes on
> > > > reboot, resulting in the device getting stuck in the usb crash
> > > > debug mode and not coming back up wihthout a hard power off.
> > > >
> > > > This hack avoids the issue by returning early in
> > > > ath10k_qmi_event_server_exit().
> > > >
> > > > A better solution is very much desired!
> > >
> > > Any chance you can bisect what caused this? There are a lot of
> > > non-ath10k pieces involved in this stuff.
> >
> > Amit had spent some work on chasing it down to the in kernel qrtr-ns
> > work, and reported it here:
> > https://lists.infradead.org/pipermail/ath10k/2020-April/014970.html
> >
> > But that discussion seemingly stalled out, so I came up with this hack
> > to workaround it for us.
>
> If I'm reading it right, then that means we should revert this stuff
> from v5.7-rc1:
>
> 0c2204a4ad71 net: qrtr: Migrate nameservice to kernel from userspace
>
> At least, until people can resolve the tail end of that thread. New
> features (ath11k, etc.) are not a reason to break existing features
> (ath10k/wcn3990).

I don't agree with this. If you read through the replies to the bug report,
it is clear that NS migration uncovered a corner case or even a bug. So we
should try to fix that indeed.

Govind: Did you get chance to work on fixing this issue?

Thanks,
Mani

>
> Brian

2020-06-03 10:09:51

by Govind Singh

[permalink] [raw]
Subject: Re: [PATCH] wireless: ath10k: Return early in ath10k_qmi_event_server_exit() to avoid hard crash on reboot

Hi Mani,

On 2020-06-03 05:57, Manivannan Sadhasivam wrote:
> On Tue, Jun 02, 2020 at 01:04:26PM -0700, Brian Norris wrote:
>> On Tue, Jun 2, 2020 at 12:40 PM John Stultz <[email protected]>
>> wrote:
>> > On Tue, Jun 2, 2020 at 12:16 PM Brian Norris <[email protected]> wrote:
>> > > On Mon, Jun 1, 2020 at 10:25 PM John Stultz <[email protected]> wrote:
>> > > >
>> > > > Ever since 5.7-rc1, if we call
>> > > > ath10k_qmi_remove_msa_permission(), the db845c hard crashes on
>> > > > reboot, resulting in the device getting stuck in the usb crash
>> > > > debug mode and not coming back up wihthout a hard power off.
>> > > >
>> > > > This hack avoids the issue by returning early in
>> > > > ath10k_qmi_event_server_exit().
>> > > >
>> > > > A better solution is very much desired!
>> > >
>> > > Any chance you can bisect what caused this? There are a lot of
>> > > non-ath10k pieces involved in this stuff.
>> >
>> > Amit had spent some work on chasing it down to the in kernel qrtr-ns
>> > work, and reported it here:
>> > https://lists.infradead.org/pipermail/ath10k/2020-April/014970.html
>> >
>> > But that discussion seemingly stalled out, so I came up with this hack
>> > to workaround it for us.
>>
>> If I'm reading it right, then that means we should revert this stuff
>> from v5.7-rc1:
>>
>> 0c2204a4ad71 net: qrtr: Migrate nameservice to kernel from userspace
>>
>> At least, until people can resolve the tail end of that thread. New
>> features (ath11k, etc.) are not a reason to break existing features
>> (ath10k/wcn3990).
>
> I don't agree with this. If you read through the replies to the bug
> report,
> it is clear that NS migration uncovered a corner case or even a bug. So
> we
> should try to fix that indeed.
>
> Govind: Did you get chance to work on fixing this issue?
>

I have done basic testing by moving msa map/unmap from qmi service
callbacks to init/de-init path.
I will send patch for review.
Reason for del_server needs to investigated from rproc side.

> Thanks,
> Mani
>
>>
>> Brian

Thanks,
Govind

2020-06-04 18:23:34

by Sibi Sankar

[permalink] [raw]
Subject: Re: [PATCH] wireless: ath10k: Return early in ath10k_qmi_event_server_exit() to avoid hard crash on reboot

On 2020-06-03 15:37, [email protected] wrote:
> Hi Mani,
>
> On 2020-06-03 05:57, Manivannan Sadhasivam wrote:
>> On Tue, Jun 02, 2020 at 01:04:26PM -0700, Brian Norris wrote:
>>> On Tue, Jun 2, 2020 at 12:40 PM John Stultz <[email protected]>
>>> wrote:
>>> > On Tue, Jun 2, 2020 at 12:16 PM Brian Norris <[email protected]> wrote:
>>> > > On Mon, Jun 1, 2020 at 10:25 PM John Stultz <[email protected]> wrote:
>>> > > >
>>> > > > Ever since 5.7-rc1, if we call
>>> > > > ath10k_qmi_remove_msa_permission(), the db845c hard crashes on
>>> > > > reboot, resulting in the device getting stuck in the usb crash
>>> > > > debug mode and not coming back up wihthout a hard power off.
>>> > > >
>>> > > > This hack avoids the issue by returning early in
>>> > > > ath10k_qmi_event_server_exit().
>>> > > >
>>> > > > A better solution is very much desired!
>>> > >
>>> > > Any chance you can bisect what caused this? There are a lot of
>>> > > non-ath10k pieces involved in this stuff.
>>> >
>>> > Amit had spent some work on chasing it down to the in kernel qrtr-ns
>>> > work, and reported it here:
>>> > https://lists.infradead.org/pipermail/ath10k/2020-April/014970.html
>>> >
>>> > But that discussion seemingly stalled out, so I came up with this hack
>>> > to workaround it for us.
>>>
>>> If I'm reading it right, then that means we should revert this stuff
>>> from v5.7-rc1:
>>>
>>> 0c2204a4ad71 net: qrtr: Migrate nameservice to kernel from userspace
>>>
>>> At least, until people can resolve the tail end of that thread. New
>>> features (ath11k, etc.) are not a reason to break existing features
>>> (ath10k/wcn3990).
>>
>> I don't agree with this. If you read through the replies to the bug
>> report,
>> it is clear that NS migration uncovered a corner case or even a bug.
>> So we
>> should try to fix that indeed.
>>
>> Govind: Did you get chance to work on fixing this issue?
>>
>
> I have done basic testing by moving msa map/unmap from qmi service
> callbacks to init/de-init path.
> I will send patch for review.
> Reason for del_server needs to investigated from rproc side.

Govind,
On receiving SIGTERM, rmtfs would try
to perform a graceful shutdown of the
modem, that should be the source of
the del_server.

>
>> Thanks,
>> Mani
>>
>>>
>>> Brian
>
> Thanks,
> Govind

--
Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum,
a Linux Foundation Collaborative Project.

2020-06-08 11:22:27

by Kalle Valo

[permalink] [raw]
Subject: Re: [PATCH] wireless: ath10k: Return early in ath10k_qmi_event_server_exit() to avoid hard crash on reboot

John Stultz <[email protected]> writes:

> Ever since 5.7-rc1, if we call
> ath10k_qmi_remove_msa_permission(), the db845c hard crashes on
> reboot, resulting in the device getting stuck in the usb crash
> debug mode and not coming back up wihthout a hard power off.
>
> This hack avoids the issue by returning early in
> ath10k_qmi_event_server_exit().
>
> A better solution is very much desired!
>
> Feedback and suggestions welcome!
>
> Cc: Rakesh Pillai <[email protected]>
> Cc: Govind Singh <[email protected]>
> Cc: Bjorn Andersson <[email protected]>
> Cc: Niklas Cassel <[email protected]>
> Cc: Manivannan Sadhasivam <[email protected]>
> Cc: Amit Pundir <[email protected]>
> Cc: Brian Norris <[email protected]>
> Cc: Kalle Valo <[email protected]>
> Cc: [email protected]
> Reported-by: Amit Pundir <[email protected]>
> Signed-off-by: John Stultz <[email protected]>

Just so you know: as you didn't CC linux-wireless it's not on patchwork
and hence not on my radar. But hopefully we find a better solution to
fix this.

--
https://wireless.wiki.kernel.org/en/developers/documentation/submittingpatches

2020-06-08 11:42:06

by Kalle Valo

[permalink] [raw]
Subject: Re: [PATCH] wireless: ath10k: Return early in ath10k_qmi_event_server_exit() to avoid hard crash on reboot

Manivannan Sadhasivam <[email protected]> writes:

> On Tue, Jun 02, 2020 at 01:04:26PM -0700, Brian Norris wrote:
>> On Tue, Jun 2, 2020 at 12:40 PM John Stultz <[email protected]> wrote:
>> > On Tue, Jun 2, 2020 at 12:16 PM Brian Norris <[email protected]> wrote:
>> > > On Mon, Jun 1, 2020 at 10:25 PM John Stultz <[email protected]> wrote:
>> > > >
>> > > > Ever since 5.7-rc1, if we call
>> > > > ath10k_qmi_remove_msa_permission(), the db845c hard crashes on
>> > > > reboot, resulting in the device getting stuck in the usb crash
>> > > > debug mode and not coming back up wihthout a hard power off.
>> > > >
>> > > > This hack avoids the issue by returning early in
>> > > > ath10k_qmi_event_server_exit().
>> > > >
>> > > > A better solution is very much desired!
>> > >
>> > > Any chance you can bisect what caused this? There are a lot of
>> > > non-ath10k pieces involved in this stuff.
>> >
>> > Amit had spent some work on chasing it down to the in kernel qrtr-ns
>> > work, and reported it here:
>> > https://lists.infradead.org/pipermail/ath10k/2020-April/014970.html
>> >
>> > But that discussion seemingly stalled out, so I came up with this hack
>> > to workaround it for us.
>>
>> If I'm reading it right, then that means we should revert this stuff
>> from v5.7-rc1:
>>
>> 0c2204a4ad71 net: qrtr: Migrate nameservice to kernel from userspace
>>
>> At least, until people can resolve the tail end of that thread. New
>> features (ath11k, etc.) are not a reason to break existing features
>> (ath10k/wcn3990).
>
> I don't agree with this. If you read through the replies to the bug report,
> it is clear that NS migration uncovered a corner case or even a bug. So we
> should try to fix that indeed.

I'm with Mani, we should try to fix ath10k instead. Hopefully we can
find a fix soon.

Forcing QCA6390 users to use the userspace qrtr-ns would be bad user
experience, I really would want to avoid that.

--
https://wireless.wiki.kernel.org/en/developers/documentation/submittingpatches

2020-08-17 09:09:57

by Amit Pundir

[permalink] [raw]
Subject: Re: [PATCH] wireless: ath10k: Return early in ath10k_qmi_event_server_exit() to avoid hard crash on reboot

On Mon, 8 Jun 2020 at 17:07, Kalle Valo <[email protected]> wrote:
> > I don't agree with this. If you read through the replies to the bug report,
> > it is clear that NS migration uncovered a corner case or even a bug. So we
> > should try to fix that indeed.
>
> I'm with Mani, we should try to fix ath10k instead. Hopefully we can
> find a fix soon.

Hi Team,

Any updates on this? I can reproduce this hard crash on v5.9-rc1 as well.

It is not a blocker for us because we switched to a userspace
workaround, where we do not wait for modem to shutdown gracefully and
SIGKILL it instead, during the shutdown/reboot process. But I'm happy
to take a swing at any intermediate/in-progress solution available.

Regards,
Amit Pundir

>
> Forcing QCA6390 users to use the userspace qrtr-ns would be bad user
> experience, I really would want to avoid that.
>
> --
> https://wireless.wiki.kernel.org/en/developers/documentation/submittingpatches

2020-08-28 12:54:56

by Kalle Valo

[permalink] [raw]
Subject: Re: [PATCH] wireless: ath10k: Return early in ath10k_qmi_event_server_exit() to avoid hard crash on reboot

Amit Pundir <[email protected]> writes:

> On Mon, 8 Jun 2020 at 17:07, Kalle Valo <[email protected]> wrote:
>> > I don't agree with this. If you read through the replies to the bug report,
>> > it is clear that NS migration uncovered a corner case or even a bug. So we
>> > should try to fix that indeed.
>>
>> I'm with Mani, we should try to fix ath10k instead. Hopefully we can
>> find a fix soon.
>
> Hi Team,
>
> Any updates on this? I can reproduce this hard crash on v5.9-rc1 as well.
>
> It is not a blocker for us because we switched to a userspace
> workaround, where we do not wait for modem to shutdown gracefully and
> SIGKILL it instead, during the shutdown/reboot process. But I'm happy
> to take a swing at any intermediate/in-progress solution available.

Govind submitted this patch and later he asked to drop it, but I think
it would be a good idea to test it anyway:

ath10k: Move msa region map/unmap to init/deinit path

https://lkml.kernel.org/r/[email protected]

(patchwork is down so I cannot give a patchwork link)

--
https://wireless.wiki.kernel.org/en/developers/documentation/submittingpatches

2020-08-28 13:18:30

by Govind Singh

[permalink] [raw]
Subject: Re: [PATCH] wireless: ath10k: Return early in ath10k_qmi_event_server_exit() to avoid hard crash on reboot

Hi Kalle,

On 2020-08-28 18:22, Kalle Valo wrote:
> Amit Pundir <[email protected]> writes:
>
>> On Mon, 8 Jun 2020 at 17:07, Kalle Valo <[email protected]> wrote:
>>> > I don't agree with this. If you read through the replies to the bug report,
>>> > it is clear that NS migration uncovered a corner case or even a bug. So we
>>> > should try to fix that indeed.
>>>
>>> I'm with Mani, we should try to fix ath10k instead. Hopefully we can
>>> find a fix soon.
>>
>> Hi Team,
>>
>> Any updates on this? I can reproduce this hard crash on v5.9-rc1 as
>> well.
>>
>> It is not a blocker for us because we switched to a userspace
>> workaround, where we do not wait for modem to shutdown gracefully and
>> SIGKILL it instead, during the shutdown/reboot process. But I'm happy
>> to take a swing at any intermediate/in-progress solution available.
>
> Govind submitted this patch and later he asked to drop it, but I think
> it would be a good idea to test it anyway:
>
> ath10k: Move msa region map/unmap to init/deinit path
>
> https://lkml.kernel.org/r/[email protected]
>
> (patchwork is down so I cannot give a patchwork link)

This patchwork is not fixing the issue and changing MSA mapping sequence
is major design change.
This issue is only seen with DB845 which uses SCM call, newer targets
QCS404/SC7180/SM8150 will not have this issue as MSA mapping is
hard-coded in TZ.
Probably changes in qmi layer to give different indication for this
scenario and changes in FW is required to mitigate this issue
gracefully.

BR,
Govind

2020-09-07 16:26:59

by Kalle Valo

[permalink] [raw]
Subject: Re: [PATCH] wireless: ath10k: Return early in ath10k_qmi_event_server_exit() to avoid hard crash on reboot

Govind Singh <[email protected]> writes:

> On 2020-08-28 18:22, Kalle Valo wrote:
>> Amit Pundir <[email protected]> writes:
>>
>>> On Mon, 8 Jun 2020 at 17:07, Kalle Valo <[email protected]> wrote:
>>>> > I don't agree with this. If you read through the replies to the bug report,
>>>> > it is clear that NS migration uncovered a corner case or even a bug. So we
>>>> > should try to fix that indeed.
>>>>
>>>> I'm with Mani, we should try to fix ath10k instead. Hopefully we can
>>>> find a fix soon.
>>>
>>> Hi Team,
>>>
>>> Any updates on this? I can reproduce this hard crash on v5.9-rc1 as
>>> well.
>>>
>>> It is not a blocker for us because we switched to a userspace
>>> workaround, where we do not wait for modem to shutdown gracefully and
>>> SIGKILL it instead, during the shutdown/reboot process. But I'm happy
>>> to take a swing at any intermediate/in-progress solution available.
>>
>> Govind submitted this patch and later he asked to drop it, but I think
>> it would be a good idea to test it anyway:
>>
>> ath10k: Move msa region map/unmap to init/deinit path
>>
>> https://lkml.kernel.org/r/[email protected]
>>
>> (patchwork is down so I cannot give a patchwork link)
>
> This patchwork is not fixing the issue and changing MSA mapping
> sequence is major design change. This issue is only seen with DB845
> which uses SCM call, newer targets QCS404/SC7180/SM8150 will not have
> this issue as MSA mapping is hard-coded in TZ. Probably changes in qmi
> layer to give different indication for this scenario and changes in FW
> is required to mitigate this issue gracefully.

Oh, bad news :/ Can anyone look at that in detail? Even a quick hack
patch would get this forward.

--
https://wireless.wiki.kernel.org/en/developers/documentation/submittingpatches