2018-07-13 17:23:44

by Ben Greear

[permalink] [raw]
Subject: [PATCH] ath10k: fix vdev-start timeout on error

From: Ben Greear <[email protected]>

The vdev-start-response message should cause the
completion to fire, even in the error case. Otherwise,
the user still gets no useful information and everything
is blocked until the timeout period.

Add some warning text to print out the invalid status
code to aid debugging.

Signed-off-by: Ben Greear <[email protected]>
---
drivers/net/wireless/ath/ath10k/wmi.c | 12 +++++++++---
1 file changed, 9 insertions(+), 3 deletions(-)

diff --git a/drivers/net/wireless/ath/ath10k/wmi.c b/drivers/net/wireless/ath/ath10k/wmi.c
index a60de71..ec4cd1e 100644
--- a/drivers/net/wireless/ath/ath10k/wmi.c
+++ b/drivers/net/wireless/ath/ath10k/wmi.c
@@ -3443,12 +3443,18 @@ void ath10k_wmi_event_vdev_start_resp(struct ath10k *ar, struct sk_buff *skb)
ret = ath10k_wmi_pull_vdev_start(ar, skb, &arg);
if (ret) {
ath10k_warn(ar, "failed to parse vdev start event: %d\n", ret);
- return;
+ goto out;
}

- if (WARN_ON(__le32_to_cpu(arg.status)))
- return;
+ if (WARN_ON_ONCE(__le32_to_cpu(arg.status))) {
+ ath10k_warn(ar, "vdev-start-response reports status error: %d\n",
+ __le32_to_cpu(arg.status));
+ /* Setup is done one way or another though, so we should still
+ * do the completion, so don't return here.
+ */
+ }

+out:
complete(&ar->vdev_setup_done);
}

--
2.4.11


2018-07-13 17:53:00

by Michal Kazior

[permalink] [raw]
Subject: Re: [PATCH] ath10k: fix vdev-start timeout on error

On 13 July 2018 at 19:21, Ben Greear <[email protected]> wrote:
> On 07/13/2018 10:17 AM, Micha=C5=82 Kazior wrote:
>>
>> On 13 July 2018 at 19:08, <[email protected]> wrote:
>>>
>>> From: Ben Greear <[email protected]>
>>>
>>> The vdev-start-response message should cause the
>>> completion to fire, even in the error case. Otherwise,
>>> the user still gets no useful information and everything
>>> is blocked until the timeout period.
>>>
>>> Add some warning text to print out the invalid status
>>> code to aid debugging.
>>>
>>> Signed-off-by: Ben Greear <[email protected]>
>>> ---
>>> drivers/net/wireless/ath/ath10k/wmi.c | 12 +++++++++---
>>> 1 file changed, 9 insertions(+), 3 deletions(-)
>>>
>>> diff --git a/drivers/net/wireless/ath/ath10k/wmi.c
>>> b/drivers/net/wireless/ath/ath10k/wmi.c
>>> index a60de71..ec4cd1e 100644
>>> --- a/drivers/net/wireless/ath/ath10k/wmi.c
>>> +++ b/drivers/net/wireless/ath/ath10k/wmi.c
>>> @@ -3443,12 +3443,18 @@ void ath10k_wmi_event_vdev_start_resp(struct
>>> ath10k *ar, struct sk_buff *skb)
>>> ret =3D ath10k_wmi_pull_vdev_start(ar, skb, &arg);
>>> if (ret) {
>>> ath10k_warn(ar, "failed to parse vdev start event: %d\n=
",
>>> ret);
>>> - return;
>>> + goto out;
>>> }
>>>
>>> - if (WARN_ON(__le32_to_cpu(arg.status)))
>>> - return;
>>> + if (WARN_ON_ONCE(__le32_to_cpu(arg.status))) {
>>> + ath10k_warn(ar, "vdev-start-response reports status
>>> error: %d\n",
>>> + __le32_to_cpu(arg.status));
>>> + /* Setup is done one way or another though, so we shoul=
d
>>> still
>>> + * do the completion, so don't return here.
>>> + */
>>> + }
>>>
>>> +out:
>>> complete(&ar->vdev_setup_done);
>>
>>
>> With this the waiter can no longer tell if vdev_start succeeded or
>> not. It'll always think it succeeded even if arg.status or parsing
>> failed. Waiter instead of erroring out will continue to play out happy
>> scenario and may end up crashing firmware.
>>
>> Not stalling is nice, but I'd argue the status should be propagated
>> back to the waiter so it can error-check.
>
>
> So, maybe set ar->last_wmi_error =3D __le32_to_cpu(arg.status) before cal=
ling
> the complete and change the code that waits for vdev_setup_done to check
> that
> error code?
>
> Or, maybe we need an error code specific to this call,
> ar->last_wmi_vdev_start_status?

Tough call. I can't find any compelling argument to prefer one over
the other. Maybe last_wmi_error is a bit too generic?


Micha=C5=82

2018-07-13 17:33:04

by Michal Kazior

[permalink] [raw]
Subject: Re: [PATCH] ath10k: fix vdev-start timeout on error

On 13 July 2018 at 19:08, <[email protected]> wrote:
> From: Ben Greear <[email protected]>
>
> The vdev-start-response message should cause the
> completion to fire, even in the error case. Otherwise,
> the user still gets no useful information and everything
> is blocked until the timeout period.
>
> Add some warning text to print out the invalid status
> code to aid debugging.
>
> Signed-off-by: Ben Greear <[email protected]>
> ---
> drivers/net/wireless/ath/ath10k/wmi.c | 12 +++++++++---
> 1 file changed, 9 insertions(+), 3 deletions(-)
>
> diff --git a/drivers/net/wireless/ath/ath10k/wmi.c b/drivers/net/wireless/ath/ath10k/wmi.c
> index a60de71..ec4cd1e 100644
> --- a/drivers/net/wireless/ath/ath10k/wmi.c
> +++ b/drivers/net/wireless/ath/ath10k/wmi.c
> @@ -3443,12 +3443,18 @@ void ath10k_wmi_event_vdev_start_resp(struct ath10k *ar, struct sk_buff *skb)
> ret = ath10k_wmi_pull_vdev_start(ar, skb, &arg);
> if (ret) {
> ath10k_warn(ar, "failed to parse vdev start event: %d\n", ret);
> - return;
> + goto out;
> }
>
> - if (WARN_ON(__le32_to_cpu(arg.status)))
> - return;
> + if (WARN_ON_ONCE(__le32_to_cpu(arg.status))) {
> + ath10k_warn(ar, "vdev-start-response reports status error: %d\n",
> + __le32_to_cpu(arg.status));
> + /* Setup is done one way or another though, so we should still
> + * do the completion, so don't return here.
> + */
> + }
>
> +out:
> complete(&ar->vdev_setup_done);

With this the waiter can no longer tell if vdev_start succeeded or
not. It'll always think it succeeded even if arg.status or parsing
failed. Waiter instead of erroring out will continue to play out happy
scenario and may end up crashing firmware.

Not stalling is nice, but I'd argue the status should be propagated
back to the waiter so it can error-check.


Michal

2018-07-13 17:36:39

by Ben Greear

[permalink] [raw]
Subject: Re: [PATCH] ath10k: fix vdev-start timeout on error

On 07/13/2018 10:17 AM, Michał Kazior wrote:
> On 13 July 2018 at 19:08, <[email protected]> wrote:
>> From: Ben Greear <[email protected]>
>>
>> The vdev-start-response message should cause the
>> completion to fire, even in the error case. Otherwise,
>> the user still gets no useful information and everything
>> is blocked until the timeout period.
>>
>> Add some warning text to print out the invalid status
>> code to aid debugging.
>>
>> Signed-off-by: Ben Greear <[email protected]>
>> ---
>> drivers/net/wireless/ath/ath10k/wmi.c | 12 +++++++++---
>> 1 file changed, 9 insertions(+), 3 deletions(-)
>>
>> diff --git a/drivers/net/wireless/ath/ath10k/wmi.c b/drivers/net/wireless/ath/ath10k/wmi.c
>> index a60de71..ec4cd1e 100644
>> --- a/drivers/net/wireless/ath/ath10k/wmi.c
>> +++ b/drivers/net/wireless/ath/ath10k/wmi.c
>> @@ -3443,12 +3443,18 @@ void ath10k_wmi_event_vdev_start_resp(struct ath10k *ar, struct sk_buff *skb)
>> ret = ath10k_wmi_pull_vdev_start(ar, skb, &arg);
>> if (ret) {
>> ath10k_warn(ar, "failed to parse vdev start event: %d\n", ret);
>> - return;
>> + goto out;
>> }
>>
>> - if (WARN_ON(__le32_to_cpu(arg.status)))
>> - return;
>> + if (WARN_ON_ONCE(__le32_to_cpu(arg.status))) {
>> + ath10k_warn(ar, "vdev-start-response reports status error: %d\n",
>> + __le32_to_cpu(arg.status));
>> + /* Setup is done one way or another though, so we should still
>> + * do the completion, so don't return here.
>> + */
>> + }
>>
>> +out:
>> complete(&ar->vdev_setup_done);
>
> With this the waiter can no longer tell if vdev_start succeeded or
> not. It'll always think it succeeded even if arg.status or parsing
> failed. Waiter instead of erroring out will continue to play out happy
> scenario and may end up crashing firmware.
>
> Not stalling is nice, but I'd argue the status should be propagated
> back to the waiter so it can error-check.

So, maybe set ar->last_wmi_error = __le32_to_cpu(arg.status) before calling
the complete and change the code that waits for vdev_setup_done to check that
error code?

Or, maybe we need an error code specific to this call, ar->last_wmi_vdev_start_status?

Thanks,
Ben


--
Ben Greear <[email protected]>
Candela Technologies Inc http://www.candelatech.com

2018-07-13 17:56:22

by Ben Greear

[permalink] [raw]
Subject: Re: [PATCH] ath10k: fix vdev-start timeout on error

On 07/13/2018 10:37 AM, Michał Kazior wrote:
> On 13 July 2018 at 19:21, Ben Greear <[email protected]> wrote:
>> On 07/13/2018 10:17 AM, Michał Kazior wrote:
>>>
>>> On 13 July 2018 at 19:08, <[email protected]> wrote:
>>>>
>>>> From: Ben Greear <[email protected]>
>>>>
>>>> The vdev-start-response message should cause the
>>>> completion to fire, even in the error case. Otherwise,
>>>> the user still gets no useful information and everything
>>>> is blocked until the timeout period.
>>>>
>>>> Add some warning text to print out the invalid status
>>>> code to aid debugging.
>>>>
>>>> Signed-off-by: Ben Greear <[email protected]>
>>>> ---
>>>> drivers/net/wireless/ath/ath10k/wmi.c | 12 +++++++++---
>>>> 1 file changed, 9 insertions(+), 3 deletions(-)
>>>>
>>>> diff --git a/drivers/net/wireless/ath/ath10k/wmi.c
>>>> b/drivers/net/wireless/ath/ath10k/wmi.c
>>>> index a60de71..ec4cd1e 100644
>>>> --- a/drivers/net/wireless/ath/ath10k/wmi.c
>>>> +++ b/drivers/net/wireless/ath/ath10k/wmi.c
>>>> @@ -3443,12 +3443,18 @@ void ath10k_wmi_event_vdev_start_resp(struct
>>>> ath10k *ar, struct sk_buff *skb)
>>>> ret = ath10k_wmi_pull_vdev_start(ar, skb, &arg);
>>>> if (ret) {
>>>> ath10k_warn(ar, "failed to parse vdev start event: %d\n",
>>>> ret);
>>>> - return;
>>>> + goto out;
>>>> }
>>>>
>>>> - if (WARN_ON(__le32_to_cpu(arg.status)))
>>>> - return;
>>>> + if (WARN_ON_ONCE(__le32_to_cpu(arg.status))) {
>>>> + ath10k_warn(ar, "vdev-start-response reports status
>>>> error: %d\n",
>>>> + __le32_to_cpu(arg.status));
>>>> + /* Setup is done one way or another though, so we should
>>>> still
>>>> + * do the completion, so don't return here.
>>>> + */
>>>> + }
>>>>
>>>> +out:
>>>> complete(&ar->vdev_setup_done);
>>>
>>>
>>> With this the waiter can no longer tell if vdev_start succeeded or
>>> not. It'll always think it succeeded even if arg.status or parsing
>>> failed. Waiter instead of erroring out will continue to play out happy
>>> scenario and may end up crashing firmware.
>>>
>>> Not stalling is nice, but I'd argue the status should be propagated
>>> back to the waiter so it can error-check.
>>
>>
>> So, maybe set ar->last_wmi_error = __le32_to_cpu(arg.status) before calling
>> the complete and change the code that waits for vdev_setup_done to check
>> that
>> error code?
>>
>> Or, maybe we need an error code specific to this call,
>> ar->last_wmi_vdev_start_status?
>
> Tough call. I can't find any compelling argument to prefer one over
> the other. Maybe last_wmi_error is a bit too generic?

I was thinking some actions might require multiple wmi calls, and so
each one we track would need its own variable.

I'll implement it with a more specific error code and post a new patch.

Thanks,
Ben

--
Ben Greear <[email protected]>
Candela Technologies Inc http://www.candelatech.com