Return-path: Received: from mail2.candelatech.com ([208.74.158.173]:55676 "EHLO mail2.candelatech.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1729982AbeGMR4W (ORCPT ); Fri, 13 Jul 2018 13:56:22 -0400 Subject: Re: [PATCH] ath10k: fix vdev-start timeout on error To: =?UTF-8?Q?Micha=c5=82_Kazior?= References: <1531501683-32299-1-git-send-email-greearb@candelatech.com> <4f4470db-d044-ef1c-4cc9-aab6bbc812f4@candelatech.com> Cc: linux-wireless , ath10k@lists.infradead.org, kvalo@codeaurora.org From: Ben Greear Message-ID: <78726366-ac9e-41c6-ce51-b378ac3408ec@candelatech.com> (sfid-20180713_194048_979343_54A4EB3C) Date: Fri, 13 Jul 2018 10:40:43 -0700 MIME-Version: 1.0 In-Reply-To: Content-Type: text/plain; charset=utf-8; format=flowed Sender: linux-wireless-owner@vger.kernel.org List-ID: On 07/13/2018 10:37 AM, Michał Kazior wrote: > On 13 July 2018 at 19:21, Ben Greear wrote: >> On 07/13/2018 10:17 AM, Michał Kazior wrote: >>> >>> On 13 July 2018 at 19:08, wrote: >>>> >>>> From: Ben Greear >>>> >>>> The vdev-start-response message should cause the >>>> completion to fire, even in the error case. Otherwise, >>>> the user still gets no useful information and everything >>>> is blocked until the timeout period. >>>> >>>> Add some warning text to print out the invalid status >>>> code to aid debugging. >>>> >>>> Signed-off-by: Ben Greear >>>> --- >>>> drivers/net/wireless/ath/ath10k/wmi.c | 12 +++++++++--- >>>> 1 file changed, 9 insertions(+), 3 deletions(-) >>>> >>>> diff --git a/drivers/net/wireless/ath/ath10k/wmi.c >>>> b/drivers/net/wireless/ath/ath10k/wmi.c >>>> index a60de71..ec4cd1e 100644 >>>> --- a/drivers/net/wireless/ath/ath10k/wmi.c >>>> +++ b/drivers/net/wireless/ath/ath10k/wmi.c >>>> @@ -3443,12 +3443,18 @@ void ath10k_wmi_event_vdev_start_resp(struct >>>> ath10k *ar, struct sk_buff *skb) >>>> ret = ath10k_wmi_pull_vdev_start(ar, skb, &arg); >>>> if (ret) { >>>> ath10k_warn(ar, "failed to parse vdev start event: %d\n", >>>> ret); >>>> - return; >>>> + goto out; >>>> } >>>> >>>> - if (WARN_ON(__le32_to_cpu(arg.status))) >>>> - return; >>>> + if (WARN_ON_ONCE(__le32_to_cpu(arg.status))) { >>>> + ath10k_warn(ar, "vdev-start-response reports status >>>> error: %d\n", >>>> + __le32_to_cpu(arg.status)); >>>> + /* Setup is done one way or another though, so we should >>>> still >>>> + * do the completion, so don't return here. >>>> + */ >>>> + } >>>> >>>> +out: >>>> complete(&ar->vdev_setup_done); >>> >>> >>> With this the waiter can no longer tell if vdev_start succeeded or >>> not. It'll always think it succeeded even if arg.status or parsing >>> failed. Waiter instead of erroring out will continue to play out happy >>> scenario and may end up crashing firmware. >>> >>> Not stalling is nice, but I'd argue the status should be propagated >>> back to the waiter so it can error-check. >> >> >> So, maybe set ar->last_wmi_error = __le32_to_cpu(arg.status) before calling >> the complete and change the code that waits for vdev_setup_done to check >> that >> error code? >> >> Or, maybe we need an error code specific to this call, >> ar->last_wmi_vdev_start_status? > > Tough call. I can't find any compelling argument to prefer one over > the other. Maybe last_wmi_error is a bit too generic? I was thinking some actions might require multiple wmi calls, and so each one we track would need its own variable. I'll implement it with a more specific error code and post a new patch. Thanks, Ben -- Ben Greear Candela Technologies Inc http://www.candelatech.com