2023-06-01 03:50:28

by Baochen Qiang

[permalink] [raw]
Subject: [PATCH] wifi: ath11k: fix boot failure with one MSI vector

Commit 5b32b6dd96633 ("ath11k: Remove core PCI references from
PCI common code") breaks with one MSI vector because it moves
affinity setting after IRQ request, see below log:

[ 1417.278835] ath11k_pci 0000:02:00.0: failed to receive control response completion, polling..
[ 1418.302829] ath11k_pci 0000:02:00.0: Service connect timeout
[ 1418.302833] ath11k_pci 0000:02:00.0: failed to connect to HTT: -110
[ 1418.303669] ath11k_pci 0000:02:00.0: failed to start core: -110

The detail is, if do affinity request after IRQ activated,
which is done in request_irq(), kernel caches that request and
returns success directly. Later when a subsequent MHI interrupt is
fired, kernel will do the real affinity setting work, as a result,
changs the MSI vector. However at that time host has configured
old vector to hardware, so host never receives CE or DP interrupts.

Fix it by setting affinity before registering MHI controller
where host is, for the first time, doing IRQ request.

Tested-on: WCN6855 hw2.0 PCI WLAN.HSP.1.1-03125-QCAHSPSWPL_V1_V2_SILICONZ_LITE-3

Fixes: 5b32b6dd9663 ("ath11k: Remove core PCI references from PCI common code")
Signed-off-by: Baochen Qiang <[email protected]>
---
drivers/net/wireless/ath/ath11k/pci.c | 24 ++++++++++++------------
1 file changed, 12 insertions(+), 12 deletions(-)

diff --git a/drivers/net/wireless/ath/ath11k/pci.c b/drivers/net/wireless/ath/ath11k/pci.c
index 7b33731a50ee..a67576226380 100644
--- a/drivers/net/wireless/ath/ath11k/pci.c
+++ b/drivers/net/wireless/ath/ath11k/pci.c
@@ -850,10 +850,16 @@ static int ath11k_pci_probe(struct pci_dev *pdev,
if (ret)
goto err_pci_disable_msi;

+ ret = ath11k_pci_set_irq_affinity_hint(ab_pci, cpumask_of(0));
+ if (ret) {
+ ath11k_err(ab, "failed to set irq affinity %d\n", ret);
+ goto err_pci_disable_msi;
+ }
+
ret = ath11k_mhi_register(ab_pci);
if (ret) {
ath11k_err(ab, "failed to register mhi: %d\n", ret);
- goto err_pci_disable_msi;
+ goto err_irq_affinity_cleanup;
}

ret = ath11k_hal_srng_init(ab);
@@ -874,12 +880,6 @@ static int ath11k_pci_probe(struct pci_dev *pdev,
goto err_ce_free;
}

- ret = ath11k_pci_set_irq_affinity_hint(ab_pci, cpumask_of(0));
- if (ret) {
- ath11k_err(ab, "failed to set irq affinity %d\n", ret);
- goto err_free_irq;
- }
-
/* kernel may allocate a dummy vector before request_irq and
* then allocate a real vector when request_irq is called.
* So get msi_data here again to avoid spurious interrupt
@@ -888,19 +888,16 @@ static int ath11k_pci_probe(struct pci_dev *pdev,
ret = ath11k_pci_config_msi_data(ab_pci);
if (ret) {
ath11k_err(ab, "failed to config msi_data: %d\n", ret);
- goto err_irq_affinity_cleanup;
+ goto err_free_irq;
}

ret = ath11k_core_init(ab);
if (ret) {
ath11k_err(ab, "failed to init core: %d\n", ret);
- goto err_irq_affinity_cleanup;
+ goto err_free_irq;
}
return 0;

-err_irq_affinity_cleanup:
- ath11k_pci_set_irq_affinity_hint(ab_pci, NULL);
-
err_free_irq:
ath11k_pcic_free_irq(ab);

@@ -913,6 +910,9 @@ static int ath11k_pci_probe(struct pci_dev *pdev,
err_mhi_unregister:
ath11k_mhi_unregister(ab_pci);

+err_irq_affinity_cleanup:
+ ath11k_pci_set_irq_affinity_hint(ab_pci, NULL);
+
err_pci_disable_msi:
ath11k_pci_free_msi(ab_pci);


base-commit: 4ab7f08db5310ded48a5c1f3ec3f2e177ba6b1c2
--
2.25.1



2023-06-09 12:33:15

by Kalle Valo

[permalink] [raw]
Subject: Re: [PATCH] wifi: ath11k: fix boot failure with one MSI vector

Baochen Qiang <[email protected]> writes:

> Commit 5b32b6dd96633 ("ath11k: Remove core PCI references from
> PCI common code") breaks with one MSI vector because it moves
> affinity setting after IRQ request, see below log:
>
> [ 1417.278835] ath11k_pci 0000:02:00.0: failed to receive control response completion, polling..
> [ 1418.302829] ath11k_pci 0000:02:00.0: Service connect timeout
> [ 1418.302833] ath11k_pci 0000:02:00.0: failed to connect to HTT: -110
> [ 1418.303669] ath11k_pci 0000:02:00.0: failed to start core: -110
>
> The detail is, if do affinity request after IRQ activated,
> which is done in request_irq(), kernel caches that request and
> returns success directly. Later when a subsequent MHI interrupt is
> fired, kernel will do the real affinity setting work, as a result,
> changs the MSI vector. However at that time host has configured
> old vector to hardware, so host never receives CE or DP interrupts.
>
> Fix it by setting affinity before registering MHI controller
> where host is, for the first time, doing IRQ request.
>
> Tested-on: WCN6855 hw2.0 PCI WLAN.HSP.1.1-03125-QCAHSPSWPL_V1_V2_SILICONZ_LITE-3
>
> Fixes: 5b32b6dd9663 ("ath11k: Remove core PCI references from PCI common code")
> Signed-off-by: Baochen Qiang <[email protected]>

I'm worried if this breaks WCN6750 support. Manikanta, would able to
test this patch on WCN6750 and let us know if breaks anything?

--
https://patchwork.kernel.org/project/linux-wireless/list/

https://wireless.wiki.kernel.org/en/developers/documentation/submittingpatches

2023-08-07 07:29:10

by Baochen Qiang

[permalink] [raw]
Subject: Re: [PATCH] wifi: ath11k: fix boot failure with one MSI vector


On 6/9/2023 8:26 PM, Kalle Valo wrote:
> Baochen Qiang <[email protected]> writes:
>
>> Commit 5b32b6dd96633 ("ath11k: Remove core PCI references from
>> PCI common code") breaks with one MSI vector because it moves
>> affinity setting after IRQ request, see below log:
>>
>> [ 1417.278835] ath11k_pci 0000:02:00.0: failed to receive control response completion, polling..
>> [ 1418.302829] ath11k_pci 0000:02:00.0: Service connect timeout
>> [ 1418.302833] ath11k_pci 0000:02:00.0: failed to connect to HTT: -110
>> [ 1418.303669] ath11k_pci 0000:02:00.0: failed to start core: -110
>>
>> The detail is, if do affinity request after IRQ activated,
>> which is done in request_irq(), kernel caches that request and
>> returns success directly. Later when a subsequent MHI interrupt is
>> fired, kernel will do the real affinity setting work, as a result,
>> changs the MSI vector. However at that time host has configured
>> old vector to hardware, so host never receives CE or DP interrupts.
>>
>> Fix it by setting affinity before registering MHI controller
>> where host is, for the first time, doing IRQ request.
>>
>> Tested-on: WCN6855 hw2.0 PCI WLAN.HSP.1.1-03125-QCAHSPSWPL_V1_V2_SILICONZ_LITE-3
>>
>> Fixes: 5b32b6dd9663 ("ath11k: Remove core PCI references from PCI common code")
>> Signed-off-by: Baochen Qiang <[email protected]>
> I'm worried if this breaks WCN6750 support. Manikanta, would able to
> test this patch on WCN6750 and let us know if breaks anything?
Hi Manikanta, could you help test this patch on WCN6750?

2023-08-07 07:52:37

by Manikanta Pubbisetty

[permalink] [raw]
Subject: Re: [PATCH] wifi: ath11k: fix boot failure with one MSI vector

On 8/7/2023 12:44 PM, Baochen Qiang wrote:
>
> On 6/9/2023 8:26 PM, Kalle Valo wrote:
>> Baochen Qiang <[email protected]> writes:
>>
>>> Commit 5b32b6dd96633 ("ath11k: Remove core PCI references from
>>> PCI common code") breaks with one MSI vector because it moves
>>> affinity setting after IRQ request, see below log:
>>>
>>> [ 1417.278835] ath11k_pci 0000:02:00.0: failed to receive control
>>> response completion, polling..
>>> [ 1418.302829] ath11k_pci 0000:02:00.0: Service connect timeout
>>> [ 1418.302833] ath11k_pci 0000:02:00.0: failed to connect to HTT: -110
>>> [ 1418.303669] ath11k_pci 0000:02:00.0: failed to start core: -110
>>>
>>> The detail is, if do affinity request after IRQ activated,
>>> which is done in request_irq(), kernel caches that request and
>>> returns success directly. Later when a subsequent MHI interrupt is
>>> fired, kernel will do the real affinity setting work, as a result,
>>> changs the MSI vector. However at that time host has configured
>>> old vector to hardware, so host never receives CE or DP interrupts.
>>>
>>> Fix it by setting affinity before registering MHI controller
>>> where host is, for the first time, doing IRQ request.
>>>
>>> Tested-on: WCN6855 hw2.0 PCI
>>> WLAN.HSP.1.1-03125-QCAHSPSWPL_V1_V2_SILICONZ_LITE-3
>>>
>>> Fixes: 5b32b6dd9663 ("ath11k: Remove core PCI references from PCI
>>> common code")
>>> Signed-off-by: Baochen Qiang <[email protected]>
>> I'm worried if this breaks WCN6750 support. Manikanta, would able to
>> test this patch on WCN6750 and let us know if breaks anything?
> Hi Manikanta, could you help test this patch on WCN6750?

Sure, we will test and update this here.

Thanks,
Manikanta

2023-08-07 10:44:52

by Baochen Qiang

[permalink] [raw]
Subject: Re: [PATCH] wifi: ath11k: fix boot failure with one MSI vector


On 8/7/2023 6:16 PM, Manikanta Pubbisetty wrote:
> On 8/7/2023 12:53 PM, Manikanta Pubbisetty wrote:
>> On 8/7/2023 12:44 PM, Baochen Qiang wrote:
>>>
>>> On 6/9/2023 8:26 PM, Kalle Valo wrote:
>>>> Baochen Qiang <[email protected]> writes:
>>>>
>>>>> Commit 5b32b6dd96633 ("ath11k: Remove core PCI references from
>>>>> PCI common code") breaks with one MSI vector because it moves
>>>>> affinity setting after IRQ request, see below log:
>>>>>
>>>>> [ 1417.278835] ath11k_pci 0000:02:00.0: failed to receive control
>>>>> response completion, polling..
>>>>> [ 1418.302829] ath11k_pci 0000:02:00.0: Service connect timeout
>>>>> [ 1418.302833] ath11k_pci 0000:02:00.0: failed to connect to HTT:
>>>>> -110
>>>>> [ 1418.303669] ath11k_pci 0000:02:00.0: failed to start core: -110
>>>>>
>>>>> The detail is, if do affinity request after IRQ activated,
>>>>> which is done in request_irq(), kernel caches that request and
>>>>> returns success directly. Later when a subsequent MHI interrupt is
>>>>> fired, kernel will do the real affinity setting work, as a result,
>>>>> changs the MSI vector. However at that time host has configured
>>>>> old vector to hardware, so host never receives CE or DP interrupts.
>>>>>
>>>>> Fix it by setting affinity before registering MHI controller
>>>>> where host is, for the first time, doing IRQ request.
>>>>>
>>>>> Tested-on: WCN6855 hw2.0 PCI
>>>>> WLAN.HSP.1.1-03125-QCAHSPSWPL_V1_V2_SILICONZ_LITE-3
>>>>>
>>>>> Fixes: 5b32b6dd9663 ("ath11k: Remove core PCI references from PCI
>>>>> common code")
>>>>> Signed-off-by: Baochen Qiang <[email protected]>
>>>> I'm worried if this breaks WCN6750 support. Manikanta, would able to
>>>> test this patch on WCN6750 and let us know if breaks anything?
>>> Hi Manikanta, could you help test this patch on WCN6750?
>>
>> Sure, we will test and update this here.
>>
>
> No impact on WCN6750.
>
> Tested-by: Manikanta Pubbisetty <[email protected]>
>
>
> Thanks,
> Manikanta

Thanks Mani.

Hi Kalle, any more comments?


2023-08-07 10:46:15

by Manikanta Pubbisetty

[permalink] [raw]
Subject: Re: [PATCH] wifi: ath11k: fix boot failure with one MSI vector

On 8/7/2023 12:53 PM, Manikanta Pubbisetty wrote:
> On 8/7/2023 12:44 PM, Baochen Qiang wrote:
>>
>> On 6/9/2023 8:26 PM, Kalle Valo wrote:
>>> Baochen Qiang <[email protected]> writes:
>>>
>>>> Commit 5b32b6dd96633 ("ath11k: Remove core PCI references from
>>>> PCI common code") breaks with one MSI vector because it moves
>>>> affinity setting after IRQ request, see below log:
>>>>
>>>> [ 1417.278835] ath11k_pci 0000:02:00.0: failed to receive control
>>>> response completion, polling..
>>>> [ 1418.302829] ath11k_pci 0000:02:00.0: Service connect timeout
>>>> [ 1418.302833] ath11k_pci 0000:02:00.0: failed to connect to HTT: -110
>>>> [ 1418.303669] ath11k_pci 0000:02:00.0: failed to start core: -110
>>>>
>>>> The detail is, if do affinity request after IRQ activated,
>>>> which is done in request_irq(), kernel caches that request and
>>>> returns success directly. Later when a subsequent MHI interrupt is
>>>> fired, kernel will do the real affinity setting work, as a result,
>>>> changs the MSI vector. However at that time host has configured
>>>> old vector to hardware, so host never receives CE or DP interrupts.
>>>>
>>>> Fix it by setting affinity before registering MHI controller
>>>> where host is, for the first time, doing IRQ request.
>>>>
>>>> Tested-on: WCN6855 hw2.0 PCI
>>>> WLAN.HSP.1.1-03125-QCAHSPSWPL_V1_V2_SILICONZ_LITE-3
>>>>
>>>> Fixes: 5b32b6dd9663 ("ath11k: Remove core PCI references from PCI
>>>> common code")
>>>> Signed-off-by: Baochen Qiang <[email protected]>
>>> I'm worried if this breaks WCN6750 support. Manikanta, would able to
>>> test this patch on WCN6750 and let us know if breaks anything?
>> Hi Manikanta, could you help test this patch on WCN6750?
>
> Sure, we will test and update this here.
>

No impact on WCN6750.

Tested-by: Manikanta Pubbisetty <[email protected]>


Thanks,
Manikanta

2023-08-07 15:50:12

by Jeff Johnson

[permalink] [raw]
Subject: Re: [PATCH] wifi: ath11k: fix boot failure with one MSI vector

On 8/7/2023 3:16 AM, Manikanta Pubbisetty wrote:
> No impact on WCN6750.
>
> Tested-by: Manikanta Pubbisetty <[email protected]>

Can you provide a Tested-on: flag that Kalle can add to the commit text?

>
>
> Thanks,
> Manikanta


2023-08-08 16:11:46

by Manikanta Pubbisetty

[permalink] [raw]
Subject: Re: [PATCH] wifi: ath11k: fix boot failure with one MSI vector



On 8/7/2023 8:59 PM, Jeff Johnson wrote:
> On 8/7/2023 3:16 AM, Manikanta Pubbisetty wrote:
>> No impact on WCN6750.
>>
>> Tested-by: Manikanta Pubbisetty <[email protected]>
>
> Can you provide a Tested-on: flag that Kalle can add to the commit text?
>

Tested-on: WCN6750 hw1.0 AHB WLAN.MSL.1.0.1-01160-QCAMSLSWPLZ-1

Thanks,
Manikanta