2023-08-09 09:31:22

by Manikanta Pubbisetty

[permalink] [raw]
Subject: Re: [PATCH] Revert "Revert "wifi: ath11k: Enable threaded NAPI""

On 8/9/2023 1:04 PM, Johan Hovold wrote:
> This reverts commit d265ebe41c911314bd273c218a37088835959fa1.
>
> Disabling threaded NAPI causes the Lenovo ThinkPad X13s to hang (e.g. no
> more interrupts received) almost immediately during RX.
>
> Apparently something broke since commit 13aa2fb692d3 ("wifi: ath11k:
> Enable threaded NAPI") so that a simple revert is no longer possible.
>

This is getting as weird as it would get :)

> As commit d265ebe41c91 ("Revert "wifi: ath11k: Enable threaded NAPI"")
> does not address the underlying issue reported with QCN9074, it seems we
> need to reenable threaded NAPI before fixing both bugs properly.
>

It seems that the revert has actually solved the issue reported with
QCN9074.

https://bugzilla.kernel.org/show_bug.cgi?id=217536

We were trying to reproduce the problem on X86+QCN9074 (with threaded
NAPI) from quite some time, but there is no repro yet.

Actually, enabling/disabling threaded NAPI is a simple affair; I'm
wondering to hear that interrupts are blocked due to not having
threaded NAPI.

What is the chip that Lenovo Thinkpad X13s is having?

Thanks,
Manikanta


2023-08-09 10:15:24

by Johan Hovold

[permalink] [raw]
Subject: Re: [PATCH] Revert "Revert "wifi: ath11k: Enable threaded NAPI""

On Wed, Aug 09, 2023 at 02:32:37PM +0530, Manikanta Pubbisetty wrote:
> On 8/9/2023 1:04 PM, Johan Hovold wrote:
> > This reverts commit d265ebe41c911314bd273c218a37088835959fa1.
> >
> > Disabling threaded NAPI causes the Lenovo ThinkPad X13s to hang (e.g. no
> > more interrupts received) almost immediately during RX.
> >
> > Apparently something broke since commit 13aa2fb692d3 ("wifi: ath11k:
> > Enable threaded NAPI") so that a simple revert is no longer possible.
> >
>
> This is getting as weird as it would get :)
>
> > As commit d265ebe41c91 ("Revert "wifi: ath11k: Enable threaded NAPI"")
> > does not address the underlying issue reported with QCN9074, it seems we
> > need to reenable threaded NAPI before fixing both bugs properly.
> >
>
> It seems that the revert has actually solved the issue reported with
> QCN9074.
>
> https://bugzilla.kernel.org/show_bug.cgi?id=217536

Sure, but it's only a workaround as the underlying cause has not been
identified.

> We were trying to reproduce the problem on X86+QCN9074 (with threaded
> NAPI) from quite some time, but there is no repro yet.
>
> Actually, enabling/disabling threaded NAPI is a simple affair; I'm
> wondering to hear that interrupts are blocked due to not having
> threaded NAPI.

It sounds to me like the driver's locking is broken if moving to softirq
processing hangs the machine like this. But I have not had time to try
to try to track it down besides verifying that reenabling threaded NAPI
makes the problem go away.

> What is the chip that Lenovo Thinkpad X13s is having?

It's a WCN6855 (QCNFA765).

Johan

2023-08-10 05:00:37

by Manikanta Pubbisetty

[permalink] [raw]
Subject: Re: [PATCH] Revert "Revert "wifi: ath11k: Enable threaded NAPI""



On 8/9/2023 2:46 PM, Johan Hovold wrote:
> On Wed, Aug 09, 2023 at 02:32:37PM +0530, Manikanta Pubbisetty wrote:
>> On 8/9/2023 1:04 PM, Johan Hovold wrote:
>>> This reverts commit d265ebe41c911314bd273c218a37088835959fa1.
>>>
>>> Disabling threaded NAPI causes the Lenovo ThinkPad X13s to hang (e.g. no
>>> more interrupts received) almost immediately during RX.
>>>
>>> Apparently something broke since commit 13aa2fb692d3 ("wifi: ath11k:
>>> Enable threaded NAPI") so that a simple revert is no longer possible.
>>>
>>
>> This is getting as weird as it would get :)
>>
>>> As commit d265ebe41c91 ("Revert "wifi: ath11k: Enable threaded NAPI"")
>>> does not address the underlying issue reported with QCN9074, it seems we
>>> need to reenable threaded NAPI before fixing both bugs properly.
>>>
>>
>> It seems that the revert has actually solved the issue reported with
>> QCN9074.
>>
>> https://bugzilla.kernel.org/show_bug.cgi?id=217536
>
> Sure, but it's only a workaround as the underlying cause has not been
> identified.
>
>> We were trying to reproduce the problem on X86+QCN9074 (with threaded
>> NAPI) from quite some time, but there is no repro yet.
>>
>> Actually, enabling/disabling threaded NAPI is a simple affair; I'm
>> wondering to hear that interrupts are blocked due to not having
>> threaded NAPI.
>
> It sounds to me like the driver's locking is broken if moving to softirq
> processing hangs the machine like this. But I have not had time to try
> to try to track it down besides verifying that reenabling threaded NAPI
> makes the problem go away.
>
>> What is the chip that Lenovo Thinkpad X13s is having?
>
> It's a WCN6855 (QCNFA765).
>

Also it is worth to give a try with this patch here
https://patchwork.kernel.org/project/linux-wireless/patch/[email protected]/
. This seems to be fixing some known interrupt issue on WCN6855. Could
you pls give a try?

Thanks,
Manikanta

2023-08-10 06:03:09

by Manikanta Pubbisetty

[permalink] [raw]
Subject: Re: [PATCH] Revert "Revert "wifi: ath11k: Enable threaded NAPI""

On 8/9/2023 2:46 PM, Johan Hovold wrote:
> On Wed, Aug 09, 2023 at 02:32:37PM +0530, Manikanta Pubbisetty wrote:
>> On 8/9/2023 1:04 PM, Johan Hovold wrote:
>>> This reverts commit d265ebe41c911314bd273c218a37088835959fa1.
>>>
>>> Disabling threaded NAPI causes the Lenovo ThinkPad X13s to hang (e.g. no
>>> more interrupts received) almost immediately during RX.
>>>
>>> Apparently something broke since commit 13aa2fb692d3 ("wifi: ath11k:
>>> Enable threaded NAPI") so that a simple revert is no longer possible.
>>>
>>
>> This is getting as weird as it would get :)
>>
>>> As commit d265ebe41c91 ("Revert "wifi: ath11k: Enable threaded NAPI"")
>>> does not address the underlying issue reported with QCN9074, it seems we
>>> need to reenable threaded NAPI before fixing both bugs properly.
>>>
>>
>> It seems that the revert has actually solved the issue reported with
>> QCN9074.
>>
>> https://bugzilla.kernel.org/show_bug.cgi?id=217536
>
> Sure, but it's only a workaround as the underlying cause has not been
> identified.
>
>> We were trying to reproduce the problem on X86+QCN9074 (with threaded
>> NAPI) from quite some time, but there is no repro yet.
>>
>> Actually, enabling/disabling threaded NAPI is a simple affair; I'm
>> wondering to hear that interrupts are blocked due to not having
>> threaded NAPI.
>
> It sounds to me like the driver's locking is broken if moving to softirq
> processing hangs the machine like this. But I have not had time to try
> to try to track it down besides verifying that reenabling threaded NAPI
> makes the problem go away.
>
>> What is the chip that Lenovo Thinkpad X13s is having?
>
> It's a WCN6855 (QCNFA765).
>

WCN6855 & QCN9074 share the same driver code base since both being PCIe
devices. One working and another not working seems to be surprising. Do
you have a dmesg log when this problem occurred?

We are working on to root cause the original problem. The hindrance as
of today is that we are not able to repro this so far in Qualcomm. We
are planning to work with the reporter to get more logs.

Thanks,
Manikanta

2023-08-21 14:16:49

by Johan Hovold

[permalink] [raw]
Subject: Re: [PATCH] Revert "Revert "wifi: ath11k: Enable threaded NAPI""

On Thu, Aug 10, 2023 at 10:03:55AM +0530, Manikanta Pubbisetty wrote:
> On 8/9/2023 2:46 PM, Johan Hovold wrote:
> > On Wed, Aug 09, 2023 at 02:32:37PM +0530, Manikanta Pubbisetty wrote:
> > > On 8/9/2023 1:04 PM, Johan Hovold wrote:
> > > > This reverts commit d265ebe41c911314bd273c218a37088835959fa1.
> > > >
> > > > Disabling threaded NAPI causes the Lenovo ThinkPad X13s to hang (e.g. no
> > > > more interrupts received) almost immediately during RX.
> > > >
> > > > Apparently something broke since commit 13aa2fb692d3 ("wifi: ath11k:
> > > > Enable threaded NAPI") so that a simple revert is no longer possible.

> > > What is the chip that Lenovo Thinkpad X13s is having?
> >
> > It's a WCN6855 (QCNFA765).
>
> WCN6855 & QCN9074 share the same driver code base since both being PCIe
> devices. One working and another not working seems to be surprising. Do you
> have a dmesg log when this problem occurred?

I can't access the logs after I hit this bug (e.g. as there are no
interrupts received from the keyboard) but when I trigger this from the
console, there is nothing logged when the hang happens.

Later, secondary errors are logged from other drivers that are no longer
receiving interrupts either, and RCU detects a stall as I mentioned
elsewhere.

Johan