2023-11-20 17:18:49

by James Prestwood

[permalink] [raw]
Subject: Power save setting while interface is down

Hi,

I've noticed a somewhat rare behavior (<1% of the time) where bringing
an interface up fails after disabling power save. This is done by IWD
when a user-option is enabled, and follows an interface
removal/creation. The sequence is:

1. Remove default interface

2. Create new interface

3. Disable power save on new interface

4. Bring up new interface (rarely fails with -110).

iwd[1286641]: src/netdev.c:netdev_disable_ps_cb() Disabled power save
for ifindex 54
kernel: ath10k_pci 0000:02:00.0: wmi service ready event not received
iwd[1286641]: Error bringing interface 54 up: Connection timed out
kernel: ath10k_pci 0000:02:00.0: Could not init core: -110

I'm thinking maybe there is a race between creating the fresh interface
and disabling PS, then bringing the interface back up?

Setting PS after the interface is up (so far) seems to be 100% reliable
and have not yet seen this behavior. Regardless of any kernel fixes I'll
need to reorder the command sequence in IWD to support existing kernels,
but before I work around this issue I wanted to point it out so its not
hidden from view.

Thanks,

James


2023-11-21 14:16:44

by Kalle Valo

[permalink] [raw]
Subject: Re: Power save setting while interface is down

James Prestwood <[email protected]> writes:

> Hi,
>
> I've noticed a somewhat rare behavior (<1% of the time) where bringing
> an interface up fails after disabling power save. This is done by IWD
> when a user-option is enabled, and follows an interface
> removal/creation. The sequence is:
>
> 1. Remove default interface
>
> 2. Create new interface
>
> 3. Disable power save on new interface
>
> 4. Bring up new interface (rarely fails with -110).
>
> iwd[1286641]: src/netdev.c:netdev_disable_ps_cb() Disabled power save
> for ifindex 54
> kernel: ath10k_pci 0000:02:00.0: wmi service ready event not received
> iwd[1286641]: Error bringing interface 54 up: Connection timed out
> kernel: ath10k_pci 0000:02:00.0: Could not init core: -110

So I guess you have a test setup to reproduce this easily? How long does
it take to see this?

What hardware and firmware version are you using?

> I'm thinking maybe there is a race between creating the fresh
> interface and disabling PS, then bringing the interface back up?

Yeah, looks like it. If it's relatively easy for you to reproduce this
you could try adding delays to suitable spots in ath10k. That might be
give more ideas where the race is exactly.

> Setting PS after the interface is up (so far) seems to be 100%
> reliable and have not yet seen this behavior. Regardless of any kernel
> fixes I'll need to reorder the command sequence in IWD to support
> existing kernels, but before I work around this issue I wanted to
> point it out so its not hidden from view.

To me working around this in user space doesn't sound like a good idea.
I would rather have a workaround in the kernel (but obviously I hope we
can find a proper fix).

--
https://patchwork.kernel.org/project/linux-wireless/list/

https://wireless.wiki.kernel.org/en/developers/documentation/submittingpatches

2023-11-21 15:14:23

by James Prestwood

[permalink] [raw]
Subject: Re: Power save setting while interface is down

Hi Kalle,

On 11/21/23 06:15, Kalle Valo wrote:
> James Prestwood <[email protected]> writes:
>
>> Hi,
>>
>> I've noticed a somewhat rare behavior (<1% of the time) where bringing
>> an interface up fails after disabling power save. This is done by IWD
>> when a user-option is enabled, and follows an interface
>> removal/creation. The sequence is:
>>
>> 1. Remove default interface
>>
>> 2. Create new interface
>>
>> 3. Disable power save on new interface
>>
>> 4. Bring up new interface (rarely fails with -110).
>>
>> iwd[1286641]: src/netdev.c:netdev_disable_ps_cb() Disabled power save
>> for ifindex 54
>> kernel: ath10k_pci 0000:02:00.0: wmi service ready event not received
>> iwd[1286641]: Error bringing interface 54 up: Connection timed out
>> kernel: ath10k_pci 0000:02:00.0: Could not init core: -110
> So I guess you have a test setup to reproduce this easily? How long does
> it take to see this?

It really depends. I was letting IWD do the work and waiting excessively
which could sometimes take an hour. But I managed to get it to happen 20
seconds apart using ip/iw in a loop. On average it usually happens
within 5-10 minutes, sometimes sooner.

$ dmesg --follow & $ for i in $(seq 1 4000); do sudo ip link set wlan0
down; sudo iw dev wlan0 set power_save off; sudo ip link set wlan0 up; done

[ 882.927218] ath10k_pci 0000:02:00.0: wmi service ready event not
received RTNETLINK answers: Connection timed out [ 883.016550]
ath10k_pci 0000:02:00.0: Could not init core: -110 [ 904.434930]
ath10k_pci 0000:02:00.0: wmi service ready event not received RTNETLINK
answers: Connection timed out [ 904.524287] ath10k_pci 0000:02:00.0:
Could not init core: -110

> What hardware and firmware version are you using?
[    5.680909] ath10k_pci 0000:02:00.0: qca6174 hw3.2 target 0x05030000
chip_id 0x00340aff sub 1dac:0261
[    5.680921] ath10k_pci 0000:02:00.0: kconfig debug 0 debugfs 1
tracing 1 dfs 0 testmode 0

[    5.681607] ath10k_pci 0000:02:00.0: firmware ver
WLAN.RM.4.4.1-00309- api 6 features wowlan,ignore-otp,mfp crc32 0793bcf2

I've seen this on 00241, 00288, and the latest firmware above.

>> I'm thinking maybe there is a race between creating the fresh
>> interface and disabling PS, then bringing the interface back up?
> Yeah, looks like it. If it's relatively easy for you to reproduce this
> you could try adding delays to suitable spots in ath10k. That might be
> give more ideas where the race is exactly.

Ok, I can add a delay ahead of the WMI request that's timing out and see
what happens.

>
>> Setting PS after the interface is up (so far) seems to be 100%
>> reliable and have not yet seen this behavior. Regardless of any kernel
>> fixes I'll need to reorder the command sequence in IWD to support
>> existing kernels, but before I work around this issue I wanted to
>> point it out so its not hidden from view.
> To me working around this in user space doesn't sound like a good idea.
> I would rather have a workaround in the kernel (but obviously I hope we
> can find a proper fix).

Yep, totally agree.

Thanks,

James