2024-01-29 10:10:55

by Baochen Qiang

[permalink] [raw]
Subject: ath11k resume fails due to kernel blocks probing MHI virtual devices

Hi Rafael and Pavel,

Currently I am facing an ath11k (a kernel WLAN driver) resume issue
related with kernel PM framework and MHI module.

Before introducing the issue details, I'd like to summarize how ath11k
interacts with MHI stack to download WLAN firmware to hardware target:
1. when booting/restarting, ath11k powers on MHI module and waits for
MHI channels to be ready.
2. When power on, MHI stack creates some virtual MHI devices, which
represents MHI hardware channels, and adds them to MHI bus. This
triggers MHI client driver, named QRTR, to get matched and probe those
MHI devices. In probe, QRTR initializes MHI channels and finally move
them to ready state.
3. Once MHI channels ready, ath11k downloads WLAN firmware to hardware
target, then WLAN is working.

Such an flow works well in general, but introduces issues in hibernation
cycle: when preparing for hibernation, ath11k powers down MHI, this
results in MHI devices being destroyed thus QRTR resets MHI channels.
When resuming back from hibernation, ath11k powers on MHI and waits for
MHI channels to be ready in its resume callback. As said above, MHI
creates and adds MHI devices to MHI bus, but they can't be probed at
that time because device probe is prohibited in device_block_probing(),
finally this results in ath11k resume timeout.

Now there is an potential fix to this issue which would needs changes in
MHI stack, i.e., don't destroy MHI devices while hibernating. And we
have had a plenty talk with MHI community regarding this change, see [1]
and [2].

However Mani (the MHI maintainer) doesn't think it's right to fix it in
MHI stack. Instead, he thought we might need to add a new PM callback
which will be called after device probe is unblocked. By registering
such a callback ath11k can wait the dependency driver, i.e., QRTR, to
probe and initialize those MHI devices.

Your thoughts?


[1] https://lists.infradead.org/pipermail/ath11k/2023-December/005098.html
[2] https://lists.infradead.org/pipermail/ath11k/2024-January/005205.html


2024-01-29 12:23:12

by Rafael J. Wysocki

[permalink] [raw]
Subject: Re: ath11k resume fails due to kernel blocks probing MHI virtual devices

On Mon, Jan 29, 2024 at 11:10 AM Baochen Qiang <[email protected]> wrote:
>
> Hi Rafael and Pavel,
>
> Currently I am facing an ath11k (a kernel WLAN driver) resume issue
> related with kernel PM framework and MHI module.
>
> Before introducing the issue details, I'd like to summarize how ath11k
> interacts with MHI stack to download WLAN firmware to hardware target:
> 1. when booting/restarting, ath11k powers on MHI module and waits for
> MHI channels to be ready.
> 2. When power on, MHI stack creates some virtual MHI devices, which
> represents MHI hardware channels, and adds them to MHI bus. This
> triggers MHI client driver, named QRTR, to get matched and probe those
> MHI devices. In probe, QRTR initializes MHI channels and finally move
> them to ready state.
> 3. Once MHI channels ready, ath11k downloads WLAN firmware to hardware
> target, then WLAN is working.
>
> Such an flow works well in general, but introduces issues in hibernation
> cycle: when preparing for hibernation, ath11k powers down MHI, this
> results in MHI devices being destroyed thus QRTR resets MHI channels.
> When resuming back from hibernation, ath11k powers on MHI and waits for
> MHI channels to be ready in its resume callback. As said above, MHI
> creates and adds MHI devices to MHI bus, but they can't be probed at
> that time because device probe is prohibited in device_block_probing(),
> finally this results in ath11k resume timeout.
>
> Now there is an potential fix to this issue which would needs changes in
> MHI stack, i.e., don't destroy MHI devices while hibernating.

Exactly.

> And we have had a plenty talk with MHI community regarding this change, see [1]
> and [2].
>
> However Mani (the MHI maintainer) doesn't think it's right to fix it in
> MHI stack. Instead, he thought we might need to add a new PM callback
> which will be called after device probe is unblocked. By registering
> such a callback ath11k can wait the dependency driver, i.e., QRTR, to
> probe and initialize those MHI devices.
>
> Your thoughts?

I'm not quite sure why do the pointless device destruction and
re-creation in the hibernation frlo and add a new callback to the PM
core to work around this.

It doesn't sound like a straightforward approach to me.

> [1] https://lists.infradead.org/pipermail/ath11k/2023-December/005098.html
> [2] https://lists.infradead.org/pipermail/ath11k/2024-January/005205.html

2024-01-29 12:31:30

by Manivannan Sadhasivam

[permalink] [raw]
Subject: Re: ath11k resume fails due to kernel blocks probing MHI virtual devices

On Mon, Jan 29, 2024 at 01:22:27PM +0100, Rafael J. Wysocki wrote:
> On Mon, Jan 29, 2024 at 11:10 AM Baochen Qiang <[email protected]> wrote:
> >
> > Hi Rafael and Pavel,
> >
> > Currently I am facing an ath11k (a kernel WLAN driver) resume issue
> > related with kernel PM framework and MHI module.
> >
> > Before introducing the issue details, I'd like to summarize how ath11k
> > interacts with MHI stack to download WLAN firmware to hardware target:
> > 1. when booting/restarting, ath11k powers on MHI module and waits for
> > MHI channels to be ready.
> > 2. When power on, MHI stack creates some virtual MHI devices, which
> > represents MHI hardware channels, and adds them to MHI bus. This
> > triggers MHI client driver, named QRTR, to get matched and probe those
> > MHI devices. In probe, QRTR initializes MHI channels and finally move
> > them to ready state.
> > 3. Once MHI channels ready, ath11k downloads WLAN firmware to hardware
> > target, then WLAN is working.
> >
> > Such an flow works well in general, but introduces issues in hibernation
> > cycle: when preparing for hibernation, ath11k powers down MHI, this
> > results in MHI devices being destroyed thus QRTR resets MHI channels.
> > When resuming back from hibernation, ath11k powers on MHI and waits for
> > MHI channels to be ready in its resume callback. As said above, MHI
> > creates and adds MHI devices to MHI bus, but they can't be probed at
> > that time because device probe is prohibited in device_block_probing(),
> > finally this results in ath11k resume timeout.
> >
> > Now there is an potential fix to this issue which would needs changes in
> > MHI stack, i.e., don't destroy MHI devices while hibernating.
>
> Exactly.
>

During hibernation, the power to ath11k could be lost and in that case, there
will be no channels available from the device. So keeping the "struct dev" when
there is no real device attached to the system, goes against the driver model
IMO since we would be messing with the refcount.

For instance in the case of USB, if the device get's unplugged, would it make
sense to keep the "struct dev" for the device in kernel in a hope that it would
come back again?

The driver model as I understood is, once the actual physical device gets
removed, the refcount for "struct dev" should be decremented and it should be
destroyed.

- Mani

--
மணிவண்ணன் சதாசிவம்

2024-01-29 12:37:59

by Rafael J. Wysocki

[permalink] [raw]
Subject: Re: ath11k resume fails due to kernel blocks probing MHI virtual devices

On Mon, Jan 29, 2024 at 1:31 PM Manivannan Sadhasivam <[email protected]> wrote:
>
> On Mon, Jan 29, 2024 at 01:22:27PM +0100, Rafael J. Wysocki wrote:
> > On Mon, Jan 29, 2024 at 11:10 AM Baochen Qiang <[email protected]> wrote:
> > >
> > > Hi Rafael and Pavel,
> > >
> > > Currently I am facing an ath11k (a kernel WLAN driver) resume issue
> > > related with kernel PM framework and MHI module.
> > >
> > > Before introducing the issue details, I'd like to summarize how ath11k
> > > interacts with MHI stack to download WLAN firmware to hardware target:
> > > 1. when booting/restarting, ath11k powers on MHI module and waits for
> > > MHI channels to be ready.
> > > 2. When power on, MHI stack creates some virtual MHI devices, which
> > > represents MHI hardware channels, and adds them to MHI bus. This
> > > triggers MHI client driver, named QRTR, to get matched and probe those
> > > MHI devices. In probe, QRTR initializes MHI channels and finally move
> > > them to ready state.
> > > 3. Once MHI channels ready, ath11k downloads WLAN firmware to hardware
> > > target, then WLAN is working.
> > >
> > > Such an flow works well in general, but introduces issues in hibernation
> > > cycle: when preparing for hibernation, ath11k powers down MHI, this
> > > results in MHI devices being destroyed thus QRTR resets MHI channels.
> > > When resuming back from hibernation, ath11k powers on MHI and waits for
> > > MHI channels to be ready in its resume callback. As said above, MHI
> > > creates and adds MHI devices to MHI bus, but they can't be probed at
> > > that time because device probe is prohibited in device_block_probing(),
> > > finally this results in ath11k resume timeout.
> > >
> > > Now there is an potential fix to this issue which would needs changes in
> > > MHI stack, i.e., don't destroy MHI devices while hibernating.
> >
> > Exactly.
> >
>
> During hibernation, the power to ath11k could be lost and in that case, there
> will be no channels available from the device. So keeping the "struct dev" when
> there is no real device attached to the system, goes against the driver model
> IMO since we would be messing with the refcount.

But this is system hibernation or suspend and the reason for the power
loss is quite different from device removal at run time.

The device is going to be back during resume (or at least it is not
expected to go away in the meantime), so it is pointless to destroy
its representation in memory.

> For instance in the case of USB, if the device get's unplugged, would it make
> sense to keep the "struct dev" for the device in kernel in a hope that it would
> come back again?

At run time - no, during system suspend - yes.

It is not even recommended to free IRQs during system suspend.

> The driver model as I understood is, once the actual physical device gets
> removed, the refcount for "struct dev" should be decremented and it should be
> destroyed.

Not really.

Thanks!

2024-01-29 12:51:14

by Manivannan Sadhasivam

[permalink] [raw]
Subject: Re: ath11k resume fails due to kernel blocks probing MHI virtual devices

On Mon, Jan 29, 2024 at 01:37:41PM +0100, Rafael J. Wysocki wrote:
> On Mon, Jan 29, 2024 at 1:31 PM Manivannan Sadhasivam <[email protected]> wrote:
> >
> > On Mon, Jan 29, 2024 at 01:22:27PM +0100, Rafael J. Wysocki wrote:
> > > On Mon, Jan 29, 2024 at 11:10 AM Baochen Qiang <[email protected]> wrote:
> > > >
> > > > Hi Rafael and Pavel,
> > > >
> > > > Currently I am facing an ath11k (a kernel WLAN driver) resume issue
> > > > related with kernel PM framework and MHI module.
> > > >
> > > > Before introducing the issue details, I'd like to summarize how ath11k
> > > > interacts with MHI stack to download WLAN firmware to hardware target:
> > > > 1. when booting/restarting, ath11k powers on MHI module and waits for
> > > > MHI channels to be ready.
> > > > 2. When power on, MHI stack creates some virtual MHI devices, which
> > > > represents MHI hardware channels, and adds them to MHI bus. This
> > > > triggers MHI client driver, named QRTR, to get matched and probe those
> > > > MHI devices. In probe, QRTR initializes MHI channels and finally move
> > > > them to ready state.
> > > > 3. Once MHI channels ready, ath11k downloads WLAN firmware to hardware
> > > > target, then WLAN is working.
> > > >
> > > > Such an flow works well in general, but introduces issues in hibernation
> > > > cycle: when preparing for hibernation, ath11k powers down MHI, this
> > > > results in MHI devices being destroyed thus QRTR resets MHI channels.
> > > > When resuming back from hibernation, ath11k powers on MHI and waits for
> > > > MHI channels to be ready in its resume callback. As said above, MHI
> > > > creates and adds MHI devices to MHI bus, but they can't be probed at
> > > > that time because device probe is prohibited in device_block_probing(),
> > > > finally this results in ath11k resume timeout.
> > > >
> > > > Now there is an potential fix to this issue which would needs changes in
> > > > MHI stack, i.e., don't destroy MHI devices while hibernating.
> > >
> > > Exactly.
> > >
> >
> > During hibernation, the power to ath11k could be lost and in that case, there
> > will be no channels available from the device. So keeping the "struct dev" when
> > there is no real device attached to the system, goes against the driver model
> > IMO since we would be messing with the refcount.
>
> But this is system hibernation or suspend and the reason for the power
> loss is quite different from device removal at run time.
>
> The device is going to be back during resume (or at least it is not
> expected to go away in the meantime), so it is pointless to destroy
> its representation in memory.
>
> > For instance in the case of USB, if the device get's unplugged, would it make
> > sense to keep the "struct dev" for the device in kernel in a hope that it would
> > come back again?
>
> At run time - no, during system suspend - yes.
>
> It is not even recommended to free IRQs during system suspend.
>

Hmm, okay. Thanks for clearing it up.

> > The driver model as I understood is, once the actual physical device gets
> > removed, the refcount for "struct dev" should be decremented and it should be
> > destroyed.
>
> Not really.
>

Okay. My undestanding seem to be wrong then. I will move forward with the
proposal to keep the devices.

- Mani

--
மணிவண்ணன் சதாசிவம்