2022-02-02 09:52:59

by Keith Busch

[permalink] [raw]
Subject: Re: Query related to shutting down NVMe during system suspend

On Tue, Feb 01, 2022 at 09:52:28PM +0530, Vidya Sagar wrote:
> Hi Rafael & Christoph,
> My query is regarding the comment and the code that follows after it at
> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/drivers/nvme/host/pci.c?h=v5.17-rc2#n3243
> What I understood from it is that, there is an underlying assumption
> that the power to the devices is not removed during the suspend call.
> In the case of device-tree based platforms like Tegra194, power is
> indeed removed to the devices during suspend-resume process. Hence, the
> NVMe devices need to be taken through the shutdown path irrespective of
> whether the ASPM states are enabled or not.
> I would like to hear from you the best method to follow to achieve this.

Since platform makers can't converge on how to let a driver know what
it's supposed to do, I suggest we default to the simple shutdown suspend
all the time. We can add a module parameter to let a user request nvme
power management if they really want it. No matter what we do here,
someone is going to complain, but at least simple shutdown is safe...


2022-02-02 14:53:09

by Rafael J. Wysocki

[permalink] [raw]
Subject: Re: Query related to shutting down NVMe during system suspend

On Tuesday, February 1, 2022 5:30:54 PM CET Keith Busch wrote:
> On Tue, Feb 01, 2022 at 09:52:28PM +0530, Vidya Sagar wrote:
> > Hi Rafael & Christoph,
> > My query is regarding the comment and the code that follows after it at
> > https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/drivers/nvme/host/pci.c?h=v5.17-rc2#n3243
> > What I understood from it is that, there is an underlying assumption
> > that the power to the devices is not removed during the suspend call.
> > In the case of device-tree based platforms like Tegra194, power is
> > indeed removed to the devices during suspend-resume process. Hence, the
> > NVMe devices need to be taken through the shutdown path irrespective of
> > whether the ASPM states are enabled or not.
> > I would like to hear from you the best method to follow to achieve this.
>
> Since platform makers can't converge on how to let a driver know what
> it's supposed to do, I suggest we default to the simple shutdown suspend
> all the time. We can add a module parameter to let a user request nvme
> power management if they really want it. No matter what we do here,
> someone is going to complain, but at least simple shutdown is safe...

Agreed.



2022-02-03 15:40:54

by Vidya Sagar

[permalink] [raw]
Subject: Re: Query related to shutting down NVMe during system suspend

Thanks for the super quick reply and I couldn't agree more.

On 2/1/2022 10:00 PM, Keith Busch wrote:
> External email: Use caution opening links or attachments
>
>
> On Tue, Feb 01, 2022 at 09:52:28PM +0530, Vidya Sagar wrote:
>> Hi Rafael & Christoph,
>> My query is regarding the comment and the code that follows after it at
>> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/drivers/nvme/host/pci.c?h=v5.17-rc2#n3243
>> What I understood from it is that, there is an underlying assumption
>> that the power to the devices is not removed during the suspend call.
>> In the case of device-tree based platforms like Tegra194, power is
>> indeed removed to the devices during suspend-resume process. Hence, the
>> NVMe devices need to be taken through the shutdown path irrespective of
>> whether the ASPM states are enabled or not.
>> I would like to hear from you the best method to follow to achieve this.
>
> Since platform makers can't converge on how to let a driver know what
> it's supposed to do, I suggest we default to the simple shutdown suspend
> all the time. We can add a module parameter to let a user request nvme
> power management if they really want it. No matter what we do here,
> someone is going to complain, but at least simple shutdown is safe...
>

2022-02-09 06:21:58

by Nitin Rawat

[permalink] [raw]
Subject: Re: Query related to shutting down NVMe during system suspend

On 2022-02-01 22:28, Vidya Sagar wrote:
> Thanks for the super quick reply and I couldn't agree more.
>
> On 2/1/2022 10:00 PM, Keith Busch wrote:
>> External email: Use caution opening links or attachments
>>
>>
>> On Tue, Feb 01, 2022 at 09:52:28PM +0530, Vidya Sagar wrote:
>>> Hi Rafael & Christoph,
>>> My query is regarding the comment and the code that follows after it
>>> at
>>> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/drivers/nvme/host/pci.c?h=v5.17-rc2#n3243
>>> What I understood from it is that, there is an underlying assumption
>>> that the power to the devices is not removed during the suspend call.
>>> In the case of device-tree based platforms like Tegra194, power is
>>> indeed removed to the devices during suspend-resume process. Hence,
>>> the
>>> NVMe devices need to be taken through the shutdown path irrespective
>>> of
>>> whether the ASPM states are enabled or not.
>>> I would like to hear from you the best method to follow to achieve
>>> this.
>>
>> Since platform makers can't converge on how to let a driver know what
>> it's supposed to do, I suggest we default to the simple shutdown
>> suspend
>> all the time. We can add a module parameter to let a user request nvme
>> power management if they really want it. No matter what we do here,
>> someone is going to complain, but at least simple shutdown is safe...
>>

Hi Vidya,

Are you planning to add module parameter based on above discussion. I
see similar behaviour even with qualcomm platform.

[ 119.994092] nvme nvme0: I/O 9 QID 0 timeout, reset controller
[ 120.006612] PM: dpm_run_callback(): pci_pm_resume+0x0/0xe4 returns
-16
[ 120.013502] nvme 0001:01:00.0: PM: pci_pm_resume+0x0/0xe4 returned
-16 after 60059958 usecs
[ 120.022239] nvme 0001:01:00.0: PM: failed to resume async: error -16

Regards,
Nitin