2022-02-02 13:15:07

by Vidya Sagar

[permalink] [raw]
Subject: Query related to shutting down NVMe during system suspend

Hi Rafael & Christoph,
My query is regarding the comment and the code that follows after it at
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/drivers/nvme/host/pci.c?h=v5.17-rc2#n3243
What I understood from it is that, there is an underlying assumption
that the power to the devices is not removed during the suspend call.
In the case of device-tree based platforms like Tegra194, power is
indeed removed to the devices during suspend-resume process. Hence, the
NVMe devices need to be taken through the shutdown path irrespective of
whether the ASPM states are enabled or not.
I would like to hear from you the best method to follow to achieve this.

Thanks & Regards,
Vidya Sagar


2022-02-08 08:42:43

by Vidya Sagar

[permalink] [raw]
Subject: Re: Query related to shutting down NVMe during system suspend



On 2/7/2022 4:27 PM, [email protected] wrote:
> External email: Use caution opening links or attachments
>
>
> On 2022-02-01 22:28, Vidya Sagar wrote:
>> Thanks for the super quick reply and I couldn't agree more.
>>
>> On 2/1/2022 10:00 PM, Keith Busch wrote:
>>> External email: Use caution opening links or attachments
>>>
>>>
>>> On Tue, Feb 01, 2022 at 09:52:28PM +0530, Vidya Sagar wrote:
>>>> Hi Rafael & Christoph,
>>>> My query is regarding the comment and the code that follows after it
>>>> at
>>>> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/drivers/nvme/host/pci.c?h=v5.17-rc2#n3243
>>>>
>>>> What I understood from it is that, there is an underlying assumption
>>>> that the power to the devices is not removed during the suspend call.
>>>> In the case of device-tree based platforms like Tegra194, power is
>>>> indeed removed to the devices during suspend-resume process. Hence,
>>>> the
>>>> NVMe devices need to be taken through the shutdown path irrespective
>>>> of
>>>> whether the ASPM states are enabled or not.
>>>> I would like to hear from you the best method to follow to achieve
>>>> this.
>>>
>>> Since platform makers can't converge on how to let a driver know what
>>> it's supposed to do, I suggest we default to the simple shutdown
>>> suspend
>>> all the time. We can add a module parameter to let a user request nvme
>>> power management if they really want it. No matter what we do here,
>>> someone is going to complain, but at least simple shutdown is safe...
>>>
>
> Hi Vidya,
>
> Are you planning to add module parameter based on above discussion. I
> see similar behaviour even with  qualcomm platform.
>
> [  119.994092] nvme nvme0: I/O 9 QID 0 timeout, reset controller
> [  120.006612] PM: dpm_run_callback(): pci_pm_resume+0x0/0xe4 returns
> -16
> [  120.013502] nvme 0001:01:00.0: PM: pci_pm_resume+0x0/0xe4 returned
> -16 after 60059958 usecs
> [  120.022239] nvme 0001:01:00.0: PM: failed to resume async: error -16
Not really.
Keith Busch has already pushed a patch to fix it in a different way and
issue is resolved (on Tegra platforms) with that patch.
https://lore.kernel.org/all/[email protected]/
is that patch.

Thanks & Regards,
Vidya Sagar
>
> Regards,
> Nitin
>
>

2022-02-08 22:26:51

by Nitin Rawat

[permalink] [raw]
Subject: Re: Query related to shutting down NVMe during system suspend

On 2022-02-07 17:41, Vidya Sagar wrote:
> On 2/7/2022 4:27 PM, [email protected] wrote:
>> External email: Use caution opening links or attachments
>>
>>
>> On 2022-02-01 22:28, Vidya Sagar wrote:
>>> Thanks for the super quick reply and I couldn't agree more.
>>>
>>> On 2/1/2022 10:00 PM, Keith Busch wrote:
>>>> External email: Use caution opening links or attachments
>>>>
>>>>
>>>> On Tue, Feb 01, 2022 at 09:52:28PM +0530, Vidya Sagar wrote:
>>>>> Hi Rafael & Christoph,
>>>>> My query is regarding the comment and the code that follows after
>>>>> it
>>>>> at
>>>>> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/drivers/nvme/host/pci.c?h=v5.17-rc2#n3243
>>>>> What I understood from it is that, there is an underlying
>>>>> assumption
>>>>> that the power to the devices is not removed during the suspend
>>>>> call.
>>>>> In the case of device-tree based platforms like Tegra194, power is
>>>>> indeed removed to the devices during suspend-resume process. Hence,
>>>>> the
>>>>> NVMe devices need to be taken through the shutdown path
>>>>> irrespective
>>>>> of
>>>>> whether the ASPM states are enabled or not.
>>>>> I would like to hear from you the best method to follow to achieve
>>>>> this.
>>>>
>>>> Since platform makers can't converge on how to let a driver know
>>>> what
>>>> it's supposed to do, I suggest we default to the simple shutdown
>>>> suspend
>>>> all the time. We can add a module parameter to let a user request
>>>> nvme
>>>> power management if they really want it. No matter what we do here,
>>>> someone is going to complain, but at least simple shutdown is
>>>> safe...
>>>>
>>
>> Hi Vidya,
>>
>> Are you planning to add module parameter based on above discussion. I
>> see similar behaviour even with  qualcomm platform.
>>
>> [  119.994092] nvme nvme0: I/O 9 QID 0 timeout, reset controller
>> [  120.006612] PM: dpm_run_callback(): pci_pm_resume+0x0/0xe4 returns
>> -16
>> [  120.013502] nvme 0001:01:00.0: PM: pci_pm_resume+0x0/0xe4 returned
>> -16 after 60059958 usecs
>> [  120.022239] nvme 0001:01:00.0: PM: failed to resume async: error
>> -16
> Not really.
> Keith Busch has already pushed a patch to fix it in a different way
> and issue is resolved (on Tegra platforms) with that patch.
> https://lore.kernel.org/all/[email protected]/
> is that patch.
>
> Thanks & Regards,
> Vidya Sagar
>>
>> Regards,
>> Nitin
>>
>>

Thanks Vidya for pointing out the patch . This patch worked for us as
well.
@keith - Please can we get this merged .

Regards,
Nitin

2022-02-09 08:38:57

by Keith Busch

[permalink] [raw]
Subject: Re: Query related to shutting down NVMe during system suspend

On Mon, Feb 07, 2022 at 09:14:49PM +0530, [email protected] wrote:
> Thanks Vidya for pointing out the patch . This patch worked for us as well.
> @keith - Please can we get this merged .

I'm working on it, but I don't think the proposed patch was acceptable.
I'm going to look at another option today.

2022-02-09 23:19:32

by Keith Busch

[permalink] [raw]
Subject: Re: Query related to shutting down NVMe during system suspend

On Thu, Feb 10, 2022 at 02:47:47AM +0530, [email protected] wrote:
> IMO, the NVME driver is not associated with any device tree, Instead PCI
> driver is associated with device tree.

That may be good, though: the idle behavior isn't unqiue to nvme. It may
be fine if PCI is the common layer to own this, as long as drivers can
query it.

> So unlike ACPI based platform where we have platform specific DMI matching,
> we don't have equivalent check for DT based platform.

Is there any existing kernel API a driver can call to uniquely identify
such a platform?

> Do we see any concern if we introduce a module param with default not set to
> quick suspend.

As of now, the idea was proposed and was not accepted.

2022-02-09 23:25:57

by Keith Busch

[permalink] [raw]
Subject: Re: Query related to shutting down NVMe during system suspend

On Tue, Feb 01, 2022 at 09:52:28PM +0530, Vidya Sagar wrote:
> Hi Rafael & Christoph,
> My query is regarding the comment and the code that follows after it at
> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/drivers/nvme/host/pci.c?h=v5.17-rc2#n3243
> What I understood from it is that, there is an underlying assumption
> that the power to the devices is not removed during the suspend call.
> In the case of device-tree based platforms like Tegra194, power is
> indeed removed to the devices during suspend-resume process. Hence, the
> NVMe devices need to be taken through the shutdown path irrespective of
> whether the ASPM states are enabled or not.
> I would like to hear from you the best method to follow to achieve this.

Christoph prefers to append quirks for platforms that need full device
shutdown on s2idle instead of changing the driver default.

We use dmi matching for our current platform quirk list. I do not know
what the equivalent is for device-tree based platforms. Do you know?

2022-02-09 23:27:57

by Nitin Rawat

[permalink] [raw]
Subject: Re: Query related to shutting down NVMe during system suspend

On 2022-02-10 01:56, Keith Busch wrote:
> On Tue, Feb 01, 2022 at 09:52:28PM +0530, Vidya Sagar wrote:
>> Hi Rafael & Christoph,
>> My query is regarding the comment and the code that follows after it
>> at
>> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/drivers/nvme/host/pci.c?h=v5.17-rc2#n3243
>> What I understood from it is that, there is an underlying assumption
>> that the power to the devices is not removed during the suspend call.
>> In the case of device-tree based platforms like Tegra194, power is
>> indeed removed to the devices during suspend-resume process. Hence,
>> the
>> NVMe devices need to be taken through the shutdown path irrespective
>> of
>> whether the ASPM states are enabled or not.
>> I would like to hear from you the best method to follow to achieve
>> this.
>
> Christoph prefers to append quirks for platforms that need full device
> shutdown on s2idle instead of changing the driver default.
>
> We use dmi matching for our current platform quirk list. I do not know
> what the equivalent is for device-tree based platforms. Do you know?

Hi Keith,

IMO, the NVME driver is not associated with any device tree, Instead PCI
driver is associated with device tree.
So unlike ACPI based platform where we have platform specific DMI
matching, we don't have equivalent check for DT based platform.

Do we see any concern if we introduce a module param with default not
set to quick suspend.
For platform that need full shutdown durring resume , they can set it to
quick suspend.

2022-02-10 04:59:32

by Vidya Sagar

[permalink] [raw]
Subject: Re: Query related to shutting down NVMe during system suspend



On 2/10/2022 1:56 AM, Keith Busch wrote:
> External email: Use caution opening links or attachments
>
>
> On Tue, Feb 01, 2022 at 09:52:28PM +0530, Vidya Sagar wrote:
>> Hi Rafael & Christoph,
>> My query is regarding the comment and the code that follows after it at
>> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/drivers/nvme/host/pci.c?h=v5.17-rc2#n3243
>> What I understood from it is that, there is an underlying assumption
>> that the power to the devices is not removed during the suspend call.
>> In the case of device-tree based platforms like Tegra194, power is
>> indeed removed to the devices during suspend-resume process. Hence, the
>> NVMe devices need to be taken through the shutdown path irrespective of
>> whether the ASPM states are enabled or not.
>> I would like to hear from you the best method to follow to achieve this.
>
> Christoph prefers to append quirks for platforms that need full device
> shutdown on s2idle instead of changing the driver default.
>
> We use dmi matching for our current platform quirk list. I do not know
> what the equivalent is for device-tree based platforms. Do you know?
I'm afraid I don't.
Rob, could you please help here?

>

2022-02-10 09:55:47

by Lukas Wunner

[permalink] [raw]
Subject: Re: Query related to shutting down NVMe during system suspend

On Thu, Feb 10, 2022 at 09:41:04AM +0530, Vidya Sagar wrote:
> On 2/10/2022 1:56 AM, Keith Busch wrote:
> > Christoph prefers to append quirks for platforms that need full device
> > shutdown on s2idle instead of changing the driver default.
> >
> > We use dmi matching for our current platform quirk list. I do not know
> > what the equivalent is for device-tree based platforms. Do you know?
>
> I'm afraid I don't.

of_machine_is_compatible()

2022-02-10 13:38:41

by Nitin Rawat

[permalink] [raw]
Subject: Re: Query related to shutting down NVMe during system suspend

On 2022-02-10 11:10, Lukas Wunner wrote:
> On Thu, Feb 10, 2022 at 09:41:04AM +0530, Vidya Sagar wrote:
>> On 2/10/2022 1:56 AM, Keith Busch wrote:
>> > Christoph prefers to append quirks for platforms that need full device
>> > shutdown on s2idle instead of changing the driver default.
>> >
>> > We use dmi matching for our current platform quirk list. I do not know
>> > what the equivalent is for device-tree based platforms. Do you know?
>>
>> I'm afraid I don't.
>
> of_machine_is_compatible()

Thanks Lukas and keith . Yes it worked using of_machine_is_compatible as
it checks root node of device tree for a given compatible value.
@keith - I have posted the change using above API to enable nvme quick
suspend quirks for sc7280 platform. Please can you review it.