2021-10-14 07:40:03

by Oliver Sang

[permalink] [raw]
Subject: [PM] bfcc1e67ff: kernel-selftests.breakpoints.step_after_suspend_test.fail



Greeting,

FYI, we noticed the following commit (built with gcc-9):

commit: bfcc1e67ff1e4aa8bfe2ca57f99390fc284c799d ("PM: sleep: Do not assume that "mem" is always present")
https://git.kernel.org/cgit/linux/kernel/git/next/linux-next.git master


in testcase: kernel-selftests
version: kernel-selftests-x86_64-c8c9111a-1_20210929
with following parameters:

group: group-00
ucode: 0x11

test-description: The kernel contains a set of "self tests" under the tools/testing/selftests/ directory. These are intended to be small unit tests to exercise individual code paths in the kernel.
test-url: https://www.kernel.org/doc/Documentation/kselftest.txt


on test machine: 288 threads 2 sockets Intel(R) Xeon Phi(TM) CPU 7295 @ 1.50GHz with 80G memory

caused below changes (please refer to attached dmesg/kmsg for entire log/backtrace):




If you fix the issue, kindly add following tag
Reported-by: kernel test robot <[email protected]>


TAP version 13
1..2
# selftests: breakpoints: step_after_suspend_test
# TAP version 13
# Bail out! Failed to enter Suspend state
# # Totals: pass:0 fail:0 xfail:0 xpass:0 skip:0 error:0
not ok 1 selftests: breakpoints: step_after_suspend_test # exit=1



To reproduce:

git clone https://github.com/intel/lkp-tests.git
cd lkp-tests
sudo bin/lkp install job.yaml # job file is attached in this email
bin/lkp split-job --compatible job.yaml # generate the yaml file for lkp run
sudo bin/lkp run generated-yaml-file

# if come across any failure that blocks the test,
# please remove ~/.lkp and /lkp dir to run from a clean state.



---
0DAY/LKP+ Test Infrastructure Open Source Technology Center
https://lists.01.org/hyperkitty/list/[email protected] Intel Corporation

Thanks,
Oliver Sang


Attachments:
(No filename) (1.84 kB)
config-5.15.0-rc1-00001-gbfcc1e67ff1e (179.13 kB)
job-script (6.25 kB)
dmesg.xz (27.78 kB)
kernel-selftests (8.41 kB)
job.yaml (5.27 kB)
reproduce (166.00 B)
Download all attachments

2021-10-14 18:49:12

by Florian Fainelli

[permalink] [raw]
Subject: Re: [PM] bfcc1e67ff: kernel-selftests.breakpoints.step_after_suspend_test.fail

On 10/14/21 12:57 AM, kernel test robot wrote:
>
>
> Greeting,
>
> FYI, we noticed the following commit (built with gcc-9):
>
> commit: bfcc1e67ff1e4aa8bfe2ca57f99390fc284c799d ("PM: sleep: Do not assume that "mem" is always present")
> https://git.kernel.org/cgit/linux/kernel/git/next/linux-next.git master
>
>
> in testcase: kernel-selftests
> version: kernel-selftests-x86_64-c8c9111a-1_20210929
> with following parameters:
>
> group: group-00
> ucode: 0x11
>
> test-description: The kernel contains a set of "self tests" under the tools/testing/selftests/ directory. These are intended to be small unit tests to exercise individual code paths in the kernel.
> test-url: https://www.kernel.org/doc/Documentation/kselftest.txt
>
>
> on test machine: 288 threads 2 sockets Intel(R) Xeon Phi(TM) CPU 7295 @ 1.50GHz with 80G memory
>
> caused below changes (please refer to attached dmesg/kmsg for entire log/backtrace):
>
>
>
>
> If you fix the issue, kindly add following tag
> Reported-by: kernel test robot <[email protected]>

Thanks for your report. Assuming that the code responsible for
registering the suspend operations is drivers/acpi/sleep.c for your
platform, and that acpi_sleep_suspend_setup() iterated over all possible
sleep states, your platform must somehow be returning that ACPI_STATE_S3
is not a supported state somehow?

Rafael have you ever encountered something like that?

>
>
> TAP version 13
> 1..2
> # selftests: breakpoints: step_after_suspend_test
> # TAP version 13
> # Bail out! Failed to enter Suspend state
> # # Totals: pass:0 fail:0 xfail:0 xpass:0 skip:0 error:0
> not ok 1 selftests: breakpoints: step_after_suspend_test # exit=1
>
>
>
> To reproduce:
>
> git clone https://github.com/intel/lkp-tests.git
> cd lkp-tests
> sudo bin/lkp install job.yaml # job file is attached in this email
> bin/lkp split-job --compatible job.yaml # generate the yaml file for lkp run
> sudo bin/lkp run generated-yaml-file
>
> # if come across any failure that blocks the test,
> # please remove ~/.lkp and /lkp dir to run from a clean state.
>
>
>
> ---
> 0DAY/LKP+ Test Infrastructure Open Source Technology Center
> https://lists.01.org/hyperkitty/list/[email protected] Intel Corporation
>
> Thanks,
> Oliver Sang
>


--
Florian

2021-10-15 00:30:15

by Wysocki, Rafael J

[permalink] [raw]
Subject: Re: [PM] bfcc1e67ff: kernel-selftests.breakpoints.step_after_suspend_test.fail

On 10/14/2021 6:26 PM, Florian Fainelli wrote:
> On 10/14/21 12:57 AM, kernel test robot wrote:
>>
>> Greeting,
>>
>> FYI, we noticed the following commit (built with gcc-9):
>>
>> commit: bfcc1e67ff1e4aa8bfe2ca57f99390fc284c799d ("PM: sleep: Do not assume that "mem" is always present")
>> https://git.kernel.org/cgit/linux/kernel/git/next/linux-next.git master
>>
>>
>> in testcase: kernel-selftests
>> version: kernel-selftests-x86_64-c8c9111a-1_20210929
>> with following parameters:
>>
>> group: group-00
>> ucode: 0x11
>>
>> test-description: The kernel contains a set of "self tests" under the tools/testing/selftests/ directory. These are intended to be small unit tests to exercise individual code paths in the kernel.
>> test-url: https://www.kernel.org/doc/Documentation/kselftest.txt
>>
>>
>> on test machine: 288 threads 2 sockets Intel(R) Xeon Phi(TM) CPU 7295 @ 1.50GHz with 80G memory
>>
>> caused below changes (please refer to attached dmesg/kmsg for entire log/backtrace):
>>
>>
>>
>>
>> If you fix the issue, kindly add following tag
>> Reported-by: kernel test robot <[email protected]>
> Thanks for your report. Assuming that the code responsible for
> registering the suspend operations is drivers/acpi/sleep.c for your
> platform, and that acpi_sleep_suspend_setup() iterated over all possible
> sleep states, your platform must somehow be returning that ACPI_STATE_S3
> is not a supported state somehow?
>
> Rafael have you ever encountered something like that?

Yes, there are systems with ACPI that don't support S3.


>>
>> TAP version 13
>> 1..2
>> # selftests: breakpoints: step_after_suspend_test
>> # TAP version 13
>> # Bail out! Failed to enter Suspend state
>> # # Totals: pass:0 fail:0 xfail:0 xpass:0 skip:0 error:0
>> not ok 1 selftests: breakpoints: step_after_suspend_test # exit=1
>>
>>
>>
>> To reproduce:
>>
>> git clone https://github.com/intel/lkp-tests.git
>> cd lkp-tests
>> sudo bin/lkp install job.yaml # job file is attached in this email
>> bin/lkp split-job --compatible job.yaml # generate the yaml file for lkp run
>> sudo bin/lkp run generated-yaml-file
>>
>> # if come across any failure that blocks the test,
>> # please remove ~/.lkp and /lkp dir to run from a clean state.
>>
>>
>>
>> ---
>> 0DAY/LKP+ Test Infrastructure Open Source Technology Center
>> https://lists.01.org/hyperkitty/list/[email protected] Intel Corporation
>>
>> Thanks,
>> Oliver Sang
>>
>

2021-10-15 03:47:11

by Florian Fainelli

[permalink] [raw]
Subject: Re: [PM] bfcc1e67ff: kernel-selftests.breakpoints.step_after_suspend_test.fail

On 10/14/21 12:23 PM, Rafael J. Wysocki wrote:
> On 10/14/2021 6:26 PM, Florian Fainelli wrote:
>> On 10/14/21 12:57 AM, kernel test robot wrote:
>>>
>>> Greeting,
>>>
>>> FYI, we noticed the following commit (built with gcc-9):
>>>
>>> commit: bfcc1e67ff1e4aa8bfe2ca57f99390fc284c799d ("PM: sleep: Do not
>>> assume that "mem" is always present")
>>> https://git.kernel.org/cgit/linux/kernel/git/next/linux-next.git master
>>>
>>>
>>> in testcase: kernel-selftests
>>> version: kernel-selftests-x86_64-c8c9111a-1_20210929
>>> with following parameters:
>>>
>>>     group: group-00
>>>     ucode: 0x11
>>>
>>> test-description: The kernel contains a set of "self tests" under the
>>> tools/testing/selftests/ directory. These are intended to be small
>>> unit tests to exercise individual code paths in the kernel.
>>> test-url: https://www.kernel.org/doc/Documentation/kselftest.txt
>>>
>>>
>>> on test machine: 288 threads 2 sockets Intel(R) Xeon Phi(TM) CPU 7295
>>> @ 1.50GHz with 80G memory
>>>
>>> caused below changes (please refer to attached dmesg/kmsg for entire
>>> log/backtrace):
>>>
>>>
>>>
>>>
>>> If you fix the issue, kindly add following tag
>>> Reported-by: kernel test robot <[email protected]>
>> Thanks for your report. Assuming that the code responsible for
>> registering the suspend operations is drivers/acpi/sleep.c for your
>> platform, and that acpi_sleep_suspend_setup() iterated over all possible
>> sleep states, your platform must somehow be returning that ACPI_STATE_S3
>> is not a supported state somehow?
>>
>> Rafael have you ever encountered something like that?
>
> Yes, there are systems with ACPI that don't support S3.

OK and do you know what happens when we enter suspend with "mem" in
those cases? Do we immediately return because ultimately the firmware
does not support ACPI S3?

I will rework
tools/testing/selftests/breakpoints/step_after_suspend_test.c not to
assume that "mem" is always available, since it may not be.
--
Florian

2021-10-17 18:34:06

by Wysocki, Rafael J

[permalink] [raw]
Subject: Re: [PM] bfcc1e67ff: kernel-selftests.breakpoints.step_after_suspend_test.fail

On 10/14/2021 11:55 PM, Florian Fainelli wrote:
> On 10/14/21 12:23 PM, Rafael J. Wysocki wrote:
>> On 10/14/2021 6:26 PM, Florian Fainelli wrote:
>>> On 10/14/21 12:57 AM, kernel test robot wrote:
>>>> Greeting,
>>>>
>>>> FYI, we noticed the following commit (built with gcc-9):
>>>>
>>>> commit: bfcc1e67ff1e4aa8bfe2ca57f99390fc284c799d ("PM: sleep: Do not
>>>> assume that "mem" is always present")
>>>> https://git.kernel.org/cgit/linux/kernel/git/next/linux-next.git master
>>>>
>>>>
>>>> in testcase: kernel-selftests
>>>> version: kernel-selftests-x86_64-c8c9111a-1_20210929
>>>> with following parameters:
>>>>
>>>>     group: group-00
>>>>     ucode: 0x11
>>>>
>>>> test-description: The kernel contains a set of "self tests" under the
>>>> tools/testing/selftests/ directory. These are intended to be small
>>>> unit tests to exercise individual code paths in the kernel.
>>>> test-url: https://www.kernel.org/doc/Documentation/kselftest.txt
>>>>
>>>>
>>>> on test machine: 288 threads 2 sockets Intel(R) Xeon Phi(TM) CPU 7295
>>>> @ 1.50GHz with 80G memory
>>>>
>>>> caused below changes (please refer to attached dmesg/kmsg for entire
>>>> log/backtrace):
>>>>
>>>>
>>>>
>>>>
>>>> If you fix the issue, kindly add following tag
>>>> Reported-by: kernel test robot <[email protected]>
>>> Thanks for your report. Assuming that the code responsible for
>>> registering the suspend operations is drivers/acpi/sleep.c for your
>>> platform, and that acpi_sleep_suspend_setup() iterated over all possible
>>> sleep states, your platform must somehow be returning that ACPI_STATE_S3
>>> is not a supported state somehow?
>>>
>>> Rafael have you ever encountered something like that?
>> Yes, there are systems with ACPI that don't support S3.
> OK and do you know what happens when we enter suspend with "mem" in
> those cases? Do we immediately return because ultimately the firmware
> does not support ACPI S3?

"mem" should not be present in the list of available strings then, so it
should be rejected right away.


> I will rework
> tools/testing/selftests/breakpoints/step_after_suspend_test.c not to
> assume that "mem" is always available, since it may not be.


2021-10-17 22:04:18

by Florian Fainelli

[permalink] [raw]
Subject: Re: [PM] bfcc1e67ff: kernel-selftests.breakpoints.step_after_suspend_test.fail

On 10/15/21 11:45 AM, Rafael J. Wysocki wrote:
> On 10/14/2021 11:55 PM, Florian Fainelli wrote:
>> On 10/14/21 12:23 PM, Rafael J. Wysocki wrote:
>>> On 10/14/2021 6:26 PM, Florian Fainelli wrote:
>>>> On 10/14/21 12:57 AM, kernel test robot wrote:
>>>>> Greeting,
>>>>>
>>>>> FYI, we noticed the following commit (built with gcc-9):
>>>>>
>>>>> commit: bfcc1e67ff1e4aa8bfe2ca57f99390fc284c799d ("PM: sleep: Do not
>>>>> assume that "mem" is always present")
>>>>> https://git.kernel.org/cgit/linux/kernel/git/next/linux-next.git
>>>>> master
>>>>>
>>>>>
>>>>> in testcase: kernel-selftests
>>>>> version: kernel-selftests-x86_64-c8c9111a-1_20210929
>>>>> with following parameters:
>>>>>
>>>>>      group: group-00
>>>>>      ucode: 0x11
>>>>>
>>>>> test-description: The kernel contains a set of "self tests" under the
>>>>> tools/testing/selftests/ directory. These are intended to be small
>>>>> unit tests to exercise individual code paths in the kernel.
>>>>> test-url: https://www.kernel.org/doc/Documentation/kselftest.txt
>>>>>
>>>>>
>>>>> on test machine: 288 threads 2 sockets Intel(R) Xeon Phi(TM) CPU 7295
>>>>> @ 1.50GHz with 80G memory
>>>>>
>>>>> caused below changes (please refer to attached dmesg/kmsg for entire
>>>>> log/backtrace):
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> If you fix the issue, kindly add following tag
>>>>> Reported-by: kernel test robot <[email protected]>
>>>> Thanks for your report. Assuming that the code responsible for
>>>> registering the suspend operations is drivers/acpi/sleep.c for your
>>>> platform, and that acpi_sleep_suspend_setup() iterated over all
>>>> possible
>>>> sleep states, your platform must somehow be returning that
>>>> ACPI_STATE_S3
>>>> is not a supported state somehow?
>>>>
>>>> Rafael have you ever encountered something like that?
>>> Yes, there are systems with ACPI that don't support S3.
>> OK and do you know what happens when we enter suspend with "mem" in
>> those cases? Do we immediately return because ultimately the firmware
>> does not support ACPI S3?
>
> "mem" should not be present in the list of available strings then, so it
> should be rejected right away.

Well yes, that was the purpose of the patch I submitted, but assuming
that we did provide "mem" as one of the possible standby modes even
though that was wrong (before patch), and the test was trying to enter
ACPI S3 standby, what would have happened, would the ACPI firmware honor
the request but return an error, or would it actually enter ACPI S3?

In any case, I will change the test to check that this is a supported
standby mode before trying it.
--
Florian

2021-10-19 18:57:45

by Wysocki, Rafael J

[permalink] [raw]
Subject: Re: [PM] bfcc1e67ff: kernel-selftests.breakpoints.step_after_suspend_test.fail

On 10/15/2021 9:40 PM, Florian Fainelli wrote:
> On 10/15/21 11:45 AM, Rafael J. Wysocki wrote:
>> On 10/14/2021 11:55 PM, Florian Fainelli wrote:
>>> On 10/14/21 12:23 PM, Rafael J. Wysocki wrote:
>>>> On 10/14/2021 6:26 PM, Florian Fainelli wrote:
>>>>> On 10/14/21 12:57 AM, kernel test robot wrote:
>>>>>> Greeting,
>>>>>>
>>>>>> FYI, we noticed the following commit (built with gcc-9):
>>>>>>
>>>>>> commit: bfcc1e67ff1e4aa8bfe2ca57f99390fc284c799d ("PM: sleep: Do not
>>>>>> assume that "mem" is always present")
>>>>>> https://git.kernel.org/cgit/linux/kernel/git/next/linux-next.git
>>>>>> master
>>>>>>
>>>>>>
>>>>>> in testcase: kernel-selftests
>>>>>> version: kernel-selftests-x86_64-c8c9111a-1_20210929
>>>>>> with following parameters:
>>>>>>
>>>>>>      group: group-00
>>>>>>      ucode: 0x11
>>>>>>
>>>>>> test-description: The kernel contains a set of "self tests" under the
>>>>>> tools/testing/selftests/ directory. These are intended to be small
>>>>>> unit tests to exercise individual code paths in the kernel.
>>>>>> test-url: https://www.kernel.org/doc/Documentation/kselftest.txt
>>>>>>
>>>>>>
>>>>>> on test machine: 288 threads 2 sockets Intel(R) Xeon Phi(TM) CPU 7295
>>>>>> @ 1.50GHz with 80G memory
>>>>>>
>>>>>> caused below changes (please refer to attached dmesg/kmsg for entire
>>>>>> log/backtrace):
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> If you fix the issue, kindly add following tag
>>>>>> Reported-by: kernel test robot <[email protected]>
>>>>> Thanks for your report. Assuming that the code responsible for
>>>>> registering the suspend operations is drivers/acpi/sleep.c for your
>>>>> platform, and that acpi_sleep_suspend_setup() iterated over all
>>>>> possible
>>>>> sleep states, your platform must somehow be returning that
>>>>> ACPI_STATE_S3
>>>>> is not a supported state somehow?
>>>>>
>>>>> Rafael have you ever encountered something like that?
>>>> Yes, there are systems with ACPI that don't support S3.
>>> OK and do you know what happens when we enter suspend with "mem" in
>>> those cases? Do we immediately return because ultimately the firmware
>>> does not support ACPI S3?
>> "mem" should not be present in the list of available strings then, so it
>> should be rejected right away.
> Well yes, that was the purpose of the patch I submitted, but assuming
> that we did provide "mem" as one of the possible standby modes even
> though that was wrong (before patch), and the test was trying to enter
> ACPI S3 standby, what would have happened, would the ACPI firmware honor
> the request but return an error, or would it actually enter ACPI S3?
>
> In any case, I will change the test to check that this is a supported
> standby mode before trying it.

Unfortunately, I will need to revert bfcc1e67ff1e4aa8bfe2, because it
breaks user space compatibility and that's got caught properly by the test.

What happens is that "mem" is a "pointer" to a secondary list of
possible states and that generally is "s2idle shallow deep" and if
s2idle is the only available option, it will be just "s2idle".

This list is there in /sys/power/mem_sleep.

It was done this way, because some variants of user space expect "mem"
to be always present and don't recognize "freeze" properly.

Sorry for the confusion.


2021-10-19 19:05:44

by Florian Fainelli

[permalink] [raw]
Subject: Re: [PM] bfcc1e67ff: kernel-selftests.breakpoints.step_after_suspend_test.fail

On 10/19/21 11:53 AM, Rafael J. Wysocki wrote:
> On 10/15/2021 9:40 PM, Florian Fainelli wrote:
>> On 10/15/21 11:45 AM, Rafael J. Wysocki wrote:
>>> On 10/14/2021 11:55 PM, Florian Fainelli wrote:
>>>> On 10/14/21 12:23 PM, Rafael J. Wysocki wrote:
>>>>> On 10/14/2021 6:26 PM, Florian Fainelli wrote:
>>>>>> On 10/14/21 12:57 AM, kernel test robot wrote:
>>>>>>> Greeting,
>>>>>>>
>>>>>>> FYI, we noticed the following commit (built with gcc-9):
>>>>>>>
>>>>>>> commit: bfcc1e67ff1e4aa8bfe2ca57f99390fc284c799d ("PM: sleep: Do not
>>>>>>> assume that "mem" is always present")
>>>>>>> https://git.kernel.org/cgit/linux/kernel/git/next/linux-next.git
>>>>>>> master
>>>>>>>
>>>>>>>
>>>>>>> in testcase: kernel-selftests
>>>>>>> version: kernel-selftests-x86_64-c8c9111a-1_20210929
>>>>>>> with following parameters:
>>>>>>>
>>>>>>>       group: group-00
>>>>>>>       ucode: 0x11
>>>>>>>
>>>>>>> test-description: The kernel contains a set of "self tests" under
>>>>>>> the
>>>>>>> tools/testing/selftests/ directory. These are intended to be small
>>>>>>> unit tests to exercise individual code paths in the kernel.
>>>>>>> test-url: https://www.kernel.org/doc/Documentation/kselftest.txt
>>>>>>>
>>>>>>>
>>>>>>> on test machine: 288 threads 2 sockets Intel(R) Xeon Phi(TM) CPU
>>>>>>> 7295
>>>>>>> @ 1.50GHz with 80G memory
>>>>>>>
>>>>>>> caused below changes (please refer to attached dmesg/kmsg for entire
>>>>>>> log/backtrace):
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> If you fix the issue, kindly add following tag
>>>>>>> Reported-by: kernel test robot <[email protected]>
>>>>>> Thanks for your report. Assuming that the code responsible for
>>>>>> registering the suspend operations is drivers/acpi/sleep.c for your
>>>>>> platform, and that acpi_sleep_suspend_setup() iterated over all
>>>>>> possible
>>>>>> sleep states, your platform must somehow be returning that
>>>>>> ACPI_STATE_S3
>>>>>> is not a supported state somehow?
>>>>>>
>>>>>> Rafael have you ever encountered something like that?
>>>>> Yes, there are systems with ACPI that don't support S3.
>>>> OK and do you know what happens when we enter suspend with "mem" in
>>>> those cases? Do we immediately return because ultimately the firmware
>>>> does not support ACPI S3?
>>> "mem" should not be present in the list of available strings then, so it
>>> should be rejected right away.
>> Well yes, that was the purpose of the patch I submitted, but assuming
>> that we did provide "mem" as one of the possible standby modes even
>> though that was wrong (before patch), and the test was trying to enter
>> ACPI S3 standby, what would have happened, would the ACPI firmware honor
>> the request but return an error, or would it actually enter ACPI S3?
>>
>> In any case, I will change the test to check that this is a supported
>> standby mode before trying it.
>
> Unfortunately, I will need to revert bfcc1e67ff1e4aa8bfe2, because it
> breaks user space compatibility and that's got caught properly by the test.

Reverting my commit will break powerpc and other ARM/ARM64 platforms
where mem is not supported (via PSCI), I have a change pending for PSCI
that will actually check that SYSTEM_SUSPEND is supported before
unconditionally making use of it.

>
> What happens is that "mem" is a "pointer" to a secondary list of
> possible states and that generally is "s2idle shallow deep" and if
> s2idle is the only available option, it will be just "s2idle".
>
> This list is there in /sys/power/mem_sleep.
>
> It was done this way, because some variants of user space expect "mem"
> to be always present and don't recognize "freeze" properly.
>
> Sorry for the confusion.

So how do we all get our cookie here? Should we just slap an #ifndef
CONFIG_ACPI in order to allow platforms that do not have "mem" to not
have it?
--
Florian

2021-10-20 13:51:03

by Rafael J. Wysocki

[permalink] [raw]
Subject: Re: [PM] bfcc1e67ff: kernel-selftests.breakpoints.step_after_suspend_test.fail

On Tue, Oct 19, 2021 at 9:04 PM Florian Fainelli <[email protected]> wrote:
>
> On 10/19/21 11:53 AM, Rafael J. Wysocki wrote:
> > On 10/15/2021 9:40 PM, Florian Fainelli wrote:
> >> On 10/15/21 11:45 AM, Rafael J. Wysocki wrote:
> >>> On 10/14/2021 11:55 PM, Florian Fainelli wrote:
> >>>> On 10/14/21 12:23 PM, Rafael J. Wysocki wrote:
> >>>>> On 10/14/2021 6:26 PM, Florian Fainelli wrote:
> >>>>>> On 10/14/21 12:57 AM, kernel test robot wrote:
> >>>>>>> Greeting,
> >>>>>>>
> >>>>>>> FYI, we noticed the following commit (built with gcc-9):
> >>>>>>>
> >>>>>>> commit: bfcc1e67ff1e4aa8bfe2ca57f99390fc284c799d ("PM: sleep: Do not
> >>>>>>> assume that "mem" is always present")
> >>>>>>> https://git.kernel.org/cgit/linux/kernel/git/next/linux-next.git
> >>>>>>> master
> >>>>>>>
> >>>>>>>
> >>>>>>> in testcase: kernel-selftests
> >>>>>>> version: kernel-selftests-x86_64-c8c9111a-1_20210929
> >>>>>>> with following parameters:
> >>>>>>>
> >>>>>>> group: group-00
> >>>>>>> ucode: 0x11
> >>>>>>>
> >>>>>>> test-description: The kernel contains a set of "self tests" under
> >>>>>>> the
> >>>>>>> tools/testing/selftests/ directory. These are intended to be small
> >>>>>>> unit tests to exercise individual code paths in the kernel.
> >>>>>>> test-url: https://www.kernel.org/doc/Documentation/kselftest.txt
> >>>>>>>
> >>>>>>>
> >>>>>>> on test machine: 288 threads 2 sockets Intel(R) Xeon Phi(TM) CPU
> >>>>>>> 7295
> >>>>>>> @ 1.50GHz with 80G memory
> >>>>>>>
> >>>>>>> caused below changes (please refer to attached dmesg/kmsg for entire
> >>>>>>> log/backtrace):
> >>>>>>>
> >>>>>>>
> >>>>>>>
> >>>>>>>
> >>>>>>> If you fix the issue, kindly add following tag
> >>>>>>> Reported-by: kernel test robot <[email protected]>
> >>>>>> Thanks for your report. Assuming that the code responsible for
> >>>>>> registering the suspend operations is drivers/acpi/sleep.c for your
> >>>>>> platform, and that acpi_sleep_suspend_setup() iterated over all
> >>>>>> possible
> >>>>>> sleep states, your platform must somehow be returning that
> >>>>>> ACPI_STATE_S3
> >>>>>> is not a supported state somehow?
> >>>>>>
> >>>>>> Rafael have you ever encountered something like that?
> >>>>> Yes, there are systems with ACPI that don't support S3.
> >>>> OK and do you know what happens when we enter suspend with "mem" in
> >>>> those cases? Do we immediately return because ultimately the firmware
> >>>> does not support ACPI S3?
> >>> "mem" should not be present in the list of available strings then, so it
> >>> should be rejected right away.
> >> Well yes, that was the purpose of the patch I submitted, but assuming
> >> that we did provide "mem" as one of the possible standby modes even
> >> though that was wrong (before patch), and the test was trying to enter
> >> ACPI S3 standby, what would have happened, would the ACPI firmware honor
> >> the request but return an error, or would it actually enter ACPI S3?
> >>
> >> In any case, I will change the test to check that this is a supported
> >> standby mode before trying it.
> >
> > Unfortunately, I will need to revert bfcc1e67ff1e4aa8bfe2, because it
> > breaks user space compatibility and that's got caught properly by the test.
>
> Reverting my commit will break powerpc and other ARM/ARM64 platforms
> where mem is not supported (via PSCI),

It won't break anything, although the things that didn't work before
will still not work after it.

And "mem" is always supported even if there are no suspend_ops at all,
in which case it becomes an alternative way to trigger s2idle.

So, on the affected systems, what's there in /sys/power/? Is
mem_sleep present? If so, what's in it?

> I have a change pending for PSCI
> that will actually check that SYSTEM_SUSPEND is supported before
> unconditionally making use of it.
>
> >
> > What happens is that "mem" is a "pointer" to a secondary list of
> > possible states and that generally is "s2idle shallow deep" and if
> > s2idle is the only available option, it will be just "s2idle".
> >
> > This list is there in /sys/power/mem_sleep.
> >
> > It was done this way, because some variants of user space expect "mem"
> > to be always present and don't recognize "freeze" properly.
> >
> > Sorry for the confusion.
>
> So how do we all get our cookie here? Should we just slap an #ifndef
> CONFIG_ACPI in order to allow platforms that do not have "mem" to not
> have it?

Certainly not.

I've just hacked my test-bed system with ACPI so it does not register
any suspend_ops at all and I have "freeze mem disk" in
/sys/power/state and "s2idle" in /sys/power/mem_sleep. Writing "mem"
to /sys/power/state causes s2idle to be carried out.

Since this is the expected behavior, I'm not sure what the problem is.

2021-10-20 15:36:15

by Florian Fainelli

[permalink] [raw]
Subject: Re: [PM] bfcc1e67ff: kernel-selftests.breakpoints.step_after_suspend_test.fail



On 10/20/2021 6:49 AM, Rafael J. Wysocki wrote:
> On Tue, Oct 19, 2021 at 9:04 PM Florian Fainelli <[email protected]> wrote:
>>
>> On 10/19/21 11:53 AM, Rafael J. Wysocki wrote:
>>> On 10/15/2021 9:40 PM, Florian Fainelli wrote:
>>>> On 10/15/21 11:45 AM, Rafael J. Wysocki wrote:
>>>>> On 10/14/2021 11:55 PM, Florian Fainelli wrote:
>>>>>> On 10/14/21 12:23 PM, Rafael J. Wysocki wrote:
>>>>>>> On 10/14/2021 6:26 PM, Florian Fainelli wrote:
>>>>>>>> On 10/14/21 12:57 AM, kernel test robot wrote:
>>>>>>>>> Greeting,
>>>>>>>>>
>>>>>>>>> FYI, we noticed the following commit (built with gcc-9):
>>>>>>>>>
>>>>>>>>> commit: bfcc1e67ff1e4aa8bfe2ca57f99390fc284c799d ("PM: sleep: Do not
>>>>>>>>> assume that "mem" is always present")
>>>>>>>>> https://git.kernel.org/cgit/linux/kernel/git/next/linux-next.git
>>>>>>>>> master
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> in testcase: kernel-selftests
>>>>>>>>> version: kernel-selftests-x86_64-c8c9111a-1_20210929
>>>>>>>>> with following parameters:
>>>>>>>>>
>>>>>>>>> group: group-00
>>>>>>>>> ucode: 0x11
>>>>>>>>>
>>>>>>>>> test-description: The kernel contains a set of "self tests" under
>>>>>>>>> the
>>>>>>>>> tools/testing/selftests/ directory. These are intended to be small
>>>>>>>>> unit tests to exercise individual code paths in the kernel.
>>>>>>>>> test-url: https://www.kernel.org/doc/Documentation/kselftest.txt
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> on test machine: 288 threads 2 sockets Intel(R) Xeon Phi(TM) CPU
>>>>>>>>> 7295
>>>>>>>>> @ 1.50GHz with 80G memory
>>>>>>>>>
>>>>>>>>> caused below changes (please refer to attached dmesg/kmsg for entire
>>>>>>>>> log/backtrace):
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> If you fix the issue, kindly add following tag
>>>>>>>>> Reported-by: kernel test robot <[email protected]>
>>>>>>>> Thanks for your report. Assuming that the code responsible for
>>>>>>>> registering the suspend operations is drivers/acpi/sleep.c for your
>>>>>>>> platform, and that acpi_sleep_suspend_setup() iterated over all
>>>>>>>> possible
>>>>>>>> sleep states, your platform must somehow be returning that
>>>>>>>> ACPI_STATE_S3
>>>>>>>> is not a supported state somehow?
>>>>>>>>
>>>>>>>> Rafael have you ever encountered something like that?
>>>>>>> Yes, there are systems with ACPI that don't support S3.
>>>>>> OK and do you know what happens when we enter suspend with "mem" in
>>>>>> those cases? Do we immediately return because ultimately the firmware
>>>>>> does not support ACPI S3?
>>>>> "mem" should not be present in the list of available strings then, so it
>>>>> should be rejected right away.
>>>> Well yes, that was the purpose of the patch I submitted, but assuming
>>>> that we did provide "mem" as one of the possible standby modes even
>>>> though that was wrong (before patch), and the test was trying to enter
>>>> ACPI S3 standby, what would have happened, would the ACPI firmware honor
>>>> the request but return an error, or would it actually enter ACPI S3?
>>>>
>>>> In any case, I will change the test to check that this is a supported
>>>> standby mode before trying it.
>>>
>>> Unfortunately, I will need to revert bfcc1e67ff1e4aa8bfe2, because it
>>> breaks user space compatibility and that's got caught properly by the test.
>>
>> Reverting my commit will break powerpc and other ARM/ARM64 platforms
>> where mem is not supported (via PSCI),
>
> It won't break anything, although the things that didn't work before
> will still not work after it.
>
> And "mem" is always supported even if there are no suspend_ops at all,
> in which case it becomes an alternative way to trigger s2idle.
>
> So, on the affected systems, what's there in /sys/power/? Is
> mem_sleep present? If so, what's in it?

With 4.9 which is what I used initially:

# cat /sys/power/state
freeze standby
# cat /sys/power/
pm_async pm_print_times pm_wakeup_irq wakeup_count
pm_freeze_timeout pm_test state

With a newer kernel without my patch:

# cat /sys/power/state
freeze standby mem
# cat /sys/power/mem_sleep
s2idle shallow [deep]
# cat /sys/power/
mem_sleep pm_freeze_timeout pm_wakeup_irq wakeup_count
pm_async pm_print_times state
pm_debug_messages pm_test suspend_stats/


>
>> I have a change pending for PSCI
>> that will actually check that SYSTEM_SUSPEND is supported before
>> unconditionally making use of it.
>>
>>>
>>> What happens is that "mem" is a "pointer" to a secondary list of
>>> possible states and that generally is "s2idle shallow deep" and if
>>> s2idle is the only available option, it will be just "s2idle".
>>>
>>> This list is there in /sys/power/mem_sleep.
>>>
>>> It was done this way, because some variants of user space expect "mem"
>>> to be always present and don't recognize "freeze" properly.
>>>
>>> Sorry for the confusion.
>>
>> So how do we all get our cookie here? Should we just slap an #ifndef
>> CONFIG_ACPI in order to allow platforms that do not have "mem" to not
>> have it?
>
> Certainly not.
>
> I've just hacked my test-bed system with ACPI so it does not register
> any suspend_ops at all and I have "freeze mem disk" in
> /sys/power/state and "s2idle" in /sys/power/mem_sleep. Writing "mem"
> to /sys/power/state causes s2idle to be carried out.
>
> Since this is the expected behavior, I'm not sure what the problem is.

The problem is advertising "mem" in /sys/power/state when the state is
not actually supported by the platform firmware here, whether that
translates into the form of s2idle or not. It is not supported, and it
should not be there IMHO. I was late to the game in identifying that,
but the 4.9 kernel makes sense to me.

Similarly, if you take arch/powerpc/sysdev/fsl_pmc.c only
PM_SUSPEND_STANDBY is valid, so advertising mem would be wrong if we
don't look at what ->valid tells us.
--
Florian

2021-10-20 16:02:19

by Rafael J. Wysocki

[permalink] [raw]
Subject: Re: [PM] bfcc1e67ff: kernel-selftests.breakpoints.step_after_suspend_test.fail

On Wed, Oct 20, 2021 at 5:34 PM Florian Fainelli <[email protected]> wrote:
>
>
>
> On 10/20/2021 6:49 AM, Rafael J. Wysocki wrote:
> > On Tue, Oct 19, 2021 at 9:04 PM Florian Fainelli <[email protected]> wrote:
> >>
> >> On 10/19/21 11:53 AM, Rafael J. Wysocki wrote:
> >>> On 10/15/2021 9:40 PM, Florian Fainelli wrote:
> >>>> On 10/15/21 11:45 AM, Rafael J. Wysocki wrote:
> >>>>> On 10/14/2021 11:55 PM, Florian Fainelli wrote:
> >>>>>> On 10/14/21 12:23 PM, Rafael J. Wysocki wrote:
> >>>>>>> On 10/14/2021 6:26 PM, Florian Fainelli wrote:
> >>>>>>>> On 10/14/21 12:57 AM, kernel test robot wrote:
> >>>>>>>>> Greeting,
> >>>>>>>>>
> >>>>>>>>> FYI, we noticed the following commit (built with gcc-9):
> >>>>>>>>>
> >>>>>>>>> commit: bfcc1e67ff1e4aa8bfe2ca57f99390fc284c799d ("PM: sleep: Do not
> >>>>>>>>> assume that "mem" is always present")
> >>>>>>>>> https://git.kernel.org/cgit/linux/kernel/git/next/linux-next.git
> >>>>>>>>> master
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>> in testcase: kernel-selftests
> >>>>>>>>> version: kernel-selftests-x86_64-c8c9111a-1_20210929
> >>>>>>>>> with following parameters:
> >>>>>>>>>
> >>>>>>>>> group: group-00
> >>>>>>>>> ucode: 0x11
> >>>>>>>>>
> >>>>>>>>> test-description: The kernel contains a set of "self tests" under
> >>>>>>>>> the
> >>>>>>>>> tools/testing/selftests/ directory. These are intended to be small
> >>>>>>>>> unit tests to exercise individual code paths in the kernel.
> >>>>>>>>> test-url: https://www.kernel.org/doc/Documentation/kselftest.txt
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>> on test machine: 288 threads 2 sockets Intel(R) Xeon Phi(TM) CPU
> >>>>>>>>> 7295
> >>>>>>>>> @ 1.50GHz with 80G memory
> >>>>>>>>>
> >>>>>>>>> caused below changes (please refer to attached dmesg/kmsg for entire
> >>>>>>>>> log/backtrace):
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>> If you fix the issue, kindly add following tag
> >>>>>>>>> Reported-by: kernel test robot <[email protected]>
> >>>>>>>> Thanks for your report. Assuming that the code responsible for
> >>>>>>>> registering the suspend operations is drivers/acpi/sleep.c for your
> >>>>>>>> platform, and that acpi_sleep_suspend_setup() iterated over all
> >>>>>>>> possible
> >>>>>>>> sleep states, your platform must somehow be returning that
> >>>>>>>> ACPI_STATE_S3
> >>>>>>>> is not a supported state somehow?
> >>>>>>>>
> >>>>>>>> Rafael have you ever encountered something like that?
> >>>>>>> Yes, there are systems with ACPI that don't support S3.
> >>>>>> OK and do you know what happens when we enter suspend with "mem" in
> >>>>>> those cases? Do we immediately return because ultimately the firmware
> >>>>>> does not support ACPI S3?
> >>>>> "mem" should not be present in the list of available strings then, so it
> >>>>> should be rejected right away.
> >>>> Well yes, that was the purpose of the patch I submitted, but assuming
> >>>> that we did provide "mem" as one of the possible standby modes even
> >>>> though that was wrong (before patch), and the test was trying to enter
> >>>> ACPI S3 standby, what would have happened, would the ACPI firmware honor
> >>>> the request but return an error, or would it actually enter ACPI S3?
> >>>>
> >>>> In any case, I will change the test to check that this is a supported
> >>>> standby mode before trying it.
> >>>
> >>> Unfortunately, I will need to revert bfcc1e67ff1e4aa8bfe2, because it
> >>> breaks user space compatibility and that's got caught properly by the test.
> >>
> >> Reverting my commit will break powerpc and other ARM/ARM64 platforms
> >> where mem is not supported (via PSCI),
> >
> > It won't break anything, although the things that didn't work before
> > will still not work after it.
> >
> > And "mem" is always supported even if there are no suspend_ops at all,
> > in which case it becomes an alternative way to trigger s2idle.
> >
> > So, on the affected systems, what's there in /sys/power/? Is
> > mem_sleep present? If so, what's in it?
>
> With 4.9 which is what I used initially:
>
> # cat /sys/power/state
> freeze standby
> # cat /sys/power/
> pm_async pm_print_times pm_wakeup_irq wakeup_count
> pm_freeze_timeout pm_test state
>
> With a newer kernel without my patch:
>
> # cat /sys/power/state
> freeze standby mem
> # cat /sys/power/mem_sleep
> s2idle shallow [deep]

OK, so the "deep" and "shallow" suspend variants appear to be
supported. What's the problem with advertising "mem" then?

> # cat /sys/power/
> mem_sleep pm_freeze_timeout pm_wakeup_irq wakeup_count
> pm_async pm_print_times state
> pm_debug_messages pm_test suspend_stats/
>
>
> >
> >> I have a change pending for PSCI
> >> that will actually check that SYSTEM_SUSPEND is supported before
> >> unconditionally making use of it.
> >>
> >>>
> >>> What happens is that "mem" is a "pointer" to a secondary list of
> >>> possible states and that generally is "s2idle shallow deep" and if
> >>> s2idle is the only available option, it will be just "s2idle".
> >>>
> >>> This list is there in /sys/power/mem_sleep.
> >>>
> >>> It was done this way, because some variants of user space expect "mem"
> >>> to be always present and don't recognize "freeze" properly.
> >>>
> >>> Sorry for the confusion.
> >>
> >> So how do we all get our cookie here? Should we just slap an #ifndef
> >> CONFIG_ACPI in order to allow platforms that do not have "mem" to not
> >> have it?
> >
> > Certainly not.
> >
> > I've just hacked my test-bed system with ACPI so it does not register
> > any suspend_ops at all and I have "freeze mem disk" in
> > /sys/power/state and "s2idle" in /sys/power/mem_sleep. Writing "mem"
> > to /sys/power/state causes s2idle to be carried out.
> >
> > Since this is the expected behavior, I'm not sure what the problem is.
>
> The problem is advertising "mem" in /sys/power/state when the state is
> not actually supported by the platform firmware here, whether that
> translates into the form of s2idle or not. It is not supported, and it
> should not be there IMHO.

Well, it is there, because some user space expects it to be there on
systems supporting any kind of system-wide suspend, including s2idle.
Like it or not.

If it is not there, the utilities in question assume that system-wide
suspend is not supported at all.

>I was late to the game in identifying that,
> but the 4.9 kernel makes sense to me.
>
> Similarly, if you take arch/powerpc/sysdev/fsl_pmc.c only
> PM_SUSPEND_STANDBY is valid, so advertising mem would be wrong if we
> don't look at what ->valid tells us.

Again: "mem" appears in /sys/power/state if the system supports any
kind of system-wide suspend (because of the expectations of user space
mentioned above) and mem_sleep decides what it really means.

And this is documented too (see Documentation/admin-guide/pm/sleep-states.html).

2021-10-20 18:19:08

by Florian Fainelli

[permalink] [raw]
Subject: Re: [PM] bfcc1e67ff: kernel-selftests.breakpoints.step_after_suspend_test.fail

On 10/20/21 9:00 AM, Rafael J. Wysocki wrote:
> On Wed, Oct 20, 2021 at 5:34 PM Florian Fainelli <[email protected]> wrote:
>>
>>
>>
>> On 10/20/2021 6:49 AM, Rafael J. Wysocki wrote:
>>> On Tue, Oct 19, 2021 at 9:04 PM Florian Fainelli <[email protected]> wrote:
>>>>
>>>> On 10/19/21 11:53 AM, Rafael J. Wysocki wrote:
>>>>> On 10/15/2021 9:40 PM, Florian Fainelli wrote:
>>>>>> On 10/15/21 11:45 AM, Rafael J. Wysocki wrote:
>>>>>>> On 10/14/2021 11:55 PM, Florian Fainelli wrote:
>>>>>>>> On 10/14/21 12:23 PM, Rafael J. Wysocki wrote:
>>>>>>>>> On 10/14/2021 6:26 PM, Florian Fainelli wrote:
>>>>>>>>>> On 10/14/21 12:57 AM, kernel test robot wrote:
>>>>>>>>>>> Greeting,
>>>>>>>>>>>
>>>>>>>>>>> FYI, we noticed the following commit (built with gcc-9):
>>>>>>>>>>>
>>>>>>>>>>> commit: bfcc1e67ff1e4aa8bfe2ca57f99390fc284c799d ("PM: sleep: Do not
>>>>>>>>>>> assume that "mem" is always present")
>>>>>>>>>>> https://git.kernel.org/cgit/linux/kernel/git/next/linux-next.git
>>>>>>>>>>> master
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> in testcase: kernel-selftests
>>>>>>>>>>> version: kernel-selftests-x86_64-c8c9111a-1_20210929
>>>>>>>>>>> with following parameters:
>>>>>>>>>>>
>>>>>>>>>>> group: group-00
>>>>>>>>>>> ucode: 0x11
>>>>>>>>>>>
>>>>>>>>>>> test-description: The kernel contains a set of "self tests" under
>>>>>>>>>>> the
>>>>>>>>>>> tools/testing/selftests/ directory. These are intended to be small
>>>>>>>>>>> unit tests to exercise individual code paths in the kernel.
>>>>>>>>>>> test-url: https://www.kernel.org/doc/Documentation/kselftest.txt
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> on test machine: 288 threads 2 sockets Intel(R) Xeon Phi(TM) CPU
>>>>>>>>>>> 7295
>>>>>>>>>>> @ 1.50GHz with 80G memory
>>>>>>>>>>>
>>>>>>>>>>> caused below changes (please refer to attached dmesg/kmsg for entire
>>>>>>>>>>> log/backtrace):
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> If you fix the issue, kindly add following tag
>>>>>>>>>>> Reported-by: kernel test robot <[email protected]>
>>>>>>>>>> Thanks for your report. Assuming that the code responsible for
>>>>>>>>>> registering the suspend operations is drivers/acpi/sleep.c for your
>>>>>>>>>> platform, and that acpi_sleep_suspend_setup() iterated over all
>>>>>>>>>> possible
>>>>>>>>>> sleep states, your platform must somehow be returning that
>>>>>>>>>> ACPI_STATE_S3
>>>>>>>>>> is not a supported state somehow?
>>>>>>>>>>
>>>>>>>>>> Rafael have you ever encountered something like that?
>>>>>>>>> Yes, there are systems with ACPI that don't support S3.
>>>>>>>> OK and do you know what happens when we enter suspend with "mem" in
>>>>>>>> those cases? Do we immediately return because ultimately the firmware
>>>>>>>> does not support ACPI S3?
>>>>>>> "mem" should not be present in the list of available strings then, so it
>>>>>>> should be rejected right away.
>>>>>> Well yes, that was the purpose of the patch I submitted, but assuming
>>>>>> that we did provide "mem" as one of the possible standby modes even
>>>>>> though that was wrong (before patch), and the test was trying to enter
>>>>>> ACPI S3 standby, what would have happened, would the ACPI firmware honor
>>>>>> the request but return an error, or would it actually enter ACPI S3?
>>>>>>
>>>>>> In any case, I will change the test to check that this is a supported
>>>>>> standby mode before trying it.
>>>>>
>>>>> Unfortunately, I will need to revert bfcc1e67ff1e4aa8bfe2, because it
>>>>> breaks user space compatibility and that's got caught properly by the test.
>>>>
>>>> Reverting my commit will break powerpc and other ARM/ARM64 platforms
>>>> where mem is not supported (via PSCI),
>>>
>>> It won't break anything, although the things that didn't work before
>>> will still not work after it.
>>>
>>> And "mem" is always supported even if there are no suspend_ops at all,
>>> in which case it becomes an alternative way to trigger s2idle.
>>>
>>> So, on the affected systems, what's there in /sys/power/? Is
>>> mem_sleep present? If so, what's in it?
>>
>> With 4.9 which is what I used initially:
>>
>> # cat /sys/power/state
>> freeze standby
>> # cat /sys/power/
>> pm_async pm_print_times pm_wakeup_irq wakeup_count
>> pm_freeze_timeout pm_test state
>>
>> With a newer kernel without my patch:
>>
>> # cat /sys/power/state
>> freeze standby mem
>> # cat /sys/power/mem_sleep
>> s2idle shallow [deep]
>
> OK, so the "deep" and "shallow" suspend variants appear to be
> supported. What's the problem with advertising "mem" then?

s2idle and shallow are, but deep is not.

>
>> # cat /sys/power/
>> mem_sleep pm_freeze_timeout pm_wakeup_irq wakeup_count
>> pm_async pm_print_times state
>> pm_debug_messages pm_test suspend_stats/
>>
>>
>>>
>>>> I have a change pending for PSCI
>>>> that will actually check that SYSTEM_SUSPEND is supported before
>>>> unconditionally making use of it.
>>>>
>>>>>
>>>>> What happens is that "mem" is a "pointer" to a secondary list of
>>>>> possible states and that generally is "s2idle shallow deep" and if
>>>>> s2idle is the only available option, it will be just "s2idle".
>>>>>
>>>>> This list is there in /sys/power/mem_sleep.
>>>>>
>>>>> It was done this way, because some variants of user space expect "mem"
>>>>> to be always present and don't recognize "freeze" properly.
>>>>>
>>>>> Sorry for the confusion.
>>>>
>>>> So how do we all get our cookie here? Should we just slap an #ifndef
>>>> CONFIG_ACPI in order to allow platforms that do not have "mem" to not
>>>> have it?
>>>
>>> Certainly not.
>>>
>>> I've just hacked my test-bed system with ACPI so it does not register
>>> any suspend_ops at all and I have "freeze mem disk" in
>>> /sys/power/state and "s2idle" in /sys/power/mem_sleep. Writing "mem"
>>> to /sys/power/state causes s2idle to be carried out.
>>>
>>> Since this is the expected behavior, I'm not sure what the problem is.
>>
>> The problem is advertising "mem" in /sys/power/state when the state is
>> not actually supported by the platform firmware here, whether that
>> translates into the form of s2idle or not. It is not supported, and it
>> should not be there IMHO.
>
> Well, it is there, because some user space expects it to be there on
> systems supporting any kind of system-wide suspend, including s2idle.
> Like it or not.

But that was not the case before 406e79385f32 ("PM / sleep: System sleep
state selection interface rework") and clearly nobody complained about
that, did they?

>
> If it is not there, the utilities in question assume that system-wide
> suspend is not supported at all.

What utilities do depend on that? That selftest that does not even check
that "mem" is actually present in /sys/power/state and just fails its
test if it is not, yes it's not great, but that can be fixed.

>
>> I was late to the game in identifying that,
>> but the 4.9 kernel makes sense to me.
>>
>> Similarly, if you take arch/powerpc/sysdev/fsl_pmc.c only
>> PM_SUSPEND_STANDBY is valid, so advertising mem would be wrong if we
>> don't look at what ->valid tells us.
>
> Again: "mem" appears in /sys/power/state if the system supports any
> kind of system-wide suspend (because of the expectations of user space
> mentioned above) and mem_sleep decides what it really means.
>
> And this is documented too (see Documentation/admin-guide/pm/sleep-states.html).

The documentation just states that if the kernel supports *any* suspend
state, then /sys/power/state will be present and likewise for
/sys/power/mem_sleep, it does not say what the contents will be and that
"mem" would always be present in there.
--
Florian

2021-10-20 18:49:45

by Rafael J. Wysocki

[permalink] [raw]
Subject: Re: [PM] bfcc1e67ff: kernel-selftests.breakpoints.step_after_suspend_test.fail

On Wed, Oct 20, 2021 at 8:17 PM Florian Fainelli <[email protected]> wrote:
>
> On 10/20/21 9:00 AM, Rafael J. Wysocki wrote:
> > On Wed, Oct 20, 2021 at 5:34 PM Florian Fainelli <[email protected]> wrote:
> >>
> >>
> >>
> >> On 10/20/2021 6:49 AM, Rafael J. Wysocki wrote:
> >>> On Tue, Oct 19, 2021 at 9:04 PM Florian Fainelli <[email protected]> wrote:
> >>>>
> >>>> On 10/19/21 11:53 AM, Rafael J. Wysocki wrote:
> >>>>> On 10/15/2021 9:40 PM, Florian Fainelli wrote:
> >>>>>> On 10/15/21 11:45 AM, Rafael J. Wysocki wrote:
> >>>>>>> On 10/14/2021 11:55 PM, Florian Fainelli wrote:
> >>>>>>>> On 10/14/21 12:23 PM, Rafael J. Wysocki wrote:
> >>>>>>>>> On 10/14/2021 6:26 PM, Florian Fainelli wrote:
> >>>>>>>>>> On 10/14/21 12:57 AM, kernel test robot wrote:
> >>>>>>>>>>> Greeting,
> >>>>>>>>>>>
> >>>>>>>>>>> FYI, we noticed the following commit (built with gcc-9):
> >>>>>>>>>>>
> >>>>>>>>>>> commit: bfcc1e67ff1e4aa8bfe2ca57f99390fc284c799d ("PM: sleep: Do not
> >>>>>>>>>>> assume that "mem" is always present")
> >>>>>>>>>>> https://git.kernel.org/cgit/linux/kernel/git/next/linux-next.git
> >>>>>>>>>>> master
> >>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>> in testcase: kernel-selftests
> >>>>>>>>>>> version: kernel-selftests-x86_64-c8c9111a-1_20210929
> >>>>>>>>>>> with following parameters:
> >>>>>>>>>>>
> >>>>>>>>>>> group: group-00
> >>>>>>>>>>> ucode: 0x11
> >>>>>>>>>>>
> >>>>>>>>>>> test-description: The kernel contains a set of "self tests" under
> >>>>>>>>>>> the
> >>>>>>>>>>> tools/testing/selftests/ directory. These are intended to be small
> >>>>>>>>>>> unit tests to exercise individual code paths in the kernel.
> >>>>>>>>>>> test-url: https://www.kernel.org/doc/Documentation/kselftest.txt
> >>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>> on test machine: 288 threads 2 sockets Intel(R) Xeon Phi(TM) CPU
> >>>>>>>>>>> 7295
> >>>>>>>>>>> @ 1.50GHz with 80G memory
> >>>>>>>>>>>
> >>>>>>>>>>> caused below changes (please refer to attached dmesg/kmsg for entire
> >>>>>>>>>>> log/backtrace):
> >>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>> If you fix the issue, kindly add following tag
> >>>>>>>>>>> Reported-by: kernel test robot <[email protected]>
> >>>>>>>>>> Thanks for your report. Assuming that the code responsible for
> >>>>>>>>>> registering the suspend operations is drivers/acpi/sleep.c for your
> >>>>>>>>>> platform, and that acpi_sleep_suspend_setup() iterated over all
> >>>>>>>>>> possible
> >>>>>>>>>> sleep states, your platform must somehow be returning that
> >>>>>>>>>> ACPI_STATE_S3
> >>>>>>>>>> is not a supported state somehow?
> >>>>>>>>>>
> >>>>>>>>>> Rafael have you ever encountered something like that?
> >>>>>>>>> Yes, there are systems with ACPI that don't support S3.
> >>>>>>>> OK and do you know what happens when we enter suspend with "mem" in
> >>>>>>>> those cases? Do we immediately return because ultimately the firmware
> >>>>>>>> does not support ACPI S3?
> >>>>>>> "mem" should not be present in the list of available strings then, so it
> >>>>>>> should be rejected right away.
> >>>>>> Well yes, that was the purpose of the patch I submitted, but assuming
> >>>>>> that we did provide "mem" as one of the possible standby modes even
> >>>>>> though that was wrong (before patch), and the test was trying to enter
> >>>>>> ACPI S3 standby, what would have happened, would the ACPI firmware honor
> >>>>>> the request but return an error, or would it actually enter ACPI S3?
> >>>>>>
> >>>>>> In any case, I will change the test to check that this is a supported
> >>>>>> standby mode before trying it.
> >>>>>
> >>>>> Unfortunately, I will need to revert bfcc1e67ff1e4aa8bfe2, because it
> >>>>> breaks user space compatibility and that's got caught properly by the test.
> >>>>
> >>>> Reverting my commit will break powerpc and other ARM/ARM64 platforms
> >>>> where mem is not supported (via PSCI),
> >>>
> >>> It won't break anything, although the things that didn't work before
> >>> will still not work after it.
> >>>
> >>> And "mem" is always supported even if there are no suspend_ops at all,
> >>> in which case it becomes an alternative way to trigger s2idle.
> >>>
> >>> So, on the affected systems, what's there in /sys/power/? Is
> >>> mem_sleep present? If so, what's in it?
> >>
> >> With 4.9 which is what I used initially:
> >>
> >> # cat /sys/power/state
> >> freeze standby
> >> # cat /sys/power/
> >> pm_async pm_print_times pm_wakeup_irq wakeup_count
> >> pm_freeze_timeout pm_test state
> >>
> >> With a newer kernel without my patch:
> >>
> >> # cat /sys/power/state
> >> freeze standby mem
> >> # cat /sys/power/mem_sleep
> >> s2idle shallow [deep]
> >
> > OK, so the "deep" and "shallow" suspend variants appear to be
> > supported. What's the problem with advertising "mem" then?
>
> s2idle and shallow are, but deep is not.

Why is it there in mem_sleep, then? It should not be there if
valid_state() returns 'false' for it.

mem_sleep_states[PM_SUSPEND_MEM] is only set by suspend_set_ops() if
valid_state(PM_SUSPEND_MEM) is 'true'.

> >
> >> # cat /sys/power/
> >> mem_sleep pm_freeze_timeout pm_wakeup_irq wakeup_count
> >> pm_async pm_print_times state
> >> pm_debug_messages pm_test suspend_stats/
> >>
> >>
> >>>
> >>>> I have a change pending for PSCI
> >>>> that will actually check that SYSTEM_SUSPEND is supported before
> >>>> unconditionally making use of it.
> >>>>
> >>>>>
> >>>>> What happens is that "mem" is a "pointer" to a secondary list of
> >>>>> possible states and that generally is "s2idle shallow deep" and if
> >>>>> s2idle is the only available option, it will be just "s2idle".
> >>>>>
> >>>>> This list is there in /sys/power/mem_sleep.
> >>>>>
> >>>>> It was done this way, because some variants of user space expect "mem"
> >>>>> to be always present and don't recognize "freeze" properly.
> >>>>>
> >>>>> Sorry for the confusion.
> >>>>
> >>>> So how do we all get our cookie here? Should we just slap an #ifndef
> >>>> CONFIG_ACPI in order to allow platforms that do not have "mem" to not
> >>>> have it?
> >>>
> >>> Certainly not.
> >>>
> >>> I've just hacked my test-bed system with ACPI so it does not register
> >>> any suspend_ops at all and I have "freeze mem disk" in
> >>> /sys/power/state and "s2idle" in /sys/power/mem_sleep. Writing "mem"
> >>> to /sys/power/state causes s2idle to be carried out.
> >>>
> >>> Since this is the expected behavior, I'm not sure what the problem is.
> >>
> >> The problem is advertising "mem" in /sys/power/state when the state is
> >> not actually supported by the platform firmware here, whether that
> >> translates into the form of s2idle or not. It is not supported, and it
> >> should not be there IMHO.
> >
> > Well, it is there, because some user space expects it to be there on
> > systems supporting any kind of system-wide suspend, including s2idle.
> > Like it or not.
>
> But that was not the case before 406e79385f32 ("PM / sleep: System sleep
> state selection interface rework") and clearly nobody complained about
> that, did they?

Yes, it was and yes, they did. Changes like that are not made without a reason.

> >
> > If it is not there, the utilities in question assume that system-wide
> > suspend is not supported at all.
>
> What utilities do depend on that? That selftest that does not even check
> that "mem" is actually present in /sys/power/state and just fails its
> test if it is not, yes it's not great, but that can be fixed.

Various GUI-based things like KDE, GNOME and similar plus the Chrome
user space IIRC.

> >
> >> I was late to the game in identifying that,
> >> but the 4.9 kernel makes sense to me.
> >>
> >> Similarly, if you take arch/powerpc/sysdev/fsl_pmc.c only
> >> PM_SUSPEND_STANDBY is valid, so advertising mem would be wrong if we
> >> don't look at what ->valid tells us.
> >
> > Again: "mem" appears in /sys/power/state if the system supports any
> > kind of system-wide suspend (because of the expectations of user space
> > mentioned above) and mem_sleep decides what it really means.
> >
> > And this is documented too (see Documentation/admin-guide/pm/sleep-states.html).
>
> The documentation just states that if the kernel supports *any* suspend
> state, then /sys/power/state will be present and likewise for
> /sys/power/mem_sleep, it does not say what the contents will be and that
> "mem" would always be present in there.

It doesn't say so directly, but it kind of wouldn't make sense to have
"mem_sleep" without "mem" in "state" and it implies that "mem_sleep"
is not empty if it is present. Ergo "mem" is present in "state" if
"mem_sleep" is present which is the case if (at least) s2idle is
supported. That is always the case if CONFIG_SUSPEND is set which
follows from the suspend-to-idle description.

Anyway, I'm still not sure what the problem really is. Commit
406e79385f32 still allows user space to only trigger transitions to
s2idle and other states explicitly reported as valid by the platform.

2021-10-21 19:21:54

by Florian Fainelli

[permalink] [raw]
Subject: Re: [PM] bfcc1e67ff: kernel-selftests.breakpoints.step_after_suspend_test.fail

On 10/20/21 11:48 AM, Rafael J. Wysocki wrote:
> On Wed, Oct 20, 2021 at 8:17 PM Florian Fainelli <[email protected]> wrote:
>>
>> On 10/20/21 9:00 AM, Rafael J. Wysocki wrote:
>>> On Wed, Oct 20, 2021 at 5:34 PM Florian Fainelli <[email protected]> wrote:
>>>>
>>>>
>>>>
>>>> On 10/20/2021 6:49 AM, Rafael J. Wysocki wrote:
>>>>> On Tue, Oct 19, 2021 at 9:04 PM Florian Fainelli <[email protected]> wrote:
>>>>>>
>>>>>> On 10/19/21 11:53 AM, Rafael J. Wysocki wrote:
>>>>>>> On 10/15/2021 9:40 PM, Florian Fainelli wrote:
>>>>>>>> On 10/15/21 11:45 AM, Rafael J. Wysocki wrote:
>>>>>>>>> On 10/14/2021 11:55 PM, Florian Fainelli wrote:
>>>>>>>>>> On 10/14/21 12:23 PM, Rafael J. Wysocki wrote:
>>>>>>>>>>> On 10/14/2021 6:26 PM, Florian Fainelli wrote:
>>>>>>>>>>>> On 10/14/21 12:57 AM, kernel test robot wrote:
>>>>>>>>>>>>> Greeting,
>>>>>>>>>>>>>
>>>>>>>>>>>>> FYI, we noticed the following commit (built with gcc-9):
>>>>>>>>>>>>>
>>>>>>>>>>>>> commit: bfcc1e67ff1e4aa8bfe2ca57f99390fc284c799d ("PM: sleep: Do not
>>>>>>>>>>>>> assume that "mem" is always present")
>>>>>>>>>>>>> https://git.kernel.org/cgit/linux/kernel/git/next/linux-next.git
>>>>>>>>>>>>> master
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> in testcase: kernel-selftests
>>>>>>>>>>>>> version: kernel-selftests-x86_64-c8c9111a-1_20210929
>>>>>>>>>>>>> with following parameters:
>>>>>>>>>>>>>
>>>>>>>>>>>>> group: group-00
>>>>>>>>>>>>> ucode: 0x11
>>>>>>>>>>>>>
>>>>>>>>>>>>> test-description: The kernel contains a set of "self tests" under
>>>>>>>>>>>>> the
>>>>>>>>>>>>> tools/testing/selftests/ directory. These are intended to be small
>>>>>>>>>>>>> unit tests to exercise individual code paths in the kernel.
>>>>>>>>>>>>> test-url: https://www.kernel.org/doc/Documentation/kselftest.txt
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> on test machine: 288 threads 2 sockets Intel(R) Xeon Phi(TM) CPU
>>>>>>>>>>>>> 7295
>>>>>>>>>>>>> @ 1.50GHz with 80G memory
>>>>>>>>>>>>>
>>>>>>>>>>>>> caused below changes (please refer to attached dmesg/kmsg for entire
>>>>>>>>>>>>> log/backtrace):
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> If you fix the issue, kindly add following tag
>>>>>>>>>>>>> Reported-by: kernel test robot <[email protected]>
>>>>>>>>>>>> Thanks for your report. Assuming that the code responsible for
>>>>>>>>>>>> registering the suspend operations is drivers/acpi/sleep.c for your
>>>>>>>>>>>> platform, and that acpi_sleep_suspend_setup() iterated over all
>>>>>>>>>>>> possible
>>>>>>>>>>>> sleep states, your platform must somehow be returning that
>>>>>>>>>>>> ACPI_STATE_S3
>>>>>>>>>>>> is not a supported state somehow?
>>>>>>>>>>>>
>>>>>>>>>>>> Rafael have you ever encountered something like that?
>>>>>>>>>>> Yes, there are systems with ACPI that don't support S3.
>>>>>>>>>> OK and do you know what happens when we enter suspend with "mem" in
>>>>>>>>>> those cases? Do we immediately return because ultimately the firmware
>>>>>>>>>> does not support ACPI S3?
>>>>>>>>> "mem" should not be present in the list of available strings then, so it
>>>>>>>>> should be rejected right away.
>>>>>>>> Well yes, that was the purpose of the patch I submitted, but assuming
>>>>>>>> that we did provide "mem" as one of the possible standby modes even
>>>>>>>> though that was wrong (before patch), and the test was trying to enter
>>>>>>>> ACPI S3 standby, what would have happened, would the ACPI firmware honor
>>>>>>>> the request but return an error, or would it actually enter ACPI S3?
>>>>>>>>
>>>>>>>> In any case, I will change the test to check that this is a supported
>>>>>>>> standby mode before trying it.
>>>>>>>
>>>>>>> Unfortunately, I will need to revert bfcc1e67ff1e4aa8bfe2, because it
>>>>>>> breaks user space compatibility and that's got caught properly by the test.
>>>>>>
>>>>>> Reverting my commit will break powerpc and other ARM/ARM64 platforms
>>>>>> where mem is not supported (via PSCI),
>>>>>
>>>>> It won't break anything, although the things that didn't work before
>>>>> will still not work after it.
>>>>>
>>>>> And "mem" is always supported even if there are no suspend_ops at all,
>>>>> in which case it becomes an alternative way to trigger s2idle.
>>>>>
>>>>> So, on the affected systems, what's there in /sys/power/? Is
>>>>> mem_sleep present? If so, what's in it?
>>>>
>>>> With 4.9 which is what I used initially:
>>>>
>>>> # cat /sys/power/state
>>>> freeze standby
>>>> # cat /sys/power/
>>>> pm_async pm_print_times pm_wakeup_irq wakeup_count
>>>> pm_freeze_timeout pm_test state
>>>>
>>>> With a newer kernel without my patch:
>>>>
>>>> # cat /sys/power/state
>>>> freeze standby mem
>>>> # cat /sys/power/mem_sleep
>>>> s2idle shallow [deep]
>>>
>>> OK, so the "deep" and "shallow" suspend variants appear to be
>>> supported. What's the problem with advertising "mem" then?
>>
>> s2idle and shallow are, but deep is not.
>
> Why is it there in mem_sleep, then? It should not be there if
> valid_state() returns 'false' for it.

The suspend_ops that is registered has a ->valid that will return false
for the PM_SUSPEND_MEM case, yet we ignore that and we still populate
valide_state[PM]

>
> mem_sleep_states[PM_SUSPEND_MEM] is only set by suspend_set_ops() if
> valid_state(PM_SUSPEND_MEM) is 'true'.

That is true, but pm_states[PM_SUSPEND_MEM] was (before my patch that
is) unconditionally present. And for the same reason that you expect
user-space to find the string "mem" in /sys/power/state, we expected not
to find it, if PM_SUSPEND_MEM is not supported.

>
>>>
>>>> # cat /sys/power/
>>>> mem_sleep pm_freeze_timeout pm_wakeup_irq wakeup_count
>>>> pm_async pm_print_times state
>>>> pm_debug_messages pm_test suspend_stats/
>>>>
>>>>
>>>>>
>>>>>> I have a change pending for PSCI
>>>>>> that will actually check that SYSTEM_SUSPEND is supported before
>>>>>> unconditionally making use of it.
>>>>>>
>>>>>>>
>>>>>>> What happens is that "mem" is a "pointer" to a secondary list of
>>>>>>> possible states and that generally is "s2idle shallow deep" and if
>>>>>>> s2idle is the only available option, it will be just "s2idle".
>>>>>>>
>>>>>>> This list is there in /sys/power/mem_sleep.
>>>>>>>
>>>>>>> It was done this way, because some variants of user space expect "mem"
>>>>>>> to be always present and don't recognize "freeze" properly.
>>>>>>>
>>>>>>> Sorry for the confusion.
>>>>>>
>>>>>> So how do we all get our cookie here? Should we just slap an #ifndef
>>>>>> CONFIG_ACPI in order to allow platforms that do not have "mem" to not
>>>>>> have it?
>>>>>
>>>>> Certainly not.
>>>>>
>>>>> I've just hacked my test-bed system with ACPI so it does not register
>>>>> any suspend_ops at all and I have "freeze mem disk" in
>>>>> /sys/power/state and "s2idle" in /sys/power/mem_sleep. Writing "mem"
>>>>> to /sys/power/state causes s2idle to be carried out.
>>>>>
>>>>> Since this is the expected behavior, I'm not sure what the problem is.
>>>>
>>>> The problem is advertising "mem" in /sys/power/state when the state is
>>>> not actually supported by the platform firmware here, whether that
>>>> translates into the form of s2idle or not. It is not supported, and it
>>>> should not be there IMHO.
>>>
>>> Well, it is there, because some user space expects it to be there on
>>> systems supporting any kind of system-wide suspend, including s2idle.
>>> Like it or not.
>>
>> But that was not the case before 406e79385f32 ("PM / sleep: System sleep
>> state selection interface rework") and clearly nobody complained about
>> that, did they?
>
> Yes, it was and yes, they did. Changes like that are not made without a reason.
>
>>>
>>> If it is not there, the utilities in question assume that system-wide
>>> suspend is not supported at all.
>>
>> What utilities do depend on that? That selftest that does not even check
>> that "mem" is actually present in /sys/power/state and just fails its
>> test if it is not, yes it's not great, but that can be fixed.
>
> Various GUI-based things like KDE, GNOME and similar plus the Chrome
> user space IIRC.

OK.

>
>>>
>>>> I was late to the game in identifying that,
>>>> but the 4.9 kernel makes sense to me.
>>>>
>>>> Similarly, if you take arch/powerpc/sysdev/fsl_pmc.c only
>>>> PM_SUSPEND_STANDBY is valid, so advertising mem would be wrong if we
>>>> don't look at what ->valid tells us.
>>>
>>> Again: "mem" appears in /sys/power/state if the system supports any
>>> kind of system-wide suspend (because of the expectations of user space
>>> mentioned above) and mem_sleep decides what it really means.
>>>
>>> And this is documented too (see Documentation/admin-guide/pm/sleep-states.html).
>>
>> The documentation just states that if the kernel supports *any* suspend
>> state, then /sys/power/state will be present and likewise for
>> /sys/power/mem_sleep, it does not say what the contents will be and that
>> "mem" would always be present in there.
>
> It doesn't say so directly, but it kind of wouldn't make sense to have
> "mem_sleep" without "mem" in "state" and it implies that "mem_sleep"
> is not empty if it is present. Ergo "mem" is present in "state" if
> "mem_sleep" is present which is the case if (at least) s2idle is
> supported. That is always the case if CONFIG_SUSPEND is set which
> follows from the suspend-to-idle description.
>
> Anyway, I'm still not sure what the problem really is. Commit
> 406e79385f32 still allows user space to only trigger transitions to
> s2idle and other states explicitly reported as valid by the platform.
>

The problem from my perspective is still that "mem" is present even with
PM_SUSPEND_PM not being valid for the said platform, and this is just
confusing my/our user-space here as well as our users. This was not like
that back in the 4.9 kernel, but it changed later, therefore it also
constitutes an user-space regression from my angle.
--
Florian

2021-10-22 14:02:54

by Rafael J. Wysocki

[permalink] [raw]
Subject: Re: [PM] bfcc1e67ff: kernel-selftests.breakpoints.step_after_suspend_test.fail

On Thu, Oct 21, 2021 at 9:19 PM Florian Fainelli <[email protected]> wrote:
>
> On 10/20/21 11:48 AM, Rafael J. Wysocki wrote:
> > On Wed, Oct 20, 2021 at 8:17 PM Florian Fainelli <[email protected]> wrote:
> >>
> >> On 10/20/21 9:00 AM, Rafael J. Wysocki wrote:

[cut]

> >>>> With a newer kernel without my patch:
> >>>>
> >>>> # cat /sys/power/state
> >>>> freeze standby mem
> >>>> # cat /sys/power/mem_sleep
> >>>> s2idle shallow [deep]
> >>>
> >>> OK, so the "deep" and "shallow" suspend variants appear to be
> >>> supported. What's the problem with advertising "mem" then?
> >>
> >> s2idle and shallow are, but deep is not.
> >
> > Why is it there in mem_sleep, then? It should not be there if
> > valid_state() returns 'false' for it.
>
> The suspend_ops that is registered has a ->valid that will return false
> for the PM_SUSPEND_MEM case, yet we ignore that and we still populate
> valide_state[PM]

I guess you mean "we still populate pm_states[PM_SUSPEND_MEM]".
Otherwise I'm not sure what you mean. :-)

Yes, we do, but this doesn't explain why "deep" is present in
"mem_sleep" in that case, because mem_sleep_states[PM_SUSPEND_MEM] is
only populated by suspend_set_ops() and that does take ->valid() into
account.

> >
> > mem_sleep_states[PM_SUSPEND_MEM] is only set by suspend_set_ops() if
> > valid_state(PM_SUSPEND_MEM) is 'true'.
>
> That is true, but pm_states[PM_SUSPEND_MEM] was (before my patch that
> is) unconditionally present. And for the same reason that you expect
> user-space to find the string "mem" in /sys/power/state, we expected not
> to find it, if PM_SUSPEND_MEM is not supported.
>
> >
> >>>
> >>>> # cat /sys/power/
> >>>> mem_sleep pm_freeze_timeout pm_wakeup_irq wakeup_count
> >>>> pm_async pm_print_times state
> >>>> pm_debug_messages pm_test suspend_stats/
> >>>>
> >>>>
> >>>>>
> >>>>>> I have a change pending for PSCI
> >>>>>> that will actually check that SYSTEM_SUSPEND is supported before
> >>>>>> unconditionally making use of it.
> >>>>>>
> >>>>>>>
> >>>>>>> What happens is that "mem" is a "pointer" to a secondary list of
> >>>>>>> possible states and that generally is "s2idle shallow deep" and if
> >>>>>>> s2idle is the only available option, it will be just "s2idle".
> >>>>>>>
> >>>>>>> This list is there in /sys/power/mem_sleep.
> >>>>>>>
> >>>>>>> It was done this way, because some variants of user space expect "mem"
> >>>>>>> to be always present and don't recognize "freeze" properly.
> >>>>>>>
> >>>>>>> Sorry for the confusion.
> >>>>>>
> >>>>>> So how do we all get our cookie here? Should we just slap an #ifndef
> >>>>>> CONFIG_ACPI in order to allow platforms that do not have "mem" to not
> >>>>>> have it?
> >>>>>
> >>>>> Certainly not.
> >>>>>
> >>>>> I've just hacked my test-bed system with ACPI so it does not register
> >>>>> any suspend_ops at all and I have "freeze mem disk" in
> >>>>> /sys/power/state and "s2idle" in /sys/power/mem_sleep. Writing "mem"
> >>>>> to /sys/power/state causes s2idle to be carried out.
> >>>>>
> >>>>> Since this is the expected behavior, I'm not sure what the problem is.
> >>>>
> >>>> The problem is advertising "mem" in /sys/power/state when the state is
> >>>> not actually supported by the platform firmware here, whether that
> >>>> translates into the form of s2idle or not. It is not supported, and it
> >>>> should not be there IMHO.
> >>>
> >>> Well, it is there, because some user space expects it to be there on
> >>> systems supporting any kind of system-wide suspend, including s2idle.
> >>> Like it or not.
> >>
> >> But that was not the case before 406e79385f32 ("PM / sleep: System sleep
> >> state selection interface rework") and clearly nobody complained about
> >> that, did they?
> >
> > Yes, it was and yes, they did. Changes like that are not made without a reason.
> >
> >>>
> >>> If it is not there, the utilities in question assume that system-wide
> >>> suspend is not supported at all.
> >>
> >> What utilities do depend on that? That selftest that does not even check
> >> that "mem" is actually present in /sys/power/state and just fails its
> >> test if it is not, yes it's not great, but that can be fixed.
> >
> > Various GUI-based things like KDE, GNOME and similar plus the Chrome
> > user space IIRC.
>
> OK.
>
> >
> >>>
> >>>> I was late to the game in identifying that,
> >>>> but the 4.9 kernel makes sense to me.
> >>>>
> >>>> Similarly, if you take arch/powerpc/sysdev/fsl_pmc.c only
> >>>> PM_SUSPEND_STANDBY is valid, so advertising mem would be wrong if we
> >>>> don't look at what ->valid tells us.
> >>>
> >>> Again: "mem" appears in /sys/power/state if the system supports any
> >>> kind of system-wide suspend (because of the expectations of user space
> >>> mentioned above) and mem_sleep decides what it really means.
> >>>
> >>> And this is documented too (see Documentation/admin-guide/pm/sleep-states.html).
> >>
> >> The documentation just states that if the kernel supports *any* suspend
> >> state, then /sys/power/state will be present and likewise for
> >> /sys/power/mem_sleep, it does not say what the contents will be and that
> >> "mem" would always be present in there.
> >
> > It doesn't say so directly, but it kind of wouldn't make sense to have
> > "mem_sleep" without "mem" in "state" and it implies that "mem_sleep"
> > is not empty if it is present. Ergo "mem" is present in "state" if
> > "mem_sleep" is present which is the case if (at least) s2idle is
> > supported. That is always the case if CONFIG_SUSPEND is set which
> > follows from the suspend-to-idle description.
> >
> > Anyway, I'm still not sure what the problem really is. Commit
> > 406e79385f32 still allows user space to only trigger transitions to
> > s2idle and other states explicitly reported as valid by the platform.
> >
>
> The problem from my perspective is still that "mem" is present even with
> PM_SUSPEND_PM not being valid for the said platform, and this is just
> confusing my/our user-space here as well as our users.

The "mem" string in /sys/power/state simply doesn't correspond to
PM_SUSPEND_MEM now, but PM_SUSPEND_MEM is just a number it has never
been well defined which state exactly is to be represented by it
(except for the ACPI case, but that's irrelevant here).

You are effectively saying something like "if my suspend_ops->valid()
returns 'false' for PM_SUSPEND_MEM, then the "mem" string should not
be there in /sys/power/state", but I don't really see why this matters
to user space and how, so please explain that to me. What does it do
differently depending on whether or not "mem" is present?

The only case in which it may matter that I can imagine is if some
functionality is lost when "mem" is present in /sys/power/state, but
quite frankly I have no idea what functionality that may be.

> This was not like
> that back in the 4.9 kernel, but it changed later, therefore it also
> constitutes an user-space regression from my angle.

The best I can offer as a remedy would be an explicit opt-out
mechanism, say in the form of a kernel command line option, for
systems that cannot cope with the new sysfs interface behavior, but I
need to be convinced that they really cannot cope with it.