An important part of validating that s0ix worked properly is to check how much
of a cycle was spent in a hardware sleep state.
The reporting of hardware sleep is a mix of kernel messages and sysfs files
that vary from vendor to vendor. Collecting this information requires extra
information on the kernel command line or fetching from debugfs.
To make this information more readily accessible introduce a new file in
suspend_stats that drivers can report into during their resume routine.
Userspace can fetch this information and compare it against the duration of
the cycle to allow determining residency percentages and flagging problems.
v4->v5
* Stop calculating a percentage, let userspace do this if desirable.
Userspace may just care != 0.
* Fix S3 case for Intel PMC
Previous submission:
https://lore.kernel.org/all/[email protected]/
Mario Limonciello (4):
PM: Add a sysfs file to represent time spent in hardware sleep state
platform/x86/amd: pmc: Report duration of time in hw sleep state
platform/x86/intel/pmc: core: Always capture counters on suspend
platform/x86/intel/pmc: core: Report duration of time in HW sleep
state
Documentation/ABI/testing/sysfs-power | 9 +++++++++
drivers/platform/x86/amd/pmc.c | 5 ++---
drivers/platform/x86/intel/pmc/core.c | 15 +++++++-------
drivers/platform/x86/intel/pmc/core.h | 2 --
include/linux/suspend.h | 2 ++
kernel/power/main.c | 29 +++++++++++++++++++++++++++
6 files changed, 49 insertions(+), 13 deletions(-)
--
2.34.1
Userspace can't easily discover how much of a sleep cycle was spent in a
hardware sleep state without using kernel tracing and vendor specific sysfs
or debugfs files.
To make this information more discoverable, introduce a new sysfs file
to represent the time spent in a sleep state.
This file will be present only if the system supports s2idle.
Signed-off-by: Mario Limonciello <[email protected]>
---
v4->v5:
* Provide time in microseconds instead of percent. Userspace can convert
this if desirable.
---
Documentation/ABI/testing/sysfs-power | 9 +++++++++
include/linux/suspend.h | 2 ++
kernel/power/main.c | 29 +++++++++++++++++++++++++++
3 files changed, 40 insertions(+)
diff --git a/Documentation/ABI/testing/sysfs-power b/Documentation/ABI/testing/sysfs-power
index f99d433ff311..9e0c31b9ce85 100644
--- a/Documentation/ABI/testing/sysfs-power
+++ b/Documentation/ABI/testing/sysfs-power
@@ -413,6 +413,15 @@ Description:
The /sys/power/suspend_stats/last_failed_step file contains
the last failed step in the suspend/resume path.
+What: /sys/power/suspend_stats/last_hw_sleep
+Date: June 2023
+Contact: Mario Limonciello <[email protected]>
+Description:
+ The /sys/power/suspend_stats/last_hw_sleep file
+ contains the duration of time spent in a hardware sleep
+ state during from the previous suspend cycle. This number
+ is measured in microseconds.
+
What: /sys/power/sync_on_suspend
Date: October 2019
Contact: Jonas Meurer <[email protected]>
diff --git a/include/linux/suspend.h b/include/linux/suspend.h
index cfe19a028918..e0f2ac5f4406 100644
--- a/include/linux/suspend.h
+++ b/include/linux/suspend.h
@@ -68,6 +68,7 @@ struct suspend_stats {
int last_failed_errno;
int errno[REC_FAILED_NUM];
int last_failed_step;
+ u64 last_hw_sleep;
enum suspend_stat_step failed_steps[REC_FAILED_NUM];
};
@@ -489,6 +490,7 @@ void restore_processor_state(void);
extern int register_pm_notifier(struct notifier_block *nb);
extern int unregister_pm_notifier(struct notifier_block *nb);
extern void ksys_sync_helper(void);
+extern void pm_set_hw_sleep_time(u64 t);
#define pm_notifier(fn, pri) { \
static struct notifier_block fn##_nb = \
diff --git a/kernel/power/main.c b/kernel/power/main.c
index 31ec4a9b9d70..6a2bf8784ce8 100644
--- a/kernel/power/main.c
+++ b/kernel/power/main.c
@@ -6,6 +6,7 @@
* Copyright (c) 2003 Open Source Development Lab
*/
+#include <linux/acpi.h>
#include <linux/export.h>
#include <linux/kobject.h>
#include <linux/string.h>
@@ -83,6 +84,12 @@ int unregister_pm_notifier(struct notifier_block *nb)
}
EXPORT_SYMBOL_GPL(unregister_pm_notifier);
+void pm_set_hw_sleep_time(u64 t)
+{
+ suspend_stats.last_hw_sleep = t;
+}
+EXPORT_SYMBOL_GPL(pm_set_hw_sleep_time);
+
int pm_notifier_call_chain_robust(unsigned long val_up, unsigned long val_down)
{
int ret;
@@ -377,6 +384,13 @@ static ssize_t last_failed_step_show(struct kobject *kobj,
}
static struct kobj_attribute last_failed_step = __ATTR_RO(last_failed_step);
+static ssize_t last_hw_sleep_show(struct kobject *kobj,
+ struct kobj_attribute *attr, char *buf)
+{
+ return sysfs_emit(buf, "%llu\n", suspend_stats.last_hw_sleep);
+}
+static struct kobj_attribute last_hw_sleep = __ATTR_RO(last_hw_sleep);
+
static struct attribute *suspend_attrs[] = {
&success.attr,
&fail.attr,
@@ -391,12 +405,27 @@ static struct attribute *suspend_attrs[] = {
&last_failed_dev.attr,
&last_failed_errno.attr,
&last_failed_step.attr,
+ &last_hw_sleep.attr,
NULL,
};
+static umode_t suspend_attr_is_visible(struct kobject *kobj, struct attribute *attr, int idx)
+{
+ if (attr == &last_hw_sleep.attr) {
+#ifdef CONFIG_ACPI
+ if (acpi_gbl_FADT.flags & ACPI_FADT_LOW_POWER_S0)
+ return 0444;
+#endif
+ return 0;
+ }
+
+ return 0444;
+}
+
static const struct attribute_group suspend_attr_group = {
.name = "suspend_stats",
.attrs = suspend_attrs,
+ .is_visible = suspend_attr_is_visible,
};
#ifdef CONFIG_DEBUG_FS
--
2.34.1
amd_pmc displays a warning when a suspend didn't get to the deepest
state and a dynamic debugging message with the duration if it did.
Rather than logging to dynamic debugging the duration spent in the
deepest state, report this to the standard kernel reporting
infrastructure so that userspace software can query after the
suspend cycle is done.
Signed-off-by: Mario Limonciello <[email protected]>
---
drivers/platform/x86/amd/pmc.c | 5 ++---
1 file changed, 2 insertions(+), 3 deletions(-)
diff --git a/drivers/platform/x86/amd/pmc.c b/drivers/platform/x86/amd/pmc.c
index 2edaae04a691..2c1ea9c14819 100644
--- a/drivers/platform/x86/amd/pmc.c
+++ b/drivers/platform/x86/amd/pmc.c
@@ -393,9 +393,8 @@ static void amd_pmc_validate_deepest(struct amd_pmc_dev *pdev)
if (!table.s0i3_last_entry_status)
dev_warn(pdev->dev, "Last suspend didn't reach deepest state\n");
- else
- dev_dbg(pdev->dev, "Last suspend in deepest state for %lluus\n",
- table.timein_s0i3_lastcapture);
+ pm_set_hw_sleep_time(table.s0i3_last_entry_status ?
+ table.timein_s0i3_lastcapture : 0);
}
static int amd_pmc_get_smu_version(struct amd_pmc_dev *dev)
--
2.34.1
intel_pmc_core displays a warning when the module parameter
`warn_on_s0ix_failures` is set and a suspend didn't get to a HW sleep
state.
Report this to the standard kernel reporting infrastructure so that
userspace software can query after the suspend cycle is done.
Signed-off-by: Mario Limonciello <[email protected]>
---
v4->v5:
* Reword commit message
---
drivers/platform/x86/intel/pmc/core.c | 2 ++
1 file changed, 2 insertions(+)
diff --git a/drivers/platform/x86/intel/pmc/core.c b/drivers/platform/x86/intel/pmc/core.c
index e2f171fac094..980af32dd48a 100644
--- a/drivers/platform/x86/intel/pmc/core.c
+++ b/drivers/platform/x86/intel/pmc/core.c
@@ -1203,6 +1203,8 @@ static inline bool pmc_core_is_s0ix_failed(struct pmc_dev *pmcdev)
if (pmc_core_dev_state_get(pmcdev, &s0ix_counter))
return false;
+ pm_set_hw_sleep_time(s0ix_counter - pmcdev->s0ix_counter);
+
if (s0ix_counter == pmcdev->s0ix_counter)
return true;
--
2.34.1
Currently counters are only captured during suspend when the
warn_on_s0ix_failures module parameter is set.
In order to relay this counter information to the kernel reporting
infrastructure adjust it so that the counters are always captured.
warn_on_s0ix_failures will be utilized solely for messaging by
the driver instead.
Signed-off-by: Mario Limonciello <[email protected]>
---
v4->v5:
* Squash patches together
* Add extra pm_suspend_via_firmware() check for resume routine too
---
drivers/platform/x86/intel/pmc/core.c | 13 +++++--------
drivers/platform/x86/intel/pmc/core.h | 2 --
2 files changed, 5 insertions(+), 10 deletions(-)
diff --git a/drivers/platform/x86/intel/pmc/core.c b/drivers/platform/x86/intel/pmc/core.c
index 3a15d32d7644..e2f171fac094 100644
--- a/drivers/platform/x86/intel/pmc/core.c
+++ b/drivers/platform/x86/intel/pmc/core.c
@@ -1168,12 +1168,6 @@ static __maybe_unused int pmc_core_suspend(struct device *dev)
{
struct pmc_dev *pmcdev = dev_get_drvdata(dev);
- pmcdev->check_counters = false;
-
- /* No warnings on S0ix failures */
- if (!warn_on_s0ix_failures)
- return 0;
-
/* Check if the syspend will actually use S0ix */
if (pm_suspend_via_firmware())
return 0;
@@ -1186,7 +1180,6 @@ static __maybe_unused int pmc_core_suspend(struct device *dev)
if (pmc_core_dev_state_get(pmcdev, &pmcdev->s0ix_counter))
return -EIO;
- pmcdev->check_counters = true;
return 0;
}
@@ -1222,12 +1215,16 @@ static __maybe_unused int pmc_core_resume(struct device *dev)
const struct pmc_bit_map **maps = pmcdev->map->lpm_sts;
int offset = pmcdev->map->lpm_status_offset;
- if (!pmcdev->check_counters)
+ /* Check if the syspend used S0ix */
+ if (pm_suspend_via_firmware())
return 0;
if (!pmc_core_is_s0ix_failed(pmcdev))
return 0;
+ if (!warn_on_s0ix_failures)
+ return 0;
+
if (pmc_core_is_pc10_failed(pmcdev)) {
/* S0ix failed because of PC10 entry failure */
dev_info(dev, "CPU did not enter PC10!!! (PC10 cnt=0x%llx)\n",
diff --git a/drivers/platform/x86/intel/pmc/core.h b/drivers/platform/x86/intel/pmc/core.h
index 810204d758ab..51d73efceaf3 100644
--- a/drivers/platform/x86/intel/pmc/core.h
+++ b/drivers/platform/x86/intel/pmc/core.h
@@ -319,7 +319,6 @@ struct pmc_reg_map {
* @pmc_xram_read_bit: flag to indicate whether PMC XRAM shadow registers
* used to read MPHY PG and PLL status are available
* @mutex_lock: mutex to complete one transcation
- * @check_counters: On resume, check if counters are getting incremented
* @pc10_counter: PC10 residency counter
* @s0ix_counter: S0ix residency (step adjusted)
* @num_lpm_modes: Count of enabled modes
@@ -338,7 +337,6 @@ struct pmc_dev {
int pmc_xram_read_bit;
struct mutex lock; /* generic mutex lock for PMC Core */
- bool check_counters; /* Check for counter increments on resume */
u64 pc10_counter;
u64 s0ix_counter;
int num_lpm_modes;
--
2.34.1
Hi Mario,
Thank you for the patch! Yet something to improve:
[auto build test ERROR on linus/master]
[also build test ERROR on v6.3-rc4 next-20230330]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch#_base_tree_information]
url: https://github.com/intel-lab-lkp/linux/commits/Mario-Limonciello/PM-Add-a-sysfs-file-to-represent-time-spent-in-hardware-sleep-state/20230331-034714
patch link: https://lore.kernel.org/r/20230330194439.14361-5-mario.limonciello%40amd.com
patch subject: [PATCH v5 4/4] platform/x86/intel/pmc: core: Report duration of time in HW sleep state
config: i386-randconfig-a001 (https://download.01.org/0day-ci/archive/20230331/[email protected]/config)
compiler: gcc-11 (Debian 11.3.0-8) 11.3.0
reproduce (this is a W=1 build):
# https://github.com/intel-lab-lkp/linux/commit/ab8a5cd564c9bd22860612acbd76477f7515ca7b
git remote add linux-review https://github.com/intel-lab-lkp/linux
git fetch --no-tags linux-review Mario-Limonciello/PM-Add-a-sysfs-file-to-represent-time-spent-in-hardware-sleep-state/20230331-034714
git checkout ab8a5cd564c9bd22860612acbd76477f7515ca7b
# save the config file
mkdir build_dir && cp config build_dir/.config
make W=1 O=build_dir ARCH=i386 olddefconfig
make W=1 O=build_dir ARCH=i386 SHELL=/bin/bash
If you fix the issue, kindly add following tag where applicable
| Reported-by: kernel test robot <[email protected]>
| Link: https://lore.kernel.org/oe-kbuild-all/[email protected]/
All errors (new ones prefixed by >>):
drivers/platform/x86/intel/pmc/core.c: In function 'pmc_core_is_s0ix_failed':
>> drivers/platform/x86/intel/pmc/core.c:1206:9: error: implicit declaration of function 'pm_set_hw_sleep_time' [-Werror=implicit-function-declaration]
1206 | pm_set_hw_sleep_time(s0ix_counter - pmcdev->s0ix_counter);
| ^~~~~~~~~~~~~~~~~~~~
cc1: some warnings being treated as errors
vim +/pm_set_hw_sleep_time +1206 drivers/platform/x86/intel/pmc/core.c
1198
1199 static inline bool pmc_core_is_s0ix_failed(struct pmc_dev *pmcdev)
1200 {
1201 u64 s0ix_counter;
1202
1203 if (pmc_core_dev_state_get(pmcdev, &s0ix_counter))
1204 return false;
1205
> 1206 pm_set_hw_sleep_time(s0ix_counter - pmcdev->s0ix_counter);
1207
1208 if (s0ix_counter == pmcdev->s0ix_counter)
1209 return true;
1210
1211 return false;
1212 }
1213
--
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests
On Thu, Mar 30, 2023 at 9:45 PM Mario Limonciello
<[email protected]> wrote:
>
> Userspace can't easily discover how much of a sleep cycle was spent in a
> hardware sleep state without using kernel tracing and vendor specific sysfs
> or debugfs files.
>
> To make this information more discoverable, introduce a new sysfs file
> to represent the time spent in a sleep state.
This is only in the most recent suspend-resume cycle, isn't it?
Wouldn't it be useful to have another attribute printing the
accumulated total HW sleep time?
> This file will be present only if the system supports s2idle.
>
> Signed-off-by: Mario Limonciello <[email protected]>
> ---
> v4->v5:
> * Provide time in microseconds instead of percent. Userspace can convert
> this if desirable.
> ---
> Documentation/ABI/testing/sysfs-power | 9 +++++++++
> include/linux/suspend.h | 2 ++
> kernel/power/main.c | 29 +++++++++++++++++++++++++++
> 3 files changed, 40 insertions(+)
>
> diff --git a/Documentation/ABI/testing/sysfs-power b/Documentation/ABI/testing/sysfs-power
> index f99d433ff311..9e0c31b9ce85 100644
> --- a/Documentation/ABI/testing/sysfs-power
> +++ b/Documentation/ABI/testing/sysfs-power
> @@ -413,6 +413,15 @@ Description:
> The /sys/power/suspend_stats/last_failed_step file contains
> the last failed step in the suspend/resume path.
>
> +What: /sys/power/suspend_stats/last_hw_sleep
> +Date: June 2023
> +Contact: Mario Limonciello <[email protected]>
> +Description:
> + The /sys/power/suspend_stats/last_hw_sleep file
> + contains the duration of time spent in a hardware sleep
> + state during from the previous suspend cycle. This number
"during from"?
I would say "in the most recent system suspend-resume cycle".
> + is measured in microseconds.
> +
> What: /sys/power/sync_on_suspend
> Date: October 2019
> Contact: Jonas Meurer <[email protected]>
> diff --git a/include/linux/suspend.h b/include/linux/suspend.h
> index cfe19a028918..e0f2ac5f4406 100644
> --- a/include/linux/suspend.h
> +++ b/include/linux/suspend.h
> @@ -68,6 +68,7 @@ struct suspend_stats {
> int last_failed_errno;
> int errno[REC_FAILED_NUM];
> int last_failed_step;
> + u64 last_hw_sleep;
> enum suspend_stat_step failed_steps[REC_FAILED_NUM];
> };
>
> @@ -489,6 +490,7 @@ void restore_processor_state(void);
> extern int register_pm_notifier(struct notifier_block *nb);
> extern int unregister_pm_notifier(struct notifier_block *nb);
> extern void ksys_sync_helper(void);
> +extern void pm_set_hw_sleep_time(u64 t);
>
> #define pm_notifier(fn, pri) { \
> static struct notifier_block fn##_nb = \
> diff --git a/kernel/power/main.c b/kernel/power/main.c
> index 31ec4a9b9d70..6a2bf8784ce8 100644
> --- a/kernel/power/main.c
> +++ b/kernel/power/main.c
> @@ -6,6 +6,7 @@
> * Copyright (c) 2003 Open Source Development Lab
> */
>
> +#include <linux/acpi.h>
> #include <linux/export.h>
> #include <linux/kobject.h>
> #include <linux/string.h>
> @@ -83,6 +84,12 @@ int unregister_pm_notifier(struct notifier_block *nb)
> }
> EXPORT_SYMBOL_GPL(unregister_pm_notifier);
>
> +void pm_set_hw_sleep_time(u64 t)
> +{
> + suspend_stats.last_hw_sleep = t;
> +}
> +EXPORT_SYMBOL_GPL(pm_set_hw_sleep_time);
> +
> int pm_notifier_call_chain_robust(unsigned long val_up, unsigned long val_down)
> {
> int ret;
> @@ -377,6 +384,13 @@ static ssize_t last_failed_step_show(struct kobject *kobj,
> }
> static struct kobj_attribute last_failed_step = __ATTR_RO(last_failed_step);
>
> +static ssize_t last_hw_sleep_show(struct kobject *kobj,
> + struct kobj_attribute *attr, char *buf)
> +{
> + return sysfs_emit(buf, "%llu\n", suspend_stats.last_hw_sleep);
> +}
> +static struct kobj_attribute last_hw_sleep = __ATTR_RO(last_hw_sleep);
> +
> static struct attribute *suspend_attrs[] = {
> &success.attr,
> &fail.attr,
> @@ -391,12 +405,27 @@ static struct attribute *suspend_attrs[] = {
> &last_failed_dev.attr,
> &last_failed_errno.attr,
> &last_failed_step.attr,
> + &last_hw_sleep.attr,
> NULL,
> };
>
> +static umode_t suspend_attr_is_visible(struct kobject *kobj, struct attribute *attr, int idx)
> +{
> + if (attr == &last_hw_sleep.attr) {
> +#ifdef CONFIG_ACPI
> + if (acpi_gbl_FADT.flags & ACPI_FADT_LOW_POWER_S0)
> + return 0444;
> +#endif
> + return 0;
> + }
> +
> + return 0444;
if (attr != &last_hw_sleep.attr)
return 0444;
#ifdef CONFIG_ACPI
if (acpi_gbl_FADT.flags & ACPI_FADT_LOW_POWER_S0)
return 0444;
#endif
return 0;
> +}
> +
> static const struct attribute_group suspend_attr_group = {
> .name = "suspend_stats",
> .attrs = suspend_attrs,
> + .is_visible = suspend_attr_is_visible,
> };
>
> #ifdef CONFIG_DEBUG_FS
> --
On 3/31/2023 13:01, Rafael J. Wysocki wrote:
> On Thu, Mar 30, 2023 at 9:45 PM Mario Limonciello
> <[email protected]> wrote:
>>
>> Userspace can't easily discover how much of a sleep cycle was spent in a
>> hardware sleep state without using kernel tracing and vendor specific sysfs
>> or debugfs files.
>>
>> To make this information more discoverable, introduce a new sysfs file
>> to represent the time spent in a sleep state.
>
> This is only in the most recent suspend-resume cycle, isn't it?
Yes; that's correct.
>
> Wouldn't it be useful to have another attribute printing the
> accumulated total HW sleep time?
>
I had considered this; but I didn't think it was actually very useful
because userspace will get control at the end of every cycle and can
accumulate those numbers if desirable.
>> This file will be present only if the system supports s2idle.
>>
>> Signed-off-by: Mario Limonciello <[email protected]>
>> ---
>> v4->v5:
>> * Provide time in microseconds instead of percent. Userspace can convert
>> this if desirable.
>> ---
>> Documentation/ABI/testing/sysfs-power | 9 +++++++++
>> include/linux/suspend.h | 2 ++
>> kernel/power/main.c | 29 +++++++++++++++++++++++++++
>> 3 files changed, 40 insertions(+)
>>
>> diff --git a/Documentation/ABI/testing/sysfs-power b/Documentation/ABI/testing/sysfs-power
>> index f99d433ff311..9e0c31b9ce85 100644
>> --- a/Documentation/ABI/testing/sysfs-power
>> +++ b/Documentation/ABI/testing/sysfs-power
>> @@ -413,6 +413,15 @@ Description:
>> The /sys/power/suspend_stats/last_failed_step file contains
>> the last failed step in the suspend/resume path.
>>
>> +What: /sys/power/suspend_stats/last_hw_sleep
>> +Date: June 2023
>> +Contact: Mario Limonciello <[email protected]>
>> +Description:
>> + The /sys/power/suspend_stats/last_hw_sleep file
>> + contains the duration of time spent in a hardware sleep
>> + state during from the previous suspend cycle. This number
>
> "during from"?
>
> I would say "in the most recent system suspend-resume cycle".
Ack, thanks.
>
>> + is measured in microseconds.
>> +
>> What: /sys/power/sync_on_suspend
>> Date: October 2019
>> Contact: Jonas Meurer <[email protected]>
>> diff --git a/include/linux/suspend.h b/include/linux/suspend.h
>> index cfe19a028918..e0f2ac5f4406 100644
>> --- a/include/linux/suspend.h
>> +++ b/include/linux/suspend.h
>> @@ -68,6 +68,7 @@ struct suspend_stats {
>> int last_failed_errno;
>> int errno[REC_FAILED_NUM];
>> int last_failed_step;
>> + u64 last_hw_sleep;
>> enum suspend_stat_step failed_steps[REC_FAILED_NUM];
>> };
>>
>> @@ -489,6 +490,7 @@ void restore_processor_state(void);
>> extern int register_pm_notifier(struct notifier_block *nb);
>> extern int unregister_pm_notifier(struct notifier_block *nb);
>> extern void ksys_sync_helper(void);
>> +extern void pm_set_hw_sleep_time(u64 t);
>>
>> #define pm_notifier(fn, pri) { \
>> static struct notifier_block fn##_nb = \
>> diff --git a/kernel/power/main.c b/kernel/power/main.c
>> index 31ec4a9b9d70..6a2bf8784ce8 100644
>> --- a/kernel/power/main.c
>> +++ b/kernel/power/main.c
>> @@ -6,6 +6,7 @@
>> * Copyright (c) 2003 Open Source Development Lab
>> */
>>
>> +#include <linux/acpi.h>
>> #include <linux/export.h>
>> #include <linux/kobject.h>
>> #include <linux/string.h>
>> @@ -83,6 +84,12 @@ int unregister_pm_notifier(struct notifier_block *nb)
>> }
>> EXPORT_SYMBOL_GPL(unregister_pm_notifier);
>>
>> +void pm_set_hw_sleep_time(u64 t)
>> +{
>> + suspend_stats.last_hw_sleep = t;
>> +}
>> +EXPORT_SYMBOL_GPL(pm_set_hw_sleep_time);
>> +
>> int pm_notifier_call_chain_robust(unsigned long val_up, unsigned long val_down)
>> {
>> int ret;
>> @@ -377,6 +384,13 @@ static ssize_t last_failed_step_show(struct kobject *kobj,
>> }
>> static struct kobj_attribute last_failed_step = __ATTR_RO(last_failed_step);
>>
>> +static ssize_t last_hw_sleep_show(struct kobject *kobj,
>> + struct kobj_attribute *attr, char *buf)
>> +{
>> + return sysfs_emit(buf, "%llu\n", suspend_stats.last_hw_sleep);
>> +}
>> +static struct kobj_attribute last_hw_sleep = __ATTR_RO(last_hw_sleep);
>> +
>> static struct attribute *suspend_attrs[] = {
>> &success.attr,
>> &fail.attr,
>> @@ -391,12 +405,27 @@ static struct attribute *suspend_attrs[] = {
>> &last_failed_dev.attr,
>> &last_failed_errno.attr,
>> &last_failed_step.attr,
>> + &last_hw_sleep.attr,
>> NULL,
>> };
>>
>> +static umode_t suspend_attr_is_visible(struct kobject *kobj, struct attribute *attr, int idx)
>> +{
>> + if (attr == &last_hw_sleep.attr) {
>> +#ifdef CONFIG_ACPI
>> + if (acpi_gbl_FADT.flags & ACPI_FADT_LOW_POWER_S0)
>> + return 0444;
>> +#endif
>> + return 0;
>> + }
>> +
>> + return 0444;
>
> if (attr != &last_hw_sleep.attr)
> return 0444;
>
> #ifdef CONFIG_ACPI
> if (acpi_gbl_FADT.flags & ACPI_FADT_LOW_POWER_S0)
> return 0444;
> #endif
>
> return 0;
>
Ack, thanks.
>> +}
>> +
>> static const struct attribute_group suspend_attr_group = {
>> .name = "suspend_stats",
>> .attrs = suspend_attrs,
>> + .is_visible = suspend_attr_is_visible,
>> };
>>
>> #ifdef CONFIG_DEBUG_FS
>> --
On Thu, Mar 30, 2023 at 9:45 PM Mario Limonciello
<[email protected]> wrote:
>
> intel_pmc_core displays a warning when the module parameter
> `warn_on_s0ix_failures` is set and a suspend didn't get to a HW sleep
> state.
>
> Report this to the standard kernel reporting infrastructure so that
> userspace software can query after the suspend cycle is done.
>
> Signed-off-by: Mario Limonciello <[email protected]>
> ---
> v4->v5:
> * Reword commit message
> ---
> drivers/platform/x86/intel/pmc/core.c | 2 ++
> 1 file changed, 2 insertions(+)
>
> diff --git a/drivers/platform/x86/intel/pmc/core.c b/drivers/platform/x86/intel/pmc/core.c
> index e2f171fac094..980af32dd48a 100644
> --- a/drivers/platform/x86/intel/pmc/core.c
> +++ b/drivers/platform/x86/intel/pmc/core.c
> @@ -1203,6 +1203,8 @@ static inline bool pmc_core_is_s0ix_failed(struct pmc_dev *pmcdev)
> if (pmc_core_dev_state_get(pmcdev, &s0ix_counter))
> return false;
>
> + pm_set_hw_sleep_time(s0ix_counter - pmcdev->s0ix_counter);
> +
Maybe check if this is really accumulating? In case of a counter
overflow, for instance?
> if (s0ix_counter == pmcdev->s0ix_counter)
> return true;
>
> --
On Fri, Mar 31, 2023 at 8:05 PM Limonciello, Mario
<[email protected]> wrote:
>
> On 3/31/2023 13:01, Rafael J. Wysocki wrote:
> > On Thu, Mar 30, 2023 at 9:45 PM Mario Limonciello
> > <[email protected]> wrote:
> >>
> >> Userspace can't easily discover how much of a sleep cycle was spent in a
> >> hardware sleep state without using kernel tracing and vendor specific sysfs
> >> or debugfs files.
> >>
> >> To make this information more discoverable, introduce a new sysfs file
> >> to represent the time spent in a sleep state.
> >
> > This is only in the most recent suspend-resume cycle, isn't it?
>
> Yes; that's correct.
>
> >
> > Wouldn't it be useful to have another attribute printing the
> > accumulated total HW sleep time?
> >
>
> I had considered this; but I didn't think it was actually very useful
> because userspace will get control at the end of every cycle and can
> accumulate those numbers if desirable.
Unless "user space" in question is actually a human, that is.
On 3/31/2023 13:07, Rafael J. Wysocki wrote:
> On Fri, Mar 31, 2023 at 8:05 PM Limonciello, Mario
> <[email protected]> wrote:
>>
>> On 3/31/2023 13:01, Rafael J. Wysocki wrote:
>>> On Thu, Mar 30, 2023 at 9:45 PM Mario Limonciello
>>> <[email protected]> wrote:
>>>>
>>>> Userspace can't easily discover how much of a sleep cycle was spent in a
>>>> hardware sleep state without using kernel tracing and vendor specific sysfs
>>>> or debugfs files.
>>>>
>>>> To make this information more discoverable, introduce a new sysfs file
>>>> to represent the time spent in a sleep state.
>>>
>>> This is only in the most recent suspend-resume cycle, isn't it?
>>
>> Yes; that's correct.
>>
>>>
>>> Wouldn't it be useful to have another attribute printing the
>>> accumulated total HW sleep time?
>>>
>>
>> I had considered this; but I didn't think it was actually very useful
>> because userspace will get control at the end of every cycle and can
>> accumulate those numbers if desirable.
>
> Unless "user space" in question is actually a human, that is.
This is what I envisioned:
* In traditional Linux some software like systemd could capture this and
log it.
It could subtract at the time the suspend started from the time it ended
and use that to calculate a percentage of time in hardware sleep state
and then put that percentage in the system journal.
* In ChromeOS something like powerd would be the only thing reading this
number and it could be adding it up during dark resume until the system
gets to a full resume.
* If a human is manually capturing these values they'll be most
interested in whether an individual cycle reached hardware sleep.
If it didn't they'll want to look at debug data from that cycle.
The aggregate will be noise.
Do you think of another use case?
On Fri, Mar 31, 2023 at 8:13 PM Limonciello, Mario
<[email protected]> wrote:
>
> On 3/31/2023 13:07, Rafael J. Wysocki wrote:
> > On Fri, Mar 31, 2023 at 8:05 PM Limonciello, Mario
> > <[email protected]> wrote:
> >>
> >> On 3/31/2023 13:01, Rafael J. Wysocki wrote:
> >>> On Thu, Mar 30, 2023 at 9:45 PM Mario Limonciello
> >>> <[email protected]> wrote:
> >>>>
> >>>> Userspace can't easily discover how much of a sleep cycle was spent in a
> >>>> hardware sleep state without using kernel tracing and vendor specific sysfs
> >>>> or debugfs files.
> >>>>
> >>>> To make this information more discoverable, introduce a new sysfs file
> >>>> to represent the time spent in a sleep state.
> >>>
> >>> This is only in the most recent suspend-resume cycle, isn't it?
> >>
> >> Yes; that's correct.
> >>
> >>>
> >>> Wouldn't it be useful to have another attribute printing the
> >>> accumulated total HW sleep time?
> >>>
> >>
> >> I had considered this; but I didn't think it was actually very useful
> >> because userspace will get control at the end of every cycle and can
> >> accumulate those numbers if desirable.
> >
> > Unless "user space" in question is actually a human, that is.
>
> This is what I envisioned:
>
> * In traditional Linux some software like systemd could capture this and
> log it.
> It could subtract at the time the suspend started from the time it ended
> and use that to calculate a percentage of time in hardware sleep state
> and then put that percentage in the system journal.
>
> * In ChromeOS something like powerd would be the only thing reading this
> number and it could be adding it up during dark resume until the system
> gets to a full resume.
>
> * If a human is manually capturing these values they'll be most
> interested in whether an individual cycle reached hardware sleep.
Or whether or not it has been reached in any cycle so far? Or in the
most recent 10 cycles? Or similar?
Or how much time the system spent in HW sleep in general?
> If it didn't they'll want to look at debug data from that cycle.
> The aggregate will be noise.
Not necessarily and the point is that you can relatively easily
provide both values, the one from the most recent cycle and the grand
total.
> Do you think of another use case?
Well, if the kernel can easily compute and expose the grand total
value, why is it better to offload that to user space, possibly in
different ways in different system configurations? What if somebody
runs a minimum user space?
On Thu, 2023-03-30 at 14:44 -0500, Mario Limonciello wrote:
> Currently counters are only captured during suspend when the
> warn_on_s0ix_failures module parameter is set.
>
> In order to relay this counter information to the kernel reporting
> infrastructure adjust it so that the counters are always captured.
>
> warn_on_s0ix_failures will be utilized solely for messaging by
> the driver instead.
>
> Signed-off-by: Mario Limonciello <[email protected]>
Reviewed-by: David E. Box <[email protected]>
> ---
> v4->v5:
> * Squash patches together
> * Add extra pm_suspend_via_firmware() check for resume routine too
> ---
> drivers/platform/x86/intel/pmc/core.c | 13 +++++--------
> drivers/platform/x86/intel/pmc/core.h | 2 --
> 2 files changed, 5 insertions(+), 10 deletions(-)
>
> diff --git a/drivers/platform/x86/intel/pmc/core.c
> b/drivers/platform/x86/intel/pmc/core.c
> index 3a15d32d7644..e2f171fac094 100644
> --- a/drivers/platform/x86/intel/pmc/core.c
> +++ b/drivers/platform/x86/intel/pmc/core.c
> @@ -1168,12 +1168,6 @@ static __maybe_unused int pmc_core_suspend(struct
> device *dev)
> {
> struct pmc_dev *pmcdev = dev_get_drvdata(dev);
>
> - pmcdev->check_counters = false;
> -
> - /* No warnings on S0ix failures */
> - if (!warn_on_s0ix_failures)
> - return 0;
> -
> /* Check if the syspend will actually use S0ix */
> if (pm_suspend_via_firmware())
> return 0;
> @@ -1186,7 +1180,6 @@ static __maybe_unused int pmc_core_suspend(struct device
> *dev)
> if (pmc_core_dev_state_get(pmcdev, &pmcdev->s0ix_counter))
> return -EIO;
>
> - pmcdev->check_counters = true;
> return 0;
> }
>
> @@ -1222,12 +1215,16 @@ static __maybe_unused int pmc_core_resume(struct
> device *dev)
> const struct pmc_bit_map **maps = pmcdev->map->lpm_sts;
> int offset = pmcdev->map->lpm_status_offset;
>
> - if (!pmcdev->check_counters)
> + /* Check if the syspend used S0ix */
> + if (pm_suspend_via_firmware())
> return 0;
>
> if (!pmc_core_is_s0ix_failed(pmcdev))
> return 0;
>
> + if (!warn_on_s0ix_failures)
> + return 0;
> +
> if (pmc_core_is_pc10_failed(pmcdev)) {
> /* S0ix failed because of PC10 entry failure */
> dev_info(dev, "CPU did not enter PC10!!! (PC10 cnt=0x%llx)\n",
> diff --git a/drivers/platform/x86/intel/pmc/core.h
> b/drivers/platform/x86/intel/pmc/core.h
> index 810204d758ab..51d73efceaf3 100644
> --- a/drivers/platform/x86/intel/pmc/core.h
> +++ b/drivers/platform/x86/intel/pmc/core.h
> @@ -319,7 +319,6 @@ struct pmc_reg_map {
> * @pmc_xram_read_bit: flag to indicate whether PMC XRAM shadow registers
> * used to read MPHY PG and PLL status are available
> * @mutex_lock: mutex to complete one transcation
> - * @check_counters: On resume, check if counters are getting incremented
> * @pc10_counter: PC10 residency counter
> * @s0ix_counter: S0ix residency (step adjusted)
> * @num_lpm_modes: Count of enabled modes
> @@ -338,7 +337,6 @@ struct pmc_dev {
> int pmc_xram_read_bit;
> struct mutex lock; /* generic mutex lock for PMC Core */
>
> - bool check_counters; /* Check for counter increments on resume */
> u64 pc10_counter;
> u64 s0ix_counter;
> int num_lpm_modes;
On Fri, 2023-03-31 at 20:05 +0200, Rafael J. Wysocki wrote:
> On Thu, Mar 30, 2023 at 9:45 PM Mario Limonciello
> <[email protected]> wrote:
> >
> > intel_pmc_core displays a warning when the module parameter
> > `warn_on_s0ix_failures` is set and a suspend didn't get to a HW sleep
> > state.
> >
> > Report this to the standard kernel reporting infrastructure so that
> > userspace software can query after the suspend cycle is done.
> >
> > Signed-off-by: Mario Limonciello <[email protected]>
> > ---
> > v4->v5:
> > * Reword commit message
> > ---
> > drivers/platform/x86/intel/pmc/core.c | 2 ++
> > 1 file changed, 2 insertions(+)
> >
> > diff --git a/drivers/platform/x86/intel/pmc/core.c
> > b/drivers/platform/x86/intel/pmc/core.c
> > index e2f171fac094..980af32dd48a 100644
> > --- a/drivers/platform/x86/intel/pmc/core.c
> > +++ b/drivers/platform/x86/intel/pmc/core.c
> > @@ -1203,6 +1203,8 @@ static inline bool pmc_core_is_s0ix_failed(struct
> > pmc_dev *pmcdev)
> > if (pmc_core_dev_state_get(pmcdev, &s0ix_counter))
> > return false;
> >
> > + pm_set_hw_sleep_time(s0ix_counter - pmcdev->s0ix_counter);
> > +
>
> Maybe check if this is really accumulating? In case of a counter
> overflow, for instance?
Overflow is likely on some systems. The counter is only 32-bit and at our
smallest granularity of 30.5us per tick it could overflow after a day and a half
of s0ix time, though most of our systems have a higher granularity that puts
them around 6 days.
This brings up an issue that the attribute cannot be trusted if the system is
suspended for longer than the maximum hardware counter time. Should be noted in
the Documentation.
David
>
> > if (s0ix_counter == pmcdev->s0ix_counter)
> > return true;
> >
> > --
On 4/3/2023 13:00, Box, David E wrote:
> On Fri, 2023-03-31 at 20:05 +0200, Rafael J. Wysocki wrote:
>> On Thu, Mar 30, 2023 at 9:45 PM Mario Limonciello
>> <[email protected]> wrote:
>>>
>>> intel_pmc_core displays a warning when the module parameter
>>> `warn_on_s0ix_failures` is set and a suspend didn't get to a HW sleep
>>> state.
>>>
>>> Report this to the standard kernel reporting infrastructure so that
>>> userspace software can query after the suspend cycle is done.
>>>
>>> Signed-off-by: Mario Limonciello <[email protected]>
>>> ---
>>> v4->v5:
>>> * Reword commit message
>>> ---
>>> drivers/platform/x86/intel/pmc/core.c | 2 ++
>>> 1 file changed, 2 insertions(+)
>>>
>>> diff --git a/drivers/platform/x86/intel/pmc/core.c
>>> b/drivers/platform/x86/intel/pmc/core.c
>>> index e2f171fac094..980af32dd48a 100644
>>> --- a/drivers/platform/x86/intel/pmc/core.c
>>> +++ b/drivers/platform/x86/intel/pmc/core.c
>>> @@ -1203,6 +1203,8 @@ static inline bool pmc_core_is_s0ix_failed(struct
>>> pmc_dev *pmcdev)
>>> if (pmc_core_dev_state_get(pmcdev, &s0ix_counter))
>>> return false;
>>>
>>> + pm_set_hw_sleep_time(s0ix_counter - pmcdev->s0ix_counter);
>>> +
>>
>> Maybe check if this is really accumulating? In case of a counter
>> overflow, for instance?
>
> Overflow is likely on some systems. The counter is only 32-bit and at our
> smallest granularity of 30.5us per tick it could overflow after a day and a half
> of s0ix time, though most of our systems have a higher granularity that puts
> them around 6 days.
>
> This brings up an issue that the attribute cannot be trusted if the system is
> suspended for longer than the maximum hardware counter time. Should be noted in
> the Documentation.
I think it would be rather confusing for userspace having to account for
this and it's better to abstract it in the kernel.
How can you discover the granularity a system can support?
How would you know overflow actually happened? Is there a bit somewhere
else that could tell you?
In terms of ABI how about when we know overflow occurred and userspace
reads the sysfs file we return -EOVERFLOW instead of a potentially bad
value?
On Mon, Apr 3, 2023 at 8:07 PM Limonciello, Mario
<[email protected]> wrote:
>
> On 4/3/2023 13:00, Box, David E wrote:
> > On Fri, 2023-03-31 at 20:05 +0200, Rafael J. Wysocki wrote:
> >> On Thu, Mar 30, 2023 at 9:45 PM Mario Limonciello
> >> <[email protected]> wrote:
> >>>
> >>> intel_pmc_core displays a warning when the module parameter
> >>> `warn_on_s0ix_failures` is set and a suspend didn't get to a HW sleep
> >>> state.
> >>>
> >>> Report this to the standard kernel reporting infrastructure so that
> >>> userspace software can query after the suspend cycle is done.
> >>>
> >>> Signed-off-by: Mario Limonciello <[email protected]>
> >>> ---
> >>> v4->v5:
> >>> * Reword commit message
> >>> ---
> >>> drivers/platform/x86/intel/pmc/core.c | 2 ++
> >>> 1 file changed, 2 insertions(+)
> >>>
> >>> diff --git a/drivers/platform/x86/intel/pmc/core.c
> >>> b/drivers/platform/x86/intel/pmc/core.c
> >>> index e2f171fac094..980af32dd48a 100644
> >>> --- a/drivers/platform/x86/intel/pmc/core.c
> >>> +++ b/drivers/platform/x86/intel/pmc/core.c
> >>> @@ -1203,6 +1203,8 @@ static inline bool pmc_core_is_s0ix_failed(struct
> >>> pmc_dev *pmcdev)
> >>> if (pmc_core_dev_state_get(pmcdev, &s0ix_counter))
> >>> return false;
> >>>
> >>> + pm_set_hw_sleep_time(s0ix_counter - pmcdev->s0ix_counter);
> >>> +
> >>
> >> Maybe check if this is really accumulating? In case of a counter
> >> overflow, for instance?
> >
> > Overflow is likely on some systems. The counter is only 32-bit and at our
> > smallest granularity of 30.5us per tick it could overflow after a day and a half
> > of s0ix time, though most of our systems have a higher granularity that puts
> > them around 6 days.
> >
> > This brings up an issue that the attribute cannot be trusted if the system is
> > suspended for longer than the maximum hardware counter time. Should be noted in
> > the Documentation.
>
> I think it would be rather confusing for userspace having to account for
> this and it's better to abstract it in the kernel.
>
> How can you discover the granularity a system can support?
> How would you know overflow actually happened? Is there a bit somewhere
> else that could tell you?
I'm not really sure if there is a generally usable overflow detection for this.
> In terms of ABI how about when we know overflow occurred and userspace
> reads the sysfs file we return -EOVERFLOW instead of a potentially bad
> value?
So if the new value is greater than the old one, you don't really know
whether or not an overflow has taken place.
And so I would just document the fact that the underlying HW/firmware
counter overflows as suggested by Dave.