2022-11-15 20:06:17

by Mario Limonciello

[permalink] [raw]
Subject: [RFC v3 0/4] Make it easier to measure % in HW sleep state

Sven van Ashbrook brought a patch to the kernel mailing list that
attempted to change the reporting level of a s0ix entry issue to a
different debugging level so that infastructure used by Google could
better scan logs to catch problems.

This approach was rejected, but during the conversation another
suggestion was made by David E. Box to introduce some infrastructure
into the kernel to report this information.

As it's information that is reported by both AMD and Intel platforms
over s2idle, this seems to make sense.

RFC v1 and v2 introduced two new sysfs files to report the information, but
Rafael pointed out that there was already a file that could be used on
Intel platforms: `low_power_idle_system_residency_us`.

RFC v3 creates this file for AMD platforms and also introduces another file
that can be used to determine total sleep time:
`/sys/power/suspend_stats/last_total`.

With these two files a simple shell script can be run after suspend to
calculate the percentage.

```
#!/bin/sh
total=$(cat /sys/power/suspend_stats/last_total)
hw=$(cat /sys/devices/system/cpu/cpuidle/low_power_idle_system_residency_us)
percent=$(awk -v hw=$hw -v total=$total 'BEGIN { printf "%.2f%%", (hw/total*100) }')
echo "Last ${total}us suspend cycle spent $percent of the time in a hardware sleep state."
```

A sample run on an AMD platform that was just sleeping with this series on
top of 6.1-rc5 shows the following:
# ./compare.sh
Last 15699838us suspend cycle spent 98.63% of the time in a hardware sleep state.

Further discussion to be expected on this series:

* What last_total will represent from the suspend cycle

* Whether the semantics of all platforms will be the same for
`low_power_idle_system_residency_us`
- AMD platforms reset this counter before s2idle entry. Do Intel? Others?

* Maybe the *kernel* should be responsible to do the calculation and export
a `last_hw_sleep_percent` file instead. Platform differences can be
abstracted then within individual drivers.

Mario Limonciello (4):
PM: Add a sysfs file to represent the total sleep duration
platform/x86/intel/pmc: core: Drop check_counters
platform/x86/amd: pmc: Report duration of time in deepest hw state
platform/x86/amd: pmc: Populate cpuidle sysfs file with hw sleep data

Documentation/ABI/testing/sysfs-amd-pmc | 6 ++++++
Documentation/ABI/testing/sysfs-power | 8 ++++++++
drivers/platform/x86/amd/pmc.c | 27 ++++++++++++++++++++++---
drivers/platform/x86/intel/pmc/core.c | 7 ++-----
drivers/platform/x86/intel/pmc/core.h | 1 -
include/linux/suspend.h | 2 ++
kernel/power/main.c | 15 ++++++++++++++
kernel/power/suspend.c | 2 ++
kernel/time/timekeeping.c | 2 ++
9 files changed, 61 insertions(+), 9 deletions(-)

--
2.34.1



2022-11-15 20:22:49

by Mario Limonciello

[permalink] [raw]
Subject: [RFC v3 3/4] platform/x86/amd: pmc: Report duration of time in deepest hw state

amd_pmc displays a warning when a suspend didn't get to the deepest
state and a dynamic debugging message with the duration if it did.

Rather than logging to dynamic debugging the duration spent in the
deepest state, report this to a file in sysfs.

Signed-off-by: Mario Limonciello <[email protected]>
---
RFC v2->v3:
* Create new sysfs file instead of reporting using kernel symbol
---
Documentation/ABI/testing/sysfs-amd-pmc | 6 ++++++
drivers/platform/x86/amd/pmc.c | 22 +++++++++++++++++++---
2 files changed, 25 insertions(+), 3 deletions(-)

diff --git a/Documentation/ABI/testing/sysfs-amd-pmc b/Documentation/ABI/testing/sysfs-amd-pmc
index c421b72844f1..89354da8e680 100644
--- a/Documentation/ABI/testing/sysfs-amd-pmc
+++ b/Documentation/ABI/testing/sysfs-amd-pmc
@@ -11,3 +11,9 @@ Contact: Mario Limonciello <[email protected]>
Description: Reading this file reports the program corresponding to the SMU
firmware version. The program field is used to disambiguate two
APU/CPU models that can share the same firmware binary.
+
+What: /sys/bus/platform/drivers/amd_pmc/*/low_power_idle_system_residency_us
+Date: December 2022
+Contact: Mario Limonciello <[email protected]>
+Description: Reading this file reports the duration of the last suspend that
+ was spent in the deepest hardware sleep state in microseconds.
diff --git a/drivers/platform/x86/amd/pmc.c b/drivers/platform/x86/amd/pmc.c
index 96e790e639a2..2550ba6d28f5 100644
--- a/drivers/platform/x86/amd/pmc.c
+++ b/drivers/platform/x86/amd/pmc.c
@@ -13,6 +13,7 @@
#include <linux/acpi.h>
#include <linux/bitfield.h>
#include <linux/bits.h>
+#include <linux/cpu.h>
#include <linux/debugfs.h>
#include <linux/delay.h>
#include <linux/io.h>
@@ -363,9 +364,6 @@ static void amd_pmc_validate_deepest(struct amd_pmc_dev *pdev)

if (!table.s0i3_last_entry_status)
dev_warn(pdev->dev, "Last suspend didn't reach deepest state\n");
- else
- dev_dbg(pdev->dev, "Last suspend in deepest state for %lluus\n",
- table.timein_s0i3_lastcapture);
}
#endif

@@ -417,12 +415,30 @@ static ssize_t smu_program_show(struct device *d, struct device_attribute *attr,
return sysfs_emit(buf, "%u\n", dev->smu_program);
}

+static ssize_t low_power_idle_system_residency_us_show(struct device *d,
+ struct device_attribute *attr,
+ char *buf)
+{
+ struct amd_pmc_dev *dev = &pmc;
+ struct smu_metrics table;
+ int ret;
+
+ ret = get_metrics_table(dev, &table);
+ if (ret)
+ return ret;
+
+ return sysfs_emit(buf, "%llu\n", table.s0i3_last_entry_status ?
+ table.timein_s0i3_lastcapture : 0);
+}
+
static DEVICE_ATTR_RO(smu_fw_version);
static DEVICE_ATTR_RO(smu_program);
+static DEVICE_ATTR_RO(low_power_idle_system_residency_us);

static struct attribute *pmc_attrs[] = {
&dev_attr_smu_fw_version.attr,
&dev_attr_smu_program.attr,
+ &dev_attr_low_power_idle_system_residency_us.attr,
NULL,
};
ATTRIBUTE_GROUPS(pmc);
--
2.34.1


2022-11-15 20:23:11

by Mario Limonciello

[permalink] [raw]
Subject: [RFC v3 4/4] platform/x86/amd: pmc: Populate cpuidle sysfs file with hw sleep data

AMD platforms don't provide an ACPI LPIT table, but the data about
duration of time in a hardware sleep state is part of the SMU metrics
table.

To allow userspace to query this from a standard location, export this
information to the standard
`/sys/devices/system/cpu/cpuidle/low_power_idle_system_residency_us`.

Signed-off-by: Mario Limonciello <[email protected]>
---
RFC v2->v3:
* New patch
---
drivers/platform/x86/amd/pmc.c | 5 +++++
1 file changed, 5 insertions(+)

diff --git a/drivers/platform/x86/amd/pmc.c b/drivers/platform/x86/amd/pmc.c
index 2550ba6d28f5..ef5c23df1a5b 100644
--- a/drivers/platform/x86/amd/pmc.c
+++ b/drivers/platform/x86/amd/pmc.c
@@ -787,6 +787,11 @@ static void amd_pmc_s2idle_restore(void)

/* Notify on failed entry */
amd_pmc_validate_deepest(pdev);
+
+ /* Silently fail, if cpuidle attribute group is not present */
+ sysfs_add_file_to_group(&cpu_subsys.dev_root->kobj,
+ &dev_attr_low_power_idle_system_residency_us.attr,
+ "cpuidle");
}

static struct acpi_s2idle_dev_ops amd_pmc_s2idle_dev_ops = {
--
2.34.1


2022-11-15 20:39:25

by Mario Limonciello

[permalink] [raw]
Subject: [RFC v3 2/4] platform/x86/intel/pmc: core: Drop check_counters

`check_counters` is a stateful variable for indicating whether or
not to be checking if counters incremented on resume from s2idle.

As the module already has code to gate whether to check the counters
that will fail the suspend when this is enabled, use that instead.

Signed-off-by: Mario Limonciello <[email protected]>
---
RFC v2->v3:
* New patch
---
drivers/platform/x86/intel/pmc/core.c | 7 ++-----
drivers/platform/x86/intel/pmc/core.h | 1 -
2 files changed, 2 insertions(+), 6 deletions(-)

diff --git a/drivers/platform/x86/intel/pmc/core.c b/drivers/platform/x86/intel/pmc/core.c
index 17ec5825d13d..adc2cae4db28 100644
--- a/drivers/platform/x86/intel/pmc/core.c
+++ b/drivers/platform/x86/intel/pmc/core.c
@@ -2059,8 +2059,6 @@ static __maybe_unused int pmc_core_suspend(struct device *dev)
{
struct pmc_dev *pmcdev = dev_get_drvdata(dev);

- pmcdev->check_counters = false;
-
/* No warnings on S0ix failures */
if (!warn_on_s0ix_failures)
return 0;
@@ -2077,7 +2075,6 @@ static __maybe_unused int pmc_core_suspend(struct device *dev)
if (pmc_core_dev_state_get(pmcdev, &pmcdev->s0ix_counter))
return -EIO;

- pmcdev->check_counters = true;
return 0;
}

@@ -2113,10 +2110,10 @@ static __maybe_unused int pmc_core_resume(struct device *dev)
const struct pmc_bit_map **maps = pmcdev->map->lpm_sts;
int offset = pmcdev->map->lpm_status_offset;

- if (!pmcdev->check_counters)
+ if (!pmc_core_is_s0ix_failed(pmcdev))
return 0;

- if (!pmc_core_is_s0ix_failed(pmcdev))
+ if (!warn_on_s0ix_failures)
return 0;

if (pmc_core_is_pc10_failed(pmcdev)) {
diff --git a/drivers/platform/x86/intel/pmc/core.h b/drivers/platform/x86/intel/pmc/core.h
index 7a059e02c265..5687e91e884c 100644
--- a/drivers/platform/x86/intel/pmc/core.h
+++ b/drivers/platform/x86/intel/pmc/core.h
@@ -316,7 +316,6 @@ struct pmc_reg_map {
* @pmc_xram_read_bit: flag to indicate whether PMC XRAM shadow registers
* used to read MPHY PG and PLL status are available
* @mutex_lock: mutex to complete one transcation
- * @check_counters: On resume, check if counters are getting incremented
* @pc10_counter: PC10 residency counter
* @s0ix_counter: S0ix residency (step adjusted)
* @num_lpm_modes: Count of enabled modes
--
2.34.1


2022-11-17 15:22:06

by Hans de Goede

[permalink] [raw]
Subject: Re: [RFC v3 0/4] Make it easier to measure % in HW sleep state

Hi,

On 11/15/22 21:01, Mario Limonciello wrote:
> Sven van Ashbrook brought a patch to the kernel mailing list that
> attempted to change the reporting level of a s0ix entry issue to a
> different debugging level so that infastructure used by Google could
> better scan logs to catch problems.
>
> This approach was rejected, but during the conversation another
> suggestion was made by David E. Box to introduce some infrastructure
> into the kernel to report this information.
>
> As it's information that is reported by both AMD and Intel platforms
> over s2idle, this seems to make sense.
>
> RFC v1 and v2 introduced two new sysfs files to report the information, but
> Rafael pointed out that there was already a file that could be used on
> Intel platforms: `low_power_idle_system_residency_us`.
>
> RFC v3 creates this file for AMD platforms and also introduces another file
> that can be used to determine total sleep time:
> `/sys/power/suspend_stats/last_total`.
>
> With these two files a simple shell script can be run after suspend to
> calculate the percentage.
>
> ```
> #!/bin/sh
> total=$(cat /sys/power/suspend_stats/last_total)
> hw=$(cat /sys/devices/system/cpu/cpuidle/low_power_idle_system_residency_us)
> percent=$(awk -v hw=$hw -v total=$total 'BEGIN { printf "%.2f%%", (hw/total*100) }')
> echo "Last ${total}us suspend cycle spent $percent of the time in a hardware sleep state."
> ```
>
> A sample run on an AMD platform that was just sleeping with this series on
> top of 6.1-rc5 shows the following:
> # ./compare.sh
> Last 15699838us suspend cycle spent 98.63% of the time in a hardware sleep state.
>
> Further discussion to be expected on this series:
>
> * What last_total will represent from the suspend cycle
>
> * Whether the semantics of all platforms will be the same for
> `low_power_idle_system_residency_us`
> - AMD platforms reset this counter before s2idle entry. Do Intel? Others?
>
> * Maybe the *kernel* should be responsible to do the calculation and export
> a `last_hw_sleep_percent` file instead. Platform differences can be
> abstracted then within individual drivers.

That (`last_hw_sleep_percent` file) is an interesting proposal,
I can see that being a better interface because as you say this allows
the kernel / platform-drivers to take care of any platform quirks /
weirdness, avoiding any userspace monitoring of this to possibly
give false positive warnings.

Regards,

Hans





>
> Mario Limonciello (4):
> PM: Add a sysfs file to represent the total sleep duration
> platform/x86/intel/pmc: core: Drop check_counters
> platform/x86/amd: pmc: Report duration of time in deepest hw state
> platform/x86/amd: pmc: Populate cpuidle sysfs file with hw sleep data
>
> Documentation/ABI/testing/sysfs-amd-pmc | 6 ++++++
> Documentation/ABI/testing/sysfs-power | 8 ++++++++
> drivers/platform/x86/amd/pmc.c | 27 ++++++++++++++++++++++---
> drivers/platform/x86/intel/pmc/core.c | 7 ++-----
> drivers/platform/x86/intel/pmc/core.h | 1 -
> include/linux/suspend.h | 2 ++
> kernel/power/main.c | 15 ++++++++++++++
> kernel/power/suspend.c | 2 ++
> kernel/time/timekeeping.c | 2 ++
> 9 files changed, 61 insertions(+), 9 deletions(-)
>