2020-03-11 09:39:01

by Gautham R Shenoy

[permalink] [raw]
Subject: [PATCH v3 0/6] Track and expose idle PURR and SPURR ticks

From: "Gautham R. Shenoy" <[email protected]>

Hi,

This is the third version of the patches to track and expose idle PURR
and SPURR ticks. These patches are required by tools such as lparstat
to compute system utilization for capacity planning purposes.

The previous versions can be found here:
v2: https://lkml.org/lkml/2020/2/21/21
v1: https://lore.kernel.org/patchwork/cover/1159341/

They key changes from v2 are:

- The prolog and epilog functions have been named
pseries_idle_prolog() and pseries_idle_epilog() respectively to
indicate their pseries specific nature.

- Fixed the Documentation for
/sys/devices/system/cpu/cpuX/idle_spurr as pointed out by
Nathan Lynch.

- Introduces a patch (Patch 6/6) to send an IPI in order to read
and cache the values of purr, spurr, idle_purr and idle_spurr of
the target CPU when any one of them is read via sysfs. These
cached values will be presented if any of these sysfs are read
within the next 10ms. If these sysfs files are read after 10ms
from the earlier IPI, a fresh IPI is issued to read and cache
the values again. This minimizes the number of IPIs required to
be sent when these values are read back-to-back via the sysfs
interface.

Test-results: While reading the four sysfs files back-to-back for a
given CPU every second for 100 seconds.

Without patch 6/6 (Without caching):
16 [XICS 2 Edge IPI] = 422 times
DBL [Doorbell interrupts] = 13 times
Total : 435 IPIs.

With patch 6/6 (With caching):
16 [XICS 2 Edge IPI] = 111 times
DBL [Doorbell interrupts] = 17 times
Total : 128 IPIs.
Motivation:
===========
On PSeries LPARs, the data centers planners desire a more accurate
view of system utilization per resource such as CPU to plan the system
capacity requirements better. Such accuracy can be obtained by reading
PURR/SPURR registers for CPU resource utilization.

Tools such as lparstat which are used to compute the utilization need
to know [S]PURR ticks when the cpu was busy or idle. The [S]PURR
counters are already exposed through sysfs. We already account for
PURR ticks when we go to idle so that we can update the VPA area. This
patchset extends support to account for SPURR ticks when idle, and
expose both via per-cpu sysfs files.

These patches are required for enhancement to the lparstat utility
that compute the CPU utilization based on PURR and SPURR which can be
found here :
https://groups.google.com/forum/#!topic/powerpc-utils-devel/fYRo69xO9r4


With the patches, when lparstat is run on a LPAR running CPU-Hogs,
=========================================================================
sudo ./src/lparstat -E 1 3
System Configuration
type=Dedicated mode=Capped smt=8 lcpu=2 mem=4834176 kB cpus=0 ent=2.00
---Actual--- -Normalized-
%busy %idle Frequency %busy %idle
------ ------ ------------- ------ ------
1 99.99 0.00 3.35GHz[111%] 110.99 0.00
2 100.00 0.00 3.35GHz[111%] 111.00 0.00
3 100.00 0.00 3.35GHz[111%] 111.00 0.00
=========================================================================

When lparstat is run on an LPAR that is idle,
=========================================================================
$ sudo ./src/lparstat -E 1 3
System Configuration
type=Dedicated mode=Capped smt=8 lcpu=2 mem=4834176 kB cpus=0 ent=2.00
---Actual--- -Normalized-
%busy %idle Frequency %busy %idle
------ ------ ------------- ------ ------
1 0.71 99.30 2.18GHz[ 72%] 0.53 71.48
2 0.56 99.44 2.11GHz[ 70%] 0.43 69.57
3 0.54 99.46 2.11GHz[ 70%] 0.43 69.57
=========================================================================


Gautham R. Shenoy (6):
powerpc: Move idle_loop_prolog()/epilog() functions to header file
powerpc/idle: Add accessor function to always read latest idle PURR
powerpc/pseries: Account for SPURR ticks on idle CPUs
powerpc/sysfs: Show idle_purr and idle_spurr for every CPU
Documentation: Document sysfs interfaces purr, spurr, idle_purr, idle_spurr
pseries/sysfs: Minimise IPI noise while reading [idle_][s]purr

Documentation/ABI/testing/sysfs-devices-system-cpu | 39 ++++++
arch/powerpc/include/asm/idle.h | 89 ++++++++++++++
arch/powerpc/kernel/sysfs.c | 133 +++++++++++++++++++--
arch/powerpc/platforms/pseries/setup.c | 8 +-
drivers/cpuidle/cpuidle-pseries.c | 39 ++----
5 files changed, 267 insertions(+), 41 deletions(-)
create mode 100644 arch/powerpc/include/asm/idle.h

--
1.9.4


2020-03-11 09:39:15

by Gautham R Shenoy

[permalink] [raw]
Subject: [PATCH v3 6/6] pseries/sysfs: Minimise IPI noise while reading [idle_][s]purr

From: "Gautham R. Shenoy" <[email protected]>

Currently purr, spurr, idle_purr, idle_spurr are exposed for every CPU
via the sysfs interface
/sys/devices/system/cpu/cpuX/[idle_][s]purr. Each sysfs read currently
generates an IPI to obtain the desired value from the target CPU X.
Since these aforementioned sysfs files are typically read one after
another, we end up generating 4 IPIs per CPU in a short duration.

In order to minimize the IPI noise, this patch caches the values of
all the four entities whenever one of them is read. If subsequently
any of these are read within the next 10ms, the cached value is
returned. With this, we will generate at most one IPI every 10ms for
every CPU.

Test-results: While reading the four sysfs files back-to-back for a
given CPU every second for 100 seconds.

Without the patch:
16 [XICS 2 Edge IPI] = 422 times
DBL [Doorbell interrupts] = 13 times
Total : 435 IPIs.

With the patch:
16 [XICS 2 Edge IPI] = 111 times
DBL [Doorbell interrupts] = 17 times
Total : 128 IPIs.

Signed-off-by: Gautham R. Shenoy <[email protected]>
---
arch/powerpc/kernel/sysfs.c | 109 ++++++++++++++++++++++++++++++++++++--------
1 file changed, 90 insertions(+), 19 deletions(-)

diff --git a/arch/powerpc/kernel/sysfs.c b/arch/powerpc/kernel/sysfs.c
index c9ddb83..db8fc90 100644
--- a/arch/powerpc/kernel/sysfs.c
+++ b/arch/powerpc/kernel/sysfs.c
@@ -586,8 +586,6 @@ void ppc_enable_pmcs(void)
* SPRs which are not related to PMU.
*/
#ifdef CONFIG_PPC64
-SYSFS_SPRSETUP(purr, SPRN_PURR);
-SYSFS_SPRSETUP(spurr, SPRN_SPURR);
SYSFS_SPRSETUP(pir, SPRN_PIR);
SYSFS_SPRSETUP(tscr, SPRN_TSCR);

@@ -596,8 +594,6 @@ void ppc_enable_pmcs(void)
enable write when needed with a separate function.
Lets be conservative and default to pseries.
*/
-static DEVICE_ATTR(spurr, 0400, show_spurr, NULL);
-static DEVICE_ATTR(purr, 0400, show_purr, store_purr);
static DEVICE_ATTR(pir, 0400, show_pir, NULL);
static DEVICE_ATTR(tscr, 0600, show_tscr, store_tscr);
#endif /* CONFIG_PPC64 */
@@ -761,39 +757,114 @@ static void create_svm_file(void)
}
#endif /* CONFIG_PPC_SVM */

-static void read_idle_purr(void *val)
+/*
+ * The duration (in ms) from the last IPI to the target CPU until
+ * which a cached value of purr, spurr, idle_purr, idle_spurr can be
+ * reported to the user on a corresponding sysfs file read. Beyond
+ * this duration, fresh values need to be obtained by sending IPIs to
+ * the target CPU when the sysfs files are read.
+ */
+static unsigned long util_stats_staleness_tolerance_ms = 10;
+struct util_acct_stats {
+ u64 latest_purr;
+ u64 latest_spurr;
+ u64 latest_idle_purr;
+ u64 latest_idle_spurr;
+ unsigned long last_update_jiffies;
+};
+
+DEFINE_PER_CPU(struct util_acct_stats, util_acct_stats);
+
+static void update_util_acct_stats(void *ptr)
{
- u64 *ret = val;
+ struct util_acct_stats *stats = ptr;

- *ret = read_this_idle_purr();
+ stats->latest_purr = mfspr(SPRN_PURR);
+ stats->latest_spurr = mfspr(SPRN_SPURR);
+ stats->latest_idle_purr = read_this_idle_purr();
+ stats->latest_idle_spurr = read_this_idle_spurr();
+ stats->last_update_jiffies = jiffies;
}

-static ssize_t idle_purr_show(struct device *dev,
- struct device_attribute *attr, char *buf)
+struct util_acct_stats *get_util_stats_ptr(int cpu)
+{
+ struct util_acct_stats *stats = per_cpu_ptr(&util_acct_stats, cpu);
+ unsigned long delta_jiffies;
+
+ delta_jiffies = jiffies - stats->last_update_jiffies;
+
+ /*
+ * If we have a recent enough data, reuse that instead of
+ * sending an IPI.
+ */
+ if (jiffies_to_msecs(delta_jiffies) < util_stats_staleness_tolerance_ms)
+ return stats;
+
+ smp_call_function_single(cpu, update_util_acct_stats, stats, 1);
+ return stats;
+}
+
+static ssize_t show_purr(struct device *dev,
+ struct device_attribute *attr, char *buf)
{
struct cpu *cpu = container_of(dev, struct cpu, dev);
- u64 val;
+ struct util_acct_stats *stats;

- smp_call_function_single(cpu->dev.id, read_idle_purr, &val, 1);
- return sprintf(buf, "%llx\n", val);
+ stats = get_util_stats_ptr(cpu->dev.id);
+ return sprintf(buf, "%llx\n", stats->latest_purr);
}
-static DEVICE_ATTR(idle_purr, 0400, idle_purr_show, NULL);

-static void read_idle_spurr(void *val)
+static void write_purr(void *val)
{
- u64 *ret = val;
+ mtspr(SPRN_PURR, *(unsigned long *)val);
+}

- *ret = read_this_idle_spurr();
+static ssize_t __used store_purr(struct device *dev,
+ struct device_attribute *attr,
+ const char *buf, size_t count)
+{
+ struct cpu *cpu = container_of(dev, struct cpu, dev);
+ unsigned long val;
+ int ret = kstrtoul(buf, 16, &val);
+
+ if (ret != 0)
+ return -EINVAL;
+
+ smp_call_function_single(cpu->dev.id, write_purr, &val, 1);
+ return count;
+}
+static DEVICE_ATTR(purr, 0400, show_purr, store_purr);
+
+static ssize_t show_spurr(struct device *dev,
+ struct device_attribute *attr, char *buf)
+{
+ struct cpu *cpu = container_of(dev, struct cpu, dev);
+ struct util_acct_stats *stats;
+
+ stats = get_util_stats_ptr(cpu->dev.id);
+ return sprintf(buf, "%llx\n", stats->latest_spurr);
}
+static DEVICE_ATTR(spurr, 0400, show_spurr, NULL);
+
+static ssize_t idle_purr_show(struct device *dev,
+ struct device_attribute *attr, char *buf)
+{
+ struct cpu *cpu = container_of(dev, struct cpu, dev);
+ struct util_acct_stats *stats;
+
+ stats = get_util_stats_ptr(cpu->dev.id);
+ return sprintf(buf, "%llx\n", stats->latest_idle_purr);
+}
+static DEVICE_ATTR(idle_purr, 0400, idle_purr_show, NULL);

static ssize_t idle_spurr_show(struct device *dev,
struct device_attribute *attr, char *buf)
{
struct cpu *cpu = container_of(dev, struct cpu, dev);
- u64 val;
+ struct util_acct_stats *stats;

- smp_call_function_single(cpu->dev.id, read_idle_spurr, &val, 1);
- return sprintf(buf, "%llx\n", val);
+ stats = get_util_stats_ptr(cpu->dev.id);
+ return sprintf(buf, "%llx\n", stats->latest_idle_spurr);
}
static DEVICE_ATTR(idle_spurr, 0400, idle_spurr_show, NULL);

--
1.9.4

2020-03-11 09:39:25

by Gautham R Shenoy

[permalink] [raw]
Subject: [PATCH v3 2/6] powerpc/idle: Add accessor function to always read latest idle PURR

From: "Gautham R. Shenoy" <[email protected]>

Currently when CPU goes idle, we take a snapshot of PURR via
pseries_idle_prolog() which is used at the CPU idle exit to compute
the idle PURR cycles via the function pseries_idle_epilog(). Thus,
the value of idle PURR cycle thus read before pseries_idle_prolog() and
after pseries_idle_epilog() is always correct.

However, if we were to read the idle PURR cycles from an interrupt
context between pseries_idle_prolog() and pseries_idle_epilog() (this will
be done in a future patch), then, the value of the idle PURR thus read
will not include the cycles spent in the most recent idle period.

This patch addresses the issue by providing accessor function to read
the idle PURR such such that it includes the cycles spent in the most
recent idle period, if we read it between pseries_idle_prolog() and
pseries_idle_epilog(). In order to achieve it, the patch saves the
snapshot of PURR in pseries_idle_prolog() in a per-cpu variable,
instead of on the stack, so that it can be accessed from an interrupt
context.

Signed-off-by: Gautham R. Shenoy <[email protected]>
---
arch/powerpc/include/asm/idle.h | 46 +++++++++++++++++++++++++++-------
arch/powerpc/platforms/pseries/setup.c | 7 +++---
drivers/cpuidle/cpuidle-pseries.c | 15 +++++------
3 files changed, 46 insertions(+), 22 deletions(-)

diff --git a/arch/powerpc/include/asm/idle.h b/arch/powerpc/include/asm/idle.h
index e838ea5..7552823 100644
--- a/arch/powerpc/include/asm/idle.h
+++ b/arch/powerpc/include/asm/idle.h
@@ -3,10 +3,27 @@
#define _ASM_POWERPC_IDLE_H
#include <asm/runlatch.h>

-static inline void pseries_idle_prolog(unsigned long *in_purr)
+DECLARE_PER_CPU(u64, idle_entry_purr_snap);
+
+static inline void snapshot_purr_idle_entry(void)
+{
+ *this_cpu_ptr(&idle_entry_purr_snap) = mfspr(SPRN_PURR);
+}
+
+static inline void update_idle_purr_accounting(void)
+{
+ u64 wait_cycles;
+ u64 in_purr = *this_cpu_ptr(&idle_entry_purr_snap);
+
+ wait_cycles = be64_to_cpu(get_lppaca()->wait_state_cycles);
+ wait_cycles += mfspr(SPRN_PURR) - in_purr;
+ get_lppaca()->wait_state_cycles = cpu_to_be64(wait_cycles);
+}
+
+static inline void pseries_idle_prolog(void)
{
ppc64_runlatch_off();
- *in_purr = mfspr(SPRN_PURR);
+ snapshot_purr_idle_entry();
/*
* Indicate to the HV that we are idle. Now would be
* a good time to find other work to dispatch.
@@ -14,15 +31,26 @@ static inline void pseries_idle_prolog(unsigned long *in_purr)
get_lppaca()->idle = 1;
}

-static inline void pseries_idle_epilog(unsigned long in_purr)
+static inline void pseries_idle_epilog(void)
{
- u64 wait_cycles;
-
- wait_cycles = be64_to_cpu(get_lppaca()->wait_state_cycles);
- wait_cycles += mfspr(SPRN_PURR) - in_purr;
- get_lppaca()->wait_state_cycles = cpu_to_be64(wait_cycles);
+ update_idle_purr_accounting();
get_lppaca()->idle = 0;
-
ppc64_runlatch_on();
}
+
+static inline u64 read_this_idle_purr(void)
+{
+ /*
+ * If we are reading from an idle context, update the
+ * idle-purr cycles corresponding to the last idle period.
+ * Since the idle context is not yet over, take a fresh
+ * snapshot of the idle-purr.
+ */
+ if (unlikely(get_lppaca()->idle == 1)) {
+ update_idle_purr_accounting();
+ snapshot_purr_idle_entry();
+ }
+
+ return be64_to_cpu(get_lppaca()->wait_state_cycles);
+}
#endif
diff --git a/arch/powerpc/platforms/pseries/setup.c b/arch/powerpc/platforms/pseries/setup.c
index 2f53e6b..4905c96 100644
--- a/arch/powerpc/platforms/pseries/setup.c
+++ b/arch/powerpc/platforms/pseries/setup.c
@@ -318,10 +318,9 @@ static int alloc_dispatch_log_kmem_cache(void)
}
machine_early_initcall(pseries, alloc_dispatch_log_kmem_cache);

+DEFINE_PER_CPU(u64, idle_entry_purr_snap);
static void pseries_lpar_idle(void)
{
- unsigned long in_purr;
-
/*
* Default handler to go into low thread priority and possibly
* low power mode by ceding processor to hypervisor
@@ -331,7 +330,7 @@ static void pseries_lpar_idle(void)
return;

/* Indicate to hypervisor that we are idle. */
- pseries_idle_prolog(&in_purr);
+ pseries_idle_prolog();

/*
* Yield the processor to the hypervisor. We return if
@@ -342,7 +341,7 @@ static void pseries_lpar_idle(void)
*/
cede_processor();

- pseries_idle_epilog(in_purr);
+ pseries_idle_epilog();
}

/*
diff --git a/drivers/cpuidle/cpuidle-pseries.c b/drivers/cpuidle/cpuidle-pseries.c
index 46d5e05..6513ef2 100644
--- a/drivers/cpuidle/cpuidle-pseries.c
+++ b/drivers/cpuidle/cpuidle-pseries.c
@@ -36,12 +36,11 @@ static int snooze_loop(struct cpuidle_device *dev,
struct cpuidle_driver *drv,
int index)
{
- unsigned long in_purr;
u64 snooze_exit_time;

set_thread_flag(TIF_POLLING_NRFLAG);

- pseries_idle_prolog(&in_purr);
+ pseries_idle_prolog();
local_irq_enable();
snooze_exit_time = get_tb() + snooze_timeout;

@@ -65,7 +64,7 @@ static int snooze_loop(struct cpuidle_device *dev,

local_irq_disable();

- pseries_idle_epilog(in_purr);
+ pseries_idle_epilog();

return index;
}
@@ -91,9 +90,8 @@ static int dedicated_cede_loop(struct cpuidle_device *dev,
struct cpuidle_driver *drv,
int index)
{
- unsigned long in_purr;

- pseries_idle_prolog(&in_purr);
+ pseries_idle_prolog();
get_lppaca()->donate_dedicated_cpu = 1;

HMT_medium();
@@ -102,7 +100,7 @@ static int dedicated_cede_loop(struct cpuidle_device *dev,
local_irq_disable();
get_lppaca()->donate_dedicated_cpu = 0;

- pseries_idle_epilog(in_purr);
+ pseries_idle_epilog();

return index;
}
@@ -111,9 +109,8 @@ static int shared_cede_loop(struct cpuidle_device *dev,
struct cpuidle_driver *drv,
int index)
{
- unsigned long in_purr;

- pseries_idle_prolog(&in_purr);
+ pseries_idle_prolog();

/*
* Yield the processor to the hypervisor. We return if
@@ -125,7 +122,7 @@ static int shared_cede_loop(struct cpuidle_device *dev,
check_and_cede_processor();

local_irq_disable();
- pseries_idle_epilog(in_purr);
+ pseries_idle_epilog();

return index;
}
--
1.9.4

2020-03-11 09:40:34

by Gautham R Shenoy

[permalink] [raw]
Subject: [PATCH v3 4/6] powerpc/sysfs: Show idle_purr and idle_spurr for every CPU

From: "Gautham R. Shenoy" <[email protected]>

On Pseries LPARs, to calculate utilization, we need to know the
[S]PURR ticks when the CPUs were busy or idle.

The total PURR and SPURR ticks are already exposed via the per-cpu
sysfs files "purr" and "spurr". This patch adds support for exposing
the idle PURR and SPURR ticks via new per-cpu sysfs files named
"idle_purr" and "idle_spurr".

Signed-off-by: Gautham R. Shenoy <[email protected]>
---
arch/powerpc/kernel/sysfs.c | 54 ++++++++++++++++++++++++++++++++++++++++++---
1 file changed, 51 insertions(+), 3 deletions(-)

diff --git a/arch/powerpc/kernel/sysfs.c b/arch/powerpc/kernel/sysfs.c
index 479c706..c9ddb83 100644
--- a/arch/powerpc/kernel/sysfs.c
+++ b/arch/powerpc/kernel/sysfs.c
@@ -19,6 +19,7 @@
#include <asm/smp.h>
#include <asm/pmc.h>
#include <asm/firmware.h>
+#include <asm/idle.h>
#include <asm/svm.h>

#include "cacheinfo.h"
@@ -760,6 +761,42 @@ static void create_svm_file(void)
}
#endif /* CONFIG_PPC_SVM */

+static void read_idle_purr(void *val)
+{
+ u64 *ret = val;
+
+ *ret = read_this_idle_purr();
+}
+
+static ssize_t idle_purr_show(struct device *dev,
+ struct device_attribute *attr, char *buf)
+{
+ struct cpu *cpu = container_of(dev, struct cpu, dev);
+ u64 val;
+
+ smp_call_function_single(cpu->dev.id, read_idle_purr, &val, 1);
+ return sprintf(buf, "%llx\n", val);
+}
+static DEVICE_ATTR(idle_purr, 0400, idle_purr_show, NULL);
+
+static void read_idle_spurr(void *val)
+{
+ u64 *ret = val;
+
+ *ret = read_this_idle_spurr();
+}
+
+static ssize_t idle_spurr_show(struct device *dev,
+ struct device_attribute *attr, char *buf)
+{
+ struct cpu *cpu = container_of(dev, struct cpu, dev);
+ u64 val;
+
+ smp_call_function_single(cpu->dev.id, read_idle_spurr, &val, 1);
+ return sprintf(buf, "%llx\n", val);
+}
+static DEVICE_ATTR(idle_spurr, 0400, idle_spurr_show, NULL);
+
static int register_cpu_online(unsigned int cpu)
{
struct cpu *c = &per_cpu(cpu_devices, cpu);
@@ -823,10 +860,15 @@ static int register_cpu_online(unsigned int cpu)
if (!firmware_has_feature(FW_FEATURE_LPAR))
add_write_permission_dev_attr(&dev_attr_purr);
device_create_file(s, &dev_attr_purr);
+ if (firmware_has_feature(FW_FEATURE_LPAR))
+ device_create_file(s, &dev_attr_idle_purr);
}

- if (cpu_has_feature(CPU_FTR_SPURR))
+ if (cpu_has_feature(CPU_FTR_SPURR)) {
device_create_file(s, &dev_attr_spurr);
+ if (firmware_has_feature(FW_FEATURE_LPAR))
+ device_create_file(s, &dev_attr_idle_spurr);
+ }

if (cpu_has_feature(CPU_FTR_DSCR))
device_create_file(s, &dev_attr_dscr);
@@ -910,11 +952,17 @@ static int unregister_cpu_online(unsigned int cpu)
device_remove_file(s, &dev_attr_mmcra);
#endif /* CONFIG_PMU_SYSFS */

- if (cpu_has_feature(CPU_FTR_PURR))
+ if (cpu_has_feature(CPU_FTR_PURR)) {
device_remove_file(s, &dev_attr_purr);
+ if (firmware_has_feature(FW_FEATURE_LPAR))
+ device_remove_file(s, &dev_attr_idle_purr);
+ }

- if (cpu_has_feature(CPU_FTR_SPURR))
+ if (cpu_has_feature(CPU_FTR_SPURR)) {
device_remove_file(s, &dev_attr_spurr);
+ if (firmware_has_feature(FW_FEATURE_LPAR))
+ device_remove_file(s, &dev_attr_idle_spurr);
+ }

if (cpu_has_feature(CPU_FTR_DSCR))
device_remove_file(s, &dev_attr_dscr);
--
1.9.4