2019-11-27 12:04:02

by Gautham R Shenoy

[permalink] [raw]
Subject: [PATCH 0/3] pseries: Track and expose idle PURR and SPURR ticks

From: "Gautham R. Shenoy" <[email protected]>

On PSeries LPARs, the data centers planners desire a more accurate
view of system utilization per resource such as CPU to plan the system
capacity requirements better. Such accuracy can be obtained by reading
PURR/SPURR registers for CPU resource utilization.

Tools such as lparstat which are used to compute the utilization need
to know [S]PURR ticks when the cpu was busy or idle. The [S]PURR
counters are already exposed through sysfs. We already account for
PURR ticks when we go to idle so that we can update the VPA area. This
patchset extends support to account for SPURR ticks when idle, and
expose both via per-cpu sysfs files.

These patches are required for enhancement to the lparstat utility
that compute the CPU utilization based on PURR and SPURR which can be
found here :
https://groups.google.com/forum/#!topic/powerpc-utils-devel/fYRo69xO9r4

Gautham R. Shenoy (3):
powerpc/pseries: Account for SPURR ticks on idle CPUs
powerpc/sysfs: Show idle_purr and idle_spurr for every CPU
Documentation: Document sysfs interfaces purr, spurr, idle_purr,
idle_spurr

Documentation/ABI/testing/sysfs-devices-system-cpu | 39 ++++++++++++++++++++++
arch/powerpc/kernel/idle.c | 2 ++
arch/powerpc/kernel/sysfs.c | 32 ++++++++++++++++++
drivers/cpuidle/cpuidle-pseries.c | 28 ++++++++++------
4 files changed, 90 insertions(+), 11 deletions(-)

--
1.9.4


2019-11-27 12:05:28

by Gautham R Shenoy

[permalink] [raw]
Subject: [PATCH 1/3] powerpc/pseries: Account for SPURR ticks on idle CPUs

From: "Gautham R. Shenoy" <[email protected]>

On PSeries LPARs, to compute the utilization, tools such as lparstat
need to know the [S]PURR ticks when the CPUs were busy or idle.

In the pseries cpuidle driver, we keep track of the idle PURR ticks in
the VPA variable "wait_state_cycles". This patch extends the support
to account for the idle SPURR ticks.

Signed-off-by: Gautham R. Shenoy <[email protected]>
---
arch/powerpc/kernel/idle.c | 2 ++
drivers/cpuidle/cpuidle-pseries.c | 28 +++++++++++++++++-----------
2 files changed, 19 insertions(+), 11 deletions(-)

diff --git a/arch/powerpc/kernel/idle.c b/arch/powerpc/kernel/idle.c
index a36fd05..708ec68 100644
--- a/arch/powerpc/kernel/idle.c
+++ b/arch/powerpc/kernel/idle.c
@@ -33,6 +33,8 @@
unsigned long cpuidle_disable = IDLE_NO_OVERRIDE;
EXPORT_SYMBOL(cpuidle_disable);

+DEFINE_PER_CPU(u64, idle_spurr_cycles);
+
static int __init powersave_off(char *arg)
{
ppc_md.power_save = NULL;
diff --git a/drivers/cpuidle/cpuidle-pseries.c b/drivers/cpuidle/cpuidle-pseries.c
index 74c2479..45e2be4 100644
--- a/drivers/cpuidle/cpuidle-pseries.c
+++ b/drivers/cpuidle/cpuidle-pseries.c
@@ -30,11 +30,14 @@ struct cpuidle_driver pseries_idle_driver = {
static struct cpuidle_state *cpuidle_state_table __read_mostly;
static u64 snooze_timeout __read_mostly;
static bool snooze_timeout_en __read_mostly;
+DECLARE_PER_CPU(u64, idle_spurr_cycles);

-static inline void idle_loop_prolog(unsigned long *in_purr)
+static inline void idle_loop_prolog(unsigned long *in_purr,
+ unsigned long *in_spurr)
{
ppc64_runlatch_off();
*in_purr = mfspr(SPRN_PURR);
+ *in_spurr = mfspr(SPRN_SPURR);
/*
* Indicate to the HV that we are idle. Now would be
* a good time to find other work to dispatch.
@@ -42,13 +45,16 @@ static inline void idle_loop_prolog(unsigned long *in_purr)
get_lppaca()->idle = 1;
}

-static inline void idle_loop_epilog(unsigned long in_purr)
+static inline void idle_loop_epilog(unsigned long in_purr,
+ unsigned long in_spurr)
{
u64 wait_cycles;
+ u64 *idle_spurr_cycles_ptr = this_cpu_ptr(&idle_spurr_cycles);

wait_cycles = be64_to_cpu(get_lppaca()->wait_state_cycles);
wait_cycles += mfspr(SPRN_PURR) - in_purr;
get_lppaca()->wait_state_cycles = cpu_to_be64(wait_cycles);
+ *idle_spurr_cycles_ptr += mfspr(SPRN_SPURR) - in_spurr;
get_lppaca()->idle = 0;

ppc64_runlatch_on();
@@ -58,12 +64,12 @@ static int snooze_loop(struct cpuidle_device *dev,
struct cpuidle_driver *drv,
int index)
{
- unsigned long in_purr;
+ unsigned long in_purr, in_spurr;
u64 snooze_exit_time;

set_thread_flag(TIF_POLLING_NRFLAG);

- idle_loop_prolog(&in_purr);
+ idle_loop_prolog(&in_purr, &in_spurr);
local_irq_enable();
snooze_exit_time = get_tb() + snooze_timeout;

@@ -87,7 +93,7 @@ static int snooze_loop(struct cpuidle_device *dev,

local_irq_disable();

- idle_loop_epilog(in_purr);
+ idle_loop_epilog(in_purr, in_spurr);

return index;
}
@@ -113,9 +119,9 @@ static int dedicated_cede_loop(struct cpuidle_device *dev,
struct cpuidle_driver *drv,
int index)
{
- unsigned long in_purr;
+ unsigned long in_purr, in_spurr;

- idle_loop_prolog(&in_purr);
+ idle_loop_prolog(&in_purr, &in_spurr);
get_lppaca()->donate_dedicated_cpu = 1;

HMT_medium();
@@ -124,7 +130,7 @@ static int dedicated_cede_loop(struct cpuidle_device *dev,
local_irq_disable();
get_lppaca()->donate_dedicated_cpu = 0;

- idle_loop_epilog(in_purr);
+ idle_loop_epilog(in_purr, in_spurr);

return index;
}
@@ -133,9 +139,9 @@ static int shared_cede_loop(struct cpuidle_device *dev,
struct cpuidle_driver *drv,
int index)
{
- unsigned long in_purr;
+ unsigned long in_purr, in_spurr;

- idle_loop_prolog(&in_purr);
+ idle_loop_prolog(&in_purr, &in_spurr);

/*
* Yield the processor to the hypervisor. We return if
@@ -147,7 +153,7 @@ static int shared_cede_loop(struct cpuidle_device *dev,
check_and_cede_processor();

local_irq_disable();
- idle_loop_epilog(in_purr);
+ idle_loop_epilog(in_purr, in_spurr);

return index;
}
--
1.9.4

2019-11-27 12:05:45

by Gautham R Shenoy

[permalink] [raw]
Subject: [PATCH 2/3] powerpc/sysfs: Show idle_purr and idle_spurr for every CPU

From: "Gautham R. Shenoy" <[email protected]>

On Pseries LPARs, to calculate utilization, we need to know the
[S]PURR ticks when the CPUs were busy or idle.

The total PURR and SPURR ticks are already exposed via the per-cpu
sysfs files /sys/devices/system/cpu/cpuX/purr and
/sys/devices/system/cpu/cpuX/spurr.

This patch adds support for exposing the idle PURR and SPURR ticks via
/sys/devices/system/cpu/cpuX/idle_purr and
/sys/devices/system/cpu/cpuX/idle_spurr.

Signed-off-by: Gautham R. Shenoy <[email protected]>
---
arch/powerpc/kernel/sysfs.c | 32 ++++++++++++++++++++++++++++++++
1 file changed, 32 insertions(+)

diff --git a/arch/powerpc/kernel/sysfs.c b/arch/powerpc/kernel/sysfs.c
index 80a676d..42ade55 100644
--- a/arch/powerpc/kernel/sysfs.c
+++ b/arch/powerpc/kernel/sysfs.c
@@ -1044,6 +1044,36 @@ static ssize_t show_physical_id(struct device *dev,
}
static DEVICE_ATTR(physical_id, 0444, show_physical_id, NULL);

+static ssize_t idle_purr_show(struct device *dev,
+ struct device_attribute *attr, char *buf)
+{
+ struct cpu *cpu = container_of(dev, struct cpu, dev);
+ unsigned int cpuid = cpu->dev.id;
+ struct lppaca *cpu_lppaca_ptr = paca_ptrs[cpuid]->lppaca_ptr;
+ u64 idle_purr_cycles = be64_to_cpu(cpu_lppaca_ptr->wait_state_cycles);
+
+ return sprintf(buf, "%llx\n", idle_purr_cycles);
+}
+static DEVICE_ATTR_RO(idle_purr);
+
+DECLARE_PER_CPU(u64, idle_spurr_cycles);
+static ssize_t idle_spurr_show(struct device *dev,
+ struct device_attribute *attr, char *buf)
+{
+ struct cpu *cpu = container_of(dev, struct cpu, dev);
+ unsigned int cpuid = cpu->dev.id;
+ u64 *idle_spurr_cycles_ptr = per_cpu_ptr(&idle_spurr_cycles, cpuid);
+
+ return sprintf(buf, "%llx\n", *idle_spurr_cycles_ptr);
+}
+static DEVICE_ATTR_RO(idle_spurr);
+
+static void create_idle_purr_spurr_sysfs_entry(struct device *cpudev)
+{
+ device_create_file(cpudev, &dev_attr_idle_purr);
+ device_create_file(cpudev, &dev_attr_idle_spurr);
+}
+
static int __init topology_init(void)
{
int cpu, r;
@@ -1067,6 +1097,8 @@ static int __init topology_init(void)
register_cpu(c, cpu);

device_create_file(&c->dev, &dev_attr_physical_id);
+ if (firmware_has_feature(FW_FEATURE_SPLPAR))
+ create_idle_purr_spurr_sysfs_entry(&c->dev);
}
}
r = cpuhp_setup_state(CPUHP_AP_ONLINE_DYN, "powerpc/topology:online",
--
1.9.4

2019-11-27 12:07:16

by Gautham R Shenoy

[permalink] [raw]
Subject: [PATCH 3/3] Documentation: Document sysfs interfaces purr, spurr, idle_purr, idle_spurr

From: "Gautham R. Shenoy" <[email protected]>

Add documentation for the following sysfs interfaces:
/sys/devices/system/cpu/cpuX/purr
/sys/devices/system/cpu/cpuX/spurr
/sys/devices/system/cpu/cpuX/idle_purr
/sys/devices/system/cpu/cpuX/idle_spurr

Signed-off-by: Gautham R. Shenoy <[email protected]>
---
Documentation/ABI/testing/sysfs-devices-system-cpu | 39 ++++++++++++++++++++++
1 file changed, 39 insertions(+)

diff --git a/Documentation/ABI/testing/sysfs-devices-system-cpu b/Documentation/ABI/testing/sysfs-devices-system-cpu
index fc20cde..ecd23fb 100644
--- a/Documentation/ABI/testing/sysfs-devices-system-cpu
+++ b/Documentation/ABI/testing/sysfs-devices-system-cpu
@@ -574,3 +574,42 @@ Description: Secure Virtual Machine
If 1, it means the system is using the Protected Execution
Facility in POWER9 and newer processors. i.e., it is a Secure
Virtual Machine.
+
+What: /sys/devices/system/cpu/cpuX/purr
+Date: Apr 2005
+Contact: Linux for PowerPC mailing list <[email protected]>
+Description: PURR ticks for this CPU since the system boot.
+
+ The Processor Utilization Resources Register (PURR) is
+ a 64-bit counter which provides an estimate of the
+ resources used by the CPU thread. The contents of this
+ register increases monotonically. This sysfs interface
+ exposes the number of PURR ticks for cpuX.
+
+What: /sys/devices/system/cpu/cpuX/spurr
+Date: Dec 2006
+Contact: Linux for PowerPC mailing list <[email protected]>
+Description: SPURR ticks for this CPU since the system boot.
+
+ The Scaled Processor Utilization Resources Register
+ (SPURR) is a 64-bit counter that provides a frequency
+ invariant estimate of the resources used by the CPU
+ thread. The contents of this register increases
+ monotonically. This sysfs interface exposes the number
+ of SPURR ticks for cpuX.
+
+What: /sys/devices/system/cpu/cpuX/idle_purr
+Date: Nov 2019
+Contact: Linux for PowerPC mailing list <[email protected]>
+Description: PURR ticks for cpuX when it was idle.
+
+ This sysfs interface exposes the number of PURR ticks
+ for cpuX when it was idle.
+
+What: /sys/devices/system/cpu/cpuX/spurr
+Date: Nov 2019
+Contact: Linux for PowerPC mailing list <[email protected]>
+Description: SPURR ticks for cpuX when it was idle.
+
+ This sysfs interface exposes the number of SPURR ticks
+ for cpuX when it was idle.
--
1.9.4

2019-12-03 13:39:08

by Kamalesh Babulal

[permalink] [raw]
Subject: Re: [PATCH 2/3] powerpc/sysfs: Show idle_purr and idle_spurr for every CPU

On 11/27/19 5:31 PM, Gautham R. Shenoy wrote:
> From: "Gautham R. Shenoy" <[email protected]>
>
> On Pseries LPARs, to calculate utilization, we need to know the
> [S]PURR ticks when the CPUs were busy or idle.
>
> The total PURR and SPURR ticks are already exposed via the per-cpu
> sysfs files /sys/devices/system/cpu/cpuX/purr and
> /sys/devices/system/cpu/cpuX/spurr.
>
> This patch adds support for exposing the idle PURR and SPURR ticks via
> /sys/devices/system/cpu/cpuX/idle_purr and
> /sys/devices/system/cpu/cpuX/idle_spurr.

The patch looks good to me, with a minor file mode nit pick mentioned below.

>
> Signed-off-by: Gautham R. Shenoy <[email protected]>
> ---
> arch/powerpc/kernel/sysfs.c | 32 ++++++++++++++++++++++++++++++++
> 1 file changed, 32 insertions(+)
>
> diff --git a/arch/powerpc/kernel/sysfs.c b/arch/powerpc/kernel/sysfs.c
> index 80a676d..42ade55 100644
> --- a/arch/powerpc/kernel/sysfs.c
> +++ b/arch/powerpc/kernel/sysfs.c
> @@ -1044,6 +1044,36 @@ static ssize_t show_physical_id(struct device *dev,
> }
> static DEVICE_ATTR(physical_id, 0444, show_physical_id, NULL);
>
> +static ssize_t idle_purr_show(struct device *dev,
> + struct device_attribute *attr, char *buf)
> +{
> + struct cpu *cpu = container_of(dev, struct cpu, dev);
> + unsigned int cpuid = cpu->dev.id;
> + struct lppaca *cpu_lppaca_ptr = paca_ptrs[cpuid]->lppaca_ptr;
> + u64 idle_purr_cycles = be64_to_cpu(cpu_lppaca_ptr->wait_state_cycles);
> +
> + return sprintf(buf, "%llx\n", idle_purr_cycles);
> +}
> +static DEVICE_ATTR_RO(idle_purr);

per cpu purr/spurr sysfs file is created with file mode 0400. Using
DEVICE_ATTR_RO for their idle_* variants will create sysfs files with 0444 as
their file mode, you should probably use DEVICE_ATTR() with file mode 0400 to
have consist permission for both variants.

--
Kamalesh

2019-12-03 13:41:12

by Kamalesh Babulal

[permalink] [raw]
Subject: Re: [PATCH 1/3] powerpc/pseries: Account for SPURR ticks on idle CPUs

On 11/27/19 5:31 PM, Gautham R. Shenoy wrote:
> From: "Gautham R. Shenoy" <[email protected]>
>
> On PSeries LPARs, to compute the utilization, tools such as lparstat
> need to know the [S]PURR ticks when the CPUs were busy or idle.
>
> In the pseries cpuidle driver, we keep track of the idle PURR ticks in
> the VPA variable "wait_state_cycles". This patch extends the support
> to account for the idle SPURR ticks.

Thanks for working on it.

>
> Signed-off-by: Gautham R. Shenoy <[email protected]>

Reviewed-by: Kamalesh Babulal <[email protected]>

2019-12-03 21:04:12

by kernel test robot

[permalink] [raw]
Subject: Re: [PATCH 2/3] powerpc/sysfs: Show idle_purr and idle_spurr for every CPU

Hi "Gautham,

Thank you for the patch! Yet something to improve:

[auto build test ERROR on powerpc/next]
[also build test ERROR on v5.4 next-20191203]
[if your patch is applied to the wrong git tree, please drop us a note to help
improve the system. BTW, we also suggest to use '--base' option to specify the
base tree in git format-patch, please see https://stackoverflow.com/a/37406982]

url: https://github.com/0day-ci/linux/commits/Gautham-R-Shenoy/powerpc-pseries-Account-for-SPURR-ticks-on-idle-CPUs/20191127-234537
base: https://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux.git next
config: powerpc-allnoconfig (attached as .config)
compiler: powerpc-linux-gcc (GCC) 7.5.0
reproduce:
wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross
chmod +x ~/bin/make.cross
# save the attached .config to linux build tree
GCC_VERSION=7.5.0 make.cross ARCH=powerpc

If you fix the issue, kindly add following tag
Reported-by: kbuild test robot <[email protected]>

Note: the linux-review/Gautham-R-Shenoy/powerpc-pseries-Account-for-SPURR-ticks-on-idle-CPUs/20191127-234537 HEAD 54932d09dae77a0b15c47c7e51f0fb6e7d34900f builds fine.
It only hurts bisectibility.

All errors (new ones prefixed by >>):

arch/powerpc/kernel/sysfs.c: In function 'idle_purr_show':
>> arch/powerpc/kernel/sysfs.c:1052:34: error: 'paca_ptrs' undeclared (first use in this function); did you mean 'hash_ptr'?
struct lppaca *cpu_lppaca_ptr = paca_ptrs[cpuid]->lppaca_ptr;
^~~~~~~~~
hash_ptr
arch/powerpc/kernel/sysfs.c:1052:34: note: each undeclared identifier is reported only once for each function it appears in
In file included from include/linux/byteorder/big_endian.h:5:0,
from arch/powerpc/include/uapi/asm/byteorder.h:14,
from include/asm-generic/bitops/le.h:6,
from arch/powerpc/include/asm/bitops.h:243,
from include/linux/bitops.h:26,
from include/linux/kernel.h:12,
from include/linux/list.h:9,
from include/linux/kobject.h:19,
from include/linux/device.h:16,
from arch/powerpc/kernel/sysfs.c:2:
>> arch/powerpc/kernel/sysfs.c:1053:51: error: dereferencing pointer to incomplete type 'struct lppaca'
u64 idle_purr_cycles = be64_to_cpu(cpu_lppaca_ptr->wait_state_cycles);
^
include/uapi/linux/byteorder/big_endian.h:38:51: note: in definition of macro '__be64_to_cpu'
#define __be64_to_cpu(x) ((__force __u64)(__be64)(x))
^
arch/powerpc/kernel/sysfs.c:1053:25: note: in expansion of macro 'be64_to_cpu'
u64 idle_purr_cycles = be64_to_cpu(cpu_lppaca_ptr->wait_state_cycles);
^~~~~~~~~~~

vim +1052 arch/powerpc/kernel/sysfs.c

1046
1047 static ssize_t idle_purr_show(struct device *dev,
1048 struct device_attribute *attr, char *buf)
1049 {
1050 struct cpu *cpu = container_of(dev, struct cpu, dev);
1051 unsigned int cpuid = cpu->dev.id;
> 1052 struct lppaca *cpu_lppaca_ptr = paca_ptrs[cpuid]->lppaca_ptr;
> 1053 u64 idle_purr_cycles = be64_to_cpu(cpu_lppaca_ptr->wait_state_cycles);
1054
1055 return sprintf(buf, "%llx\n", idle_purr_cycles);
1056 }
1057 static DEVICE_ATTR_RO(idle_purr);
1058

---
0-DAY kernel test infrastructure Open Source Technology Center
https://lists.01.org/hyperkitty/list/[email protected] Intel Corporation


Attachments:
(No filename) (3.68 kB)
.config.gz (6.23 kB)
Download all attachments

2019-12-04 12:45:50

by Gautham R Shenoy

[permalink] [raw]
Subject: Re: [PATCH 2/3] powerpc/sysfs: Show idle_purr and idle_spurr for every CPU

Hi Kamalesh,

On Tue, Dec 03, 2019 at 07:07:53PM +0530, Kamalesh Babulal wrote:
> On 11/27/19 5:31 PM, Gautham R. Shenoy wrote:
> > From: "Gautham R. Shenoy" <[email protected]>
> >
> > On Pseries LPARs, to calculate utilization, we need to know the
> > [S]PURR ticks when the CPUs were busy or idle.
> >
> > The total PURR and SPURR ticks are already exposed via the per-cpu
> > sysfs files /sys/devices/system/cpu/cpuX/purr and
> > /sys/devices/system/cpu/cpuX/spurr.
> >
> > This patch adds support for exposing the idle PURR and SPURR ticks via
> > /sys/devices/system/cpu/cpuX/idle_purr and
> > /sys/devices/system/cpu/cpuX/idle_spurr.
>
> The patch looks good to me, with a minor file mode nit pick mentioned below.
>
> >
> > Signed-off-by: Gautham R. Shenoy <[email protected]>
> > ---
> > arch/powerpc/kernel/sysfs.c | 32 ++++++++++++++++++++++++++++++++
> > 1 file changed, 32 insertions(+)
> >
> > diff --git a/arch/powerpc/kernel/sysfs.c b/arch/powerpc/kernel/sysfs.c
> > index 80a676d..42ade55 100644
> > --- a/arch/powerpc/kernel/sysfs.c
> > +++ b/arch/powerpc/kernel/sysfs.c
> > @@ -1044,6 +1044,36 @@ static ssize_t show_physical_id(struct device *dev,
> > }
> > static DEVICE_ATTR(physical_id, 0444, show_physical_id, NULL);
> >
> > +static ssize_t idle_purr_show(struct device *dev,
> > + struct device_attribute *attr, char *buf)
> > +{
> > + struct cpu *cpu = container_of(dev, struct cpu, dev);
> > + unsigned int cpuid = cpu->dev.id;
> > + struct lppaca *cpu_lppaca_ptr = paca_ptrs[cpuid]->lppaca_ptr;
> > + u64 idle_purr_cycles = be64_to_cpu(cpu_lppaca_ptr->wait_state_cycles);
> > +
> > + return sprintf(buf, "%llx\n", idle_purr_cycles);
> > +}
> > +static DEVICE_ATTR_RO(idle_purr);
>
> per cpu purr/spurr sysfs file is created with file mode 0400. Using
> DEVICE_ATTR_RO for their idle_* variants will create sysfs files with 0444 as
> their file mode, you should probably use DEVICE_ATTR() with file mode 0400 to
> have consist permission for both variants.

Thanks for catching this. I missed checking the permissions of purr
and spurr. Will send another version.


>
> --
> Kamalesh

2019-12-04 22:25:22

by Nathan Lynch

[permalink] [raw]
Subject: Re: [PATCH 0/3] pseries: Track and expose idle PURR and SPURR ticks

"Gautham R. Shenoy" <[email protected]> writes:
> From: "Gautham R. Shenoy" <[email protected]>
>
> On PSeries LPARs, the data centers planners desire a more accurate
> view of system utilization per resource such as CPU to plan the system
> capacity requirements better. Such accuracy can be obtained by reading
> PURR/SPURR registers for CPU resource utilization.
>
> Tools such as lparstat which are used to compute the utilization need
> to know [S]PURR ticks when the cpu was busy or idle. The [S]PURR
> counters are already exposed through sysfs. We already account for
> PURR ticks when we go to idle so that we can update the VPA area. This
> patchset extends support to account for SPURR ticks when idle, and
> expose both via per-cpu sysfs files.

Does anything really want to use PURR instead of SPURR? Seems like we
should expose only SPURR idle values if possible.

2019-12-04 22:25:22

by Nathan Lynch

[permalink] [raw]
Subject: Re: [PATCH 2/3] powerpc/sysfs: Show idle_purr and idle_spurr for every CPU

"Gautham R. Shenoy" <[email protected]> writes:
> @@ -1067,6 +1097,8 @@ static int __init topology_init(void)
> register_cpu(c, cpu);
>
> device_create_file(&c->dev, &dev_attr_physical_id);
> + if (firmware_has_feature(FW_FEATURE_SPLPAR))
> + create_idle_purr_spurr_sysfs_entry(&c->dev);

Architecturally speaking PURR/SPURR aren't strongly linked to the PAPR
SPLPAR option, are they? I'm not sure it's right for these attributes to
be absent if the platform does not support shared processor mode.

2019-12-04 22:26:17

by Nathan Lynch

[permalink] [raw]
Subject: Re: [PATCH 1/3] powerpc/pseries: Account for SPURR ticks on idle CPUs

"Gautham R. Shenoy" <[email protected]> writes:
> diff --git a/arch/powerpc/kernel/idle.c b/arch/powerpc/kernel/idle.c
> index a36fd05..708ec68 100644
> --- a/arch/powerpc/kernel/idle.c
> +++ b/arch/powerpc/kernel/idle.c
> @@ -33,6 +33,8 @@
> unsigned long cpuidle_disable = IDLE_NO_OVERRIDE;
> EXPORT_SYMBOL(cpuidle_disable);
>
> +DEFINE_PER_CPU(u64, idle_spurr_cycles);
> +

Does idle_spurr_cycles need any special treatment for CPU
online/offline?

> static int __init powersave_off(char *arg)
> {
> ppc_md.power_save = NULL;
> diff --git a/drivers/cpuidle/cpuidle-pseries.c b/drivers/cpuidle/cpuidle-pseries.c
> index 74c2479..45e2be4 100644
> --- a/drivers/cpuidle/cpuidle-pseries.c
> +++ b/drivers/cpuidle/cpuidle-pseries.c
> @@ -30,11 +30,14 @@ struct cpuidle_driver pseries_idle_driver = {
> static struct cpuidle_state *cpuidle_state_table __read_mostly;
> static u64 snooze_timeout __read_mostly;
> static bool snooze_timeout_en __read_mostly;
> +DECLARE_PER_CPU(u64, idle_spurr_cycles);

This belongs in a header...


> -static inline void idle_loop_prolog(unsigned long *in_purr)
> +static inline void idle_loop_prolog(unsigned long *in_purr,
> + unsigned long *in_spurr)
> {
> ppc64_runlatch_off();
> *in_purr = mfspr(SPRN_PURR);
> + *in_spurr = mfspr(SPRN_SPURR);
> /*
> * Indicate to the HV that we are idle. Now would be
> * a good time to find other work to dispatch.
> @@ -42,13 +45,16 @@ static inline void idle_loop_prolog(unsigned long *in_purr)
> get_lppaca()->idle = 1;
> }
>
> -static inline void idle_loop_epilog(unsigned long in_purr)
> +static inline void idle_loop_epilog(unsigned long in_purr,
> + unsigned long in_spurr)
> {
> u64 wait_cycles;
> + u64 *idle_spurr_cycles_ptr = this_cpu_ptr(&idle_spurr_cycles);
>
> wait_cycles = be64_to_cpu(get_lppaca()->wait_state_cycles);
> wait_cycles += mfspr(SPRN_PURR) - in_purr;
> get_lppaca()->wait_state_cycles = cpu_to_be64(wait_cycles);
> + *idle_spurr_cycles_ptr += mfspr(SPRN_SPURR) - in_spurr;

... and the sampling and increment logic probably should be further
encapsulated in accessor functions that can be used in both the cpuidle
driver and the default/generic idle implementation. Or is there some
reason this is specific to the pseries cpuidle driver?

2019-12-04 22:27:21

by Nathan Lynch

[permalink] [raw]
Subject: Re: [PATCH 3/3] Documentation: Document sysfs interfaces purr, spurr, idle_purr, idle_spurr

"Gautham R. Shenoy" <[email protected]> writes:
> +
> +What: /sys/devices/system/cpu/cpuX/idle_purr
> +Date: Nov 2019
> +Contact: Linux for PowerPC mailing list <[email protected]>
> +Description: PURR ticks for cpuX when it was idle.
> +
> + This sysfs interface exposes the number of PURR ticks
> + for cpuX when it was idle.
> +
> +What: /sys/devices/system/cpu/cpuX/spurr
/sys/devices/system/cpu/cpuX/idle_spurr


> +Date: Nov 2019
> +Contact: Linux for PowerPC mailing list <[email protected]>
> +Description: SPURR ticks for cpuX when it was idle.
> +
> + This sysfs interface exposes the number of SPURR ticks
> + for cpuX when it was idle.

2019-12-05 15:07:17

by Kamalesh Babulal

[permalink] [raw]
Subject: Re: [PATCH 0/3] pseries: Track and expose idle PURR and SPURR ticks

On 12/5/19 3:54 AM, Nathan Lynch wrote:
> "Gautham R. Shenoy" <[email protected]> writes:
>> From: "Gautham R. Shenoy" <[email protected]>
>>
>> On PSeries LPARs, the data centers planners desire a more accurate
>> view of system utilization per resource such as CPU to plan the system
>> capacity requirements better. Such accuracy can be obtained by reading
>> PURR/SPURR registers for CPU resource utilization.
>>
>> Tools such as lparstat which are used to compute the utilization need
>> to know [S]PURR ticks when the cpu was busy or idle. The [S]PURR
>> counters are already exposed through sysfs. We already account for
>> PURR ticks when we go to idle so that we can update the VPA area. This
>> patchset extends support to account for SPURR ticks when idle, and
>> expose both via per-cpu sysfs files.
>
> Does anything really want to use PURR instead of SPURR? Seems like we
> should expose only SPURR idle values if possible.
>

lparstat is one of the consumers of PURR idle metric
(https://groups.google.com/forum/#!topic/powerpc-utils-devel/fYRo69xO9r4).
Agree, on the argument that system utilization metrics based on SPURR
accounting is accurate in comparison to PURR, which isn't proportional to
CPU frequency. PURR has been traditionally used to understand the system
utilization, whereas SPURR is used for understanding how much capacity is
left/exceeding in the system based on the current power saving mode.

--
Kamalesh

2019-12-05 16:19:44

by Nathan Lynch

[permalink] [raw]
Subject: Re: [PATCH 0/3] pseries: Track and expose idle PURR and SPURR ticks

Hi Kamalesh,

Kamalesh Babulal <[email protected]> writes:
> On 12/5/19 3:54 AM, Nathan Lynch wrote:
>> "Gautham R. Shenoy" <[email protected]> writes:
>>>
>>> Tools such as lparstat which are used to compute the utilization need
>>> to know [S]PURR ticks when the cpu was busy or idle. The [S]PURR
>>> counters are already exposed through sysfs. We already account for
>>> PURR ticks when we go to idle so that we can update the VPA area. This
>>> patchset extends support to account for SPURR ticks when idle, and
>>> expose both via per-cpu sysfs files.
>>
>> Does anything really want to use PURR instead of SPURR? Seems like we
>> should expose only SPURR idle values if possible.
>>
>
> lparstat is one of the consumers of PURR idle metric
> (https://groups.google.com/forum/#!topic/powerpc-utils-devel/fYRo69xO9r4).
> Agree, on the argument that system utilization metrics based on SPURR
> accounting is accurate in comparison to PURR, which isn't proportional to
> CPU frequency. PURR has been traditionally used to understand the system
> utilization, whereas SPURR is used for understanding how much capacity is
> left/exceeding in the system based on the current power saving mode.

I'll phrase my question differently: does SPURR complement or supercede
PURR? You seem to be saying they serve different purposes. If PURR is
actually useful rather then vestigial then I have no objection to
exposing idle_purr.

2019-12-05 16:56:08

by Naveen N. Rao

[permalink] [raw]
Subject: Re: [PATCH 2/3] powerpc/sysfs: Show idle_purr and idle_spurr for every CPU

Gautham R. Shenoy wrote:
> From: "Gautham R. Shenoy" <[email protected]>
>
> On Pseries LPARs, to calculate utilization, we need to know the
> [S]PURR ticks when the CPUs were busy or idle.
>
> The total PURR and SPURR ticks are already exposed via the per-cpu
> sysfs files /sys/devices/system/cpu/cpuX/purr and
> /sys/devices/system/cpu/cpuX/spurr.
>
> This patch adds support for exposing the idle PURR and SPURR ticks via
> /sys/devices/system/cpu/cpuX/idle_purr and
> /sys/devices/system/cpu/cpuX/idle_spurr.
>
> Signed-off-by: Gautham R. Shenoy <[email protected]>
> ---
> arch/powerpc/kernel/sysfs.c | 32 ++++++++++++++++++++++++++++++++
> 1 file changed, 32 insertions(+)
>
> diff --git a/arch/powerpc/kernel/sysfs.c b/arch/powerpc/kernel/sysfs.c
> index 80a676d..42ade55 100644
> --- a/arch/powerpc/kernel/sysfs.c
> +++ b/arch/powerpc/kernel/sysfs.c
> @@ -1044,6 +1044,36 @@ static ssize_t show_physical_id(struct device *dev,
> }
> static DEVICE_ATTR(physical_id, 0444, show_physical_id, NULL);
>
> +static ssize_t idle_purr_show(struct device *dev,
> + struct device_attribute *attr, char *buf)
> +{
> + struct cpu *cpu = container_of(dev, struct cpu, dev);
> + unsigned int cpuid = cpu->dev.id;
> + struct lppaca *cpu_lppaca_ptr = paca_ptrs[cpuid]->lppaca_ptr;
> + u64 idle_purr_cycles = be64_to_cpu(cpu_lppaca_ptr->wait_state_cycles);
> +
> + return sprintf(buf, "%llx\n", idle_purr_cycles);
> +}
> +static DEVICE_ATTR_RO(idle_purr);
> +
> +DECLARE_PER_CPU(u64, idle_spurr_cycles);
> +static ssize_t idle_spurr_show(struct device *dev,
> + struct device_attribute *attr, char *buf)
> +{
> + struct cpu *cpu = container_of(dev, struct cpu, dev);
> + unsigned int cpuid = cpu->dev.id;
> + u64 *idle_spurr_cycles_ptr = per_cpu_ptr(&idle_spurr_cycles, cpuid);

Is it possible for a user to read stale values if a particular cpu is in
an extended cede? Is it possible to use smp_call_function_single() to
force the cpu out of idle?

- Naveen

2019-12-05 17:26:00

by Naveen N. Rao

[permalink] [raw]
Subject: Re: [PATCH 0/3] pseries: Track and expose idle PURR and SPURR ticks

Hi Nathan,

Nathan Lynch wrote:
> Hi Kamalesh,
>
> Kamalesh Babulal <[email protected]> writes:
>> On 12/5/19 3:54 AM, Nathan Lynch wrote:
>>> "Gautham R. Shenoy" <[email protected]> writes:
>>>>
>>>> Tools such as lparstat which are used to compute the utilization need
>>>> to know [S]PURR ticks when the cpu was busy or idle. The [S]PURR
>>>> counters are already exposed through sysfs. We already account for
>>>> PURR ticks when we go to idle so that we can update the VPA area. This
>>>> patchset extends support to account for SPURR ticks when idle, and
>>>> expose both via per-cpu sysfs files.
>>>
>>> Does anything really want to use PURR instead of SPURR? Seems like we
>>> should expose only SPURR idle values if possible.
>>>
>>
>> lparstat is one of the consumers of PURR idle metric
>> (https://groups.google.com/forum/#!topic/powerpc-utils-devel/fYRo69xO9r4).
>> Agree, on the argument that system utilization metrics based on SPURR
>> accounting is accurate in comparison to PURR, which isn't proportional to
>> CPU frequency. PURR has been traditionally used to understand the system
>> utilization, whereas SPURR is used for understanding how much capacity is
>> left/exceeding in the system based on the current power saving mode.
>
> I'll phrase my question differently: does SPURR complement or supercede
> PURR? You seem to be saying they serve different purposes. If PURR is
> actually useful rather then vestigial then I have no objection to
> exposing idle_purr.

SPURR complements PURR, so we need both. SPURR/PURR ratio helps provide
an indication of the available headroom in terms of core resources, at
maximum frequency.


- Naveen

2019-12-06 09:16:15

by Naveen N. Rao

[permalink] [raw]
Subject: Re: [PATCH 0/3] pseries: Track and expose idle PURR and SPURR ticks

Naveen N. Rao wrote:
> Hi Nathan,
>
> Nathan Lynch wrote:
>> Hi Kamalesh,
>>
>> Kamalesh Babulal <[email protected]> writes:
>>> On 12/5/19 3:54 AM, Nathan Lynch wrote:
>>>> "Gautham R. Shenoy" <[email protected]> writes:
>>>>>
>>>>> Tools such as lparstat which are used to compute the utilization need
>>>>> to know [S]PURR ticks when the cpu was busy or idle. The [S]PURR
>>>>> counters are already exposed through sysfs. We already account for
>>>>> PURR ticks when we go to idle so that we can update the VPA area. This
>>>>> patchset extends support to account for SPURR ticks when idle, and
>>>>> expose both via per-cpu sysfs files.
>>>>
>>>> Does anything really want to use PURR instead of SPURR? Seems like we
>>>> should expose only SPURR idle values if possible.
>>>>
>>>
>>> lparstat is one of the consumers of PURR idle metric
>>> (https://groups.google.com/forum/#!topic/powerpc-utils-devel/fYRo69xO9r4).
>>> Agree, on the argument that system utilization metrics based on SPURR
>>> accounting is accurate in comparison to PURR, which isn't proportional to
>>> CPU frequency. PURR has been traditionally used to understand the system
>>> utilization, whereas SPURR is used for understanding how much capacity is
>>> left/exceeding in the system based on the current power saving mode.
>>
>> I'll phrase my question differently: does SPURR complement or supercede
>> PURR? You seem to be saying they serve different purposes. If PURR is
>> actually useful rather then vestigial then I have no objection to
>> exposing idle_purr.
>
> SPURR complements PURR, so we need both. SPURR/PURR ratio helps provide
> an indication of the available headroom in terms of core resources, at
> maximum frequency.

Re-reading this today morning, I realize that this isn't entirely
accurate. SPURR alone is sufficient to understand core resource
utilization.

Kamalesh is using PURR to display non-normalized utilization values
(under 'actual' column), as reported by lparstat on AIX. I am not
entirely sure if it is ok to derive these based on the SPURR busy/idle
ratio.

- Naveen

2020-02-03 06:42:38

by Gautham R Shenoy

[permalink] [raw]
Subject: Re: [PATCH 2/3] powerpc/sysfs: Show idle_purr and idle_spurr for every CPU

Hello Nathan,


On Wed, Dec 04, 2019 at 04:24:31PM -0600, Nathan Lynch wrote:
> "Gautham R. Shenoy" <[email protected]> writes:
> > @@ -1067,6 +1097,8 @@ static int __init topology_init(void)
> > register_cpu(c, cpu);
> >
> > device_create_file(&c->dev, &dev_attr_physical_id);
> > + if (firmware_has_feature(FW_FEATURE_SPLPAR))
> > + create_idle_purr_spurr_sysfs_entry(&c->dev);
>
> Architecturally speaking PURR/SPURR aren't strongly linked to the PAPR
> SPLPAR option, are they? I'm not sure it's right for these attributes to
> be absent if the platform does not support shared processor mode.

Doesn't FW_FEATURE_SPLPAR refer to all Pseries guests ? It is perhaps
incorrectly named, but from the other uses in the kernel, it seems to
indicate that we are running as a guest instead of on a bare-metal
system.

--
Thanks and Regards
gautham.

2020-02-03 06:44:16

by Gautham R Shenoy

[permalink] [raw]
Subject: Re: [PATCH 1/3] powerpc/pseries: Account for SPURR ticks on idle CPUs

Hello Nathan,

On Wed, Dec 04, 2019 at 04:24:52PM -0600, Nathan Lynch wrote:
> "Gautham R. Shenoy" <[email protected]> writes:
> > diff --git a/arch/powerpc/kernel/idle.c b/arch/powerpc/kernel/idle.c
> > index a36fd05..708ec68 100644
> > --- a/arch/powerpc/kernel/idle.c
> > +++ b/arch/powerpc/kernel/idle.c
> > @@ -33,6 +33,8 @@
> > unsigned long cpuidle_disable = IDLE_NO_OVERRIDE;
> > EXPORT_SYMBOL(cpuidle_disable);
> >
> > +DEFINE_PER_CPU(u64, idle_spurr_cycles);
> > +
>
> Does idle_spurr_cycles need any special treatment for CPU
> online/offline?

If offline uses extended cede, then we need to take a snapshot of the
idle_spurr_cycles before going offline and add the delta once we are
back online. However, since the plan is to deprecate the use of
extended cede for CPU-Offline and use only rtas-stop-self, we don't
need any special handling there.


>
> > static int __init powersave_off(char *arg)
> > {
> > ppc_md.power_save = NULL;
> > diff --git a/drivers/cpuidle/cpuidle-pseries.c b/drivers/cpuidle/cpuidle-pseries.c
> > index 74c2479..45e2be4 100644
> > --- a/drivers/cpuidle/cpuidle-pseries.c
> > +++ b/drivers/cpuidle/cpuidle-pseries.c
> > @@ -30,11 +30,14 @@ struct cpuidle_driver pseries_idle_driver = {
> > static struct cpuidle_state *cpuidle_state_table __read_mostly;
> > static u64 snooze_timeout __read_mostly;
> > static bool snooze_timeout_en __read_mostly;
> > +DECLARE_PER_CPU(u64, idle_spurr_cycles);
>
> This belongs in a header...

Will move it to the header file.

>
>
> > -static inline void idle_loop_prolog(unsigned long *in_purr)
> > +static inline void idle_loop_prolog(unsigned long *in_purr,
> > + unsigned long *in_spurr)
> > {
> > ppc64_runlatch_off();
> > *in_purr = mfspr(SPRN_PURR);
> > + *in_spurr = mfspr(SPRN_SPURR);
> > /*
> > * Indicate to the HV that we are idle. Now would be
> > * a good time to find other work to dispatch.
> > @@ -42,13 +45,16 @@ static inline void idle_loop_prolog(unsigned long *in_purr)
> > get_lppaca()->idle = 1;
> > }
> >
> > -static inline void idle_loop_epilog(unsigned long in_purr)
> > +static inline void idle_loop_epilog(unsigned long in_purr,
> > + unsigned long in_spurr)
> > {
> > u64 wait_cycles;
> > + u64 *idle_spurr_cycles_ptr = this_cpu_ptr(&idle_spurr_cycles);
> >
> > wait_cycles = be64_to_cpu(get_lppaca()->wait_state_cycles);
> > wait_cycles += mfspr(SPRN_PURR) - in_purr;
> > get_lppaca()->wait_state_cycles = cpu_to_be64(wait_cycles);
> > + *idle_spurr_cycles_ptr += mfspr(SPRN_SPURR) - in_spurr;
>
> ... and the sampling and increment logic probably should be further
> encapsulated in accessor functions that can be used in both the cpuidle
> driver and the default/generic idle implementation. Or is there some
> reason this is specific to the pseries cpuidle driver?

I am not sure if we use SPURR and PURR for performing accounting on
Bare-Metal systems. IIUC, the patches proposed by Kamalesh is only to
use idle_[s]purr and [s]purr on POWERVM LPARs. This is why I coded the
sampling/increment logic in the pseries cpuidle driver. But you are
right, in the absence of cpuidle, when we use the default idle
implementation, we will still need to note the value of
idle_purr/spurr.

--
Thanks and Regards
gautham.


2020-02-03 06:46:44

by Gautham R Shenoy

[permalink] [raw]
Subject: Re: [PATCH 2/3] powerpc/sysfs: Show idle_purr and idle_spurr for every CPU

Hi Naveen,

On Thu, Dec 05, 2019 at 10:23:58PM +0530, Naveen N. Rao wrote:
> >diff --git a/arch/powerpc/kernel/sysfs.c b/arch/powerpc/kernel/sysfs.c
> >index 80a676d..42ade55 100644
> >--- a/arch/powerpc/kernel/sysfs.c
> >+++ b/arch/powerpc/kernel/sysfs.c
> >@@ -1044,6 +1044,36 @@ static ssize_t show_physical_id(struct device *dev,
> > }
> > static DEVICE_ATTR(physical_id, 0444, show_physical_id, NULL);
> >
> >+static ssize_t idle_purr_show(struct device *dev,
> >+ struct device_attribute *attr, char *buf)
> >+{
> >+ struct cpu *cpu = container_of(dev, struct cpu, dev);
> >+ unsigned int cpuid = cpu->dev.id;
> >+ struct lppaca *cpu_lppaca_ptr = paca_ptrs[cpuid]->lppaca_ptr;
> >+ u64 idle_purr_cycles = be64_to_cpu(cpu_lppaca_ptr->wait_state_cycles);
> >+
> >+ return sprintf(buf, "%llx\n", idle_purr_cycles);
> >+}
> >+static DEVICE_ATTR_RO(idle_purr);
> >+
> >+DECLARE_PER_CPU(u64, idle_spurr_cycles);
> >+static ssize_t idle_spurr_show(struct device *dev,
> >+ struct device_attribute *attr, char *buf)
> >+{
> >+ struct cpu *cpu = container_of(dev, struct cpu, dev);
> >+ unsigned int cpuid = cpu->dev.id;
> >+ u64 *idle_spurr_cycles_ptr = per_cpu_ptr(&idle_spurr_cycles, cpuid);
>
> Is it possible for a user to read stale values if a particular cpu is in an
> extended cede? Is it possible to use smp_call_function_single() to force the
> cpu out of idle?

Yes, if the CPU whose idle_spurr cycle is being read is still in idle,
then we will miss reporting the delta spurr cycles for this last
idle-duration. Yes, we can use an smp_call_function_single(), though
that will introduce IPI noise. How often will idle_[s]purr be read ?

>
> - Naveen
>

--
Thanks and Regards
gautham.

2020-02-04 07:54:11

by Naveen N. Rao

[permalink] [raw]
Subject: Re: [PATCH 2/3] powerpc/sysfs: Show idle_purr and idle_spurr for every CPU

Gautham R Shenoy wrote:
> Hi Naveen,
>
> On Thu, Dec 05, 2019 at 10:23:58PM +0530, Naveen N. Rao wrote:
>> >diff --git a/arch/powerpc/kernel/sysfs.c b/arch/powerpc/kernel/sysfs.c
>> >index 80a676d..42ade55 100644
>> >--- a/arch/powerpc/kernel/sysfs.c
>> >+++ b/arch/powerpc/kernel/sysfs.c
>> >@@ -1044,6 +1044,36 @@ static ssize_t show_physical_id(struct device *dev,
>> > }
>> > static DEVICE_ATTR(physical_id, 0444, show_physical_id, NULL);
>> >
>> >+static ssize_t idle_purr_show(struct device *dev,
>> >+ struct device_attribute *attr, char *buf)
>> >+{
>> >+ struct cpu *cpu = container_of(dev, struct cpu, dev);
>> >+ unsigned int cpuid = cpu->dev.id;
>> >+ struct lppaca *cpu_lppaca_ptr = paca_ptrs[cpuid]->lppaca_ptr;
>> >+ u64 idle_purr_cycles = be64_to_cpu(cpu_lppaca_ptr->wait_state_cycles);
>> >+
>> >+ return sprintf(buf, "%llx\n", idle_purr_cycles);
>> >+}
>> >+static DEVICE_ATTR_RO(idle_purr);
>> >+
>> >+DECLARE_PER_CPU(u64, idle_spurr_cycles);
>> >+static ssize_t idle_spurr_show(struct device *dev,
>> >+ struct device_attribute *attr, char *buf)
>> >+{
>> >+ struct cpu *cpu = container_of(dev, struct cpu, dev);
>> >+ unsigned int cpuid = cpu->dev.id;
>> >+ u64 *idle_spurr_cycles_ptr = per_cpu_ptr(&idle_spurr_cycles, cpuid);
>>
>> Is it possible for a user to read stale values if a particular cpu is in an
>> extended cede? Is it possible to use smp_call_function_single() to force the
>> cpu out of idle?
>
> Yes, if the CPU whose idle_spurr cycle is being read is still in idle,
> then we will miss reporting the delta spurr cycles for this last
> idle-duration. Yes, we can use an smp_call_function_single(), though
> that will introduce IPI noise. How often will idle_[s]purr be read ?

Since it is possible for a cpu to go into extended cede for multiple
seconds during which time it is possible to mis-report utilization, I
think it is better to ensure that the sysfs interface for idle_[s]purr
report the proper values through use of IPI.

With repect to lparstat, the read interval is user-specified and just
gets passed onto sleep().

- Naveen

2020-02-04 09:15:08

by Kamalesh Babulal

[permalink] [raw]
Subject: Re: [PATCH 0/3] pseries: Track and expose idle PURR and SPURR ticks

On 12/6/19 2:44 PM, Naveen N. Rao wrote:
> Naveen N. Rao wrote:
>> Hi Nathan,
>>
>> Nathan Lynch wrote:
>>> Hi Kamalesh,
>>>
>>> Kamalesh Babulal <[email protected]> writes:
>>>> On 12/5/19 3:54 AM, Nathan Lynch wrote:
>>>>> "Gautham R. Shenoy" <[email protected]> writes:
>>>>>>
>>>>>> Tools such as lparstat which are used to compute the utilization need
>>>>>> to know [S]PURR ticks when the cpu was busy or idle. The [S]PURR
>>>>>> counters are already exposed through sysfs.  We already account for
>>>>>> PURR ticks when we go to idle so that we can update the VPA area. This
>>>>>> patchset extends support to account for SPURR ticks when idle, and
>>>>>> expose both via per-cpu sysfs files.
>>>>>
>>>>> Does anything really want to use PURR instead of SPURR? Seems like we
>>>>> should expose only SPURR idle values if possible.
>>>>>
>>>>
>>>> lparstat is one of the consumers of PURR idle metric
>>>> (https://groups.google.com/forum/#!topic/powerpc-utils-devel/fYRo69xO9r4). Agree, on the argument that system utilization metrics based on SPURR
>>>> accounting is accurate in comparison to PURR, which isn't proportional to
>>>> CPU frequency.  PURR has been traditionally used to understand the system
>>>> utilization, whereas SPURR is used for understanding how much capacity is
>>>> left/exceeding in the system based on the current power saving mode.
>>>
>>> I'll phrase my question differently: does SPURR complement or supercede
>>> PURR? You seem to be saying they serve different purposes. If PURR is
>>> actually useful rather then vestigial then I have no objection to
>>> exposing idle_purr.
>>
>> SPURR complements PURR, so we need both. SPURR/PURR ratio helps provide an indication of the available headroom in terms of core resources, at maximum frequency.
>
> Re-reading this today morning, I realize that this isn't entirely accurate. SPURR alone is sufficient to understand core resource utilization.
>
> Kamalesh is using PURR to display non-normalized utilization values (under 'actual' column), as reported by lparstat on AIX. I am not entirely sure if it is ok to derive these based on the SPURR busy/idle ratio.

Both idle_purr and idle_spurr complement each other and we need to expose both of them.
It will improve the accounting accuracy of tools currently consuming system-wide PURR
and/or SPURR numbers to report system usage. Deriving one from another, from my
experience makes it hard for tools or any custom scripts to give an accurate system view.
One tool I am aware of is lparstat, which uses PURR based metrics.

--
Kamalesh

2020-02-05 04:21:18

by Gautham R Shenoy

[permalink] [raw]
Subject: Re: [PATCH 2/3] powerpc/sysfs: Show idle_purr and idle_spurr for every CPU

Hi Naveen,

On Tue, Feb 04, 2020 at 01:22:19PM +0530, Naveen N. Rao wrote:
> Gautham R Shenoy wrote:
> >Hi Naveen,
> >
> >On Thu, Dec 05, 2019 at 10:23:58PM +0530, Naveen N. Rao wrote:
> >>>diff --git a/arch/powerpc/kernel/sysfs.c b/arch/powerpc/kernel/sysfs.c
> >>>index 80a676d..42ade55 100644
> >>>--- a/arch/powerpc/kernel/sysfs.c
> >>>+++ b/arch/powerpc/kernel/sysfs.c
> >>>@@ -1044,6 +1044,36 @@ static ssize_t show_physical_id(struct device *dev,
> >>> }
> >>> static DEVICE_ATTR(physical_id, 0444, show_physical_id, NULL);
> >>>
> >>>+static ssize_t idle_purr_show(struct device *dev,
> >>>+ struct device_attribute *attr, char *buf)
> >>>+{
> >>>+ struct cpu *cpu = container_of(dev, struct cpu, dev);
> >>>+ unsigned int cpuid = cpu->dev.id;
> >>>+ struct lppaca *cpu_lppaca_ptr = paca_ptrs[cpuid]->lppaca_ptr;
> >>>+ u64 idle_purr_cycles = be64_to_cpu(cpu_lppaca_ptr->wait_state_cycles);
> >>>+
> >>>+ return sprintf(buf, "%llx\n", idle_purr_cycles);
> >>>+}
> >>>+static DEVICE_ATTR_RO(idle_purr);
> >>>+
> >>>+DECLARE_PER_CPU(u64, idle_spurr_cycles);
> >>>+static ssize_t idle_spurr_show(struct device *dev,
> >>>+ struct device_attribute *attr, char *buf)
> >>>+{
> >>>+ struct cpu *cpu = container_of(dev, struct cpu, dev);
> >>>+ unsigned int cpuid = cpu->dev.id;
> >>>+ u64 *idle_spurr_cycles_ptr = per_cpu_ptr(&idle_spurr_cycles, cpuid);
> >>
> >>Is it possible for a user to read stale values if a particular cpu is in an
> >>extended cede? Is it possible to use smp_call_function_single() to force the
> >>cpu out of idle?
> >
> >Yes, if the CPU whose idle_spurr cycle is being read is still in idle,
> >then we will miss reporting the delta spurr cycles for this last
> >idle-duration. Yes, we can use an smp_call_function_single(), though
> >that will introduce IPI noise. How often will idle_[s]purr be read ?
>
> Since it is possible for a cpu to go into extended cede for multiple seconds
> during which time it is possible to mis-report utilization, I think it is
> better to ensure that the sysfs interface for idle_[s]purr report the proper
> values through use of IPI.
>

Fair enough.


> With repect to lparstat, the read interval is user-specified and just gets
> passed onto sleep().

Ok. So I guess currently you will be sending smp_call_function every
time you read a PURR and SPURR. That number will now increase by 2
times when we read idle_purr and idle_spurr.


>
> - Naveen
>

--
Thanks and Regards
gautham.

2020-02-05 07:00:05

by Naveen N. Rao

[permalink] [raw]
Subject: Re: [PATCH 2/3] powerpc/sysfs: Show idle_purr and idle_spurr for every CPU

Gautham R Shenoy wrote:
>
>> With repect to lparstat, the read interval is user-specified and just gets
>> passed onto sleep().
>
> Ok. So I guess currently you will be sending smp_call_function every
> time you read a PURR and SPURR. That number will now increase by 2
> times when we read idle_purr and idle_spurr.

Yes, not really efficient. I just wanted to point out that we can't have
stale data being returned if we choose to add another sysfs file.

We should be able to use any other interface too, if you have a
different interface in mind.


- Naveen

2020-02-05 07:09:53

by Christophe Leroy

[permalink] [raw]
Subject: Re: [PATCH 2/3] powerpc/sysfs: Show idle_purr and idle_spurr for every CPU



Le 27/11/2019 à 13:01, Gautham R. Shenoy a écrit :
> From: "Gautham R. Shenoy" <[email protected]>
>
> On Pseries LPARs, to calculate utilization, we need to know the
> [S]PURR ticks when the CPUs were busy or idle.
>
> The total PURR and SPURR ticks are already exposed via the per-cpu
> sysfs files /sys/devices/system/cpu/cpuX/purr and
> /sys/devices/system/cpu/cpuX/spurr.
>
> This patch adds support for exposing the idle PURR and SPURR ticks via
> /sys/devices/system/cpu/cpuX/idle_purr and
> /sys/devices/system/cpu/cpuX/idle_spurr.

Might be a candid question, but I see in arch/powerpc/kernel/time.c that
PURR/SPURR are already taken into account by the kernel to calculate
utilisation when CONFIG_VIRT_CPU_ACCOUNTING_NATIVE is selected.

As far as I understand, you are wanting to expose this to userland to
redo the calculation there. What is wrong with the values reported by
the kernel ?

Christophe

>
> Signed-off-by: Gautham R. Shenoy <[email protected]>
> ---
> arch/powerpc/kernel/sysfs.c | 32 ++++++++++++++++++++++++++++++++
> 1 file changed, 32 insertions(+)
>
> diff --git a/arch/powerpc/kernel/sysfs.c b/arch/powerpc/kernel/sysfs.c
> index 80a676d..42ade55 100644
> --- a/arch/powerpc/kernel/sysfs.c
> +++ b/arch/powerpc/kernel/sysfs.c
> @@ -1044,6 +1044,36 @@ static ssize_t show_physical_id(struct device *dev,
> }
> static DEVICE_ATTR(physical_id, 0444, show_physical_id, NULL);
>
> +static ssize_t idle_purr_show(struct device *dev,
> + struct device_attribute *attr, char *buf)
> +{
> + struct cpu *cpu = container_of(dev, struct cpu, dev);
> + unsigned int cpuid = cpu->dev.id;
> + struct lppaca *cpu_lppaca_ptr = paca_ptrs[cpuid]->lppaca_ptr;
> + u64 idle_purr_cycles = be64_to_cpu(cpu_lppaca_ptr->wait_state_cycles);
> +
> + return sprintf(buf, "%llx\n", idle_purr_cycles);
> +}
> +static DEVICE_ATTR_RO(idle_purr);
> +
> +DECLARE_PER_CPU(u64, idle_spurr_cycles);
> +static ssize_t idle_spurr_show(struct device *dev,
> + struct device_attribute *attr, char *buf)
> +{
> + struct cpu *cpu = container_of(dev, struct cpu, dev);
> + unsigned int cpuid = cpu->dev.id;
> + u64 *idle_spurr_cycles_ptr = per_cpu_ptr(&idle_spurr_cycles, cpuid);
> +
> + return sprintf(buf, "%llx\n", *idle_spurr_cycles_ptr);
> +}
> +static DEVICE_ATTR_RO(idle_spurr);
> +
> +static void create_idle_purr_spurr_sysfs_entry(struct device *cpudev)
> +{
> + device_create_file(cpudev, &dev_attr_idle_purr);
> + device_create_file(cpudev, &dev_attr_idle_spurr);
> +}
> +
> static int __init topology_init(void)
> {
> int cpu, r;
> @@ -1067,6 +1097,8 @@ static int __init topology_init(void)
> register_cpu(c, cpu);
>
> device_create_file(&c->dev, &dev_attr_physical_id);
> + if (firmware_has_feature(FW_FEATURE_SPLPAR))
> + create_idle_purr_spurr_sysfs_entry(&c->dev);
> }
> }
> r = cpuhp_setup_state(CPUHP_AP_ONLINE_DYN, "powerpc/topology:online",
>

2020-02-05 08:09:23

by Naveen N. Rao

[permalink] [raw]
Subject: Re: [PATCH 2/3] powerpc/sysfs: Show idle_purr and idle_spurr for every CPU

Christophe Leroy wrote:
>
>
> Le 27/11/2019 à 13:01, Gautham R. Shenoy a écrit :
>> From: "Gautham R. Shenoy" <[email protected]>
>>
>> On Pseries LPARs, to calculate utilization, we need to know the
>> [S]PURR ticks when the CPUs were busy or idle.
>>
>> The total PURR and SPURR ticks are already exposed via the per-cpu
>> sysfs files /sys/devices/system/cpu/cpuX/purr and
>> /sys/devices/system/cpu/cpuX/spurr.
>>
>> This patch adds support for exposing the idle PURR and SPURR ticks via
>> /sys/devices/system/cpu/cpuX/idle_purr and
>> /sys/devices/system/cpu/cpuX/idle_spurr.
>
> Might be a candid question, but I see in arch/powerpc/kernel/time.c that
> PURR/SPURR are already taken into account by the kernel to calculate
> utilisation when CONFIG_VIRT_CPU_ACCOUNTING_NATIVE is selected.
>
> As far as I understand, you are wanting to expose this to userland to
> redo the calculation there. What is wrong with the values reported by
> the kernel ?

As you point out, it is only done with
CONFIG_VIRT_CPU_ACCOUNTING_NATIVE, but isn't available with NO_HZ_FULL,
which happens to be the distro default nowadays.

- Naveen