2021-01-15 09:13:37

by Zhang Rui

[permalink] [raw]
Subject: [PATCH 1/3] perf/x86/rapl: Add msr mask support

In some cases, when probing a perf MSR, we're probing certain bits of the
MSR instead of the whole register, thus only these bits should be checked.

For example, for RAPL ENERGY_STATUS MSR, only the lower 32 bits represents
the energy counter, and the higher 32bits are reserved.

Introduce a new mask field in struct perf_msr to allow probing certain
bits of a MSR.

This change is transparent to the current perf_msr_probe() users.

Signed-off-by: Zhang Rui <[email protected]>
Reviewed-by: Andi Kleen <[email protected]>
---
arch/x86/events/probe.c | 5 ++++-
arch/x86/events/probe.h | 7 ++++---
2 files changed, 8 insertions(+), 4 deletions(-)

diff --git a/arch/x86/events/probe.c b/arch/x86/events/probe.c
index 136a1e847254..a0a19c404cb5 100644
--- a/arch/x86/events/probe.c
+++ b/arch/x86/events/probe.c
@@ -28,6 +28,7 @@ perf_msr_probe(struct perf_msr *msr, int cnt, bool zero, void *data)
for (bit = 0; bit < cnt; bit++) {
if (!msr[bit].no_check) {
struct attribute_group *grp = msr[bit].grp;
+ u64 mask;

/* skip entry with no group */
if (!grp)
@@ -44,8 +45,10 @@ perf_msr_probe(struct perf_msr *msr, int cnt, bool zero, void *data)
/* Virt sucks; you cannot tell if a R/O MSR is present :/ */
if (rdmsrl_safe(msr[bit].msr, &val))
continue;
+
+ mask = msr[bit].mask ? msr[bit].mask : U64_MAX;
/* Disable zero counters if requested. */
- if (!zero && !val)
+ if (!zero && !(val & mask))
continue;

grp->is_visible = NULL;
diff --git a/arch/x86/events/probe.h b/arch/x86/events/probe.h
index 4c8e0afc5fb5..261b9bda24e3 100644
--- a/arch/x86/events/probe.h
+++ b/arch/x86/events/probe.h
@@ -4,10 +4,11 @@
#include <linux/sysfs.h>

struct perf_msr {
- u64 msr;
- struct attribute_group *grp;
+ u64 msr;
+ struct attribute_group *grp;
bool (*test)(int idx, void *data);
- bool no_check;
+ bool no_check;
+ u64 mask;
};

unsigned long
--
2.17.1


2021-01-15 09:13:43

by Zhang Rui

[permalink] [raw]
Subject: [PATCH 2/3] perf/x86/rapl: Fix energy counter detection

In the RAPL ENERGY_COUNTER MSR, only the lower 32bits represent the
energy counter, and the higher 32bits are reserved.

Add the MSR mask for these MSRs to fix a problem that the RAPL PMU events
are added erroneously when higher 32bits contain non-zero value.

Signed-off-by: Zhang Rui <[email protected]>
Reviewed-by: Andi Kleen <[email protected]>
---
arch/x86/events/rapl.c | 13 ++++++++-----
1 file changed, 8 insertions(+), 5 deletions(-)

diff --git a/arch/x86/events/rapl.c b/arch/x86/events/rapl.c
index 7dbbeaacd995..7ed25b2ba05f 100644
--- a/arch/x86/events/rapl.c
+++ b/arch/x86/events/rapl.c
@@ -523,12 +523,15 @@ static bool test_msr(int idx, void *data)
return test_bit(idx, (unsigned long *) data);
}

+/* Only lower 32bits of the MSR represents the energy counter */
+#define RAPL_MSR_MASK 0xFFFFFFFF
+
static struct perf_msr intel_rapl_msrs[] = {
- [PERF_RAPL_PP0] = { MSR_PP0_ENERGY_STATUS, &rapl_events_cores_group, test_msr },
- [PERF_RAPL_PKG] = { MSR_PKG_ENERGY_STATUS, &rapl_events_pkg_group, test_msr },
- [PERF_RAPL_RAM] = { MSR_DRAM_ENERGY_STATUS, &rapl_events_ram_group, test_msr },
- [PERF_RAPL_PP1] = { MSR_PP1_ENERGY_STATUS, &rapl_events_gpu_group, test_msr },
- [PERF_RAPL_PSYS] = { MSR_PLATFORM_ENERGY_STATUS, &rapl_events_psys_group, test_msr },
+ [PERF_RAPL_PP0] = { MSR_PP0_ENERGY_STATUS, &rapl_events_cores_group, test_msr, false, RAPL_MSR_MASK },
+ [PERF_RAPL_PKG] = { MSR_PKG_ENERGY_STATUS, &rapl_events_pkg_group, test_msr, false, RAPL_MSR_MASK },
+ [PERF_RAPL_RAM] = { MSR_DRAM_ENERGY_STATUS, &rapl_events_ram_group, test_msr, false, RAPL_MSR_MASK },
+ [PERF_RAPL_PP1] = { MSR_PP1_ENERGY_STATUS, &rapl_events_gpu_group, test_msr, false, RAPL_MSR_MASK },
+ [PERF_RAPL_PSYS] = { MSR_PLATFORM_ENERGY_STATUS, &rapl_events_psys_group, test_msr, false, RAPL_MSR_MASK },
};

/*
--
2.17.1

2021-01-15 09:14:38

by Zhang Rui

[permalink] [raw]
Subject: [PATCH 3/3] perf/x86/rapl: Fix psys-energy event on Intel SPR platform

There are several things special for the RAPL Psys energy counter, on
Intel Sapphire Rapids platform.
1. it contains one Psys master package, and only CPUs on the master
package can read valid value of the Psys energy counter, reading the
MSR on CPUs in the slave package returns 0.
2. The master package does not have to be Physical package 0. And when
all the CPUs on the Psys master package are offlined, we lose the Psys
energy counter, at runtime.
3. The Psys energy counter can be disabled by BIOS, while all the other
energy counters are not affected.

It is not easy to handle all of these in the current RAPL PMU design
because
a) perf_msr_probe() validates the MSR on some random CPU, which may either
be in the Psys master package or in the Psys slave package.
b) all the RAPL events share the same PMU, and there is not API to remove
the psys-energy event cleanly, without affecting the other events in
the same PMU.

This patch addresses the problems in a simple way.

First, by setting .no_check bit for RAPL Psys MSR, the psys-energy event
is always added, so we don't have to check the Psys ENERGY_STATUS MSR on
master package.

Then, rapl_not_visible() is removed because
1. it is useless for RAPL MSRs with .no_check cleared, because the
.is_visible() callbacks is always overridden in perf_msr_probe().
2. it is useless for RAPL MSRs with .no_check set, because we actually
want the sysfs attributes always be visible for those MSRs.

With the above changes, we always probe the psys-energy event on Intel SPR
platform. Difference is that the event counter returns 0 when the Psys
RAPL Domain is disabled by BIOS, or the Psys master package is offlined.

Signed-off-by: Zhang Rui <[email protected]>
Reviewed-by: Andi Kleen <[email protected]>
---
arch/x86/events/rapl.c | 21 +++++++++------------
1 file changed, 9 insertions(+), 12 deletions(-)

diff --git a/arch/x86/events/rapl.c b/arch/x86/events/rapl.c
index 7ed25b2ba05f..f42a70496a24 100644
--- a/arch/x86/events/rapl.c
+++ b/arch/x86/events/rapl.c
@@ -454,16 +454,9 @@ static struct attribute *rapl_events_cores[] = {
NULL,
};

-static umode_t
-rapl_not_visible(struct kobject *kobj, struct attribute *attr, int i)
-{
- return 0;
-}
-
static struct attribute_group rapl_events_cores_group = {
.name = "events",
.attrs = rapl_events_cores,
- .is_visible = rapl_not_visible,
};

static struct attribute *rapl_events_pkg[] = {
@@ -476,7 +469,6 @@ static struct attribute *rapl_events_pkg[] = {
static struct attribute_group rapl_events_pkg_group = {
.name = "events",
.attrs = rapl_events_pkg,
- .is_visible = rapl_not_visible,
};

static struct attribute *rapl_events_ram[] = {
@@ -489,7 +481,6 @@ static struct attribute *rapl_events_ram[] = {
static struct attribute_group rapl_events_ram_group = {
.name = "events",
.attrs = rapl_events_ram,
- .is_visible = rapl_not_visible,
};

static struct attribute *rapl_events_gpu[] = {
@@ -502,7 +493,6 @@ static struct attribute *rapl_events_gpu[] = {
static struct attribute_group rapl_events_gpu_group = {
.name = "events",
.attrs = rapl_events_gpu,
- .is_visible = rapl_not_visible,
};

static struct attribute *rapl_events_psys[] = {
@@ -515,7 +505,6 @@ static struct attribute *rapl_events_psys[] = {
static struct attribute_group rapl_events_psys_group = {
.name = "events",
.attrs = rapl_events_psys,
- .is_visible = rapl_not_visible,
};

static bool test_msr(int idx, void *data)
@@ -534,6 +523,14 @@ static struct perf_msr intel_rapl_msrs[] = {
[PERF_RAPL_PSYS] = { MSR_PLATFORM_ENERGY_STATUS, &rapl_events_psys_group, test_msr, false, RAPL_MSR_MASK },
};

+static struct perf_msr intel_rapl_spr_msrs[] = {
+ [PERF_RAPL_PP0] = { MSR_PP0_ENERGY_STATUS, &rapl_events_cores_group, test_msr, false, RAPL_MSR_MASK },
+ [PERF_RAPL_PKG] = { MSR_PKG_ENERGY_STATUS, &rapl_events_pkg_group, test_msr, false, RAPL_MSR_MASK },
+ [PERF_RAPL_RAM] = { MSR_DRAM_ENERGY_STATUS, &rapl_events_ram_group, test_msr, false, RAPL_MSR_MASK },
+ [PERF_RAPL_PP1] = { MSR_PP1_ENERGY_STATUS, &rapl_events_gpu_group, test_msr, false, RAPL_MSR_MASK },
+ [PERF_RAPL_PSYS] = { MSR_PLATFORM_ENERGY_STATUS, &rapl_events_psys_group, test_msr, true, RAPL_MSR_MASK },
+};
+
/*
* Force to PERF_RAPL_MAX size due to:
* - perf_msr_probe(PERF_RAPL_MAX)
@@ -764,7 +761,7 @@ static struct rapl_model model_spr = {
BIT(PERF_RAPL_PSYS),
.unit_quirk = RAPL_UNIT_QUIRK_INTEL_SPR,
.msr_power_unit = MSR_RAPL_POWER_UNIT,
- .rapl_msrs = intel_rapl_msrs,
+ .rapl_msrs = intel_rapl_spr_msrs,
};

static struct rapl_model model_amd_fam17h = {
--
2.17.1

2021-01-15 20:09:13

by Peter Zijlstra

[permalink] [raw]
Subject: Re: [PATCH 2/3] perf/x86/rapl: Fix energy counter detection

On Fri, Jan 15, 2021 at 05:22:07PM +0800, Zhang Rui wrote:
> In the RAPL ENERGY_COUNTER MSR, only the lower 32bits represent the
> energy counter, and the higher 32bits are reserved.
>
> Add the MSR mask for these MSRs to fix a problem that the RAPL PMU events
> are added erroneously when higher 32bits contain non-zero value.

Why would these high bits be non-zero?

2021-01-16 08:23:53

by Zhang Rui

[permalink] [raw]
Subject: RE: [PATCH 2/3] perf/x86/rapl: Fix energy counter detection



> -----Original Message-----
> From: Peter Zijlstra <[email protected]>
> Sent: Saturday, January 16, 2021 4:03 AM
> To: Zhang, Rui <[email protected]>
> Cc: [email protected]; [email protected]; [email protected];
> [email protected]; [email protected];
> [email protected]; [email protected]; [email protected];
> [email protected]; [email protected]
> Subject: Re: [PATCH 2/3] perf/x86/rapl: Fix energy counter detection
> Importance: High
>
> On Fri, Jan 15, 2021 at 05:22:07PM +0800, Zhang Rui wrote:
> > In the RAPL ENERGY_COUNTER MSR, only the lower 32bits represent the
> > energy counter, and the higher 32bits are reserved.
> >
> > Add the MSR mask for these MSRs to fix a problem that the RAPL PMU
> > events are added erroneously when higher 32bits contain non-zero value.
>
> Why would these high bits be non-zero?

On SPR platform, the high bits of Psys energy counter are reused for other purpose.
High bits for other RAPL domains energy counters still return 0.

I didn't mention this because I thought this patch should be okay as a generic fix.

Thanks,
rui

2021-01-16 12:51:18

by Peter Zijlstra

[permalink] [raw]
Subject: Re: [PATCH 2/3] perf/x86/rapl: Fix energy counter detection

On Sat, Jan 16, 2021 at 08:19:35AM +0000, Zhang, Rui wrote:
>
>
> > -----Original Message-----
> > From: Peter Zijlstra <[email protected]>
> > Sent: Saturday, January 16, 2021 4:03 AM
> > To: Zhang, Rui <[email protected]>
> > Cc: [email protected]; [email protected]; [email protected];
> > [email protected]; [email protected];
> > [email protected]; [email protected]; [email protected];
> > [email protected]; [email protected]
> > Subject: Re: [PATCH 2/3] perf/x86/rapl: Fix energy counter detection
> > Importance: High
> >
> > On Fri, Jan 15, 2021 at 05:22:07PM +0800, Zhang Rui wrote:
> > > In the RAPL ENERGY_COUNTER MSR, only the lower 32bits represent the
> > > energy counter, and the higher 32bits are reserved.
> > >
> > > Add the MSR mask for these MSRs to fix a problem that the RAPL PMU
> > > events are added erroneously when higher 32bits contain non-zero value.
> >
> > Why would these high bits be non-zero?
>
> On SPR platform, the high bits of Psys energy counter are reused for other purpose.
> High bits for other RAPL domains energy counters still return 0.
>
> I didn't mention this because I thought this patch should be okay as a generic fix.

But it doesn't fix anything.. there's not anything broken, except on
that daft SPR thing.

2021-01-16 12:55:30

by Peter Zijlstra

[permalink] [raw]
Subject: Re: [PATCH 3/3] perf/x86/rapl: Fix psys-energy event on Intel SPR platform

On Fri, Jan 15, 2021 at 05:22:08PM +0800, Zhang Rui wrote:
> There are several things special for the RAPL Psys energy counter, on
> Intel Sapphire Rapids platform.
> 1. it contains one Psys master package, and only CPUs on the master
> package can read valid value of the Psys energy counter, reading the
> MSR on CPUs in the slave package returns 0.
> 2. The master package does not have to be Physical package 0. And when
> all the CPUs on the Psys master package are offlined, we lose the Psys
> energy counter, at runtime.
> 3. The Psys energy counter can be disabled by BIOS, while all the other
> energy counters are not affected.
>
> It is not easy to handle all of these in the current RAPL PMU design
> because
> a) perf_msr_probe() validates the MSR on some random CPU, which may either
> be in the Psys master package or in the Psys slave package.
> b) all the RAPL events share the same PMU, and there is not API to remove
> the psys-energy event cleanly, without affecting the other events in
> the same PMU.
>
> This patch addresses the problems in a simple way.
>
> First, by setting .no_check bit for RAPL Psys MSR, the psys-energy event
> is always added, so we don't have to check the Psys ENERGY_STATUS MSR on
> master package.
>
> Then, rapl_not_visible() is removed because
> 1. it is useless for RAPL MSRs with .no_check cleared, because the
> .is_visible() callbacks is always overridden in perf_msr_probe().
> 2. it is useless for RAPL MSRs with .no_check set, because we actually
> want the sysfs attributes always be visible for those MSRs.
>
> With the above changes, we always probe the psys-energy event on Intel SPR
> platform. Difference is that the event counter returns 0 when the Psys
> RAPL Domain is disabled by BIOS, or the Psys master package is offlined.

Maybe I'm too tired, but I cannot follow. How does this cure the fact
that the rapl_cpu_mask might not include that master thing. And how can
software detect what the master thing is to begin with?

2021-01-17 14:37:19

by Zhang Rui

[permalink] [raw]
Subject: RE: [PATCH 3/3] perf/x86/rapl: Fix psys-energy event on Intel SPR platform

Hi, Peter,

> -----Original Message-----
> From: Peter Zijlstra <[email protected]>
> Sent: Saturday, January 16, 2021 8:50 PM
> To: Zhang, Rui <[email protected]>
> Cc: [email protected]; [email protected]; [email protected];
> [email protected]; [email protected];
> [email protected]; [email protected]; [email protected];
> [email protected]; [email protected]
> Subject: Re: [PATCH 3/3] perf/x86/rapl: Fix psys-energy event on Intel SPR
> platform
> Importance: High
>
> On Fri, Jan 15, 2021 at 05:22:08PM +0800, Zhang Rui wrote:
> > There are several things special for the RAPL Psys energy counter, on
> > Intel Sapphire Rapids platform.
> > 1. it contains one Psys master package, and only CPUs on the master
> > package can read valid value of the Psys energy counter, reading the
> > MSR on CPUs in the slave package returns 0.
> > 2. The master package does not have to be Physical package 0. And when
> > all the CPUs on the Psys master package are offlined, we lose the Psys
> > energy counter, at runtime.
> > 3. The Psys energy counter can be disabled by BIOS, while all the other
> > energy counters are not affected.
> >
> > It is not easy to handle all of these in the current RAPL PMU design
> > because
> > a) perf_msr_probe() validates the MSR on some random CPU, which may
> either
> > be in the Psys master package or in the Psys slave package.
> > b) all the RAPL events share the same PMU, and there is not API to remove
> > the psys-energy event cleanly, without affecting the other events in
> > the same PMU.
> >
> > This patch addresses the problems in a simple way.
> >
> > First, by setting .no_check bit for RAPL Psys MSR, the psys-energy
> > event is always added, so we don't have to check the Psys
> > ENERGY_STATUS MSR on master package.
> >
> > Then, rapl_not_visible() is removed because 1. it is useless for RAPL
> > MSRs with .no_check cleared, because the
> > .is_visible() callbacks is always overridden in perf_msr_probe().
> > 2. it is useless for RAPL MSRs with .no_check set, because we actually
> > want the sysfs attributes always be visible for those MSRs.
> >
> > With the above changes, we always probe the psys-energy event on Intel
> > SPR platform. Difference is that the event counter returns 0 when the
> > Psys RAPL Domain is disabled by BIOS, or the Psys master package is
> offlined.
>
> Maybe I'm too tired, but I cannot follow. How does this cure the fact that the
> rapl_cpu_mask might not include that master thing. And how can software
> detect what the master thing is to begin with?

To make things simple, I ignore the master thing, and probe the psys-energy counter blindly on SPR.
So rapl_cpu_mask still includes all the online CPUs.
This means that psys-energy is "valid" on all packages, and it just returns different values on different packages.
AKA, whole system power consumption on Psys master package, and Zero on Psys slave packages.

Not sure if I answered your question or not.

Thanks,
rui

2021-01-17 15:46:31

by Zhang Rui

[permalink] [raw]
Subject: RE: [PATCH 2/3] perf/x86/rapl: Fix energy counter detection



> -----Original Message-----
> From: Peter Zijlstra <[email protected]>
> Sent: Saturday, January 16, 2021 8:48 PM
> To: Zhang, Rui <[email protected]>
> Cc: [email protected]; [email protected]; [email protected];
> [email protected]; [email protected];
> [email protected]; [email protected]; [email protected];
> [email protected]; [email protected]
> Subject: Re: [PATCH 2/3] perf/x86/rapl: Fix energy counter detection
> Importance: High
>
> On Sat, Jan 16, 2021 at 08:19:35AM +0000, Zhang, Rui wrote:
> >
> >
> > > -----Original Message-----
> > > From: Peter Zijlstra <[email protected]>
> > > Sent: Saturday, January 16, 2021 4:03 AM
> > > To: Zhang, Rui <[email protected]>
> > > Cc: [email protected]; [email protected]; [email protected];
> > > [email protected]; [email protected];
> > > [email protected]; [email protected]; [email protected];
> > > [email protected]; [email protected]
> > > Subject: Re: [PATCH 2/3] perf/x86/rapl: Fix energy counter detection
> > > Importance: High
> > >
> > > On Fri, Jan 15, 2021 at 05:22:07PM +0800, Zhang Rui wrote:
> > > > In the RAPL ENERGY_COUNTER MSR, only the lower 32bits represent
> > > > the energy counter, and the higher 32bits are reserved.
> > > >
> > > > Add the MSR mask for these MSRs to fix a problem that the RAPL PMU
> > > > events are added erroneously when higher 32bits contain non-zero
> value.
> > >
> > > Why would these high bits be non-zero?
> >
> > On SPR platform, the high bits of Psys energy counter are reused for other
> purpose.
> > High bits for other RAPL domains energy counters still return 0.
> >
> > I didn't mention this because I thought this patch should be okay as a
> generic fix.
>
> But it doesn't fix anything.. there's not anything broken, except on that daft
> SPR thing.

Well, yes.
Before SPR, this is just a potential issue. But things on SPR suggests that this potential issue may become a real one.
So are you suggesting me to also include the SPR information as the justification of this patch?

Thanks,
rui

2021-01-25 06:13:26

by Zhang Rui

[permalink] [raw]
Subject: RE: [PATCH 3/3] perf/x86/rapl: Fix psys-energy event on Intel SPR platform

Hi, Peter,

> -----Original Message-----
> From: Zhang, Rui
> Sent: Sunday, January 17, 2021 10:34 PM
> To: 'Peter Zijlstra' <[email protected]>
> Cc: [email protected]; [email protected]; [email protected];
> [email protected]; [email protected];
> [email protected]; [email protected]; [email protected];
> [email protected]; [email protected]
> Subject: RE: [PATCH 3/3] perf/x86/rapl: Fix psys-energy event on Intel SPR
> platform
>
> Hi, Peter,
>
> > -----Original Message-----
> > From: Peter Zijlstra <[email protected]>
> > Sent: Saturday, January 16, 2021 8:50 PM
> > To: Zhang, Rui <[email protected]>
> > Cc: [email protected]; [email protected]; [email protected];
> > [email protected]; [email protected];
> > [email protected]; [email protected]; [email protected];
> > [email protected]; [email protected]
> > Subject: Re: [PATCH 3/3] perf/x86/rapl: Fix psys-energy event on Intel
> > SPR platform
> > Importance: High
> >
> > On Fri, Jan 15, 2021 at 05:22:08PM +0800, Zhang Rui wrote:
> > > There are several things special for the RAPL Psys energy counter,
> > > on Intel Sapphire Rapids platform.
> > > 1. it contains one Psys master package, and only CPUs on the master
> > > package can read valid value of the Psys energy counter, reading the
> > > MSR on CPUs in the slave package returns 0.
> > > 2. The master package does not have to be Physical package 0. And when
> > > all the CPUs on the Psys master package are offlined, we lose the Psys
> > > energy counter, at runtime.
> > > 3. The Psys energy counter can be disabled by BIOS, while all the other
> > > energy counters are not affected.
> > >
> > > It is not easy to handle all of these in the current RAPL PMU design
> > > because
> > > a) perf_msr_probe() validates the MSR on some random CPU, which may
> > either
> > > be in the Psys master package or in the Psys slave package.
> > > b) all the RAPL events share the same PMU, and there is not API to
> remove
> > > the psys-energy event cleanly, without affecting the other events in
> > > the same PMU.
> > >
> > > This patch addresses the problems in a simple way.
> > >
> > > First, by setting .no_check bit for RAPL Psys MSR, the psys-energy
> > > event is always added, so we don't have to check the Psys
> > > ENERGY_STATUS MSR on master package.
> > >
> > > Then, rapl_not_visible() is removed because 1. it is useless for
> > > RAPL MSRs with .no_check cleared, because the
> > > .is_visible() callbacks is always overridden in perf_msr_probe().
> > > 2. it is useless for RAPL MSRs with .no_check set, because we actually
> > > want the sysfs attributes always be visible for those MSRs.
> > >
> > > With the above changes, we always probe the psys-energy event on
> > > Intel SPR platform. Difference is that the event counter returns 0
> > > when the Psys RAPL Domain is disabled by BIOS, or the Psys master
> > > package is
> > offlined.
> >
> > Maybe I'm too tired, but I cannot follow. How does this cure the fact
> > that the rapl_cpu_mask might not include that master thing. And how
> > can software detect what the master thing is to begin with?
>
> To make things simple, I ignore the master thing, and probe the psys-energy
> counter blindly on SPR.
> So rapl_cpu_mask still includes all the online CPUs.
> This means that psys-energy is "valid" on all packages, and it just returns
> different values on different packages.
> AKA, whole system power consumption on Psys master package, and Zero
> on Psys slave packages.
>
In short, the current code does not allow RAPL energy counter to return 0. And all the work I do is to allow Psys energy counter to return 0.
In this way, the Psys event is "valid" on all CPUs, so we don't need to handle the master thing.
The drawback is that we still see psys-energy event, but with 0 readout, when Psys counter is not available (master package offlined, or psys disabled).

TBH, I'm not quite sure if I understand your original question correctly or not, so please let me know if there is still something unclear.

Thanks,
rui
>
> Thanks,
> rui

2021-02-03 14:19:23

by Zhang Rui

[permalink] [raw]
Subject: RE: [PATCH 3/3] perf/x86/rapl: Fix psys-energy event on Intel SPR platform

Hi, Peter,

> -----Original Message-----
> From: Zhang, Rui
> Sent: Monday, January 25, 2021 2:11 PM
> To: 'Peter Zijlstra' <[email protected]>
> Cc: '[email protected]' <[email protected]>; '[email protected]'
> <[email protected]>; '[email protected]' <[email protected]>;
> '[email protected]' <[email protected]>;
> '[email protected]' <[email protected]>; '[email protected]'
> <[email protected]>; '[email protected]' <linux-
> [email protected]>; '[email protected]' <[email protected]>;
> '[email protected]' <[email protected]>; '[email protected]'
> <[email protected]>
> Subject: RE: [PATCH 3/3] perf/x86/rapl: Fix psys-energy event on Intel SPR
> platform
>
> Hi, Peter,
>
> > -----Original Message-----
> > From: Zhang, Rui
> > Sent: Sunday, January 17, 2021 10:34 PM
> > To: 'Peter Zijlstra' <[email protected]>
> > Cc: [email protected]; [email protected]; [email protected];
> > [email protected]; [email protected];
> > [email protected]; [email protected]; [email protected];
> > [email protected]; [email protected]
> > Subject: RE: [PATCH 3/3] perf/x86/rapl: Fix psys-energy event on Intel
> > SPR platform
> >
> > Hi, Peter,
> >
> > > -----Original Message-----
> > > From: Peter Zijlstra <[email protected]>
> > > Sent: Saturday, January 16, 2021 8:50 PM
> > > To: Zhang, Rui <[email protected]>
> > > Cc: [email protected]; [email protected]; [email protected];
> > > [email protected]; [email protected];
> > > [email protected]; [email protected]; [email protected];
> > > [email protected]; [email protected]
> > > Subject: Re: [PATCH 3/3] perf/x86/rapl: Fix psys-energy event on
> > > Intel SPR platform
> > > Importance: High
> > >
> > > On Fri, Jan 15, 2021 at 05:22:08PM +0800, Zhang Rui wrote:
> > > > There are several things special for the RAPL Psys energy counter,
> > > > on Intel Sapphire Rapids platform.
> > > > 1. it contains one Psys master package, and only CPUs on the master
> > > > package can read valid value of the Psys energy counter, reading the
> > > > MSR on CPUs in the slave package returns 0.
> > > > 2. The master package does not have to be Physical package 0. And
> when
> > > > all the CPUs on the Psys master package are offlined, we lose the Psys
> > > > energy counter, at runtime.
> > > > 3. The Psys energy counter can be disabled by BIOS, while all the other
> > > > energy counters are not affected.
> > > >
> > > > It is not easy to handle all of these in the current RAPL PMU
> > > > design because
> > > > a) perf_msr_probe() validates the MSR on some random CPU, which
> > > > may
> > > either
> > > > be in the Psys master package or in the Psys slave package.
> > > > b) all the RAPL events share the same PMU, and there is not API to
> > remove
> > > > the psys-energy event cleanly, without affecting the other events in
> > > > the same PMU.
> > > >
> > > > This patch addresses the problems in a simple way.
> > > >
> > > > First, by setting .no_check bit for RAPL Psys MSR, the psys-energy
> > > > event is always added, so we don't have to check the Psys
> > > > ENERGY_STATUS MSR on master package.
> > > >
> > > > Then, rapl_not_visible() is removed because 1. it is useless for
> > > > RAPL MSRs with .no_check cleared, because the
> > > > .is_visible() callbacks is always overridden in perf_msr_probe().
> > > > 2. it is useless for RAPL MSRs with .no_check set, because we actually
> > > > want the sysfs attributes always be visible for those MSRs.
> > > >
> > > > With the above changes, we always probe the psys-energy event on
> > > > Intel SPR platform. Difference is that the event counter returns 0
> > > > when the Psys RAPL Domain is disabled by BIOS, or the Psys master
> > > > package is
> > > offlined.
> > >
> > > Maybe I'm too tired, but I cannot follow. How does this cure the
> > > fact that the rapl_cpu_mask might not include that master thing. And
> > > how can software detect what the master thing is to begin with?
> >
> > To make things simple, I ignore the master thing, and probe the
> > psys-energy counter blindly on SPR.
> > So rapl_cpu_mask still includes all the online CPUs.
> > This means that psys-energy is "valid" on all packages, and it just
> > returns different values on different packages.
> > AKA, whole system power consumption on Psys master package, and Zero
> > on Psys slave packages.
> >
> In short, the current code does not allow RAPL energy counter to return 0.
> And all the work I do is to allow Psys energy counter to return 0.
> In this way, the Psys event is "valid" on all CPUs, so we don't need to handle
> the master thing.
> The drawback is that we still see psys-energy event, but with 0 readout,
> when Psys counter is not available (master package offlined, or psys
> disabled).
>
> TBH, I'm not quite sure if I understand your original question correctly or not,
> so please let me know if there is still something unclear.
>
Sorry to bother, may I know your concern about this patch series?

Thanks,
rui
> Thanks,
> rui
> >
> > Thanks,
> > rui

2021-02-03 14:26:20

by Peter Zijlstra

[permalink] [raw]
Subject: Re: [PATCH 2/3] perf/x86/rapl: Fix energy counter detection

On Sun, Jan 17, 2021 at 02:44:04PM +0000, Zhang, Rui wrote:

> > But it doesn't fix anything.. there's not anything broken, except on that daft
> > SPR thing.
>
> Well, yes.
> Before SPR, this is just a potential issue. But things on SPR suggests
> that this potential issue may become a real one. So are you
> suggesting me to also include the SPR information as the justification
> of this patch?

Yes, and fix the subject. It doesn't fix anything, as there isn't
anything broken.

Just say that upcoming SPR will need this.

2021-02-03 14:50:36

by Peter Zijlstra

[permalink] [raw]
Subject: Re: [PATCH 3/3] perf/x86/rapl: Fix psys-energy event on Intel SPR platform


FWIW, your email is malformed, please wrap at 78 chars.

On Mon, Jan 25, 2021 at 06:11:14AM +0000, Zhang, Rui wrote:
> In short, the current code does not allow RAPL energy counter to
> return 0. And all the work I do is to allow Psys energy counter to
> return 0.

Ok.

> In this way, the Psys event is "valid" on all CPUs, so we don't need
> to handle the master thing.

So RAPL is mapped to DIEs, and IIRC we can have multiple DIEs per
Package. But the master thing is a Package.

Is this all moot because SPR has one DIE per Package? Because if it
would have more, there's be more interesting problems I suppose.

2021-02-05 00:55:08

by Zhang Rui

[permalink] [raw]
Subject: Re: [PATCH 3/3] perf/x86/rapl: Fix psys-energy event on Intel SPR platform

Hi, Peter,

On Wed, 2021-02-03 at 15:47 +0100, Peter Zijlstra wrote:
> FWIW, your email is malformed, please wrap at 78 chars.
>
> On Mon, Jan 25, 2021 at 06:11:14AM +0000, Zhang, Rui wrote:
> > In short, the current code does not allow RAPL energy counter to
> > return 0. And all the work I do is to allow Psys energy counter to
> > return 0.
>
> Ok.
>
> > In this way, the Psys event is "valid" on all CPUs, so we don't
> > need
> > to handle the master thing.
>
> So RAPL is mapped to DIEs, and IIRC we can have multiple DIEs per
> Package. But the master thing is a Package.
>
> Is this all moot because SPR has one DIE per Package?

Oh, right.
This is not a problem on SPR because it is a single-die
platform.

> Because if it
> would have more, there's be more interesting problems I suppose.

Agreed.

thanks,
rui