2022-01-05 18:57:17

by Stephane Eranian

[permalink] [raw]
Subject: [PATCH] perf/x86/rapl: fix AMD event handling

The RAPL events exposed under /sys/devices/power/events should only reflect
what the underlying hardware actually support. This is how it works on Intel
RAPL and Intel core/uncore PMUs in general.
But on AMD, this was not the case. All possible RAPL events were advertised.

This is what it showed on an AMD Fam17h:
$ ls /sys/devices/power/events/
energy-cores energy-gpu energy-pkg energy-psys
energy-ram energy-cores.scale energy-gpu.scale energy-pkg.scale
energy-psys.scale energy-ram.scale energy-cores.unit energy-gpu.unit
energy-pkg.unit energy-psys.unit energy-ram.unit

Yet, on AMD Fam17h, only energy-pkg is supported.

This patch fixes the problem. Given the way perf_msr_probe() works, the
amd_rapl_msrs[] table has to have all entries filled out and in particular
the group field, otherwise perf_msr_probe() defaults to making the event
visible.

With the patch applied, the kernel now only shows was is actually supported:

$ ls /sys/devices/power/events/
energy-pkg energy-pkg.scale energy-pkg.unit

The patch also uses the RAPL_MSR_MASK because only the 32-bits LSB of the
RAPL counters are relevant when reading power consumption.

Signed-off-by: Stephane Eranian <[email protected]>
---
arch/x86/events/rapl.c | 9 ++++++---
1 file changed, 6 insertions(+), 3 deletions(-)

diff --git a/arch/x86/events/rapl.c b/arch/x86/events/rapl.c
index 85feafacc445..77e3a47af5ad 100644
--- a/arch/x86/events/rapl.c
+++ b/arch/x86/events/rapl.c
@@ -536,11 +536,14 @@ static struct perf_msr intel_rapl_spr_msrs[] = {
* - perf_msr_probe(PERF_RAPL_MAX)
* - want to use same event codes across both architectures
*/
-static struct perf_msr amd_rapl_msrs[PERF_RAPL_MAX] = {
- [PERF_RAPL_PKG] = { MSR_AMD_PKG_ENERGY_STATUS, &rapl_events_pkg_group, test_msr },
+static struct perf_msr amd_rapl_msrs[] = {
+ [PERF_RAPL_PP0] = { 0, &rapl_events_cores_group, 0, false, 0 },
+ [PERF_RAPL_PKG] = { MSR_AMD_PKG_ENERGY_STATUS, &rapl_events_pkg_group, test_msr, false, RAPL_MSR_MASK },
+ [PERF_RAPL_RAM] = { 0, &rapl_events_ram_group, 0, false, 0 },
+ [PERF_RAPL_PP1] = { 0, &rapl_events_gpu_group, 0, false, 0 },
+ [PERF_RAPL_PSYS] = { 0, &rapl_events_psys_group, 0, false, 0 },
};

-
static int rapl_cpu_offline(unsigned int cpu)
{
struct rapl_pmu *pmu = cpu_to_rapl_pmu(cpu);
--
2.34.1.448.ga2b2bfdf31-goog



2022-01-06 00:20:08

by kernel test robot

[permalink] [raw]
Subject: Re: [PATCH] perf/x86/rapl: fix AMD event handling

Hi Stephane,

Thank you for the patch! Perhaps something to improve:

[auto build test WARNING on tip/perf/core]
[also build test WARNING on tip/master linux/master linus/master v5.16-rc8 next-20220105]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch]

url: https://github.com/0day-ci/linux/commits/Stephane-Eranian/perf-x86-rapl-fix-AMD-event-handling/20220106-025808
base: https://git.kernel.org/pub/scm/linux/kernel/git/tip/tip.git a9f4a6e92b3b319296fb078da2615f618f6cd80c
config: i386-randconfig-s002-20220105 (https://download.01.org/0day-ci/archive/20220106/[email protected]/config)
compiler: gcc-9 (Debian 9.3.0-22) 9.3.0
reproduce:
# apt-get install sparse
# sparse version: v0.6.4-dirty
# https://github.com/0day-ci/linux/commit/3c196dc3aa384eb70492fdb07371de164e98e238
git remote add linux-review https://github.com/0day-ci/linux
git fetch --no-tags linux-review Stephane-Eranian/perf-x86-rapl-fix-AMD-event-handling/20220106-025808
git checkout 3c196dc3aa384eb70492fdb07371de164e98e238
# save the config file to linux build tree
mkdir build_dir
make W=1 C=1 CF='-fdiagnostic-prefix -D__CHECK_ENDIAN__' O=build_dir ARCH=i386 SHELL=/bin/bash arch/x86/events/

If you fix the issue, kindly add following tag as appropriate
Reported-by: kernel test robot <[email protected]>


sparse warnings: (new ones prefixed by >>)
>> arch/x86/events/rapl.c:540:59: sparse: sparse: Using plain integer as NULL pointer
arch/x86/events/rapl.c:542:59: sparse: sparse: Using plain integer as NULL pointer
arch/x86/events/rapl.c:543:59: sparse: sparse: Using plain integer as NULL pointer
arch/x86/events/rapl.c:544:59: sparse: sparse: Using plain integer as NULL pointer

vim +540 arch/x86/events/rapl.c

533
534 /*
535 * Force to PERF_RAPL_MAX size due to:
536 * - perf_msr_probe(PERF_RAPL_MAX)
537 * - want to use same event codes across both architectures
538 */
539 static struct perf_msr amd_rapl_msrs[] = {
> 540 [PERF_RAPL_PP0] = { 0, &rapl_events_cores_group, 0, false, 0 },
541 [PERF_RAPL_PKG] = { MSR_AMD_PKG_ENERGY_STATUS, &rapl_events_pkg_group, test_msr, false, RAPL_MSR_MASK },
542 [PERF_RAPL_RAM] = { 0, &rapl_events_ram_group, 0, false, 0 },
543 [PERF_RAPL_PP1] = { 0, &rapl_events_gpu_group, 0, false, 0 },
544 [PERF_RAPL_PSYS] = { 0, &rapl_events_psys_group, 0, false, 0 },
545 };
546

---
0-DAY CI Kernel Test Service, Intel Corporation
https://lists.01.org/hyperkitty/list/[email protected]

2022-01-20 07:41:30

by tip-bot2 for Jacob Pan

[permalink] [raw]
Subject: [tip: perf/urgent] perf/x86/rapl: fix AMD event handling

The following commit has been merged into the perf/urgent branch of tip:

Commit-ID: 0036fb00a756a2f6e360d44e2e3d2200a8afbc9b
Gitweb: https://git.kernel.org/tip/0036fb00a756a2f6e360d44e2e3d2200a8afbc9b
Author: Stephane Eranian <[email protected]>
AuthorDate: Wed, 05 Jan 2022 10:56:59 -08:00
Committer: Peter Zijlstra <[email protected]>
CommitterDate: Tue, 18 Jan 2022 12:09:48 +01:00

perf/x86/rapl: fix AMD event handling

The RAPL events exposed under /sys/devices/power/events should only reflect
what the underlying hardware actually support. This is how it works on Intel
RAPL and Intel core/uncore PMUs in general.
But on AMD, this was not the case. All possible RAPL events were advertised.

This is what it showed on an AMD Fam17h:
$ ls /sys/devices/power/events/
energy-cores energy-gpu energy-pkg energy-psys
energy-ram energy-cores.scale energy-gpu.scale energy-pkg.scale
energy-psys.scale energy-ram.scale energy-cores.unit energy-gpu.unit
energy-pkg.unit energy-psys.unit energy-ram.unit

Yet, on AMD Fam17h, only energy-pkg is supported.

This patch fixes the problem. Given the way perf_msr_probe() works, the
amd_rapl_msrs[] table has to have all entries filled out and in particular
the group field, otherwise perf_msr_probe() defaults to making the event
visible.

With the patch applied, the kernel now only shows was is actually supported:

$ ls /sys/devices/power/events/
energy-pkg energy-pkg.scale energy-pkg.unit

The patch also uses the RAPL_MSR_MASK because only the 32-bits LSB of the
RAPL counters are relevant when reading power consumption.

Signed-off-by: Stephane Eranian <[email protected]>
Signed-off-by: Peter Zijlstra (Intel) <[email protected]>
Link: https://lkml.kernel.org/r/[email protected]
---
arch/x86/events/rapl.c | 9 ++++++---
1 file changed, 6 insertions(+), 3 deletions(-)

diff --git a/arch/x86/events/rapl.c b/arch/x86/events/rapl.c
index 85feafa..77e3a47 100644
--- a/arch/x86/events/rapl.c
+++ b/arch/x86/events/rapl.c
@@ -536,11 +536,14 @@ static struct perf_msr intel_rapl_spr_msrs[] = {
* - perf_msr_probe(PERF_RAPL_MAX)
* - want to use same event codes across both architectures
*/
-static struct perf_msr amd_rapl_msrs[PERF_RAPL_MAX] = {
- [PERF_RAPL_PKG] = { MSR_AMD_PKG_ENERGY_STATUS, &rapl_events_pkg_group, test_msr },
+static struct perf_msr amd_rapl_msrs[] = {
+ [PERF_RAPL_PP0] = { 0, &rapl_events_cores_group, 0, false, 0 },
+ [PERF_RAPL_PKG] = { MSR_AMD_PKG_ENERGY_STATUS, &rapl_events_pkg_group, test_msr, false, RAPL_MSR_MASK },
+ [PERF_RAPL_RAM] = { 0, &rapl_events_ram_group, 0, false, 0 },
+ [PERF_RAPL_PP1] = { 0, &rapl_events_gpu_group, 0, false, 0 },
+ [PERF_RAPL_PSYS] = { 0, &rapl_events_psys_group, 0, false, 0 },
};

-
static int rapl_cpu_offline(unsigned int cpu)
{
struct rapl_pmu *pmu = cpu_to_rapl_pmu(cpu);