2014-01-08 10:16:04

by Stephane Eranian

[permalink] [raw]
Subject: [PATCH v2 0/2] perf/x86: various RAPL improvements and fixes

This short patch series provides the following improvements
to the RAPL support for perf_events:
- support for RAPL PP1 energy counter
- fix struct perf_event active_list field initialization

The PP1 counter maps to the builtin graphic card for client
processors. that is why it is called energy-gpu.

The new energy event is:
- name: power/energy-gpu/
- code: 0x4
- unit: Joules
- scaling factor: 2^-32 Joules.

In v2, we renamed energy-pp1 to energy-gpu. We dropped the Haswell
Celeron patch because it is committed already. But we add a fix for
the broken initialization of the active_entry list.

Signed-off-by: Stephane Eranian <[email protected]>

Stephane Eranian (2):
perf/x86: fix active_entry initialization
perf/x86: add RAPL PP1 energy counter support

arch/x86/kernel/cpu/perf_event_intel_rapl.c | 31 +++++++++++++++++++++------
include/linux/perf_event.h | 6 ++----
kernel/events/core.c | 2 ++
3 files changed, 28 insertions(+), 11 deletions(-)

--
1.7.9.5


2014-01-08 10:16:10

by Stephane Eranian

[permalink] [raw]
Subject: [PATCH v2 1/2] perf/x86: fix active_entry initialization

This patch fixes a problem with the initialization of the
struct perf_event active_entry field. It is defined inside
an anonymous union and was initialized in perf_event_alloc()
using INIT_LIST_HEAD(). However at that time, we do not know
whether the event is going to use active_entry or hlist_entry (SW).
Or at last, we don't want to make that determination there.
The problem is that hlist and list_head are not initialized
the same way. One is okay with NULL (from kzmalloc), the other
needs to pointers to point to self.

This patch resolves this problem by dropping the union.
This will avoid problems later on, if someone starts using
active_entry or hlist_entry without verifying that they
actually overlap. This also solves the initialization
problem.

Signed-off-by: Stephane Eranian <[email protected]>
---
include/linux/perf_event.h | 6 ++----
kernel/events/core.c | 2 ++
2 files changed, 4 insertions(+), 4 deletions(-)

diff --git a/include/linux/perf_event.h b/include/linux/perf_event.h
index 8f4a70f..e56b07f 100644
--- a/include/linux/perf_event.h
+++ b/include/linux/perf_event.h
@@ -319,10 +319,8 @@ struct perf_event {
*/
struct list_head migrate_entry;

- union {
- struct hlist_node hlist_entry;
- struct list_head active_entry;
- };
+ struct hlist_node hlist_entry;
+ struct list_head active_entry;
int nr_siblings;
int group_flags;
struct perf_event *group_leader;
diff --git a/kernel/events/core.c b/kernel/events/core.c
index 4b743b2..74fdef5 100644
--- a/kernel/events/core.c
+++ b/kernel/events/core.c
@@ -6685,6 +6685,8 @@ perf_event_alloc(struct perf_event_attr *attr, int cpu,
INIT_LIST_HEAD(&event->sibling_list);
INIT_LIST_HEAD(&event->rb_entry);
INIT_LIST_HEAD(&event->active_entry);
+ INIT_HLIST_NODE(&event->hlist_entry);
+

init_waitqueue_head(&event->waitq);
init_irq_work(&event->pending, perf_pending_event);
--
1.7.9.5

2014-01-08 10:16:46

by Stephane Eranian

[permalink] [raw]
Subject: [PATCH v2 2/2] perf/x86: add RAPL PP1 energy counter support

This patch adds support for the RAPL energy counter
PP1 (Power Plane 1).

On client processors, it usually corresponds to the
energy consumption of the builtin graphic card. That
is why the sysfs event is called energy-gpu.

New event:
- name: power/energy-gpu/
- code: event=0x4
- unit: 2^-32 Joules

On processors without graphics, this should count 0.
The patch only enables this event on client processors.

Reviewed-by: Maria Dimakopoulou <[email protected]>
Signed-off-by: Stephane Eranian <[email protected]>
---
arch/x86/kernel/cpu/perf_event_intel_rapl.c | 31 +++++++++++++++++++++------
1 file changed, 24 insertions(+), 7 deletions(-)

diff --git a/arch/x86/kernel/cpu/perf_event_intel_rapl.c b/arch/x86/kernel/cpu/perf_event_intel_rapl.c
index 0e3754e..5ad35ad 100644
--- a/arch/x86/kernel/cpu/perf_event_intel_rapl.c
+++ b/arch/x86/kernel/cpu/perf_event_intel_rapl.c
@@ -27,6 +27,10 @@
* event: rapl_energy_dram
* perf code: 0x3
*
+ * dram counter: consumption of the builtin-gpu domain (client only)
+ * event: rapl_energy_gpu
+ * perf code: 0x4
+ *
* We manage those counters as free running (read-only). They may be
* use simultaneously by other tools, such as turbostat.
*
@@ -55,10 +59,13 @@
#define INTEL_RAPL_PKG 0x2 /* pseudo-encoding */
#define RAPL_IDX_RAM_NRG_STAT 2 /* DRAM */
#define INTEL_RAPL_RAM 0x3 /* pseudo-encoding */
+#define RAPL_IDX_PP1_NRG_STAT 3 /* DRAM */
+#define INTEL_RAPL_PP1 0x4 /* pseudo-encoding */

/* Clients have PP0, PKG */
#define RAPL_IDX_CLN (1<<RAPL_IDX_PP0_NRG_STAT|\
- 1<<RAPL_IDX_PKG_NRG_STAT)
+ 1<<RAPL_IDX_PKG_NRG_STAT|\
+ 1<<RAPL_IDX_PP1_NRG_STAT)

/* Servers have PP0, PKG, RAM */
#define RAPL_IDX_SRV (1<<RAPL_IDX_PP0_NRG_STAT|\
@@ -315,6 +322,10 @@ static int rapl_pmu_event_init(struct perf_event *event)
bit = RAPL_IDX_RAM_NRG_STAT;
msr = MSR_DRAM_ENERGY_STATUS;
break;
+ case INTEL_RAPL_PP1:
+ bit = RAPL_IDX_PP1_NRG_STAT;
+ msr = MSR_PP1_ENERGY_STATUS;
+ break;
default:
return -EINVAL;
}
@@ -367,19 +378,22 @@ static struct attribute_group rapl_pmu_attr_group = {
};

EVENT_ATTR_STR(energy-cores, rapl_cores, "event=0x01");
-EVENT_ATTR_STR(energy-pkg , rapl_pkg, "event=0x02");
-EVENT_ATTR_STR(energy-ram , rapl_ram, "event=0x03");
+EVENT_ATTR_STR(energy-pkg , rapl_pkg, "event=0x02");
+EVENT_ATTR_STR(energy-ram , rapl_ram, "event=0x03");
+EVENT_ATTR_STR(energy-gpu , rapl_gpu, "event=0x04");

EVENT_ATTR_STR(energy-cores.unit, rapl_cores_unit, "Joules");
-EVENT_ATTR_STR(energy-pkg.unit , rapl_pkg_unit, "Joules");
-EVENT_ATTR_STR(energy-ram.unit , rapl_ram_unit, "Joules");
+EVENT_ATTR_STR(energy-pkg.unit , rapl_pkg_unit, "Joules");
+EVENT_ATTR_STR(energy-ram.unit , rapl_ram_unit, "Joules");
+EVENT_ATTR_STR(energy-gpu.unit , rapl_gpu_unit, "Joules");

/*
* we compute in 0.23 nJ increments regardless of MSR
*/
EVENT_ATTR_STR(energy-cores.scale, rapl_cores_scale, "2.3283064365386962890625e-10");
-EVENT_ATTR_STR(energy-pkg.scale, rapl_pkg_scale, "2.3283064365386962890625e-10");
-EVENT_ATTR_STR(energy-ram.scale, rapl_ram_scale, "2.3283064365386962890625e-10");
+EVENT_ATTR_STR(energy-pkg.scale, rapl_pkg_scale, "2.3283064365386962890625e-10");
+EVENT_ATTR_STR(energy-ram.scale, rapl_ram_scale, "2.3283064365386962890625e-10");
+EVENT_ATTR_STR(energy-gpu.scale, rapl_gpu_scale, "2.3283064365386962890625e-10");

static struct attribute *rapl_events_srv_attr[] = {
EVENT_PTR(rapl_cores),
@@ -399,12 +413,15 @@ static struct attribute *rapl_events_srv_attr[] = {
static struct attribute *rapl_events_cln_attr[] = {
EVENT_PTR(rapl_cores),
EVENT_PTR(rapl_pkg),
+ EVENT_PTR(rapl_gpu),

EVENT_PTR(rapl_cores_unit),
EVENT_PTR(rapl_pkg_unit),
+ EVENT_PTR(rapl_gpu_unit),

EVENT_PTR(rapl_cores_scale),
EVENT_PTR(rapl_pkg_scale),
+ EVENT_PTR(rapl_gpu_scale),
NULL,
};

--
1.7.9.5

2014-01-08 11:29:09

by Peter Zijlstra

[permalink] [raw]
Subject: Re: [PATCH v2 0/2] perf/x86: various RAPL improvements and fixes

On Wed, Jan 08, 2014 at 11:15:51AM +0100, Stephane Eranian wrote:
> This short patch series provides the following improvements
> to the RAPL support for perf_events:
> - support for RAPL PP1 energy counter
> - fix struct perf_event active_list field initialization
>
> The PP1 counter maps to the builtin graphic card for client
> processors. that is why it is called energy-gpu.
>
> The new energy event is:
> - name: power/energy-gpu/
> - code: 0x4
> - unit: Joules
> - scaling factor: 2^-32 Joules.
>
> In v2, we renamed energy-pp1 to energy-gpu. We dropped the Haswell
> Celeron patch because it is committed already. But we add a fix for
> the broken initialization of the active_entry list.
>

Thanks!

Subject: [tip:perf/core] perf/x86: Fix active_entry initialization

Commit-ID: f3ae75de98c4bac145a87d830c156c96f9414022
Gitweb: http://git.kernel.org/tip/f3ae75de98c4bac145a87d830c156c96f9414022
Author: Stephane Eranian <[email protected]>
AuthorDate: Wed, 8 Jan 2014 11:15:52 +0100
Committer: Ingo Molnar <[email protected]>
CommitDate: Sun, 12 Jan 2014 10:16:07 +0100

perf/x86: Fix active_entry initialization

This patch fixes a problem with the initialization of the
struct perf_event active_entry field. It is defined inside
an anonymous union and was initialized in perf_event_alloc()
using INIT_LIST_HEAD(). However at that time, we do not know
whether the event is going to use active_entry or hlist_entry (SW).
Or at last, we don't want to make that determination there.
The problem is that hlist and list_head are not initialized
the same way. One is okay with NULL (from kzmalloc), the other
needs to pointers to point to self.

This patch resolves this problem by dropping the union.
This will avoid problems later on, if someone starts using
active_entry or hlist_entry without verifying that they
actually overlap. This also solves the initialization
problem.

Signed-off-by: Stephane Eranian <[email protected]>
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Signed-off-by: Peter Zijlstra <[email protected]>
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Ingo Molnar <[email protected]>
---
include/linux/perf_event.h | 6 ++----
kernel/events/core.c | 2 ++
2 files changed, 4 insertions(+), 4 deletions(-)

diff --git a/include/linux/perf_event.h b/include/linux/perf_event.h
index 8f4a70f..e56b07f 100644
--- a/include/linux/perf_event.h
+++ b/include/linux/perf_event.h
@@ -319,10 +319,8 @@ struct perf_event {
*/
struct list_head migrate_entry;

- union {
- struct hlist_node hlist_entry;
- struct list_head active_entry;
- };
+ struct hlist_node hlist_entry;
+ struct list_head active_entry;
int nr_siblings;
int group_flags;
struct perf_event *group_leader;
diff --git a/kernel/events/core.c b/kernel/events/core.c
index 89d34f9..c3b6c27 100644
--- a/kernel/events/core.c
+++ b/kernel/events/core.c
@@ -6670,6 +6670,8 @@ perf_event_alloc(struct perf_event_attr *attr, int cpu,
INIT_LIST_HEAD(&event->sibling_list);
INIT_LIST_HEAD(&event->rb_entry);
INIT_LIST_HEAD(&event->active_entry);
+ INIT_HLIST_NODE(&event->hlist_entry);
+

init_waitqueue_head(&event->waitq);
init_irq_work(&event->pending, perf_pending_event);

Subject: [tip:perf/core] perf/x86/intel: Add Intel RAPL PP1 energy counter support

Commit-ID: f228c5b882602697a1adb50d61ff688b0df1eced
Gitweb: http://git.kernel.org/tip/f228c5b882602697a1adb50d61ff688b0df1eced
Author: Stephane Eranian <[email protected]>
AuthorDate: Wed, 8 Jan 2014 11:15:53 +0100
Committer: Ingo Molnar <[email protected]>
CommitDate: Sun, 12 Jan 2014 10:16:08 +0100

perf/x86/intel: Add Intel RAPL PP1 energy counter support

This patch adds support for the Intel RAPL energy counter
PP1 (Power Plane 1).

On client processors, it usually corresponds to the
energy consumption of the builtin graphic card. That
is why the sysfs event is called energy-gpu.

New event:
- name: power/energy-gpu/
- code: event=0x4
- unit: 2^-32 Joules

On processors without graphics, this should count 0.
The patch only enables this event on client processors.

Reviewed-by: Maria Dimakopoulou <[email protected]>
Signed-off-by: Stephane Eranian <[email protected]>
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Signed-off-by: Peter Zijlstra <[email protected]>
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Ingo Molnar <[email protected]>
---
arch/x86/kernel/cpu/perf_event_intel_rapl.c | 31 ++++++++++++++++++++++-------
1 file changed, 24 insertions(+), 7 deletions(-)

diff --git a/arch/x86/kernel/cpu/perf_event_intel_rapl.c b/arch/x86/kernel/cpu/perf_event_intel_rapl.c
index 0e3754e..5ad35ad 100644
--- a/arch/x86/kernel/cpu/perf_event_intel_rapl.c
+++ b/arch/x86/kernel/cpu/perf_event_intel_rapl.c
@@ -27,6 +27,10 @@
* event: rapl_energy_dram
* perf code: 0x3
*
+ * dram counter: consumption of the builtin-gpu domain (client only)
+ * event: rapl_energy_gpu
+ * perf code: 0x4
+ *
* We manage those counters as free running (read-only). They may be
* use simultaneously by other tools, such as turbostat.
*
@@ -55,10 +59,13 @@
#define INTEL_RAPL_PKG 0x2 /* pseudo-encoding */
#define RAPL_IDX_RAM_NRG_STAT 2 /* DRAM */
#define INTEL_RAPL_RAM 0x3 /* pseudo-encoding */
+#define RAPL_IDX_PP1_NRG_STAT 3 /* DRAM */
+#define INTEL_RAPL_PP1 0x4 /* pseudo-encoding */

/* Clients have PP0, PKG */
#define RAPL_IDX_CLN (1<<RAPL_IDX_PP0_NRG_STAT|\
- 1<<RAPL_IDX_PKG_NRG_STAT)
+ 1<<RAPL_IDX_PKG_NRG_STAT|\
+ 1<<RAPL_IDX_PP1_NRG_STAT)

/* Servers have PP0, PKG, RAM */
#define RAPL_IDX_SRV (1<<RAPL_IDX_PP0_NRG_STAT|\
@@ -315,6 +322,10 @@ static int rapl_pmu_event_init(struct perf_event *event)
bit = RAPL_IDX_RAM_NRG_STAT;
msr = MSR_DRAM_ENERGY_STATUS;
break;
+ case INTEL_RAPL_PP1:
+ bit = RAPL_IDX_PP1_NRG_STAT;
+ msr = MSR_PP1_ENERGY_STATUS;
+ break;
default:
return -EINVAL;
}
@@ -367,19 +378,22 @@ static struct attribute_group rapl_pmu_attr_group = {
};

EVENT_ATTR_STR(energy-cores, rapl_cores, "event=0x01");
-EVENT_ATTR_STR(energy-pkg , rapl_pkg, "event=0x02");
-EVENT_ATTR_STR(energy-ram , rapl_ram, "event=0x03");
+EVENT_ATTR_STR(energy-pkg , rapl_pkg, "event=0x02");
+EVENT_ATTR_STR(energy-ram , rapl_ram, "event=0x03");
+EVENT_ATTR_STR(energy-gpu , rapl_gpu, "event=0x04");

EVENT_ATTR_STR(energy-cores.unit, rapl_cores_unit, "Joules");
-EVENT_ATTR_STR(energy-pkg.unit , rapl_pkg_unit, "Joules");
-EVENT_ATTR_STR(energy-ram.unit , rapl_ram_unit, "Joules");
+EVENT_ATTR_STR(energy-pkg.unit , rapl_pkg_unit, "Joules");
+EVENT_ATTR_STR(energy-ram.unit , rapl_ram_unit, "Joules");
+EVENT_ATTR_STR(energy-gpu.unit , rapl_gpu_unit, "Joules");

/*
* we compute in 0.23 nJ increments regardless of MSR
*/
EVENT_ATTR_STR(energy-cores.scale, rapl_cores_scale, "2.3283064365386962890625e-10");
-EVENT_ATTR_STR(energy-pkg.scale, rapl_pkg_scale, "2.3283064365386962890625e-10");
-EVENT_ATTR_STR(energy-ram.scale, rapl_ram_scale, "2.3283064365386962890625e-10");
+EVENT_ATTR_STR(energy-pkg.scale, rapl_pkg_scale, "2.3283064365386962890625e-10");
+EVENT_ATTR_STR(energy-ram.scale, rapl_ram_scale, "2.3283064365386962890625e-10");
+EVENT_ATTR_STR(energy-gpu.scale, rapl_gpu_scale, "2.3283064365386962890625e-10");

static struct attribute *rapl_events_srv_attr[] = {
EVENT_PTR(rapl_cores),
@@ -399,12 +413,15 @@ static struct attribute *rapl_events_srv_attr[] = {
static struct attribute *rapl_events_cln_attr[] = {
EVENT_PTR(rapl_cores),
EVENT_PTR(rapl_pkg),
+ EVENT_PTR(rapl_gpu),

EVENT_PTR(rapl_cores_unit),
EVENT_PTR(rapl_pkg_unit),
+ EVENT_PTR(rapl_gpu_unit),

EVENT_PTR(rapl_cores_scale),
EVENT_PTR(rapl_pkg_scale),
+ EVENT_PTR(rapl_gpu_scale),
NULL,
};

2014-01-13 15:46:18

by Peter Zijlstra

[permalink] [raw]
Subject: Re: [tip:perf/core] perf/x86: Fix active_entry initialization

On Sun, Jan 12, 2014 at 10:42:47AM -0800, tip-bot for Stephane Eranian wrote:
> - union {
> - struct hlist_node hlist_entry;
> - struct list_head active_entry;
> - };
> + struct hlist_node hlist_entry;
> + struct list_head active_entry;

Fwiw, we could probably stick both of those in hw_perf_event.

the software event's hrtimer isn't the biggest entry anymore, so we
could add the hlist_entry in there, and we can start a new entry for
RAPL events.

2014-01-13 15:49:41

by Stephane Eranian

[permalink] [raw]
Subject: Re: [tip:perf/core] perf/x86: Fix active_entry initialization

On Mon, Jan 13, 2014 at 4:45 PM, Peter Zijlstra <[email protected]> wrote:
>
> On Sun, Jan 12, 2014 at 10:42:47AM -0800, tip-bot for Stephane Eranian wrote:
> > - union {
> > - struct hlist_node hlist_entry;
> > - struct list_head active_entry;
> > - };
> > + struct hlist_node hlist_entry;
> > + struct list_head active_entry;
>
> Fwiw, we could probably stick both of those in hw_perf_event.
>
> the software event's hrtimer isn't the biggest entry anymore, so we
> could add the hlist_entry in there, and we can start a new entry for
> RAPL events.

Yeah, that would be possible.
The other thing I noticed too in another uncore patch I am preparing is
that we could also use that active_entry to avoid tracking association
of uncore events to counters to update them in the hrtimer routine.
All we need there is the list of active event on the uncore PMU and
we iterate over them.