From: Stephane Eranian <[email protected]>
The following patch set enforces exclusive PMU access for
Intel SandyBridge INST_REITRED:PREC_DIST event when used
with PEBS as described in the SDM Vol 3b. Without this,
the sample distribution may not be correct.
The kernel now rejects PEBS + INST_RETIRED:PREC_DIST
on SNB if attr->exclusive is not set. One reason to
do it this way is to make sure users understands the
restriction. If the kernel were to force exclusive
on the fly, users would have cases where they would
get no samples and no error messages to understand
why.
The first patch extends perf to allow users to request
exclusive PMU access by introducing a new modifier: x.
For instance:
$ perf record -e r01c0:ppx .....
or
$ perf record -e cpu/event=0xc0,umask=0x1/ppx ....
In V2, we fixed the pr_info() issue. We know use
pr_warn_once(). Also fixed the max repeat factor
for the event modifier in the event parser.
Signed-off-by: Stephane Eranian <[email protected]>
Stephane Eranian (2):
perf tools: add event modifier to request exclusive PMU access
perf: enforce SNB exclusive access for INST_RETIRED:PREC_DIST
arch/x86/kernel/cpu/perf_event.h | 2 +-
arch/x86/kernel/cpu/perf_event_intel.c | 30 +++++++++++++++++++++++++-----
tools/perf/util/parse-events.c | 7 +++++++
tools/perf/util/parse-events.l | 2 +-
4 files changed, 34 insertions(+), 7 deletions(-)
--
1.7.9.5
This patch adds the x modifier for events. It allows users to
request exclusive PMU access (attr->exclusive):
perf stat -e cycles:x ......
or
perf stat -e cpu/cycles/x ....
Exclusive mode is a feature of perf_events which was not yet
supported by the perf tool. Some events may require exclusive
PMU access (like on Intel SandyBridge).
In V2, we fixed the max repeat factor for the event
modifier. Must be 9 not 8.
Signed-off-by: Stephane Eranian <[email protected]>
---
tools/perf/util/parse-events.c | 7 +++++++
tools/perf/util/parse-events.l | 2 +-
2 files changed, 8 insertions(+), 1 deletion(-)
diff --git a/tools/perf/util/parse-events.c b/tools/perf/util/parse-events.c
index aed38e4..aa73392 100644
--- a/tools/perf/util/parse-events.c
+++ b/tools/perf/util/parse-events.c
@@ -642,6 +642,7 @@ struct event_modifier {
int eG;
int precise;
int exclude_GH;
+ int exclusive;
};
static int get_event_modifier(struct event_modifier *mod, char *str,
@@ -656,6 +657,7 @@ static int get_event_modifier(struct event_modifier *mod, char *str,
int exclude = eu | ek | eh;
int exclude_GH = evsel ? evsel->exclude_GH : 0;
+ int exclusive = evsel ? evsel->attr.exclusive : 0;
/*
* We are here for group and 'GH' was not set as event
@@ -690,6 +692,8 @@ static int get_event_modifier(struct event_modifier *mod, char *str,
eH = 0;
} else if (*str == 'p') {
precise++;
+ } else if (*str == 'x') {
+ exclusive = 1;
} else
break;
@@ -716,6 +720,8 @@ static int get_event_modifier(struct event_modifier *mod, char *str,
mod->eG = eG;
mod->precise = precise;
mod->exclude_GH = exclude_GH;
+ mod->exclusive = exclusive;
+
return 0;
}
@@ -741,6 +747,7 @@ int parse_events__modifier_event(struct list_head *list, char *str, bool add)
evsel->attr.precise_ip = mod.precise;
evsel->attr.exclude_host = mod.eH;
evsel->attr.exclude_guest = mod.eG;
+ evsel->attr.exclusive = mod.exclusive;
evsel->exclude_GH = mod.exclude_GH;
}
diff --git a/tools/perf/util/parse-events.l b/tools/perf/util/parse-events.l
index c87efc1..9c8a06d 100644
--- a/tools/perf/util/parse-events.l
+++ b/tools/perf/util/parse-events.l
@@ -81,7 +81,7 @@ num_dec [0-9]+
num_hex 0x[a-fA-F0-9]+
num_raw_hex [a-fA-F0-9]+
name [a-zA-Z_*?][a-zA-Z0-9_*?]*
-modifier_event [ukhpGH]{1,8}
+modifier_event [ukhpGHx]{1,9}
modifier_bp [rwx]{1,3}
%%
--
1.7.9.5
On all Intel SandyBridge processors, the INST_RETIRED:PREC_DIST
when used with PEBS must be measured alone. That means no other
event can be active on the PMU at the same time as per Intel SDM
Vol3b. This is what the exclusive mode of perf_events provides.
However, it was not enforced for that event.
This patch forces the INST_RETIRED:PREC_DIST event to have
the attr->exclusive bit set in order to be used with PEBS
on SNB. On Intel Ivybridge, that restriction is gone.
In V2, addressed the issue of the pr_info(). Now using
the pr_warn_once() call.
Signed-off-by: Stephane Eranian <[email protected]>
---
arch/x86/kernel/cpu/perf_event.h | 2 +-
arch/x86/kernel/cpu/perf_event_intel.c | 30 +++++++++++++++++++++++++-----
2 files changed, 26 insertions(+), 6 deletions(-)
diff --git a/arch/x86/kernel/cpu/perf_event.h b/arch/x86/kernel/cpu/perf_event.h
index 271d257..722ab12 100644
--- a/arch/x86/kernel/cpu/perf_event.h
+++ b/arch/x86/kernel/cpu/perf_event.h
@@ -382,7 +382,7 @@ struct x86_pmu {
int pebs_record_size;
void (*drain_pebs)(struct pt_regs *regs);
struct event_constraint *pebs_constraints;
- void (*pebs_aliases)(struct perf_event *event);
+ int (*pebs_aliases)(struct perf_event *event);
int max_pebs_events;
/*
diff --git a/arch/x86/kernel/cpu/perf_event_intel.c b/arch/x86/kernel/cpu/perf_event_intel.c
index 324bb52..93d61d5 100644
--- a/arch/x86/kernel/cpu/perf_event_intel.c
+++ b/arch/x86/kernel/cpu/perf_event_intel.c
@@ -1416,7 +1416,7 @@ static void intel_put_event_constraints(struct cpu_hw_events *cpuc,
intel_put_shared_regs_event_constraints(cpuc, event);
}
-static void intel_pebs_aliases_core2(struct perf_event *event)
+static int intel_pebs_aliases_core2(struct perf_event *event)
{
if ((event->hw.config & X86_RAW_EVENT_MASK) == 0x003c) {
/*
@@ -1442,9 +1442,10 @@ static void intel_pebs_aliases_core2(struct perf_event *event)
alt_config |= (event->hw.config & ~X86_RAW_EVENT_MASK);
event->hw.config = alt_config;
}
+ return 0;
}
-static void intel_pebs_aliases_snb(struct perf_event *event)
+static int intel_pebs_aliases_ivb(struct perf_event *event)
{
if ((event->hw.config & X86_RAW_EVENT_MASK) == 0x003c) {
/*
@@ -1470,6 +1471,22 @@ static void intel_pebs_aliases_snb(struct perf_event *event)
alt_config |= (event->hw.config & ~X86_RAW_EVENT_MASK);
event->hw.config = alt_config;
}
+ return 0;
+}
+
+static int intel_pebs_aliases_snb(struct perf_event *event)
+{
+ u64 cfg = event->hw.config;
+ /*
+ * for INST_RETIRED.PREC_DIST to work correctly with PEBS, it must
+ * be measured alone on SNB (exclusive PMU access) as per Intel SDM.
+ */
+ if ((cfg & INTEL_ARCH_EVENT_MASK) == 0x01c0 && !event->attr.exclusive) {
+ pr_warn_once("perf: INST_RETIRED.PREC_DIST only works in exclusive mode\n");
+ return -EINVAL;
+ }
+
+ return intel_pebs_aliases_ivb(event);
}
static int intel_pmu_hw_config(struct perf_event *event)
@@ -1479,8 +1496,11 @@ static int intel_pmu_hw_config(struct perf_event *event)
if (ret)
return ret;
- if (event->attr.precise_ip && x86_pmu.pebs_aliases)
- x86_pmu.pebs_aliases(event);
+ if (event->attr.precise_ip && x86_pmu.pebs_aliases) {
+ ret = x86_pmu.pebs_aliases(event);
+ if (ret)
+ return ret;
+ }
if (intel_pmu_needs_lbr_smpl(event)) {
ret = intel_pmu_setup_lbr_filter(event);
@@ -2084,7 +2104,7 @@ __init int intel_pmu_init(void)
x86_pmu.event_constraints = intel_snb_event_constraints;
x86_pmu.pebs_constraints = intel_ivb_pebs_event_constraints;
- x86_pmu.pebs_aliases = intel_pebs_aliases_snb;
+ x86_pmu.pebs_aliases = intel_pebs_aliases_ivb;
x86_pmu.extra_regs = intel_snb_extra_regs;
/* all extra regs are per-cpu when HT is on */
x86_pmu.er_flags |= ERF_HAS_RSP_1;
--
1.7.9.5
On Mon, Oct 22, 2012 at 03:20:51PM +0200, Stephane Eranian wrote:
> This patch adds the x modifier for events. It allows users to
> request exclusive PMU access (attr->exclusive):
>
> perf stat -e cycles:x ......
> or
> perf stat -e cpu/cycles/x ....
>
> Exclusive mode is a feature of perf_events which was not yet
> supported by the perf tool. Some events may require exclusive
> PMU access (like on Intel SandyBridge).
You probably should add it also to be readable from sysfs
(like I did with precise in my haswell patchkit)
Then the SNB version of instructions-p can just force this
without the user having to do anything special.
-Andi
On Mon, Oct 22, 2012 at 3:53 PM, Andi Kleen <[email protected]> wrote:
> On Mon, Oct 22, 2012 at 03:20:51PM +0200, Stephane Eranian wrote:
>> This patch adds the x modifier for events. It allows users to
>> request exclusive PMU access (attr->exclusive):
>>
>> perf stat -e cycles:x ......
>> or
>> perf stat -e cpu/cycles/x ....
>>
>> Exclusive mode is a feature of perf_events which was not yet
>> supported by the perf tool. Some events may require exclusive
>> PMU access (like on Intel SandyBridge).
>
> You probably should add it also to be readable from sysfs
>
Not sure I understand what you mean here.
x is not a HW property, this is a perf_event feature. It should NOT be
exposed in /sys/device/cpu/format/.
But maybe you're talking about a generic event exports in
/sys/device/cpu/events.
But why would I expose prec_dist there. And under what name?
> (like I did with precise in my haswell patchkit)
>
> Then the SNB version of instructions-p can just force this
> without the user having to do anything special.
>
> -Andi
> But maybe you're talking about a generic event exports in
> /sys/device/cpu/events.
> But why would I expose prec_dist there. And under what name?
I already exposed it in the Haswell patchkit (instructions-p)
I also exposed precise there so that some events can be forced PEBS
If you want to expose it on Sandy Bridge too you need to add the
event to the user tool grammar, like I did for precise.
Then user can just use
perf record -e instructions-p ...
and the kernel does automagically the right thing.
-Andi
--
[email protected] -- Speaking for myself only
On Mon, Oct 22, 2012 at 5:39 PM, Andi Kleen <[email protected]> wrote:
>> But maybe you're talking about a generic event exports in
>> /sys/device/cpu/events.
>> But why would I expose prec_dist there. And under what name?
>
> I already exposed it in the Haswell patchkit (instructions-p)
>
> I also exposed precise there so that some events can be forced PEBS
>
> If you want to expose it on Sandy Bridge too you need to add the
> event to the user tool grammar, like I did for precise.
>
> Then user can just use
>
> perf record -e instructions-p ...
>
> and the kernel does automagically the right thing.
>
Ok, I understand now.
But I am wondering what would be the meaning of:
$ perf stat -e instructions-p ...
I know the answer, because I know what's going on under the
hood. But what about the average user?
On Mon, 2012-10-22 at 17:44 +0200, Stephane Eranian wrote:
>
> I know the answer, because I know what's going on under the
> hood. But what about the average user?
I'm still wondering if the avg user really thinks 'instructions' is a
useful metric for other than obtaining ipc measurements.
The avg user wants time domain measurements, like what's the most
expensive function, or most expensive cache-miss etc..
On Mon, Oct 22, 2012 at 5:59 PM, Peter Zijlstra <[email protected]> wrote:
> On Mon, 2012-10-22 at 17:44 +0200, Stephane Eranian wrote:
>>
>> I know the answer, because I know what's going on under the
>> hood. But what about the average user?
>
> I'm still wondering if the avg user really thinks 'instructions' is a
> useful metric for other than obtaining ipc measurements.
>
Yeah, for many users CPI (or IPC) is a useful metric.
> The avg user wants time domain measurements, like what's the most
> expensive function, or most expensive cache-miss etc..
>
Yes, you have those too.
On Mon, 2012-10-22 at 18:08 +0200, Stephane Eranian wrote:
> > I'm still wondering if the avg user really thinks 'instructions' is
> a
> > useful metric for other than obtaining ipc measurements.
> >
> Yeah, for many users CPI (or IPC) is a useful metric.
Right but you don't get that using instruction sampling, right? Or you
need to pair it with a cycle count one way or another. And I don't
really see PDIST helping with that.