2012-10-19 14:55:25

by Stephane Eranian

[permalink] [raw]
Subject: [PATCH 0/2] perf: enforce exclusive PMU access for SNB INST_RETIRED:PREC_DIST

From: Stephane Eranian <[email protected]>

The following patch set enforces exclusive PMU access for
Intel SandyBridge INST_REITRED:PREC_DIST event when used
with PEBS as described in the SDM Vol 3b. Without this,
the sample distribution may not be correct.

The kernel now rejects PEBS + INST_RETIRED:PREC_DIST
on SNB if attr->exclusive is not set. One reason to
do it this way is to make sure users understands the
restriction. If the kernel were to force exclusive
on the fly, users would have cases where they would
get no samples and no error messages to understand
why.

The first patch extends perf to allow users to request
exclusive PMU access by introducing a new modifier: x.
For instance:
$ perf record -e r01c0:ppx .....
or
$ perf record -e cpu/event=0xc0,umask=0x1/ppx ....

Signed-off-by: Stephane Eranian <[email protected]>
---

Stephane Eranian (2):
perf tools: add event modifier to request exclusive PMU access
perf: enforce SNB exclusive access for INST_RETIRED:PREC_DIST

arch/x86/kernel/cpu/perf_event.h | 2 +-
arch/x86/kernel/cpu/perf_event_intel.c | 30 +++++++++++++++++++++++++-----
tools/perf/util/parse-events.c | 7 +++++++
tools/perf/util/parse-events.l | 2 +-
4 files changed, 34 insertions(+), 7 deletions(-)

--
1.7.9.5


2012-10-19 14:55:33

by Stephane Eranian

[permalink] [raw]
Subject: [PATCH 2/2] perf: SNB exclusive PMU access for INST_RETIRED:PREC_DIST

On all Intel SandyBridge processors, the INST_RETIRED:PREC_DIST
when used with PEBS must be measured alone. That means no other
event can be active on the PMU at the same time as per Intel SDM
Vol3b. This is what the exclusive mode of perf_events provides.
However, it was not enforced for that event.

This patch forces the INST_RETIRED:PREC_DIST event to have
the attr->exclusive bit set in order to be used with PEBS
on SNB. On Intel Ivybridge, that restriction is gone.

Signed-off-by: Stephane Eranian <[email protected]>
---
arch/x86/kernel/cpu/perf_event.h | 2 +-
arch/x86/kernel/cpu/perf_event_intel.c | 30 +++++++++++++++++++++++++-----
2 files changed, 26 insertions(+), 6 deletions(-)

diff --git a/arch/x86/kernel/cpu/perf_event.h b/arch/x86/kernel/cpu/perf_event.h
index 271d257..722ab12 100644
--- a/arch/x86/kernel/cpu/perf_event.h
+++ b/arch/x86/kernel/cpu/perf_event.h
@@ -382,7 +382,7 @@ struct x86_pmu {
int pebs_record_size;
void (*drain_pebs)(struct pt_regs *regs);
struct event_constraint *pebs_constraints;
- void (*pebs_aliases)(struct perf_event *event);
+ int (*pebs_aliases)(struct perf_event *event);
int max_pebs_events;

/*
diff --git a/arch/x86/kernel/cpu/perf_event_intel.c b/arch/x86/kernel/cpu/perf_event_intel.c
index 324bb52..d7546dc 100644
--- a/arch/x86/kernel/cpu/perf_event_intel.c
+++ b/arch/x86/kernel/cpu/perf_event_intel.c
@@ -1416,7 +1416,7 @@ static void intel_put_event_constraints(struct cpu_hw_events *cpuc,
intel_put_shared_regs_event_constraints(cpuc, event);
}

-static void intel_pebs_aliases_core2(struct perf_event *event)
+static int intel_pebs_aliases_core2(struct perf_event *event)
{
if ((event->hw.config & X86_RAW_EVENT_MASK) == 0x003c) {
/*
@@ -1442,9 +1442,10 @@ static void intel_pebs_aliases_core2(struct perf_event *event)
alt_config |= (event->hw.config & ~X86_RAW_EVENT_MASK);
event->hw.config = alt_config;
}
+ return 0;
}

-static void intel_pebs_aliases_snb(struct perf_event *event)
+static int intel_pebs_aliases_ivb(struct perf_event *event)
{
if ((event->hw.config & X86_RAW_EVENT_MASK) == 0x003c) {
/*
@@ -1470,6 +1471,22 @@ static void intel_pebs_aliases_snb(struct perf_event *event)
alt_config |= (event->hw.config & ~X86_RAW_EVENT_MASK);
event->hw.config = alt_config;
}
+ return 0;
+}
+
+static int intel_pebs_aliases_snb(struct perf_event *event)
+{
+ u64 cfg = event->hw.config;
+ /*
+ * for INST_RETIRED.PREC_DIST to work correctly with PEBS, it must
+ * be measured alone on SNB (exclusive PMU access) as per Intel SDM.
+ */
+ if ((cfg & INTEL_ARCH_EVENT_MASK) == 0x01c0 && !event->attr.exclusive) {
+ pr_info("perf: INST_RETIRED.PREC_DIST only works in exclusive mode\n");
+ return -EINVAL;
+ }
+
+ return intel_pebs_aliases_ivb(event);
}

static int intel_pmu_hw_config(struct perf_event *event)
@@ -1479,8 +1496,11 @@ static int intel_pmu_hw_config(struct perf_event *event)
if (ret)
return ret;

- if (event->attr.precise_ip && x86_pmu.pebs_aliases)
- x86_pmu.pebs_aliases(event);
+ if (event->attr.precise_ip && x86_pmu.pebs_aliases) {
+ ret = x86_pmu.pebs_aliases(event);
+ if (ret)
+ return ret;
+ }

if (intel_pmu_needs_lbr_smpl(event)) {
ret = intel_pmu_setup_lbr_filter(event);
@@ -2084,7 +2104,7 @@ __init int intel_pmu_init(void)

x86_pmu.event_constraints = intel_snb_event_constraints;
x86_pmu.pebs_constraints = intel_ivb_pebs_event_constraints;
- x86_pmu.pebs_aliases = intel_pebs_aliases_snb;
+ x86_pmu.pebs_aliases = intel_pebs_aliases_ivb;
x86_pmu.extra_regs = intel_snb_extra_regs;
/* all extra regs are per-cpu when HT is on */
x86_pmu.er_flags |= ERF_HAS_RSP_1;
--
1.7.9.5

2012-10-19 14:55:29

by Stephane Eranian

[permalink] [raw]
Subject: [PATCH 1/2] perf tools: add event modifier to request exclusive PMU access

This patch adds the x modifier for events. It allows users to
request exclusive PMU access (attr->exclusive):

perf stat -e cycles:x ......
or
perf stat -e cpu/cycles/x ....

Exclusive mode is a feature of perf_events which was not yet
supported by the perf tool. Some events may require exclusive
PMU access (like on Intel SandyBridge).

Signed-off-by: Stephane Eranian <[email protected]>
---
tools/perf/util/parse-events.c | 7 +++++++
tools/perf/util/parse-events.l | 2 +-
2 files changed, 8 insertions(+), 1 deletion(-)

diff --git a/tools/perf/util/parse-events.c b/tools/perf/util/parse-events.c
index aed38e4..aa73392 100644
--- a/tools/perf/util/parse-events.c
+++ b/tools/perf/util/parse-events.c
@@ -642,6 +642,7 @@ struct event_modifier {
int eG;
int precise;
int exclude_GH;
+ int exclusive;
};

static int get_event_modifier(struct event_modifier *mod, char *str,
@@ -656,6 +657,7 @@ static int get_event_modifier(struct event_modifier *mod, char *str,

int exclude = eu | ek | eh;
int exclude_GH = evsel ? evsel->exclude_GH : 0;
+ int exclusive = evsel ? evsel->attr.exclusive : 0;

/*
* We are here for group and 'GH' was not set as event
@@ -690,6 +692,8 @@ static int get_event_modifier(struct event_modifier *mod, char *str,
eH = 0;
} else if (*str == 'p') {
precise++;
+ } else if (*str == 'x') {
+ exclusive = 1;
} else
break;

@@ -716,6 +720,8 @@ static int get_event_modifier(struct event_modifier *mod, char *str,
mod->eG = eG;
mod->precise = precise;
mod->exclude_GH = exclude_GH;
+ mod->exclusive = exclusive;
+
return 0;
}

@@ -741,6 +747,7 @@ int parse_events__modifier_event(struct list_head *list, char *str, bool add)
evsel->attr.precise_ip = mod.precise;
evsel->attr.exclude_host = mod.eH;
evsel->attr.exclude_guest = mod.eG;
+ evsel->attr.exclusive = mod.exclusive;
evsel->exclude_GH = mod.exclude_GH;
}

diff --git a/tools/perf/util/parse-events.l b/tools/perf/util/parse-events.l
index c87efc1..9c8a06d 100644
--- a/tools/perf/util/parse-events.l
+++ b/tools/perf/util/parse-events.l
@@ -81,7 +81,7 @@ num_dec [0-9]+
num_hex 0x[a-fA-F0-9]+
num_raw_hex [a-fA-F0-9]+
name [a-zA-Z_*?][a-zA-Z0-9_*?]*
-modifier_event [ukhpGH]{1,8}
+modifier_event [ukhpGHx]{1,8}
modifier_bp [rwx]{1,3}

%%
--
1.7.9.5

2012-10-19 15:13:55

by Peter Zijlstra

[permalink] [raw]
Subject: Re: [PATCH 1/2] perf tools: add event modifier to request exclusive PMU access

On Fri, 2012-10-19 at 16:52 +0200, Stephane Eranian wrote:
> -modifier_event [ukhpGH]{1,8}
> +modifier_event [ukhpGHx]{1,8}

wouldn't the max modifier sting length grow by adding another possible
modifier?

2012-10-19 15:18:00

by Stephane Eranian

[permalink] [raw]
Subject: Re: [PATCH 1/2] perf tools: add event modifier to request exclusive PMU access

On Fri, Oct 19, 2012 at 5:13 PM, Peter Zijlstra <[email protected]> wrote:
> On Fri, 2012-10-19 at 16:52 +0200, Stephane Eranian wrote:
>> -modifier_event [ukhpGH]{1,8}
>> +modifier_event [ukhpGHx]{1,8}
>
> wouldn't the max modifier sting length grow by adding another possible
> modifier?

That's what I thought too, but then I don't understand why it was at
eight before and not
seven: One instance of each letter, + a second for pp (precise=2). Or
am I missing
something here?

2012-10-19 15:23:56

by Jiri Olsa

[permalink] [raw]
Subject: Re: [PATCH 1/2] perf tools: add event modifier to request exclusive PMU access

On Fri, Oct 19, 2012 at 05:17:57PM +0200, Stephane Eranian wrote:
> On Fri, Oct 19, 2012 at 5:13 PM, Peter Zijlstra <[email protected]> wrote:
> > On Fri, 2012-10-19 at 16:52 +0200, Stephane Eranian wrote:
> >> -modifier_event [ukhpGH]{1,8}
> >> +modifier_event [ukhpGHx]{1,8}
> >
> > wouldn't the max modifier sting length grow by adding another possible
> > modifier?
>
> That's what I thought too, but then I don't understand why it was at
> eight before and not
> seven: One instance of each letter, + a second for pp (precise=2). Or
> am I missing
> something here?

hm, I think I assumed for some reason that 'ppp' is valid as well

jirka

2012-10-19 15:46:13

by Andi Kleen

[permalink] [raw]
Subject: Re: [PATCH 1/2] perf tools: add event modifier to request exclusive PMU access

> That's what I thought too, but then I don't understand why it was at
> eight before and not
> seven: One instance of each letter, + a second for pp (precise=2). Or
> am I missing
> something here?

The number is pretty useless imho (it's unlikely to catch any real user error)
and most likely generates a much longer automata in flex than just a +
So I would drop it.

-Andi

--
[email protected] -- Speaking for myself only

2012-10-19 15:47:14

by Stephane Eranian

[permalink] [raw]
Subject: Re: [PATCH 1/2] perf tools: add event modifier to request exclusive PMU access

On Fri, Oct 19, 2012 at 5:23 PM, Jiri Olsa <[email protected]> wrote:
> On Fri, Oct 19, 2012 at 05:17:57PM +0200, Stephane Eranian wrote:
>> On Fri, Oct 19, 2012 at 5:13 PM, Peter Zijlstra <[email protected]> wrote:
>> > On Fri, 2012-10-19 at 16:52 +0200, Stephane Eranian wrote:
>> >> -modifier_event [ukhpGH]{1,8}
>> >> +modifier_event [ukhpGHx]{1,8}
>> >
>> > wouldn't the max modifier sting length grow by adding another possible
>> > modifier?
>>
>> That's what I thought too, but then I don't understand why it was at
>> eight before and not
>> seven: One instance of each letter, + a second for pp (precise=2). Or
>> am I missing
>> something here?
>
> hm, I think I assumed for some reason that 'ppp' is valid as well
>
If that's the case, then yes, I need to bump that number to 9.

> jirka

2012-10-19 15:50:01

by Andi Kleen

[permalink] [raw]
Subject: Re: [PATCH 2/2] perf: SNB exclusive PMU access for INST_RETIRED:PREC_DIST

> + /*
> + * for INST_RETIRED.PREC_DIST to work correctly with PEBS, it must
> + * be measured alone on SNB (exclusive PMU access) as per Intel SDM.
> + */
> + if ((cfg & INTEL_ARCH_EVENT_MASK) == 0x01c0 && !event->attr.exclusive) {
> + pr_info("perf: INST_RETIRED.PREC_DIST only works in exclusive mode\n");
> + return -EINVAL;
> + }


Strictly you have to check for precise too, right?

-Andi

2012-10-19 15:53:14

by Jiri Olsa

[permalink] [raw]
Subject: Re: [PATCH 1/2] perf tools: add event modifier to request exclusive PMU access

On Fri, Oct 19, 2012 at 05:47:11PM +0200, Stephane Eranian wrote:
> On Fri, Oct 19, 2012 at 5:23 PM, Jiri Olsa <[email protected]> wrote:
> > On Fri, Oct 19, 2012 at 05:17:57PM +0200, Stephane Eranian wrote:
> >> On Fri, Oct 19, 2012 at 5:13 PM, Peter Zijlstra <[email protected]> wrote:
> >> > On Fri, 2012-10-19 at 16:52 +0200, Stephane Eranian wrote:
> >> >> -modifier_event [ukhpGH]{1,8}
> >> >> +modifier_event [ukhpGHx]{1,8}
> >> >
> >> > wouldn't the max modifier sting length grow by adding another possible
> >> > modifier?
> >>
> >> That's what I thought too, but then I don't understand why it was at
> >> eight before and not
> >> seven: One instance of each letter, + a second for pp (precise=2). Or
> >> am I missing
> >> something here?
> >
> > hm, I think I assumed for some reason that 'ppp' is valid as well
> >
> If that's the case, then yes, I need to bump that number to 9.

found my source ;)

include/linux/perf_event.h:
...

/*
* precise_ip:
*
* 0 - SAMPLE_IP can have arbitrary skid
* 1 - SAMPLE_IP must have constant skid
* 2 - SAMPLE_IP requested to have 0 skid
* 3 - SAMPLE_IP must have 0 skid
*
* See also PERF_RECORD_MISC_EXACT_IP
*/
precise_ip : 2, /* skid constraint */
...


jirka

2012-10-19 15:58:13

by Stephane Eranian

[permalink] [raw]
Subject: Re: [PATCH 2/2] perf: SNB exclusive PMU access for INST_RETIRED:PREC_DIST

On Fri, Oct 19, 2012 at 5:49 PM, Andi Kleen <[email protected]> wrote:
>> + /*
>> + * for INST_RETIRED.PREC_DIST to work correctly with PEBS, it must
>> + * be measured alone on SNB (exclusive PMU access) as per Intel SDM.
>> + */
>> + if ((cfg & INTEL_ARCH_EVENT_MASK) == 0x01c0 && !event->attr.exclusive) {
>> + pr_info("perf: INST_RETIRED.PREC_DIST only works in exclusive mode\n");
>> + return -EINVAL;
>> + }
>
>
> Strictly you have to check for precise too, right?
>
Yes, the restriction is enforced only when PEBS is active and
this is what I do based on where pebs_aliases() is called.


> -Andi

2012-10-19 15:58:55

by Stephane Eranian

[permalink] [raw]
Subject: Re: [PATCH 1/2] perf tools: add event modifier to request exclusive PMU access

On Fri, Oct 19, 2012 at 5:53 PM, Jiri Olsa <[email protected]> wrote:
> On Fri, Oct 19, 2012 at 05:47:11PM +0200, Stephane Eranian wrote:
>> On Fri, Oct 19, 2012 at 5:23 PM, Jiri Olsa <[email protected]> wrote:
>> > On Fri, Oct 19, 2012 at 05:17:57PM +0200, Stephane Eranian wrote:
>> >> On Fri, Oct 19, 2012 at 5:13 PM, Peter Zijlstra <[email protected]> wrote:
>> >> > On Fri, 2012-10-19 at 16:52 +0200, Stephane Eranian wrote:
>> >> >> -modifier_event [ukhpGH]{1,8}
>> >> >> +modifier_event [ukhpGHx]{1,8}
>> >> >
>> >> > wouldn't the max modifier sting length grow by adding another possible
>> >> > modifier?
>> >>
>> >> That's what I thought too, but then I don't understand why it was at
>> >> eight before and not
>> >> seven: One instance of each letter, + a second for pp (precise=2). Or
>> >> am I missing
>> >> something here?
>> >
>> > hm, I think I assumed for some reason that 'ppp' is valid as well
>> >
>> If that's the case, then yes, I need to bump that number to 9.
>
> found my source ;)
>
> include/linux/perf_event.h:
> ...
>
> /*
> * precise_ip:
> *
> * 0 - SAMPLE_IP can have arbitrary skid
> * 1 - SAMPLE_IP must have constant skid
> * 2 - SAMPLE_IP requested to have 0 skid
> * 3 - SAMPLE_IP must have 0 skid
> *
> * See also PERF_RECORD_MISC_EXACT_IP
> */
> precise_ip : 2, /* skid constraint */
> ...
>
Ok, nobody supports 3 today, but that's fine. I can change that value to 9 then.

2012-10-19 16:07:41

by Jiri Olsa

[permalink] [raw]
Subject: Re: [PATCH 1/2] perf tools: add event modifier to request exclusive PMU access

On Fri, Oct 19, 2012 at 08:46:10AM -0700, Andi Kleen wrote:
> > That's what I thought too, but then I don't understand why it was at
> > eight before and not
> > seven: One instance of each letter, + a second for pp (precise=2). Or
> > am I missing
> > something here?
>
> The number is pretty useless imho (it's unlikely to catch any real user error)
> and most likely generates a much longer automata in flex than just a +
> So I would drop it.

yep, probably we could do better job with checking the modifier
after parsing (with just + in flex) ... added in my todo ;)

jirka

2012-10-19 16:28:42

by Peter Zijlstra

[permalink] [raw]
Subject: Re: [PATCH 2/2] perf: SNB exclusive PMU access for INST_RETIRED:PREC_DIST

On Fri, 2012-10-19 at 16:52 +0200, Stephane Eranian wrote:
> +static int intel_pebs_aliases_snb(struct perf_event *event)
> +{
> + u64 cfg = event->hw.config;
> + /*
> + * for INST_RETIRED.PREC_DIST to work correctly with PEBS, it must
> + * be measured alone on SNB (exclusive PMU access) as per Intel SDM.
> + */
> + if ((cfg & INTEL_ARCH_EVENT_MASK) == 0x01c0 && !event->attr.exclusive) {
> + pr_info("perf: INST_RETIRED.PREC_DIST only works in exclusive mode\n");
> + return -EINVAL;

This isn't limited to admin, right? So the above turns into a DoS on the
console.

> + }
> +
> + return intel_pebs_aliases_ivb(event);
> }

2012-10-19 16:31:16

by Stephane Eranian

[permalink] [raw]
Subject: Re: [PATCH 2/2] perf: SNB exclusive PMU access for INST_RETIRED:PREC_DIST

On Fri, Oct 19, 2012 at 6:27 PM, Peter Zijlstra <[email protected]> wrote:
> On Fri, 2012-10-19 at 16:52 +0200, Stephane Eranian wrote:
>> +static int intel_pebs_aliases_snb(struct perf_event *event)
>> +{
>> + u64 cfg = event->hw.config;
>> + /*
>> + * for INST_RETIRED.PREC_DIST to work correctly with PEBS, it must
>> + * be measured alone on SNB (exclusive PMU access) as per Intel SDM.
>> + */
>> + if ((cfg & INTEL_ARCH_EVENT_MASK) == 0x01c0 && !event->attr.exclusive) {
>> + pr_info("perf: INST_RETIRED.PREC_DIST only works in exclusive mode\n");
>> + return -EINVAL;
>
> This isn't limited to admin, right? So the above turns into a DoS on the
> console.
>
Ok, so how about a WARN_ON_ONCE() instead?

2012-10-19 16:46:37

by Peter Zijlstra

[permalink] [raw]
Subject: Re: [PATCH 2/2] perf: SNB exclusive PMU access for INST_RETIRED:PREC_DIST

On Fri, 2012-10-19 at 18:31 +0200, Stephane Eranian wrote:
> On Fri, Oct 19, 2012 at 6:27 PM, Peter Zijlstra <[email protected]> wrote:
> > On Fri, 2012-10-19 at 16:52 +0200, Stephane Eranian wrote:
> >> +static int intel_pebs_aliases_snb(struct perf_event *event)
> >> +{
> >> + u64 cfg = event->hw.config;
> >> + /*
> >> + * for INST_RETIRED.PREC_DIST to work correctly with PEBS, it must
> >> + * be measured alone on SNB (exclusive PMU access) as per Intel SDM.
> >> + */
> >> + if ((cfg & INTEL_ARCH_EVENT_MASK) == 0x01c0 && !event->attr.exclusive) {
> >> + pr_info("perf: INST_RETIRED.PREC_DIST only works in exclusive mode\n");
> >> + return -EINVAL;
> >
> > This isn't limited to admin, right? So the above turns into a DoS on the
> > console.
> >
> Ok, so how about a WARN_ON_ONCE() instead?

That should be fine I guess ;-)

2012-10-19 17:20:20

by Andi Kleen

[permalink] [raw]
Subject: Re: [PATCH 2/2] perf: SNB exclusive PMU access for INST_RETIRED:PREC_DIST

> > > This isn't limited to admin, right? So the above turns into a DoS on the
> > > console.
> > >
> > Ok, so how about a WARN_ON_ONCE() instead?
>
> That should be fine I guess ;-)

imho there is need for a generic mechanism to return an error string
to the user program without such hacks.

-Andi

--
[email protected] -- Speaking for myself only

2012-10-21 16:55:29

by Ingo Molnar

[permalink] [raw]
Subject: Re: [PATCH 2/2] perf: SNB exclusive PMU access for INST_RETIRED:PREC_DIST


* Andi Kleen <[email protected]> wrote:

> > > > This isn't limited to admin, right? So the above turns into a DoS on the
> > > > console.
> > > >
> > > Ok, so how about a WARN_ON_ONCE() instead?
> >
> > That should be fine I guess ;-)
>
> imho there is need for a generic mechanism to return an error
> string to the user program without such hacks.

Agreed - we could return an 'extended errno' long error return
value, which in reality is a pointer to an error string (in
perf_attr::error_str), and copy that string to user-space at
perf syscall return time.

Thus error-string aware tooling could print the error string.

So PMU drivers could do something obvious like:

return (long)"perf: INST_RETIRED.PREC_DIST only works in exclusive mode";

The perf syscall notices these pointers by noticing that the
error code returned is outside the errno range.

Old userspace will get a -EINVAL and no string copied into the
error string buffer.

New userspace would get the error string copied into
perf_attr::error_str, plus a 'normal' -EINVAL error code.

The only cost on the kernel side is to make sure all "string
errors" are returned as long.

Thanks,

Ingo

2012-10-21 17:54:46

by Stephane Eranian

[permalink] [raw]
Subject: Re: [PATCH 2/2] perf: SNB exclusive PMU access for INST_RETIRED:PREC_DIST

On Sun, Oct 21, 2012 at 6:55 PM, Ingo Molnar <[email protected]> wrote:
>
> * Andi Kleen <[email protected]> wrote:
>
>> > > > This isn't limited to admin, right? So the above turns into a DoS on the
>> > > > console.
>> > > >
>> > > Ok, so how about a WARN_ON_ONCE() instead?
>> >
>> > That should be fine I guess ;-)
>>
>> imho there is need for a generic mechanism to return an error
>> string to the user program without such hacks.
>
> Agreed - we could return an 'extended errno' long error return
> value, which in reality is a pointer to an error string (in
> perf_attr::error_str), and copy that string to user-space at
> perf syscall return time.
>
I assume by perf_attr:error_str, you actually mean:

struct perf_event_attr {
char error_str[PERF_ERR_LEN];
};

Right?

> Thus error-string aware tooling could print the error string.
>
> So PMU drivers could do something obvious like:
>
> return (long)"perf: INST_RETIRED.PREC_DIST only works in exclusive mode";
>
> The perf syscall notices these pointers by noticing that the
> error code returned is outside the errno range.
>
Is that always the case on all archs?

> Old userspace will get a -EINVAL and no string copied into the
> error string buffer.
>
> New userspace would get the error string copied into
> perf_attr::error_str, plus a 'normal' -EINVAL error code.
>
> The only cost on the kernel side is to make sure all "string
> errors" are returned as long.
>
> Thanks,
>
> Ingo

2012-10-21 18:03:47

by Ingo Molnar

[permalink] [raw]
Subject: Re: [PATCH 2/2] perf: SNB exclusive PMU access for INST_RETIRED:PREC_DIST


* Stephane Eranian <[email protected]> wrote:

> On Sun, Oct 21, 2012 at 6:55 PM, Ingo Molnar <[email protected]> wrote:
> >
> > * Andi Kleen <[email protected]> wrote:
> >
> >> > > > This isn't limited to admin, right? So the above turns into a DoS on the
> >> > > > console.
> >> > > >
> >> > > Ok, so how about a WARN_ON_ONCE() instead?
> >> >
> >> > That should be fine I guess ;-)
> >>
> >> imho there is need for a generic mechanism to return an error
> >> string to the user program without such hacks.
> >
> > Agreed - we could return an 'extended errno' long error return
> > value, which in reality is a pointer to an error string (in
> > perf_attr::error_str), and copy that string to user-space at
> > perf syscall return time.
> >
> I assume by perf_attr:error_str, you actually mean:
>
> struct perf_event_attr {
> char error_str[PERF_ERR_LEN];
> };
>
> Right?

I don't think we should allocate space in the attr, instead we
should use something like:

u8 __user *err_str;
u32 err_str_len;

which would be filled in by tooling with a string and a max_len
value, and strncpy_to_user() could do the rest on the kernel
side. [ A minor complication is that we don't have a
strncpy_to_user() method at the moment. ]

Static strings could be handled this way.

[ Dynamic strings could be supported too with a few tricks,
although I doubt it matters in practice. ]

> > Thus error-string aware tooling could print the error string.
> >
> > So PMU drivers could do something obvious like:
> >
> > return (long)"perf: INST_RETIRED.PREC_DIST only works in exclusive mode";
> >
> > The perf syscall notices these pointers by noticing that the
> > error code returned is outside the errno range.
>
> Is that always the case on all archs?

I think yes - and if not then it can be solved via some trivial
offset value added to it on such an architecture, without
complicating the code on normal architectures.

Thanks,

Ingo

2012-10-22 11:31:55

by Stephane Eranian

[permalink] [raw]
Subject: Re: [PATCH 2/2] perf: SNB exclusive PMU access for INST_RETIRED:PREC_DIST

On Sun, Oct 21, 2012 at 8:03 PM, Ingo Molnar <[email protected]> wrote:
>
> * Stephane Eranian <[email protected]> wrote:
>
>> On Sun, Oct 21, 2012 at 6:55 PM, Ingo Molnar <[email protected]> wrote:
>> >
>> > * Andi Kleen <[email protected]> wrote:
>> >
>> >> > > > This isn't limited to admin, right? So the above turns into a DoS on the
>> >> > > > console.
>> >> > > >
>> >> > > Ok, so how about a WARN_ON_ONCE() instead?
>> >> >
>> >> > That should be fine I guess ;-)
>> >>
>> >> imho there is need for a generic mechanism to return an error
>> >> string to the user program without such hacks.
>> >
>> > Agreed - we could return an 'extended errno' long error return
>> > value, which in reality is a pointer to an error string (in
>> > perf_attr::error_str), and copy that string to user-space at
>> > perf syscall return time.
>> >
>> I assume by perf_attr:error_str, you actually mean:
>>
>> struct perf_event_attr {
>> char error_str[PERF_ERR_LEN];
>> };
>>
>> Right?
>
> I don't think we should allocate space in the attr, instead we
> should use something like:
>
> u8 __user *err_str;
> u32 err_str_len;
>
> which would be filled in by tooling with a string and a max_len
> value, and strncpy_to_user() could do the rest on the kernel
> side. [ A minor complication is that we don't have a
> strncpy_to_user() method at the moment. ]
>
> Static strings could be handled this way.
>
> [ Dynamic strings could be supported too with a few tricks,
> although I doubt it matters in practice. ]
>
Ok, but this still limits returning error string to the perf_event_open()
syscall, not read(), ioctl() and such.

I am fine with this change. However, I think it should be added separately
from my inst_retired:prec_dist patch. It has a broader impact.


>> > Thus error-string aware tooling could print the error string.
>> >
>> > So PMU drivers could do something obvious like:
>> >
>> > return (long)"perf: INST_RETIRED.PREC_DIST only works in exclusive mode";
>> >
>> > The perf syscall notices these pointers by noticing that the
>> > error code returned is outside the errno range.
>>
>> Is that always the case on all archs?
>
> I think yes - and if not then it can be solved via some trivial
> offset value added to it on such an architecture, without
> complicating the code on normal architectures.
>
> Thanks,
>
> Ingo

2012-10-24 08:15:31

by Ingo Molnar

[permalink] [raw]
Subject: Re: [PATCH 2/2] perf: SNB exclusive PMU access for INST_RETIRED:PREC_DIST


* Stephane Eranian <[email protected]> wrote:

> On Sun, Oct 21, 2012 at 8:03 PM, Ingo Molnar <[email protected]> wrote:
> >
> > * Stephane Eranian <[email protected]> wrote:
> >
> >> On Sun, Oct 21, 2012 at 6:55 PM, Ingo Molnar <[email protected]> wrote:
> >> >
> >> > * Andi Kleen <[email protected]> wrote:
> >> >
> >> >> > > > This isn't limited to admin, right? So the above turns into a DoS on the
> >> >> > > > console.
> >> >> > > >
> >> >> > > Ok, so how about a WARN_ON_ONCE() instead?
> >> >> >
> >> >> > That should be fine I guess ;-)
> >> >>
> >> >> imho there is need for a generic mechanism to return an error
> >> >> string to the user program without such hacks.
> >> >
> >> > Agreed - we could return an 'extended errno' long error return
> >> > value, which in reality is a pointer to an error string (in
> >> > perf_attr::error_str), and copy that string to user-space at
> >> > perf syscall return time.
> >> >
> >> I assume by perf_attr:error_str, you actually mean:
> >>
> >> struct perf_event_attr {
> >> char error_str[PERF_ERR_LEN];
> >> };
> >>
> >> Right?
> >
> > I don't think we should allocate space in the attr, instead we
> > should use something like:
> >
> > u8 __user *err_str;
> > u32 err_str_len;
> >
> > which would be filled in by tooling with a string and a max_len
> > value, and strncpy_to_user() could do the rest on the kernel
> > side. [ A minor complication is that we don't have a
> > strncpy_to_user() method at the moment. ]
> >
> > Static strings could be handled this way.
> >
> > [ Dynamic strings could be supported too with a few tricks,
> > although I doubt it matters in practice. ]
> >
>
> Ok, but this still limits returning error string to the
> perf_event_open() syscall, not read(), ioctl() and such.

Yes - but this should be enough to handle most of the cases in
practice - because the richness of the various perf components
is mostly exposed via the perf syscall. By the time we get to
read() and ioctl() we are in a pretty well defined domain.

Also, I don't think people want the (small but nonzero) overhead
of extended error reporting for read or ioctl.

> I am fine with this change. However, I think it should be
> added separately from my inst_retired:prec_dist patch. It has
> a broader impact.

Most definitely.

Thanks,

Ingo