2013-04-24 23:04:59

by Andi Kleen

[permalink] [raw]
Subject: [PATCH 1/2] Fix perf LBR filtering

From: Andi Kleen <[email protected]>

The perf LBR code has special code to filter specific
instructions in software.

The LBR logs any instruction address, even if IP just faulted.
This means user space can control any address by just branching
to a bad address.

On a modern Intel system the only software filtering needed
is to include SYSCALL/RETs in PERF_SAMPLE_BRANCH_ANY_CALL/RETURN.
The hardware call filter only handles short calls, but syscall
is a far call. So it enables far call logging too, but removes
any other far calls (like interrupts) by looking at the instruction.
On older systems some additional software filtering is done too,
to work a problem that CALLs can be only logged together with
indirect jumps.

It currently assumes that any address that looks like a kernel
address can be safely referenced.

But that is dangerous if can be controlled by the user:
- It can be used to crash the kernel
- It allows to probe any physical address for a small set of values
(valid call op codes) which is an information leak.
- It may point to a side effect on read MMIO region

So we cannot reference kernel addresses safely.

Possible options:

I) Disable FAR calls for ANY_CALL/RETURNS.
This just means syscalls are not logged
as calls. This also lowers the overhead of call logging.
This changes semantics slightly.
This is reasonable on Sandy Bridge and later, but would
cause additional problems on Nehalem and Westmere with
their additional filters.

II) Simple disable any filtering for kernel space.
This means interrupts in kernel space are reported as calls
and on Nehalem/Westmere some indirect jumps are reported
as calls too

III) Enumerate all the kernel entry points and check.
Any bad call must have a kernel entry point as to.
This seemed to fragile to maintain.

IV) Enumerate all kernel code and check for these ranges.
Quite complicated, especially with the new kernel code JITs.
Would also allow to probe for kernel code (defeating randomized kernel)

This patch implements II: Simply disable software filtering for
any kernel address, which seemed the best.
(I) would be also an option and was earlier implemented in
https://patchwork.kernel.org/patch/2468351/
(however this patch still leaves Nehalem/Westmere/Atom open to the problem)
(III) and (IV) appear too complicated and risky.

Should be applied to applicable stable branches too. The problem
goes back a long time.

Signed-off-by: Andi Kleen <[email protected]>
---
arch/x86/kernel/cpu/perf_event_intel_lbr.c | 18 +++++++++++++++---
1 files changed, 15 insertions(+), 3 deletions(-)

diff --git a/arch/x86/kernel/cpu/perf_event_intel_lbr.c b/arch/x86/kernel/cpu/perf_event_intel_lbr.c
index da02e9c..ae8c76f 100644
--- a/arch/x86/kernel/cpu/perf_event_intel_lbr.c
+++ b/arch/x86/kernel/cpu/perf_event_intel_lbr.c
@@ -442,15 +442,27 @@ static int branch_type(unsigned long from, unsigned long to)
return X86_BR_NONE;

addr = buf;
- } else
- addr = (void *)from;
+ } else {
+ /*
+ * The LBR logs any address in IP, even if IP just faulted.
+ * This means user space can control any address. Since
+ * it's dangerous to reference a user controlled kernel
+ * address we don't do any software filtering for addresses that
+ * look like kernel.
+ *
+ * On modern Intel systems (Sandy Bridge+) this implies that
+ * exceptions and interrupts in kernel space may be reported like
+ * calls.
+ */
+ return X86_BR_NONE;
+ }

/*
* decoder needs to know the ABI especially
* on 64-bit systems running 32-bit apps
*/
#ifdef CONFIG_X86_64
- is64 = kernel_ip((unsigned long)addr) || !test_thread_flag(TIF_IA32);
+ is64 = !test_thread_flag(TIF_IA32);
#endif
insn_init(&insn, addr, is64);
insn_get_opcode(&insn);
--
1.7.7.6


2013-04-24 23:05:03

by Andi Kleen

[permalink] [raw]
Subject: [PATCH 2/2] perf, x86: Don't allow unusual PEBS raw flags

From: Andi Kleen <[email protected]>

The PEBS documentation in the Intel SDM 18.6.1.1 states:

"""
PEBS events are only valid when the following fields of IA32_PERFEVTSELx are all
zero: AnyThread, Edge, Invert, CMask.
"""

Since we had problems with this earlier, don't allow cmask, any, edge, invert
as raw events, except for the ones explicitly listed as pebs_aliases.

Signed-off-by: Andi Kleen <[email protected]>
---
arch/x86/kernel/cpu/perf_event.h | 2 +-
arch/x86/kernel/cpu/perf_event_intel.c | 24 ++++++++++++++++++++----
2 files changed, 21 insertions(+), 5 deletions(-)

diff --git a/arch/x86/kernel/cpu/perf_event.h b/arch/x86/kernel/cpu/perf_event.h
index 7f5c75c..e46f932 100644
--- a/arch/x86/kernel/cpu/perf_event.h
+++ b/arch/x86/kernel/cpu/perf_event.h
@@ -386,7 +386,7 @@ struct x86_pmu {
int pebs_record_size;
void (*drain_pebs)(struct pt_regs *regs);
struct event_constraint *pebs_constraints;
- void (*pebs_aliases)(struct perf_event *event);
+ bool (*pebs_aliases)(struct perf_event *event);
int max_pebs_events;

/*
diff --git a/arch/x86/kernel/cpu/perf_event_intel.c b/arch/x86/kernel/cpu/perf_event_intel.c
index cc45deb..126c971 100644
--- a/arch/x86/kernel/cpu/perf_event_intel.c
+++ b/arch/x86/kernel/cpu/perf_event_intel.c
@@ -1447,7 +1447,7 @@ static void intel_put_event_constraints(struct cpu_hw_events *cpuc,
intel_put_shared_regs_event_constraints(cpuc, event);
}

-static void intel_pebs_aliases_core2(struct perf_event *event)
+static bool intel_pebs_aliases_core2(struct perf_event *event)
{
if ((event->hw.config & X86_RAW_EVENT_MASK) == 0x003c) {
/*
@@ -1472,10 +1472,12 @@ static void intel_pebs_aliases_core2(struct perf_event *event)

alt_config |= (event->hw.config & ~X86_RAW_EVENT_MASK);
event->hw.config = alt_config;
+ return true;
}
+ return false;
}

-static void intel_pebs_aliases_snb(struct perf_event *event)
+static bool intel_pebs_aliases_snb(struct perf_event *event)
{
if ((event->hw.config & X86_RAW_EVENT_MASK) == 0x003c) {
/*
@@ -1500,7 +1502,9 @@ static void intel_pebs_aliases_snb(struct perf_event *event)

alt_config |= (event->hw.config & ~X86_RAW_EVENT_MASK);
event->hw.config = alt_config;
+ return true;
}
+ return false;
}

static int intel_pmu_hw_config(struct perf_event *event)
@@ -1510,8 +1514,20 @@ static int intel_pmu_hw_config(struct perf_event *event)
if (ret)
return ret;

- if (event->attr.precise_ip && x86_pmu.pebs_aliases)
- x86_pmu.pebs_aliases(event);
+ if (event->attr.precise_ip) {
+ /*
+ * Don't allow unusal flags with PEBS, as that
+ * may confuse the CPU.
+ *
+ * The only exception are explicitely listed pebs_aliases
+ */
+ if ((!x86_pmu.pebs_aliases || !x86_pmu.pebs_aliases(event)) &&
+ (event->attr.config & (ARCH_PERFMON_EVENTSEL_CMASK|
+ ARCH_PERFMON_EVENTSEL_EDGE|
+ ARCH_PERFMON_EVENTSEL_INV|
+ ARCH_PERFMON_EVENTSEL_ANY)))
+ return -EIO;
+ }

if (intel_pmu_needs_lbr_smpl(event)) {
ret = intel_pmu_setup_lbr_filter(event);
--
1.7.7.6

2013-04-24 23:20:49

by Ben Hutchings

[permalink] [raw]
Subject: Re: [PATCH 1/2] Fix perf LBR filtering

On Wed, Apr 24, 2013 at 04:04:53PM -0700, Andi Kleen wrote:
[....]
> Should be applied to applicable stable branches too. The problem
> goes back a long time.
>
> Signed-off-by: Andi Kleen <[email protected]>
> ---
[...]

This is not the correct way to submit a change to stable.
See Documentation/stable_kernel_rules.txt

Ben.

--
Ben Hutchings
We get into the habit of living before acquiring the habit of thinking.
- Albert Camus

2013-04-24 23:24:21

by Greg KH

[permalink] [raw]
Subject: Re: [PATCH 1/2] Fix perf LBR filtering

On Wed, Apr 24, 2013 at 04:04:53PM -0700, Andi Kleen wrote:
> From: Andi Kleen <[email protected]>
>
> The perf LBR code has special code to filter specific
> instructions in software.
>
> The LBR logs any instruction address, even if IP just faulted.
> This means user space can control any address by just branching
> to a bad address.
>
> On a modern Intel system the only software filtering needed
> is to include SYSCALL/RETs in PERF_SAMPLE_BRANCH_ANY_CALL/RETURN.
> The hardware call filter only handles short calls, but syscall
> is a far call. So it enables far call logging too, but removes
> any other far calls (like interrupts) by looking at the instruction.
> On older systems some additional software filtering is done too,
> to work a problem that CALLs can be only logged together with
> indirect jumps.
>
> It currently assumes that any address that looks like a kernel
> address can be safely referenced.
>
> But that is dangerous if can be controlled by the user:
> - It can be used to crash the kernel
> - It allows to probe any physical address for a small set of values
> (valid call op codes) which is an information leak.
> - It may point to a side effect on read MMIO region
>
> So we cannot reference kernel addresses safely.
>
> Possible options:
>
> I) Disable FAR calls for ANY_CALL/RETURNS.
> This just means syscalls are not logged
> as calls. This also lowers the overhead of call logging.
> This changes semantics slightly.
> This is reasonable on Sandy Bridge and later, but would
> cause additional problems on Nehalem and Westmere with
> their additional filters.
>
> II) Simple disable any filtering for kernel space.
> This means interrupts in kernel space are reported as calls
> and on Nehalem/Westmere some indirect jumps are reported
> as calls too
>
> III) Enumerate all the kernel entry points and check.
> Any bad call must have a kernel entry point as to.
> This seemed to fragile to maintain.
>
> IV) Enumerate all kernel code and check for these ranges.
> Quite complicated, especially with the new kernel code JITs.
> Would also allow to probe for kernel code (defeating randomized kernel)
>
> This patch implements II: Simply disable software filtering for
> any kernel address, which seemed the best.
> (I) would be also an option and was earlier implemented in
> https://patchwork.kernel.org/patch/2468351/
> (however this patch still leaves Nehalem/Westmere/Atom open to the problem)
> (III) and (IV) appear too complicated and risky.
>
> Should be applied to applicable stable branches too. The problem
> goes back a long time.
>
> Signed-off-by: Andi Kleen <[email protected]>
> ---
> arch/x86/kernel/cpu/perf_event_intel_lbr.c | 18 +++++++++++++++---
> 1 files changed, 15 insertions(+), 3 deletions(-)


<formletter>

This is not the correct way to submit patches for inclusion in the
stable kernel tree. Please read Documentation/stable_kernel_rules.txt
for how to do this properly.

</formletter>

2013-04-25 16:27:02

by Peter Zijlstra

[permalink] [raw]
Subject: Re: [PATCH 1/2] Fix perf LBR filtering

On Wed, Apr 24, 2013 at 04:04:53PM -0700, Andi Kleen wrote:
> Possible options:
>
> I) Disable FAR calls for ANY_CALL/RETURNS.
> This just means syscalls are not logged
> as calls. This also lowers the overhead of call logging.
> This changes semantics slightly.
> This is reasonable on Sandy Bridge and later, but would
> cause additional problems on Nehalem and Westmere with
> their additional filters.
>
> II) Simple disable any filtering for kernel space.
> This means interrupts in kernel space are reported as calls
> and on Nehalem/Westmere some indirect jumps are reported
> as calls too
>
> III) Enumerate all the kernel entry points and check.
> Any bad call must have a kernel entry point as to.
> This seemed to fragile to maintain.
>
> IV) Enumerate all kernel code and check for these ranges.
> Quite complicated, especially with the new kernel code JITs.
> Would also allow to probe for kernel code (defeating randomized kernel)

So why not do the same as we do for userspace? Copy MAX_INSN_SIZE bytes
and trap -EFAULT.

With Steven's recent NMI nesting stuff we should be able to take the fault and
do the fixup_exception() thing. Or alternatively we could software walk the
kernel pagetables.

2013-04-25 16:41:05

by Andi Kleen

[permalink] [raw]
Subject: Re: [PATCH 1/2] Fix perf LBR filtering

> So why not do the same as we do for userspace? Copy MAX_INSN_SIZE bytes
> and trap -EFAULT.

Read the whole description, then you'll know why that is insecure.

-Andi

2013-04-25 16:49:57

by Peter Zijlstra

[permalink] [raw]
Subject: Re: [PATCH 1/2] Fix perf LBR filtering

On Thu, Apr 25, 2013 at 06:41:00PM +0200, Andi Kleen wrote:
> > So why not do the same as we do for userspace? Copy MAX_INSN_SIZE bytes
> > and trap -EFAULT.
>
> Read the whole description, then you'll know why that is insecure.

You didn't actually explicitly mention it; you just said unconditional reading
of random addresses was bad.

You list:

> But that is dangerous if can be controlled by the user:
> - It can be used to crash the kernel
> - It allows to probe any physical address for a small set of values
> (valid call op codes) which is an information leak.
> - It may point to a side effect on read MMIO region

Traping the read deals with the first. The second shouldn't be a problem since
we generally only allow kernel info for CAP_ADMIN; if we don't already for LBR
that needs to be fixed separately.

That only leaves the third.. can we descern MMIO maps from the kernel page tables?

2013-04-25 17:00:41

by Andi Kleen

[permalink] [raw]
Subject: Re: [PATCH 1/2] Fix perf LBR filtering

> Traping the read deals with the first. The second shouldn't be a problem since
> we generally only allow kernel info for CAP_ADMIN; if we don't already for LBR
> that needs to be fixed separately.

Where is that check? I don't see it.

Also remember that precise == 2 can enable LBR implicitly.

> That only leaves the third.. can we descern MMIO maps from the kernel page tables?

In theory you could use some bits in the PTE for vmalloc, but it would need quite a
few changes.

Also there may be corner cases where MMIO is in the direct mapping or in
the kernel mapping.


-Andi

--
[email protected] -- Speaking for myself only.

2013-04-25 17:20:03

by Peter Zijlstra

[permalink] [raw]
Subject: Re: [PATCH 1/2] Fix perf LBR filtering

On Thu, Apr 25, 2013 at 07:00:37PM +0200, Andi Kleen wrote:
> > Traping the read deals with the first. The second shouldn't be a problem since
> > we generally only allow kernel info for CAP_ADMIN; if we don't already for LBR
> > that needs to be fixed separately.
>
> Where is that check? I don't see it.

Then that might need fixing.

> Also remember that precise == 2 can enable LBR implicitly.

Sure.. but it doesn't need the filter stuff. Now I completely forgot if it will
actually still use the filter muck.. /me goes check

I think intel_pmu_lbr_filter() will typically bail on the X86_BR_ALL test for
PEBS fixup, it might only end up in the filter code if precise_br_compat()
finds another LBR user compatible with the fixup.

> > That only leaves the third.. can we descern MMIO maps from the kernel page tables?
>
> In theory you could use some bits in the PTE for vmalloc, but it would need quite a
> few changes.
>
> Also there may be corner cases where MMIO is in the direct mapping or in
> the kernel mapping.

Hrmm... do we keep track of the MMIO regions somewhere at all?

2013-04-25 17:42:15

by Andi Kleen

[permalink] [raw]
Subject: Re: [PATCH 1/2] Fix perf LBR filtering

On Thu, Apr 25, 2013 at 07:18:42PM +0200, Peter Zijlstra wrote:
> On Thu, Apr 25, 2013 at 07:00:37PM +0200, Andi Kleen wrote:
> > > Traping the read deals with the first. The second shouldn't be a problem since
> > > we generally only allow kernel info for CAP_ADMIN; if we don't already for LBR
> > > that needs to be fixed separately.
> >
> > Where is that check? I don't see it.
>
> Then that might need fixing.

Ok.

BTW I would just still argue that even paranoid == -1 should not allow
crashes. So even with that added it would still be a problem.

>
> > Also remember that precise == 2 can enable LBR implicitly.
>
> Sure.. but it doesn't need the filter stuff. Now I completely forgot if it will
> actually still use the filter muck.. /me goes check

True. It doesn't filter.

> I think intel_pmu_lbr_filter() will typically bail on the X86_BR_ALL test for
> PEBS fixup, it might only end up in the filter code if precise_br_compat()
> finds another LBR user compatible with the fixup.

Yes it does.

>
> > > That only leaves the third.. can we descern MMIO maps from the kernel page tables?
> >
> > In theory you could use some bits in the PTE for vmalloc, but it would need quite a
> > few changes.
> >
> > Also there may be corner cases where MMIO is in the direct mapping or in
> > the kernel mapping.
>
> Hrmm... do we keep track of the MMIO regions somewhere at all?

There's the non cachable region tracking. But there's no guarantee a
MMIO has to be in there, driver may still rely just on MTRRs. Also
there may be MMIOs the kernel doesn't know about which just happen
to be somewhere in the direct mapping.

I don't see any reliable way to detect all mmios.

-Andi

2013-04-26 07:57:24

by Peter Zijlstra

[permalink] [raw]
Subject: Re: [PATCH 1/2] Fix perf LBR filtering

On Thu, Apr 25, 2013 at 07:42:11PM +0200, Andi Kleen wrote:
> There's the non cachable region tracking. But there's no guarantee a
> MMIO has to be in there, driver may still rely just on MTRRs. Also
> there may be MMIOs the kernel doesn't know about which just happen
> to be somewhere in the direct mapping.
>
> I don't see any reliable way to detect all mmios.

Hmm.. that's unfortunate.

OK, so how about we use something like:

is_kernel_text() || is_module_text_address()

And bail if not; that should at least make it work for the simple case or
normal text and not be worse than what you propose for JIT stuff.

2013-04-26 19:46:39

by Andi Kleen

[permalink] [raw]
Subject: Re: [PATCH 1/2] Fix perf LBR filtering

>
> OK, so how about we use something like:
>
> is_kernel_text() || is_module_text_address()

is_module_text_address() has to walk all modules.
A random system with a distro kernel I checked has 101 modules loaded.

16 * 101 = too much

I don't think you want to spend that many cycles in the NMI
handler for a dubious feature. Ok in theory you could
add something with binary search, but that would be quite
a bit of effort and it would be probably challenging
to do that all NMI safe.

Also it wouldn't work for all these new kernel JITs people
are doing of course.

Still think my patch is the best so far? (plus the missing
root check)

-Andi
--
[email protected] -- Speaking for myself only.

2013-04-29 22:16:26

by Stephane Eranian

[permalink] [raw]
Subject: Re: [PATCH 2/2] perf, x86: Don't allow unusual PEBS raw flags

On Thu, Apr 25, 2013 at 1:04 AM, Andi Kleen <[email protected]> wrote:
> From: Andi Kleen <[email protected]>
>
> The PEBS documentation in the Intel SDM 18.6.1.1 states:
>
> """
> PEBS events are only valid when the following fields of IA32_PERFEVTSELx are all
> zero: AnyThread, Edge, Invert, CMask.
> """
>
> Since we had problems with this earlier, don't allow cmask, any, edge, invert
> as raw events, except for the ones explicitly listed as pebs_aliases.
>
Yes, this is indeed needed. Those filters are otherwise ignored. So you are
not measuring what you think you are.

Could you explain why those listed in the aliases avoid that problem?

> Signed-off-by: Andi Kleen <[email protected]>
> ---
> arch/x86/kernel/cpu/perf_event.h | 2 +-
> arch/x86/kernel/cpu/perf_event_intel.c | 24 ++++++++++++++++++++----
> 2 files changed, 21 insertions(+), 5 deletions(-)
>
> diff --git a/arch/x86/kernel/cpu/perf_event.h b/arch/x86/kernel/cpu/perf_event.h
> index 7f5c75c..e46f932 100644
> --- a/arch/x86/kernel/cpu/perf_event.h
> +++ b/arch/x86/kernel/cpu/perf_event.h
> @@ -386,7 +386,7 @@ struct x86_pmu {
> int pebs_record_size;
> void (*drain_pebs)(struct pt_regs *regs);
> struct event_constraint *pebs_constraints;
> - void (*pebs_aliases)(struct perf_event *event);
> + bool (*pebs_aliases)(struct perf_event *event);
> int max_pebs_events;
>
> /*
> diff --git a/arch/x86/kernel/cpu/perf_event_intel.c b/arch/x86/kernel/cpu/perf_event_intel.c
> index cc45deb..126c971 100644
> --- a/arch/x86/kernel/cpu/perf_event_intel.c
> +++ b/arch/x86/kernel/cpu/perf_event_intel.c
> @@ -1447,7 +1447,7 @@ static void intel_put_event_constraints(struct cpu_hw_events *cpuc,
> intel_put_shared_regs_event_constraints(cpuc, event);
> }
>
> -static void intel_pebs_aliases_core2(struct perf_event *event)
> +static bool intel_pebs_aliases_core2(struct perf_event *event)
> {
> if ((event->hw.config & X86_RAW_EVENT_MASK) == 0x003c) {
> /*
> @@ -1472,10 +1472,12 @@ static void intel_pebs_aliases_core2(struct perf_event *event)
>
> alt_config |= (event->hw.config & ~X86_RAW_EVENT_MASK);
> event->hw.config = alt_config;
> + return true;
> }
> + return false;
> }
>
> -static void intel_pebs_aliases_snb(struct perf_event *event)
> +static bool intel_pebs_aliases_snb(struct perf_event *event)
> {
> if ((event->hw.config & X86_RAW_EVENT_MASK) == 0x003c) {
> /*
> @@ -1500,7 +1502,9 @@ static void intel_pebs_aliases_snb(struct perf_event *event)
>
> alt_config |= (event->hw.config & ~X86_RAW_EVENT_MASK);
> event->hw.config = alt_config;
> + return true;
> }
> + return false;
> }
>
> static int intel_pmu_hw_config(struct perf_event *event)
> @@ -1510,8 +1514,20 @@ static int intel_pmu_hw_config(struct perf_event *event)
> if (ret)
> return ret;
>
> - if (event->attr.precise_ip && x86_pmu.pebs_aliases)
> - x86_pmu.pebs_aliases(event);
> + if (event->attr.precise_ip) {
> + /*
> + * Don't allow unusal flags with PEBS, as that
> + * may confuse the CPU.
> + *
> + * The only exception are explicitely listed pebs_aliases
> + */
> + if ((!x86_pmu.pebs_aliases || !x86_pmu.pebs_aliases(event)) &&
> + (event->attr.config & (ARCH_PERFMON_EVENTSEL_CMASK|
> + ARCH_PERFMON_EVENTSEL_EDGE|
> + ARCH_PERFMON_EVENTSEL_INV|
> + ARCH_PERFMON_EVENTSEL_ANY)))
> + return -EIO;
> + }
>
> if (intel_pmu_needs_lbr_smpl(event)) {
> ret = intel_pmu_setup_lbr_filter(event);
> --
> 1.7.7.6
>

2013-04-29 22:34:58

by Andi Kleen

[permalink] [raw]
Subject: Re: [PATCH 2/2] perf, x86: Don't allow unusual PEBS raw flags

> Could you explain why those listed in the aliases avoid that problem?

The aliases are widely used now (by perf), so as a special case are expected
to work now (at least as long as we enable it by model number)

-Andi

2013-04-29 23:05:11

by Stephane Eranian

[permalink] [raw]
Subject: Re: [PATCH 2/2] perf, x86: Don't allow unusual PEBS raw flags

On Tue, Apr 30, 2013 at 12:34 AM, Andi Kleen <[email protected]> wrote:
>> Could you explain why those listed in the aliases avoid that problem?
>
> The aliases are widely used now (by perf), so as a special case are expected
> to work now (at least as long as we enable it by model number)
>
I don't get that.

uops_retired:cmask=16:i with precise=2 does not measure
that based on your comments. It measures uops_retired
if the filters are ignored with PEBS.

2013-05-01 11:51:37

by Ingo Molnar

[permalink] [raw]
Subject: Re: [PATCH 1/2] Fix perf LBR filtering


* Andi Kleen <[email protected]> wrote:

> >
> > OK, so how about we use something like:
> >
> > is_kernel_text() || is_module_text_address()
>
> is_module_text_address() has to walk all modules.
> A random system with a distro kernel I checked has 101 modules loaded.

I checked another random distro and it had 30 modules loaded.

> 16 * 101 = too much
>
> I don't think you want to spend that many cycles in the NMI
> handler for a dubious feature. Ok in theory you could
> add something with binary search, but that would be quite
> a bit of effort and it would be probably challenging
> to do that all NMI safe.

If anyone using LBR sees that overhead it can be improved. You or others
who care can improve it.

(Or if the hardware gets fully fixed, it can be removed for that
hardware.)

> Also it wouldn't work for all these new kernel JITs people are doing of
> course.

They'll miss the filtering, until they offer the proper
is_kernel_jit_text() primitive that is.

> Still think my patch is the best so far? (plus the missing root check)

Your patch simply disables the filtering for kernel addresses, it's the
worst of all options so far.

Anyway, what Peter asked for is a trivial change that solves the bug - are
you willing to respond to his review feedback and submit an updated
series?

Thanks,

Ingo

2013-05-01 11:56:00

by Ingo Molnar

[permalink] [raw]
Subject: Re: [PATCH 1/2] Fix perf LBR filtering


* Ingo Molnar <[email protected]> wrote:

> > I don't think you want to spend that many cycles in the NMI
> > handler for a dubious feature. Ok in theory you could
> > add something with binary search, but that would be quite
> > a bit of effort and it would be probably challenging
> > to do that all NMI safe.
>
> If anyone using LBR sees that overhead it can be improved. You or others
> who care can improve it.

Also, improving the performance of is_module_text() shouldn't be too hard:
an RCU rbtree should be enough.

It's NMI-safe: when the rb-tree is in the middle of a rotation we'll
simply not find the address and 'revert' to the worst case non-filtering
your patch does all the time, but in the likely case it does find it and
works as expected.

Thanks,

Ingo

2013-05-02 07:39:01

by Peter Zijlstra

[permalink] [raw]
Subject: Re: [PATCH 2/2] perf, x86: Don't allow unusual PEBS raw flags

On Wed, Apr 24, 2013 at 04:04:54PM -0700, Andi Kleen wrote:
> From: Andi Kleen <[email protected]>
>
> The PEBS documentation in the Intel SDM 18.6.1.1 states:
>
> """
> PEBS events are only valid when the following fields of IA32_PERFEVTSELx are all
> zero: AnyThread, Edge, Invert, CMask.
> """
>
> Since we had problems with this earlier, don't allow cmask, any, edge, invert
> as raw events, except for the ones explicitly listed as pebs_aliases.

If its a simple matter of crap in crap out without affecting anything else we
shouldn't do anything.

The only reason to interfere with the programming is if invalid programming can
affect others like with that MEM_*_RETIRED failure on ivb.

2013-05-06 17:44:22

by Stephane Eranian

[permalink] [raw]
Subject: Re: [PATCH 2/2] perf, x86: Don't allow unusual PEBS raw flags

On Thu, May 2, 2013 at 9:37 AM, Peter Zijlstra <[email protected]> wrote:
> On Wed, Apr 24, 2013 at 04:04:54PM -0700, Andi Kleen wrote:
>> From: Andi Kleen <[email protected]>
>>
>> The PEBS documentation in the Intel SDM 18.6.1.1 states:
>>
>> """
>> PEBS events are only valid when the following fields of IA32_PERFEVTSELx are all
>> zero: AnyThread, Edge, Invert, CMask.
>> """
>>
>> Since we had problems with this earlier, don't allow cmask, any, edge, invert
>> as raw events, except for the ones explicitly listed as pebs_aliases.
>
> If its a simple matter of crap in crap out without affecting anything else we
> shouldn't do anything.
>
The problem here is that you are sampling an instruction which did not cause
the event you are measuring. Remember that using cmask, changes the
nature of what's being measured (from event to cycles).


> The only reason to interfere with the programming is if invalid programming can
> affect others like with that MEM_*_RETIRED failure on ivb.

2013-05-06 18:54:13

by Peter Zijlstra

[permalink] [raw]
Subject: Re: [PATCH 2/2] perf, x86: Don't allow unusual PEBS raw flags

On Mon, May 06, 2013 at 07:44:19PM +0200, Stephane Eranian wrote:
> On Thu, May 2, 2013 at 9:37 AM, Peter Zijlstra <[email protected]> wrote:
> > On Wed, Apr 24, 2013 at 04:04:54PM -0700, Andi Kleen wrote:
> >> From: Andi Kleen <[email protected]>
> >>
> >> The PEBS documentation in the Intel SDM 18.6.1.1 states:
> >>
> >> """
> >> PEBS events are only valid when the following fields of IA32_PERFEVTSELx are all
> >> zero: AnyThread, Edge, Invert, CMask.
> >> """
> >>
> >> Since we had problems with this earlier, don't allow cmask, any, edge, invert
> >> as raw events, except for the ones explicitly listed as pebs_aliases.
> >
> > If its a simple matter of crap in crap out without affecting anything else we
> > shouldn't do anything.
> >
> The problem here is that you are sampling an instruction which did not cause
> the event you are measuring. Remember that using cmask, changes the
> nature of what's being measured (from event to cycles).

Yeah.. I don't see the problem though. If you're using cmask and the like
you're supposed to know wth you're doing; which includes knowing your cpu and
what it thinks of such an event.

The only reason to disallow events is if they (badly) interact with other
counters.

2013-05-06 22:43:08

by Stephane Eranian

[permalink] [raw]
Subject: Re: [PATCH 2/2] perf, x86: Don't allow unusual PEBS raw flags

On Mon, May 6, 2013 at 8:52 PM, Peter Zijlstra <[email protected]> wrote:
> On Mon, May 06, 2013 at 07:44:19PM +0200, Stephane Eranian wrote:
>> On Thu, May 2, 2013 at 9:37 AM, Peter Zijlstra <[email protected]> wrote:
>> > On Wed, Apr 24, 2013 at 04:04:54PM -0700, Andi Kleen wrote:
>> >> From: Andi Kleen <[email protected]>
>> >>
>> >> The PEBS documentation in the Intel SDM 18.6.1.1 states:
>> >>
>> >> """
>> >> PEBS events are only valid when the following fields of IA32_PERFEVTSELx are all
>> >> zero: AnyThread, Edge, Invert, CMask.
>> >> """
>> >>
>> >> Since we had problems with this earlier, don't allow cmask, any, edge, invert
>> >> as raw events, except for the ones explicitly listed as pebs_aliases.
>> >
>> > If its a simple matter of crap in crap out without affecting anything else we
>> > shouldn't do anything.
>> >
>> The problem here is that you are sampling an instruction which did not cause
>> the event you are measuring. Remember that using cmask, changes the
>> nature of what's being measured (from event to cycles).
>
> Yeah.. I don't see the problem though. If you're using cmask and the like
> you're supposed to know wth you're doing; which includes knowing your cpu and
> what it thinks of such an event.
>
But that implies that you'd know that on Intel precise mode uses PEBS
and that PEBS
does not take cmask events. That seems to contradict the philosophy of
perf_events
where the kernel does the work for you.

> The only reason to disallow events is if they (badly) interact with other
> counters.

2013-05-07 06:48:11

by Ingo Molnar

[permalink] [raw]
Subject: Re: [PATCH 2/2] perf, x86: Don't allow unusual PEBS raw flags


* Stephane Eranian <[email protected]> wrote:

> On Mon, May 6, 2013 at 8:52 PM, Peter Zijlstra <[email protected]> wrote:
> > On Mon, May 06, 2013 at 07:44:19PM +0200, Stephane Eranian wrote:
> >> On Thu, May 2, 2013 at 9:37 AM, Peter Zijlstra <[email protected]> wrote:
> >> > On Wed, Apr 24, 2013 at 04:04:54PM -0700, Andi Kleen wrote:
> >> >> From: Andi Kleen <[email protected]>
> >> >>
> >> >> The PEBS documentation in the Intel SDM 18.6.1.1 states:
> >> >>
> >> >> """
> >> >> PEBS events are only valid when the following fields of IA32_PERFEVTSELx are all
> >> >> zero: AnyThread, Edge, Invert, CMask.
> >> >> """
> >> >>
> >> >> Since we had problems with this earlier, don't allow cmask, any, edge, invert
> >> >> as raw events, except for the ones explicitly listed as pebs_aliases.
> >> >
> >> > If its a simple matter of crap in crap out without affecting anything else we
> >> > shouldn't do anything.
> >> >
> >> The problem here is that you are sampling an instruction which did not cause
> >> the event you are measuring. Remember that using cmask, changes the
> >> nature of what's being measured (from event to cycles).
> >
> > Yeah.. I don't see the problem though. If you're using cmask and the like
> > you're supposed to know wth you're doing; which includes knowing your cpu and
> > what it thinks of such an event.
>
> But that implies that you'd know that on Intel precise mode uses PEBS
> and that PEBS does not take cmask events. That seems to contradict the
> philosophy of perf_events where the kernel does the work for you.

Also, this code only runs when the event is set up, so a bit of sanity
checking can only help, right?

Thanks,

Ingo

2013-05-07 08:44:50

by Peter Zijlstra

[permalink] [raw]
Subject: Re: [PATCH 2/2] perf, x86: Don't allow unusual PEBS raw flags

On Tue, May 07, 2013 at 12:43:04AM +0200, Stephane Eranian wrote:
> But that implies that you'd know that on Intel precise mode uses PEBS
> and that PEBS
> does not take cmask events. That seems to contradict the philosophy of
> perf_events
> where the kernel does the work for you.

This is basically raw event territory (hidden in a nice syntax). This is very
much a you're on your own case.

But it appears there's more behind this than was said; which makes this patch
submission duplicitous and trying to sneak one past the maintainers -- that's
very much not how things are done.

I'm not taking anything like this until the Changelog reflects the true
purpose.

2013-05-07 08:49:40

by Peter Zijlstra

[permalink] [raw]
Subject: Re: [PATCH 2/2] perf, x86: Don't allow unusual PEBS raw flags

On Tue, May 07, 2013 at 08:48:05AM +0200, Ingo Molnar wrote:
> Also, this code only runs when the event is set up, so a bit of sanity
> checking can only help, right?

Nah, its all very circumspect. In fact; while what Andi states is 'true':

> documentation in the Intel SDM 18.6.1.1 states:
>
> """
> PEBS events are only valid when the following fields of IA32_PERFEVTSELx are all
> zero: AnyThread, Edge, Invert, CMask.
> """

It is also true that Intel themselves gave us events that contradict this; look
at the intel_pebs_aliases*() functions.

This patch would make it impossible to manually create those events.

Further, there's something entirely different behind this.

2013-05-07 11:04:21

by Ingo Molnar

[permalink] [raw]
Subject: Re: [PATCH 2/2] perf, x86: Don't allow unusual PEBS raw flags


* Peter Zijlstra <[email protected]> wrote:

> On Tue, May 07, 2013 at 08:48:05AM +0200, Ingo Molnar wrote:
> > Also, this code only runs when the event is set up, so a bit of sanity
> > checking can only help, right?
>
> Nah, its all very circumspect. In fact; while what Andi states is 'true':
>
> > documentation in the Intel SDM 18.6.1.1 states:
> >
> > """
> > PEBS events are only valid when the following fields of IA32_PERFEVTSELx are all
> > zero: AnyThread, Edge, Invert, CMask.
> > """
>
> It is also true that Intel themselves gave us events that contradict this; look
> at the intel_pebs_aliases*() functions.
>
> This patch would make it impossible to manually create those events.

I missed that bit of obfuscation ...

> Further, there's something entirely different behind this.

Ok, I agree with you, this needs to be fully explained.

Thanks,

Ingo